Loading...

Sambar in N Dimensional Space

You must be wondering, what has sambar to do with the N-dimensional space. Here's a question. You taste sambar. Mouth explodes with a variety of flavours. If I were to tell you that in the 4th minute, I added jeera, in the 11th minute, I added drumsticks, in the 25th minute homemade sambar powder went in. You may not be interested. You just want to relish!

That's how we are today in the world of LLMs. Just want to use LLMs. What happens inside? Your guess is as good as mine.

LLMs are powered by Transformers. A Transformer doesn't read a sentence left to right. It inhales the entire sentence at once the way one's nose inhales tadka.

Every word gets flung into N-dimensional space as a vector. Onion acquires coordinates, pungency (9/10), crunch (8/10), 10/10 for tear-jerk quotient.

Attention: Every token turns to every other token and asks: How relevant are you to me, right now? Just like the drumstick locks eyes with the pumpkin across a crowded pot and asks, how much attention should I give you?

Three vectors broker the conversation: Query, Key, Value (Q,K,V)

  • Query (drumstick): Who complements my flavour?
  • Key (pumpkin): I do, sweetness against your earthiness
  • Value (pumpkin): Here's exactly what I bring

Query meets Key. Millions of matrix multiplications erupt like ghee hitting a hot pan. Out comes an attention score: these two belong together.

Self-Attention: Every ingredient tastes every other. It means every token attends to every other token. Tamarind consults jaggery, curry leaf interrogates mustard seeds, yoor dal checks in with every single ingredient. Flavour democracy.

Softmax: The decider But the pot is chaos with every ingredient shouting its relevance at once. Enter Softmax: the quiet, ruthless function that converts raw scores into probabilities summing to exactly one. Not a democracy of equals. A weighted verdict.

The drumstick doesn't get the same say as the tamarind. Softmax decides who matters most, right now, in this context.

The black box or the 44th Minute

I know the sequence. Oil, mustard seeds, curry leaves, dried chillies, hing. I have done this a thousand times. But what happens at the 44th minute when the asafoetida has been simmering quietly in its undisclosed dimensions, and the sambar crosses some invisible threshold from good to transcendent, I cannot explain. My mother-in-law cannot explain it. Her mother couldn't either.

You may be an expert in training LLMs. You can explain Transformers, derive backpropagation, tune hyperparameters across a thousand GPUs. But can you tell me what actually happens in the black box? That my friends, is how LLMs work!