Multi-Token Generation (Predictions)

Starting generation - "then" queries for next token prediction

Autoregressive token prediction (no caching):
"John"
K1
[--]
[--]
[--]
V1
[--]
[--]
[--]
"sat"
K2
[--]
[--]
[--]
V2
[--]
[--]
[--]
"then"
Q3
[0.31]
[0.92]
[0.56]
🔄Autoregressive Generation
• Context length: 2 tokens
• Query token: "then" computing attention over context
• No caching - vectors recomputed each generation step