Multi-Token Generation (Predictions)
Starting generation - "then" queries for next token prediction
Autoregressive token prediction (no caching):
"John"
K1
[--]
[--]
[--]
V1
[--]
[--]
[--]
"sat"
K2
[--]
[--]
[--]
V2
[--]
[--]
[--]
"then"
Q3
[0.31]
[0.92]
[0.56]
🔄
Autoregressive Generation
• Context length: 2 tokens
• Query token: "then" computing attention over context
• No caching - vectors recomputed each generation step
Previous Step
Next Step
Reset