Multi-Token Generation (With KV Cache)
Starting generation - "then" queries for next token prediction
Autoregressive token prediction (with cache):
"John"
K1
[--]
[--]
[--]
V1
[--]
[--]
[--]
"sat"
K2
[--]
[--]
[--]
V2
[--]
[--]
[--]
"then"
Q3
[0.31]
[0.92]
[0.56]
⚡
Autoregressive Generation with KV Cache
• Context length: 2 tokens
• Query token: "then" computing attention over context
• KV caching enabled - vectors computed once and reused!
Previous Step
Next Step
Reset