Multi-Token Generation (With KV Cache)

Starting generation - "then" queries for next token prediction

Autoregressive token prediction (with cache):
"John"
K1
[--]
[--]
[--]
V1
[--]
[--]
[--]
"sat"
K2
[--]
[--]
[--]
V2
[--]
[--]
[--]
"then"
Q3
[0.31]
[0.92]
[0.56]
Autoregressive Generation with KV Cache
• Context length: 2 tokens
• Query token: "then" computing attention over context
• KV caching enabled - vectors computed once and reused!