Pierce.dev API

Attention Distribution

Full n×n attention distribution with causal masking

Each row shows how one token attends to all previous tokens (autoregressive)

"John"

"sat"

"then"

"he"

"ran"

"John"

1.00

John → John: 1.000

—

—

—

—

"sat"

0.64

sat → John: 0.640

0.36

sat → sat: 0.360

—

—

—

"then"

0.45

then → John: 0.450

0.32

then → sat: 0.320

0.23

then → then: 0.230

—

—

"he"

0.72

he → John: 0.720

0.15

he → sat: 0.150

0.08

he → then: 0.080

0.05

he → he: 0.050

—

"ran"

0.22

ran → John: 0.220

0.28

ran → sat: 0.280

0.25

ran → then: 0.250

0.15

ran → he: 0.150

0.10

ran → ran: 0.100