3 lessons

Attention & Transformers

The single mechanism that reshaped deep learning

Lessons

01
Self Attention
Queries, keys, values — derived and animated.
HardOpen
02
Multi Headed Self Attention
Parallel attention heads specializing on different patterns.
HardOpen
03
Transformer Block
Attention + MLP + norms + residuals — one layer.
HardOpen