All Courses
10 lessons
Build GPT
A working GPT, built lesson by lesson
Lessons
- 01
Tokenizer (Byte Pair Encoding)
BPE from scratch — merges, vocabs, edge cases.
HardOpen - 02
Build Vocabulary
Train a BPE vocab on real text.
MediumOpen - 03
Tokenization Edge Cases
Whitespace, unicode, emoji, and the quirks of tiktoken.
MediumOpen - 04
GPT Data Loader
Streaming tokens into the model efficiently.
MediumOpen - 05
GPT Dataset
Context windows, next-token targets, packing.
MediumOpen - 06
Code GPT
Assemble the full GPT architecture.
HardOpen - 07
Train Your GPT
AdamW, warmup, cosine decay — the real recipe.
HardOpen - 08
Make GPT Talk Back
Sampling: temperature, top-k, nucleus.
MediumOpen - 09
KV-Cache
The single trick behind fast inference.
HardOpen - 10
Grouped Query Attention
Llama-style attention: memory savings without accuracy loss.
HardOpen