All Courses
5 lessons
Inference & Serving
Ship the model — make it fast, cheap, and production-ready
Lessons
- 01
Quantization Basics
Why lower precision is an (almost) free lunch.
MediumOpen - 02
INT8 & INT4 Quantization
Post-training quantization and QAT, in detail.
HardOpen - 03
Speculative Decoding
A small model drafts, the big model verifies — 2-3× faster.
HardOpen - 04
Continuous Batching
The throughput trick that makes production LLMs economical.
HardOpen - 05
Paged Attention
vLLM's virtual-memory-inspired KV cache.
HardOpen