All Courses
6 lessons
Fine-Tuning & RLHF
From a base model to an aligned, instruction-following assistant
Lessons
- 01
Supervised Fine-Tuning
Turn a base model into an instruction-follower.
MediumOpen - 02
LoRA
Low-rank adapters — fine-tune 0.1% of the parameters.
HardOpen - 03
QLoRA
LoRA on 4-bit weights — fine-tune a 70B on a single GPU.
HardOpen - 04
Reward Modeling
Train a preference model from human pairwise comparisons.
HardOpen - 05
PPO for RLHF
Policy optimization against a learned reward model.
HardOpen - 06
Direct Preference Optimization
RLHF without a separate reward model — the elegant alternative.
HardOpen