NOUS
DashboardCoursesUploadAuthoringAnalyticsStudentsSettings
RK
Prof. Ramesh KumarPES University
All Courses
6 lessons

Fine-Tuning & RLHF

From a base model to an aligned, instruction-following assistant

Lessons

  1. 01

    Supervised Fine-Tuning

    Turn a base model into an instruction-follower.

    MediumOpen
  2. 02

    LoRA

    Low-rank adapters — fine-tune 0.1% of the parameters.

    HardOpen
  3. 03

    QLoRA

    LoRA on 4-bit weights — fine-tune a 70B on a single GPU.

    HardOpen
  4. 04

    Reward Modeling

    Train a preference model from human pairwise comparisons.

    HardOpen
  5. 05

    PPO for RLHF

    Policy optimization against a learned reward model.

    HardOpen
  6. 06

    Direct Preference Optimization

    RLHF without a separate reward model — the elegant alternative.

    HardOpen