NOUS
DashboardCoursesUploadAuthoringAnalyticsStudentsSettings
RK
Prof. Ramesh KumarPES University
All Courses
6 lessons

Reinforcement Learning

Learn from reward signals — the algorithms behind AlphaGo and RLHF

Lessons

  1. 01

    Markov Decision Processes

    States, actions, rewards, transitions — the RL contract.

    MediumOpen
  2. 02

    Q-Learning

    Learn a value function from experience, one update at a time.

    MediumOpen
  3. 03

    Policy Gradients

    Optimize the policy directly via gradient ascent on expected reward.

    HardOpen
  4. 04

    REINFORCE

    The cleanest policy-gradient algorithm — and its variance problem.

    HardOpen
  5. 05

    Actor-Critic

    Combine policy learning with a value baseline.

    HardOpen
  6. 06

    Proximal Policy Optimization

    The stable RL algorithm behind RLHF.

    HardOpen