NOUS
DashboardCoursesUploadAuthoringAnalyticsStudentsSettings
RK
Prof. Ramesh KumarPES University
All Courses
5 lessons

Inference & Serving

Ship the model — make it fast, cheap, and production-ready

Lessons

  1. 01

    Quantization Basics

    Why lower precision is an (almost) free lunch.

    MediumOpen
  2. 02

    INT8 & INT4 Quantization

    Post-training quantization and QAT, in detail.

    HardOpen
  3. 03

    Speculative Decoding

    A small model drafts, the big model verifies — 2-3× faster.

    HardOpen
  4. 04

    Continuous Batching

    The throughput trick that makes production LLMs economical.

    HardOpen
  5. 05

    Paged Attention

    vLLM's virtual-memory-inspired KV cache.

    HardOpen