Lecture 10: Kernel Methods & Gaussian Processes

Learning Probability Distributions Over Functions

Overview

In this lecture, we explore Gaussian Processes (GPs) - a fundamentally different approach to machine learning that excels when neural networks fail. Instead of learning a single prediction function, GPs learn probability distributions over entire function spaces, providing principled uncertainty quantification with limited data.

  • Gaussian Processes as distributions over functions: priors, posteriors, and the role of kernels
  • Kernel engineering: composing simple kernels (RBF, Periodic, Linear) to encode domain knowledge
  • Small data advantage: why GPs outperform neural networks with sparse training data
  • Deep kernel learning: combining neural network feature extractors with GP inference
  • Real-world applications: CO₂ forecasting with uncertainty and semantic ambiguity prediction from images

Learning Objectives

By the end of this lecture, you will:

  • Understand the paradigm shift from learning single predictions to learning probability distributions over functions
  • Design and combine kernels to encode domain knowledge about patterns in data (trends, seasonality, smoothness)
  • Apply GPs effectively when neural networks fail due to limited training data
  • Obtain calibrated confidence intervals and quantify prediction uncertainty
  • Implement deep kernel learning by using neural networks as feature extractors for GPs
  • Make informed model choices between GPs and neural networks for real-world problems

Materials

Datasets & Acknowledgments

  • Mauna Loa CO₂ Data: Monthly atmospheric CO₂ measurements from 1958-2017 used to demonstrate GP forecasting with uncertainty
    • Source: NOAA Global Monitoring Laboratory
    • Download: Available on Kaggle
    • Note: Dataset is included in the repository at Lecture 10 Kernel Methods/data/archive.csv
  • CIFAR-10 (Krizhevsky et al.): Natural images used for deep kernel learning demonstration
    • Source: https://www.cs.toronto.edu/~kriz/cifar.html
    • Downloaded automatically by PyTorch when running the notebook
    • Please review dataset license/terms of use before redistribution
  • Libraries: scikit-learn (GP implementation, kernels), PyTorch/torchvision (ResNet features for deep kernels)

Previous: ← Lecture 9: Ensemble Methods | Next: Lecture 11: K-Means Clustering →