Lecture 10: Kernel Methods & Gaussian Processes

Learning Probability Distributions Over Functions

Overview

In this lecture, we explore Gaussian Processes (GPs) - a fundamentally different approach to machine learning that excels when neural networks fail. Instead of learning a single prediction function, GPs learn probability distributions over entire function spaces, providing principled uncertainty quantification with limited data.

Gaussian Processes as distributions over functions: priors, posteriors, and the role of kernels
Kernel engineering: composing simple kernels (RBF, Periodic, Linear) to encode domain knowledge
Small data advantage: why GPs outperform neural networks with sparse training data
Deep kernel learning: combining neural network feature extractors with GP inference
Real-world applications: CO₂ forecasting with uncertainty and semantic ambiguity prediction from images

Learning Objectives

By the end of this lecture, you will:

Understand the paradigm shift from learning single predictions to learning probability distributions over functions
Design and combine kernels to encode domain knowledge about patterns in data (trends, seasonality, smoothness)
Apply GPs effectively when neural networks fail due to limited training data
Obtain calibrated confidence intervals and quantify prediction uncertainty
Implement deep kernel learning by using neural networks as feature extractors for GPs
Make informed model choices between GPs and neural networks for real-world problems

Materials

Quick Access

Kernel Methods & GP Notebook

Datasets & Acknowledgments

Mauna Loa CO₂ Data: Monthly atmospheric CO₂ measurements from 1958-2017 used to demonstrate GP forecasting with uncertainty
- Source: NOAA Global Monitoring Laboratory
- Download: Available on Kaggle
- Note: Dataset is included in the repository at Lecture 10 Kernel Methods/data/archive.csv
CIFAR-10 (Krizhevsky et al.): Natural images used for deep kernel learning demonstration
- Source: https://www.cs.toronto.edu/~kriz/cifar.html
- Downloaded automatically by PyTorch when running the notebook
- Please review dataset license/terms of use before redistribution
Libraries: scikit-learn (GP implementation, kernels), PyTorch/torchvision (ResNet features for deep kernels)

Previous: ← Lecture 9: Ensemble Methods | Next: Lecture 11: K-Means Clustering →