Lecture 10: Kernel Methods & Gaussian Processes
Learning Probability Distributions Over Functions
Overview
In this lecture, we explore Gaussian Processes (GPs) - a fundamentally different approach to machine learning that excels when neural networks fail. Instead of learning a single prediction function, GPs learn probability distributions over entire function spaces, providing principled uncertainty quantification with limited data.
- Gaussian Processes as distributions over functions: priors, posteriors, and the role of kernels
- Kernel engineering: composing simple kernels (RBF, Periodic, Linear) to encode domain knowledge
- Small data advantage: why GPs outperform neural networks with sparse training data
- Deep kernel learning: combining neural network feature extractors with GP inference
- Real-world applications: CO₂ forecasting with uncertainty and semantic ambiguity prediction from images
Learning Objectives
By the end of this lecture, you will:
- Understand the paradigm shift from learning single predictions to learning probability distributions over functions
- Design and combine kernels to encode domain knowledge about patterns in data (trends, seasonality, smoothness)
- Apply GPs effectively when neural networks fail due to limited training data
- Obtain calibrated confidence intervals and quantify prediction uncertainty
- Implement deep kernel learning by using neural networks as feature extractors for GPs
- Make informed model choices between GPs and neural networks for real-world problems
Materials
TipQuick Access
Datasets & Acknowledgments
- Mauna Loa CO₂ Data: Monthly atmospheric CO₂ measurements from 1958-2017 used to demonstrate GP forecasting with uncertainty
- Source: NOAA Global Monitoring Laboratory
- Download: Available on Kaggle
- Note: Dataset is included in the repository at
Lecture 10 Kernel Methods/data/archive.csv
- CIFAR-10 (Krizhevsky et al.): Natural images used for deep kernel learning demonstration
- Source: https://www.cs.toronto.edu/~kriz/cifar.html
- Downloaded automatically by PyTorch when running the notebook
- Please review dataset license/terms of use before redistribution
- Libraries: scikit-learn (GP implementation, kernels), PyTorch/torchvision (ResNet features for deep kernels)
Previous: ← Lecture 9: Ensemble Methods | Next: Lecture 11: K-Means Clustering →