Lecture 15: Convolutional Neural Networks - Learning to See
From edge detection to hierarchical feature learning in computer vision
Overview
In 2012, AlexNet achieved a breakthrough 15.3% error rate on ImageNet, fundamentally changing computer vision. This lecture explores why CNNs work so well for visual tasks through hands-on experimentation. The core insight: instead of learning separate parameters for every pixel location, CNNs learn small patterns that slide across the image through weight sharing, reducing parameters from millions to thousands while encoding translation invariance. You’ll implement convolution from scratch, compare MLP vs CNN robustness on MNIST, visualize hierarchical features in ResNet50, and use saliency maps to understand what CNNs actually “see.”
Learning Objectives
By the end of this lecture, you will be able to:
- Understand why spatial structure matters in vision and how CNNs exploit it
- Learn how convolution operations extract local features through weight sharing
- Master the mathematics behind convolution and its relationship to cross-correlation
- Understand how CNNs build hierarchical representations from edges to objects
- Learn the role of pooling in creating translation invariance and dimension reduction
- Implement convolution from scratch to understand the operation deeply
- Visualize learned features at different network depths
- Apply transfer learning to solve custom image classification tasks
- Diagnose what CNNs actually “see” using activation and gradient visualization
- Compare translation robustness between fully connected networks and CNNs
Materials
Resources
- Papers:
- “ImageNet Classification with Deep Convolutional Neural Networks” (Krizhevsky et al., 2012) - AlexNet
- “Deep Residual Learning for Image Recognition” (He et al., 2015) - ResNet
- “Visualizing and Understanding Convolutional Networks” (Zeiler & Fergus, 2013)
- “Deep Inside Convolutional Networks: Visualising Image Classification Models” (Simonyan et al., 2013) - Saliency maps
Previous: ← Lecture 14: Understanding Transformers | Next: Lecture 16: Recurrent Neural Networks →