Lecture 14: Understanding Transformers Through Exploration and Visualization
Attention mechanisms, GPT-2 architecture, and interpretability
Overview
Transformers have revolutionized natural language processing since their introduction in “Attention is All You Need” (Vaswani et al., 2017). This lecture explores pre-trained transformer models through visualization and analysis, specifically examining GPT-2. Rather than building transformers from scratch, we perform an “MRI scan” on a transformer’s brain while it processes text—extracting attention patterns, watching hidden representations evolve, and discovering how different components specialize in different tasks. You’ll gain an intuitive understanding of how transformers work by examining them in action.
Learning Objectives
By the end of this lecture, you will be able to:
- Extract and visualize attention weights from any pre-trained transformer model
- Identify attention head specialization and understand how different heads focus on different linguistic patterns
- Track hidden state evolution and see how word representations change through transformer layers
- Understand causal masking in autoregressive models like GPT-2
- Interpret attention patterns to understand model predictions
- Use visualization tools like BertViz and custom heatmaps for model analysis
- Recognize common attention patterns (positional, syntactic, semantic)
Materials
Connection to Project 2
This lecture provides the foundation for Project 2: Neural Archaeology, where you’ll go deeper into the hidden layers of language models to decode how they represent concepts like safety, emotion, and truthfulness. You’ll apply cutting-edge interpretability techniques from AI safety research to understand what models actually learn beyond just attention patterns.
Resources
- Papers:
- “Attention is All You Need” (Vaswani et al., 2017)
- “What Does BERT Look At?” (Clark et al., 2019)
- Tutorials:
- Jay Alammar’s “The Illustrated Transformer”
- Andrej Karpathy’s “minGPT”
Previous: ← Lecture 13: Neural Network Architecture | Next: Lecture 15: Convolutional Neural Networks →