Lecture 14: Understanding Transformers Through Exploration and Visualization

Attention mechanisms, GPT-2 architecture, and interpretability

Overview

Transformers have revolutionized natural language processing since their introduction in “Attention is All You Need” (Vaswani et al., 2017). This lecture explores pre-trained transformer models through visualization and analysis, specifically examining GPT-2. Rather than building transformers from scratch, we perform an “MRI scan” on a transformer’s brain while it processes text—extracting attention patterns, watching hidden representations evolve, and discovering how different components specialize in different tasks. You’ll gain an intuitive understanding of how transformers work by examining them in action.

Learning Objectives

By the end of this lecture, you will be able to:

Extract and visualize attention weights from any pre-trained transformer model
Identify attention head specialization and understand how different heads focus on different linguistic patterns
Track hidden state evolution and see how word representations change through transformer layers
Understand causal masking in autoregressive models like GPT-2
Interpret attention patterns to understand model predictions
Use visualization tools like BertViz and custom heatmaps for model analysis
Recognize common attention patterns (positional, syntactic, semantic)

Materials

Quick Access

Transformers Notebook

Connection to Project 2

This lecture provides the foundation for Project 2: Neural Archaeology, where you’ll go deeper into the hidden layers of language models to decode how they represent concepts like safety, emotion, and truthfulness. You’ll apply cutting-edge interpretability techniques from AI safety research to understand what models actually learn beyond just attention patterns.

Resources

Papers:
- “Attention is All You Need” (Vaswani et al., 2017)
- “What Does BERT Look At?” (Clark et al., 2019)
Tutorials:
- Jay Alammar’s “The Illustrated Transformer”
- Andrej Karpathy’s “minGPT”

Previous: ← Lecture 13: Neural Network Architecture | Next: Lecture 15: Convolutional Neural Networks →