Lecture 6: Model Evaluation and Data Deception
When Data Lies - Real-World Deception in Data Science
Overview
In this eye-opening lecture, we expose the hidden traps that cause even experienced data scientists to draw wrong conclusions. Through real datasets including the Datasaurus Dozen, UC Berkeley admissions paradox, and credit card fraud detection, you’ll learn how identical statistics can hide wildly different patterns, why 99.8% accuracy can be completely useless, and how data leakage creates models that fail catastrophically in production. This lecture will fundamentally change how you evaluate models and interpret data.
Learning Objectives
By the end of this lecture, you will:
- Detect when summary statistics hide critical patterns using visualization techniques
- Recognize and resolve Simpson’s Paradox in aggregated data
- Choose appropriate metrics beyond accuracy for imbalanced problems
- Identify and prevent data leakage in time series and customer data
- Interpret ML visualizations (ROC curves, confusion matrices) correctly
- Build a systematic evaluation framework for production models
Materials
TipQuick Access
Datasets & Acknowledgments
Real Datasets Used
- Datasaurus Dozen: Created by Autodesk Research — 13 datasets with identical statistics but widely differing spatial patterns
- UC Berkeley Admissions (1973): Historical admissions data illustrating Simpson’s Paradox
- Credit Card Fraud (Kaggle): Transactions from European cardholders over two days; 492 frauds out of 284,807 transactions (0.172%), resulting in a highly imbalanced dataset
- Telco Customer Churn (IBM, Kaggle): Customer account data used to demonstrate evaluation pitfalls and potential data leakage
Key Takeaways
WarningCritical Evaluation Principles
- Always visualize your data — Summary statistics alone can hide dinosaurs (literally!)
- Accuracy lies for imbalanced data — A 99.8% accurate model can be completely useless
- Data leakage is subtle and devastating — Always ask: “Would I have this information at prediction time?”
- Context matters — What looks like discrimination at one level may be the opposite at another (Simpson’s Paradox)
Previous: ← Lecture 5: Probabilistic Classification | Next: Lecture 7: Overfitting and Regularization →