Lecture 6: Model Evaluation and Data Deception

When Data Lies - Real-World Deception in Data Science

Overview

In this eye-opening lecture, we expose the hidden traps that cause even experienced data scientists to draw wrong conclusions. Through real datasets including the Datasaurus Dozen, UC Berkeley admissions paradox, and credit card fraud detection, you’ll learn how identical statistics can hide wildly different patterns, why 99.8% accuracy can be completely useless, and how data leakage creates models that fail catastrophically in production. This lecture will fundamentally change how you evaluate models and interpret data.

Learning Objectives

By the end of this lecture, you will:

Detect when summary statistics hide critical patterns using visualization techniques
Recognize and resolve Simpson’s Paradox in aggregated data
Choose appropriate metrics beyond accuracy for imbalanced problems
Identify and prevent data leakage in time series and customer data
Interpret ML visualizations (ROC curves, confusion matrices) correctly
Build a systematic evaluation framework for production models

Materials

Quick Access

Model Evaluation and Data Deception Notebook

Datasets & Acknowledgments

Real Datasets Used

Datasaurus Dozen: Created by Autodesk Research — 13 datasets with identical statistics but widely differing spatial patterns
UC Berkeley Admissions (1973): Historical admissions data illustrating Simpson’s Paradox
Credit Card Fraud (Kaggle): Transactions from European cardholders over two days; 492 frauds out of 284,807 transactions (0.172%), resulting in a highly imbalanced dataset
Telco Customer Churn (IBM, Kaggle): Customer account data used to demonstrate evaluation pitfalls and potential data leakage

Key Takeaways

Critical Evaluation Principles

Always visualize your data — Summary statistics alone can hide dinosaurs (literally!)
Accuracy lies for imbalanced data — A 99.8% accurate model can be completely useless
Data leakage is subtle and devastating — Always ask: “Would I have this information at prediction time?”
Context matters — What looks like discrimination at one level may be the opposite at another (Simpson’s Paradox)

Previous: ← Lecture 5: Probabilistic Classification | Next: Lecture 7: Overfitting and Regularization →