Lecture 2: k-Nearest Neighbors

Framing Prediction Problems & Instance-Based Learning

Overview

In this hands-on lecture, we tackle a real problem: finding compatible ML project teammates using k-Nearest Neighbors. Yes, you’ll actually use this system to form your project teams! We’ll explore how to frame real-world problems as ML tasks, understand the elegance of lazy learning, and confront the challenges of high-dimensional spaces.

Learning Objectives

By the end of this lecture, you will:

Frame real-world problems as ML tasks with inputs (X) and outputs (y)
Apply k-NN for classification and regression tasks
Analyze the impact of distance metrics and feature scaling
Understand the curse of dimensionality and its practical implications
Design fair and effective matching systems with domain constraints
Evaluate trade-offs between different similarity measures

Materials

Quick Access

Lecture Notebook
Open in Colab
Lecture Folder

Pre-Class Requirements

Complete the Project Matchmaker Form by Mon 8/26, 12:00 PM ET. Required for Lecture 2: k-NN; counts toward participation.

Interactive Demo

🎮 Team Matcher Visualization

Experience k-NN in action with our Interactive Team Matching Demo

This real-time visualization lets you:

Adjust k parameter and see immediate effects
Explore different dimension pairs
View how proximity translates to similarity
Discover natural clustering patterns in the class

Key Topics

The Data Journey: From text surveys → NLP features → 8D vectors
k-NN Fundamentals: The beautiful simplicity of “you are your neighbors”
Distance Metrics: Euclidean, Manhattan, Cosine, and when each matters
Curse of Dimensionality: When all points become equidistant
Fairness in ML: Ensuring inclusive team formation

Additional Resources

Scikit-learn k-NN Documentation

Previous: ← Lecture 1: Welcome to ML | Next: Lecture 3: Linear Regression →