AI Skill Course Course 2 · Intermediate
Module 05 of 12
Course 2 · Module 05 · 65 minutes

Find structure
nobody told
you to look for.

Until now, you've trained models on labeled data — "this is spam, this isn't." Unsupervised learning is the opposite: you give the algorithm raw data with no labels, and it discovers patterns. Customer segments. Anomaly clusters. The hidden groups you didn't know existed.

You'll watch
K-means converge live
You'll adjust
k with a slider
You'll segment
A real dataset
Find the patterns
Clustering — no labels needed ◆ centroid · ● data point
Part 01 · The shift

No labels.
No teacher.

In supervised learning (Modules 3 & 4), every training example came with the right answer. In unsupervised learning, you just have data — and the algorithm has to find the structure on its own. Both are essential. They solve different problems.

// Modules 3 & 4 · Supervised

You knew the answer.

"Here are 1000 examples with the right answer. Learn the pattern, then predict on new ones."
You have
Features (X) + Labels (y)
Goal
Predict y from X on new data
Examples → Spam classifier (label: spam/not)
→ House price predictor (label: price)
→ Disease diagnosis (label: condition)
// This module · Unsupervised

You don't.

"Here are 1000 data points. Find the structure. Tell me what you see."
You have
Features (X) — no labels at all
Goal
Discover groups, anomalies, or simpler structure
Examples → Customer segmentation (who behaves alike?)
→ Anomaly detection (what's weird?)
→ Topic modeling (what themes exist?)
Part 02 · Hands on · Live algorithm

Watch k-means
find the clusters.

150 data points. No labels. Below: the actual k-means algorithm running step-by-step, in your browser. Adjust k. Press "Step" to do one iteration at a time, or "Run" to watch it converge. Try "Shuffle" to see how random initialization changes the outcome.

How to play.

Set k (number of clusters) with the slider. Then either press Step to do one iteration (assign points → move centroids), or Run to animate to convergence. Shuffle randomizes the starting centroids — important! Different starts can give different final clusters. That's the k-means catch.

150 data points · no labels
Diamond = centroid · circles = points colored by current cluster
Feature 1 Feature 2
// Number of clusters (k) 4
12345678
Iteration 0
Inertia
Status Initialized
Ready · centroids randomly placed
Part 03 · The algorithm

Four steps. Forever.
That's all it does.

K-means (Lloyd's algorithm, 1957) is one of the oldest ML algorithms and still one of the most used. The mechanics fit in four lines.

01

Place k centroids randomly

Pick k random points in the data space. These are your initial cluster centers. Their position will move as the algorithm runs — but where you start matters more than you'd think.

// random_state controls reproducibility
02

Assign each point to nearest centroid

For every data point, compute its distance to each centroid. Assign it to the closest one. Now you have k provisional clusters.

// usually Euclidean distance: √Σ(xᵢ − cᵢ)²
03

Move centroids to cluster means

For each cluster, compute the average position of all its assigned points. Move the centroid to that average. This is the "means" in k-means.

// new_centroid = mean of assigned points
04

Repeat until centroids stop moving

Go back to step 2. Assignments may change because the centroids moved. Loop until centroids barely move between iterations — convergence. Usually 5-20 iterations.

// stop when Δcentroid < tolerance
Part 04 · The hardest question

How do you
pick k?

K-means won't tell you how many clusters to use. You have to choose. The most-used trick is the "elbow method" — try several values of k, plot the result, and look for where the improvement bends.

k (number of clusters) Inertia 1 2 3 4 5 6 7 ↓ the elbow k = 4

The elbow method

For each k from 1 to ~10, run k-means and record the inertia (sum of distances from each point to its centroid). Inertia always drops as k increases — but at some point, the marginal improvement gets tiny. That's the elbow. That's your k.

The intuition: Below the elbow, each extra cluster captures a real group. Above the elbow, you're just splitting noise.

Elbow method — visual; cheap to compute. Most common in practice.
Silhouette score — measures how "tight" each cluster is. Higher is better. More principled but slower.
Domain knowledge — sometimes you just know there should be ~3 customer segments. Don't overthink it.
Part 05 · Real segmentation

Segment 500 customers
with three lines of sklearn.

A real customer dataset (annual spend + visit frequency). Find natural groups. Use the elbow method to pick k. Visualize the segments. This is what your data team is doing in their Slack #segmentation channel today.

Py
Python runtime + scikit-learn Loading Pyodide... 0%
Loading
Part 06 · Beyond k-means

Three other unsupervised
methods you'll meet.

K-means is the workhorse, but it's not the only tool. Each of these handles things k-means can't.

Hierarchical

Agglomerative Clustering

Builds a tree of merges: start with each point as its own cluster, then repeatedly merge the closest pair. You get every level of clustering at once.

Use when: you don't know k, and you want a dendrogram
noise
Density-based

DBSCAN

Finds clusters of any shape (not just blobs). Marks outliers as "noise" instead of forcing them into a cluster. Doesn't need k upfront — just two density parameters.

Use when: clusters are non-spherical or you want anomaly detection
PC1 (most variance) 100 dimensions → 2
Dim. reduction

PCA

Different goal: compress many dimensions into a few while keeping the structure. Used before plotting high-D data, or as input to other models. Principal Component Analysis.

Use when: too many features to visualize or to feed downstream
Part 07 · Knowledge check

Five questions on what
you just discovered.

Aim for 4/5. Wrong answers explain themselves.

Question 01 of 05

0/5

Continue
Course 2 · Module 05 complete

You can now find groups
without being told what to look for.

You ran k-means yourself, watched it converge, segmented real customers, and know when to reach for DBSCAN or PCA instead. That covers the practical unsupervised toolkit — and you've now done the full classical ML quartet: regression, classification, trees, and clustering.

Up next · Course 2 · Module 06

Neural Networks from Scratch

The pivot to deep learning starts here. You'll build a single neuron in JavaScript, watch backpropagation happen step by step, then train a real neural net in the browser. After this module, "deep learning" stops being a black box.

Continue to Module 06