Until now, you've trained models on labeled data — "this is spam, this isn't." Unsupervised learning is the opposite: you give the algorithm raw data with no labels, and it discovers patterns. Customer segments. Anomaly clusters. The hidden groups you didn't know existed.
Find the patternsIn supervised learning (Modules 3 & 4), every training example came with the right answer. In unsupervised learning, you just have data — and the algorithm has to find the structure on its own. Both are essential. They solve different problems.
150 data points. No labels. Below: the actual k-means algorithm running step-by-step, in your browser. Adjust k. Press "Step" to do one iteration at a time, or "Run" to watch it converge. Try "Shuffle" to see how random initialization changes the outcome.
Set k (number of clusters) with the slider. Then either press Step to do one iteration (assign points → move centroids), or Run to animate to convergence. Shuffle randomizes the starting centroids — important! Different starts can give different final clusters. That's the k-means catch.
K-means (Lloyd's algorithm, 1957) is one of the oldest ML algorithms and still one of the most used. The mechanics fit in four lines.
Pick k random points in the data space. These are your initial cluster centers. Their position will move as the algorithm runs — but where you start matters more than you'd think.
For every data point, compute its distance to each centroid. Assign it to the closest one. Now you have k provisional clusters.
For each cluster, compute the average position of all its assigned points. Move the centroid to that average. This is the "means" in k-means.
Go back to step 2. Assignments may change because the centroids moved. Loop until centroids barely move between iterations — convergence. Usually 5-20 iterations.
K-means won't tell you how many clusters to use. You have to choose. The most-used trick is the "elbow method" — try several values of k, plot the result, and look for where the improvement bends.
For each k from 1 to ~10, run k-means and record the inertia (sum of distances from each point to its centroid). Inertia always drops as k increases — but at some point, the marginal improvement gets tiny. That's the elbow. That's your k.
The intuition: Below the elbow, each extra cluster captures a real group. Above the elbow, you're just splitting noise.
A real customer dataset (annual spend + visit frequency). Find natural groups. Use the elbow method to pick k. Visualize the segments. This is what your data team is doing in their Slack #segmentation channel today.
K-means is the workhorse, but it's not the only tool. Each of these handles things k-means can't.
Builds a tree of merges: start with each point as its own cluster, then repeatedly merge the closest pair. You get every level of clustering at once.
Finds clusters of any shape (not just blobs). Marks outliers as "noise" instead of forcing them into a cluster. Doesn't need k upfront — just two density parameters.
Different goal: compress many dimensions into a few while keeping the structure. Used before plotting high-D data, or as input to other models. Principal Component Analysis.
Aim for 4/5. Wrong answers explain themselves.
You ran k-means yourself, watched it converge, segmented real customers, and know when to reach for DBSCAN or PCA instead. That covers the practical unsupervised toolkit — and you've now done the full classical ML quartet: regression, classification, trees, and clustering.
Continue to Module 06