Course 2 · Module 04 · 75 minutes

When linear
isn't enough,
trees branch.

Decision trees split your data with questions until each piece is pure. Then random forests grow many trees and vote. Then gradient boosting trains trees that fix each other's mistakes. These three algorithms still win most Kaggle competitions — and they're easier to understand than any neural net.

You'll build

A tree by hand

You'll compare

Tree vs Forest vs Boost

You'll see

Why ensembles win

Start splitting

Part 01 · The why

Three problems where
regression can't help —
but a tree can.

Linear regression draws one line. That's powerful, but limited. Real data often needs different rules in different regions — exactly what trees do naturally.

// Problem 01

Non-linear interactions

"Customers under 30 buy if income is high. Customers over 30 buy if they've visited the site 3+ times." A regression line can't say "it depends." A tree can.

Trees handle: if age < 30 & income > 50k → buy
if age >= 30 & visits > 3 → buy
else → skip

// Problem 02

Mixed feature types

Categorical features ("city = Mumbai") and numeric features ("salary = 85k") side by side. Regression needs encoding, scaling, normalization. Trees just split — same code path for any feature type.

Trees handle: if city in [Mumbai, Delhi] → branch left
if salary > 80k → branch right
no preprocessing required

// Problem 03

You need to explain it

A neural net or even a regression with 200 features is a black box. A tree is a flowchart you can show to a regulator, a manager, or a customer. "Why was I rejected?" → trace the path.

Trees handle: "Your loan was denied because
credit_score < 650 AND
debt_ratio > 0.4"

// Problem 04

You don't know the shape yet

For linear regression you have to guess: is this relationship linear? Quadratic? With trees you don't guess — they learn whatever piecewise shape the data has.

Trees handle: No assumption about the shape
of the input-output relationship.
Just split where it helps.

Part 02 · Hands on

Build a tree.
One split at a time.

Below: 60 customers plotted by months with company and monthly spend. Cyan = stayed, rose = churned. The pattern is non-linear (a single line can't separate them) — but a tree of 3-4 splits can. You pick the splits.

How to play.

Click a candidate split below. The dashed amber line appears on the plot, splitting one region into two. Each region becomes a colored prediction (cyan or rose) based on the majority class inside. Track your accuracy. Try to get above 90% — most learners need 3-4 splits.

Customer churn — predict who leaves

Cyan dots stayed · rose dots churned

Splits made 0

Leaf regions 1

Accuracy —

// Candidate splits — pick one

// Tree so far

—

Part 03 · How the math picks splits

The tree's three
simple rules.

You were guessing which split looked good. A real algorithm does it systematically — by literally trying all possible splits and picking the one that reduces "impurity" most.

Try every split

For each feature, for each possible threshold, ask: if I split here, how pure are the two resulting groups?

"Pure" = all one class. "Impure" = mixed.

Measure with Gini

Gini impurity is 0 when a group is all one class, 0.5 when classes are 50/50. Algorithms pick the split that drops weighted Gini the most.

Gini = 1 − (p_cyan² + p_rose²)

Repeat until pure (or you stop it)

Each branch can split again. Without a stopping rule, a tree will keep splitting until every leaf has one sample — that's overfitting. Hence: random forests.

max_depth · min_samples_split · min_samples_leaf

Part 05 · The three families

When to reach for
which.

These three are 80% of all tabular ML work in production. Memorize the trade-offs.

Single tree

Decision Tree

One tree, grown deep. Easy to explain (a flowchart you can read). Famously prone to overfitting — memorizes training data instead of generalizing.

AccuracyDecent

SpeedVery fast

InterpretableYes — fully

OverfitsEasily

In sklearn: DecisionTreeClassifier

Bagging

Random Forest

100+ trees, each trained on a random sample of the data with a random subset of features. They vote. The averaging cancels out individual trees' overfitting.

AccuracyStrong

SpeedFast

InterpretablePartly

OverfitsRarely

In sklearn: RandomForestClassifier

Boosting

Gradient Boosting

Trees in a sequence. Each new tree trains specifically on what the previous trees got wrong. Slower to train, but consistently the highest accuracy on tabular data.

AccuracyBest

SpeedSlower train

InterpretablePartly

OverfitsPossible

In sklearn: GradientBoostingClassifier
Industry favorites: XGBoost · LightGBM

Course 2 · Module 04 complete

You now know 80%
of practical tabular ML.

Trees, forests, boosting — these three algorithms dominate every real-world non-image ML problem. You built a tree by hand. You trained all three with sklearn. You know when to reach for which. Genuinely a working ML practitioner now.

Up next · Course 2 · Module 05

Unsupervised Learning

Until now you've had labels. What if you don't? Clustering finds groups in data without being told what to look for. You'll play with a live k-means visualization, adjust k with a slider, and watch the centroids dance.

Continue to Module 05

When linear
isn't enough,
trees branch.

Three problems where
regression can't help —
but a tree can.

Non-linear interactions

Mixed feature types

You need to explain it

You don't know the shape yet

Build a tree.
One split at a time.

The tree's three
simple rules.

Try every split

Measure with Gini

Repeat until pure (or you stop it)

Single tree. Random forest.
Gradient boosting. On the same data.

When to reach for
which.

Decision Tree

Random Forest

Gradient Boosting

Five questions on
what you just grew.

You now know 80%
of practical tabular ML.

Unsupervised Learning

When linearisn't enough,trees branch.

Three problems whereregression can't help —but a tree can.

Non-linear interactions

Mixed feature types

You need to explain it

You don't know the shape yet

Build a tree.One split at a time.

The tree's threesimple rules.

Try every split

Measure with Gini

Repeat until pure (or you stop it)

Single tree. Random forest.Gradient boosting. On the same data.

When to reach forwhich.

Decision Tree

Random Forest

Gradient Boosting

Five questions onwhat you just grew.

You now know 80%of practical tabular ML.

Unsupervised Learning

When linear
isn't enough,
trees branch.

Three problems where
regression can't help —
but a tree can.

Build a tree.
One split at a time.

The tree's three
simple rules.

Single tree. Random forest.
Gradient boosting. On the same data.

When to reach for
which.

Five questions on
what you just grew.

You now know 80%
of practical tabular ML.