AI Skill Course Course 2 · Intermediate
Module 04 of 12
Course 2 · Module 04 · 75 minutes

When linear
isn't enough,
trees branch.

Decision trees split your data with questions until each piece is pure. Then random forests grow many trees and vote. Then gradient boosting trains trees that fix each other's mistakes. These three algorithms still win most Kaggle competitions — and they're easier to understand than any neural net.

You'll build
A tree by hand
You'll compare
Tree vs Forest vs Boost
You'll see
Why ensembles win
Start splitting
age < 30? income > 50k? visits > 3? buy skip skip buy yes no Decision Tree
Part 01 · The why

Three problems where
regression can't help —
but a tree can.

Linear regression draws one line. That's powerful, but limited. Real data often needs different rules in different regions — exactly what trees do naturally.

// Problem 01

Non-linear interactions

"Customers under 30 buy if income is high. Customers over 30 buy if they've visited the site 3+ times." A regression line can't say "it depends." A tree can.

Trees handle: if age < 30 & income > 50k → buy
if age >= 30 & visits > 3 → buy
else → skip
// Problem 02

Mixed feature types

Categorical features ("city = Mumbai") and numeric features ("salary = 85k") side by side. Regression needs encoding, scaling, normalization. Trees just split — same code path for any feature type.

Trees handle: if city in [Mumbai, Delhi] → branch left
if salary > 80k → branch right
no preprocessing required
// Problem 03

You need to explain it

A neural net or even a regression with 200 features is a black box. A tree is a flowchart you can show to a regulator, a manager, or a customer. "Why was I rejected?" → trace the path.

Trees handle: "Your loan was denied because
credit_score < 650 AND
debt_ratio > 0.4"
// Problem 04

You don't know the shape yet

For linear regression you have to guess: is this relationship linear? Quadratic? With trees you don't guess — they learn whatever piecewise shape the data has.

Trees handle: No assumption about the shape
of the input-output relationship.
Just split where it helps.
Part 02 · Hands on

Build a tree.
One split at a time.

Below: 60 customers plotted by months with company and monthly spend. Cyan = stayed, rose = churned. The pattern is non-linear (a single line can't separate them) — but a tree of 3-4 splits can. You pick the splits.

How to play.

Click a candidate split below. The dashed amber line appears on the plot, splitting one region into two. Each region becomes a colored prediction (cyan or rose) based on the majority class inside. Track your accuracy. Try to get above 90% — most learners need 3-4 splits.

Customer churn — predict who leaves
Cyan dots stayed · rose dots churned
Months with company Monthly spend ($)
Splits made 0
Leaf regions 1
Accuracy
// Candidate splits — pick one
Loading...
// Tree so far
Pick a split to grow the tree

Part 03 · How the math picks splits

The tree's three
simple rules.

You were guessing which split looked good. A real algorithm does it systematically — by literally trying all possible splits and picking the one that reduces "impurity" most.

01

Try every split

For each feature, for each possible threshold, ask: if I split here, how pure are the two resulting groups?

"Pure" = all one class. "Impure" = mixed.
02

Measure with Gini

Gini impurity is 0 when a group is all one class, 0.5 when classes are 50/50. Algorithms pick the split that drops weighted Gini the most.

Gini = 1 − (p_cyan² + p_rose²)
03

Repeat until pure (or you stop it)

Each branch can split again. Without a stopping rule, a tree will keep splitting until every leaf has one sample — that's overfitting. Hence: random forests.

max_depth · min_samples_split · min_samples_leaf
Part 04 · The three-way fight

Single tree. Random forest.
Gradient boosting. On the same data.

Same dataset (1000 customers, churn prediction). Three models. Watch the accuracy climb — and the overfitting gap shrink.

Py
Python runtime + scikit-learn Loading Pyodide... 0%
Loading
Part 05 · The three families

When to reach for
which.

These three are 80% of all tabular ML work in production. Memorize the trade-offs.

Single tree

Decision Tree

One tree, grown deep. Easy to explain (a flowchart you can read). Famously prone to overfitting — memorizes training data instead of generalizing.

AccuracyDecent
SpeedVery fast
InterpretableYes — fully
OverfitsEasily
In sklearn: DecisionTreeClassifier
Bagging

Random Forest

100+ trees, each trained on a random sample of the data with a random subset of features. They vote. The averaging cancels out individual trees' overfitting.

AccuracyStrong
SpeedFast
InterpretablePartly
OverfitsRarely
In sklearn: RandomForestClassifier
Boosting

Gradient Boosting

Trees in a sequence. Each new tree trains specifically on what the previous trees got wrong. Slower to train, but consistently the highest accuracy on tabular data.

AccuracyBest
SpeedSlower train
InterpretablePartly
OverfitsPossible
In sklearn: GradientBoostingClassifier
Industry favorites: XGBoost · LightGBM
Part 06 · Knowledge check

Five questions on
what you just grew.

Aim for 4/5. Wrong answers explain themselves.

Question 01 of 05

0/5

Continue
Course 2 · Module 04 complete

You now know 80%
of practical tabular ML.

Trees, forests, boosting — these three algorithms dominate every real-world non-image ML problem. You built a tree by hand. You trained all three with sklearn. You know when to reach for which. Genuinely a working ML practitioner now.

Up next · Course 2 · Module 05

Unsupervised Learning

Until now you've had labels. What if you don't? Clustering finds groups in data without being told what to look for. You'll play with a live k-means visualization, adjust k with a slider, and watch the centroids dance.

Continue to Module 05