Decision trees split your data with questions until each piece is pure. Then random forests grow many trees and vote. Then gradient boosting trains trees that fix each other's mistakes. These three algorithms still win most Kaggle competitions — and they're easier to understand than any neural net.
Start splittingLinear regression draws one line. That's powerful, but limited. Real data often needs different rules in different regions — exactly what trees do naturally.
"Customers under 30 buy if income is high. Customers over 30 buy if they've visited the site 3+ times." A regression line can't say "it depends." A tree can.
Categorical features ("city = Mumbai") and numeric features ("salary = 85k") side by side. Regression needs encoding, scaling, normalization. Trees just split — same code path for any feature type.
A neural net or even a regression with 200 features is a black box. A tree is a flowchart you can show to a regulator, a manager, or a customer. "Why was I rejected?" → trace the path.
For linear regression you have to guess: is this relationship linear? Quadratic? With trees you don't guess — they learn whatever piecewise shape the data has.
Below: 60 customers plotted by months with company and monthly spend. Cyan = stayed, rose = churned. The pattern is non-linear (a single line can't separate them) — but a tree of 3-4 splits can. You pick the splits.
Click a candidate split below. The dashed amber line appears on the plot, splitting one region into two. Each region becomes a colored prediction (cyan or rose) based on the majority class inside. Track your accuracy. Try to get above 90% — most learners need 3-4 splits.
—
You were guessing which split looked good. A real algorithm does it systematically — by literally trying all possible splits and picking the one that reduces "impurity" most.
For each feature, for each possible threshold, ask: if I split here, how pure are the two resulting groups?
Gini impurity is 0 when a group is all one class, 0.5 when classes are 50/50. Algorithms pick the split that drops weighted Gini the most.
Each branch can split again. Without a stopping rule, a tree will keep splitting until every leaf has one sample — that's overfitting. Hence: random forests.
Same dataset (1000 customers, churn prediction). Three models. Watch the accuracy climb — and the overfitting gap shrink.
These three are 80% of all tabular ML work in production. Memorize the trade-offs.
One tree, grown deep. Easy to explain (a flowchart you can read). Famously prone to overfitting — memorizes training data instead of generalizing.
DecisionTreeClassifier100+ trees, each trained on a random sample of the data with a random subset of features. They vote. The averaging cancels out individual trees' overfitting.
RandomForestClassifierTrees in a sequence. Each new tree trains specifically on what the previous trees got wrong. Slower to train, but consistently the highest accuracy on tabular data.
GradientBoostingClassifierXGBoost · LightGBMAim for 4/5. Wrong answers explain themselves.
Trees, forests, boosting — these three algorithms dominate every real-world non-image ML problem. You built a tree by hand. You trained all three with sklearn. You know when to reach for which. Genuinely a working ML practitioner now.
Continue to Module 05