Course 2 · Module 03 · 75 minutes

Your first
predictive
model.

Regression is the workhorse of practical ML. Predicting house prices, sales, ad clicks, exam scores — it's all the same trick: find the line (or curve) that best matches your data. In the next 75 minutes, you'll do this by hand, then with scikit-learn.

You'll drag

A real fit line

You'll train

A real sklearn model

You'll predict

On unseen data

Start fitting

Part 01 · The big idea

Regression is just
prediction with a line.

Given some examples of "input → output", regression finds the best line (or curve) that maps inputs to outputs. Then you use that line to predict outputs for new inputs you haven't seen. That's it. That's the whole game.

Input

Square feet

↓ regression model ↓

Predicts

House price

Train on 10,000 past sales, predict the price of a new listing.

Input

Hours studied

↓ regression model ↓

Predicts

Exam score

The example you're about to fit by hand. Below.

Input

Marketing spend

↓ regression model ↓

Predicts

Q4 revenue

Every business analytics dashboard runs a model like this under the hood.

Part 02 · Hands on

Drag the line.
Find the best fit.

15 data points: study hours vs exam score for a real (simulated) class. Your job: drag the amber line endpoints so the line "best matches" all the points. Watch the error (MSE) drop as you get closer.

How it works.

Grab either endpoint of the amber line and drag. The dashed lines show residuals — the gap between each point and your line. The colored squares show squared errors (the bigger the square, the worse the prediction). Try to minimize the total. When you're satisfied, click "Find best fit" to see what the math computes — and how close you got.

Study hours vs exam score

Drag the amber endpoints to adjust the line

Your line

y = ? x + ?

Slope (m)

—

Score gain per study hour

Intercept (b)

—

Predicted score at 0 hours

Mean squared error

—

Lower is better. Optimal: ?

—

Part 03 · The math (just enough)

How the math knows
the "best" line.

You don't need calculus to understand this. Three concepts. That's it.

The line equation

A line in 2D is defined by two numbers: how steep it is (slope) and where it crosses the y-axis (intercept).

y = m·x + b

Measure the error

For each point, the error is the gap between actual y and what the line predicts. Square it — so positive and negative errors don't cancel, and big mistakes hurt more than small ones.

error = Σ(y_actual − y_pred)²

Find m, b that minimize it

Calculus gives you an exact formula. No guessing, no iteration. The optimal m and b can be computed in one shot — which is why linear regression is so fast.

argmin_m,b Σ(y − (mx+b))²

Part 04 · Real model on real data

Now let scikit-learn
do it. Properly.

In practice, you never compute regressions by hand. You use scikit-learn — three lines of code, production-ready model. Let's predict house prices from square footage, bedrooms, and age.

Python runtime + scikit-learn Loading Pyodide... 0%

Part 05 · The bigger picture

Two regressions you'll meet
every day.

"Linear regression" predicts a number. "Logistic regression" predicts a category. Despite the name, logistic regression is the most-used classification algorithm in the world.

Linear regression

Predicts a number.

How much? How long? How many?

Output type

A continuous number

House prices (₹85 lakhs)
Sales next quarter ($2.4M)
Exam scores (87/100)
Delivery time (32 mins)

Logistic regression

Predicts a probability.

Will they? Is it? Yes or no?

Output type

A probability (0 to 1)

Spam? (yes/no)
Will user churn? (probability)
Click the ad? (yes/no)
Loan default? (probability)

Part 06 · Knowledge check

Five questions on what
you just fit.

Aim for 4/5. Wrong answers explain themselves.

Question 01 of 05

0/5

Continue

Course 2 · Module 03 complete

You trained a real ML
model. From scratch.

You fit a line by hand. You watched the math beat you (slightly). You trained a scikit-learn model. You predicted prices on data the model never saw. That's the entire ML loop in 75 minutes. The rest of the course is variations on this theme.

Up next · Course 2 · Module 04

Supervised Learning II — Trees & Forests

Decision trees, random forests, gradient boosting — the algorithms that won every Kaggle competition before deep learning came along (and still win most non-image ones). You'll build a tree by hand, then train a forest in two lines.

Continue to Module 04

Your firstpredictivemodel.

Regression is justprediction with a line.

Drag the line.Find the best fit.

How the math knowsthe "best" line.

The line equation

Measure the error

Find m, b that minimize it

Now let scikit-learndo it. Properly.

Two regressions you'll meetevery day.

Predicts a number.

Predicts a probability.

Five questions on whatyou just fit.

You trained a real MLmodel. From scratch.

Supervised Learning II — Trees & Forests

Your first
predictive
model.

Regression is just
prediction with a line.

Drag the line.
Find the best fit.

How the math knows
the "best" line.

Now let scikit-learn
do it. Properly.

Two regressions you'll meet
every day.

Five questions on what
you just fit.

You trained a real ML
model. From scratch.