AI Skill Course Course 2 · Intermediate
Module 03 of 12
Course 2 · Module 03 · 75 minutes

Your first
predictive
model.

Regression is the workhorse of practical ML. Predicting house prices, sales, ad clicks, exam scores — it's all the same trick: find the line (or curve) that best matches your data. In the next 75 minutes, you'll do this by hand, then with scikit-learn.

You'll drag
A real fit line
You'll train
A real sklearn model
You'll predict
On unseen data
Start fitting
study hours → score y = mx + b
Part 01 · The big idea

Regression is just
prediction with a line.

Given some examples of "input → output", regression finds the best line (or curve) that maps inputs to outputs. Then you use that line to predict outputs for new inputs you haven't seen. That's it. That's the whole game.

Input
Square feet
↓ regression model ↓
Predicts
House price
Train on 10,000 past sales, predict the price of a new listing.
Input
Hours studied
↓ regression model ↓
Predicts
Exam score
The example you're about to fit by hand. Below.
Input
Marketing spend
↓ regression model ↓
Predicts
Q4 revenue
Every business analytics dashboard runs a model like this under the hood.
Part 02 · Hands on

Drag the line.
Find the best fit.

15 data points: study hours vs exam score for a real (simulated) class. Your job: drag the amber line endpoints so the line "best matches" all the points. Watch the error (MSE) drop as you get closer.

How it works.

Grab either endpoint of the amber line and drag. The dashed lines show residuals — the gap between each point and your line. The colored squares show squared errors (the bigger the square, the worse the prediction). Try to minimize the total. When you're satisfied, click "Find best fit" to see what the math computes — and how close you got.

Study hours vs exam score
Drag the amber endpoints to adjust the line
Study hours Exam score
Your line
y = ? x + ?
Slope (m)
Score gain per study hour
Intercept (b)
Predicted score at 0 hours
Mean squared error
Lower is better. Optimal: ?

Part 03 · The math (just enough)

How the math knows
the "best" line.

You don't need calculus to understand this. Three concepts. That's it.

01

The line equation

A line in 2D is defined by two numbers: how steep it is (slope) and where it crosses the y-axis (intercept).

y = m·x + b
02

Measure the error

For each point, the error is the gap between actual y and what the line predicts. Square it — so positive and negative errors don't cancel, and big mistakes hurt more than small ones.

error = Σ(yactual − ypred)2
03

Find m, b that minimize it

Calculus gives you an exact formula. No guessing, no iteration. The optimal m and b can be computed in one shot — which is why linear regression is so fast.

argminm,b Σ(y − (mx+b))2
Part 04 · Real model on real data

Now let scikit-learn
do it. Properly.

In practice, you never compute regressions by hand. You use scikit-learn — three lines of code, production-ready model. Let's predict house prices from square footage, bedrooms, and age.

Py
Python runtime + scikit-learn Loading Pyodide... 0%
Loading
Part 05 · The bigger picture

Two regressions you'll meet
every day.

"Linear regression" predicts a number. "Logistic regression" predicts a category. Despite the name, logistic regression is the most-used classification algorithm in the world.

Linear regression

Predicts a number.

How much? How long? How many?

y = mx + b
Output type
A continuous number
  • House prices (₹85 lakhs)
  • Sales next quarter ($2.4M)
  • Exam scores (87/100)
  • Delivery time (32 mins)
Logistic regression

Predicts a probability.

Will they? Is it? Yes or no?

1.0 0.5 0.0 sigmoid(mx+b)
Output type
A probability (0 to 1)
  • Spam? (yes/no)
  • Will user churn? (probability)
  • Click the ad? (yes/no)
  • Loan default? (probability)
Part 06 · Knowledge check

Five questions on what
you just fit.

Aim for 4/5. Wrong answers explain themselves.

Question 01 of 05

0/5

Continue
Course 2 · Module 03 complete

You trained a real ML
model. From scratch.

You fit a line by hand. You watched the math beat you (slightly). You trained a scikit-learn model. You predicted prices on data the model never saw. That's the entire ML loop in 75 minutes. The rest of the course is variations on this theme.

Up next · Course 2 · Module 04

Supervised Learning II — Trees & Forests

Decision trees, random forests, gradient boosting — the algorithms that won every Kaggle competition before deep learning came along (and still win most non-image ones). You'll build a tree by hand, then train a forest in two lines.

Continue to Module 04