AI Skill Course Course 2 · Intermediate
Module 06 of 12
Course 2 · Module 06 · 85 minutes

The pivot to
deep learning
starts here.

Every neural network — from your phone's autocomplete to GPT-5 — is built from one component repeated millions of times: the neuron. It's not magic. It's a weighted sum plus a squish. By the end of this module, you'll have built one yourself, watched it learn, and trained a real network in the browser.

You'll tune
A neuron by hand
You'll watch
Gradient descent run
You'll train
A real MLP
Open the neuron
A small neural network x₁ x₂ y INPUT HIDDEN OUTPUT
Part 01 · The atom

A neuron is just
three steps.

The biological metaphor is mostly marketing. Mathematically, a neuron is one of the simplest things in machine learning: multiply, add, squish.

x₁ x₂ 1 w₁ w₂ b Σ σ SUM + ACTIVATE y output y = σ(w₁·x₁ + w₂·x₂ + b)

Three operations.

// Step 1
Multiply inputs by weights

Each input gets multiplied by a learned weight. Bigger weight = the input matters more. Negative weight = the input pushes the answer the other way.

// Step 2
Add them up · plus bias

Sum all the weighted inputs. Add a bias term (which lets the neuron shift its decision boundary off the origin). This is just one number now.

// Step 3
Squish through an activation

The sigmoid (σ) squishes any number to between 0 and 1. ReLU keeps it positive or zeros it. Without this non-linearity, stacking neurons would be pointless — you'd just get one big linear function.

Part 02 · Hands on · Build a neuron

Three sliders. One neuron.
Find the right weights.

40 data points, two classes. Your neuron has 3 knobs: weight 1, weight 2, bias. Move them. Find the line that separates the classes. Then click "Train this for me" and watch gradient descent move them automatically.

How it works.

Each slider is one parameter of the neuron. As you move them, the orange decision line updates. Points on one side are predicted class A (cyan), on the other side class B (rose). A point with an amber outline is misclassified. Toggle output gradient to see the sigmoid probability shading. Train runs real gradient descent and animates the sliders to optimal.

Two classes · find the separator
Cyan = class A · rose = class B · amber outline = misclassified
x₁ x₂
y = σ(w₁·x₁ + w₂·x₂ + b)
Weight 1 · w₁ 0.0
Weight 2 · w₂ 0.0
Bias · b 0.0
Accuracy
Loss
Part 03 · Gradient descent

How the neuron learns:
roll downhill.

Training a neuron is just: try some weights, measure how wrong you are (loss), nudge the weights toward less wrong, repeat. The "nudge" direction comes from calculus — but the picture is simpler than the math.

start · high loss → minimum PARAMETER SPACE (w₁, w₂, b) LOSS

The bowl metaphor.

Imagine a 3D bowl. The bottom of the bowl is where loss is minimum — the best weights. Your neuron starts at some random point on the side of the bowl. Gradient descent is just: look at the slope, take a small step downhill, repeat.

The "gradient" is the calculus answer to "which direction is steepest downhill from here?" Subtract a fraction of it from your weights and you're closer to the bottom.

Learning rate How big the steps are. Too small = slow. Too big = overshoot and oscillate.
Epoch One full pass over the training data. Usually you do tens to thousands.
Local minimum A "ditch" that isn't the deepest spot. Big neural networks have lots of these — surprisingly, this turns out to be okay.
Part 04 · Why depth wins

One neuron draws a line.
Stack them to draw anything.

A single neuron with sigmoid activation can only draw a straight decision boundary. That's enough for some problems — but not most real ones. Adding a hidden layer of neurons lets the network learn curves, blobs, spirals, anything.

Single neuron

Linear only.

One straight line. Can solve "AND", "OR" — but not "XOR" (where the answer depends on a non-linear combination of inputs).

XOR: no single line works
// What it can do
Linear classification (AND, OR)
Linear regression
XOR · circles · spirals · anything curved
Images, language, audio
Multi-layer network

Anything continuous.

With just one hidden layer of "enough" neurons, you can approximate any continuous function. This is the universal approximation theorem — and it's literally true.

curves easily — depth solves XOR
// What it can do
XOR · circles · spirals · any shape
Image recognition (with depth)
Language modeling (with depth + attention)
Anything you can label and feed it
Part 05 · The proof

Single neuron vs MLP.
Same data. Different result.

A two-moons dataset — two interlocking curved clusters. A single neuron can't separate them (it's non-linear). A small MLP with one hidden layer separates them cleanly. The same shift that made deep learning possible.

Py
Python runtime + scikit-learn Loading Pyodide... 0%
Loading
Part 06 · Knowledge check

Five questions on what
you just trained.

Aim for 4/5. Wrong answers explain themselves.

Question 01 of 05

0/5

Continue
Course 2 · Module 06 complete

"Deep learning" is no longer
a black box to you.

You built a single neuron, watched gradient descent move its weights, then saw a multi-layer network solve a problem the single neuron couldn't. Every modern AI system — ChatGPT, Midjourney, Tesla autopilot — is millions of these neurons, organized in clever ways. You now know the unit they're all made of.

Up next · Course 2 · Module 07

Computer Vision Basics

Why one layer became thousands. CNNs (convolutional neural networks) added two simple tricks to the basic neuron — filters and pooling — and unlocked computer vision. You'll train an image classifier in the browser.

Continue to Module 07