Every neural network — from your phone's autocomplete to GPT-5 — is built from one component repeated millions of times: the neuron. It's not magic. It's a weighted sum plus a squish. By the end of this module, you'll have built one yourself, watched it learn, and trained a real network in the browser.
Open the neuronThe biological metaphor is mostly marketing. Mathematically, a neuron is one of the simplest things in machine learning: multiply, add, squish.
Each input gets multiplied by a learned weight. Bigger weight = the input matters more. Negative weight = the input pushes the answer the other way.
Sum all the weighted inputs. Add a bias term (which lets the neuron shift its decision boundary off the origin). This is just one number now.
The sigmoid (σ) squishes any number to between 0 and 1. ReLU keeps it positive or zeros it. Without this non-linearity, stacking neurons would be pointless — you'd just get one big linear function.
40 data points, two classes. Your neuron has 3 knobs: weight 1, weight 2, bias. Move them. Find the line that separates the classes. Then click "Train this for me" and watch gradient descent move them automatically.
Each slider is one parameter of the neuron. As you move them, the orange decision line updates. Points on one side are predicted class A (cyan), on the other side class B (rose). A point with an amber outline is misclassified. Toggle output gradient to see the sigmoid probability shading. Train runs real gradient descent and animates the sliders to optimal.
Training a neuron is just: try some weights, measure how wrong you are (loss), nudge the weights toward less wrong, repeat. The "nudge" direction comes from calculus — but the picture is simpler than the math.
Imagine a 3D bowl. The bottom of the bowl is where loss is minimum — the best weights. Your neuron starts at some random point on the side of the bowl. Gradient descent is just: look at the slope, take a small step downhill, repeat.
The "gradient" is the calculus answer to "which direction is steepest downhill from here?" Subtract a fraction of it from your weights and you're closer to the bottom.
A single neuron with sigmoid activation can only draw a straight decision boundary. That's enough for some problems — but not most real ones. Adding a hidden layer of neurons lets the network learn curves, blobs, spirals, anything.
One straight line. Can solve "AND", "OR" — but not "XOR" (where the answer depends on a non-linear combination of inputs).
With just one hidden layer of "enough" neurons, you can approximate any continuous function. This is the universal approximation theorem — and it's literally true.
A two-moons dataset — two interlocking curved clusters. A single neuron can't separate them (it's non-linear). A small MLP with one hidden layer separates them cleanly. The same shift that made deep learning possible.
Aim for 4/5. Wrong answers explain themselves.
You built a single neuron, watched gradient descent move its weights, then saw a multi-layer network solve a problem the single neuron couldn't. Every modern AI system — ChatGPT, Midjourney, Tesla autopilot — is millions of these neurons, organized in clever ways. You now know the unit they're all made of.
Continue to Module 07