AI Skill Course Course 2 · Intermediate
Module 08 of 12
Course 2 · Module 08 · 70 minutes

From pixels
to words.
All numbers.

Computers can't read English. They can only do math on numbers. So before any model can understand a sentence, two things must happen: the text gets broken into pieces (tokens), and each piece becomes a vector of numbers (embedding). Get those two right and everything from search to ChatGPT becomes possible.

You'll tokenize
Any text you type
You'll explore
A 3D word space
You'll learn
Why "king − man + woman = queen"
Decode the language
Words become numbers "cat" [0.21, -0.45, 0.73, 0.12, ...] "dog" [0.18, -0.41, 0.69, 0.15, ...] "pizza" [-0.62, 0.31, -0.15, 0.81, ...] similar words → similar vectors
Part 01 · The fundamental problem

Computers can't read.
They only do math.

Before any NLP model can do anything useful, raw text has to become numbers. The two-step pipeline below powers every search engine, every chatbot, every language model — including ChatGPT, Claude, Gemini, all of them.

// Step 01

Tokenization

Break the text into pieces called tokens. Sometimes a token is a whole word ("cat"). Sometimes it's a subword ("anti" + "freeze"). Sometimes a single character.

Example "Hello, world!" →
["Hello", ",", " world", "!"]
4 tokens
// Step 02

Token IDs

Each unique token gets an integer ID from a fixed vocabulary (typically 50,000 entries). The model only ever sees these numbers, never the original text.

Example "Hello" → ID 15496
" world" → ID 2159
The model never sees the letters
// Step 03

Embeddings

Each token ID becomes a long vector (often 768 or 4096 numbers). These vectors are learned so similar words end up with similar vectors.

Example ID 15496 → [0.23, -0.41,
0.67, 0.15, ... ] (768 floats)
Now you can do math.
Part 02 · Hands on · Tokenization

Type something.
Watch the model see it.

Below is a simplified tokenizer working in real time. Type any text and see how it breaks into tokens, each with its own ID and color. Try common words, made-up words, code, even other languages — and notice the token count. (That's also how API providers charge you: per token.)

Try a few things.

Common English words get one token each. Uncommon or technical words get broken into subword pieces. Capital letters, punctuation, and emojis each get their own tokens. The total token count is what matters for GPT-4, Claude, and most APIs — they all charge per token.

Characters
Tokens
Chars per token
GPT-4 cost (input)
Part 03 · The leap

From IDs to
meaning.

Token IDs are just arbitrary numbers — there's nothing meaningful about ID 15496. The real magic happens next: each token gets turned into a long vector of numbers (an embedding) where similar words end up close to each other in that vector space. This is what makes everything else possible.

An embedding is a learned vector.

An embedding is a long list of numbers (typically 256-4096 floats) assigned to each token. These numbers aren't designed by hand — they're learned from billions of examples of how words are used together.

The key property: similar words get similar vectors. "Dog" and "puppy" will have vectors that point in nearly the same direction. "Dog" and "spreadsheet" will point in different directions.

This isn't just for words. Modern systems embed sentences, paragraphs, even entire documents. The whole field of "semantic search" is just comparing these vectors.

// Real embeddings, simplified (4 dims shown of 768)
cat
+0.21 −0.45 +0.73 +0.12
dog
+0.18 −0.41 +0.69 +0.15
puppy
+0.20 −0.39 +0.74 +0.10
pizza
−0.62 +0.31 −0.15 +0.81
code
+0.55 +0.32 −0.61 −0.39
↑ "cat", "dog", "puppy" share a direction. "pizza" doesn't.
Part 04 · Hands on · The semantic universe

Drag the universe.
Click a word. See its neighbors.

36 words plotted in 3D space using simulated embeddings. Real embeddings live in 768+ dimensions — impossible to visualize directly — but the same clustering happens. Animals near animals, royalty near royalty, foods near foods. Drag to rotate. Click any word to highlight its 5 nearest neighbors.

Explore.

Click and drag the cube to rotate. Click any word to select it — its 5 nearest semantic neighbors light up, connected by dashed cyan lines. Notice how "king", "queen", "prince" cluster together; "happy", "sad", "joy" form their own region; "computer" and "code" are practically on top of each other. This is what neural networks "see" when they read.

3D embedding space
36 words · 6 categories · drag to rotate
Hint: drag to rotate · click a word to see its neighbors
// Selected word
Click a word to see its semantic neighbors
// Categories
Royalty
Family
Animals
Foods
Colors
Emotions
Part 05 · The party trick

Vectors do arithmetic.
And it just… works.

Because embeddings live in a coherent vector space, you can add and subtract them. And the results are often poetic.

king
man
+
woman
=
queen

Subtract the "male" component of "king" by removing "man." Add the "female" component by adding "woman." The closest vector in embedding space to the result is — almost spookily — "queen." This is real: it works on actual trained embeddings like Word2Vec and GloVe.

This isn't magic. It's geometry. The gender axis is consistent across word pairs because the model learned (from billions of sentences) that "king" relates to "queen" the same way "man" relates to "woman." So the vector difference is the same.

Geography
Paris − France + Japan ≈ Tokyo
Verb tense
walking − walk + swim ≈ swimming
Comparatives
bigger − big + small ≈ smaller
Part 06 · The real world

Embeddings power the
internet you use every day.

Every product that "understands meaning" is doing some version of this. Three big ones:

Semantic Search

"Find documents about AI risks" — not just ones with the exact words. Search engines embed your query and every document, then find the closest vectors. Works across synonyms and paraphrasing.

// Google · Notion · Algolia · every modern search

Recommendation

"You liked this movie — here are similar ones." Movies, products, songs, articles — all get embedded as vectors. Recommendations are just nearest neighbor lookups in vector space.

// Netflix · Spotify · Amazon · TikTok

Large Language Models

Every LLM starts by embedding the input tokens, processes those vectors through dozens of layers, and outputs new vectors. The whole transformer architecture (next module) operates entirely in embedding space.

// GPT-4 · Claude · Gemini · Llama · all of them
Part 07 · Knowledge check

Five questions on what
you just decoded.

Aim for 4/5. Wrong answers explain themselves.

Question 01 of 05

0/5

Continue
Course 2 · Module 08 complete

Language is now
just geometry.

You tokenized text and saw it become numbered pieces. You explored a 3D embedding space and felt how meaning becomes distance. You learned why "king − man + woman = queen." Everything modern in NLP — every LLM, every search engine, every recommendation system — is built on these two ideas. You now have the foundation.

Up next · Course 2 · Module 09

Working with LLMs

Now we put embeddings to work. You'll build a Q&A bot over a PDF using retrieval-augmented generation (RAG) — embed chunks, semantic-search for relevance, and feed the relevant pieces to an LLM. This is the architecture behind every "chat with your docs" product.

Continue to Module 09