Every AI term you keep hearing, explained in two or three plain-English sentences — no math, no jargon-defined-with-more-jargon. Each entry links to the free lesson that teaches it properly.
A hypothetical AI that can understand, learn, and apply knowledge across any intellectual task a human can — not just one narrow specialty. No AGI exists today; every AI system you use is narrow AI. When it might arrive, and whether current methods can get there, is one of the most debated questions in the field.
→ Learn more: What Is AI?Software that performs tasks which normally require human intelligence — recognizing images, understanding language, making predictions, or generating content. Modern AI is not programmed with explicit rules; it learns patterns from large amounts of data. AI is the umbrella term; machine learning and deep learning are ways of building it.
→ Learn more: What Is AI?An AI system that doesn't just answer a question but takes actions to complete a goal: it can plan steps, call tools and APIs, check its results, and try again. Think of the difference between asking for directions and having someone drive you there. Agents combine a language model with tool use, memory, and a planning loop.
→ Learn more: AI Agents & Multi-step ReasoningA step-by-step procedure a computer follows to solve a problem — like a recipe. In machine learning, the algorithm is the learning method (for example, gradient descent or decision trees), and the model is what the algorithm produces after training on data.
→ Learn more: How Machines LearnThe effort to make AI systems reliably do what humans actually want — being helpful, honest, and harmless — rather than what a literal reading of their training objective rewards. Techniques like RLHF are alignment methods: they tune a raw language model to follow instructions and refuse harmful requests.
→ Learn more: RLHF & LLM AlignmentThe mechanism that lets a neural network decide which parts of its input matter most for the current prediction — for example, connecting the word "it" back to the noun it refers to. Attention is the core idea behind the transformer architecture that powers every modern large language model.
→ Learn more: Attention Mechanisms Deep DiveThe algorithm neural networks use to learn from mistakes. After the network makes a prediction, backpropagation measures the error and works backwards through every layer, calculating how much each connection contributed to the mistake so its weight can be nudged in the right direction. Repeat millions of times and the network gets good.
→ Learn more: Neural Networks from ScratchSystematic unfairness in an AI system's outputs, usually inherited from patterns in its training data. If a hiring model is trained on historical decisions that favored one group, it learns to repeat that preference. Bias is not a bug in the code — it's a reflection of the data, which is why it's hard to fix.
→ Learn more: When AI Gets It WrongA machine learning task where the model assigns an input to one of a fixed set of categories: spam or not spam, cat or dog, approve or deny. It's one of the two classic supervised learning problems — the other is regression, which predicts a number instead of a category.
→ Learn more: Supervised Learning & RegressionAn unsupervised learning technique that groups similar data points together without being told what the groups are. Give it customer data and it might discover natural segments — bargain hunters, loyal regulars, one-time buyers — that nobody labeled in advance. K-means is the best-known clustering algorithm.
→ Learn more: Unsupervised Learning & ClusteringA neural network designed for images. Instead of looking at every pixel independently, a CNN slides small filters across the image to detect edges, then textures, then shapes, then whole objects — building understanding layer by layer, roughly the way the visual cortex does. CNNs power face recognition, medical imaging, and self-driving car vision.
→ Learn more: Computer Vision & CNNsThe field of AI concerned with understanding images and video: recognizing objects, reading text in photos, detecting faces, tracking movement. It's how your phone unlocks when it sees you and how a car spots a pedestrian.
→ Learn more: Computer Vision & CNNsThe maximum amount of text a language model can "see" at once — its working memory, measured in tokens. Everything you've typed in a conversation, plus the model's replies, must fit inside it; anything pushed out is forgotten. Modern models range from a few thousand to over a million tokens of context.
→ Learn more: Long Context LLMsThe collection of examples a model learns from — images, sentences, table rows, sensor readings. The quality, size, and diversity of the dataset matter more than almost anything else in machine learning: a brilliant algorithm trained on bad data produces a bad model.
→ Learn more: The Data MindsetMachine learning using neural networks with many layers ("deep" refers to the layer count). Each layer learns progressively more abstract features — from edges to shapes to faces. Deep learning is behind essentially every headline AI advance of the last decade, from image recognition to ChatGPT.
→ Learn more: What Is AI?The type of AI behind image generators like Stable Diffusion and DALL·E. It's trained by adding noise to images until they're pure static, then learning to reverse the process. To generate a new image, it starts from random noise and "denoises" step by step toward something that matches your text prompt.
→ Learn more: Diffusion Models ExplainedA list of numbers that represents the meaning of a piece of text, an image, or any other data, positioned so that similar things end up close together. "King" and "queen" get nearby embeddings; "king" and "toaster" don't. Embeddings are how computers do math on meaning, and they power semantic search and RAG.
→ Learn more: NLP FoundationsOne complete pass through the entire training dataset. Models typically train for many epochs, seeing every example repeatedly, with performance checked after each pass. Too few epochs and the model underfits; too many and it starts memorizing instead of learning.
→ Learn more: Neural Networks from ScratchAn individual measurable property the model uses as input — a house's square footage, a customer's age, the number of exclamation marks in an email. Choosing and shaping good features (feature engineering) is often the difference between a mediocre model and a great one.
→ Learn more: The Data MindsetGetting a language model to perform a task by showing it just a handful of examples inside the prompt — no retraining required. Show it three examples of the tone you want, and it picks up the pattern for the fourth. Contrast with zero-shot, where you give instructions but no examples.
→ Learn more: Prompt Engineering for BeginnersTaking a model that's already been trained on general data and training it further on a smaller, specialized dataset — teaching a general language model your company's support style, or medical terminology. It's far cheaper than training from scratch because the model already knows language; it just needs the specialty.
→ Learn more: Fine-tuning LLMs with LoRAA very large model trained on broad data that serves as the base for many different applications — GPT-4, Claude, Gemini, and Llama are foundation models. Instead of building a new model per task, you adapt one foundation model through prompting, fine-tuning, or RAG.
→ Learn more: Working with LLMsA capability that lets a language model invoke external tools — search the web, query a database, send an email, run code — by outputting a structured request that your software executes. It's how chatbots go from "talking about things" to "doing things," and it's the building block of AI agents.
→ Learn more: LLM Tool Use & Function CallingAI that creates new content — text, images, music, video, code — rather than just classifying or predicting from existing data. ChatGPT generating an essay and Midjourney painting an image are both generative AI. The generated output is new, but it's shaped entirely by patterns learned from training data.
→ Learn more: AI That Sees, Hears & CreatesA family of large language models built on the transformer architecture: generative because it produces text, pre-trained because it first learns from vast amounts of internet text, and transformer after its architecture. ChatGPT is a GPT model fine-tuned to be a helpful conversational assistant.
→ Learn more: Working with LLMsThe optimization method most machine learning uses to learn: measure how wrong the model is (the loss), work out which direction of change reduces the error fastest, take a small step that way, repeat. Picture descending a foggy mountain by always stepping downhill — eventually you reach the valley of low error.
→ Learn more: Supervised Learning & RegressionWhen an AI confidently states something false — inventing citations, facts, or events that don't exist. It happens because language models generate statistically plausible text, not verified truth; a made-up study title can be perfectly plausible. Hallucination is the single most important limitation to understand before trusting AI output.
→ Learn more: When AI Gets It WrongUsing a trained model to produce outputs — every ChatGPT reply is inference. Training happens once and costs millions; inference happens billions of times a day, which is why so much engineering goes into making it fast and cheap (quantization, caching, batching).
→ Learn more: LLM Inference OptimizationA memory trick that makes language models generate text fast. Without it, the model would re-process the entire conversation for every new word; the KV cache stores the intermediate attention results (keys and values) so each new token only requires computing what's new. It's the main reason long chats consume so much GPU memory.
→ Learn more: LLM Inference OptimizationThe correct answer attached to a training example: the "spam" tag on an email, the "cat" tag on a photo, the actual sale price of a house. Supervised learning needs labels to learn from; producing them at scale (often by human annotators) is one of the hidden costs of AI.
→ Learn more: How Machines LearnA neural network with billions of parameters trained on enormous amounts of text to predict the next token — and, through that simple objective, to write, summarize, translate, reason, and code. ChatGPT, Claude, and Gemini are all LLMs. "Large" refers to both the parameter count and the training data.
→ Learn more: Working with LLMsA technique for fine-tuning huge models cheaply. Instead of updating all billions of parameters, LoRA freezes the original model and trains tiny add-on matrices — often less than 1% of the original size — that steer its behavior. This makes fine-tuning possible on a single consumer GPU instead of a data center.
→ Learn more: Fine-tuning LLMs with LoRA & QLoRAThe formula that scores how wrong a model's predictions are — the single number training tries to minimize. Choosing the loss function defines what "good" means for the model: penalize big errors heavily, treat false alarms differently from misses, and so on.
→ Learn more: Supervised Learning & RegressionThe approach of teaching computers by example instead of by explicit rules. Rather than writing "spam contains the word FREE," you show the system thousands of spam and non-spam emails and it works out the patterns itself. Nearly all modern AI is machine learning underneath.
→ Learn more: How Machines LearnAn architecture where a model contains many specialized sub-networks ("experts") but only activates a few of them per token, routed by a small gating network. This lets a model have enormous total capacity while keeping each individual computation cheap — the trick behind models like Mixtral and (reportedly) GPT-4.
→ Learn more: Mixture of Experts (MoE)The output of training: a mathematical structure (usually a network of weighted connections) that has absorbed patterns from data and can now make predictions on new inputs. The algorithm is the recipe; the model is the dish. When people say "GPT-4," they mean a specific trained model.
→ Learn more: How Machines LearnAI that works across multiple types of input and output — text, images, audio, video — in a single model. You can show it a photo and ask questions about it, or have it describe a chart aloud. GPT-4V and Gemini are multimodal; earlier models handled only one modality each.
→ Learn more: Multimodal AI ModelsAI built for one specific job — recommending videos, translating text, spotting tumors. It can be superhuman at that job and useless at everything else; a chess engine can't drive a car. Every AI system in existence today, including ChatGPT, is narrow AI. The opposite is AGI.
→ Learn more: What Is AI?A model made of layers of simple units ("neurons"), each taking weighted inputs and passing a signal forward — loosely inspired by the brain. Alone, each unit does almost nothing; stacked in layers and trained on data, they can recognize faces, translate languages, and write essays.
→ Learn more: Neural Networks from ScratchThe field of AI focused on understanding and generating human language: translation, sentiment analysis, summarization, chatbots. Large language models are NLP's biggest breakthrough, but the field also includes the humbler machinery — tokenization, embeddings, text classification — that makes them work.
→ Learn more: NLP FoundationsWhen a model memorizes its training data instead of learning general patterns — like a student who memorizes past exam answers and fails on new questions. An overfit model scores brilliantly on data it has seen and poorly on data it hasn't, which is why models are always evaluated on held-out test data.
→ Learn more: How Machines LearnThe internal numbers a model adjusts during training — the weights on the connections between neurons. They're where the "knowledge" lives. Model sizes are quoted in parameters: a 7B model has seven billion of them. More parameters generally means more capability, more cost, and more data needed to train well.
→ Learn more: Neural Networks from ScratchThe input you give a language model — a question, an instruction, a document to work on, or all three. The prompt is the model's entire brief for the task; it knows nothing about what you want except what the prompt contains, which is why phrasing changes results so dramatically.
→ Learn more: Prompt Engineering for BeginnersThe craft of writing prompts that get reliably good results: giving context, showing examples, specifying format, assigning a role, breaking tasks into steps. It requires no coding, and it's the highest-return AI skill for most people — the same model can perform terribly or brilliantly depending on how it's asked.
→ Learn more: Prompt Engineering for BeginnersShrinking a model by storing its parameters with less precision — using 4 or 8 bits per number instead of 16 or 32. The model gets dramatically smaller and faster with only a small quality loss, which is how large models run on laptops and phones instead of server farms.
→ Learn more: LLM Inference OptimizationA technique that lets a language model answer questions using knowledge it wasn't trained on. Relevant documents are retrieved from a database (usually via embeddings and vector search) and pasted into the prompt, so the model answers from your up-to-date sources instead of its frozen training data. It's the standard way to build "chat with your documents" products and reduce hallucination.
→ Learn more: Vector Databases & RAGA supervised learning task where the model predicts a continuous number — a house price, tomorrow's temperature, expected revenue — rather than a category. Linear regression, the simplest version, fits a straight line through data points and is often the first algorithm anyone learns.
→ Learn more: Supervised Learning & RegressionLearning by trial, error, and reward: the system takes actions, receives points for good outcomes, and gradually discovers strategies that maximize its score — the way you'd train a dog, or the way AlphaGo learned to beat world champions. It's one of the three main learning paradigms, alongside supervised and unsupervised learning.
→ Learn more: How Machines LearnThe training stage that turns a raw text-predictor into a helpful assistant. Humans rank pairs of model answers, a reward model learns to predict those preferences, and the language model is then optimized to produce answers humans would prefer. RLHF is the main reason ChatGPT feels cooperative rather than like an autocomplete engine.
→ Learn more: RLHF & LLM AlignmentThe specific form of attention used inside transformers, where every word in a sequence looks at every other word to figure out what's relevant to its own meaning. In "the animal didn't cross the road because it was tired," self-attention is what links "it" to "animal" rather than "road."
→ Learn more: Attention Mechanisms Deep DiveSearch by meaning instead of keywords. A query for "how do I make my laptop battery last longer" matches a document titled "extending notebook power life" even though they share almost no words, because both are converted to embeddings and compared by similarity. It's the retrieval half of RAG.
→ Learn more: Vector Databases & Semantic SearchTraining a model on examples that include the correct answer — emails labeled spam/not-spam, photos labeled by object, houses with their sale prices. The model learns the mapping from input to answer and applies it to new cases. It's the most common and commercially important form of machine learning.
→ Learn more: How Machines LearnA setting that controls how adventurous a language model's word choices are. Low temperature makes it pick the most likely next token every time — predictable and consistent; higher temperature lets less likely tokens through — more creative, more varied, and more prone to going off the rails. Factual tasks want low temperature; brainstorming wants it higher.
→ Learn more: Working with LLMsThe unit of text a language model actually reads and writes — usually a word fragment of about four characters. "Understanding" might be split into "under," "stand," "ing." Context windows, API pricing, and generation speed are all measured in tokens, which is why the word appears on every AI pricing page.
→ Learn more: NLP FoundationsThe process of chopping text into tokens before a model processes it. The tokenizer's vocabulary is fixed at training time, which explains some odd model behaviors — like struggling to count letters in a word, since the model sees tokens, not characters.
→ Learn more: NLP FoundationsThe examples a model learns from. For an LLM, that's a huge slice of the internet plus books and code; for a fraud model, it's millions of past transactions. A model can only reflect what its training data contains — its knowledge cutoff, its blind spots, and its biases all trace back here.
→ Learn more: How Machines LearnThe neural network architecture, introduced in the 2017 paper "Attention Is All You Need," that powers virtually all modern AI language systems. Its self-attention mechanism processes all words in parallel while modeling how they relate, making it both more capable and more efficient to train than its predecessors. The "T" in GPT stands for transformer.
→ Learn more: Transformer Architecture ExplainedAlan Turing's 1950 proposal: if a human conversing with a machine can't tell it apart from a person, the machine can be said to "think." Modern chatbots arguably pass casual versions of it, which has mostly taught us that the test measures imitation, not understanding — but it framed the AI debate for 70 years.
→ Learn more: What Is AI?When a model is too simple to capture the real patterns in the data — like summarizing a novel in three words. An underfit model performs poorly even on its own training data. The fix is more capacity, better features, or more training; the art is stopping before you swing into overfitting.
→ Learn more: How Machines LearnLearning from data with no answer key — the model finds structure on its own, grouping similar items or compressing data to its essentials. Clustering customers into segments and detecting unusual transactions are classic uses. It's how you learn from the vast majority of data, which nobody has labeled.
→ Learn more: Unsupervised Learning & ClusteringA database built to store embeddings and answer one question extremely fast: "which stored items are most similar to this one?" Regular databases match exact values; vector databases match by meaning, using approximate nearest-neighbor search across millions of vectors. They're the storage layer behind RAG and semantic search — Pinecone, FAISS, and pgvector are common choices.
→ Learn more: Vector Databases at ScaleAsking a model to do a task it was never explicitly trained or shown examples for — "translate this into pirate speak" — and having it succeed purely from general knowledge absorbed during training. The surprising zero-shot abilities of large language models are a big part of why they feel intelligent.
→ Learn more: Prompt Engineering for BeginnersEvery term above is taught properly — with visuals, analogies, and quizzes — in the free 34-module course.
Explore the full course →