LLMs have a frozen knowledge cutoff. They can't run code. They can't see your database. They can't book a flight. Tool use is how you fix all of that — give the model a list of functions it can call, and it decides when to call them, with what arguments, and how to use the results. This module unpacks the actual mechanic that turned LLMs into agents.
Wire it upA frozen-weights language model has hard limits. Tool use lets it route around them — by deferring to systems that can do those things and treating the result as new context.
The model's knowledge cuts off at training. Real-time info, current weather, news today — invisible to it.
LLMs are bad at long arithmetic. They hallucinate digits. A calculator tool fixes this for free.
Your company database, your spreadsheets, your private docs — outside the model's training set.
Need to actually execute Python, parse JSON, fit a regression? A code-execution tool does it for real.
Send an email. Book a flight. Make an API call. Effects in the world, not just outputs.
Pick a scenario. Press play. Watch the full agent loop unfold: model decides → tool call(s) → tool result(s) → model continues → final answer. This is exactly what Claude, GPT-4, and Gemini do under the hood when you give them tools.
The model is given a list of available tools with JSON schemas. It outputs either a regular text reply, OR a special tool_use block containing the function name and arguments. Your runtime executes the tool and sends back a tool_result block with the response. The model sees the result, reasons over it, then either calls more tools or gives the final answer.
Before the model can use a tool, you give it the tool's JSON schema — name, description, parameters, types, required fields. The model reads this list (as part of its system prompt) and decides when each tool is appropriate. Below: three canonical examples.
The description field is the most important — it's how the model knows when to use the tool. Vague description → model uses the tool wrong (or doesn't use it). Precise description → reliable behavior. This is the most underrated skill in building with tool use.
—
If you need the weather in Tokyo AND Paris, those calls are independent — they can run at the same time. If you need to find a flight first, then book it, those are dependent — second waits for first. Modern models (Claude, GPT-4) emit multiple tool_use blocks in a single response when they detect independence, letting your runtime fan them out.
You don't tell the model "these are parallel." It looks at the user's query, decides what tools to call, and emits them as multiple separate tool_use blocks within one response if they're independent. Your runtime executes them concurrently. If they're dependent, the model emits one tool call, sees the result, then emits the next.
Once you have tool use, you have agents. Here's how teams actually deploy it.
The model has a web_search tool. For any question that smells time-sensitive ("latest", "current", "today"), it searches first, then synthesizes the result.
Used by: Claude, ChatGPT, Perplexity, Gemini. The reason they can answer "what happened today" despite a stale knowledge cutoff.
The model gets read_file, write_file, run_bash, and search_codebase tools. It reads, edits, runs tests, iterates — all autonomously.
This is what makes Claude Code, Cursor, and Devin work. The loop runs dozens or hundreds of tool calls per task without user intervention.
Instead of stuffing retrieved docs into context up front, give the model a search_knowledge_base tool and let it decide when (and how often) to query.
Smarter than fixed RAG: the model can refine its query based on partial results, search again with better keywords, only retrieve what's needed. "Agentic RAG" is the standard for serious deployments now.
Anthropic's Computer Use gives Claude tools to take screenshots, move the mouse, type, click. The model sees the screen, decides what to do, the action runs, takes another screenshot, repeats.
Same protocol as text tool use — just with vision input and GUI-action tools. Lets the model operate any app it has access to. The bridge from chatbot to assistant that actually does things.
Aim for 4/5. Wrong answers explain themselves.
You watched the full loop: query → decide → tool call → result → continue → answer. You read real JSON schemas. You understand why some calls go parallel and others can't. You know that "agent" mostly just means "LLM in a tool-use loop". The mechanics are simple — the implications are not.
Continue to Module 09