Tool use: giving a model hands
How tool calling works under the hood, and how to design tools models can use.
An agent without tools is a chat bot. An agent with too many tools is a confused chat bot. Tool design is the lever with the biggest effect on agent quality.
How tool calling actually works
The model sees tool definitions in its context — name, description, parameter schema. When generating a response, it can emit a structured "tool call" instead of a final answer. Your code executes the tool, feeds the result back, the model continues.
Mechanically: same completion API, with a structured format for calls.
Designing tools for model consumption
Models aren't reading your tools the way your engineers do. They read the description and parameter names. That's ~80% of what they have to decide on.
Rules:
- Name the tool for its job, not its implementation.
get_customer_detailsbeatsfetchCustomerFromCRMV2. - Write the description for a new hire. "Returns the customer's current plan, billing status, and last-login timestamp" beats "Customer object getter."
- Name parameters what they are.
customer_idbeatsid. The model uses names as type hints. - Constrain parameters with enums when possible. Fewer wrong calls.
- Keep the tool list short. 5-15 tools is healthy. 30+ and the model gets confused.
The reliability trade-off
Make tools idempotent where possible. If the model calls send_email twice due to a retry, you don't want two emails. Either the tool itself is idempotent (by design) or the orchestration layer deduplicates.
Error handling inside tools
When a tool fails, what does the model see? Three options:
- Throw/crash. Bad — the agent has no way to recover.
- Return a generic error. "Error: failed." Model can't do anything useful with this.
- Return a structured error with what went wrong and what to try. "Invalid customer_id: no record found. Suggestions: check format (expected: uuid), or call list_customers to find the right id."
The third is meaningfully better. It turns the model into a partner in error recovery instead of a victim of it.
Tool composition patterns
- Atomic tools. One tool per clean capability. Best for flexibility.
- Macro tools. A single tool that wraps a multi-step workflow. Best for cost and reliability — fewer model-in-loop decisions.
- Mixed. Most real systems. Atomic tools for exploration, macro tools for well-known flows.
MCP as the standardization moment
Model Context Protocol (MCP) is the 2025-onwards standard for exposing tools to LLMs. One tool server can serve Cursor, Claude Code, and your own agents. If you're designing tools in 2026, design them as MCP servers — you'll get multi-client compatibility for free.
What breaks in practice
- Tool permissions too broad.
run_sql(query)is a power tool and a liability. Either split (get_customer(id),get_orders(id)) or heavily sandbox. - Ambiguous errors. "Something went wrong" — useless. Always be specific.
- Silent rate limits. The tool returns
{ok: false}but not why. The model keeps retrying.
Check your understanding
2-question self-check
Optional. Your answers feed your knowledge score on the track certificate.
Q1.The biggest lever for agent reliability on tool design is…
Q2.When a tool fails, what should it return to the agent?
Continue in this track
More lessons from Building AI Agents.
Lesson 1
What an agent actually is (and isn't)
Cut through the marketing. Define agents by behavior, not hype.
Lesson 3
Memory systems: short, long, and associative
The three kinds of memory an agent needs and how to build each.
Lesson 4
Planning strategies: ReAct, Plan-and-Execute, and beyond
Different shapes of agent reasoning and when to use each.