Tool use: giving a model hands

An agent without tools is a chat bot. An agent with too many tools is a confused chat bot. Tool design is the lever with the biggest effect on agent quality.

How tool calling actually works

The model sees tool definitions in its context — name, description, parameter schema. When generating a response, it can emit a structured "tool call" instead of a final answer. Your code executes the tool, feeds the result back, the model continues.

Mechanically: same completion API, with a structured format for calls.

Designing tools for model consumption

Models aren't reading your tools the way your engineers do. They read the description and parameter names. That's ~80% of what they have to decide on.

Rules:

Name the tool for its job, not its implementation. get_customer_details beats fetchCustomerFromCRMV2.
Write the description for a new hire. "Returns the customer's current plan, billing status, and last-login timestamp" beats "Customer object getter."
Name parameters what they are. customer_id beats id. The model uses names as type hints.
Constrain parameters with enums when possible. Fewer wrong calls.
Keep the tool list short. 5-15 tools is healthy. 30+ and the model gets confused.

The reliability trade-off

Make tools idempotent where possible. If the model calls send_email twice due to a retry, you don't want two emails. Either the tool itself is idempotent (by design) or the orchestration layer deduplicates.

Error handling inside tools

When a tool fails, what does the model see? Three options:

Throw/crash. Bad — the agent has no way to recover.
Return a generic error. "Error: failed." Model can't do anything useful with this.
Return a structured error with what went wrong and what to try. "Invalid customer_id: no record found. Suggestions: check format (expected: uuid), or call list_customers to find the right id."

The third is meaningfully better. It turns the model into a partner in error recovery instead of a victim of it.

Tool composition patterns

Atomic tools. One tool per clean capability. Best for flexibility.
Macro tools. A single tool that wraps a multi-step workflow. Best for cost and reliability — fewer model-in-loop decisions.
Mixed. Most real systems. Atomic tools for exploration, macro tools for well-known flows.

MCP as the standardization moment

Model Context Protocol (MCP) is the 2025-onwards standard for exposing tools to LLMs. One tool server can serve Cursor, Claude Code, and your own agents. If you're designing tools in 2026, design them as MCP servers — you'll get multi-client compatibility for free.

What breaks in practice

Tool permissions too broad. run_sql(query) is a power tool and a liability. Either split (get_customer(id), get_orders(id)) or heavily sandbox.
Ambiguous errors. "Something went wrong" — useless. Always be specific.
Silent rate limits. The tool returns {ok: false} but not why. The model keeps retrying.

An agent without tools is a chat bot. An agent with too many tools is a confused chat bot. Tool design is the lever with the biggest effect on agent quality.

How tool calling actually works

Mechanically: same completion API, with a structured format for calls.

Designing tools for model consumption

Models aren't reading your tools the way your engineers do. They read the description and parameter names. That's ~80% of what they have to decide on.

Rules:

Name the tool for its job, not its implementation. get_customer_details beats fetchCustomerFromCRMV2.
Write the description for a new hire. "Returns the customer's current plan, billing status, and last-login timestamp" beats "Customer object getter."
Name parameters what they are. customer_id beats id. The model uses names as type hints.
Constrain parameters with enums when possible. Fewer wrong calls.
Keep the tool list short. 5-15 tools is healthy. 30+ and the model gets confused.

The reliability trade-off

Error handling inside tools

When a tool fails, what does the model see? Three options:

Throw/crash. Bad — the agent has no way to recover.
Return a generic error. "Error: failed." Model can't do anything useful with this.
Return a structured error with what went wrong and what to try. "Invalid customer_id: no record found. Suggestions: check format (expected: uuid), or call list_customers to find the right id."

The third is meaningfully better. It turns the model into a partner in error recovery instead of a victim of it.

Tool composition patterns

Atomic tools. One tool per clean capability. Best for flexibility.
Macro tools. A single tool that wraps a multi-step workflow. Best for cost and reliability — fewer model-in-loop decisions.
Mixed. Most real systems. Atomic tools for exploration, macro tools for well-known flows.

MCP as the standardization moment

What breaks in practice

Tool permissions too broad. run_sql(query) is a power tool and a liability. Either split (get_customer(id), get_orders(id)) or heavily sandbox.
Ambiguous errors. "Something went wrong" — useless. Always be specific.
Silent rate limits. The tool returns {ok: false} but not why. The model keeps retrying.

Tool use: giving a model hands

How tool calling actually works

Designing tools for model consumption

The reliability trade-off

Error handling inside tools

Tool composition patterns

MCP as the standardization moment

What breaks in practice

2-question self-check

Continue in this track

Tool use: giving a model hands

How tool calling actually works

Designing tools for model consumption

The reliability trade-off

Error handling inside tools

Tool composition patterns

MCP as the standardization moment

What breaks in practice

2-question self-check

Continue in this track