Entry
Designing Tools AI Agents Can Actually Use
Most agents are bad at using tools because the tools were built for programmers, not for a model reading a one-line description under load. How to design tools, and MCP servers, that agents get right.
00 / Masthead
Entry
Most agents are bad at using tools because the tools were built for programmers, not for a model reading a one-line description under load. How to design tools, and MCP servers, that agents get right.
Here is a failure I have now seen at three different companies, almost word for word.
A team gives their agent a tool called update_record. It takes eleven optional parameters and returns a 4KB JSON object with nested metadata. In the demo it works, because the engineer who built it knew exactly which three parameters to pass. In production the agent guesses, picks the wrong record, misreads the response, and the team spends a week blaming the model.
The model was fine. The tool was built for a programmer who already understood the system, not for a model reading a one-line description with a thousand other tokens competing for its attention. Tool design is one of the parts of building an agent that holds up that pays back the most, and it is mostly a writing problem.
Whatever you write as the tool's description, the model treats as instructions. So write it as instructions.
update_record(id, fields) with the description "updates a record" tells the model nothing about when to reach for it, what a valid id looks like, or what happens if the record does not exist. Compare "Update the status of an existing support ticket. Use this only after you have confirmed the ticket ID with find_ticket. Fails if the ticket is already closed." Now the model knows the preconditions, the ordering, and one of the failure modes before it ever calls the thing.
Vague tool names are the same trap as vague variable names, except the consumer cannot ask a colleague what you meant. Name the tool after the action a person would describe. refund_order beats process_transaction. find_customer_by_email beats query.
The model has to read the return value and decide what to do next. A wall of JSON is the worst possible thing to hand it.
If a tool returns the full record when the agent only needs to know the update succeeded, you have spent tokens and attention on nothing, and you have given the model more surface to misread. Return the smallest thing that lets it take the next step. "Ticket 4821 set to resolved." Done. If it needs the data, return the three fields it needs, not the row.
This is the same instinct as designing a good API response for a junior engineer who will not read the docs. Except the junior engineer never reads the docs and answers in milliseconds.
The pull toward a single flexible tool with a mode parameter is strong and you should resist it. A manage_calendar tool that creates, updates, deletes, and lists depending on an action argument forces the model to get two decisions right at once: which action, and which arguments for that action. Two coupled decisions fail more often than two clean ones.
Four tools (create_event, move_event, cancel_event, list_events) each read like a sentence and each has obvious arguments. Yes, it is more tools. The model handles a clear menu far better than it handles one tool wearing four hats.
Most tools, when they fail, throw the same generic error a human would see in a stack trace. The agent reads "Error 422: unprocessable entity" and has no idea whether to retry, change an argument, or give up.
Write errors the model can act on. "No ticket found with that ID. Call find_ticket with the customer email first" turns a dead end into a next step. An error message is the only steering you get at the exact moment the agent is off track, so spend words there. (This is the cheapest reliability win in the whole stack and almost nobody does it.)
Everything above is true for any tool you hand a model. MCP matters because it turns tool design from a thing you redo per agent into a contract you build once.
Before, wiring an agent into your calendar, your docs, and your CRM meant gluing three integrations into that one agent. The next agent re-glued the same three. MCP makes each of those a server with a stable, versioned interface that any agent can speak to. Build the calendar server once, design its tools well once, and every agent you ship after that inherits the work.
That is why I put an MCP server in front of the tools I actually want agents driving. GNO exposes hybrid search over your documents as an MCP server, so any agent can retrieve against your knowledge base without me rebuilding retrieval each time. Dettivo ships a stdio MCP server so an agent can drive dictation and meeting capture on your Mac. sheets-cli installs as an agent skill for the same reason. The integration is the asset. The agent is downstream of it.
The teams whose agents feel reliable are not running better models. They have spent the time to give those models tools that read like sentences, return like answers, and fail like advice. It is unglamorous work. It is also most of the job.
This is one layer of the system around the model. The other four (verification, memory, guardrails, observability) are in the field guide to harness engineering.