Project: build a research agent end-to-end
Ship a working research agent with tools, memory, and eval.
This lesson is the capstone — a working research agent you can study, modify, and ship. It's deliberately small; production agents are built by iterating on a working base, not by architecting a perfect one upfront.
The spec
Build an agent that:
- Takes a research question as input.
- Plans 3-5 subtopics to investigate.
- For each subtopic, searches the web and reads a few sources.
- Synthesizes findings into a structured brief.
- Cites its sources.
The tool graph
Only three tools needed:
web_search(query)— returns a list of {title, url, snippet}.fetch_page(url)— returns the cleaned text of a web page.finish(brief)— ends the agent with the final output.
That's it. Resist the urge to add more.
The loop
system prompt: you are a research assistant. produce a 400-600 word
brief with citations on the question below. use the tools iteratively.
when done, call finish(brief).
user: <research question>
[loop]
model generates: tool call or finish
if tool call: execute, append result to context, continue
if finish: return brief
if step count > 20 or cost > $1.00: abort with partial result
Key decisions worth getting right
Planning up front vs. as-you-go. Start with a plan-and-execute shape — model produces a list of subtopics first, then works through them. More coherent than a pure ReAct loop on this task.
Source diversity. If the first two results are the same site, encourage the agent to seek different perspectives. Add a rule: "prefer sources from different domains when possible."
Citation format. Decide upfront: inline [1], [2] with a references section? Or hyperlinks? Consistency matters more than format choice.
Token budget per subtopic. If the brief is 500 words, each subtopic gets ~100 words. Tell the model that. Agents tend to over-write without explicit length rules.
Eval set
Even for this small project, build 10-20 test questions. Graded by:
- Coverage — did the brief address the question?
- Factuality — do the cited sources support the claims? (spot-check.)
- Conciseness — is it in the requested length?
- Source diversity — do citations span multiple domains?
Run the set weekly as you iterate.
What you'll hit
- Dead-end searches. The first search returns garbage. The agent needs a re-query strategy.
- Conflicting sources. The agent sometimes picks one and ignores the other. Better: note the conflict in the brief.
- Hallucinated citations. A classic failure — the model invents sources. Mitigation: validate every URL exists before including in the brief.
- Over-long output. The model ignores the length rule. Enforce with a post-check that trims if over budget.
Where to take it
Once the basic version works:
- Memory. Remember what was researched before; don't repeat.
- Expert models. Use a stronger model for synthesis, a cheaper one for initial search.
- Human review checkpoint. After the plan, pause for user edits before executing.
- Interactive. Let the user ask follow-up questions that reuse the research context.
Each of these is 20-100 lines of code on top of the basic version.
The meta-lesson
Your first agent won't be your best. It'll reveal the next five things to improve. Ship the scrappy version first; iterate from real usage, not from imagination.
Check your understanding
2-question self-check
Optional. Your answers feed your knowledge score on the track certificate.
Q1.The meta-lesson from building a working 100-line research agent is…
Q2.Why does the capstone agent include a step-count cap and cost cap?
Continue in this track
More lessons from Building AI Agents.
Lesson 5
Multi-agent systems without the chaos
When multiple agents help, when they don't, and how to coordinate them.
Lesson 6
Evaluating agents (this is hard)
Why agent eval is different from LLM eval, and the harness patterns that work.
Lesson 7
Agent safety and guardrails
Defense in depth for agents that take real actions.