AgentKit Tutorial: Build an AI Agent in ChatGPT
Goal: ship a production-ready ChatGPT Agent that plans multi-step tasks, calls tools, handles errors, and hands off to an App for human confirmation or in-chat checkout when needed.
If you’re deciding between surfaces, read Apps vs Agents and What Is OpenAI AgentKit? first. For UI surfaces and forms, you’ll likely pair this with Inline UI & Widgets.
What we’ll build
A Research → Decide → Act agent that:
- Gathers inputs (topic + constraints)
- Plans steps (search → extract → compile → send)
- Calls tools (web search, file write, email/send)
- Recovers from errors (retries/backoff/alternates)
- Logs traces & costs
- Opens an App screen for confirmation before sending
Prerequisites
- Working knowledge of JS/TS or Python (examples shown in pseudo-TS)
- Tool endpoints you can call (HTTP APIs or your MCP server)
- Familiarity with scopes/consent and privacy: Security • Data Privacy
1) Define your tools (capabilities)
Keep the toolkit small for your hero job.
type ToolCall<I,O> = (input: I) => Promise<O>;
const searchWeb: ToolCall<{q:string, k?:number}, {results:{title:string,url:string}[]}> = ...
const extractFacts: ToolCall<{url:string}, {facts:string[]}> = ...
const compileBrief: ToolCall<{topic:string, facts:string[]}, {docUrl:string, wordCount:number}> = ...
const sendEmail: ToolCall<{to:string, subject:string, body:string}, {messageId:string}> = ...
If you already expose tools via MCP, reuse those endpoints here.
- Server side patterns: MCP Server Tutorial
- Tool contracts & UI: Model Context Protocol
2) Write a planner (lightweight but explicit)
The planner converts goals → steps with exit criteria.
type Step =
| {kind:"search"; query:string}
| {kind:"extract"; url:string}
| {kind:"compile"; topic:string}
| {kind:"handoff_confirm"; docUrl:string}
| {kind:"send"; to:string; subject:string; body:string};
function plan({topic, recipient}:{topic:string, recipient:string}): Step[] {
return [
{kind:"search", query:`${topic} latest developments site:.gov OR site:.edu`},
{kind:"extract", url:"<from-search>"},
{kind:"compile", topic},
{kind:"handoff_confirm", docUrl:"<from-compile>"},
{kind:"send", to: recipient, subject:`Brief: ${topic}`, body:"<from-compile>"}
];
}
- Keep steps deterministic and small.
- Add heuristics (if search sparse → widen query).
- For complex goals, generate plans with an LLM but validate them.
3) Guardrails & budgets (must-have)
Define explicit limits before you run the plan.
const policy = {
allowedDomains: ["gov","edu","reputable-news.com"],
maxSteps: 12,
maxCostUSD: 0.50,
maxSeconds: 120,
denyWritesWithoutConfirm: true
};
- Block disallowed domains/tools.
- Cap steps/time/cost; short-circuit if exceeded.
- Require human-in-the-loop for writes (email, orders, payments).
Governance guides: Security for ChatGPT Apps • Compliance & PII
4) The executor (routing + retries)
async function runPlan(steps: Step[], ctx: Ctx) {
for (let i=0; i<steps.length; i++) {
enforceBudgets(ctx); // time/cost/step caps
const s = steps[i];
try {
if (s.kind === "search") {
ctx.search = await searchWeb({ q: s.query, k: 5 });
}
if (s.kind === "extract") {
const top = ctx.search.results[0]?.url;
assertDomainAllowed(top, policy);
ctx.facts = await extractFacts({ url: top });
}
if (s.kind === "compile") {
ctx.brief = await compileBrief({ topic: s.topic, facts: ctx.facts });
}
if (s.kind === "handoff_confirm") {
const ok = await openAppConfirmUI(ctx.brief.docUrl); // App confirm
if (!ok) return {status:"cancelled"};
}
if (s.kind === "send") {
assert(policy.denyWritesWithoutConfirm === false, "Needs confirm");
await sendEmail({ to:s.to, subject:s.subject, body: makeEmailBody(ctx.brief) });
}
recordStepSuccess(s, ctx);
} catch (err) {
const retried = await maybeRetry(s, err, ctx); // backoff + alt
if (!retried) return {status:"failed", at: s.kind, error: String(err)};
}
}
return {status:"success", docUrl: ctx.brief.docUrl};
}
Resilience tips:
- maybeRetry: exponential backoff, change provider, or reduce k.
- Idempotency for writes (message keys, request hashes).
- Circuit breakers around flaky upstreams.
Ops references: App Analytics
5) Human-in-the-loop via an App handoff
Before sending or charging, open a confirm UI in your App:
- Collect final edits/recipients.
- Show cost/time summary and what will happen next.
- If payment, use Agentic Commerce Protocol with in-chat checkout.
UI patterns: Inline UI & Widgets
6) Telemetry & audits
Capture:
- Trace ID for the run; per-step timings and errors
- Token & API cost estimates (by tool)
- Policy events (denied domain, write blocked, confirmation obtained)
- Outcome (success/failed/cancelled)
Pipe to your warehouse and review weekly.
Analytics guide: Analytics for ChatGPT Apps
7) Evaluations (Evals) that matter
Write lightweight checks that run on CI:
- Factuality probe for extract/compile on known URLs
- Plan length ≤ N for a given task
- Guardrail checks (no disallowed domains/tools)
- Cost/time limits enforced under synthetic load
Tune prompts/tools until evals pass reliably.
8) Shipping checklist (Agent edition)
- ✅ One hero job with a clear success criterion
- ✅ Minimal, well-typed toolset
- ✅ Budgets (steps/time/cost) + whitelists/denylists
- ✅ Confirm screen before writes/payments
- ✅ Telemetry + audits + weekly evals
- ✅ Clear user messaging on what the agent will/won’t do
9) Extending your agent
- Add alternates (multiple search/data providers).
- Cache intermediate artifacts to cut cost/time.
- Introduce a review step that routes to a human on low confidence.
- Split the system into sub-agents by skill; orchestrate with a top-level planner.
- Expose safe portions as an App for user-driven flows.
Deep dives: Agent Orchestration & Multi-Agent Workflows
FAQ
Do I need an App if I’m using an Agent?
You’ll want one for structured inputs, previews, and confirmations—especially for risky writes or payments.
How do I keep costs predictable?
Strict budgets, step caps, caching, and early-exit evals. Track cost per successful task as a north star.
Can Agents reuse my MCP tools?
Yes—expose your actions as tools once and call them from both the Agent and the App.
