AgentKit Tutorial: Build an AI Agent in ChatGPT

Goal: ship a production-ready ChatGPT Agent that plans multi-step tasks, calls tools, handles errors, and hands off to an App for human confirmation or in-chat checkout when needed.

If you’re deciding between surfaces, read Apps vs Agents and What Is OpenAI AgentKit? first. For UI surfaces and forms, you’ll likely pair this with Inline UI & Widgets.

What we’ll build

A Research → Decide → Act agent that:

Gathers inputs (topic + constraints)
Plans steps (search → extract → compile → send)
Calls tools (web search, file write, email/send)
Recovers from errors (retries/backoff/alternates)
Logs traces & costs
Opens an App screen for confirmation before sending

Prerequisites

Working knowledge of JS/TS or Python (examples shown in pseudo-TS)
Tool endpoints you can call (HTTP APIs or your MCP server)
Familiarity with scopes/consent and privacy: Security • Data Privacy

1) Define your tools (capabilities)

Keep the toolkit small for your hero job.

type ToolCall<I,O> = (input: I) => Promise<O>;

const searchWeb: ToolCall<{q:string, k?:number}, {results:{title:string,url:string}[]}> = ...
const extractFacts: ToolCall<{url:string}, {facts:string[]}> = ...
const compileBrief: ToolCall<{topic:string, facts:string[]}, {docUrl:string, wordCount:number}> = ...
const sendEmail: ToolCall<{to:string, subject:string, body:string}, {messageId:string}> = ...

If you already expose tools via MCP, reuse those endpoints here.

Server side patterns: MCP Server Tutorial
Tool contracts & UI: Model Context Protocol

2) Write a planner (lightweight but explicit)

The planner converts goals → steps with exit criteria.

type Step =
  | {kind:"search"; query:string}
  | {kind:"extract"; url:string}
  | {kind:"compile"; topic:string}
  | {kind:"handoff_confirm"; docUrl:string}
  | {kind:"send"; to:string; subject:string; body:string};

function plan({topic, recipient}:{topic:string, recipient:string}): Step[] {
  return [
    {kind:"search",  query:`${topic} latest developments site:.gov OR site:.edu`},
    {kind:"extract", url:"<from-search>"},
    {kind:"compile", topic},
    {kind:"handoff_confirm", docUrl:"<from-compile>"},
    {kind:"send", to: recipient, subject:`Brief: ${topic}`, body:"<from-compile>"}
  ];
}

Keep steps deterministic and small.
Add heuristics (if search sparse → widen query).
For complex goals, generate plans with an LLM but validate them.

3) Guardrails & budgets (must-have)

Define explicit limits before you run the plan.

const policy = {
  allowedDomains: ["gov","edu","reputable-news.com"],
  maxSteps: 12,
  maxCostUSD: 0.50,
  maxSeconds: 120,
  denyWritesWithoutConfirm: true
};

Block disallowed domains/tools.
Cap steps/time/cost; short-circuit if exceeded.
Require human-in-the-loop for writes (email, orders, payments).

Governance guides: Security for ChatGPT Apps • Compliance & PII

4) The executor (routing + retries)

async function runPlan(steps: Step[], ctx: Ctx) {
  for (let i=0; i<steps.length; i++) {
    enforceBudgets(ctx);  // time/cost/step caps

    const s = steps[i];
    try {
      if (s.kind === "search") {
        ctx.search = await searchWeb({ q: s.query, k: 5 });
      }
      if (s.kind === "extract") {
        const top = ctx.search.results[0]?.url;
        assertDomainAllowed(top, policy);
        ctx.facts = await extractFacts({ url: top });
      }
      if (s.kind === "compile") {
        ctx.brief = await compileBrief({ topic: s.topic, facts: ctx.facts });
      }
      if (s.kind === "handoff_confirm") {
        const ok = await openAppConfirmUI(ctx.brief.docUrl); // App confirm
        if (!ok) return {status:"cancelled"};
      }
      if (s.kind === "send") {
        assert(policy.denyWritesWithoutConfirm === false, "Needs confirm");
        await sendEmail({ to:s.to, subject:s.subject, body: makeEmailBody(ctx.brief) });
      }
      recordStepSuccess(s, ctx);
    } catch (err) {
      const retried = await maybeRetry(s, err, ctx);  // backoff + alt
      if (!retried) return {status:"failed", at: s.kind, error: String(err)};
    }
  }
  return {status:"success", docUrl: ctx.brief.docUrl};
}

Resilience tips:

maybeRetry: exponential backoff, change provider, or reduce k.
Idempotency for writes (message keys, request hashes).
Circuit breakers around flaky upstreams.

Ops references: App Analytics

5) Human-in-the-loop via an App handoff

Before sending or charging, open a confirm UI in your App:

Collect final edits/recipients.
Show cost/time summary and what will happen next.
If payment, use Agentic Commerce Protocol with in-chat checkout.

UI patterns: Inline UI & Widgets

6) Telemetry & audits

Capture:

Trace ID for the run; per-step timings and errors
Token & API cost estimates (by tool)
Policy events (denied domain, write blocked, confirmation obtained)
Outcome (success/failed/cancelled)

Pipe to your warehouse and review weekly.

Analytics guide: Analytics for ChatGPT Apps

7) Evaluations (Evals) that matter

Write lightweight checks that run on CI:

Factuality probe for extract/compile on known URLs
Plan length ≤ N for a given task
Guardrail checks (no disallowed domains/tools)
Cost/time limits enforced under synthetic load

Tune prompts/tools until evals pass reliably.

8) Shipping checklist (Agent edition)

✅ One hero job with a clear success criterion
✅ Minimal, well-typed toolset
✅ Budgets (steps/time/cost) + whitelists/denylists
✅ Confirm screen before writes/payments
✅ Telemetry + audits + weekly evals
✅ Clear user messaging on what the agent will/won’t do

9) Extending your agent

Add alternates (multiple search/data providers).
Cache intermediate artifacts to cut cost/time.
Introduce a review step that routes to a human on low confidence.
Split the system into sub-agents by skill; orchestrate with a top-level planner.
Expose safe portions as an App for user-driven flows.

Deep dives: Agent Orchestration & Multi-Agent Workflows

FAQ

Do I need an App if I’m using an Agent?
You’ll want one for structured inputs, previews, and confirmations—especially for risky writes or payments.

How do I keep costs predictable?
Strict budgets, step caps, caching, and early-exit evals. Track cost per successful task as a north star.

Can Agents reuse my MCP tools?
Yes—expose your actions as tools once and call them from both the Agent and the App.

AgentKit Tutorial: Build an AI Agent in ChatGPT

What we’ll build

Prerequisites

1) Define your tools (capabilities)

2) Write a planner (lightweight but explicit)

3) Guardrails & budgets (must-have)

4) The executor (routing + retries)

5) Human-in-the-loop via an App handoff

6) Telemetry & audits

7) Evaluations (Evals) that matter

8) Shipping checklist (Agent edition)

9) Extending your agent

FAQ

Best Use Cases for ChatGPT Agents in 2025

ChatGPT Agent Mode: How It Works Behind the Scenes

ChatGPT Agent Capabilities: What They Can (and Can’t) Do

What Is OpenAI AgentKit? Overview & Capabilities

What we’ll build

Prerequisites

1) Define your tools (capabilities)

2) Write a planner (lightweight but explicit)

3) Guardrails & budgets (must-have)

4) The executor (routing + retries)

5) Human-in-the-loop via an App handoff

6) Telemetry & audits

7) Evaluations (Evals) that matter

8) Shipping checklist (Agent edition)

9) Extending your agent

FAQ

Similar Posts