@standardagents/skill 0.14.0 → 0.14.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -12,970 +12,577 @@ description: >
12
12
 
13
13
  # AgentBuilder Skill
14
14
 
15
- The effective agents guide is a living record that is not part of the specification, but intended to help humans and agents use the specification to craft effective agents. It is updated regularly and not bound to any specific version of the specification.
15
+ This guide is a living record. It is not part of the Standard Agent Specification — it exists to help humans and coding agents *use* the spec to build effective agents. It is updated regularly and not bound to any spec version.
16
16
 
17
- Note: in this guide we occasionally discuss commands like `npmx` or `pnpm` use whatever package manager the user prefers, these are just placeholders for the correct command.
17
+ > Commands like `pnpm exec agents …` are placeholders. Use whichever package manager (`npm`, `pnpm`, `yarn`, `bun`) the project actually uses.
18
18
 
19
- ## What is a Standard Agent?
19
+ For deeper reference material that this skill links to, see:
20
20
 
21
- In the Standard Agent paradigm agents are the atomic unit of an AI system and it is the composition of many domain-specific agents that produces efficacy. How large should the concerns of a given agent be? There is no hard or fast rule here, the question is a bit like how big should a React component be. However, in general Standard Agents can be effective using smaller and cheaper models — but smaller and cheaper models typically suffer from poor tool discernment when presented with a broad range of tools, especially a broad range of similar tools.
21
+ - `agents/agents/AGENTS.md` agent definition reference (created by `agents scaffold`)
22
+ - `agents/prompts/AGENTS.md` — prompt and tool config reference
23
+ - `agents/tools/AGENTS.md` — tool-writing patterns and `ThreadState` examples
24
+ - `agents/models/AGENTS.md` — recommended model list (authoritative)
25
+ - `agents/hooks/AGENTS.md` — hook reference
26
+ - Canonical TypeScript types (full signatures): read `node_modules/@standardagents/spec/dist/` in your project, or browse `packages/spec/src/` on GitHub at https://github.com/standardagents/agentbuilder
27
+ - Specification: https://standardagentspec.org/llms.txt
28
+ - Builder docs: https://docs.standardagentbuilder.com/llms.txt
22
29
 
23
- A "gmail" agent, for example, will typically do better than a "google apps" agent. Those higher-level agents would instead be composed of many sub-service agents, and a larger coordinator agent would be the one with the primary objectives, business goals and logic, and ultimate authority. This is a fractal that can be scaled up to create even larger coordinators and larger agents. These agent-graphs, or agent-trees, are the core philosophy of Standard Agents and solves many industry problems such as progressive tool discovery, model-prompt tuning, context dilution, task resumability, and even compaction.
30
+ > If the `agents/*/AGENTS.md` files don't exist in your project yet, run `pnpm exec agents scaffold` to create them.
24
31
 
25
- However, just because we can use subagents to create complex interactions doesn't mean we have to. A subagent is always a two-sided conversation where each side can perform multiple steps per turn. Sometimes you just need to use a feature of a different model, without it being a full subagent, for example using nano bananna to generate an image. If you just need to have a tool call that generates an image, use a subprompt. You may want an asset generator subagent if, for example, the agent needs to perform QA on the generated asset to ensure it meets certain criteria.
32
+ ---
26
33
 
27
- Another example may be using an instruct model variant to run an agent, but a more eloquent model to write the actual text of the email that is sent to people. Again, that could be a subprompt. Having another model proofread the email that is written — that would be a subagent. One small note on that: when performing QA in a subthread, try to use different models from different providers on the agent side that is doing the proofreading. Models from the same labs tend to think their own output is quite good, even if it is not.
34
+ ## Before you write any code
28
35
 
29
- Example structure:
36
+ Most failed Standard Agent projects fail in the same six ways. Read this section first and answer all six questions explicitly before creating any code. Each links to the deeper section that explains *how*.
30
37
 
31
- - `personal_assistant_coordinator`
32
- - central dual-ai coordinator
33
- - decides which domain owns the work
34
- - aggregates state, plans, and responses
35
- - has subprompts for producing images, or using a more eloquent model for writing
36
- - `communications_coordinator`
37
- - owns all inbound and outbound communications work
38
- - delegates to `gmail_agent`, `slack_agent`, and `text_message_agent`
39
- - receives escalations from those channel agents and routes the next action
40
- - `research_assistant`
41
- - owns information gathering and synthesis
42
- - delegates to `browser_use_agent`, `google_drive_agent`, and `notes_agent`
43
- - optional additional branches
44
- - `scheduling_coordinator`
45
- - `travel_coordinator`
46
- - `finance_ops_coordinator`
47
- - `crm_coordinator`
38
+ **1. Architect the agent graph before picking types.**
39
+ List the domains the system touches. Determine the tree (coordinator → domain agents → sub-domain agents). *Then* decide what each node is. Do not start by writing one agent and bolting tools onto it. → [Architecture & decomposition](#architecture--decomposition)
48
40
 
49
- ## Standard Agent Stack
41
+ **2. `dual_ai` is the default. `ai_human` is the exception.**
42
+ `ai_human` is correct only when the thread itself is the chat surface (chat UI, website widget, AgentBuilder admin, direct API chat). For Slack, email, SMS, Discord, webhooks, polled inboxes, schedulers, or any other mediated channel, the human is just a tool target — the agent is `dual_ai`. The whole graph being `dual_ai` is fine and often cleaner. → [Interaction type](#interaction-type-dual_ai-is-the-default)
50
43
 
51
- The standard agent spec defines exactly what an agent is made up of:
44
+ **3. Every `dual_ai` agent must have explicit session boundaries.**
45
+ Name its `sessionStop` tool. Name its `sessionFail` tool. Set a finite `maxSessionTurns`. Give side_b a real, non-redundant job. If you cannot say in one sentence what tool call ends the session, the agent is not designed yet. → [Session boundary discipline](#session-boundary-discipline)
52
46
 
53
- - Providers (define how to interact with different LLM providers, and different model variants.)
54
- - Prompts (system instructions, and how to use tools, subagents, and subprompts)
55
- - Tools (define the functions that an agent can call to interact with the world, and other agents)
56
- - Agents (define the agent itself, how it uses prompts and tools)
57
- - Hooks (define custom code that can perform various internal operations such as modifying the prompt or changing the message history)
58
- - Effects (custom code that executes at a certain time)
59
- - Threads (the instance of a given agent, with its own state, message history, and filesystem)
60
- - Endpoints (built-in and custom endpoints exposed by an agent thread)
47
+ **4. Pick models from `current-models` only.**
48
+ Never invent model strings from memory. Run `pnpm exec agents current-models`, choose by category, then run `pnpm exec agents available-models --provider=<name>` to confirm the exact string. → [Model selection](#model-selection)
61
49
 
62
- Each of these is defined by the Standard Agent Specification.
50
+ **5. Research every third-party API before writing the tool.**
51
+ Fetch the official current docs. Confirm auth, base URL, endpoint paths, payload shapes, rate limits, error codes. Do not write a tool from memory of how an API "usually" works. APIs change. → [API research checklist](#api-research-checklist)
63
52
 
64
- ## Models
53
+ **6. Check `ThreadState` before adding any dependency.**
54
+ Before reaching for S3, Redis, an external cron, a queue service, or even `node:fs`, confirm the framework does not already provide the capability via `ThreadState`. It almost always does. → [ThreadState first](#threadstate-first)
65
55
 
66
- The standard agent spec is model-agnostic, but it is important to understand how to use different models effectively within the paradigm. The most effective agents use a combination of smaller and larger models. The smallest model that produces sufficiently good results is always the best choice. Smaller models are cheaper, faster, and more likely to run on local hardware (eventually). However smaller models often struggle with discernment, coordination, and higher order reasoning like business objectives. So typically we'll want to use larger models for coordinators, and smaller models for discrete subagents.
56
+ ---
67
57
 
68
- Different models are good at different things. The Standard Agents organization has identified a number of well-suited models for various tasks. To get the latest curated list run:
58
+ ## What is a Standard Agent?
69
59
 
70
- ```bash
71
- pnpm exec agents current-models
72
- ```
60
+ In the Standard Agent paradigm, agents are the atomic unit of an AI system, and it is the *composition* of many domain-specific agents that produces efficacy. Standard Agents can be effective using small and cheap models — but small and cheap models suffer from poor tool discernment when presented with a broad, undifferentiated set of tools. Decomposition is what makes them work.
73
61
 
74
- When writing an agent, use `current-models` to choose a model that fits the use case, then use `available-models` to check the exact model string your installed provider exposes. If you have more than one configured provider, pass `--provider=<name>`.
62
+ A "gmail" agent will outperform a "google apps" agent. A higher-level "communications" coordinator composes the gmail, slack, and SMS agents. A still-higher "personal assistant" coordinator composes communications, research, scheduling, and finance. This is fractal: the same shape scales up to teams, departments, and entire products. These agent-graphs solve real industry problems progressive tool discovery, model/prompt tuning, context dilution, task resumability, and compaction.
63
+
64
+ Just because subagents *can* compose complex behavior doesn't mean every step needs one. A subagent is a two-sided conversation where each side may take multiple steps per turn. Sometimes you only need a feature of a different model — generating an image, rewriting a paragraph in a more eloquent voice. For those, use a **subprompt** (a single LLM step exposed as a tool), not a subagent. Use a **subagent** when you need iteration, QA, reflection, or long-lived addressable behavior.
65
+
66
+ When a subagent does QA on another model's output, prefer a *different provider* on the reviewing side. Same-lab models tend to rate their own output too generously.
67
+
68
+ ### Example graph
75
69
 
76
- ```bash
77
- pnpm exec agents available-models --provider=openai
70
+ ```
71
+ personal_assistant_coordinator (dual_ai, top-level reasoning model)
72
+ ├── communications_coordinator (dual_ai, resumable, explicit parent communication, note: children are explicit because they receive inbound messages and should filter out noise before returning)
73
+ │ ├── gmail_agent (dual_ai, resumable, immediate, explicit parent communication)
74
+ │ ├── slack_agent (dual_ai, resumable, immediate, explicit parent communication)
75
+ │ └── sms_agent (dual_ai, resumable, immediate, explicit parent communication)
76
+ ├── research_assistant (dual_ai, resumable, explicit parent communication)
77
+ │ ├── browser_use_agent (dual_ai, resumable, implicit parent communication)
78
+ │ ├── google_drive_agent (dual_ai, resumable, implicit parent communication)
79
+ │ └── notes_agent (dual_ai, resumable, implicit parent communication)
80
+ ├── scheduling_coordinator (subprompt)
81
+ └── finance_ops_coordinator (dual_ai)
78
82
  ```
79
83
 
80
- Note: the `name` of a model should indicate the use case for the model, for example: `extra_reasoning`, or `fast_tool_calls` or `image_generation`. Fallback models can be created for similar use cases, and should typically use different providers to increase the chances of at least one model producing good results. For example, you may have `extra_reasoning` models from both OpenAI and Anthropic.
84
+ Note: no `ai_human` anywhere. The "human" enters via the gmail/slack/sms tool calls. The whole graph is autonomous.
81
85
 
82
- ### Prompts
86
+ ## The Standard Agent stack
83
87
 
84
- Prompts define what actually gets sent to the LLM. They encompass model instructions, tool definitions, subprompts, and subagent tools, history inclusion, model selection, and more. The typescript definition of a prompt is as follows:
88
+ The spec defines an agent as a composition of:
85
89
 
86
- ```ts
87
- export interface PromptDefinition<
88
- N extends string = string,
89
- S extends ToolArgs = ToolArgs,
90
- > {
91
- /**
92
- * Unique name for this prompt.
93
- * Used as the identifier when referencing from agents or as a tool.
94
- * Should be snake_case (e.g., 'customer_support', 'data_analyst').
95
- */
96
- name: N;
97
-
98
- /**
99
- * Description shown when this prompt is exposed as a tool.
100
- * Should clearly describe what this prompt does for LLM tool selection.
101
- */
102
- toolDescription: string;
103
-
104
- /**
105
- * The system prompt content sent to the LLM.
106
- * Can be either a plain string or a structured array for composition.
107
- */
108
- prompt: PromptContent;
109
-
110
- /**
111
- * Model to use for this prompt.
112
- * Must reference a model defined in agents/models/.
113
- */
114
- model: StandardAgentSpec.Models;
115
-
116
- /**
117
- * Include full chat history in the LLM context.
118
- * @default false
119
- */
120
- includeChat?: boolean;
121
-
122
- /**
123
- * Include results from past tool calls in the LLM context.
124
- * @default false
125
- */
126
- includePastTools?: boolean;
127
-
128
- /**
129
- * Allow parallel execution of multiple tool calls.
130
- * @default false
131
- */
132
- parallelToolCalls?: boolean;
133
-
134
- /**
135
- * Tool calling strategy for the LLM.
136
- *
137
- * - `auto`: Model decides when to call tools (default)
138
- * - `none`: Disable tool calling entirely
139
- * - `required`: Force the model to call at least one tool
140
- *
141
- * @default 'auto'
142
- */
143
- toolChoice?: 'auto' | 'none' | 'required';
144
-
145
- /**
146
- * Zod schema for validating inputs when this prompt is called as a tool.
147
- */
148
- requiredSchema?: S;
149
-
150
- /**
151
- * Declared variables for this prompt.
152
- */
153
- variables?: VariableDefinition[];
154
-
155
- /**
156
- * Tools available to this prompt.
157
- * Can be:
158
- * - string: Simple tool name (custom or provider tool)
159
- * - SubpromptConfig: Sub-prompt used as a tool
160
- * - PromptToolConfig: Tool with environment values and/or options
161
- * - SubagentToolConfig: `dual_ai` subagent invocation behavior
162
- *
163
- * To enable handoffs, include ai_human agent names in this array.
164
- *
165
- * @example
166
- * ```typescript
167
- * tools: [
168
- * 'custom_tool', // Simple tool name
169
- * { name: 'other_prompt' }, // Sub-prompt as tool
170
- * { name: 'file_search', env: { VECTOR_STORE_ID: 'vs_123' } }, // Tool with env values
171
- * ]
172
- * ```
173
- */
174
- tools?: (
175
- | StandardAgentSpec.Callables
176
- | SubpromptConfig
177
- | PromptToolConfig
178
- | SubagentToolConfig
179
- )[];
180
-
181
- /**
182
- * Environment values provided by this prompt.
183
- * Prompt values are the lowest-precedence source in runtime resolution.
184
- */
185
- env?: Record<string, string>;
186
-
187
- /**
188
- * Reasoning configuration for models that support extended thinking.
189
- */
190
- reasoning?: ReasoningConfig;
191
-
192
- /**
193
- * Number of recent messages to keep actual images for in context.
194
- * @default 10
195
- */
196
- recentImageThreshold?: number;
197
-
198
- /**
199
- * Provider-specific options passed through to the provider.
200
- * These override model-level providerOptions for this prompt.
201
- *
202
- * Options are merged in order (later wins):
203
- * 1. model.providerOptions (defaults)
204
- * 2. prompt.providerOptions (this field - overrides)
205
- *
206
- * @example
207
- * ```typescript
208
- * providerOptions: {
209
- * response_format: { type: 'json_object' },
210
- * }
211
- * ```
212
- */
213
- providerOptions?: Record<string, unknown>;
214
-
215
- /**
216
- * Hook IDs to run when this prompt is active.
217
- * References hooks by their unique `id` property from defineHook().
218
- * If not specified, falls back to agent-level hooks.
219
- *
220
- * @example
221
- * ```typescript
222
- * hooks: ['limit_to_20_messages', 'log_tool_calls']
223
- * ```
224
- */
225
- hooks?: StandardAgentSpec.HookIds[];
226
- }
90
+ - **Providers** — how to talk to LLM providers and model variants
91
+ - **Models** — named model definitions referencing a provider
92
+ - **Prompts** system instructions plus the tools, subprompts, and subagents available at that step
93
+ - **Tools** functions the agent can call to interact with the world (and other agents)
94
+ - **Agents** — bind it all together: name, type, sides, session bindings
95
+ - **Hooks** — custom code that intercepts lifecycle events (history filtering, message injection, tool result transforms)
96
+ - **Effects** custom code scheduled to run later
97
+ - **Threads** runtime instances of an agent, each with its own state, message history, and filesystem
98
+ - **Endpoints** built-in and custom HTTP endpoints exposed by a thread
99
+
100
+ ---
101
+
102
+ ## Architecture & decomposition
103
+
104
+ **Procedure (do this in order, every time):**
105
+
106
+ 1. List the domains the system actually touches.
107
+ 2. Draw the tree on paper or in a comment block. Coordinators on top, leaf agents at the bottom.
108
+ 3. Pick interaction types. (Default `dual_ai`. Use `ai_human` only at chat-surface entry points.)
109
+ 4. Name session boundaries for every `dual_ai` node.
110
+ 5. Pick a model category for every node.
111
+ 6. List the third-party APIs each leaf needs and queue them for research.
112
+ 7. Map each capability to a `ThreadState` primitive before writing any custom plumbing.
113
+
114
+ Only then start writing files.
115
+
116
+ ### Mega-agent smell test
117
+
118
+ If a single agent has **more than ~8 tools across unrelated domains**, decompose it. Decomposition is rarely wrong; flattening usually is.
119
+
120
+ Other smells that mean "decompose now":
121
+
122
+ - One prompt has tools from two clearly different worlds (e.g., `send_email` and `query_postgres` and `generate_image`).
123
+ - The system prompt is trying to teach the model when to use which tool by writing rules in English. Rules-in-prose are coordinator logic; promote them to a coordinator that picks subagents.
124
+ - A model keeps calling the wrong tool because two tools have similar names or overlapping descriptions. Different domains, different agents.
125
+ - The agent's prompt is over ~150 lines. That's almost always context dilution — split.
126
+
127
+ ### Worked example: wrong vs. right
128
+
129
+ **Wrong** — one mega-agent with tools from unrelated domains:
130
+
131
+ ```
132
+ personal_assistant (ai_human)
133
+ tools: send_gmail, read_gmail, search_ebay_listings, place_ebay_order,
134
+ track_ebay_shipment, create_calendar_event, list_calendar_events,
135
+ post_slack_message, read_slack_channel, query_stock_price,
136
+ execute_stock_trade, search_recipes, generate_image
227
137
  ```
228
138
 
229
- The `prompt` property is the system message for that LLM step. It is defined as a string or array of parts. If defined with parts, the prompt is rendered before being sent to the LLM and can contain dynamic content. For example, let's say our agent is powering a chatbot for a website that sells athletic shoes. We want the chatbot to be able to answer questions about the shoes, billing, shipping etc. To do this, we have many prompts, but we always want to use the same friendly and helpful tone.
139
+ Gmail, eBay, calendar, Slack, stock trading, recipes, image gen seven unrelated worlds in one tool list. The model picks `send_gmail` when it meant `post_slack_message`. It calls `execute_stock_trade` with eBay listing IDs. The system prompt grows 80 lines of rules trying to teach it which tool belongs to which world. Every bug fix breaks two unrelated flows.
230
140
 
231
- To share "tone" prompt that just defines the tone and style of the chatbot, and then reference that prompt in all our other prompts as a part. This allows us to change the tone of the chatbot across all its functions by just changing one prompt.
141
+ **Right** coordinator + domain subagents:
232
142
 
233
- ```ts
234
- const tonePrompt: PromptDefinition = {
235
- name: 'company_tone',
236
- toolDescription: 'Defines the tone and style of the chatbot.',
237
- prompt: `You are a friendly and helpful customer support assistant for an athletic shoe company. You always respond in a positive and upbeat tone, even when the customer is upset. You use simple language and avoid technical jargon. Your goal is to help the customer with their issue and make them feel good about shopping with us.`,
238
- model: 'tiny_model',
239
- };
143
+ ```
144
+ personal_assistant_coordinator (dual_ai, reasoning model)
145
+ ├── gmail_agent tools: send_gmail, read_gmail
146
+ ├── ebay_agent tools: search_ebay_listings, place_ebay_order,
147
+ │ track_ebay_shipment
148
+ ├── calendar_agent tools: create_calendar_event, list_calendar_events
149
+ ├── slack_agent tools: post_slack_message, read_slack_channel
150
+ ├── trading_agent tools: query_stock_price, execute_stock_trade
151
+ └── content_helpers subprompts: search_recipes, generate_image
240
152
  ```
241
153
 
242
- Then in another prompt we can reference that tone prompt:
154
+ `personal_assistant_coordinator` runs a reasoning model and decides which domain owns the next step. Each leaf is a fast tool-calling model with a tight tool set it can actually discern between. A fix to the eBay flow can't break Gmail.
155
+
156
+ ### Coordinators
157
+
158
+ Coordinators provide two things flat agents can't:
159
+
160
+ 1. **Inter-domain communication.** A `coding_coordinator` can ask `research_agent` to investigate a library, then hand the findings to `bash_agent` to run commands. The bash agent never needed to know the research agent existed.
161
+ 2. **Filtering.** A `gmail_agent` can filter spam well on its own. But the higher-level objectives of the organization — "what email actually matters to the CEO of the sprinkler hose company" — belong to a coordinator above it. The coordinator filters again before escalating.
162
+
163
+ ### Graph depth tradeoffs
164
+
165
+ Deeper graphs add latency and create more inter-agent communication that can fail. But they also enable parallelism and isolation. A `marketing_communications_agent` can have a `social_media_agent` which has `twitter_agent`, `linkedin_agent`, and `facebook_agent` as children. That subtree handles posting workflows entirely without involving the top coordinator. Aim for depth that matches the natural hierarchy of the work, not depth for its own sake.
166
+
167
+ ---
168
+
169
+ ## Interaction type: `dual_ai` is the default
170
+
171
+ > **`dual_ai` is the default agent shape. `ai_human` is the exception**, used only when the thread itself is the chat surface — chat UI, website widget, AgentBuilder admin, or an API the user is talking to directly.
172
+
173
+ For **Slack, email, SMS, Discord, webhooks, polled inboxes, schedulers, or any other mediated channel**, the human is just a tool target. The agent on the framework side is `dual_ai`. The whole graph being `dual_ai` is fine — often cleaner.
174
+
175
+ ### The single decision
176
+
177
+ Ask: **"Is the human typing directly into this thread?"**
178
+
179
+ - **Yes** → the top-level agent is `ai_human`.
180
+ - **No** → the top-level agent is `dual_ai`. The human enters the graph via a tool somewhere (e.g., a `send_slack_message` call inside a slack subagent).
181
+
182
+ Do not default to `ai_human` just because a human is eventually involved. Ask where messages physically arrive.
183
+
184
+ ### The two shapes
185
+
186
+ **Shape A — chat surface (`ai_human` at top):**
243
187
 
244
- ```ts
245
- const shippingPrompt: PromptDefinition = {
246
- name: 'shipping_inquiries',
247
- toolDescription: 'Handles customer questions about shipping.',
248
- prompt: [
249
- { type: 'include', prompt: 'company_tone' }, // Reference the tone prompt as a part
250
- { type: 'text', content: `Details about the products:...` }
251
- ],
252
- model: 'tiny_model',
253
- };
188
+ ```
189
+ website_chatbot (ai_human) ← user types in a chat widget
190
+ tools: search_products, lookup_order, escalate_to_human
254
191
  ```
255
192
 
256
- Additionally, we can use the `env` property to provide dynamic content that can be referenced in the prompt. This can be especially helpful for packed agents that get distributed to other people, businesses, or industries. For example, we could create a generic ecommerce assistant agent that uses a `PRODUCT_INVENTORY` env to customize the agent to the specific products of the business that is using it.
193
+ **Shape B mediated (`dual_ai` everywhere):**
257
194
 
258
- ```ts
259
- {
260
- // ...
261
- name: 'ecommerce_assistant',
262
- toolDescription: 'An assistant for ecommerce businesses that can answer questions about products, inventory, and orders.',
263
- prompt: [
264
- { type: 'text', content: `You are an assistant for an ecommerce business. You help customers with their questions about products, inventory, and orders. Here is the current product inventory: ` },
265
- { type: 'env', property: 'PRODUCT_INVENTORY' }, // Reference the PRODUCT_INVENTORY env variable
266
- ],
267
- model: 'tiny_model',
268
- variables: [
269
- {
270
- /** Environment variable/property name */
271
- name: 'PRODUCT_INVENTORY',
272
- /** Value type: 'text' or 'secret' */
273
- type: 'text',
274
- /** Whether this variable is required to execute */
275
- required: true;
276
- /**
277
- * Whether this variable is scoped to the declarer agent subtree.
278
- *
279
- * Scoped variables do not inherit parent thread env values. Descendants of
280
- * the declarer still inherit scoped values from that declarer thread.
281
- *
282
- * @default false
283
- */
284
- scoped: false;
285
- /** Human-readable description (empty string when not provided) */
286
- description: 'The full inventory of products, including names, descriptions, and stock levels.',
287
- }
288
- ]
289
- }
195
+ ```
196
+ slack_research_assistant (dual_ai, reasoning model)
197
+ ├── slack_agent (dual_ai, resumable) ← messages arrive via tool calls,
198
+ │ not as thread messages
199
+ └── research_agent (dual_ai)
290
200
  ```
291
201
 
292
- ## Tools
202
+ In Shape B, side_a of `slack_research_assistant` plans the work and dispatches subagents. Side_b reviews and decides when the work is done. The "user" is a Slack channel reachable through `slack_agent`'s tools.
293
203
 
294
- Tools are a generic term for anything an agent can "call". In practice there are 3 types of tools:
204
+ ### Handoffs (a special case of `ai_human`)
295
205
 
296
- 1. Callables: these are TypeScript functions that are defined via `defineTool` in the `tools` directory. These should be created as the primary mechanism for interfacing with the outside world, interacting with APIs, database, business logic, and often even other agents. Each callable receives a `ThreadState` object. You should not assume that the function is being executed on a node server; it may be an edge function, for example, so do not use node imports if possible. Instead use the `ThreadState` to perform any necessary operations (see the ThreadState documentation for more details and examples).
206
+ When an `ai_human` agent calls another `ai_human` agent as a tool, the runtime does **not** spawn a new thread it changes which prompt "owns" the existing thread. This is a *handoff*. Useful for stepwise human flows: an `onboarding_agent` collects information, then hands off to a `scheduling_agent` that books a meeting. The user keeps talking to the same thread.
297
207
 
298
- 2. Subprompts: these are prompts that can be called as tools. They are defined in the `prompts` directory and have a `toolDescription` property that indicates they can be used as tools, and are useful for switching models for a small task, like writing some text, or generating an image.
208
+ ### Subprompts vs. subagents vs. handoffs
299
209
 
300
- 3. Subagents: these are full agents that can be called as tools. They are defined in the `agents` directory and have `exposeAsTool: true` in their agent definition. Subagents are useful when you need a more complex interaction that requires multiple steps, or when you want to leverage the unique capabilities of a different model that is better suited for a specific task.
210
+ - **Subprompt** one LLM step exposed as a tool. Use to switch models for a focused task: image generation, polished writing, JSON extraction.
211
+ - **Subagent** — full child thread with its own `ThreadState`. Use when you need iteration, QA, reflection, or long-lived addressable behavior. Always `dual_ai`.
212
+ - **Handoff** — `ai_human` → `ai_human`, swaps prompt ownership of the same thread. Use for stepwise human-driven flows.
301
213
 
302
- There are lots of options for how tools are defined on their respective prompt:
214
+ A subagent can be:
303
215
 
304
- ```ts
305
- {
306
- // ...
307
- tools: {
308
- /**
309
- * Agent callable name.
310
- *
311
- * Must reference a `dual_ai` agent with `exposeAsTool: true`.
312
- */
313
- name: T;
314
-
315
- /**
316
- * Whether parent execution blocks until the subagent returns a result.
317
- *
318
- * - `true`: Parent waits for completion (tool-call style)
319
- * - `false`: Parent continues immediately and receives results asynchronously
320
- *
321
- * @default true
322
- */
323
- blocking?: boolean;
324
-
325
- /**
326
- * Property from tool-call arguments used as the initial message sent to the
327
- * subagent on invocation.
328
- *
329
- * Uses the same semantics as {@link SubpromptConfig.initUserMessageProperty}.
330
- */
331
- initUserMessageProperty?: StandardAgentSpec.SchemaFields<T>;
332
-
333
- /**
334
- * Property from tool-call arguments containing attachment path(s) that should
335
- * be sent to the subagent on invocation.
336
- *
337
- * Uses the same semantics as {@link SubpromptConfig.initAttachmentsProperty}.
338
- */
339
- initAttachmentsProperty?: StandardAgentSpec.SchemaFields<T>;
340
-
341
- /**
342
- * Property from tool-call arguments used to assign a human-readable name for
343
- * each spawned child thread instance.
344
- *
345
- * Implementations SHOULD store this as a thread tag in the form
346
- * `name:<value>` so UIs can render a concise per-instance title.
347
- */
348
- initAgentNameProperty?: StandardAgentSpec.SchemaFields<T>;
349
-
350
- /**
351
- * Execute this tool immediately when the prompt becomes active.
352
- *
353
- * - `true`: Execute immediately using runtime defaults.
354
- * - Object: Execute immediately with explicit per-instance env relationships.
355
- *
356
- * When the object form is used:
357
- * - `scopedEnv` names the per-instance env values copied into the child thread.
358
- * - `nameEnv` and `descriptionEnv` identify the only per-instance env values
359
- * that runtimes may expose to an internal bootstrap model when deriving
360
- * initial child arguments.
361
- *
362
- * Runtimes MUST NOT expose `scopedEnv` values to the model unless the same env
363
- * name is explicitly designated by `nameEnv` or `descriptionEnv`.
364
- *
365
- * Immediate tools run before the first LLM step for that activation.
366
- */
367
- immediate?:
368
- | boolean
369
- | {
370
- /**
371
- * Scoped env name whose value may be used as the safe per-instance name
372
- * hint for child bootstrap.
373
- */
374
- nameEnv?: string;
375
-
376
- /**
377
- * Scoped env name whose value may be used as the safe per-instance
378
- * description hint for child bootstrap.
379
- */
380
- descriptionEnv?: string;
381
-
382
- /**
383
- * Scoped env names that should be copied into the child thread for each
384
- * immediate instance group.
385
- */
386
- scopedEnv?: string[];
387
- };
388
-
389
- /**
390
- * Optional branch flag env name.
391
- *
392
- * When set, this subagent is only enabled when the named env resolves to
393
- * `true`, `1`, or `yes` (case-insensitive).
394
- */
395
- optional?: string;
396
-
397
- /**
398
- * Resumability configuration.
399
- *
400
- * - `false` (default): Non-resumable subagent
401
- * - Object: Resumable subagent with message routing and instance limits
402
- *
403
- * When resumable mode is enabled, runtimes SHOULD provide a built-in create
404
- * and message lifecycle interface instead of exposing raw agent callables for
405
- * new instance creation.
406
- */
407
- resumable?:
408
- | false
409
- | {
410
- /**
411
- * Which side of the child `dual_ai` conversation receives parent messages.
412
- *
413
- * - `side_a`: Messages are queued as `role: 'user'`
414
- * - `side_b`: Messages are queued as `role: 'assistant'`
415
- */
416
- receives_messages: 'side_a' | 'side_b';
417
-
418
- /**
419
- * Maximum concurrent instances for this subagent tool.
420
- *
421
- * When reached, implementations may remove this tool from subsequent LLM
422
- * requests and route new messages to existing instances.
423
- *
424
- * @default unlimited
425
- */
426
- maxInstances?: number;
427
-
428
- /**
429
- * How this child reports back to its parent.
430
- *
431
- * - `implicit` (default): Child completion is automatically queued to the parent.
432
- * - `explicit`: The runtime does not auto-queue child completion; tools/hooks may
433
- * use thread APIs such as `state.notifyParent()` when they choose to escalate.
434
- */
435
- parentCommunication?: 'implicit' | 'explicit';
436
- };
437
- } |
438
- {
439
- /**
440
- * Name of the tool (custom tool or provider tool).
441
- */
442
- name: StandardAgentSpec.Callables;
443
-
444
- /**
445
- * Environment variable values for this tool.
446
- */
447
- env?: Record<string, string>;
448
- /**
449
- * @deprecated Use `env` instead.
450
- */
451
- tenvs?: Record<string, unknown>;
452
-
453
- /**
454
- * Static options for this tool.
455
- * Passed to the tool handler at execution time.
456
- */
457
- options?: Record<string, unknown>;
458
- } | {
459
- /**
460
- * Name of the sub-prompt or agent to call.
461
- * Must be a prompt defined in agents/prompts/ or an agent in agents/agents/.
462
- */
463
- name: T;
464
-
465
- /**
466
- * Include text response content from sub-prompt execution in the result string.
467
- * @default true
468
- */
469
- includeTextResponse?: boolean;
470
-
471
- /**
472
- * Serialize tool calls made by the sub-prompt (and their results) into the result string.
473
- * @default true
474
- */
475
- includeToolCalls?: boolean;
476
-
477
- /**
478
- * Serialize any errors from the sub-prompt into the result string.
479
- * @default true
480
- */
481
- includeErrors?: boolean;
482
-
483
- /**
484
- * Property from the tool call arguments to use as the initial user message
485
- * when invoking the sub-prompt or agent.
486
- *
487
- * Autocompletes to fields from the prompt's requiredSchema (or agent's side_a prompt schema).
488
- *
489
- * @example
490
- * If the tool is called with `{ query: "search term", limit: 10 }` and
491
- * `initUserMessageProperty: 'query'`, the sub-prompt will receive
492
- * "search term" as the initial user message.
493
- */
494
- initUserMessageProperty?: StandardAgentSpec.SchemaFields<T>;
495
-
496
- /**
497
- * Property containing attachment path(s) to include as multimodal content
498
- * when invoking the sub-prompt or agent.
499
- *
500
- * Autocompletes to fields from the prompt's requiredSchema (or agent's side_a prompt schema).
501
- * Supports both a single path string or an array of paths.
502
- *
503
- * @example
504
- * If the tool is called with `{ image: "/attachments/123.jpg" }` and
505
- * `initAttachmentsProperty: 'image'`, the sub-prompt will receive
506
- * the image as an attachment in the user message.
507
- *
508
- * @example
509
- * If the tool is called with `{ images: ["/attachments/a.jpg", "/attachments/b.jpg"] }` and
510
- * `initAttachmentsProperty: 'images'`, the sub-prompt will receive
511
- * both images as attachments.
512
- */
513
- initAttachmentsProperty?: StandardAgentSpec.SchemaFields<T>;
514
- } |
515
- string /* simple tool name with no extra options * /;
516
- }
216
+ - **Blocking + non-resumable** — the parent waits, the child runs once, returns, and is gone (tool-call style)
217
+ - **Blocking + resumable** — the parent waits but the child remains addressable for future calls
218
+ - **Non-blocking + non-resumable** — fire-and-forget one-shot
219
+ - **Non-blocking + resumable** — long-lived addressable child the parent can re-message later (e.g., a Slack monitor that lives as long as the parent)
220
+
221
+ ---
222
+
223
+ ## Session boundary discipline
224
+
225
+ This section is a hard rule, not a suggestion. **Every `dual_ai` agent in the graph must explicitly define its session boundaries.** This is what makes whole-graph `dual_ai` safe.
226
+
227
+ ### Required for every `dual_ai` agent
228
+
229
+ - **`sessionStop` tool binding** — names the tool call that ends the session with a result. Side_a or side_b invokes it when the work is done. Common patterns: `report_findings`, `submit_to_parent`, `final_answer`, `done`.
230
+ - **`sessionFail` tool binding** names the tool call that ends the session with a failure the parent should see. Common patterns: `escalate_blocked`, `report_unresolvable`, `give_up_with_reason`.
231
+ - **`maxSessionTurns`** a finite integer sized to the realistic upper bound for the task. Never omit this. A research subagent might be 20; an asset QA loop might be 6; a tight reflection loop might be 3.
232
+ - **A real job for side_b** — reflection, QA, judging, alternative-perspective driving, devil's advocate. If side_b is "another instance of side_a," you don't have a `dual_ai` agent — collapse it to a single-side prompt or a coordinator pattern.
233
+ - **Cross-provider QA** — when side_b reviews side_a's output, pick a model from a *different provider* than side_a. Same-lab models systematically over-rate their own work.
234
+
235
+ ### The smell test
236
+
237
+ > If you cannot articulate in one sentence what tool call ends the session, the agent is not designed yet.
238
+
239
+ Examples that pass:
240
+
241
+ - "Side_a calls `submit_video_assets` once side_b approves the renders."
242
+ - "Side_b calls `report_findings` after the research turn count exceeds 5 or it judges the topic exhausted."
243
+ - "Either side calls `escalate_to_parent` if the customer policy question is unresolvable."
244
+
245
+ Examples that fail:
246
+
247
+ - "It just stops when it's done." (How does the runtime know?)
248
+ - "Whichever side decides." (Decides by calling *what*?)
249
+
250
+ ### Why this matters
251
+
252
+ A `dual_ai` agent with no `sessionStop`, no `sessionFail`, and no `maxSessionTurns` will burn turns until it hits a runtime cap, then fail in a way the parent can't interpret. Coding agents that have been bitten by this once will start defaulting to `ai_human` to "feel safer" — and then we're back to mega-agents and chat-surface confusion. Boundary discipline is what keeps `dual_ai` the default.
253
+
254
+ ---
255
+
256
+ ## Model selection
257
+
258
+ **Rule: never write a model string the user did not request and `current-models` did not produce.**
259
+
260
+ ### Required workflow
261
+
262
+ 1. Run `pnpm exec agents current-models` to see the curated category list (e.g. `extra_reasoning`, `fast_tool_calls`, `writing`, `image_generation`, `tiny`).
263
+ 2. Choose the category that fits the role (table below).
264
+ 3. Run `pnpm exec agents available-models --provider=<name>` to confirm the exact provider model string.
265
+ 4. Define the model in `agents/models/<name>.ts` using `defineModel`.
266
+
267
+ If you have multiple configured providers, pass `--provider=<name>` explicitly.
268
+
269
+ ### Role category mapping
270
+
271
+ | Role | Category | Notes |
272
+ |---|---|---|
273
+ | Top-level coordinator | `extra_reasoning` | The discerning brain. Don't skimp here. |
274
+ | Domain subagent (gmail, slack, etc.) | `fast_tool_calls` | High volume, narrow tool set, latency matters. |
275
+ | Eloquent text generation | `writing` | Use as a subprompt from a fast-tool-calling agent. |
276
+ | Image generation | `image_generation` | Use as a subprompt. |
277
+ | QA / reviewer side_b | `extra_reasoning` from a *different provider* than side_a | Same-lab QA is biased. |
278
+ | Cheap classification, tagging, routing | `tiny` | Where speed and cost dominate quality. |
279
+
280
+ ### Fallback strategy
281
+
282
+ Define more than one model per category, on different providers. The model `name` should describe the *use case*, not the provider, so a fallback can substitute transparently:
283
+
284
+ ```
285
+ agents/models/extra_reasoning.ts → primary (provider A)
286
+ agents/models/extra_reasoning_fallback.ts → secondary (provider B)
517
287
  ```
518
288
 
519
- Excellent orchestration is the creation of models definitions, prompts, tools, subprompts, subagents, and agent definitions to tie everything together.
289
+ Prompts and agents reference `extra_reasoning`; if the primary provider is down, the fallback is one rename away.
520
290
 
291
+ For the authoritative list of which models currently fill each category, see `agents/models/AGENTS.md` in your project (created by `agents scaffold`). Do not embed the list here — it drifts.
521
292
 
522
- ## Subprompts vs Handoffs vs Subagents
293
+ ---
523
294
 
524
- A subprompt allows a `prompt` definition to be run with a single step as if it was a tool. The result of the subprompt is returned to the thread as if it is a tool result. Subprompts are a useful way to use a different model for a specific task. For example, an agent may use a subprompt to leverage a better image generation model.
295
+ ## API research checklist
296
+
297
+ This is the single biggest gap in most coding-agent-built tools. **Do not write a `defineTool` that touches a third-party API from memory.** APIs change. Auth flows change. Endpoints get deprecated. Read the current docs every time.
298
+
299
+ ### Before writing the tool
300
+
301
+ 1. **Fetch the official current docs.** Use WebFetch, the read-website skill, or a browser. Confirm:
302
+ - Auth method (bearer, OAuth, signed request, API key in header vs. query)
303
+ - Base URL and current API version
304
+ - Endpoint paths and HTTP methods
305
+ - Request payload shape (and required vs. optional fields)
306
+ - Response payload shape (success and error)
307
+ - Rate limits and `Retry-After` handling
308
+ - Pagination model (cursor, offset, link header)
309
+ - Idempotency rules — does retrying duplicate side effects?
310
+ 2. **Confirm SDK availability and Workers compatibility.** Is there a first-party JS/TS SDK? Does it run in Cloudflare Workers (no Node built-ins, no `fs`, no native modules, no long-lived sockets)? If not, use `fetch` directly. Most provider SDKs are not Workers-compatible out of the box.
311
+ 3. **Identify required secrets.** Declare them as `secret` variables, never `text`. Secrets must never be referenced in prompt text or returned to the model.
312
+ 4. **Map error modes explicitly.** Each gets handled, not ignored:
313
+ - `401` / `403` — auth failed. Surface a clear message; do not retry blindly.
314
+ - `404` — missing resource. Often a user-facing error; surface it.
315
+ - `409` — conflict. Often means the operation already happened; check before retrying.
316
+ - `429` — rate limited. Honor `Retry-After`. Retry with backoff.
317
+ - `5xx` — server error. Retry with exponential backoff, finite cap.
318
+ - `4xx` (other) — surface to the model so it can correct its arguments.
319
+ 5. **Prototype the raw call** in isolation — a single `fetch` against the real endpoint with a real token. Confirm the response shape *as observed*, not as documented. Docs lag.
320
+ 6. **Return shapes the model can actually use.** Strip noise. Surface IDs, names, statuses, and the fields the next step needs. Don't return the entire 12 KB JSON blob and hope the model picks the right field.
321
+
322
+ > **Anti-pattern:** "I know the Slack API, I'll just write it." You don't. It changed last quarter. Read the docs.
525
323
 
526
- Subprompts can receive the entire context of the parent thread if the prompt definition has `includeChat: true` and `includePastTools: true`. If those are set to false, then the subprompt receives no context at all, and it should use `initUserMessageProperty` and optionally `initAttachmentsProperty` instead. These properties specify what explicit context from the parent thread should be exposed, and their values are provided as if it is a `user` requesting the subprompt to run. The exact text the user provides will be the `initUserMessageProperty` value, and optionally any file attachments specified using the `/attachments/{filename}.{ext}` convention (`initAttachmentsProperty` should be an array of strings).
324
+ ---
527
325
 
528
- A subagent on the other hand is a full agent thread, with its own `ThreadState`. Subagents can be:
326
+ ## ThreadState first
529
327
 
530
- - Blocking and non-resumable
531
- - Blocking and resumable
532
- - Non-blocking and non-resumable
533
- - Non-blocking and resumable
328
+ > **Rule:** Before adding any dependency or external service, check whether `ThreadState` already provides the capability.
534
329
 
535
- These distinctions indicate how a subagent appears to its parent agent. A non-resumable subagent is very much like a tool call from the parent's perspective. It calls the "tool" the subagent runs for a while and returns a result. A resumable subagent however, becomes an addressable part of the agent graph, the parent can communicate with it receive results from it, and inspect its status. Resumable subagents can be as long-lived as the parent itself. For example a LinkedIn subagent may be permanently monitoring a particular linked in account and advising the parent coordinator when messages that would be important for the owner are received.
536
- Blocking and non-blocking indicates whether the parent thread waits for the subagent to finish before it continues. A non-blocking subagent allows the parent thread to continue its work while the subagent is running, and the parent can receive messages from the subagent.
330
+ `ThreadState` is the unified API passed to every callable, hook, and endpoint. It abstracts the runtime your tool may run on the edge, on a Worker, or on a Node server, and the same `state.readFile()` call works in all of them. Tools should **not** import `node:fs`, read `process.env`, or assume Node-shaped APIs.
537
331
 
538
- A handoff is when an `ai_human` agent is used as a tool by another `ai_human` agent. This is a special case and rather than spawn a new thread, it changes which prompt "owns" that thread — in other words it hands off control from one agent to another. This is very useful in a number of applications where a human is being led through a series of steps, for example, an onboarding agent may collect necessary information from a user, and then "handoff" to another agent which performs a different function, for example, a scheduling agent that books meetings based on the information collected by the onboarding agent.
332
+ ### Capability lookup table
539
333
 
540
- Subagents should always be `dual_ai`. `dual_ai` agents are fully autonomous and require no input from a human. Use `side_a` and `side_b` to perform reflection and reasoning on the results of the communication. For example if a subagent is tasked with generating assets for a video game, and those assets need to be on a green background, the `side_a` of the subagent can generate the assets, and the `side_b` can review the assets and determine if they meet the criteria. If they do, the `side_b` can communicate back to the parent coordinator via the tool defined as the `sessionStop` tool. If they do not meet the criteria, `side_b` can inform `side_a` in which ways the image needs to change before it can be approved.
334
+ | What you need | Use this | Don't use |
335
+ |---|---|---|
336
+ | Store a file between turns | `state.writeFile` / `state.readFile` | S3, external blob store |
337
+ | Persist structured data across turns | `state.context` (in-memory) + `state.writeFile` JSON (durable) | External KV, Redis |
338
+ | Trigger work later | `state.scheduleEffect` | External cron, queue service |
339
+ | Invoke another tool from inside a tool | `state.invokeTool` / `state.queueTool` | Re-implementing tool logic inline |
340
+ | Read/write config and secrets | `state.env` / `state.setEnv` | `process.env`, `.env` files |
341
+ | Search files the thread has seen | `state.grepFiles` / `state.findFiles` | Reimplementing search |
342
+ | Escalate / report status to the parent | `state.notifyParent` / `state.setStatus` | Custom message bus |
343
+ | Load a sibling prompt / agent / model | `state.loadPrompt` / `state.loadAgent` / `state.loadModel` | Duplicating the definition |
344
+ | Inspect or message a child thread | `state.children` / `state.getChildThread` | External orchestration |
345
+ | Inject context the model should see | `state.injectMessage` / `state.queueMessage` | Stuffing the system prompt |
346
+ | Read the thread's message history | `state.getMessages` / `state.getMessage` | Reimplementing storage |
347
+ | Update an existing message | `state.updateMessage` | Mutating storage directly |
348
+ | Read execution logs | `state.getLogs` | External observability shim |
349
+ | Emit a runtime event | `state.emit` | Console logging |
350
+ | Stop the thread | `state.terminate` | Throwing and hoping |
541
351
 
542
- The following is the official shape of the agent definition by which these kinds of interactions can be configured:
352
+ ### Method cheat sheet
543
353
 
544
- ```ts
545
- export interface AgentDefinition<
546
- N extends string = string,
547
- Prompt extends string = StandardAgentSpec.Prompts,
548
- Callable extends string = StandardAgentSpec.Callables,
549
- > {
550
- /**
551
- * Unique name for this agent.
552
- * Used as the identifier for thread creation and handoffs.
553
- * Should be snake_case (e.g., 'support_agent', 'research_flow').
554
- */
555
- name: N;
556
-
557
- /**
558
- * Agent conversation type.
559
- *
560
- * - `ai_human`: AI conversing with a human user (default)
561
- * - `dual_ai`: Two AI participants conversing
562
- *
563
- * @default 'ai_human'
564
- */
565
- type?: AgentType;
566
-
567
- /**
568
- * Maximum total turns across both sides.
569
- * Only applies to `dual_ai` agents.
570
- * Prevents infinite loops in AI-to-AI conversations.
571
- */
572
- maxSessionTurns?: number;
573
-
574
- /**
575
- * Configuration for Side A.
576
- * For `ai_human`: This is the AI side.
577
- * For `dual_ai`: This is the first AI participant.
578
- */
579
- sideA: SideConfig<Prompt, Callable>;
580
-
581
- /**
582
- * Configuration for Side B.
583
- * For `ai_human`: Optional, the human side doesn't need config.
584
- * For `dual_ai`: Required, the second AI participant.
585
- */
586
- sideB?: SideConfig<Prompt, Callable>;
587
-
588
- /**
589
- * Expose this agent as a tool for other prompts.
590
- * Enables agent composition and handoffs.
591
- * When true, other prompts can invoke this agent as a tool.
592
- * @default false
593
- */
594
- exposeAsTool?: boolean;
595
-
596
- /**
597
- * Description shown when agent is used as a tool.
598
- * Required if exposeAsTool is true.
599
- * Should clearly describe what this agent does.
600
- */
601
- toolDescription?: string;
602
-
603
- /**
604
- * Brief description of what this agent does.
605
- * Useful for UIs and documentation.
606
- *
607
- * @example 'Handles customer support inquiries and resolves issues'
608
- */
609
- description?: string;
610
-
611
- /**
612
- * Icon URL or absolute path for the agent.
613
- * Absolute paths (starting with `/`) are converted to full URLs in API responses.
614
- *
615
- * @example 'https://example.com/icon.svg' or '/icons/support.svg'
616
- */
617
- icon?: string;
618
-
619
- /**
620
- * Environment values provided by this agent.
621
- * Agent values are lower priority than thread/account/instance values.
622
- */
623
- env?: Record<string, string>;
624
-
625
- // ============================================================================
626
- // Package Metadata (for packing/unpacking)
627
- // ============================================================================
628
-
629
- /**
630
- * npm package name for this agent when packed.
631
- * Used by the packing system to maintain consistent package identity
632
- * across pack/unpack cycles.
633
- *
634
- * @example 'standardagent-support-agent', '@myorg/support-agent'
635
- */
636
- packageName?: string;
637
-
638
- /**
639
- * Package version (semver format).
640
- * Used by the packing system to track versions across pack/unpack cycles.
641
- * When re-packing, this version is auto-incremented by the pack modal.
642
- *
643
- * @example '1.0.0', '2.3.1-beta.1'
644
- */
645
- version?: string;
646
-
647
- /**
648
- * Package author/copyright holder.
649
- * Used by the packing system for the LICENSE file and package.json author field.
650
- *
651
- * @example 'John Doe', 'Acme Corp'
652
- */
653
- author?: string;
654
-
655
- /**
656
- * License identifier (SPDX format).
657
- * Used by the packing system for LICENSE file generation.
658
- *
659
- * @example 'MIT', 'Apache-2.0', 'ISC'
660
- */
661
- license?: string;
662
-
663
- /**
664
- * Hook IDs to run for this agent.
665
- * References hooks by their unique `id` property from defineHook().
666
- * These run when prompts don't specify their own hooks.
667
- *
668
- * @example
669
- * ```typescript
670
- * hooks: ['log_messages', 'track_tool_usage']
671
- * ```
672
- */
673
- hooks?: StandardAgentSpec.HookIds[];
674
- }
354
+ Discoverability reference. For full signatures, read the spec types from `node_modules/@standardagents/spec/dist/` (or browse `packages/spec/src/` on GitHub). Worked examples live in `agents/tools/AGENTS.md`.
355
+
356
+ ```
357
+ Identity threadId, agentId, userId, createdAt, children, terminated
358
+ Messages getMessages, getMessage, injectMessage, queueMessage, updateMessage
359
+ Logs getLogs
360
+ Resources loadModel, loadPrompt, loadAgent,
361
+ getChildThread, getParentThread,
362
+ getPromptNames, getAgentNames, getModelNames
363
+ Env env, setEnv
364
+ Parent notifyParent, setStatus
365
+ Tools queueTool, invokeTool
366
+ Effects scheduleEffect, getScheduledEffects, removeScheduledEffect
367
+ Events emit
368
+ Context context (Record<string, unknown>, in-memory only)
369
+ Files writeFile, readFile, readFileStream, statFile, readdirFile,
370
+ unlinkFile, mkdirFile, rmdirFile, getFileStats,
371
+ grepFiles, findFiles, getFileThumbnail
372
+ Execution execution, terminate
373
+ ```
374
+
375
+ ### Notes on a few that are easy to misuse
376
+
377
+ - **`state.context`** is in-memory for the *current execution*. It is not durable across thread restarts. For durable structured state, write a JSON file with `state.writeFile`.
378
+ - **`state.scheduleEffect`** runs a named effect after a delay. It survives restarts. This is your cron, your queue, and your retry timer all in one.
379
+ - **`state.invokeTool` vs `state.queueTool`** — `invokeTool` runs synchronously and returns the result; `queueTool` schedules the call to run later in the normal tool-call flow. Prefer `queueTool` when the model should see the result as a regular tool call.
380
+ - **`state.notifyParent`** for resumable subagents with `parentCommunication: 'explicit'`, this is the only way the child talks to the parent. Use it sparingly; every notification interrupts the parent.
381
+ - **File attachments** use the path convention `/attachments/{filename}.{ext}`. Always use this path when passing files between agents — the runtime copies them across thread filesystems automatically.
382
+
383
+ ---
384
+
385
+ ## Tools
386
+
387
+ A "tool" is anything an agent can call. There are three kinds:
388
+
389
+ 1. **Callables** — TypeScript functions defined via `defineTool` in `agents/tools/`. The primary way to interface with the outside world: APIs, databases, business logic, and (sometimes) other agents. Each callable receives a `ThreadState`. Do not assume Node APIs are available — your code may run on the edge.
390
+ 2. **Subprompts** — prompts exposed as tools via their `toolDescription`. A single-step LLM call. Use for switching models on a focused task (image generation, polished writing, JSON extraction).
391
+ 3. **Subagents** — full agents exposed as tools via `exposeAsTool: true` on the agent definition. Use when you need iteration, QA, reflection, or long-lived addressable behavior. Always `dual_ai`.
392
+
393
+ ### `PromptDefinition` cheat sheet
394
+
395
+ A prompt is what actually gets sent to the LLM at one step. Set on each prompt file in `agents/prompts/`. For full signatures, read the spec types from `node_modules/@standardagents/spec/dist/` (or browse `packages/spec/src/` on GitHub), and see `agents/prompts/AGENTS.md`.
396
+
397
+ ```
398
+ PromptDefinition
399
+ name string unique snake_case identifier
400
+ toolDescription string shown when this prompt is exposed as a tool
401
+ prompt string | PromptContent[] system prompt (string, or composable parts)
402
+ model ModelName references agents/models/<name>
403
+ includeChat boolean (default false) pass full chat history to this LLM step
404
+ includePastTools boolean (default false) pass past tool call results
405
+ parallelToolCalls boolean (default false) allow multiple tool calls per turn
406
+ toolChoice 'auto' | 'none' | 'required' tool calling strategy (default 'auto')
407
+ requiredSchema ZodSchema validate args when called as a tool
408
+ variables VariableDefinition[] declared text/secret variables
409
+ tools (string | SubpromptConfig | PromptToolConfig | SubagentToolConfig)[]
410
+ tools available at this step
411
+ env Record<string, string> prompt-level env values (lowest precedence)
412
+ reasoning ReasoningConfig extended thinking config (for models that support it)
413
+ recentImageThreshold number (default 10) how many recent messages keep real images
414
+ providerOptions Record<string, unknown> passthrough to the provider (overrides model defaults)
415
+ hooks HookId[] prompt-scoped hooks (overrides agent hooks)
675
416
  ```
676
417
 
677
- ## Receiving and sending input
418
+ ### Composable prompts: the `tone` pattern
678
419
 
679
- Within an agent graph, input can come directly via a user, a tool call, or even subagents. In its simplest form, you may have an ai+human agent at the top of the graph, although this works well in simple cases, it's often preferable to have the highest order agent be a high-level coordinator that uses a more intelligent discerning model and is guided by prompting to achieve high-level objectives (ex: "You are project manager for the Jichael Mordan line of shoes at an athletic shoe company. You are in charge of...").
420
+ `prompt` can be a string or an array of parts. Use parts to compose a shared "tone" or "persona" across many prompts so changes flow through one place.
680
421
 
681
- It is quite possible to have an entire agent graph with no `ai_human` agent types, where the only i/o with a human is through a subagent that performs tool calls to, for example, Slack. New messages are queued and posted by `side_b` as "Message from user: "Mow the lawn clanker", and responses would be sent via a tool call send_to_slack(channel, "I don't know how to do that"). However, for a chat-bot-like implementation it's often preferable to use `ai_human` as the top-level coordinator.
422
+ ```ts
423
+ const tonePrompt: PromptDefinition = {
424
+ name: 'company_tone',
425
+ toolDescription: 'Defines the tone and style of the chatbot.',
426
+ prompt: `You are a friendly and helpful customer support assistant for an athletic shoe company. You always respond in a positive and upbeat tone, even when the customer is upset. You use simple language and avoid technical jargon.`,
427
+ model: 'tiny',
428
+ };
682
429
 
683
- ## Inter-agent communication
430
+ const shippingPrompt: PromptDefinition = {
431
+ name: 'shipping_inquiries',
432
+ toolDescription: 'Handles customer questions about shipping.',
433
+ prompt: [
434
+ { type: 'include', prompt: 'company_tone' },
435
+ { type: 'text', content: 'Details about shipping policies: ...' },
436
+ ],
437
+ model: 'tiny',
438
+ };
439
+ ```
684
440
 
685
- A standard agent will often be communicating with many other agents. In almost every situation the agent graph is a tree, meaning each agent can have children, and can have parents. There are a few options that are important architectural decisions when creating these parent/child relationships. The shape of the options is this:
441
+ Use `{ type: 'env', property: 'PRODUCT_INVENTORY' }` parts to inject runtime values into the prompt. Combined with `variables`, this lets a generic agent be specialized per-thread without code changes:
686
442
 
687
443
  ```ts
688
- interface SubagentToolConfig<T extends string = StandardAgentSpec.Callables> {
689
- /**
690
- * Agent callable name.
691
- *
692
- * Must reference a `dual_ai` agent with `exposeAsTool: true`.
693
- */
694
- name: T;
695
- /**
696
- * Whether parent execution blocks until the subagent returns a result.
697
- *
698
- * - `true`: Parent waits for completion (tool-call style)
699
- * - `false`: Parent continues immediately and receives results asynchronously
700
- *
701
- * @default true
702
- */
703
- blocking?: boolean;
704
- /**
705
- * Property from tool-call arguments used as the initial message sent to the
706
- * subagent on invocation.
707
- *
708
- * Uses the same semantics as {@link SubpromptConfig.initUserMessageProperty}.
709
- */
710
- initUserMessageProperty?: StandardAgentSpec.SchemaFields<T>;
711
- /**
712
- * Property from tool-call arguments containing attachment path(s) that should
713
- * be sent to the subagent on invocation.
714
- *
715
- * Uses the same semantics as {@link SubpromptConfig.initAttachmentsProperty}.
716
- */
717
- initAttachmentsProperty?: StandardAgentSpec.SchemaFields<T>;
718
- /**
719
- * Property from tool-call arguments used to assign a human-readable name for
720
- * each spawned child thread instance.
721
- *
722
- * Implementations SHOULD store this as a thread tag in the form
723
- * `name:<value>` so UIs can render a concise per-instance title.
724
- */
725
- initAgentNameProperty?: StandardAgentSpec.SchemaFields<T>;
726
- /**
727
- * Execute this tool immediately when the prompt becomes active.
728
- *
729
- * - `true`: Execute immediately using runtime defaults.
730
- * - Object: Execute immediately with explicit per-instance env relationships.
731
- *
732
- * When the object form is used:
733
- * - `scopedEnv` names the per-instance env values copied into the child thread.
734
- * - `nameEnv` and `descriptionEnv` identify the only per-instance env values
735
- * that runtimes may expose to an internal bootstrap model when deriving
736
- * initial child arguments.
737
- *
738
- * Runtimes MUST NOT expose `scopedEnv` values to the model unless the same env
739
- * name is explicitly designated by `nameEnv` or `descriptionEnv`.
740
- *
741
- * Immediate tools run before the first LLM step for that activation.
742
- */
743
- immediate?: boolean | {
744
- /**
745
- * Scoped env name whose value may be used as the safe per-instance name
746
- * hint for child bootstrap.
747
- */
748
- nameEnv?: string;
749
- /**
750
- * Scoped env name whose value may be used as the safe per-instance
751
- * description hint for child bootstrap.
752
- */
753
- descriptionEnv?: string;
754
- /**
755
- * Scoped env names that should be copied into the child thread for each
756
- * immediate instance group.
757
- */
758
- scopedEnv?: string[];
759
- };
760
- /**
761
- * Optional branch flag env name.
762
- *
763
- * When set, this subagent is only enabled when the named env resolves to
764
- * `true`, `1`, or `yes` (case-insensitive).
765
- */
766
- optional?: string;
767
- /**
768
- * Resumability configuration.
769
- *
770
- * - `false` (default): Non-resumable subagent
771
- * - Object: Resumable subagent with message routing and instance limits
772
- *
773
- * When resumable mode is enabled, runtimes SHOULD provide a built-in create
774
- * and message lifecycle interface instead of exposing raw agent callables for
775
- * new instance creation.
776
- */
777
- resumable?: false | {
778
- /**
779
- * Which side of the child `dual_ai` conversation receives parent messages.
780
- *
781
- * - `side_a`: Messages are queued as `role: 'user'`
782
- * - `side_b`: Messages are queued as `role: 'assistant'`
783
- */
784
- receives_messages: 'side_a' | 'side_b';
785
- /**
786
- * Maximum concurrent instances for this subagent tool.
787
- *
788
- * When reached, implementations may remove this tool from subsequent LLM
789
- * requests and route new messages to existing instances.
790
- *
791
- * @default unlimited
792
- */
793
- maxInstances?: number;
794
- /**
795
- * How this child reports back to its parent.
796
- *
797
- * - `implicit` (default): Child completion is automatically queued to the parent.
798
- * - `explicit`: The runtime does not auto-queue child completion; tools/hooks may
799
- * use thread APIs such as `state.notifyParent()` when they choose to escalate.
800
- */
801
- parentCommunication?: 'implicit' | 'explicit';
802
- };
444
+ {
445
+ name: 'ecommerce_assistant',
446
+ toolDescription: 'Assistant for ecommerce businesses.',
447
+ prompt: [
448
+ { type: 'text', content: 'You help customers with products and orders. Current inventory: ' },
449
+ { type: 'env', property: 'PRODUCT_INVENTORY' },
450
+ ],
451
+ model: 'tiny',
452
+ variables: [
453
+ {
454
+ name: 'PRODUCT_INVENTORY',
455
+ type: 'text',
456
+ required: true,
457
+ description: 'Full product inventory with names, descriptions, and stock levels.',
458
+ },
459
+ ],
803
460
  }
804
461
  ```
805
462
 
806
- 1. Parents always create children.
807
- 1. Parents explicitly create children by calling the tool `subagent_create`.
808
- 2. Parents implicitly create children by having a subagent tool call with `immediate: { ... }` which creates a child thread as soon as the parent thread is activated, without the parent having to explicitly call the tool.
809
- 2. Children only communicate back to their parents:
810
- 1. Implicit communication: the child thread automatically queues a message to the parent thread when it ends a "session". A session ends, typically, when one side calls the tools assigned to `sessionStop` or `sessionFail` properties in the agent definition, or the `maxSessionTurns` is reached. All subagents communicate implicitly, unless they are resumable and have `resumable.parentCommunication` set to `explicit`.
811
- 2. Explicit communication: the child thread never communicates with the parent unless `state.notifyParent()` is called. Typically this is done in the tools that are given to the child thread. These calls are independent from session management. Subagents that receive a lot of inbound traffic, for example a Slack subagent, may want to use explicit communication to have more control over when the parent is notified and reduce noise.
812
- 3. Resumable subagents can receive messages from their parents. Which side receives the message is indicated by the `resumable.receives_messages`.
813
- 4. When sending messages back to the parent, subagents MUST indicate if they require the parent to provide a response or not (in plain english, for example "After researching you must provide a response back so I can continue with sending this email."). For example, a Gmail subagent may ask its parent for guidance on how to respond to an email regarding company policy, the coordinator may ask another subagent with expertise in legal matters and company policy for advice. The legal subagent does not need a response, it is just providing information, but the gmail subagent does require a response in order to proceed. Thus it is critical that the gmail subagent indicates to the parent that it needs a response in order to continue.
814
- 5. File attachments are represented by simple strings. Anytime a file is added to a standard agent's filesystem by generation or upload it is given an explicit attachment "path" `/attachments/{filename}.{ext}`. This MUST be used anytime an agent is coordinating with subagents. The tool definition for a subagent can indicate an `initAttachmentsProperty`, which should be an array of strings — if these strings are valid attachments in the parent's file system, then those attachments will be copied into the subagent's file system and attached along with the `initUserMessageProperty` when the subagent is created (note: this also is true for sub-prompts).
815
- 6. Resumable agents communicate from one agent to another via "silent" user messages. These messages indicate which agent instance they are from (via uuid) as well as the content of the message. The parent agent can then decide what to do with the message, whether to respond to it, or just use the information in the message to make a decision.
816
- 7. If a message is sent to a parent agent or a sub agent that is currently busy, it will be queued and sent when the agent is free.
817
- 8. When writing the language for prompts that include resumable subagents, ensure you clearly describe when the subagent should be created (assuming it's not immediate) via `subagent_create` vs when it should just be given a message via `subagent_message`. For example, if you have a research subagent that is researching a given topic, and another subagent needs additional information on that topic, just send a message back to the same resumable subagent, so its existing context can be useful. But if it's requesting information on a totally new topic, then perhaps it should use `subagent_create`.
463
+ ### `AgentDefinition` cheat sheet
818
464
 
465
+ The agent definition binds prompts and sides together. Set on each file in `agents/agents/`. For full signatures, see `agents/agents/AGENTS.md`.
819
466
 
820
- ## Coordinators and the agent graph
467
+ ```
468
+ AgentDefinition
469
+ name string unique snake_case identifier
470
+ type 'ai_human' | 'dual_ai' default 'ai_human' — but see "dual_ai is the default"
471
+ maxSessionTurns number REQUIRED for dual_ai. Finite turn cap.
472
+ sideA SideConfig AI side (or first AI in dual_ai)
473
+ sideB SideConfig second AI side; required for dual_ai
474
+ exposeAsTool boolean (default false) enables this agent to be called as a tool by other prompts
475
+ toolDescription string required if exposeAsTool: true
476
+ description string brief human description for UIs
477
+ icon string URL or absolute path
478
+ env Record<string, string> agent-level env values
479
+ hooks HookId[] agent-scoped hooks (when prompts don't specify their own)
480
+ ```
821
481
 
822
- The coordinator pattern is akin to the actor model. It can work well when a number of related subagents need to work together. Coordinators provide two significant benefits:
482
+ `SideConfig` is where you bind `defaultPrompt`, `defaultModel`, and the **session lifecycle bindings**:
823
483
 
824
- 1. Provide subagent communication for similar domains. For example, a coding agent with a `research_agent` and a `bash_agent`. The coordinator could request the `research_agent` research a given operation. Once the `research_agent` completes its task, it communicates back to the coordinator. The coordinator, being satisfied with the results, can then instruct the `bash_agent` to run certain commands and provide the results. In this scenario, the bash subagent didn't even need to know the research subagent existed in order to benefit from it.
825
- 2. Filtering. Imagine a `gmail_agent`. On its own it can probably do a pretty good job of filtering out spam. But the `gmail_agent` shouldn't be responsible for the higher-level objectives of an organization. A coordinator that sits above a `gmail_agent` could have directives for what kind of email really matters to the CEO of the sprinkler hose company and can further reduce clutter before it passes email results to it's parent.
484
+ - `sessionStop` name of the tool whose call ends the session with a result
485
+ - `sessionFail` name of the tool whose call ends the session with a failure
486
+ - `sessionStatus` — optional, a tool used to update the session status mid-run
826
487
 
827
- The agent graph is critical to get right. The deeper a graph, the more latency is introduced and the more chances for inter-agent communication to fail. However, deep graphs can also allow for better parallelization. A `marketing_communications_agent` may have a social media subagent: `social_media_agent` which may in turn have a `twitter_agent`, `linkedin_agent`, and `facebook_agent` as subagents. This subtree can perform many of its functions without ever even involving the top-level coordinator, allowing it additional flexibility.
488
+ These are the bindings the [Session boundary discipline](#session-boundary-discipline) section refers to. Every `dual_ai` agent must set them.
828
489
 
829
- ## Hooks
490
+ Packaging fields (`packageName`, `version`, `author`, `license`) exist for the agent packing system; ignore them unless you're publishing.
830
491
 
831
- Hooks are a powerful mechanism for extending the functionality of agents without having to modify their core logic. They can be used for a variety of purposes, such as logging, monitoring, modifying inputs/outputs, and more. Hooks are defined via `defineHook` and can be referenced in both agent and prompt definitions by their unique `id`.
492
+ ### `SubagentToolConfig` cheat sheet
832
493
 
833
- A common use for a hook is to truncate the conversation history, or inject artificial tool calls into the conversation. This can be useful, for example, to give a model awareness of the real time world clock, or to provide additional context without an explicit tool call. The following hooks are supported:
494
+ This is where parent/child architecture is actually configured it lives on the *parent prompt's* `tools` array. Knowing every field exists is essential; full signatures live in the spec types (`node_modules/@standardagents/spec/dist/`, or `packages/spec/src/` on GitHub).
834
495
 
835
- | Hook | Execution Point | Purpose |
836
- |------|-----------------|---------|
837
- | filter_messages | Before LLM context assembly | Filter/transform message history |
838
- | prefilter_llm_history | After context assembly | Final adjustments before LLM request |
839
- | before_create_message | Before message insert | Transform message before storage |
840
- | after_create_message | After message insert | Side effects after storage |
841
- | before_update_message | Before message update | Transform update data |
842
- | after_update_message | After message update | Side effects after update |
843
- | before_store_tool_result | Before tool result storage | Transform tool results |
844
- | after_tool_call_success | After successful tool call | Post-process success results |
845
- | after_tool_call_failure | After failed tool call | Handle/recover from errors |
496
+ ```
497
+ SubagentToolConfig — entry on a parent prompt's `tools` array
498
+ name string dual_ai agent to invoke (must have exposeAsTool: true)
499
+ blocking boolean (default true) parent waits for result, vs. fire-and-forget
500
+ immediate bool | object spawn child when prompt activates, before any LLM step
501
+ object form: { nameEnv, descriptionEnv, scopedEnv }
502
+ scopedEnv values are copied into the child but NOT
503
+ exposed to the bootstrap model unless named in
504
+ nameEnv/descriptionEnv
505
+ resumable false | object false = tool-call style (default)
506
+ object = long-lived addressable child; runtime
507
+ exposes built-in subagent_create / subagent_message
508
+ receives_messages 'side_a' | 'side_b' which side hears parent messages
509
+ side_a = queued as 'user'
510
+ side_b = queued as 'assistant'
511
+ maxInstances number cap on concurrent instances; when reached the tool
512
+ is hidden and new messages route to existing instances
513
+ parentCommunication 'implicit' | 'explicit' implicit (default) auto-queues completion to parent
514
+ explicit requires state.notifyParent() in code
515
+ initUserMessageProperty schema field name tool arg used as child's first user message
516
+ initAttachmentsProperty schema field name tool arg holding /attachments/* paths to copy to child
517
+ initAgentNameProperty schema field name tool arg used to tag the child thread (UI label)
518
+ optional string (env name) subagent disabled unless this env resolves truthy
519
+ ```
846
520
 
847
- Note: Do be careful when applying hooks to ensure matching tool calls and results are not separated or truncated off as this can lead to errors in many models.
521
+ A second tool-config form exists for plain callables (`{ name, env, options }`), and a third for subprompts (`{ name, includeTextResponse, includeToolCalls, includeErrors, initUserMessageProperty, initAttachmentsProperty }`). Both are documented in full at `agents/prompts/AGENTS.md`.
848
522
 
849
- ## Variables and Environment
523
+ ### Inter-agent communication rules
850
524
 
851
- Variables allow tools, prompts, and agents to specify dynamic content they require to function properly. For example a `weather_agent` may have a variable `LOCATION` that it needs in order to provide accurate weather information. Variables can be one of two types: `text` or `secret`. Text variables are simple strings that can be used for any purpose. Secret variables are encrypted and MUST only be used in tools to prevent exposing them to a model. For example, an `GMAIL_API_KEY` variable should be of type `secret` and only used in a tool that makes API calls, it should never be included in a prompt or exposed to the model in any way.
525
+ These are correct as written in the spec internalize them.
852
526
 
853
- When creating a new thread, all required variables used within the agent graph must be provided. Variables can have their values provided by prompts, tools, or threads. An instance of a variable is called an "environment variable" or `env` whereas the definition of the variable is just called a variable.
527
+ 1. **Parents always create children.**
528
+ - Explicitly via the built-in `subagent_create` tool.
529
+ - Implicitly via `immediate: { ... }` — the child spawns the moment the parent prompt activates, before any LLM step.
530
+ 2. **Children only communicate back to their parents.** Two flavors:
531
+ - **Implicit**: the child auto-queues a message to the parent when the session ends (via `sessionStop`, `sessionFail`, or `maxSessionTurns`). Default for all subagents.
532
+ - **Explicit**: only when `state.notifyParent()` is called. Set with `resumable.parentCommunication: 'explicit'`. Use for high-traffic resumables (e.g., a Slack monitor) where you want control over when the parent is interrupted.
533
+ 3. **Resumable subagents can receive messages from their parents.** `resumable.receives_messages` chooses which side hears them.
534
+ 4. **When a child sends back to the parent, it MUST indicate whether it needs a response** — in plain English, e.g. "After researching, you must respond so I can continue with the email." A research subagent providing FYI info doesn't need a response; a gmail subagent waiting on guidance does.
535
+ 5. **File attachments use path strings.** `/attachments/{filename}.{ext}`. Use this convention everywhere — the runtime copies attachments across thread filesystems automatically when listed in `initAttachmentsProperty`.
536
+ 6. **Resumable agents communicate via "silent" user messages** that carry the source instance UUID. The receiving agent decides whether to respond or just absorb the information.
537
+ 7. **Messages to busy agents are queued** and delivered when the agent is free.
538
+ 8. **Prompt language must distinguish `subagent_create` from `subagent_message`.** When writing prompts that include resumable subagents, explicitly tell the model: *"To research a new topic, use `subagent_create`. To follow up on a topic an existing instance is already researching, use `subagent_message`."* Without this guidance, models will pick wrong.
854
539
 
855
- ## ThreadState
540
+ ---
856
541
 
857
- The `ThreadState` object is a powerful tool that is passed to all callables, hooks, and endpoints. It provides access to the current thread's context, including its messages, tools, variables, filesystem, and more. To increase portability, callable tools should only use the `ThreadState` APIs for interacting with the execution environment. For example, rather than using `node:fs` to read files, use `state.readFile()` which will work regardless of the underlying runtime environment.
542
+ ## Hooks
858
543
 
859
- ```ts
860
- /**
861
- * Thread state interface - the unified API for thread interactions.
862
- * Available to tools, hooks, and endpoints.
863
- */
864
- interface ThreadState {
865
- // ─────────────────────────────────────────────────────────────────────────
866
- // Identity (readonly)
867
- // ─────────────────────────────────────────────────────────────────────────
868
- readonly threadId: string;
869
- readonly agentId: string;
870
- readonly userId: string | null;
871
- readonly createdAt: number;
872
- readonly children: SubagentRegistryEntry[];
873
- readonly terminated: number | null;
874
-
875
- // ─────────────────────────────────────────────────────────────────────────
876
- // Messages
877
- // ─────────────────────────────────────────────────────────────────────────
878
- getMessages(options?: GetMessagesOptions): Promise<MessagesResult>;
879
- getMessage(messageId: string): Promise<Message | null>;
880
- injectMessage(input: InjectMessageInput): Promise<Message>;
881
- queueMessage(input: QueueMessageInput): Promise<void>;
882
- updateMessage(messageId: string, updates: MessageUpdates): Promise<Message>;
883
-
884
- // ─────────────────────────────────────────────────────────────────────────
885
- // Logs
886
- // ─────────────────────────────────────────────────────────────────────────
887
- getLogs(options?: GetLogsOptions): Promise<Log[]>;
888
-
889
- // ─────────────────────────────────────────────────────────────────────────
890
- // Resource Loading
891
- // ─────────────────────────────────────────────────────────────────────────
892
- loadModel<T = unknown>(name: string): Promise<T>;
893
- loadPrompt<T = unknown>(name: string): Promise<T>;
894
- loadAgent<T = unknown>(name: string): Promise<T>;
895
- getChildThread(referenceId: string): Promise<ThreadState | null>;
896
- getParentThread(): Promise<ThreadState | null>;
897
- getPromptNames(): string[];
898
- getAgentNames(): string[];
899
- getModelNames(): string[];
900
- env(propertyName: string): Promise<string>;
901
- setEnv(propertyName: string, value: string): Promise<void>;
902
- notifyParent(content: string): Promise<void>;
903
- setStatus(status: string): Promise<void>;
904
-
905
- // ─────────────────────────────────────────────────────────────────────────
906
- // Tool Invocation
907
- // ─────────────────────────────────────────────────────────────────────────
908
- queueTool(toolName: string, args: Record<string, unknown>): void;
909
- invokeTool(toolName: string, args: Record<string, unknown>): Promise<ToolResult>;
910
-
911
- // ─────────────────────────────────────────────────────────────────────────
912
- // Effect Scheduling
913
- // ─────────────────────────────────────────────────────────────────────────
914
- scheduleEffect(name: string, args: Record<string, unknown>, delay?: number): Promise<string>;
915
- getScheduledEffects(name?: string): Promise<ScheduledEffect[]>;
916
- removeScheduledEffect(id: string): Promise<boolean>;
917
-
918
- // ─────────────────────────────────────────────────────────────────────────
919
- // Events
920
- // ─────────────────────────────────────────────────────────────────────────
921
- emit(event: string, data: unknown): void;
922
-
923
- // ─────────────────────────────────────────────────────────────────────────
924
- // Context Storage
925
- // ─────────────────────────────────────────────────────────────────────────
926
- context: Record<string, unknown>;
927
-
928
- // ─────────────────────────────────────────────────────────────────────────
929
- // File System
930
- // ─────────────────────────────────────────────────────────────────────────
931
- writeFile(path: string, data: ArrayBuffer | string, mimeType: string, options?: WriteFileOptions): Promise<FileRecord>;
932
- readFile(path: string): Promise<ArrayBuffer | null>;
933
- readFileStream(path: string, options?: ReadFileStreamOptions): Promise<AsyncIterable<FileChunk> | null>;
934
- statFile(path: string): Promise<FileRecord | null>;
935
- readdirFile(path: string): Promise<ReaddirResult>;
936
- unlinkFile(path: string): Promise<void>;
937
- mkdirFile(path: string): Promise<FileRecord>;
938
- rmdirFile(path: string): Promise<void>;
939
- getFileStats(): Promise<FileStats>;
940
- grepFiles(pattern: string): Promise<GrepResult[]>;
941
- findFiles(pattern: string): Promise<FindResult>;
942
- getFileThumbnail(path: string): Promise<ArrayBuffer | null>;
943
-
944
- // ─────────────────────────────────────────────────────────────────────────
945
- // Execution State
946
- // ─────────────────────────────────────────────────────────────────────────
947
- execution: ExecutionState | null;
948
- terminate(): Promise<void>;
949
-
950
- // ─────────────────────────────────────────────────────────────────────────
951
- // Runtime Context (Non-Portable)
952
- // ─────────────────────────────────────────────────────────────────────────
953
- readonly _notPackableRuntimeContext?: Record<string, unknown>;
954
- }
955
- ```
544
+ Hooks extend agent behavior without modifying core logic. Defined via `defineHook`, referenced by `id` from agent or prompt definitions. Common uses: truncating history, injecting synthetic tool calls (e.g., real-time clock awareness), logging, and adapting tool results.
545
+
546
+ | Hook | Execution Point | Purpose |
547
+ |---|---|---|
548
+ | `filter_messages` | Before LLM context assembly | Filter/transform message history |
549
+ | `prefilter_llm_history` | After context assembly | Final adjustments before LLM request |
550
+ | `before_create_message` | Before message insert | Transform message before storage |
551
+ | `after_create_message` | After message insert | Side effects after storage |
552
+ | `before_update_message` | Before message update | Transform update data |
553
+ | `after_update_message` | After message update | Side effects after update |
554
+ | `before_store_tool_result` | Before tool result storage | Transform tool results |
555
+ | `after_tool_call_success` | After successful tool call | Post-process success results |
556
+ | `after_tool_call_failure` | After failed tool call | Handle/recover from errors |
557
+
558
+ > **Caution:** Hooks that filter or truncate messages must keep matching tool calls and tool results together. Separating them produces hard-to-debug failures in many models.
956
559
 
560
+ ## Variables and environment
957
561
 
958
- ## Further reading
562
+ Variables let tools, prompts, and agents declare dynamic values they need. Two types:
959
563
 
960
- To get an even more detailed understanding of how to create agents, read the following:
564
+ - **`text`** simple string. Safe to render in prompts.
565
+ - **`secret`** — encrypted; **MUST only be used inside tools**. Never reference a secret in prompt text and never return it to the model. A `GMAIL_API_KEY` is `secret`; a `LOCATION` is `text`.
961
566
 
962
- Specification Docummentation: https://standardagentspec.org/llms.txt
963
- Agent Builder Documentation: https://docs.standardagentbuilder.com/llms.txt
567
+ When a thread is created, all required variables in the agent graph must be provided. The instance of a variable on a thread is called an "environment variable" or `env`. Resolution precedence (low → high): prompt → tool → agent → thread.
964
568
 
569
+ Scoped variables (`scoped: true`) do not inherit from parent thread env — they reset for the declaring agent's subtree. Use this when a subagent must run with different config from its parent (e.g., a per-instance Slack channel ID).
570
+
571
+ ---
965
572
 
966
573
  ## Implementation checking
967
574
 
968
- When you edit `agents/`, `prompts/`, `models/`, `tools/`, `hooks/`, or `worker/`,
969
- validate before claiming the change is done.
970
-
971
- Validation order:
972
- 1. Read `package.json` and prefer project scripts if they exist.
973
- 2. Refresh Cloudflare types:
974
- - use `pnpm cf-typegen` if that script exists
975
- - otherwise run `npx wrangler types`
976
- 3. Regenerate AgentBuilder types by running the project's build command:
977
- - usually `pnpm build`, `npm run build`, `pnpm run dev` or equivalent
978
- 4. Run the project's type checker:
979
- - prefer `pnpm type-check` / `npm run type-check` if present
980
- - otherwise use the installed checker directly, such as `pnpm exec vue-tsc --build`, `pnpm exec tsc -b`, or `pnpm exec tsc --noEmit`
981
- 5. If any validation step cannot run, state exactly what is missing.
575
+ When you edit `agents/`, `prompts/`, `models/`, `tools/`, `hooks/`, or `worker/`, validate before claiming the change is done.
576
+
577
+ Validation order:
578
+
579
+ 1. Read `package.json` and prefer project scripts if they exist.
580
+ 2. Refresh Cloudflare types:
581
+ - use `pnpm cf-typegen` if that script exists
582
+ - otherwise run `npx wrangler types`
583
+ 3. Regenerate AgentBuilder types by running the project's build command:
584
+ - usually `pnpm build`, `npm run build`, `pnpm run dev`, or equivalent
585
+ 4. Run the project's type checker:
586
+ - prefer `pnpm type-check` / `npm run type-check` if present
587
+ - otherwise use the installed checker directly: `pnpm exec vue-tsc --build`, `pnpm exec tsc -b`, or `pnpm exec tsc --noEmit`
588
+ 5. If any validation step cannot run, state exactly what is missing.