@mantyx/sdk 0.10.1 → 0.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -16,7 +16,7 @@ Companion documents:
16
16
 
17
17
  ## 1. Concepts
18
18
 
19
- **Ephemeral agent.** A run-time agent that is *defined by the request* rather
19
+ **Ephemeral agent.** A run-time agent that is _defined by the request_ rather
20
20
  than persisted as a row in MANTYX's `Agent` table. The full spec (system
21
21
  prompt, model, tools) is stored as part of each session/run for observability
22
22
  but is not editable from the dashboard.
@@ -24,15 +24,15 @@ but is not editable from the dashboard.
24
24
  **Tool refs.** Seven flavours, all carried inside the agent spec's `tools`
25
25
  array:
26
26
 
27
- | `kind` | Resolved by | Notes |
28
- | ---------------- | ----------- | ----- |
29
- | `mantyx` | server | A workspace `Tool` row referenced by id (HTTP / Code / Plugin). |
30
- | `mantyx_plugin` | server | A platform plugin tool referenced by name. |
31
- | `local` | client | A custom tool defined and executed in the SDK's process. Carries `parameters` (input JSON Schema) plus optional `outputSchema` (return-value JSON Schema) and `longRunning` flag — see §4.1.1. |
32
- | `a2a` | server | A *remote* Agent2Agent peer MANTYX can reach; invoked via `message/send` and the reply is surfaced as the tool result. |
33
- | `a2a_local` | client | An A2A peer MANTYX **cannot** reach. SDK resolves the [Agent Card](https://google.github.io/A2A/specification/#agent-card) locally and ships it inline; MANTYX uses it for the model description and routes calls back to the SDK over SSE. |
34
- | `mcp` | server | A *remote* MCP server (Streamable HTTP). At run start MANTYX lists the catalog and exposes every tool as `<server>_<tool>` (subject to `toolFilter`). |
35
- | `mcp_local` | client | An MCP server MANTYX **cannot** reach. SDK runs `Initialize` + `tools/list` locally and ships the resolved `Tool[]` (with `inputSchema`); MANTYX exposes them to the model with the SDK-declared names and routes calls back over SSE. |
27
+ | `kind` | Resolved by | Notes |
28
+ | --------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
29
+ | `mantyx` | server | A workspace `Tool` row referenced by id (HTTP / Code / Plugin). |
30
+ | `mantyx_plugin` | server | A platform plugin tool referenced by name. |
31
+ | `local` | client | A custom tool defined and executed in the SDK's process. Carries `parameters` (input JSON Schema) plus optional `outputSchema` (return-value JSON Schema) and `longRunning` flag — see §4.1.1. |
32
+ | `a2a` | server | A _remote_ Agent2Agent peer MANTYX can reach; invoked via `message/send` and the reply is surfaced as the tool result. |
33
+ | `a2a_local` | client | An A2A peer MANTYX **cannot** reach. SDK resolves the [Agent Card](https://google.github.io/A2A/specification/#agent-card) locally and ships it inline; MANTYX uses it for the model description and routes calls back to the SDK over SSE. |
34
+ | `mcp` | server | A _remote_ MCP server (Streamable HTTP). At run start MANTYX lists the catalog and exposes every tool as `<server>_<tool>` (subject to `toolFilter`). |
35
+ | `mcp_local` | client | An MCP server MANTYX **cannot** reach. SDK runs `Initialize` + `tools/list` locally and ships the resolved `Tool[]` (with `inputSchema`); MANTYX exposes them to the model with the SDK-declared names and routes calls back over SSE. |
36
36
 
37
37
  The split is deliberate:
38
38
 
@@ -42,7 +42,7 @@ The split is deliberate:
42
42
  MCP/A2A this also means MANTYX does discovery (`listTools`, agent-card
43
43
  fetch).
44
44
  - **Client-resolved / "local"** (`local`, `a2a_local`, `mcp_local`) —
45
- MANTYX has *no* access to the resource. The SDK does **all** of the
45
+ MANTYX has _no_ access to the resource. The SDK does **all** of the
46
46
  work: connection, discovery, listing, expansion, arg validation, auth,
47
47
  execution, retries. MANTYX is a thin LLM-routing layer that emits a
48
48
  `local_tool_call` event and blocks until the SDK POSTs back to
@@ -52,9 +52,9 @@ The split is deliberate:
52
52
 
53
53
  **One-shot run vs. session.** A run is an LLM execution. Runs may be:
54
54
 
55
- - *one-shot* (`POST /agent-runs`) — fire-and-stream, no persistent state apart
55
+ - _one-shot_ (`POST /agent-runs`) — fire-and-stream, no persistent state apart
56
56
  from observability.
57
- - *session-scoped* (`POST /agent-sessions/:id/messages`) — the run inherits the
57
+ - _session-scoped_ (`POST /agent-sessions/:id/messages`) — the run inherits the
58
58
  session's full message history, and the new user/assistant turns are
59
59
  appended back to the session on success.
60
60
 
@@ -75,10 +75,10 @@ Authorization: Bearer <credential>
75
75
  X-API-Key: <credential>
76
76
  ```
77
77
 
78
- | Credential | Token format | Identifies | Bound to | Use when |
79
- | ------------------------- | --------------- | ------------------------ | ----------------------- | -------- |
80
- | **Workspace API key** | `mantyx_…` | The workspace | One workspace, no end-user | Personal scripts, internal automations, anything the SDK caller owns end-to-end. |
81
- | **OAuth 2.0 access token**| `mantyx_at_…` | An end user **and** the workspace they consented for | One workspace, one user (or one app for `client_credentials`) | "Sign in with MANTYX" apps, third-party integrations, anywhere consent + scopes matter. |
78
+ | Credential | Token format | Identifies | Bound to | Use when |
79
+ | -------------------------- | ------------- | ---------------------------------------------------- | ------------------------------------------------------------- | --------------------------------------------------------------------------------------- |
80
+ | **Workspace API key** | `mantyx_…` | The workspace | One workspace, no end-user | Personal scripts, internal automations, anything the SDK caller owns end-to-end. |
81
+ | **OAuth 2.0 access token** | `mantyx_at_…` | An end user **and** the workspace they consented for | One workspace, one user (or one app for `client_credentials`) | "Sign in with MANTYX" apps, third-party integrations, anywhere consent + scopes matter. |
82
82
 
83
83
  The server resolves whichever it sees by token-prefix sniffing (see
84
84
  `packages/api/src/services/bearer-credential.ts`) — SDKs do **not** need
@@ -115,19 +115,19 @@ two differences:
115
115
  multi-scope ones — see §2.3). The SDK is expected to surface this
116
116
  verbatim. The agent-runs surface uses these scopes:
117
117
 
118
- | Endpoint | Required scope |
119
- | ------------------------------------------------------------ | -------------- |
120
- | `GET .../models` | `models:read` |
121
- | `POST .../agent-runs` | `runs:write` |
122
- | `GET .../agent-runs/{runId}` | `runs:read` |
123
- | `GET .../agent-runs/{runId}/stream` | `runs:read` |
124
- | `POST .../agent-runs/{runId}/cancel` | `runs:write` |
125
- | `POST .../agent-runs/{runId}/tool-results` | `runs:write` |
126
- | `POST .../agent-sessions` | `sessions:write` |
127
- | `GET .../agent-sessions/{sessionId}` | `sessions:read` |
128
- | `DELETE .../agent-sessions/{sessionId}` | `sessions:write` |
129
- | `POST .../agent-sessions/{sessionId}/messages` | `sessions:write` |
130
- | `GET /api/oauth/userinfo` | `mantyx.identity:read` |
118
+ | Endpoint | Required scope |
119
+ | ------------------------------------------------ | ---------------------- |
120
+ | `GET .../models` | `models:read` |
121
+ | `POST .../agent-runs` | `runs:write` |
122
+ | `GET .../agent-runs/{runId}` | `runs:read` |
123
+ | `GET .../agent-runs/{runId}/stream` | `runs:read` |
124
+ | `POST .../agent-runs/{runId}/cancel` | `runs:write` |
125
+ | `POST .../agent-runs/{runId}/tool-results` | `runs:write` |
126
+ | `POST .../agent-sessions` | `sessions:write` |
127
+ | `GET .../agent-sessions/{sessionId}` | `sessions:read` |
128
+ | `DELETE .../agent-sessions/{sessionId}` | `sessions:write` |
129
+ | `POST .../agent-sessions/{sessionId}/messages` | `sessions:write` |
130
+ | `GET /api/oauth/userinfo` | `mantyx.identity:read` |
131
131
 
132
132
  For an SDK that exposes one-shot runs and sessions end-to-end, request
133
133
  at minimum `models:read runs:read runs:write sessions:read sessions:write`,
@@ -143,9 +143,10 @@ two differences:
143
143
  OAuth tokens **also** honor the per-token agent allow-list
144
144
  (`OAuthAccessToken.agentIds`) the user picked at consent time — see
145
145
  [`docs/oauth.md`](./oauth.md) for the full registration / authorization-code
146
- + PKCE flow. PKCE (`S256`) is mandatory and every MANTYX OAuth app is a
147
- confidential client, so the token endpoint requires both `client_secret`
148
- and `code_verifier`.
146
+
147
+ - PKCE flow. PKCE (`S256`) is mandatory and every MANTYX OAuth app is a
148
+ confidential client, so the token endpoint requires both `client_secret`
149
+ and `code_verifier`.
149
150
 
150
151
  **Token lifetimes.** Access tokens live **1 hour** (`expires_in: 3600`).
151
152
  Refresh tokens are **persistent and non-rotating**: they have no
@@ -176,14 +177,14 @@ Content-Type: application/json
176
177
 
177
178
  ### 2.3 Error model for credentials
178
179
 
179
- | Status | Body shape | When |
180
- | ------ | ------------------------------------------------------------------------------------- | ---- |
181
- | `401` | `{ "error": "Unauthorized", "message": "API key or OAuth access token required..." }` | No `Authorization` / `X-API-Key` header. |
182
- | `401` | `{ "error": "Invalid API key or OAuth access token" }` | Token doesn't match a row, expired, or revoked. |
183
- | `403` | `{ "error": "This API key is not for the Developer API", "hint": "..." }` | API key has wrong `usage`. |
184
- | `403` | `{ "error": "Workspace API keys are not available on this plan.", "code": "api_keys_plan" }` <br> `{ "error": "OAuth applications are not available on this plan.", "code": "oauth_apps_plan" }` | Workspace tier lacks the `apiKeys` / `oauthApps` feature. |
185
- | `403` | `{ "error": "insufficient_scope", "required": "runs:write" }` (or an array if a route needs multiple) | OAuth token is missing a scope a route demands. The response also sets `WWW-Authenticate: Bearer error="insufficient_scope", scope="..."`. |
186
- | `404` | `{ "error": "Workspace path does not match this credential", "hint": "..." }` | URL slug ≠ token's workspace. |
180
+ | Status | Body shape | When |
181
+ | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------ |
182
+ | `401` | `{ "error": "Unauthorized", "message": "API key or OAuth access token required..." }` | No `Authorization` / `X-API-Key` header. |
183
+ | `401` | `{ "error": "Invalid API key or OAuth access token" }` | Token doesn't match a row, expired, or revoked. |
184
+ | `403` | `{ "error": "This API key is not for the Developer API", "hint": "..." }` | API key has wrong `usage`. |
185
+ | `403` | `{ "error": "Workspace API keys are not available on this plan.", "code": "api_keys_plan" }` <br> `{ "error": "OAuth applications are not available on this plan.", "code": "oauth_apps_plan" }` | Workspace tier lacks the `apiKeys` / `oauthApps` feature. |
186
+ | `403` | `{ "error": "insufficient_scope", "required": "runs:write" }` (or an array if a route needs multiple) | OAuth token is missing a scope a route demands. The response also sets `WWW-Authenticate: Bearer error="insufficient_scope", scope="..."`. |
187
+ | `404` | `{ "error": "Workspace path does not match this credential", "hint": "..." }` | URL slug ≠ token's workspace. |
187
188
 
188
189
  ## 3. Models
189
190
 
@@ -204,7 +205,11 @@ platform-hosted offerings visible to the workspace's tier.
204
205
  "vendorModelId": "claude-sonnet-4-5",
205
206
  "source": "platform_offering",
206
207
  "contextWindowTokens": 200000,
207
- "pricing": { "inputPer1MUsd": 3.0, "outputPer1MUsd": 15.0, "cacheReadPer1MUsd": 0.3 }
208
+ "pricing": {
209
+ "inputPer1MUsd": 3.0,
210
+ "outputPer1MUsd": 15.0,
211
+ "cacheReadPer1MUsd": 0.3,
212
+ },
208
213
  },
209
214
  {
210
215
  "id": "provider:cm6def456",
@@ -213,10 +218,10 @@ platform-hosted offerings visible to the workspace's tier.
213
218
  "vendorModelId": "gpt-5.5",
214
219
  "source": "workspace_provider",
215
220
  "contextWindowTokens": 200000,
216
- "pricing": null
217
- }
221
+ "pricing": null,
222
+ },
218
223
  ],
219
- "defaultModelId": "platform:cm6abc123"
224
+ "defaultModelId": "platform:cm6abc123",
220
225
  }
221
226
  ```
222
227
 
@@ -240,11 +245,11 @@ The agent spec is the body shape used by `POST /agent-runs` and `POST
240
245
 
241
246
  ```jsonc
242
247
  {
243
- "name": "ephemeral", // optional, observability only
244
- "agentId": "agent_cm6abc123", // optional — see §4.1
245
- "systemPrompt": "You are helpful.", // required unless agentId is set
246
- "modelId": "platform:cm6abc123", // optional, see §3
247
- "reasoningLevel": "medium", // optional, see §4.4
248
+ "name": "ephemeral", // optional, observability only
249
+ "agentId": "agent_cm6abc123", // optional — see §4.1
250
+ "systemPrompt": "You are helpful.", // required unless agentId is set
251
+ "modelId": "platform:cm6abc123", // optional, see §3
252
+ "reasoningLevel": "medium", // optional, see §4.4
248
253
  "tools": [
249
254
  { "kind": "mantyx", "id": "tool_cm6..." },
250
255
  { "kind": "mantyx_plugin", "name": "web_search" },
@@ -252,20 +257,22 @@ The agent spec is the body shape used by `POST /agent-runs` and `POST
252
257
  "kind": "local",
253
258
  "name": "read_file",
254
259
  "description": "Read a file from the user's machine",
255
- "parameters": { // JSON Schema for the args object
260
+ "parameters": {
261
+ // JSON Schema for the args object
256
262
  "type": "object",
257
263
  "properties": { "path": { "type": "string" } },
258
264
  "required": ["path"],
259
- "additionalProperties": false
265
+ "additionalProperties": false,
260
266
  },
261
- "outputSchema": { // optional — JSON Schema for the return value
267
+ "outputSchema": {
268
+ // optional — JSON Schema for the return value
262
269
  "type": "object",
263
270
  "properties": {
264
- "bytes": { "type": "string", "description": "UTF-8 file contents" }
271
+ "bytes": { "type": "string", "description": "UTF-8 file contents" },
265
272
  },
266
- "required": ["bytes"]
273
+ "required": ["bytes"],
267
274
  },
268
- "longRunning": false // optional — default false
275
+ "longRunning": false, // optional — default false
269
276
  },
270
277
  {
271
278
  "kind": "a2a",
@@ -273,12 +280,13 @@ The agent spec is the body shape used by `POST /agent-runs` and `POST
273
280
  "description": "Delegate billing questions to the Acme billing agent.",
274
281
  "agentCardUrl": "https://billing.acme.com/.well-known/agent-card.json",
275
282
  "headers": { "Authorization": "Bearer ${BILLING_TOKEN}" },
276
- "contextId": "ctx_abc" // optional A2A context to thread turns
283
+ "contextId": "ctx_abc", // optional A2A context to thread turns
277
284
  },
278
285
  {
279
286
  "kind": "a2a_local",
280
287
  "name": "intranet_hr_agent",
281
- "agentCard": { // SDK-resolved A2A Agent Card content
288
+ "agentCard": {
289
+ // SDK-resolved A2A Agent Card content
282
290
  "protocolVersion": "0.3.0",
283
291
  "name": "Acme HR",
284
292
  "description": "Answers questions about HR policies and benefits.",
@@ -289,72 +297,83 @@ The agent spec is the body shape used by `POST /agent-runs` and `POST
289
297
  {
290
298
  "id": "pto_lookup",
291
299
  "name": "PTO lookup",
292
- "description": "Find a teammate's remaining PTO days for the year."
300
+ "description": "Find a teammate's remaining PTO days for the year.",
293
301
  },
294
302
  {
295
303
  "id": "benefits_qa",
296
304
  "name": "Benefits Q&A",
297
- "description": "Answer questions about insurance, 401k, and parental leave."
298
- }
299
- ]
300
- }
305
+ "description": "Answer questions about insurance, 401k, and parental leave.",
306
+ },
307
+ ],
308
+ },
301
309
  },
302
310
  {
303
311
  "kind": "mcp",
304
- "name": "github", // → tools become github_<tool>
312
+ "name": "github", // → tools become github_<tool>
305
313
  "url": "https://mcp.github.com/v1",
306
314
  "headers": { "Authorization": "Bearer ${GH_PAT}" },
307
- "toolFilter": ["search_repos", "read_file"] // optional allowlist
315
+ "toolFilter": ["search_repos", "read_file"], // optional allowlist
308
316
  },
309
317
  {
310
318
  "kind": "mcp_local",
311
- "name": "fs", // SDK-side server label only — NOT a prefix
312
- "serverInfo": { // optional; from MCP Initialize
319
+ "name": "fs", // SDK-side server label only — NOT a prefix
320
+ "serverInfo": {
321
+ // optional; from MCP Initialize
313
322
  "name": "mcp-server-filesystem",
314
- "version": "0.4.1"
323
+ "version": "0.4.1",
315
324
  },
316
- "tools": [ // verbatim MCP tools/list response
325
+ "tools": [
326
+ // verbatim MCP tools/list response
317
327
  {
318
- "name": "fs_read_file", // model-facing name, exactly as declared
328
+ "name": "fs_read_file", // model-facing name, exactly as declared
319
329
  "description": "Read a file from the user's workstation",
320
- "inputSchema": { // MCP's term — JSON Schema
330
+ "inputSchema": {
331
+ // MCP's term — JSON Schema
321
332
  "type": "object",
322
333
  "properties": { "path": { "type": "string" } },
323
- "required": ["path"]
324
- }
325
- }
326
- ]
327
- }
334
+ "required": ["path"],
335
+ },
336
+ },
337
+ ],
338
+ },
328
339
  ],
329
- "budgets": { "maxToolTurns": 32 }, // optional safety cap
330
- "outputSchema": { // optional, see §4.5
340
+ "budgets": { "maxToolTurns": 32 }, // optional safety cap
341
+ "outputSchema": {
342
+ // optional, see §4.5
331
343
  "name": "weather_report",
332
344
  "schema": {
333
345
  "type": "object",
334
346
  "properties": {
335
347
  "city": { "type": "string" },
336
- "temperature_c": { "type": "number" }
348
+ "temperature_c": { "type": "number" },
337
349
  },
338
- "required": ["city", "temperature_c"]
339
- }
350
+ "required": ["city", "temperature_c"],
351
+ },
340
352
  },
341
- "loopDetection": { // optional, see §4.6
353
+ "loopDetection": {
354
+ // optional, see §4.6
342
355
  "consecutiveThreshold": 3,
343
- "hardCutoffThreshold": 6
356
+ "hardCutoffThreshold": 6,
344
357
  },
345
- "toolBudgets": { // optional, see §4.7
346
- "recall": { "maxCalls": 4 },
358
+ "toolBudgets": {
359
+ // optional, see §4.7
360
+ "recall": { "maxCalls": 4 },
347
361
  "hive_consult_ontology": { "maxCalls": 4 },
348
- "scary_tool": { "maxCalls": 0 }
362
+ "scary_tool": { "maxCalls": 0 },
363
+ },
364
+ "supervisor": {
365
+ // optional, see §4.8 — platform LLM judge; pass false to disable
366
+ "interval": 5,
349
367
  },
350
- "metadata": { // optional, see §4.8
368
+ "metadata": {
369
+ // optional, see §4.9
351
370
  "customer": "acme",
352
- "env": "prod"
353
- }
371
+ "env": "prod",
372
+ },
354
373
  }
355
374
  ```
356
375
 
357
- `POST /agent-runs` additionally accepts `prompt` *or* `messages` (an array of
376
+ `POST /agent-runs` additionally accepts `prompt` _or_ `messages` (an array of
358
377
  `{role, content}`). Sending both is a `400 invalid_request`.
359
378
 
360
379
  ### 4.1 Triggering a persisted MANTYX agent (`agentId`)
@@ -366,7 +385,7 @@ defining an ephemeral one inline. When `agentId` is set:
366
385
  stored system prompt at run time.
367
386
  - `modelId` becomes optional. If omitted, the server uses the agent's
368
387
  configured LLM provider (or the workspace automation provider if the agent
369
- has *Use workspace default model* turned on).
388
+ has _Use workspace default model_ turned on).
370
389
  - The agent's own tools are loaded from its workspace configuration —
371
390
  including memory, skills, and plugin tools — and your `tools` array is
372
391
  **merged on top**. This is typically used to attach `local` tools so the
@@ -389,14 +408,14 @@ the handler in its own process. MANTYX never executes the body — it
389
408
  emits a `local_tool_call` event when the model picks the tool and waits
390
409
  for the SDK to POST a tool-result.
391
410
 
392
- | Field | Required | Notes |
393
- | -------------- | -------- | ----- |
394
- | `kind` | yes | Discriminator literal `"local"`. |
395
- | `name` | yes | Model-facing tool name. Must match `/^[a-zA-Z0-9_]{1,64}$/`. |
396
- | `description` | no | Free-form. Empty when omitted (acceptable, but reduces tool-selection accuracy). |
397
- | `parameters` | no | JSON Schema for the tool's input. Must be a `type: "object"` schema with `properties`; non-object roots are coerced to an empty object schema server-side. Forwarded **verbatim** to the LLM provider so nested constraints (`array.items`, `enum`, `anyOf`, numeric formats, …) survive. Args that fail server-side validation produce a structured `tool_input_invalid` tool result the model can recover from instead of crashing the call. |
398
- | `outputSchema` | no | JSON Schema for the structured value the tool returns. When present, forwarded to providers that accept per-tool response schemas (Gemini's `responseJsonSchema` on the FunctionDeclaration); other engines surface it through the description and rely on host-side validation. Helps the model emit follow-up arguments that round-trip cleanly. Must be an object schema; non-object roots are dropped server-side. |
399
- | `longRunning` | no | When `true`, MANTYX appends a stable hint to the model-facing description so every provider treats the tool as long-running:<br>*"NOTE: This is a long-running operation. Do not call this tool again if it has already returned an intermediate or pending status."*<br>Useful for tools that return `pending` and rely on SDK-side polling — without the hint the model routinely fires repeat calls and burns turns. Pure declarative — MANTYX does not change scheduling. |
411
+ | Field | Required | Notes |
412
+ | -------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
413
+ | `kind` | yes | Discriminator literal `"local"`. |
414
+ | `name` | yes | Model-facing tool name. Must match `/^[a-zA-Z0-9_]{1,64}$/`. |
415
+ | `description` | no | Free-form. Empty when omitted (acceptable, but reduces tool-selection accuracy). |
416
+ | `parameters` | no | JSON Schema for the tool's input. Must be a `type: "object"` schema with `properties`; non-object roots are coerced to an empty object schema server-side. Forwarded **verbatim** to the LLM provider so nested constraints (`array.items`, `enum`, `anyOf`, numeric formats, …) survive. Args that fail server-side validation produce a structured `tool_input_invalid` tool result the model can recover from instead of crashing the call. |
417
+ | `outputSchema` | no | JSON Schema for the structured value the tool returns. When present, forwarded to providers that accept per-tool response schemas (Gemini's `responseJsonSchema` on the FunctionDeclaration); other engines surface it through the description and rely on host-side validation. Helps the model emit follow-up arguments that round-trip cleanly. Must be an object schema; non-object roots are dropped server-side. |
418
+ | `longRunning` | no | When `true`, MANTYX appends a stable hint to the model-facing description so every provider treats the tool as long-running:<br>_"NOTE: This is a long-running operation. Do not call this tool again if it has already returned an intermediate or pending status."_<br>Useful for tools that return `pending` and rely on SDK-side polling — without the hint the model routinely fires repeat calls and burns turns. Pure declarative — MANTYX does not change scheduling. |
400
419
 
401
420
  The `outputSchema` and `longRunning` fields are **additive** since wire
402
421
  protocol v1: SDKs that don't ship them keep working unchanged. Providers
@@ -410,10 +429,10 @@ A2A delegation lets the agent hand a task to another
410
429
  [Agent2Agent](https://google.github.io/A2A/) peer. The wire protocol exposes
411
430
  two kinds depending on **who can reach the peer**:
412
431
 
413
- - `kind: "a2a"` — *remote* (server-resolved). MANTYX dials `agentCardUrl`
432
+ - `kind: "a2a"` — _remote_ (server-resolved). MANTYX dials `agentCardUrl`
414
433
  directly. Pick this when the peer is on the public internet or in the
415
434
  same VPC as MANTYX.
416
- - `kind: "a2a_local"` — *local* (client-resolved). The SDK invokes the peer
435
+ - `kind: "a2a_local"` — _local_ (client-resolved). The SDK invokes the peer
417
436
  on its side and posts back the reply. Pick this when the peer lives on an
418
437
  intranet, behind a VPN, or on the user's device — anywhere MANTYX can't
419
438
  reach but the SDK can.
@@ -431,14 +450,14 @@ POSTs the model's `message` argument to `agentCardUrl` over A2A's standard
431
450
  and `/message/send` endpoints are probed in order) and forwards the remote
432
451
  agent's text reply back as the tool result.
433
452
 
434
- | Field | Required | Notes |
435
- | --------------- | -------- | ----- |
436
- | `kind` | yes | Discriminator literal `"a2a"`. |
437
- | `name` | yes | Tool name surfaced to the model — must match `/^[a-zA-Z0-9_]{1,64}$/`. |
438
- | `description` | no | Model-facing description. Defaults to `"Delegate a task to the <name> agent over A2A. Pass the full task as a single message."`. Mention the remote agent's purpose so the model picks it for the right turn. |
439
- | `agentCardUrl` | yes | URL of the remote Agent Card (`/.well-known/agent-card.json`) or the JSON-RPC root the peer accepts. |
440
- | `headers` | no | Flat string→string HTTP headers sent on every A2A request — typically `Authorization`. Each value is capped at 8 KB. |
441
- | `contextId` | no | A2A `contextId` to thread multiple delegations into the same remote conversation. Omit for fresh per-call context. |
453
+ | Field | Required | Notes |
454
+ | -------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
455
+ | `kind` | yes | Discriminator literal `"a2a"`. |
456
+ | `name` | yes | Tool name surfaced to the model — must match `/^[a-zA-Z0-9_]{1,64}$/`. |
457
+ | `description` | no | Model-facing description. Defaults to `"Delegate a task to the <name> agent over A2A. Pass the full task as a single message."`. Mention the remote agent's purpose so the model picks it for the right turn. |
458
+ | `agentCardUrl` | yes | URL of the remote Agent Card (`/.well-known/agent-card.json`) or the JSON-RPC root the peer accepts. |
459
+ | `headers` | no | Flat string→string HTTP headers sent on every A2A request — typically `Authorization`. Each value is capped at 8 KB. |
460
+ | `contextId` | no | A2A `contextId` to thread multiple delegations into the same remote conversation. Omit for fresh per-call context. |
442
461
 
443
462
  > **Secret handling.** `headers` are forwarded **as-is** by the SDK API. If
444
463
  > you need long-lived credentials (refresh tokens, rotating API keys),
@@ -476,30 +495,30 @@ Per-run lifecycle:
476
495
  5. **Continuation (MANTYX).** MANTYX feeds the reply back into the model
477
496
  loop as the tool result.
478
497
 
479
- | Field | Required | Notes |
480
- | --------------- | -------- | ----- |
481
- | `kind` | yes | Discriminator literal `"a2a_local"`. |
482
- | `name` | yes | Tool name surfaced to the model — must match `/^[a-zA-Z0-9_]{1,64}$/`. |
483
- | `description` | no | Model-facing description override. When omitted, MANTYX synthesizes one from `agentCard.name`, `agentCard.description`, and the first 12 skills. |
484
- | `agentCard` | yes | The resolved A2A Agent Card (JSON content). Schema follows the [A2A Agent Card spec](https://google.github.io/A2A/specification/#agent-card) — passthrough for unknown fields, so any spec-compliant card works. See the *Agent Card shape* table below for the fields MANTYX actually reads. |
498
+ | Field | Required | Notes |
499
+ | ------------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
500
+ | `kind` | yes | Discriminator literal `"a2a_local"`. |
501
+ | `name` | yes | Tool name surfaced to the model — must match `/^[a-zA-Z0-9_]{1,64}$/`. |
502
+ | `description` | no | Model-facing description override. When omitted, MANTYX synthesizes one from `agentCard.name`, `agentCard.description`, and the first 12 skills. |
503
+ | `agentCard` | yes | The resolved A2A Agent Card (JSON content). Schema follows the [A2A Agent Card spec](https://google.github.io/A2A/specification/#agent-card) — passthrough for unknown fields, so any spec-compliant card works. See the _Agent Card shape_ table below for the fields MANTYX actually reads. |
485
504
 
486
505
  **Agent Card shape** (only the fields MANTYX inspects; everything else is
487
506
  forwarded verbatim back to the SDK):
488
507
 
489
- | Card field | Used by MANTYX | Notes |
490
- | --------------------- | -------------- | ----- |
491
- | `protocolVersion` | echo only | A2A protocol version (e.g. `"0.3.0"`). |
492
- | `name` | description | Used when synthesizing the tool description (`"Delegate a task to the <name> agent ..."`). |
493
- | `description` | description | One-paragraph summary of what the peer does — surfaced to the model. |
494
- | `url` | echo only | Peer's A2A endpoint. Forwarded back to the SDK in the `local_tool_call` event so the SDK can dispatch by URL. Never fetched server-side. |
495
- | `version` | echo only | Peer agent version. |
496
- | `provider` | echo only | Vendor info. |
497
- | `capabilities` | echo only | A2A capability flags (streaming, push notifications, …). |
498
- | `defaultInputModes` | echo only | Modalities the peer accepts. |
499
- | `defaultOutputModes` | echo only | Modalities the peer returns. |
500
- | `skills[]` | description | First 12 skills (`name`, `description`) are bulleted into the tool description so the model knows what to ask for. |
501
- | `securitySchemes`, `security` | echo only | Forwarded to the SDK; MANTYX does no auth. |
502
- | *anything else* | echo only | Passthrough — survives round-trip unchanged. |
508
+ | Card field | Used by MANTYX | Notes |
509
+ | ----------------------------- | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
510
+ | `protocolVersion` | echo only | A2A protocol version (e.g. `"0.3.0"`). |
511
+ | `name` | description | Used when synthesizing the tool description (`"Delegate a task to the <name> agent ..."`). |
512
+ | `description` | description | One-paragraph summary of what the peer does — surfaced to the model. |
513
+ | `url` | echo only | Peer's A2A endpoint. Forwarded back to the SDK in the `local_tool_call` event so the SDK can dispatch by URL. Never fetched server-side. |
514
+ | `version` | echo only | Peer agent version. |
515
+ | `provider` | echo only | Vendor info. |
516
+ | `capabilities` | echo only | A2A capability flags (streaming, push notifications, …). |
517
+ | `defaultInputModes` | echo only | Modalities the peer accepts. |
518
+ | `defaultOutputModes` | echo only | Modalities the peer returns. |
519
+ | `skills[]` | description | First 12 skills (`name`, `description`) are bulleted into the tool description so the model knows what to ask for. |
520
+ | `securitySchemes`, `security` | echo only | Forwarded to the SDK; MANTYX does no auth. |
521
+ | _anything else_ | echo only | Passthrough — survives round-trip unchanged. |
503
522
 
504
523
  Local A2A respects the same `localToolTimeoutMs` budget (default 5 minutes)
505
524
  as `kind: "local"`. Tool-result POSTs after timeout return `409 run_terminal`.
@@ -510,25 +529,25 @@ as `kind: "local"`. Tool-result POSTs after timeout return `409 run_terminal`.
510
529
  expose every tool published by an MCP server to the agent loop in one go.
511
530
  Like A2A, the protocol distinguishes by **where the server lives**:
512
531
 
513
- - `kind: "mcp"` — *remote* MCP (Streamable HTTP). MANTYX has network access
532
+ - `kind: "mcp"` — _remote_ MCP (Streamable HTTP). MANTYX has network access
514
533
  to the server, dials it, lists the catalog at run start, and proxies each
515
534
  call server-side. **MANTYX prefixes every discovered tool name with the
516
535
  ref's `name`** (e.g. `github_search_repos`) so multiple MCP servers
517
536
  can coexist without colliding.
518
- - `kind: "mcp_local"` — *local* MCP (stdio, on-device, intranet). MANTYX
537
+ - `kind: "mcp_local"` — _local_ MCP (stdio, on-device, intranet). MANTYX
519
538
  has **no** access to the server; the SDK does discovery, validation, and
520
539
  execution. The SDK declares the tool catalog with **the exact names it
521
540
  wants the model to see** — MANTYX does not auto-prefix.
522
541
 
523
542
  #### `kind: "mcp"` — remote MCP
524
543
 
525
- | Field | Required | Notes |
526
- | -------------- | -------- | ----- |
527
- | `kind` | yes | Discriminator literal `"mcp"`. |
528
- | `name` | yes | Server label — MANTYX prefixes every discovered tool name as `<name>_<tool>`. Must match `/^[a-zA-Z0-9_]{1,64}$/`. |
529
- | `url` | yes | Streamable HTTP MCP endpoint. |
530
- | `headers` | no | Flat string→string HTTP headers (e.g. `Authorization`). Each value capped at 8 KB. |
531
- | `toolFilter` | no | Allowlist of MCP tool names (un-prefixed, as the server returns them). When set, tools not in the list are silently dropped. When omitted, every published tool is exposed. |
544
+ | Field | Required | Notes |
545
+ | ------------ | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
546
+ | `kind` | yes | Discriminator literal `"mcp"`. |
547
+ | `name` | yes | Server label — MANTYX prefixes every discovered tool name as `<name>_<tool>`. Must match `/^[a-zA-Z0-9_]{1,64}$/`. |
548
+ | `url` | yes | Streamable HTTP MCP endpoint. |
549
+ | `headers` | no | Flat string→string HTTP headers (e.g. `Authorization`). Each value capped at 8 KB. |
550
+ | `toolFilter` | no | Allowlist of MCP tool names (un-prefixed, as the server returns them). When set, tools not in the list are silently dropped. When omitted, every published tool is exposed. |
532
551
 
533
552
  If the MCP server is unreachable when the run starts, MANTYX still exposes
534
553
  a single stub tool named `<server>_unavailable` so the model can report the
@@ -566,16 +585,17 @@ Per-run lifecycle:
566
585
  "type": "local_tool_call",
567
586
  "data": {
568
587
  "toolUseId": "tu_x",
569
- "name": "fs_read_file", // SDK-declared name; same string the model called
588
+ "name": "fs_read_file", // SDK-declared name; same string the model called
570
589
  "args": { "path": "/etc/hosts" },
571
590
  "kind": "mcp_local",
572
- "mcpServer": "fs", // the SDK-side label from the ref's `name`
591
+ "mcpServer": "fs", // the SDK-side label from the ref's `name`
573
592
  "mcpToolName": "fs_read_file", // duplicates `name` for the SDK's convenience
574
- "mcpServerInfo": { // present iff the ref carried `serverInfo`
593
+ "mcpServerInfo": {
594
+ // present iff the ref carried `serverInfo`
575
595
  "name": "mcp-server-filesystem",
576
- "version": "0.4.1"
577
- }
578
- }
596
+ "version": "0.4.1",
597
+ },
598
+ },
579
599
  }
580
600
  ```
581
601
 
@@ -587,12 +607,12 @@ Per-run lifecycle:
587
607
  updated `mcp_local` ref inside `POST /agent-sessions/:id/messages`'s
588
608
  `tools` field; the catalog snapshot lives on the run, not the session.
589
609
 
590
- | Field | Required | Notes |
591
- | -------------- | -------- | ----- |
592
- | `kind` | yes | Discriminator literal `"mcp_local"`. |
593
- | `name` | yes | SDK-side server label (e.g. `"fs"`, `"jira"`). Echoed back unchanged as `mcpServer` on every `local_tool_call`. **Not used to prefix tool names.** Match `/^[a-zA-Z0-9_]{1,64}$/`. |
594
- | `serverInfo` | no | The MCP `Implementation` block the SDK got from `Initialize` (`{ name, version? }`, plus any extra fields the server returned). Forwarded to the SDK in `local_tool_call.mcpServerInfo` for observability; not used to drive behavior. |
595
- | `tools` | yes | Verbatim MCP `tools/list` output (1–64 entries). Each item is the standard MCP `Tool` shape: `{ name, description?, inputSchema?, annotations?, … }`. `name` is the model-facing tool name (SDK owns naming). `inputSchema` is the MCP-spec JSON Schema for the tool's arguments — used to constrain the LLM's tool call. Empty `inputSchema` means a no-arg tool. |
610
+ | Field | Required | Notes |
611
+ | ------------ | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
612
+ | `kind` | yes | Discriminator literal `"mcp_local"`. |
613
+ | `name` | yes | SDK-side server label (e.g. `"fs"`, `"jira"`). Echoed back unchanged as `mcpServer` on every `local_tool_call`. **Not used to prefix tool names.** Match `/^[a-zA-Z0-9_]{1,64}$/`. |
614
+ | `serverInfo` | no | The MCP `Implementation` block the SDK got from `Initialize` (`{ name, version? }`, plus any extra fields the server returned). Forwarded to the SDK in `local_tool_call.mcpServerInfo` for observability; not used to drive behavior. |
615
+ | `tools` | yes | Verbatim MCP `tools/list` output (1–64 entries). Each item is the standard MCP `Tool` shape: `{ name, description?, inputSchema?, annotations?, … }`. `name` is the model-facing tool name (SDK owns naming). `inputSchema` is the MCP-spec JSON Schema for the tool's arguments — used to constrain the LLM's tool call. Empty `inputSchema` means a no-arg tool. |
596
616
 
597
617
  Older SDKs that ignore the `kind` discriminator still see a normal
598
618
  `local_tool_call` and can match on `name` alone.
@@ -612,10 +632,10 @@ provider:
612
632
 
613
633
  Two equivalent input shapes are accepted:
614
634
 
615
- | Form | Values | Notes |
616
- | ----------- | ------------------------------------- | ----- |
617
- | **String** | `"off"`, `"low"`, `"medium"`, `"high"` | Snaps to the same anchors the web composer uses (Fast=30, Moderate=50, Smart=80; off=0). |
618
- | **Number** | integer `0`–`100` | Pass-through to `RunAgentOptions.reasoningLevel`. `0` explicitly disables provider thinking even on reasoning models. |
635
+ | Form | Values | Notes |
636
+ | ---------- | -------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
637
+ | **String** | `"off"`, `"low"`, `"medium"`, `"high"` | Snaps to the same anchors the web composer uses (Fast=30, Moderate=50, Smart=80; off=0). |
638
+ | **Number** | integer `0`–`100` | Pass-through to `RunAgentOptions.reasoningLevel`. `0` explicitly disables provider thinking even on reasoning models. |
619
639
 
620
640
  When omitted, MANTYX falls back to the agent's default — for ephemeral
621
641
  specs, that means thinking is off; for `agentId`-backed specs, it follows
@@ -649,29 +669,29 @@ reply directly into downstream code without LLM-flavoured prose to parse out.
649
669
  }
650
670
  ```
651
671
 
652
- | Field | Required | Notes |
653
- | -------- | -------- | ----- |
654
- | `name` | no | Stable identifier passed to providers (OpenAI `text.format.name`, Anthropic synthetic-tool name). Defaults to `"output"`. Must match `/^[a-zA-Z0-9_-]{1,64}$/`. |
672
+ | Field | Required | Notes |
673
+ | -------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
674
+ | `name` | no | Stable identifier passed to providers (OpenAI `text.format.name`, Anthropic synthetic-tool name). Defaults to `"output"`. Must match `/^[a-zA-Z0-9_-]{1,64}$/`. |
655
675
  | `schema` | yes | JSON Schema describing the final assistant text. Root must be a JSON **object** (most providers reject array / scalar roots in structured-output mode). The schema is passed through verbatim — MANTYX does not validate its contents; the provider does. |
656
676
 
657
677
  Validation (server-side, `400 invalid_request` on violation):
658
678
 
659
- | Constraint | Limit |
660
- | ----------------------------------- | ----- |
661
- | Serialized JSON size of `outputSchema` | ≤ 32 KB |
662
- | `name` regex | `/^[a-zA-Z0-9_-]{1,64}$/` |
663
- | `schema` shape | non-`null`, non-array JSON object |
679
+ | Constraint | Limit |
680
+ | -------------------------------------- | --------------------------------- |
681
+ | Serialized JSON size of `outputSchema` | ≤ 32 KB |
682
+ | `name` regex | `/^[a-zA-Z0-9_-]{1,64}$/` |
683
+ | `schema` shape | non-`null`, non-array JSON object |
664
684
 
665
685
  **Per-provider behaviour** (mirrors the SDK's `RunAgentOptions.finalResponseSchema`):
666
686
 
667
- | Provider | How the schema is enforced |
668
- | ------------------------------ | -------------------------- |
669
- | OpenAI Responses (o-series, GPT-5.x, …) | `text.format = { type: "json_schema", strict: true, name, schema }` on every turn (works alongside tool calls). |
670
- | Gemini 3+ (any turn) | `responseMimeType: "application/json"` + `responseJsonSchema` on every `completeTurn`. Gemini 3 accepts the schema alongside `functionDeclarations`. |
671
- | Gemini ≤ 2.5 (no-tools turn) | `responseMimeType: "application/json"` + `responseJsonSchema`. |
672
- | Gemini ≤ 2.5 (with tools) | Synthetic `set_model_response` function declaration is injected; its `parametersJsonSchema` is the supplied schema. The system instruction is augmented to direct the model to call this tool with the final answer. The engine intercepts the call, hides it from the SDK, and surfaces the call's arguments as the assistant text (JSON-stringified). Sidesteps the API rejection ("Function calling with a response mime type: 'application/json' is unsupported") without round-tripping a 4xx. |
673
- | Anthropic / Bedrock-Anthropic | Synthetic `final_report` tool whose `input_schema` is the supplied schema; `tool_choice` is forced on the no-tools finishing turn. The tool's input is surfaced as the assistant text. |
674
- | xAI Grok, others | Ignored (the model returns plain text). |
687
+ | Provider | How the schema is enforced |
688
+ | --------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
689
+ | OpenAI Responses (o-series, GPT-5.x, …) | `text.format = { type: "json_schema", strict: true, name, schema }` on every turn (works alongside tool calls). |
690
+ | Gemini 3+ (any turn) | `responseMimeType: "application/json"` + `responseJsonSchema` on every `completeTurn`. Gemini 3 accepts the schema alongside `functionDeclarations`. |
691
+ | Gemini ≤ 2.5 (no-tools turn) | `responseMimeType: "application/json"` + `responseJsonSchema`. |
692
+ | Gemini ≤ 2.5 (with tools) | Synthetic `set_model_response` function declaration is injected; its `parametersJsonSchema` is the supplied schema. The system instruction is augmented to direct the model to call this tool with the final answer. The engine intercepts the call, hides it from the SDK, and surfaces the call's arguments as the assistant text (JSON-stringified). Sidesteps the API rejection ("Function calling with a response mime type: 'application/json' is unsupported") without round-tripping a 4xx. |
693
+ | Anthropic / Bedrock-Anthropic | Synthetic `final_report` tool whose `input_schema` is the supplied schema; `tool_choice` is forced on the no-tools finishing turn. The tool's input is surfaced as the assistant text. |
694
+ | xAI Grok, others | Ignored (the model returns plain text). |
675
695
 
676
696
  The synthetic-tool paths (Gemini 2.5 + tools, Anthropic) are entirely
677
697
  internal: the SDK never receives a `local_tool_call` for
@@ -727,17 +747,17 @@ The wire shape also accepts the literal `false`:
727
747
  "loopDetection": false // explicitly disable the guard for this run
728
748
  ```
729
749
 
730
- | Field | Type | Required | Notes |
731
- | ---------------------- | --------------- | -------- | ----- |
732
- | `consecutiveThreshold` | integer ≥ 2 | no | Defaults to **3** when the field is omitted. Must be `>= 2` (one identical batch is just a single tool call, not a loop). |
750
+ | Field | Type | Required | Notes |
751
+ | ---------------------- | --------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
752
+ | `consecutiveThreshold` | integer ≥ 2 | no | Defaults to **3** when the field is omitted. Must be `>= 2` (one identical batch is just a single tool call, not a loop). |
733
753
  | `hardCutoffThreshold` | integer ≥ 3 | no | Defaults to **6** when the field is omitted. Must be `> consecutiveThreshold`; otherwise the soft nudge would never get a chance to land. |
734
- | (top-level `false`) | literal `false` | no | Disables the guard entirely for this run. The pipeline still enforces `budgets.maxToolTurns`. |
754
+ | (top-level `false`) | literal `false` | no | Disables the guard entirely for this run. The pipeline still enforces `budgets.maxToolTurns`. |
735
755
 
736
756
  Validation (server-side, `400 invalid_request` on violation):
737
757
 
738
- | Constraint | Limit |
739
- | -------------------------------------------------- | ----- |
740
- | `consecutiveThreshold` / `hardCutoffThreshold` upper bound | `100` |
758
+ | Constraint | Limit |
759
+ | ------------------------------------------------------------------ | -------- |
760
+ | `consecutiveThreshold` / `hardCutoffThreshold` upper bound | `100` |
741
761
  | `hardCutoffThreshold` strictly greater than `consecutiveThreshold` | enforced |
742
762
 
743
763
  **Defaults.** When `loopDetection` is omitted entirely, MANTYX applies the
@@ -776,31 +796,31 @@ tool result.
776
796
  }
777
797
  ```
778
798
 
779
- | Field | Type | Required | Notes |
780
- | ---------- | ----------- | -------- | ----- |
781
- | `<key>` | string | yes | Logical tool name as the model sees it (the same name on `ResolvedTool.name`; the SDK + pipeline handle sanitisation). 1–120 characters. |
799
+ | Field | Type | Required | Notes |
800
+ | ---------- | ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
801
+ | `<key>` | string | yes | Logical tool name as the model sees it (the same name on `ResolvedTool.name`; the SDK + pipeline handle sanitisation). 1–120 characters. |
782
802
  | `maxCalls` | integer ≥ 0 | yes | Hard cap on executed calls per run. `0` disables the tool entirely (every attempt returns the synthetic body on the first try). Budgets are **per-tool, not pooled**: `hive_search_deals: { maxCalls: 5 }` and `hive_search_meetings: { maxCalls: 5 }` give the agent five of each, not five between them. |
783
803
 
784
804
  Validation (server-side, `400 invalid_request` on violation):
785
805
 
786
- | Constraint | Limit |
787
- | --------------------- | ----- |
788
- | Max entries | `32` |
789
- | `<key>` length | `1..120` chars |
806
+ | Constraint | Limit |
807
+ | ---------------------- | -------------------------------------------------------------------------- |
808
+ | Max entries | `32` |
809
+ | `<key>` length | `1..120` chars |
790
810
  | `maxCalls` upper bound | `1000` (functionally unlimited; the SDK's `maxToolTurns: 100` fires first) |
791
811
 
792
812
  **Defaults.** When `toolBudgets` is omitted, MANTYX layers the runtime
793
813
  defaults from `runtime/default-run-guards.ts` on top of the spec. The
794
814
  default research-tool surface is:
795
815
 
796
- | Tool | Default `maxCalls` |
797
- | ------------------------------------------------------------------------------------------------ | ------------------ |
798
- | `recall` (workspace memory hybrid search) | `4` |
799
- | `traverse` (memory graph BFS) | `3` |
800
- | `hive_consult_ontology` (per-hive ontology read; same name across all three hives) | `4` |
801
- | `hive_search_deals` / `_meetings` / `_companies` / `_people` (Sales Hive general search) | `5` |
802
- | `hive_search_tickets` / `_conversations` / `_accounts` (Customer Hive general search) | `5` |
803
- | `hive_search_releases` / `_issues` (Product Hive general search) | `5` |
816
+ | Tool | Default `maxCalls` |
817
+ | ---------------------------------------------------------------------------------------- | ------------------ |
818
+ | `recall` (workspace memory hybrid search) | `4` |
819
+ | `traverse` (memory graph BFS) | `3` |
820
+ | `hive_consult_ontology` (per-hive ontology read; same name across all three hives) | `4` |
821
+ | `hive_search_deals` / `_meetings` / `_companies` / `_people` (Sales Hive general search) | `5` |
822
+ | `hive_search_tickets` / `_conversations` / `_accounts` (Customer Hive general search) | `5` |
823
+ | `hive_search_releases` / `_issues` (Product Hive general search) | `5` |
804
824
 
805
825
  Pass `"toolBudgets": {}` to start from a clean slate (no defaults applied
806
826
  on top — useful for runs that intentionally want unbounded research). When
@@ -828,7 +848,64 @@ during normal multi-entity reads. The loop-detection guard catches the
828
848
  pathological "same `(name, args)` batch over and over" case for that
829
849
  family without needing per-tool caps.
830
850
 
831
- ### 4.8 `metadata` (developer-supplied KV for filtering)
851
+ ### 4.8 `supervisor` (run judge)
852
+
853
+ `supervisor` controls the optional **run supervisor** — an LLM judge that
854
+ periodically reviews the agent's transcript (reasoning, tool calls, tool
855
+ results, visible text) and may steer the run:
856
+
857
+ - **`on_track`** — no-op; the run continues.
858
+ - **`redirect`** — a steering user message is injected; tools stay available.
859
+ - **`finalize`** — the next turn is forced tools-disabled so the run lands a
860
+ clean final answer.
861
+
862
+ Reviews fire every **`interval` LLM calls** (`completeTurn` invocations) at
863
+ the bottom of tool-emitting rounds. Default interval is **5** when enabled.
864
+
865
+ ```jsonc
866
+ "supervisor": {
867
+ "interval": 5 // optional — LLM calls between reviews; default 5
868
+ }
869
+
870
+ // or:
871
+ "supervisor": false // explicitly disable the platform judge for this run
872
+ ```
873
+
874
+ | Field | Type | Required | Notes |
875
+ | ----------------- | --------------- | -------- | ---------------------------------------------------------------------------------------------------------------------------- |
876
+ | `interval` | integer ≥ 1 | no | Defaults to **5** when the supervisor is enabled and `interval` is omitted. Capped at **100** server-side. |
877
+ | (literal `false`) | `false` | no | Disables the run supervisor for this run. `loopDetection` and `toolBudgets` still apply. |
878
+
879
+ **Defaults.** When `supervisor` is **omitted**, MANTYX enables the platform
880
+ LLM judge on ephemeral runs. Pass `"supervisor": false` to opt out.
881
+
882
+ **SDK-only usage.** When calling `@mantyx/ts-sdk` directly (not via
883
+ `POST /agent-runs`), the supervisor is **off unless explicitly configured**:
884
+ pass `supervisor: { review, interval? }` on `RunAgentOptions` to enable a
885
+ caller-supplied judge, or pass `supervisor: false` (or omit the field) to
886
+ keep it disabled. The wire field above controls the **platform-hosted** judge
887
+ on API/ephemeral runs only.
888
+
889
+ Validation (server-side, `400 invalid_request` on violation):
890
+
891
+ | Constraint | Limit |
892
+ | ----------------- | ----- |
893
+ | `interval` upper bound | `100` |
894
+
895
+ **Inheritance for sessions.**
896
+
897
+ - `POST /agent-sessions { supervisor }` — sets the session-default, applied
898
+ to every subsequent message run.
899
+ - `POST /agent-sessions/:id/messages { supervisor }` — optional per-message
900
+ override; applies to that one run only and does not mutate the session's
901
+ stored value.
902
+
903
+ **Observability.** Each review emits a SSE `supervisor` event (see §7) —
904
+ including `on_track` checks — so SDK clients can render supervisor activity.
905
+ When `action` is `redirect` or `finalize`, the pipeline has already applied
906
+ the verdict by the time the event arrives.
907
+
908
+ ### 4.9 `metadata` (developer-supplied KV for filtering)
832
909
 
833
910
  `metadata` is a flat string→string KV that is **persisted alongside the run /
834
911
  session** and surfaced in the MANTYX dashboard. Use it to tag runs with your
@@ -838,12 +915,12 @@ prompt.
838
915
 
839
916
  Validation (server-side, `400 invalid_request` on violation):
840
917
 
841
- | Constraint | Limit |
842
- | ------------------------- | ---------------------------------- |
843
- | Max entries | 16 |
844
- | Key pattern | `^[A-Za-z0-9._-]{1,64}$` |
845
- | Value type / length | string ≤ 256 chars |
846
- | Serialized JSON size | ≤ 4 KB |
918
+ | Constraint | Limit |
919
+ | -------------------- | ------------------------ |
920
+ | Max entries | 16 |
921
+ | Key pattern | `^[A-Za-z0-9._-]{1,64}$` |
922
+ | Value type / length | string ≤ 256 chars |
923
+ | Serialized JSON size | ≤ 4 KB |
847
924
 
848
925
  For session-scoped runs the inheritance rules are:
849
926
 
@@ -872,13 +949,18 @@ POST /api/v1/workspaces/{slug}/agent-runs/{runId}/cancel
872
949
  `POST /agent-runs` returns `202 Accepted` immediately:
873
950
 
874
951
  ```json
875
- { "runId": "run_abc", "streamUrl": "/api/v1/workspaces/acme/agent-runs/run_abc/stream" }
952
+ {
953
+ "runId": "run_abc",
954
+ "streamUrl": "/api/v1/workspaces/acme/agent-runs/run_abc/stream"
955
+ }
876
956
  ```
877
957
 
878
958
  `GET .../stream` is the canonical event channel; see §7.
879
959
 
880
960
  `GET /agent-runs/{runId}` returns the run snapshot (status, final text, error,
881
- spec) without subscribing to live events. Useful for polling long runs.
961
+ spec, plus the cost-attribution triple `tokens` / `turns` / `model`
962
+ see §7.1) without subscribing to live events. Useful for polling long
963
+ runs or attributing spend after the SSE stream was already consumed.
882
964
 
883
965
  ## 6. Sessions
884
966
 
@@ -903,13 +985,15 @@ and returns `{ runId, streamUrl }` just like a one-shot run. Body:
903
985
  ```jsonc
904
986
  {
905
987
  "prompt": "What's in /etc/hosts?",
906
- "tools": [/* optional refresh of tool definitions */]
988
+ "tools": [
989
+ /* optional refresh of tool definitions */
990
+ ],
907
991
  }
908
992
  ```
909
993
 
910
994
  The server prepends the session's prior messages, runs the model, and on
911
995
  success appends the new user/assistant turns back to the session row. Local
912
- tool **handlers** are *not* persisted: the session stores definitions
996
+ tool **handlers** are _not_ persisted: the session stores definitions
913
997
  (name, schema, description) so that a restarted SDK can re-bind handlers and
914
998
  keep going.
915
999
 
@@ -939,9 +1023,6 @@ data: <utf-8 JSON>
939
1023
  `<type>` and `<data>` shapes:
940
1024
 
941
1025
  ```jsonc
942
- // running message
943
- { "seq": 1, "type": "started", "data": {} }
944
-
945
1026
  // streamed assistant tokens (zero or more per turn)
946
1027
  { "seq": 2, "type": "assistant_delta", "data": { "text": "Hello" } }
947
1028
 
@@ -978,9 +1059,30 @@ data: <utf-8 JSON>
978
1059
  // is observability so SDK clients can render "memory budget exhausted" status notes.
979
1060
  { "seq": 7, "type": "tool_budget_exceeded", "data": { "tool": "recall", "maxCalls": 4, "callIndex": 5 } }
980
1061
 
1062
+ // run-supervisor check (see §4.8). Fired on every review — on_track included.
1063
+ { "seq": 7, "type": "supervisor", "data": { "action": "on_track", "reason": "Agent is making progress.", "llmCalls": 5 } }
1064
+ { "seq": 8, "type": "supervisor", "data": { "action": "redirect", "reason": "Stuck re-querying.", "redirect": "Answer from the data you already have.", "llmCalls": 10 } }
1065
+ { "seq": 9, "type": "supervisor", "data": { "action": "finalize", "reason": "Enough to answer.", "llmCalls": 15 } }
1066
+
981
1067
  // terminal event
982
- { "seq": 8, "type": "result", "data": { "subtype": "success", "text": "Final reply" } }
983
- { "seq": 8, "type": "result", "data": { "subtype": "error_local_tool_timeout", "error": "..." } }
1068
+ // Every terminal `result` event also carries `tokens`, `turns`, and `model`
1069
+ // for cost attribution and dashboards see §7.1. Older platforms (pre-
1070
+ // 2026-09) omit these fields; SDK clients detect "no usage data" by
1071
+ // checking that `model.provider` is empty / falsy.
1072
+ { "seq": 8, "type": "result", "data": {
1073
+ "subtype": "success",
1074
+ "text": "Final reply",
1075
+ "tokens": { "inputTokens": 1283, "cachedTokens": 512, "reasoningTokens": 96, "outputTokens": 240 },
1076
+ "turns": 3,
1077
+ "model": { "id": "platform:demo", "provider": "openai", "vendorModelId": "gpt-5.4-mini", "reasoningEffort": "low" }
1078
+ } }
1079
+ { "seq": 8, "type": "result", "data": {
1080
+ "subtype": "error_local_tool_timeout",
1081
+ "error": "...",
1082
+ "tokens": { "inputTokens": 980, "cachedTokens": 0, "reasoningTokens": 0, "outputTokens": 14 },
1083
+ "turns": 2,
1084
+ "model": { "id": "platform:demo", "provider": "anthropic", "vendorModelId": "claude-opus-4-7" }
1085
+ } }
984
1086
  { "seq": 8, "type": "cancelled", "data": {} }
985
1087
  ```
986
1088
 
@@ -991,6 +1093,117 @@ field and the parsed `type` inside `data` — they are always equal, but
991
1093
  implementations should rely on `data.type` because some HTTP middleware
992
1094
  strips the `event:` line.
993
1095
 
1096
+ ### 7.1 Cost-attribution fields (`tokens`, `turns`, `model`)
1097
+
1098
+ Every terminal `result` SSE event (and every terminal `error` event on
1099
+ platforms that emit it — see `docs/wire-protocol.md` §4.7) carries three
1100
+ additional fields so callers can drive cost dashboards, per-turn budgets,
1101
+ and provider/model spend reports without a follow-up
1102
+ `GET /agent-runs/:runId` round trip. The same fields are persisted on the
1103
+ `EphemeralAgentRun` row and surfaced by that endpoint.
1104
+
1105
+ | Field | Type | Notes |
1106
+ | -------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
1107
+ | `tokens` | object | Per-run token totals aggregated across every model invocation. See schema below. |
1108
+ | `turns` | int | Total `engine.completeTurn(...)` invocations for the run. Counts the failing call too — so a single-shot run is `1`, a tool loop is `>= 2`, and a run that errored on its first model call is `1`. Distinct from "tool turns" — `turns` is **model invocations**, regardless of whether the model called any tools. |
1109
+ | `model` | object | Resolved model that actually executed the run. See schema below. |
1110
+
1111
+ Always present on the terminal event for runs created against
1112
+ **MANTYX ≥ 2026-09** servers. Older servers omit these fields entirely;
1113
+ SDK clients (TS/Go/Python) detect "no usage data" by checking that
1114
+ `model.provider` is empty / falsy. JSON keys follow MANTYX's standard
1115
+ camelCase wire convention.
1116
+
1117
+ **`tokens` schema** — mirrors the wire shape produced by
1118
+ `tokenUsageToWireTokens` in `packages/ts-sdk/src/usage-wire.ts`:
1119
+
1120
+ | Field | Type | Notes |
1121
+ | ----------------- | ---- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
1122
+ | `inputTokens` | int | **Total billable input** — fresh prompt tokens **plus** the cached-read slice the provider still bills (at a discount) **plus** any cache-creation tokens **plus** tool-prompt tokens. Equal to the sum of every provider-reported input bucket for the run. |
1123
+ | `cachedTokens` | int | The discounted slice of `inputTokens` that came from a prompt cache hit (Anthropic prompt caching, OpenAI cached prompt, Gemini implicit cache). `0` when the provider doesn't report cache reads or the run didn't hit cache. |
1124
+ | `reasoningTokens` | int | Non-visible thinking tokens. **Already counted inside `outputTokens`** — surfaced separately so dashboards can break out "thinking cost" vs visible output. `0` when the model didn't reason or didn't report it. |
1125
+ | `outputTokens` | int | **All** tokens the model emitted for this run, visible + reasoning. Matches the provider's "completion tokens" / "output tokens" billing line. |
1126
+
1127
+ `inputTokens` and `outputTokens` together cover every billable token the
1128
+ run consumed; `cachedTokens` and `reasoningTokens` are diagnostic
1129
+ breakdowns _inside_ those two totals (not separate buckets to be added).
1130
+
1131
+ **`model` schema** — fields the platform stamps onto every successful
1132
+ or failed run:
1133
+
1134
+ | Field | Type | Notes |
1135
+ | ----------------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
1136
+ | `id` | string | Catalog id — the same string a caller would pass back as `modelId` to re-select this exact entry (e.g. `"platform:demo"`, `"provider:cmf…"`). Empty string against legacy fallbacks that didn't synthesise a catalog id. |
1137
+ | `provider` | string | Lowercase provider id: `"openai"`, `"anthropic"`, `"google"`, `"azure-openai"`. |
1138
+ | `vendorModelId` | string | The model id the platform actually sent to the provider (e.g. `"gpt-5.4-mini"`, `"claude-opus-4-7"`, `"gemini-2.5-pro"`). Carried through from the `model` field on `AgentSpec` after resolution. |
1139
+ | `reasoningEffort` | string | Optional. `"off"`, `"low"`, `"medium"`, `"high"`. Omitted when the provider doesn't expose a reasoning-level knob or the run didn't request one. |
1140
+
1141
+ **Per-provider token mapping.** Provider responses vary in how they
1142
+ report token usage. MANTYX normalises them into the wire shape above as
1143
+ follows:
1144
+
1145
+ | Provider | `inputTokens` ← | `cachedTokens` ← | `reasoningTokens` ← | `outputTokens` ← |
1146
+ | --------- | ----------------------------------------------------------------------------------------------- | ------------------------------------------- | ----------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
1147
+ | OpenAI | `usage.prompt_tokens` (already includes cached read tokens) | `usage.prompt_tokens_details.cached_tokens` | `usage.completion_tokens_details.reasoning_tokens` | `usage.completion_tokens` |
1148
+ | Anthropic | `usage.input_tokens` + `usage.cache_read_input_tokens` + `usage.cache_creation_input_tokens` | `usage.cache_read_input_tokens` | (extended-thinking tokens; folded into `output_tokens` by the provider) | `usage.output_tokens` |
1149
+ | Google | `usageMetadata.promptTokenCount` + `usageMetadata.cachedContentTokenCount` + tool-prompt tokens | `usageMetadata.cachedContentTokenCount` | `usageMetadata.thoughtsTokenCount` | `usageMetadata.candidatesTokenCount` (or `totalTokenCount - promptTokenCount` for older Gemini SDKs) |
1150
+
1151
+ If a provider doesn't report a given bucket the corresponding field is
1152
+ `0`, never `null`.
1153
+
1154
+ **Tool-loop accounting.** When the run executes tool turns, every
1155
+ `engine.completeTurn(...)` invocation contributes its usage to the
1156
+ aggregated `tokens` object — so a run with one tool round (model →
1157
+ tool → model) reports `turns: 2` and the **sum** of both model calls'
1158
+ token usage. The terminal event carries the cumulative totals; no
1159
+ per-turn breakdown is in the terminal event (use the
1160
+ `assistant_message` events for per-turn observability).
1161
+
1162
+ **Snapshot exposure.** `GET /api/v1/workspaces/{slug}/agent-runs/{runId}`
1163
+ also returns `tokens` / `turns` / `model` on the run snapshot JSON, with
1164
+ the same wire shape. The keys are always present (as `null` until the
1165
+ worker writes the terminal event, and on legacy rows pre-rollout) so
1166
+ SDK clients can probe server capability via `"tokens" in body` without
1167
+ triggering an undefined-vs-null distinction across HTTP/JSON
1168
+ serialization.
1169
+
1170
+ **A2A exposure.** The MANTYX-hosted A2A endpoint
1171
+ (`POST /api/a2a/{workspaceSlug}/agents/{agentSlug}`) returns the same
1172
+ triple on the JSON-RPC response under `result.metadata.mantyx`:
1173
+
1174
+ ```jsonc
1175
+ {
1176
+ "result": {
1177
+ "kind": "message",
1178
+ "messageId": "msg_abc",
1179
+ "role": "agent",
1180
+ "parts": [{ "kind": "text", "text": "Final reply" }],
1181
+ "metadata": {
1182
+ "mantyx": {
1183
+ "tokens": {
1184
+ "inputTokens": 1283,
1185
+ "cachedTokens": 512,
1186
+ "reasoningTokens": 96,
1187
+ "outputTokens": 240,
1188
+ },
1189
+ "turns": 3,
1190
+ "model": {
1191
+ "id": "platform:demo",
1192
+ "provider": "openai",
1193
+ "vendorModelId": "gpt-5.4-mini",
1194
+ "reasoningEffort": "low",
1195
+ },
1196
+ },
1197
+ },
1198
+ },
1199
+ }
1200
+ ```
1201
+
1202
+ The `metadata.mantyx` block is omitted entirely against legacy runners
1203
+ that haven't implemented `runWithUsage` on the A2A adapter (see
1204
+ `packages/ts-sdk/src/a2a/adapter.ts`); cross-platform A2A clients
1205
+ should treat its absence as "no usage data" rather than as zero usage.
1206
+
994
1207
  ## 8. Local tool result
995
1208
 
996
1209
  ```
@@ -1027,23 +1240,25 @@ All non-2xx responses use this body shape:
1027
1240
 
1028
1241
  ```jsonc
1029
1242
  {
1030
- "error": "invalid_model", // machine-readable code
1243
+ "error": "invalid_model", // machine-readable code
1031
1244
  "message": "Model 'foo' is ambiguous; pick one of: provider:cm6...",
1032
- "candidates": [/* sometimes present */]
1245
+ "candidates": [
1246
+ /* sometimes present */
1247
+ ],
1033
1248
  }
1034
1249
  ```
1035
1250
 
1036
1251
  Common codes:
1037
1252
 
1038
- | Code | HTTP | Notes |
1039
- | ---------------------- | ---: | ----- |
1040
- | `unauthorized` | 401 | Missing/invalid API key |
1041
- | `not_found` | 404 | Workspace, run, or session unknown |
1042
- | `invalid_request` | 400 | Body failed Zod validation |
1043
- | `invalid_model` | 400 | `modelId` couldn't be resolved |
1044
- | `unknown_tool_use` | 404 | Tool-result for an unknown `toolUseId` |
1045
- | `run_terminal` | 409 | Tool-result after run finished |
1046
- | `rate_limited` | 429 | Per-API-key sliding window |
1253
+ | Code | HTTP | Notes |
1254
+ | ------------------ | ---: | -------------------------------------- |
1255
+ | `unauthorized` | 401 | Missing/invalid API key |
1256
+ | `not_found` | 404 | Workspace, run, or session unknown |
1257
+ | `invalid_request` | 400 | Body failed Zod validation |
1258
+ | `invalid_model` | 400 | `modelId` couldn't be resolved |
1259
+ | `unknown_tool_use` | 404 | Tool-result for an unknown `toolUseId` |
1260
+ | `run_terminal` | 409 | Tool-result after run finished |
1261
+ | `rate_limited` | 429 | Per-API-key sliding window |
1047
1262
 
1048
1263
  ## 11. Suggested client architecture
1049
1264
 
@@ -1061,8 +1276,8 @@ A reference SDK should:
1061
1276
  model-side "don't double-call" hint without hand-editing the
1062
1277
  description.
1063
1278
  - **Local A2A peers** (`kind: "a2a_local"`) — caller-supplied A2A
1064
- clients. Resolve the peer's Agent Card *first* (e.g. `fetch
1065
- "<peer>/.well-known/agent-card.json"` or read from a local registry),
1279
+ clients. Resolve the peer's Agent Card _first_ (e.g. `fetch
1280
+ "<peer>/.well-known/agent-card.json"` or read from a local registry),
1066
1281
  attach it to the spec as `agentCard`, and in the dispatcher look the
1067
1282
  client up by `agentCard.url` (or any other field you indexed on)
1068
1283
  when the `local_tool_call` arrives.
@@ -1074,9 +1289,10 @@ A reference SDK should:
1074
1289
 
1075
1290
  `mantyx`, `mantyx_plugin`, `a2a`, and `mcp` refs are server-resolved —
1076
1291
  no SDK-side registry needed.
1292
+
1077
1293
  3. On `runAgent` / `session.send`:
1078
1294
  - Accept `reasoningLevel` from the caller and pass it through unchanged
1079
- (string `"off" | "low" | "medium" | "high"` *or* number `0–100`); do
1295
+ (string `"off" | "low" | "medium" | "high"` _or_ number `0–100`); do
1080
1296
  **not** translate to a vendor-specific knob — the server owns that
1081
1297
  mapping so all SDKs stay aligned with the web composer.
1082
1298
  - POST the run/message, get `{ runId, streamUrl }`.
@@ -1088,18 +1304,18 @@ A reference SDK should:
1088
1304
  - Treat `thinking_delta` events as opt-in callback fodder; many UIs hide
1089
1305
  them by default. Their presence depends on `reasoningLevel > 0` and
1090
1306
  on the active model exposing thought parts.
1091
- - Accept `loopDetection` and `toolBudgets` from the caller and pass
1092
- them through unchanged (see §4.6 / §4.7). Both fields are *additive*:
1093
- omitting them keeps MANTYX's runtime defaults; passing
1094
- `loopDetection: false` opts out; passing `toolBudgets: {}` clears the
1095
- defaults; passing entries layers caller overrides on top of the
1096
- defaults.
1097
- - Treat `loop_detected` and `tool_budget_exceeded` SSE events as
1098
- observability-only — the server already substituted the synthetic
1099
- tool-results / steering nudges, so the SDK's job is just to surface
1100
- the event to the caller (status banner, log line, telemetry). Do
1101
- **not** abort the run on these events; the run continues through
1102
- `result` / `error` / `cancelled` as usual.
1307
+ - Accept `loopDetection`, `toolBudgets`, and `supervisor` from the caller
1308
+ and pass them through unchanged (see §4.6 / §4.7 / §4.8). All three are
1309
+ _additive_: omitting them keeps MANTYX's runtime defaults; passing
1310
+ `loopDetection: false` or `supervisor: false` opts out; passing
1311
+ `toolBudgets: {}` clears the defaults; passing entries layers caller
1312
+ overrides on top of the defaults.
1313
+ - Treat `loop_detected`, `tool_budget_exceeded`, and `supervisor` SSE
1314
+ events as observability-only — the server already substituted synthetic
1315
+ tool-results / steering nudges / supervisor verdicts where applicable, so
1316
+ the SDK's job is just to surface the event to the caller (status banner,
1317
+ log line, telemetry). Do **not** abort the run on these events; the run
1318
+ continues through `result` / `error` / `cancelled` as usual.
1103
1319
  - On terminal `result`, resolve the call. On `error` subtype, throw.
1104
1320
  4. Re-emit assistant deltas/events as a stream/iterator for callers who care
1105
1321
  about live output.