@mantyx/sdk 0.9.0 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,1102 @@
1
+ # MANTYX Wire Protocol — messaging & data structures
2
+
3
+ This is the SDK-builder reference for the **messaging layer** that sits on top
4
+ of the HTTP / SSE endpoints documented in
5
+ [`agent-runs-protocol.md`](./agent-runs-protocol.md). It catalogs every
6
+ message shape MANTYX and a client SDK exchange during an agent run, in the
7
+ order they flow on the wire, and pins down the resolved data structures the
8
+ SDK is expected to ship for client-resolved (`*_local`) tools.
9
+
10
+ If you're just looking for HTTP routes, auth, body shapes, or session
11
+ semantics, start with `agent-runs-protocol.md`. If you're writing or
12
+ maintaining an SDK and want to know *exactly* what a `local_tool_call` event
13
+ looks like for `mcp_local`, you're in the right place.
14
+
15
+ > **Authentication.** Every example below uses
16
+ > `Authorization: Bearer <api-key>` for brevity. The same header also
17
+ > accepts a MANTYX OAuth 2.0 access token (`mantyx_at_…`) — the server
18
+ > resolves either kind by token-prefix, so SDKs only need a single
19
+ > credential code path. OAuth tokens additionally enforce per-route
20
+ > **scopes** (`runs:read`, `runs:write`, `sessions:read`, `sessions:write`,
21
+ > `models:read`, `mantyx.identity:read`); see §2 of
22
+ > `agent-runs-protocol.md` for the per-endpoint scope table and
23
+ > [`docs/oauth.md`](./oauth.md) for the registration / Authorization Code
24
+ > + PKCE flow.
25
+
26
+ > **Stability.** Field names listed in *bold* are part of the documented
27
+ > stable surface. Any other fields are passed through verbatim and survive
28
+ > round-trips, but their semantics are not contractually guaranteed. The
29
+ > server uses Zod with `passthrough` for all `*_local` resolved-content
30
+ > blobs (Agent Card, MCP `Tool[]`, server `Implementation`) so future spec
31
+ > additions flow through without a server-side schema bump.
32
+
33
+ ---
34
+
35
+ ## 0. Glossary
36
+
37
+ | Term | Meaning |
38
+ | ------------------- | ------- |
39
+ | **MANTYX** | The agent operating system server (this repo). Owns LLM orchestration, tool execution for server-resolved tools, persistence. |
40
+ | **SDK** | Anything calling the public agent-runs API — typically `@mantyx/ts-sdk`, but also other-language SDKs and direct HTTP clients. |
41
+ | **Agent run** | A single LLM execution. Streams events; ends with a terminal `result` / `error` / `cancelled`. |
42
+ | **Spec** | The JSON object describing what the run does — model, prompt, tools, budgets, optional `reasoningLevel`. Sent in the `POST /agent-runs` (or `.../messages`) body. |
43
+ | **Tool ref** | One entry in `spec.tools[]`. A discriminated union keyed by `kind`. |
44
+ | **Server-resolved** | A tool MANTYX executes itself (`mantyx`, `mantyx_plugin`, `a2a`, `mcp`). The SDK only sees informational `tool_result` events. |
45
+ | **Client-resolved** | A tool the SDK executes (`local`, `a2a_local`, `mcp_local`). MANTYX emits `local_tool_call`, the SDK does the work, the SDK posts back to `.../tool-results`. |
46
+ | **Resolution** | The act of turning an external resource (A2A peer, MCP server) into a self-contained JSON document the model can reason about. For `*_local` kinds, resolution is the **SDK's** responsibility. |
47
+
48
+ ---
49
+
50
+ ## 1. The full message lifecycle
51
+
52
+ ```text
53
+ SDK MANTYX
54
+ │ │
55
+ │ ── (resolve A2A cards / MCP catalogs ──────▶ │ (offline, SDK-side)
56
+ │ locally; cache as needed) │
57
+ │ │
58
+ │ ── POST /agent-runs ─────────────────────▶ │
59
+ │ body: spec (model, prompt, tools, …) │
60
+ │ │
61
+ │ ◀────────────────────── 201 { runId, │
62
+ │ streamUrl } │
63
+ │ │
64
+ │ ── GET streamUrl (text/event-stream) ────▶ │
65
+ │ │
66
+ │ ┌─────────────────────────────────┤ ───┐
67
+ │ ◀ SSE event│ assistant_delta │ │
68
+ │ ◀ SSE event│ thinking_delta (iff reasoning) │ │ model loop
69
+ │ ◀ SSE event│ tool_result (server-resolved) │ │
70
+ │ ◀ SSE event│ local_tool_call ◀──┐ │ │
71
+ │ └────────────────────┼────────────┤ ───┘
72
+ │ │ │
73
+ │ ── POST .../tool-results ───────┘──────────▶ │
74
+ │ { toolUseId, result | error } │
75
+ │ │
76
+ │ ◀ SSE event result (terminal) │
77
+ │ │
78
+ │ (close stream) │
79
+ ```
80
+
81
+ The lifecycle has three logical stages:
82
+
83
+ 1. **Setup (SDK-only).** For any `*_local` tool the SDK plans to expose, it
84
+ pre-resolves the resource locally. For `a2a_local` that means fetching
85
+ or constructing the peer's [Agent Card](https://google.github.io/A2A/specification/#agent-card)
86
+ JSON. For `mcp_local` that means speaking MCP `Initialize` + `tools/list`
87
+ and capturing the result. All resolved content is shipped inline inside
88
+ the spec; nothing is fetched server-side.
89
+ 2. **Run (MANTYX-driven).** The SDK opens the SSE stream and listens. MANTYX
90
+ runs the LLM loop, executing server-resolved tools itself and emitting
91
+ `local_tool_call` for client-resolved ones. The SDK answers each
92
+ `local_tool_call` by posting back a tool result.
93
+ 3. **Termination.** A terminal `result` (success), `error` (failure), or
94
+ `cancelled` event closes the run. The SSE stream is then safe to close.
95
+
96
+ ---
97
+
98
+ ## 2. Spec submission (SDK → MANTYX)
99
+
100
+ `POST /api/v1/workspaces/{slug}/agent-runs` accepts the spec body. Only the
101
+ new wire-protocol additions are documented in detail here; for the full
102
+ spec body shape (system prompt, tool budgets, sessions, `agentId`
103
+ short-circuit, etc.) see `agent-runs-protocol.md` §4.
104
+
105
+ ### 2.1 Spec body (top-level shape)
106
+
107
+ ```jsonc
108
+ {
109
+ "modelId": "openai:gpt-5.5",
110
+ "systemPrompt": "...",
111
+ "prompt": "...", // OR "messages": [...]
112
+ "tools": [ /* tool refs — see §3 */ ],
113
+ "reasoningLevel": "medium", // optional; see §6
114
+ "budgets": { "maxToolTurns": 32 },
115
+ "outputSchema": { // optional; see §7
116
+ "name": "weather_report", // defaults to "output"
117
+ "schema": { /* JSON Schema */ }
118
+ },
119
+ "loopDetection": { // optional; see §8
120
+ "consecutiveThreshold": 3,
121
+ "hardCutoffThreshold": 6
122
+ },
123
+ "toolBudgets": { // optional; see §8
124
+ "recall": { "maxCalls": 4 },
125
+ "hive_consult_ontology": { "maxCalls": 4 }
126
+ },
127
+ "metadata": { "customer": "acme" } // optional, free-form k/v
128
+ }
129
+ ```
130
+
131
+ ### 2.2 Sessions
132
+
133
+ Same body shape, posted to `POST /agent-sessions/:id/messages`. The session
134
+ keeps the conversation history; per-message `tools`, `reasoningLevel`,
135
+ `outputSchema`, `loopDetection`, and `toolBudgets` *replace* the session's
136
+ defaults for that single run only — the next run falls back to whatever
137
+ the session was created with.
138
+
139
+ ---
140
+
141
+ ## 3. Tool ref taxonomy
142
+
143
+ Every entry in `spec.tools[]` is one of the seven shapes below. The
144
+ *resolution column* is the contract that drives everything else: **server**
145
+ means MANTYX runs the tool itself and the SDK only ever sees a
146
+ `tool_result` event; **client** means MANTYX is a transport and the SDK
147
+ must answer `local_tool_call` events.
148
+
149
+ | Kind | Resolution | Wire-payload contract |
150
+ | ---------------- | ---------- | --------------------- |
151
+ | `mantyx` | server | `{ id }` reference to a workspace `Tool` row. |
152
+ | `mantyx_plugin` | server | `{ name }` reference to a platform plugin tool. |
153
+ | `local` | client | `{ name, description?, parameters?, outputSchema?, longRunning? }` — `parameters` is **JSON Schema** (object schema with `properties`/`required`); forwarded verbatim to the LLM provider and validated against incoming tool-call args before execution. `outputSchema` (optional) is JSON Schema for the tool's structured return value, surfaced to providers that accept per-tool response schemas. `longRunning` (optional, default `false`) annotates the model-facing description with a "don't double-call while pending" hint so every provider treats the tool as long-running. |
154
+ | `a2a` | server | `{ name, agentCardUrl, headers?, contextId?, description? }`. |
155
+ | `a2a_local` | client | `{ name, agentCard }` — **resolved A2A Agent Card JSON content**. |
156
+ | `mcp` | server | `{ name, url, headers?, toolFilter? }`. |
157
+ | `mcp_local` | client | `{ name, serverInfo?, tools[] }` — **resolved MCP `Tool[]`**. |
158
+
159
+ The remainder of this document focuses on `local`, `a2a_local`, and
160
+ `mcp_local`, because they're the ones that carry SDK-defined structured
161
+ content. For the wire shapes of the four server-resolved kinds, see
162
+ `agent-runs-protocol.md` §4.
163
+
164
+ ### 3.1 `kind: "local"` — generic local tools
165
+
166
+ The minimal client-resolved tool: the SDK declares a name + JSON Schema
167
+ and implements the handler in its own process. Useful for any tool MANTYX
168
+ shouldn't (or can't) execute itself — file system access, on-device APIs,
169
+ caller-specific business logic.
170
+
171
+ **Wire shape:**
172
+
173
+ ```jsonc
174
+ {
175
+ "kind": "local",
176
+ "name": "send_email", // model-facing; /^[a-zA-Z0-9_]{1,64}$/
177
+ "description": "Send a transactional email.",
178
+ "parameters": { // OPTIONAL; JSON Schema for args
179
+ "type": "object",
180
+ "properties": {
181
+ "to": { "type": "string", "format": "email" },
182
+ "subject": { "type": "string" },
183
+ "body": { "type": "string" }
184
+ },
185
+ "required": ["to", "subject", "body"],
186
+ "additionalProperties": false
187
+ },
188
+ "outputSchema": { // OPTIONAL; JSON Schema for the return value
189
+ "type": "object",
190
+ "properties": { "id": { "type": "string" } },
191
+ "required": ["id"],
192
+ "additionalProperties": false
193
+ },
194
+ "longRunning": false // OPTIONAL; default false
195
+ }
196
+ ```
197
+
198
+ **Field reference:**
199
+
200
+ | Field | Required | Notes |
201
+ | -------------- | -------- | ----- |
202
+ | `kind` | yes | Discriminator literal `"local"`. |
203
+ | `name` | yes | Model-facing tool name. Must match `/^[a-zA-Z0-9_]{1,64}$/`. |
204
+ | `description` | no | Free-form. When omitted the model sees an empty description (acceptable but reduces tool selection accuracy). |
205
+ | `parameters` | no | JSON Schema for the tool's input. Must be an object schema (`type: "object"` with `properties`); other shapes are coerced to an empty object schema server-side. Nested constraints (`array.items`, `enum`, `anyOf`, …) are preserved end-to-end. Args that fail server-side validation produce a structured `tool_input_invalid` tool result the model can recover from instead of crashing the call. |
206
+ | `outputSchema` | no | JSON Schema for the structured value the tool returns. Forwarded to providers with per-tool response schemas (Gemini's `responseJsonSchema` on the FunctionDeclaration); other engines surface it through the description and rely on host-side validation. The model uses it to plan follow-up arguments more reliably. Must be an object schema; non-object roots are dropped server-side (engines reject non-object roots in this position). |
207
+ | `longRunning` | no | When `true`, MANTYX appends a stable hint to the description:<br>*"NOTE: This is a long-running operation. Do not call this tool again if it has already returned an intermediate or pending status."*<br>Useful for tools where a single call may yield a `pending` / status response and the SDK polls on its own; without the hint, models routinely fire repeat calls and waste turns. Pure declarative — MANTYX does not change scheduling. |
208
+
209
+ **Tool call dispatch.** When the model calls a `local` tool, the SSE
210
+ stream emits `local_tool_call` with `kind: "local"` (or omitted, for
211
+ backward compatibility). The SDK runs the handler and POSTs back to
212
+ `.../tool-results`. See §4.3.1 for the event shape.
213
+
214
+ ### 3.2 `a2a_local` — SDK-resolved Agent Card
215
+
216
+ The defining feature of `a2a_local` is that the SDK ships a fully-resolved
217
+ [A2A Agent Card](https://google.github.io/A2A/specification/#agent-card) as
218
+ the `agentCard` field. MANTYX never reaches out to discover it.
219
+
220
+ **Wire shape:**
221
+
222
+ ```jsonc
223
+ {
224
+ "kind": "a2a_local",
225
+ "name": "intranet_hr_agent", // model-facing; /^[a-zA-Z0-9_]{1,64}$/
226
+ "description": "...", // OPTIONAL; overrides the synthesized one
227
+ "agentCard": { // REQUIRED; A2A Agent Card content
228
+ "protocolVersion": "0.3.0",
229
+ "name": "Acme HR",
230
+ "description": "Answers questions about HR policies and benefits.",
231
+ "url": "https://hr.intranet.acme/a2a",
232
+ "version": "1.4.0",
233
+ "provider": { "organization": "Acme Co.", "url": "https://acme.example/" },
234
+ "documentationUrl": "https://hr.intranet.acme/docs",
235
+ "iconUrl": "https://hr.intranet.acme/icon.png",
236
+ "capabilities": { "streaming": false, "pushNotifications": false },
237
+ "defaultInputModes": ["text/plain"],
238
+ "defaultOutputModes": ["text/plain"],
239
+ "skills": [
240
+ {
241
+ "id": "pto_lookup",
242
+ "name": "PTO lookup",
243
+ "description": "Find a teammate's remaining PTO days for the year.",
244
+ "tags": ["hr", "pto"],
245
+ "examples": ["How many PTO days does Alice have left?"]
246
+ }
247
+ ],
248
+ "securitySchemes": { /* spec-shaped, never read by MANTYX */ },
249
+ "security": [ /* spec-shaped, never read by MANTYX */ ]
250
+ /* …any other A2A spec field passes through unchanged. */
251
+ }
252
+ }
253
+ ```
254
+
255
+ **Where the SDK obtains `agentCard`:**
256
+
257
+ - *Well-known URL.* Most peers expose the card at
258
+ `<peer>/.well-known/agent-card.json`. The SDK can simply
259
+ `fetch` it (with whatever auth applies on the local network).
260
+ - *Static config.* For peers that don't publish a card, hand-craft one — the
261
+ spec only requires a couple of fields and the rest is all metadata.
262
+ - *Registry / cache.* Cache cards locally and refresh periodically. MANTYX
263
+ treats every spec submission as a fresh snapshot, so new cards take
264
+ effect on the next run / message.
265
+
266
+ **What MANTYX does with `agentCard`:**
267
+
268
+ | Field | Used for | Notes |
269
+ | ------------------------ | -------- | ----- |
270
+ | `name`, `description` | Tool description for the model | Used to compose `"Delegate a task to <name>: <description>"` if no `description` override is supplied at the ref level. |
271
+ | `skills[]` (first 12) | Tool description for the model | Bulleted into the description so the model can choose a peer based on capability. |
272
+ | All other fields | Echo only | Forwarded back to the SDK in every `local_tool_call` event so the SDK can dispatch by `url`, by `provider.organization`, by `protocolVersion`, or whatever it indexed on. |
273
+
274
+ ### 3.3 `mcp_local` — SDK-resolved Tool catalog
275
+
276
+ The defining feature of `mcp_local` is that the SDK ships the **verbatim
277
+ output of MCP `tools/list`** as `tools[]`, with field names matching the
278
+ MCP spec (`inputSchema`, not `parameters`). Optionally, the SDK can also
279
+ ship the `Implementation` block from MCP `Initialize` as `serverInfo`.
280
+
281
+ **Wire shape:**
282
+
283
+ ```jsonc
284
+ {
285
+ "kind": "mcp_local",
286
+ "name": "fs", // SDK-side server label; not a name prefix
287
+ "serverInfo": { // OPTIONAL; from MCP Initialize
288
+ "name": "mcp-server-filesystem",
289
+ "version": "0.4.1"
290
+ /* …any other Implementation field passes through unchanged. */
291
+ },
292
+ "tools": [ // REQUIRED; verbatim MCP tools/list output
293
+ {
294
+ "name": "fs_read_file", // model-facing; /^[a-zA-Z0-9_]{1,64}$/; SDK owns naming
295
+ "description": "Read a file under /workspace.",
296
+ "inputSchema": { // MCP's term for the JSON Schema
297
+ "type": "object",
298
+ "properties": { "path": { "type": "string" } },
299
+ "required": ["path"]
300
+ },
301
+ "annotations": { // OPTIONAL; spec-defined hints
302
+ "readOnlyHint": true,
303
+ "openWorldHint": false
304
+ }
305
+ /* …any other MCP Tool field passes through unchanged. */
306
+ }
307
+ ]
308
+ }
309
+ ```
310
+
311
+ **Where the SDK obtains `tools[]`:**
312
+
313
+ ```ts
314
+ // pseudo-code, MCP-SDK-flavoured
315
+ const client = new McpClient(stdio("./fs-server"));
316
+ const init = await client.initialize(); // → { name, version, … }
317
+ const list = await client.listTools(); // → { tools: [...] }
318
+
319
+ // drop straight into the spec
320
+ const ref = {
321
+ kind: "mcp_local" as const,
322
+ name: "fs",
323
+ serverInfo: init,
324
+ tools: list.tools,
325
+ };
326
+ ```
327
+
328
+ **What MANTYX does with the catalog:**
329
+
330
+ | Field | Used for | Notes |
331
+ | ------------------------ | -------- | ----- |
332
+ | `tools[].name` | Model-facing tool name | Used as-is. MANTYX does **not** prefix with the ref's `name`. The SDK is responsible for any naming convention (e.g. emit `fs_read_file` instead of `read_file` if you have multiple servers). |
333
+ | `tools[].description` | Model-facing description | Used as-is. |
334
+ | `tools[].inputSchema` | LLM tool-call schema | Forwarded **verbatim** to the LLM provider as the tool's JSON Schema, then validated against incoming tool-call args (Ajv) before execution. Nested constraints (`array.items`, `enum`, `anyOf`, …) are preserved end-to-end. Empty / missing schema → no-arg tool. Args that violate the schema produce a structured `tool_input_invalid` tool result the model can recover from instead of crashing the tool. |
335
+ | `tools[].annotations` | Echo only | Forwarded to the SDK in `local_tool_call` events (as part of the call envelope) for observability. |
336
+ | `serverInfo` | Echo only | Forwarded to the SDK in `local_tool_call.mcpServerInfo`. |
337
+
338
+ > **Naming convention reminder.** Because MANTYX doesn't prefix names for
339
+ > `mcp_local`, two refs that both expose a tool called `read_file` will
340
+ > collide. Either give the second one a different `name` in the catalog or
341
+ > drop it via SDK-side filtering. (For `mcp` — *remote* MCP — MANTYX does
342
+ > auto-prefix with the ref's `name`, so collisions are impossible.)
343
+
344
+ ---
345
+
346
+ ## 4. SSE event vocabulary
347
+
348
+ The SSE stream is opened with `GET /agent-runs/:runId/stream`. Standard
349
+ SSE rules apply: each frame is `data: <json>\n\n`, with an `id: <seq>` line
350
+ so reconnects can use `Last-Event-ID`.
351
+
352
+ Every event payload has the same envelope:
353
+
354
+ ```jsonc
355
+ { "seq": 7, "type": "<event-type>", "data": { /* type-specific */ } }
356
+ ```
357
+
358
+ The vocabulary (`EphemeralEventType` in `bus.ts`):
359
+
360
+ | Type | Direction | Frequency | Purpose |
361
+ | ----------------------- | --------- | --------- | ------- |
362
+ | `assistant_delta` | M → SDK | Many | Streamed assistant text token / chunk. |
363
+ | `thinking_delta` | M → SDK | Many (iff `reasoningLevel > 0`) | Streamed extended-thinking text (provider redacts when policy requires). |
364
+ | `tool_result` | M → SDK | Per server-resolved tool call | Informational — tells the SDK that MANTYX ran a server-resolved tool (`mantyx`, `mantyx_plugin`, `a2a`, `mcp`) and got a result. The SDK does not need to act on it. |
365
+ | `local_tool_call` | M → SDK | Per client-resolved tool call | **Action required.** SDK must POST a tool-result. |
366
+ | `local_tool_result_in` | M → SDK | Per client-resolved tool call | Informational mirror of the tool-result the SDK just posted, persisted for observability. Re-emitted to late subscribers so they can replay the conversation. |
367
+ | `loop_detected` | M → SDK | 0–2× per run (soft nudge + optional hard cutoff) | Observability for the loop-detection guard (see §8). The server already substituted the synthetic skip + steering nudge — SDK clients render a status note (`looping — nudged` / `looping — gave up`) and otherwise leave the run alone. |
368
+ | `tool_budget_exceeded` | M → SDK | Per intercepted tool call | Observability for per-tool call budgets (see §8). The synthetic `tool_result` carrying the "budget exceeded — pivot or finalize" body lands on the normal tool-result channel; this event is purely so SDK clients can surface a UI banner. |
369
+ | `assistant_message` | M → SDK | 1× per turn | Final assistant message for the turn (concatenated, persistence-ready). |
370
+ | `result` | M → SDK | 1× terminal | Successful completion. Carries the final assistant text and run summary. |
371
+ | `error` | M → SDK | 1× terminal | Failure. Carries `error` (message), `code` / `errorClass` (category), `finishReason`, and an optional `partialText` salvage payload. See §4.7. |
372
+ | `cancelled` | M → SDK | 1× terminal | Cancellation. Run was aborted via `POST /cancel`. |
373
+
374
+ `result`, `error`, and `cancelled` are the **terminal** events — the SDK
375
+ should close the SSE stream after one of them arrives.
376
+
377
+ ### 4.1 `assistant_delta` / `thinking_delta`
378
+
379
+ ```jsonc
380
+ { "seq": 3, "type": "assistant_delta", "data": { "text": "Hello" } }
381
+ { "seq": 4, "type": "thinking_delta", "data": { "text": "Considering options..." } }
382
+ ```
383
+
384
+ `thinking_delta` only fires when `reasoningLevel > 0` and the provider
385
+ exposes reasoning (OpenAI o-series / GPT-5.x, Anthropic extended thinking,
386
+ Gemini ≥ 3 with `thinkingConfig.includeThoughts`). Treat it as opaque
387
+ progress text — it's not part of the canonical assistant response.
388
+
389
+ ### 4.2 `tool_result` (server-resolved tools)
390
+
391
+ ```jsonc
392
+ {
393
+ "seq": 5,
394
+ "type": "tool_result",
395
+ "data": {
396
+ "toolUseId": "tu_a",
397
+ "name": "github_search_repos",
398
+ "result": "..." // truncated for display; never JSON-parsed by SDK
399
+ }
400
+ }
401
+ ```
402
+
403
+ Purely informational. The SDK does not respond.
404
+
405
+ ### 4.3 `local_tool_call` (client-resolved tools)
406
+
407
+ This is the workhorse event for SDK-implemented tools. Payload shape varies
408
+ slightly by `kind`, but the envelope is always:
409
+
410
+ ```jsonc
411
+ {
412
+ "seq": <int>,
413
+ "type": "local_tool_call",
414
+ "data": {
415
+ "toolUseId": "<opaque-id>", // round-trip back in the tool-result POST
416
+ "name": "<model-facing tool name>",
417
+ "args": { /* model-supplied args */ },
418
+ "kind": "<local | a2a_local | mcp_local>",
419
+ /* …kind-specific extras below… */
420
+ }
421
+ }
422
+ ```
423
+
424
+ Older SDKs that ignore the `kind` discriminator can still match on `name`
425
+ and dispatch correctly — the `kind` field is additive metadata.
426
+
427
+ #### 4.3.1 `kind: "local"` — generic local tools
428
+
429
+ No extras. Dispatch by `name`.
430
+
431
+ ```jsonc
432
+ {
433
+ "seq": 6,
434
+ "type": "local_tool_call",
435
+ "data": {
436
+ "toolUseId": "tu_x",
437
+ "name": "compute_total",
438
+ "args": { "amount": 42, "currency": "USD" },
439
+ "kind": "local" // OR omitted (legacy)
440
+ }
441
+ }
442
+ ```
443
+
444
+ #### 4.3.2 `kind: "a2a_local"` — local A2A delegations
445
+
446
+ Carries the **full Agent Card** echoed back from the spec, so the SDK can
447
+ dispatch to the right A2A client when it manages multiple peers.
448
+
449
+ ```jsonc
450
+ {
451
+ "seq": 7,
452
+ "type": "local_tool_call",
453
+ "data": {
454
+ "toolUseId": "tu_y",
455
+ "name": "intranet_hr_agent",
456
+ "args": { "message": "When does PTO reset?" },
457
+ "kind": "a2a_local",
458
+ "agentCard": { // full Agent Card from the spec
459
+ "name": "Acme HR",
460
+ "url": "https://hr.intranet.acme/a2a",
461
+ "skills": [ /* ... */ ]
462
+ /* ...all other fields the SDK shipped... */
463
+ }
464
+ }
465
+ }
466
+ ```
467
+
468
+ `args.message` is *always* `{ "message": string }` for `a2a_local` — the
469
+ LLM's task is reduced to "what do I want to ask the peer in plain text?"
470
+ so the SDK doesn't have to re-derive an A2A `message` envelope from a
471
+ tool-specific schema.
472
+
473
+ #### 4.3.3 `kind: "mcp_local"` — local MCP tool calls
474
+
475
+ Carries dispatch hints so the SDK can route to the right MCP client without
476
+ parsing the tool name back into pieces.
477
+
478
+ ```jsonc
479
+ {
480
+ "seq": 8,
481
+ "type": "local_tool_call",
482
+ "data": {
483
+ "toolUseId": "tu_z",
484
+ "name": "fs_read_file", // identical to what the SDK declared
485
+ "args": { "path": "/etc/hosts" },
486
+ "kind": "mcp_local",
487
+ "mcpServer": "fs", // ref's `name` — SDK's MCP-client key
488
+ "mcpToolName": "fs_read_file", // duplicates `name` for the SDK's convenience
489
+ "mcpServerInfo": { // present iff the spec carried `serverInfo`
490
+ "name": "mcp-server-filesystem",
491
+ "version": "0.4.1"
492
+ }
493
+ }
494
+ }
495
+ ```
496
+
497
+ The SDK's typical dispatch path is:
498
+
499
+ ```ts
500
+ const client = mcpClients.get(call.mcpServer); // by SDK label
501
+ if (!client) throw new Error(`unknown MCP server ${call.mcpServer}`);
502
+ const result = await client.callTool({
503
+ name: call.mcpToolName,
504
+ arguments: call.args,
505
+ });
506
+ const text = result.content
507
+ .filter((b) => b.type === "text")
508
+ .map((b) => b.text)
509
+ .join("\n");
510
+ await fetch(`${baseUrl}/agent-runs/${runId}/tool-results`, {
511
+ method: "POST",
512
+ headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}` },
513
+ body: JSON.stringify({ toolUseId: call.toolUseId, result: text }),
514
+ });
515
+ ```
516
+
517
+ ### 4.4 `assistant_message`
518
+
519
+ ```jsonc
520
+ {
521
+ "seq": 12,
522
+ "type": "assistant_message",
523
+ "data": {
524
+ "text": "Here's what I found...",
525
+ "turn": 0,
526
+ "finishReason": "tool_use", // optional; canonical lowercase token
527
+ "toolCalls": [ // optional; absent when the turn was text-only
528
+ { "id": "call_abc", "name": "search", "input": { /* JSON Schema-matching args */ } }
529
+ ]
530
+ }
531
+ }
532
+ ```
533
+
534
+ | Field | Type | Required | Notes |
535
+ | ---------------- | -------- | -------- | ----- |
536
+ | `text` | string | yes | Full assistant text for this turn (concatenation of every preceding `assistant_delta` for this turn, plus any non-streaming snapshot the engine appended at close). May be empty when the turn was tool-only. |
537
+ | `turn` | integer | yes | 0-based tool-turn index this assistant message closes. Useful for SDK clients pairing the message with the subsequent `tool_result` rows. |
538
+ | `finishReason` | string\|null | no | Canonical lowercase stop reason normalized across providers (`"end_turn"`, `"tool_use"`, `"max_tokens"`, `"refusal"`, `"malformed_function_call"`, …). Pulled from the engine's per-turn `stopReason` after normalization — Gemini's `MAX_TOKENS` lands as `"max_tokens"`, OpenAI's `length` lands as `"max_tokens"`, etc. `null` / omitted when the provider did not report one. |
539
+ | `toolCalls` | array | no | Tool calls the model emitted on this turn (id, sanitized pipeline-side name, JSON-matching `input`). Omitted when the model did not call any tools. |
540
+
541
+ **Emission frequency.** Exactly **one** `assistant_message` per completed
542
+ assistant turn — including the last turn before a terminal `error`. SDK
543
+ clients should treat this as the canonical "the model said something" anchor
544
+ and avoid stitching a turn out of `assistant_delta` chunks themselves
545
+ (deltas may be split arbitrarily for transport).
546
+
547
+ **Truncation behaviour.** When the run terminates with `error` (e.g.
548
+ Gemini `MAX_TOKENS` while emitting `outputSchema` JSON), the last
549
+ `assistant_message` preceding the `error` carries the partial text plus
550
+ `finishReason: "max_tokens"`. The terminal `error` event then carries the
551
+ *same* text on `data.partialText` so reconnect / replay sees both pieces
552
+ without depending on event ordering.
553
+
554
+ ### 4.5 `loop_detected`
555
+
556
+ ```jsonc
557
+ // soft nudge — pipeline injected a "finalize OR change strategy" user message
558
+ { "seq": 13, "type": "loop_detected",
559
+ "data": { "consecutiveCount": 3, "hardCutoff": false, "tools": ["recall"] } }
560
+
561
+ // hard cutoff — pipeline forced a tools-disabled finalise turn
562
+ { "seq": 27, "type": "loop_detected",
563
+ "data": { "consecutiveCount": 6, "hardCutoff": true, "tools": ["recall"] } }
564
+ ```
565
+
566
+ | Field | Type | Notes |
567
+ | ------------------ | ------- | ----- |
568
+ | `consecutiveCount` | integer | Length of the identical-batch streak that just tripped the threshold (`>= consecutiveThreshold`). |
569
+ | `hardCutoff` | boolean | `false` for the soft nudge round; `true` once the pipeline forces finalisation. The SDK may see one of each in a single run. |
570
+ | `tools` | array | Names of the tool calls in the looping batch (no args — those are persisted on the matching `tool_result` events). |
571
+
572
+ Observability only: the synthetic skip + steering nudge are emitted on the
573
+ normal `tool_result` and assistant-message channels by the time this event
574
+ fires. SDK clients should render a status note (`looping — nudged` /
575
+ `looping — gave up`) and otherwise leave the run alone — the run still
576
+ continues to its terminal `result` / `error` / `cancelled`.
577
+
578
+ See §8 for the wire-spec field that controls thresholds.
579
+
580
+ ### 4.6 `tool_budget_exceeded`
581
+
582
+ ```jsonc
583
+ { "seq": 14, "type": "tool_budget_exceeded",
584
+ "data": { "tool": "recall", "maxCalls": 4, "callIndex": 5 } }
585
+ ```
586
+
587
+ | Field | Type | Notes |
588
+ | ----------- | ------- | ----- |
589
+ | `tool` | string | Logical tool name as the model saw it (matches the key in `spec.toolBudgets`). |
590
+ | `maxCalls` | integer | Configured cap. |
591
+ | `callIndex` | integer | 1-based count of attempts to call this tool over the run lifetime; always strictly greater than `maxCalls`. |
592
+
593
+ Observability only: the synthetic "budget exceeded — pivot or finalize"
594
+ tool-result lands on the normal `tool_result` channel before this event
595
+ fires, so the model already has the directive to pivot. SDK clients use
596
+ this event to render UI banners (`memory budget exhausted`, etc.) without
597
+ re-parsing tool-result bodies.
598
+
599
+ See §8 for the wire-spec field that defines budgets.
600
+
601
+ ### 4.7 Terminal events
602
+
603
+ ```jsonc
604
+ { "seq": 14, "type": "result", "data": { "ok": true, "text": "..." } }
605
+ { "seq": 14, "type": "error", "data": {
606
+ "error": "Model output was truncated (stop_reason=max_tokens). …",
607
+ "code": "truncation", // mirrors `errorClass`; legacy alias
608
+ "errorClass": "truncation", // canonical category (see below)
609
+ "finishReason": "max_tokens", // canonical lowercase stop reason
610
+ "partialText": "{\n \"answer\":… (truncated JSON) …",
611
+ "retryable": false // optional; per-class retry hint
612
+ } }
613
+ { "seq": 14, "type": "cancelled", "data": { "reason": "user" } }
614
+ ```
615
+
616
+ After one of these arrives, no further events will be emitted; close the
617
+ SSE stream.
618
+
619
+ **`error` event payload fields.** The runner enriches the `error` event
620
+ with structured triage attributes when the failure carried a salvage path
621
+ (typically truncation, upstream deadline, or max-budget-with-text):
622
+
623
+ | Field | Type | Required | Notes |
624
+ | -------------- | -------- | -------- | ----- |
625
+ | `error` | string | yes | Human-readable message (also persisted on `EphemeralAgentRun.error`). |
626
+ | `code` | string | yes | Legacy alias for `errorClass`. Equals `errorClass` when present; otherwise a small lowercase token (`"error"`, `"invalid_spec"`, `"worker_error"`, …) the SDK can switch on. |
627
+ | `errorClass` | string | no | Canonical category. One of `"rate_limit"`, `"overloaded"`, `"server"`, `"context_window"` (input too big), `"truncation"` (output budget exhausted), `"invalid_request"`, `"auth"`, `"timeout"`, `"local_timeout"`, `"upstream_deadline"`, `"unknown"`. New categories may land additively. |
628
+ | `finishReason` | string\|null | no | Canonical lowercase stop reason normalized across providers (`"max_tokens"`, `"refusal"`, `"malformed_function_call"`, …). When present, mirrors the value on the last `assistant_message`. |
629
+ | `partialText` | string | no | **Best-effort raw bytes** the model emitted before the failure. For `outputSchema` runs this is likely **incomplete JSON** that will fail `JSON.parse` — see §7 below. Also persisted on `EphemeralAgentRun.finalText` so the Calls UI can render it alongside a truncation banner. |
630
+ | `retryable` | boolean | no | Coarse retry hint inherited from the pipeline's error classifier. Informational; the SDK still owns the actual retry decision. |
631
+
632
+ When `errorClass` is `"truncation"`, the `EphemeralAgentRun` row that the
633
+ SDK can re-fetch via `GET /agent-runs/:runId` will have:
634
+
635
+ | Field | Value |
636
+ | --------------- | ----- |
637
+ | `status` | `"failed"` |
638
+ | `finalText` | Same string as `data.partialText` (so SDKs can ignore the SSE stream and still recover the salvage). |
639
+ | `error` | Same string as `data.error`. |
640
+ | `failureReason` | `{ "errorClass": "truncation", "finishReason": "max_tokens" }` (JSON object, future-proof for additional triage fields). |
641
+
642
+ ---
643
+
644
+ ## 5. SDK → MANTYX: tool-result POST
645
+
646
+ When the SDK sees a `local_tool_call`, it owes MANTYX exactly one
647
+ tool-result POST (success or failure):
648
+
649
+ ```http
650
+ POST /api/v1/workspaces/{slug}/agent-runs/{runId}/tool-results
651
+ Content-Type: application/json
652
+ Authorization: Bearer <api-key>
653
+
654
+ {
655
+ "toolUseId": "tu_z", // copied from local_tool_call
656
+ "result": "<file contents>" // OR "error": "..." (mutually exclusive)
657
+ }
658
+ ```
659
+
660
+ | Field | Type | Required | Notes |
661
+ | ------------ | ------- | -------- | ----- |
662
+ | `toolUseId` | string | yes | Must match a pending `local_tool_call`'s id. |
663
+ | `result` | string | one-of | Successful textual result (≤ 2 MB). For MCP tools, flatten content blocks to text. For A2A delegations, the peer's reply text. |
664
+ | `error` | string | one-of | Human-readable failure message (≤ 8 KB). Surfaced to the model so it can recover. |
665
+
666
+ Server response codes:
667
+
668
+ | Code | When |
669
+ | ---- | ---- |
670
+ | `204` | Accepted; the runner was woken and will resume the model loop. |
671
+ | `400` | Body failed Zod validation (missing `toolUseId`, both/neither of `result`/`error`, etc.). |
672
+ | `404` | `unknown_tool_use` — `toolUseId` doesn't match any pending call (already answered or unknown id). |
673
+ | `409` | `run_terminal` — the run already finished (success, failure, cancel, or local-tool timeout). The result is dropped. |
674
+
675
+ The runner enforces a per-call `localToolTimeoutMs` (default 5 minutes).
676
+ After timeout the model loop unblocks with a synthetic
677
+ "Timed out waiting for local tool result" error — which is also why a
678
+ `409 run_terminal` for a tool-result POST is a normal occurrence.
679
+
680
+ ---
681
+
682
+ ## 6. `reasoningLevel`
683
+
684
+ `spec.reasoningLevel` controls the LLM's extended-thinking effort. Two
685
+ input shapes are accepted; both map to a numeric `0–100` internally.
686
+
687
+ | Form | Values | Notes |
688
+ | ----------- | ------------------------------------- | ----- |
689
+ | **String** | `"off"`, `"low"`, `"medium"`, `"high"` | Snaps to `0`, `30`, `50`, `80` (matches the web composer). |
690
+ | **Number** | integer `0`–`100` | Pass-through. `0` explicitly disables provider thinking. |
691
+
692
+ Per provider:
693
+
694
+ | Provider | Knob driven by `reasoningLevel` |
695
+ | -------------------------- | ------------------------------- |
696
+ | OpenAI Responses (o-series, GPT-5.x) | `reasoning.effort` |
697
+ | Gemini ≥ 3 | `thinkingConfig.thinkingLevel` |
698
+ | Gemini ≤ 2.5 | `thinkingConfig.thinkingBudget` (token budget; scaled) |
699
+ | Anthropic / Bedrock-Anthropic | extended thinking budget (≈ 512 tokens at `low` → ≈ 8 000 at `high`) |
700
+ | xAI Grok, others | ignored |
701
+
702
+ When `reasoningLevel > 0` and the provider supports it, the SSE stream
703
+ will include `thinking_delta` events alongside `assistant_delta`.
704
+
705
+ ---
706
+
707
+ ## 7. `outputSchema` (structured final reply)
708
+
709
+ `outputSchema` constrains the final assistant message to a JSON document
710
+ conforming to a JSON Schema. When set, the run's terminal `result` event
711
+ still carries the reply as `data.text: string`, but that string is
712
+ guaranteed-parseable JSON matching the supplied schema.
713
+
714
+ ```jsonc
715
+ "outputSchema": {
716
+ "name": "weather_report", // optional; default "output"; /^[a-zA-Z0-9_-]{1,64}$/
717
+ "schema": { /* JSON Schema */ } // required, root must be a JSON object
718
+ }
719
+ ```
720
+
721
+ | Field | Type | Required | Notes |
722
+ | -------- | ------ | -------- | ----- |
723
+ | `name` | string | no | Stable identifier passed to providers (OpenAI `text.format.name`, Anthropic synthetic-tool name). Defaults to `"output"`. |
724
+ | `schema` | object | yes | JSON Schema for the assistant text. Root must be a JSON object — most providers reject array/scalar roots in structured-output mode. Passed through verbatim; MANTYX does not validate the schema's contents. |
725
+
726
+ Per provider:
727
+
728
+ | Provider | How the schema is enforced |
729
+ | ------------------------------ | -------------------------- |
730
+ | OpenAI Responses (o-series, GPT-5.x, …) | `text.format = { type: "json_schema", strict: true, name, schema }` on every `completeTurn` (compatible with tool calls). |
731
+ | Gemini 3+ (any turn) | `responseMimeType: "application/json"` + `responseJsonSchema` on every `completeTurn`. Gemini 3 accepts the schema alongside `functionDeclarations`. |
732
+ | Gemini ≤ 2.5 with no tools | Same as Gemini 3+: `responseMimeType: "application/json"` + `responseJsonSchema`. |
733
+ | Gemini ≤ 2.5 **with tools** | Synthetic `set_model_response` function declaration is injected; its `parametersJsonSchema` is the supplied schema. The system instruction is augmented to direct the model to call this tool with the final answer. The engine intercepts the call, hides it from the SDK, and surfaces the call's arguments as the assistant text (JSON-stringified). Sidesteps the API rejection ("Function calling with a response mime type: 'application/json' is unsupported") without round-tripping a 4xx. |
734
+ | Anthropic / Bedrock-Anthropic | Synthetic `final_report` tool whose `input_schema` is the supplied schema; `tool_choice` is forced on the no-tools finishing turn. The tool's input is surfaced as the assistant text. |
735
+ | xAI Grok, others | Ignored — the model returns plain text. |
736
+
737
+ The synthetic-tool paths (Gemini 2.5 + tools, Anthropic) are entirely
738
+ internal: the SDK still receives `data.text: string` on the terminal
739
+ `result` event and never sees a `local_tool_call` for `set_model_response`
740
+ or `final_report`. They never appear in the tools array the SDK declared.
741
+
742
+ Validation (server-side, `400 invalid_request` on violation):
743
+
744
+ | Constraint | Limit |
745
+ | ----------------------------------------- | ----- |
746
+ | Serialized JSON size of `outputSchema` | ≤ 32 KB |
747
+ | `name` regex | `/^[a-zA-Z0-9_-]{1,64}$/` |
748
+ | `schema` shape | non-`null`, non-array JSON object |
749
+
750
+ **SDK guidance.** Even though the server enforces JSON shape via the
751
+ provider, transient model errors (refusal text, truncation under
752
+ `max_tokens` pressure, exotic Unicode normalisation) can still produce
753
+ a string that fails to `JSON.parse` in rare cases. Reference SDKs should:
754
+
755
+ 1. Pass the schema through unchanged from the developer's API.
756
+ 2. `JSON.parse` the terminal `result.data.text`.
757
+ 3. Re-validate against their source-of-truth Zod / Pydantic / JSON Schema
758
+ validator and surface a typed parse error instead of crashing.
759
+
760
+ **Truncation contract.** When the model is mid-JSON and Gemini /
761
+ Anthropic / OpenAI hit the output budget, MANTYX does **not** discard the
762
+ bytes that already streamed. Instead:
763
+
764
+ 1. The last `assistant_message` for the turn (§4.4) carries the partial
765
+ text plus `finishReason: "max_tokens"`.
766
+ 2. The terminal SSE event is an `error` (not `result`) with
767
+ `errorClass: "truncation"` and `data.partialText` set to the same
768
+ bytes (§4.7).
769
+ 3. The run row exposes the salvage on
770
+ `GET /agent-runs/:runId` as `{ status: "failed", finalText: "<partial JSON>",
771
+ error: "Model output was truncated …", failureReason: { errorClass:
772
+ "truncation", finishReason: "max_tokens" } }`.
773
+
774
+ `partialText` is a **best-effort raw byte sequence** — for `outputSchema`
775
+ runs it will almost always fail `JSON.parse` because the JSON object was
776
+ not closed. SDKs should treat it as diagnostic data, never as a
777
+ schema-conformant reply. Surfacing it (as a "truncated reply — JSON
778
+ likely incomplete" status note) is the recommended pattern; silently
779
+ falling back to it as the answer is not.
780
+
781
+ `outputSchema` works for both ephemeral runs (`systemPrompt`-defined) and
782
+ `agentId`-backed runs — the runner applies the schema to whichever
783
+ `AgentSpec` it built. `outputSchema` is independent of `reasoningLevel`:
784
+ the model can think extensively *and* emit JSON.
785
+
786
+ ---
787
+
788
+ ## 8. Run guards (`loopDetection`, `toolBudgets`)
789
+
790
+ Two opt-in (default-on) fields on the spec body govern how MANTYX guards
791
+ against tight tool loops and runaway research-tool usage. Both are
792
+ **additive over the wire** — older SDKs that don't ship them keep working,
793
+ and the runtime defaults still apply server-side.
794
+
795
+ ### 8.1 `loopDetection`
796
+
797
+ The pipeline tracks an order-invariant canonical signature for every
798
+ assistant turn that emits one or more tool calls. When the same signature
799
+ repeats consecutively the guard intervenes:
800
+
801
+ | Trigger | Server action |
802
+ | -------------------------------------------------- | ------------- |
803
+ | `consecutiveThreshold` identical batches in a row | Skip the duplicate batch with a synthetic "you've made this exact call before" tool result, prepend a user-style **steering nudge** ("either deliver a final answer or change strategy") before the next model turn. |
804
+ | `hardCutoffThreshold` identical batches in a row | Force a tools-disabled finalise turn (same path as `budgets.maxToolTurnsExceeded: "finalize"`) so the run lands cleanly. |
805
+
806
+ ```jsonc
807
+ "loopDetection": {
808
+ "consecutiveThreshold": 3, // optional, default 3 — fires the steering nudge
809
+ "hardCutoffThreshold": 6 // optional, default 6 — forces finalisation
810
+ }
811
+
812
+ // or:
813
+ "loopDetection": false // explicitly disable for this run
814
+ ```
815
+
816
+ | Field | Type | Notes |
817
+ | ---------------------- | --------------- | ----- |
818
+ | `consecutiveThreshold` | integer ≥ 2 | Default `3`. Single batch = single tool call, not a loop, so the floor is `2`. |
819
+ | `hardCutoffThreshold` | integer ≥ 3 | Default `6`. Must be **strictly greater** than `consecutiveThreshold` (otherwise the soft nudge never gets a chance). |
820
+ | (top-level `false`) | literal `false` | Disables the guard. `budgets.maxToolTurns` still applies. |
821
+
822
+ Validation (server-side, `400 invalid_request` on violation): both
823
+ thresholds capped at `100`; `hardCutoffThreshold` must exceed
824
+ `consecutiveThreshold`.
825
+
826
+ The runtime default — applied when the field is omitted — is
827
+ `{ consecutiveThreshold: 3, hardCutoffThreshold: 6 }`. SDK-driven runs and
828
+ platform-driven runs inherit identical defaults.
829
+
830
+ ### 8.2 `toolBudgets`
831
+
832
+ Per-tool call caps enforced over the **lifetime of the run** (across every
833
+ LLM turn). Calls under the cap run normally; calls past the cap are
834
+ intercepted **before execution** and the model receives a synthetic
835
+ "budget exceeded — pivot or finalize" tool result. The model stays in the
836
+ loop and either changes strategy or finalises.
837
+
838
+ ```jsonc
839
+ "toolBudgets": {
840
+ "recall": { "maxCalls": 4 },
841
+ "hive_consult_ontology": { "maxCalls": 4 },
842
+ "traverse": { "maxCalls": 3 },
843
+ "scary_tool": { "maxCalls": 0 } // disables the tool for this run
844
+ }
845
+ ```
846
+
847
+ | Field | Type | Notes |
848
+ | ---------- | ----------- | ----- |
849
+ | `<key>` | string (1–120 chars) | Logical tool name as the model sees it (`ResolvedTool.name`). The SDK + pipeline handle internal sanitisation. |
850
+ | `maxCalls` | integer ≥ 0 | Hard cap. `0` disables the tool entirely (the first attempt returns the synthetic body). |
851
+
852
+ Budgets are **per-tool, not pooled** — `hive_search_deals: { maxCalls: 5 }`
853
+ and `hive_search_meetings: { maxCalls: 5 }` give the agent five of each,
854
+ not five between them.
855
+
856
+ Validation (server-side, `400 invalid_request` on violation):
857
+
858
+ | Constraint | Limit |
859
+ | --------------------- | ----- |
860
+ | Max entries | `32` |
861
+ | `<key>` length | `1..120` |
862
+ | `maxCalls` upper bound | `1000` (functionally unlimited; `maxToolTurns: 100` fires first) |
863
+
864
+ **Default budgets** (applied when the field is omitted; caller-provided
865
+ entries are layered on top so per-run overrides win):
866
+
867
+ | Tool | Default `maxCalls` |
868
+ | ------------------------------------------------------------------------------------------------ | ------------------ |
869
+ | `recall` (workspace memory hybrid search) | `4` |
870
+ | `traverse` (memory graph BFS) | `3` |
871
+ | `hive_consult_ontology` (per-hive ontology read; same name across all three hives) | `4` |
872
+ | `hive_search_deals` / `_meetings` / `_companies` / `_people` (Sales Hive general search) | `5` |
873
+ | `hive_search_tickets` / `_conversations` / `_accounts` (Customer Hive general search) | `5` |
874
+ | `hive_search_releases` / `_issues` (Product Hive general search) | `5` |
875
+
876
+ Pass `"toolBudgets": {}` to start from a clean slate (no defaults applied
877
+ on top — useful for runs that intentionally want unbounded research). When
878
+ both the caller and the runtime defaults specify a budget for the same
879
+ tool, **the caller's value wins**.
880
+
881
+ ### 8.3 Observability
882
+
883
+ Each intervention emits a SSE event so SDK clients can render UI status
884
+ banners without re-parsing tool-result bodies:
885
+
886
+ - `loop_detected` — fired on the soft nudge and again on the hard cutoff
887
+ if reached. See §4.5.
888
+ - `tool_budget_exceeded` — fired each time a call is intercepted. See §4.6.
889
+
890
+ Both events are observability-only: the server has already substituted
891
+ the synthetic tool-result / steering nudge by the time the SDK sees the
892
+ event. The run continues to its terminal `result` / `error` / `cancelled`
893
+ as usual.
894
+
895
+ ### 8.4 Session inheritance
896
+
897
+ Like `reasoningLevel` and `outputSchema`, both fields support
898
+ session-default + per-message override:
899
+
900
+ - `POST /agent-sessions { loopDetection, toolBudgets }` — sets the
901
+ session-default applied to every subsequent message run.
902
+ - `POST /agent-sessions/:id/messages { loopDetection, toolBudgets }` —
903
+ optional per-message override. Applies to that one run only and does
904
+ not mutate the session's stored value.
905
+
906
+ ---
907
+
908
+ ## 9. Cancellation
909
+
910
+ ```http
911
+ POST /api/v1/workspaces/{slug}/agent-runs/{runId}/cancel
912
+ Authorization: Bearer <api-key>
913
+ ```
914
+
915
+ Best-effort: publishes a Valkey signal the runner observes between LLM
916
+ turns. The runner aborts cleanly and emits a terminal `cancelled` event.
917
+ In-flight `local_tool_call`s are still fulfilled (or time out) before the
918
+ final event lands, so SDKs should keep the stream open until they see a
919
+ terminal event.
920
+
921
+ ---
922
+
923
+ ## 10. Reconnects & at-least-once delivery
924
+
925
+ - Every event has a monotonically-increasing `seq` per run, persisted to
926
+ `EphemeralAgentRunEvent`. Reopen with `Last-Event-ID: <seq>` to resume.
927
+ - The Valkey pub/sub is best-effort; the persisted log is the source of
928
+ truth. The server occasionally polls the DB during long waits (see
929
+ `bus.ts → waitForLocalToolResult`) so missed publishes still wake the
930
+ runner.
931
+ - `local_tool_result_in` is persisted in addition to the live publish, so
932
+ late-joining viewers can replay the SDK's response.
933
+ - Tool-result POSTs are idempotent on `toolUseId`: a second POST for the
934
+ same `toolUseId` returns `404 unknown_tool_use` (or `409` if the run
935
+ already ended), it does **not** double-execute the tool.
936
+
937
+ ---
938
+
939
+ ## 11. Full worked example: `a2a_local` round-trip
940
+
941
+ ```ts
942
+ import { fetch } from "undici";
943
+
944
+ // ── 1. Resolve the Agent Card locally ───────────────────────────────────
945
+ const cardResp = await fetch("https://hr.intranet.acme/.well-known/agent-card.json", {
946
+ headers: { Authorization: `Bearer ${INTRANET_TOKEN}` },
947
+ });
948
+ const agentCard = await cardResp.json(); // ← whole document, passed through
949
+
950
+ // ── 2. Submit the spec ──────────────────────────────────────────────────
951
+ const create = await fetch(`${MANTYX}/api/v1/workspaces/${slug}/agent-runs`, {
952
+ method: "POST",
953
+ headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}` },
954
+ body: JSON.stringify({
955
+ modelId: "openai:gpt-5.5",
956
+ systemPrompt: "You can delegate HR questions to the Acme HR agent.",
957
+ prompt: "How many PTO days does Alice have left this year?",
958
+ reasoningLevel: "low",
959
+ tools: [
960
+ { kind: "a2a_local", name: "intranet_hr_agent", agentCard },
961
+ ],
962
+ }),
963
+ });
964
+ const { runId, streamUrl } = await create.json();
965
+
966
+ // ── 3. Open the SSE stream and dispatch local_tool_calls ────────────────
967
+ const stream = await fetch(streamUrl, {
968
+ headers: { Authorization: `Bearer ${apiKey}`, Accept: "text/event-stream" },
969
+ });
970
+
971
+ for await (const ev of parseSSE(stream)) {
972
+ if (ev.type !== "local_tool_call") continue;
973
+ if (ev.data.kind !== "a2a_local") continue;
974
+
975
+ const peer = a2aClients.get(ev.data.agentCard.url); // ← dispatch by URL
976
+ const reply = await peer.send({ message: ev.data.args.message });
977
+
978
+ await fetch(`${MANTYX}/api/v1/workspaces/${slug}/agent-runs/${runId}/tool-results`, {
979
+ method: "POST",
980
+ headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}` },
981
+ body: JSON.stringify({ toolUseId: ev.data.toolUseId, result: reply.text }),
982
+ });
983
+ }
984
+ ```
985
+
986
+ ---
987
+
988
+ ## 12. Full worked example: `mcp_local` round-trip
989
+
990
+ ```ts
991
+ // ── 1. Connect + resolve catalog locally ────────────────────────────────
992
+ const mcp = new McpClient(stdio("./mcp-server-filesystem"));
993
+ const initImpl = await mcp.initialize(); // → { name, version, ... }
994
+ const { tools } = await mcp.listTools(); // → MCP Tool[]
995
+
996
+ // ── 2. Submit the spec ──────────────────────────────────────────────────
997
+ const create = await fetch(`${MANTYX}/api/v1/workspaces/${slug}/agent-runs`, {
998
+ method: "POST",
999
+ headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}` },
1000
+ body: JSON.stringify({
1001
+ modelId: "openai:gpt-5.5",
1002
+ prompt: "Tell me what's at /etc/hosts.",
1003
+ tools: [
1004
+ {
1005
+ kind: "mcp_local",
1006
+ name: "fs",
1007
+ serverInfo: initImpl,
1008
+ tools, // ← verbatim from listTools()
1009
+ },
1010
+ ],
1011
+ }),
1012
+ });
1013
+ const { runId, streamUrl } = await create.json();
1014
+
1015
+ // ── 3. Open SSE and dispatch ────────────────────────────────────────────
1016
+ for await (const ev of parseSSE(streamFromUrl(streamUrl, apiKey))) {
1017
+ if (ev.type !== "local_tool_call") continue;
1018
+ if (ev.data.kind !== "mcp_local") continue;
1019
+
1020
+ const result = await mcp.callTool({
1021
+ name: ev.data.mcpToolName, // identical to ev.data.name
1022
+ arguments: ev.data.args,
1023
+ });
1024
+ const text = result.content
1025
+ .filter((b) => b.type === "text")
1026
+ .map((b) => b.text)
1027
+ .join("\n");
1028
+
1029
+ await fetch(`${MANTYX}/api/v1/workspaces/${slug}/agent-runs/${runId}/tool-results`, {
1030
+ method: "POST",
1031
+ headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}` },
1032
+ body: JSON.stringify({ toolUseId: ev.data.toolUseId, result: text }),
1033
+ });
1034
+ }
1035
+ ```
1036
+
1037
+ ---
1038
+
1039
+ ## 13. Compliance checklist for SDK implementers
1040
+
1041
+ A reference SDK should:
1042
+
1043
+ - [ ] Accept `reasoningLevel` from the caller in either string or number
1044
+ form and pass it through unchanged. Do not translate it to a
1045
+ vendor-specific knob — the server owns that mapping.
1046
+ - [ ] Accept `outputSchema` from the caller as `{ name?, schema }` and pass
1047
+ it through unchanged. After the run terminates, `JSON.parse` the
1048
+ `result.data.text` and re-validate against the caller's
1049
+ source-of-truth schema (Zod / Pydantic / etc.) — the server enforces
1050
+ JSON shape via the provider, but transient model errors can still
1051
+ produce strings that fail to parse in rare cases.
1052
+ - [ ] Accept `loopDetection` and `toolBudgets` from the caller and pass
1053
+ them through unchanged (see §8). Both are *additive* — omitting
1054
+ them keeps the runtime defaults; passing `loopDetection: false` opts
1055
+ out; passing `toolBudgets: {}` clears the defaults; passing entries
1056
+ layers caller overrides on top of the defaults. Do **not** translate
1057
+ to vendor-specific knobs.
1058
+ - [ ] Treat `loop_detected` and `tool_budget_exceeded` SSE events as
1059
+ observability-only (see §4.5 / §4.6). Surface them as status notes
1060
+ / log lines / telemetry — the server already substituted the
1061
+ synthetic tool-results / steering nudges, so the SDK should keep
1062
+ consuming the stream until the terminal event lands.
1063
+ - [ ] Maintain three local-callback registries (or one tagged-union
1064
+ registry), keyed by `name`:
1065
+ - generic local tools (`kind: "local"`),
1066
+ - local A2A peers (`kind: "a2a_local"`, indexed by some Agent Card
1067
+ field — typically `agentCard.url`),
1068
+ - local MCP servers (`kind: "mcp_local"`, indexed by the SDK-side
1069
+ server label that matches `local_tool_call.mcpServer`).
1070
+ - [ ] For `kind: "local"`, accept developer-supplied `parameters` (Zod /
1071
+ JSON Schema) and serialize to JSON Schema before submission. When the
1072
+ caller declares an output schema, forward it as `outputSchema` (same
1073
+ JSON Schema shape) so providers with per-tool response schemas can
1074
+ enforce it. Surface a `longRunning` flag on the tool builder so the
1075
+ caller can opt into the model-side "don't double-call" hint without
1076
+ hand-editing the description.
1077
+ - [ ] For `a2a_local`, **resolve the Agent Card locally** and ship it as
1078
+ `agentCard`. Don't expect MANTYX to fetch anything.
1079
+ - [ ] For `mcp_local`, **speak `Initialize` + `tools/list` locally** and
1080
+ ship the verbatim result as `serverInfo` + `tools[]`. Don't expect
1081
+ MANTYX to discover anything.
1082
+ - [ ] On `local_tool_call`, dispatch by the event's `kind` discriminator
1083
+ (defaulting to `"local"` when omitted). Validate args against the
1084
+ tool's schema, run it, POST the result back to `.../tool-results`.
1085
+ - [ ] On the terminal `result` / `error` / `cancelled` event, close the
1086
+ SSE stream.
1087
+ - [ ] Idempotency: only POST one tool-result per `toolUseId`. Treat
1088
+ `409 run_terminal` as a normal late-arrival outcome (the runner
1089
+ timed out).
1090
+ - [ ] Reconnects: send `Last-Event-ID: <last seq>` to resume, and rely on
1091
+ the persisted event log to backfill missed events.
1092
+
1093
+ ---
1094
+
1095
+ ## 14. See also
1096
+
1097
+ - [`agent-runs-protocol.md`](./agent-runs-protocol.md) — HTTP routes, auth,
1098
+ full body shapes, sessions, error codes.
1099
+ - [A2A spec](https://google.github.io/A2A/specification/) — canonical
1100
+ Agent Card schema.
1101
+ - [MCP spec](https://spec.modelcontextprotocol.io/) — canonical `Tool` and
1102
+ `Implementation` shapes.