@mantyx/sdk 0.2.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -5,8 +5,14 @@ speaks with SDKs. It is the source of truth for anyone implementing a new
5
5
  client (Python, Rust, Java…) and is shipped with each first-party SDK so the
6
6
  SDK repository can stand on its own when it is extracted from this monorepo.
7
7
 
8
- The companion document for this protocol — server-side overview, internals,
9
- deployment notes — is [`docs/agent-runs.md`](./agent-runs.md).
8
+ Companion documents:
9
+
10
+ - [`docs/wire-protocol.md`](./wire-protocol.md) — the messaging-layer
11
+ reference: every SSE event payload, the SDK-side dispatcher pattern, and
12
+ the resolved data structures (`a2a_local` Agent Card, `mcp_local`
13
+ `Tool[]`) the SDK is expected to ship.
14
+ - [`docs/agent-runs.md`](./agent-runs.md) — server-side overview, internals,
15
+ deployment notes.
10
16
 
11
17
  ## 1. Concepts
12
18
 
@@ -15,17 +21,34 @@ than persisted as a row in MANTYX's `Agent` table. The full spec (system
15
21
  prompt, model, tools) is stored as part of each session/run for observability
16
22
  but is not editable from the dashboard.
17
23
 
18
- **Tool refs.** Three flavours, all carried inside the agent spec:
24
+ **Tool refs.** Seven flavours, all carried inside the agent spec's `tools`
25
+ array:
19
26
 
20
27
  | `kind` | Resolved by | Notes |
21
28
  | ---------------- | ----------- | ----- |
22
29
  | `mantyx` | server | A workspace `Tool` row referenced by id (HTTP / Code / Plugin). |
23
30
  | `mantyx_plugin` | server | A platform plugin tool referenced by name. |
24
- | `local` | client | Defined and executed in the SDK's process. |
25
-
26
- When the model calls a `local` tool, MANTYX pauses the agent loop, emits a
27
- `local_tool_call` event over SSE and waits for the SDK to POST a tool-result
28
- back via HTTP.
31
+ | `local` | client | A custom tool defined and executed in the SDK's process. |
32
+ | `a2a` | server | A *remote* Agent2Agent peer MANTYX can reach; invoked via `message/send` and the reply is surfaced as the tool result. |
33
+ | `a2a_local` | client | An A2A peer MANTYX **cannot** reach. SDK resolves the [Agent Card](https://google.github.io/A2A/specification/#agent-card) locally and ships it inline; MANTYX uses it for the model description and routes calls back to the SDK over SSE. |
34
+ | `mcp` | server | A *remote* MCP server (Streamable HTTP). At run start MANTYX lists the catalog and exposes every tool as `<server>_<tool>` (subject to `toolFilter`). |
35
+ | `mcp_local` | client | An MCP server MANTYX **cannot** reach. SDK runs `Initialize` + `tools/list` locally and ships the resolved `Tool[]` (with `inputSchema`); MANTYX exposes them to the model with the SDK-declared names and routes calls back over SSE. |
36
+
37
+ The split is deliberate:
38
+
39
+ - **Server-resolved** (`mantyx`, `mantyx_plugin`, `a2a`, `mcp`) — MANTYX has
40
+ network access to the resource. The worker runs the tool itself and the
41
+ SDK only sees an informational `tool_result` event in the SSE stream. For
42
+ MCP/A2A this also means MANTYX does discovery (`listTools`, agent-card
43
+ fetch).
44
+ - **Client-resolved / "local"** (`local`, `a2a_local`, `mcp_local`) —
45
+ MANTYX has *no* access to the resource. The SDK does **all** of the
46
+ work: connection, discovery, listing, expansion, arg validation, auth,
47
+ execution, retries. MANTYX is a thin LLM-routing layer that emits a
48
+ `local_tool_call` event and blocks until the SDK POSTs back to
49
+ `.../tool-results`. The event payload carries a `kind` discriminator
50
+ (`"local"` implied when absent, `"a2a_local"` and `"mcp_local"` explicit)
51
+ so SDKs can dispatch to the right local handler.
29
52
 
30
53
  **One-shot run vs. session.** A run is an LLM execution. Runs may be:
31
54
 
@@ -114,6 +137,7 @@ The agent spec is the body shape used by `POST /agent-runs` and `POST
114
137
  "agentId": "agent_cm6abc123", // optional — see §4.1
115
138
  "systemPrompt": "You are helpful.", // required unless agentId is set
116
139
  "modelId": "platform:cm6abc123", // optional, see §3
140
+ "reasoningLevel": "medium", // optional, see §4.4
117
141
  "tools": [
118
142
  { "kind": "mantyx", "id": "tool_cm6..." },
119
143
  { "kind": "mantyx_plugin", "name": "web_search" },
@@ -127,10 +151,79 @@ The agent spec is the body shape used by `POST /agent-runs` and `POST
127
151
  "required": ["path"],
128
152
  "additionalProperties": false
129
153
  }
154
+ },
155
+ {
156
+ "kind": "a2a",
157
+ "name": "billing_agent",
158
+ "description": "Delegate billing questions to the Acme billing agent.",
159
+ "agentCardUrl": "https://billing.acme.com/.well-known/agent-card.json",
160
+ "headers": { "Authorization": "Bearer ${BILLING_TOKEN}" },
161
+ "contextId": "ctx_abc" // optional A2A context to thread turns
162
+ },
163
+ {
164
+ "kind": "a2a_local",
165
+ "name": "intranet_hr_agent",
166
+ "agentCard": { // SDK-resolved A2A Agent Card content
167
+ "protocolVersion": "0.3.0",
168
+ "name": "Acme HR",
169
+ "description": "Answers questions about HR policies and benefits.",
170
+ "url": "https://hr.intranet.acme/a2a",
171
+ "version": "1.4.0",
172
+ "capabilities": { "streaming": false },
173
+ "skills": [
174
+ {
175
+ "id": "pto_lookup",
176
+ "name": "PTO lookup",
177
+ "description": "Find a teammate's remaining PTO days for the year."
178
+ },
179
+ {
180
+ "id": "benefits_qa",
181
+ "name": "Benefits Q&A",
182
+ "description": "Answer questions about insurance, 401k, and parental leave."
183
+ }
184
+ ]
185
+ }
186
+ },
187
+ {
188
+ "kind": "mcp",
189
+ "name": "github", // → tools become github_<tool>
190
+ "url": "https://mcp.github.com/v1",
191
+ "headers": { "Authorization": "Bearer ${GH_PAT}" },
192
+ "toolFilter": ["search_repos", "read_file"] // optional allowlist
193
+ },
194
+ {
195
+ "kind": "mcp_local",
196
+ "name": "fs", // SDK-side server label only — NOT a prefix
197
+ "serverInfo": { // optional; from MCP Initialize
198
+ "name": "mcp-server-filesystem",
199
+ "version": "0.4.1"
200
+ },
201
+ "tools": [ // verbatim MCP tools/list response
202
+ {
203
+ "name": "fs_read_file", // model-facing name, exactly as declared
204
+ "description": "Read a file from the user's workstation",
205
+ "inputSchema": { // MCP's term — JSON Schema
206
+ "type": "object",
207
+ "properties": { "path": { "type": "string" } },
208
+ "required": ["path"]
209
+ }
210
+ }
211
+ ]
130
212
  }
131
213
  ],
132
214
  "budgets": { "maxToolTurns": 32 }, // optional safety cap
133
- "metadata": { // optional, see §4.2
215
+ "outputSchema": { // optional, see §4.5
216
+ "name": "weather_report",
217
+ "schema": {
218
+ "type": "object",
219
+ "properties": {
220
+ "city": { "type": "string" },
221
+ "temperature_c": { "type": "number" }
222
+ },
223
+ "required": ["city", "temperature_c"]
224
+ }
225
+ },
226
+ "metadata": { // optional, see §4.6
134
227
  "customer": "acme",
135
228
  "env": "prod"
136
229
  }
@@ -165,7 +258,292 @@ defining an ephemeral one inline. When `agentId` is set:
165
258
  Both `agentId` and `systemPrompt` may be supplied. The agent's stored prompt
166
259
  wins; the inline `systemPrompt` is ignored.
167
260
 
168
- ### 4.2 `metadata` (developer-supplied KV for filtering)
261
+ ### 4.2 A2A tool refs
262
+
263
+ A2A delegation lets the agent hand a task to another
264
+ [Agent2Agent](https://google.github.io/A2A/) peer. The wire protocol exposes
265
+ two kinds depending on **who can reach the peer**:
266
+
267
+ - `kind: "a2a"` — *remote* (server-resolved). MANTYX dials `agentCardUrl`
268
+ directly. Pick this when the peer is on the public internet or in the
269
+ same VPC as MANTYX.
270
+ - `kind: "a2a_local"` — *local* (client-resolved). The SDK invokes the peer
271
+ on its side and posts back the reply. Pick this when the peer lives on an
272
+ intranet, behind a VPN, or on the user's device — anywhere MANTYX can't
273
+ reach but the SDK can.
274
+
275
+ Both kinds present the **same** `{ "message": string }` argument shape to
276
+ the model, so an agent prompt that uses one transparently works with the
277
+ other. (This also matches MANTYX's internal `delegate_to_<name>` tools, so
278
+ models trained on one pattern carry across.)
279
+
280
+ #### `kind: "a2a"` — remote A2A
281
+
282
+ MANTYX resolves the tool server-side: when the model calls it, the worker
283
+ POSTs the model's `message` argument to `agentCardUrl` over A2A's standard
284
+ `message/send` RPC (Google ADK JSON-RPC root, A2A `/rpc`, `/message:send`,
285
+ and `/message/send` endpoints are probed in order) and forwards the remote
286
+ agent's text reply back as the tool result.
287
+
288
+ | Field | Required | Notes |
289
+ | --------------- | -------- | ----- |
290
+ | `kind` | yes | Discriminator literal `"a2a"`. |
291
+ | `name` | yes | Tool name surfaced to the model — must match `/^[a-zA-Z0-9_]{1,64}$/`. |
292
+ | `description` | no | Model-facing description. Defaults to `"Delegate a task to the <name> agent over A2A. Pass the full task as a single message."`. Mention the remote agent's purpose so the model picks it for the right turn. |
293
+ | `agentCardUrl` | yes | URL of the remote Agent Card (`/.well-known/agent-card.json`) or the JSON-RPC root the peer accepts. |
294
+ | `headers` | no | Flat string→string HTTP headers sent on every A2A request — typically `Authorization`. Each value is capped at 8 KB. |
295
+ | `contextId` | no | A2A `contextId` to thread multiple delegations into the same remote conversation. Omit for fresh per-call context. |
296
+
297
+ > **Secret handling.** `headers` are forwarded **as-is** by the SDK API. If
298
+ > you need long-lived credentials (refresh tokens, rotating API keys),
299
+ > register the peer as a workspace `ExternalAgent` instead — those headers
300
+ > support `{{secret:NAME}}` resolution against the workspace secrets store
301
+ > (see `runtime/a2a-client.ts`). The wire-protocol `a2a` ref is best for
302
+ > short-lived per-run tokens minted by your application.
303
+
304
+ #### `kind: "a2a_local"` — local A2A
305
+
306
+ > **MANTYX does no A2A work for this kind.** It does not fetch the agent
307
+ > card, validate transport, manage credentials, or speak `message/send`.
308
+ > The SDK owns the entire A2A relationship; MANTYX merely translates the
309
+ > model's `delegate_to_<name>` call into a `local_tool_call` event and
310
+ > waits for the SDK to POST back the reply text.
311
+
312
+ Per-run lifecycle:
313
+
314
+ 1. **Resolution (SDK).** Before submitting the spec, the SDK obtains the
315
+ peer's [A2A Agent Card](https://google.github.io/A2A/specification/#agent-card)
316
+ — typically by fetching `/.well-known/agent-card.json` from the local
317
+ peer, or by reading it from a config file / registry / inline constant.
318
+ 2. **Submission (SDK → MANTYX).** SDK posts the spec with the resolved
319
+ card embedded as `agentCard`. MANTYX uses the card's `name`,
320
+ `description`, and `skills[]` to compose the model-facing tool
321
+ description so the LLM understands what the peer can do.
322
+ 3. **Tool call (MANTYX → SDK).** When the model calls the tool, MANTYX
323
+ emits a `local_tool_call` event with `kind: "a2a_local"`,
324
+ `args: { message: string }`, and the **full `agentCard`** echoed back
325
+ so the SDK can route to the right local A2A handler (matching by URL,
326
+ name, skill set, or any other field).
327
+ 4. **Execution (SDK).** SDK invokes the A2A peer (its own client, its own
328
+ credentials, its own retries) and POSTs the reply text to
329
+ `POST /agent-runs/:runId/tool-results`.
330
+ 5. **Continuation (MANTYX).** MANTYX feeds the reply back into the model
331
+ loop as the tool result.
332
+
333
+ | Field | Required | Notes |
334
+ | --------------- | -------- | ----- |
335
+ | `kind` | yes | Discriminator literal `"a2a_local"`. |
336
+ | `name` | yes | Tool name surfaced to the model — must match `/^[a-zA-Z0-9_]{1,64}$/`. |
337
+ | `description` | no | Model-facing description override. When omitted, MANTYX synthesizes one from `agentCard.name`, `agentCard.description`, and the first 12 skills. |
338
+ | `agentCard` | yes | The resolved A2A Agent Card (JSON content). Schema follows the [A2A Agent Card spec](https://google.github.io/A2A/specification/#agent-card) — passthrough for unknown fields, so any spec-compliant card works. See the *Agent Card shape* table below for the fields MANTYX actually reads. |
339
+
340
+ **Agent Card shape** (only the fields MANTYX inspects; everything else is
341
+ forwarded verbatim back to the SDK):
342
+
343
+ | Card field | Used by MANTYX | Notes |
344
+ | --------------------- | -------------- | ----- |
345
+ | `protocolVersion` | echo only | A2A protocol version (e.g. `"0.3.0"`). |
346
+ | `name` | description | Used when synthesizing the tool description (`"Delegate a task to the <name> agent ..."`). |
347
+ | `description` | description | One-paragraph summary of what the peer does — surfaced to the model. |
348
+ | `url` | echo only | Peer's A2A endpoint. Forwarded back to the SDK in the `local_tool_call` event so the SDK can dispatch by URL. Never fetched server-side. |
349
+ | `version` | echo only | Peer agent version. |
350
+ | `provider` | echo only | Vendor info. |
351
+ | `capabilities` | echo only | A2A capability flags (streaming, push notifications, …). |
352
+ | `defaultInputModes` | echo only | Modalities the peer accepts. |
353
+ | `defaultOutputModes` | echo only | Modalities the peer returns. |
354
+ | `skills[]` | description | First 12 skills (`name`, `description`) are bulleted into the tool description so the model knows what to ask for. |
355
+ | `securitySchemes`, `security` | echo only | Forwarded to the SDK; MANTYX does no auth. |
356
+ | *anything else* | echo only | Passthrough — survives round-trip unchanged. |
357
+
358
+ Local A2A respects the same `localToolTimeoutMs` budget (default 5 minutes)
359
+ as `kind: "local"`. Tool-result POSTs after timeout return `409 run_terminal`.
360
+
361
+ ### 4.3 MCP tool refs
362
+
363
+ [Model Context Protocol](https://modelcontextprotocol.io/) connectors
364
+ expose every tool published by an MCP server to the agent loop in one go.
365
+ Like A2A, the protocol distinguishes by **where the server lives**:
366
+
367
+ - `kind: "mcp"` — *remote* MCP (Streamable HTTP). MANTYX has network access
368
+ to the server, dials it, lists the catalog at run start, and proxies each
369
+ call server-side. **MANTYX prefixes every discovered tool name with the
370
+ ref's `name`** (e.g. `github_search_repos`) so multiple MCP servers
371
+ can coexist without colliding.
372
+ - `kind: "mcp_local"` — *local* MCP (stdio, on-device, intranet). MANTYX
373
+ has **no** access to the server; the SDK does discovery, validation, and
374
+ execution. The SDK declares the tool catalog with **the exact names it
375
+ wants the model to see** — MANTYX does not auto-prefix.
376
+
377
+ #### `kind: "mcp"` — remote MCP
378
+
379
+ | Field | Required | Notes |
380
+ | -------------- | -------- | ----- |
381
+ | `kind` | yes | Discriminator literal `"mcp"`. |
382
+ | `name` | yes | Server label — MANTYX prefixes every discovered tool name as `<name>_<tool>`. Must match `/^[a-zA-Z0-9_]{1,64}$/`. |
383
+ | `url` | yes | Streamable HTTP MCP endpoint. |
384
+ | `headers` | no | Flat string→string HTTP headers (e.g. `Authorization`). Each value capped at 8 KB. |
385
+ | `toolFilter` | no | Allowlist of MCP tool names (un-prefixed, as the server returns them). When set, tools not in the list are silently dropped. When omitted, every published tool is exposed. |
386
+
387
+ If the MCP server is unreachable when the run starts, MANTYX still exposes
388
+ a single stub tool named `<server>_unavailable` so the model can report the
389
+ failure to the user instead of silently going without the catalog.
390
+
391
+ #### `kind: "mcp_local"` — local MCP
392
+
393
+ > **MANTYX does no MCP work for this kind.** It does not speak
394
+ > `Initialize`, `tools/list`, or `tools/call`, does not validate args,
395
+ > and does not interpret result content blocks. The SDK owns the entire
396
+ > MCP relationship — including discovery — and gives MANTYX the resolved
397
+ > tool catalog so the model can be told what's available. MANTYX is
398
+ > purely a transport.
399
+
400
+ Per-run lifecycle:
401
+
402
+ 1. **Discovery (SDK).** Before submitting the spec, the SDK connects to
403
+ its local MCP server, speaks `Initialize` (capturing the `Implementation`
404
+ block as optional `serverInfo`), then calls `tools/list`. The
405
+ resulting `Tool[]` array is shipped **verbatim** as `tools[]`.
406
+ 2. **Submission (SDK → MANTYX).** SDK posts the spec with the resolved
407
+ catalog. Field names match the MCP spec exactly — `inputSchema`, not
408
+ `parameters` — so a TypeScript SDK can pass through what its MCP client
409
+ already decoded. The `tools[].name` values are exactly what the model
410
+ will see; MANTYX does **not** auto-prefix or rename anything. Sanitize
411
+ them to `[a-zA-Z0-9_]{1,64}` yourself (if you want `fs/read_file` to
412
+ surface as `fs_read_file`, declare it that way).
413
+ 3. **Tool call (MANTYX → SDK).** When the model calls a tool, MANTYX emits
414
+ a `local_tool_call` event with `kind: "mcp_local"` and these extra
415
+ hints so the SDK can dispatch to the right MCP client:
416
+
417
+ ```jsonc
418
+ {
419
+ "seq": 9,
420
+ "type": "local_tool_call",
421
+ "data": {
422
+ "toolUseId": "tu_x",
423
+ "name": "fs_read_file", // SDK-declared name; same string the model called
424
+ "args": { "path": "/etc/hosts" },
425
+ "kind": "mcp_local",
426
+ "mcpServer": "fs", // the SDK-side label from the ref's `name`
427
+ "mcpToolName": "fs_read_file", // duplicates `name` for the SDK's convenience
428
+ "mcpServerInfo": { // present iff the ref carried `serverInfo`
429
+ "name": "mcp-server-filesystem",
430
+ "version": "0.4.1"
431
+ }
432
+ }
433
+ }
434
+ ```
435
+
436
+ 4. **Execution (SDK).** SDK validates args against the locally-known
437
+ `inputSchema`, speaks MCP `tools/call`, flattens the response content
438
+ blocks (typically the joined `text` blocks), and POSTs the result back
439
+ to `.../tool-results`.
440
+ 5. **Refresh (optional).** To pick up new tools mid-session, send the
441
+ updated `mcp_local` ref inside `POST /agent-sessions/:id/messages`'s
442
+ `tools` field; the catalog snapshot lives on the run, not the session.
443
+
444
+ | Field | Required | Notes |
445
+ | -------------- | -------- | ----- |
446
+ | `kind` | yes | Discriminator literal `"mcp_local"`. |
447
+ | `name` | yes | SDK-side server label (e.g. `"fs"`, `"jira"`). Echoed back unchanged as `mcpServer` on every `local_tool_call`. **Not used to prefix tool names.** Match `/^[a-zA-Z0-9_]{1,64}$/`. |
448
+ | `serverInfo` | no | The MCP `Implementation` block the SDK got from `Initialize` (`{ name, version? }`, plus any extra fields the server returned). Forwarded to the SDK in `local_tool_call.mcpServerInfo` for observability; not used to drive behavior. |
449
+ | `tools` | yes | Verbatim MCP `tools/list` output (1–64 entries). Each item is the standard MCP `Tool` shape: `{ name, description?, inputSchema?, annotations?, … }`. `name` is the model-facing tool name (SDK owns naming). `inputSchema` is the MCP-spec JSON Schema for the tool's arguments — used to constrain the LLM's tool call. Empty `inputSchema` means a no-arg tool. |
450
+
451
+ Older SDKs that ignore the `kind` discriminator still see a normal
452
+ `local_tool_call` and can match on `name` alone.
453
+
454
+ ### 4.4 `reasoningLevel` (provider thinking strength)
455
+
456
+ `reasoningLevel` controls how much extended-thinking / reasoning effort the
457
+ model spends per turn. MANTYX maps the same value onto every supported
458
+ provider:
459
+
460
+ - **OpenAI Responses** — `reasoning.effort` on reasoning models (o-series,
461
+ GPT-5.x, …; ignored on non-reasoning models and on xAI Grok).
462
+ - **Gemini 3+** — `thinkingConfig.thinkingLevel`; pre-Gemini-3 models
463
+ consume the equivalent `thinkingBudget` token count.
464
+ - **Anthropic / Bedrock-Anthropic** — extended thinking with a budget that
465
+ scales with strength (≈512 tokens at `low` → ≈8000 at `high`).
466
+
467
+ Two equivalent input shapes are accepted:
468
+
469
+ | Form | Values | Notes |
470
+ | ----------- | ------------------------------------- | ----- |
471
+ | **String** | `"off"`, `"low"`, `"medium"`, `"high"` | Snaps to the same anchors the web composer uses (Fast=30, Moderate=50, Smart=80; off=0). |
472
+ | **Number** | integer `0`–`100` | Pass-through to `RunAgentOptions.reasoningLevel`. `0` explicitly disables provider thinking even on reasoning models. |
473
+
474
+ When omitted, MANTYX falls back to the agent's default — for ephemeral
475
+ specs, that means thinking is off; for `agentId`-backed specs, it follows
476
+ the persisted `Agent` configuration.
477
+
478
+ For session-scoped runs the inheritance rules are:
479
+
480
+ - `POST /agent-sessions { reasoningLevel }` — sets the session-default
481
+ applied to every subsequent message run.
482
+ - `POST /agent-sessions/:id/messages { reasoningLevel }` — optional
483
+ per-message override; applies to that one run only and does not mutate
484
+ the session's stored value.
485
+
486
+ ### 4.5 `outputSchema` (structured final reply)
487
+
488
+ `outputSchema` constrains the model's **final assistant text** to a JSON
489
+ document conforming to a JSON Schema. Useful when the SDK needs to feed the
490
+ reply directly into downstream code without LLM-flavoured prose to parse out.
491
+
492
+ ```jsonc
493
+ "outputSchema": {
494
+ "name": "weather_report", // optional; default "output"
495
+ "schema": { // required, root must be a JSON object
496
+ "type": "object",
497
+ "properties": {
498
+ "city": { "type": "string" },
499
+ "temperature_c": { "type": "number" }
500
+ },
501
+ "required": ["city", "temperature_c"]
502
+ }
503
+ }
504
+ ```
505
+
506
+ | Field | Required | Notes |
507
+ | -------- | -------- | ----- |
508
+ | `name` | no | Stable identifier passed to providers (OpenAI `text.format.name`, Anthropic synthetic-tool name). Defaults to `"output"`. Must match `/^[a-zA-Z0-9_-]{1,64}$/`. |
509
+ | `schema` | yes | JSON Schema describing the final assistant text. Root must be a JSON **object** (most providers reject array / scalar roots in structured-output mode). The schema is passed through verbatim — MANTYX does not validate its contents; the provider does. |
510
+
511
+ Validation (server-side, `400 invalid_request` on violation):
512
+
513
+ | Constraint | Limit |
514
+ | ----------------------------------- | ----- |
515
+ | Serialized JSON size of `outputSchema` | ≤ 32 KB |
516
+ | `name` regex | `/^[a-zA-Z0-9_-]{1,64}$/` |
517
+ | `schema` shape | non-`null`, non-array JSON object |
518
+
519
+ **Per-provider behaviour** (mirrors the SDK's `RunAgentOptions.finalResponseSchema`):
520
+
521
+ | Provider | How the schema is enforced |
522
+ | ------------------------------ | -------------------------- |
523
+ | OpenAI Responses (o-series, GPT-5.x, …) | `text.format = { type: "json_schema", strict: true, name, schema }` on every turn (works alongside tool calls). |
524
+ | Gemini ≥ 2.5 | `responseMimeType: "application/json"` + `responseJsonSchema` on no-tools turns (Gemini rejects schemas alongside `functionDeclarations`). |
525
+ | Anthropic / Bedrock-Anthropic | Synthetic `final_report` tool whose `input_schema` is the supplied schema; `tool_choice` is forced on the no-tools finishing turn. The tool's input is surfaced as the assistant text. |
526
+ | xAI Grok, others | Ignored (the model returns plain text). |
527
+
528
+ The terminal `result` event still carries the reply as
529
+ `data.text: string` — the SDK is expected to `JSON.parse` and validate
530
+ against its own source-of-truth schema (Zod, Pydantic, …) so it keeps
531
+ control of error handling on malformed-but-rare provider outputs.
532
+
533
+ **Inheritance for sessions:**
534
+
535
+ - `POST /agent-sessions { outputSchema }` — sets the session-default,
536
+ applied to every subsequent message run.
537
+ - `POST /agent-sessions/:id/messages { outputSchema }` — optional
538
+ per-message override; applies to that one run only and does not mutate
539
+ the session's stored value.
540
+
541
+ `outputSchema` works for both ephemeral runs (`systemPrompt`-defined) and
542
+ `agentId`-backed runs — the runner applies the schema to whatever
543
+ `AgentSpec` it built for the run. When the field is omitted, runs return
544
+ unconstrained plain text as before.
545
+
546
+ ### 4.6 `metadata` (developer-supplied KV for filtering)
169
547
 
170
548
  `metadata` is a flat string→string KV that is **persisted alongside the run /
171
549
  session** and surfaced in the MANTYX dashboard. Use it to tag runs with your
@@ -282,6 +660,11 @@ data: <utf-8 JSON>
282
660
  // streamed assistant tokens (zero or more per turn)
283
661
  { "seq": 2, "type": "assistant_delta", "data": { "text": "Hello" } }
284
662
 
663
+ // streamed reasoning / extended-thinking tokens (only when reasoningLevel > 0
664
+ // AND the active provider exposes thought parts: Anthropic extended thinking,
665
+ // Gemini `includeThoughts`, OpenAI `reasoning_content` on reasoning models).
666
+ { "seq": 2, "type": "thinking_delta", "data": { "text": "First, I should…" } }
667
+
285
668
  // completed assistant message (text + any tool calls about to execute)
286
669
  { "seq": 3, "type": "assistant_message", "data": { "text": "...", "toolCalls": [...] } }
287
670
 
@@ -289,8 +672,13 @@ data: <utf-8 JSON>
289
672
  { "seq": 4, "type": "tool_call", "data": { "toolUseId": "...", "name": "...", "input": {...} } }
290
673
  { "seq": 5, "type": "tool_result", "data": { "toolUseId": "...", "name": "...", "ok": true, "summary": "..." } }
291
674
 
292
- // LOCAL tool call — SDK MUST POST a tool-result for the same toolUseId
293
- { "seq": 6, "type": "local_tool_call", "data": { "toolUseId": "tu_x", "name": "read_file", "input": { "path": "/etc/hosts" } } }
675
+ // LOCAL tool call — SDK MUST POST a tool-result for the same toolUseId.
676
+ // `kind` carries the discriminator so the SDK can dispatch to the right
677
+ // local handler (generic registry, A2A client, or MCP client). Older SDKs
678
+ // that ignore `kind` still match on `name`.
679
+ { "seq": 6, "type": "local_tool_call", "data": { "toolUseId": "tu_x", "name": "read_file", "args": { "path": "/etc/hosts" } } }
680
+ { "seq": 6, "type": "local_tool_call", "data": { "toolUseId": "tu_y", "name": "intranet_hr_agent", "args": { "message": "When does PTO reset?" }, "kind": "a2a_local", "agentCard": { "name": "Acme HR", "url": "https://hr.intranet.acme/a2a", "skills": [ { "id": "pto_lookup", "name": "PTO lookup" } ] } } }
681
+ { "seq": 6, "type": "local_tool_call", "data": { "toolUseId": "tu_z", "name": "fs_read_file", "args": { "path": "/etc/hosts" }, "kind": "mcp_local", "mcpServer": "fs", "mcpToolName": "fs_read_file", "mcpServerInfo": { "name": "mcp-server-filesystem", "version": "0.4.1" } } }
294
682
 
295
683
  // echo of the SDK's POSTed tool-result, persisted for replay
296
684
  { "seq": 7, "type": "local_tool_result_in", "data": { "toolUseId": "tu_x", "output": "127.0.0.1 ..." } }
@@ -368,17 +756,52 @@ A reference SDK should:
368
756
 
369
757
  1. Hold the API key + workspace slug and a small `fetch` (or stdlib HTTP)
370
758
  client.
371
- 2. Maintain a registry of local tool handlers, keyed by `name`.
759
+ 2. Maintain three local-callback registries (or one tagged-union registry),
760
+ keyed by tool `name`:
761
+ - **Generic local tools** (`kind: "local"`) — caller-supplied handler
762
+ functions, dispatched by `name`.
763
+ - **Local A2A peers** (`kind: "a2a_local"`) — caller-supplied A2A
764
+ clients. Resolve the peer's Agent Card *first* (e.g. `fetch
765
+ "<peer>/.well-known/agent-card.json"` or read from a local registry),
766
+ attach it to the spec as `agentCard`, and in the dispatcher look the
767
+ client up by `agentCard.url` (or any other field you indexed on)
768
+ when the `local_tool_call` arrives.
769
+ - **Local MCP servers** (`kind: "mcp_local"`) — caller-supplied MCP
770
+ client connections. Speak `Initialize` and `tools/list` once at
771
+ setup, ship the verbatim `tools[]` (with `inputSchema`) plus
772
+ optional `serverInfo`, and dispatch incoming calls by the `mcpServer`
773
+ field in the event payload.
774
+
775
+ `mantyx`, `mantyx_plugin`, `a2a`, and `mcp` refs are server-resolved —
776
+ no SDK-side registry needed.
372
777
  3. On `runAgent` / `session.send`:
778
+ - Accept `reasoningLevel` from the caller and pass it through unchanged
779
+ (string `"off" | "low" | "medium" | "high"` *or* number `0–100`); do
780
+ **not** translate to a vendor-specific knob — the server owns that
781
+ mapping so all SDKs stay aligned with the web composer.
373
782
  - POST the run/message, get `{ runId, streamUrl }`.
374
783
  - Open the SSE stream with `Last-Event-ID` if reconnecting.
375
- - On `local_tool_call`, look up the handler, validate args against the
376
- tool's schema, run it, POST the result back to `.../tool-results`.
784
+ - On `local_tool_call`, dispatch by the event's `kind` discriminator
785
+ (defaulting to `"local"` when omitted): generic registry / local A2A
786
+ client / local MCP client. Validate args against the tool's schema,
787
+ run it, POST the result back to `.../tool-results`.
788
+ - Treat `thinking_delta` events as opt-in callback fodder; many UIs hide
789
+ them by default. Their presence depends on `reasoningLevel > 0` and
790
+ on the active model exposing thought parts.
377
791
  - On terminal `result`, resolve the call. On `error` subtype, throw.
378
792
  4. Re-emit assistant deltas/events as a stream/iterator for callers who care
379
793
  about live output.
380
794
  5. Treat the protocol as the contract. Implementation details such as Valkey
381
795
  pub/sub or pgvector are server-side only.
382
-
383
- The TypeScript SDK in [`packages/mantyx-sdk/ts/`](../packages/mantyx-sdk/ts/) and the Go SDK in
384
- [`packages/mantyx-sdk/go/`](../packages/mantyx-sdk/go/) are reference implementations of this protocol.
796
+ 6. **Execution model (server-side, informational).** Runs are executed
797
+ out-of-process by the inbound worker off a dedicated
798
+ `mantyx:agent-runs` RabbitMQ queue; the API only persists the run row and
799
+ enqueues. Nothing about this is observable on the wire — clients still see
800
+ `202 { runId, streamUrl }` followed by the same SSE vocabulary — but it
801
+ means the `local_tool_call` ↔ `tool-results` round-trip is valid across
802
+ any API or worker replica, and transient broker failures surface as a
803
+ terminal `error` event on the stream, not as a 5xx on the initial POST.
804
+
805
+ The npm package [`@mantyx/sdk`](https://www.npmjs.com/package/@mantyx/sdk) and the Go module
806
+ [`github.com/mantyx/mantyx-go-sdk`](https://github.com/mantyx/mantyx-go-sdk) are reference implementations of this protocol
807
+ (maintained in the official **mantyx-sdk** repositories).
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@mantyx/sdk",
3
- "version": "0.2.0",
3
+ "version": "0.4.0",
4
4
  "description": "MANTYX as a hosted agent runtime: define ephemeral agents, mix server-side MANTYX tools with locally-executed tools, run them remotely.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.cjs",
@@ -11,6 +11,11 @@
11
11
  "types": "./dist/index.d.ts",
12
12
  "import": "./dist/index.js",
13
13
  "require": "./dist/index.cjs"
14
+ },
15
+ "./a2a-server": {
16
+ "types": "./dist/a2a-server.d.ts",
17
+ "import": "./dist/a2a-server.js",
18
+ "require": "./dist/a2a-server.cjs"
14
19
  }
15
20
  },
16
21
  "files": [
@@ -23,8 +28,8 @@
23
28
  ],
24
29
  "scripts": {
25
30
  "sync-version": "node ../scripts/sync-version.mjs",
26
- "build": "tsup src/index.ts --format esm,cjs --dts --clean --sourcemap",
27
- "dev": "tsup src/index.ts --format esm,cjs --dts --watch",
31
+ "build": "tsup --config tsup.config.ts",
32
+ "dev": "tsup --config tsup.config.ts --watch",
28
33
  "typecheck": "tsc --noEmit",
29
34
  "test": "vitest run",
30
35
  "test:watch": "vitest",
@@ -34,10 +39,26 @@
34
39
  "node": ">=18.17.0"
35
40
  },
36
41
  "dependencies": {
42
+ "@modelcontextprotocol/sdk": "^1.29.0",
37
43
  "zod": "^3.23.8"
38
44
  },
45
+ "peerDependencies": {
46
+ "@a2a-js/sdk": "^0.3.0",
47
+ "express": "^4.0.0 || ^5.0.0"
48
+ },
49
+ "peerDependenciesMeta": {
50
+ "@a2a-js/sdk": {
51
+ "optional": true
52
+ },
53
+ "express": {
54
+ "optional": true
55
+ }
56
+ },
39
57
  "devDependencies": {
58
+ "@a2a-js/sdk": "^0.3.0",
59
+ "@types/express": "^5.0.0",
40
60
  "@types/node": "^22.0.0",
61
+ "express": "^5.0.0",
41
62
  "tsup": "^8.3.0",
42
63
  "typescript": "^5.6.0",
43
64
  "vitest": "^2.1.0"