@tangle-network/agent-runtime 0.23.0 → 0.25.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,551 +1,138 @@
1
1
  # @tangle-network/agent-runtime
2
2
 
3
- Production runtime substrate for domain agents. Owns the task lifecycle
4
- (knowledge readiness, control loop, session resume, sanitized telemetry,
5
- canonical `RuntimeRunRow` persistence + cost ledger), the chat-turn
6
- engine (NDJSON envelope + product hooks), the chat-model catalog +
7
- admission, and the declarative `defineAgent` manifest — so domain
8
- repos stop inventing their own. Long-running execution durability
9
- (reconnect, replay, dedup) lives in `@tangle-network/sandbox`.
3
+ Production runtime substrate for domain agents. Owns the chat-turn engine, task lifecycle, knowledge readiness, sanitized telemetry, OTEL export, model admission, and the declarative `defineAgent` manifest. Long-running execution durability lives in `@tangle-network/sandbox`.
10
4
 
11
5
  ```bash
12
- pnpm add @tangle-network/agent-runtime @tangle-network/agent-eval
6
+ pnpm add @tangle-network/agent-runtime @tangle-network/agent-eval @tangle-network/sandbox
13
7
  ```
14
8
 
15
- ## What you get
9
+ ## Hello world
16
10
 
17
- | Entry point | When to reach for it |
18
- |---|---|
19
- | `runAgentTask` | Single-shot adapter-driven task with eval/verification |
20
- | `runAgentTaskStream` | Streaming product loop with session resume + backends |
21
- | `handleChatTurn` | Framework-neutral chat-turn orchestrator (NDJSON + `session.run.*` envelope + product hooks) |
22
- | `deriveExecutionId` | Stable substrate executionId for `X-Execution-ID` cross-process reconnect |
23
- | `startRuntimeRun` | Canonical production-run row + cost ledger |
24
- | `defineAgent` | Declarative per-vertical agent manifest — surfaces, knowledge, rubric, run fn |
25
- | `createMcpServer` (`/mcp`) + `agent-runtime-mcp` bin | Stdio MCP server with the 5 delegation tools (`delegate_code`, `delegate_research`, `delegate_feedback`, `delegation_status`, `delegation_history`) |
26
- | `resolveChatModel` / `validateChatModelId` / `getModels` | Router catalog fetch + fail-closed admission + precedence resolver |
27
- | `decideKnowledgeReadiness` | `ready` / `blocked` / `caveat` branch for routes / UI |
28
- | `createOpenAICompatibleBackend` | OpenAI-compatible streaming backend (TCloud / cli-bridge) |
29
- | `createSandboxPromptBackend` | Sandbox / sidecar `streamPrompt` clients |
30
- | `createRuntimeStreamEventCollector` | Default-redacted sanitized telemetry over a stream |
31
- | `PlatformAuthClient` + `PlatformHubClient` (`/platform`) | Cross-site SSO + integrations hub |
32
-
33
- Every public export is annotated `@stable` or `@experimental`. `@stable`
34
- exports do not change shape inside a minor; `@experimental` exports may
35
- change inside a minor and require a deliberate consumer bump.
36
-
37
- ## Quickstart
38
-
39
- ```ts
40
- import { runAgentTask } from '@tangle-network/agent-runtime'
41
-
42
- const result = await runAgentTask({
43
- task: { id: 'review-2026-return', intent: 'Review the return', domain: 'tax' },
44
- adapter: {
45
- async observe() { return { /* domain state */ } },
46
- async validate({ state }) { return [/* eval results */] },
47
- async decide({ state }) { return { type: 'stop', pass: true, score: 1, reason: 'done' } },
48
- async act() { return undefined },
49
- },
50
- })
51
- console.log(result.status, result.runRecords)
52
- ```
53
-
54
- ## Chat turns
55
-
56
- `handleChatTurn` wraps a product `produce()` hook with the `session.run.*`
57
- lifecycle envelope, drains the producer stream through the NDJSON line
58
- protocol, and calls the persist / post-process hooks after drain.
59
- Framework-neutral: takes already-resolved values, never a `Request` or
60
- `Context`.
11
+ Every product agent is a `handleChatTurn` call inside a route. This 20-line snippet is what gtm / creative / legal / tax all run:
61
12
 
62
13
  ```ts
63
14
  import { handleChatTurn } from '@tangle-network/agent-runtime'
64
15
 
65
- const result = handleChatTurn({
66
- identity: { tenantId: workspaceId, sessionId: threadId, userId, turnIndex },
67
- hooks: {
68
- produce: () => ({
69
- stream: box.streamPrompt(prompt, sandboxOptions),
70
- finalText: () => assembled,
71
- }),
72
- persistAssistantMessage: async ({ identity, finalText }) => db.insert(messages).values(...),
73
- onTurnComplete: async ({ identity, finalText }) => extractProposals(finalText),
74
- traceFlush: () => traceSink.flush(),
75
- },
76
- waitUntil: ctx.waitUntil,
77
- })
78
- return new Response(result.body, { headers: { 'content-type': result.contentType } })
79
- ```
80
-
81
- ## Execution continuity
82
-
83
- Long-running execution durability — reconnect, replay, dedup — lives in
84
- the substrate. `@tangle-network/sandbox`'s `box.streamPrompt`
85
- auto-reconnects in-call (extracts `executionId` from the response and
86
- replays via the runtime endpoint on drop). Cross-process reconnect —
87
- worker dies, a fresh worker resumes the same execution — requires
88
- either bypassing the SDK and POSTing directly with `X-Execution-ID`
89
- (see `tax-agent/sessions.ts`) or a future SDK release that surfaces the
90
- field on `PromptOptions`.
91
-
92
- `deriveExecutionId` is the convention helper for the stable id the
93
- product persists alongside its session row:
94
-
95
- ```ts
96
- import { deriveExecutionId } from '@tangle-network/agent-runtime'
97
-
98
- const executionId = deriveExecutionId({ projectId, sessionId, turnIndex })
99
- // pass as `X-Execution-ID` header when calling the orchestrator directly
100
- ```
101
-
102
- ## Chat-model resolution
103
-
104
- One primitive every chat handler needs and was hand-rolling per repo:
105
- router catalog fetch, malformed-id guard, fail-closed catalog admission,
106
- precedence resolver. Policy-free — the caller passes its own precedence
107
- order and known-good allowlist.
108
-
109
- ```ts
110
- import {
111
- resolveChatModel, resolveRouterBaseUrl, validateChatModelId, getModels,
112
- } from '@tangle-network/agent-runtime'
113
-
114
- const routerBaseUrl = resolveRouterBaseUrl(env)
115
- const { model, source } = resolveChatModel(
116
- [
117
- { source: 'request', model: requestBody.model },
118
- { source: 'workspace', model: workspace.pinnedModel },
119
- { source: 'env', model: env.TCLOUD_CHAT_MODEL },
120
- ],
121
- { source: 'default', model: 'claude-sonnet-4-6' },
122
- )
123
- const validation = await validateChatModelId(model, {
124
- routerBaseUrl,
125
- allowlist: ['claude-sonnet-4-6'],
126
- })
127
- if (!validation.succeeded) throw new ConfigError(validation.error)
128
- ```
129
-
130
- Full runnable: [`examples/model-resolution/`](./examples/model-resolution/).
131
-
132
- ## Define an agent — declarative manifest
133
-
134
- `defineAgent` is the per-vertical layer that pairs a runtime adapter with
135
- the surfaces / knowledge / rubric / outcome contract `agent-eval`'s analyst
136
- loop drives improvement against.
137
-
138
- ```ts
139
- import { defineAgent } from '@tangle-network/agent-runtime/agent'
140
-
141
- export const myAgent = defineAgent({
142
- id: 'legal-agent',
143
- surfaces: { /* prompt, tools, skills — the levers an analyst can edit */ },
144
- knowledge: { /* requirements + provider */ },
145
- rubric: { /* dimensions + weights */ },
146
- run: async (ctx) => {
147
- /* product-specific run — typically wraps handleChatTurn or runAgentTaskStream */
148
- },
149
- })
150
- ```
151
-
152
- ## Canonical production-run lifecycle
153
-
154
- `startRuntimeRun` records what the agent did for a customer, what it
155
- cost, and how it ended. Replaces bespoke `agentRuns` helpers across
156
- consumer repos.
157
-
158
- ```ts
159
- import { startRuntimeRun, runAgentTaskStream } from '@tangle-network/agent-runtime'
160
-
161
- const run = startRuntimeRun({
162
- workspaceId: 'ws-1', sessionId: threadId, agentId: 'legal-chat-runtime',
163
- taskSpec, scenarioId: `legal-chat:${threadId}`,
164
- adapter: { upsert: (row) => db.insert(agentRuns).values(row) },
165
- })
166
- for await (const event of runAgentTaskStream({ task: taskSpec, backend, input })) {
167
- run.observe(event)
168
- if (event.type === 'final') {
169
- run.complete({ status: event.status === 'completed' ? 'completed' : 'failed', resultSummary: event.text ?? '' })
170
- }
171
- }
172
- await run.persist({ runtimeEvents: telemetry.events })
173
- ```
174
-
175
- Full runnable: [`examples/runtime-run/`](./examples/runtime-run/).
176
-
177
- ## Delegation tools (MCP)
178
-
179
- `@tangle-network/agent-runtime/mcp` ships a stdio MCP server that exposes
180
- five delegation tools to a sandbox coding-harness agent (claude-code,
181
- codex, opencode, ...). The product agent itself runs inside a sandbox
182
- during a chat; when it needs a long-running coder or researcher loop, it
183
- calls one of these tools instead of doing the work in-line.
184
-
185
- | Tool | Kind | Use |
186
- |---|---|---|
187
- | `delegate_code` | async | Code-modification task — returns a `taskId`; poll `delegation_status` for the patch |
188
- | `delegate_research` | async | Source-grounded research task — returns a `taskId`; poll for items + citations |
189
- | `delegate_feedback` | sync | Append an agent/user/judge rating against a delegation, artifact, or outcome |
190
- | `delegation_status` | sync | Snapshot of a delegation's state machine (`pending` → `running` → `completed` \| `failed` \| `cancelled`) |
191
- | `delegation_history` | sync | Newest-first read of past delegations + attached feedback |
192
-
193
- Mount the server from a Node entry point:
194
-
195
- ```ts
196
- import { Sandbox } from '@tangle-network/sandbox'
197
- import {
198
- createMcpServer,
199
- createDefaultCoderDelegate,
200
- } from '@tangle-network/agent-runtime/mcp'
201
-
202
- const sandboxClient = new Sandbox({ apiKey: process.env.TANGLE_API_KEY! })
203
- const server = createMcpServer({
204
- coderDelegate: createDefaultCoderDelegate({ sandboxClient }),
205
- // researcherDelegate: wire your own — see below.
206
- })
207
- await server.serve() // reads JSON-RPC from stdin, writes responses to stdout
208
- ```
209
-
210
- Or run the ready-made bin:
211
-
212
- ```bash
213
- TANGLE_API_KEY=sk_sandbox_... agent-runtime-mcp
214
- ```
215
-
216
- ### Surfacing the tools through `createOpenAICompatibleBackend`
217
-
218
- Sandbox callers discover MCP tools through the runtime mount. Callers that
219
- route through the OpenAI-compat backend (tcloud, OpenRouter, cli-bridge,
220
- OpenAI direct) must hand the model an explicit `tools[]` array — the
221
- backend does not auto-discover. `mcpToolsForRuntimeMcp()` returns the
222
- canonical projection so the model can call any of the 5 delegation tools
223
- through the OpenAI-compat path:
224
-
225
- ```ts
226
- import {
227
- createOpenAICompatibleBackend,
228
- mcpToolsForRuntimeMcp,
229
- } from '@tangle-network/agent-runtime'
230
-
231
- const backend = createOpenAICompatibleBackend({
232
- apiKey,
233
- baseUrl,
234
- model,
235
- tools: mcpToolsForRuntimeMcp(),
236
- })
237
- ```
238
-
239
- Use `mcpToolsForRuntimeMcpSubset(['delegate_research', 'delegation_status'])`
240
- when you want a curated subset (e.g. read-only research without the coder
241
- queue).
242
-
243
- The bin auto-wires the coder delegate and, when
244
- `@tangle-network/agent-knowledge` is installed as a peer, the researcher
245
- delegate. Environment knobs:
246
-
247
- - `TANGLE_API_KEY` — required (unless both `MCP_DISABLE_*` are set)
248
- - `SANDBOX_BASE_URL` — sandbox-SDK base URL override
249
- - `TANGLE_FLEET_ID` — switches placement from sibling-sandbox to fleet-workspace (see [Placement modes](#placement-modes))
250
- - `TANGLE_FLEET_EXCLUDE_MACHINES` — comma-separated machine ids to skip during fleet-mode round-robin (typically the coordinator)
251
- - `MCP_MAX_CONCURRENT_SANDBOXES` — kernel `maxConcurrency` cap (default 4)
252
- - `MCP_CODER_FANOUT_HARNESSES` — comma-separated harness ids for `variants > 1`
253
- - `MCP_DISABLE_CODER` / `MCP_DISABLE_RESEARCHER` — omit the matching tool
254
-
255
- ### Placement modes
256
-
257
- Where worker iterations land — sibling sandboxes vs the caller's fleet
258
- workspace — is controlled by `TANGLE_FLEET_ID`.
259
-
260
- **Sibling-sandbox mode (default).** No `TANGLE_FLEET_ID` set. Every
261
- `delegate_code` / `delegate_research` call invokes `sandboxClient.create(...)`
262
- and runs the worker in a fresh sandbox. The worker's diff lives in the
263
- worker's filesystem; the caller pulls it back via the structured tool
264
- result. Use this when the MCP server runs as a standalone CLI mounted
265
- outside a fleet (developer workflows, single-process integrations).
266
-
267
- **Fleet-workspace mode.** `TANGLE_FLEET_ID` set by the parent sandbox when
268
- it launches the MCP server. Each delegation dispatches onto an existing
269
- machine in that fleet via `fleet.sandbox(machineId).streamPrompt(...)`.
270
- The fleet's shared-workspace policy means worker machines mount the same
271
- filesystem as the caller — diffs land in-place, no cross-sandbox copy
272
- step. The bin logs `fleet-aware delegation: fleetId=...` to stderr on
273
- startup so the operator can confirm the placement.
274
-
275
- Pass `TANGLE_FLEET_ID` from a parent sandbox's `AgentProfile.mcpServers`
276
- config:
277
-
278
- ```ts
279
- import { defineAgentProfile } from '@tangle-network/sandbox'
280
-
281
- const parentProfile = defineAgentProfile({
282
- name: 'tax-orchestrator',
283
- mcp: {
284
- 'agent-runtime': {
285
- transport: 'stdio',
286
- command: 'agent-runtime-mcp',
287
- env: {
288
- TANGLE_API_KEY: '${TANGLE_API_KEY}',
289
- TANGLE_FLEET_ID: '${TANGLE_FLEET_ID}', // injected by orchestrator
290
- TANGLE_FLEET_EXCLUDE_MACHINES: 'coordinator', // skip the machine running this MCP server
291
- },
16
+ export async function POST({ request, env, ctx }: { request: Request; env: Env; ctx: ExecutionContext }) {
17
+ const { workspaceId, threadId, userMessage } = await request.json()
18
+ const box = await ensureWorkspaceSandbox(workspaceId)
19
+
20
+ const result = handleChatTurn({
21
+ identity: { tenantId: workspaceId, sessionId: threadId, userId: 'demo', turnIndex: 0 },
22
+ hooks: {
23
+ produce: () => ({
24
+ stream: box.streamPrompt(userMessage),
25
+ finalText: () => box.lastResponse(),
26
+ }),
27
+ persistAssistantMessage: async ({ identity, finalText }) => env.db.insertMessage(identity, finalText),
28
+ traceFlush: () => env.traceSink.flush(),
292
29
  },
293
- },
294
- })
295
- ```
296
-
297
- For non-bin entry points, wire an executor directly:
298
-
299
- ```ts
300
- import { Sandbox } from '@tangle-network/sandbox'
301
- import {
302
- createMcpServer,
303
- createDefaultCoderDelegate,
304
- createFleetWorkspaceExecutor,
305
- createSiblingSandboxExecutor,
306
- detectExecutor,
307
- } from '@tangle-network/agent-runtime/mcp'
308
-
309
- const sandboxClient = new Sandbox({ apiKey: process.env.TANGLE_API_KEY! })
310
-
311
- // Either pick automatically from env:
312
- const executor = await detectExecutor({ sandboxClient })
313
-
314
- // Or pin it explicitly:
315
- const fleet = await sandboxClient.fleets.get(process.env.TANGLE_FLEET_ID!)
316
- const fleetExecutor = createFleetWorkspaceExecutor({
317
- fleet,
318
- excludeMachineIds: ['coordinator'],
319
- })
320
-
321
- const server = createMcpServer({
322
- coderDelegate: createDefaultCoderDelegate({ executor: fleetExecutor }),
323
- })
30
+ waitUntil: ctx.waitUntil.bind(ctx),
31
+ })
32
+ return new Response(result.body, { headers: { 'content-type': result.contentType } })
33
+ }
324
34
  ```
325
35
 
326
- The kernel emits a `loop.iteration.dispatch` trace event for every
327
- iteration: `{ placement: 'sibling', sandboxId }` in sibling mode,
328
- `{ placement: 'fleet', fleetId, machineId, sandboxId }` in fleet mode.
329
- Analyst loops use this to correlate worker activity with the caller's
330
- machine.
36
+ That's the centerpiece. Everything else is "when chat alone isn't enough."
331
37
 
332
- ### Async semantics
333
-
334
- Coder + researcher delegations are **fire-and-poll**. The handler returns
335
- a `taskId` immediately; the agent calls `delegation_status(taskId)` until
336
- the state is terminal. Identical inputs return the same `taskId` —
337
- duplicate-call safety is built in via canonical-form hashing.
38
+ ## Which entry point do I reach for?
338
39
 
339
40
  ```
340
- agent delegate_code(goal, repoRoot) { taskId, estimatedDurationMs }
341
- agent → delegation_status(taskId) → { status: 'running', progress: { ... } }
342
- ... (minutes pass)
343
- agent delegation_status(taskId) → { status: 'completed', result: { profile: 'coder', output: <CoderOutput> } }
344
- agent delegate_feedback(refersTo, rating){ recorded: true, id }
41
+ Production chat turn (90% of products) handleChatTurn
42
+ Declarative agent manifest defineAgent (/agent)
43
+ Cross-process reconnect (X-Execution-ID) → deriveExecutionId
44
+ One-shot task with verification + eval → runAgentTask
45
+ Streaming task without chat-turn enveloperunAgentTaskStream
46
+ Multi-iteration parallel fanout (coders /
47
+ researchers proposing N variants) → runLoop + a Driver (/loops)
48
+ Tool/MCP delegation server (stdio) → createMcpServer (/mcp)
49
+ Analyst surface mutations → runAnalystLoop (/analyst-loop)
50
+ Production-run persistence + cost ledger → startRuntimeRun
51
+ Cross-site SSO / integrations hub → PlatformAuthClient (/platform)
345
52
  ```
346
53
 
347
- Task state lives in-memory inside the server process. A restart drops
348
- pending delegations — Phase 2 will move state into sqlite.
54
+ ## Defaults
349
55
 
350
- ### Wiring a researcher delegate
56
+ When nothing is specified:
351
57
 
352
- `agent-runtime` cannot depend on `@tangle-network/agent-knowledge` (it
353
- would induce a dependency cycle). Wire the researcher delegate from your
354
- own integration code:
58
+ | Knob | Default | Override |
59
+ |---|---|---|
60
+ | Backend model | `gpt-4o-mini` (when via `createOpenAICompatibleBackend`) | `model` option, or `MODEL_NAME` env |
61
+ | Backend provider | `openai-compat` when `TANGLE_API_KEY` present, else `openai` if `OPENAI_API_KEY` | `MODEL_PROVIDER` env |
62
+ | Router base URL | `https://router.tangle.tools/v1` | `TANGLE_ROUTER_BASE_URL` env |
63
+ | Sandbox base URL | `https://sandbox.tangle.tools` | `SANDBOX_API_URL` env |
64
+ | Loop iteration cap | 8 | `runLoop({ maxIterations })` |
65
+ | Driver | none — required to pass `Refine` or `FanoutVote` | `createRefineDriver()` or `createFanoutVoteDriver({ n })` |
66
+ | Validator | none — required if using `runLoop` | profile preset (e.g., `coderProfile().validator`) or your own |
67
+ | OTEL export | off | set `OTEL_EXPORTER_OTLP_ENDPOINT` |
68
+ | Trace propagation through MCP subprocess | off until product wires it | `env.TRACE_ID` + `env.PARENT_SPAN_ID` at MCP launch |
355
69
 
356
- ```ts
357
- import { runLoop } from '@tangle-network/agent-runtime/loops'
358
- import { researcherProfile, multiHarnessResearcherFanout } from '@tangle-network/agent-knowledge/profiles'
359
- import { createMcpServer, type ResearcherDelegate } from '@tangle-network/agent-runtime/mcp'
360
-
361
- const researcherDelegate: ResearcherDelegate = async (args, ctx) => {
362
- const task = {
363
- question: args.question,
364
- knowledgeNamespace: args.namespace,
365
- scope: args.scope,
366
- sources: args.sources,
367
- /* ...map config.recencyWindow ISO strings to Date objects */
368
- }
369
- if ((args.variants ?? 1) <= 1) {
370
- const preset = researcherProfile({ task })
371
- const result = await runLoop({
372
- driver: { /* single-shot */ async plan(t, h) { return h.length === 0 ? [t] : [] }, decide(h) { return h.length > 0 ? 'pick-winner' : 'fail' } },
373
- agentRun: preset.agentRunSpec, output: preset.output, validator: preset.validator,
374
- task, ctx: { sandboxClient, signal: ctx.signal }, maxIterations: 1,
375
- })
376
- return result.winner!.output
377
- }
378
- const fanout = multiHarnessResearcherFanout({ task })
379
- const result = await runLoop({
380
- driver: fanout.driver,
381
- agentRuns: fanout.agentRuns.slice(0, args.variants),
382
- output: fanout.output, validator: fanout.validator,
383
- task, ctx: { sandboxClient, signal: ctx.signal },
384
- maxIterations: args.variants ?? 1,
385
- })
386
- return result.winner!.output
387
- }
70
+ ## Composition with the rest of the stack
388
71
 
389
- createMcpServer({ researcherDelegate })
390
72
  ```
73
+ agent-runtime ──── handleChatTurn (chat turn lifecycle)
74
+ defineAgent (declarative manifest)
75
+ runLoop (multi-shot kernel)
76
+ createMcpServer (delegation tools server)
77
+ OTEL export (trace pipeline)
391
78
 
392
- ## OpenAI-compat backend tools + fail-loud errors
79
+ agent-eval ──── runEvalCampaign / runProductionLoop / runAgentMatrix
80
+ (consumes agent-runtime traces, scores, gates promotion)
393
81
 
394
- `createOpenAICompatibleBackend` forwards an OpenAI Chat Completions
395
- `tools[]` array on every request when configured. Streamed tool calls
396
- (both OpenAI delta shape and the Anthropic `tool_use` shape proxied by
397
- the router) are assembled across SSE chunks and emitted as a single
398
- `tool_call` RuntimeStreamEvent per call. The backend does NOT execute
399
- tools — surfacing the call is the contract; dispatch is the caller's
400
- problem.
82
+ agent-knowledge ─── proposeKnowledgeWrites / applyKnowledgeWriteBlocks
83
+ (analyst-loop produces these; runtime consumes them)
401
84
 
402
- ```ts
403
- import {
404
- createOpenAICompatibleBackend,
405
- runAgentTaskStream,
406
- type OpenAIChatTool,
407
- } from '@tangle-network/agent-runtime'
408
-
409
- const delegateResearch: OpenAIChatTool = {
410
- type: 'function',
411
- function: {
412
- name: 'delegate_research',
413
- description: 'Spin up a researcher loop and return a taskId.',
414
- parameters: {
415
- type: 'object',
416
- properties: { question: { type: 'string' } },
417
- required: ['question'],
418
- },
419
- },
420
- }
421
-
422
- const backend = createOpenAICompatibleBackend({
423
- apiKey: process.env.TANGLE_API_KEY!,
424
- baseUrl: 'https://router.tangle.tools/v1',
425
- model: 'claude-sonnet-4-6',
426
- tools: [delegateResearch /* + delegate_code, delegate_feedback, etc. */],
427
- toolChoice: 'auto', // or 'none' | 'required' | { type: 'function', function: { name } }
428
- })
429
-
430
- for await (const event of runAgentTaskStream({ task, backend, input })) {
431
- if (event.type === 'tool_call') {
432
- // Dispatch through your MCP / sandbox runtime. `args` is JSON-parsed
433
- // when the model produced a valid object, raw string otherwise.
434
- const result = await dispatch(event.toolName, event.args)
435
- // Feed `result` back on a follow-up turn via `input.messages`.
436
- }
437
- }
85
+ sandbox ──── AgentProfile (substrate type), Sandbox.create, exportTraceBundle
86
+ (provides the harness execution surface)
438
87
  ```
439
88
 
440
- Callers integrating with `agent-runtime/mcp` typically project the MCP
441
- server's `tools/list` response into this shape once at config time and
442
- pass the array as `tools`. The runtime intentionally does NOT depend on
443
- `@modelcontextprotocol/sdk` — keeping the backend transport thin lets
444
- domain repos own MCP plumbing.
445
-
446
- ### Transport errors fail loud
447
-
448
- Non-success HTTP responses (4xx/5xx after retry exhaustion) and
449
- connection failures throw `BackendTransportError` from inside the
450
- `stream()` generator. `runAgentTaskStream` catches the throw and emits:
451
-
452
- - `backend_error` event with `error: { kind: 'transport', message, status, body }`
453
- - terminal `final` event with `status: 'failed'` carrying the same `error` detail
454
-
455
- Consumers building a `RunRecord` MUST map `final.error` onto
456
- `RunRecord.error`. Treating an empty `finalText` as "agent produced
457
- nothing" hides credit exhaustion (HTTP 402), auth failure (401),
458
- model-not-found (404), and upstream outages (5xx).
459
-
460
- ```ts
461
- for await (const event of runAgentTaskStream({ task, backend, input })) {
462
- run.observe(event)
463
- if (event.type === 'final') {
464
- run.complete({
465
- status: event.status === 'completed' ? 'completed' : 'failed',
466
- resultSummary: event.text ?? '',
467
- error: event.error
468
- ? `${event.error.kind} ${event.error.status ?? ''}: ${event.error.message}`
469
- : undefined,
470
- })
471
- }
472
- }
473
- ```
89
+ Self-improving products consume all four. See [`agent-stack-adoption` skill](https://github.com/drewstone/dotfiles/blob/main/claude/skills/agent-stack-adoption/SKILL.md) for the end-to-end 10-phase adoption runbook.
474
90
 
475
- The body is captured truncated to 2 KiB. By default the sanitized
476
- telemetry envelope surfaces `error.kind` + `error.status` but redacts
477
- `error.body` (it can echo user-visible text from a provider's error
478
- page). Opt in with `RuntimeTelemetryOptions.includeControlPayloads`.
91
+ ## Examples
479
92
 
480
- ## Error taxonomy
93
+ Ordered as a learning progression — each example introduces one concept.
481
94
 
482
- | Error | When |
483
- |---|---|
484
- | `ValidationError` | Caller passed invalid arguments |
485
- | `ConfigError` | Required env / config missing |
486
- | `NotFoundError` | A named resource does not exist |
487
- | `BackendTransportError` | Backend HTTP / IPC call returned non-success — carries `status` + truncated `body` |
488
- | `SessionMismatchError` | Resume requested against a different backend |
489
- | `RuntimeRunStateError` | `RuntimeRunHandle` lifecycle methods called out of order |
95
+ **Start here:**
96
+ - [`chat-handler/`](./examples/chat-handler/) — `handleChatTurn`, the production centerpiece
490
97
 
491
- All extend `AgentEvalError` (re-exported from `@tangle-network/agent-eval`)
492
- and carry a stable `code` so cross-package handlers pattern-match
493
- without importing the runtime.
98
+ **Add observability + readiness:**
99
+ - [`with-knowledge-readiness/`](./examples/with-knowledge-readiness/) `requiredKnowledge` + `decideKnowledgeReadiness`
100
+ - [`sanitized-telemetry-streaming/`](./examples/sanitized-telemetry-streaming/) `createRuntimeStreamEventCollector` + redaction
101
+ - [`runtime-run/`](./examples/runtime-run/) — `startRuntimeRun` + cost ledger persistence
494
102
 
495
- ## Sanitized telemetry
103
+ **Add delegation:**
104
+ - [`mcp-delegation/`](./examples/mcp-delegation/) — mount `agent-runtime-mcp` in an `AgentProfile`
496
105
 
497
- `task.intent` flows through sanitized telemetry on every event. **Never
498
- set it to user input** use a fixed string describing the operation
499
- kind (e.g. `"Run a chat turn"`, `"Score a tax return"`). Route
500
- user-visible content through `task.inputs` (redacted by default).
106
+ **Multi-agent fanout (advanced):**
107
+ - [`coder-loop/`](./examples/coder-loop/)`coderProfile` + `runLoop` + `FanoutVote`
108
+ - [`researcher-loop/`](./examples/researcher-loop/) `researcherProfile` + `runLoop` (peer dep: `@tangle-network/agent-knowledge`)
109
+ - [`fleet-delegation/`](./examples/fleet-delegation/) `TANGLE_FLEET_ID` + `createFleetWorkspaceExecutor`
501
110
 
502
- ```ts
503
- import { createRuntimeStreamEventCollector, runAgentTaskStream } from '@tangle-network/agent-runtime'
111
+ ## Stability
504
112
 
505
- const telemetry = createRuntimeStreamEventCollector()
506
- for await (const event of runAgentTaskStream({ task, backend })) telemetry.onEvent(event)
507
- console.log(telemetry.events, telemetry.summary())
508
- ```
113
+ Every public export is annotated `@stable` or `@experimental`. `@stable` exports do not change shape inside a minor. `@experimental` exports may change inside a minor and require a deliberate consumer bump.
509
114
 
510
115
  ## Package boundaries
511
116
 
512
117
  | Package | Owns |
513
118
  |---|---|
514
- | `agent-runtime` | Task lifecycle, adapters, backends, chat-turn engine, execution-handle contract, model resolution, trace bridge, `defineAgent`. **Does not** own long-running execution state — that lives in `@tangle-network/sandbox` + orchestrator. |
515
- | `agent-runtime/platform` | Cross-site SSO (`PlatformAuthClient`) + integrations hub (`PlatformHubClient`) |
119
+ | `agent-runtime` | Task lifecycle, adapters, backends, chat-turn engine, model resolution, trace bridge, `defineAgent` |
120
+ | `agent-runtime/platform` | Cross-site SSO + integrations hub |
516
121
  | `agent-runtime/agent` | `defineAgent` + surfaces / outcome adapters |
517
122
  | `agent-runtime/analyst-loop` | `runAnalystLoop` — analyst registry driver |
518
- | `agent-eval` | Control loops, readiness scoring, traces, evals, judges, RL, release evidence |
123
+ | `agent-runtime/loops` | `runLoop` kernel + `Refine` / `FanoutVote` drivers |
124
+ | `agent-runtime/profiles` | `coderProfile`, `researcherProfile` presets |
125
+ | `agent-runtime/mcp` | `createMcpServer` + `agent-runtime-mcp` bin (5 delegation tools) |
126
+ | `agent-eval` | Evals, judges, scorecards, RL bridge, release evidence, matrix |
519
127
  | `agent-knowledge` | Evidence, claims, wiki pages, retrieval |
520
- | Domain packages | Domain tools, policies, credentials, UI text, rubrics |
521
-
522
- See [`docs/concepts.md`](./docs/concepts.md) for the mental model.
523
-
524
- ## Examples
128
+ | `sandbox` | `AgentProfile`, `Sandbox.create`, `streamPrompt`, `exportTraceBundle` |
525
129
 
526
- Runnable in [`examples/`](./examples/). Every example imports from
527
- `@tangle-network/agent-runtime` (the same surface consumers use):
528
-
529
- - [`basic-task/`](./examples/basic-task/) — smallest `runAgentTask`
530
- - [`with-knowledge-readiness/`](./examples/with-knowledge-readiness/) — readiness gating
531
- - [`sanitized-telemetry/`](./examples/sanitized-telemetry/) + [`-streaming/`](./examples/sanitized-telemetry-streaming/) — redaction
532
- - [`sse-stream/`](./examples/sse-stream/) — SSE helpers for browser clients
533
- - [`sandbox-stream-backend/`](./examples/sandbox-stream-backend/) — `createSandboxPromptBackend`
534
- - [`openai-stream-backend/`](./examples/openai-stream-backend/) — `createOpenAICompatibleBackend`
535
- - [`runtime-run/`](./examples/runtime-run/) — production-run row + cost ledger
536
- - [`model-resolution/`](./examples/model-resolution/) — router catalog + fail-closed admission
537
- - [`agent-into-reviewer/`](./examples/agent-into-reviewer/) — pipe one runtime's stream into a reviewer agent
538
- - [`chat-handler/`](./examples/chat-handler/) — `handleChatTurn` (the centerpiece production pattern)
539
- - [`coder-loop/`](./examples/coder-loop/) — `coderProfile` + `runLoop` + `FanoutVote` (driven-loop kernel)
540
- - [`researcher-loop/`](./examples/researcher-loop/) — `researcherProfile` + `runLoop` + `FanoutVote` (peer dep: `@tangle-network/agent-knowledge`)
541
- - [`mcp-delegation/`](./examples/mcp-delegation/) — mount `agent-runtime-mcp` in a product `AgentProfile` + stdio `tools/list` smoke
542
- - [`fleet-delegation/`](./examples/fleet-delegation/) — `TANGLE_FLEET_ID` env flip + `createFleetWorkspaceExecutor` topology
130
+ See [`docs/concepts.md`](./docs/concepts.md) for the deeper mental model.
543
131
 
544
132
  ## Tests
545
133
 
546
134
  ```bash
547
- pnpm test
135
+ pnpm test # 283+ tests across the kernel + drivers + MCP + backends + analyst-loop
548
136
  pnpm typecheck
549
- pnpm lint
550
137
  pnpm build
551
138
  ```