@tangle-network/agent-runtime 0.15.0 → 0.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,8 +2,11 @@
2
2
 
3
3
  Production runtime substrate for domain agents. Owns the task lifecycle
4
4
  (knowledge readiness, control loop, session resume, sanitized telemetry,
5
- canonical `RuntimeRunRow` persistence + cost ledger) so domain repos stop
6
- inventing their own.
5
+ canonical `RuntimeRunRow` persistence + cost ledger), the chat-turn
6
+ engine (NDJSON envelope + product hooks), the chat-model catalog +
7
+ admission, and the declarative `defineAgent` manifest — so domain
8
+ repos stop inventing their own. Long-running execution durability
9
+ (reconnect, replay, dedup) lives in `@tangle-network/sandbox`.
7
10
 
8
11
  ```bash
9
12
  pnpm add @tangle-network/agent-runtime @tangle-network/agent-eval
@@ -15,12 +18,17 @@ pnpm add @tangle-network/agent-runtime @tangle-network/agent-eval
15
18
  |---|---|
16
19
  | `runAgentTask` | Single-shot adapter-driven task with eval/verification |
17
20
  | `runAgentTaskStream` | Streaming product loop with session resume + backends |
21
+ | `handleChatTurn` | Framework-neutral chat-turn orchestrator (NDJSON + `session.run.*` envelope + product hooks) |
22
+ | `deriveExecutionId` | Stable substrate executionId for `X-Execution-ID` cross-process reconnect |
18
23
  | `startRuntimeRun` | Canonical production-run row + cost ledger |
24
+ | `defineAgent` | Declarative per-vertical agent manifest — surfaces, knowledge, rubric, run fn |
25
+ | `resolveChatModel` / `validateChatModelId` / `getModels` | Router catalog fetch + fail-closed admission + precedence resolver |
19
26
  | `createTraceBridge` | Map `RuntimeStreamEvent` → `agent-eval` `TraceEvent` |
20
27
  | `decideKnowledgeReadiness` | `ready` / `blocked` / `caveat` branch for routes / UI |
21
28
  | `createOpenAICompatibleBackend` | OpenAI-compatible streaming backend (TCloud / cli-bridge) |
22
29
  | `createSandboxPromptBackend` | Sandbox / sidecar `streamPrompt` clients |
23
30
  | `createRuntimeStreamEventCollector` | Default-redacted sanitized telemetry over a stream |
31
+ | `PlatformAuthClient` + `PlatformHubClient` (`/platform`) | Cross-site SSO + integrations hub |
24
32
 
25
33
  Every public export is annotated `@stable` or `@experimental`. `@stable`
26
34
  exports do not change shape inside a minor; `@experimental` exports may
@@ -32,55 +40,136 @@ change inside a minor and require a deliberate consumer bump.
32
40
  import { runAgentTask } from '@tangle-network/agent-runtime'
33
41
 
34
42
  const result = await runAgentTask({
35
- task: {
36
- id: 'review-2026-return',
37
- intent: 'Review the return for missing evidence',
38
- domain: 'tax',
39
- },
43
+ task: { id: 'review-2026-return', intent: 'Review the return', domain: 'tax' },
40
44
  adapter: {
41
45
  async observe() { return { /* domain state */ } },
42
46
  async validate({ state }) { return [/* eval results */] },
43
- async decide({ state }) {
44
- return { type: 'stop', pass: true, score: 1, reason: 'review complete' }
45
- },
47
+ async decide({ state }) { return { type: 'stop', pass: true, score: 1, reason: 'done' } },
46
48
  async act() { return undefined },
47
49
  },
48
50
  })
49
-
50
51
  console.log(result.status, result.runRecords)
51
52
  ```
52
53
 
54
+ ## Chat turns
55
+
56
+ `handleChatTurn` wraps a product `produce()` hook with the `session.run.*`
57
+ lifecycle envelope, drains the producer stream through the NDJSON line
58
+ protocol, and calls the persist / post-process hooks after drain.
59
+ Framework-neutral: takes already-resolved values, never a `Request` or
60
+ `Context`.
61
+
62
+ ```ts
63
+ import { handleChatTurn } from '@tangle-network/agent-runtime'
64
+
65
+ const result = handleChatTurn({
66
+ identity: { tenantId: workspaceId, sessionId: threadId, userId, turnIndex },
67
+ hooks: {
68
+ produce: () => ({
69
+ stream: box.streamPrompt(prompt, sandboxOptions),
70
+ finalText: () => assembled,
71
+ }),
72
+ persistAssistantMessage: async ({ identity, finalText }) => db.insert(messages).values(...),
73
+ onTurnComplete: async ({ identity, finalText }) => extractProposals(finalText),
74
+ traceFlush: () => traceSink.flush(),
75
+ },
76
+ waitUntil: ctx.waitUntil,
77
+ })
78
+ return new Response(result.body, { headers: { 'content-type': result.contentType } })
79
+ ```
80
+
81
+ ## Execution continuity
82
+
83
+ Long-running execution durability — reconnect, replay, dedup — lives in
84
+ the substrate. `@tangle-network/sandbox`'s `box.streamPrompt`
85
+ auto-reconnects in-call (extracts `executionId` from the response and
86
+ replays via the runtime endpoint on drop). Cross-process reconnect —
87
+ worker dies, a fresh worker resumes the same execution — requires
88
+ either bypassing the SDK and POSTing directly with `X-Execution-ID`
89
+ (see `tax-agent/sessions.ts`) or a future SDK release that surfaces the
90
+ field on `PromptOptions`.
91
+
92
+ `deriveExecutionId` is the convention helper for the stable id the
93
+ product persists alongside its session row:
94
+
95
+ ```ts
96
+ import { deriveExecutionId } from '@tangle-network/agent-runtime'
97
+
98
+ const executionId = deriveExecutionId({ projectId, sessionId, turnIndex })
99
+ // pass as `X-Execution-ID` header when calling the orchestrator directly
100
+ ```
101
+
102
+ ## Chat-model resolution
103
+
104
+ One primitive every chat handler needs and was hand-rolling per repo:
105
+ router catalog fetch, malformed-id guard, fail-closed catalog admission,
106
+ precedence resolver. Policy-free — the caller passes its own precedence
107
+ order and known-good allowlist.
108
+
109
+ ```ts
110
+ import {
111
+ resolveChatModel, resolveRouterBaseUrl, validateChatModelId, getModels,
112
+ } from '@tangle-network/agent-runtime'
113
+
114
+ const routerBaseUrl = resolveRouterBaseUrl(env)
115
+ const { model, source } = resolveChatModel(
116
+ [
117
+ { source: 'request', model: requestBody.model },
118
+ { source: 'workspace', model: workspace.pinnedModel },
119
+ { source: 'env', model: env.TCLOUD_CHAT_MODEL },
120
+ ],
121
+ { source: 'default', model: 'claude-sonnet-4-6' },
122
+ )
123
+ const validation = await validateChatModelId(model, {
124
+ routerBaseUrl,
125
+ allowlist: ['claude-sonnet-4-6'],
126
+ })
127
+ if (!validation.succeeded) throw new ConfigError(validation.error)
128
+ ```
129
+
130
+ Full runnable: [`examples/model-resolution/`](./examples/model-resolution/).
131
+
132
+ ## Define an agent — declarative manifest
133
+
134
+ `defineAgent` is the per-vertical layer that pairs a runtime adapter with
135
+ the surfaces / knowledge / rubric / outcome contract `agent-eval`'s analyst
136
+ loop drives improvement against.
137
+
138
+ ```ts
139
+ import { defineAgent } from '@tangle-network/agent-runtime/agent'
140
+
141
+ export const myAgent = defineAgent({
142
+ id: 'legal-agent',
143
+ surfaces: { /* prompt, tools, skills — the levers an analyst can edit */ },
144
+ knowledge: { /* requirements + provider */ },
145
+ rubric: { /* dimensions + weights */ },
146
+ run: async (ctx) => {
147
+ /* product-specific run — typically wraps handleChatTurn or runAgentTaskStream */
148
+ },
149
+ })
150
+ ```
151
+
53
152
  ## Canonical production-run lifecycle
54
153
 
55
- `startRuntimeRun` records what the agent did on behalf of a customer,
56
- what it cost, and how it ended. Replaces bespoke `agentRuns`-row helpers
57
- across consumer repos with a single contract.
154
+ `startRuntimeRun` records what the agent did for a customer, what it
155
+ cost, and how it ended. Replaces bespoke `agentRuns` helpers across
156
+ consumer repos.
58
157
 
59
158
  ```ts
60
159
  import { startRuntimeRun, runAgentTaskStream } from '@tangle-network/agent-runtime'
61
160
 
62
161
  const run = startRuntimeRun({
63
- workspaceId: 'ws-1',
64
- sessionId: threadId,
65
- agentId: 'legal-chat-runtime',
66
- taskSpec,
67
- scenarioId: `legal-chat:${threadId}`,
162
+ workspaceId: 'ws-1', sessionId: threadId, agentId: 'legal-chat-runtime',
163
+ taskSpec, scenarioId: `legal-chat:${threadId}`,
68
164
  adapter: { upsert: (row) => db.insert(agentRuns).values(row) },
69
165
  })
70
-
71
166
  for await (const event of runAgentTaskStream({ task: taskSpec, backend, input })) {
72
- run.observe(event) // llm_call events update the cost ledger
167
+ run.observe(event)
73
168
  if (event.type === 'final') {
74
- run.complete({
75
- status: event.status === 'completed' ? 'completed' : 'failed',
76
- resultSummary: event.text ?? '',
77
- error: event.status === 'failed' ? event.reason : undefined,
78
- })
169
+ run.complete({ status: event.status === 'completed' ? 'completed' : 'failed', resultSummary: event.text ?? '' })
79
170
  }
80
171
  }
81
-
82
172
  await run.persist({ runtimeEvents: telemetry.events })
83
- console.log(run.cost()) // { tokensIn, tokensOut, costUsd, wallMs, llmCalls }
84
173
  ```
85
174
 
86
175
  Full runnable: [`examples/runtime-run/`](./examples/runtime-run/).
@@ -89,7 +178,7 @@ Full runnable: [`examples/runtime-run/`](./examples/runtime-run/).
89
178
 
90
179
  If you persist traces in agent-eval's `TraceStore`, the bridge maps
91
180
  runtime stream events to `TraceEvent` so consumer repos don't hand-roll
92
- the adapter:
181
+ the adapter.
93
182
 
94
183
  ```ts
95
184
  import { createTraceBridge } from '@tangle-network/agent-runtime'
@@ -103,8 +192,6 @@ for await (const event of runAgentTaskStream({ task, backend, input })) {
103
192
 
104
193
  ## Error taxonomy
105
194
 
106
- Every public function throws one of:
107
-
108
195
  | Error | When |
109
196
  |---|---|
110
197
  | `ValidationError` | Caller passed invalid arguments |
@@ -115,83 +202,60 @@ Every public function throws one of:
115
202
  | `RuntimeRunStateError` | `RuntimeRunHandle` lifecycle methods called out of order |
116
203
 
117
204
  All extend `AgentEvalError` (re-exported from `@tangle-network/agent-eval`)
118
- and carry a stable `code` so cross-package handlers can pattern-match
205
+ and carry a stable `code` so cross-package handlers pattern-match
119
206
  without importing the runtime.
120
207
 
121
208
  ## Sanitized telemetry
122
209
 
123
210
  `task.intent` flows through sanitized telemetry on every event. **Never
124
211
  set it to user input** — use a fixed string describing the operation
125
- kind (e.g. `"Run a chat turn"`, `"Score a tax return"`). Route user-
126
- visible content through `task.inputs` (redacted by default).
212
+ kind (e.g. `"Run a chat turn"`, `"Score a tax return"`). Route
213
+ user-visible content through `task.inputs` (redacted by default).
127
214
 
128
215
  ```ts
129
216
  import { createRuntimeStreamEventCollector, runAgentTaskStream } from '@tangle-network/agent-runtime'
130
217
 
131
218
  const telemetry = createRuntimeStreamEventCollector()
132
- for await (const event of runAgentTaskStream({ task, backend })) {
133
- telemetry.onEvent(event)
134
- }
219
+ for await (const event of runAgentTaskStream({ task, backend })) telemetry.onEvent(event)
135
220
  console.log(telemetry.events, telemetry.summary())
136
221
  ```
137
222
 
138
- By default the collector redacts task inputs, user answers, credential
139
- questions, control payloads, evidence IDs, task metadata, and eval
140
- details. Private diagnostics opt-in via `RuntimeTelemetryOptions`.
141
-
142
223
  ## Package boundaries
143
224
 
144
225
  | Package | Owns |
145
226
  |---|---|
146
- | `agent-runtime` | Lifecycle, adapters, backends, `RuntimeRunHandle`, trace bridge |
147
- | `agent-runtime/platform` | Server-side clients for the Tangle platform: cross-site SSO (`PlatformAuthClient`) and integrations hub (`PlatformHubClient`) |
148
- | `agent-eval` | Control loops, readiness scoring, traces, evals, failure classes, release evidence |
149
- | `agent-knowledge` | Evidence, claims, wiki pages, retrieval, knowledge bundle builders |
227
+ | `agent-runtime` | Task lifecycle, adapters, backends, chat-turn engine, execution-handle contract, model resolution, trace bridge, `defineAgent`. **Does not** own long-running execution state — that lives in `@tangle-network/sandbox` + orchestrator. |
228
+ | `agent-runtime/platform` | Cross-site SSO (`PlatformAuthClient`) + integrations hub (`PlatformHubClient`) |
229
+ | `agent-runtime/agent` | `defineAgent` + surfaces / outcome adapters |
230
+ | `agent-runtime/analyst-loop` | `runAnalystLoop` analyst registry driver |
231
+ | `agent-eval` | Control loops, readiness scoring, traces, evals, judges, RL, release evidence |
232
+ | `agent-knowledge` | Evidence, claims, wiki pages, retrieval |
150
233
  | Domain packages | Domain tools, policies, credentials, UI text, rubrics |
151
234
 
152
- ### `agent-runtime/platform` Login with Tangle + integrations hub
153
-
154
- ```ts
155
- import {
156
- PlatformAuthClient,
157
- PlatformHubClient,
158
- } from '@tangle-network/agent-runtime/platform'
159
-
160
- // Login with Tangle (cross-site SSO bridge).
161
- const auth = new PlatformAuthClient({
162
- baseUrl: process.env.TANGLE_PLATFORM_URL!, // https://id.tangle.tools
163
- appId: 'gtm-agent', // must be registered in TRUSTED_APPS
164
- })
165
- const url = auth.authorizeUrl({ state: csrfToken, redirectUri: callbackUrl })
166
- // …user redirected to `url`, returns to callbackUrl with ?code=…
167
- const { apiKey, user } = await auth.exchange(code)
168
-
169
- // Integrations hub (uses the user's apiKey from cross-site exchange).
170
- const hub = new PlatformHubClient({
171
- baseUrl: process.env.TANGLE_PLATFORM_URL!,
172
- bearer: apiKey,
173
- })
174
- const connections = await hub.listConnections()
175
- const { authorizationUrl } = await hub.startAuth({
176
- providerId: 'google',
177
- connectorId: 'gmail',
178
- returnUrl: 'https://gtm.tangle.tools/integrations',
179
- })
180
- ```
181
-
182
- The API uses `runAgentTask`, not `runVerticalAgentTask`. `domain` is
183
- metadata on the task because the runtime is reusable across many kinds of
184
- agents without baking taxonomy into type names.
235
+ See [`docs/concepts.md`](./docs/concepts.md) for the mental model.
185
236
 
186
237
  ## Examples
187
238
 
188
- Runnable in [`examples/`](./examples/):
239
+ Runnable in [`examples/`](./examples/). Every example imports from
240
+ `@tangle-network/agent-runtime` (the same surface consumers use):
189
241
 
190
242
  - [`basic-task/`](./examples/basic-task/) — smallest `runAgentTask`
191
- - [`with-knowledge-readiness/`](./examples/with-knowledge-readiness/) — readiness gating + `onKnowledgeBlocked`
192
- - [`sanitized-telemetry/`](./examples/sanitized-telemetry/) `createRuntimeEventCollector` + redaction
193
- - [`sanitized-telemetry-streaming/`](./examples/sanitized-telemetry-streaming/) — streaming collector + redaction
194
- - [`sse-stream/`](./examples/sse-stream/) — Server-Sent Events for browser clients
243
+ - [`with-knowledge-readiness/`](./examples/with-knowledge-readiness/) — readiness gating
244
+ - [`sanitized-telemetry/`](./examples/sanitized-telemetry/) + [`-streaming/`](./examples/sanitized-telemetry-streaming/) redaction
245
+ - [`sse-stream/`](./examples/sse-stream/) — SSE helpers for browser clients
195
246
  - [`sandbox-stream-backend/`](./examples/sandbox-stream-backend/) — `createSandboxPromptBackend`
196
247
  - [`openai-stream-backend/`](./examples/openai-stream-backend/) — `createOpenAICompatibleBackend`
197
- - [`runtime-run/`](./examples/runtime-run/) — `startRuntimeRun` + cost ledger + persistence adapter
248
+ - [`runtime-run/`](./examples/runtime-run/) — production-run row + cost ledger
249
+ - [`model-resolution/`](./examples/model-resolution/) — router catalog + fail-closed admission
250
+ - [`agent-into-reviewer/`](./examples/agent-into-reviewer/) — pipe one runtime's stream into a reviewer agent
251
+ - [`chat-handler/`](./examples/chat-handler/) — `handleChatTurn` (the centerpiece production pattern)
252
+ - [`production-trace-sink/`](./examples/production-trace-sink/) — `createProductionTraceSink` data capture
253
+
254
+ ## Tests
255
+
256
+ ```bash
257
+ pnpm test
258
+ pnpm typecheck
259
+ pnpm lint
260
+ pnpm build
261
+ ```
package/dist/agent.d.ts CHANGED
@@ -1,6 +1,6 @@
1
1
  import * as _tangle_network_agent_eval from '@tangle-network/agent-eval';
2
2
  import { FindingSubject, TraceAnalystKindSpec, AnalystFinding, TraceStore, RunCompleteHook, FeedbackLabel, FeedbackTrajectoryStore } from '@tangle-network/agent-eval';
3
- import { R as RuntimeStreamEvent } from './types-CYxfw14J.js';
3
+ import { R as RuntimeStreamEvent } from './types-DmhXdAhu.js';
4
4
  import { I as ImprovementAdapter, K as KnowledgeAdapter, a as RunAnalystLoopResult } from './types-D_MXrmJP.js';
5
5
 
6
6
  /**