@tangle-network/agent-runtime 0.5.6 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,54 +1,32 @@
1
- # agent-runtime
2
-
3
- Reusable runtime lifecycle for domain-specific agents. Standardizes the
4
- task lifecycle (knowledge readiness → questions/acquisition → control loop
5
- → eval) and delegates domain behavior to an adapter. Owns no domain
6
- policy, models, tools, connectors, or UI.
7
-
8
- ## Contents
9
-
10
- - [Overview](#overview)
11
- - [Install](#install)
12
- - [Getting started](#getting-started)
13
- - [When to use which entry point](#when-to-use-which-entry-point)
14
- - [Backends for `runAgentTaskStream`](#backends-for-runagenttaskstream)
15
- - [Lifecycle events](#lifecycle-events)
16
- - [Knowledge providers](#knowledge-providers)
17
- - [Sanitized telemetry](#sanitized-telemetry)
18
- - [Package boundaries](#package-boundaries)
19
- - [Examples](#examples)
20
-
21
- ## Overview
22
-
23
- ```txt
24
- TaskSpec
25
- → Knowledge readiness
26
- → Question / acquisition decision
27
- → Agent control loop (observe / validate / decide / act)
28
- → Eval / verification
29
- → Run evidence
30
- ```
31
-
32
- For product agents that own a streaming model backend:
33
-
34
- ```txt
35
- TaskSpec
36
- → Knowledge readiness
37
- → Session create/resume
38
- → Backend stream
39
- → Sanitized RuntimeStreamEvent / SSE
40
- ```
1
+ # @tangle-network/agent-runtime
41
2
 
42
- ## Install
3
+ Production runtime substrate for domain agents. Owns the task lifecycle
4
+ (knowledge readiness, control loop, session resume, sanitized telemetry,
5
+ canonical `RuntimeRunRow` persistence + cost ledger) so domain repos stop
6
+ inventing their own.
43
7
 
44
8
  ```bash
45
9
  pnpm add @tangle-network/agent-runtime @tangle-network/agent-eval
46
10
  ```
47
11
 
48
- ## Getting started
12
+ ## What you get
49
13
 
50
- The smallest possible task a domain adapter responding to one task with
51
- no streaming:
14
+ | Entry point | When to reach for it |
15
+ |---|---|
16
+ | `runAgentTask` | Single-shot adapter-driven task with eval/verification |
17
+ | `runAgentTaskStream` | Streaming product loop with session resume + backends |
18
+ | `startRuntimeRun` | Canonical production-run row + cost ledger (NEW in 0.7.0) |
19
+ | `createTraceBridge` | Map `RuntimeStreamEvent` → `agent-eval` `TraceEvent` (NEW in 0.7.0) |
20
+ | `decideKnowledgeReadiness` | `ready` / `blocked` / `caveat` branch for routes / UI |
21
+ | `createOpenAICompatibleBackend` | OpenAI-compatible streaming backend (TCloud / cli-bridge) |
22
+ | `createSandboxPromptBackend` | Sandbox / sidecar `streamPrompt` clients |
23
+ | `createRuntimeStreamEventCollector` | Default-redacted sanitized telemetry over a stream |
24
+
25
+ Every public export is annotated `@stable` or `@experimental`. `@stable`
26
+ exports do not change shape inside a minor; `@experimental` exports may
27
+ change inside a minor and require a deliberate consumer bump.
28
+
29
+ ## Quickstart
52
30
 
53
31
  ```ts
54
32
  import { runAgentTask } from '@tangle-network/agent-runtime'
@@ -63,7 +41,7 @@ const result = await runAgentTask({
63
41
  async observe() { return { /* domain state */ } },
64
42
  async validate({ state }) { return [/* eval results */] },
65
43
  async decide({ state }) {
66
- return { kind: 'finish', reason: 'review complete' }
44
+ return { type: 'stop', pass: true, score: 1, reason: 'review complete' }
67
45
  },
68
46
  async act() { return undefined },
69
47
  },
@@ -72,160 +50,119 @@ const result = await runAgentTask({
72
50
  console.log(result.status, result.runRecords)
73
51
  ```
74
52
 
75
- Full runnable: [`examples/basic-task/`](./examples/basic-task/).
76
-
77
- ## When to use which entry point
78
-
79
- | You want… | Use |
80
- |---|---|
81
- | Single-shot task with eval/verification | `runAgentTask` |
82
- | Streaming product loop with session resume | `runAgentTaskStream` + a backend factory |
83
- | Just SSE serialization for an existing readiness report | `readinessServerSentEvent` |
84
- | Just sanitized telemetry over an existing run | `createRuntimeEventCollector` (+ `summarizeAgentTaskRun`) for `runAgentTask`, or `createRuntimeStreamEventCollector` for `runAgentTaskStream` |
85
- | Stable readiness branching (`ready` / `blocked` / `caveat`) in a route | `decideKnowledgeReadiness` |
86
-
87
- ## Backends for `runAgentTaskStream`
88
-
89
- Four SDK-agnostic factories ship in core:
90
-
91
- | Factory | When |
92
- |---|---|
93
- | `createOpenAICompatibleBackend` | TCloud / OpenAI-compatible chat APIs |
94
- | `createCliBridgeBackend` | HTTP CLI bridge streams |
95
- | `createSandboxPromptBackend` | Sandbox / sidecar `streamPrompt` clients |
96
- | `createIterableBackend` | Custom coding harnesses, browser agents |
97
-
98
- Adapters are intentionally thin. Product repos still own client
99
- construction, auth, concrete tool permissions, and UI behavior. See
100
- [`examples/sandbox-stream-backend/`](./examples/sandbox-stream-backend/) and
101
- [`examples/openai-stream-backend/`](./examples/openai-stream-backend/) for
102
- runnable wirings.
53
+ ## Canonical production-run lifecycle (NEW in 0.7.0)
103
54
 
104
- ## Lifecycle events
105
-
106
- `runAgentTask` and `runAgentTaskStream` emit typed lifecycle events
107
- through `onEvent`:
55
+ `startRuntimeRun` is the ONE abstraction for "the agent did a thing on
56
+ behalf of a customer; record what it did, what it cost, how it ended."
57
+ Replaces bespoke `agentRuns`-row helpers (legal-agent's
58
+ `completeProductionAgentRun` + `persistRuntimeRun` pair is the canonical
59
+ example of what this subsumes).
108
60
 
109
61
  ```ts
110
- await runAgentTask({
111
- task, adapter,
112
- onEvent(event) {
113
- console.log(event.type)
114
- },
62
+ import { startRuntimeRun, runAgentTaskStream } from '@tangle-network/agent-runtime'
63
+
64
+ const run = startRuntimeRun({
65
+ workspaceId: 'ws-1',
66
+ sessionId: threadId,
67
+ agentId: 'legal-chat-runtime',
68
+ taskSpec,
69
+ scenarioId: `legal-chat:${threadId}`,
70
+ adapter: { upsert: (row) => db.insert(agentRuns).values(row) },
115
71
  })
116
- ```
117
72
 
118
- Events cover readiness, question answering, acquisition, control-loop
119
- steps, and task completion. Every transition is observable without
120
- coupling domain adapters to logging, streaming, or telemetry concerns.
121
-
122
- This package does **not** stream model tokens for you. Domain adapters
123
- and product routes still own model calls, tool execution, and token
124
- streaming. agent-runtime emits lifecycle events around those actions.
73
+ for await (const event of runAgentTaskStream({ task: taskSpec, backend, input })) {
74
+ run.observe(event) // llm_call events update the cost ledger
75
+ if (event.type === 'final') {
76
+ run.complete({
77
+ status: event.status === 'completed' ? 'completed' : 'failed',
78
+ resultSummary: event.text ?? '',
79
+ error: event.status === 'failed' ? event.reason : undefined,
80
+ })
81
+ }
82
+ }
125
83
 
126
- ## Knowledge providers
84
+ await run.persist({ runtimeEvents: telemetry.events })
85
+ console.log(run.cost()) // { tokensIn, tokensOut, costUsd, wallMs, llmCalls }
86
+ ```
127
87
 
128
- Optional. A knowledge provider implements:
88
+ Full runnable: [`examples/runtime-run/`](./examples/runtime-run/).
129
89
 
130
- - `buildReadiness` score readiness against the task's required knowledge
131
- - `answerQuestions` — handle outstanding user questions
132
- - `executeAcquisitionPlans` — fetch missing evidence
133
- - `refreshReadiness` — rerun scoring after acquisition
90
+ ## agent-eval trace bridge (NEW in 0.7.0)
134
91
 
135
- Lets a task collect missing context before the control loop starts, then
136
- rerun readiness against new evidence. If readiness fails, `runAgentTask`
137
- stops before domain actions; adapters can override `onKnowledgeBlocked`
138
- to emit a domain action (asking a user, querying a connector, etc.).
92
+ If you persist traces in agent-eval's `TraceStore`, map runtime stream
93
+ events to `TraceEvent` once and stop hand-rolling the adapter in every
94
+ domain repo:
139
95
 
140
- For control policies or route handlers that need a stable readiness
141
- branch, use `decideKnowledgeReadiness(report)` it returns `ready`,
142
- `blocked`, or `caveat` plus gap IDs and the recommended action.
96
+ ```ts
97
+ import { createTraceBridge } from '@tangle-network/agent-runtime'
143
98
 
144
- ## Sanitized telemetry
99
+ const bridge = createTraceBridge({ runId, spanId })
100
+ for await (const event of runAgentTaskStream({ task, backend, input })) {
101
+ const trace = bridge.toTraceEvent(event)
102
+ if (trace) await traceStore.appendEvent(trace)
103
+ }
104
+ ```
145
105
 
146
- For logs, reports, UI telemetry — never serialize raw events directly.
147
- Use the built-in sanitized collector:
106
+ ## Error taxonomy
148
107
 
149
- ```ts
150
- import {
151
- createRuntimeEventCollector,
152
- summarizeAgentTaskRun,
153
- } from '@tangle-network/agent-runtime'
108
+ Every public function throws one of:
154
109
 
155
- const telemetry = createRuntimeEventCollector()
156
- const result = await runAgentTask({ task, adapter, onEvent: telemetry.onEvent })
110
+ | Error | When |
111
+ |---|---|
112
+ | `ValidationError` | Caller passed invalid arguments |
113
+ | `ConfigError` | Required env / config missing |
114
+ | `NotFoundError` | A named resource does not exist |
115
+ | `BackendTransportError` | Backend HTTP / IPC call returned non-success |
116
+ | `SessionMismatchError` | Resume requested against a different backend |
117
+ | `RuntimeRunStateError` | `RuntimeRunHandle` lifecycle methods called out of order |
157
118
 
158
- console.log(telemetry.events)
159
- console.log(summarizeAgentTaskRun(result))
160
- ```
119
+ All extend `AgentEvalError` (re-exported from `@tangle-network/agent-eval`)
120
+ and carry a stable `code` so cross-package handlers can pattern-match
121
+ without importing the runtime.
161
122
 
162
- By default, the collector redacts task inputs, user answers, credential
163
- questions, control payloads, evidence IDs, task metadata, and eval
164
- details. Private diagnostics opt-in via `RuntimeTelemetryOptions` flags
165
- (`includeInputs`, `includeUserAnswers`, `includeControlPayloads`,
166
- `includeEvidenceIds`, `includeRequirementDescriptions`,
167
- `includeMetadata`, `includeEvalDetails`).
123
+ ## Sanitized telemetry
168
124
 
169
- For `runAgentTaskStream`, use the sibling
170
- `createRuntimeStreamEventCollector`:
125
+ `task.intent` flows through sanitized telemetry on every event. **Never
126
+ set it to user input** — use a fixed string describing the operation
127
+ kind (e.g. `"Run a chat turn"`, `"Score a tax return"`). Route user-
128
+ visible content through `task.inputs` (redacted by default).
171
129
 
172
130
  ```ts
173
- import {
174
- createRuntimeStreamEventCollector,
175
- runAgentTaskStream,
176
- } from '@tangle-network/agent-runtime'
131
+ import { createRuntimeStreamEventCollector, runAgentTaskStream } from '@tangle-network/agent-runtime'
177
132
 
178
133
  const telemetry = createRuntimeStreamEventCollector()
179
134
  for await (const event of runAgentTaskStream({ task, backend })) {
180
135
  telemetry.onEvent(event)
181
136
  }
182
-
183
- console.log(telemetry.events)
184
- console.log(telemetry.summary())
137
+ console.log(telemetry.events, telemetry.summary())
185
138
  ```
186
139
 
187
- Same `RuntimeTelemetryOptions` flags apply. Streaming and non-streaming
188
- events have different field shapes (timestamps, sessions, text/tool
189
- deltas), which is why the factories are siblings rather than overloads —
190
- a single dispatcher would silently misroute events whose `type` literals
191
- overlap (`task_start`, `readiness_end`, etc.).
192
-
193
- ### `task.intent` is sanitized telemetry by default
194
-
195
- `task.intent` flows through sanitized telemetry on every event. **Never
196
- set it to user input** — use a fixed string describing the operation
197
- kind (e.g. `"Run a chat turn"`, `"Score a tax return"`). If you need to
198
- log user-visible intent, route it through `inputs` (which are redacted
199
- by default) instead.
200
-
201
- For SSE-over-HTTP, use the helpers:
202
-
203
- ```ts
204
- import { readinessServerSentEvent } from '@tangle-network/agent-runtime'
205
- writer.write(encoder.encode(readinessServerSentEvent(readinessReport)))
206
- ```
140
+ By default the collector redacts task inputs, user answers, credential
141
+ questions, control payloads, evidence IDs, task metadata, and eval
142
+ details. Private diagnostics opt-in via `RuntimeTelemetryOptions`.
207
143
 
208
144
  ## Package boundaries
209
145
 
210
146
  | Package | Owns |
211
147
  |---|---|
212
- | `agent-runtime` | Reusable lifecycle and adapter contracts |
213
- | `agent-eval` | Control loops, readiness scoring, traces, evals, failure classes, optimization, release evidence |
148
+ | `agent-runtime` | Lifecycle, adapters, backends, `RuntimeRunHandle`, trace bridge |
149
+ | `agent-eval` | Control loops, readiness scoring, traces, evals, failure classes, release evidence |
214
150
  | `agent-knowledge` | Evidence, claims, wiki pages, retrieval, knowledge bundle builders |
215
151
  | Domain packages | Domain tools, policies, credentials, UI text, rubrics |
216
152
 
217
153
  The API uses `runAgentTask`, not `runVerticalAgentTask`. `domain` is
218
- metadata on the task, because the runtime should be reusable across many
219
- kinds of agents without baking taxonomy into type names.
154
+ metadata on the task because the runtime is reusable across many kinds of
155
+ agents without baking taxonomy into type names.
220
156
 
221
157
  ## Examples
222
158
 
223
159
  Runnable in [`examples/`](./examples/):
224
160
 
225
- - [`basic-task/`](./examples/basic-task/) — the smallest `runAgentTask`
226
- - [`with-knowledge-readiness/`](./examples/with-knowledge-readiness/) — readiness gating + custom `onKnowledgeBlocked`
227
- - [`sanitized-telemetry/`](./examples/sanitized-telemetry/) — `createRuntimeEventCollector` + redaction policy
228
- - [`sanitized-telemetry-streaming/`](./examples/sanitized-telemetry-streaming/) — `createRuntimeStreamEventCollector` + redaction policy for `runAgentTaskStream`
161
+ - [`basic-task/`](./examples/basic-task/) — smallest `runAgentTask`
162
+ - [`with-knowledge-readiness/`](./examples/with-knowledge-readiness/) — readiness gating + `onKnowledgeBlocked`
163
+ - [`sanitized-telemetry/`](./examples/sanitized-telemetry/) — `createRuntimeEventCollector` + redaction
164
+ - [`sanitized-telemetry-streaming/`](./examples/sanitized-telemetry-streaming/) — streaming collector + redaction
229
165
  - [`sse-stream/`](./examples/sse-stream/) — Server-Sent Events for browser clients
230
- - [`sandbox-stream-backend/`](./examples/sandbox-stream-backend/) — `runAgentTaskStream` with `createSandboxPromptBackend` (synthetic sandbox client; real one in `agent-builder`)
231
- - [`openai-stream-backend/`](./examples/openai-stream-backend/) — `runAgentTaskStream` with `createOpenAICompatibleBackend`
166
+ - [`sandbox-stream-backend/`](./examples/sandbox-stream-backend/) — `createSandboxPromptBackend`
167
+ - [`openai-stream-backend/`](./examples/openai-stream-backend/) — `createOpenAICompatibleBackend`
168
+ - [`runtime-run/`](./examples/runtime-run/) — `startRuntimeRun` + cost ledger + persistence adapter (NEW)