@tangle-network/agent-runtime 0.6.0 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,54 +1,32 @@
1
- # agent-runtime
2
-
3
- Reusable runtime lifecycle for domain-specific agents. Standardizes the
4
- task lifecycle (knowledge readiness → questions/acquisition → control loop
5
- → eval) and delegates domain behavior to an adapter. Owns no domain
6
- policy, models, tools, connectors, or UI.
7
-
8
- ## Contents
9
-
10
- - [Overview](#overview)
11
- - [Install](#install)
12
- - [Getting started](#getting-started)
13
- - [When to use which entry point](#when-to-use-which-entry-point)
14
- - [Backends for `runAgentTaskStream`](#backends-for-runagenttaskstream)
15
- - [Lifecycle events](#lifecycle-events)
16
- - [Knowledge providers](#knowledge-providers)
17
- - [Sanitized telemetry](#sanitized-telemetry)
18
- - [Package boundaries](#package-boundaries)
19
- - [Examples](#examples)
20
-
21
- ## Overview
22
-
23
- ```txt
24
- TaskSpec
25
- → Knowledge readiness
26
- → Question / acquisition decision
27
- → Agent control loop (observe / validate / decide / act)
28
- → Eval / verification
29
- → Run evidence
30
- ```
31
-
32
- For product agents that own a streaming model backend:
33
-
34
- ```txt
35
- TaskSpec
36
- → Knowledge readiness
37
- → Session create/resume
38
- → Backend stream
39
- → Sanitized RuntimeStreamEvent / SSE
40
- ```
1
+ # @tangle-network/agent-runtime
41
2
 
42
- ## Install
3
+ Production runtime substrate for domain agents. Owns the task lifecycle
4
+ (knowledge readiness, control loop, session resume, sanitized telemetry,
5
+ canonical `RuntimeRunRow` persistence + cost ledger) so domain repos stop
6
+ inventing their own.
43
7
 
44
8
  ```bash
45
9
  pnpm add @tangle-network/agent-runtime @tangle-network/agent-eval
46
10
  ```
47
11
 
48
- ## Getting started
12
+ ## What you get
13
+
14
+ | Entry point | When to reach for it |
15
+ |---|---|
16
+ | `runAgentTask` | Single-shot adapter-driven task with eval/verification |
17
+ | `runAgentTaskStream` | Streaming product loop with session resume + backends |
18
+ | `startRuntimeRun` | Canonical production-run row + cost ledger (NEW in 0.7.0) |
19
+ | `createTraceBridge` | Map `RuntimeStreamEvent` → `agent-eval` `TraceEvent` (NEW in 0.7.0) |
20
+ | `decideKnowledgeReadiness` | `ready` / `blocked` / `caveat` branch for routes / UI |
21
+ | `createOpenAICompatibleBackend` | OpenAI-compatible streaming backend (TCloud / cli-bridge) |
22
+ | `createSandboxPromptBackend` | Sandbox / sidecar `streamPrompt` clients |
23
+ | `createRuntimeStreamEventCollector` | Default-redacted sanitized telemetry over a stream |
24
+
25
+ Every public export is annotated `@stable` or `@experimental`. `@stable`
26
+ exports do not change shape inside a minor; `@experimental` exports may
27
+ change inside a minor and require a deliberate consumer bump.
49
28
 
50
- The smallest possible task — a domain adapter responding to one task with
51
- no streaming:
29
+ ## Quickstart
52
30
 
53
31
  ```ts
54
32
  import { runAgentTask } from '@tangle-network/agent-runtime'
@@ -63,7 +41,7 @@ const result = await runAgentTask({
63
41
  async observe() { return { /* domain state */ } },
64
42
  async validate({ state }) { return [/* eval results */] },
65
43
  async decide({ state }) {
66
- return { kind: 'finish', reason: 'review complete' }
44
+ return { type: 'stop', pass: true, score: 1, reason: 'review complete' }
67
45
  },
68
46
  async act() { return undefined },
69
47
  },
@@ -72,165 +50,119 @@ const result = await runAgentTask({
72
50
  console.log(result.status, result.runRecords)
73
51
  ```
74
52
 
75
- Full runnable: [`examples/basic-task/`](./examples/basic-task/).
76
-
77
- ## When to use which entry point
78
-
79
- | You want… | Use |
80
- |---|---|
81
- | Single-shot task with eval/verification | `runAgentTask` |
82
- | Streaming product loop with session resume | `runAgentTaskStream` + a backend factory |
83
- | Just SSE serialization for an existing readiness report | `readinessServerSentEvent` |
84
- | Just sanitized telemetry over an existing run | `createRuntimeEventCollector` (+ `summarizeAgentTaskRun`) for `runAgentTask`, or `createRuntimeStreamEventCollector` for `runAgentTaskStream` |
85
- | Stable readiness branching (`ready` / `blocked` / `caveat`) in a route | `decideKnowledgeReadiness` |
86
-
87
- ## Backends for `runAgentTaskStream`
53
+ ## Canonical production-run lifecycle (NEW in 0.7.0)
88
54
 
89
- Three SDK-agnostic factories ship in core:
90
-
91
- | Factory | When |
92
- |---|---|
93
- | `createOpenAICompatibleBackend` | TCloud / OpenAI-compatible chat APIs |
94
- | `createSandboxPromptBackend` | Sandbox / sidecar `streamPrompt` clients |
95
- | `createIterableBackend` | Custom coding harnesses, browser agents |
96
-
97
- For [cli-bridge](https://github.com/drewstone/cli-bridge) (or any other
98
- OpenAI-compatible HTTP gateway), use `createOpenAICompatibleBackend` pointed
99
- at the gateway's `/v1/chat/completions` URL — the cli-bridge harness/model
100
- selector is just an OpenAI `model` string like `claude/sonnet` or
101
- `codex/gpt-5-codex`.
102
-
103
- Adapters are intentionally thin. Product repos still own client
104
- construction, auth, concrete tool permissions, and UI behavior. See
105
- [`examples/sandbox-stream-backend/`](./examples/sandbox-stream-backend/) and
106
- [`examples/openai-stream-backend/`](./examples/openai-stream-backend/) for
107
- runnable wirings.
108
-
109
- ## Lifecycle events
110
-
111
- `runAgentTask` and `runAgentTaskStream` emit typed lifecycle events
112
- through `onEvent`:
55
+ `startRuntimeRun` is the ONE abstraction for "the agent did a thing on
56
+ behalf of a customer; record what it did, what it cost, how it ended."
57
+ Replaces bespoke `agentRuns`-row helpers (legal-agent's
58
+ `completeProductionAgentRun` + `persistRuntimeRun` pair is the canonical
59
+ example of what this subsumes).
113
60
 
114
61
  ```ts
115
- await runAgentTask({
116
- task, adapter,
117
- onEvent(event) {
118
- console.log(event.type)
119
- },
62
+ import { startRuntimeRun, runAgentTaskStream } from '@tangle-network/agent-runtime'
63
+
64
+ const run = startRuntimeRun({
65
+ workspaceId: 'ws-1',
66
+ sessionId: threadId,
67
+ agentId: 'legal-chat-runtime',
68
+ taskSpec,
69
+ scenarioId: `legal-chat:${threadId}`,
70
+ adapter: { upsert: (row) => db.insert(agentRuns).values(row) },
120
71
  })
121
- ```
122
72
 
123
- Events cover readiness, question answering, acquisition, control-loop
124
- steps, and task completion. Every transition is observable without
125
- coupling domain adapters to logging, streaming, or telemetry concerns.
126
-
127
- This package does **not** stream model tokens for you. Domain adapters
128
- and product routes still own model calls, tool execution, and token
129
- streaming. agent-runtime emits lifecycle events around those actions.
73
+ for await (const event of runAgentTaskStream({ task: taskSpec, backend, input })) {
74
+ run.observe(event) // llm_call events update the cost ledger
75
+ if (event.type === 'final') {
76
+ run.complete({
77
+ status: event.status === 'completed' ? 'completed' : 'failed',
78
+ resultSummary: event.text ?? '',
79
+ error: event.status === 'failed' ? event.reason : undefined,
80
+ })
81
+ }
82
+ }
130
83
 
131
- ## Knowledge providers
84
+ await run.persist({ runtimeEvents: telemetry.events })
85
+ console.log(run.cost()) // { tokensIn, tokensOut, costUsd, wallMs, llmCalls }
86
+ ```
132
87
 
133
- Optional. A knowledge provider implements:
88
+ Full runnable: [`examples/runtime-run/`](./examples/runtime-run/).
134
89
 
135
- - `buildReadiness` score readiness against the task's required knowledge
136
- - `answerQuestions` — handle outstanding user questions
137
- - `executeAcquisitionPlans` — fetch missing evidence
138
- - `refreshReadiness` — rerun scoring after acquisition
90
+ ## agent-eval trace bridge (NEW in 0.7.0)
139
91
 
140
- Lets a task collect missing context before the control loop starts, then
141
- rerun readiness against new evidence. If readiness fails, `runAgentTask`
142
- stops before domain actions; adapters can override `onKnowledgeBlocked`
143
- to emit a domain action (asking a user, querying a connector, etc.).
92
+ If you persist traces in agent-eval's `TraceStore`, map runtime stream
93
+ events to `TraceEvent` once and stop hand-rolling the adapter in every
94
+ domain repo:
144
95
 
145
- For control policies or route handlers that need a stable readiness
146
- branch, use `decideKnowledgeReadiness(report)` it returns `ready`,
147
- `blocked`, or `caveat` plus gap IDs and the recommended action.
96
+ ```ts
97
+ import { createTraceBridge } from '@tangle-network/agent-runtime'
148
98
 
149
- ## Sanitized telemetry
99
+ const bridge = createTraceBridge({ runId, spanId })
100
+ for await (const event of runAgentTaskStream({ task, backend, input })) {
101
+ const trace = bridge.toTraceEvent(event)
102
+ if (trace) await traceStore.appendEvent(trace)
103
+ }
104
+ ```
150
105
 
151
- For logs, reports, UI telemetry — never serialize raw events directly.
152
- Use the built-in sanitized collector:
106
+ ## Error taxonomy
153
107
 
154
- ```ts
155
- import {
156
- createRuntimeEventCollector,
157
- summarizeAgentTaskRun,
158
- } from '@tangle-network/agent-runtime'
108
+ Every public function throws one of:
159
109
 
160
- const telemetry = createRuntimeEventCollector()
161
- const result = await runAgentTask({ task, adapter, onEvent: telemetry.onEvent })
110
+ | Error | When |
111
+ |---|---|
112
+ | `ValidationError` | Caller passed invalid arguments |
113
+ | `ConfigError` | Required env / config missing |
114
+ | `NotFoundError` | A named resource does not exist |
115
+ | `BackendTransportError` | Backend HTTP / IPC call returned non-success |
116
+ | `SessionMismatchError` | Resume requested against a different backend |
117
+ | `RuntimeRunStateError` | `RuntimeRunHandle` lifecycle methods called out of order |
162
118
 
163
- console.log(telemetry.events)
164
- console.log(summarizeAgentTaskRun(result))
165
- ```
119
+ All extend `AgentEvalError` (re-exported from `@tangle-network/agent-eval`)
120
+ and carry a stable `code` so cross-package handlers can pattern-match
121
+ without importing the runtime.
166
122
 
167
- By default, the collector redacts task inputs, user answers, credential
168
- questions, control payloads, evidence IDs, task metadata, and eval
169
- details. Private diagnostics opt-in via `RuntimeTelemetryOptions` flags
170
- (`includeInputs`, `includeUserAnswers`, `includeControlPayloads`,
171
- `includeEvidenceIds`, `includeRequirementDescriptions`,
172
- `includeMetadata`, `includeEvalDetails`).
123
+ ## Sanitized telemetry
173
124
 
174
- For `runAgentTaskStream`, use the sibling
175
- `createRuntimeStreamEventCollector`:
125
+ `task.intent` flows through sanitized telemetry on every event. **Never
126
+ set it to user input** — use a fixed string describing the operation
127
+ kind (e.g. `"Run a chat turn"`, `"Score a tax return"`). Route user-
128
+ visible content through `task.inputs` (redacted by default).
176
129
 
177
130
  ```ts
178
- import {
179
- createRuntimeStreamEventCollector,
180
- runAgentTaskStream,
181
- } from '@tangle-network/agent-runtime'
131
+ import { createRuntimeStreamEventCollector, runAgentTaskStream } from '@tangle-network/agent-runtime'
182
132
 
183
133
  const telemetry = createRuntimeStreamEventCollector()
184
134
  for await (const event of runAgentTaskStream({ task, backend })) {
185
135
  telemetry.onEvent(event)
186
136
  }
187
-
188
- console.log(telemetry.events)
189
- console.log(telemetry.summary())
137
+ console.log(telemetry.events, telemetry.summary())
190
138
  ```
191
139
 
192
- Same `RuntimeTelemetryOptions` flags apply. Streaming and non-streaming
193
- events have different field shapes (timestamps, sessions, text/tool
194
- deltas), which is why the factories are siblings rather than overloads —
195
- a single dispatcher would silently misroute events whose `type` literals
196
- overlap (`task_start`, `readiness_end`, etc.).
197
-
198
- ### `task.intent` is sanitized telemetry by default
199
-
200
- `task.intent` flows through sanitized telemetry on every event. **Never
201
- set it to user input** — use a fixed string describing the operation
202
- kind (e.g. `"Run a chat turn"`, `"Score a tax return"`). If you need to
203
- log user-visible intent, route it through `inputs` (which are redacted
204
- by default) instead.
205
-
206
- For SSE-over-HTTP, use the helpers:
207
-
208
- ```ts
209
- import { readinessServerSentEvent } from '@tangle-network/agent-runtime'
210
- writer.write(encoder.encode(readinessServerSentEvent(readinessReport)))
211
- ```
140
+ By default the collector redacts task inputs, user answers, credential
141
+ questions, control payloads, evidence IDs, task metadata, and eval
142
+ details. Private diagnostics opt-in via `RuntimeTelemetryOptions`.
212
143
 
213
144
  ## Package boundaries
214
145
 
215
146
  | Package | Owns |
216
147
  |---|---|
217
- | `agent-runtime` | Reusable lifecycle and adapter contracts |
218
- | `agent-eval` | Control loops, readiness scoring, traces, evals, failure classes, optimization, release evidence |
148
+ | `agent-runtime` | Lifecycle, adapters, backends, `RuntimeRunHandle`, trace bridge |
149
+ | `agent-eval` | Control loops, readiness scoring, traces, evals, failure classes, release evidence |
219
150
  | `agent-knowledge` | Evidence, claims, wiki pages, retrieval, knowledge bundle builders |
220
151
  | Domain packages | Domain tools, policies, credentials, UI text, rubrics |
221
152
 
222
153
  The API uses `runAgentTask`, not `runVerticalAgentTask`. `domain` is
223
- metadata on the task, because the runtime should be reusable across many
224
- kinds of agents without baking taxonomy into type names.
154
+ metadata on the task because the runtime is reusable across many kinds of
155
+ agents without baking taxonomy into type names.
225
156
 
226
157
  ## Examples
227
158
 
228
159
  Runnable in [`examples/`](./examples/):
229
160
 
230
- - [`basic-task/`](./examples/basic-task/) — the smallest `runAgentTask`
231
- - [`with-knowledge-readiness/`](./examples/with-knowledge-readiness/) — readiness gating + custom `onKnowledgeBlocked`
232
- - [`sanitized-telemetry/`](./examples/sanitized-telemetry/) — `createRuntimeEventCollector` + redaction policy
233
- - [`sanitized-telemetry-streaming/`](./examples/sanitized-telemetry-streaming/) — `createRuntimeStreamEventCollector` + redaction policy for `runAgentTaskStream`
161
+ - [`basic-task/`](./examples/basic-task/) — smallest `runAgentTask`
162
+ - [`with-knowledge-readiness/`](./examples/with-knowledge-readiness/) — readiness gating + `onKnowledgeBlocked`
163
+ - [`sanitized-telemetry/`](./examples/sanitized-telemetry/) — `createRuntimeEventCollector` + redaction
164
+ - [`sanitized-telemetry-streaming/`](./examples/sanitized-telemetry-streaming/) — streaming collector + redaction
234
165
  - [`sse-stream/`](./examples/sse-stream/) — Server-Sent Events for browser clients
235
- - [`sandbox-stream-backend/`](./examples/sandbox-stream-backend/) — `runAgentTaskStream` with `createSandboxPromptBackend` (synthetic sandbox client; real one in `agent-builder`)
236
- - [`openai-stream-backend/`](./examples/openai-stream-backend/) — `runAgentTaskStream` with `createOpenAICompatibleBackend`
166
+ - [`sandbox-stream-backend/`](./examples/sandbox-stream-backend/) — `createSandboxPromptBackend`
167
+ - [`openai-stream-backend/`](./examples/openai-stream-backend/) — `createOpenAICompatibleBackend`
168
+ - [`runtime-run/`](./examples/runtime-run/) — `startRuntimeRun` + cost ledger + persistence adapter (NEW)