npm - @tangle-network/agent-runtime - Versions diffs - 0.6.0 → 0.8.0 - Mend

@tangle-network/agent-runtime 0.6.0 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -1,54 +1,32 @@
-# agent-runtime
-Reusable runtime lifecycle for domain-specific agents. Standardizes the
-task lifecycle (knowledge readiness → questions/acquisition → control loop
-→ eval) and delegates domain behavior to an adapter. Owns no domain
-policy, models, tools, connectors, or UI.
-## Contents
-- [Overview](#overview)
-- [Install](#install)
-- [Getting started](#getting-started)
-- [When to use which entry point](#when-to-use-which-entry-point)
-- [Backends for `runAgentTaskStream`](#backends-for-runagenttaskstream)
-- [Lifecycle events](#lifecycle-events)
-- [Knowledge providers](#knowledge-providers)
-- [Sanitized telemetry](#sanitized-telemetry)
-- [Package boundaries](#package-boundaries)
-- [Examples](#examples)
-## Overview
-```txt
-TaskSpec
-  → Knowledge readiness
-  → Question / acquisition decision
-  → Agent control loop (observe / validate / decide / act)
-  → Eval / verification
-  → Run evidence
-```
-For product agents that own a streaming model backend:
-```txt
-TaskSpec
-  → Knowledge readiness
-  → Session create/resume
-  → Backend stream
-  → Sanitized RuntimeStreamEvent / SSE
-```
+# @tangle-network/agent-runtime
-## Install
+Production runtime substrate for domain agents. Owns the task lifecycle
+(knowledge readiness, control loop, session resume, sanitized telemetry,
+canonical `RuntimeRunRow` persistence + cost ledger) so domain repos stop
+inventing their own.
 ```bash
 pnpm add @tangle-network/agent-runtime @tangle-network/agent-eval
 ```
-## Getting started
+## What you get
+| Entry point | When to reach for it |
+|---|---|
+| `runAgentTask` | Single-shot adapter-driven task with eval/verification |
+| `runAgentTaskStream` | Streaming product loop with session resume + backends |
+| `startRuntimeRun` | Canonical production-run row + cost ledger (NEW in 0.7.0) |
+| `createTraceBridge` | Map `RuntimeStreamEvent` → `agent-eval` `TraceEvent` (NEW in 0.7.0) |
+| `decideKnowledgeReadiness` | `ready` / `blocked` / `caveat` branch for routes / UI |
+| `createOpenAICompatibleBackend` | OpenAI-compatible streaming backend (TCloud / cli-bridge) |
+| `createSandboxPromptBackend` | Sandbox / sidecar `streamPrompt` clients |
+| `createRuntimeStreamEventCollector` | Default-redacted sanitized telemetry over a stream |
+Every public export is annotated `@stable` or `@experimental`. `@stable`
+exports do not change shape inside a minor; `@experimental` exports may
+change inside a minor and require a deliberate consumer bump.
-The smallest possible task — a domain adapter responding to one task with
-no streaming:
+## Quickstart
 ```ts
 import { runAgentTask } from '@tangle-network/agent-runtime'
@@ -63,7 +41,7 @@ const result = await runAgentTask({
     async observe() { return { /* domain state */ } },
     async validate({ state }) { return [/* eval results */] },
     async decide({ state }) {
-      return { kind: 'finish', reason: 'review complete' }
+      return { type: 'stop', pass: true, score: 1, reason: 'review complete' }
     },
     async act() { return undefined },
   },
@@ -72,165 +50,119 @@ const result = await runAgentTask({
 console.log(result.status, result.runRecords)
 ```
-Full runnable: [`examples/basic-task/`](./examples/basic-task/).
-## When to use which entry point
-| You want… | Use |
-|---|---|
-| Single-shot task with eval/verification | `runAgentTask` |
-| Streaming product loop with session resume | `runAgentTaskStream` + a backend factory |
-| Just SSE serialization for an existing readiness report | `readinessServerSentEvent` |
-| Just sanitized telemetry over an existing run | `createRuntimeEventCollector` (+ `summarizeAgentTaskRun`) for `runAgentTask`, or `createRuntimeStreamEventCollector` for `runAgentTaskStream` |
-| Stable readiness branching (`ready` / `blocked` / `caveat`) in a route | `decideKnowledgeReadiness` |
-## Backends for `runAgentTaskStream`
+## Canonical production-run lifecycle (NEW in 0.7.0)
-Three SDK-agnostic factories ship in core:
-| Factory | When |
-|---|---|
-| `createOpenAICompatibleBackend` | TCloud / OpenAI-compatible chat APIs |
-| `createSandboxPromptBackend` | Sandbox / sidecar `streamPrompt` clients |
-| `createIterableBackend` | Custom coding harnesses, browser agents |
-For [cli-bridge](https://github.com/drewstone/cli-bridge) (or any other
-OpenAI-compatible HTTP gateway), use `createOpenAICompatibleBackend` pointed
-at the gateway's `/v1/chat/completions` URL — the cli-bridge harness/model
-selector is just an OpenAI `model` string like `claude/sonnet` or
-`codex/gpt-5-codex`.
-Adapters are intentionally thin. Product repos still own client
-construction, auth, concrete tool permissions, and UI behavior. See
-[`examples/sandbox-stream-backend/`](./examples/sandbox-stream-backend/) and
-[`examples/openai-stream-backend/`](./examples/openai-stream-backend/) for
-runnable wirings.
-## Lifecycle events
-`runAgentTask` and `runAgentTaskStream` emit typed lifecycle events
-through `onEvent`:
+`startRuntimeRun` is the ONE abstraction for "the agent did a thing on
+behalf of a customer; record what it did, what it cost, how it ended."
+Replaces bespoke `agentRuns`-row helpers (legal-agent's
+`completeProductionAgentRun` + `persistRuntimeRun` pair is the canonical
+example of what this subsumes).
 ```ts
-await runAgentTask({
-  task, adapter,
-  onEvent(event) {
-    console.log(event.type)
-  },
+import { startRuntimeRun, runAgentTaskStream } from '@tangle-network/agent-runtime'
+const run = startRuntimeRun({
+  workspaceId: 'ws-1',
+  sessionId: threadId,
+  agentId: 'legal-chat-runtime',
+  taskSpec,
+  scenarioId: `legal-chat:${threadId}`,
+  adapter: { upsert: (row) => db.insert(agentRuns).values(row) },
 })
-```
-Events cover readiness, question answering, acquisition, control-loop
-steps, and task completion. Every transition is observable without
-coupling domain adapters to logging, streaming, or telemetry concerns.
-This package does **not** stream model tokens for you. Domain adapters
-and product routes still own model calls, tool execution, and token
-streaming. agent-runtime emits lifecycle events around those actions.
+for await (const event of runAgentTaskStream({ task: taskSpec, backend, input })) {
+  run.observe(event) // llm_call events update the cost ledger
+  if (event.type === 'final') {
+    run.complete({
+      status: event.status === 'completed' ? 'completed' : 'failed',
+      resultSummary: event.text ?? '',
+      error: event.status === 'failed' ? event.reason : undefined,
+    })
+  }
+}
-## Knowledge providers
+await run.persist({ runtimeEvents: telemetry.events })
+console.log(run.cost()) // { tokensIn, tokensOut, costUsd, wallMs, llmCalls }
+```
-Optional. A knowledge provider implements:
+Full runnable: [`examples/runtime-run/`](./examples/runtime-run/).
-- `buildReadiness` — score readiness against the task's required knowledge
-- `answerQuestions` — handle outstanding user questions
-- `executeAcquisitionPlans` — fetch missing evidence
-- `refreshReadiness` — rerun scoring after acquisition
+## agent-eval trace bridge (NEW in 0.7.0)
-Lets a task collect missing context before the control loop starts, then
-rerun readiness against new evidence. If readiness fails, `runAgentTask`
-stops before domain actions; adapters can override `onKnowledgeBlocked`
-to emit a domain action (asking a user, querying a connector, etc.).
+If you persist traces in agent-eval's `TraceStore`, map runtime stream
+events to `TraceEvent` once and stop hand-rolling the adapter in every
+domain repo:
-For control policies or route handlers that need a stable readiness
-branch, use `decideKnowledgeReadiness(report)` — it returns `ready`,
-`blocked`, or `caveat` plus gap IDs and the recommended action.
+```ts
+import { createTraceBridge } from '@tangle-network/agent-runtime'
-## Sanitized telemetry
+const bridge = createTraceBridge({ runId, spanId })
+for await (const event of runAgentTaskStream({ task, backend, input })) {
+  const trace = bridge.toTraceEvent(event)
+  if (trace) await traceStore.appendEvent(trace)
+}
+```
-For logs, reports, UI telemetry — never serialize raw events directly.
-Use the built-in sanitized collector:
+## Error taxonomy
-```ts
-import {
-  createRuntimeEventCollector,
-  summarizeAgentTaskRun,
-} from '@tangle-network/agent-runtime'
+Every public function throws one of:
-const telemetry = createRuntimeEventCollector()
-const result = await runAgentTask({ task, adapter, onEvent: telemetry.onEvent })
+| Error | When |
+|---|---|
+| `ValidationError` | Caller passed invalid arguments |
+| `ConfigError` | Required env / config missing |
+| `NotFoundError` | A named resource does not exist |
+| `BackendTransportError` | Backend HTTP / IPC call returned non-success |
+| `SessionMismatchError` | Resume requested against a different backend |
+| `RuntimeRunStateError` | `RuntimeRunHandle` lifecycle methods called out of order |
-console.log(telemetry.events)
-console.log(summarizeAgentTaskRun(result))
-```
+All extend `AgentEvalError` (re-exported from `@tangle-network/agent-eval`)
+and carry a stable `code` so cross-package handlers can pattern-match
+without importing the runtime.
-By default, the collector redacts task inputs, user answers, credential
-questions, control payloads, evidence IDs, task metadata, and eval
-details. Private diagnostics opt-in via `RuntimeTelemetryOptions` flags
-(`includeInputs`, `includeUserAnswers`, `includeControlPayloads`,
-`includeEvidenceIds`, `includeRequirementDescriptions`,
-`includeMetadata`, `includeEvalDetails`).
+## Sanitized telemetry
-For `runAgentTaskStream`, use the sibling
-`createRuntimeStreamEventCollector`:
+`task.intent` flows through sanitized telemetry on every event. **Never
+set it to user input** — use a fixed string describing the operation
+kind (e.g. `"Run a chat turn"`, `"Score a tax return"`). Route user-
+visible content through `task.inputs` (redacted by default).
 ```ts
-import {
-  createRuntimeStreamEventCollector,
-  runAgentTaskStream,
-} from '@tangle-network/agent-runtime'
+import { createRuntimeStreamEventCollector, runAgentTaskStream } from '@tangle-network/agent-runtime'
 const telemetry = createRuntimeStreamEventCollector()
 for await (const event of runAgentTaskStream({ task, backend })) {
   telemetry.onEvent(event)
 }
-console.log(telemetry.events)
-console.log(telemetry.summary())
+console.log(telemetry.events, telemetry.summary())
 ```
-Same `RuntimeTelemetryOptions` flags apply. Streaming and non-streaming
-events have different field shapes (timestamps, sessions, text/tool
-deltas), which is why the factories are siblings rather than overloads —
-a single dispatcher would silently misroute events whose `type` literals
-overlap (`task_start`, `readiness_end`, etc.).
-### `task.intent` is sanitized telemetry by default
-`task.intent` flows through sanitized telemetry on every event. **Never
-set it to user input** — use a fixed string describing the operation
-kind (e.g. `"Run a chat turn"`, `"Score a tax return"`). If you need to
-log user-visible intent, route it through `inputs` (which are redacted
-by default) instead.
-For SSE-over-HTTP, use the helpers:
-```ts
-import { readinessServerSentEvent } from '@tangle-network/agent-runtime'
-writer.write(encoder.encode(readinessServerSentEvent(readinessReport)))
-```
+By default the collector redacts task inputs, user answers, credential
+questions, control payloads, evidence IDs, task metadata, and eval
+details. Private diagnostics opt-in via `RuntimeTelemetryOptions`.
 ## Package boundaries
 | Package | Owns |
 |---|---|
-| `agent-runtime` | Reusable lifecycle and adapter contracts |
-| `agent-eval` | Control loops, readiness scoring, traces, evals, failure classes, optimization, release evidence |
+| `agent-runtime` | Lifecycle, adapters, backends, `RuntimeRunHandle`, trace bridge |
+| `agent-eval` | Control loops, readiness scoring, traces, evals, failure classes, release evidence |
 | `agent-knowledge` | Evidence, claims, wiki pages, retrieval, knowledge bundle builders |
 | Domain packages | Domain tools, policies, credentials, UI text, rubrics |
 The API uses `runAgentTask`, not `runVerticalAgentTask`. `domain` is
-metadata on the task, because the runtime should be reusable across many
-kinds of agents without baking taxonomy into type names.
+metadata on the task because the runtime is reusable across many kinds of
+agents without baking taxonomy into type names.
 ## Examples
 Runnable in [`examples/`](./examples/):
-- [`basic-task/`](./examples/basic-task/) — the smallest `runAgentTask`
-- [`with-knowledge-readiness/`](./examples/with-knowledge-readiness/) — readiness gating + custom `onKnowledgeBlocked`
-- [`sanitized-telemetry/`](./examples/sanitized-telemetry/) — `createRuntimeEventCollector` + redaction policy
-- [`sanitized-telemetry-streaming/`](./examples/sanitized-telemetry-streaming/) — `createRuntimeStreamEventCollector` + redaction policy for `runAgentTaskStream`
+- [`basic-task/`](./examples/basic-task/) — smallest `runAgentTask`
+- [`with-knowledge-readiness/`](./examples/with-knowledge-readiness/) — readiness gating + `onKnowledgeBlocked`
+- [`sanitized-telemetry/`](./examples/sanitized-telemetry/) — `createRuntimeEventCollector` + redaction
+- [`sanitized-telemetry-streaming/`](./examples/sanitized-telemetry-streaming/) — streaming collector + redaction
 - [`sse-stream/`](./examples/sse-stream/) — Server-Sent Events for browser clients
-- [`sandbox-stream-backend/`](./examples/sandbox-stream-backend/) — `runAgentTaskStream` with `createSandboxPromptBackend` (synthetic sandbox client; real one in `agent-builder`)
-- [`openai-stream-backend/`](./examples/openai-stream-backend/) — `runAgentTaskStream` with `createOpenAICompatibleBackend`
+- [`sandbox-stream-backend/`](./examples/sandbox-stream-backend/) — `createSandboxPromptBackend`
+- [`openai-stream-backend/`](./examples/openai-stream-backend/) — `createOpenAICompatibleBackend`
+- [`runtime-run/`](./examples/runtime-run/) — `startRuntimeRun` + cost ledger + persistence adapter (NEW)