@tangle-network/agent-runtime 0.5.0 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,58 +1,230 @@
1
- # Product Runtime Kernel Track
1
+ # Product Runtime Kernel
2
2
 
3
- This package should be useful in production because it owns the agent execution
4
- contract, not because it logs decorative lifecycle events.
3
+ Status: implemented in `@tangle-network/agent-runtime@0.5.0`; validated and
4
+ documented in `0.5.1`.
5
5
 
6
- ## Goal
6
+ This document tracks the production runtime kernel: what it is for, what is
7
+ complete, what is intentionally out of scope, and what product repos still need
8
+ to adopt.
7
9
 
8
- Provide a small, stable kernel that product routes, eval harnesses, and coding
9
- agent harnesses can all call:
10
+ ## Purpose
11
+
12
+ `agent-runtime` exists to make agent execution consistent across products and
13
+ eval harnesses. It should own the contract for:
14
+
15
+ - readiness gating before execution;
16
+ - session create/resume for long-running coding harnesses;
17
+ - backend-agnostic streaming;
18
+ - sanitized product/eval telemetry;
19
+ - durable evidence that can feed reports, failure classification, and
20
+ optimization.
21
+
22
+ It should not be a decorative event logger around unrelated product code. If a
23
+ product route still calls a backend directly, hand-rolls SSE, and only emits
24
+ `start/end`, it is not getting the full value.
25
+
26
+ ## Runtime Flow
10
27
 
11
28
  ```txt
12
- Task
29
+ TaskSpec
13
30
  -> knowledge readiness
14
- -> optional ask/acquire
31
+ -> optional ask/acquire/refresh
32
+ -> readiness decision
15
33
  -> session create/resume
16
- -> backend stream
17
- -> policy/eval-visible events
18
- -> persisted resumable evidence
34
+ -> execution backend stream
35
+ -> normalized RuntimeStreamEvent
36
+ -> sanitized SSE / persisted session event history
37
+ -> final task status
19
38
  ```
20
39
 
21
- ## Non-Goals
40
+ ## Completed API Surface
41
+
42
+ ### Execution
43
+
44
+ - `runAgentTaskStream(options)`
45
+ - Applies readiness before backend execution.
46
+ - Emits `task_start`, `readiness_start`, `readiness_end`.
47
+ - Stops before backend execution when blocking gaps remain.
48
+ - Creates or resumes a backend session.
49
+ - Normalizes backend output into `RuntimeStreamEvent`.
50
+ - Emits `backend_start`, `backend_end`, `task_end`, and `final`.
51
+ - Records backend stream events into an optional `RuntimeSessionStore`.
52
+
53
+ - `runAgentTask(options)`
54
+ - Existing control-loop path for eval-oriented agents.
55
+ - Still useful for deterministic eval/optimization harnesses that model
56
+ observe/validate/decide/act directly.
57
+
58
+ ### Stream Contract
59
+
60
+ - `RuntimeStreamEvent`
61
+ - Readiness: `readiness_start`, `readiness_end`.
62
+ - Context collection: `questions_start`, `questions_end`,
63
+ `acquisition_start`, `acquisition_end`.
64
+ - Session: `session_created`, `session_resumed`.
65
+ - Backend lifecycle: `backend_start`, `backend_end`, `backend_error`.
66
+ - Product stream: `text_delta`, `reasoning_delta`, `tool_call`,
67
+ `tool_result`, `artifact`.
68
+ - Completion: `task_end`, `final`.
69
+
70
+ ### Sessions
71
+
72
+ - `RuntimeSession`
73
+ - Stable `id`, backend kind, status, timestamps, optional `resumeToken`, and
74
+ metadata.
75
+
76
+ - `RuntimeSessionStore`
77
+ - Minimal persistence contract: `get`, `put`, `appendEvent`, `listEvents`.
78
+ - Product repos should back this with D1/Postgres/Redis/etc. for real resume.
79
+
80
+ - `InMemoryRuntimeSessionStore`
81
+ - Useful for tests, local demos, and short-lived worker processes.
82
+ - Not durable enough for production resume by itself.
83
+
84
+ ### Backend Abstraction
85
+
86
+ - `AgentExecutionBackend`
87
+ - `start`, `resume`, `stream`, optional `stop`.
88
+ - SDK-agnostic: the package owns the contract, callers own concrete clients
89
+ and auth.
90
+
91
+ - `createIterableBackend`
92
+ - Escape hatch for custom harnesses, browser agents, and test doubles.
93
+
94
+ - `createSandboxPromptBackend`
95
+ - Wraps sandbox/sidecar clients that expose `streamPrompt`.
96
+ - Supports caller-provided session IDs and resume via backend `resume`.
97
+ - Maps common sandbox events to `text_delta`, `tool_call`, and `tool_result`.
98
+
99
+ - `createCliBridgeBackend`
100
+ - Posts task/message/session info to an HTTP CLI bridge.
101
+ - Passes `sessionId` and `resumeToken`.
102
+ - Parses SSE/NDJSON-style streamed responses through the common stream
103
+ parser.
104
+
105
+ - `createOpenAICompatibleBackend`
106
+ - Wraps TCloud/OpenAI-compatible `/chat/completions` streaming APIs.
107
+ - Normalizes streamed content deltas into `text_delta`.
108
+
109
+ ### Sanitization and SSE
110
+
111
+ - `sanitizeRuntimeStreamEvent(event, options)`
112
+ - Redacts task inputs, user answers, control payloads, metadata, artifact
113
+ URIs, and evidence IDs by default.
114
+ - Reveals payloads only through explicit diagnostic options.
115
+
116
+ - `runtimeStreamServerSentEvent(event, options)`
117
+ - Encodes any sanitized runtime stream event as SSE.
118
+ - Prevents every product route from hand-rolling inconsistent framing.
119
+
120
+ - Existing helpers remain:
121
+ - `sanitizeAgentRuntimeEvent`
122
+ - `createRuntimeEventCollector`
123
+ - `readinessServerSentEvent`
124
+ - `encodeServerSentEvent`
125
+
126
+ ## Validation Matrix
127
+
128
+ Implemented test coverage in `tests/runtime.test.ts`:
129
+
130
+ - Ready task runs through the existing control lifecycle.
131
+ - Missing blocking knowledge stops before action.
132
+ - Knowledge question/acquisition hooks refresh readiness before control.
133
+ - Sanitized runtime telemetry redacts secrets by default.
134
+ - Readiness decisions return stable `ready`, `blocked`, and `caveat` states.
135
+ - SSE encoding strips unsafe control-field newlines.
136
+ - Readiness SSE payloads use sanitized reports.
137
+ - `runAgentTaskStream` blocks backend execution when readiness is missing.
138
+ - Streaming backend creates a session, persists events, and resumes by
139
+ `sessionId`.
140
+ - Sanitized tool-call stream events hide payloads by default and reveal them
141
+ only with `includeControlPayloads`.
142
+ - Sandbox prompt events map to text/tool runtime stream events.
143
+ - OpenAI-compatible streaming chat completions parse token deltas and produce a
144
+ final completed event.
145
+
146
+ Release verification:
147
+
148
+ - `pnpm test`
149
+ - `pnpm typecheck`
150
+ - `pnpm build`
151
+ - Published to npm as `@tangle-network/agent-runtime@0.5.0`.
152
+ - Documentation validation published in `@tangle-network/agent-runtime@0.5.1`.
153
+
154
+ ## Critique
155
+
156
+ The runtime kernel is now materially useful, but it is not magic. The most
157
+ important limitations are deliberate:
158
+
159
+ - It does not construct TCloud, sandbox, or CLI bridge clients. Product repos
160
+ own credentials and client lifecycle.
161
+ - It does not persist sessions durably unless a product supplies a durable
162
+ `RuntimeSessionStore`.
163
+ - It does not enforce all budgets/approvals/tool policies by itself yet. Those
164
+ still live in product adapters or `agent-eval` control loops.
165
+ - It does not guarantee backend resume works if the underlying backend cannot
166
+ resume. It passes stable session IDs/resume tokens and records history; the
167
+ backend must honor them.
168
+ - It does not replace domain-specific wrappers. Tax/legal/GTM/creative still
169
+ need their own requirements, tools, prompts, and report semantics.
170
+
171
+ These constraints are correct for a public package. The core should define the
172
+ contract and provide high-quality adapters, not absorb private product code.
173
+
174
+ ## Downstream Adoption Checklist
175
+
176
+ For product routes:
177
+
178
+ - Replace direct sandbox/CLI/TCloud stream loops with `runAgentTaskStream`.
179
+ - Forward `runtimeStreamServerSentEvent(event)` to UI.
180
+ - Preserve legacy UI events only as compatibility shims.
181
+ - Store `RuntimeSession` and `RuntimeStreamEvent[]` in the product database.
182
+ - Pass `sessionId` and `resume: true` for continuation.
183
+ - Persist `final.status`, readiness decision, and backend kind in run records.
184
+
185
+ For coding harnesses:
186
+
187
+ - Use `createSandboxPromptBackend`, `createCliBridgeBackend`, or a custom
188
+ `AgentExecutionBackend`.
189
+ - Require a stable `sessionId` for any long-running workspace.
190
+ - Surface `session_resumed` in telemetry so product/debug views can distinguish
191
+ continuation from a fresh run.
192
+ - Treat missing session state as a recoverable backend/runtime failure, not a
193
+ prompt failure.
194
+
195
+ For eval and optimization:
22
196
 
23
- - Do not make `agent-runtime` depend directly on private app code.
24
- - Do not force one model SDK, sandbox SDK, or CLI bridge implementation.
25
- - Do not hide domain tools, prompts, credentials, or UI policy in this package.
26
- - Do not replace token/tool streaming. Normalize it.
197
+ - Attach readiness decisions and stream session metadata to `RunRecord.raw`.
198
+ - Classify missing knowledge/runtime/session failures separately from prompt or
199
+ reasoning failures.
200
+ - Do not optimize prompts when dominant failures are missing context, bad
201
+ retrieval, missing credentials, or broken backend resume.
27
202
 
28
- ## Runtime Kernel Requirements
203
+ ## Completed Downstream Proof
29
204
 
30
- - `RuntimeStreamEvent` is the canonical product/eval stream shape.
31
- - `AgentExecutionBackend` wraps TCloud, CLI bridge, sandbox SDK, browser drivers,
32
- local harnesses, or custom agent loops.
33
- - `RuntimeSessionStore` persists session handles and event history so coding
34
- harnesses can resume instead of starting over.
35
- - `runAgentTaskStream` applies readiness before backend execution, then streams
36
- normalized backend events.
37
- - Sanitizers redact task inputs, credentials, answers, payloads, and evidence by
38
- default.
39
- - SSE helpers encode any runtime stream event without apps hand-rolling framing.
205
+ `agent-builder` has a product-path proof in PR #61:
40
206
 
41
- ## Backend Shape
207
+ - Bumps `@tangle-network/agent-runtime` to `^0.5.0`.
208
+ - Routes sandbox chat through `runAgentTaskStream`.
209
+ - Uses `createSandboxPromptBackend`.
210
+ - Emits sanitized runtime stream SSE.
211
+ - Adds runtime session IDs to the compatibility `done` event.
42
212
 
43
- Backends should be thin adapters over real clients:
213
+ That validates the package against a real sandbox-backed product route, not only
214
+ unit tests.
44
215
 
45
- - TCloud/simple chat: message in, text deltas/final out.
46
- - CLI bridge: session id/resume token in, terminal/tool/text events out.
47
- - Sandbox SDK: existing sandbox id/session id in, `streamPrompt` events out.
216
+ ## Remaining Work
48
217
 
49
- The package owns the contract; callers own the concrete clients and auth.
218
+ This is downstream work, not missing kernel work:
50
219
 
51
- ## Acceptance
220
+ - Add durable `RuntimeSessionStore` implementations in product repos.
221
+ - Convert CLI bridge routes/harnesses to `createCliBridgeBackend`.
222
+ - Convert simple TCloud chat routes to `createOpenAICompatibleBackend` where
223
+ useful.
224
+ - Store runtime stream events in product trace/run-record tables.
225
+ - Add UI affordances for session resume/continuation and readiness blockers.
226
+ - Extend failure classifiers to consume `RuntimeStreamEvent` evidence directly.
52
227
 
53
- - A product route can call `runAgentTaskStream()` and forward SSE.
54
- - A sandbox or CLI bridge run can resume by passing `sessionId`.
55
- - Stream events distinguish readiness, session, text, tool, artifact, error, and
56
- final states.
57
- - Domain repos can keep their wrappers while using the same backend/session
58
- primitives.
228
+ The kernel is complete enough to adopt broadly. The next value comes from
229
+ removing bespoke product stream loops and using the same runtime contract
230
+ everywhere.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@tangle-network/agent-runtime",
3
- "version": "0.5.0",
3
+ "version": "0.5.1",
4
4
  "description": "Reusable runtime lifecycle for domain-specific agents.",
5
5
  "homepage": "https://github.com/tangle-network/agent-runtime#readme",
6
6
  "repository": {