stable-harness 0.0.7 → 0.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/README.md +10 -0
  2. package/docs/0.1.0-p0-runtime-control-plane-plan.zh.md +171 -0
  3. package/docs/0.1.0-retry-policy.zh.md +87 -0
  4. package/docs/0.1.0-stable-runtime-development-roadmap.zh.md +393 -0
  5. package/docs/0.1.0-tool-guard-benchmark.zh.md +42 -0
  6. package/docs/adapter-contract.md +199 -0
  7. package/docs/architecture/backend-comparison.md +41 -0
  8. package/docs/architecture/runtime-events.md +263 -0
  9. package/docs/architecture/runtime-events.zh.md +248 -0
  10. package/docs/architecture/system-architecture.zh.md +435 -0
  11. package/docs/compatibility-matrix.md +139 -0
  12. package/docs/engineering-rules.md +111 -0
  13. package/docs/evaluation/0.1.0-bfcl-targeted-model-matrix.zh.md +1632 -0
  14. package/docs/evaluation/0.1.0-bfcl-targeted-review-matrix.zh.md +1952 -0
  15. package/docs/evaluation/0.1.0-bfcl-tool-guard.zh.md +1427 -0
  16. package/docs/granite-tool-calling-comparison.zh.md +206 -0
  17. package/docs/guides/getting-started.md +126 -0
  18. package/docs/guides/index.md +40 -0
  19. package/docs/guides/integration-guide.md +126 -0
  20. package/docs/guides/operator-runbook.md +153 -0
  21. package/docs/guides/workspace-authoring.md +212 -0
  22. package/docs/implementation-blueprint.md +233 -0
  23. package/docs/memory/0.1.0-memory-design.zh.md +719 -0
  24. package/docs/memory/0.1.0-step-09-deepagents-native-memory.zh.md +146 -0
  25. package/docs/memory/0.1.0-step-09-langmem-shaped-provider.zh.md +169 -0
  26. package/docs/memory/0.1.0-step-09-memory-adapter-projection.zh.md +123 -0
  27. package/docs/memory/0.1.0-step-09-memory-contract.zh.md +169 -0
  28. package/docs/memory/0.1.0-step-09-memory-governance-approval.zh.md +143 -0
  29. package/docs/memory/0.1.0-step-09-memory-lifecycle-hooks.zh.md +150 -0
  30. package/docs/memory/0.1.0-step-09-memory-maintenance-boundary.zh.md +118 -0
  31. package/docs/memory/0.1.0-step-09-memory-persistence-boundary.zh.md +118 -0
  32. package/docs/product/adoption-playbook.md +145 -0
  33. package/docs/product/market-positioning.md +137 -0
  34. package/docs/product-boundary.md +258 -0
  35. package/docs/protocols/http-runtime.md +37 -0
  36. package/docs/protocols/langgraph-compatible.md +107 -0
  37. package/docs/protocols/openai-compatible.md +121 -0
  38. package/docs/tooling/0.1.0-bettercall-tool-quality.zh.md +231 -0
  39. package/package.json +3 -1
@@ -0,0 +1,137 @@
1
+ # Market Positioning
2
+
3
+ Stable Harness sits between agent frameworks and production applications.
4
+
5
+ It is not trying to replace the framework that decides what an agent does next.
6
+ It gives application teams the runtime boundary they need once the prototype
7
+ must become inspectable, governable, recoverable, and callable through stable
8
+ interfaces.
9
+
10
+ ## Category
11
+
12
+ Stable Harness is a stable agent application runtime and operator control plane.
13
+
14
+ It combines:
15
+
16
+ - workspace inventory
17
+ - runtime lifecycle
18
+ - tool-gateway reliability
19
+ - event traces
20
+ - governance hooks
21
+ - memory lifecycle
22
+ - protocol access
23
+ - backend adapters
24
+
25
+ ## The Problem
26
+
27
+ Agent frameworks usually focus on execution semantics:
28
+
29
+ - model calls
30
+ - planning loops
31
+ - tool calls
32
+ - graph transitions
33
+ - delegation
34
+ - memory primitives
35
+
36
+ Production applications need additional surfaces:
37
+
38
+ - repeatable workspace definition
39
+ - request and session lifecycle
40
+ - operator inspection
41
+ - approval and sandbox policy
42
+ - tool repair and validation before execution
43
+ - trace and artifact capture
44
+ - protocol facades for existing clients
45
+ - backend portability without rewriting the product
46
+
47
+ Stable Harness owns those production surfaces.
48
+
49
+ ## What It Is Not
50
+
51
+ Stable Harness is not:
52
+
53
+ - a new planning framework
54
+ - a LangGraph replacement
55
+ - a DeepAgents replacement
56
+ - a model router based on prompt keywords
57
+ - a hosted product by itself
58
+ - an unbounded tool-call repair layer
59
+
60
+ The boundary matters because users should be able to trust that upstream
61
+ framework semantics remain upstream-native.
62
+
63
+ ## Competitive Map
64
+
65
+ | Category | Examples | Stable Harness Position |
66
+ | --- | --- | --- |
67
+ | Agent execution frameworks | DeepAgents, LangGraph, OpenAI Agents SDK | Keep them. Stable Harness wraps their runtime boundary. |
68
+ | Protocol gateways | OpenAI-compatible servers, MCP servers | Stable Harness exposes protocols over one workspace runtime. |
69
+ | Workflow engines | LangGraph workflows, custom DAG runners | Stable Harness can expose explicit topology while preserving adapter ownership. |
70
+ | Observability tools | tracing and logging platforms | Stable Harness emits runtime evidence that can feed those systems. |
71
+ | Governance systems | approval queues, policy engines, sandbox managers | Stable Harness provides runtime hooks and policy boundaries for agent work. |
72
+
73
+ ## Differentiation
74
+
75
+ ### Passthrough-first adapters
76
+
77
+ Stable Harness does not copy every backend feature into core runtime. It keeps a
78
+ thin adapter boundary and passes through upstream-native behavior when the
79
+ backend already owns it.
80
+
81
+ ### YAML workspace inventory
82
+
83
+ Agents, models, tools, memory, workflows, and protocol exposure live in
84
+ workspace config. This makes agent applications easier to inspect, move, review,
85
+ and operate.
86
+
87
+ ### Tool reliability at the runtime edge
88
+
89
+ Tool calls are validated and can be repaired before execution. The gateway
90
+ protects execution with inventory, schemas, semantic validators, and governance
91
+ policy.
92
+
93
+ ### Operator control plane
94
+
95
+ Applications can inspect requests, sessions, events, artifacts, approvals,
96
+ memory lifecycle, and runs through stable runtime surfaces.
97
+
98
+ ### Multi-protocol access
99
+
100
+ The same workspace can be used through CLI, SDK, HTTP, and OpenAI-compatible
101
+ clients without creating separate execution paths.
102
+
103
+ ## Buyer Narrative
104
+
105
+ For a product team:
106
+
107
+ > We already have an agent prototype. Stable Harness makes it shippable by adding
108
+ > runtime inventory, sessions, traces, governance, memory lifecycle, and protocol
109
+ > access without rewriting the backend.
110
+
111
+ For a platform team:
112
+
113
+ > Stable Harness gives us one runtime contract across multiple agent apps while
114
+ > letting each team keep its backend framework.
115
+
116
+ For an engineering leader:
117
+
118
+ > Stable Harness reduces the cost of operating agent systems by making behavior
119
+ > inspectable, recoverable, testable, and governed through explicit runtime
120
+ > boundaries.
121
+
122
+ ## Claims To Avoid
123
+
124
+ Avoid claims that imply Stable Harness owns correctness it cannot guarantee:
125
+
126
+ - "agents always call tools correctly"
127
+ - "drop-in production for every agent"
128
+ - "framework-independent behavior with no adapter work"
129
+ - "automatic routing from natural language"
130
+
131
+ Use claims that are true:
132
+
133
+ - "stable runtime boundary for agent workspaces"
134
+ - "framework-generic operator control plane"
135
+ - "validated and repairable tool gateway"
136
+ - "YAML-defined inventory and protocol exposure"
137
+ - "passthrough-first backend adapters"
@@ -0,0 +1,258 @@
1
+ # Stable Harness Product Boundary
2
+
3
+ `stable-harness` is a generic stable application runtime and operator control plane for agent workspaces.
4
+
5
+ It is not a new agent framework. It wraps upstream frameworks such as DeepAgents, LangChain v1, OpenAI Agents SDK, Gemini SDK, and future runtimes through adapters.
6
+
7
+ DeepAgents is the first backend target. The current DeepAgents feature set must be audited before adding any overlapping `stable-harness` behavior.
8
+
9
+ ## Product Mission
10
+
11
+ The product should let a team define a production agent workspace in YAML, choose an execution backend, and get stable runtime operations without rewriting the backend framework.
12
+
13
+ The product must stay framework-generic. A native runtime capability is valid only when it is useful across workspaces and backends, has a stable owner, and does not duplicate an upstream execution primitive.
14
+
15
+ The public surface should stay small:
16
+
17
+ - load a workspace
18
+ - start a runtime
19
+ - run a request
20
+ - inspect runs, events, approvals, memory, and artifacts
21
+ - stop the runtime
22
+
23
+ Everything else should be configured through YAML or handled internally by the runtime.
24
+
25
+ ## Complete Product Boundary
26
+
27
+ ### 1. YAML-defined workspace
28
+
29
+ The primary product surface is a workspace folder with Kubernetes-style YAML.
30
+
31
+ The workspace owns:
32
+
33
+ - agents
34
+ - models
35
+ - tools
36
+ - MCP servers
37
+ - skills and resources
38
+ - routing
39
+ - backend adapter selection
40
+ - memory policy
41
+ - approval policy
42
+ - sandbox policy
43
+ - recovery policy
44
+ - protocol exposure
45
+ - maintenance policy
46
+
47
+ The YAML should express upstream concepts directly when the upstream framework already defines them.
48
+
49
+ ### 2. Pluggable backend runtime adapters
50
+
51
+ The product contract must not be DeepAgents-shaped, LangGraph-shaped, Microsoft-Agent-Framework-shaped, or downstream-workspace-shaped.
52
+
53
+ The runtime should support backend adapters such as:
54
+
55
+ - DeepAgents
56
+ - LangChain v1 or LangGraph agents
57
+ - OpenAI Agents SDK
58
+ - Gemini client SDK
59
+ - local model runtimes
60
+ - customer-owned internal frameworks
61
+
62
+ Adapters translate stable runtime requests into upstream framework calls. They should be internal integration layers, not new public semantics.
63
+
64
+ The default adapter strategy is passthrough first:
65
+
66
+ - expose upstream-native primitives through typed workspace config when needed
67
+ - preserve upstream execution semantics instead of normalizing them into a local imitation
68
+ - add only small runtime wrappers for lifecycle, governance, observability, persistence, replay, protocol access, or operator inspection
69
+ - keep backend-specific options behind the adapter boundary unless they are intentionally part of workspace configuration
70
+
71
+ Every adapter feature should be classified before implementation:
72
+
73
+ - `passthrough`: the upstream framework already owns the capability
74
+ - `runtime wrapper`: `stable-harness` adds lifecycle, governance, observability, persistence, or protocol access around upstream execution
75
+ - `plugin capability`: the feature is runtime-owned but optional and replaceable
76
+ - `downstream workspace`: the feature is application-specific and should not enter the generic runtime
77
+ - `do not build`: the feature would duplicate upstream agent execution semantics
78
+
79
+ No DeepAgents feature should be rebuilt locally when upstream passthrough or upstream-native config is sufficient.
80
+
81
+ The same standard applies to every backend. Do not recreate OpenAI Agents SDK, Gemini SDK, LangGraph, Microsoft Agent Framework, or customer-framework concepts inside core runtime when an adapter can pass them through.
82
+
83
+ ### 2.1 Agent graph overlay
84
+
85
+ The native user-facing definition should stay `Agent`. DeepAgents agents keep
86
+ their existing definition shape. LangGraph agents may append `edges` to connect
87
+ existing workspace inventory such as tools, skills, and subagents; the edge list
88
+ is topology, not a new execution language.
89
+
90
+ Standalone workflow documents are an operator/control-plane compatibility
91
+ surface for explicit graph inventory, inspection, and migration. They must not
92
+ become the default product concept when an `Agent` with backend-native graph
93
+ configuration is sufficient.
94
+
95
+ The first implementation layer should stay small:
96
+
97
+ - load and validate optional Agent edges for graph-capable backends
98
+ - verify graph edges against existing Agent inventory references
99
+ - render graph structure for inspection without redefining inventory
100
+ - expose explicit graph routing tables as typed runtime inventory when needed
101
+ - dispatch graph-capable agents to a pluggable backend adapter
102
+
103
+ Graph routing is a static control-plane table. It may name an explicit graph
104
+ route when a protocol needs it, but it must not infer a route from user prose,
105
+ prompt keywords, downstream domains, tool names, or benchmark cases.
106
+
107
+ When a workflow is compiled to LangGraph, Microsoft Agent Framework, or another backend, that compiler remains an adapter capability. Core runtime keeps lifecycle, governance, observability, persistence, request IDs, session IDs, events, and protocol ownership.
108
+
109
+ The LangGraph adapter is a concrete example of this boundary: it compiles the
110
+ workflow topology to upstream LangGraph and calls injected node handlers. It does
111
+ not decide what an `agents.*`, `tools.*`, or `skills.*` node means by itself.
112
+ Sub-workflows use the same rule. A `workflows.*` node is just an inventory
113
+ reference until a workflow adapter explicitly opts in and supplies bounded
114
+ execution behavior.
115
+
116
+ ### 3. Long-term memory lifecycle
117
+
118
+ `stable-harness` owns runtime-wide memory lifecycle:
119
+
120
+ - memory namespace management
121
+ - memory persistence policy
122
+ - recall orchestration
123
+ - import, export, backup, and compaction
124
+ - memory observability
125
+ - cross-run memory governance
126
+
127
+ When a backend has native memory primitives, the adapter should use them directly. Runtime memory is an operational substrate, not a replacement for upstream memory semantics.
128
+
129
+ ### 4. Runtime stability layer
130
+
131
+ The runtime should make upstream frameworks production-safe without reimplementing agent logic.
132
+
133
+ It owns:
134
+
135
+ - persisted runs and threads
136
+ - checkpoint discovery and recovery
137
+ - retry and resume lifecycle
138
+ - cancellation and timeout handling
139
+ - event normalization
140
+ - artifact capture
141
+ - structured error records
142
+ - background maintenance
143
+
144
+ It must not infer execution behavior from natural-language text.
145
+
146
+ ### 5. Multi-protocol access
147
+
148
+ The same runtime should be reachable through multiple protocol surfaces:
149
+
150
+ - in-process SDK
151
+ - CLI
152
+ - HTTP server
153
+ - MCP
154
+ - ACP
155
+ - A2A
156
+ - AG-UI
157
+ - future protocol adapters
158
+
159
+ Protocol adapters expose stable runtime concepts. They should not expose backend-specific internal details unless explicitly required for compatibility.
160
+
161
+ ### 6. Operator control plane
162
+
163
+ The runtime should give operators direct control over application lifecycle:
164
+
165
+ - list active and historical runs
166
+ - inspect events
167
+ - inspect pending approvals
168
+ - approve, deny, cancel, or retry work
169
+ - view memory and artifacts
170
+ - inspect backend health
171
+ - run maintenance jobs
172
+
173
+ Users should think in terms of runs, requests, approvals, events, and outcomes rather than raw checkpoint internals.
174
+
175
+ ### 7. Governance and sandbox policy
176
+
177
+ The runtime owns application-level governance:
178
+
179
+ - approval decisions
180
+ - sandbox selection
181
+ - resource limits
182
+ - secret access policy
183
+ - tool execution policy
184
+ - audit records
185
+ - tenant or workspace isolation
186
+
187
+ Policy decisions must be driven by typed config, tool metadata, runtime state, approval state, or explicit request metadata.
188
+
189
+ ### 8. Tool, MCP, skill, and resource integration
190
+
191
+ The runtime owns application inventory and registration:
192
+
193
+ - local tools
194
+ - MCP tools
195
+ - remote tools
196
+ - skills
197
+ - files and workspace resources
198
+ - artifact stores
199
+ - credentials and secret references
200
+
201
+ Tool execution semantics still belong to the selected backend or tool gateway.
202
+
203
+ Each runtime inventory capability must stay independently pluggable. A tool registry, MCP registry, skill registry, artifact store, memory store, approval policy, sandbox policy, event sink, replay store, or protocol surface should be replaceable without forcing a different backend adapter.
204
+
205
+ Pluggability is a design gate, not an implementation detail. A new capability should have a narrow interface, explicit config, focused tests, and a replacement point. If enabling one capability silently requires unrelated memory, approval, replay, protocol, or adapter behavior, the design is too coupled.
206
+
207
+ ### 9. Evaluation, replay, traces, and artifacts
208
+
209
+ The runtime should make production behavior inspectable and repeatable:
210
+
211
+ - event traces
212
+ - tool traces
213
+ - replay manifests
214
+ - regression cases
215
+ - evaluation datasets
216
+ - artifact capture
217
+ - run export and import
218
+
219
+ Replay should use structured events and recorded runtime state. It should not rebuild intent by parsing final prose.
220
+
221
+ ### 10. Distribution and scaffold experience
222
+
223
+ The project should ship a clean day-one experience:
224
+
225
+ - `stable-harness init`
226
+ - example workspaces
227
+ - typed packages
228
+ - minimal SDK
229
+ - clear adapter templates
230
+ - local development scripts
231
+ - verification scripts
232
+
233
+ The default scaffold should teach the product boundary by example.
234
+
235
+ ## Anti-goals
236
+
237
+ `stable-harness` must not:
238
+
239
+ - become a third agent execution framework
240
+ - reimplement DeepAgents, LangChain, or LangGraph semantics
241
+ - reimplement Microsoft Agent Framework, OpenAI Agents SDK, Gemini SDK, or any customer backend semantics
242
+ - invent a second subagent planning language
243
+ - introduce a stable-owned concept when an upstream primitive can be passed through
244
+ - use natural-language keyword matching to drive runtime control flow
245
+ - synthesize tool calls from TODO text
246
+ - locally replay upstream custom tool calls
247
+ - hardcode downstream domains, tickers, tools, or product-specific workflows
248
+ - expose raw checkpoint manipulation as a primary product API
249
+ - mirror every upstream helper export as product surface
250
+ - bundle unrelated runtime capabilities into one non-replaceable subsystem
251
+
252
+ ## Boundary Rule
253
+
254
+ Upstream frameworks own agent-level execution behavior.
255
+
256
+ `stable-harness` owns application-level runtime orchestration, lifecycle, observability, governance, and protocol access.
257
+
258
+ When a feature sits near the boundary, the default answer is: passthrough upstream execution semantics, then add only a small optional runtime capability around it.
@@ -0,0 +1,37 @@
1
+ # HTTP Runtime Protocol
2
+
3
+ The native HTTP server exposes stable runtime state and request submission.
4
+
5
+ `POST /requests` is a protocol adapter over `RuntimeRequest`. It may pass these
6
+ stable fields through to the runtime:
7
+
8
+ - `input`
9
+ - `agentId`
10
+ - `requestId`
11
+ - `sessionId`
12
+ - `parentRunId`
13
+ - `metadata`
14
+ - `memory`
15
+ - `toolCall`
16
+ - `workflow`
17
+
18
+ The endpoint must not infer backend behavior, choose tools from prose, or
19
+ rewrite workflow routes. Tool execution and workflow dispatch remain explicit
20
+ runtime requests.
21
+
22
+ Inspection endpoints expose normalized runtime state:
23
+
24
+ - `GET /inspect`
25
+ - `GET /requests`
26
+ - `GET /requests/:id`
27
+ - `GET /runs/:id/trace`
28
+ - `GET /sessions`
29
+ - `GET /workflows`
30
+ - `GET /workflows/:id/mermaid`
31
+ - `GET /workflows/:id/plan`
32
+
33
+ When the workspace enables `specDrivenWorkflow`, `GET /inspect` exposes the
34
+ configured spec-driven policy summary, and `GET /requests/:id` includes the
35
+ run-derived `specDrivenWorkflow` phase state. That state is projected from
36
+ stored `runtime.specDriven.phase.*` events and preserves the raw events in the
37
+ request timeline.
@@ -0,0 +1,107 @@
1
+ # LangGraph-Compatible Server
2
+
3
+ `stable-harness start` starts the official LangGraph Agent Server when
4
+ `protocols.langgraph` is enabled. The LangGraph HTTP protocol, assistant API,
5
+ thread API, run API, persistence model, and streaming behavior remain owned by
6
+ upstream LangGraph.
7
+
8
+ `stable-harness` owns only the runtime assembly:
9
+
10
+ - read YAML workspace inventory
11
+ - decide which agents to expose
12
+ - generate the thin LangGraph graph entry file
13
+ - call the official `@langchain/langgraph-api/server` startup API
14
+ - keep OpenAI-compatible and LangGraph-compatible services under one runtime
15
+ start command
16
+
17
+ ## Defaults
18
+
19
+ `stable-harness start` starts both services by default:
20
+
21
+ | Service | Default host | Default port |
22
+ | --- | --- | --- |
23
+ | OpenAI-compatible | `127.0.0.1` | `8642` |
24
+ | LangGraph-compatible | `127.0.0.1` | `2024` |
25
+
26
+ Configure or disable services from runtime YAML:
27
+
28
+ ```yaml
29
+ apiVersion: stable-harness.dev/v1
30
+ kind: Runtime
31
+ spec:
32
+ protocols:
33
+ openaiCompatible:
34
+ enabled: true
35
+ host: 127.0.0.1
36
+ port: 8642
37
+ bearerToken: ${env:STABLE_HARNESS_OPENAI_API_KEY:-}
38
+ langgraph:
39
+ enabled: true
40
+ host: 127.0.0.1
41
+ port: 2024
42
+ nWorkers: 10
43
+ env: .env
44
+ exposeAgents:
45
+ - orchestra
46
+ ```
47
+
48
+ If `exposeAgents` is omitted, all YAML agents are exposed as LangGraph graph
49
+ IDs.
50
+
51
+ Then run:
52
+
53
+ ```bash
54
+ stable-harness start -w "$PWD"
55
+ ```
56
+
57
+ ## LangSmith and Studio
58
+
59
+ The LangGraph service is started through the official
60
+ `@langchain/langgraph-api/server`, so LangSmith Studio and LangSmith tracing use
61
+ upstream LangGraph behavior. `stable-harness` only loads the environment before
62
+ that server starts.
63
+
64
+ By default, `stable-harness start` reads `.env` from the workspace root for the
65
+ LangGraph service. Existing shell environment variables win over `.env` values.
66
+ For Studio, put your key in the workspace `.env` file:
67
+
68
+ ```text
69
+ LANGSMITH_API_KEY=lsv2...
70
+ LANGSMITH_TRACING=true
71
+ LANGSMITH_PROJECT=stable-harness-local
72
+ ```
73
+
74
+ You can choose a different file or inline environment through runtime YAML:
75
+
76
+ ```yaml
77
+ apiVersion: stable-harness.dev/v1
78
+ kind: Runtime
79
+ spec:
80
+ protocols:
81
+ langgraph:
82
+ enabled: true
83
+ env: .env.langsmith
84
+ # or:
85
+ # envFile: .env.langsmith
86
+ # env:
87
+ # LANGSMITH_PROJECT: local-studio
88
+ ```
89
+
90
+ When the server prints the LangGraph URL, open Studio with:
91
+
92
+ ```text
93
+ https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024
94
+ ```
95
+
96
+ ## Generated Bridge
97
+
98
+ The official LangGraph server requires each graph to be loaded from a file
99
+ entrypoint. `stable-harness start` generates that file under:
100
+
101
+ ```text
102
+ .stable-harness/langgraph/bridge.mjs
103
+ ```
104
+
105
+ The generated file exports one graph per exposed YAML agent. Each graph is a
106
+ thin adapter that calls the stable runtime for the selected `agentId`. It does
107
+ not implement the LangGraph HTTP protocol.
@@ -0,0 +1,121 @@
1
+ # OpenAI-Compatible Protocol Facade
2
+
3
+ `stable-harness` exposes OpenAI-compatible endpoints as protocol adapters over the stable runtime contract.
4
+
5
+ This surface is for client compatibility. It is not a model provider, a backend adapter, or a new execution framework.
6
+
7
+ ## Supported MVP Surface
8
+
9
+ - `GET /v1/models`
10
+ - `GET /v1/capabilities`
11
+ - `POST /v1/chat/completions`
12
+ - `stream: true` Server-Sent Events for chat completion chunks
13
+ - `stable_harness.tool.progress` SSE events for runtime tool progress
14
+ - `stable_harness.progress.narration` SSE events for runtime progress narration
15
+
16
+ The `model` field maps to workspace agent IDs. `/v1/models` lists all workspace agents by default.
17
+
18
+ ## Local Launch
19
+
20
+ From a workspace checkout:
21
+
22
+ ```sh
23
+ stable-harness start -w "$PWD" --port 8642 --api-key change-me-local-dev
24
+ ```
25
+
26
+ For embedded applications, the server can read auth settings from runtime YAML:
27
+
28
+ ```yaml
29
+ apiVersion: stable-harness.dev/v1
30
+ kind: Runtime
31
+ metadata:
32
+ name: app-runtime
33
+ spec:
34
+ protocols:
35
+ openaiCompatible:
36
+ bearerToken: ${env:STABLE_HARNESS_API_KEY}
37
+ ```
38
+
39
+ Then the application can use:
40
+
41
+ ```ts
42
+ const runtime = await createStableHarnessRuntime(workspaceRoot);
43
+ const server = createOpenAiCompatibleHttpServer(runtime);
44
+ ```
45
+
46
+ Then point OpenAI-compatible clients at:
47
+
48
+ ```text
49
+ http://127.0.0.1:8642/v1
50
+ ```
51
+
52
+ For local-only development, auth is optional. When binding beyond loopback, run behind a trusted network boundary and set a bearer token in runtime YAML or pass `--api-key` as an override.
53
+
54
+ ## Runtime Boundary
55
+
56
+ The protocol adapter calls:
57
+
58
+ ```ts
59
+ runtime.request({
60
+ input,
61
+ agentId,
62
+ metadata: { protocol: "openai-compatible" }
63
+ });
64
+ ```
65
+
66
+ It must not:
67
+
68
+ - create backend-specific model clients
69
+ - bypass workspace agent selection
70
+ - execute client-supplied tools directly
71
+ - mutate DeepAgents or other backend adapter behavior
72
+ - add prompt, keyword, or domain-specific routing logic
73
+
74
+ Client `tools` and `tool_choice` fields are ignored by this facade. Runtime tools still come from workspace inventory, governance policy, and the tool gateway.
75
+
76
+ ## Message Mapping
77
+
78
+ `messages` are flattened into a text transcript:
79
+
80
+ ```text
81
+ system: ...
82
+
83
+ user: ...
84
+
85
+ assistant: ...
86
+ ```
87
+
88
+ Text content parts are preserved. Image URL parts are represented as `[image:<url>]` until a backend-native multimodal projection is added.
89
+
90
+ Unsupported content types return an OpenAI-shaped `invalid_request_error`.
91
+
92
+ ## Streaming
93
+
94
+ Streaming responses use `chat.completion.chunk` events followed by `data: [DONE]`.
95
+
96
+ Runtime tool lifecycle events are emitted as a separate custom SSE event:
97
+
98
+ ```text
99
+ event: stable_harness.tool.progress
100
+ ```
101
+
102
+ When runtime progress narration is enabled, narration events are emitted as:
103
+
104
+ ```text
105
+ event: stable_harness.progress.narration
106
+ ```
107
+
108
+ These keep tool progress and human-readable narration visible without polluting
109
+ the final assistant message.
110
+
111
+ ## Next Surfaces
112
+
113
+ Add these only after the chat-completions subset is stable:
114
+
115
+ - `POST /v1/responses`
116
+ - `previous_response_id` state handling
117
+ - persisted response lookup
118
+ - run event polling endpoints
119
+ - idempotency-key response caching
120
+
121
+ Each addition should remain a protocol adapter over native runtime state.