stable-harness 0.0.8 → 0.0.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +10 -0
  3. package/docs/0.1.0-p0-runtime-control-plane-plan.zh.md +171 -0
  4. package/docs/0.1.0-retry-policy.zh.md +87 -0
  5. package/docs/0.1.0-stable-runtime-development-roadmap.zh.md +393 -0
  6. package/docs/0.1.0-tool-guard-benchmark.zh.md +42 -0
  7. package/docs/adapter-contract.md +199 -0
  8. package/docs/architecture/backend-comparison.md +41 -0
  9. package/docs/architecture/runtime-events.md +263 -0
  10. package/docs/architecture/runtime-events.zh.md +248 -0
  11. package/docs/architecture/system-architecture.zh.md +435 -0
  12. package/docs/compatibility-matrix.md +139 -0
  13. package/docs/engineering-rules.md +111 -0
  14. package/docs/evaluation/0.1.0-bfcl-targeted-model-matrix.zh.md +1632 -0
  15. package/docs/evaluation/0.1.0-bfcl-targeted-review-matrix.zh.md +1952 -0
  16. package/docs/evaluation/0.1.0-bfcl-tool-guard.zh.md +1427 -0
  17. package/docs/granite-tool-calling-comparison.zh.md +206 -0
  18. package/docs/guides/getting-started.md +126 -0
  19. package/docs/guides/index.md +40 -0
  20. package/docs/guides/integration-guide.md +126 -0
  21. package/docs/guides/operator-runbook.md +153 -0
  22. package/docs/guides/workspace-authoring.md +212 -0
  23. package/docs/implementation-blueprint.md +233 -0
  24. package/docs/memory/0.1.0-memory-design.zh.md +719 -0
  25. package/docs/memory/0.1.0-step-09-deepagents-native-memory.zh.md +146 -0
  26. package/docs/memory/0.1.0-step-09-langmem-shaped-provider.zh.md +169 -0
  27. package/docs/memory/0.1.0-step-09-memory-adapter-projection.zh.md +123 -0
  28. package/docs/memory/0.1.0-step-09-memory-contract.zh.md +169 -0
  29. package/docs/memory/0.1.0-step-09-memory-governance-approval.zh.md +143 -0
  30. package/docs/memory/0.1.0-step-09-memory-lifecycle-hooks.zh.md +150 -0
  31. package/docs/memory/0.1.0-step-09-memory-maintenance-boundary.zh.md +118 -0
  32. package/docs/memory/0.1.0-step-09-memory-persistence-boundary.zh.md +118 -0
  33. package/docs/product/adoption-playbook.md +145 -0
  34. package/docs/product/market-positioning.md +137 -0
  35. package/docs/product-boundary.md +258 -0
  36. package/docs/protocols/http-runtime.md +37 -0
  37. package/docs/protocols/langgraph-compatible.md +107 -0
  38. package/docs/protocols/openai-compatible.md +121 -0
  39. package/docs/tooling/0.1.0-bettercall-tool-quality.zh.md +231 -0
  40. package/package.json +2 -1
@@ -0,0 +1,206 @@
1
+ # Self-hosted Tool Calling Comparison
2
+
3
+ 本诊断用于比较 self-hosted 模型在不同 backend 下的工具调用表现。当前低显存默认选择是 Ollama `qwen3.5:0.8b`。
4
+
5
+ ## Scope
6
+
7
+ - Default Ollama model id: `qwen3.5:0.8b`
8
+ - Default Ollama options: `num_ctx=2048`, `num_predict=256`
9
+ - Granite baseline model id: `granite4.1:3b`
10
+ - Hugging Face model id: `ibm-granite/granite-4.1-3b`
11
+ - Ollama native endpoint: `/api/chat`
12
+ - vLLM/SGLang endpoint: OpenAI-compatible `/v1/chat/completions`
13
+
14
+ 这个脚本只做诊断,不改变 stable-harness 默认 runtime 行为。
15
+
16
+ ## Commands
17
+
18
+ Run local Ollama only:
19
+
20
+ ```bash
21
+ TOOLCALL_RUN_OPENAI=false npm run compare:tool-calling
22
+ ```
23
+
24
+ Run remote Ollama only:
25
+
26
+ ```bash
27
+ TOOLCALL_OLLAMA_BASE_URL=http://192.168.0.33:11434 \
28
+ TOOLCALL_RUN_OPENAI=false \
29
+ npm run compare:tool-calling
30
+ ```
31
+
32
+ Run an OpenAI-compatible backend:
33
+
34
+ ```bash
35
+ TOOLCALL_RUN_OLLAMA=false \
36
+ TOOLCALL_OPENAI_BASE_URL=http://127.0.0.1:8000/v1 \
37
+ TOOLCALL_OPENAI_MODEL=ibm-granite/granite-4.1-3b \
38
+ npm run compare:tool-calling
39
+ ```
40
+
41
+ Use named function mode to test argument generation after runtime-selected tool choice:
42
+
43
+ ```bash
44
+ TOOLCALL_RUN_OLLAMA=false \
45
+ TOOLCALL_OPENAI_TOOL_CHOICE=named \
46
+ TOOLCALL_OPENAI_BASE_URL=http://127.0.0.1:8000/v1 \
47
+ TOOLCALL_OPENAI_MODEL=ibm-granite/granite-4.1-3b \
48
+ npm run compare:tool-calling
49
+ ```
50
+
51
+ ## Backend Startup
52
+
53
+ vLLM on a CUDA host:
54
+
55
+ ```bash
56
+ vllm serve ibm-granite/granite-4.1-3b \
57
+ --host 0.0.0.0 \
58
+ --port 8000 \
59
+ --enable-auto-tool-choice \
60
+ --tool-call-parser hermes
61
+ ```
62
+
63
+ SGLang on a CUDA host:
64
+
65
+ ```bash
66
+ python -m sglang.launch_server \
67
+ --model-path ibm-granite/granite-4.1-3b \
68
+ --host 0.0.0.0 \
69
+ --port 30000 \
70
+ --grammar-backend xgrammar
71
+ ```
72
+
73
+ Parser support varies by backend and model template. Keep parser selection as backend config, not stable-harness runtime logic.
74
+
75
+ ## 2026-05-05 Baseline
76
+
77
+ Validated in this checkout:
78
+
79
+ - local Ollama `http://127.0.0.1:11434`, model `granite4.1:3b`: 3/3 passed.
80
+ - remote Ollama `https://ollama-rtx-4070.easynet.world`, model `granite4.1:3b`: 3/3 passed.
81
+ - local Ollama through OpenAI-compatible `/v1`, `tool_choice=required`: 3/3 passed.
82
+ - local Ollama through OpenAI-compatible `/v1`, `tool_choice=named`: 3/3 passed.
83
+ - CUDA host Ollama `http://192.168.0.201:11434`, model `granite4.1:3b`: 3/3 passed.
84
+ - CUDA host Ollama through OpenAI-compatible `/v1`, `tool_choice=required`: 3/3 passed.
85
+ - CUDA host Ollama through OpenAI-compatible `/v1`, `tool_choice=named`: 3/3 passed.
86
+ - CUDA host vLLM `http://192.168.0.201:8000/v1`, model `ibm-granite/granite-4.1-3b`, `tool_choice=auto`: 3/3 passed.
87
+ - CUDA host vLLM `http://192.168.0.201:8000/v1`, model `ibm-granite/granite-4.1-3b`, `tool_choice=required`: 3/3 passed.
88
+ - CUDA host vLLM `http://192.168.0.201:8000/v1`, model `ibm-granite/granite-4.1-3b`, `tool_choice=named`: 3/3 passed.
89
+
90
+ ## 2026-05-05 Deep Matrix
91
+
92
+ Expanded matrix: 11 cases across current weather, forecast vs current-weather, create vs search ticket, shell vs file read, Chinese Kubernetes, Chinese stock quote, general-news vs finance, exact shell command, and enum priority.
93
+
94
+ Results:
95
+
96
+ | Backend | Mode | Result | Notes |
97
+ | --- | --- | ---: | --- |
98
+ | Ollama `192.168.0.201:11434` | native auto tools | 11/11 | News query contained unnatural Chinese but still matched the relaxed semantic check. |
99
+ | Ollama `192.168.0.201:11434/v1` | auto | 11/11 | Same issue: news query contained unnatural Chinese. |
100
+ | Ollama `192.168.0.201:11434/v1` | required | 11/11 | Same issue: news query contained unnatural Chinese. |
101
+ | Ollama `192.168.0.201:11434/v1` | named | 10/11 | Failed news-query semantic check: `不止 定期 AI 环境 新雪`. |
102
+ | vLLM `192.168.0.201:8000/v1` | auto | 10/11 | Tool selection was correct, but news-query args were semantically bad. |
103
+ | vLLM `192.168.0.201:8000/v1` | required | 10/11 | News case failed with vLLM `400 Bad Request` from invalid JSON emitted by the tool parser. |
104
+ | vLLM `192.168.0.201:8000/v1` | named | 11/11 | Best result in this matrix. Runtime-selected tool plus backend-generated args was most stable. |
105
+
106
+ Interpretation:
107
+
108
+ - Both backends can select the right tool on this matrix.
109
+ - The failure mode is argument semantics, especially Chinese query generation.
110
+ - vLLM `required` is not automatically safer than `auto`; parser-level strictness can surface as hard 400 errors.
111
+ - `named` mode best matches the stable-harness two-step strategy: runtime chooses the tool from typed inventory, backend fills schema-bound arguments, gateway validates before execution.
112
+
113
+ Not yet validated here:
114
+
115
+ - SGLang with `ibm-granite/granite-4.1-3b`.
116
+
117
+ vLLM startup details:
118
+
119
+ - Host: `boqiang@192.168.0.201`
120
+ - GPU: RTX 4070 Ti SUPER 16GB
121
+ - Driver: `550.163.01`
122
+ - vLLM image: `vllm/vllm-openai:v0.8.5`
123
+ - `latest` image was not usable on this host because it required CUDA 13.
124
+ - Container: `stable-granite-vllm`
125
+
126
+ SGLang remains unvalidated because no SGLang server was listening on `192.168.0.201:30000`.
127
+
128
+ ## 2026-05-05 CUDA 13 Upgrade And GPT-OSS
129
+
130
+ CUDA host `192.168.0.201` was upgraded from NVIDIA driver `550.163.01` to `580.142`.
131
+
132
+ Validation:
133
+
134
+ - Host `nvidia-smi`: driver `580.142`, CUDA `13.0`.
135
+ - CUDA 13 container smoke: `nvidia/cuda:13.0.0-base-ubuntu24.04` ran `nvidia-smi` successfully.
136
+ - `vllm/vllm-openai:latest` now starts on this host.
137
+
138
+ APT note:
139
+
140
+ - `apt update` was blocked by a stale Warp signing key and a removed Kubernetes `apt.kubernetes.io kubernetes-xenial` source.
141
+ - Those two source entries were disabled with `# codex-disabled`; backups were written beside them as `*.codex-bak`.
142
+
143
+ GPT-OSS run:
144
+
145
+ - Container: `stable-gpt-oss-vllm`
146
+ - Image: `vllm/vllm-openai:latest`
147
+ - vLLM version from logs: `0.20.1`
148
+ - Endpoint: `http://192.168.0.201:8001/v1`
149
+ - Model: `openai/gpt-oss-20b`
150
+ - Working startup flags: `--max-model-len 2048 --gpu-memory-utilization 0.97 --enforce-eager --no-enable-prefix-caching`
151
+
152
+ The initial `4096` context attempt failed before serving:
153
+
154
+ - Model loaded: `13.72 GiB`.
155
+ - Error: no available memory for KV cache blocks.
156
+
157
+ Tool matrix:
158
+
159
+ | Backend | Mode | Result | Notes |
160
+ | --- | --- | ---: | --- |
161
+ | vLLM latest GPT-OSS 20B | auto | 10/11 | Tool selection correct; `read_file` path lost leading `/`. |
162
+ | vLLM latest GPT-OSS 20B | named | 10/11 | Same path failure. Runtime-selected tool does not fix path precision. |
163
+
164
+ Interpretation:
165
+
166
+ - GPT-OSS 20B can run on this 16GB GPU through vLLM latest after the driver upgrade, but only with conservative memory settings.
167
+ - It is faster than Granite in this matrix, but the absolute-path error shows that schema/tool parsing still needs runtime semantic validation before execution.
168
+
169
+ ## 2026-05-05 Model Matrix
170
+
171
+ Expanded matrix: 15 cases. The additional cases cover closed-ticket search, Hong Kong stock lookup, monthly web freshness, and absolute paths containing spaces.
172
+
173
+ Ollama models currently visible on `192.168.0.201` include:
174
+
175
+ - Small/medium chat candidates: `qwen3.5:0.8b`, `qwen3:0.6b`, `lfm2.5-thinking:latest`, `granite4.1:3b`, `qwen3.5:2b`, `qwen3.5:4b`, `qwen3:latest`, `qwen3.5:9b`, `qwen2.5:7b-instruct`, `gemma4:e2b`, `gemma4:e4b`, `gpt-oss:latest`.
176
+ - Very large or slower candidates not included in the fast matrix: `qwen3.6:27b`, `qwen3.6:35b`, `gemma4:26b`, `qwen3.5:27b`, `nemotron-cascade-2:30b`, `lfm2:latest`.
177
+ - Non-chat/specialized models not included: `nomic-embed-text:latest`, `glm-ocr:q8_0`.
178
+
179
+ Fast matrix results:
180
+
181
+ | Backend | Model | Mode | Result | Main failure |
182
+ | --- | --- | --- | ---: | --- |
183
+ | Ollama | `qwen3.5:2b` | native auto tools | 15/15 | none |
184
+ | Ollama | `qwen3.5:4b` | native auto tools | 15/15 | none |
185
+ | Ollama | `qwen3.5:9b` | native auto tools | 15/15 | none |
186
+ | Ollama | `qwen3.5:0.8b` | native auto tools | 14/15 | `freshness` day instead of month |
187
+ | Ollama | `granite4.1:3b` | native auto tools | 14/15 | Chinese news query became unnatural text |
188
+ | Ollama | `qwen3:latest` | native auto tools | 14/15 | Workday ticker became `WORK` |
189
+ | Ollama | `qwen2.5:7b-instruct` | native auto tools | 14/15 | path with space was collapsed |
190
+ | Ollama | `gemma4:e2b` | native auto tools | 14/15 | missed Chinese Workday stock tool call |
191
+ | Ollama | `gemma4:e4b` | native auto tools | 14/15 | missed Chinese Workday stock tool call |
192
+ | Ollama | `lfm2.5-thinking:latest` | native auto tools | 13/15 | namespace typo and HK market error |
193
+ | Ollama | `qwen3:0.6b` | native auto tools | 12/15 | one timeout, weaker exact title/query handling |
194
+ | Ollama | `gpt-oss:latest` | native auto tools | 12/15 | path and freshness enum errors |
195
+ | vLLM latest | `ibm-granite/granite-4.1-3b` | auto | 0/15 | no tool calls emitted |
196
+ | vLLM latest | `ibm-granite/granite-4.1-3b` | required | 13/15 | Chinese query corruption and HK ticker normalization |
197
+ | vLLM latest | `ibm-granite/granite-4.1-3b` | named | 14/15 | HK ticker normalized as `00700` |
198
+ | vLLM latest | `openai/gpt-oss-20b` | auto | 14/15 | absolute path lost leading `/` |
199
+ | vLLM latest | `openai/gpt-oss-20b` | named | 14/15 | same path failure |
200
+
201
+ Current recommendation:
202
+
203
+ 1. Stable default for this host: Ollama `qwen3.5:4b`.
204
+ 2. Best runtime-controlled strategy: vLLM named tools, but only after the runtime has selected the tool and semantic validators are active.
205
+ 3. Do not use vLLM Granite auto tool calling as a default on `vllm/vllm-openai:latest`; it emitted no tool calls in this matrix.
206
+ 4. Do not trust any backend without semantic validation. Passing JSON schema still missed cases such as absolute path preservation, ticker normalization, freshness enum semantics, namespace spelling, and Chinese query quality.
@@ -0,0 +1,126 @@
1
+ # Getting Started
2
+
3
+ This guide gets a new application from zero to a runnable Stable Harness
4
+ workspace without cloning the Stable Harness repository.
5
+
6
+ ## Requirements
7
+
8
+ - Node.js `>=24 <25`
9
+ - An OpenAI-compatible model endpoint, or environment variables that point at
10
+ the model provider you want to use
11
+
12
+ ## Install
13
+
14
+ ```bash
15
+ npm install stable-harness
16
+ ```
17
+
18
+ For a one-off trial:
19
+
20
+ ```bash
21
+ npx stable-harness init ./my-agent-app
22
+ ```
23
+
24
+ ## Initialize A Workspace
25
+
26
+ ```bash
27
+ stable-harness init ./my-agent-app
28
+ cd ./my-agent-app
29
+ ```
30
+
31
+ The scaffold creates:
32
+
33
+ ```text
34
+ config/
35
+ agents/orchestra.yaml
36
+ catalogs/models.yaml
37
+ runtime/workspace.yaml
38
+ resources/
39
+ tools/echo_tool.mjs
40
+ ```
41
+
42
+ It does not overwrite existing scaffold files. If the target already contains a
43
+ generated file, Stable Harness stops so you can decide what to keep.
44
+
45
+ ## Configure The Model
46
+
47
+ The default scaffold expects an OpenAI-compatible endpoint:
48
+
49
+ ```bash
50
+ export OPENAI_API_KEY=...
51
+ export STABLE_HARNESS_MODEL=gpt-4.1-mini
52
+ export STABLE_HARNESS_OPENAI_BASE_URL=https://api.openai.com/v1
53
+ ```
54
+
55
+ For a local or private OpenAI-compatible endpoint, keep the same model catalog
56
+ shape and change the environment variables:
57
+
58
+ ```bash
59
+ export STABLE_HARNESS_MODEL=granite4.1:3b
60
+ export STABLE_HARNESS_OPENAI_BASE_URL=http://127.0.0.1:11434/v1
61
+ export OPENAI_API_KEY=local
62
+ ```
63
+
64
+ ## Run The Workspace
65
+
66
+ Show the loaded workspace:
67
+
68
+ ```bash
69
+ stable-harness -w .
70
+ ```
71
+
72
+ Invoke the scaffolded tool through the runtime gateway:
73
+
74
+ ```bash
75
+ stable-harness -w . --agent orchestra --tool echo_tool --tool-args-json '{"value":"hello"}'
76
+ ```
77
+
78
+ Send a natural-language request to the default agent:
79
+
80
+ ```bash
81
+ stable-harness -w . --agent orchestra "Summarize what this workspace can do."
82
+ ```
83
+
84
+ Print runtime trace lines:
85
+
86
+ ```bash
87
+ stable-harness -w . --agent orchestra --trace "Call the echo tool with hello."
88
+ ```
89
+
90
+ ## Start The OpenAI-Compatible Facade
91
+
92
+ ```bash
93
+ stable-harness start -w . --port 8642
94
+ ```
95
+
96
+ Clients can point at:
97
+
98
+ ```text
99
+ http://127.0.0.1:8642/v1
100
+ ```
101
+
102
+ The facade is a protocol adapter around the same Stable Harness runtime. It is
103
+ not a separate execution path.
104
+
105
+ ## Develop The Framework
106
+
107
+ Clone the repository only when you are changing Stable Harness itself:
108
+
109
+ ```bash
110
+ git clone git@github.com:botbotgo/stable-harness.git
111
+ cd stable-harness
112
+ npm install
113
+ npm run build
114
+ npm run check:rules
115
+ npm test
116
+ npm run example:minimal
117
+ ```
118
+
119
+ ## What To Do Next
120
+
121
+ - Add real tools under `resources/tools`.
122
+ - Let local module tools be auto-discovered, or declare non-module tools in
123
+ `config/catalogs/tools.yaml`.
124
+ - Add specialist agents under `config/agents`.
125
+ - Enable memory, approvals, or protocol exposure in `config/runtime/workspace.yaml`.
126
+ - Embed the runtime in your service when the CLI path is stable.
@@ -0,0 +1,40 @@
1
+ # Stable Harness Documentation
2
+
3
+ Stable Harness is a stable application runtime and operator control plane for
4
+ agent workspaces. These docs are organized for teams that want to adopt it,
5
+ embed it, operate it, or explain why it exists.
6
+
7
+ ## Start Here
8
+
9
+ - [Getting started](getting-started.md): install the package, initialize a
10
+ workspace, run a tool, and start the OpenAI-compatible facade.
11
+ - [Workspace authoring](workspace-authoring.md): define agents, models, tools,
12
+ workflows, memory, and protocol exposure in YAML.
13
+ - [Integration guide](integration-guide.md): embed the runtime inside an app,
14
+ run it through CLI, or expose it through HTTP/OpenAI-compatible clients.
15
+ - [Operator runbook](operator-runbook.md): validate a workspace, inspect
16
+ events, run smoke tests, and keep the runtime operable.
17
+
18
+ ## Product And Adoption
19
+
20
+ - [Adoption playbook](../product/adoption-playbook.md): practical paths for getting a
21
+ team, a product, or a community to try Stable Harness.
22
+ - [Market positioning](../product/market-positioning.md): how to explain Stable Harness
23
+ relative to agent frameworks, protocol gateways, and orchestration systems.
24
+ - [Product boundary](../product-boundary.md): the guardrails that keep the project
25
+ framework-generic and passthrough-first.
26
+ - [Compatibility matrix](../compatibility-matrix.md): current backend and protocol
27
+ surfaces.
28
+
29
+ ## Engineering References
30
+
31
+ - [Implementation blueprint](../implementation-blueprint.md)
32
+ - [Adapter contract](../adapter-contract.md)
33
+ - [Engineering rules](../engineering-rules.md)
34
+ - [OpenAI-compatible protocol](../protocols/openai-compatible.md)
35
+ - [LangGraph-compatible protocol](../protocols/langgraph-compatible.md)
36
+ - [HTTP runtime protocol](../protocols/http-runtime.md)
37
+
38
+ The short version: keep execution semantics upstream-native, define product
39
+ inventory in YAML, and use Stable Harness for lifecycle, governance,
40
+ observability, recovery, tool reliability, protocol access, and operator control.
@@ -0,0 +1,126 @@
1
+ # Integration Guide
2
+
3
+ Stable Harness can be used three ways: as a CLI, as an embedded runtime, or as a
4
+ protocol facade. All three paths use the same workspace inventory and runtime
5
+ boundary.
6
+
7
+ ## CLI Integration
8
+
9
+ Use the CLI while developing or validating a workspace:
10
+
11
+ ```bash
12
+ stable-harness -w ./workspace
13
+ stable-harness -w ./workspace --agent orchestra "Review the latest release evidence."
14
+ stable-harness -w ./workspace --agent orchestra --tool echo_tool --tool-args-json '{"value":"hello"}'
15
+ ```
16
+
17
+ Render inventory without executing a request:
18
+
19
+ ```bash
20
+ stable-harness agent render orchestra -w ./workspace
21
+ stable-harness workflow render review-shell -w ./workspace
22
+ stable-harness workflow inspect review-shell -w ./workspace
23
+ ```
24
+
25
+ The CLI is intentionally thin. It loads the workspace, creates the tool gateway,
26
+ starts memory services, registers adapters, and calls the core runtime.
27
+
28
+ ## Embedded Runtime
29
+
30
+ Use the SDK when Stable Harness runs inside your service:
31
+
32
+ ```ts
33
+ import { createStableHarnessRuntime } from "stable-harness";
34
+
35
+ const runtime = await createStableHarnessRuntime("/srv/my-agent-workspace");
36
+
37
+ const response = await runtime.request({
38
+ input: "Review the current release evidence.",
39
+ agentId: "orchestra",
40
+ sessionId: "customer-123",
41
+ });
42
+
43
+ console.log(response.output);
44
+ ```
45
+
46
+ The runtime exposes operator surfaces for applications that need more than a
47
+ single final answer:
48
+
49
+ - `subscribe`: stream runtime events
50
+ - `inspect`: inspect runtime inventory and state
51
+ - `getRun`: read one run record
52
+ - `listRequests`: list historical or active requests
53
+ - `listSessions`: inspect session state
54
+ - `inspectRequest`: inspect one request lifecycle
55
+ - `cancel`: stop active work
56
+ - `stop`: close the runtime
57
+
58
+ Use those surfaces to build product UI around requests, approvals, event traces,
59
+ artifacts, and memory lifecycle.
60
+
61
+ ## OpenAI-Compatible Facade
62
+
63
+ Start the server:
64
+
65
+ ```bash
66
+ stable-harness start -w ./workspace --host 127.0.0.1 --port 8642
67
+ ```
68
+
69
+ Point compatible clients at:
70
+
71
+ ```text
72
+ http://127.0.0.1:8642/v1
73
+ ```
74
+
75
+ Use this path when an existing product or evaluation harness already speaks
76
+ OpenAI-compatible chat completions. Stable Harness still owns workspace loading,
77
+ runtime events, tool-gateway policy, memory lifecycle, and adapter selection.
78
+
79
+ ## HTTP Runtime Protocol
80
+
81
+ Use the HTTP runtime protocol when the caller needs Stable Harness concepts
82
+ directly: request IDs, sessions, traces, approvals, cancellation, and runtime
83
+ inspection.
84
+
85
+ See [HTTP runtime protocol](protocols/http-runtime.md).
86
+
87
+ ## Backend Adapters
88
+
89
+ Adapters translate Stable Harness runtime requests into upstream backend calls.
90
+
91
+ Current public adapter surfaces:
92
+
93
+ - DeepAgents: primary agent backend
94
+ - LangGraph: workflow and graph topology adapter
95
+ - Custom adapters: implement the runtime adapter contract
96
+
97
+ Adapter rule: preserve upstream execution semantics first. Add wrappers only for
98
+ runtime lifecycle, governance, observability, persistence, recovery, protocol
99
+ access, memory lifecycle, or tool-gateway control.
100
+
101
+ ## Tool Gateway
102
+
103
+ The tool gateway is the boundary where Stable Harness can validate, repair,
104
+ authorize, execute, and observe tool calls.
105
+
106
+ The default CLI path configures BetterCall repair mode for registered tools.
107
+ This helps small models recover malformed arguments, missing wrappers, or
108
+ near-miss tool-call shapes without allowing arbitrary execution.
109
+
110
+ Repair remains constrained:
111
+
112
+ - registered tool inventory is authoritative
113
+ - schemas and semantic validators are authoritative
114
+ - approval and sandbox policy can block execution
115
+ - repaired calls remain visible through runtime events
116
+
117
+ ## Production Embedding Checklist
118
+
119
+ - Keep a workspace directory next to the service deployment.
120
+ - Keep secrets in environment variables or the service secret manager.
121
+ - Start one runtime per workspace or tenant boundary.
122
+ - Subscribe to events and persist the run IDs your product exposes.
123
+ - Treat request cancellation and timeout as product-level controls.
124
+ - Run a registry/package smoke test before publishing a service image.
125
+ - Validate OpenAI-compatible behavior only through the facade when that is the
126
+ public contract your callers use.
@@ -0,0 +1,153 @@
1
+ # Operator Runbook
2
+
3
+ This runbook is for teams operating a Stable Harness workspace in development,
4
+ CI, or production-like environments.
5
+
6
+ ## Validate The Package
7
+
8
+ For framework development:
9
+
10
+ ```bash
11
+ npm run check:rules
12
+ npm run check
13
+ npm test
14
+ ```
15
+
16
+ Before publishing:
17
+
18
+ ```bash
19
+ npm run release:pack
20
+ npm run release:smoke
21
+ ```
22
+
23
+ `release:smoke` installs the packed package into an isolated temporary project,
24
+ initializes a workspace, and runs a CLI tool call. It catches missing published
25
+ dependencies that monorepo tests can hide.
26
+
27
+ ## Validate A Workspace
28
+
29
+ Show workspace status:
30
+
31
+ ```bash
32
+ stable-harness -w ./workspace
33
+ ```
34
+
35
+ Render agent inventory:
36
+
37
+ ```bash
38
+ stable-harness agent render orchestra -w ./workspace
39
+ ```
40
+
41
+ Render or inspect workflow topology:
42
+
43
+ ```bash
44
+ stable-harness workflow render review-shell -w ./workspace
45
+ stable-harness workflow inspect review-shell -w ./workspace
46
+ ```
47
+
48
+ Run a direct tool smoke:
49
+
50
+ ```bash
51
+ stable-harness -w ./workspace --agent orchestra --tool echo_tool --tool-args-json '{"value":"operator-smoke"}'
52
+ ```
53
+
54
+ Run a model-backed request:
55
+
56
+ ```bash
57
+ stable-harness -w ./workspace --agent orchestra --trace "Use the echo tool with operator-smoke."
58
+ ```
59
+
60
+ ## Start And Stop Protocol Serving
61
+
62
+ Start:
63
+
64
+ ```bash
65
+ stable-harness start -w ./workspace --host 127.0.0.1 --port 8642
66
+ ```
67
+
68
+ Stop:
69
+
70
+ ```bash
71
+ stable-harness stop -w ./workspace
72
+ ```
73
+
74
+ Use `STABLE_HARNESS_OPENAI_HOST`, `STABLE_HARNESS_OPENAI_PORT`, and
75
+ `STABLE_HARNESS_OPENAI_API_KEY` when the server is managed by an environment
76
+ wrapper.
77
+
78
+ ## Read Runtime Evidence
79
+
80
+ Prefer structured runtime evidence over final prose:
81
+
82
+ - request ID
83
+ - session ID
84
+ - runtime event stream
85
+ - tool invocation events
86
+ - repaired tool-call diagnostics
87
+ - approval decisions
88
+ - memory lifecycle events
89
+ - artifacts and exported traces
90
+
91
+ Final answers are user-facing presentation. Runtime events are the operator
92
+ record.
93
+
94
+ ## Common Failures
95
+
96
+ ### The CLI cannot find a package after publishing
97
+
98
+ Run:
99
+
100
+ ```bash
101
+ npm run release:smoke
102
+ ```
103
+
104
+ If the smoke fails only after isolated install, add the missing dependency to
105
+ the root package that is published to npm, not only to a workspace child package.
106
+
107
+ ### A tool exists in code but is unavailable to the agent
108
+
109
+ Check both surfaces:
110
+
111
+ - the tool module under `resources/tools`
112
+ - the `kind: Tool` inventory document
113
+
114
+ The runtime only exposes registered inventory.
115
+
116
+ ### A model can answer but does not call tools reliably
117
+
118
+ Check the tool schema and the gateway events first. BetterCall repair can fix
119
+ malformed calls, but it cannot authorize unknown tools or infer product intent
120
+ from prose.
121
+
122
+ ### A workflow runs the wrong topology
123
+
124
+ Render it:
125
+
126
+ ```bash
127
+ stable-harness workflow render <workflow-id> -w ./workspace
128
+ ```
129
+
130
+ Workflow edges are static control-plane topology. They should not depend on user
131
+ phrasing or benchmark-specific cases.
132
+
133
+ ### Output looks correct but production behavior is uncertain
134
+
135
+ Run the exact public path your users call:
136
+
137
+ - CLI if the CLI is the product surface
138
+ - OpenAI-compatible facade if clients use `/v1`
139
+ - embedded runtime if your service calls the SDK
140
+
141
+ Do not treat a nearby internal smoke as proof of a different public contract.
142
+
143
+ ## Release Evidence
144
+
145
+ A release is ready when the evidence includes:
146
+
147
+ - project rules pass
148
+ - TypeScript check passes
149
+ - full tests pass
150
+ - package dry-run passes
151
+ - isolated npm install smoke passes
152
+ - the published version can initialize and run a real workspace
153
+ - GitHub release and code-quality checks are green