stable-harness 0.0.7 → 0.0.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +10 -0
- package/docs/0.1.0-p0-runtime-control-plane-plan.zh.md +171 -0
- package/docs/0.1.0-retry-policy.zh.md +87 -0
- package/docs/0.1.0-stable-runtime-development-roadmap.zh.md +393 -0
- package/docs/0.1.0-tool-guard-benchmark.zh.md +42 -0
- package/docs/adapter-contract.md +199 -0
- package/docs/architecture/backend-comparison.md +41 -0
- package/docs/architecture/runtime-events.md +263 -0
- package/docs/architecture/runtime-events.zh.md +248 -0
- package/docs/architecture/system-architecture.zh.md +435 -0
- package/docs/compatibility-matrix.md +139 -0
- package/docs/engineering-rules.md +111 -0
- package/docs/evaluation/0.1.0-bfcl-targeted-model-matrix.zh.md +1632 -0
- package/docs/evaluation/0.1.0-bfcl-targeted-review-matrix.zh.md +1952 -0
- package/docs/evaluation/0.1.0-bfcl-tool-guard.zh.md +1427 -0
- package/docs/granite-tool-calling-comparison.zh.md +206 -0
- package/docs/guides/getting-started.md +126 -0
- package/docs/guides/index.md +40 -0
- package/docs/guides/integration-guide.md +126 -0
- package/docs/guides/operator-runbook.md +153 -0
- package/docs/guides/workspace-authoring.md +212 -0
- package/docs/implementation-blueprint.md +233 -0
- package/docs/memory/0.1.0-memory-design.zh.md +719 -0
- package/docs/memory/0.1.0-step-09-deepagents-native-memory.zh.md +146 -0
- package/docs/memory/0.1.0-step-09-langmem-shaped-provider.zh.md +169 -0
- package/docs/memory/0.1.0-step-09-memory-adapter-projection.zh.md +123 -0
- package/docs/memory/0.1.0-step-09-memory-contract.zh.md +169 -0
- package/docs/memory/0.1.0-step-09-memory-governance-approval.zh.md +143 -0
- package/docs/memory/0.1.0-step-09-memory-lifecycle-hooks.zh.md +150 -0
- package/docs/memory/0.1.0-step-09-memory-maintenance-boundary.zh.md +118 -0
- package/docs/memory/0.1.0-step-09-memory-persistence-boundary.zh.md +118 -0
- package/docs/product/adoption-playbook.md +145 -0
- package/docs/product/market-positioning.md +137 -0
- package/docs/product-boundary.md +258 -0
- package/docs/protocols/http-runtime.md +37 -0
- package/docs/protocols/langgraph-compatible.md +107 -0
- package/docs/protocols/openai-compatible.md +121 -0
- package/docs/tooling/0.1.0-bettercall-tool-quality.zh.md +231 -0
- package/package.json +3 -1
|
@@ -0,0 +1,206 @@
|
|
|
1
|
+
# Self-hosted Tool Calling Comparison
|
|
2
|
+
|
|
3
|
+
本诊断用于比较 self-hosted 模型在不同 backend 下的工具调用表现。当前低显存默认选择是 Ollama `qwen3.5:0.8b`。
|
|
4
|
+
|
|
5
|
+
## Scope
|
|
6
|
+
|
|
7
|
+
- Default Ollama model id: `qwen3.5:0.8b`
|
|
8
|
+
- Default Ollama options: `num_ctx=2048`, `num_predict=256`
|
|
9
|
+
- Granite baseline model id: `granite4.1:3b`
|
|
10
|
+
- Hugging Face model id: `ibm-granite/granite-4.1-3b`
|
|
11
|
+
- Ollama native endpoint: `/api/chat`
|
|
12
|
+
- vLLM/SGLang endpoint: OpenAI-compatible `/v1/chat/completions`
|
|
13
|
+
|
|
14
|
+
这个脚本只做诊断,不改变 stable-harness 默认 runtime 行为。
|
|
15
|
+
|
|
16
|
+
## Commands
|
|
17
|
+
|
|
18
|
+
Run local Ollama only:
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
TOOLCALL_RUN_OPENAI=false npm run compare:tool-calling
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
Run remote Ollama only:
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
TOOLCALL_OLLAMA_BASE_URL=http://192.168.0.33:11434 \
|
|
28
|
+
TOOLCALL_RUN_OPENAI=false \
|
|
29
|
+
npm run compare:tool-calling
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
Run an OpenAI-compatible backend:
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
TOOLCALL_RUN_OLLAMA=false \
|
|
36
|
+
TOOLCALL_OPENAI_BASE_URL=http://127.0.0.1:8000/v1 \
|
|
37
|
+
TOOLCALL_OPENAI_MODEL=ibm-granite/granite-4.1-3b \
|
|
38
|
+
npm run compare:tool-calling
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
Use named function mode to test argument generation after runtime-selected tool choice:
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
TOOLCALL_RUN_OLLAMA=false \
|
|
45
|
+
TOOLCALL_OPENAI_TOOL_CHOICE=named \
|
|
46
|
+
TOOLCALL_OPENAI_BASE_URL=http://127.0.0.1:8000/v1 \
|
|
47
|
+
TOOLCALL_OPENAI_MODEL=ibm-granite/granite-4.1-3b \
|
|
48
|
+
npm run compare:tool-calling
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
## Backend Startup
|
|
52
|
+
|
|
53
|
+
vLLM on a CUDA host:
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
vllm serve ibm-granite/granite-4.1-3b \
|
|
57
|
+
--host 0.0.0.0 \
|
|
58
|
+
--port 8000 \
|
|
59
|
+
--enable-auto-tool-choice \
|
|
60
|
+
--tool-call-parser hermes
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
SGLang on a CUDA host:
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
python -m sglang.launch_server \
|
|
67
|
+
--model-path ibm-granite/granite-4.1-3b \
|
|
68
|
+
--host 0.0.0.0 \
|
|
69
|
+
--port 30000 \
|
|
70
|
+
--grammar-backend xgrammar
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
Parser support varies by backend and model template. Keep parser selection as backend config, not stable-harness runtime logic.
|
|
74
|
+
|
|
75
|
+
## 2026-05-05 Baseline
|
|
76
|
+
|
|
77
|
+
Validated in this checkout:
|
|
78
|
+
|
|
79
|
+
- local Ollama `http://127.0.0.1:11434`, model `granite4.1:3b`: 3/3 passed.
|
|
80
|
+
- remote Ollama `https://ollama-rtx-4070.easynet.world`, model `granite4.1:3b`: 3/3 passed.
|
|
81
|
+
- local Ollama through OpenAI-compatible `/v1`, `tool_choice=required`: 3/3 passed.
|
|
82
|
+
- local Ollama through OpenAI-compatible `/v1`, `tool_choice=named`: 3/3 passed.
|
|
83
|
+
- CUDA host Ollama `http://192.168.0.201:11434`, model `granite4.1:3b`: 3/3 passed.
|
|
84
|
+
- CUDA host Ollama through OpenAI-compatible `/v1`, `tool_choice=required`: 3/3 passed.
|
|
85
|
+
- CUDA host Ollama through OpenAI-compatible `/v1`, `tool_choice=named`: 3/3 passed.
|
|
86
|
+
- CUDA host vLLM `http://192.168.0.201:8000/v1`, model `ibm-granite/granite-4.1-3b`, `tool_choice=auto`: 3/3 passed.
|
|
87
|
+
- CUDA host vLLM `http://192.168.0.201:8000/v1`, model `ibm-granite/granite-4.1-3b`, `tool_choice=required`: 3/3 passed.
|
|
88
|
+
- CUDA host vLLM `http://192.168.0.201:8000/v1`, model `ibm-granite/granite-4.1-3b`, `tool_choice=named`: 3/3 passed.
|
|
89
|
+
|
|
90
|
+
## 2026-05-05 Deep Matrix
|
|
91
|
+
|
|
92
|
+
Expanded matrix: 11 cases across current weather, forecast vs current-weather, create vs search ticket, shell vs file read, Chinese Kubernetes, Chinese stock quote, general-news vs finance, exact shell command, and enum priority.
|
|
93
|
+
|
|
94
|
+
Results:
|
|
95
|
+
|
|
96
|
+
| Backend | Mode | Result | Notes |
|
|
97
|
+
| --- | --- | ---: | --- |
|
|
98
|
+
| Ollama `192.168.0.201:11434` | native auto tools | 11/11 | News query contained unnatural Chinese but still matched the relaxed semantic check. |
|
|
99
|
+
| Ollama `192.168.0.201:11434/v1` | auto | 11/11 | Same issue: news query contained unnatural Chinese. |
|
|
100
|
+
| Ollama `192.168.0.201:11434/v1` | required | 11/11 | Same issue: news query contained unnatural Chinese. |
|
|
101
|
+
| Ollama `192.168.0.201:11434/v1` | named | 10/11 | Failed news-query semantic check: `不止 定期 AI 环境 新雪`. |
|
|
102
|
+
| vLLM `192.168.0.201:8000/v1` | auto | 10/11 | Tool selection was correct, but news-query args were semantically bad. |
|
|
103
|
+
| vLLM `192.168.0.201:8000/v1` | required | 10/11 | News case failed with vLLM `400 Bad Request` from invalid JSON emitted by the tool parser. |
|
|
104
|
+
| vLLM `192.168.0.201:8000/v1` | named | 11/11 | Best result in this matrix. Runtime-selected tool plus backend-generated args was most stable. |
|
|
105
|
+
|
|
106
|
+
Interpretation:
|
|
107
|
+
|
|
108
|
+
- Both backends can select the right tool on this matrix.
|
|
109
|
+
- The failure mode is argument semantics, especially Chinese query generation.
|
|
110
|
+
- vLLM `required` is not automatically safer than `auto`; parser-level strictness can surface as hard 400 errors.
|
|
111
|
+
- `named` mode best matches the stable-harness two-step strategy: runtime chooses the tool from typed inventory, backend fills schema-bound arguments, gateway validates before execution.
|
|
112
|
+
|
|
113
|
+
Not yet validated here:
|
|
114
|
+
|
|
115
|
+
- SGLang with `ibm-granite/granite-4.1-3b`.
|
|
116
|
+
|
|
117
|
+
vLLM startup details:
|
|
118
|
+
|
|
119
|
+
- Host: `boqiang@192.168.0.201`
|
|
120
|
+
- GPU: RTX 4070 Ti SUPER 16GB
|
|
121
|
+
- Driver: `550.163.01`
|
|
122
|
+
- vLLM image: `vllm/vllm-openai:v0.8.5`
|
|
123
|
+
- `latest` image was not usable on this host because it required CUDA 13.
|
|
124
|
+
- Container: `stable-granite-vllm`
|
|
125
|
+
|
|
126
|
+
SGLang remains unvalidated because no SGLang server was listening on `192.168.0.201:30000`.
|
|
127
|
+
|
|
128
|
+
## 2026-05-05 CUDA 13 Upgrade And GPT-OSS
|
|
129
|
+
|
|
130
|
+
CUDA host `192.168.0.201` was upgraded from NVIDIA driver `550.163.01` to `580.142`.
|
|
131
|
+
|
|
132
|
+
Validation:
|
|
133
|
+
|
|
134
|
+
- Host `nvidia-smi`: driver `580.142`, CUDA `13.0`.
|
|
135
|
+
- CUDA 13 container smoke: `nvidia/cuda:13.0.0-base-ubuntu24.04` ran `nvidia-smi` successfully.
|
|
136
|
+
- `vllm/vllm-openai:latest` now starts on this host.
|
|
137
|
+
|
|
138
|
+
APT note:
|
|
139
|
+
|
|
140
|
+
- `apt update` was blocked by a stale Warp signing key and a removed Kubernetes `apt.kubernetes.io kubernetes-xenial` source.
|
|
141
|
+
- Those two source entries were disabled with `# codex-disabled`; backups were written beside them as `*.codex-bak`.
|
|
142
|
+
|
|
143
|
+
GPT-OSS run:
|
|
144
|
+
|
|
145
|
+
- Container: `stable-gpt-oss-vllm`
|
|
146
|
+
- Image: `vllm/vllm-openai:latest`
|
|
147
|
+
- vLLM version from logs: `0.20.1`
|
|
148
|
+
- Endpoint: `http://192.168.0.201:8001/v1`
|
|
149
|
+
- Model: `openai/gpt-oss-20b`
|
|
150
|
+
- Working startup flags: `--max-model-len 2048 --gpu-memory-utilization 0.97 --enforce-eager --no-enable-prefix-caching`
|
|
151
|
+
|
|
152
|
+
The initial `4096` context attempt failed before serving:
|
|
153
|
+
|
|
154
|
+
- Model loaded: `13.72 GiB`.
|
|
155
|
+
- Error: no available memory for KV cache blocks.
|
|
156
|
+
|
|
157
|
+
Tool matrix:
|
|
158
|
+
|
|
159
|
+
| Backend | Mode | Result | Notes |
|
|
160
|
+
| --- | --- | ---: | --- |
|
|
161
|
+
| vLLM latest GPT-OSS 20B | auto | 10/11 | Tool selection correct; `read_file` path lost leading `/`. |
|
|
162
|
+
| vLLM latest GPT-OSS 20B | named | 10/11 | Same path failure. Runtime-selected tool does not fix path precision. |
|
|
163
|
+
|
|
164
|
+
Interpretation:
|
|
165
|
+
|
|
166
|
+
- GPT-OSS 20B can run on this 16GB GPU through vLLM latest after the driver upgrade, but only with conservative memory settings.
|
|
167
|
+
- It is faster than Granite in this matrix, but the absolute-path error shows that schema/tool parsing still needs runtime semantic validation before execution.
|
|
168
|
+
|
|
169
|
+
## 2026-05-05 Model Matrix
|
|
170
|
+
|
|
171
|
+
Expanded matrix: 15 cases. The additional cases cover closed-ticket search, Hong Kong stock lookup, monthly web freshness, and absolute paths containing spaces.
|
|
172
|
+
|
|
173
|
+
Ollama models currently visible on `192.168.0.201` include:
|
|
174
|
+
|
|
175
|
+
- Small/medium chat candidates: `qwen3.5:0.8b`, `qwen3:0.6b`, `lfm2.5-thinking:latest`, `granite4.1:3b`, `qwen3.5:2b`, `qwen3.5:4b`, `qwen3:latest`, `qwen3.5:9b`, `qwen2.5:7b-instruct`, `gemma4:e2b`, `gemma4:e4b`, `gpt-oss:latest`.
|
|
176
|
+
- Very large or slower candidates not included in the fast matrix: `qwen3.6:27b`, `qwen3.6:35b`, `gemma4:26b`, `qwen3.5:27b`, `nemotron-cascade-2:30b`, `lfm2:latest`.
|
|
177
|
+
- Non-chat/specialized models not included: `nomic-embed-text:latest`, `glm-ocr:q8_0`.
|
|
178
|
+
|
|
179
|
+
Fast matrix results:
|
|
180
|
+
|
|
181
|
+
| Backend | Model | Mode | Result | Main failure |
|
|
182
|
+
| --- | --- | --- | ---: | --- |
|
|
183
|
+
| Ollama | `qwen3.5:2b` | native auto tools | 15/15 | none |
|
|
184
|
+
| Ollama | `qwen3.5:4b` | native auto tools | 15/15 | none |
|
|
185
|
+
| Ollama | `qwen3.5:9b` | native auto tools | 15/15 | none |
|
|
186
|
+
| Ollama | `qwen3.5:0.8b` | native auto tools | 14/15 | `freshness` day instead of month |
|
|
187
|
+
| Ollama | `granite4.1:3b` | native auto tools | 14/15 | Chinese news query became unnatural text |
|
|
188
|
+
| Ollama | `qwen3:latest` | native auto tools | 14/15 | Workday ticker became `WORK` |
|
|
189
|
+
| Ollama | `qwen2.5:7b-instruct` | native auto tools | 14/15 | path with space was collapsed |
|
|
190
|
+
| Ollama | `gemma4:e2b` | native auto tools | 14/15 | missed Chinese Workday stock tool call |
|
|
191
|
+
| Ollama | `gemma4:e4b` | native auto tools | 14/15 | missed Chinese Workday stock tool call |
|
|
192
|
+
| Ollama | `lfm2.5-thinking:latest` | native auto tools | 13/15 | namespace typo and HK market error |
|
|
193
|
+
| Ollama | `qwen3:0.6b` | native auto tools | 12/15 | one timeout, weaker exact title/query handling |
|
|
194
|
+
| Ollama | `gpt-oss:latest` | native auto tools | 12/15 | path and freshness enum errors |
|
|
195
|
+
| vLLM latest | `ibm-granite/granite-4.1-3b` | auto | 0/15 | no tool calls emitted |
|
|
196
|
+
| vLLM latest | `ibm-granite/granite-4.1-3b` | required | 13/15 | Chinese query corruption and HK ticker normalization |
|
|
197
|
+
| vLLM latest | `ibm-granite/granite-4.1-3b` | named | 14/15 | HK ticker normalized as `00700` |
|
|
198
|
+
| vLLM latest | `openai/gpt-oss-20b` | auto | 14/15 | absolute path lost leading `/` |
|
|
199
|
+
| vLLM latest | `openai/gpt-oss-20b` | named | 14/15 | same path failure |
|
|
200
|
+
|
|
201
|
+
Current recommendation:
|
|
202
|
+
|
|
203
|
+
1. Stable default for this host: Ollama `qwen3.5:4b`.
|
|
204
|
+
2. Best runtime-controlled strategy: vLLM named tools, but only after the runtime has selected the tool and semantic validators are active.
|
|
205
|
+
3. Do not use vLLM Granite auto tool calling as a default on `vllm/vllm-openai:latest`; it emitted no tool calls in this matrix.
|
|
206
|
+
4. Do not trust any backend without semantic validation. Passing JSON schema still missed cases such as absolute path preservation, ticker normalization, freshness enum semantics, namespace spelling, and Chinese query quality.
|
|
@@ -0,0 +1,126 @@
|
|
|
1
|
+
# Getting Started
|
|
2
|
+
|
|
3
|
+
This guide gets a new application from zero to a runnable Stable Harness
|
|
4
|
+
workspace without cloning the Stable Harness repository.
|
|
5
|
+
|
|
6
|
+
## Requirements
|
|
7
|
+
|
|
8
|
+
- Node.js `>=24 <25`
|
|
9
|
+
- An OpenAI-compatible model endpoint, or environment variables that point at
|
|
10
|
+
the model provider you want to use
|
|
11
|
+
|
|
12
|
+
## Install
|
|
13
|
+
|
|
14
|
+
```bash
|
|
15
|
+
npm install stable-harness
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
For a one-off trial:
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
npx stable-harness init ./my-agent-app
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## Initialize A Workspace
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
stable-harness init ./my-agent-app
|
|
28
|
+
cd ./my-agent-app
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
The scaffold creates:
|
|
32
|
+
|
|
33
|
+
```text
|
|
34
|
+
config/
|
|
35
|
+
agents/orchestra.yaml
|
|
36
|
+
catalogs/models.yaml
|
|
37
|
+
runtime/workspace.yaml
|
|
38
|
+
resources/
|
|
39
|
+
tools/echo_tool.mjs
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
It does not overwrite existing scaffold files. If the target already contains a
|
|
43
|
+
generated file, Stable Harness stops so you can decide what to keep.
|
|
44
|
+
|
|
45
|
+
## Configure The Model
|
|
46
|
+
|
|
47
|
+
The default scaffold expects an OpenAI-compatible endpoint:
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
export OPENAI_API_KEY=...
|
|
51
|
+
export STABLE_HARNESS_MODEL=gpt-4.1-mini
|
|
52
|
+
export STABLE_HARNESS_OPENAI_BASE_URL=https://api.openai.com/v1
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
For a local or private OpenAI-compatible endpoint, keep the same model catalog
|
|
56
|
+
shape and change the environment variables:
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
export STABLE_HARNESS_MODEL=granite4.1:3b
|
|
60
|
+
export STABLE_HARNESS_OPENAI_BASE_URL=http://127.0.0.1:11434/v1
|
|
61
|
+
export OPENAI_API_KEY=local
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
## Run The Workspace
|
|
65
|
+
|
|
66
|
+
Show the loaded workspace:
|
|
67
|
+
|
|
68
|
+
```bash
|
|
69
|
+
stable-harness -w .
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
Invoke the scaffolded tool through the runtime gateway:
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
stable-harness -w . --agent orchestra --tool echo_tool --tool-args-json '{"value":"hello"}'
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
Send a natural-language request to the default agent:
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
stable-harness -w . --agent orchestra "Summarize what this workspace can do."
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
Print runtime trace lines:
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
stable-harness -w . --agent orchestra --trace "Call the echo tool with hello."
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## Start The OpenAI-Compatible Facade
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
stable-harness start -w . --port 8642
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
Clients can point at:
|
|
97
|
+
|
|
98
|
+
```text
|
|
99
|
+
http://127.0.0.1:8642/v1
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
The facade is a protocol adapter around the same Stable Harness runtime. It is
|
|
103
|
+
not a separate execution path.
|
|
104
|
+
|
|
105
|
+
## Develop The Framework
|
|
106
|
+
|
|
107
|
+
Clone the repository only when you are changing Stable Harness itself:
|
|
108
|
+
|
|
109
|
+
```bash
|
|
110
|
+
git clone git@github.com:botbotgo/stable-harness.git
|
|
111
|
+
cd stable-harness
|
|
112
|
+
npm install
|
|
113
|
+
npm run build
|
|
114
|
+
npm run check:rules
|
|
115
|
+
npm test
|
|
116
|
+
npm run example:minimal
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
## What To Do Next
|
|
120
|
+
|
|
121
|
+
- Add real tools under `resources/tools`.
|
|
122
|
+
- Let local module tools be auto-discovered, or declare non-module tools in
|
|
123
|
+
`config/catalogs/tools.yaml`.
|
|
124
|
+
- Add specialist agents under `config/agents`.
|
|
125
|
+
- Enable memory, approvals, or protocol exposure in `config/runtime/workspace.yaml`.
|
|
126
|
+
- Embed the runtime in your service when the CLI path is stable.
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# Stable Harness Documentation
|
|
2
|
+
|
|
3
|
+
Stable Harness is a stable application runtime and operator control plane for
|
|
4
|
+
agent workspaces. These docs are organized for teams that want to adopt it,
|
|
5
|
+
embed it, operate it, or explain why it exists.
|
|
6
|
+
|
|
7
|
+
## Start Here
|
|
8
|
+
|
|
9
|
+
- [Getting started](getting-started.md): install the package, initialize a
|
|
10
|
+
workspace, run a tool, and start the OpenAI-compatible facade.
|
|
11
|
+
- [Workspace authoring](workspace-authoring.md): define agents, models, tools,
|
|
12
|
+
workflows, memory, and protocol exposure in YAML.
|
|
13
|
+
- [Integration guide](integration-guide.md): embed the runtime inside an app,
|
|
14
|
+
run it through CLI, or expose it through HTTP/OpenAI-compatible clients.
|
|
15
|
+
- [Operator runbook](operator-runbook.md): validate a workspace, inspect
|
|
16
|
+
events, run smoke tests, and keep the runtime operable.
|
|
17
|
+
|
|
18
|
+
## Product And Adoption
|
|
19
|
+
|
|
20
|
+
- [Adoption playbook](../product/adoption-playbook.md): practical paths for getting a
|
|
21
|
+
team, a product, or a community to try Stable Harness.
|
|
22
|
+
- [Market positioning](../product/market-positioning.md): how to explain Stable Harness
|
|
23
|
+
relative to agent frameworks, protocol gateways, and orchestration systems.
|
|
24
|
+
- [Product boundary](../product-boundary.md): the guardrails that keep the project
|
|
25
|
+
framework-generic and passthrough-first.
|
|
26
|
+
- [Compatibility matrix](../compatibility-matrix.md): current backend and protocol
|
|
27
|
+
surfaces.
|
|
28
|
+
|
|
29
|
+
## Engineering References
|
|
30
|
+
|
|
31
|
+
- [Implementation blueprint](../implementation-blueprint.md)
|
|
32
|
+
- [Adapter contract](../adapter-contract.md)
|
|
33
|
+
- [Engineering rules](../engineering-rules.md)
|
|
34
|
+
- [OpenAI-compatible protocol](../protocols/openai-compatible.md)
|
|
35
|
+
- [LangGraph-compatible protocol](../protocols/langgraph-compatible.md)
|
|
36
|
+
- [HTTP runtime protocol](../protocols/http-runtime.md)
|
|
37
|
+
|
|
38
|
+
The short version: keep execution semantics upstream-native, define product
|
|
39
|
+
inventory in YAML, and use Stable Harness for lifecycle, governance,
|
|
40
|
+
observability, recovery, tool reliability, protocol access, and operator control.
|
|
@@ -0,0 +1,126 @@
|
|
|
1
|
+
# Integration Guide
|
|
2
|
+
|
|
3
|
+
Stable Harness can be used three ways: as a CLI, as an embedded runtime, or as a
|
|
4
|
+
protocol facade. All three paths use the same workspace inventory and runtime
|
|
5
|
+
boundary.
|
|
6
|
+
|
|
7
|
+
## CLI Integration
|
|
8
|
+
|
|
9
|
+
Use the CLI while developing or validating a workspace:
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
stable-harness -w ./workspace
|
|
13
|
+
stable-harness -w ./workspace --agent orchestra "Review the latest release evidence."
|
|
14
|
+
stable-harness -w ./workspace --agent orchestra --tool echo_tool --tool-args-json '{"value":"hello"}'
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
Render inventory without executing a request:
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
stable-harness agent render orchestra -w ./workspace
|
|
21
|
+
stable-harness workflow render review-shell -w ./workspace
|
|
22
|
+
stable-harness workflow inspect review-shell -w ./workspace
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
The CLI is intentionally thin. It loads the workspace, creates the tool gateway,
|
|
26
|
+
starts memory services, registers adapters, and calls the core runtime.
|
|
27
|
+
|
|
28
|
+
## Embedded Runtime
|
|
29
|
+
|
|
30
|
+
Use the SDK when Stable Harness runs inside your service:
|
|
31
|
+
|
|
32
|
+
```ts
|
|
33
|
+
import { createStableHarnessRuntime } from "stable-harness";
|
|
34
|
+
|
|
35
|
+
const runtime = await createStableHarnessRuntime("/srv/my-agent-workspace");
|
|
36
|
+
|
|
37
|
+
const response = await runtime.request({
|
|
38
|
+
input: "Review the current release evidence.",
|
|
39
|
+
agentId: "orchestra",
|
|
40
|
+
sessionId: "customer-123",
|
|
41
|
+
});
|
|
42
|
+
|
|
43
|
+
console.log(response.output);
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
The runtime exposes operator surfaces for applications that need more than a
|
|
47
|
+
single final answer:
|
|
48
|
+
|
|
49
|
+
- `subscribe`: stream runtime events
|
|
50
|
+
- `inspect`: inspect runtime inventory and state
|
|
51
|
+
- `getRun`: read one run record
|
|
52
|
+
- `listRequests`: list historical or active requests
|
|
53
|
+
- `listSessions`: inspect session state
|
|
54
|
+
- `inspectRequest`: inspect one request lifecycle
|
|
55
|
+
- `cancel`: stop active work
|
|
56
|
+
- `stop`: close the runtime
|
|
57
|
+
|
|
58
|
+
Use those surfaces to build product UI around requests, approvals, event traces,
|
|
59
|
+
artifacts, and memory lifecycle.
|
|
60
|
+
|
|
61
|
+
## OpenAI-Compatible Facade
|
|
62
|
+
|
|
63
|
+
Start the server:
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
stable-harness start -w ./workspace --host 127.0.0.1 --port 8642
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Point compatible clients at:
|
|
70
|
+
|
|
71
|
+
```text
|
|
72
|
+
http://127.0.0.1:8642/v1
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
Use this path when an existing product or evaluation harness already speaks
|
|
76
|
+
OpenAI-compatible chat completions. Stable Harness still owns workspace loading,
|
|
77
|
+
runtime events, tool-gateway policy, memory lifecycle, and adapter selection.
|
|
78
|
+
|
|
79
|
+
## HTTP Runtime Protocol
|
|
80
|
+
|
|
81
|
+
Use the HTTP runtime protocol when the caller needs Stable Harness concepts
|
|
82
|
+
directly: request IDs, sessions, traces, approvals, cancellation, and runtime
|
|
83
|
+
inspection.
|
|
84
|
+
|
|
85
|
+
See [HTTP runtime protocol](protocols/http-runtime.md).
|
|
86
|
+
|
|
87
|
+
## Backend Adapters
|
|
88
|
+
|
|
89
|
+
Adapters translate Stable Harness runtime requests into upstream backend calls.
|
|
90
|
+
|
|
91
|
+
Current public adapter surfaces:
|
|
92
|
+
|
|
93
|
+
- DeepAgents: primary agent backend
|
|
94
|
+
- LangGraph: workflow and graph topology adapter
|
|
95
|
+
- Custom adapters: implement the runtime adapter contract
|
|
96
|
+
|
|
97
|
+
Adapter rule: preserve upstream execution semantics first. Add wrappers only for
|
|
98
|
+
runtime lifecycle, governance, observability, persistence, recovery, protocol
|
|
99
|
+
access, memory lifecycle, or tool-gateway control.
|
|
100
|
+
|
|
101
|
+
## Tool Gateway
|
|
102
|
+
|
|
103
|
+
The tool gateway is the boundary where Stable Harness can validate, repair,
|
|
104
|
+
authorize, execute, and observe tool calls.
|
|
105
|
+
|
|
106
|
+
The default CLI path configures BetterCall repair mode for registered tools.
|
|
107
|
+
This helps small models recover malformed arguments, missing wrappers, or
|
|
108
|
+
near-miss tool-call shapes without allowing arbitrary execution.
|
|
109
|
+
|
|
110
|
+
Repair remains constrained:
|
|
111
|
+
|
|
112
|
+
- registered tool inventory is authoritative
|
|
113
|
+
- schemas and semantic validators are authoritative
|
|
114
|
+
- approval and sandbox policy can block execution
|
|
115
|
+
- repaired calls remain visible through runtime events
|
|
116
|
+
|
|
117
|
+
## Production Embedding Checklist
|
|
118
|
+
|
|
119
|
+
- Keep a workspace directory next to the service deployment.
|
|
120
|
+
- Keep secrets in environment variables or the service secret manager.
|
|
121
|
+
- Start one runtime per workspace or tenant boundary.
|
|
122
|
+
- Subscribe to events and persist the run IDs your product exposes.
|
|
123
|
+
- Treat request cancellation and timeout as product-level controls.
|
|
124
|
+
- Run a registry/package smoke test before publishing a service image.
|
|
125
|
+
- Validate OpenAI-compatible behavior only through the facade when that is the
|
|
126
|
+
public contract your callers use.
|
|
@@ -0,0 +1,153 @@
|
|
|
1
|
+
# Operator Runbook
|
|
2
|
+
|
|
3
|
+
This runbook is for teams operating a Stable Harness workspace in development,
|
|
4
|
+
CI, or production-like environments.
|
|
5
|
+
|
|
6
|
+
## Validate The Package
|
|
7
|
+
|
|
8
|
+
For framework development:
|
|
9
|
+
|
|
10
|
+
```bash
|
|
11
|
+
npm run check:rules
|
|
12
|
+
npm run check
|
|
13
|
+
npm test
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
Before publishing:
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
npm run release:pack
|
|
20
|
+
npm run release:smoke
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
`release:smoke` installs the packed package into an isolated temporary project,
|
|
24
|
+
initializes a workspace, and runs a CLI tool call. It catches missing published
|
|
25
|
+
dependencies that monorepo tests can hide.
|
|
26
|
+
|
|
27
|
+
## Validate A Workspace
|
|
28
|
+
|
|
29
|
+
Show workspace status:
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
stable-harness -w ./workspace
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
Render agent inventory:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
stable-harness agent render orchestra -w ./workspace
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
Render or inspect workflow topology:
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
stable-harness workflow render review-shell -w ./workspace
|
|
45
|
+
stable-harness workflow inspect review-shell -w ./workspace
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
Run a direct tool smoke:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
stable-harness -w ./workspace --agent orchestra --tool echo_tool --tool-args-json '{"value":"operator-smoke"}'
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
Run a model-backed request:
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
stable-harness -w ./workspace --agent orchestra --trace "Use the echo tool with operator-smoke."
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
## Start And Stop Protocol Serving
|
|
61
|
+
|
|
62
|
+
Start:
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
stable-harness start -w ./workspace --host 127.0.0.1 --port 8642
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
Stop:
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
stable-harness stop -w ./workspace
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
Use `STABLE_HARNESS_OPENAI_HOST`, `STABLE_HARNESS_OPENAI_PORT`, and
|
|
75
|
+
`STABLE_HARNESS_OPENAI_API_KEY` when the server is managed by an environment
|
|
76
|
+
wrapper.
|
|
77
|
+
|
|
78
|
+
## Read Runtime Evidence
|
|
79
|
+
|
|
80
|
+
Prefer structured runtime evidence over final prose:
|
|
81
|
+
|
|
82
|
+
- request ID
|
|
83
|
+
- session ID
|
|
84
|
+
- runtime event stream
|
|
85
|
+
- tool invocation events
|
|
86
|
+
- repaired tool-call diagnostics
|
|
87
|
+
- approval decisions
|
|
88
|
+
- memory lifecycle events
|
|
89
|
+
- artifacts and exported traces
|
|
90
|
+
|
|
91
|
+
Final answers are user-facing presentation. Runtime events are the operator
|
|
92
|
+
record.
|
|
93
|
+
|
|
94
|
+
## Common Failures
|
|
95
|
+
|
|
96
|
+
### The CLI cannot find a package after publishing
|
|
97
|
+
|
|
98
|
+
Run:
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
npm run release:smoke
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
If the smoke fails only after isolated install, add the missing dependency to
|
|
105
|
+
the root package that is published to npm, not only to a workspace child package.
|
|
106
|
+
|
|
107
|
+
### A tool exists in code but is unavailable to the agent
|
|
108
|
+
|
|
109
|
+
Check both surfaces:
|
|
110
|
+
|
|
111
|
+
- the tool module under `resources/tools`
|
|
112
|
+
- the `kind: Tool` inventory document
|
|
113
|
+
|
|
114
|
+
The runtime only exposes registered inventory.
|
|
115
|
+
|
|
116
|
+
### A model can answer but does not call tools reliably
|
|
117
|
+
|
|
118
|
+
Check the tool schema and the gateway events first. BetterCall repair can fix
|
|
119
|
+
malformed calls, but it cannot authorize unknown tools or infer product intent
|
|
120
|
+
from prose.
|
|
121
|
+
|
|
122
|
+
### A workflow runs the wrong topology
|
|
123
|
+
|
|
124
|
+
Render it:
|
|
125
|
+
|
|
126
|
+
```bash
|
|
127
|
+
stable-harness workflow render <workflow-id> -w ./workspace
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
Workflow edges are static control-plane topology. They should not depend on user
|
|
131
|
+
phrasing or benchmark-specific cases.
|
|
132
|
+
|
|
133
|
+
### Output looks correct but production behavior is uncertain
|
|
134
|
+
|
|
135
|
+
Run the exact public path your users call:
|
|
136
|
+
|
|
137
|
+
- CLI if the CLI is the product surface
|
|
138
|
+
- OpenAI-compatible facade if clients use `/v1`
|
|
139
|
+
- embedded runtime if your service calls the SDK
|
|
140
|
+
|
|
141
|
+
Do not treat a nearby internal smoke as proof of a different public contract.
|
|
142
|
+
|
|
143
|
+
## Release Evidence
|
|
144
|
+
|
|
145
|
+
A release is ready when the evidence includes:
|
|
146
|
+
|
|
147
|
+
- project rules pass
|
|
148
|
+
- TypeScript check passes
|
|
149
|
+
- full tests pass
|
|
150
|
+
- package dry-run passes
|
|
151
|
+
- isolated npm install smoke passes
|
|
152
|
+
- the published version can initialize and run a real workspace
|
|
153
|
+
- GitHub release and code-quality checks are green
|