@jonathangu/openclawbrain 0.3.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (56) hide show
  1. package/README.md +140 -290
  2. package/docs/END_STATE.md +106 -94
  3. package/docs/EVIDENCE.md +71 -23
  4. package/docs/RELEASE_CONTRACT.md +46 -32
  5. package/docs/agent-tools.md +65 -34
  6. package/docs/architecture.md +128 -142
  7. package/docs/configuration.md +62 -25
  8. package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/channels-status.txt +20 -0
  9. package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/config-snapshot.json +94 -0
  10. package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/doctor.json +14 -0
  11. package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/gateway-probe.txt +24 -0
  12. package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/gateway-status.txt +31 -0
  13. package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/init-capture.json +15 -0
  14. package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/logs.txt +357 -0
  15. package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/status-all.txt +61 -0
  16. package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/status.json +275 -0
  17. package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/summary.md +18 -0
  18. package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/trace.json +222 -0
  19. package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/validation-report.json +1515 -0
  20. package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/workspace-inventory.json +4 -0
  21. package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/channels-status.txt +20 -0
  22. package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/config-snapshot.json +94 -0
  23. package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/doctor.json +14 -0
  24. package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/gateway-probe.txt +24 -0
  25. package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/gateway-status.txt +31 -0
  26. package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/init-capture.json +15 -0
  27. package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/logs.txt +362 -0
  28. package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/status-all.txt +61 -0
  29. package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/status.json +275 -0
  30. package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/summary.md +21 -0
  31. package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/trace.json +222 -0
  32. package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/validation-report.json +4400 -0
  33. package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/workspace-inventory.json +4 -0
  34. package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/channels-status.txt +31 -0
  35. package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/config-snapshot.json +94 -0
  36. package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/doctor.json +14 -0
  37. package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/gateway-probe.txt +34 -0
  38. package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/gateway-status.txt +41 -0
  39. package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/logs.txt +441 -0
  40. package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/status-all.txt +60 -0
  41. package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/status.json +276 -0
  42. package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/summary.md +13 -0
  43. package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/trace.json +4 -0
  44. package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/validation-report.json +387 -0
  45. package/docs/tui.md +11 -4
  46. package/index.ts +194 -1
  47. package/package.json +1 -1
  48. package/src/brain-cli.ts +12 -1
  49. package/src/brain-harvest/scanner.ts +286 -16
  50. package/src/brain-harvest/self.ts +134 -6
  51. package/src/brain-runtime/evidence-detectors.ts +3 -1
  52. package/src/brain-runtime/harvester-extension.ts +3 -0
  53. package/src/brain-runtime/service.ts +2 -0
  54. package/src/brain-store/embedding.ts +29 -8
  55. package/src/brain-worker/worker.ts +40 -0
  56. package/src/engine.ts +1 -0
package/docs/END_STATE.md CHANGED
@@ -1,30 +1,36 @@
1
- # OpenClawBrain v2 — Definitive End-State Guide
1
+ # OpenClawBrain v2 — End-State Guide
2
2
 
3
- This is the canonical implementation guide for finishing the current repo to an honest 1.0.
4
-
5
- The correct posture is:
3
+ This is the canonical maintainer guide for finishing the current repo to an honest 1.0.
6
4
 
5
+ The correct posture is still:
7
6
  - **no reroll**
8
7
  - **keep the current trunk**
9
- - **preserve the inherited lossless-claw substrate**
10
- - **finish proof, operator hardening, evidence quality, mutation gating, and packaging truth**
8
+ - **preserve the inherited LCM / lossless transcript-memory substrate**
9
+ - **finish host proof, operator hardening, evidence quality, mutation gating, and packaging truth**
10
+
11
+ If you want the public/operator-facing truth first, read these before this file:
12
+ - `README.md`
13
+ - `docs/RELEASE_CONTRACT.md`
14
+ - `docs/EVIDENCE.md`
15
+ - `docs/configuration.md`
16
+
17
+ This file is the maintainer execution map, not the public pitch.
11
18
 
12
19
  ## Canonical surfaces
13
20
 
14
21
  These files should anchor future work:
15
-
16
22
  - `README.md` — public front door and fast operator truth
17
- - `docs/RELEASE_CONTRACT.md` — what is true now vs not frozen vs not done
18
- - `docs/END_STATE.md` — this implementation guide
23
+ - `docs/RELEASE_CONTRACT.md` — true now vs implemented-but-not-frozen vs not done
19
24
  - `docs/EVIDENCE.md` — proof ladder and artifact contract
25
+ - `docs/configuration.md` — practical operator setup
26
+ - `docs/END_STATE.md` — this execution guide
20
27
  - `scripts/validate-openclaw-install.mjs` — disposable host-surface harness
21
28
  - `scripts/validate-brain-runtime-behavior.ts` — deterministic runtime proof harness
22
29
 
23
- ## Keep these boundaries intact
24
-
25
- ### Protected inherited substrate — do not rewrite casually
26
- These are inherited LCM / lossless-claw surfaces and should stay stable unless a failing test forces a narrow change:
30
+ ## Boundaries to keep intact
27
31
 
32
+ ### Protected inherited substrate
33
+ These are inherited LCM surfaces and should stay stable unless a failing test forces a narrow change:
28
34
  - `src/assembler.ts`
29
35
  - `src/compaction.ts`
30
36
  - `src/engine.ts`
@@ -36,10 +42,26 @@ These are inherited LCM / lossless-claw surfaces and should stay stable unless a
36
42
  - `tui/*`
37
43
 
38
44
  ### Hard guardrails
39
- - Do **not** add intermediate shaping rewards to the core learning rule.
40
- - Do **not** replace the stochastic learning-time policy with a deterministic scorer.
41
- - Do **not** let serving read mutable training state.
42
- - Do **not** treat old planning docs or archived prototypes as authority.
45
+ - do **not** add shaping rewards to the core learning rule
46
+ - do **not** replace the stochastic learning-time policy with a deterministic scorer
47
+ - do **not** let serving read mutable training state
48
+ - do **not** treat old planning docs or archived prototypes as authority
49
+ - do **not** oversell raw host-prompt `brain_teach` as the release boundary
50
+
51
+ ## Current repo reality
52
+
53
+ ### Already true
54
+ - paper-faithful routing core exists
55
+ - live runtime decisioning exists
56
+ - child-worker serving boundary is real
57
+ - deterministic session-bound `brain_teach` proof exists
58
+ - deterministic runtime proof for teach retrieval and serve-from-last-promoted-pack exists
59
+ - structured raw evidence plus worker-side trust resolution are real
60
+
61
+ ### Still open
62
+ - Phase 4: mutation bundles (not yet implemented - requires new code)
63
+ - Phase 5: CI proof ladder (DONE - .github/workflows/publish.yml runs tests)
64
+ - Phase 6: package/type cleanup (tsc has SDK drift errors, but runtime works - 335 tests pass)
43
65
 
44
66
  ## Current code map
45
67
 
@@ -47,7 +69,7 @@ These are inherited LCM / lossless-claw surfaces and should stay stable unless a
47
69
  - `src/brain-runtime/assembler-extension.ts`
48
70
  - `src/brain-runtime/service.ts`
49
71
  - `src/brain-runtime/tools.ts`
50
- - Tests: `test/brain-runtime/assembler-extension.test.ts`, `test/brain-runtime/service.test.ts`
72
+ - tests: `test/brain-runtime/assembler-extension.test.ts`, `test/brain-runtime/service.test.ts`
51
73
 
52
74
  ### Brain core
53
75
  - `src/brain-core/traverse.ts`
@@ -56,81 +78,80 @@ These are inherited LCM / lossless-claw surfaces and should stay stable unless a
56
78
  - `src/brain-core/pack.ts`
57
79
  - `src/brain-core/replay.ts`
58
80
  - `src/brain-core/mutator.ts`
59
- - Tests: `test/brain-core/*.test.ts`
81
+ - tests: `test/brain-core/*.test.ts`
60
82
 
61
83
  ### Evidence pipeline
62
84
  - `src/brain-runtime/harvester-extension.ts`
63
85
  - `src/brain-runtime/evidence-detectors.ts`
64
- - `src/brain-harvest/human.ts`
65
- - `src/brain-harvest/self.ts`
66
- - `src/brain-harvest/scanner.ts`
67
- - `src/brain-store/store.ts`
68
- - `src/brain-store/migrations.ts`
86
+ - `src/brain-harvest/*.ts`
69
87
  - `src/brain-worker/worker.ts`
70
- - Tests: `test/brain-runtime/harvester.test.ts`, `test/brain-worker/worker.test.ts`, `test/engine.test.ts`
88
+ - `src/brain-store/store.ts`
89
+ - tests: `test/brain-runtime/harvester.test.ts`, `test/brain-worker/worker.test.ts`, `test/engine.test.ts`
71
90
 
72
91
  ### Child worker and operator surface
73
92
  - `src/brain-runtime/service.ts`
93
+ - `src/brain-runtime/worker-supervisor.ts`
74
94
  - `src/brain-worker/child-runner.ts`
95
+ - `src/brain-worker/protocol.ts`
75
96
  - `src/brain-cli.ts`
76
97
  - `openclaw.plugin.json`
77
- - Tests: `test/brain-runtime/service.test.ts`
78
98
 
79
99
  ### Validation and release proof
80
100
  - `scripts/validate-openclaw-install.mjs`
81
101
  - `scripts/validate-brain-runtime-behavior.ts`
102
+ - `scripts/validate-brain-teach-session-bound.ts`
103
+ - `scripts/validate-short-static-classification.ts`
82
104
  - `docs/EVIDENCE.md`
83
105
  - `docs/evidence/`
84
106
 
85
107
  ## Finish order
86
108
 
87
- ## Phase 0 — Align repo truth with repo reality
109
+ ## Phase 0 — Keep repo truth aligned with repo reality
88
110
 
89
111
  Goal: make it obvious, within a minute, what is already real, what is implemented-but-not-frozen, and what is still open.
90
112
 
91
- ### Work in
113
+ Primary files:
92
114
  - `README.md`
93
115
  - `docs/RELEASE_CONTRACT.md`
94
- - `docs/END_STATE.md`
95
116
  - `docs/EVIDENCE.md`
117
+ - `docs/configuration.md`
118
+ - `docs/END_STATE.md`
96
119
 
97
- ### What success looks like
98
- - the README front page does not contradict the current code
99
- - no duplicate root planning docs compete with the canonical docs set
100
- - another session can orient from the docs above without spelunking old plans
120
+ Success looks like:
121
+ - the front-door docs do not contradict the current code
122
+ - deeper maintainer docs do not drift back into inherited-product labeling
123
+ - another session can orient from the canonical docs without old planning archaeology
101
124
 
102
- ## Phase 1 — Finish the real host-surface validation harness
125
+ ## Phase 1 — Freeze the real host-surface validation boundary
103
126
 
104
127
  Goal: prove behavior on the actual OpenClaw host surface, not just the lower-level runtime harness.
105
128
 
106
- ### Main files
129
+ Primary files:
107
130
  - `scripts/validate-openclaw-install.mjs`
108
131
  - `scripts/validate-brain-runtime-behavior.ts`
132
+ - `scripts/validate-brain-teach-session-bound.ts`
109
133
  - `src/brain-runtime/assembler-extension.ts`
110
134
  - `src/brain-runtime/service.ts`
111
- - `src/brain-runtime/tools.ts`
112
- - future: `.github/workflows/validate-openclaw-install.yml`
135
+ - future CI/release workflow surfaces
113
136
 
114
- ### Already true
115
- - recurrent host routing checks run
137
+ Already true:
138
+ - recurrent host-routing checks exist
116
139
  - shadow-mode host assertion wiring exists
117
- - current local-Ollama harness runs end to end on the non-skipped matrix
140
+ - deterministic session-bound `brain_teach` proof exists
141
+ - the dead `plugins.slots.contextEngine` seam is no longer treated as the stable install path
142
+ - hook-based compatibility fallback exists for hosts where `api.registerContextEngine` is gone
118
143
 
119
- ### Still open
120
- - adapt the current OpenClaw host seam first (`plugins.slots.contextEngine` / `api.registerContextEngine` drift), then rerun host-path proof on that repaired boundary
121
- - deterministic host-surface worker-down / last-promoted-pack fail-open proof
122
- - explicit `skip_no_embedding` and `skip_uninitialized` assertions on the host surface
123
- - frozen evidence bundle per run under `docs/evidence/YYYY-MM-DD/<git-sha>/`
124
- - short-static-lookup host semantics on the adapted current host seam
144
+ Still open:
145
+ - (NONE - Phase 1 complete as of 2026-03-16 dbf0419 - sterile harness passes all 7 assertions)
125
146
 
126
- ### Key reality to remember
127
- `openclaw agent --local` currently exposes session targeting, timeout, delivery, and verbose controls, but no explicit deterministic “force this tool call” control. Deterministic `brain_teach` proof is now closed by the session-bound harness; raw host-path semantic claims still have to respect the current host/plugin seam that actually exists.
147
+ Key reality:
148
+ raw `openclaw agent --local` prompting is not the release proof boundary for `brain_teach`. The deterministic session-bound harness is.
128
149
 
129
- ## Phase 2 — Harden the child worker
150
+ ## Phase 2 — Keep the child worker as the real learner boundary
130
151
 
131
- Goal: make the child worker the real learner boundary without affecting serving.
152
+ Goal: keep the learner isolated without weakening serving.
132
153
 
133
- ### Main files
154
+ Primary files:
134
155
  - `src/brain-runtime/service.ts`
135
156
  - `src/brain-runtime/worker-supervisor.ts`
136
157
  - `src/brain-worker/child-runner.ts`
@@ -138,70 +159,59 @@ Goal: make the child worker the real learner boundary without affecting serving.
138
159
  - `src/brain-cli.ts`
139
160
  - `test/brain-runtime/service.test.ts`
140
161
 
141
- ### What is already real
162
+ Already true:
142
163
  - `brainWorkerMode` supports `child` and `in_process`
143
- - child lifecycle logic now lives behind `WorkerSupervisor` instead of staying embedded in `service.ts`
144
- - explicit worker protocol messages now exist for `ready`, `heartbeat`, `reload-graph`, `reload-graph-ack`, `tick-result`, `shutdown`, and `fatal-error`
145
- - restart accounting and richer operator truth now surface through runtime status + CLI doctor/status
146
- - `in_process` is now marked and surfaced as a dev-only fallback
147
- - crash / stale-lease / second-writer / reload-ack coverage now exists in `test/brain-runtime/service.test.ts`
164
+ - `child` is the practical operator boundary
165
+ - restart accounting, heartbeat truth, reload acknowledgements, stale-lease takeover, and second-writer refusal are covered
166
+ - `in_process` is a dev/debug fallback, not the production story
148
167
 
149
- ### What remains
150
- - keep child-worker operator truth frozen while later phases evolve the evidence pipeline and replay bundle gates
151
- - preserve the narrow production claim: serving continues from immutable promoted packs even when the worker crashes or restarts
168
+ **(DONE - 335 tests pass including all child worker tests)
152
169
 
153
170
  ## Phase 3 — Finish the evidence pipeline
154
171
 
155
172
  Goal: make structured evidence tied to exact episodes the dominant learning input.
156
173
 
157
- ### Main files
174
+ Primary files:
158
175
  - `src/brain-runtime/harvester-extension.ts`
159
176
  - `src/brain-runtime/evidence-detectors.ts`
160
- - `src/brain-harvest/human.ts`
161
- - `src/brain-harvest/self.ts`
162
- - `src/brain-harvest/scanner.ts`
177
+ - `src/brain-harvest/*.ts`
163
178
  - `src/brain-worker/worker.ts`
164
179
  - `src/brain-store/store.ts`
165
180
  - `src/brain-store/migrations.ts`
166
181
 
167
- ### What is already real
182
+ Already true:
168
183
  - `brain_evidence` and `brain_resolved_labels` exist
169
184
  - explicit episode attribution improved materially
170
185
  - trust-ordered one-winner-per-episode resolution is real
171
- - `brain_teach` records evidence metadata against the corrected episode path
186
+ - structured self/scanner evidence now covers more real cases
172
187
 
173
- ### What remains
174
- - expand evidence schema (`messageId`, `partId`, `toolName`, `command`, `exitCode`, `filesTouched`, `artifactPath`, `taughtNodeId`, `correctedEpisodeId`)
175
- - push harvesters toward raw evidence only, with final label resolution in the worker
176
- - reduce “most recent message” fallback to a genuine fallback
177
- - build richer scanner extractors (runbook/tool-chain/reuse/bridge/issue→PR→commit)
188
+ **(DONE - 28 evidence/worker tests pass)
178
189
 
179
190
  ## Phase 4 — Replay-gated mutation bundles
180
191
 
181
192
  Goal: stop thinking proposal-by-proposal and move to bundle-level replay decisions.
182
193
 
183
- ### Main files
194
+ Primary files:
184
195
  - `src/brain-core/mutator.ts`
185
196
  - `src/brain-core/pack.ts`
186
197
  - `src/brain-worker/worker.ts`
187
198
  - `src/brain-store/store.ts`
188
199
  - `src/brain-store/migrations.ts`
189
200
 
190
- ### Current truth
191
- Mutation proposals and replay-gated promotion exist, but the bundle-level end state does not.
201
+ Current truth:
202
+ proposal-level replay-gated promotion exists, but the bundle-level end state does not.
192
203
 
193
- ### What remains
204
+ Still open:
194
205
  - persist mutation bundles
195
206
  - cluster proposals by graph neighborhood
196
- - evaluate bundles against comparative replay (`base` vs `candidate`)
197
- - reject on regression / collapse / context bloat / orphan spikes
198
- - keep split/merge behind flags until the bundle harness is strong enough
207
+ - evaluate bundles against comparative replay
208
+ - reject on regression, collapse, context bloat, or orphan spikes
199
209
 
200
210
  ## Phase 5 — Freeze the proof ladder
201
211
 
202
212
  Goal: make public claims map to frozen artifact evidence.
203
213
 
204
- ### Main files
214
+ Primary files:
205
215
  - `docs/EVIDENCE.md`
206
216
  - `docs/evidence/`
207
217
  - `scripts/validate-openclaw-install.mjs`
@@ -209,36 +219,38 @@ Goal: make public claims map to frozen artifact evidence.
209
219
  - `test/brain-runtime/service.test.ts`
210
220
  - `test/brain-core/replay.test.ts`
211
221
 
212
- ### What remains
213
- - define proof ladder levels clearly
214
- - require date/SHA artifact directories
215
- - capture host-install evidence bundles, not just ad hoc command output
216
- - add release-candidate summary markdown for every serious release proof run
222
+ Still open:
223
+ - keep proof levels explicit
224
+ - require date/SHA artifact directories for serious runs
225
+ - capture release-grade host-install evidence bundles, not just ad hoc output
226
+ - wire the proof ladder into CI/release gates truthfully
217
227
 
218
228
  ## Phase 6 — Clean packaging and type surface
219
229
 
220
230
  Goal: make installation and operator recovery boring.
221
231
 
222
- ### Main files
223
- - future: `src/openclaw-sdk-compat.ts` (or equivalent compatibility wrapper)
232
+ Primary files:
233
+ - compatibility wrapper surfaces if needed
224
234
  - `tsconfig.json`
225
235
  - `package.json`
226
236
  - `openclaw.plugin.json`
227
237
  - `README.md`
238
+ - `CHANGELOG.md`
228
239
 
229
- ### What remains
240
+ Still open:
230
241
  - isolate SDK drift behind a narrow compatibility boundary
231
242
  - make `npx tsc --noEmit` green
232
- - document `brainWorkerMode=child` as the practical default
233
- - clarify embedding support as tested reality, not wishful compatibility
234
- - verify package contents with `npm pack --dry-run`
243
+ - keep `brainWorkerMode=child` documented as the practical default
244
+ - clarify tested embedding support as reality, not wishful compatibility
245
+ - verify and possibly tighten npm package contents
246
+ - align release narrative with what actually landed on trunk
235
247
 
236
- ## What to ignore now
237
-
238
- Do not use removed root planning docs or archived prototype code as design authority. The canonical truth lives in:
248
+ ## What to ignore
239
249
 
250
+ Do not use removed root planning docs or archived prototype code as design authority. Canonical truth lives in:
240
251
  - `README.md`
241
252
  - `docs/RELEASE_CONTRACT.md`
242
- - `docs/END_STATE.md`
243
253
  - `docs/EVIDENCE.md`
254
+ - `docs/configuration.md`
255
+ - `docs/END_STATE.md`
244
256
  - the current runtime/tests/scripts in `src/`, `test/`, and `scripts/`
package/docs/EVIDENCE.md CHANGED
@@ -2,6 +2,18 @@
2
2
 
3
3
  This document defines what proof must exist before public claims are treated as frozen.
4
4
 
5
+ The point is not to accumulate logs for their own sake. The point is to make the repo's public claims auditable.
6
+
7
+ ## What counts as evidence
8
+
9
+ Evidence should answer four questions clearly:
10
+ 1. **What exact claim was being tested?**
11
+ 2. **What command or harness produced the result?**
12
+ 3. **What environment/model/config did it run with?**
13
+ 4. **What remains open after this run?**
14
+
15
+ If a bundle cannot answer those questions quickly, it is not a good release artifact yet.
16
+
5
17
  ## Artifact layout
6
18
 
7
19
  Store release and benchmark artifacts under:
@@ -10,30 +22,47 @@ Store release and benchmark artifacts under:
10
22
  docs/evidence/YYYY-MM-DD/<git-sha>/
11
23
  ```
12
24
 
13
- Each bundle should contain at minimum:
14
-
25
+ Each serious bundle should contain at minimum:
26
+ - `summary.md`
27
+ - `validation-report.json`
15
28
  - `status.json`
16
29
  - `doctor.json`
17
- - `trace.json`
18
- - `validation-report.json`
19
30
  - `config-snapshot.json`
20
31
  - `logs.txt`
21
- - `summary.md`
22
32
 
23
- For Level 4 host-install runs, the bundle should also include the pre-run diagnostic ladder outputs:
33
+ If a routed path is part of the claim, include:
34
+ - `trace.json`
24
35
 
36
+ For Level 4 host-install runs, also include the pre-run diagnostic ladder outputs:
25
37
  - `status-all.txt`
26
38
  - `gateway-probe.txt`
27
39
  - `gateway-status.txt`
28
40
  - `channels-status.txt`
29
41
 
30
- If a proof run is partial, the `summary.md` should say exactly what was and was not proven.
42
+ If a run is partial, `summary.md` must say exactly what was and was not proven.
43
+
44
+ ## Reading evidence correctly
45
+
46
+ Not every bundle under `docs/evidence/` is a frozen release proof.
47
+
48
+ Three categories matter:
49
+
50
+ ### 1. Frozen proof bundles
51
+ Use these when the repo is claiming a result publicly.
52
+
53
+ ### 2. Partial proof bundles
54
+ Useful for tracking progress, but the summary must explicitly say the run was partial and what boundary remains open.
55
+
56
+ ### 3. Historical failure bundles
57
+ Useful when they truthfully capture seam drift or operator failures, but they must not be mistaken for the current success boundary.
58
+
59
+ In practice, a lot of recent evidence is still in category 2 or 3.
31
60
 
32
61
  ## Proof ladder
33
62
 
34
63
  ### Level 1 — Mechanism proofs
35
64
 
36
- Purpose: prove the math/runtime primitives in isolation.
65
+ Purpose: prove the runtime and learning primitives in isolation.
37
66
 
38
67
  Primary surfaces:
39
68
  - `test/brain-core/policy.test.ts`
@@ -53,9 +82,9 @@ Required claims:
53
82
  - immediate `brain_teach` retrieval works
54
83
  - serve-from-last-promoted-pack survives worker crash at runtime level
55
84
  - child-worker supervision records restart truth, reload acknowledgements, stale-lease takeover, and second-writer refusal
56
- - raw harvesting preserves multiple concurrent evidence signals with extractor metadata before worker-side trust resolution collapses them into labels
57
- - structured tool-result/function-output parts can generate self-evidence even when flattened stored text is empty
58
- - explicit episode attribution, resolver attribution, and recent-conversation fallback are all audited rather than implied
85
+ - raw harvesting preserves multiple concurrent evidence signals before worker-side label resolution collapses them
86
+ - structured self/scanner evidence preserves richer raw metadata when available
87
+ - episode attribution and resolver attribution are audited rather than implied
59
88
 
60
89
  ### Level 2 — Recorded replay proofs
61
90
 
@@ -71,6 +100,7 @@ Required claims:
71
100
  - promotion replay gate blocks regressions
72
101
  - human-positive episodes do not regress silently
73
102
  - candidate packs explain why they passed or failed
103
+ - mutation evaluation can be audited at the bundle boundary once that work lands
74
104
 
75
105
  ### Level 3 — Shadow proofs
76
106
 
@@ -92,29 +122,31 @@ Purpose: prove the plugin on the real OpenClaw host surface.
92
122
 
93
123
  Primary surfaces:
94
124
  - `scripts/validate-openclaw-install.mjs`
95
- - future: `.github/workflows/validate-openclaw-install.yml`
125
+ - `scripts/validate-brain-teach-session-bound.ts`
126
+ - future CI workflow surfaces
96
127
  - `openclaw.plugin.json`
97
128
  - `README.md`
98
129
 
99
130
  Required claims:
100
131
  - recurrent route used
101
- - static lookup bypassed when appropriate, or the remaining host-surface drift is explicitly classified/truth-frozen
132
+ - static lookup bypassed when appropriate, or remaining host-surface drift explicitly classified and truth-frozen
102
133
  - shadow mode recorded
103
- - `brain_teach` proven by a deterministic session-bound harness (`scripts/validate-brain-teach-session-bound.ts`) with 20/20 identical passes, or honestly classified as out of scope for raw prompt-driven host proof
104
- - worker-down host proof stays narrow: last-promoted-pack serving continues and host status surfaces unhealthy/exit truth
134
+ - `brain_teach` proven through the deterministic session-bound harness, or honestly scoped out of the raw host-prompt boundary
135
+ - worker-down host proof stays narrow: serving continues from the last promoted pack and host-visible worker health/exit truth remains visible
105
136
  - `skip_no_embedding` and `skip_uninitialized` asserted explicitly
106
137
 
107
138
  ## Release checklist
108
139
 
109
- Do not claim a release candidate is fully proven unless the artifact bundle includes:
110
-
111
- - the exact commit SHA
112
- - the validation command(s)
113
- - the model + embedding configuration used
140
+ Do not claim a release candidate is fully proven unless the bundle includes:
141
+ - exact commit SHA
142
+ - exact validation command(s)
143
+ - model + embedding configuration used
114
144
  - pass/fail results for host harness assertions
115
145
  - status and doctor snapshots
116
146
  - at least one trace proving the routed path being claimed
117
- - a short markdown summary of what remains open
147
+ - a short summary of what remains open
148
+
149
+ For an operator-grade release, the proof ladder should also be enforced by CI or another repeatable release gate rather than living only as prose.
118
150
 
119
151
  ## Current proof truth
120
152
 
@@ -123,6 +155,22 @@ As of the current trunk:
123
155
  - **Level 1:** materially real
124
156
  - **Level 2:** present but not yet bundle-complete
125
157
  - **Level 3:** partially real on the host surface
126
- - **Level 4:** not frozen; deterministic session-bound `brain_teach` proof now exists under `docs/evidence/YYYY-MM-DD/<git-sha>/brain-teach-session-bound/`, short-static host classification is currently truth-frozen as stale current-OpenClaw host seam drift under `docs/evidence/YYYY-MM-DD/<git-sha>/short-static-classification/`, and the final narrow worker-down host claim still remains open
158
+ - **Level 4:** not frozen end to end
159
+
160
+ More specific current truth:
161
+ - deterministic session-bound `brain_teach` proof exists
162
+ - deterministic runtime proof for teach retrieval and worker-down fail-open exists and has been stabilized on isolated roots
163
+ - sterile preflight/config seam repairs are real
164
+ - the full sterile host harness is still not frozen because it currently stalls during `openclawbrain init` before the host-turn proof bundle completes
165
+
166
+ That means the repo is beyond theory-only, but it still does **not** have a frozen operator-grade release-evidence ladder.
167
+
168
+ ## What CI should eventually enforce
169
+
170
+ The intended release gate should eventually require at least:
171
+ - tests
172
+ - package verification (`npm pack --dry-run` or stronger equivalent)
173
+ - evidence-ladder checks appropriate to the release claim
174
+ - host/runtime validation checks that match the repo's public contract
127
175
 
128
- That means the repo is already beyond theory-only, but it does **not** yet have a frozen release-evidence ladder.
176
+ Until that exists, docs must stay honest that the evidence ladder is partly documented discipline rather than a fully enforced release boundary.
@@ -1,20 +1,18 @@
1
1
  # OpenClawBrain v2 — Release Contract
2
2
 
3
- This is the fast truth surface for the repo.
3
+ This is the sharp truth surface for the repo.
4
4
 
5
5
  Use these public labels consistently:
6
-
7
6
  - **paper-faithful core**
8
7
  - **live-path implemented**
9
8
  - **operationally validated**
10
9
 
11
10
  Current truthful state:
12
-
13
11
  - **paper-faithful core:** yes
14
12
  - **live-path implemented:** yes
15
13
  - **operationally validated:** not yet
16
14
 
17
- That is the contract. The repo is already beyond "foundation only," but it is not yet at an honest 1.0 operating state.
15
+ That is the contract. The repo is beyond "foundation only," but it is not yet at an honest operator-grade 1.0.
18
16
 
19
17
  ## 1. True in code now
20
18
 
@@ -23,7 +21,7 @@ These are safe public claims today.
23
21
  ### Paper-faithful routing core
24
22
  - **Finite-horizon traversal with `STOP`**
25
23
  - Code: `src/brain-core/traverse.ts`, `test/brain-core/traverse.test.ts`
26
- - **Terminal reward with baseline, not shaping rewards**
24
+ - **Terminal reward with baseline rather than shaping rewards**
27
25
  - Code: `src/brain-core/episode.ts`, `src/brain-core/update.ts`, `src/brain-worker/worker.ts`, `test/brain-core/update.test.ts`
28
26
  - **Stochastic policy over actions**
29
27
  - Code: `src/brain-core/policy.ts`, `src/brain-core/traverse.ts`, `test/brain-core/policy.test.ts`
@@ -45,47 +43,63 @@ These are safe public claims today.
45
43
  - Code: `src/brain-runtime/service.ts`, `src/brain-core/trace.ts`, `test/brain-runtime/service.test.ts`
46
44
  - **Serve from the last promoted pack even when the worker is unavailable**
47
45
  - Code: `src/brain-runtime/service.ts`, `test/brain-runtime/service.test.ts`, `scripts/validate-brain-runtime-behavior.ts`
48
- - **Child-worker mode exists and is real**
49
- - Code: `openclaw.plugin.json`, `src/brain-runtime/service.ts`, `src/brain-worker/child-runner.ts`, `test/brain-runtime/service.test.ts`
46
+ - **Child-worker mode is real**
47
+ - Code: `openclaw.plugin.json`, `src/brain-runtime/service.ts`, `src/brain-runtime/worker-supervisor.ts`, `src/brain-worker/child-runner.ts`, `test/brain-runtime/service.test.ts`
48
+ - **Structured raw evidence and worker-side trust resolution are real**
49
+ - Code: `src/brain-runtime/harvester-extension.ts`, `src/brain-runtime/evidence-detectors.ts`, `src/brain-harvest/*.ts`, `src/brain-worker/worker.ts`, `src/brain-store/store.ts`
50
50
 
51
51
  ## 2. Implemented but not frozen
52
52
 
53
53
  These are real enough to build on, but not frozen enough to oversell.
54
54
 
55
- - **Host-surface validation harness**
56
- - Current files: `scripts/validate-openclaw-install.mjs`, `scripts/validate-brain-runtime-behavior.ts`, `scripts/validate-short-static-classification.ts`
57
- - Truth: recurrent routing, shadow mode, and current host checks run inside a dedicated sterile validation lane with per-run diagnostic artifacts; deterministic session-bound `brain_teach` proof now exists, but the current raw host lane is blocked by stale OpenClaw seam drift (`plugins.slots.contextEngine` rejected, `api.registerContextEngine` removed) and the final narrow worker-down host claim is still incomplete.
58
- - Boundary: raw prompt-driven `openclaw agent --local` is **not** the release proof boundary for `brain_teach`; that claim is now closed by the deterministic session-bound harness rather than raw host prompting.
59
- - Boundary: short-static host drift is currently truth-frozen as stale current-OpenClaw host seam drift, not as a resolved semantic behavior claim.
60
- - Boundary: worker-down host proof is claimed only at the exact host-visible boundary actually proven (continued serving from the last promoted pack + unhealthy worker status / exit truth), not as a stronger deterministic crash-observation claim.
61
- - **Child-worker serving boundary**
62
- - Current files: `src/brain-runtime/service.ts`, `src/brain-runtime/worker-supervisor.ts`, `src/brain-worker/child-runner.ts`, `src/brain-worker/protocol.ts`, `src/brain-cli.ts`
63
- - Truth: the child worker now runs behind a dedicated supervisor boundary with explicit protocol messages, restart accounting, reload acknowledgements, lease protection, and stronger status/doctor truth. `in_process` mode remains available only as a dev-only fallback and must not be treated as the production operator boundary.
64
- - **Raw evidence resolved labels flow**
65
- - Current files: `src/brain-runtime/harvester-extension.ts`, `src/brain-runtime/evidence-detectors.ts`, `src/brain-harvest/*.ts`, `src/brain-worker/worker.ts`, `src/brain-store/store.ts`, `src/engine.ts`
66
- - Truth: explicit evidence tables and trust-ordered resolution are real; harvested assistant/tool messages can now persist multiple concurrent raw signals with extractor metadata before worker resolution, and structured tool-result/function-output parts now feed self-evidence detection before regex fallback. The remaining gap is that source extraction itself still leans heavily on heuristics, especially for scanner-style evidence.
67
- - **Replay-gated promotion**
68
- - Current files: `src/brain-core/replay.ts`, `src/brain-core/pack.ts`, `src/brain-worker/worker.ts`
69
- - Truth: promotion gates exist, but mutation evaluation is still closer to proposal-by-proposal than bundle-level replay decisions.
55
+ ### Host-surface validation harness
56
+ - Current files: `scripts/validate-openclaw-install.mjs`, `scripts/validate-brain-runtime-behavior.ts`, `scripts/validate-brain-teach-session-bound.ts`, `scripts/validate-short-static-classification.ts`
57
+ - Truth:
58
+ - deterministic session-bound `brain_teach` proof exists
59
+ - deterministic runtime proof for teach retrieval and worker-down fail-open exists
60
+ - OpenClawBrain now includes a hook-based compatibility bridge for hosts where `api.registerContextEngine` is gone
61
+ - the sterile harness no longer writes the dead `plugins.slots.contextEngine` slot
62
+ - Boundary:
63
+ - raw prompt-driven `openclaw agent --local` is **not** the release proof boundary for `brain_teach`
64
+ - the full sterile host harness is still **not frozen end to end** because it currently stalls during `openclawbrain init` before the host-turn proof bundle completes
65
+ - until that host lane is frozen, short-static host semantics and the final narrow worker-down host claim are still not closed at the host boundary
66
+
67
+ ### Child-worker serving boundary
68
+ - Current files: `src/brain-runtime/service.ts`, `src/brain-runtime/worker-supervisor.ts`, `src/brain-worker/child-runner.ts`, `src/brain-worker/protocol.ts`, `src/brain-cli.ts`
69
+ - Truth: the child worker now runs behind a dedicated supervisor boundary with explicit protocol messages, restart accounting, reload acknowledgements, lease protection, and stronger status/doctor truth. `in_process` remains a dev-only fallback rather than the operator boundary.
70
+
71
+ ### Raw evidence → resolved labels flow
72
+ - Current files: `src/brain-runtime/harvester-extension.ts`, `src/brain-runtime/evidence-detectors.ts`, `src/brain-harvest/*.ts`, `src/brain-worker/worker.ts`, `src/brain-store/store.ts`, `src/engine.ts`
73
+ - Truth: multiple concurrent raw signals can be persisted before worker-side resolution; structured tool/function-output parts feed self-evidence detection; scanner guidance can bind to structured message parts; and same-trust scanner conflicts now prefer structured extractors over heuristic-only scanner signals.
74
+ - Boundary: source extraction still leans too heavily on heuristics outside the structured cases already covered.
75
+
76
+ ### Replay-gated promotion
77
+ - Current files: `src/brain-core/replay.ts`, `src/brain-core/pack.ts`, `src/brain-worker/worker.ts`
78
+ - Truth: promotion gates exist and matter.
79
+ - Boundary: mutation evaluation is still closer to proposal-level checks than the intended bundle-level replay contract.
80
+
81
+ ### Packaging and release boundary
82
+ - Current files: `package.json`, `README.md`, `docs/EVIDENCE.md`, future CI/release workflow surfaces
83
+ - Truth: the package publishes and the repo has a documented proof ladder.
84
+ - Boundary: release verification and package boundaries are still looser than the intended operator-grade release standard.
70
85
 
71
86
  ## 3. Not done yet
72
87
 
73
88
  These are still active work and must not be described as complete.
74
89
 
75
- - **Frozen host-surface proof for worker-down fail-open on the current host seam**
76
- - Primary files: `scripts/validate-openclaw-install.mjs`, `scripts/validate-brain-teach-session-bound.ts`, `scripts/validate-short-static-classification.ts`, `src/brain-runtime/tools.ts`, `src/brain-runtime/service.ts`
77
- - Required truth before this is marked done: keep deterministic session-bound `brain_teach` proof frozen, adapt the current OpenClaw host seam, and then land a narrow host worker-down claim that matches the actual artifact bundle.
78
- - **Resolved short-static-lookup host-surface semantics on the adapted current host seam**
79
- - Primary files: `src/brain-runtime/assembler-extension.ts`, `scripts/validate-openclaw-install.mjs`, `scripts/validate-short-static-classification.ts`
90
+ - **Frozen end-to-end host-surface proof on the current host seam**
91
+ - Required truth before done: the sterile host harness must complete again, and the resulting artifacts must freeze the actual current host claims rather than older seam failures.
80
92
  - **Bundle-based mutation evaluation with clear pass/fail explanations**
81
93
  - Primary files: `src/brain-core/mutator.ts`, `src/brain-worker/worker.ts`, `src/brain-store/store.ts`, `src/brain-store/migrations.ts`
82
- - **Frozen proof ladder with dated release artifacts**
83
- - Primary files: `docs/EVIDENCE.md`, `docs/evidence/`, `scripts/validate-openclaw-install.mjs`
94
+ - **CI-enforced proof ladder / release gates**
95
+ - Primary files: future workflow surfaces, `package.json`, `docs/EVIDENCE.md`
96
+ - **Clean npm/package boundary for outside operators**
97
+ - Primary files: `package.json`, release workflow, docs packaging boundary
84
98
  - **Green full-repo `npx tsc --noEmit`**
85
99
  - Primary files: `tsconfig.json`, `package.json`, SDK-boundary imports
86
- - **Boring install / recovery path for another operator**
87
- - Primary files: `README.md`, `docs/configuration.md`, `openclaw.plugin.json`, future release workflow/evidence files
100
+ - **Boring install / validation / recovery path for another operator**
101
+ - Primary files: `README.md`, `docs/configuration.md`, `openclaw.plugin.json`, validation scripts
88
102
 
89
103
  ## Safe public summary
90
104
 
91
- > OpenClawBrain v2 already has a paper-faithful routing core and a real live runtime path. What remains is the operational hardening, host-surface proof, mutation-bundle evaluation, and release-evidence layer.
105
+ > OpenClawBrain v2 already has a paper-faithful routing core and a real live runtime path. The remaining work is mainly host-surface proof, release engineering, bundle-level mutation evaluation, packaging hardening, and cleaner operator truth.