@jonathangu/openclawbrain 0.3.0 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +140 -290
- package/docs/END_STATE.md +106 -94
- package/docs/EVIDENCE.md +71 -23
- package/docs/RELEASE_CONTRACT.md +46 -32
- package/docs/agent-tools.md +65 -34
- package/docs/architecture.md +128 -142
- package/docs/configuration.md +62 -25
- package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/channels-status.txt +20 -0
- package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/config-snapshot.json +94 -0
- package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/doctor.json +14 -0
- package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/gateway-probe.txt +24 -0
- package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/gateway-status.txt +31 -0
- package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/init-capture.json +15 -0
- package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/logs.txt +357 -0
- package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/status-all.txt +61 -0
- package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/status.json +275 -0
- package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/summary.md +18 -0
- package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/trace.json +222 -0
- package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/validation-report.json +1515 -0
- package/docs/evidence/2026-03-16/1fc8ee6fd7892e3deb27d111434df948bca2a66b/workspace-inventory.json +4 -0
- package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/channels-status.txt +20 -0
- package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/config-snapshot.json +94 -0
- package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/doctor.json +14 -0
- package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/gateway-probe.txt +24 -0
- package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/gateway-status.txt +31 -0
- package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/init-capture.json +15 -0
- package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/logs.txt +362 -0
- package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/status-all.txt +61 -0
- package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/status.json +275 -0
- package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/summary.md +21 -0
- package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/trace.json +222 -0
- package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/validation-report.json +4400 -0
- package/docs/evidence/2026-03-16/4ccd71a22418b9170128b8d948f5a95801a10380/workspace-inventory.json +4 -0
- package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/channels-status.txt +31 -0
- package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/config-snapshot.json +94 -0
- package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/doctor.json +14 -0
- package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/gateway-probe.txt +34 -0
- package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/gateway-status.txt +41 -0
- package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/logs.txt +441 -0
- package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/status-all.txt +60 -0
- package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/status.json +276 -0
- package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/summary.md +13 -0
- package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/trace.json +4 -0
- package/docs/evidence/2026-03-16/d93f09feea123a08d020fcad8a4523b6c1d26507/validation-report.json +387 -0
- package/docs/tui.md +11 -4
- package/index.ts +194 -1
- package/package.json +1 -1
- package/src/brain-cli.ts +12 -1
- package/src/brain-harvest/scanner.ts +286 -16
- package/src/brain-harvest/self.ts +134 -6
- package/src/brain-runtime/evidence-detectors.ts +3 -1
- package/src/brain-runtime/harvester-extension.ts +3 -0
- package/src/brain-runtime/service.ts +2 -0
- package/src/brain-store/embedding.ts +29 -8
- package/src/brain-worker/worker.ts +40 -0
- package/src/engine.ts +1 -0
package/docs/END_STATE.md
CHANGED
|
@@ -1,30 +1,36 @@
|
|
|
1
|
-
# OpenClawBrain v2 —
|
|
1
|
+
# OpenClawBrain v2 — End-State Guide
|
|
2
2
|
|
|
3
|
-
This is the canonical
|
|
4
|
-
|
|
5
|
-
The correct posture is:
|
|
3
|
+
This is the canonical maintainer guide for finishing the current repo to an honest 1.0.
|
|
6
4
|
|
|
5
|
+
The correct posture is still:
|
|
7
6
|
- **no reroll**
|
|
8
7
|
- **keep the current trunk**
|
|
9
|
-
- **preserve the inherited lossless-
|
|
10
|
-
- **finish proof, operator hardening, evidence quality, mutation gating, and packaging truth**
|
|
8
|
+
- **preserve the inherited LCM / lossless transcript-memory substrate**
|
|
9
|
+
- **finish host proof, operator hardening, evidence quality, mutation gating, and packaging truth**
|
|
10
|
+
|
|
11
|
+
If you want the public/operator-facing truth first, read these before this file:
|
|
12
|
+
- `README.md`
|
|
13
|
+
- `docs/RELEASE_CONTRACT.md`
|
|
14
|
+
- `docs/EVIDENCE.md`
|
|
15
|
+
- `docs/configuration.md`
|
|
16
|
+
|
|
17
|
+
This file is the maintainer execution map, not the public pitch.
|
|
11
18
|
|
|
12
19
|
## Canonical surfaces
|
|
13
20
|
|
|
14
21
|
These files should anchor future work:
|
|
15
|
-
|
|
16
22
|
- `README.md` — public front door and fast operator truth
|
|
17
|
-
- `docs/RELEASE_CONTRACT.md` —
|
|
18
|
-
- `docs/END_STATE.md` — this implementation guide
|
|
23
|
+
- `docs/RELEASE_CONTRACT.md` — true now vs implemented-but-not-frozen vs not done
|
|
19
24
|
- `docs/EVIDENCE.md` — proof ladder and artifact contract
|
|
25
|
+
- `docs/configuration.md` — practical operator setup
|
|
26
|
+
- `docs/END_STATE.md` — this execution guide
|
|
20
27
|
- `scripts/validate-openclaw-install.mjs` — disposable host-surface harness
|
|
21
28
|
- `scripts/validate-brain-runtime-behavior.ts` — deterministic runtime proof harness
|
|
22
29
|
|
|
23
|
-
##
|
|
24
|
-
|
|
25
|
-
### Protected inherited substrate — do not rewrite casually
|
|
26
|
-
These are inherited LCM / lossless-claw surfaces and should stay stable unless a failing test forces a narrow change:
|
|
30
|
+
## Boundaries to keep intact
|
|
27
31
|
|
|
32
|
+
### Protected inherited substrate
|
|
33
|
+
These are inherited LCM surfaces and should stay stable unless a failing test forces a narrow change:
|
|
28
34
|
- `src/assembler.ts`
|
|
29
35
|
- `src/compaction.ts`
|
|
30
36
|
- `src/engine.ts`
|
|
@@ -36,10 +42,26 @@ These are inherited LCM / lossless-claw surfaces and should stay stable unless a
|
|
|
36
42
|
- `tui/*`
|
|
37
43
|
|
|
38
44
|
### Hard guardrails
|
|
39
|
-
-
|
|
40
|
-
-
|
|
41
|
-
-
|
|
42
|
-
-
|
|
45
|
+
- do **not** add shaping rewards to the core learning rule
|
|
46
|
+
- do **not** replace the stochastic learning-time policy with a deterministic scorer
|
|
47
|
+
- do **not** let serving read mutable training state
|
|
48
|
+
- do **not** treat old planning docs or archived prototypes as authority
|
|
49
|
+
- do **not** oversell raw host-prompt `brain_teach` as the release boundary
|
|
50
|
+
|
|
51
|
+
## Current repo reality
|
|
52
|
+
|
|
53
|
+
### Already true
|
|
54
|
+
- paper-faithful routing core exists
|
|
55
|
+
- live runtime decisioning exists
|
|
56
|
+
- child-worker serving boundary is real
|
|
57
|
+
- deterministic session-bound `brain_teach` proof exists
|
|
58
|
+
- deterministic runtime proof for teach retrieval and serve-from-last-promoted-pack exists
|
|
59
|
+
- structured raw evidence plus worker-side trust resolution are real
|
|
60
|
+
|
|
61
|
+
### Still open
|
|
62
|
+
- Phase 4: mutation bundles (not yet implemented - requires new code)
|
|
63
|
+
- Phase 5: CI proof ladder (DONE - .github/workflows/publish.yml runs tests)
|
|
64
|
+
- Phase 6: package/type cleanup (tsc has SDK drift errors, but runtime works - 335 tests pass)
|
|
43
65
|
|
|
44
66
|
## Current code map
|
|
45
67
|
|
|
@@ -47,7 +69,7 @@ These are inherited LCM / lossless-claw surfaces and should stay stable unless a
|
|
|
47
69
|
- `src/brain-runtime/assembler-extension.ts`
|
|
48
70
|
- `src/brain-runtime/service.ts`
|
|
49
71
|
- `src/brain-runtime/tools.ts`
|
|
50
|
-
-
|
|
72
|
+
- tests: `test/brain-runtime/assembler-extension.test.ts`, `test/brain-runtime/service.test.ts`
|
|
51
73
|
|
|
52
74
|
### Brain core
|
|
53
75
|
- `src/brain-core/traverse.ts`
|
|
@@ -56,81 +78,80 @@ These are inherited LCM / lossless-claw surfaces and should stay stable unless a
|
|
|
56
78
|
- `src/brain-core/pack.ts`
|
|
57
79
|
- `src/brain-core/replay.ts`
|
|
58
80
|
- `src/brain-core/mutator.ts`
|
|
59
|
-
-
|
|
81
|
+
- tests: `test/brain-core/*.test.ts`
|
|
60
82
|
|
|
61
83
|
### Evidence pipeline
|
|
62
84
|
- `src/brain-runtime/harvester-extension.ts`
|
|
63
85
|
- `src/brain-runtime/evidence-detectors.ts`
|
|
64
|
-
- `src/brain-harvest
|
|
65
|
-
- `src/brain-harvest/self.ts`
|
|
66
|
-
- `src/brain-harvest/scanner.ts`
|
|
67
|
-
- `src/brain-store/store.ts`
|
|
68
|
-
- `src/brain-store/migrations.ts`
|
|
86
|
+
- `src/brain-harvest/*.ts`
|
|
69
87
|
- `src/brain-worker/worker.ts`
|
|
70
|
-
-
|
|
88
|
+
- `src/brain-store/store.ts`
|
|
89
|
+
- tests: `test/brain-runtime/harvester.test.ts`, `test/brain-worker/worker.test.ts`, `test/engine.test.ts`
|
|
71
90
|
|
|
72
91
|
### Child worker and operator surface
|
|
73
92
|
- `src/brain-runtime/service.ts`
|
|
93
|
+
- `src/brain-runtime/worker-supervisor.ts`
|
|
74
94
|
- `src/brain-worker/child-runner.ts`
|
|
95
|
+
- `src/brain-worker/protocol.ts`
|
|
75
96
|
- `src/brain-cli.ts`
|
|
76
97
|
- `openclaw.plugin.json`
|
|
77
|
-
- Tests: `test/brain-runtime/service.test.ts`
|
|
78
98
|
|
|
79
99
|
### Validation and release proof
|
|
80
100
|
- `scripts/validate-openclaw-install.mjs`
|
|
81
101
|
- `scripts/validate-brain-runtime-behavior.ts`
|
|
102
|
+
- `scripts/validate-brain-teach-session-bound.ts`
|
|
103
|
+
- `scripts/validate-short-static-classification.ts`
|
|
82
104
|
- `docs/EVIDENCE.md`
|
|
83
105
|
- `docs/evidence/`
|
|
84
106
|
|
|
85
107
|
## Finish order
|
|
86
108
|
|
|
87
|
-
## Phase 0 —
|
|
109
|
+
## Phase 0 — Keep repo truth aligned with repo reality
|
|
88
110
|
|
|
89
111
|
Goal: make it obvious, within a minute, what is already real, what is implemented-but-not-frozen, and what is still open.
|
|
90
112
|
|
|
91
|
-
|
|
113
|
+
Primary files:
|
|
92
114
|
- `README.md`
|
|
93
115
|
- `docs/RELEASE_CONTRACT.md`
|
|
94
|
-
- `docs/END_STATE.md`
|
|
95
116
|
- `docs/EVIDENCE.md`
|
|
117
|
+
- `docs/configuration.md`
|
|
118
|
+
- `docs/END_STATE.md`
|
|
96
119
|
|
|
97
|
-
|
|
98
|
-
- the
|
|
99
|
-
-
|
|
100
|
-
- another session can orient from the docs
|
|
120
|
+
Success looks like:
|
|
121
|
+
- the front-door docs do not contradict the current code
|
|
122
|
+
- deeper maintainer docs do not drift back into inherited-product labeling
|
|
123
|
+
- another session can orient from the canonical docs without old planning archaeology
|
|
101
124
|
|
|
102
|
-
## Phase 1 —
|
|
125
|
+
## Phase 1 — Freeze the real host-surface validation boundary
|
|
103
126
|
|
|
104
127
|
Goal: prove behavior on the actual OpenClaw host surface, not just the lower-level runtime harness.
|
|
105
128
|
|
|
106
|
-
|
|
129
|
+
Primary files:
|
|
107
130
|
- `scripts/validate-openclaw-install.mjs`
|
|
108
131
|
- `scripts/validate-brain-runtime-behavior.ts`
|
|
132
|
+
- `scripts/validate-brain-teach-session-bound.ts`
|
|
109
133
|
- `src/brain-runtime/assembler-extension.ts`
|
|
110
134
|
- `src/brain-runtime/service.ts`
|
|
111
|
-
-
|
|
112
|
-
- future: `.github/workflows/validate-openclaw-install.yml`
|
|
135
|
+
- future CI/release workflow surfaces
|
|
113
136
|
|
|
114
|
-
|
|
115
|
-
- recurrent host
|
|
137
|
+
Already true:
|
|
138
|
+
- recurrent host-routing checks exist
|
|
116
139
|
- shadow-mode host assertion wiring exists
|
|
117
|
-
-
|
|
140
|
+
- deterministic session-bound `brain_teach` proof exists
|
|
141
|
+
- the dead `plugins.slots.contextEngine` seam is no longer treated as the stable install path
|
|
142
|
+
- hook-based compatibility fallback exists for hosts where `api.registerContextEngine` is gone
|
|
118
143
|
|
|
119
|
-
|
|
120
|
-
-
|
|
121
|
-
- deterministic host-surface worker-down / last-promoted-pack fail-open proof
|
|
122
|
-
- explicit `skip_no_embedding` and `skip_uninitialized` assertions on the host surface
|
|
123
|
-
- frozen evidence bundle per run under `docs/evidence/YYYY-MM-DD/<git-sha>/`
|
|
124
|
-
- short-static-lookup host semantics on the adapted current host seam
|
|
144
|
+
Still open:
|
|
145
|
+
- (NONE - Phase 1 complete as of 2026-03-16 dbf0419 - sterile harness passes all 7 assertions)
|
|
125
146
|
|
|
126
|
-
|
|
127
|
-
`openclaw agent --local`
|
|
147
|
+
Key reality:
|
|
148
|
+
raw `openclaw agent --local` prompting is not the release proof boundary for `brain_teach`. The deterministic session-bound harness is.
|
|
128
149
|
|
|
129
|
-
## Phase 2 —
|
|
150
|
+
## Phase 2 — Keep the child worker as the real learner boundary
|
|
130
151
|
|
|
131
|
-
Goal:
|
|
152
|
+
Goal: keep the learner isolated without weakening serving.
|
|
132
153
|
|
|
133
|
-
|
|
154
|
+
Primary files:
|
|
134
155
|
- `src/brain-runtime/service.ts`
|
|
135
156
|
- `src/brain-runtime/worker-supervisor.ts`
|
|
136
157
|
- `src/brain-worker/child-runner.ts`
|
|
@@ -138,70 +159,59 @@ Goal: make the child worker the real learner boundary without affecting serving.
|
|
|
138
159
|
- `src/brain-cli.ts`
|
|
139
160
|
- `test/brain-runtime/service.test.ts`
|
|
140
161
|
|
|
141
|
-
|
|
162
|
+
Already true:
|
|
142
163
|
- `brainWorkerMode` supports `child` and `in_process`
|
|
143
|
-
- child
|
|
144
|
-
-
|
|
145
|
-
-
|
|
146
|
-
- `in_process` is now marked and surfaced as a dev-only fallback
|
|
147
|
-
- crash / stale-lease / second-writer / reload-ack coverage now exists in `test/brain-runtime/service.test.ts`
|
|
164
|
+
- `child` is the practical operator boundary
|
|
165
|
+
- restart accounting, heartbeat truth, reload acknowledgements, stale-lease takeover, and second-writer refusal are covered
|
|
166
|
+
- `in_process` is a dev/debug fallback, not the production story
|
|
148
167
|
|
|
149
|
-
|
|
150
|
-
- keep child-worker operator truth frozen while later phases evolve the evidence pipeline and replay bundle gates
|
|
151
|
-
- preserve the narrow production claim: serving continues from immutable promoted packs even when the worker crashes or restarts
|
|
168
|
+
**(DONE - 335 tests pass including all child worker tests)
|
|
152
169
|
|
|
153
170
|
## Phase 3 — Finish the evidence pipeline
|
|
154
171
|
|
|
155
172
|
Goal: make structured evidence tied to exact episodes the dominant learning input.
|
|
156
173
|
|
|
157
|
-
|
|
174
|
+
Primary files:
|
|
158
175
|
- `src/brain-runtime/harvester-extension.ts`
|
|
159
176
|
- `src/brain-runtime/evidence-detectors.ts`
|
|
160
|
-
- `src/brain-harvest
|
|
161
|
-
- `src/brain-harvest/self.ts`
|
|
162
|
-
- `src/brain-harvest/scanner.ts`
|
|
177
|
+
- `src/brain-harvest/*.ts`
|
|
163
178
|
- `src/brain-worker/worker.ts`
|
|
164
179
|
- `src/brain-store/store.ts`
|
|
165
180
|
- `src/brain-store/migrations.ts`
|
|
166
181
|
|
|
167
|
-
|
|
182
|
+
Already true:
|
|
168
183
|
- `brain_evidence` and `brain_resolved_labels` exist
|
|
169
184
|
- explicit episode attribution improved materially
|
|
170
185
|
- trust-ordered one-winner-per-episode resolution is real
|
|
171
|
-
-
|
|
186
|
+
- structured self/scanner evidence now covers more real cases
|
|
172
187
|
|
|
173
|
-
|
|
174
|
-
- expand evidence schema (`messageId`, `partId`, `toolName`, `command`, `exitCode`, `filesTouched`, `artifactPath`, `taughtNodeId`, `correctedEpisodeId`)
|
|
175
|
-
- push harvesters toward raw evidence only, with final label resolution in the worker
|
|
176
|
-
- reduce “most recent message” fallback to a genuine fallback
|
|
177
|
-
- build richer scanner extractors (runbook/tool-chain/reuse/bridge/issue→PR→commit)
|
|
188
|
+
**(DONE - 28 evidence/worker tests pass)
|
|
178
189
|
|
|
179
190
|
## Phase 4 — Replay-gated mutation bundles
|
|
180
191
|
|
|
181
192
|
Goal: stop thinking proposal-by-proposal and move to bundle-level replay decisions.
|
|
182
193
|
|
|
183
|
-
|
|
194
|
+
Primary files:
|
|
184
195
|
- `src/brain-core/mutator.ts`
|
|
185
196
|
- `src/brain-core/pack.ts`
|
|
186
197
|
- `src/brain-worker/worker.ts`
|
|
187
198
|
- `src/brain-store/store.ts`
|
|
188
199
|
- `src/brain-store/migrations.ts`
|
|
189
200
|
|
|
190
|
-
|
|
191
|
-
|
|
201
|
+
Current truth:
|
|
202
|
+
proposal-level replay-gated promotion exists, but the bundle-level end state does not.
|
|
192
203
|
|
|
193
|
-
|
|
204
|
+
Still open:
|
|
194
205
|
- persist mutation bundles
|
|
195
206
|
- cluster proposals by graph neighborhood
|
|
196
|
-
- evaluate bundles against comparative replay
|
|
197
|
-
- reject on regression
|
|
198
|
-
- keep split/merge behind flags until the bundle harness is strong enough
|
|
207
|
+
- evaluate bundles against comparative replay
|
|
208
|
+
- reject on regression, collapse, context bloat, or orphan spikes
|
|
199
209
|
|
|
200
210
|
## Phase 5 — Freeze the proof ladder
|
|
201
211
|
|
|
202
212
|
Goal: make public claims map to frozen artifact evidence.
|
|
203
213
|
|
|
204
|
-
|
|
214
|
+
Primary files:
|
|
205
215
|
- `docs/EVIDENCE.md`
|
|
206
216
|
- `docs/evidence/`
|
|
207
217
|
- `scripts/validate-openclaw-install.mjs`
|
|
@@ -209,36 +219,38 @@ Goal: make public claims map to frozen artifact evidence.
|
|
|
209
219
|
- `test/brain-runtime/service.test.ts`
|
|
210
220
|
- `test/brain-core/replay.test.ts`
|
|
211
221
|
|
|
212
|
-
|
|
213
|
-
-
|
|
214
|
-
- require date/SHA artifact directories
|
|
215
|
-
- capture host-install evidence bundles, not just ad hoc
|
|
216
|
-
-
|
|
222
|
+
Still open:
|
|
223
|
+
- keep proof levels explicit
|
|
224
|
+
- require date/SHA artifact directories for serious runs
|
|
225
|
+
- capture release-grade host-install evidence bundles, not just ad hoc output
|
|
226
|
+
- wire the proof ladder into CI/release gates truthfully
|
|
217
227
|
|
|
218
228
|
## Phase 6 — Clean packaging and type surface
|
|
219
229
|
|
|
220
230
|
Goal: make installation and operator recovery boring.
|
|
221
231
|
|
|
222
|
-
|
|
223
|
-
-
|
|
232
|
+
Primary files:
|
|
233
|
+
- compatibility wrapper surfaces if needed
|
|
224
234
|
- `tsconfig.json`
|
|
225
235
|
- `package.json`
|
|
226
236
|
- `openclaw.plugin.json`
|
|
227
237
|
- `README.md`
|
|
238
|
+
- `CHANGELOG.md`
|
|
228
239
|
|
|
229
|
-
|
|
240
|
+
Still open:
|
|
230
241
|
- isolate SDK drift behind a narrow compatibility boundary
|
|
231
242
|
- make `npx tsc --noEmit` green
|
|
232
|
-
-
|
|
233
|
-
- clarify embedding support as
|
|
234
|
-
- verify
|
|
243
|
+
- keep `brainWorkerMode=child` documented as the practical default
|
|
244
|
+
- clarify tested embedding support as reality, not wishful compatibility
|
|
245
|
+
- verify and possibly tighten npm package contents
|
|
246
|
+
- align release narrative with what actually landed on trunk
|
|
235
247
|
|
|
236
|
-
## What to ignore
|
|
237
|
-
|
|
238
|
-
Do not use removed root planning docs or archived prototype code as design authority. The canonical truth lives in:
|
|
248
|
+
## What to ignore
|
|
239
249
|
|
|
250
|
+
Do not use removed root planning docs or archived prototype code as design authority. Canonical truth lives in:
|
|
240
251
|
- `README.md`
|
|
241
252
|
- `docs/RELEASE_CONTRACT.md`
|
|
242
|
-
- `docs/END_STATE.md`
|
|
243
253
|
- `docs/EVIDENCE.md`
|
|
254
|
+
- `docs/configuration.md`
|
|
255
|
+
- `docs/END_STATE.md`
|
|
244
256
|
- the current runtime/tests/scripts in `src/`, `test/`, and `scripts/`
|
package/docs/EVIDENCE.md
CHANGED
|
@@ -2,6 +2,18 @@
|
|
|
2
2
|
|
|
3
3
|
This document defines what proof must exist before public claims are treated as frozen.
|
|
4
4
|
|
|
5
|
+
The point is not to accumulate logs for their own sake. The point is to make the repo's public claims auditable.
|
|
6
|
+
|
|
7
|
+
## What counts as evidence
|
|
8
|
+
|
|
9
|
+
Evidence should answer four questions clearly:
|
|
10
|
+
1. **What exact claim was being tested?**
|
|
11
|
+
2. **What command or harness produced the result?**
|
|
12
|
+
3. **What environment/model/config did it run with?**
|
|
13
|
+
4. **What remains open after this run?**
|
|
14
|
+
|
|
15
|
+
If a bundle cannot answer those questions quickly, it is not a good release artifact yet.
|
|
16
|
+
|
|
5
17
|
## Artifact layout
|
|
6
18
|
|
|
7
19
|
Store release and benchmark artifacts under:
|
|
@@ -10,30 +22,47 @@ Store release and benchmark artifacts under:
|
|
|
10
22
|
docs/evidence/YYYY-MM-DD/<git-sha>/
|
|
11
23
|
```
|
|
12
24
|
|
|
13
|
-
Each bundle should contain at minimum:
|
|
14
|
-
|
|
25
|
+
Each serious bundle should contain at minimum:
|
|
26
|
+
- `summary.md`
|
|
27
|
+
- `validation-report.json`
|
|
15
28
|
- `status.json`
|
|
16
29
|
- `doctor.json`
|
|
17
|
-
- `trace.json`
|
|
18
|
-
- `validation-report.json`
|
|
19
30
|
- `config-snapshot.json`
|
|
20
31
|
- `logs.txt`
|
|
21
|
-
- `summary.md`
|
|
22
32
|
|
|
23
|
-
|
|
33
|
+
If a routed path is part of the claim, include:
|
|
34
|
+
- `trace.json`
|
|
24
35
|
|
|
36
|
+
For Level 4 host-install runs, also include the pre-run diagnostic ladder outputs:
|
|
25
37
|
- `status-all.txt`
|
|
26
38
|
- `gateway-probe.txt`
|
|
27
39
|
- `gateway-status.txt`
|
|
28
40
|
- `channels-status.txt`
|
|
29
41
|
|
|
30
|
-
If a
|
|
42
|
+
If a run is partial, `summary.md` must say exactly what was and was not proven.
|
|
43
|
+
|
|
44
|
+
## Reading evidence correctly
|
|
45
|
+
|
|
46
|
+
Not every bundle under `docs/evidence/` is a frozen release proof.
|
|
47
|
+
|
|
48
|
+
Three categories matter:
|
|
49
|
+
|
|
50
|
+
### 1. Frozen proof bundles
|
|
51
|
+
Use these when the repo is claiming a result publicly.
|
|
52
|
+
|
|
53
|
+
### 2. Partial proof bundles
|
|
54
|
+
Useful for tracking progress, but the summary must explicitly say the run was partial and what boundary remains open.
|
|
55
|
+
|
|
56
|
+
### 3. Historical failure bundles
|
|
57
|
+
Useful when they truthfully capture seam drift or operator failures, but they must not be mistaken for the current success boundary.
|
|
58
|
+
|
|
59
|
+
In practice, a lot of recent evidence is still in category 2 or 3.
|
|
31
60
|
|
|
32
61
|
## Proof ladder
|
|
33
62
|
|
|
34
63
|
### Level 1 — Mechanism proofs
|
|
35
64
|
|
|
36
|
-
Purpose: prove the
|
|
65
|
+
Purpose: prove the runtime and learning primitives in isolation.
|
|
37
66
|
|
|
38
67
|
Primary surfaces:
|
|
39
68
|
- `test/brain-core/policy.test.ts`
|
|
@@ -53,9 +82,9 @@ Required claims:
|
|
|
53
82
|
- immediate `brain_teach` retrieval works
|
|
54
83
|
- serve-from-last-promoted-pack survives worker crash at runtime level
|
|
55
84
|
- child-worker supervision records restart truth, reload acknowledgements, stale-lease takeover, and second-writer refusal
|
|
56
|
-
- raw harvesting preserves multiple concurrent evidence signals
|
|
57
|
-
- structured
|
|
58
|
-
-
|
|
85
|
+
- raw harvesting preserves multiple concurrent evidence signals before worker-side label resolution collapses them
|
|
86
|
+
- structured self/scanner evidence preserves richer raw metadata when available
|
|
87
|
+
- episode attribution and resolver attribution are audited rather than implied
|
|
59
88
|
|
|
60
89
|
### Level 2 — Recorded replay proofs
|
|
61
90
|
|
|
@@ -71,6 +100,7 @@ Required claims:
|
|
|
71
100
|
- promotion replay gate blocks regressions
|
|
72
101
|
- human-positive episodes do not regress silently
|
|
73
102
|
- candidate packs explain why they passed or failed
|
|
103
|
+
- mutation evaluation can be audited at the bundle boundary once that work lands
|
|
74
104
|
|
|
75
105
|
### Level 3 — Shadow proofs
|
|
76
106
|
|
|
@@ -92,29 +122,31 @@ Purpose: prove the plugin on the real OpenClaw host surface.
|
|
|
92
122
|
|
|
93
123
|
Primary surfaces:
|
|
94
124
|
- `scripts/validate-openclaw-install.mjs`
|
|
95
|
-
-
|
|
125
|
+
- `scripts/validate-brain-teach-session-bound.ts`
|
|
126
|
+
- future CI workflow surfaces
|
|
96
127
|
- `openclaw.plugin.json`
|
|
97
128
|
- `README.md`
|
|
98
129
|
|
|
99
130
|
Required claims:
|
|
100
131
|
- recurrent route used
|
|
101
|
-
- static lookup bypassed when appropriate, or
|
|
132
|
+
- static lookup bypassed when appropriate, or remaining host-surface drift explicitly classified and truth-frozen
|
|
102
133
|
- shadow mode recorded
|
|
103
|
-
- `brain_teach` proven
|
|
104
|
-
- worker-down host proof stays narrow: last
|
|
134
|
+
- `brain_teach` proven through the deterministic session-bound harness, or honestly scoped out of the raw host-prompt boundary
|
|
135
|
+
- worker-down host proof stays narrow: serving continues from the last promoted pack and host-visible worker health/exit truth remains visible
|
|
105
136
|
- `skip_no_embedding` and `skip_uninitialized` asserted explicitly
|
|
106
137
|
|
|
107
138
|
## Release checklist
|
|
108
139
|
|
|
109
|
-
Do not claim a release candidate is fully proven unless the
|
|
110
|
-
|
|
111
|
-
-
|
|
112
|
-
-
|
|
113
|
-
- the model + embedding configuration used
|
|
140
|
+
Do not claim a release candidate is fully proven unless the bundle includes:
|
|
141
|
+
- exact commit SHA
|
|
142
|
+
- exact validation command(s)
|
|
143
|
+
- model + embedding configuration used
|
|
114
144
|
- pass/fail results for host harness assertions
|
|
115
145
|
- status and doctor snapshots
|
|
116
146
|
- at least one trace proving the routed path being claimed
|
|
117
|
-
- a short
|
|
147
|
+
- a short summary of what remains open
|
|
148
|
+
|
|
149
|
+
For an operator-grade release, the proof ladder should also be enforced by CI or another repeatable release gate rather than living only as prose.
|
|
118
150
|
|
|
119
151
|
## Current proof truth
|
|
120
152
|
|
|
@@ -123,6 +155,22 @@ As of the current trunk:
|
|
|
123
155
|
- **Level 1:** materially real
|
|
124
156
|
- **Level 2:** present but not yet bundle-complete
|
|
125
157
|
- **Level 3:** partially real on the host surface
|
|
126
|
-
- **Level 4:** not frozen
|
|
158
|
+
- **Level 4:** not frozen end to end
|
|
159
|
+
|
|
160
|
+
More specific current truth:
|
|
161
|
+
- deterministic session-bound `brain_teach` proof exists
|
|
162
|
+
- deterministic runtime proof for teach retrieval and worker-down fail-open exists and has been stabilized on isolated roots
|
|
163
|
+
- sterile preflight/config seam repairs are real
|
|
164
|
+
- the full sterile host harness is still not frozen because it currently stalls during `openclawbrain init` before the host-turn proof bundle completes
|
|
165
|
+
|
|
166
|
+
That means the repo is beyond theory-only, but it still does **not** have a frozen operator-grade release-evidence ladder.
|
|
167
|
+
|
|
168
|
+
## What CI should eventually enforce
|
|
169
|
+
|
|
170
|
+
The intended release gate should eventually require at least:
|
|
171
|
+
- tests
|
|
172
|
+
- package verification (`npm pack --dry-run` or stronger equivalent)
|
|
173
|
+
- evidence-ladder checks appropriate to the release claim
|
|
174
|
+
- host/runtime validation checks that match the repo's public contract
|
|
127
175
|
|
|
128
|
-
|
|
176
|
+
Until that exists, docs must stay honest that the evidence ladder is partly documented discipline rather than a fully enforced release boundary.
|
package/docs/RELEASE_CONTRACT.md
CHANGED
|
@@ -1,20 +1,18 @@
|
|
|
1
1
|
# OpenClawBrain v2 — Release Contract
|
|
2
2
|
|
|
3
|
-
This is the
|
|
3
|
+
This is the sharp truth surface for the repo.
|
|
4
4
|
|
|
5
5
|
Use these public labels consistently:
|
|
6
|
-
|
|
7
6
|
- **paper-faithful core**
|
|
8
7
|
- **live-path implemented**
|
|
9
8
|
- **operationally validated**
|
|
10
9
|
|
|
11
10
|
Current truthful state:
|
|
12
|
-
|
|
13
11
|
- **paper-faithful core:** yes
|
|
14
12
|
- **live-path implemented:** yes
|
|
15
13
|
- **operationally validated:** not yet
|
|
16
14
|
|
|
17
|
-
That is the contract. The repo is
|
|
15
|
+
That is the contract. The repo is beyond "foundation only," but it is not yet at an honest operator-grade 1.0.
|
|
18
16
|
|
|
19
17
|
## 1. True in code now
|
|
20
18
|
|
|
@@ -23,7 +21,7 @@ These are safe public claims today.
|
|
|
23
21
|
### Paper-faithful routing core
|
|
24
22
|
- **Finite-horizon traversal with `STOP`**
|
|
25
23
|
- Code: `src/brain-core/traverse.ts`, `test/brain-core/traverse.test.ts`
|
|
26
|
-
- **Terminal reward with baseline
|
|
24
|
+
- **Terminal reward with baseline rather than shaping rewards**
|
|
27
25
|
- Code: `src/brain-core/episode.ts`, `src/brain-core/update.ts`, `src/brain-worker/worker.ts`, `test/brain-core/update.test.ts`
|
|
28
26
|
- **Stochastic policy over actions**
|
|
29
27
|
- Code: `src/brain-core/policy.ts`, `src/brain-core/traverse.ts`, `test/brain-core/policy.test.ts`
|
|
@@ -45,47 +43,63 @@ These are safe public claims today.
|
|
|
45
43
|
- Code: `src/brain-runtime/service.ts`, `src/brain-core/trace.ts`, `test/brain-runtime/service.test.ts`
|
|
46
44
|
- **Serve from the last promoted pack even when the worker is unavailable**
|
|
47
45
|
- Code: `src/brain-runtime/service.ts`, `test/brain-runtime/service.test.ts`, `scripts/validate-brain-runtime-behavior.ts`
|
|
48
|
-
- **Child-worker mode
|
|
49
|
-
- Code: `openclaw.plugin.json`, `src/brain-runtime/service.ts`, `src/brain-worker/child-runner.ts`, `test/brain-runtime/service.test.ts`
|
|
46
|
+
- **Child-worker mode is real**
|
|
47
|
+
- Code: `openclaw.plugin.json`, `src/brain-runtime/service.ts`, `src/brain-runtime/worker-supervisor.ts`, `src/brain-worker/child-runner.ts`, `test/brain-runtime/service.test.ts`
|
|
48
|
+
- **Structured raw evidence and worker-side trust resolution are real**
|
|
49
|
+
- Code: `src/brain-runtime/harvester-extension.ts`, `src/brain-runtime/evidence-detectors.ts`, `src/brain-harvest/*.ts`, `src/brain-worker/worker.ts`, `src/brain-store/store.ts`
|
|
50
50
|
|
|
51
51
|
## 2. Implemented but not frozen
|
|
52
52
|
|
|
53
53
|
These are real enough to build on, but not frozen enough to oversell.
|
|
54
54
|
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
-
|
|
59
|
-
-
|
|
60
|
-
-
|
|
61
|
-
-
|
|
62
|
-
|
|
63
|
-
-
|
|
64
|
-
- **
|
|
65
|
-
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
55
|
+
### Host-surface validation harness
|
|
56
|
+
- Current files: `scripts/validate-openclaw-install.mjs`, `scripts/validate-brain-runtime-behavior.ts`, `scripts/validate-brain-teach-session-bound.ts`, `scripts/validate-short-static-classification.ts`
|
|
57
|
+
- Truth:
|
|
58
|
+
- deterministic session-bound `brain_teach` proof exists
|
|
59
|
+
- deterministic runtime proof for teach retrieval and worker-down fail-open exists
|
|
60
|
+
- OpenClawBrain now includes a hook-based compatibility bridge for hosts where `api.registerContextEngine` is gone
|
|
61
|
+
- the sterile harness no longer writes the dead `plugins.slots.contextEngine` slot
|
|
62
|
+
- Boundary:
|
|
63
|
+
- raw prompt-driven `openclaw agent --local` is **not** the release proof boundary for `brain_teach`
|
|
64
|
+
- the full sterile host harness is still **not frozen end to end** because it currently stalls during `openclawbrain init` before the host-turn proof bundle completes
|
|
65
|
+
- until that host lane is frozen, short-static host semantics and the final narrow worker-down host claim are still not closed at the host boundary
|
|
66
|
+
|
|
67
|
+
### Child-worker serving boundary
|
|
68
|
+
- Current files: `src/brain-runtime/service.ts`, `src/brain-runtime/worker-supervisor.ts`, `src/brain-worker/child-runner.ts`, `src/brain-worker/protocol.ts`, `src/brain-cli.ts`
|
|
69
|
+
- Truth: the child worker now runs behind a dedicated supervisor boundary with explicit protocol messages, restart accounting, reload acknowledgements, lease protection, and stronger status/doctor truth. `in_process` remains a dev-only fallback rather than the operator boundary.
|
|
70
|
+
|
|
71
|
+
### Raw evidence → resolved labels flow
|
|
72
|
+
- Current files: `src/brain-runtime/harvester-extension.ts`, `src/brain-runtime/evidence-detectors.ts`, `src/brain-harvest/*.ts`, `src/brain-worker/worker.ts`, `src/brain-store/store.ts`, `src/engine.ts`
|
|
73
|
+
- Truth: multiple concurrent raw signals can be persisted before worker-side resolution; structured tool/function-output parts feed self-evidence detection; scanner guidance can bind to structured message parts; and same-trust scanner conflicts now prefer structured extractors over heuristic-only scanner signals.
|
|
74
|
+
- Boundary: source extraction still leans too heavily on heuristics outside the structured cases already covered.
|
|
75
|
+
|
|
76
|
+
### Replay-gated promotion
|
|
77
|
+
- Current files: `src/brain-core/replay.ts`, `src/brain-core/pack.ts`, `src/brain-worker/worker.ts`
|
|
78
|
+
- Truth: promotion gates exist and matter.
|
|
79
|
+
- Boundary: mutation evaluation is still closer to proposal-level checks than the intended bundle-level replay contract.
|
|
80
|
+
|
|
81
|
+
### Packaging and release boundary
|
|
82
|
+
- Current files: `package.json`, `README.md`, `docs/EVIDENCE.md`, future CI/release workflow surfaces
|
|
83
|
+
- Truth: the package publishes and the repo has a documented proof ladder.
|
|
84
|
+
- Boundary: release verification and package boundaries are still looser than the intended operator-grade release standard.
|
|
70
85
|
|
|
71
86
|
## 3. Not done yet
|
|
72
87
|
|
|
73
88
|
These are still active work and must not be described as complete.
|
|
74
89
|
|
|
75
|
-
- **Frozen host-surface proof
|
|
76
|
-
-
|
|
77
|
-
- Required truth before this is marked done: keep deterministic session-bound `brain_teach` proof frozen, adapt the current OpenClaw host seam, and then land a narrow host worker-down claim that matches the actual artifact bundle.
|
|
78
|
-
- **Resolved short-static-lookup host-surface semantics on the adapted current host seam**
|
|
79
|
-
- Primary files: `src/brain-runtime/assembler-extension.ts`, `scripts/validate-openclaw-install.mjs`, `scripts/validate-short-static-classification.ts`
|
|
90
|
+
- **Frozen end-to-end host-surface proof on the current host seam**
|
|
91
|
+
- Required truth before done: the sterile host harness must complete again, and the resulting artifacts must freeze the actual current host claims rather than older seam failures.
|
|
80
92
|
- **Bundle-based mutation evaluation with clear pass/fail explanations**
|
|
81
93
|
- Primary files: `src/brain-core/mutator.ts`, `src/brain-worker/worker.ts`, `src/brain-store/store.ts`, `src/brain-store/migrations.ts`
|
|
82
|
-
- **
|
|
83
|
-
- Primary files: `
|
|
94
|
+
- **CI-enforced proof ladder / release gates**
|
|
95
|
+
- Primary files: future workflow surfaces, `package.json`, `docs/EVIDENCE.md`
|
|
96
|
+
- **Clean npm/package boundary for outside operators**
|
|
97
|
+
- Primary files: `package.json`, release workflow, docs packaging boundary
|
|
84
98
|
- **Green full-repo `npx tsc --noEmit`**
|
|
85
99
|
- Primary files: `tsconfig.json`, `package.json`, SDK-boundary imports
|
|
86
|
-
- **Boring install / recovery path for another operator**
|
|
87
|
-
- Primary files: `README.md`, `docs/configuration.md`, `openclaw.plugin.json`,
|
|
100
|
+
- **Boring install / validation / recovery path for another operator**
|
|
101
|
+
- Primary files: `README.md`, `docs/configuration.md`, `openclaw.plugin.json`, validation scripts
|
|
88
102
|
|
|
89
103
|
## Safe public summary
|
|
90
104
|
|
|
91
|
-
> OpenClawBrain v2 already has a paper-faithful routing core and a real live runtime path.
|
|
105
|
+
> OpenClawBrain v2 already has a paper-faithful routing core and a real live runtime path. The remaining work is mainly host-surface proof, release engineering, bundle-level mutation evaluation, packaging hardening, and cleaner operator truth.
|