brainclaw 1.9.0 → 1.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (91) hide show
  1. package/README.md +585 -499
  2. package/dist/brainclaw-vscode.vsix +0 -0
  3. package/dist/commands/harvest.js +1 -1
  4. package/dist/commands/hooks.js +73 -73
  5. package/dist/commands/init.js +1 -1
  6. package/dist/commands/install-hooks.js +78 -78
  7. package/dist/commands/mcp-read-handlers.js +57 -14
  8. package/dist/commands/mcp.js +79 -13
  9. package/dist/commands/switch.js +26 -5
  10. package/dist/commands/version.js +1 -1
  11. package/dist/core/agent-capability.js +19 -4
  12. package/dist/core/agent-files.js +119 -119
  13. package/dist/core/codev-prompts.js +38 -38
  14. package/dist/core/default-profiles/doctor.yaml +11 -11
  15. package/dist/core/default-profiles/janitor.yaml +11 -11
  16. package/dist/core/default-profiles/onboarder.yaml +11 -11
  17. package/dist/core/default-profiles/reviewer.yaml +13 -13
  18. package/dist/core/dispatcher.js +1 -1
  19. package/dist/core/entity-operations.js +29 -3
  20. package/dist/core/execution.js +1 -1
  21. package/dist/core/loops/verbs.js +0 -1
  22. package/dist/core/messaging.js +2 -2
  23. package/dist/core/protocol-skills.js +164 -164
  24. package/dist/core/runtime-signals.js +1 -1
  25. package/dist/core/search.js +19 -2
  26. package/dist/core/security-guard.js +207 -207
  27. package/dist/core/spawn-check.js +16 -2
  28. package/dist/core/staleness.js +1 -1
  29. package/dist/core/store-resolution.js +26 -7
  30. package/dist/core/worktree.js +18 -18
  31. package/dist/facts.js +3 -3
  32. package/dist/facts.json +2 -2
  33. package/docs/PROTOCOL.md +1 -1
  34. package/docs/adapters/openclaw.md +43 -43
  35. package/docs/architecture/project-refs.md +328 -328
  36. package/docs/cli.md +2093 -2093
  37. package/docs/concepts/coordination.md +52 -52
  38. package/docs/concepts/coordinator-runbook.md +129 -129
  39. package/docs/concepts/dispatch-lifecycle.md +245 -245
  40. package/docs/concepts/event-log-store.md +928 -928
  41. package/docs/concepts/ideation-loop.md +317 -317
  42. package/docs/concepts/loop-engine.md +520 -511
  43. package/docs/concepts/mcp-governance.md +268 -268
  44. package/docs/concepts/memory.md +84 -84
  45. package/docs/concepts/multi-agent-workflows.md +167 -167
  46. package/docs/concepts/observer-protocol.md +361 -361
  47. package/docs/concepts/plans-and-claims.md +217 -217
  48. package/docs/concepts/project-md-convention.md +35 -35
  49. package/docs/concepts/runtime-notes.md +38 -38
  50. package/docs/concepts/troubleshooting.md +254 -254
  51. package/docs/concepts/workspace-bootstrapping.md +142 -142
  52. package/docs/context-format-changelog.md +35 -35
  53. package/docs/context-format.md +48 -48
  54. package/docs/index.md +65 -65
  55. package/docs/integrations/agents.md +158 -158
  56. package/docs/integrations/claude-code.md +23 -23
  57. package/docs/integrations/cline.md +77 -77
  58. package/docs/integrations/continue.md +55 -55
  59. package/docs/integrations/copilot.md +68 -68
  60. package/docs/integrations/cursor.md +23 -23
  61. package/docs/integrations/kilocode.md +72 -72
  62. package/docs/integrations/mcp.md +377 -377
  63. package/docs/integrations/mistral-vibe.md +122 -122
  64. package/docs/integrations/openclaw.md +92 -92
  65. package/docs/integrations/opencode.md +84 -84
  66. package/docs/integrations/overview.md +115 -115
  67. package/docs/integrations/roo.md +71 -71
  68. package/docs/integrations/windsurf.md +77 -77
  69. package/docs/mcp-schema-changelog.md +360 -356
  70. package/docs/playbooks/integration/index.md +121 -121
  71. package/docs/playbooks/orchestration.md +37 -0
  72. package/docs/playbooks/productivity/index.md +99 -99
  73. package/docs/playbooks/team/index.md +117 -117
  74. package/docs/product/agent-first-model.md +184 -184
  75. package/docs/product/entity-model-audit.md +462 -462
  76. package/docs/product/positioning.md +86 -86
  77. package/docs/quickstart-existing-project.md +107 -107
  78. package/docs/quickstart.md +183 -183
  79. package/docs/release-maintenance.md +79 -79
  80. package/docs/reputation.md +52 -52
  81. package/docs/review.md +45 -45
  82. package/docs/security.md +212 -212
  83. package/docs/server-operations.md +118 -118
  84. package/docs/storage.md +106 -106
  85. package/package.json +80 -65
  86. package/docs/concepts/event-log-store-critique-A.md +0 -333
  87. package/docs/concepts/event-log-store-critique-B.md +0 -353
  88. package/docs/concepts/event-log-store-phase0-measurements.md +0 -58
  89. package/docs/concepts/event-log-store-proposal-A.md +0 -365
  90. package/docs/concepts/event-log-store-proposal-B.md +0 -404
  91. package/docs/concepts/identity-model-proposal.md +0 -371
@@ -1,254 +1,254 @@
1
- # Troubleshooting & recovery
2
-
3
- Runbook for the most common ways a brainclaw workspace gets into a degraded state during multi-agent coordination, and how to bring it back. Symptoms first, causes second, remediation third — pattern-matchable when you don't have time to read the whole page.
4
-
5
- This is **operator-facing**: it assumes you can run CLI commands. Agents you orchestrate don't read this page; you do, when something stalls.
6
-
7
- ## Quick-reference cheatsheet
8
-
9
- | Symptom | First-line check | First-line fix |
10
- |---|---|---|
11
- | Agent crashed, claim still active | `brainclaw claim list` | `brainclaw claim release <id>` (or `brainclaw stale resolve <id>`) |
12
- | Plan stuck `in_progress` for days | `brainclaw stale list` | `brainclaw stale resolve <plan_id>` (transitions to `dropped`) |
13
- | Dispatched worker finished without committing | `git -C <worktree> status` | manually `git add` + `git commit` in the worktree, then merge |
14
- | `Cannot find module 'mcp-worker.js'` | `brainclaw doctor` | `brainclaw doctor --repair` |
15
- | Octopus merge fails on parallel lanes | `git status` | merge lanes one-by-one, resolve conflicts, then proceed |
16
- | `.brainclaw/` schema looks corrupt | `brainclaw doctor --after-migration` | `brainclaw upgrade --rollback` (restores last backup) |
17
- | Inbox messages stuck / not delivered | `brainclaw inbox list` | `brainclaw inbox ack <id>` or check `bclaw_assignment_events` |
18
- | `bclaw_work` returns 25k-token error | n/a | already mitigated since v1.0.14 (compact mode default); pass `compact: true` if older clients |
19
- | Stale runtime notes flood `bclaw_context` | `brainclaw stale list` | `brainclaw stale resolve <id>` per noisy item |
20
-
21
- If your symptom isn't here, jump to the relevant section below or run `brainclaw doctor --json` and inspect the `checks` array.
22
-
23
- ---
24
-
25
- ## Stale claims after a crashed agent
26
-
27
- **Symptom**: an agent died (credit limit, terminal closed, network drop). Other agents see the scope as held and refuse to claim it.
28
-
29
- **Why**: claims are advisory locks with a TTL, but expiry is not enforced by a daemon — it surfaces only when something queries it. So a crashed agent's claim stays "active" until someone runs a check.
30
-
31
- **Fix**:
32
-
33
- ```bash
34
- # See what's stale (uses the staleness scoring from src/core/staleness.ts)
35
- brainclaw stale list
36
-
37
- # Release a specific stale claim
38
- brainclaw claim release <claim_id>
39
-
40
- # Or, for any stale entity (plan, handoff, candidate, runtime_note, claim),
41
- # trigger the canonical action:
42
- brainclaw stale resolve <id>
43
- ```
44
-
45
- `stale resolve` dispatches to the right transition per entity:
46
- - claim → release
47
- - plan → `bclaw_transition(entity="plan", to="dropped")`
48
- - handoff → `bclaw_transition(entity="handoff", to="closed")`
49
- - candidate → `bclaw_transition(entity="candidate", to="rejected")`
50
- - trap → `bclaw_transition(entity="trap", to="resolved")`
51
- - runtime_note → `bclaw_remove(entity="runtime_note", id=…)`
52
-
53
- **Prevention**: agents that respect the protocol call `bclaw_session_end(auto_release: true)` on exit, which releases all their claims. This is the recommended default in every dispatch brief.
54
-
55
- ---
56
-
57
- ## `bclaw_coordinate` refused with `dirty_working_tree`
58
-
59
- **Symptom**: an `assign` / `review` / `reroute` dispatch returns
60
- `dirty_working_tree` instead of spawning.
61
-
62
- **Why**: the worker spawns from a worktree branched at HEAD, so uncommitted
63
- edits in the source repo are invisible to it. The guard (trp#371) is
64
- scope-aware — it refuses only when the uncommitted files **overlap**, or
65
- cannot be proven disjoint from, the dispatch `scope`. `.brainclaw/` and
66
- `.git/` are always ignored, and `consult` / `ideate` / `summarize` are never
67
- guarded (they spawn no worktree). A scope that is not a resolvable file path
68
- (a plan-id, loop-ref, or prose) cannot be proven disjoint, so the guard stays
69
- conservative and refuses while the tree is dirty.
70
-
71
- **Fixes**:
72
-
73
- - Commit or stash the overlapping files, then re-dispatch (cleanest).
74
- - Pass `allow_dirty: true` to proceed anyway — the block becomes a warning
75
- that lists the overlapping files.
76
- - Pass a resolvable file `scope` (e.g. `src/foo.ts`) so the guard can prove the
77
- dirty files are out of scope.
78
- - Pass `ref: <commit|branch|tag>` to build the worktree from an explicit ref —
79
- uncommitted working-tree changes are then intentionally out of scope.
80
-
81
- ---
82
-
83
- ## Dispatched worker finished work but never committed
84
-
85
- **Symptom**: a sequence's lane shows the worker as "task_complete" in the run log, but `git -C <worktree-path> status` shows uncommitted changes.
86
-
87
- **Why**: some agents (notably codex when running in `--sandbox workspace-write`) sometimes finish editing without ever creating a git commit — they exit on `task_complete` from the prompt without the wrap-up step. The brief-ack file confirms the spawn *started*, not that it *committed*. See `trp#178`.
88
-
89
- **Fix** (manual harvest):
90
-
91
- ```bash
92
- # 1. Locate the worktree
93
- git worktree list | grep feat/pln_<lane_id>
94
-
95
- # 2. cd into it, inspect the work
96
- cd ~/.brainclaw/worktrees/<project-hash>/feat_pln_xxxx
97
- git status
98
- git diff --stat
99
-
100
- # 3. Stage + commit with a clear message that references the plan id
101
- git add <files>
102
- git commit -m "feat(<scope>): <summary> (pln#<id>)"
103
-
104
- # 4. Back on master, octopus-merge as usual
105
- cd <main repo>
106
- git merge --no-ff feat/pln_xxxx -m "merge: <description>"
107
- ```
108
-
109
- **Prevention**: every dispatch brief targeting agents prone to this pattern (notably codex) should include explicit commit instructions at the end, e.g. *"When done editing, stage your changes and create a commit with a clear message referencing the plan id (e.g. `feat(scope): summary (pln#XXX)`). Do not stop until the commit exists."*
110
-
111
- ---
112
-
113
- ## MCP runtime corrupted (mcp-worker.js missing)
114
-
115
- **Symptom**: `MCP error -32603: Cannot find module 'mcp-worker.js'` or the server logs `MCP runtime corrupted (mcp-worker.js missing)` on startup.
116
-
117
- **Why**: `dist/` was wiped or partially deleted. Common causes: a `git merge` that triggered worktree cleanup before pln#477 landed, an `npm run clean:dist` followed by an interrupted build, or filesystem-level corruption.
118
-
119
- **Fix**:
120
-
121
- ```bash
122
- brainclaw doctor --repair
123
- ```
124
-
125
- This rebuilds `dist/` from `src/` (TypeScript compile + copy default profiles) and validates by running `node dist/cli.js --version`. The repair also writes `dist/.brainclaw-build.json` so subsequent runs can do a stale-check (compare `src_hash` vs `dist_hash`).
126
-
127
- **If `--repair` fails**: it usually means `node_modules` is also damaged. Run a clean `npm install` first, then re-run `brainclaw doctor --repair`.
128
-
129
- **Note**: read-only MCP handlers stay available in-process even when the worker is missing (since pln#478) — so basic `bclaw_context` and `bclaw_find` calls still respond, but anything requiring the worker (most write operations) returns `runtime_corrupted` with a repair pointer.
130
-
131
- ---
132
-
133
- ## Octopus merge fails on parallel lanes
134
-
135
- **Symptom**: after a sequenced parallel dispatch finishes, you run `git merge --no-ff lane1 lane2 lane3 -m "merge: …"` and git refuses with conflict markers.
136
-
137
- **Why**: octopus merges only succeed when the lanes touch disjoint files. If two lanes wrote to the same file, octopus aborts and you must merge them sequentially.
138
-
139
- **Fix**:
140
-
141
- ```bash
142
- # Cancel the failed octopus
143
- git merge --abort
144
-
145
- # Merge lanes one at a time, resolving conflicts as needed
146
- git merge --no-ff lane1
147
- # (resolve any conflicts, commit)
148
- git merge --no-ff lane2
149
- # (resolve any conflicts, commit)
150
- git merge --no-ff lane3
151
- ```
152
-
153
- **Prevention**: when defining a sequence, choose lane scopes that minimize file overlap. Use `hard_after` dependencies for lanes that genuinely need to land in order. The dispatcher does not itself enforce disjoint scopes — that's the caller's responsibility when designing the sequence.
154
-
155
- ---
156
-
157
- ## `.brainclaw/` looks corrupted (schema drift, malformed JSON)
158
-
159
- **Symptom**: `bclaw_doctor` reports `state is invalid: <ZodError>` or files in `.brainclaw/memory/` fail to parse.
160
-
161
- **Why**: usually a half-written file from an interrupted write (process killed mid-write), a migration that didn't complete, or a manual edit that introduced syntax errors. `brainclaw upgrade --rollback` exists precisely for this case.
162
-
163
- **Fix**:
164
-
165
- ```bash
166
- # 1. Inspect what's wrong
167
- brainclaw doctor --after-migration
168
-
169
- # 2. If the most recent migration is the cause, roll back
170
- brainclaw upgrade --rollback
171
- # This restores the last backup at <store>.bak-<iso-ts>/ and parks the
172
- # current corrupted store at <store>.rollback-<iso-ts>/ for inspection.
173
-
174
- # 3. If a single file is corrupted (and rollback is too aggressive),
175
- # inspect the parked rollback dir and copy individual files back manually.
176
- ```
177
-
178
- **Prevention**: brainclaw takes a backup before every `upgrade` run (see `docs/concepts/upgrade-cli.md`). For non-upgrade scenarios, rely on git: `.brainclaw/` is git-versioned by default, so `git log` and `git checkout <prev>` recover any committed state.
179
-
180
- ---
181
-
182
- ## Plan stuck `in_progress`
183
-
184
- **Symptom**: a plan has been marked `in_progress` for days with no commits or claim activity.
185
-
186
- **Why**: the agent that started it crashed, was rerouted, or simply forgot to transition to `done` / `blocked` / `dropped`.
187
-
188
- **Fix**:
189
-
190
- ```bash
191
- # Survey
192
- brainclaw stale list # plan_in_progress flagged after 7 days by default
193
-
194
- # Decide based on context
195
- brainclaw stale resolve <plan_id> # → dropped (default for stale)
196
- # or, via canonical grammar, transition to a different terminal state:
197
- # bclaw_transition(entity="plan", id="<plan_id>", to="done")
198
- # bclaw_transition(entity="plan", id="<plan_id>", to="blocked")
199
- ```
200
-
201
- **Threshold tuning**: defaults live in `src/core/staleness.ts`. A config-driven override is on the roadmap (open follow-up); for now you adjust the source file if 7 days is too aggressive for your project.
202
-
203
- ---
204
-
205
- ## Inbox messages stuck / brief-ack never arrived
206
-
207
- **Symptom**: a dispatched assignment shows `running` indefinitely, and `bclaw_assignment_events` shows `run_running` but no further progress.
208
-
209
- **Why**: the spawned worker process either (a) crashed before reading its inbox, (b) read the inbox but couldn't acknowledge (e.g., MCP unavailable inside the spawned sandbox — common with codex `--sandbox workspace-write`), or (c) is genuinely still working but slow.
210
-
211
- **Diagnostic order**:
212
-
213
- ```bash
214
- # 1. Is the worker process still alive?
215
- ps -ef | grep <agent-binary> # codex, claude, copilot, …
216
- # Windows: Get-Process -Id <pid> # or `tasklist /FI "PID eq <pid>"`
217
-
218
- # 2. Did the brief-ack file land?
219
- ls .brainclaw/coordination/runtime/ack/<assignment_id>.ack
220
- # If yes → spawn started, worker is somewhere in its loop
221
- # If no → spawn never started or died before the wrap shell ran touch
222
-
223
- # 3. (pln#504) What did the worker actually say? stdout/stderr capture
224
- # Spawned workers now route their streams to per-assignment log files. If the
225
- # worker died silently, the error usually shows up here.
226
- cat .brainclaw/coordination/runtime/log/<assignment_id>.stdout.log
227
- cat .brainclaw/coordination/runtime/log/<assignment_id>.stderr.log
228
-
229
- # 4. Inspect the worktree for activity
230
- git -C <worktree> log --oneline -5
231
- git -C <worktree> status
232
-
233
- # 5. Check the run log
234
- brainclaw inbox list --agent <agent>
235
- # or via MCP: bclaw_assignment_events(assignmentId="<id>")
236
- ```
237
-
238
- **Fix paths**:
239
- - Worker dead, no ack → reroute via `bclaw_coordinate(intent="reroute", …)` to another agent
240
- - Worker dead, ack present, work uncommitted → manual harvest (see "Dispatched worker finished without committing" above)
241
- - Worker still alive but slow → wait, or `kill` and reroute
242
-
243
- **Brief-ack TTL** is configurable via `BRAINCLAW_HANDSHAKE_TIMEOUT_MS` (default 30s since pln#475+#476). Past that, the dispatcher times the spawn out and surfaces the failure in the assignment events log.
244
-
245
- ---
246
-
247
- ## See also
248
-
249
- - [`docs/concepts/dispatch-lifecycle.md`](dispatch-lifecycle.md) — the entity model + FSMs + observability decision tree underlying every diagnostic step on this page
250
- - [`docs/concepts/memory-staleness.md`](memory-staleness.md) — staleness signals and resolve flow in depth
251
- - [`docs/concepts/loop-engine.md`](loop-engine.md) — multi-turn loops (review-fix), recovery semantics for in-flight loops
252
- - [`docs/concepts/upgrade-cli.md`](upgrade-cli.md) — `brainclaw upgrade` design + rollback path
253
- - [`docs/cli.md`](../cli.md) — full command reference for `doctor`, `stale`, `claim`, `upgrade`, `inbox`, `worktree`
254
- - [`docs/concepts/multi-agent-workflows.md`](multi-agent-workflows.md) — happy-path coordination patterns (the inverse of this page)
1
+ # Troubleshooting & recovery
2
+
3
+ Runbook for the most common ways a brainclaw workspace gets into a degraded state during multi-agent coordination, and how to bring it back. Symptoms first, causes second, remediation third — pattern-matchable when you don't have time to read the whole page.
4
+
5
+ This is **operator-facing**: it assumes you can run CLI commands. Agents you orchestrate don't read this page; you do, when something stalls.
6
+
7
+ ## Quick-reference cheatsheet
8
+
9
+ | Symptom | First-line check | First-line fix |
10
+ |---|---|---|
11
+ | Agent crashed, claim still active | `brainclaw claim list` | `brainclaw claim release <id>` (or `brainclaw stale resolve <id>`) |
12
+ | Plan stuck `in_progress` for days | `brainclaw stale list` | `brainclaw stale resolve <plan_id>` (transitions to `dropped`) |
13
+ | Dispatched worker finished without committing | `git -C <worktree> status` | manually `git add` + `git commit` in the worktree, then merge |
14
+ | `Cannot find module 'mcp-worker.js'` | `brainclaw doctor` | `brainclaw doctor --repair` |
15
+ | Octopus merge fails on parallel lanes | `git status` | merge lanes one-by-one, resolve conflicts, then proceed |
16
+ | `.brainclaw/` schema looks corrupt | `brainclaw doctor --after-migration` | `brainclaw upgrade --rollback` (restores last backup) |
17
+ | Inbox messages stuck / not delivered | `brainclaw inbox list` | `brainclaw inbox ack <id>` or check `bclaw_assignment_events` |
18
+ | `bclaw_work` returns 25k-token error | n/a | already mitigated since v1.0.14 (compact mode default); pass `compact: true` if older clients |
19
+ | Stale runtime notes flood `bclaw_context` | `brainclaw stale list` | `brainclaw stale resolve <id>` per noisy item |
20
+
21
+ If your symptom isn't here, jump to the relevant section below or run `brainclaw doctor --json` and inspect the `checks` array.
22
+
23
+ ---
24
+
25
+ ## Stale claims after a crashed agent
26
+
27
+ **Symptom**: an agent died (credit limit, terminal closed, network drop). Other agents see the scope as held and refuse to claim it.
28
+
29
+ **Why**: claims are advisory locks with a TTL, but expiry is not enforced by a daemon — it surfaces only when something queries it. So a crashed agent's claim stays "active" until someone runs a check.
30
+
31
+ **Fix**:
32
+
33
+ ```bash
34
+ # See what's stale (uses the staleness scoring from src/core/staleness.ts)
35
+ brainclaw stale list
36
+
37
+ # Release a specific stale claim
38
+ brainclaw claim release <claim_id>
39
+
40
+ # Or, for any stale entity (plan, handoff, candidate, runtime_note, claim),
41
+ # trigger the canonical action:
42
+ brainclaw stale resolve <id>
43
+ ```
44
+
45
+ `stale resolve` dispatches to the right transition per entity:
46
+ - claim → release
47
+ - plan → `bclaw_transition(entity="plan", to="dropped")`
48
+ - handoff → `bclaw_transition(entity="handoff", to="closed")`
49
+ - candidate → `bclaw_transition(entity="candidate", to="rejected")`
50
+ - trap → `bclaw_transition(entity="trap", to="resolved")`
51
+ - runtime_note → `bclaw_remove(entity="runtime_note", id=…)`
52
+
53
+ **Prevention**: agents that respect the protocol call `bclaw_session_end(auto_release: true)` on exit, which releases all their claims. This is the recommended default in every dispatch brief.
54
+
55
+ ---
56
+
57
+ ## `bclaw_coordinate` refused with `dirty_working_tree`
58
+
59
+ **Symptom**: an `assign` / `review` / `reroute` dispatch returns
60
+ `dirty_working_tree` instead of spawning.
61
+
62
+ **Why**: the worker spawns from a worktree branched at HEAD, so uncommitted
63
+ edits in the source repo are invisible to it. The guard (trp#371) is
64
+ scope-aware — it refuses only when the uncommitted files **overlap**, or
65
+ cannot be proven disjoint from, the dispatch `scope`. `.brainclaw/` and
66
+ `.git/` are always ignored, and `consult` / `ideate` / `summarize` are never
67
+ guarded (they spawn no worktree). A scope that is not a resolvable file path
68
+ (a plan-id, loop-ref, or prose) cannot be proven disjoint, so the guard stays
69
+ conservative and refuses while the tree is dirty.
70
+
71
+ **Fixes**:
72
+
73
+ - Commit or stash the overlapping files, then re-dispatch (cleanest).
74
+ - Pass `allow_dirty: true` to proceed anyway — the block becomes a warning
75
+ that lists the overlapping files.
76
+ - Pass a resolvable file `scope` (e.g. `src/foo.ts`) so the guard can prove the
77
+ dirty files are out of scope.
78
+ - Pass `ref: <commit|branch|tag>` to build the worktree from an explicit ref —
79
+ uncommitted working-tree changes are then intentionally out of scope.
80
+
81
+ ---
82
+
83
+ ## Dispatched worker finished work but never committed
84
+
85
+ **Symptom**: a sequence's lane shows the worker as "task_complete" in the run log, but `git -C <worktree-path> status` shows uncommitted changes.
86
+
87
+ **Why**: some agents (notably codex when running in `--sandbox workspace-write`) sometimes finish editing without ever creating a git commit — they exit on `task_complete` from the prompt without the wrap-up step. The brief-ack file confirms the spawn *started*, not that it *committed*. See `trp#178`.
88
+
89
+ **Fix** (manual harvest):
90
+
91
+ ```bash
92
+ # 1. Locate the worktree
93
+ git worktree list | grep feat/pln_<lane_id>
94
+
95
+ # 2. cd into it, inspect the work
96
+ cd ~/.brainclaw/worktrees/<project-hash>/feat_pln_xxxx
97
+ git status
98
+ git diff --stat
99
+
100
+ # 3. Stage + commit with a clear message that references the plan id
101
+ git add <files>
102
+ git commit -m "feat(<scope>): <summary> (pln#<id>)"
103
+
104
+ # 4. Back on master, octopus-merge as usual
105
+ cd <main repo>
106
+ git merge --no-ff feat/pln_xxxx -m "merge: <description>"
107
+ ```
108
+
109
+ **Prevention**: every dispatch brief targeting agents prone to this pattern (notably codex) should include explicit commit instructions at the end, e.g. *"When done editing, stage your changes and create a commit with a clear message referencing the plan id (e.g. `feat(scope): summary (pln#XXX)`). Do not stop until the commit exists."*
110
+
111
+ ---
112
+
113
+ ## MCP runtime corrupted (mcp-worker.js missing)
114
+
115
+ **Symptom**: `MCP error -32603: Cannot find module 'mcp-worker.js'` or the server logs `MCP runtime corrupted (mcp-worker.js missing)` on startup.
116
+
117
+ **Why**: `dist/` was wiped or partially deleted. Common causes: a `git merge` that triggered worktree cleanup before pln#477 landed, an `npm run clean:dist` followed by an interrupted build, or filesystem-level corruption.
118
+
119
+ **Fix**:
120
+
121
+ ```bash
122
+ brainclaw doctor --repair
123
+ ```
124
+
125
+ This rebuilds `dist/` from `src/` (TypeScript compile + copy default profiles) and validates by running `node dist/cli.js --version`. The repair also writes `dist/.brainclaw-build.json` so subsequent runs can do a stale-check (compare `src_hash` vs `dist_hash`).
126
+
127
+ **If `--repair` fails**: it usually means `node_modules` is also damaged. Run a clean `npm install` first, then re-run `brainclaw doctor --repair`.
128
+
129
+ **Note**: read-only MCP handlers stay available in-process even when the worker is missing (since pln#478) — so basic `bclaw_context` and `bclaw_find` calls still respond, but anything requiring the worker (most write operations) returns `runtime_corrupted` with a repair pointer.
130
+
131
+ ---
132
+
133
+ ## Octopus merge fails on parallel lanes
134
+
135
+ **Symptom**: after a sequenced parallel dispatch finishes, you run `git merge --no-ff lane1 lane2 lane3 -m "merge: …"` and git refuses with conflict markers.
136
+
137
+ **Why**: octopus merges only succeed when the lanes touch disjoint files. If two lanes wrote to the same file, octopus aborts and you must merge them sequentially.
138
+
139
+ **Fix**:
140
+
141
+ ```bash
142
+ # Cancel the failed octopus
143
+ git merge --abort
144
+
145
+ # Merge lanes one at a time, resolving conflicts as needed
146
+ git merge --no-ff lane1
147
+ # (resolve any conflicts, commit)
148
+ git merge --no-ff lane2
149
+ # (resolve any conflicts, commit)
150
+ git merge --no-ff lane3
151
+ ```
152
+
153
+ **Prevention**: when defining a sequence, choose lane scopes that minimize file overlap. Use `hard_after` dependencies for lanes that genuinely need to land in order. The dispatcher does not itself enforce disjoint scopes — that's the caller's responsibility when designing the sequence.
154
+
155
+ ---
156
+
157
+ ## `.brainclaw/` looks corrupted (schema drift, malformed JSON)
158
+
159
+ **Symptom**: `bclaw_doctor` reports `state is invalid: <ZodError>` or files in `.brainclaw/memory/` fail to parse.
160
+
161
+ **Why**: usually a half-written file from an interrupted write (process killed mid-write), a migration that didn't complete, or a manual edit that introduced syntax errors. `brainclaw upgrade --rollback` exists precisely for this case.
162
+
163
+ **Fix**:
164
+
165
+ ```bash
166
+ # 1. Inspect what's wrong
167
+ brainclaw doctor --after-migration
168
+
169
+ # 2. If the most recent migration is the cause, roll back
170
+ brainclaw upgrade --rollback
171
+ # This restores the last backup at <store>.bak-<iso-ts>/ and parks the
172
+ # current corrupted store at <store>.rollback-<iso-ts>/ for inspection.
173
+
174
+ # 3. If a single file is corrupted (and rollback is too aggressive),
175
+ # inspect the parked rollback dir and copy individual files back manually.
176
+ ```
177
+
178
+ **Prevention**: brainclaw takes a backup before every `upgrade` run (see `docs/concepts/upgrade-cli.md`). For non-upgrade scenarios, rely on git: `.brainclaw/` is git-versioned by default, so `git log` and `git checkout <prev>` recover any committed state.
179
+
180
+ ---
181
+
182
+ ## Plan stuck `in_progress`
183
+
184
+ **Symptom**: a plan has been marked `in_progress` for days with no commits or claim activity.
185
+
186
+ **Why**: the agent that started it crashed, was rerouted, or simply forgot to transition to `done` / `blocked` / `dropped`.
187
+
188
+ **Fix**:
189
+
190
+ ```bash
191
+ # Survey
192
+ brainclaw stale list # plan_in_progress flagged after 7 days by default
193
+
194
+ # Decide based on context
195
+ brainclaw stale resolve <plan_id> # → dropped (default for stale)
196
+ # or, via canonical grammar, transition to a different terminal state:
197
+ # bclaw_transition(entity="plan", id="<plan_id>", to="done")
198
+ # bclaw_transition(entity="plan", id="<plan_id>", to="blocked")
199
+ ```
200
+
201
+ **Threshold tuning**: defaults live in `src/core/staleness.ts`. A config-driven override is on the roadmap (open follow-up); for now you adjust the source file if 7 days is too aggressive for your project.
202
+
203
+ ---
204
+
205
+ ## Inbox messages stuck / brief-ack never arrived
206
+
207
+ **Symptom**: a dispatched assignment shows `running` indefinitely, and `bclaw_assignment_events` shows `run_running` but no further progress.
208
+
209
+ **Why**: the spawned worker process either (a) crashed before reading its inbox, (b) read the inbox but couldn't acknowledge (e.g., MCP unavailable inside the spawned sandbox — common with codex `--sandbox workspace-write`), or (c) is genuinely still working but slow.
210
+
211
+ **Diagnostic order**:
212
+
213
+ ```bash
214
+ # 1. Is the worker process still alive?
215
+ ps -ef | grep <agent-binary> # codex, claude, copilot, …
216
+ # Windows: Get-Process -Id <pid> # or `tasklist /FI "PID eq <pid>"`
217
+
218
+ # 2. Did the brief-ack file land?
219
+ ls .brainclaw/coordination/runtime/ack/<assignment_id>.ack
220
+ # If yes → spawn started, worker is somewhere in its loop
221
+ # If no → spawn never started or died before the wrap shell ran touch
222
+
223
+ # 3. (pln#504) What did the worker actually say? stdout/stderr capture
224
+ # Spawned workers now route their streams to per-assignment log files. If the
225
+ # worker died silently, the error usually shows up here.
226
+ cat .brainclaw/coordination/runtime/log/<assignment_id>.stdout.log
227
+ cat .brainclaw/coordination/runtime/log/<assignment_id>.stderr.log
228
+
229
+ # 4. Inspect the worktree for activity
230
+ git -C <worktree> log --oneline -5
231
+ git -C <worktree> status
232
+
233
+ # 5. Check the run log
234
+ brainclaw inbox list --agent <agent>
235
+ # or via MCP: bclaw_assignment_events(assignmentId="<id>")
236
+ ```
237
+
238
+ **Fix paths**:
239
+ - Worker dead, no ack → reroute via `bclaw_coordinate(intent="reroute", …)` to another agent
240
+ - Worker dead, ack present, work uncommitted → manual harvest (see "Dispatched worker finished without committing" above)
241
+ - Worker still alive but slow → wait, or `kill` and reroute
242
+
243
+ **Brief-ack TTL** is configurable via `BRAINCLAW_HANDSHAKE_TIMEOUT_MS` (default 30s since pln#475+#476). Past that, the dispatcher times the spawn out and surfaces the failure in the assignment events log.
244
+
245
+ ---
246
+
247
+ ## See also
248
+
249
+ - [`docs/concepts/dispatch-lifecycle.md`](dispatch-lifecycle.md) — the entity model + FSMs + observability decision tree underlying every diagnostic step on this page
250
+ - [`docs/concepts/memory-staleness.md`](memory-staleness.md) — staleness signals and resolve flow in depth
251
+ - [`docs/concepts/loop-engine.md`](loop-engine.md) — multi-turn loops (review-fix), recovery semantics for in-flight loops
252
+ - [`docs/concepts/upgrade-cli.md`](upgrade-cli.md) — `brainclaw upgrade` design + rollback path
253
+ - [`docs/cli.md`](../cli.md) — full command reference for `doctor`, `stale`, `claim`, `upgrade`, `inbox`, `worktree`
254
+ - [`docs/concepts/multi-agent-workflows.md`](multi-agent-workflows.md) — happy-path coordination patterns (the inverse of this page)