brainclaw 1.9.0 → 1.10.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +631 -499
- package/dist/brainclaw-vscode.vsix +0 -0
- package/dist/cli.js +18 -1
- package/dist/commands/code-map.js +129 -0
- package/dist/commands/codev.js +7 -0
- package/dist/commands/harvest.js +1 -1
- package/dist/commands/hooks.js +73 -73
- package/dist/commands/init.js +1 -1
- package/dist/commands/install-hooks.js +78 -78
- package/dist/commands/mcp-read-handlers.js +57 -14
- package/dist/commands/mcp.js +200 -13
- package/dist/commands/run-profile.js +3 -2
- package/dist/commands/switch.js +125 -93
- package/dist/commands/version.js +1 -1
- package/dist/core/agent-capability.js +19 -4
- package/dist/core/agent-files.js +131 -119
- package/dist/core/code-map/backend.js +123 -0
- package/dist/core/code-map/core.js +81 -0
- package/dist/core/code-map/drafts.js +2 -0
- package/dist/core/code-map/extractor.js +29 -0
- package/dist/core/code-map/finalizer.js +191 -0
- package/dist/core/code-map/freshness.js +108 -0
- package/dist/core/code-map/ids.js +0 -0
- package/dist/core/code-map/importable.js +35 -0
- package/dist/core/code-map/indexes.js +197 -0
- package/dist/core/code-map/lang/java/imports.scm +17 -0
- package/dist/core/code-map/lang/java/index.js +254 -0
- package/dist/core/code-map/lang/java/tags.scm +48 -0
- package/dist/core/code-map/lang/php/imports.scm +21 -0
- package/dist/core/code-map/lang/php/index.js +251 -0
- package/dist/core/code-map/lang/php/tags.scm +44 -0
- package/dist/core/code-map/lang/provider.js +9 -0
- package/dist/core/code-map/lang/providers.js +24 -0
- package/dist/core/code-map/lang/python/imports.scm +90 -0
- package/dist/core/code-map/lang/python/index.js +364 -0
- package/dist/core/code-map/lang/python/tags.scm +81 -0
- package/dist/core/code-map/lang/query-runtime.js +374 -0
- package/dist/core/code-map/lang/registry.js +125 -0
- package/dist/core/code-map/lang/typescript/imports.scm +90 -0
- package/dist/core/code-map/lang/typescript/index.js +306 -0
- package/dist/core/code-map/lang/typescript/tags.js.scm +106 -0
- package/dist/core/code-map/lang/typescript/tags.scm +151 -0
- package/dist/core/code-map/lock.js +210 -0
- package/dist/core/code-map/materialized.js +51 -0
- package/dist/core/code-map/memory-reader.js +59 -0
- package/dist/core/code-map/paths.js +53 -0
- package/dist/core/code-map/query.js +568 -0
- package/dist/core/code-map/refresh.js +0 -0
- package/dist/core/code-map/resolve.js +177 -0
- package/dist/core/code-map/store.js +206 -0
- package/dist/core/code-map/types.js +288 -0
- package/dist/core/code-map/vocabulary.js +57 -0
- package/dist/core/code-map/wasm-loader.js +294 -0
- package/dist/core/code-map/work-section.js +206 -0
- package/dist/core/codev-prompts.js +38 -38
- package/dist/core/codev-rounds.js +4 -0
- package/dist/core/default-profiles/doctor.yaml +11 -11
- package/dist/core/default-profiles/janitor.yaml +11 -11
- package/dist/core/default-profiles/onboarder.yaml +11 -11
- package/dist/core/default-profiles/reviewer.yaml +13 -13
- package/dist/core/dispatcher.js +1 -1
- package/dist/core/entity-operations.js +29 -3
- package/dist/core/execution-adapters.js +11 -10
- package/dist/core/execution-profile.js +58 -0
- package/dist/core/execution.js +1 -1
- package/dist/core/facade-schema.js +9 -0
- package/dist/core/instruction-templates.js +2 -0
- package/dist/core/loops/verbs.js +0 -1
- package/dist/core/mcp-command-resolution.js +3 -1
- package/dist/core/messaging.js +2 -2
- package/dist/core/protocol-skills.js +164 -164
- package/dist/core/runtime-signals.js +1 -1
- package/dist/core/search.js +19 -2
- package/dist/core/security-guard.js +207 -207
- package/dist/core/spawn-check.js +16 -2
- package/dist/core/staleness.js +1 -1
- package/dist/core/store-resolution.js +67 -11
- package/dist/core/worktree.js +18 -18
- package/dist/facts.js +9 -5
- package/dist/facts.json +8 -4
- package/dist/vendor/web-tree-sitter/tree-sitter.js +3980 -0
- package/dist/vendor/web-tree-sitter/tree-sitter.wasm +0 -0
- package/dist/wasm/tree-sitter-java.wasm +0 -0
- package/dist/wasm/tree-sitter-javascript.wasm +0 -0
- package/dist/wasm/tree-sitter-php.wasm +0 -0
- package/dist/wasm/tree-sitter-python.wasm +0 -0
- package/dist/wasm/tree-sitter-tsx.wasm +0 -0
- package/dist/wasm/tree-sitter-typescript.wasm +0 -0
- package/dist/wasm/tree-sitter.wasm +0 -0
- package/docs/PROTOCOL.md +1 -1
- package/docs/adapters/openclaw.md +43 -43
- package/docs/architecture/project-refs.md +328 -328
- package/docs/cli.md +2131 -2093
- package/docs/code-map.md +198 -0
- package/docs/concepts/coordination.md +52 -52
- package/docs/concepts/coordinator-runbook.md +129 -129
- package/docs/concepts/dispatch-lifecycle.md +245 -245
- package/docs/concepts/event-log-store.md +928 -928
- package/docs/concepts/ideation-loop.md +317 -317
- package/docs/concepts/loop-engine.md +520 -511
- package/docs/concepts/mcp-governance.md +268 -268
- package/docs/concepts/memory.md +84 -84
- package/docs/concepts/multi-agent-workflows.md +167 -167
- package/docs/concepts/observer-protocol.md +361 -361
- package/docs/concepts/plans-and-claims.md +217 -217
- package/docs/concepts/project-md-convention.md +35 -35
- package/docs/concepts/runtime-notes.md +38 -38
- package/docs/concepts/troubleshooting.md +254 -254
- package/docs/concepts/workspace-bootstrapping.md +142 -142
- package/docs/context-format-changelog.md +35 -35
- package/docs/context-format.md +48 -48
- package/docs/index.md +65 -65
- package/docs/integrations/agents.md +158 -158
- package/docs/integrations/claude-code.md +23 -23
- package/docs/integrations/cline.md +77 -77
- package/docs/integrations/continue.md +55 -55
- package/docs/integrations/copilot.md +68 -68
- package/docs/integrations/cursor.md +23 -23
- package/docs/integrations/kilocode.md +72 -72
- package/docs/integrations/mcp.md +385 -378
- package/docs/integrations/mistral-vibe.md +122 -122
- package/docs/integrations/openclaw.md +92 -92
- package/docs/integrations/opencode.md +84 -84
- package/docs/integrations/overview.md +115 -115
- package/docs/integrations/roo.md +71 -71
- package/docs/integrations/windsurf.md +77 -77
- package/docs/mcp-schema-changelog.md +364 -356
- package/docs/playbooks/integration/index.md +121 -121
- package/docs/playbooks/orchestration.md +37 -0
- package/docs/playbooks/productivity/index.md +99 -99
- package/docs/playbooks/team/index.md +117 -117
- package/docs/product/agent-first-model.md +184 -184
- package/docs/product/entity-model-audit.md +462 -462
- package/docs/product/positioning.md +86 -86
- package/docs/quickstart-existing-project.md +107 -107
- package/docs/quickstart.md +183 -183
- package/docs/release-maintenance.md +79 -79
- package/docs/reputation.md +52 -52
- package/docs/review.md +45 -45
- package/docs/security.md +212 -212
- package/docs/server-operations.md +118 -118
- package/docs/storage.md +106 -106
- package/package.json +86 -66
- package/docs/concepts/event-log-store-critique-A.md +0 -333
- package/docs/concepts/event-log-store-critique-B.md +0 -353
- package/docs/concepts/event-log-store-phase0-measurements.md +0 -58
- package/docs/concepts/event-log-store-proposal-A.md +0 -365
- package/docs/concepts/event-log-store-proposal-B.md +0 -404
- package/docs/concepts/identity-model-proposal.md +0 -371
|
@@ -1,254 +1,254 @@
|
|
|
1
|
-
# Troubleshooting & recovery
|
|
2
|
-
|
|
3
|
-
Runbook for the most common ways a brainclaw workspace gets into a degraded state during multi-agent coordination, and how to bring it back. Symptoms first, causes second, remediation third — pattern-matchable when you don't have time to read the whole page.
|
|
4
|
-
|
|
5
|
-
This is **operator-facing**: it assumes you can run CLI commands. Agents you orchestrate don't read this page; you do, when something stalls.
|
|
6
|
-
|
|
7
|
-
## Quick-reference cheatsheet
|
|
8
|
-
|
|
9
|
-
| Symptom | First-line check | First-line fix |
|
|
10
|
-
|---|---|---|
|
|
11
|
-
| Agent crashed, claim still active | `brainclaw claim list` | `brainclaw claim release <id>` (or `brainclaw stale resolve <id>`) |
|
|
12
|
-
| Plan stuck `in_progress` for days | `brainclaw stale list` | `brainclaw stale resolve <plan_id>` (transitions to `dropped`) |
|
|
13
|
-
| Dispatched worker finished without committing | `git -C <worktree> status` | manually `git add` + `git commit` in the worktree, then merge |
|
|
14
|
-
| `Cannot find module 'mcp-worker.js'` | `brainclaw doctor` | `brainclaw doctor --repair` |
|
|
15
|
-
| Octopus merge fails on parallel lanes | `git status` | merge lanes one-by-one, resolve conflicts, then proceed |
|
|
16
|
-
| `.brainclaw/` schema looks corrupt | `brainclaw doctor --after-migration` | `brainclaw upgrade --rollback` (restores last backup) |
|
|
17
|
-
| Inbox messages stuck / not delivered | `brainclaw inbox list` | `brainclaw inbox ack <id>` or check `bclaw_assignment_events` |
|
|
18
|
-
| `bclaw_work` returns 25k-token error | n/a | already mitigated since v1.0.14 (compact mode default); pass `compact: true` if older clients |
|
|
19
|
-
| Stale runtime notes flood `bclaw_context` | `brainclaw stale list` | `brainclaw stale resolve <id>` per noisy item |
|
|
20
|
-
|
|
21
|
-
If your symptom isn't here, jump to the relevant section below or run `brainclaw doctor --json` and inspect the `checks` array.
|
|
22
|
-
|
|
23
|
-
---
|
|
24
|
-
|
|
25
|
-
## Stale claims after a crashed agent
|
|
26
|
-
|
|
27
|
-
**Symptom**: an agent died (credit limit, terminal closed, network drop). Other agents see the scope as held and refuse to claim it.
|
|
28
|
-
|
|
29
|
-
**Why**: claims are advisory locks with a TTL, but expiry is not enforced by a daemon — it surfaces only when something queries it. So a crashed agent's claim stays "active" until someone runs a check.
|
|
30
|
-
|
|
31
|
-
**Fix**:
|
|
32
|
-
|
|
33
|
-
```bash
|
|
34
|
-
# See what's stale (uses the staleness scoring from src/core/staleness.ts)
|
|
35
|
-
brainclaw stale list
|
|
36
|
-
|
|
37
|
-
# Release a specific stale claim
|
|
38
|
-
brainclaw claim release <claim_id>
|
|
39
|
-
|
|
40
|
-
# Or, for any stale entity (plan, handoff, candidate, runtime_note, claim),
|
|
41
|
-
# trigger the canonical action:
|
|
42
|
-
brainclaw stale resolve <id>
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
`stale resolve` dispatches to the right transition per entity:
|
|
46
|
-
- claim → release
|
|
47
|
-
- plan → `bclaw_transition(entity="plan", to="dropped")`
|
|
48
|
-
- handoff → `bclaw_transition(entity="handoff", to="closed")`
|
|
49
|
-
- candidate → `bclaw_transition(entity="candidate", to="rejected")`
|
|
50
|
-
- trap → `bclaw_transition(entity="trap", to="resolved")`
|
|
51
|
-
- runtime_note → `bclaw_remove(entity="runtime_note", id=…)`
|
|
52
|
-
|
|
53
|
-
**Prevention**: agents that respect the protocol call `bclaw_session_end(auto_release: true)` on exit, which releases all their claims. This is the recommended default in every dispatch brief.
|
|
54
|
-
|
|
55
|
-
---
|
|
56
|
-
|
|
57
|
-
## `bclaw_coordinate` refused with `dirty_working_tree`
|
|
58
|
-
|
|
59
|
-
**Symptom**: an `assign` / `review` / `reroute` dispatch returns
|
|
60
|
-
`dirty_working_tree` instead of spawning.
|
|
61
|
-
|
|
62
|
-
**Why**: the worker spawns from a worktree branched at HEAD, so uncommitted
|
|
63
|
-
edits in the source repo are invisible to it. The guard (trp#371) is
|
|
64
|
-
scope-aware — it refuses only when the uncommitted files **overlap**, or
|
|
65
|
-
cannot be proven disjoint from, the dispatch `scope`. `.brainclaw/` and
|
|
66
|
-
`.git/` are always ignored, and `consult` / `ideate` / `summarize` are never
|
|
67
|
-
guarded (they spawn no worktree). A scope that is not a resolvable file path
|
|
68
|
-
(a plan-id, loop-ref, or prose) cannot be proven disjoint, so the guard stays
|
|
69
|
-
conservative and refuses while the tree is dirty.
|
|
70
|
-
|
|
71
|
-
**Fixes**:
|
|
72
|
-
|
|
73
|
-
- Commit or stash the overlapping files, then re-dispatch (cleanest).
|
|
74
|
-
- Pass `allow_dirty: true` to proceed anyway — the block becomes a warning
|
|
75
|
-
that lists the overlapping files.
|
|
76
|
-
- Pass a resolvable file `scope` (e.g. `src/foo.ts`) so the guard can prove the
|
|
77
|
-
dirty files are out of scope.
|
|
78
|
-
- Pass `ref: <commit|branch|tag>` to build the worktree from an explicit ref —
|
|
79
|
-
uncommitted working-tree changes are then intentionally out of scope.
|
|
80
|
-
|
|
81
|
-
---
|
|
82
|
-
|
|
83
|
-
## Dispatched worker finished work but never committed
|
|
84
|
-
|
|
85
|
-
**Symptom**: a sequence's lane shows the worker as "task_complete" in the run log, but `git -C <worktree-path> status` shows uncommitted changes.
|
|
86
|
-
|
|
87
|
-
**Why**: some agents (notably codex when running in `--sandbox workspace-write`) sometimes finish editing without ever creating a git commit — they exit on `task_complete` from the prompt without the wrap-up step. The brief-ack file confirms the spawn *started*, not that it *committed*. See `trp#178`.
|
|
88
|
-
|
|
89
|
-
**Fix** (manual harvest):
|
|
90
|
-
|
|
91
|
-
```bash
|
|
92
|
-
# 1. Locate the worktree
|
|
93
|
-
git worktree list | grep feat/pln_<lane_id>
|
|
94
|
-
|
|
95
|
-
# 2. cd into it, inspect the work
|
|
96
|
-
cd ~/.brainclaw/worktrees/<project-hash>/feat_pln_xxxx
|
|
97
|
-
git status
|
|
98
|
-
git diff --stat
|
|
99
|
-
|
|
100
|
-
# 3. Stage + commit with a clear message that references the plan id
|
|
101
|
-
git add <files>
|
|
102
|
-
git commit -m "feat(<scope>): <summary> (pln#<id>)"
|
|
103
|
-
|
|
104
|
-
# 4. Back on master, octopus-merge as usual
|
|
105
|
-
cd <main repo>
|
|
106
|
-
git merge --no-ff feat/pln_xxxx -m "merge: <description>"
|
|
107
|
-
```
|
|
108
|
-
|
|
109
|
-
**Prevention**: every dispatch brief targeting agents prone to this pattern (notably codex) should include explicit commit instructions at the end, e.g. *"When done editing, stage your changes and create a commit with a clear message referencing the plan id (e.g. `feat(scope): summary (pln#XXX)`). Do not stop until the commit exists."*
|
|
110
|
-
|
|
111
|
-
---
|
|
112
|
-
|
|
113
|
-
## MCP runtime corrupted (mcp-worker.js missing)
|
|
114
|
-
|
|
115
|
-
**Symptom**: `MCP error -32603: Cannot find module 'mcp-worker.js'` or the server logs `MCP runtime corrupted (mcp-worker.js missing)` on startup.
|
|
116
|
-
|
|
117
|
-
**Why**: `dist/` was wiped or partially deleted. Common causes: a `git merge` that triggered worktree cleanup before pln#477 landed, an `npm run clean:dist` followed by an interrupted build, or filesystem-level corruption.
|
|
118
|
-
|
|
119
|
-
**Fix**:
|
|
120
|
-
|
|
121
|
-
```bash
|
|
122
|
-
brainclaw doctor --repair
|
|
123
|
-
```
|
|
124
|
-
|
|
125
|
-
This rebuilds `dist/` from `src/` (TypeScript compile + copy default profiles) and validates by running `node dist/cli.js --version`. The repair also writes `dist/.brainclaw-build.json` so subsequent runs can do a stale-check (compare `src_hash` vs `dist_hash`).
|
|
126
|
-
|
|
127
|
-
**If `--repair` fails**: it usually means `node_modules` is also damaged. Run a clean `npm install` first, then re-run `brainclaw doctor --repair`.
|
|
128
|
-
|
|
129
|
-
**Note**: read-only MCP handlers stay available in-process even when the worker is missing (since pln#478) — so basic `bclaw_context` and `bclaw_find` calls still respond, but anything requiring the worker (most write operations) returns `runtime_corrupted` with a repair pointer.
|
|
130
|
-
|
|
131
|
-
---
|
|
132
|
-
|
|
133
|
-
## Octopus merge fails on parallel lanes
|
|
134
|
-
|
|
135
|
-
**Symptom**: after a sequenced parallel dispatch finishes, you run `git merge --no-ff lane1 lane2 lane3 -m "merge: …"` and git refuses with conflict markers.
|
|
136
|
-
|
|
137
|
-
**Why**: octopus merges only succeed when the lanes touch disjoint files. If two lanes wrote to the same file, octopus aborts and you must merge them sequentially.
|
|
138
|
-
|
|
139
|
-
**Fix**:
|
|
140
|
-
|
|
141
|
-
```bash
|
|
142
|
-
# Cancel the failed octopus
|
|
143
|
-
git merge --abort
|
|
144
|
-
|
|
145
|
-
# Merge lanes one at a time, resolving conflicts as needed
|
|
146
|
-
git merge --no-ff lane1
|
|
147
|
-
# (resolve any conflicts, commit)
|
|
148
|
-
git merge --no-ff lane2
|
|
149
|
-
# (resolve any conflicts, commit)
|
|
150
|
-
git merge --no-ff lane3
|
|
151
|
-
```
|
|
152
|
-
|
|
153
|
-
**Prevention**: when defining a sequence, choose lane scopes that minimize file overlap. Use `hard_after` dependencies for lanes that genuinely need to land in order. The dispatcher does not itself enforce disjoint scopes — that's the caller's responsibility when designing the sequence.
|
|
154
|
-
|
|
155
|
-
---
|
|
156
|
-
|
|
157
|
-
## `.brainclaw/` looks corrupted (schema drift, malformed JSON)
|
|
158
|
-
|
|
159
|
-
**Symptom**: `bclaw_doctor` reports `state is invalid: <ZodError>` or files in `.brainclaw/memory/` fail to parse.
|
|
160
|
-
|
|
161
|
-
**Why**: usually a half-written file from an interrupted write (process killed mid-write), a migration that didn't complete, or a manual edit that introduced syntax errors. `brainclaw upgrade --rollback` exists precisely for this case.
|
|
162
|
-
|
|
163
|
-
**Fix**:
|
|
164
|
-
|
|
165
|
-
```bash
|
|
166
|
-
# 1. Inspect what's wrong
|
|
167
|
-
brainclaw doctor --after-migration
|
|
168
|
-
|
|
169
|
-
# 2. If the most recent migration is the cause, roll back
|
|
170
|
-
brainclaw upgrade --rollback
|
|
171
|
-
# This restores the last backup at <store>.bak-<iso-ts>/ and parks the
|
|
172
|
-
# current corrupted store at <store>.rollback-<iso-ts>/ for inspection.
|
|
173
|
-
|
|
174
|
-
# 3. If a single file is corrupted (and rollback is too aggressive),
|
|
175
|
-
# inspect the parked rollback dir and copy individual files back manually.
|
|
176
|
-
```
|
|
177
|
-
|
|
178
|
-
**Prevention**: brainclaw takes a backup before every `upgrade` run (see `docs/concepts/upgrade-cli.md`). For non-upgrade scenarios, rely on git: `.brainclaw/` is git-versioned by default, so `git log` and `git checkout <prev>` recover any committed state.
|
|
179
|
-
|
|
180
|
-
---
|
|
181
|
-
|
|
182
|
-
## Plan stuck `in_progress`
|
|
183
|
-
|
|
184
|
-
**Symptom**: a plan has been marked `in_progress` for days with no commits or claim activity.
|
|
185
|
-
|
|
186
|
-
**Why**: the agent that started it crashed, was rerouted, or simply forgot to transition to `done` / `blocked` / `dropped`.
|
|
187
|
-
|
|
188
|
-
**Fix**:
|
|
189
|
-
|
|
190
|
-
```bash
|
|
191
|
-
# Survey
|
|
192
|
-
brainclaw stale list # plan_in_progress flagged after 7 days by default
|
|
193
|
-
|
|
194
|
-
# Decide based on context
|
|
195
|
-
brainclaw stale resolve <plan_id> # → dropped (default for stale)
|
|
196
|
-
# or, via canonical grammar, transition to a different terminal state:
|
|
197
|
-
# bclaw_transition(entity="plan", id="<plan_id>", to="done")
|
|
198
|
-
# bclaw_transition(entity="plan", id="<plan_id>", to="blocked")
|
|
199
|
-
```
|
|
200
|
-
|
|
201
|
-
**Threshold tuning**: defaults live in `src/core/staleness.ts`. A config-driven override is on the roadmap (open follow-up); for now you adjust the source file if 7 days is too aggressive for your project.
|
|
202
|
-
|
|
203
|
-
---
|
|
204
|
-
|
|
205
|
-
## Inbox messages stuck / brief-ack never arrived
|
|
206
|
-
|
|
207
|
-
**Symptom**: a dispatched assignment shows `running` indefinitely, and `bclaw_assignment_events` shows `run_running` but no further progress.
|
|
208
|
-
|
|
209
|
-
**Why**: the spawned worker process either (a) crashed before reading its inbox, (b) read the inbox but couldn't acknowledge (e.g., MCP unavailable inside the spawned sandbox — common with codex `--sandbox workspace-write`), or (c) is genuinely still working but slow.
|
|
210
|
-
|
|
211
|
-
**Diagnostic order**:
|
|
212
|
-
|
|
213
|
-
```bash
|
|
214
|
-
# 1. Is the worker process still alive?
|
|
215
|
-
ps -ef | grep <agent-binary> # codex, claude, copilot, …
|
|
216
|
-
# Windows: Get-Process -Id <pid> # or `tasklist /FI "PID eq <pid>"`
|
|
217
|
-
|
|
218
|
-
# 2. Did the brief-ack file land?
|
|
219
|
-
ls .brainclaw/coordination/runtime/ack/<assignment_id>.ack
|
|
220
|
-
# If yes → spawn started, worker is somewhere in its loop
|
|
221
|
-
# If no → spawn never started or died before the wrap shell ran touch
|
|
222
|
-
|
|
223
|
-
# 3. (pln#504) What did the worker actually say? stdout/stderr capture
|
|
224
|
-
# Spawned workers now route their streams to per-assignment log files. If the
|
|
225
|
-
# worker died silently, the error usually shows up here.
|
|
226
|
-
cat .brainclaw/coordination/runtime/log/<assignment_id>.stdout.log
|
|
227
|
-
cat .brainclaw/coordination/runtime/log/<assignment_id>.stderr.log
|
|
228
|
-
|
|
229
|
-
# 4. Inspect the worktree for activity
|
|
230
|
-
git -C <worktree> log --oneline -5
|
|
231
|
-
git -C <worktree> status
|
|
232
|
-
|
|
233
|
-
# 5. Check the run log
|
|
234
|
-
brainclaw inbox list --agent <agent>
|
|
235
|
-
# or via MCP: bclaw_assignment_events(assignmentId="<id>")
|
|
236
|
-
```
|
|
237
|
-
|
|
238
|
-
**Fix paths**:
|
|
239
|
-
- Worker dead, no ack → reroute via `bclaw_coordinate(intent="reroute", …)` to another agent
|
|
240
|
-
- Worker dead, ack present, work uncommitted → manual harvest (see "Dispatched worker finished without committing" above)
|
|
241
|
-
- Worker still alive but slow → wait, or `kill` and reroute
|
|
242
|
-
|
|
243
|
-
**Brief-ack TTL** is configurable via `BRAINCLAW_HANDSHAKE_TIMEOUT_MS` (default 30s since pln#475+#476). Past that, the dispatcher times the spawn out and surfaces the failure in the assignment events log.
|
|
244
|
-
|
|
245
|
-
---
|
|
246
|
-
|
|
247
|
-
## See also
|
|
248
|
-
|
|
249
|
-
- [`docs/concepts/dispatch-lifecycle.md`](dispatch-lifecycle.md) — the entity model + FSMs + observability decision tree underlying every diagnostic step on this page
|
|
250
|
-
- [`docs/concepts/memory-staleness.md`](memory-staleness.md) — staleness signals and resolve flow in depth
|
|
251
|
-
- [`docs/concepts/loop-engine.md`](loop-engine.md) — multi-turn loops (review-fix), recovery semantics for in-flight loops
|
|
252
|
-
- [`docs/concepts/upgrade-cli.md`](upgrade-cli.md) — `brainclaw upgrade` design + rollback path
|
|
253
|
-
- [`docs/cli.md`](../cli.md) — full command reference for `doctor`, `stale`, `claim`, `upgrade`, `inbox`, `worktree`
|
|
254
|
-
- [`docs/concepts/multi-agent-workflows.md`](multi-agent-workflows.md) — happy-path coordination patterns (the inverse of this page)
|
|
1
|
+
# Troubleshooting & recovery
|
|
2
|
+
|
|
3
|
+
Runbook for the most common ways a brainclaw workspace gets into a degraded state during multi-agent coordination, and how to bring it back. Symptoms first, causes second, remediation third — pattern-matchable when you don't have time to read the whole page.
|
|
4
|
+
|
|
5
|
+
This is **operator-facing**: it assumes you can run CLI commands. Agents you orchestrate don't read this page; you do, when something stalls.
|
|
6
|
+
|
|
7
|
+
## Quick-reference cheatsheet
|
|
8
|
+
|
|
9
|
+
| Symptom | First-line check | First-line fix |
|
|
10
|
+
|---|---|---|
|
|
11
|
+
| Agent crashed, claim still active | `brainclaw claim list` | `brainclaw claim release <id>` (or `brainclaw stale resolve <id>`) |
|
|
12
|
+
| Plan stuck `in_progress` for days | `brainclaw stale list` | `brainclaw stale resolve <plan_id>` (transitions to `dropped`) |
|
|
13
|
+
| Dispatched worker finished without committing | `git -C <worktree> status` | manually `git add` + `git commit` in the worktree, then merge |
|
|
14
|
+
| `Cannot find module 'mcp-worker.js'` | `brainclaw doctor` | `brainclaw doctor --repair` |
|
|
15
|
+
| Octopus merge fails on parallel lanes | `git status` | merge lanes one-by-one, resolve conflicts, then proceed |
|
|
16
|
+
| `.brainclaw/` schema looks corrupt | `brainclaw doctor --after-migration` | `brainclaw upgrade --rollback` (restores last backup) |
|
|
17
|
+
| Inbox messages stuck / not delivered | `brainclaw inbox list` | `brainclaw inbox ack <id>` or check `bclaw_assignment_events` |
|
|
18
|
+
| `bclaw_work` returns 25k-token error | n/a | already mitigated since v1.0.14 (compact mode default); pass `compact: true` if older clients |
|
|
19
|
+
| Stale runtime notes flood `bclaw_context` | `brainclaw stale list` | `brainclaw stale resolve <id>` per noisy item |
|
|
20
|
+
|
|
21
|
+
If your symptom isn't here, jump to the relevant section below or run `brainclaw doctor --json` and inspect the `checks` array.
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Stale claims after a crashed agent
|
|
26
|
+
|
|
27
|
+
**Symptom**: an agent died (credit limit, terminal closed, network drop). Other agents see the scope as held and refuse to claim it.
|
|
28
|
+
|
|
29
|
+
**Why**: claims are advisory locks with a TTL, but expiry is not enforced by a daemon — it surfaces only when something queries it. So a crashed agent's claim stays "active" until someone runs a check.
|
|
30
|
+
|
|
31
|
+
**Fix**:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
# See what's stale (uses the staleness scoring from src/core/staleness.ts)
|
|
35
|
+
brainclaw stale list
|
|
36
|
+
|
|
37
|
+
# Release a specific stale claim
|
|
38
|
+
brainclaw claim release <claim_id>
|
|
39
|
+
|
|
40
|
+
# Or, for any stale entity (plan, handoff, candidate, runtime_note, claim),
|
|
41
|
+
# trigger the canonical action:
|
|
42
|
+
brainclaw stale resolve <id>
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
`stale resolve` dispatches to the right transition per entity:
|
|
46
|
+
- claim → release
|
|
47
|
+
- plan → `bclaw_transition(entity="plan", to="dropped")`
|
|
48
|
+
- handoff → `bclaw_transition(entity="handoff", to="closed")`
|
|
49
|
+
- candidate → `bclaw_transition(entity="candidate", to="rejected")`
|
|
50
|
+
- trap → `bclaw_transition(entity="trap", to="resolved")`
|
|
51
|
+
- runtime_note → `bclaw_remove(entity="runtime_note", id=…)`
|
|
52
|
+
|
|
53
|
+
**Prevention**: agents that respect the protocol call `bclaw_session_end(auto_release: true)` on exit, which releases all their claims. This is the recommended default in every dispatch brief.
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## `bclaw_coordinate` refused with `dirty_working_tree`
|
|
58
|
+
|
|
59
|
+
**Symptom**: an `assign` / `review` / `reroute` dispatch returns
|
|
60
|
+
`dirty_working_tree` instead of spawning.
|
|
61
|
+
|
|
62
|
+
**Why**: the worker spawns from a worktree branched at HEAD, so uncommitted
|
|
63
|
+
edits in the source repo are invisible to it. The guard (trp#371) is
|
|
64
|
+
scope-aware — it refuses only when the uncommitted files **overlap**, or
|
|
65
|
+
cannot be proven disjoint from, the dispatch `scope`. `.brainclaw/` and
|
|
66
|
+
`.git/` are always ignored, and `consult` / `ideate` / `summarize` are never
|
|
67
|
+
guarded (they spawn no worktree). A scope that is not a resolvable file path
|
|
68
|
+
(a plan-id, loop-ref, or prose) cannot be proven disjoint, so the guard stays
|
|
69
|
+
conservative and refuses while the tree is dirty.
|
|
70
|
+
|
|
71
|
+
**Fixes**:
|
|
72
|
+
|
|
73
|
+
- Commit or stash the overlapping files, then re-dispatch (cleanest).
|
|
74
|
+
- Pass `allow_dirty: true` to proceed anyway — the block becomes a warning
|
|
75
|
+
that lists the overlapping files.
|
|
76
|
+
- Pass a resolvable file `scope` (e.g. `src/foo.ts`) so the guard can prove the
|
|
77
|
+
dirty files are out of scope.
|
|
78
|
+
- Pass `ref: <commit|branch|tag>` to build the worktree from an explicit ref —
|
|
79
|
+
uncommitted working-tree changes are then intentionally out of scope.
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
|
|
83
|
+
## Dispatched worker finished work but never committed
|
|
84
|
+
|
|
85
|
+
**Symptom**: a sequence's lane shows the worker as "task_complete" in the run log, but `git -C <worktree-path> status` shows uncommitted changes.
|
|
86
|
+
|
|
87
|
+
**Why**: some agents (notably codex when running in `--sandbox workspace-write`) sometimes finish editing without ever creating a git commit — they exit on `task_complete` from the prompt without the wrap-up step. The brief-ack file confirms the spawn *started*, not that it *committed*. See `trp#178`.
|
|
88
|
+
|
|
89
|
+
**Fix** (manual harvest):
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
# 1. Locate the worktree
|
|
93
|
+
git worktree list | grep feat/pln_<lane_id>
|
|
94
|
+
|
|
95
|
+
# 2. cd into it, inspect the work
|
|
96
|
+
cd ~/.brainclaw/worktrees/<project-hash>/feat_pln_xxxx
|
|
97
|
+
git status
|
|
98
|
+
git diff --stat
|
|
99
|
+
|
|
100
|
+
# 3. Stage + commit with a clear message that references the plan id
|
|
101
|
+
git add <files>
|
|
102
|
+
git commit -m "feat(<scope>): <summary> (pln#<id>)"
|
|
103
|
+
|
|
104
|
+
# 4. Back on master, octopus-merge as usual
|
|
105
|
+
cd <main repo>
|
|
106
|
+
git merge --no-ff feat/pln_xxxx -m "merge: <description>"
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
**Prevention**: every dispatch brief targeting agents prone to this pattern (notably codex) should include explicit commit instructions at the end, e.g. *"When done editing, stage your changes and create a commit with a clear message referencing the plan id (e.g. `feat(scope): summary (pln#XXX)`). Do not stop until the commit exists."*
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## MCP runtime corrupted (mcp-worker.js missing)
|
|
114
|
+
|
|
115
|
+
**Symptom**: `MCP error -32603: Cannot find module 'mcp-worker.js'` or the server logs `MCP runtime corrupted (mcp-worker.js missing)` on startup.
|
|
116
|
+
|
|
117
|
+
**Why**: `dist/` was wiped or partially deleted. Common causes: a `git merge` that triggered worktree cleanup before pln#477 landed, an `npm run clean:dist` followed by an interrupted build, or filesystem-level corruption.
|
|
118
|
+
|
|
119
|
+
**Fix**:
|
|
120
|
+
|
|
121
|
+
```bash
|
|
122
|
+
brainclaw doctor --repair
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
This rebuilds `dist/` from `src/` (TypeScript compile + copy default profiles) and validates by running `node dist/cli.js --version`. The repair also writes `dist/.brainclaw-build.json` so subsequent runs can do a stale-check (compare `src_hash` vs `dist_hash`).
|
|
126
|
+
|
|
127
|
+
**If `--repair` fails**: it usually means `node_modules` is also damaged. Run a clean `npm install` first, then re-run `brainclaw doctor --repair`.
|
|
128
|
+
|
|
129
|
+
**Note**: read-only MCP handlers stay available in-process even when the worker is missing (since pln#478) — so basic `bclaw_context` and `bclaw_find` calls still respond, but anything requiring the worker (most write operations) returns `runtime_corrupted` with a repair pointer.
|
|
130
|
+
|
|
131
|
+
---
|
|
132
|
+
|
|
133
|
+
## Octopus merge fails on parallel lanes
|
|
134
|
+
|
|
135
|
+
**Symptom**: after a sequenced parallel dispatch finishes, you run `git merge --no-ff lane1 lane2 lane3 -m "merge: …"` and git refuses with conflict markers.
|
|
136
|
+
|
|
137
|
+
**Why**: octopus merges only succeed when the lanes touch disjoint files. If two lanes wrote to the same file, octopus aborts and you must merge them sequentially.
|
|
138
|
+
|
|
139
|
+
**Fix**:
|
|
140
|
+
|
|
141
|
+
```bash
|
|
142
|
+
# Cancel the failed octopus
|
|
143
|
+
git merge --abort
|
|
144
|
+
|
|
145
|
+
# Merge lanes one at a time, resolving conflicts as needed
|
|
146
|
+
git merge --no-ff lane1
|
|
147
|
+
# (resolve any conflicts, commit)
|
|
148
|
+
git merge --no-ff lane2
|
|
149
|
+
# (resolve any conflicts, commit)
|
|
150
|
+
git merge --no-ff lane3
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
**Prevention**: when defining a sequence, choose lane scopes that minimize file overlap. Use `hard_after` dependencies for lanes that genuinely need to land in order. The dispatcher does not itself enforce disjoint scopes — that's the caller's responsibility when designing the sequence.
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## `.brainclaw/` looks corrupted (schema drift, malformed JSON)
|
|
158
|
+
|
|
159
|
+
**Symptom**: `bclaw_doctor` reports `state is invalid: <ZodError>` or files in `.brainclaw/memory/` fail to parse.
|
|
160
|
+
|
|
161
|
+
**Why**: usually a half-written file from an interrupted write (process killed mid-write), a migration that didn't complete, or a manual edit that introduced syntax errors. `brainclaw upgrade --rollback` exists precisely for this case.
|
|
162
|
+
|
|
163
|
+
**Fix**:
|
|
164
|
+
|
|
165
|
+
```bash
|
|
166
|
+
# 1. Inspect what's wrong
|
|
167
|
+
brainclaw doctor --after-migration
|
|
168
|
+
|
|
169
|
+
# 2. If the most recent migration is the cause, roll back
|
|
170
|
+
brainclaw upgrade --rollback
|
|
171
|
+
# This restores the last backup at <store>.bak-<iso-ts>/ and parks the
|
|
172
|
+
# current corrupted store at <store>.rollback-<iso-ts>/ for inspection.
|
|
173
|
+
|
|
174
|
+
# 3. If a single file is corrupted (and rollback is too aggressive),
|
|
175
|
+
# inspect the parked rollback dir and copy individual files back manually.
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
**Prevention**: brainclaw takes a backup before every `upgrade` run (see `docs/concepts/upgrade-cli.md`). For non-upgrade scenarios, rely on git: `.brainclaw/` is git-versioned by default, so `git log` and `git checkout <prev>` recover any committed state.
|
|
179
|
+
|
|
180
|
+
---
|
|
181
|
+
|
|
182
|
+
## Plan stuck `in_progress`
|
|
183
|
+
|
|
184
|
+
**Symptom**: a plan has been marked `in_progress` for days with no commits or claim activity.
|
|
185
|
+
|
|
186
|
+
**Why**: the agent that started it crashed, was rerouted, or simply forgot to transition to `done` / `blocked` / `dropped`.
|
|
187
|
+
|
|
188
|
+
**Fix**:
|
|
189
|
+
|
|
190
|
+
```bash
|
|
191
|
+
# Survey
|
|
192
|
+
brainclaw stale list # plan_in_progress flagged after 7 days by default
|
|
193
|
+
|
|
194
|
+
# Decide based on context
|
|
195
|
+
brainclaw stale resolve <plan_id> # → dropped (default for stale)
|
|
196
|
+
# or, via canonical grammar, transition to a different terminal state:
|
|
197
|
+
# bclaw_transition(entity="plan", id="<plan_id>", to="done")
|
|
198
|
+
# bclaw_transition(entity="plan", id="<plan_id>", to="blocked")
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
**Threshold tuning**: defaults live in `src/core/staleness.ts`. A config-driven override is on the roadmap (open follow-up); for now you adjust the source file if 7 days is too aggressive for your project.
|
|
202
|
+
|
|
203
|
+
---
|
|
204
|
+
|
|
205
|
+
## Inbox messages stuck / brief-ack never arrived
|
|
206
|
+
|
|
207
|
+
**Symptom**: a dispatched assignment shows `running` indefinitely, and `bclaw_assignment_events` shows `run_running` but no further progress.
|
|
208
|
+
|
|
209
|
+
**Why**: the spawned worker process either (a) crashed before reading its inbox, (b) read the inbox but couldn't acknowledge (e.g., MCP unavailable inside the spawned sandbox — common with codex `--sandbox workspace-write`), or (c) is genuinely still working but slow.
|
|
210
|
+
|
|
211
|
+
**Diagnostic order**:
|
|
212
|
+
|
|
213
|
+
```bash
|
|
214
|
+
# 1. Is the worker process still alive?
|
|
215
|
+
ps -ef | grep <agent-binary> # codex, claude, copilot, …
|
|
216
|
+
# Windows: Get-Process -Id <pid> # or `tasklist /FI "PID eq <pid>"`
|
|
217
|
+
|
|
218
|
+
# 2. Did the brief-ack file land?
|
|
219
|
+
ls .brainclaw/coordination/runtime/ack/<assignment_id>.ack
|
|
220
|
+
# If yes → spawn started, worker is somewhere in its loop
|
|
221
|
+
# If no → spawn never started or died before the wrap shell ran touch
|
|
222
|
+
|
|
223
|
+
# 3. (pln#504) What did the worker actually say? stdout/stderr capture
|
|
224
|
+
# Spawned workers now route their streams to per-assignment log files. If the
|
|
225
|
+
# worker died silently, the error usually shows up here.
|
|
226
|
+
cat .brainclaw/coordination/runtime/log/<assignment_id>.stdout.log
|
|
227
|
+
cat .brainclaw/coordination/runtime/log/<assignment_id>.stderr.log
|
|
228
|
+
|
|
229
|
+
# 4. Inspect the worktree for activity
|
|
230
|
+
git -C <worktree> log --oneline -5
|
|
231
|
+
git -C <worktree> status
|
|
232
|
+
|
|
233
|
+
# 5. Check the run log
|
|
234
|
+
brainclaw inbox list --agent <agent>
|
|
235
|
+
# or via MCP: bclaw_assignment_events(assignmentId="<id>")
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
**Fix paths**:
|
|
239
|
+
- Worker dead, no ack → reroute via `bclaw_coordinate(intent="reroute", …)` to another agent
|
|
240
|
+
- Worker dead, ack present, work uncommitted → manual harvest (see "Dispatched worker finished without committing" above)
|
|
241
|
+
- Worker still alive but slow → wait, or `kill` and reroute
|
|
242
|
+
|
|
243
|
+
**Brief-ack TTL** is configurable via `BRAINCLAW_HANDSHAKE_TIMEOUT_MS` (default 30s since pln#475+#476). Past that, the dispatcher times the spawn out and surfaces the failure in the assignment events log.
|
|
244
|
+
|
|
245
|
+
---
|
|
246
|
+
|
|
247
|
+
## See also
|
|
248
|
+
|
|
249
|
+
- [`docs/concepts/dispatch-lifecycle.md`](dispatch-lifecycle.md) — the entity model + FSMs + observability decision tree underlying every diagnostic step on this page
|
|
250
|
+
- [`docs/concepts/memory-staleness.md`](memory-staleness.md) — staleness signals and resolve flow in depth
|
|
251
|
+
- [`docs/concepts/loop-engine.md`](loop-engine.md) — multi-turn loops (review-fix), recovery semantics for in-flight loops
|
|
252
|
+
- [`docs/concepts/upgrade-cli.md`](upgrade-cli.md) — `brainclaw upgrade` design + rollback path
|
|
253
|
+
- [`docs/cli.md`](../cli.md) — full command reference for `doctor`, `stale`, `claim`, `upgrade`, `inbox`, `worktree`
|
|
254
|
+
- [`docs/concepts/multi-agent-workflows.md`](multi-agent-workflows.md) — happy-path coordination patterns (the inverse of this page)
|