brainclaw 1.9.0 → 1.9.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +585 -499
- package/dist/brainclaw-vscode.vsix +0 -0
- package/dist/commands/harvest.js +1 -1
- package/dist/commands/hooks.js +73 -73
- package/dist/commands/init.js +1 -1
- package/dist/commands/install-hooks.js +78 -78
- package/dist/commands/mcp-read-handlers.js +57 -14
- package/dist/commands/mcp.js +79 -13
- package/dist/commands/switch.js +26 -5
- package/dist/commands/version.js +1 -1
- package/dist/core/agent-capability.js +19 -4
- package/dist/core/agent-files.js +119 -119
- package/dist/core/codev-prompts.js +38 -38
- package/dist/core/default-profiles/doctor.yaml +11 -11
- package/dist/core/default-profiles/janitor.yaml +11 -11
- package/dist/core/default-profiles/onboarder.yaml +11 -11
- package/dist/core/default-profiles/reviewer.yaml +13 -13
- package/dist/core/dispatcher.js +1 -1
- package/dist/core/entity-operations.js +29 -3
- package/dist/core/execution.js +1 -1
- package/dist/core/loops/verbs.js +0 -1
- package/dist/core/messaging.js +2 -2
- package/dist/core/protocol-skills.js +164 -164
- package/dist/core/runtime-signals.js +1 -1
- package/dist/core/search.js +19 -2
- package/dist/core/security-guard.js +207 -207
- package/dist/core/spawn-check.js +16 -2
- package/dist/core/staleness.js +1 -1
- package/dist/core/store-resolution.js +26 -7
- package/dist/core/worktree.js +18 -18
- package/dist/facts.js +3 -3
- package/dist/facts.json +2 -2
- package/docs/PROTOCOL.md +1 -1
- package/docs/adapters/openclaw.md +43 -43
- package/docs/architecture/project-refs.md +328 -328
- package/docs/cli.md +2093 -2093
- package/docs/concepts/coordination.md +52 -52
- package/docs/concepts/coordinator-runbook.md +129 -129
- package/docs/concepts/dispatch-lifecycle.md +245 -245
- package/docs/concepts/event-log-store.md +928 -928
- package/docs/concepts/ideation-loop.md +317 -317
- package/docs/concepts/loop-engine.md +520 -511
- package/docs/concepts/mcp-governance.md +268 -268
- package/docs/concepts/memory.md +84 -84
- package/docs/concepts/multi-agent-workflows.md +167 -167
- package/docs/concepts/observer-protocol.md +361 -361
- package/docs/concepts/plans-and-claims.md +217 -217
- package/docs/concepts/project-md-convention.md +35 -35
- package/docs/concepts/runtime-notes.md +38 -38
- package/docs/concepts/troubleshooting.md +254 -254
- package/docs/concepts/workspace-bootstrapping.md +142 -142
- package/docs/context-format-changelog.md +35 -35
- package/docs/context-format.md +48 -48
- package/docs/index.md +65 -65
- package/docs/integrations/agents.md +158 -158
- package/docs/integrations/claude-code.md +23 -23
- package/docs/integrations/cline.md +77 -77
- package/docs/integrations/continue.md +55 -55
- package/docs/integrations/copilot.md +68 -68
- package/docs/integrations/cursor.md +23 -23
- package/docs/integrations/kilocode.md +72 -72
- package/docs/integrations/mcp.md +377 -377
- package/docs/integrations/mistral-vibe.md +122 -122
- package/docs/integrations/openclaw.md +92 -92
- package/docs/integrations/opencode.md +84 -84
- package/docs/integrations/overview.md +115 -115
- package/docs/integrations/roo.md +71 -71
- package/docs/integrations/windsurf.md +77 -77
- package/docs/mcp-schema-changelog.md +360 -356
- package/docs/playbooks/integration/index.md +121 -121
- package/docs/playbooks/orchestration.md +37 -0
- package/docs/playbooks/productivity/index.md +99 -99
- package/docs/playbooks/team/index.md +117 -117
- package/docs/product/agent-first-model.md +184 -184
- package/docs/product/entity-model-audit.md +462 -462
- package/docs/product/positioning.md +86 -86
- package/docs/quickstart-existing-project.md +107 -107
- package/docs/quickstart.md +183 -183
- package/docs/release-maintenance.md +79 -79
- package/docs/reputation.md +52 -52
- package/docs/review.md +45 -45
- package/docs/security.md +212 -212
- package/docs/server-operations.md +118 -118
- package/docs/storage.md +106 -106
- package/package.json +80 -65
- package/docs/concepts/event-log-store-critique-A.md +0 -333
- package/docs/concepts/event-log-store-critique-B.md +0 -353
- package/docs/concepts/event-log-store-phase0-measurements.md +0 -58
- package/docs/concepts/event-log-store-proposal-A.md +0 -365
- package/docs/concepts/event-log-store-proposal-B.md +0 -404
- package/docs/concepts/identity-model-proposal.md +0 -371
|
@@ -1,245 +1,245 @@
|
|
|
1
|
-
# Dispatch lifecycle
|
|
2
|
-
|
|
3
|
-
When brainclaw routes work to another agent — `bclaw_coordinate(intent="assign"|"review"|"consult")`, `bclaw_dispatch(intent="execute")`, or a multi-turn `bclaw_loop` — it spins up **up to six related entities** plus an on-disk **brief-ack sentinel** and (since pln#504) **per-assignment stdout/stderr log files**. Knowing what each one means lets you tell at a glance whether a dispatch is alive, dead, or merely slow.
|
|
4
|
-
|
|
5
|
-
This doc is the consolidated reference. It complements:
|
|
6
|
-
- [multi-agent-workflows.md](multi-agent-workflows.md) — happy-path coordination patterns
|
|
7
|
-
- [troubleshooting.md](troubleshooting.md) — symptom-driven diagnostic playbooks
|
|
8
|
-
- [loop-engine.md](loop-engine.md) — multi-turn loop protocol details
|
|
9
|
-
- [../integrations/codex.md](../integrations/codex.md), [../integrations/claude-code.md](../integrations/claude-code.md), etc. — per-agent spawn semantics
|
|
10
|
-
|
|
11
|
-
---
|
|
12
|
-
|
|
13
|
-
## The six entities
|
|
14
|
-
|
|
15
|
-
A single `bclaw_coordinate(intent="review", open_loop=true, targetAgents=[codex])` creates:
|
|
16
|
-
|
|
17
|
-
```
|
|
18
|
-
┌─────────────────┐
|
|
19
|
-
│ candidate │ cnd_… (review payload)
|
|
20
|
-
└────────┬────────┘
|
|
21
|
-
│ references
|
|
22
|
-
┌───────────────────┼──────────────────┐
|
|
23
|
-
▼ ▼ ▼
|
|
24
|
-
┌──────────┐ ┌─────────────┐ ┌──────────┐
|
|
25
|
-
│ loop │ ◄────►│ assignment │ │ message │
|
|
26
|
-
│ lop_… │ │ asgn_… │ │ msg_… │
|
|
27
|
-
└──────────┘ └──────┬──────┘ └──────────┘
|
|
28
|
-
│
|
|
29
|
-
│ owned-by
|
|
30
|
-
▼
|
|
31
|
-
┌──────────────┐
|
|
32
|
-
│ claim │ clm_… (worktree lock)
|
|
33
|
-
└──────┬───────┘
|
|
34
|
-
│ triggers
|
|
35
|
-
▼
|
|
36
|
-
┌──────────────┐
|
|
37
|
-
│ agent_run │ run_… (the OS-level spawn)
|
|
38
|
-
└──────┬───────┘
|
|
39
|
-
│
|
|
40
|
-
┌───────────────────┼─────────────────┐
|
|
41
|
-
▼ ▼ ▼
|
|
42
|
-
┌──────────┐ ┌─────────────┐ ┌────────────┐
|
|
43
|
-
│ ack file │ │ stdout log │ │ stderr log │
|
|
44
|
-
│ .ack │ │ .stdout.log │ │ .stderr.log│
|
|
45
|
-
└──────────┘ └─────────────┘ └────────────┘
|
|
46
|
-
(pln#476) (pln#504) (pln#504)
|
|
47
|
-
```
|
|
48
|
-
|
|
49
|
-
| Entity | Prefix | Created by | Owner | Purpose |
|
|
50
|
-
|---|---|---|---|---|
|
|
51
|
-
| `candidate` | `cnd_` | the coordinate facade (review/ideate) | the dispatcher agent | Review payload that the loop references. Stays after the loop closes. |
|
|
52
|
-
| `loop` | `lop_` | `bclaw_coordinate(open_loop=true)` or `bclaw_loop(intent="open")` | the dispatcher | Multi-turn thread of structured work. Has its own FSM. |
|
|
53
|
-
| `assignment` | `asgn_` | dispatcher when targeting an agent | the **target** agent | Lifecycle event for that agent's turn. The only entity whose FSM tracks the WORKER's progress. |
|
|
54
|
-
| `message` | `msg_` | dispatcher | the dispatcher | The brief delivered to the target's inbox. |
|
|
55
|
-
| `claim` | `clm_` | dispatcher (or `bclaw_claim` directly) | the target agent | Worktree advisory lock. Released when the work is done or the agent gives up. |
|
|
56
|
-
| `agent_run` | `run_` | the CLI execution adapter, only when an OS-level spawn actually happens | the target agent | OS-level subprocess record. Status FSM tracks the LIFETIME of the process — but only the parts brainclaw can observe (see [§Liveness limits](#liveness-limits) below). |
|
|
57
|
-
|
|
58
|
-
Plus two filesystem-only artefacts created by the worker shell wrapper:
|
|
59
|
-
|
|
60
|
-
- **Brief-ack sentinel**: `.brainclaw/coordination/runtime/ack/<assignment_id>.ack` — touched by the spawn wrapper BEFORE the agent binary runs (pln#476). Proves the spawn shell got far enough to execute `touch`. Does NOT prove the agent binary itself succeeded.
|
|
61
|
-
- **stdout/stderr logs** (pln#504): `.brainclaw/coordination/runtime/log/<assignment_id>.{stdout,stderr}.log` — opened by the parent before the spawn, the child inherits dup'd fds and writes its streams there. This is the only window onto what a sandboxed worker actually said before dying.
|
|
62
|
-
|
|
63
|
-
---
|
|
64
|
-
|
|
65
|
-
## FSM cheatsheet
|
|
66
|
-
|
|
67
|
-
### `loop.status`
|
|
68
|
-
|
|
69
|
-
```
|
|
70
|
-
open ──▶ paused ──▶ open (pause / resume)
|
|
71
|
-
│
|
|
72
|
-
├──▶ completed (stop_condition met)
|
|
73
|
-
├──▶ cancelled (manual close — use when the loop dies abnormally)
|
|
74
|
-
└──▶ blocked (external blocker; intent to resume later)
|
|
75
|
-
```
|
|
76
|
-
|
|
77
|
-
`bclaw_loop(intent="close")` accepts **only** `completed | cancelled | blocked` as `status`. **Not `failed`** — map crashed/dead loops to `cancelled` with a `reason`.
|
|
78
|
-
|
|
79
|
-
### `assignment.status`
|
|
80
|
-
|
|
81
|
-
```
|
|
82
|
-
created ──▶ offered ──▶ accepted ──▶ started ──▶ completed
|
|
83
|
-
│ │ │ │
|
|
84
|
-
│ │ │ └──▶ failed (worker self-reported)
|
|
85
|
-
│ │ │ └──▶ blocked (worker needs supervisor)
|
|
86
|
-
│ │ │ └──▶ cancelled (rerouted away)
|
|
87
|
-
│ │ └──▶ acceptance_ttl expired (default 15min) → cancelled
|
|
88
|
-
│ └──▶ heartbeat_ttl expired (default 30min while running) → cancelled
|
|
89
|
-
└──▶ removed by `bclaw_assignment_admin` (rare)
|
|
90
|
-
```
|
|
91
|
-
|
|
92
|
-
Transitions past `offered` require the assigned agent itself (or `bclaw_assignment_admin`). A coordinator that didn't create the assignment **cannot** update it — `Agent X cannot update assignment owned by Y` is the canonical rejection.
|
|
93
|
-
|
|
94
|
-
### `agent_run.status`
|
|
95
|
-
|
|
96
|
-
```
|
|
97
|
-
launching ──▶ running ──▶ completed
|
|
98
|
-
│ ──▶ failed (non-zero exit, worker reported)
|
|
99
|
-
│ ──▶ interrupted (TTL/heartbeat expiry, see below)
|
|
100
|
-
│
|
|
101
|
-
└──▶ failed (spawn returned no pid, brief-ack timeout)
|
|
102
|
-
```
|
|
103
|
-
|
|
104
|
-
**Liveness limits** {#liveness-limits}: `last_event_at` is bumped only when the worker writes a lifecycle event (via MCP or via the wrap shell). A worker that crashes before its first output keeps `status=running` and `last_event_at == launched_at` until reconciled. Since pln#503 phase 3.2, **any read of `agent_run` via `bclaw_find` / `bclaw_get` triggers a lazy reconciliation pass**: open runs past the 60s grace window get their pid checked, and dead workers transition to `failed` (`status_reason='silent_termination_no_evidence'`) once past the 30min stale threshold.
|
|
105
|
-
|
|
106
|
-
For a single consolidated check (run + assignment + claim + loop + pid + log tails + verdict in one response), use **`bclaw_dispatch_status(target_id)`** (pln#503 phase 3.1).
|
|
107
|
-
|
|
108
|
-
### `claim.status`
|
|
109
|
-
|
|
110
|
-
```
|
|
111
|
-
active ──▶ released
|
|
112
|
-
│
|
|
113
|
-
└──▶ adopted (another session inherited the claim, e.g. reconnect)
|
|
114
|
-
```
|
|
115
|
-
|
|
116
|
-
Releasing a claim does NOT cancel its assignment / agent_run / loop — those are independent entities. You generally need to clean up all of them together when aborting a dispatch.
|
|
117
|
-
|
|
118
|
-
---
|
|
119
|
-
|
|
120
|
-
## Observability decision tree
|
|
121
|
-
|
|
122
|
-
You called `bclaw_coordinate(intent="review", open_loop=true, …)` and got back `execution_status: "delivered_and_started"`. What does that actually mean?
|
|
123
|
-
|
|
124
|
-
**Fast path** (recommended since pln#503 phase 3.1): call `bclaw_dispatch_status(target_id="<asgn_…>")` and read its `diagnosis.health` + `diagnosis.recommended_next_action`. The tool consolidates the steps below into a single response — entity fan-out, pid liveness, log tails, verdict, recommended next action.
|
|
125
|
-
|
|
126
|
-
**Long path** (for understanding or when the tool isn't available):
|
|
127
|
-
|
|
128
|
-
```
|
|
129
|
-
1. execution_status = "delivered_and_started"
|
|
130
|
-
├──▶ Means: the spawn wrapper touched the brief-ack sentinel
|
|
131
|
-
└──▶ Does NOT mean: the worker is doing useful work
|
|
132
|
-
|
|
133
|
-
2. Verify the spawn is alive — check the agent_run record
|
|
134
|
-
bclaw_find(entity="agent_run", filter={assignment_id: "<asgn>"})
|
|
135
|
-
├──▶ status="running" AND pid alive on OS AND last_event_at < 5min ago → healthy
|
|
136
|
-
├──▶ status="running" AND pid alive AND last_event_at == launched_at → stalled (worker never produced output)
|
|
137
|
-
├──▶ status="running" AND pid dead → silently died (see logs)
|
|
138
|
-
└──▶ status="completed" / "failed" / "interrupted" → terminal, read status_reason
|
|
139
|
-
|
|
140
|
-
3. If silent, read the logs (pln#504)
|
|
141
|
-
cat .brainclaw/coordination/runtime/log/<asgn>.stderr.log
|
|
142
|
-
cat .brainclaw/coordination/runtime/log/<asgn>.stdout.log
|
|
143
|
-
├──▶ Contains an error → root cause found
|
|
144
|
-
└──▶ Empty → worker died before any write OR launched without log capture (legacy path)
|
|
145
|
-
|
|
146
|
-
4. If the worker is alive but doing nothing useful for 15+ min
|
|
147
|
-
→ most likely sandbox / MCP / capability mismatch with the brief
|
|
148
|
-
→ see ../integrations/<agent>.md "Caveats" for per-agent gotchas
|
|
149
|
-
```
|
|
150
|
-
|
|
151
|
-
---
|
|
152
|
-
|
|
153
|
-
## Worktree-as-contract harvest
|
|
154
|
-
|
|
155
|
-
Some dispatched workers cannot self-commit or call MCP. For example, a sandboxed Codex run may have `dispatchCanCommit=false` because its writable root is the linked worktree, while `.git` lives outside that root. In that case the worker contract is intentionally small:
|
|
156
|
-
|
|
157
|
-
1. Edit files inside the dispatched worktree.
|
|
158
|
-
2. Write `LANE-RESULT.json` at the worktree root.
|
|
159
|
-
|
|
160
|
-
The worker does not need to commit, call `bclaw_assignment_update`, or release the claim itself. The worktree is the contract.
|
|
161
|
-
|
|
162
|
-
When the coordinator runs `brainclaw harvest <assignment_id> --integrate`, brainclaw reads the worker's `LANE-RESULT.json`, commits the linked worktree diff on the worker's behalf onto the lane branch, then completes the assignment and releases the claim, including the normal plan-status cascade.
|
|
163
|
-
|
|
164
|
-
The on-behalf commit is guarded by the linked-worktree check (`isLinkedWorktree`): integration only targets the worktree associated with the assignment, never the main repository. This keeps sandboxed-worker harvesting from turning into an accidental main-repo commit path.
|
|
165
|
-
|
|
166
|
-
Integration is strictly additive and opt-in. Plain `brainclaw harvest <assignment_id>` remains report-only; it reads and reports the lane result without committing or mutating assignment / claim state. The on-behalf commit and lifecycle completion happen only when the coordinator passes `--integrate`.
|
|
167
|
-
|
|
168
|
-
---
|
|
169
|
-
|
|
170
|
-
## Diagnostic playbook
|
|
171
|
-
|
|
172
|
-
When a dispatch hangs, work top-down through these checks. For the symptom-driven variant see [troubleshooting.md#inbox-messages-stuck--brief-ack-never-arrived](troubleshooting.md#inbox-messages-stuck--brief-ack-never-arrived).
|
|
173
|
-
|
|
174
|
-
### Quick triage (≤5s)
|
|
175
|
-
|
|
176
|
-
```bash
|
|
177
|
-
# Single call covers process liveness + ack + log tails + entity state + verdict
|
|
178
|
-
bclaw_dispatch_status(target_id="<asgn>") # or clm_/lop_/run_
|
|
179
|
-
```
|
|
180
|
-
|
|
181
|
-
Read `diagnosis.health` (`healthy` | `stalled` | `silent_death` | `terminal` | `not_dispatched` | `unknown`) and `diagnosis.recommended_next_action` — usually that's all you need.
|
|
182
|
-
|
|
183
|
-
### Manual triage (≤30s — when `bclaw_dispatch_status` isn't available)
|
|
184
|
-
|
|
185
|
-
```bash
|
|
186
|
-
# 1. Is the OS-level process alive?
|
|
187
|
-
Get-Process -Id <pid> # Windows
|
|
188
|
-
ps -p <pid> # POSIX
|
|
189
|
-
|
|
190
|
-
# 2. Did the spawn wrapper actually run?
|
|
191
|
-
ls .brainclaw/coordination/runtime/ack/<asgn>.ack
|
|
192
|
-
|
|
193
|
-
# 3. What did the worker say? (pln#504)
|
|
194
|
-
cat .brainclaw/coordination/runtime/log/<asgn>.stderr.log
|
|
195
|
-
cat .brainclaw/coordination/runtime/log/<asgn>.stdout.log
|
|
196
|
-
```
|
|
197
|
-
|
|
198
|
-
### Deeper (1-5min)
|
|
199
|
-
|
|
200
|
-
```bash
|
|
201
|
-
# Full entity state — same fan-out bclaw_dispatch_status does for you
|
|
202
|
-
bclaw_get(entity="assignment", id="<asgn>") # owner, ttls, status_reason
|
|
203
|
-
bclaw_get(entity="agent_run", id="<run>") # pid, started_at, last_event_at
|
|
204
|
-
bclaw_get(entity="claim", id="<clm>") # worktree, agent
|
|
205
|
-
bclaw_get(entity="loop", id="<lop>") # current_phase, slot states
|
|
206
|
-
|
|
207
|
-
# Worktree activity
|
|
208
|
-
git -C <worktree> log --oneline -5 # any new commits?
|
|
209
|
-
git -C <worktree> status # uncommitted work?
|
|
210
|
-
ls <worktree>/REVIEW_FINDINGS.md # for review loops
|
|
211
|
-
```
|
|
212
|
-
|
|
213
|
-
### Abort a dispatch cleanly
|
|
214
|
-
|
|
215
|
-
A dead dispatch needs four cleanup steps (no single facade does all of them today):
|
|
216
|
-
|
|
217
|
-
```text
|
|
218
|
-
1. Stop-Process -Id <pid> # if pid still alive
|
|
219
|
-
2. bclaw_loop(intent="close", loop_id="<lop>", status="cancelled", reason="...")
|
|
220
|
-
3. bclaw_release_claim(id="<clm>")
|
|
221
|
-
4. (optional) bclaw_assignment_admin or leave assignment as `offered`
|
|
222
|
-
— only the owning agent can transition assignment.status, and a
|
|
223
|
-
released claim already makes it effectively orphan
|
|
224
|
-
```
|
|
225
|
-
|
|
226
|
-
---
|
|
227
|
-
|
|
228
|
-
## Per-agent spawn semantics
|
|
229
|
-
|
|
230
|
-
Spawn behaviour varies by agent. The capability profile in `src/core/agent-capability.ts` describes each agent's prompt delivery, sandbox model, and MCP availability. Per-agent caveats:
|
|
231
|
-
|
|
232
|
-
- [codex.md](../integrations/codex.md#caveats) — `--sandbox workspace-write` required; spawned codex may not have brainclaw MCP wired; stdin_pipe prompt delivery; brief-ack required for headless dispatch detection.
|
|
233
|
-
- [claude-code.md](../integrations/claude-code.md) — interactive vs `-p` headless modes; tools whitelist.
|
|
234
|
-
- [copilot.md](../integrations/copilot.md), [windsurf.md](../integrations/windsurf.md), [cline.md](../integrations/cline.md), [opencode.md](../integrations/opencode.md), [roo.md](../integrations/roo.md), [kilocode.md](../integrations/kilocode.md), [continue.md](../integrations/continue.md) — per-agent specifics.
|
|
235
|
-
- [mistral-vibe.md](../integrations/mistral-vibe.md) — EU/GDPR self-hosted option.
|
|
236
|
-
|
|
237
|
-
---
|
|
238
|
-
|
|
239
|
-
## See also
|
|
240
|
-
|
|
241
|
-
- [troubleshooting.md](troubleshooting.md) — symptom-driven diagnostic playbooks
|
|
242
|
-
- [loop-engine.md](loop-engine.md) — multi-turn loop protocol, locks, advance gates
|
|
243
|
-
- [multi-agent-workflows.md](multi-agent-workflows.md) — high-level coordination scenarios
|
|
244
|
-
- [../integrations/overview.md](../integrations/overview.md) — index of supported agents
|
|
245
|
-
- [../integrations/mcp.md](../integrations/mcp.md) — full MCP tool catalog
|
|
1
|
+
# Dispatch lifecycle
|
|
2
|
+
|
|
3
|
+
When brainclaw routes work to another agent — `bclaw_coordinate(intent="assign"|"review"|"consult")`, `bclaw_dispatch(intent="execute")`, or a multi-turn `bclaw_loop` — it spins up **up to six related entities** plus an on-disk **brief-ack sentinel** and (since pln#504) **per-assignment stdout/stderr log files**. Knowing what each one means lets you tell at a glance whether a dispatch is alive, dead, or merely slow.
|
|
4
|
+
|
|
5
|
+
This doc is the consolidated reference. It complements:
|
|
6
|
+
- [multi-agent-workflows.md](multi-agent-workflows.md) — happy-path coordination patterns
|
|
7
|
+
- [troubleshooting.md](troubleshooting.md) — symptom-driven diagnostic playbooks
|
|
8
|
+
- [loop-engine.md](loop-engine.md) — multi-turn loop protocol details
|
|
9
|
+
- [../integrations/codex.md](../integrations/codex.md), [../integrations/claude-code.md](../integrations/claude-code.md), etc. — per-agent spawn semantics
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## The six entities
|
|
14
|
+
|
|
15
|
+
A single `bclaw_coordinate(intent="review", open_loop=true, targetAgents=[codex])` creates:
|
|
16
|
+
|
|
17
|
+
```
|
|
18
|
+
┌─────────────────┐
|
|
19
|
+
│ candidate │ cnd_… (review payload)
|
|
20
|
+
└────────┬────────┘
|
|
21
|
+
│ references
|
|
22
|
+
┌───────────────────┼──────────────────┐
|
|
23
|
+
▼ ▼ ▼
|
|
24
|
+
┌──────────┐ ┌─────────────┐ ┌──────────┐
|
|
25
|
+
│ loop │ ◄────►│ assignment │ │ message │
|
|
26
|
+
│ lop_… │ │ asgn_… │ │ msg_… │
|
|
27
|
+
└──────────┘ └──────┬──────┘ └──────────┘
|
|
28
|
+
│
|
|
29
|
+
│ owned-by
|
|
30
|
+
▼
|
|
31
|
+
┌──────────────┐
|
|
32
|
+
│ claim │ clm_… (worktree lock)
|
|
33
|
+
└──────┬───────┘
|
|
34
|
+
│ triggers
|
|
35
|
+
▼
|
|
36
|
+
┌──────────────┐
|
|
37
|
+
│ agent_run │ run_… (the OS-level spawn)
|
|
38
|
+
└──────┬───────┘
|
|
39
|
+
│
|
|
40
|
+
┌───────────────────┼─────────────────┐
|
|
41
|
+
▼ ▼ ▼
|
|
42
|
+
┌──────────┐ ┌─────────────┐ ┌────────────┐
|
|
43
|
+
│ ack file │ │ stdout log │ │ stderr log │
|
|
44
|
+
│ .ack │ │ .stdout.log │ │ .stderr.log│
|
|
45
|
+
└──────────┘ └─────────────┘ └────────────┘
|
|
46
|
+
(pln#476) (pln#504) (pln#504)
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
| Entity | Prefix | Created by | Owner | Purpose |
|
|
50
|
+
|---|---|---|---|---|
|
|
51
|
+
| `candidate` | `cnd_` | the coordinate facade (review/ideate) | the dispatcher agent | Review payload that the loop references. Stays after the loop closes. |
|
|
52
|
+
| `loop` | `lop_` | `bclaw_coordinate(open_loop=true)` or `bclaw_loop(intent="open")` | the dispatcher | Multi-turn thread of structured work. Has its own FSM. |
|
|
53
|
+
| `assignment` | `asgn_` | dispatcher when targeting an agent | the **target** agent | Lifecycle event for that agent's turn. The only entity whose FSM tracks the WORKER's progress. |
|
|
54
|
+
| `message` | `msg_` | dispatcher | the dispatcher | The brief delivered to the target's inbox. |
|
|
55
|
+
| `claim` | `clm_` | dispatcher (or `bclaw_claim` directly) | the target agent | Worktree advisory lock. Released when the work is done or the agent gives up. |
|
|
56
|
+
| `agent_run` | `run_` | the CLI execution adapter, only when an OS-level spawn actually happens | the target agent | OS-level subprocess record. Status FSM tracks the LIFETIME of the process — but only the parts brainclaw can observe (see [§Liveness limits](#liveness-limits) below). |
|
|
57
|
+
|
|
58
|
+
Plus two filesystem-only artefacts created by the worker shell wrapper:
|
|
59
|
+
|
|
60
|
+
- **Brief-ack sentinel**: `.brainclaw/coordination/runtime/ack/<assignment_id>.ack` — touched by the spawn wrapper BEFORE the agent binary runs (pln#476). Proves the spawn shell got far enough to execute `touch`. Does NOT prove the agent binary itself succeeded.
|
|
61
|
+
- **stdout/stderr logs** (pln#504): `.brainclaw/coordination/runtime/log/<assignment_id>.{stdout,stderr}.log` — opened by the parent before the spawn, the child inherits dup'd fds and writes its streams there. This is the only window onto what a sandboxed worker actually said before dying.
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## FSM cheatsheet
|
|
66
|
+
|
|
67
|
+
### `loop.status`
|
|
68
|
+
|
|
69
|
+
```
|
|
70
|
+
open ──▶ paused ──▶ open (pause / resume)
|
|
71
|
+
│
|
|
72
|
+
├──▶ completed (stop_condition met)
|
|
73
|
+
├──▶ cancelled (manual close — use when the loop dies abnormally)
|
|
74
|
+
└──▶ blocked (external blocker; intent to resume later)
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
`bclaw_loop(intent="close")` accepts **only** `completed | cancelled | blocked` as `status`. **Not `failed`** — map crashed/dead loops to `cancelled` with a `reason`.
|
|
78
|
+
|
|
79
|
+
### `assignment.status`
|
|
80
|
+
|
|
81
|
+
```
|
|
82
|
+
created ──▶ offered ──▶ accepted ──▶ started ──▶ completed
|
|
83
|
+
│ │ │ │
|
|
84
|
+
│ │ │ └──▶ failed (worker self-reported)
|
|
85
|
+
│ │ │ └──▶ blocked (worker needs supervisor)
|
|
86
|
+
│ │ │ └──▶ cancelled (rerouted away)
|
|
87
|
+
│ │ └──▶ acceptance_ttl expired (default 15min) → cancelled
|
|
88
|
+
│ └──▶ heartbeat_ttl expired (default 30min while running) → cancelled
|
|
89
|
+
└──▶ removed by `bclaw_assignment_admin` (rare)
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
Transitions past `offered` require the assigned agent itself (or `bclaw_assignment_admin`). A coordinator that didn't create the assignment **cannot** update it — `Agent X cannot update assignment owned by Y` is the canonical rejection.
|
|
93
|
+
|
|
94
|
+
### `agent_run.status`
|
|
95
|
+
|
|
96
|
+
```
|
|
97
|
+
launching ──▶ running ──▶ completed
|
|
98
|
+
│ ──▶ failed (non-zero exit, worker reported)
|
|
99
|
+
│ ──▶ interrupted (TTL/heartbeat expiry, see below)
|
|
100
|
+
│
|
|
101
|
+
└──▶ failed (spawn returned no pid, brief-ack timeout)
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
**Liveness limits** {#liveness-limits}: `last_event_at` is bumped only when the worker writes a lifecycle event (via MCP or via the wrap shell). A worker that crashes before its first output keeps `status=running` and `last_event_at == launched_at` until reconciled. Since pln#503 phase 3.2, **any read of `agent_run` via `bclaw_find` / `bclaw_get` triggers a lazy reconciliation pass**: open runs past the 60s grace window get their pid checked, and dead workers transition to `failed` (`status_reason='silent_termination_no_evidence'`) once past the 30min stale threshold.
|
|
105
|
+
|
|
106
|
+
For a single consolidated check (run + assignment + claim + loop + pid + log tails + verdict in one response), use **`bclaw_dispatch_status(target_id)`** (pln#503 phase 3.1).
|
|
107
|
+
|
|
108
|
+
### `claim.status`
|
|
109
|
+
|
|
110
|
+
```
|
|
111
|
+
active ──▶ released
|
|
112
|
+
│
|
|
113
|
+
└──▶ adopted (another session inherited the claim, e.g. reconnect)
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
Releasing a claim does NOT cancel its assignment / agent_run / loop — those are independent entities. You generally need to clean up all of them together when aborting a dispatch.
|
|
117
|
+
|
|
118
|
+
---
|
|
119
|
+
|
|
120
|
+
## Observability decision tree
|
|
121
|
+
|
|
122
|
+
You called `bclaw_coordinate(intent="review", open_loop=true, …)` and got back `execution_status: "delivered_and_started"`. What does that actually mean?
|
|
123
|
+
|
|
124
|
+
**Fast path** (recommended since pln#503 phase 3.1): call `bclaw_dispatch_status(target_id="<asgn_…>")` and read its `diagnosis.health` + `diagnosis.recommended_next_action`. The tool consolidates the steps below into a single response — entity fan-out, pid liveness, log tails, verdict, recommended next action.
|
|
125
|
+
|
|
126
|
+
**Long path** (for understanding or when the tool isn't available):
|
|
127
|
+
|
|
128
|
+
```
|
|
129
|
+
1. execution_status = "delivered_and_started"
|
|
130
|
+
├──▶ Means: the spawn wrapper touched the brief-ack sentinel
|
|
131
|
+
└──▶ Does NOT mean: the worker is doing useful work
|
|
132
|
+
|
|
133
|
+
2. Verify the spawn is alive — check the agent_run record
|
|
134
|
+
bclaw_find(entity="agent_run", filter={assignment_id: "<asgn>"})
|
|
135
|
+
├──▶ status="running" AND pid alive on OS AND last_event_at < 5min ago → healthy
|
|
136
|
+
├──▶ status="running" AND pid alive AND last_event_at == launched_at → stalled (worker never produced output)
|
|
137
|
+
├──▶ status="running" AND pid dead → silently died (see logs)
|
|
138
|
+
└──▶ status="completed" / "failed" / "interrupted" → terminal, read status_reason
|
|
139
|
+
|
|
140
|
+
3. If silent, read the logs (pln#504)
|
|
141
|
+
cat .brainclaw/coordination/runtime/log/<asgn>.stderr.log
|
|
142
|
+
cat .brainclaw/coordination/runtime/log/<asgn>.stdout.log
|
|
143
|
+
├──▶ Contains an error → root cause found
|
|
144
|
+
└──▶ Empty → worker died before any write OR launched without log capture (legacy path)
|
|
145
|
+
|
|
146
|
+
4. If the worker is alive but doing nothing useful for 15+ min
|
|
147
|
+
→ most likely sandbox / MCP / capability mismatch with the brief
|
|
148
|
+
→ see ../integrations/<agent>.md "Caveats" for per-agent gotchas
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
## Worktree-as-contract harvest
|
|
154
|
+
|
|
155
|
+
Some dispatched workers cannot self-commit or call MCP. For example, a sandboxed Codex run may have `dispatchCanCommit=false` because its writable root is the linked worktree, while `.git` lives outside that root. In that case the worker contract is intentionally small:
|
|
156
|
+
|
|
157
|
+
1. Edit files inside the dispatched worktree.
|
|
158
|
+
2. Write `LANE-RESULT.json` at the worktree root.
|
|
159
|
+
|
|
160
|
+
The worker does not need to commit, call `bclaw_assignment_update`, or release the claim itself. The worktree is the contract.
|
|
161
|
+
|
|
162
|
+
When the coordinator runs `brainclaw harvest <assignment_id> --integrate`, brainclaw reads the worker's `LANE-RESULT.json`, commits the linked worktree diff on the worker's behalf onto the lane branch, then completes the assignment and releases the claim, including the normal plan-status cascade.
|
|
163
|
+
|
|
164
|
+
The on-behalf commit is guarded by the linked-worktree check (`isLinkedWorktree`): integration only targets the worktree associated with the assignment, never the main repository. This keeps sandboxed-worker harvesting from turning into an accidental main-repo commit path.
|
|
165
|
+
|
|
166
|
+
Integration is strictly additive and opt-in. Plain `brainclaw harvest <assignment_id>` remains report-only; it reads and reports the lane result without committing or mutating assignment / claim state. The on-behalf commit and lifecycle completion happen only when the coordinator passes `--integrate`.
|
|
167
|
+
|
|
168
|
+
---
|
|
169
|
+
|
|
170
|
+
## Diagnostic playbook
|
|
171
|
+
|
|
172
|
+
When a dispatch hangs, work top-down through these checks. For the symptom-driven variant see [troubleshooting.md#inbox-messages-stuck--brief-ack-never-arrived](troubleshooting.md#inbox-messages-stuck--brief-ack-never-arrived).
|
|
173
|
+
|
|
174
|
+
### Quick triage (≤5s)
|
|
175
|
+
|
|
176
|
+
```bash
|
|
177
|
+
# Single call covers process liveness + ack + log tails + entity state + verdict
|
|
178
|
+
bclaw_dispatch_status(target_id="<asgn>") # or clm_/lop_/run_
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
Read `diagnosis.health` (`healthy` | `stalled` | `silent_death` | `terminal` | `not_dispatched` | `unknown`) and `diagnosis.recommended_next_action` — usually that's all you need.
|
|
182
|
+
|
|
183
|
+
### Manual triage (≤30s — when `bclaw_dispatch_status` isn't available)
|
|
184
|
+
|
|
185
|
+
```bash
|
|
186
|
+
# 1. Is the OS-level process alive?
|
|
187
|
+
Get-Process -Id <pid> # Windows
|
|
188
|
+
ps -p <pid> # POSIX
|
|
189
|
+
|
|
190
|
+
# 2. Did the spawn wrapper actually run?
|
|
191
|
+
ls .brainclaw/coordination/runtime/ack/<asgn>.ack
|
|
192
|
+
|
|
193
|
+
# 3. What did the worker say? (pln#504)
|
|
194
|
+
cat .brainclaw/coordination/runtime/log/<asgn>.stderr.log
|
|
195
|
+
cat .brainclaw/coordination/runtime/log/<asgn>.stdout.log
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
### Deeper (1-5min)
|
|
199
|
+
|
|
200
|
+
```bash
|
|
201
|
+
# Full entity state — same fan-out bclaw_dispatch_status does for you
|
|
202
|
+
bclaw_get(entity="assignment", id="<asgn>") # owner, ttls, status_reason
|
|
203
|
+
bclaw_get(entity="agent_run", id="<run>") # pid, started_at, last_event_at
|
|
204
|
+
bclaw_get(entity="claim", id="<clm>") # worktree, agent
|
|
205
|
+
bclaw_get(entity="loop", id="<lop>") # current_phase, slot states
|
|
206
|
+
|
|
207
|
+
# Worktree activity
|
|
208
|
+
git -C <worktree> log --oneline -5 # any new commits?
|
|
209
|
+
git -C <worktree> status # uncommitted work?
|
|
210
|
+
ls <worktree>/REVIEW_FINDINGS.md # for review loops
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
### Abort a dispatch cleanly
|
|
214
|
+
|
|
215
|
+
A dead dispatch needs four cleanup steps (no single facade does all of them today):
|
|
216
|
+
|
|
217
|
+
```text
|
|
218
|
+
1. Stop-Process -Id <pid> # if pid still alive
|
|
219
|
+
2. bclaw_loop(intent="close", loop_id="<lop>", status="cancelled", reason="...")
|
|
220
|
+
3. bclaw_release_claim(id="<clm>")
|
|
221
|
+
4. (optional) bclaw_assignment_admin or leave assignment as `offered`
|
|
222
|
+
— only the owning agent can transition assignment.status, and a
|
|
223
|
+
released claim already makes it effectively orphan
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
---
|
|
227
|
+
|
|
228
|
+
## Per-agent spawn semantics
|
|
229
|
+
|
|
230
|
+
Spawn behaviour varies by agent. The capability profile in `src/core/agent-capability.ts` describes each agent's prompt delivery, sandbox model, and MCP availability. Per-agent caveats:
|
|
231
|
+
|
|
232
|
+
- [codex.md](../integrations/codex.md#caveats) — `--sandbox workspace-write` required; spawned codex may not have brainclaw MCP wired; stdin_pipe prompt delivery; brief-ack required for headless dispatch detection.
|
|
233
|
+
- [claude-code.md](../integrations/claude-code.md) — interactive vs `-p` headless modes; tools whitelist.
|
|
234
|
+
- [copilot.md](../integrations/copilot.md), [windsurf.md](../integrations/windsurf.md), [cline.md](../integrations/cline.md), [opencode.md](../integrations/opencode.md), [roo.md](../integrations/roo.md), [kilocode.md](../integrations/kilocode.md), [continue.md](../integrations/continue.md) — per-agent specifics.
|
|
235
|
+
- [mistral-vibe.md](../integrations/mistral-vibe.md) — EU/GDPR self-hosted option.
|
|
236
|
+
|
|
237
|
+
---
|
|
238
|
+
|
|
239
|
+
## See also
|
|
240
|
+
|
|
241
|
+
- [troubleshooting.md](troubleshooting.md) — symptom-driven diagnostic playbooks
|
|
242
|
+
- [loop-engine.md](loop-engine.md) — multi-turn loop protocol, locks, advance gates
|
|
243
|
+
- [multi-agent-workflows.md](multi-agent-workflows.md) — high-level coordination scenarios
|
|
244
|
+
- [../integrations/overview.md](../integrations/overview.md) — index of supported agents
|
|
245
|
+
- [../integrations/mcp.md](../integrations/mcp.md) — full MCP tool catalog
|