brainclaw 1.7.5 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (143) hide show
  1. package/README.md +28 -11
  2. package/dist/brainclaw-vscode.vsix +0 -0
  3. package/dist/cli.js +139 -13
  4. package/dist/commands/add-step.js +1 -1
  5. package/dist/commands/bootstrap.js +2 -26
  6. package/dist/commands/check-security-mcp.js +50 -33
  7. package/dist/commands/check-security.js +86 -43
  8. package/dist/commands/claim.js +22 -21
  9. package/dist/commands/confirm.js +26 -0
  10. package/dist/commands/context-diff.js +1 -1
  11. package/dist/commands/dispatch-watch.js +142 -0
  12. package/dist/commands/doctor.js +113 -2
  13. package/dist/commands/estimation-report.js +115 -16
  14. package/dist/commands/harvest.js +502 -16
  15. package/dist/commands/init.js +123 -21
  16. package/dist/commands/loops-handlers.js +4 -0
  17. package/dist/commands/mcp-read-handlers.js +198 -29
  18. package/dist/commands/mcp.js +615 -92
  19. package/dist/commands/memory.js +21 -17
  20. package/dist/commands/migrate.js +81 -17
  21. package/dist/commands/prune.js +78 -4
  22. package/dist/commands/reflect.js +26 -20
  23. package/dist/commands/register-agent.js +57 -1
  24. package/dist/commands/repair.js +20 -0
  25. package/dist/commands/session-end.js +15 -6
  26. package/dist/commands/session-start.js +18 -1
  27. package/dist/commands/setup-security.js +39 -18
  28. package/dist/commands/setup.js +26 -27
  29. package/dist/commands/stale.js +16 -2
  30. package/dist/commands/uninstall.js +126 -34
  31. package/dist/commands/update-step.js +6 -0
  32. package/dist/commands/worktree.js +60 -0
  33. package/dist/core/actions.js +12 -3
  34. package/dist/core/agent-capability.js +11 -13
  35. package/dist/core/agent-files.js +844 -547
  36. package/dist/core/agent-integrations.js +0 -3
  37. package/dist/core/agent-inventory.js +67 -0
  38. package/dist/core/agent-registry.js +163 -29
  39. package/dist/core/agentrun-reconciler.js +33 -2
  40. package/dist/core/agentruns.js +7 -1
  41. package/dist/core/ai-agent-detection.js +31 -44
  42. package/dist/core/archival.js +15 -9
  43. package/dist/core/assignment-reconciler.js +56 -0
  44. package/dist/core/assignment-sweeper.js +127 -4
  45. package/dist/core/assignments.js +69 -11
  46. package/dist/core/bootstrap.js +233 -67
  47. package/dist/core/brainclaw-version.js +22 -0
  48. package/dist/core/candidates.js +21 -1
  49. package/dist/core/claims.js +313 -150
  50. package/dist/core/config.js +6 -1
  51. package/dist/core/context-diff.js +148 -20
  52. package/dist/core/context.js +129 -8
  53. package/dist/core/coordination.js +22 -3
  54. package/dist/core/dispatch-status.js +109 -5
  55. package/dist/core/dispatcher.js +65 -11
  56. package/dist/core/entity-operations.js +45 -24
  57. package/dist/core/entity-registry.js +31 -5
  58. package/dist/core/event-log.js +138 -21
  59. package/dist/core/events/checkpoint.js +258 -0
  60. package/dist/core/events/genesis.js +220 -0
  61. package/dist/core/events/journal.js +507 -0
  62. package/dist/core/events/materialize.js +126 -0
  63. package/dist/core/events/registry-post-image.js +110 -0
  64. package/dist/core/events/verify.js +109 -0
  65. package/dist/core/execution-adapters.js +23 -0
  66. package/dist/core/execution.js +25 -0
  67. package/dist/core/facade-schema.js +48 -0
  68. package/dist/core/gc-semantic.js +130 -5
  69. package/dist/core/handoff-snapshot.js +68 -0
  70. package/dist/core/ids.js +19 -8
  71. package/dist/core/instruction-templates.js +34 -115
  72. package/dist/core/io.js +39 -3
  73. package/dist/core/json-store.js +10 -1
  74. package/dist/core/lock.js +153 -28
  75. package/dist/core/loops/bootstrap-acquire.js +25 -1
  76. package/dist/core/loops/facade-schema.js +2 -0
  77. package/dist/core/loops/hooks/survey-signals-baseline.js +36 -0
  78. package/dist/core/loops/index.js +1 -0
  79. package/dist/core/loops/presets/bootstrap.js +7 -0
  80. package/dist/core/loops/store.js +17 -0
  81. package/dist/core/loops/verbs.js +24 -1
  82. package/dist/core/markdown.js +8 -76
  83. package/dist/core/mcp-command-resolution.js +245 -0
  84. package/dist/core/memory-compactor.js +5 -3
  85. package/dist/core/memory-lifecycle.js +282 -0
  86. package/dist/core/merge-risk.js +150 -0
  87. package/dist/core/messaging.js +8 -1
  88. package/dist/core/migration.js +11 -1
  89. package/dist/core/observer-mode.js +26 -0
  90. package/dist/core/operations/memory-mutation.js +90 -65
  91. package/dist/core/operations/plan.js +27 -1
  92. package/dist/core/protocol-skills.js +210 -0
  93. package/dist/core/reflection-safety.js +6 -7
  94. package/dist/core/reputation.js +84 -2
  95. package/dist/core/runtime-signals.js +71 -9
  96. package/dist/core/runtime.js +84 -1
  97. package/dist/core/schema.js +125 -0
  98. package/dist/core/security-detectors.js +125 -0
  99. package/dist/core/security-extract.js +189 -0
  100. package/dist/core/security-guard.js +107 -29
  101. package/dist/core/security-packages.js +121 -0
  102. package/dist/core/security-scoring.js +76 -9
  103. package/dist/core/security.js +34 -2
  104. package/dist/core/sequence.js +11 -2
  105. package/dist/core/setup-flow.js +141 -13
  106. package/dist/core/spawn-check.js +110 -4
  107. package/dist/core/staleness.js +109 -1
  108. package/dist/core/state.js +250 -54
  109. package/dist/core/store-resolution.js +19 -5
  110. package/dist/core/worktree.js +169 -7
  111. package/dist/facts.js +8 -8
  112. package/dist/facts.json +7 -7
  113. package/docs/PROTOCOL.md +223 -0
  114. package/docs/cli.md +11 -10
  115. package/docs/concepts/coordinator-runbook.md +129 -0
  116. package/docs/concepts/dispatch-lifecycle.md +17 -0
  117. package/docs/concepts/event-log-store-critique-A.md +333 -0
  118. package/docs/concepts/event-log-store-critique-B.md +353 -0
  119. package/docs/concepts/event-log-store-phase0-measurements.md +58 -0
  120. package/docs/concepts/event-log-store-proposal-A.md +365 -0
  121. package/docs/concepts/event-log-store-proposal-B.md +404 -0
  122. package/docs/concepts/event-log-store.md +928 -0
  123. package/docs/concepts/identity-model-proposal.md +371 -0
  124. package/docs/concepts/memory.md +5 -4
  125. package/docs/concepts/observer-protocol.md +361 -0
  126. package/docs/concepts/parallel-merge-protocol.md +71 -0
  127. package/docs/concepts/plans-and-claims.md +43 -0
  128. package/docs/concepts/skills.md +78 -0
  129. package/docs/concepts/workspace-bootstrapping.md +61 -0
  130. package/docs/integrations/agents.md +4 -4
  131. package/docs/integrations/cline.md +10 -11
  132. package/docs/integrations/codex.md +2 -2
  133. package/docs/integrations/continue.md +5 -5
  134. package/docs/integrations/copilot.md +14 -12
  135. package/docs/integrations/openclaw.md +7 -6
  136. package/docs/integrations/overview.md +7 -7
  137. package/docs/integrations/roo.md +3 -3
  138. package/docs/integrations/windsurf.md +6 -6
  139. package/docs/mcp-schema-changelog.md +51 -20
  140. package/docs/quickstart.md +48 -47
  141. package/docs/security.md +174 -15
  142. package/docs/storage.md +4 -2
  143. package/package.json +8 -6
@@ -0,0 +1,333 @@
1
+ # Event-Log Store — Cross-Critique by Slot A (round 2)
2
+
3
+ > Ideation artifact for lop_3bf55b9492e0d96c (pln_2290bc70 / pln#543 step 1).
4
+ > Slot A attacking proposal B, the shared spine, and adjudicating divergences.
5
+ > Convergence between A and B is treated as suspect, not as validation.
6
+
7
+ ## 1. Attacks on Proposal B
8
+
9
+ ### 1.1 "Two mutating writers: impossible by construction" — FALSE, and B has no detection
10
+
11
+ B §5.3 claims two mutating writers are "impossible by construction" because the
12
+ store lock serializes them. The lock can still be **broken on presumed owner
13
+ death**, and presumed death is fallible:
14
+
15
+ - **Pid-liveness false negative**: on Windows, a permissions error or a
16
+ transient process-query failure can make a live owner look dead → breaker
17
+ takes the lock while the owner is mid-mutation. Two writers.
18
+ - **Pid reuse false positive** (inverse): dead owner's pid recycled to an
19
+ unrelated process → lock looks held forever → availability stall, and the
20
+ eventual manual/timeout break lands while operators are improvising.
21
+
22
+ B records `writer`/`pid` in the envelope but defines **no reader rule for a
23
+ duplicate seq**. Worse, B's `HEAD.json` makes the failure compound: two
24
+ writers both read `next_seq = N`, both append seq N, both rewrite `HEAD.json`
25
+ via temp+rename — last rename wins, the loser's bump is lost, so a **third**
26
+ writer can reuse N again. Seq uniqueness silently degrades with no anomaly
27
+ surfaced anywhere.
28
+
29
+ A's `(seq, writer-nonce)` identity + "duplicate seq from different writers =
30
+ detected anomaly, apply in file order, doctor warning" is the minimum viable
31
+ answer. The synthesis additionally needs a **writer-side tail validation**: on
32
+ lock acquisition, before first append, read the last record of the active
33
+ segment and assert `next_seq > last_seq_in_file`; if not, self-heal
34
+ `next_seq = last_seq_in_file + 1` and emit a `seq_repair` event. That closes
35
+ the HEAD-regression hole that neither proposal closes.
36
+
37
+ Note `writer` must be **pid + start-nonce** (A's shape), not agent name + pid
38
+ (B's shape): pid reuse makes bare pid an unreliable writer identity over a
39
+ journal's lifetime.
40
+
41
+ ### 1.2 Lockless observability appends race the segment roll — B's immutability claim is unenforced
42
+
43
+ B §5.1 lets observability events append **without the lock** (`seq: null`).
44
+ B §3.1 rolls segments by creating a new file and updating `HEAD.json` —
45
+ **never renaming** the old active segment. Combine them: a lockless appender
46
+ that resolved the active segment path before the roll (or holds an open fd,
47
+ which is the natural way to implement an appender) keeps appending **into the
48
+ just-sealed segment**. "Rolled segments are immutable" is therefore not an
49
+ invariant; it is a hope that every writer notices the roll. Consequences:
50
+
51
+ - Checkpoint-based archival (B §3.2) can park a segment that is still
52
+ receiving writes — silent event loss into `archive/`.
53
+ - Segment-name-encodes-first-seq stays true, but "segment content is frozen
54
+ after roll" — which cursors, federation pulls, and doctor verification all
55
+ implicitly assume — is false.
56
+
57
+ A has no lockless append path, so A doesn't have this bug, at the cost of
58
+ notification appends contending for the lock. The fix is cheap: **all appends
59
+ take the lock** (mutation frequency is human-action-scale; B itself concedes
60
+ "they can cheaply take the lock"). If notification traffic ever measurably
61
+ contends, split the streams (B's own Q6) — but do not ship an unlocked write
62
+ path into a file whose immutability the whole design leans on.
63
+
64
+ ### 1.3 In-journal checkpoints pollute every cursor and inflate seq space
65
+
66
+ B §3.2 emits checkpoints as ordinary journal records: one snapshot event per
67
+ live entity + a terminator. Three problems:
68
+
69
+ 1. **Cursor spam.** Cursors are seq watermarks; after a checkpoint, every
70
+ notification consumer "sees" N phantom snapshot events it must parse and
71
+ filter. B never says checkpoint records are excluded from
72
+ `readUnseenEvents` — and if they are excluded, that's a special-case rule
73
+ contradicting "checkpoint is appended like any event."
74
+ 2. **Crash mid-checkpoint** leaves a headless run (snapshots, no terminator).
75
+ B's terminator implies the recovery rule (use last *complete* checkpoint)
76
+ but never states it, and a rebuild scanning backward must now distinguish
77
+ "real entity event" from "stale partial-checkpoint snapshot" — they are
78
+ schema-identical.
79
+ 3. **Lock hold time.** O(live entities) appends + fsync under the store lock;
80
+ at 20-agent scale this stretches toward the lock-refresh/expiry windows the
81
+ sprint-1 hardening just tuned.
82
+
83
+ A's out-of-band checkpoint manifest (`checkpoints/ckpt-<seq>.json`) has none
84
+ of these: cursors never see it, partial write = orphan file (harmless,
85
+ meta-written-last), and the journal stays purely a stream of real events.
86
+ A's Q3 must resolve to **self-contained** post-images: the "referencing"
87
+ variant (hashes of projection files) couples checkpoint validity to projection
88
+ integrity — exactly the dependency direction a rebuild-from-truth artifact
89
+ must not have.
90
+
91
+ ### 1.4 Checkpoint-gated archival without checkpoint verification
92
+
93
+ B §3.2 moves superseded segments to `archive/` once a checkpoint covers them.
94
+ If that checkpoint later turns out corrupt (torn during write, disk fault),
95
+ the archived segments are suddenly **not** redundant — and recovery now
96
+ depends on operators realizing the archive must be un-parked. Neither proposal
97
+ states the guard, so the synthesis must: **verify a checkpoint by full
98
+ re-parse (and schema-validate) before any segment it supersedes is archived.**
99
+ Park-don't-delete makes this survivable either way, but survivable-by-forensics
100
+ is not the bar; the bar is no-human-needed convergence.
101
+
102
+ ### 1.5 Torn-tail handling: B's reader rules conflict with themselves
103
+
104
+ B §2.2 rule 3: torn tail → "skip without warning." But after the next append,
105
+ the leading-`\n` framing converts that torn tail into a **mid-file malformed
106
+ line**, which rule 2 says doctor must flag forever. So a routine, benign crash
107
+ permanently raises a doctor warning on a healthy store — alarm fatigue, which
108
+ trp_d5595086 teaches is how real corruption later slips through. Fix in §3.3
109
+ below (a `journal_note` event marks the fragment as adjudicated; doctor counts
110
+ adjudicated fragments separately from unexplained corruption).
111
+
112
+ Also under-specified in B: a torn write can, in the worst case, end exactly at
113
+ the record's final `}` with only the trailing `\n` missing — a line that
114
+ **parses validly** yet was never confirmed (crash before fsync). B's rule 3
115
+ skips it, which is the correct call, but B never argues why it's correct
116
+ (answer: journal-first + fsync-before-projection means an unconfirmed tail can
117
+ always be dropped; the caller was never told "ok"). The synthesis should state
118
+ this argument, because the rule looks wrong without it.
119
+
120
+ ### 1.6 O_APPEND seatbelt has a size ceiling nobody enforces
121
+
122
+ Both proposals say "single write of a few KB doesn't interleave." True for
123
+ small records on local FS — but single-`write()` append atomicity degrades for
124
+ multi-page writes (>4KB is where guarantees get murky across FS
125
+ implementations, and NTFS makes no formal promise at any size). B's own size
126
+ math admits long plan bodies can grow; B's `payload_ref` escape hatch is
127
+ deferred until "a real entity exceeds ~64 KB" — meaning large records **will
128
+ ship before the mitigation exists**, silently exiting the envelope where the
129
+ seatbelt works. Since the lock is the primary guarantee, this only bites in
130
+ the lock-steal window — but that's precisely the window the seatbelt exists
131
+ for. Synthesis: enforce a **max-record-size check at write time** (warn at
132
+ 64 KB, hard-fail at 256 KB with a pointer to payload_ref), so the day the
133
+ ceiling matters, it fails loud at the writer, not subtly at a reader.
134
+
135
+ ### 1.7 Network drives: B warns, neither proposal decides
136
+
137
+ B's "doctor warns when the store sits on a network mount" is the right
138
+ instinct but under-specified: detection (Windows UNC paths and mapped drives,
139
+ `fs.statfs` is not in stable Node API) is nontrivial, and the consequence
140
+ ("journal correctness guaranteed on local FS only") is buried in a caveat.
141
+ The synthesis should promote it to a documented support boundary + best-effort
142
+ UNC/mapped-drive detection in doctor, and accept that detection is heuristic.
143
+
144
+ ### 1.8 MCP fresh-path: two meta files create a torn-state window and double the reads
145
+
146
+ B reads `HEAD.json` + `projections.json` (two files, two atomic renames at
147
+ write time). The write order (watermark last) keeps `applied_seq ≤ next_seq`
148
+ — fine — but two files means two reads per MCP call and two rename syscalls
149
+ per mutation for state that is always consumed together. A's single
150
+ `journal/meta.json` carrying `next_seq` + per-family `last_applied_seq` is
151
+ strictly cheaper and removes the ordering reasoning entirely. Keep B's
152
+ property that the meta file is a **rebuildable cache** (reconstructible from
153
+ segment listing + tail read), which A never claimed and should.
154
+
155
+ ### 1.9 Clock skew / federation — B survives, one nit
156
+
157
+ B is clean here (seq orders, ts decorates, `(origin_store_id, seq)` dedups).
158
+ One nit: stamping `origin_store_id` once per **segment header record** means a
159
+ federation pull must transfer whole segments or carry the header along with
160
+ any slice; fine for the rsync/dumb-bus model, but the spec should say slices
161
+ must be header-prefixed. Cosmetic, not structural.
162
+
163
+ ## 2. Attacks on the SHARED spine
164
+
165
+ Convergence on snapshots + global-seq-under-lock + segments + lazy projections
166
+ is exactly where groupthink would hide (both slots are the same model — the
167
+ single-agent-method memory applies). Two scenarios where the spine is wrong:
168
+
169
+ ### 2.1 Full-snapshot-per-event is wrong for high-frequency partial updates on fat entities
170
+
171
+ The poison combination is **large entity × high update frequency**: an
172
+ agent_run receiving progress/heartbeat updates every few seconds, or a plan
173
+ with a long body where every step-completion rewrites the whole doc. A 100 KB
174
+ plan updated 50× = 5 MB of journal for one entity in one session; each record
175
+ individually exceeds the O_APPEND atomicity comfort zone (§1.6) **and** makes
176
+ the per-mutation fsync slower, all to record a one-field change. The spine's
177
+ defense ("entities are 1–10 KB") is an observation about today's store, not
178
+ an invariant of the schema.
179
+
180
+ **Falsifier (measurable now, before phase 1):** from the dogfood store,
181
+ compute per-`item_type` p95 entity size × per-entity event frequency from the
182
+ 17k v1 events. If any type's (size × freq) implies segment rolls faster than
183
+ ~weekly, or any single record would exceed 64 KB, then snapshots-everywhere is
184
+ the wrong call for that type and `payload_ref` (or per-type field-delta) must
185
+ ship in phase 1, not deferred. This measurement is cheap and should be a
186
+ phase-0 deliverable.
187
+
188
+ ### 2.2 Global-seq-under-lock welds event capture to lock availability
189
+
190
+ Three concrete failure shapes:
191
+
192
+ 1. **Wedged lock** = no events at all (A) or seq-null limbo events (B). The
193
+ sprint-1 hardening reduced but did not eliminate stuck-owner scenarios.
194
+ 2. **Federation import** must take the lock to assign local seqs to every
195
+ imported batch; a 10k-event pull holds the lock long enough to starve local
196
+ agents at 20-agent scale. Needs chunked import with lock release between
197
+ chunks — neither proposal says so.
198
+ 3. **Sandboxed/worktree workers** (the facades-in-sandbox failure is on
199
+ record: sandboxed agents could not reach MCP) **cannot append at all**.
200
+ Their work produces zero journal events until a sync point — meaning the
201
+ journal is the truth of the *store*, not of the *system*, and nobody should
202
+ pretend otherwise.
203
+
204
+ **Falsifier:** the moment any roadmap item requires offline/sandboxed local
205
+ event capture (a worker journaling into its worktree for later merge),
206
+ global-seq-under-lock is the wrong primitive — that world needs per-writer
207
+ seqs + merge, i.e., the federation mechanism applied locally. Until then the
208
+ spine holds, but the spec must state this boundary explicitly so the
209
+ assumption is visible when it breaks.
210
+
211
+ ## 3. Divergence adjudication
212
+
213
+ ### 3.1 Diff-to-event synthesis at `persistStateUnlocked` (A) vs `diffToEvents` in `mutateState` (B)
214
+
215
+ Substantively the same mechanism at the same layer; the real defect is shared:
216
+ **diff inference destroys verb semantics.** A diff knows created/changed/
217
+ removed; it cannot know `claim` vs `update` vs `complete` — the EventAction
218
+ union's expressiveness, which notifications and federation signaling consume,
219
+ collapses to generic `update`. A waves at "verb sites migrate
220
+ opportunistically afterwards"; B defers to phase 3. Both lose semantics in the
221
+ interim.
222
+
223
+ **Verdict: third option beats both.** Diff-synthesis stays as the
224
+ **correctness backstop** (it guarantees no mutation escapes the journal —
225
+ that property is what makes the journal trustworthy), plus a per-mutation
226
+ **intent annotation**: verb sites declare `(action, item_type, item_id,
227
+ summary)` into the in-flight mutation context (they already call
228
+ `appendEvent` with exactly these fields today — the call sites exist, they
229
+ just need to write to the mutation context instead of the legacy stream); the
230
+ diff supplies the payload and emits any *unannotated* change as `update` +
231
+ doctor counter. Semantic fidelity from day one, correctness regardless.
232
+
233
+ ### 3.2 Cursors: `{segment_id, offset}` (A) vs `{last_seq}` (B)
234
+
235
+ **Verdict: B wins outright.** Seq watermarks are rotation-proof,
236
+ compaction-proof, trivially comparable, and survive any future re-layout.
237
+ A's byte offsets are broken **by A's own design**: A's quarantine repair moves
238
+ torn bytes out of the active segment — a rewrite that shifts every subsequent
239
+ offset, invalidating A's own cursors. Self-inflicted. The cost of seq cursors
240
+ (scan from segment start to find seq N, no per-line index) is bounded by the
241
+ 10 MB segment size and irrelevant in practice. B's `{gap: true}` +
242
+ checkpoint-summary degradation for ancient watermarks is also the right
243
+ notification semantics and should be kept verbatim.
244
+
245
+ ### 3.3 Torn-tail handling: quarantine-and-repair (A) vs leading-`\n` framing (B)
246
+
247
+ **Verdict: B's framing wins; A's repair must die.** A's writer-side repair
248
+ (move torn bytes to `quarantine/`, truncate) is a **read-modify-write of the
249
+ journal** — it breaks append-only (the one structural property everything
250
+ else leans on), breaks A's offset cursors (§3.2), and adds a crash window
251
+ *inside the repair itself*. B's leading-`\n` caps damage at one event with
252
+ zero mutation of existing bytes and costs one byte per record.
253
+
254
+ Take from A the **loudness**, repaired: when a writer (under lock, before
255
+ appending) detects a torn tail, it appends a `journal_note` event recording
256
+ the fragment's segment + byte range + content hash as *adjudicated*. Doctor
257
+ then distinguishes adjudicated fragments (expected crash residue, count only)
258
+ from unexplained mid-file corruption (alarm). This resolves B's
259
+ self-contradiction (§1.5) and keeps park-don't-delete: the fragment stays in
260
+ the file, annotated, forever.
261
+
262
+ ### 3.4 `doctor --verify-journal` (A) vs scattered doctor checks (B)
263
+
264
+ **Verdict: A wins; union with B's counters.** A's full rebuild-in-temp-dir +
265
+ diff-against-projections is the only check that validates the actual claim
266
+ ("journal is sufficient to reproduce state") and is the only credible phase-2
267
+ exit gate. B's incremental counters (skipped lines, torn tails, network-FS
268
+ warning) are cheap continuous telemetry and complement it. Synthesis ships
269
+ both: counters always-on, `--verify-journal` in CI on both OS families + as
270
+ the dual→primary promotion gate.
271
+
272
+ ### 3.5 Additional divergences spotted
273
+
274
+ | Divergence | A | B | Verdict |
275
+ |---|---|---|---|
276
+ | Segment naming/sealing | Range-named, **rename at seal** | First-seq-named at birth, **never rename** | **B.** A's rename-of-open-file is the exact Windows EPERM/EBUSY hazard A then mitigates with retry logic. B's name-at-birth needs no rename, no retry, no Windows caveat. Don't build the problem. |
277
+ | fsync default | `rotate` (seal+checkpoint only) | one fsync per `mutate()` | **B.** A's argument (fsync cost vs MCP cheapness) conflates paths: fsync is on the write path; MCP read cost is untouched. A journal-as-truth that confirms mutations the disk may not have is a contradiction in terms. Mutations are human-frequency; one fsync each is affordable. Keep the config escape hatch, drop the `rotate` default. |
278
+ | Envelope: `entity_rev` (A) vs `(writer_id, writer_seq)` (B) | per-entity monotonic rev | per-writer counter | **A.** `entity_rev` serves three masters (cheap dirty checks, optimistic concurrency, federation conflict detection); `writer_seq` serves only federation and is derivable later. Don't carry dead weight in every record. |
279
+ | Writer identity | pid + start-nonce | agent name + bare pid | **A** (pid reuse, §1.1). |
280
+ | Meta files | single `meta.json` | `HEAD.json` + `projections.json` | **A's single file, B's rebuildable-cache property** (§1.8). |
281
+ | Tombstone shape | `action:"delete", payload:null` | `deleted:true`, payload omitted | Either works; pick **`action:"delete"` + payload omitted** (no boolean redundant with the action union). Codex schema review should confirm. |
282
+ | Observability events | in-lock, payload-exempt | lockless, `seq:null` | **A** (roll race, §1.2). All appends take the lock; revisit only with contention data. |
283
+ | Checkpoint placement | out-of-band manifest | in-journal event run | **A** (§1.3), self-contained, + verify-before-archive (§1.4). |
284
+
285
+ ## 4. VERDICT
286
+
287
+ ### The 5 decisions the synthesis MUST take
288
+
289
+ 1. **Segment lifecycle = B's:** first-seq-named at creation, never renamed,
290
+ immutable after roll; **all appends under the store lock** (no lockless
291
+ path — closes the roll race §1.2); cursors are **seq watermarks** (B),
292
+ with `{gap:true}` checkpoint-summary degradation.
293
+ 2. **Torn-tail protocol = B's framing + A's loudness, no rewrites:**
294
+ leading+trailing `\n` per record; torn fragments stay in place forever;
295
+ writer appends an adjudicating `journal_note` under lock; doctor separates
296
+ adjudicated residue from unexplained corruption. Max-record-size enforced
297
+ at write (warn 64 KB / fail 256 KB).
298
+ 3. **Checkpoints = A's:** out-of-band, self-contained manifests in
299
+ `checkpoints/`, fsync'd, meta-written-last; **verified by full re-parse
300
+ before any superseded segment is archived**; never referenced-by-hash.
301
+ 4. **Two-writer honesty = A's, hardened:** writer identity = pid+start-nonce;
302
+ duplicate `(seq)` from distinct writers is a *detected anomaly* (file
303
+ order wins, doctor warns); on lock acquisition, writer validates
304
+ `next_seq` against the actual segment tail and self-heals upward
305
+ (`seq_repair` event). The phrase "impossible by construction" is banned
306
+ from the synthesis.
307
+ 5. **Write path = journal-first, fsync-per-mutate (B), diff-synthesis as
308
+ backstop + verb-site intent annotation (third option, §3.1);**
309
+ single rebuildable `meta.json`; `entity_rev` in the envelope;
310
+ `doctor --verify-journal` as the dual→primary promotion gate, doctor
311
+ counters always-on.
312
+
313
+ ### Open questions, severity-ranked
314
+
315
+ | # | Sev | Owner | Question |
316
+ |---|---|---|---|
317
+ | 1 | HIGH | **Juan** | Journal-in-git policy (B's Q5). Recommendation: **gitignore segments inside the store repo; commit checkpoints only.** Committing segments bloats history (10 MB blobs) and — fatal — if journal files ever live on diverging branches/worktrees that merge, seq uniqueness dies in a union merge. Needs a product decision because it touches the "git-diffable identity" constraint's interpretation. |
318
+ | 2 | HIGH | **Codex** | Envelope schema review: payload-required-per-EventAction mapping (A's Q1), tombstone shape (§3.5), checkpoint manifest schema, `journal_note`/`seq_repair` event kinds. The action union → payload contract has hole potential and is exactly schema-review territory. |
319
+ | 3 | MED | **Juan** | Stale-read policy under lock contention (B's Q3): serve stale-with-annotation everywhere except claim verbs (read-through-journal)? Liveness-vs-consistency product call; affects dispatch correctness at 20-agent scale. |
320
+ | 4 | MED | **Codex + measurement** | Snapshot-size falsifier (§2.1): run the p95-size × frequency analysis on the dogfood store in phase 0. If poison combination exists, `payload_ref` enters phase 1 and the record schema changes — decide before, not after, the format ships. |
321
+ | 5 | LOW | **Juan** | Federation conflict semantics (A-Q2/B-Q4) and gc/archive thresholds (A-Q5). Genuinely deferrable: the journal design is agnostic; both only need answering when federation lands. |
322
+
323
+ ### Abstract
324
+
325
+ B's spine survives; its perimeter doesn't: the "two writers impossible" claim
326
+ is false and undetected, the lockless append path races segment roll into
327
+ "immutable" files, and in-journal checkpoints pollute every cursor. A's
328
+ quarantine repair is self-defeating (rewrites the journal, breaks its own
329
+ offset cursors) and must be replaced by B's framing. Synthesis: B's segment
330
+ lifecycle + cursors + fsync, A's checkpoints + two-writer detection +
331
+ verify-journal gate, plus three rules neither proposal had — verify
332
+ checkpoints before archival, enforce max record size, validate next_seq
333
+ against the segment tail on lock acquisition.