brainclaw 1.9.0 → 1.9.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +585 -499
- package/dist/brainclaw-vscode.vsix +0 -0
- package/dist/commands/harvest.js +1 -1
- package/dist/commands/hooks.js +73 -73
- package/dist/commands/init.js +1 -1
- package/dist/commands/install-hooks.js +78 -78
- package/dist/commands/mcp-read-handlers.js +57 -14
- package/dist/commands/mcp.js +79 -13
- package/dist/commands/switch.js +26 -5
- package/dist/commands/version.js +1 -1
- package/dist/core/agent-capability.js +19 -4
- package/dist/core/agent-files.js +119 -119
- package/dist/core/codev-prompts.js +38 -38
- package/dist/core/default-profiles/doctor.yaml +11 -11
- package/dist/core/default-profiles/janitor.yaml +11 -11
- package/dist/core/default-profiles/onboarder.yaml +11 -11
- package/dist/core/default-profiles/reviewer.yaml +13 -13
- package/dist/core/dispatcher.js +1 -1
- package/dist/core/entity-operations.js +29 -3
- package/dist/core/execution.js +1 -1
- package/dist/core/loops/verbs.js +0 -1
- package/dist/core/messaging.js +2 -2
- package/dist/core/protocol-skills.js +164 -164
- package/dist/core/runtime-signals.js +1 -1
- package/dist/core/search.js +19 -2
- package/dist/core/security-guard.js +207 -207
- package/dist/core/spawn-check.js +16 -2
- package/dist/core/staleness.js +1 -1
- package/dist/core/store-resolution.js +26 -7
- package/dist/core/worktree.js +18 -18
- package/dist/facts.js +3 -3
- package/dist/facts.json +2 -2
- package/docs/PROTOCOL.md +1 -1
- package/docs/adapters/openclaw.md +43 -43
- package/docs/architecture/project-refs.md +328 -328
- package/docs/cli.md +2093 -2093
- package/docs/concepts/coordination.md +52 -52
- package/docs/concepts/coordinator-runbook.md +129 -129
- package/docs/concepts/dispatch-lifecycle.md +245 -245
- package/docs/concepts/event-log-store.md +928 -928
- package/docs/concepts/ideation-loop.md +317 -317
- package/docs/concepts/loop-engine.md +520 -511
- package/docs/concepts/mcp-governance.md +268 -268
- package/docs/concepts/memory.md +84 -84
- package/docs/concepts/multi-agent-workflows.md +167 -167
- package/docs/concepts/observer-protocol.md +361 -361
- package/docs/concepts/plans-and-claims.md +217 -217
- package/docs/concepts/project-md-convention.md +35 -35
- package/docs/concepts/runtime-notes.md +38 -38
- package/docs/concepts/troubleshooting.md +254 -254
- package/docs/concepts/workspace-bootstrapping.md +142 -142
- package/docs/context-format-changelog.md +35 -35
- package/docs/context-format.md +48 -48
- package/docs/index.md +65 -65
- package/docs/integrations/agents.md +158 -158
- package/docs/integrations/claude-code.md +23 -23
- package/docs/integrations/cline.md +77 -77
- package/docs/integrations/continue.md +55 -55
- package/docs/integrations/copilot.md +68 -68
- package/docs/integrations/cursor.md +23 -23
- package/docs/integrations/kilocode.md +72 -72
- package/docs/integrations/mcp.md +377 -377
- package/docs/integrations/mistral-vibe.md +122 -122
- package/docs/integrations/openclaw.md +92 -92
- package/docs/integrations/opencode.md +84 -84
- package/docs/integrations/overview.md +115 -115
- package/docs/integrations/roo.md +71 -71
- package/docs/integrations/windsurf.md +77 -77
- package/docs/mcp-schema-changelog.md +360 -356
- package/docs/playbooks/integration/index.md +121 -121
- package/docs/playbooks/orchestration.md +37 -0
- package/docs/playbooks/productivity/index.md +99 -99
- package/docs/playbooks/team/index.md +117 -117
- package/docs/product/agent-first-model.md +184 -184
- package/docs/product/entity-model-audit.md +462 -462
- package/docs/product/positioning.md +86 -86
- package/docs/quickstart-existing-project.md +107 -107
- package/docs/quickstart.md +183 -183
- package/docs/release-maintenance.md +79 -79
- package/docs/reputation.md +52 -52
- package/docs/review.md +45 -45
- package/docs/security.md +212 -212
- package/docs/server-operations.md +118 -118
- package/docs/storage.md +106 -106
- package/package.json +80 -65
- package/docs/concepts/event-log-store-critique-A.md +0 -333
- package/docs/concepts/event-log-store-critique-B.md +0 -353
- package/docs/concepts/event-log-store-phase0-measurements.md +0 -58
- package/docs/concepts/event-log-store-proposal-A.md +0 -365
- package/docs/concepts/event-log-store-proposal-B.md +0 -404
- package/docs/concepts/identity-model-proposal.md +0 -371
|
@@ -1,333 +0,0 @@
|
|
|
1
|
-
# Event-Log Store — Cross-Critique by Slot A (round 2)
|
|
2
|
-
|
|
3
|
-
> Ideation artifact for lop_3bf55b9492e0d96c (pln_2290bc70 / pln#543 step 1).
|
|
4
|
-
> Slot A attacking proposal B, the shared spine, and adjudicating divergences.
|
|
5
|
-
> Convergence between A and B is treated as suspect, not as validation.
|
|
6
|
-
|
|
7
|
-
## 1. Attacks on Proposal B
|
|
8
|
-
|
|
9
|
-
### 1.1 "Two mutating writers: impossible by construction" — FALSE, and B has no detection
|
|
10
|
-
|
|
11
|
-
B §5.3 claims two mutating writers are "impossible by construction" because the
|
|
12
|
-
store lock serializes them. The lock can still be **broken on presumed owner
|
|
13
|
-
death**, and presumed death is fallible:
|
|
14
|
-
|
|
15
|
-
- **Pid-liveness false negative**: on Windows, a permissions error or a
|
|
16
|
-
transient process-query failure can make a live owner look dead → breaker
|
|
17
|
-
takes the lock while the owner is mid-mutation. Two writers.
|
|
18
|
-
- **Pid reuse false positive** (inverse): dead owner's pid recycled to an
|
|
19
|
-
unrelated process → lock looks held forever → availability stall, and the
|
|
20
|
-
eventual manual/timeout break lands while operators are improvising.
|
|
21
|
-
|
|
22
|
-
B records `writer`/`pid` in the envelope but defines **no reader rule for a
|
|
23
|
-
duplicate seq**. Worse, B's `HEAD.json` makes the failure compound: two
|
|
24
|
-
writers both read `next_seq = N`, both append seq N, both rewrite `HEAD.json`
|
|
25
|
-
via temp+rename — last rename wins, the loser's bump is lost, so a **third**
|
|
26
|
-
writer can reuse N again. Seq uniqueness silently degrades with no anomaly
|
|
27
|
-
surfaced anywhere.
|
|
28
|
-
|
|
29
|
-
A's `(seq, writer-nonce)` identity + "duplicate seq from different writers =
|
|
30
|
-
detected anomaly, apply in file order, doctor warning" is the minimum viable
|
|
31
|
-
answer. The synthesis additionally needs a **writer-side tail validation**: on
|
|
32
|
-
lock acquisition, before first append, read the last record of the active
|
|
33
|
-
segment and assert `next_seq > last_seq_in_file`; if not, self-heal
|
|
34
|
-
`next_seq = last_seq_in_file + 1` and emit a `seq_repair` event. That closes
|
|
35
|
-
the HEAD-regression hole that neither proposal closes.
|
|
36
|
-
|
|
37
|
-
Note `writer` must be **pid + start-nonce** (A's shape), not agent name + pid
|
|
38
|
-
(B's shape): pid reuse makes bare pid an unreliable writer identity over a
|
|
39
|
-
journal's lifetime.
|
|
40
|
-
|
|
41
|
-
### 1.2 Lockless observability appends race the segment roll — B's immutability claim is unenforced
|
|
42
|
-
|
|
43
|
-
B §5.1 lets observability events append **without the lock** (`seq: null`).
|
|
44
|
-
B §3.1 rolls segments by creating a new file and updating `HEAD.json` —
|
|
45
|
-
**never renaming** the old active segment. Combine them: a lockless appender
|
|
46
|
-
that resolved the active segment path before the roll (or holds an open fd,
|
|
47
|
-
which is the natural way to implement an appender) keeps appending **into the
|
|
48
|
-
just-sealed segment**. "Rolled segments are immutable" is therefore not an
|
|
49
|
-
invariant; it is a hope that every writer notices the roll. Consequences:
|
|
50
|
-
|
|
51
|
-
- Checkpoint-based archival (B §3.2) can park a segment that is still
|
|
52
|
-
receiving writes — silent event loss into `archive/`.
|
|
53
|
-
- Segment-name-encodes-first-seq stays true, but "segment content is frozen
|
|
54
|
-
after roll" — which cursors, federation pulls, and doctor verification all
|
|
55
|
-
implicitly assume — is false.
|
|
56
|
-
|
|
57
|
-
A has no lockless append path, so A doesn't have this bug, at the cost of
|
|
58
|
-
notification appends contending for the lock. The fix is cheap: **all appends
|
|
59
|
-
take the lock** (mutation frequency is human-action-scale; B itself concedes
|
|
60
|
-
"they can cheaply take the lock"). If notification traffic ever measurably
|
|
61
|
-
contends, split the streams (B's own Q6) — but do not ship an unlocked write
|
|
62
|
-
path into a file whose immutability the whole design leans on.
|
|
63
|
-
|
|
64
|
-
### 1.3 In-journal checkpoints pollute every cursor and inflate seq space
|
|
65
|
-
|
|
66
|
-
B §3.2 emits checkpoints as ordinary journal records: one snapshot event per
|
|
67
|
-
live entity + a terminator. Three problems:
|
|
68
|
-
|
|
69
|
-
1. **Cursor spam.** Cursors are seq watermarks; after a checkpoint, every
|
|
70
|
-
notification consumer "sees" N phantom snapshot events it must parse and
|
|
71
|
-
filter. B never says checkpoint records are excluded from
|
|
72
|
-
`readUnseenEvents` — and if they are excluded, that's a special-case rule
|
|
73
|
-
contradicting "checkpoint is appended like any event."
|
|
74
|
-
2. **Crash mid-checkpoint** leaves a headless run (snapshots, no terminator).
|
|
75
|
-
B's terminator implies the recovery rule (use last *complete* checkpoint)
|
|
76
|
-
but never states it, and a rebuild scanning backward must now distinguish
|
|
77
|
-
"real entity event" from "stale partial-checkpoint snapshot" — they are
|
|
78
|
-
schema-identical.
|
|
79
|
-
3. **Lock hold time.** O(live entities) appends + fsync under the store lock;
|
|
80
|
-
at 20-agent scale this stretches toward the lock-refresh/expiry windows the
|
|
81
|
-
sprint-1 hardening just tuned.
|
|
82
|
-
|
|
83
|
-
A's out-of-band checkpoint manifest (`checkpoints/ckpt-<seq>.json`) has none
|
|
84
|
-
of these: cursors never see it, partial write = orphan file (harmless,
|
|
85
|
-
meta-written-last), and the journal stays purely a stream of real events.
|
|
86
|
-
A's Q3 must resolve to **self-contained** post-images: the "referencing"
|
|
87
|
-
variant (hashes of projection files) couples checkpoint validity to projection
|
|
88
|
-
integrity — exactly the dependency direction a rebuild-from-truth artifact
|
|
89
|
-
must not have.
|
|
90
|
-
|
|
91
|
-
### 1.4 Checkpoint-gated archival without checkpoint verification
|
|
92
|
-
|
|
93
|
-
B §3.2 moves superseded segments to `archive/` once a checkpoint covers them.
|
|
94
|
-
If that checkpoint later turns out corrupt (torn during write, disk fault),
|
|
95
|
-
the archived segments are suddenly **not** redundant — and recovery now
|
|
96
|
-
depends on operators realizing the archive must be un-parked. Neither proposal
|
|
97
|
-
states the guard, so the synthesis must: **verify a checkpoint by full
|
|
98
|
-
re-parse (and schema-validate) before any segment it supersedes is archived.**
|
|
99
|
-
Park-don't-delete makes this survivable either way, but survivable-by-forensics
|
|
100
|
-
is not the bar; the bar is no-human-needed convergence.
|
|
101
|
-
|
|
102
|
-
### 1.5 Torn-tail handling: B's reader rules conflict with themselves
|
|
103
|
-
|
|
104
|
-
B §2.2 rule 3: torn tail → "skip without warning." But after the next append,
|
|
105
|
-
the leading-`\n` framing converts that torn tail into a **mid-file malformed
|
|
106
|
-
line**, which rule 2 says doctor must flag forever. So a routine, benign crash
|
|
107
|
-
permanently raises a doctor warning on a healthy store — alarm fatigue, which
|
|
108
|
-
trp_d5595086 teaches is how real corruption later slips through. Fix in §3.3
|
|
109
|
-
below (a `journal_note` event marks the fragment as adjudicated; doctor counts
|
|
110
|
-
adjudicated fragments separately from unexplained corruption).
|
|
111
|
-
|
|
112
|
-
Also under-specified in B: a torn write can, in the worst case, end exactly at
|
|
113
|
-
the record's final `}` with only the trailing `\n` missing — a line that
|
|
114
|
-
**parses validly** yet was never confirmed (crash before fsync). B's rule 3
|
|
115
|
-
skips it, which is the correct call, but B never argues why it's correct
|
|
116
|
-
(answer: journal-first + fsync-before-projection means an unconfirmed tail can
|
|
117
|
-
always be dropped; the caller was never told "ok"). The synthesis should state
|
|
118
|
-
this argument, because the rule looks wrong without it.
|
|
119
|
-
|
|
120
|
-
### 1.6 O_APPEND seatbelt has a size ceiling nobody enforces
|
|
121
|
-
|
|
122
|
-
Both proposals say "single write of a few KB doesn't interleave." True for
|
|
123
|
-
small records on local FS — but single-`write()` append atomicity degrades for
|
|
124
|
-
multi-page writes (>4KB is where guarantees get murky across FS
|
|
125
|
-
implementations, and NTFS makes no formal promise at any size). B's own size
|
|
126
|
-
math admits long plan bodies can grow; B's `payload_ref` escape hatch is
|
|
127
|
-
deferred until "a real entity exceeds ~64 KB" — meaning large records **will
|
|
128
|
-
ship before the mitigation exists**, silently exiting the envelope where the
|
|
129
|
-
seatbelt works. Since the lock is the primary guarantee, this only bites in
|
|
130
|
-
the lock-steal window — but that's precisely the window the seatbelt exists
|
|
131
|
-
for. Synthesis: enforce a **max-record-size check at write time** (warn at
|
|
132
|
-
64 KB, hard-fail at 256 KB with a pointer to payload_ref), so the day the
|
|
133
|
-
ceiling matters, it fails loud at the writer, not subtly at a reader.
|
|
134
|
-
|
|
135
|
-
### 1.7 Network drives: B warns, neither proposal decides
|
|
136
|
-
|
|
137
|
-
B's "doctor warns when the store sits on a network mount" is the right
|
|
138
|
-
instinct but under-specified: detection (Windows UNC paths and mapped drives,
|
|
139
|
-
`fs.statfs` is not in stable Node API) is nontrivial, and the consequence
|
|
140
|
-
("journal correctness guaranteed on local FS only") is buried in a caveat.
|
|
141
|
-
The synthesis should promote it to a documented support boundary + best-effort
|
|
142
|
-
UNC/mapped-drive detection in doctor, and accept that detection is heuristic.
|
|
143
|
-
|
|
144
|
-
### 1.8 MCP fresh-path: two meta files create a torn-state window and double the reads
|
|
145
|
-
|
|
146
|
-
B reads `HEAD.json` + `projections.json` (two files, two atomic renames at
|
|
147
|
-
write time). The write order (watermark last) keeps `applied_seq ≤ next_seq`
|
|
148
|
-
— fine — but two files means two reads per MCP call and two rename syscalls
|
|
149
|
-
per mutation for state that is always consumed together. A's single
|
|
150
|
-
`journal/meta.json` carrying `next_seq` + per-family `last_applied_seq` is
|
|
151
|
-
strictly cheaper and removes the ordering reasoning entirely. Keep B's
|
|
152
|
-
property that the meta file is a **rebuildable cache** (reconstructible from
|
|
153
|
-
segment listing + tail read), which A never claimed and should.
|
|
154
|
-
|
|
155
|
-
### 1.9 Clock skew / federation — B survives, one nit
|
|
156
|
-
|
|
157
|
-
B is clean here (seq orders, ts decorates, `(origin_store_id, seq)` dedups).
|
|
158
|
-
One nit: stamping `origin_store_id` once per **segment header record** means a
|
|
159
|
-
federation pull must transfer whole segments or carry the header along with
|
|
160
|
-
any slice; fine for the rsync/dumb-bus model, but the spec should say slices
|
|
161
|
-
must be header-prefixed. Cosmetic, not structural.
|
|
162
|
-
|
|
163
|
-
## 2. Attacks on the SHARED spine
|
|
164
|
-
|
|
165
|
-
Convergence on snapshots + global-seq-under-lock + segments + lazy projections
|
|
166
|
-
is exactly where groupthink would hide (both slots are the same model — the
|
|
167
|
-
single-agent-method memory applies). Two scenarios where the spine is wrong:
|
|
168
|
-
|
|
169
|
-
### 2.1 Full-snapshot-per-event is wrong for high-frequency partial updates on fat entities
|
|
170
|
-
|
|
171
|
-
The poison combination is **large entity × high update frequency**: an
|
|
172
|
-
agent_run receiving progress/heartbeat updates every few seconds, or a plan
|
|
173
|
-
with a long body where every step-completion rewrites the whole doc. A 100 KB
|
|
174
|
-
plan updated 50× = 5 MB of journal for one entity in one session; each record
|
|
175
|
-
individually exceeds the O_APPEND atomicity comfort zone (§1.6) **and** makes
|
|
176
|
-
the per-mutation fsync slower, all to record a one-field change. The spine's
|
|
177
|
-
defense ("entities are 1–10 KB") is an observation about today's store, not
|
|
178
|
-
an invariant of the schema.
|
|
179
|
-
|
|
180
|
-
**Falsifier (measurable now, before phase 1):** from the dogfood store,
|
|
181
|
-
compute per-`item_type` p95 entity size × per-entity event frequency from the
|
|
182
|
-
17k v1 events. If any type's (size × freq) implies segment rolls faster than
|
|
183
|
-
~weekly, or any single record would exceed 64 KB, then snapshots-everywhere is
|
|
184
|
-
the wrong call for that type and `payload_ref` (or per-type field-delta) must
|
|
185
|
-
ship in phase 1, not deferred. This measurement is cheap and should be a
|
|
186
|
-
phase-0 deliverable.
|
|
187
|
-
|
|
188
|
-
### 2.2 Global-seq-under-lock welds event capture to lock availability
|
|
189
|
-
|
|
190
|
-
Three concrete failure shapes:
|
|
191
|
-
|
|
192
|
-
1. **Wedged lock** = no events at all (A) or seq-null limbo events (B). The
|
|
193
|
-
sprint-1 hardening reduced but did not eliminate stuck-owner scenarios.
|
|
194
|
-
2. **Federation import** must take the lock to assign local seqs to every
|
|
195
|
-
imported batch; a 10k-event pull holds the lock long enough to starve local
|
|
196
|
-
agents at 20-agent scale. Needs chunked import with lock release between
|
|
197
|
-
chunks — neither proposal says so.
|
|
198
|
-
3. **Sandboxed/worktree workers** (the facades-in-sandbox failure is on
|
|
199
|
-
record: sandboxed agents could not reach MCP) **cannot append at all**.
|
|
200
|
-
Their work produces zero journal events until a sync point — meaning the
|
|
201
|
-
journal is the truth of the *store*, not of the *system*, and nobody should
|
|
202
|
-
pretend otherwise.
|
|
203
|
-
|
|
204
|
-
**Falsifier:** the moment any roadmap item requires offline/sandboxed local
|
|
205
|
-
event capture (a worker journaling into its worktree for later merge),
|
|
206
|
-
global-seq-under-lock is the wrong primitive — that world needs per-writer
|
|
207
|
-
seqs + merge, i.e., the federation mechanism applied locally. Until then the
|
|
208
|
-
spine holds, but the spec must state this boundary explicitly so the
|
|
209
|
-
assumption is visible when it breaks.
|
|
210
|
-
|
|
211
|
-
## 3. Divergence adjudication
|
|
212
|
-
|
|
213
|
-
### 3.1 Diff-to-event synthesis at `persistStateUnlocked` (A) vs `diffToEvents` in `mutateState` (B)
|
|
214
|
-
|
|
215
|
-
Substantively the same mechanism at the same layer; the real defect is shared:
|
|
216
|
-
**diff inference destroys verb semantics.** A diff knows created/changed/
|
|
217
|
-
removed; it cannot know `claim` vs `update` vs `complete` — the EventAction
|
|
218
|
-
union's expressiveness, which notifications and federation signaling consume,
|
|
219
|
-
collapses to generic `update`. A waves at "verb sites migrate
|
|
220
|
-
opportunistically afterwards"; B defers to phase 3. Both lose semantics in the
|
|
221
|
-
interim.
|
|
222
|
-
|
|
223
|
-
**Verdict: third option beats both.** Diff-synthesis stays as the
|
|
224
|
-
**correctness backstop** (it guarantees no mutation escapes the journal —
|
|
225
|
-
that property is what makes the journal trustworthy), plus a per-mutation
|
|
226
|
-
**intent annotation**: verb sites declare `(action, item_type, item_id,
|
|
227
|
-
summary)` into the in-flight mutation context (they already call
|
|
228
|
-
`appendEvent` with exactly these fields today — the call sites exist, they
|
|
229
|
-
just need to write to the mutation context instead of the legacy stream); the
|
|
230
|
-
diff supplies the payload and emits any *unannotated* change as `update` +
|
|
231
|
-
doctor counter. Semantic fidelity from day one, correctness regardless.
|
|
232
|
-
|
|
233
|
-
### 3.2 Cursors: `{segment_id, offset}` (A) vs `{last_seq}` (B)
|
|
234
|
-
|
|
235
|
-
**Verdict: B wins outright.** Seq watermarks are rotation-proof,
|
|
236
|
-
compaction-proof, trivially comparable, and survive any future re-layout.
|
|
237
|
-
A's byte offsets are broken **by A's own design**: A's quarantine repair moves
|
|
238
|
-
torn bytes out of the active segment — a rewrite that shifts every subsequent
|
|
239
|
-
offset, invalidating A's own cursors. Self-inflicted. The cost of seq cursors
|
|
240
|
-
(scan from segment start to find seq N, no per-line index) is bounded by the
|
|
241
|
-
10 MB segment size and irrelevant in practice. B's `{gap: true}` +
|
|
242
|
-
checkpoint-summary degradation for ancient watermarks is also the right
|
|
243
|
-
notification semantics and should be kept verbatim.
|
|
244
|
-
|
|
245
|
-
### 3.3 Torn-tail handling: quarantine-and-repair (A) vs leading-`\n` framing (B)
|
|
246
|
-
|
|
247
|
-
**Verdict: B's framing wins; A's repair must die.** A's writer-side repair
|
|
248
|
-
(move torn bytes to `quarantine/`, truncate) is a **read-modify-write of the
|
|
249
|
-
journal** — it breaks append-only (the one structural property everything
|
|
250
|
-
else leans on), breaks A's offset cursors (§3.2), and adds a crash window
|
|
251
|
-
*inside the repair itself*. B's leading-`\n` caps damage at one event with
|
|
252
|
-
zero mutation of existing bytes and costs one byte per record.
|
|
253
|
-
|
|
254
|
-
Take from A the **loudness**, repaired: when a writer (under lock, before
|
|
255
|
-
appending) detects a torn tail, it appends a `journal_note` event recording
|
|
256
|
-
the fragment's segment + byte range + content hash as *adjudicated*. Doctor
|
|
257
|
-
then distinguishes adjudicated fragments (expected crash residue, count only)
|
|
258
|
-
from unexplained mid-file corruption (alarm). This resolves B's
|
|
259
|
-
self-contradiction (§1.5) and keeps park-don't-delete: the fragment stays in
|
|
260
|
-
the file, annotated, forever.
|
|
261
|
-
|
|
262
|
-
### 3.4 `doctor --verify-journal` (A) vs scattered doctor checks (B)
|
|
263
|
-
|
|
264
|
-
**Verdict: A wins; union with B's counters.** A's full rebuild-in-temp-dir +
|
|
265
|
-
diff-against-projections is the only check that validates the actual claim
|
|
266
|
-
("journal is sufficient to reproduce state") and is the only credible phase-2
|
|
267
|
-
exit gate. B's incremental counters (skipped lines, torn tails, network-FS
|
|
268
|
-
warning) are cheap continuous telemetry and complement it. Synthesis ships
|
|
269
|
-
both: counters always-on, `--verify-journal` in CI on both OS families + as
|
|
270
|
-
the dual→primary promotion gate.
|
|
271
|
-
|
|
272
|
-
### 3.5 Additional divergences spotted
|
|
273
|
-
|
|
274
|
-
| Divergence | A | B | Verdict |
|
|
275
|
-
|---|---|---|---|
|
|
276
|
-
| Segment naming/sealing | Range-named, **rename at seal** | First-seq-named at birth, **never rename** | **B.** A's rename-of-open-file is the exact Windows EPERM/EBUSY hazard A then mitigates with retry logic. B's name-at-birth needs no rename, no retry, no Windows caveat. Don't build the problem. |
|
|
277
|
-
| fsync default | `rotate` (seal+checkpoint only) | one fsync per `mutate()` | **B.** A's argument (fsync cost vs MCP cheapness) conflates paths: fsync is on the write path; MCP read cost is untouched. A journal-as-truth that confirms mutations the disk may not have is a contradiction in terms. Mutations are human-frequency; one fsync each is affordable. Keep the config escape hatch, drop the `rotate` default. |
|
|
278
|
-
| Envelope: `entity_rev` (A) vs `(writer_id, writer_seq)` (B) | per-entity monotonic rev | per-writer counter | **A.** `entity_rev` serves three masters (cheap dirty checks, optimistic concurrency, federation conflict detection); `writer_seq` serves only federation and is derivable later. Don't carry dead weight in every record. |
|
|
279
|
-
| Writer identity | pid + start-nonce | agent name + bare pid | **A** (pid reuse, §1.1). |
|
|
280
|
-
| Meta files | single `meta.json` | `HEAD.json` + `projections.json` | **A's single file, B's rebuildable-cache property** (§1.8). |
|
|
281
|
-
| Tombstone shape | `action:"delete", payload:null` | `deleted:true`, payload omitted | Either works; pick **`action:"delete"` + payload omitted** (no boolean redundant with the action union). Codex schema review should confirm. |
|
|
282
|
-
| Observability events | in-lock, payload-exempt | lockless, `seq:null` | **A** (roll race, §1.2). All appends take the lock; revisit only with contention data. |
|
|
283
|
-
| Checkpoint placement | out-of-band manifest | in-journal event run | **A** (§1.3), self-contained, + verify-before-archive (§1.4). |
|
|
284
|
-
|
|
285
|
-
## 4. VERDICT
|
|
286
|
-
|
|
287
|
-
### The 5 decisions the synthesis MUST take
|
|
288
|
-
|
|
289
|
-
1. **Segment lifecycle = B's:** first-seq-named at creation, never renamed,
|
|
290
|
-
immutable after roll; **all appends under the store lock** (no lockless
|
|
291
|
-
path — closes the roll race §1.2); cursors are **seq watermarks** (B),
|
|
292
|
-
with `{gap:true}` checkpoint-summary degradation.
|
|
293
|
-
2. **Torn-tail protocol = B's framing + A's loudness, no rewrites:**
|
|
294
|
-
leading+trailing `\n` per record; torn fragments stay in place forever;
|
|
295
|
-
writer appends an adjudicating `journal_note` under lock; doctor separates
|
|
296
|
-
adjudicated residue from unexplained corruption. Max-record-size enforced
|
|
297
|
-
at write (warn 64 KB / fail 256 KB).
|
|
298
|
-
3. **Checkpoints = A's:** out-of-band, self-contained manifests in
|
|
299
|
-
`checkpoints/`, fsync'd, meta-written-last; **verified by full re-parse
|
|
300
|
-
before any superseded segment is archived**; never referenced-by-hash.
|
|
301
|
-
4. **Two-writer honesty = A's, hardened:** writer identity = pid+start-nonce;
|
|
302
|
-
duplicate `(seq)` from distinct writers is a *detected anomaly* (file
|
|
303
|
-
order wins, doctor warns); on lock acquisition, writer validates
|
|
304
|
-
`next_seq` against the actual segment tail and self-heals upward
|
|
305
|
-
(`seq_repair` event). The phrase "impossible by construction" is banned
|
|
306
|
-
from the synthesis.
|
|
307
|
-
5. **Write path = journal-first, fsync-per-mutate (B), diff-synthesis as
|
|
308
|
-
backstop + verb-site intent annotation (third option, §3.1);**
|
|
309
|
-
single rebuildable `meta.json`; `entity_rev` in the envelope;
|
|
310
|
-
`doctor --verify-journal` as the dual→primary promotion gate, doctor
|
|
311
|
-
counters always-on.
|
|
312
|
-
|
|
313
|
-
### Open questions, severity-ranked
|
|
314
|
-
|
|
315
|
-
| # | Sev | Owner | Question |
|
|
316
|
-
|---|---|---|---|
|
|
317
|
-
| 1 | HIGH | **Juan** | Journal-in-git policy (B's Q5). Recommendation: **gitignore segments inside the store repo; commit checkpoints only.** Committing segments bloats history (10 MB blobs) and — fatal — if journal files ever live on diverging branches/worktrees that merge, seq uniqueness dies in a union merge. Needs a product decision because it touches the "git-diffable identity" constraint's interpretation. |
|
|
318
|
-
| 2 | HIGH | **Codex** | Envelope schema review: payload-required-per-EventAction mapping (A's Q1), tombstone shape (§3.5), checkpoint manifest schema, `journal_note`/`seq_repair` event kinds. The action union → payload contract has hole potential and is exactly schema-review territory. |
|
|
319
|
-
| 3 | MED | **Juan** | Stale-read policy under lock contention (B's Q3): serve stale-with-annotation everywhere except claim verbs (read-through-journal)? Liveness-vs-consistency product call; affects dispatch correctness at 20-agent scale. |
|
|
320
|
-
| 4 | MED | **Codex + measurement** | Snapshot-size falsifier (§2.1): run the p95-size × frequency analysis on the dogfood store in phase 0. If poison combination exists, `payload_ref` enters phase 1 and the record schema changes — decide before, not after, the format ships. |
|
|
321
|
-
| 5 | LOW | **Juan** | Federation conflict semantics (A-Q2/B-Q4) and gc/archive thresholds (A-Q5). Genuinely deferrable: the journal design is agnostic; both only need answering when federation lands. |
|
|
322
|
-
|
|
323
|
-
### Abstract
|
|
324
|
-
|
|
325
|
-
B's spine survives; its perimeter doesn't: the "two writers impossible" claim
|
|
326
|
-
is false and undetected, the lockless append path races segment roll into
|
|
327
|
-
"immutable" files, and in-journal checkpoints pollute every cursor. A's
|
|
328
|
-
quarantine repair is self-defeating (rewrites the journal, breaks its own
|
|
329
|
-
offset cursors) and must be replaced by B's framing. Synthesis: B's segment
|
|
330
|
-
lifecycle + cursors + fsync, A's checkpoints + two-writer detection +
|
|
331
|
-
verify-journal gate, plus three rules neither proposal had — verify
|
|
332
|
-
checkpoints before archival, enforce max record size, validate next_seq
|
|
333
|
-
against the segment tail on lock acquisition.
|