brainclaw 1.8.0 → 1.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +12 -11
- package/dist/brainclaw-vscode.vsix +0 -0
- package/dist/cli.js +138 -13
- package/dist/commands/add-step.js +1 -1
- package/dist/commands/bootstrap.js +2 -26
- package/dist/commands/check-security-mcp.js +50 -33
- package/dist/commands/check-security.js +86 -43
- package/dist/commands/claim.js +22 -21
- package/dist/commands/confirm.js +26 -0
- package/dist/commands/context-diff.js +1 -1
- package/dist/commands/dispatch-watch.js +142 -0
- package/dist/commands/doctor.js +113 -2
- package/dist/commands/estimation-report.js +115 -16
- package/dist/commands/harvest.js +285 -22
- package/dist/commands/init.js +123 -21
- package/dist/commands/loops-handlers.js +4 -0
- package/dist/commands/mcp-read-handlers.js +198 -29
- package/dist/commands/mcp.js +588 -92
- package/dist/commands/memory.js +21 -17
- package/dist/commands/migrate.js +81 -17
- package/dist/commands/prune.js +78 -4
- package/dist/commands/reflect.js +26 -20
- package/dist/commands/register-agent.js +57 -1
- package/dist/commands/repair.js +20 -0
- package/dist/commands/session-end.js +15 -6
- package/dist/commands/session-start.js +18 -1
- package/dist/commands/setup-security.js +39 -18
- package/dist/commands/setup.js +26 -27
- package/dist/commands/stale.js +16 -2
- package/dist/commands/uninstall.js +126 -34
- package/dist/commands/update-step.js +6 -0
- package/dist/commands/worktree.js +60 -0
- package/dist/core/actions.js +12 -3
- package/dist/core/agent-capability.js +11 -13
- package/dist/core/agent-files.js +844 -547
- package/dist/core/agent-integrations.js +0 -3
- package/dist/core/agent-inventory.js +67 -0
- package/dist/core/agent-registry.js +163 -29
- package/dist/core/agentrun-reconciler.js +33 -2
- package/dist/core/agentruns.js +7 -1
- package/dist/core/ai-agent-detection.js +31 -44
- package/dist/core/archival.js +15 -9
- package/dist/core/assignment-reconciler.js +56 -0
- package/dist/core/assignment-sweeper.js +127 -4
- package/dist/core/assignments.js +69 -11
- package/dist/core/bootstrap.js +233 -67
- package/dist/core/brainclaw-version.js +22 -0
- package/dist/core/candidates.js +21 -1
- package/dist/core/claims.js +313 -150
- package/dist/core/config.js +6 -1
- package/dist/core/context-diff.js +148 -20
- package/dist/core/context.js +129 -8
- package/dist/core/coordination.js +22 -3
- package/dist/core/dispatch-status.js +79 -5
- package/dist/core/dispatcher.js +64 -11
- package/dist/core/entity-operations.js +45 -24
- package/dist/core/entity-registry.js +31 -5
- package/dist/core/event-log.js +138 -21
- package/dist/core/events/checkpoint.js +258 -0
- package/dist/core/events/genesis.js +220 -0
- package/dist/core/events/journal.js +507 -0
- package/dist/core/events/materialize.js +126 -0
- package/dist/core/events/registry-post-image.js +110 -0
- package/dist/core/events/verify.js +109 -0
- package/dist/core/execution-adapters.js +23 -0
- package/dist/core/facade-schema.js +38 -0
- package/dist/core/gc-semantic.js +130 -5
- package/dist/core/handoff-snapshot.js +68 -0
- package/dist/core/ids.js +19 -8
- package/dist/core/instruction-templates.js +34 -115
- package/dist/core/io.js +39 -3
- package/dist/core/json-store.js +10 -1
- package/dist/core/lock.js +153 -28
- package/dist/core/loops/bootstrap-acquire.js +25 -1
- package/dist/core/loops/facade-schema.js +2 -0
- package/dist/core/loops/hooks/survey-signals-baseline.js +36 -0
- package/dist/core/loops/index.js +1 -0
- package/dist/core/loops/presets/bootstrap.js +7 -0
- package/dist/core/loops/store.js +17 -0
- package/dist/core/loops/verbs.js +24 -1
- package/dist/core/markdown.js +8 -76
- package/dist/core/mcp-command-resolution.js +245 -0
- package/dist/core/memory-compactor.js +5 -3
- package/dist/core/memory-lifecycle.js +282 -0
- package/dist/core/merge-risk.js +150 -0
- package/dist/core/messaging.js +8 -1
- package/dist/core/migration.js +11 -1
- package/dist/core/observer-mode.js +26 -0
- package/dist/core/operations/memory-mutation.js +90 -65
- package/dist/core/operations/plan.js +27 -1
- package/dist/core/protocol-skills.js +210 -0
- package/dist/core/reflection-safety.js +6 -7
- package/dist/core/reputation.js +84 -2
- package/dist/core/runtime-signals.js +71 -9
- package/dist/core/runtime.js +84 -1
- package/dist/core/schema.js +114 -0
- package/dist/core/security-detectors.js +125 -0
- package/dist/core/security-extract.js +189 -0
- package/dist/core/security-guard.js +107 -29
- package/dist/core/security-packages.js +121 -0
- package/dist/core/security-scoring.js +76 -9
- package/dist/core/security.js +34 -2
- package/dist/core/sequence.js +11 -2
- package/dist/core/setup-flow.js +141 -13
- package/dist/core/staleness.js +72 -1
- package/dist/core/state.js +250 -54
- package/dist/core/store-resolution.js +19 -5
- package/dist/core/worktree.js +72 -8
- package/dist/facts.js +8 -8
- package/dist/facts.json +7 -7
- package/docs/PROTOCOL.md +223 -0
- package/docs/cli.md +11 -10
- package/docs/concepts/coordinator-runbook.md +129 -0
- package/docs/concepts/event-log-store-critique-A.md +333 -0
- package/docs/concepts/event-log-store-critique-B.md +353 -0
- package/docs/concepts/event-log-store-phase0-measurements.md +58 -0
- package/docs/concepts/event-log-store-proposal-A.md +365 -0
- package/docs/concepts/event-log-store-proposal-B.md +404 -0
- package/docs/concepts/event-log-store.md +928 -0
- package/docs/concepts/identity-model-proposal.md +371 -0
- package/docs/concepts/memory.md +5 -4
- package/docs/concepts/observer-protocol.md +361 -0
- package/docs/concepts/parallel-merge-protocol.md +71 -0
- package/docs/concepts/plans-and-claims.md +43 -0
- package/docs/concepts/skills.md +78 -0
- package/docs/concepts/workspace-bootstrapping.md +61 -0
- package/docs/integrations/agents.md +4 -4
- package/docs/integrations/cline.md +10 -11
- package/docs/integrations/codex.md +2 -2
- package/docs/integrations/continue.md +5 -5
- package/docs/integrations/copilot.md +14 -12
- package/docs/integrations/openclaw.md +7 -6
- package/docs/integrations/overview.md +7 -7
- package/docs/integrations/roo.md +3 -3
- package/docs/integrations/windsurf.md +6 -6
- package/docs/mcp-schema-changelog.md +29 -2
- package/docs/quickstart.md +48 -47
- package/docs/security.md +174 -15
- package/docs/storage.md +4 -2
- package/package.json +8 -6
|
@@ -0,0 +1,928 @@
|
|
|
1
|
+
# Event-Log Store — Converged Design Spec
|
|
2
|
+
|
|
3
|
+
> Synthesis (round 3) of ideation loop lop_3bf55b9492e0d96c, pln#543 step 1.
|
|
4
|
+
> Distills proposal-A, proposal-B, and both cross-critiques. Where the two
|
|
5
|
+
> round-2 VERDICT blocks agree, this spec follows them; where they diverge,
|
|
6
|
+
> one option is chosen and the loser recorded in Appendix A. Status: SPEC,
|
|
7
|
+
> product calls ARBITRATED (Juan, 2026-06-10 — see §6); C1–C4 resolved by
|
|
8
|
+
> the symmetric schema review of 2026-06-10 (§2.1.1–2.1.4, §2.2, §2.10,
|
|
9
|
+
> §2.11); residue R1–R4 + new product call J6 in §6, pending the Codex
|
|
10
|
+
> second pass.
|
|
11
|
+
|
|
12
|
+
## 1. Motivation
|
|
13
|
+
|
|
14
|
+
The 2026-06-10 review (zone 1) found that `src/core/event-log.ts` cannot
|
|
15
|
+
serve as the store's source of truth in its current form: appends are
|
|
16
|
+
swallowed on error (`appendFileSync` inside a catch-all — a journal that may
|
|
17
|
+
silently drop writes is not a journal), events carry no payload (state is not
|
|
18
|
+
reconstructible from the log), rotation at 10MB renames the file away and
|
|
19
|
+
**deletes all reader cursors** (silent re-notification loss, history
|
|
20
|
+
unreachable), and the only ordering key is a wall-clock timestamp (unreliable
|
|
21
|
+
across agent shells, WSL, containers). Meanwhile every state mutation already
|
|
22
|
+
serializes through the hardened store lock, and loops already run a
|
|
23
|
+
payload-carrying journal (`loops/<id>/events.jsonl`) — the substrate and the
|
|
24
|
+
precedent both exist. This spec evolves the event log into a write-ahead
|
|
25
|
+
journal of full-entity snapshots, organized as immutable segments plus
|
|
26
|
+
out-of-band checkpoints, with the existing per-entity JSON directories
|
|
27
|
+
demoted to lazily reconciled projections (the pln#496 pattern).
|
|
28
|
+
|
|
29
|
+
## 2. Design
|
|
30
|
+
|
|
31
|
+
### 2.1 Event record format
|
|
32
|
+
|
|
33
|
+
Each record is one JSONL line, zod-validated, envelope version `v: 2`
|
|
34
|
+
(existing events are retroactively v1):
|
|
35
|
+
|
|
36
|
+
```jsonc
|
|
37
|
+
{
|
|
38
|
+
"v": 2,
|
|
39
|
+
"seq": 18342, // store-global monotonic, assigned under lock
|
|
40
|
+
"ts": "2026-06-10T14:03:22.114Z", // informational ONLY — never an ordering key
|
|
41
|
+
"writer": "w_31416-9f3c2a", // pid + start-nonce (NOT agent name, NOT bare pid)
|
|
42
|
+
"agent": "claude-code",
|
|
43
|
+
"agent_id": "agt_...", // optional
|
|
44
|
+
"user": "jberdah", // optional
|
|
45
|
+
"action": "update", // EventAction union (see payload rule below)
|
|
46
|
+
"item_type": "plan",
|
|
47
|
+
"item_id": "pln_2290bc70",
|
|
48
|
+
"entity_rev": 7, // per-entity monotonic revision
|
|
49
|
+
"summary": "step 1 completed", // human-facing, optional
|
|
50
|
+
"payload": { /* full post-image, schema-valid entity doc */ }
|
|
51
|
+
}
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
Normative rules:
|
|
55
|
+
|
|
56
|
+
- **Payload = full entity snapshot** (post-image), never a diff. Required
|
|
57
|
+
iff the action mutates a persisted entity; lifecycle/observability actions
|
|
58
|
+
(`session_start`, notification verbs) are payload-free. The normative
|
|
59
|
+
action-class → payload-requirement mapping is §2.1.1.
|
|
60
|
+
- **Tombstones**: `action: "delete"`, payload omitted. No redundant
|
|
61
|
+
`deleted` boolean. Per-item_type semantics in §2.1.3.
|
|
62
|
+
- **`(seq, writer)` is the normative event identity.** Bare `seq` is an
|
|
63
|
+
address, valid only where the lock guarantees held (see §2.2 anomaly
|
|
64
|
+
handling). Federation idempotency keys, dedup, and the dup-seq reducer all
|
|
65
|
+
use the pair.
|
|
66
|
+
- **`entity_rev`** is a per-entity monotonic revision bumped on every event
|
|
67
|
+
for that id, carried in the envelope (entity schemas untouched). It powers
|
|
68
|
+
projection dirty-checks, the never-regress guard (§2.7), optimistic
|
|
69
|
+
concurrency for future API writes, and is the local half of federation
|
|
70
|
+
conflict detection.
|
|
71
|
+
- **New event kinds** introduced by this spec: `checkpoint_ref` (§2.4),
|
|
72
|
+
`journal_note` (§2.6), `seq_repair` (§2.2), `backfill` (§4). Normative
|
|
73
|
+
schemas in §2.1.2.
|
|
74
|
+
- **Writer identity** is pid + per-process random start-nonce. Pid reuse
|
|
75
|
+
makes bare pid unreliable over a journal's lifetime; agent name is
|
|
76
|
+
metadata, not identity.
|
|
77
|
+
- **Max record size, enforced at write time**: payloads > 64 KB are
|
|
78
|
+
externalized via `payload_ref` (§2.10); the *envelope line* hard-fails at
|
|
79
|
+
256 KB (with payload_ref no legitimate record approaches it). The cap is
|
|
80
|
+
the tripwire that tells us when the snapshot-everywhere assumption expires
|
|
81
|
+
(see falsifier, §2.8 — it fired on handoffs in phase 0, hence §2.10).
|
|
82
|
+
|
|
83
|
+
#### 2.1.1 Action taxonomy → payload requirement (C1 resolution)
|
|
84
|
+
|
|
85
|
+
The v2 `action` field extends today's `EventAction` union
|
|
86
|
+
(`src/core/event-log.ts`, 34 members) with five journal-meta actions
|
|
87
|
+
(`checkpoint_ref`, `journal_note`, `seq_repair`, `backfill`,
|
|
88
|
+
`federation_apply`) and three progress-split verbs (`run_progress`,
|
|
89
|
+
`assignment_amended`, `run_amended` — see the heartbeat/durable split
|
|
90
|
+
below), and classifies every action into exactly one of five classes.
|
|
91
|
+
`EventItemType` gains `journal` (journal-meta records) and, at registry
|
|
92
|
+
unification (§4 phase 3), `loop`.
|
|
93
|
+
|
|
94
|
+
**Encoding (R1 resolution, 2026-06-12):** the class is **never
|
|
95
|
+
serialized** — `action` remains the only discriminant on the wire. Code
|
|
96
|
+
carries one `ACTION_CLASS_BY_ACTION` table typed
|
|
97
|
+
`satisfies Record<EventAction, ActionClass>` so adding a 35th action
|
|
98
|
+
without classifying it is a compile error, and the zod schema is a
|
|
99
|
+
discriminated union keyed on `action` (zod ≥ 4 accepts enum values per
|
|
100
|
+
branch — 4.4.3 is the installed runtime dep). The phase-gated payload
|
|
101
|
+
requiredness of registry-lifecycle (OPTIONAL until phase 1.5) is a
|
|
102
|
+
runtime refinement selected by journal mode
|
|
103
|
+
(`off | dual | primary | registryPrimary`), not a frozen schema variant —
|
|
104
|
+
the wire format never changes across the cutover, only the validator
|
|
105
|
+
strictness. A serialized `class` field was rejected: a derived persisted
|
|
106
|
+
field is drift waiting to happen (same failure family as trp#180).
|
|
107
|
+
|
|
108
|
+
| Class | Actions | `payload` | `item_id` | `entity_rev` |
|
|
109
|
+
|---|---|---|---|---|
|
|
110
|
+
| entity-state | `create`, `update`, `accept`, `reject`, `claim`, `release_claim`, `rollback`, `upgrade`, `backfill` | REQUIRED — versioned post-image (§2.1.4) or `payload_ref` (§2.10) | REQUIRED | REQUIRED, bumped |
|
|
111
|
+
| tombstone | `delete` | FORBIDDEN | REQUIRED | REQUIRED, bumped |
|
|
112
|
+
| journal-meta | `checkpoint_ref`, `journal_note`, `seq_repair`, `federation_apply` | REQUIRED — meta-schema per action (§2.1.2), never an entity post-image | FORBIDDEN (`item_type: "journal"`) | absent |
|
|
113
|
+
| observability | `session_start`, `session_end`, `assignment_offered`*, `assignment_progress`†, `run_progress`† | FORBIDDEN | optional | absent |
|
|
114
|
+
| registry-lifecycle | `assignment_created/accepted/started/completed/cancelled/failed/blocked/timed_out/expired/retrying/rerouted`, `assignment_amended`†, `run_amended`†, all other `run_*` (`run_running` = the transition into running, emitted once†) | OPTIONAL until registry families go journal-primary (J4 phase 1.5); REQUIRED post-image from then on | REQUIRED | absent until phase 1.5, then REQUIRED, bumped |
|
|
115
|
+
|
|
116
|
+
\* `assignment_offered` is a status transition of the assignment doc and
|
|
117
|
+
moves to registry-lifecycle at phase 1.5; until then it is notification-only.
|
|
118
|
+
† See the heartbeat/durable-progress split below.
|
|
119
|
+
|
|
120
|
+
Holes found by the adversarial enumeration, resolved as follows:
|
|
121
|
+
|
|
122
|
+
- **`item_id` is optional in today's `MemoryEvent`.** v2 makes it REQUIRED
|
|
123
|
+
for entity-state, tombstone, and registry-lifecycle records — a
|
|
124
|
+
payload-carrying or rev-bumping record without an addressable entity is
|
|
125
|
+
unreplayable and rejected at write time.
|
|
126
|
+
- **Heartbeat vs durable progress (pre-P0 codex review, 2026-06-12).**
|
|
127
|
+
Today's code conflates the two on a single verb:
|
|
128
|
+
`recordAssignmentProgress` persists `status_reason` and **appends to
|
|
129
|
+
`artifacts`** (durable, accumulating state) on the same path that bumps
|
|
130
|
+
`last_heartbeat_at`, then emits `assignment_progress`
|
|
131
|
+
(`assignments.ts:296-301`); `recordAgentRunProgress` likewise mutates
|
|
132
|
+
`session_id`/`status_reason`/`artifacts` and re-emits `run_running` on
|
|
133
|
+
every tick (`agentruns.ts:318-325`). If those verbs stayed
|
|
134
|
+
heartbeat-class/no-replay, journal-primary replay would silently drop
|
|
135
|
+
artifacts reported mid-run. Normative invariant: **any event reflecting
|
|
136
|
+
a durable-field mutation must be replayable by the phase its family goes
|
|
137
|
+
journal-primary.** Resolution, effective at phase 1.5 (no write-path
|
|
138
|
+
code change before then):
|
|
139
|
+
- `assignment_progress` and `run_progress` (new) are **pure ticks** —
|
|
140
|
+
observability-class, payload FORBIDDEN, touching only ephemeral-class
|
|
141
|
+
fields (`last_heartbeat_at`, `updated_at`, `last_event_at`); excluded
|
|
142
|
+
from replay, exactly as §2.8 masks them.
|
|
143
|
+
- `run_running` re-scopes to the status **transition into** running —
|
|
144
|
+
registry-lifecycle, emitted once per transition, never as a tick.
|
|
145
|
+
- A progress call that carries durable mutations (`status_reason`,
|
|
146
|
+
`artifacts`, `session_id` binding) emits `assignment_amended` /
|
|
147
|
+
`run_amended` instead — registry-lifecycle, REQUIRED post-image,
|
|
148
|
+
`entity_rev` bumped. The write path splits on argument presence;
|
|
149
|
+
existing tests that assert artifacts-on-progress move to the amended
|
|
150
|
+
verbs with the phase 1.5 migration.
|
|
151
|
+
- **Whole-store operations** (`rollback`, `upgrade` — today emitted once
|
|
152
|
+
with `item_type: "state"`): in v2 the diff choke point (§2.8) emits them
|
|
153
|
+
*per entity* (entity-state class, post-image each), plus one
|
|
154
|
+
`journal_note` kind `store_marker` recording the store-level operation
|
|
155
|
+
for audit. The coarse `item_type: "state"` event class disappears; the
|
|
156
|
+
`state` item_type survives only inside `store_marker` notes.
|
|
157
|
+
- **Compactor archival vs deletion**: archival removal emits `delete` with
|
|
158
|
+
`summary: "archived"` (the archived copy lives outside live dirs and is
|
|
159
|
+
not journal-visible). Restore emits `create` continuing the entity_rev
|
|
160
|
+
counter. **`entity_rev` is per item_id and never resets**, including
|
|
161
|
+
across delete→recreate — required by federation LWW (§2.11).
|
|
162
|
+
- **Sessions stay observability-class.** `current_session` /
|
|
163
|
+
`session_snapshot` docs remain projection-only (ephemeral-class, like
|
|
164
|
+
heartbeats); if sessions ever need replay they move to entity-state, but
|
|
165
|
+
nothing today consumes a replayed session.
|
|
166
|
+
|
|
167
|
+
#### 2.1.2 Journal-meta record schemas (C1 resolution)
|
|
168
|
+
|
|
169
|
+
All journal-meta records use `item_type: "journal"`, omit `item_id` and
|
|
170
|
+
`entity_rev`, and carry a payload discriminated as follows:
|
|
171
|
+
|
|
172
|
+
```jsonc
|
|
173
|
+
// checkpoint_ref — appended AFTER manifest fsync (§2.4)
|
|
174
|
+
{ "action": "checkpoint_ref", "item_type": "journal", "payload": {
|
|
175
|
+
"file": "ckpt-00018000.json", // name under checkpoints/
|
|
176
|
+
"sha256": "…", // hash of the manifest bytes
|
|
177
|
+
"head_seq": 18342, // last seq the manifest covers
|
|
178
|
+
"entities": 913, // live entity count
|
|
179
|
+
"bytes": 481332,
|
|
180
|
+
"blobs": ["…"] // payload_ref closure (§2.10); [] if none
|
|
181
|
+
} }
|
|
182
|
+
|
|
183
|
+
// journal_note — discriminated by payload.kind
|
|
184
|
+
{ "action": "journal_note", "item_type": "journal", "payload": {
|
|
185
|
+
"kind": "torn_tail_adjudicated",
|
|
186
|
+
"segment": "seg-00018000.jsonl",
|
|
187
|
+
"byte_start": 104832, "byte_end": 105219,
|
|
188
|
+
"sha256": "…" } } // hash of the adjudicated fragment
|
|
189
|
+
{ "payload": { "kind": "genesis", // phase-1 migration marker (§4)
|
|
190
|
+
"migrated_from": "v1", "v1_events_parked": 17727,
|
|
191
|
+
"backfill_count": 913, "tool_version": "…" } }
|
|
192
|
+
{ "payload": { "kind": "redaction", // J1 audit trail — doctor redact
|
|
193
|
+
"segments": ["seg-00000001.jsonl"],
|
|
194
|
+
"redacted": [{ "seq": 1234, "writer": "w_…" }],
|
|
195
|
+
"reason": "…", "by": "…" } }
|
|
196
|
+
{ "payload": { "kind": "store_marker", // whole-store ops (§2.1.1)
|
|
197
|
+
"op": "rollback", // "rollback" | "upgrade"
|
|
198
|
+
"detail": "…" } }
|
|
199
|
+
|
|
200
|
+
// seq_repair — tail-validation correction (§2.2)
|
|
201
|
+
{ "action": "seq_repair", "item_type": "journal", "payload": {
|
|
202
|
+
"meta_next_seq": 18301, // stale value found in meta.json
|
|
203
|
+
"tail_seq": 18342, // observed at the active-segment tail
|
|
204
|
+
"repaired_next_seq": 18343 } }
|
|
205
|
+
|
|
206
|
+
// federation_apply — local record of an applied remote slice
|
|
207
|
+
// (required by identity-model-proposal §"local apply"; declared NOW so
|
|
208
|
+
// the frozen v2 union needs no post-freeze extension — emitted only once
|
|
209
|
+
// federation ships, inert until then)
|
|
210
|
+
{ "action": "federation_apply", "item_type": "journal", "payload": {
|
|
211
|
+
"origin_id": "org_a1b2…", "origin_epoch": 3,
|
|
212
|
+
"seq_range": [120, 184], // remote seqs covered by the slice
|
|
213
|
+
"applied": 64, // records materialized locally
|
|
214
|
+
"conflicts": 1, // LWW losers surfaced as candidates (§2.11)
|
|
215
|
+
"slice_sha256": "…" } } // hash of the ingested slice bytes
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
`backfill` is **entity-state class**, not journal-meta: normal envelope
|
|
219
|
+
with `item_type`/`item_id`/`entity_rev`/payload. Genesis (§4 phase 1) = one
|
|
220
|
+
`journal_note` kind `genesis` followed by one `backfill` per live entity
|
|
221
|
+
with `entity_rev: 1`, all under a single lock hold. Doctor-initiated
|
|
222
|
+
re-syncs reuse `backfill` with the entity's current rev + 1.
|
|
223
|
+
|
|
224
|
+
#### 2.1.3 Tombstone semantics per item_type (C1 resolution)
|
|
225
|
+
|
|
226
|
+
- Payload FORBIDDEN; `item_id` REQUIRED; `entity_rev` bumped. The rev
|
|
227
|
+
counter survives deletion (§2.1.1 — never resets per item_id).
|
|
228
|
+
- Projection unlink happens iff a tombstone is applied (§2.8). The
|
|
229
|
+
never-unlink-unparseable guard **wins over the tombstone**: the file is
|
|
230
|
+
preserved and the divergence is a *persistent, counted* doctor item —
|
|
231
|
+
divergence-by-design, distinct from corruption.
|
|
232
|
+
- Singleton item types (`state`, `session`) never tombstone —
|
|
233
|
+
schema-forbidden; an encountered one is a doctor error.
|
|
234
|
+
- Claims: lifecycle release is `release_claim` (entity-state, post-image
|
|
235
|
+
with `status: released`); `delete` on a claim appears only from prune.
|
|
236
|
+
- Archival is `delete` + `summary: "archived"` (§2.1.1), not a distinct
|
|
237
|
+
action.
|
|
238
|
+
|
|
239
|
+
#### 2.1.4 Payload schema versioning (C2 resolution)
|
|
240
|
+
|
|
241
|
+
Decision: **version-in-payload + migration-on-replay, reusing the existing
|
|
242
|
+
versioned-document registry** (`src/core/migration.ts`). No new envelope
|
|
243
|
+
field.
|
|
244
|
+
|
|
245
|
+
- Every entity payload — and every checkpoint post-image — is persisted
|
|
246
|
+
exactly as its projection file is today: the document carries
|
|
247
|
+
`schema_version` and is registered in the migration registry keyed by
|
|
248
|
+
`VersionedDocumentType`.
|
|
249
|
+
- Replay runs each payload through the same detect → stepwise-migrate →
|
|
250
|
+
zod-validate path projections already use (`loadVersionedJsonFile`
|
|
251
|
+
semantics). One mechanism, one registry, one set of migration tests —
|
|
252
|
+
the journal adds zero new versioning machinery.
|
|
253
|
+
- The envelope's `v: 2` governs ONLY envelope shape (seq/writer/action
|
|
254
|
+
fields). Envelope and payload version independently.
|
|
255
|
+
- **Migration-retention invariant (normative):** journal immutability makes
|
|
256
|
+
migration paths load-bearing — a stepwise migration may never be deleted
|
|
257
|
+
while any non-archived segment, or either of the **two** newest verified
|
|
258
|
+
checkpoints, contains a payload at the pre-migration version. (Two, not
|
|
259
|
+
one: the §2.4 fallback chain replays from the second-newest checkpoint,
|
|
260
|
+
so the version floor is the state of the *second-newest* checkpoint.)
|
|
261
|
+
Checkpoints rewrite post-images at current schema versions when written,
|
|
262
|
+
so each checkpoint advances the floor; in the common case replay spans
|
|
263
|
+
only the post-checkpoint tail (weeks of records, ≤ 1–2 schema versions).
|
|
264
|
+
Archived segments may outlive migration paths: doctor warns
|
|
265
|
+
"archive predates migration floor" rather than promising eternal
|
|
266
|
+
replayability of archives.
|
|
267
|
+
- **Replay validation failure** (unknown version / migration throws / zod
|
|
268
|
+
fails): skip + count + doctor (the §2.6 mid-file rule). If the failed
|
|
269
|
+
record is the entity's *newest*, the projection keeps its current content
|
|
270
|
+
(never-regress, §2.7) and the entity is flagged divergent — rebuild never
|
|
271
|
+
silently regresses to the previous snapshot.
|
|
272
|
+
|
|
273
|
+
Alternatives rejected (recorded in Appendix A): per-record envelope
|
|
274
|
+
schema-version (redundant — payloads self-describe); segment-rewrite
|
|
275
|
+
migration (violates immutability and J1's audited-rewrite-only rule).
|
|
276
|
+
|
|
277
|
+
### 2.2 Seq and ordering
|
|
278
|
+
|
|
279
|
+
- `seq` is store-global, monotonic, persisted as `next_seq` in
|
|
280
|
+
`events/meta.json`, incremented **under the store mutation lock**. Every
|
|
281
|
+
append — including observability events — takes the lock and gets a seq.
|
|
282
|
+
There is no lockless append path and no `seq: null` record class (a
|
|
283
|
+
lockless path races segment roll, and seq-less records are unaddressable
|
|
284
|
+
by seq-watermark cursors).
|
|
285
|
+
- **Timestamps never order anything.** `ts` is for humans and notification
|
|
286
|
+
summaries.
|
|
287
|
+
- **Tail validation at lock acquisition (normative):** before its first
|
|
288
|
+
append, a writer reads the last record of the active segment and sets
|
|
289
|
+
`next_seq = max(meta.next_seq, tail_seq + 1)`. If meta was behind, it
|
|
290
|
+
appends a `seq_repair` event recording the correction. This re-derives
|
|
291
|
+
truth from the journal (meta is a cache) and caps seq collisions to the
|
|
292
|
+
single in-flight race write.
|
|
293
|
+
- **Two writers are NOT impossible.** The lock can be broken on presumed
|
|
294
|
+
owner death, and presumed death is fallible (pid-liveness false negatives
|
|
295
|
+
on Windows, pid reuse). A duplicate `seq` from distinct writers is a
|
|
296
|
+
**detected anomaly**: the reducer applies both records in file order
|
|
297
|
+
(snapshot payloads make double-apply convergent — the later line wins
|
|
298
|
+
wholesale), and doctor emits a warning. Detection via `(seq, writer)`;
|
|
299
|
+
containment via tail validation above. The journal's two-writer story is
|
|
300
|
+
only as rare as lock.ts's steal rate; the spec depends on lock.ts
|
|
301
|
+
identifying owners by pid + random token (verified against today's
|
|
302
|
+
`lockIsOwnedByCurrentProcess` — token-based, pid reuse alone cannot forge
|
|
303
|
+
ownership).
|
|
304
|
+
- **Dup-seq reducer semantics (normative, C1 resolution).** Replay
|
|
305
|
+
processes records strictly in (segment, file-line) order — never sorted
|
|
306
|
+
by seq. Collision cases:
|
|
307
|
+
1. Identical `(seq, writer)`, identical payload bytes → idempotent
|
|
308
|
+
duplicate (e.g. ambiguous-retry residue): second occurrence skipped,
|
|
309
|
+
doctor counter.
|
|
310
|
+
2. Identical `(seq, writer)`, different content → doctor **ERROR**
|
|
311
|
+
(should be impossible — a writer never reuses its own seq); both
|
|
312
|
+
applied in file order, later wins, entity flagged.
|
|
313
|
+
3. Same `seq`, different writers → the lock-steal anomaly above: both
|
|
314
|
+
applied in file order, doctor **WARNING**.
|
|
315
|
+
4. `entity_rev` ties produced by case 3 on the same entity: later file
|
|
316
|
+
order wins wholesale; the never-regress guard (§2.7) treats
|
|
317
|
+
equal-rev-different-writer as a doctor-flagged overwrite, not a
|
|
318
|
+
regression.
|
|
319
|
+
5. `entity_rev` *gaps* during replay (expected prev+1, observed larger):
|
|
320
|
+
doctor warning (possible lost event) — snapshot payloads self-heal
|
|
321
|
+
state, the counter records that history is incomplete.
|
|
322
|
+
- **Scope boundary (stated so the assumption is visible when it breaks):**
|
|
323
|
+
global-seq-under-lock welds event capture to lock availability. Sandboxed
|
|
324
|
+
or worktree workers that cannot reach the store produce zero journal
|
|
325
|
+
events until a sync point — the journal is the truth of the *store*, not
|
|
326
|
+
of the *system*. The moment any roadmap item requires offline local event
|
|
327
|
+
capture with later merge, this primitive is falsified and per-writer seqs
|
|
328
|
+
+ merge (the federation mechanism applied locally) become necessary.
|
|
329
|
+
Until then, global seq costs zero new coordination and stays.
|
|
330
|
+
|
|
331
|
+
### 2.3 Segments and sealing
|
|
332
|
+
|
|
333
|
+
Layout:
|
|
334
|
+
|
|
335
|
+
```
|
|
336
|
+
.brainclaw/events/
|
|
337
|
+
meta.json # next_seq + per-family last_applied_seq — rebuildable cache
|
|
338
|
+
seg-00000001.jsonl # immutable once rolled; name = first seq it contains
|
|
339
|
+
seg-00018000.jsonl # active segment (newest = append target)
|
|
340
|
+
checkpoints/
|
|
341
|
+
ckpt-00018000.json # self-contained state manifest (out-of-band, §2.4)
|
|
342
|
+
quarantine/ # doctor-parked bytes only (offline repair, §2.6)
|
|
343
|
+
archive/
|
|
344
|
+
events.v1.jsonl # parked legacy notification log (never deleted)
|
|
345
|
+
```
|
|
346
|
+
|
|
347
|
+
- Segments are **named by their first seq at birth and never renamed**.
|
|
348
|
+
The active segment is simply the newest one. No rename means no Windows
|
|
349
|
+
EPERM/EBUSY hazard, no retry protocol, no cursor invalidation. Locating
|
|
350
|
+
seq N = directory listing + binary search by filename; no index file.
|
|
351
|
+
- Roll when the active segment ≥ 10 MB: under the lock, write a checkpoint
|
|
352
|
+
(§2.4), create the next segment, update `meta.json`. Rolled segments are
|
|
353
|
+
immutable — an invariant that holds because **all** appenders take the
|
|
354
|
+
lock and resolve the active segment inside it.
|
|
355
|
+
- `meta.json` is a single small file (one read covers staleness checks for
|
|
356
|
+
everything), rewritten atomically (temp+rename), and is a **rebuildable
|
|
357
|
+
cache**: if missing or corrupt it is reconstructed from the segment
|
|
358
|
+
listing plus a tail read of the last segment.
|
|
359
|
+
- **Retention**: sealed segments are never auto-deleted. `gc` may move
|
|
360
|
+
segments superseded by a *verified* checkpoint to `archive/`
|
|
361
|
+
(park-don't-delete), but never past the **second-newest verified
|
|
362
|
+
checkpoint** — the previous checkpoint must remain replayable as the
|
|
363
|
+
fallback chain (§2.4).
|
|
364
|
+
- **Support boundary**: journal correctness is guaranteed on local
|
|
365
|
+
filesystems only (NTFS, ext4, APFS). O_APPEND atomicity does not hold on
|
|
366
|
+
SMB/NFS. `bclaw doctor` performs best-effort (heuristic) detection of UNC
|
|
367
|
+
paths and mapped network drives and warns; the boundary is documented,
|
|
368
|
+
not silently assumed.
|
|
369
|
+
|
|
370
|
+
### 2.4 Checkpoints
|
|
371
|
+
|
|
372
|
+
- A checkpoint is an **out-of-band, self-contained** manifest
|
|
373
|
+
`checkpoints/ckpt-<seq>.json`: full post-images of every live entity at
|
|
374
|
+
head seq ("self-contained" = manifest + its blob closure once
|
|
375
|
+
`payload_ref` exists — §2.10). Never hashes referencing projection files — a checkpoint whose
|
|
376
|
+
validity depends on projection integrity is useless in exactly the
|
|
377
|
+
scenarios it exists for.
|
|
378
|
+
- Written under the lock at segment roll (and on `bclaw doctor --compact`):
|
|
379
|
+
write manifest → fsync → append a `checkpoint_ref` event to the journal
|
|
380
|
+
carrying the checkpoint's **sha256** → update meta last. A crash leaves
|
|
381
|
+
at worst an orphan manifest with no ref (harmless) — cursors never see
|
|
382
|
+
checkpoint content, the seq space is not inflated, and rebuild needs no
|
|
383
|
+
terminator-scanning.
|
|
384
|
+
- **Verify before archive (normative):** a checkpoint must be fully
|
|
385
|
+
re-parsed and schema-validated before any segment it supersedes moves to
|
|
386
|
+
`archive/`. On checksum or parse failure at rebuild time, fall back to
|
|
387
|
+
the previous checkpoint and replay more segments (guaranteed available by
|
|
388
|
+
the two-checkpoint gc floor).
|
|
389
|
+
- Rebuild cost is bounded: latest verified checkpoint + replay of segments
|
|
390
|
+
after it (≤ ~10 MB tail in the common case).
|
|
391
|
+
|
|
392
|
+
### 2.5 Cursors
|
|
393
|
+
|
|
394
|
+
- `AgentCursor` = `{last_seq, last_read}` — a **seq watermark**. Rotation,
|
|
395
|
+
compaction, archival, and any future segment surgery cannot invalidate a
|
|
396
|
+
watermark. Byte offsets are dead (they die under any file mutation,
|
|
397
|
+
including the offline repairs in §2.6).
|
|
398
|
+
- `readUnseenEvents(agent)` = binary-search the segment containing
|
|
399
|
+
`last_seq + 1` by filename, stream forward across segments.
|
|
400
|
+
- If the watermark predates the oldest non-archived segment, the reader
|
|
401
|
+
gets `{gap: true}` plus a summary built from the latest checkpoint —
|
|
402
|
+
notifications degrade gracefully; state rebuild never depended on them.
|
|
403
|
+
- **Cursor key and self-exclusion.** Cursors are keyed today by agent
|
|
404
|
+
*name*; the identity-model proposal re-keys them name → actor instance
|
|
405
|
+
(its migration step 3 — one-time rename, cursors are caches). v2
|
|
406
|
+
self-exclusion compares the record's `writer` (or actor id), never the
|
|
407
|
+
display name: three same-name claude-code instances sharing one cursor
|
|
408
|
+
and consuming each other's notifications was an observed incident
|
|
409
|
+
(2026-06-10).
|
|
410
|
+
|
|
411
|
+
### 2.6 Append protocol, framing, torn tails
|
|
412
|
+
|
|
413
|
+
- One record = **one single-buffer write** (`"\n" + JSON + "\n"`) to an fd
|
|
414
|
+
opened append-only (O_APPEND / FILE_APPEND_DATA). The lock is the primary
|
|
415
|
+
concurrency guarantee; single-write atomicity on local FS is the seatbelt
|
|
416
|
+
for the lock-steal window.
|
|
417
|
+
- The **leading `\n`** caps torn-write damage at exactly one event: if the
|
|
418
|
+
previous append tore (no trailing newline), our leading newline
|
|
419
|
+
terminates the fragment as its own malformed line instead of letting our
|
|
420
|
+
valid record be absorbed into it.
|
|
421
|
+
- **Short-write check**: `bytesWritten !== buffer.length` ⇒ throw inside
|
|
422
|
+
`mutate()`; the mutation fails loudly before any projection write.
|
|
423
|
+
- **Append failures are loud.** The current error swallow is removed for v2
|
|
424
|
+
state events: a failed journal append is a failed mutation.
|
|
425
|
+
- Reader rules (normative):
|
|
426
|
+
1. Split on `\n`, skip empty lines.
|
|
427
|
+
2. A mid-file line failing parse or schema validation: skip, count,
|
|
428
|
+
surface via doctor — never silently (trp_d5595086).
|
|
429
|
+
3. A torn **tail** (final line, unparseable or missing trailing `\n`) is
|
|
430
|
+
expected crash residue: skip it. This is correct even when the torn
|
|
431
|
+
line *parses* validly — journal-first + fsync-before-projection (§2.7)
|
|
432
|
+
means an unconfirmed tail can always be dropped, because the caller
|
|
433
|
+
was never told "ok".
|
|
434
|
+
- **No hot-path rewrites, ever.** The journal is append-only; nothing
|
|
435
|
+
truncates or moves bytes during normal operation. When a writer (under
|
|
436
|
+
lock, before appending) detects a torn tail, it appends a `journal_note`
|
|
437
|
+
event recording the fragment's segment, byte range, and content hash as
|
|
438
|
+
*adjudicated*. Doctor counts adjudicated fragments separately from
|
|
439
|
+
unexplained mid-file corruption — benign crash residue never raises a
|
|
440
|
+
permanent alarm (alarm fatigue is how real corruption later slips
|
|
441
|
+
through). Physical excision of damaged bytes into `quarantine/` exists
|
|
442
|
+
only as an **offline doctor repair** (doctor holds the lock, no
|
|
443
|
+
concurrent appender, parks bytes, never deletes).
|
|
444
|
+
|
|
445
|
+
### 2.7 Durability (fsync) and the journal-first invariant
|
|
446
|
+
|
|
447
|
+
- **Write order inside `mutate()` (the single most important invariant):**
|
|
448
|
+
|
|
449
|
+
```
|
|
450
|
+
append v2 event(s) → fsync journal fd → write projection files → bump watermark in meta
|
|
451
|
+
```
|
|
452
|
+
|
|
453
|
+
Program-order journal-first is fiction without a barrier: the OS may
|
|
454
|
+
persist later projection writes before earlier journal appends, yielding
|
|
455
|
+
a projection *from the future* that the journal cannot explain — which a
|
|
456
|
+
reconciler would then wrongly regress (silent data loss).
|
|
457
|
+
- **Default: one `fsync` per `mutate()` call** — after the last append,
|
|
458
|
+
before any projection write. Mutations are human-action frequency, not
|
|
459
|
+
hot-loop; one fsync each is affordable on NTFS. Config escape hatch
|
|
460
|
+
`store.journal.fsync: "mutation" | "never"`; **CI and tests run the prod
|
|
461
|
+
default** (fidelity over speed, per the test-env-contamination history).
|
|
462
|
+
- **Never-regress guard (defense in depth — fsync can be configured off):**
|
|
463
|
+
the reconciler refuses to overwrite a projection with replayed state that
|
|
464
|
+
is *older* (lower `entity_rev`) than what the projection holds; a
|
|
465
|
+
regressing mismatch is a doctor error, not a write.
|
|
466
|
+
|
|
467
|
+
### 2.8 Projections and event emission
|
|
468
|
+
|
|
469
|
+
- Projections are exactly today's per-entity JSON files — atomic,
|
|
470
|
+
pretty-printed, git-diffable. They remain the store's human-readable and
|
|
471
|
+
MCP-cheap representation.
|
|
472
|
+
- **Staleness check is O(1)**: read `meta.json`, compare per-family
|
|
473
|
+
`last_applied_seq` to `next_seq - 1`. Equal (the overwhelmingly common
|
|
474
|
+
case) → serve projection files directly; the MCP worker-per-call fresh
|
|
475
|
+
path adds one small file read. Behind → acquire the lock, replay only the
|
|
476
|
+
gap onto the projection files, bump the watermark, serve. pln#496 lazy
|
|
477
|
+
reconcile; no daemon.
|
|
478
|
+
- **Lock contended** → serve the stale projection annotated `stale: true`
|
|
479
|
+
rather than block; whoever wins the lock heals once (no thundering herd
|
|
480
|
+
of identical reconciles). Whether claim-class entities may be served
|
|
481
|
+
stale is a Juan call (§6).
|
|
482
|
+
- **Emission = diff synthesis at the persist choke point, permanently,
|
|
483
|
+
plus verb-site intent annotation.** `persistStateUnlocked` computes an
|
|
484
|
+
id-level diff (created / changed / removed) against the loaded state and
|
|
485
|
+
synthesizes snapshot events — a single choke point provably consistent
|
|
486
|
+
with what was persisted, immune to call-site drift. To preserve verb
|
|
487
|
+
semantics (`claim` vs `update` vs `complete` — consumed by notifications
|
|
488
|
+
and federation signaling), verb sites declare
|
|
489
|
+
`(action, item_type, item_id, summary)` into the in-flight mutation
|
|
490
|
+
context (today's ~30 `appendEvent` call sites already pass exactly these
|
|
491
|
+
fields; they redirect to the context instead of the legacy stream); the
|
|
492
|
+
diff supplies the payload and emits any *unannotated* change as `update`
|
|
493
|
+
plus a doctor counter. There is **no migration to explicit call-site
|
|
494
|
+
event emission** — explicit emission is justified only for registries
|
|
495
|
+
that never pass through `State` (assignments/runs/loops), and those reuse
|
|
496
|
+
the same append+project primitive.
|
|
497
|
+
- **Deletion authority** (journal-primary mode): a projection file is
|
|
498
|
+
unlinked only when a tombstone for its id is applied. "Absent from
|
|
499
|
+
in-memory state" stops being a deletion signal — closing the
|
|
500
|
+
trp_d5595086 bug class structurally. The never-unlink-unparseable guard
|
|
501
|
+
carries over on the projection side. The **same id-level diff** that
|
|
502
|
+
synthesizes events is what drives the unlink — one diff, two consumers,
|
|
503
|
+
cannot disagree (today's `deleteMissing` path and event emission are
|
|
504
|
+
separate code; v2 fuses them). The coarse `agent: "system"` /
|
|
505
|
+
`item_type: "state"` ping today's `persistStateUnlocked` appends is
|
|
506
|
+
replaced by the per-entity diff events.
|
|
507
|
+
- **Heartbeat-class churn is never journaled.** Refresh/liveness field
|
|
508
|
+
updates (claim `expires_at` extensions, run `last_heartbeat_at`,
|
|
509
|
+
assignment `last_progress`, lock metadata) are ephemeral —
|
|
510
|
+
projection/registry layer only. Only lifecycle *transitions* (claimed,
|
|
511
|
+
released, completed, failed) are events. Without this rule, 20 agents ×
|
|
512
|
+
30s heartbeats × 2 KB snapshots ≈ >100 MB/day of journal for zero
|
|
513
|
+
information.
|
|
514
|
+
- **Ephemeral-field masking (normative consequence).** Ephemeral fields
|
|
515
|
+
mutate projections without journal events or `entity_rev` bumps, so a
|
|
516
|
+
projection can legitimately differ from replayed state at *equal* rev.
|
|
517
|
+
Therefore: the reconciler never overwrites a projection at equal rev (it
|
|
518
|
+
only fills gaps forward), `doctor --verify-journal` masks the ephemeral
|
|
519
|
+
field set per item_type before diffing, and the §2.8 diff synthesizer
|
|
520
|
+
emits **no event** for ephemeral-only changes. The ephemeral field set
|
|
521
|
+
is declared once per schema — a single source consumed by all three. **Falsifier (phase 0 deliverable):** from the dogfood
|
|
522
|
+
store's 17k v1 events, compute per-item_type p95 entity size × event
|
|
523
|
+
frequency; instrument event bytes by action class during the dual-mode
|
|
524
|
+
sprint. If any non-heartbeat class exceeds ~50% of journal bytes, or any
|
|
525
|
+
record would exceed 64 KB, that type needs `payload_ref` or a delta
|
|
526
|
+
format in phase 1, not deferred.
|
|
527
|
+
|
|
528
|
+
### 2.9 Locking interplay
|
|
529
|
+
|
|
530
|
+
- The journal lives **inside** the existing `mutate()` critical section. No
|
|
531
|
+
new lock protocol. Seq assignment, appends, fsync, projection writes, and
|
|
532
|
+
the watermark bump all happen under the one store lock, journal-first.
|
|
533
|
+
- Lock-steal residual (a breaker briefly coexisting with a
|
|
534
|
+
stale-but-alive owner) is handled by detection + containment (§2.2), not
|
|
535
|
+
denied. The phrase "impossible by construction" is banned.
|
|
536
|
+
- Lock-hold growth (fsync + reconciling readers) is instrumented, not
|
|
537
|
+
assumed away: the phase-1 dual sprint records lock wait-time
|
|
538
|
+
distribution. **Falsifier:** p95 lock wait > ~200 ms under normal
|
|
539
|
+
multi-agent load falsifies global-seq-under-lock and forces the
|
|
540
|
+
per-writer-journal redesign (§2.2 boundary). Note for the instrumented
|
|
541
|
+
baseline: today's `persistStateUnlocked` already runs a **git commit**
|
|
542
|
+
(`commitMemoryChange`) inside the critical section — the pre-existing
|
|
543
|
+
dominant lock-hold term. The fsync the journal adds is marginal against
|
|
544
|
+
it; attribute wait-time per phase (append/fsync/projection/git) so the
|
|
545
|
+
falsifier indicts the right component.
|
|
546
|
+
- Federation imports must chunk: a 10k-event pull takes and releases the
|
|
547
|
+
lock per chunk rather than starving local agents.
|
|
548
|
+
|
|
549
|
+
### 2.10 Oversized payloads — `payload_ref` and the handoff diet (C3 resolution)
|
|
550
|
+
|
|
551
|
+
The phase-0 measurement (`event-log-store-phase0-measurements.md`) fired
|
|
552
|
+
the §2.8 falsifier: handoff entities are p50 109,700 B / p95 225,157 B —
|
|
553
|
+
15–45× over the 64 KB threshold at p50 already — while every other
|
|
554
|
+
item_type sits at p95 ≤ 7.5 KB. Per C3's own rule this enters **phase 1**;
|
|
555
|
+
the record format ships with it. Two composable mechanisms, **both
|
|
556
|
+
adopted**:
|
|
557
|
+
|
|
558
|
+
1. **Handoff diet (primary fix).** The dominant bytes are the inline
|
|
559
|
+
`snapshot.diff` (same root cause as the 41 MB
|
|
560
|
+
`handoffs/compacted.jsonl`). Externalize `snapshot.diff` from the
|
|
561
|
+
handoff document to a content-addressed attachment under
|
|
562
|
+
`events/blobs/` referenced by hash; the handoff entity returns to the
|
|
563
|
+
2–8 KB class every other entity lives in. One move fixes the journal
|
|
564
|
+
record size, checkpoint size, J2's git posture, and the legacy
|
|
565
|
+
compacted.jsonl pathology. The schema change rides the existing
|
|
566
|
+
migration registry (§2.1.4). Product call J6 (§6) confirms portability
|
|
567
|
+
implications.
|
|
568
|
+
2. **`payload_ref` (permanent safety net).** If a serialized payload
|
|
569
|
+
exceeds 64 KB, the writer stores it at
|
|
570
|
+
`events/blobs/<sha256[0:2]>/<sha256>` (content-addressed, write-once)
|
|
571
|
+
and the record carries `payload_ref: { sha256, bytes }` *instead of*
|
|
572
|
+
`payload`. Readers resolve transparently; a missing or hash-mismatched
|
|
573
|
+
blob is a doctor **ERROR** for that entity — never silent.
|
|
574
|
+
- **Blob-before-ref ordering (normative):** the blob is written and
|
|
575
|
+
fsync'd *before* the journal append that references it — the §2.7
|
|
576
|
+
barrier extended one link left. A crash between the two leaves an
|
|
577
|
+
orphan blob (harmless, gc-able), never a dangling ref.
|
|
578
|
+
- **Checkpoint closure:** checkpoints store oversized post-images as
|
|
579
|
+
the same `payload_ref` (manifests stay small); the
|
|
580
|
+
`checkpoint_ref.payload.blobs` list (§2.1.2) enumerates the closure.
|
|
581
|
+
"Self-contained" (§2.4) is redefined as *manifest + blob closure*;
|
|
582
|
+
verify-before-archive verifies the manifest hash AND presence + hash
|
|
583
|
+
of every blob in the closure.
|
|
584
|
+
- **Blob gc:** park-don't-delete. A blob moves to `archive/blobs/` only
|
|
585
|
+
when referenced by zero records in non-archived segments AND by
|
|
586
|
+
neither of the two newest verified checkpoints' closures — the §2.3
|
|
587
|
+
floor extended verbatim.
|
|
588
|
+
- **Redaction closure (J1 × `payload_ref`, normative — resolves the
|
|
589
|
+
blocking half of R2, 2026-06-12):** `doctor redact` of a record whose
|
|
590
|
+
payload lives in a blob must also **delete the blob** (true erasure —
|
|
591
|
+
the one exception to park-don't-delete; an erasure request is not
|
|
592
|
+
satisfied by parking) AND regenerate any checkpoint whose closure
|
|
593
|
+
references it *before* the redaction completes — manifest rewritten
|
|
594
|
+
minus the redacted post-image, re-verified, the stale checkpoint
|
|
595
|
+
parked. The redaction `journal_note` (§2.1.2) lists rewritten
|
|
596
|
+
checkpoints alongside segments. Invariant: after `doctor redact`
|
|
597
|
+
returns, no live segment, no `archive/blobs/` entry, and no
|
|
598
|
+
checkpoint closure can yield the redacted bytes. The *federation*
|
|
599
|
+
half of R2 (peer re-presenting a pre-redaction record or checkpoint)
|
|
600
|
+
stays open in §6 — it cannot be closed before the federation
|
|
601
|
+
transport exists.
|
|
602
|
+
- **Git (J2 boundary):** `events/blobs/` is gitignored like segments.
|
|
603
|
+
With the diet in place no live entity ships an oversized payload, so
|
|
604
|
+
bare-clone restorability from projections + checkpoints holds in
|
|
605
|
+
practice; doctor flags any checkpoint whose closure references a
|
|
606
|
+
gitignored blob as not-clone-restorable. This becomes a real product
|
|
607
|
+
trade-off only if J6 rejects the diet.
|
|
608
|
+
|
|
609
|
+
Residual falsifier follow-up: `runtime_note`/`session` event *count* (10k
|
|
610
|
+
of 17.7k v1 events) is volume, not bytes — both classes are payload-free
|
|
611
|
+
in v2 (observability), so they contribute line overhead only (~2–3 MB at
|
|
612
|
+
historical rates) and do not threaten the weekly-roll target. No per-class
|
|
613
|
+
retention knob needed ahead of J5.
|
|
614
|
+
|
|
615
|
+
### 2.11 Federation conflict primitive (C4 resolution)
|
|
616
|
+
|
|
617
|
+
Cross-checked against `identity-model-proposal.md` (origin-partitioned
|
|
618
|
+
write authority; scalar `entity_rev` + origin tag;
|
|
619
|
+
`(origin_id, origin_epoch, seq)`-headed slices — the epoch handles
|
|
620
|
+
restore-from-backup, see the proposal). Both symmetric reviews attacked the
|
|
621
|
+
same concurrent-edit hole independently and produced two complementary
|
|
622
|
+
detection mechanisms; this section reconciles them (coordinator synthesis
|
|
623
|
+
2026-06-11, flagged for Codex adjudication in §6 R-C4).
|
|
624
|
+
|
|
625
|
+
- **Execution entities** (claims, runs, locks, assignments): single-writer
|
|
626
|
+
per origin; other origins materialize read-only. Authority partition
|
|
627
|
+
means no concurrent-write conflict exists; the scalar is trivially
|
|
628
|
+
sufficient. (The advisory cross-machine claim race is *arbitration*, not
|
|
629
|
+
journal conflict — deferred to the cloud dispatcher per the proposal.)
|
|
630
|
+
- **Memory entities**: LWW ordered by the total order
|
|
631
|
+
`(entity_rev, origin_id)` — rev first, origin_id lexicographic as the
|
|
632
|
+
deterministic tiebreak. **No wall clock anywhere** (the "LWW by what
|
|
633
|
+
clock?" answer: by revision counter + origin id, never time). Convergent:
|
|
634
|
+
every origin applying the same record set reaches the same head.
|
|
635
|
+
- **The attack (resolution ≠ detection):** origin A edits entity e
|
|
636
|
+
rev 7→8→9; origin B, offline, edits e 7→8. B's slice reaches A after A
|
|
637
|
+
is at rev 9. *Resolution* is correct (9 > 8, deterministic LWW). But
|
|
638
|
+
*detection* — the proposal's "conflicts surface as candidates, never
|
|
639
|
+
silent overwrite" — cannot be decided from head comparison: B's rev-8 is
|
|
640
|
+
concurrent with A's lineage, not an ancestor of it, and a bare scalar
|
|
641
|
+
head cannot distinguish "stale copy of what I already incorporated" from
|
|
642
|
+
"divergent edit with a lower rev".
|
|
643
|
+
- **Adopted detection rules (two, complementary — reconciled with the
|
|
644
|
+
identity proposal's hardened model):**
|
|
645
|
+
1. **PRIMARY — `base_rev` fast-forward check** (from the identity
|
|
646
|
+
proposal, post-review): every *exported* memory-entity record carries
|
|
647
|
+
`base_rev`, the rev the write was based on. Receiver rule: incoming is
|
|
648
|
+
a clean fast-forward iff `incoming.base_rev >= current.rev`; otherwise
|
|
649
|
+
the write was concurrent → LWW materializes the winner AND a conflict
|
|
650
|
+
candidate carries both post-images. One integer per exported record,
|
|
651
|
+
decided from the record alone — **independent of local history
|
|
652
|
+
retention**, so it survives gc/compaction and works on a fresh
|
|
653
|
+
materialize.
|
|
654
|
+
2. **DEFENSE-IN-DEPTH — `(rev, origin)` journal collision at replay**:
|
|
655
|
+
import replays the incoming slice through the reducer; an incoming
|
|
656
|
+
record whose `(item_id, entity_rev)` already exists locally **with a
|
|
657
|
+
different origin** is a conflict (the §2.2 dup-detection generalized
|
|
658
|
+
from `(seq, writer)`). Catches legacy/foreign slices lacking
|
|
659
|
+
`base_rev` and cross-checks rule 1, at zero envelope cost — but only
|
|
660
|
+
reaches back to the gc floor.
|
|
661
|
+
In the attack above, both rules fire: B's record has `base_rev 7 <
|
|
662
|
+
current rev 9` (rule 1) and B's (e, 8) collides with A's journaled
|
|
663
|
+
(e, 8) (rule 2) → candidate surfaced while LWW keeps A's rev 9.
|
|
664
|
+
- **Residual miss-window, now narrow:** only a record that *lacks*
|
|
665
|
+
`base_rev` (legacy exporter) AND whose colliding rev is archived past
|
|
666
|
+
the gc floor escapes surfacing — convergence still never breaks.
|
|
667
|
+
The per-origin high-watermark map (a bounded vector clock, size =
|
|
668
|
+
origin count, typically ≤ 3) remains the **named upgrade path** if dogfooding
|
|
669
|
+
shows missed candidates; it slots into import metadata without touching
|
|
670
|
+
the envelope, which stays origin-agnostic (origin appears only in
|
|
671
|
+
exported slice headers, per the proposal's migration step 2).
|
|
672
|
+
- **Cross-requirement flowing back to the identity proposal:**
|
|
673
|
+
`entity_rev` must never reset per item_id — tombstone → recreate
|
|
674
|
+
continues the counter (§2.1.1/§2.1.3) — otherwise (rev, origin)
|
|
675
|
+
collisions become false positives after delete→recreate races.
|
|
676
|
+
|
|
677
|
+
## 3. Failure-mode matrix
|
|
678
|
+
|
|
679
|
+
| # | Scenario (round-2 attack) | Mitigation in this spec |
|
|
680
|
+
|---|---|---|
|
|
681
|
+
| 1 | Crash mid-append (torn tail) | Leading-`\n` framing caps loss at 1 event; reader skips tail; next writer appends adjudicating `journal_note`; doctor counts adjudicated residue separately from corruption (§2.6) |
|
|
682
|
+
| 2 | Torn line that parses validly | Dropped anyway: journal-first + fsync means an unconfirmed tail was never acknowledged to the caller (§2.6 rule 3) |
|
|
683
|
+
| 3 | Crash between append and projection write | Projection stale, never ahead (fsync barrier §2.7); lazy reconcile heals forward on next read |
|
|
684
|
+
| 4 | Projection from the future (no-fsync reorder) | One fsync per mutate before projection writes; never-regress guard keyed on `entity_rev` as second line (§2.7) |
|
|
685
|
+
| 5 | Two writers in the lock-steal window | O_APPEND seatbelt prevents byte interleaving; duplicate seq detected via `(seq, writer)`, applied in file order (snapshot double-apply is convergent), doctor warns (§2.2) |
|
|
686
|
+
| 6 | Seq counter corruption outliving the race (both writers rewrite meta, loser's bump lost, third writer reuses seq) | Tail validation at lock acquisition: `next_seq = max(meta, tail+1)` + `seq_repair` event; meta is a rebuildable cache, the journal is truth (§2.2) |
|
|
687
|
+
| 7 | Lockless appender writes into a just-rolled "immutable" segment | No lockless path exists; all appends take the lock and resolve the active segment inside it (§2.2, §2.3) |
|
|
688
|
+
| 8 | Crash mid-checkpoint | Out-of-band manifest; worst case orphan file with no `checkpoint_ref` (harmless); meta written last (§2.4) |
|
|
689
|
+
| 9 | Corrupt checkpoint discovered after segments archived | Verify-by-full-re-parse before archival; sha256 in `checkpoint_ref`; previous-checkpoint fallback; gc floor = second-newest verified checkpoint (§2.4) |
|
|
690
|
+
| 10 | Oversized record exits the O_APPEND atomicity envelope | Payloads > 64 KB externalized via `payload_ref` (§2.10); envelope line hard-fails at 256 KB (§2.1) |
|
|
691
|
+
| 11 | Partial `write()` (signal, ENOSPC, quota) | Short-write check ⇒ loud mutation failure before projections (§2.6) |
|
|
692
|
+
| 12 | Rotation/sealing during concurrent read | Segments never renamed; active segment is just the newest file; seq watermarks survive any layout change (§2.3, §2.5) |
|
|
693
|
+
| 13 | Cursor predates archived history | `{gap: true}` + checkpoint-built summary; graceful notification degradation (§2.5) |
|
|
694
|
+
| 14 | Clock skew / ts collision | Irrelevant — ts never orders (§2.2) |
|
|
695
|
+
| 15 | 100k-event store cold read | Fresh path O(1) check + projection read; stale path replays only the gap; rebuild bounded by latest checkpoint (§2.4, §5) |
|
|
696
|
+
| 16 | `meta.json` corrupt/lost | Rebuilt from segment listing + tail read — it is a cache, not truth (§2.3) |
|
|
697
|
+
| 17 | Heartbeat churn floods segments (20-agent scale) | Heartbeat-class updates excluded from the journal by rule; volume falsifier instrumented (§2.8) |
|
|
698
|
+
| 18 | Store on a network mount | Documented local-FS-only support boundary; doctor warns heuristically (§2.3) |
|
|
699
|
+
| 19 | Wedged lock = no event capture; sandboxed workers can't append | Stated scope boundary: journal is truth of the store, not the system; offline capture falsifies the primitive and triggers the per-writer redesign (§2.2) |
|
|
700
|
+
| 20 | Mid-file malformed line (should be impossible under lock) | Skip + count + doctor alarm (unexplained-corruption class), never silent (§2.6) |
|
|
701
|
+
| 21 | Crash between blob write and referencing append | Blob-before-ref ordering: worst case an orphan blob (harmless, gc-able), never a dangling `payload_ref` (§2.10) |
|
|
702
|
+
| 22 | `payload_ref` blob missing or hash-mismatched at read | Doctor ERROR for that entity, read fails loudly — never silent (§2.10) |
|
|
703
|
+
| 23 | Replay diff flags ephemeral-only field drift as divergence | Ephemeral field set masked per item_type in verify-journal and the reconciler; equal-rev projections never overwritten (§2.8) |
|
|
704
|
+
|
|
705
|
+
## 4. Migration plan
|
|
706
|
+
|
|
707
|
+
Flag: `store.journal_v2: off | dual | primary` (default `off`). Each phase
|
|
708
|
+
ships dark behind the flag; this repo's own store (~17k v1 events of real
|
|
709
|
+
multi-agent traffic) is the canary. A `.brainclaw/` backup is taken at every
|
|
710
|
+
phase flip (upgrade-style, park-don't-delete).
|
|
711
|
+
|
|
712
|
+
- **Phase 0 — format, no behavior change.** Land the v2 record schema (zod),
|
|
713
|
+
segment reader/writer, meta cache, doctor counters,
|
|
714
|
+
max-record-size enforcement, and the **snapshot-size falsifier
|
|
715
|
+
measurement** (§2.8). v1 `events.jsonl` untouched.
|
|
716
|
+
- **Phase 1 — `dual`: journal-first dual-write.** One-shot
|
|
717
|
+
`bclaw migrate journal`: backup store; emit a **genesis backfill** — one
|
|
718
|
+
`backfill` snapshot event per current entity, built from the projection
|
|
719
|
+
files (the only truth we have; the 17k payload-less v1 events are not
|
|
720
|
+
translatable — parked to `events/archive/events.v1.jsonl`, readable
|
|
721
|
+
forever for forensics); initialize meta. `persistStateUnlocked` reorders
|
|
722
|
+
to append → fsync → existing file writes → watermark. Notifications
|
|
723
|
+
switch to seq-watermark cursors. State dirs remain authoritative.
|
|
724
|
+
Phase 1 also lands `payload_ref` + the handoff diet (§2.10) — the
|
|
725
|
+
phase-0 falsifier fired on handoffs, so the record format ships with
|
|
726
|
+
both.
|
|
727
|
+
**Rollback:** set `off` — projection files were written on every mutation
|
|
728
|
+
in exactly today's format; park `events/`; zero data transformation in
|
|
729
|
+
either direction.
|
|
730
|
+
- **Phase 2 — verification (promotion gate).**
|
|
731
|
+
`bclaw doctor --verify-journal` rebuilds state from
|
|
732
|
+
checkpoint + journal in a temp dir and diffs against live projections —
|
|
733
|
+
the only check that validates the actual claim ("the journal is
|
|
734
|
+
sufficient to reproduce state"). Runs in CI on **both OS families**,
|
|
735
|
+
alongside: kill-9 storm tests (crash between append and projection must
|
|
736
|
+
always converge), a two-process append stress test (N children × K
|
|
737
|
+
events; assert no interleaved bytes, no lost `(seq, writer)` pairs), and
|
|
738
|
+
the tail-validation test. Doctor counters (skipped lines, torn tails,
|
|
739
|
+
adjudicated fragments, unannotated-diff emissions, network-FS warning)
|
|
740
|
+
run always-on as continuous telemetry. **Exit criterion:** zero
|
|
741
|
+
divergence across a full dogfooding sprint of real multi-agent traffic,
|
|
742
|
+
including dispatch worktree churn; lock wait-time distribution recorded
|
|
743
|
+
(§2.9 falsifier).
|
|
744
|
+
- **Phase 3 — `primary`.** Reads serve projections via lazy reconcile;
|
|
745
|
+
deletion authority moves to tombstones; `mutateState` callers unchanged.
|
|
746
|
+
Then per-entity ops: single-entity mutations append + patch one
|
|
747
|
+
projection file without full-store load/rewrite; registries
|
|
748
|
+
(assignments/runs/loops) unify on the same append+project primitive
|
|
749
|
+
(entry phase is a Juan sequencing call, §6). **Rollback:** projections
|
|
750
|
+
are at all times a complete materialized state in legacy format — flip
|
|
751
|
+
to `dual` or `off`, re-arm legacy delete semantics, no data
|
|
752
|
+
transformation.
|
|
753
|
+
|
|
754
|
+
### Phase 2 gate status (pln#565, 2026-06-12)
|
|
755
|
+
|
|
756
|
+
The promotion gate is now **mechanically checkable** via one command and an
|
|
757
|
+
automated hardening suite. Status of each Phase-2 exit criterion:
|
|
758
|
+
|
|
759
|
+
| Criterion | Status | Evidence |
|
|
760
|
+
| --- | --- | --- |
|
|
761
|
+
| Journal reproduces projections (the core claim) | ✅ | `brainclaw doctor --verify-journal` — rebuilds from journal, diffs vs live projections, exit 1 on drift. GREEN on this repo's store (mode=dual). |
|
|
762
|
+
| Tail validation / torn-tail adjudication | ✅ | `journal-v2.test` (torn tail → `torn_tail_adjudicated`, stale meta → `seq_repair`). |
|
|
763
|
+
| Two-process append stress | ✅ | `journal-concurrency.test` — N processes × K appends, gap-free 1..N*K seq, N distinct writers, zero torn/lost. |
|
|
764
|
+
| Kill-9 storm convergence (append path) | ✅ | `journal-concurrency.test` — SIGKILL mid-append storm: journal stays readable, seqs never duplicate, post-storm append re-derives a non-colliding seq, state still materializes. |
|
|
765
|
+
| **Persist crash-ordering — journal before projections (I2)** | ✅ | pln#566 F1 (codex review): persist now PLANs → emits+fsyncs the journal → APPLIES projection writes, so a crash can only leave the journal ahead (recoverable), never projections ahead. Proven by `journal-crash-ordering.test` via deterministic fault injection on the real `mutateState` pipeline. (Earlier the kill-9 test exercised `forceAppendJournalRecords` directly, not the mutation pipeline — that gap is now closed.) |
|
|
766
|
+
| Migration + rollback tooling | ✅ | genesis backfill + `rollbackJournal` (park `events/`, projections untouched). |
|
|
767
|
+
| Dual-OS CI | ✅ | `.github/workflows/ci.yml` matrix `[ubuntu, windows]`. |
|
|
768
|
+
| **Zero divergence across a real multi-agent sprint** | ✅ | seq#47 (2026-06-12): 4 parallel claude-code lanes + dispatch worktree churn + 4 merges → `verify-journal` zero drift throughout. |
|
|
769
|
+
| Lock wait-time distribution (§2.9 falsifier) | ◐ | Lock serialization proven under contention by `journal-concurrency.test`; explicit p50/p95 telemetry via doctor counters is the one remaining instrumentation item — lands with the cutover (it touches the mutate hot path). |
|
|
770
|
+
|
|
771
|
+
**Verdict:** the correctness gate is GREEN. The only residual is wait-time
|
|
772
|
+
*telemetry* (not a correctness blocker). The primary cutover (Phase 3) is a
|
|
773
|
+
Juan sequencing call (§6) and a distinct implementation chantier (tombstones +
|
|
774
|
+
per-entity append/patch), not gated on more verification.
|
|
775
|
+
|
|
776
|
+
## 5. Perf targets (measured, not asserted)
|
|
777
|
+
|
|
778
|
+
- `bclaw_work` cold read < 1 s on a 100k-event store.
|
|
779
|
+
- Single-entity op cost independent of store size: O(1) append + O(1)
|
|
780
|
+
projection write + O(gap) reconcile.
|
|
781
|
+
- MCP worker-per-call overhead delta < 50 ms vs. today (fresh path = one
|
|
782
|
+
extra small meta read).
|
|
783
|
+
- One fsync per `mutate()`; lock p95 wait < 200 ms under normal multi-agent
|
|
784
|
+
load (falsifier threshold, §2.9).
|
|
785
|
+
- Segment roll ≈ every 2–3 weeks at current write rates (post heartbeat
|
|
786
|
+
exclusion); checkpoint cost O(live entities) under lock.
|
|
787
|
+
|
|
788
|
+
## 6. OPEN QUESTIONS
|
|
789
|
+
|
|
790
|
+
Severity-ranked. Every open question from round 2 not resolved by this spec
|
|
791
|
+
is carried here.
|
|
792
|
+
|
|
793
|
+
### [JUAN — product calls] — RESOLVED 2026-06-10
|
|
794
|
+
|
|
795
|
+
| # | Sev | Decision |
|
|
796
|
+
|---|---|---|
|
|
797
|
+
| J1 | HIGH | **`doctor redact` ships in v1.** Immutability is "immutable except via audited `doctor redact`": tooled segment rewrite, audit-trailed, seq watermarks survive it. Rationale: the EU/GDPR positioning cannot answer "impossible" to an erasure request. (Write-time secret-detection may complement later; it does not replace redaction.) |
|
|
798
|
+
| J2 | HIGH | **Projections + checkpoints in git; segments and meta gitignored.** The store's git-diffable identity = the per-entity projections (diff/merge as today) plus checkpoints (single-file snapshots a human can adjudicate in a merge, making a bare git clone restorable without segments). No segment blobs in history; the branched-seq merge problem never enters git. |
|
|
799
|
+
| J3 | MED | **Read-through for claim-class entities.** Claims and active assignments read the journal tail even under contention — consistency before liveness for the coordination primitive (no double-work is the product promise). Tail-read cost is paid only on this hot-critical path; memory-class entities keep stale-annotated reads (§2.8). |
|
|
800
|
+
| J4 | MED | **Registry enters in a dedicated Phase 1.5.** Phase 1 = memory entities (low volume, proven reversibility); registry lifecycle transitions migrate once the journal is hardened in real use. Matches the off/dual/primary posture: the dispatch lifecycle is the product's credibility — it is not migrated first. |
|
|
801
|
+
| J5 | LOW | **Defer fine gc/archive thresholds.** The normative two-verified-checkpoint floor stands alone until federation defines its consumer; count/age knobs are trivial additive later. |
|
|
802
|
+
|
|
803
|
+
### [JUAN — new product call raised by C3]
|
|
804
|
+
|
|
805
|
+
| # | Sev | Question |
|
|
806
|
+
|---|---|---|
|
|
807
|
+
| J6 | MED | **Handoff diet (§2.10):** externalize `snapshot.diff` from handoff documents to content-addressed blob attachments. Affects handoff export/import and federation transfer (the blob closure must travel with the document). Recommended: **accept** — it also fixes the 41 MB `compacted.jsonl` class and keeps J2's bare-clone restorability intact. |
|
|
808
|
+
|
|
809
|
+
### [CODEX — schema/invariant review] — RESOLVED 2026-06-10 (symmetric pass)
|
|
810
|
+
|
|
811
|
+
| # | Sev | Resolution |
|
|
812
|
+
|---|---|---|
|
|
813
|
+
| C1 | HIGH | Resolved in §2.1.1 (action taxonomy, 5 classes, holes closed: required `item_id`, `assignment_progress` heartbeat-class, store-ops per-entity + `store_marker`, archival-vs-delete, rev-never-resets), §2.1.2 (journal-meta schemas incl. genesis + J1 redaction audit note), §2.1.3 (tombstones per item_type), §2.2 (dup-seq reducer, 5 normative cases). |
|
|
814
|
+
| C2 | HIGH | Resolved in §2.1.4: version-in-payload + migration-on-replay reusing the existing `migration.ts` versioned-document registry; migration-retention invariant pinned to the *second-newest* checkpoint; alternatives in Appendix A. |
|
|
815
|
+
| C3 | MED | Falsifier FIRED on handoffs (phase-0 measurements). Resolved in §2.10: handoff diet (primary) + `payload_ref` (safety net), blob-before-ref ordering, checkpoint blob closure, gc floor extension, J2 git posture. Residual product call → J6. |
|
|
816
|
+
| C4 | MED | Resolved in §2.11 against `identity-model-proposal.md`: scalar `(entity_rev, origin_id)` survives — convergence intact; conflict *surfacing* via (rev, origin) journal collision; documented miss-window past the gc floor with the per-origin watermark as named upgrade path. |
|
|
817
|
+
|
|
818
|
+
### [CODEX pre-P0 review] — RESOLVED 2026-06-12 (claude-code, codex out of credits)
|
|
819
|
+
|
|
820
|
+
Codex's final pass before P0 implementation surfaced 5 findings; all
|
|
821
|
+
verified against code and resolved in this revision:
|
|
822
|
+
|
|
823
|
+
| # | Sev | Resolution |
|
|
824
|
+
|---|---|---|
|
|
825
|
+
| F1 | MED/HIGH | `assignment_progress` carried durable state (`status_reason`, `artifacts`) on the heartbeat path — un-replayable as specced. Resolved in §2.1.1: heartbeat/durable split (`assignment_progress`/`run_progress` = pure ticks; `assignment_amended`/`run_amended` = registry-lifecycle with post-image), effective phase 1.5. |
|
|
826
|
+
| F2 | MED | Same ambiguity on `run_running` (re-emitted per tick). Resolved with F1 — one decision: `run_running` = transition-only; ticks move to `run_progress`. |
|
|
827
|
+
| F3 | MED | `federation_apply` required by the identity proposal but absent from the taxonomy. Resolved: declared as journal-meta NOW (§2.1.1 table + §2.1.2 schema), inert until federation ships — avoids a post-freeze union extension. |
|
|
828
|
+
| F4 | MED | Redaction × payload_ref/checkpoints under-specified. Blocking half resolved in §2.10 (redaction closure: blob deletion + checkpoint regeneration, normative invariant); federation re-import half stays open as R2 below. |
|
|
829
|
+
| F5 | LOW | Spec said 32 `EventAction` members; code has 34. Corrected in §2.1.1. |
|
|
830
|
+
|
|
831
|
+
### [CODEX residue — needs a second model's schema instincts]
|
|
832
|
+
|
|
833
|
+
| # | Sev | Question |
|
|
834
|
+
|---|---|---|
|
|
835
|
+
| R1 | MED | ~~Zod encoding of §2.1.1~~ **RESOLVED 2026-06-12** (codex recommendation, claude-code verified zod 4.4.3 installed): `action` stays the only discriminant — no serialized `class` field (derived-field drift, trp#180 family). `ACTION_CLASS_BY_ACTION` table `satisfies Record<EventAction, ActionClass>` for compile-time exhaustiveness; zod discriminatedUnion on `action` enums per class; phase-gated payload requiredness = runtime refinement by journal mode, not schema variants. See §2.1.1. |
|
|
836
|
+
| R2 | MED | **Redaction × cursors × federation — federation half only** (blob/checkpoint closure resolved in §2.10). Does seq-watermark survival hold for a cursor positioned *inside* a redacted range? And the re-import hole: a federation peer that pulled a record pre-redaction can re-present it — `(seq, writer)` dedup would *reject* the redacted copy (good) but the peer's checkpoint may still embed the payload. The redaction note likely needs to propagate as a federation signal; decide with the federation transport (cannot be closed before it exists). |
|
|
837
|
+
| R3 | LOW | **Ephemeral field set enumeration (§2.8).** Adversarial sweep of the real zod schemas for fields beyond `last_heartbeat_at` / claim `expires_at` / `last_progress` that mutate without semantic change (counters, denormalized caches?) — the masking set must be complete or verify-journal cries wolf. |
|
|
838
|
+
| R4 | LOW | **C4 miss-window sizing (§2.11).** Gc-floor window (weeks) vs realistic offline-origin durations; should the per-origin watermark ship in federation v1 regardless of dogfood evidence? |
|
|
839
|
+
| R-C4 | MED | **Dual conflict-detection adjudication (§2.11, reconciliation 2026-06-11).** The two symmetric reviews independently produced `base_rev` fast-forward (identity proposal) and `(rev, origin)` journal collision (this spec); the coordinator kept BOTH (primary + defense-in-depth). Adjudicate: is the redundancy worth the dual maintenance, or should one become normative? Note `base_rev` is the only one that survives gc and fresh materializes. |
|
|
840
|
+
|
|
841
|
+
## Appendix A — Rejected alternatives
|
|
842
|
+
|
|
843
|
+
- **Diff/patch payloads (RFC 6902 or field-deltas).** Every event becomes
|
|
844
|
+
load-bearing: one torn line poisons all later state for that entity, and
|
|
845
|
+
zero-dep means hand-rolling a patch engine. Snapshots are idempotent,
|
|
846
|
+
self-healing, and compaction-trivial. (Both proposals; unanimous.)
|
|
847
|
+
- **A's rename-based sealing (`active.jsonl` → range name).** Contradicts
|
|
848
|
+
its own cursor format (offsets dangle after rename), and rename-of-open-
|
|
849
|
+
file is the exact Windows EPERM/EBUSY hazard it then needs retry logic
|
|
850
|
+
for. Segments are born with their permanent first-seq name.
|
|
851
|
+
- **A's byte-offset cursors `{segment_id, offset}`.** Die under rename,
|
|
852
|
+
under quarantine truncation, and under any future segment surgery
|
|
853
|
+
(including J1 redaction). Seq watermarks survive all of it.
|
|
854
|
+
- **A's writer-inline torn-tail quarantine (truncate + move bytes).** A
|
|
855
|
+
read-modify-write of the journal on the hot path: breaks append-only,
|
|
856
|
+
races the very lock-steal window the seatbelt exists for, and can
|
|
857
|
+
quarantine a live in-flight write. Demoted to offline doctor repair.
|
|
858
|
+
- **A's `fsync: rotate` default (no fsync per mutation).** Program-order
|
|
859
|
+
journal-first without a barrier permits projections from the future and
|
|
860
|
+
silent reconciler regression — the trp_d5595086 class. One fsync per
|
|
861
|
+
mutate is affordable at human-action mutation rates.
|
|
862
|
+
- **B's in-journal checkpoint event runs (+ terminator).** Pollutes every
|
|
863
|
+
seq-watermark cursor with O(entities) phantom events, leaves headless
|
|
864
|
+
runs on crash that are schema-identical to real events, and stretches
|
|
865
|
+
lock hold time. Out-of-band manifests have none of these.
|
|
866
|
+
- **A's "referencing" checkpoint variant (hashes of projection files).**
|
|
867
|
+
Circular: a rebuild-from-truth artifact whose validity depends on
|
|
868
|
+
projection integrity is useless precisely when projections are suspect.
|
|
869
|
+
Killed without further study.
|
|
870
|
+
- **B's lockless observability appends (`seq: null`).** Races segment roll
|
|
871
|
+
into "immutable" files, and seq-less records are unaddressable by B's own
|
|
872
|
+
seq-watermark cursors. All appends take the lock; revisit only if
|
|
873
|
+
instrumentation shows notification contention.
|
|
874
|
+
- **B's `(writer_id, writer_seq)` per-writer counter in the envelope.**
|
|
875
|
+
Serves only federation and is derivable later; `entity_rev` serves three
|
|
876
|
+
local masters today. Dead weight dropped.
|
|
877
|
+
- **B's `deleted: true` tombstone boolean.** Redundant with
|
|
878
|
+
`action: "delete"`; one source of truth in the envelope.
|
|
879
|
+
- **B's two meta files (`HEAD.json` + `projections.json`).** Two reads per
|
|
880
|
+
MCP call, two renames per mutation, plus cross-file ordering reasoning,
|
|
881
|
+
for state always consumed together. Single `meta.json`, keeping B's
|
|
882
|
+
rebuildable-cache property.
|
|
883
|
+
- **Migration to explicit verb-site event emission (A's end-state).** ~30
|
|
884
|
+
call sites each become a chance to forget, double-emit, or
|
|
885
|
+
emit-without-persisting. The diff choke point is provably consistent
|
|
886
|
+
with what was persisted; verb semantics are preserved by intent
|
|
887
|
+
annotation instead. Conversely, **pure diff with no annotation** (B's
|
|
888
|
+
letter) was also rejected: it collapses the EventAction union to generic
|
|
889
|
+
`update`, losing semantics notifications and federation signaling consume.
|
|
890
|
+
- **Splitting notification stream from state journal now (B Q6).** Same
|
|
891
|
+
journal is simpler — one reader, one cursor type, one ordering; split
|
|
892
|
+
only if volume instrumentation demands it.
|
|
893
|
+
- **Separate journal per entity (vs. one per store).** Global order comes
|
|
894
|
+
free with one journal; per-entity journals reintroduce cross-entity
|
|
895
|
+
ordering as a problem. (Proposal A §0; never contested.)
|
|
896
|
+
- **Per-record envelope schema-version for payloads (C2 alternative).**
|
|
897
|
+
Redundant: payloads already self-describe via `schema_version` + the
|
|
898
|
+
migration registry; a second version field in the envelope creates two
|
|
899
|
+
sources of truth that can disagree.
|
|
900
|
+
- **Migration-by-segment-rewrite (C2 alternative).** Rewriting old
|
|
901
|
+
segments to the current payload schema violates append-only immutability
|
|
902
|
+
and J1's audited-rewrite-only rule; replay-time migration is
|
|
903
|
+
pure-functional and leaves bytes untouched.
|
|
904
|
+
- **Hash in every envelope, inline payloads included (C3 variant).**
|
|
905
|
+
Inline payloads are already line-framed and zod-validated; mandatory
|
|
906
|
+
hashing buys federation dedup nothing (dedup keys on `(seq, writer)`)
|
|
907
|
+
at a per-mutation CPU cost. The hash lives where it is load-bearing:
|
|
908
|
+
`payload_ref` and `checkpoint_ref`.
|
|
909
|
+
- **A vector-clock component in the envelope (C4 alternative).** Origin-
|
|
910
|
+
partitioned write authority makes convergence scalar-safe (§2.11); the
|
|
911
|
+
only thing a vector adds is complete conflict *surfacing* across
|
|
912
|
+
gc-floor-sized offline windows. Deferred to import metadata (per-origin
|
|
913
|
+
watermark) — the envelope stays origin-agnostic.
|
|
914
|
+
|
|
915
|
+
## Appendix B — Memory citations (union of rounds 1–2)
|
|
916
|
+
|
|
917
|
+
trp_d5595086 (silent-loss-via-swallow → loud appends, doctor-visible skips,
|
|
918
|
+
tombstone deletion authority, never-regress guard);
|
|
919
|
+
feedback_lazy_reconcile_pattern / pln#496 (read-path reconciliation, no
|
|
920
|
+
daemon); trp_e85e9fbe (dual-platform CI gates, Windows/POSIX divergence
|
|
921
|
+
discipline); trp_26e9634b (missing-store failure mode); trp_09988deb
|
|
922
|
+
(upgrade-style backups); feedback_no_init_force + park-don't-delete house
|
|
923
|
+
rule (retention, quarantine, archives, rollback);
|
|
924
|
+
federation_architecture_decisions + cross_project_signaling_vs_execution
|
|
925
|
+
(Pull-and-Materialize substrate, signaling-only foreign writes, no daemon);
|
|
926
|
+
feedback_bisect_state_before_code (doctor counters over silent skips);
|
|
927
|
+
feedback_ideation_loop_single_agent_method (multi-instance multi-round
|
|
928
|
+
method that produced this spec).
|