@lcv-ideas-software/cross-review 4.0.8 → 4.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -7,6 +7,210 @@ standard `v00.00.00`; npm package versions remain SemVer.
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [v04.01.01] — 2026-05-17
11
+
12
+ **Patch — release the hard-gate cleanup as a published package.** The previous
13
+ hard-gate cleanup was synchronized without a package-version bump; this patch
14
+ formalizes the change as npm package `4.1.1`, preserving the rule that every
15
+ patch shipped to `main` receives a publishable SemVer increment.
16
+
17
+ ### Fixed
18
+
19
+ - Removed the dead global ESLint waiver for
20
+ `@typescript-eslint/no-explicit-any`; strict enforcement already passes on the
21
+ current source tree.
22
+ - Restored README coverage under Prettier by removing the README masks from
23
+ `.prettierignore` and formatting the file instead of hiding the drift.
24
+ - Added smoke coverage that prevents future linter/formatter masking of
25
+ `README.md`, `src/**`, and `scripts/**`, and pins the TypeScript unused-var
26
+ rule as an error.
27
+ - Made `runtime-smoke` polling terminal-outcome aware and increased the polling
28
+ deadline to 60 seconds so slow-but-converged stub sessions are not reported as
29
+ timeouts.
30
+ - Replaced two CodeQL `js/file-system-race` patterns with atomic/file-descriptor
31
+ based flows: session metadata placeholder creation now relies directly on
32
+ `writeFileSync(..., { flag: "wx" })`, and the migration race harness snapshots
33
+ lock state through `openSync` + `fstatSync` on the opened descriptor.
34
+ - Added a scoped StepSecurity suppression for generated `dist/**` artifacts in
35
+ the publish workflow's pre-publish build job, then resolved the existing
36
+ actionable generated-file detections.
37
+
38
+ ## [v04.01.00] — 2026-05-17
39
+
40
+ **Minor — security hardening of session-store concurrency, write-path
41
+ DoS surface, and credential redaction.** This release closes three
42
+ high-impact findings from an in-depth security audit of the v4.0.8
43
+ codebase. The public MCP tool surface is unchanged; the SessionStore
44
+ class methods that mutate state become async (cascading `await` to
45
+ ~80 internal call sites). Operators consuming the public MCP tools
46
+ see no API change.
47
+
48
+ ### Fixed
49
+
50
+ - **F1 — Session-lock TOCTOU race (multi-process).** Pre-v4.1.0
51
+ acquired `<session_dir>/.lock` by creating the file empty and then
52
+ writing PID metadata in a separate syscall. Across multiple host
53
+ processes sharing the same `data_dir`, a second process could
54
+ observe the empty lock between the two syscalls, fail to JSON-parse
55
+ it, remove it, create its own, and enter the critical section in
56
+ parallel with the first holder — corrupting `meta.json`.
57
+ `withSessionLock` now uses `proper-lockfile`'s `fs.mkdir`-based
58
+ atomic locking (the lockfile path is a directory, not a regular
59
+ file). The lock comes into existence in a single syscall with no
60
+ empty-window race possible across NTFS and POSIX. Lock-holder
61
+ freshness is now signaled by mtime touched every 5 s and detected
62
+ as stale after 120 s (the prior PID-aliveness check had collision
63
+ risk after PID-recycling restart). `clearStaleInFlight` and
64
+ `abortStaleSessions` switched from the manual PID read to
65
+ `lockfile.check(...)`.
66
+ - **F2 — `redactPrivateKeyBlocks` leaked unterminated PRIVATE KEY
67
+ payloads.** Pre-v4.1.0, when the input contained
68
+ `-----BEGIN PRIVATE KEY-----` without a matching
69
+ `-----END PRIVATE KEY-----` (e.g. a log truncated mid-key), the
70
+ function returned the original input unredacted — the partial key
71
+ reached events.ndjson + persistent logs. v4.1.0 redacts from the
72
+ first BEGIN marker to end-of-string when no matching END is found,
73
+ emitting a single `[REDACTED]` token for the unterminated tail.
74
+ - **F3 — `writeJson` retry busy-wait blocked the Node.js event loop
75
+ for up to 310 ms under Windows AV stress.** Pre-v4.1.0 used
76
+ `while (Date.now() - start < wait) {}` between `renameSync`
77
+ retries, burning a single core at 100% and starving the event loop
78
+ (SSE token streams, MCP stdio reads, timers, other sessions) for
79
+ the cumulative wait. `writeJson` is now `async`; the backoff awaits
80
+ a Promise-based timer (`await new Promise(r => setTimeout(r, wait))`)
81
+ so the event loop processes other tasks during the wait. The same
82
+ CPU-burning busy-wait existed in `src/core/cache-manifest.ts`
83
+ `writeJsonAtomic` and is removed in the same release; the F5
84
+ anti-drift pin (below) now scans every `src/**/*.ts` to prevent
85
+ recurrence.
86
+
87
+ ### Changed (cascade)
88
+
89
+ - `writeJson(file, data)` → `async writeJson(file, data): Promise<void>`.
90
+ - `withSessionLock<T>(sessionId, fn): T` →
91
+ `async withSessionLock<T>(sessionId, fn: () => T | Promise<T>): Promise<T>`.
92
+ - `private sleepSync` removed (no callers after the lock refactor).
93
+ - `cache-manifest.ts` exports become async:
94
+ `appendCacheManifestEntry(...)` and `writeCacheManifest(...)` now
95
+ return `Promise<void>`. Orchestrator's `recordCacheTelemetry` is
96
+ now async + awaited in the post-peer call path.
97
+ - The following SessionStore methods are now async (return
98
+ `Promise<T>`): `init`, `markInFlight`, `appendEvent`,
99
+ `saveGeneration`, `savePeerResult`, `savePeerFailure`, `appendRound`,
100
+ `markBudgetWarningEmitted`, `setCircularState`,
101
+ `setSessionTraceability`, `finalize`, `requestCancellation`,
102
+ `markCancelled`, `appendFallbackEvent`,
103
+ `appendEvidenceChecklistItems`,
104
+ `runEvidenceChecklistAddressDetection`,
105
+ `setEvidenceChecklistItemStatus`, `markEvidenceItemAddressedByJudge`,
106
+ `recoverInterruptedSessions`, `sessionDoctor`, `contestVerdict`,
107
+ `attachEvidence`, `escalateToOperator`, `sweepIdle`,
108
+ `clearStaleInFlight`, `abortStaleSessions`.
109
+ - New `SessionStore.flushPendingEvents()` — awaits all in-flight
110
+ fire-and-forget `appendEvent` promises. Used by sweeps + tests
111
+ that read events.ndjson right after the emit pipeline persisted.
112
+ - New runtime dep: `proper-lockfile` ^4.1.2 (3 transitive deps,
113
+ small surface, used by npm internally; MIT licensed).
114
+ - New devDep: `@types/proper-lockfile` ^4.1.4.
115
+
116
+ ### Tests
117
+
118
+ - `redact_unterminated_private_key_test` (v4.1.0 / F4): empirical
119
+ regression for the unterminated PRIVATE KEY redaction. Asserts
120
+ `[REDACTED]` emitted, partial key body absent, passthrough for
121
+ no-key inputs.
122
+ - `writeJson_async_no_busy_wait_test` (v4.1.0 / F5): pins source
123
+ invariants on `src/core/session-store.ts` — `writeJson` must be
124
+ declared `async function writeJson`, must use a Promise-based
125
+ async delay — AND walks every `.ts` under `src/` asserting that
126
+ no executable code contains `while (Date.now() - start < wait) {}`
127
+ or `Atomics.wait(...)`. The pin's expanded scope was driven by the
128
+ R1 cross-review feedback (cache-manifest.ts had an identical
129
+ busy-wait that the original session-store-only grep missed).
130
+ - `session_lock_proper_lockfile_test` (v4.1.0 / F6): pins
131
+ `from "proper-lockfile"` import, `lockfile.lock(` call,
132
+ `async withSessionLock`, the absence of the pre-v4.1.0
133
+ `fs.openSync(..., "wx")` lock-acquire pattern, AND the fail-closed
134
+ legacy-file policy — source must contain the `detected a
135
+ pre-v4.1.0 lock file` remediation string and MUST NOT contain
136
+ `fs.rmSync(lockfilePath, ...)` (no auto-remove). The expanded
137
+ contract was driven by codex catches R1..R4.
138
+
139
+ ### Migration
140
+
141
+ - Pre-v4.1.0 created `.lock` as a regular file containing
142
+ `{pid, ts}` JSON. v4.1.0's lock claims `.lock` as a directory, so a
143
+ leftover legacy regular file would block every subsequent lock
144
+ acquisition. **v4.1.0 NEVER auto-removes a legacy regular `.lock`
145
+ file.** A four-round cross-review (codex catches R1, R2, R3, R4)
146
+ demonstrated that every auto-clean strategy could split-brain
147
+ under live cross-version v4.0/v4.1 operation:
148
+ - R1: unconditional removal split-brained with a live legacy
149
+ holder.
150
+ - R2: removal when `pidAlive && legacyMtimeStale` failed because
151
+ legacy locks do not heartbeat (mtime frozen at acquisition; a
152
+ v4.0.x process inside a multi-minute peer call has BOTH a live
153
+ pid AND a >120 s old mtime).
154
+ - R3: fail-closed on `pidAlive` (regardless of mtime) still raced
155
+ two concurrent v4.1.0 migrators against a v4.0.x.
156
+ - R4: a v4.1↔v4.1 migration mutex still left the cross-version
157
+ race — v4.0.x's own stale-removal-and-recreate path does not
158
+ honor any v4.1 mutex, so v4.0.x could remove a stale `.lock` and
159
+ create its own live one between v4.1's inspect and v4.1's
160
+ path-based `rmSync`, and v4.1 would then delete v4.0.x's new
161
+ live lock = split-brain.
162
+
163
+ **v4.1.0 fails closed.** When `withSessionLock` observes a regular
164
+ file at the lock path, it throws a clear remediation error to the
165
+ caller: "cross-review v4.1.0 detected a pre-v4.1.0 lock file at
166
+ `<path>`. Live cross-version migration is not supported (would
167
+ split-brain with any concurrent v4.0.x process). To migrate
168
+ safely: (1) stop all cross-review processes / close all MCP hosts
169
+ that loaded the server, (2) remove the legacy lock file, (3)
170
+ restart."
171
+
172
+ **Operator remediation (one-time at v4.0.x → v4.1.0 upgrade):**
173
+ 1. Close every MCP host running cross-review (Claude Code, Codex,
174
+ Gemini Code Assist, etc.).
175
+ 2. Remove all legacy lock files. POSIX one-liner:
176
+ `find ~/.cross-review/data/sessions -name .lock -type f -delete`.
177
+ Windows PowerShell:
178
+ `Get-ChildItem -Path ~/.cross-review/data/sessions -Recurse -Filter .lock -File | Remove-Item`.
179
+ 3. Restart the MCP hosts. They now spawn v4.1.0 cross-review which
180
+ manages locks as mkdir-atomic directories; the issue cannot
181
+ recur.
182
+
183
+ Trade-off: an extra one-time operator step at upgrade. The
184
+ alternative (best-effort auto-clean) was demonstrated unsafe
185
+ across four cross-review rounds. Operator burden of a single
186
+ `find` command is far less than the cost of any split-brain
187
+ corruption.
188
+
189
+ - Public MCP tool surface (`session_init`, `ask_peers`,
190
+ `run_until_unanimous`, etc.) is unchanged — all the async cascade
191
+ is internal.
192
+
193
+ ### Empirical validation
194
+
195
+ - `scripts/race-reproducer.mjs`: 4 procs × 5 rounds = 20/20 and
196
+ 8 procs × 10 rounds = 80/80 persisted with no losses under
197
+ multi-process contention against the shared `data_dir`.
198
+ - `scripts/race-legacy-holder.mjs`: five scenarios cover the legacy
199
+ matrix (live-pid+fresh-mtime, live-pid+stale-mtime, dead-pid,
200
+ empty-fresh-mtime, empty-stale-mtime). Every shape gets the
201
+ fail-closed remediation error; v4.1.0 never enters the CS, never
202
+ removes the legacy file, never mutates `meta.json`.
203
+ - `scripts/race-migration-toctou.mjs`: 3-process race orchestrated
204
+ to V41_A → LEGACY → V41_B. V41_A sees the planted stale dead-pid
205
+ file → fail-closed. LEGACY (v4.0.x simulator) clears the stale
206
+ file via v4.0.x's own openSync(wx) loop, claims `.lock` with its
207
+ live pid, holds an 8 s synthetic CS. V41_B sees LEGACY's new live
208
+ file → fail-closed. At end-of-CS, LEGACY's lockfile is present and
209
+ its content still names LEGACY's pid (no v4.1.0 deleted/replaced
210
+ it) — empirically demonstrating that the fail-closed policy
211
+ prevents the codex R3+R4 inspect+remove TOCTOU under live
212
+ cross-version operation.
213
+
10
214
  ## [v04.00.08] — 2026-05-16
11
215
 
12
216
  **Patch — eliminate the `js/file-access-to-http` CodeQL false positive