@lcv-ideas-software/cross-review 4.0.8 → 4.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -7,6 +7,182 @@ standard `v00.00.00`; npm package versions remain SemVer.
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [v04.01.00] — 2026-05-17
11
+
12
+ **Minor — security hardening of session-store concurrency, write-path
13
+ DoS surface, and credential redaction.** This release closes three
14
+ high-impact findings from an in-depth security audit of the v4.0.8
15
+ codebase. The public MCP tool surface is unchanged; the SessionStore
16
+ class methods that mutate state become async (cascading `await` to
17
+ ~80 internal call sites). Operators consuming the public MCP tools
18
+ see no API change.
19
+
20
+ ### Fixed
21
+
22
+ - **F1 — Session-lock TOCTOU race (multi-process).** Pre-v4.1.0
23
+ acquired `<session_dir>/.lock` by creating the file empty and then
24
+ writing PID metadata in a separate syscall. Across multiple host
25
+ processes sharing the same `data_dir`, a second process could
26
+ observe the empty lock between the two syscalls, fail to JSON-parse
27
+ it, remove it, create its own, and enter the critical section in
28
+ parallel with the first holder — corrupting `meta.json`.
29
+ `withSessionLock` now uses `proper-lockfile`'s `fs.mkdir`-based
30
+ atomic locking (the lockfile path is a directory, not a regular
31
+ file). The lock comes into existence in a single syscall with no
32
+ empty-window race possible across NTFS and POSIX. Lock-holder
33
+ freshness is now signaled by mtime touched every 5 s and detected
34
+ as stale after 120 s (the prior PID-aliveness check had collision
35
+ risk after PID-recycling restart). `clearStaleInFlight` and
36
+ `abortStaleSessions` switched from the manual PID read to
37
+ `lockfile.check(...)`.
38
+ - **F2 — `redactPrivateKeyBlocks` leaked unterminated PRIVATE KEY
39
+ payloads.** Pre-v4.1.0, when the input contained
40
+ `-----BEGIN PRIVATE KEY-----` without a matching
41
+ `-----END PRIVATE KEY-----` (e.g. a log truncated mid-key), the
42
+ function returned the original input unredacted — the partial key
43
+ reached events.ndjson + persistent logs. v4.1.0 redacts from the
44
+ first BEGIN marker to end-of-string when no matching END is found,
45
+ emitting a single `[REDACTED]` token for the unterminated tail.
46
+ - **F3 — `writeJson` retry busy-wait blocked the Node.js event loop
47
+ for up to 310 ms under Windows AV stress.** Pre-v4.1.0 used
48
+ `while (Date.now() - start < wait) {}` between `renameSync`
49
+ retries, burning a single core at 100% and starving the event loop
50
+ (SSE token streams, MCP stdio reads, timers, other sessions) for
51
+ the cumulative wait. `writeJson` is now `async`; the backoff awaits
52
+ a Promise-based timer (`await new Promise(r => setTimeout(r, wait))`)
53
+ so the event loop processes other tasks during the wait. The same
54
+ CPU-burning busy-wait existed in `src/core/cache-manifest.ts`
55
+ `writeJsonAtomic` and is removed in the same release; the F5
56
+ anti-drift pin (below) now scans every `src/**/*.ts` to prevent
57
+ recurrence.
58
+
59
+ ### Changed (cascade)
60
+
61
+ - `writeJson(file, data)` → `async writeJson(file, data): Promise<void>`.
62
+ - `withSessionLock<T>(sessionId, fn): T` →
63
+ `async withSessionLock<T>(sessionId, fn: () => T | Promise<T>): Promise<T>`.
64
+ - `private sleepSync` removed (no callers after the lock refactor).
65
+ - `cache-manifest.ts` exports become async:
66
+ `appendCacheManifestEntry(...)` and `writeCacheManifest(...)` now
67
+ return `Promise<void>`. Orchestrator's `recordCacheTelemetry` is
68
+ now async + awaited in the post-peer call path.
69
+ - The following SessionStore methods are now async (return
70
+ `Promise<T>`): `init`, `markInFlight`, `appendEvent`,
71
+ `saveGeneration`, `savePeerResult`, `savePeerFailure`, `appendRound`,
72
+ `markBudgetWarningEmitted`, `setCircularState`,
73
+ `setSessionTraceability`, `finalize`, `requestCancellation`,
74
+ `markCancelled`, `appendFallbackEvent`,
75
+ `appendEvidenceChecklistItems`,
76
+ `runEvidenceChecklistAddressDetection`,
77
+ `setEvidenceChecklistItemStatus`, `markEvidenceItemAddressedByJudge`,
78
+ `recoverInterruptedSessions`, `sessionDoctor`, `contestVerdict`,
79
+ `attachEvidence`, `escalateToOperator`, `sweepIdle`,
80
+ `clearStaleInFlight`, `abortStaleSessions`.
81
+ - New `SessionStore.flushPendingEvents()` — awaits all in-flight
82
+ fire-and-forget `appendEvent` promises. Used by sweeps + tests
83
+ that read events.ndjson right after the emit pipeline persisted.
84
+ - New runtime dep: `proper-lockfile` ^4.1.2 (3 transitive deps,
85
+ small surface, used by npm internally; MIT licensed).
86
+ - New devDep: `@types/proper-lockfile` ^4.1.4.
87
+
88
+ ### Tests
89
+
90
+ - `redact_unterminated_private_key_test` (v4.1.0 / F4): empirical
91
+ regression for the unterminated PRIVATE KEY redaction. Asserts
92
+ `[REDACTED]` emitted, partial key body absent, passthrough for
93
+ no-key inputs.
94
+ - `writeJson_async_no_busy_wait_test` (v4.1.0 / F5): pins source
95
+ invariants on `src/core/session-store.ts` — `writeJson` must be
96
+ declared `async function writeJson`, must use a Promise-based
97
+ async delay — AND walks every `.ts` under `src/` asserting that
98
+ no executable code contains `while (Date.now() - start < wait) {}`
99
+ or `Atomics.wait(...)`. The pin's expanded scope was driven by the
100
+ R1 cross-review feedback (cache-manifest.ts had an identical
101
+ busy-wait that the original session-store-only grep missed).
102
+ - `session_lock_proper_lockfile_test` (v4.1.0 / F6): pins
103
+ `from "proper-lockfile"` import, `lockfile.lock(` call,
104
+ `async withSessionLock`, the absence of the pre-v4.1.0
105
+ `fs.openSync(..., "wx")` lock-acquire pattern, AND the fail-closed
106
+ legacy-file policy — source must contain the `detected a
107
+ pre-v4.1.0 lock file` remediation string and MUST NOT contain
108
+ `fs.rmSync(lockfilePath, ...)` (no auto-remove). The expanded
109
+ contract was driven by codex catches R1..R4.
110
+
111
+ ### Migration
112
+
113
+ - Pre-v4.1.0 created `.lock` as a regular file containing
114
+ `{pid, ts}` JSON. v4.1.0's lock claims `.lock` as a directory, so a
115
+ leftover legacy regular file would block every subsequent lock
116
+ acquisition. **v4.1.0 NEVER auto-removes a legacy regular `.lock`
117
+ file.** A four-round cross-review (codex catches R1, R2, R3, R4)
118
+ demonstrated that every auto-clean strategy could split-brain
119
+ under live cross-version v4.0/v4.1 operation:
120
+ - R1: unconditional removal split-brained with a live legacy
121
+ holder.
122
+ - R2: removal when `pidAlive && legacyMtimeStale` failed because
123
+ legacy locks do not heartbeat (mtime frozen at acquisition; a
124
+ v4.0.x process inside a multi-minute peer call has BOTH a live
125
+ pid AND a >120 s old mtime).
126
+ - R3: fail-closed on `pidAlive` (regardless of mtime) still raced
127
+ two concurrent v4.1.0 migrators against a v4.0.x.
128
+ - R4: a v4.1↔v4.1 migration mutex still left the cross-version
129
+ race — v4.0.x's own stale-removal-and-recreate path does not
130
+ honor any v4.1 mutex, so v4.0.x could remove a stale `.lock` and
131
+ create its own live one between v4.1's inspect and v4.1's
132
+ path-based `rmSync`, and v4.1 would then delete v4.0.x's new
133
+ live lock = split-brain.
134
+
135
+ **v4.1.0 fails closed.** When `withSessionLock` observes a regular
136
+ file at the lock path, it throws a clear remediation error to the
137
+ caller: "cross-review v4.1.0 detected a pre-v4.1.0 lock file at
138
+ `<path>`. Live cross-version migration is not supported (would
139
+ split-brain with any concurrent v4.0.x process). To migrate
140
+ safely: (1) stop all cross-review processes / close all MCP hosts
141
+ that loaded the server, (2) remove the legacy lock file, (3)
142
+ restart."
143
+
144
+ **Operator remediation (one-time at v4.0.x → v4.1.0 upgrade):**
145
+ 1. Close every MCP host running cross-review (Claude Code, Codex,
146
+ Gemini Code Assist, etc.).
147
+ 2. Remove all legacy lock files. POSIX one-liner:
148
+ `find ~/.cross-review/data/sessions -name .lock -type f -delete`.
149
+ Windows PowerShell:
150
+ `Get-ChildItem -Path ~/.cross-review/data/sessions -Recurse -Filter .lock -File | Remove-Item`.
151
+ 3. Restart the MCP hosts. They now spawn v4.1.0 cross-review which
152
+ manages locks as mkdir-atomic directories; the issue cannot
153
+ recur.
154
+
155
+ Trade-off: an extra one-time operator step at upgrade. The
156
+ alternative (best-effort auto-clean) was demonstrated unsafe
157
+ across four cross-review rounds. Operator burden of a single
158
+ `find` command is far less than the cost of any split-brain
159
+ corruption.
160
+
161
+ - Public MCP tool surface (`session_init`, `ask_peers`,
162
+ `run_until_unanimous`, etc.) is unchanged — all the async cascade
163
+ is internal.
164
+
165
+ ### Empirical validation
166
+
167
+ - `scripts/race-reproducer.mjs`: 4 procs × 5 rounds = 20/20 and
168
+ 8 procs × 10 rounds = 80/80 persisted with no losses under
169
+ multi-process contention against the shared `data_dir`.
170
+ - `scripts/race-legacy-holder.mjs`: five scenarios cover the legacy
171
+ matrix (live-pid+fresh-mtime, live-pid+stale-mtime, dead-pid,
172
+ empty-fresh-mtime, empty-stale-mtime). Every shape gets the
173
+ fail-closed remediation error; v4.1.0 never enters the CS, never
174
+ removes the legacy file, never mutates `meta.json`.
175
+ - `scripts/race-migration-toctou.mjs`: 3-process race orchestrated
176
+ to V41_A → LEGACY → V41_B. V41_A sees the planted stale dead-pid
177
+ file → fail-closed. LEGACY (v4.0.x simulator) clears the stale
178
+ file via v4.0.x's own openSync(wx) loop, claims `.lock` with its
179
+ live pid, holds an 8 s synthetic CS. V41_B sees LEGACY's new live
180
+ file → fail-closed. At end-of-CS, LEGACY's lockfile is present and
181
+ its content still names LEGACY's pid (no v4.1.0 deleted/replaced
182
+ it) — empirically demonstrating that the fail-closed policy
183
+ prevents the codex R3+R4 inspect+remove TOCTOU under live
184
+ cross-version operation.
185
+
10
186
  ## [v04.00.08] — 2026-05-16
11
187
 
12
188
  **Patch — eliminate the `js/file-access-to-http` CodeQL false positive
package/README.md CHANGED
@@ -21,7 +21,7 @@ npm install -g @lcv-ideas-software/cross-review
21
21
  npm install -g @lcv-ideas-software/cross-review --registry=https://npm.pkg.github.com
22
22
  ```
23
23
 
24
- **Status.** Stable. Current release: **v04.00.08** (npm package `4.0.8`). See
24
+ **Status.** Stable. Current release: **v04.01.00** (npm package `4.1.0`). See
25
25
  [CHANGELOG.md](./CHANGELOG.md) for the release history.
26
26
 
27
27
  > **Project renamed 2026-05-15.** This project was previously published as
@@ -36,6 +36,7 @@ The version history at a glance:
36
36
 
37
37
  | Release | Scope |
38
38
  |---|---|
39
+ | **`v04.01.00`** | **Minor — security hardening of session-store concurrency, write-path DoS surface, and credential redaction.** Closes three high-impact findings from an in-depth security audit of v4.0.8: (F1) `withSessionLock` switched from `fs.openSync(.., "wx")` + separate write to `proper-lockfile`'s `fs.mkdir`-based atomic locking, eliminating the multi-process TOCTOU race window where two host processes sharing the same `data_dir` could both enter the critical section and corrupt `meta.json`. (F2) `redactPrivateKeyBlocks` now redacts unterminated `-----BEGIN PRIVATE KEY-----` blocks to end-of-string instead of returning the original input unredacted — pre-v4.1.0 leaked partial keys to events.ndjson when logs were truncated mid-key. (F3) `writeJson`'s `renameSync` retry no longer busy-waits with `while (Date.now() - start < wait)` (which blocked the event loop for up to 310 ms under Windows AV stress); it now awaits a Promise-based timer so the event loop remains responsive during backoff. The cascading internal refactor (~22 SessionStore methods became async, ~80 internal call sites added `await`) preserves the public MCP tool surface unchanged. New runtime dep: `proper-lockfile` ^4.1.2. |
39
40
  | **`v04.00.08`** | **Patch — eliminate the recurring `js/file-access-to-http` CodeQL false positive at the source.** `scripts/verify-registry-dist.mjs` no longer reads `package.json` from disk; package name and version come from `PACKAGE_NAME` / `PACKAGE_VERSION` env vars (with `npm_package_name` / `npm_package_version` auto-injected by npm as a transparent fallback when invoked via `npm run release:verify-registry`). Both inputs are required; missing values throw a clear error before any network call. Removing the `fs.readFileSync` → outbound-fetch flow stops future CodeQL analyses from re-filing the same alert on every release. |
40
41
  | **`v04.00.07`** | **Patch — bounded npm registry fetch in the post-publish verifier.** `scripts/verify-registry-dist.mjs` now passes `signal: AbortSignal.timeout(30_000)` to the `https://registry.npmjs.org/<package>/<version>` `fetch` call so a slow or unreachable registry surfaces as a deterministic abort instead of hanging the publish workflow until its 60-minute ceiling. Timeouts map to an explicit `"npm registry lookup for <spec> timed out after 30000 ms"` error; the validated fields (`dist.shasum`, `dist.integrity`, `dist.tarball`) and the script CLI/env contract are unchanged. |
41
42
  | **`v04.00.06`** | **Patch — Windows-safe registry verifier.** `scripts/verify-registry-dist.mjs` now queries `https://registry.npmjs.org` directly instead of spawning `npm.cmd`, closing the Windows Node hardening failure (`spawnSync npm.cmd EINVAL`) while preserving the post-publish validation of registry `dist.shasum`, `dist.integrity`, and `dist.tarball`. |