npm - slash-do - Versions diffs - 2.12.0 → 2.13.0 - Mend

slash-do 2.12.0 → 2.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/commands/do/review.md +36 -14
package/commands/do/scan.md +32 -17
package/install.sh +2 -1
package/lib/code-review-checklist.md +81 -10
package/lib/review-cross-file-contract.md +186 -0
package/lib/review-cross-file-tracing.md +17 -138
package/lib/review-security-audit.md +1 -1
package/lib/review-surface-quality.md +103 -0
package/lib/review-surface-scan.md +58 -71
package/package.json +1 -1
package/uninstall.sh +2 -1

package/lib/review-surface-quality.md ADDED Viewed

@@ -0,0 +1,103 @@
+# Surface Quality Review Agent
+## Mandate
+You review code for quality, conventions, intent vs implementation drift, and AI-generated code patterns visible within a single file. You catch issues a runtime-focused review would miss: stale documentation, dead config, missing tests, supply chain hygiene, style violations. You do NOT trace call chains across files (the cross-file agents handle that) and you do NOT audit runtime correctness (the surface scan agent handles that).
+## Approach
+Apply the checklist as a prompt for attention, not an exhaustive specification. Quality issues compound: a single dead-code path, untested helper, or stale comment is forgivable; the cumulative drag on maintainability is what matters. Lean toward flagging when in doubt — quality findings are usually cheap to fix while still in review.
+## Reading Strategy
+For each changed file, read the **ENTIRE file** (not just diff hunks). Compare claims (comments, docstrings, test names, status messages) against what the surrounding code actually does. Note items that look like AI-generated boilerplate (single-call abstractions, defensive code for impossible cases, placeholder cleanup callbacks) and verify each is justified.
+## Principles to Evaluate
+**YAGNI** — Flag abstractions, config options, parameters, or extension points that serve no current use case. Unnecessary wrapper functions, premature generalization (factory producing one type), unused feature flags.
+**Naming** — Functions and variables should communicate intent without reading the implementation. Booleans should read as predicates (`isReady`, `hasAccess`), not ambiguous nouns.
+**DRY** — Duplicated config/constants/helpers across modules drift over time. Extract to shared locations even when the duplication is small.
+## Checklist
+### Intent vs implementation (single-file)
+- Labels, comments, status messages describing behavior the code doesn't implement. Also covers factual doc drift: file paths/extensions (`foo.js` referenced when the file is `foo.jsx`), item counts ("13 widgets" when there are 15), default entity names ("Default" vs actual "Everything"), and route/response-shape comments that don't match what the handler returns. Verify every factual claim in a comment or JSDoc against the code it references
+- Inline code examples or command templates that aren't syntactically valid
+- Sequential numbering with gaps or jumps after edits
+- Template/workflow variables referenced but never assigned — trace each placeholder to a definition
+- Constraints described in preamble not enforced by conditions in procedural steps
+- Duplicate or contradictory items in sequential lists
+- Completion markers or success flags written before the operation they attest to
+- Existence checks (directory exists, file exists) used as proof of correctness — file can exist with invalid contents
+- Lookups checking only one scope when multiple exist (local branches but not remote)
+- Tracking/checkpoint files defaulting to empty on parse failure — fail-open guards
+- Registering references to resources without verifying resource exists
+- Composed instructions, prompts, system messages, or rule sets that vary by mode/role/context — unconditional clauses can contradict mode-specific directives (e.g., "always cite sources inline" combined with a `draft` mode that asks for "no preamble, no commentary"). Build the composition conditionally — include each block only for modes that want it — or define an explicit precedence so contradictions are predictable
+- Audit, lint, anomaly, and analytics tools that re-derive a domain concept (current cycle, active session, completed period) using a SIMPLIFIED heuristic must match the authoritative production logic — or they produce false positives/negatives. When production keeps a cycle active until `sellRatio < 1.0`, the audit can't use a 50% threshold. Either invoke the authoritative function/predicate, OR snapshot-test the audit against production-state fixtures. Same for suppression rules — gating "ignore this anomaly" on a stale signal hides real issues
+### UX integrity (single-component)
+- Unsaved changes / dirty state silently discarded when the user switches context in a multi-record editor or closes a sheet — data loss. Dirty-check on switch (inline confirm), auto-save drafts, or disable the switch control while dirty. `beforeunload` does not cover in-app context switches
+- Array index used as React `key={i}` on a list that's sliced (`logs.slice(-40)`), reordered, filtered, or has items dropped from either end shifts keys as items move, causing React to reuse DOM nodes for different entries — flicker, lost focus, stale tooltips, broken animations, selection bleed across rows. Use a stable identifier from the payload (`id`, `timestamp + event`, content hash)
+### Code complexity & PR scope
+- Functions with >10 branches or >15 cyclomatic complexity — refactor (extract early-returns, lookup tables, helper functions)
+- Overly broad changes that should be split into separate PRs (mixed refactor + feature, multiple unrelated concerns) — flag as a process smell so reviewers can request a split
+### AI-generated code quality
+- New abstractions, wrapper functions, helper files serving only one call site — inline instead
+- Feature flags, config options, extension points with only one possible value
+- Commit messages claiming a fix while the bug remains
+- Placeholder comments (`// TODO`, `// FIXME`) or stubs presented as complete
+- Unnecessary defensive code for scenarios that provably cannot occur
+- Cleanup callbacks (useEffect return, finalizer, dispose, signal handler) containing only comments are misleading — implement the cleanup or remove the callback entirely
+### Configuration & hardcoding
+- Hardcoded values when config/env var exists; dead config fields; unused function parameters
+- Duplicated config/constants/helpers across modules — extract to shared module. Watch for behavioral inconsistencies between copies
+- CI pipelines without lockfile pinning or version constraints
+- Production code paths with no structured logging at entry/exit
+- Error logs missing reproduction context (request ID, input params)
+- Async flows without correlation ID propagation
+### Supply chain & dependencies
+- Lockfile committed and CI uses `--frozen-lockfile`; no drift from manifest
+- `npm audit` / `cargo audit` / `pip-audit` — no unaddressed HIGH/CRITICAL vulns
+- No `postinstall` scripts from untrusted packages executing arbitrary code
+- Overly permissive version ranges (`*`, `>=`) on deps with breaking-change history
+### Test coverage
+- New logic/schemas/services without tests when similar existing code has tests
+- New error paths untestable because services throw generic errors
+- Tests re-implementing logic under test instead of importing real exports — pass even when real code regresses. Tests asserting by inspecting source code strings rather than calling functions
+- Tests depending on real wall-clock time or external dependencies (system `git`, `gh`, `python`, etc.) — environment-dependent flakiness; mock the subprocess interface (`child_process.spawn`) instead of relying on the binary being installed
+- Missing tests for trust-boundary enforcement
+- Tests exercising code paths the integration layer doesn't expose — pass against mocks but untriggerable in production
+- Test mock state leaking between tests — "clear" resets invocation counts but not configured behavior; use "reset" variants
+- Response/status assertions written as loose ranges (`status >= 400`, `status < 500`, `ok: false`) — a regression that turns a 400 validation failure into a 500 still passes. Assert the specific expected status so tests distinguish validation from server failure
+- Tests gated by `if (process.platform !== 'darwin') return` (or POSIX-only filesystem tricks like `chmod`-based permission failures) silently skip on CI runners with different platforms — the new code becomes effectively untested. Factor platform-specific behavior into pure functions, mock `fs/promises` directly to throw deterministically, or run multi-platform CI. `vi.spyOn(process, 'platform', 'get')` is brittle because `process.platform` is a value property — use `Object.defineProperty(process, 'platform', { value: '<os>', configurable: true })` and restore the original descriptor in cleanup
+- Tests that allocate temp directories (`mkdtempSync`, `mkdir`), spawn long-lived child processes, or write artifacts must clean up in `afterEach`/`finally` (e.g., `rmSync(dir, { recursive: true, force: true })`). Without cleanup, the OS temp dir accumulates over many test runs; concurrent test orderings can collide on shared paths
+- Tests that mutate global state inside the test body (`vi.useFakeTimers()`/`jest.useFakeTimers()`, monkey-patches, `Object.defineProperty(process, 'platform', ...)`, env-var overrides, `mock.module()` setup, frozen `Date.now`, intercepted `console.log`) and only restore at the END of the happy path leak the mutation into the next test when an assertion throws midway — a flaky cascade where one failure causes unrelated tests to misbehave. Restore in a `try/finally` block inside the test body, OR move setup/teardown into `beforeEach`/`afterEach` for the describe block so the framework guarantees cleanup regardless of assertion outcome
+- Tests whose name or description claims a behavior they don't actually assert (`'forwards lastImageFile'` that only checks `prompt` and `mode`) lie about the contract — the test passes even when the named behavior regresses. Either rename the test to match what's asserted or add the missing assertion
+- Tests asserting a specific validation rejection (negative number, oversized string, invalid format) must provide ALL OTHER required fields in valid form — otherwise the 400 the test sees comes from a different validation path (UUID failure, missing required field) and the intended rule is never exercised. Use a valid fixture for unrelated fields so the rejection is attributable to the field under test
+- Path assertions in tests using forward-slash literal substrings (`expect(p).toContain('/some/sub/path')`) fail on Windows where `path.join` produces backslashes. Use `path.join()` in expectations (`expect(p).toContain(join('some', 'sub', 'path'))`), assert the suffix via `endsWith(join(...))`, or use a separator-agnostic regex (`/some[\/\\]sub[\/\\]path/`) — otherwise the test is silently macOS/Linux-only despite running on a Windows CI matrix
+- Refactors that consolidate logic into a SINGLE helper that becomes the source of truth for a critical shape (`buildXPayload`, `serializeY`, `formatZ`) — this helper deserves explicit unit-test coverage on its output fields/shape, because its bugs propagate to every consumer (engine, IPC, route fallback, websocket emitter) without a localized symptom. After extracting a shared builder, add field-by-field assertions on the helper's output and at least one test per call site that exercises end-to-end emission
+### Automated pipeline discipline
+- Internal code review must run before creating PRs — never go straight from "tests pass" to PR
+- Copilot review must complete before merging
+- Automated agent output must be reviewed against project conventions
+### Style & conventions
+- Naming and patterns inconsistent with rest of codebase
+- New content not matching existing indentation, bullet style, heading levels. Within a single structured file (changelog, README, TOML config), section headers must be unique — duplicate `## Fixed` blocks or repeated table sections are a merge artifact that splits content downstream tools expect to find under one header. Consolidate
+- Shell instructions with destructive operations not verifying preconditions first
+## Output Format
+For each finding:
+```
+file:line — [CRITICAL|IMPROVEMENT|UNCERTAIN] description
+Evidence: `quoted code line(s)`
+```
+Only report verified findings with quoted code evidence. If you cannot quote specific code for a finding, mark as [UNCERTAIN].

package/lib/review-surface-scan.md CHANGED Viewed

@@ -1,7 +1,7 @@
 # Surface Scan Review Agent
 ## Mandate
-You review code for per-file correctness: bugs, quality issues, and convention violations visible within a single file. You do NOT trace call chains or data flows across files — another agent handles cross-file analysis.
+You review code for per-file RUNTIME CORRECTNESS: bugs that crash, fail, mishandle data, or produce wrong output at runtime, plus domain-specific runtime patterns. You do NOT trace call chains across files (the cross-file agents handle that) and you do NOT audit code quality / conventions / tests / documentation drift (the surface quality agent handles that).
 ## Approach
@@ -10,11 +10,17 @@ Apply the checklist as a prompt for attention, not an exhaustive specification.
 ## Reading Strategy
 For each changed file, read the **ENTIRE file** (not just diff hunks). New code interacting incorrectly with existing code in the same file is a common bug source. Review one file at a time.
-## Principles to Evaluate
+## Out of Scope (handled by sibling agents)
-**YAGNI** — Flag abstractions, config options, parameters, or extension points that serve no current use case. Unnecessary wrapper functions, premature generalization (factory producing one type), unused feature flags.
+The following concerns are explicitly NOT this agent's job — do not flag them here:
-**Naming** — Functions and variables should communicate intent without reading the implementation. Booleans should read as predicates (`isReady`, `hasAccess`), not ambiguous nouns.
+- **Code quality / conventions / naming / YAGNI / dead code** — handled by the Surface Quality agent
+- **Tests / documentation drift / commit hygiene** — handled by the Surface Quality agent
+- **Cross-file call chains, state lifecycle, concurrency** — handled by the Cross-File Tracing agent
+- **Schemas / validation / error classification / architectural pattern adherence** — handled by the Cross-File Contract agent
+- **Secrets, auth, supply-chain, prompt-injection** — handled by the Security Audit agent
+If a concern fits one of the categories above, drop it from your output rather than co-flagging — that overlap is exactly what this split was intended to remove.
 ## Checklist
@@ -22,7 +28,6 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 **Hygiene**
 - Leftover debug code (`console.log`, `debugger`, TODO/FIXME/HACK), hardcoded secrets/credentials, uncommittable files (.env, node_modules, build artifacts, runtime-generated data/reports)
-- Overly broad changes that should be split into separate PRs
 **Imports & references**
 - Every symbol used is imported (missing → runtime crash); no unused imports. Also check references to framework utilities (CSS class names, directive names, component props) — a non-existent utility class or prop name silently does nothing
@@ -37,11 +42,24 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 - Parallel arrays coupled by index position — use objects/maps keyed by stable identifier
 - Shared mutable references: module-level defaults mutated across calls (use `structuredClone()`); `useCallback`/`useMemo` referencing later `const` (temporal dead zone); spread followed by unconditional assignment clobbering spread values
 - React state invariants (uniqueness, cap/floor, monotonicity) checked against render-time value before `setX(...)` — rapid events or concurrent updates race the check. Move into the functional updater: `setX(prev => prev.includes(id) ? prev : [...prev, id])`
+- Bound-derived state values (current time, scroll position, focused index, selected page) that are bounded by another piece of state (total duration, list length, page count) must be clamped when the bound shrinks. Removing a clip / deleting an item / tightening trims can leave the derived value past its new upper bound, and downstream readers (preview seekers, `arr[index]`, `findClipAt(t)`) return garbage / past-end results / undefined elements. Add a `useEffect` keyed on the bound that clamps the derived value (and pauses playback / clears selection if appropriate)
 - `useEffect` depending on state it writes — the write retriggers the effect (infinite loop / request storm). Split into two effects, drop the self-written value from deps, or use a functional setter that doesn't require the current value in deps
-- Functions with >10 branches or >15 cyclomatic complexity — refactor
 - String accumulation via `+=` inside high-frequency loops (streaming frames, chunked I/O, per-event handlers) is O(n²) for long outputs and triggers React re-renders on growing payloads. Collect chunks into an array and `join('')` at the end while still emitting per-chunk events
+- Sorting an entire collection just to find a single max/min/first-matching element is O(n log n) when a single pass is O(n) — `[...items].sort(byDate)[0]` over thousands of items burns CPU on every render. Use a linear scan that tracks the current best (`reduce`, manual `for` loop with comparator). Especially impactful when invoked per-render or per-component (collection card, list row)
 - Server-side string formatting (`toLocaleString`, `toLocaleDateString`, currency/number formatters) that depends on locale/timezone defaults produces non-deterministic outputs across deployments. For data flowing into prompts, logs, or persisted records, format with explicit `Intl.DateTimeFormat({ timeZone: ... })` or ISO strings; reserve locale-aware formatting for user-visible UI layers
 - Required-at-use-time config values (model name, API key, endpoint URL, default selection) that may be null/undefined in source data must be validated at the boundary before invoking the downstream API. Otherwise the API responds with an opaque error far from the user's intent. Emit a clear, actionable error identifying the missing field
+- Optional-chain incompleteness through dereference chains: `obj?.a.b.c` only guards `obj` — every subsequent `.b` / `.c` still throws if `a` or `b` is null. Extend optional chaining through every dereference (`obj?.a?.b?.c`), or destructure with defaults at the boundary. Common in helpers wrapping CLI output (`res?.stdout.split(...)` crashes when stdout is missing)
+- Temp filenames derived from `Date.now()` collide when concurrent operations start in the same millisecond — corrupted output, mid-flight `unlink` of another request's file. Use `randomUUID()` (combined with `process.pid` for cross-process isolation), `fs.mkdtemp` for per-request scratch dirs, or a shared atomic counter. Same caveat for any filename pattern using a clock reading or a non-unique sequence
+- Cross-platform `fs.rename` "destination exists" handling that destroys the user's output on failure. POSIX `rename` atomically replaces the destination; Windows `fs.rename` rejects when the destination exists. Naive Windows fix (`unlink(dest); rename(tmp, dest)`) is dangerous: if `rename` fails (file lock, AV scan, transient permissions), the original is gone and only the temp file remains. Use a backup-aside rollback: rename original to `<dest>.bak.<uuid>`, rename temp into place, unlink the .bak on success; on failure, rename .bak back. Treat `ENOENT` on the initial backup rename as "no original existed." Same caveat for any optimization-style replace (faststart remux, sidecar updates) — and never silently swallow rename errors: if the rename throws, unlink the temp file in the catch (so failed optimization doesn't slowly leak `.fs.tmp` / `.bak` files) and log a warning so operators know the optimization didn't apply
+- Cache miss for falsy successful values: `if (cache[key]) ... else cache[key] = compute(...)` re-computes whenever the cached value is `''`, `0`, or `false` — falsy successful results are never hit. Use `if (key in cache)` (or `Object.hasOwn(cache, key)`) so the *presence* of the key, not its truthiness, controls the cache hit. Same caveat applies to `Map.get(key) ?? compute()` when the cached value can be falsy
+- Browser storage APIs (`sessionStorage`/`localStorage` `setItem`/`getItem`, IndexedDB) and `JSON.parse` on stored values can throw — Safari private mode, quota exceeded, disabled cookies, corrupted data, or older schema all surface as runtime exceptions during render or effects. Wrap reads/writes in try/catch and validate the parsed shape (string/object/required fields) before consuming; storage failure must not crash the page or trip the nearest error boundary
+- LLM tool-call / palette / agent-invoked function parameters commonly arrive as JSON-typed strings even when the schema declares `number`/`boolean` (the calling LLM serialized them as strings). Pass-through patterns like `width: width || undefined` forward `'1024'` as a string into downstream APIs. Coerce explicitly (`Number(v)`, `Number.isInteger`, range guards) and return a structured error on invalid coercion — bypassing route-layer Zod schemas means the tool's own handler must enforce the same contract
+- Boolean flags surfaced to clients (`feature.enabled`, `mode.active`) must reflect whether the feature is CONFIGURED, not whether transient runtime state happens to be present right now. Deriving `enabled` from "are there active items" causes the flag to flip between cycles/fills/sessions even though the feature is still on. Source from config; report runtime presence in a separate field (`hasActiveItems`, `currentCount`)
+- Producer emits `null` or `0` as an "unknown" sentinel — consumers compute against it as if it were real (`Date.now() - placedAt` with `placedAt: null` becomes epoch → "decades old" age display; `qty * lastPrice` with `lastPrice: 0` zeroes out P&L/APY). Either omit the field, use an explicit `unknown: true` / `stale: true` marker, or document and verify each consumer guards with `!= null` before computing
+- Tracking sets/arrays/maps that only GROW during process lifetime — settled-id sets, seen-event fingerprints, processed-task ids — leak monotonically with workload. Long-lived services (workers, daemons, market-data feeds) accumulate over weeks and OOM. Cap with size limits + LRU eviction, periodic age-based pruning, OR scope to a smaller window (per-cycle, per-day) when feasible
+- State machines that handle only terminal events (`FILLED`, `COMPLETED`, `SETTLED`) ignore intermediate progress fields (`filledSize`, partial completion, in-progress counters) the producer exposes between start and terminal. Consumers that want live progress need the intermediate values too — process them or document the lag
+- File-existence checks for "must be a regular file" — `existsSync(path)` returns true for directories, symlinks, and special files. When downstream code needs a regular file (image, config, manifest, ffmpeg input), use `statSync(path, { throwIfNoEntry: false })?.isFile()` and reject `.`, `..`, and empty basenames before passing to subprocess args / `readFile`. Wrap `statSync` in try/catch when invoked from request validation paths so transient permission/FS errors surface as clean 4xx, not unhandled 500s
+- Cache-validity checks using `existsSync` alone return zero-byte / partial / corrupted files from prior failed attempts as valid cache hits. When the cache is a file written by a subprocess that may crash mid-write (ffmpeg frames, downloaded artifacts, generated thumbnails), validate the cached file is non-empty (`statSync(path).size > 0`) and consider a magic-byte / format check; on extraction failures, unlink the partial output before returning so the next call re-runs cleanly. Same caveat for "fall through to re-extraction on stat error" comments — wrap the `statSync` in try/catch and treat any error (EACCES, transient I/O, ENOENT) as cache miss, not as a fatal abort
 **API route basics**
 - Route params passed to services without format validation; path containment using string prefix without separator boundary (use `path.relative()`). When sibling endpoints validate body/query fields, the path param must be validated with the same schema — skipping param validation on one endpoint turns schema violations into 404/500 instead of 400
@@ -49,9 +67,13 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 - Stored or external URLs rendered as clickable links without protocol validation — allowlist `http:`/`https:`
 - Request schema fields for large string/binary payloads (base64, file content, free text) without per-field size limits — a total body-size limit alone doesn't prevent individual oversized fields from consuming excessive memory or exceeding downstream service limits; add per-field `max(N)` constraints with clear error messages
 - Character-class regex validators (`^[a-z0-9-]+$`) claiming to enforce a structured format (slug, kebab-case, reverse-DNS) — they accept leading/trailing separators (`-foo`, `foo-`) and repeated separators (`a--b`). Require alnum boundaries or use a parser
-- `z.string().min(1)` without `.trim()` accepts whitespace-only values for user-visible names — use `z.string().trim().min(1)` when the field represents a human-readable identifier
+- Schema-validator chain ordering — checks run in chain order, so `.trim()` AFTER `.min(1)` (`z.string().min(1).max(200).trim()`) means whitespace-only strings pass the non-empty guard and get trimmed to `''` afterward, defeating the intent. Always order as `z.string().trim().min(1)` for human-readable names AND for nullable foreign-key IDs (`folderId`, `parentId`, `workId`): `z.string().max(N).nullable()` without `.trim().min(1)` accepts `''` and persists it as the ID — service layers treat `''` as falsy (skipping existence checks) but still write it, breaking lookups and grouping invariants. Tighten to `z.string().trim().min(1).max(N).nullable()` (or preprocess `''` → `null` at the schema boundary). Handler-side trim normalization (`patch.title = patch.title.trim()`) must additionally reject post-trim emptiness — a PATCH with `"   "` trims to `''` and persists a blank value even when create requires non-empty
+- HTTP query parameters can arrive as arrays for repeated keys (`?id=a&id=b`) in most server frameworks (Express, Fastify, etc.) — a route that types `req.query.id` as a string and passes it through unchecked may treat the array as `undefined` (silently dropping the filter and returning UNFILTERED data, a quiet authorization/leakage bug) or pass the array shape downstream. Validate explicitly at the route boundary: accept only `typeof === 'string'`, take the first element when arrays are intended, or reject with 400. Apply via a Zod query schema covering every typed query field
 - Object spread of a potentially-null/undefined boundary value (`{ ...req.body, id }`) — throws a TypeError and surfaces as 500 instead of 4xx. Use `{ ...(req.body ?? {}), id }` at request/boundary entry points
+- HTTP header values that are case-insensitive by spec (`Content-Type`, `Accept`, `Authorization` scheme, `Transfer-Encoding`) must be lowercased before comparison — `startsWith('multipart/form-data')` against `Multipart/Form-Data; boundary=...` returns false and skips the middleware. Parse parameterized values structurally (`mimeType.split(';')[0].trim().toLowerCase()`) rather than substring-matching the raw header
 - Schemas accepting paired range fields (`startDate`/`endDate`, `min`/`max`, `from`/`to`) without a cross-field refinement (`zod .refine()`) — accepts inconsistent ranges (start > end). Define deterministic rules when only one bound is supplied
+- PATCH/update endpoints accepting an empty body (or only metadata fields like `expectedUpdatedAt`) pass validation, write-through bumps `updatedAt`, and produce needless write churn + optimistic-concurrency conflicts. Add a `.refine()` requiring at least one mutating field, OR have the service no-op when the patch is empty
+- Truthy-only guards on concurrency tokens / etags / version stamps (`if (expectedUpdatedAt && ...)`) skip the conflict check whenever the client sends an empty string, allowing overwrites of newer data. Use `!= null` (or explicit string validation that rejects empty/whitespace) so "provided but empty" surfaces as a 400, not a silent bypass
 - Outbound payloads (snippets, previews, cached excerpts) stored in persisted records or sent over streaming protocols without size caps — applies the same per-field max constraint as inbound payloads, enforced at capture time so display-layer truncation isn't the only defense
 **Async & state (single-file patterns)**
@@ -67,6 +89,11 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 - Caches that store negative/error results (`null`, "not found", probe failure) without a TTL or invalidation hook — when a user installs the missing dependency mid-runtime (ffmpeg, python venv, model file), the cache reports "still missing" until process restart. Cache only successful lookups, OR use a short TTL for negatives, OR re-probe on demand when the cached value is the negative sentinel
 - Late-connecting clients to long-running async jobs (SSE, WebSocket subscribe-by-id) receive nothing if they connect after the terminal `complete`/`error` broadcast — the server emitted once and moved on. Persist the most-recent (or terminal) payload on the job and emit it immediately on attach, OR document that subscribers must connect before kicking off the job and update any "late connectors will get the final state" comments accordingly
 - Server returning an empty success payload (`200` with `{ images: [] }`, `{ items: null }`, etc.) when an awaited operation succeeded but the artifact fetch failed — clients treat empty as "no work to show" and never surface the underlying error. After awaiting completion, a missing/unreadable artifact is an internal error: return non-2xx with a structured error, never an empty 200
+- DOM event handlers (`onMouseLeave`, `onBlur`, `onPointerLeave`, key handlers) registered via JSX close over render-time state — so `if (pressing) endPress()` inside the handler reads the value from the closure, not the live state, and may see stale `false` even when the user is still holding. Use refs (`pressingRef.current`) for liveness flags read inside DOM handlers, OR call cleanup methods unconditionally and have them no-op if not active
+- Synchronous re-entrancy guards via React state — `if (saving) return; setSaving(true); await save()` is unsafe because `setState` is async and rapid repeated triggers (key-repeat Cmd/Ctrl+S, double-click, debounce overlap, two paths to the same handler) all observe `saving === false` and fire concurrent operations. Use a synchronous mutable ref (`savingRef.current = true` set BEFORE awaiting; reset in `finally`) for the gate, and reserve React state purely for rendering. Same pattern applies to any "in-flight" boolean used as a mutex
+- Controlled-input rehydration from a parent prop — components that hold local state derived from `prop` and only sync via `useEffect(() => setLocal(prop.field), [prop.id])` MISS prop changes within the same parent identity (switching draft versions on the same work, switching tabs on the same record, server normalizing the field). Track every identity discriminator that should trigger rehydration (`[work.id, work.activeDraftVersionId]`), AND sync after server normalization — when a save returns a normalized value (trimmed, lowercased, server-canonical), update local state from the response, not just from optimistic input. Also: a `<select value={prop.status}>` controlled by the prop with no local state can snap back to the old value on re-render before the PATCH resolves — keep a local state mirror and update it in `onChange`
+- Sorted-list mutations via in-place patch (`items.map(x => x.id === id ? merged : x)`) leave the list visually sorted by the OLD order when the mutation changes the sort key (`updatedAt`, priority, score) — a recently-saved item stays mid-list instead of moving to the top. After mutating the sort key, re-sort the array (or splice the item to its new position). Same for filter predicates that may now exclude the item, and for selection sets that may need pruning when the mutation changes the entity's group/tab
+- Reload-on-update fallback that replaces richer state with partial input — handlers like `if (opts.reload) { fresh = await refetch(); setState(fresh) } else { setState(updated) }` that fall back to `setState(updated)` ON REFETCH FAILURE blank fields the previous state had (full body, comments, derived metadata, joined relations) when the input lacks them. Either keep the previous state on refetch failure, retry, or surface the error explicitly — never replace richer state with the partial input that triggered the reload as a "recovery" path
 **Streaming response handlers (server-side)**
 - Long-lived streaming handlers (SSE, chunked HTTP) must register a client-disconnect listener (`req.on('close')`/`req.on('aborted')`), set an `aborted` flag, and check it before subsequent writes — otherwise the server keeps emitting frames and incurring `write-after-end` errors after the client navigates away
@@ -74,14 +101,20 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 - Per-request timeouts on streaming responses must remain active for the full duration of stream consumption, not cleared on initial fetch resolution — a stalled upstream that keeps the connection open hangs the consumer indefinitely
 - Honor write backpressure: check the boolean return of `write()` and await `'drain'` when it returns false; otherwise a slow client causes unbounded server-side buffering
 - When attaching paired listeners for backpressure or completion (`drain` + `close`), the cleanup handler must remove ALL of them — asymmetric removal accumulates listeners across slow-client cycles
+- EventEmitter / socket cleanup using inline anonymous handlers (`socket.on('foo', () => { ... })`) cannot be removed precisely later — assign the handler to a named const (`const handleFoo = () => {...}; socket.on('foo', handleFoo); return () => socket.off('foo', handleFoo);`). On shared emitters/sockets/buses, NEVER call `.off(event)` without the specific handler — that removes ALL listeners on the emitter, breaking sibling subscribers (e.g., other components listening to the same socket event)
+- State-reset methods (`clear()`, `reset()`, `dispose()`, mode-switch handlers) must release every related artifact, not just the data they track: timers (`clearTimeout`, `clearInterval`), refs (`pressingRef.current = false`), audio playback (`stopTone()`), pending press state, animation frames, and any boolean flags blocking re-entry. A `clear()` that resets only the dataset leaves dangling timers firing on stale state, audio playing forever, and a stuck `pressingRef` that prevents the next interaction
 - Named lifecycle events on streams (`error`, `done`, `complete`) must be MUTUALLY EXCLUSIVE — after emitting `error`, do NOT also emit `done`. Otherwise clients parsing the last event treat failed runs as completed
 **Error handling (single-file)**
 - Swallowed errors (empty `.catch(() => {})`); error handlers that exit cleanly (`exit 0`, `return`) without user-visible output; handlers replacing detailed failure info with generic messages
 - Error discrimination by string matching (`err.message.includes('not found')`, regex on error text) — localization, refactors, or wrapper rewrites silently change HTTP status / retry behavior. Use explicit error codes or typed classes
 - Route handlers mapping any exception from a service into a single HTTP status (e.g., `catch { throw new NotFoundError() }`) — hides real server errors (file I/O, parse, write failures) as domain 404s. Map only known codes/classes; let unknown errors surface as 500
+- Catch-all fallback that synthesizes a SUCCESS-shaped payload indistinguishable from a legitimate quiet state — e.g., catching any IPC/RPC failure and returning `{ success: true, status: { isRunning: false } }` so the dashboard renders "engine cleanly stopped" whether the engine is actually stopped, timing out, or crashed. Operators can't distinguish "expected idle" from "outage". Either preserve a distinct health/mode value the UI recognizes as "outage" AND ensure every consumer maps it, OR translate the underlying error to a specific HTTP status. Don't gate the fallback on broad rejection types — distinguish timeout, connection-refused, and method-not-found
+- State-machine transitions gated on an external call (settle a tracked order, mark a job complete) must verify success: a thrown error, an empty array where data was expected, or a null where a record was expected MUST keep the state in `pending`, NOT advance it. A swallowed external-call failure followed by `state = 'settled'` loses ledger entries and erases work that was never confirmed. Distinguish "external call definitively reports terminal state" from "external call failed to populate the data we needed"
 - Error wrappers that re-throw with only `{ status }` and drop `code`/`context`/`cause` — downstream consumers see generic `INTERNAL_ERROR` instead of the specific code. Preserve structured detail when wrapping
 - Errors thrown from middleware/parser modules (multipart, body parsers, validators) without `err.status` set are normalized to HTTP 500 by the framework's default error handler — but they typically represent client payload issues (bad multipart, payload too large, file type rejected, missing boundary). Set `err.status = 400` (or 413 for size limits, etc.) and a stable `err.code` (`PAYLOAD_TOO_LARGE`, `INVALID_MULTIPART`, `VALIDATION_ERROR`) at the throw site, OR throw a typed `ServerError`/`ApiError`, so clients distinguish their bad input from real server failures
+- Error message templates that interpolate possibly-empty values (`${stderr.split('\n')[0]}`, `${err.code}`, first-line excerpts) produce useless output (`X failed: .`, `X failed: undefined`) when the upstream returns blank/whitespace/missing. Trim the source, fall back to a default (`err.message`, `'unknown error'`) when empty, and skip trailing punctuation that piles on already-punctuated content (`needsPeriod = !/[.!?]$/.test(text)`)
+- JSDoc / comment claims of absolute behavior (`Never throws`, `Always returns`, `Synchronous`, `Idempotent`, `Pure`) must be verified against the implementation. A "Never throws" helper that calls `mkdirSync`/`readFileSync`/`atomicWrite` will throw on permissions/disk-full and crash callers that built error handling around the documented contract. Either soften the doc to specify which failure modes are handled (e.g., "expected provisioning failures surface as `{ ok: false }`; unexpected FS errors may still throw") OR wrap the throwing operations in try/catch and return a structured failure
 - Outbound `fetch()` / HTTP calls in setup, install, or update scripts (`scripts/*.js`, `setup.sh` invoked tools, post-install hooks) without an `AbortController` per-request timeout — a hung server (accepts connection, never responds) blocks the parent shell process indefinitely, breaking "fail-soft" guarantees that the parent script depends on. Use the same timeout helper the rest of the codebase uses for outbound HTTP, and treat timeout as a skip with a clean exit code
 ### Domain-Specific (check only when file type matches)
@@ -106,7 +139,7 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 **Lazy initialization** _[dynamic imports, lazy singletons, bootstrap]_
 - Cached state getters returning null before initialization
-- Module-level side effects (file reads, SDK init) without error handling
+- Module-level side effects (file reads, SDK init) without error handling. Loaders for user-editable config (`media-models.json`, settings, registries) invoked at import-time must wrap BOTH the read (`readFileSync` can throw on permissions, broken symlinks, transient I/O) AND the parse (`JSON.parse` throws on malformed edits) in try/catch — an uncaught exception during import aborts server boot. Beyond IO/parse: parsed-but-malformed data (missing top-level keys, wrong types like `image: {}` instead of `image: []`) crashes downstream consumers (`reg.video.macos` is undefined → TypeError). Normalize the parsed object: deep-merge with defaults, coerce array-typed fields with `Array.isArray(...)`, coerce object-typed fields with a `isPlainObject` check, and fall back to defaults with a clear log message when shapes don't match
 - File writes assuming parent directory exists
 - Bootstrap code importing dependencies it's meant to install — restructure so install precedes resolution
 - Re-exporting from heavy modules defeats lazy loading
@@ -123,6 +156,10 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 - Detached processes with piped stdio — SIGPIPE on parent exit. Use `'ignore'`
 - Subprocess output buffered without size limits — unbounded memory growth
 - Platform-specific: hardcoded shell interpreters; `path.join()` backslashes breaking ESM imports — use `pathToFileURL()`
+- Subprocess spawns of binaries with platform-or-distro-dependent names (`pwsh` vs Windows PowerShell `powershell.exe`, `python3` vs `python`, `gh` vs absent on minimal containers, `tailscale` vs path-bundled) must probe (`which`/`where`) and fall back to alternates rather than assuming the modern/preferred name is installed — many Windows boxes ship Windows PowerShell only, distro-stripped Linux containers may lack `python3`. Detect once at startup or per-call; emit a clear actionable error if no candidate is found
+- Platform guards using inverse logic (`if (!IS_WIN && ...)`) when the real condition is "macOS only" run platform-specific binaries on Linux/BSD/Unix where they don't exist (`caffeinate` is macOS-only, `pmset` is macOS-only, `nvidia-smi` requires NVIDIA hardware). Use the positive predicate (`process.platform === 'darwin'`, `IS_MAC`) — every "not Windows" check is wrong on every other non-Windows OS. The same applies to "not macOS" checks meant for Windows-only behavior
+- Setup/install scripts that switch between multiple packages providing the same import path (renamed packages, fork swaps, deprecated→replacement) must explicitly uninstall the predecessor before installing the new one (`pip uninstall -y <old>` before `pip install <new>`). Otherwise both remain installed and Python's import resolution becomes ambiguous (or the wrong one wins on path order). Idempotent: suppress errors from the uninstall step so reruns don't fail when the old package is already absent
+- Case-sensitive string operations on filesystem paths break on case-insensitive filesystems (Windows always, macOS HFS+/APFS by default). Deriving a sibling binary path via case-sensitive `replace(/ffmpeg$/, 'ffprobe')` against a real filesystem path that may be `FFMPEG.EXE` produces an unchanged path → caller spawns the wrong binary and `existsSync` confirms it exists. Use case-insensitive regex flags (`/ffmpeg$/i`) and/or derive the sibling via `path.parse()` + `path.format()` so the result is structurally guaranteed to be the intended binary, not a string that happens to match a pattern
 - Naive whitespace splitting of command strings breaks quoted arguments — use proper argv parser
 - Subprocess output parsed from single stream (stdout or stderr) to detect conditions — check both streams and exit code
 - Readiness/health probes that rely solely on subprocess exit code without inspecting output — many CLIs (`psql`, `curl`, `kubectl`) exit 0 for empty results, missing schema, or auth-only handshake. Capture stdout and verify it contains the expected marker. For tools that read user-level config (`.psqlrc`, `~/.curlrc`), pass flags that ignore those files (`-X`, `--no-rcfile`) so the probe behaves the same in every environment
@@ -131,6 +168,9 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 - Arguments passed via process argv have OS-imposed length limits (notoriously low on Windows, ~32KB). For variable-length payloads (prompts, JSON blobs, file contents), pipe via stdin instead of constructing a long argv. If argv must be used, enforce a strict cap and fail with a clear message before spawning
 - PowerShell `$LASTEXITCODE` propagates from any external call and is read by the script's final exit. A step claiming to be "fail-soft" (e.g., a non-essential post-install hook) that runs an external command without explicitly resetting `$LASTEXITCODE = 0` (or wrapping in try/catch with `$global:LASTEXITCODE = 0`) leaks a non-zero exit code from the soft step into the parent script's overall exit status — breaking the fail-soft contract that callers depend on
+**Multi-input pipelines** _[ffmpeg concat, audio mixers, image compositors, set unions]_
+- Pipelines that join multiple input streams (ffmpeg `concat`, audio mixers, image compositors, SQL `UNION ALL`) require matching parameters across ALL inputs (sample rate, channel layout, sample format, frame rate, dimensions, column types). When one branch normalizes (`aresample=48000,aformat=fltp`, dim coercion) and a sibling branch (silent track from `anullsrc`, fallback empty stream, generated placeholder) does not, the join fails at runtime ("Input link parameters do not match") or produces uneven output. Apply the SAME normalization across regular AND fallback/generated branches; if the pipeline is variable-rate, force a canonical rate (`fps`, `settb`, `aresample`) on every input
 **Search & navigation** _[search, deep-linking]_
 - Search results linking to generic list pages instead of deep-linking to specific record
 - Search code hardcoding one backend when system supports multiple
@@ -140,6 +180,10 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 **Accessibility** _[UI components, interactive elements]_
 - Interactive elements missing accessible names, roles, or ARIA states — including labels removed or replaced with non-descriptive placeholders in conditional/compact rendering modes. ARIA attributes should match established patterns used elsewhere for the same widget type (disclosure, menu, dialog)
+- Icon-only buttons relying on `title` for their accessible name fail across screen readers — `title` is announced inconsistently or not at all, and not surfaced to assistive tech in many contexts. Add an explicit `aria-label` to the button and mark the icon `aria-hidden="true"` since it's decorative
+- Form submission via Enter calls `onSubmit` regardless of the submit button's disabled state — submit handlers must replicate every guard the disabled state enforces (`notConnected`, missing prerequisite, validation failure, in-flight). Either set the inner button to `type="button"` (so Enter doesn't submit) and submit only via the explicit handler, or duplicate every disabled-condition into the submit handler's early-return guard
+- `<input type="number">` (or any text input) controlled with format-on-render (`value={n.toFixed(2)}`, `value={String(parsed)}`) prevents users from typing intermediate states ("", "0.", "-", "1e") because every keystroke snaps the value back to a formatted number. Store the raw string in local state and only parse/clamp on `onBlur` or Enter, OR use `defaultValue` with an uncontrolled input and read on commit — controlled + format-on-render is broken for free-form numeric entry
+- Form inputs that mutate the source-of-truth prop (`onChange={e => project.name = e.target.value}`, `onChange={e => setProject({ ...project, name: e.target.value })}`) on every keystroke break dirty-checks that compare against the same prop on `onBlur` (`if (trimmed === project.name) return`) — the guard is always true, so the save never fires. Use a separate draft state OR compare against the last-saved/server-confirmed value (e.g., a ref capturing the value at mount), and trigger the save when the draft differs from the saved snapshot
 - ARIA roles applied without the keyboard interactions they imply — `role="menu"`/`menuitem*` expects roving focus, arrow-key navigation, Escape scoped to the menu, and focus management; `role="listbox"` expects Home/End/typeahead; `role="dialog"` expects focus trap + return focus. Either implement the full interaction pattern or drop to a simpler one (native `<button>` + disclosure)
 - Nested inputs handling `Escape`/`Enter`/`ArrowUp`/`ArrowDown` inside a modal/form that also handles the key at the ancestor — the event bubbles and the ancestor fires too (closes modal, submits form). Call `e.stopPropagation()` (and usually `preventDefault()`) in the inner handler
 - Custom toggles from non-semantic elements instead of native inputs
@@ -148,75 +192,18 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 **UI performance** _[UI components with streaming, scroll, or frequent updates]_
 - Event handlers or `useEffect` callbacks firing on every high-frequency event (streaming deltas, scroll, resize, keydown) without throttle, debounce, or `requestAnimationFrame` batching — causes jank and excessive re-renders. Batch with rAF or a time-based limiter when the handler doesn't need to run on every tick
+- Global event-listener `useEffect`s (`addEventListener('keydown'/'mousemove'/'resize')`) whose dependency array includes a rapidly-mutating object or callback — the listener detaches and re-attaches on every change, churning DOM and racing in-flight events. The cleanup also runs on every re-attach, so any timers/refs the cleanup clears (`flushTimerRef`, `wordTimerRef`, audio nodes) get reset mid-interaction. Stabilize: keep changing values in a `ref` read inside a stable handler, and depend only on truly listener-relevant inputs (`enabled`, route key) — not on the live filter object or per-press state
+- Background polling (`setInterval(asyncFn, N)`, recurring `fetch` chains) must (a) suppress per-iteration error toasts via the codebase's `silent` flag (or equivalent) — transient failures otherwise spawn toast storms; (b) guard against overlapping in-flight requests with a ref boolean or convert to a `setTimeout`-after-resolve loop. `setInterval(asyncFn, N)` produces overlapping requests when N < response time, leading to out-of-order state updates and resource pile-up
+- Background polling whose data appears in long-lived UI surfaces (HUD counts, headers, dashboards) must subscribe to live event streams (sockets, pub/sub) rather than re-polling on a fixed cadence — one-shot fetches at mount go stale immediately, and polling alone misses bursty changes between intervals
-**Wire-protocol parsers** _[SSE/NDJSON/line-delimited frame parsers, multipart, etc.]_
-- Wire-protocol parsers must (a) handle the spec's full set of separators (e.g., both `\n\n` and `\r\n\r\n` for SSE; multiple `data:` lines joined with `\n`); (b) flush remaining buffered content on EOF — otherwise the last frame is dropped when upstream closes mid-frame; (c) wrap per-frame deserialization (`JSON.parse`) so a single malformed frame doesn't terminate the entire stream
+**Wire-protocol parsers** _[SSE/NDJSON/line-delimited frame parsers, multipart, source-code/config parsers, etc.]_
+- Wire-protocol parsers must (a) handle the spec's full set of separators (e.g., both `\n\n` and `\r\n\r\n` for SSE; multiple `data:` lines joined with `\n`); (b) flush remaining buffered content on EOF — otherwise the last frame is dropped when upstream closes mid-frame; (c) wrap per-frame deserialization (`JSON.parse`) so a single malformed frame doesn't terminate the entire stream. Per-token regexes that scan stream chunks (line-extracting `data: foo` from raw chunks) miss matches when the producer splits a single line across chunks — buffer with a rolling buffer or use `readline` to parse complete lines before applying token regexes
+- Hand-rolled parsers for source-code or config formats (env-block extraction, brace matching, key extraction) must handle the language's full token grammar: nested braces with depth counting (`\{([^}]*)\}` truncates at first `}`, missing nested object spreads / ternaries), ALL string delimiters including backtick template literals (not just single/double quotes), escape detection by **counting consecutive backslashes** (odd = escaped, even = not — `\\\\"` is a real quote, not an escape), and optional quoting on keys (`PORT: 3000`, `'PORT': 3000`, `"PORT": 3000`). Prefer the language's own AST parser (Babel, esbuild, ts-morph, YAML/TOML libs) when available; flag every regex that simplifies the source grammar
 - Stateful parsers (multipart, MIME, framed protocols) must verify they reached the terminal state on `req.on('end')` / EOF — calling `finish()` while still in `STATE_HEADERS`/`STATE_BODY` accepts truncated input as success, silently corrupting partially-written uploads or persisting half-written state. Track the terminal-state transition (e.g., `STATE_DONE` after the closing `--boundary--`) and return a 400 error otherwise (and clean up any partial files written)
 - Per-part state in stateful parsers must be reset at part boundaries — fields like `currentFileMimetype`, accumulated headers, decoder state, and offsets that aren't cleared at the start of each new part will leak the previous part's value (e.g., a file part with no `Content-Type` inherits the previous part's mimetype). Reset per-part state at the top of the part-start handler
 - Refactoring a streaming parser to "buffer-then-process" (calling `readAllBytes()` / `Buffer.concat(chunks)` / `await req.text()` before parsing) defeats the streaming contract and re-introduces an OOM/DoS vector for large uploads — verify the new implementation still respects each caller's `maxSize`/body cap WHILE reading (stop collecting once bytes exceed the cap), or restore true streaming. Watch for header comments still claiming "streams" / "never buffers entire body in memory" after such refactors — they become a documentation lie
 - Library wrappers advertising a multer/express-style contract `(req, file, cb)` must pass the real `req` (not `null`) through to filters/hooks; treating the `cb` as synchronous breaks any caller that supplied an async filter (callback fires later, but the wrapper already read pre-callback state). Either enforce synchronous filters with a clear error and document, or `await` a Promise-wrapped callback before continuing
-### Always Check — Quality & Conventions
-**Intent vs implementation (single-file)**
-- Labels, comments, status messages describing behavior the code doesn't implement. Also covers factual doc drift: file paths/extensions (`foo.js` referenced when the file is `foo.jsx`), item counts ("13 widgets" when there are 15), default entity names ("Default" vs actual "Everything"), and route/response-shape comments that don't match what the handler returns. Verify every factual claim in a comment or JSDoc against the code it references
-- Inline code examples or command templates that aren't syntactically valid
-- Sequential numbering with gaps or jumps after edits
-- Template/workflow variables referenced but never assigned — trace each placeholder to a definition
-- Constraints described in preamble not enforced by conditions in procedural steps
-- Duplicate or contradictory items in sequential lists
-- Completion markers or success flags written before the operation they attest to
-- Existence checks (directory exists, file exists) used as proof of correctness — file can exist with invalid contents
-- Lookups checking only one scope when multiple exist (local branches but not remote)
-- Tracking/checkpoint files defaulting to empty on parse failure — fail-open guards
-- Registering references to resources without verifying resource exists
-- Composed instructions, prompts, system messages, or rule sets that vary by mode/role/context — unconditional clauses can contradict mode-specific directives (e.g., "always cite sources inline" combined with a `draft` mode that asks for "no preamble, no commentary"). Build the composition conditionally — include each block only for modes that want it — or define an explicit precedence so contradictions are predictable
-**UX integrity (single-component)**
-- Unsaved changes / dirty state silently discarded when the user switches context in a multi-record editor or closes a sheet — data loss. Dirty-check on switch (inline confirm), auto-save drafts, or disable the switch control while dirty. `beforeunload` does not cover in-app context switches
-**AI-generated code quality**
-- New abstractions, wrapper functions, helper files serving only one call site — inline instead
-- Feature flags, config options, extension points with only one possible value
-- Commit messages claiming a fix while the bug remains
-- Placeholder comments (`// TODO`, `// FIXME`) or stubs presented as complete
-- Unnecessary defensive code for scenarios that provably cannot occur
-- Cleanup callbacks (useEffect return, finalizer, dispose, signal handler) containing only comments are misleading — implement the cleanup or remove the callback entirely
-**Configuration & hardcoding**
-- Hardcoded values when config/env var exists; dead config fields; unused function parameters
-- Duplicated config/constants/helpers across modules — extract to shared module. Watch for behavioral inconsistencies between copies
-- CI pipelines without lockfile pinning or version constraints
-- Production code paths with no structured logging at entry/exit
-- Error logs missing reproduction context (request ID, input params)
-- Async flows without correlation ID propagation
-**Supply chain & dependencies**
-- Lockfile committed and CI uses `--frozen-lockfile`; no drift from manifest
-- `npm audit` / `cargo audit` / `pip-audit` — no unaddressed HIGH/CRITICAL vulns
-- No `postinstall` scripts from untrusted packages executing arbitrary code
-- Overly permissive version ranges (`*`, `>=`) on deps with breaking-change history
-**Test coverage**
-- New logic/schemas/services without tests when similar existing code has tests
-- New error paths untestable because services throw generic errors
-- Tests re-implementing logic under test instead of importing real exports — pass even when real code regresses. Tests asserting by inspecting source code strings rather than calling functions
-- Tests depending on real wall-clock time or external dependencies
-- Missing tests for trust-boundary enforcement
-- Tests exercising code paths the integration layer doesn't expose — pass against mocks but untriggerable in production
-- Test mock state leaking between tests — "clear" resets invocation counts but not configured behavior; use "reset" variants
-- Response/status assertions written as loose ranges (`status >= 400`, `status < 500`, `ok: false`) — a regression that turns a 400 validation failure into a 500 still passes. Assert the specific expected status so tests distinguish validation from server failure
-**Automated pipeline discipline**
-- Internal code review must run before creating PRs — never go straight from "tests pass" to PR
-- Copilot review must complete before merging
-- Automated agent output must be reviewed against project conventions
-**Style & conventions**
-- Naming and patterns inconsistent with rest of codebase
-- New content not matching existing indentation, bullet style, heading levels. Within a single structured file (changelog, README, TOML config), section headers must be unique — duplicate `## Fixed` blocks or repeated table sections are a merge artifact that splits content downstream tools expect to find under one header. Consolidate
-- Shell instructions with destructive operations not verifying preconditions first
 ## Output Format
 For each finding:

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "slash-do",
-  "version": "2.12.0",
+  "version": "2.13.0",
   "description": "Curated slash commands for AI coding assistants — Claude Code, OpenCode, Gemini CLI, and Codex",
   "author": "Adam Eivy <adam@eivy.com>",
   "license": "MIT",

package/uninstall.sh CHANGED Viewed

@@ -33,7 +33,8 @@ OLD_COMMANDS=(cam good makegoals makegood optimize-md)
 LIBS=(
   code-review-checklist copilot-review-loop graphql-escaping
   remediation-agent-template swift-review-checklist swift-gotchas
-  review-surface-scan review-security-audit review-cross-file-tracing
+  review-surface-scan review-surface-quality review-security-audit
+  review-cross-file-tracing review-cross-file-contract
 )
 HOOKS=(slashdo-check-update slashdo-statusline)