npm - slash-do - Versions diffs - 2.10.0 → 2.11.0 - Mend

slash-do 2.10.0 → 2.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/README.md +2 -1
package/commands/do/help.md +1 -0
package/commands/do/pr-better.md +94 -0
package/commands/do/review.md +7 -1
package/commands/do/rpr.md +1 -1
package/install.sh +1 -1
package/lib/code-review-checklist.md +50 -14
package/lib/copilot-review-loop.md +10 -10
package/lib/review-cross-file-tracing.md +22 -2
package/lib/review-security-audit.md +3 -0
package/lib/review-surface-scan.md +50 -6
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -24,7 +24,7 @@
 <p align="center">
   <img src="https://img.shields.io/npm/v/slash-do?style=flat-square&color=blue" alt="npm version" />
   <img src="https://img.shields.io/badge/environments-4-green?style=flat-square" alt="environments" />
-  <img src="https://img.shields.io/badge/commands-14-orange?style=flat-square" alt="commands" />
+  <img src="https://img.shields.io/badge/commands-15-orange?style=flat-square" alt="commands" />
   <img src="https://img.shields.io/badge/license-MIT-lightgrey?style=flat-square" alt="license" />
 </p>
@@ -56,6 +56,7 @@ All commands live under the `do:` namespace:
 |:---|:---|
 | `/do:push` | Commit and push all work with changelog |
 | `/do:pr` | Open a PR with self-review and Copilot review loop |
+| `/do:pr-better` | Run a full do:better audit on the current branch, commit fixes directly, then open a single PR |
 | `/do:fpr` | Fork PR -- push to fork, PR against upstream |
 | `/do:rpr` | Resolve PR review feedback with parallel agents |
 | `/do:release` | Create a release PR with version bump and changelog |

package/commands/do/help.md CHANGED Viewed

@@ -20,6 +20,7 @@ List all available `/do:*` commands with their descriptions.
 | `/do:help` | List all available slashdo commands |
 | `/do:omd` | Audit and optimize markdown files (CLAUDE.md, README.md, etc.) against best practices |
 | `/do:pr` | Commit, push, and open a PR against the repo's default branch |
+| `/do:pr-better` | Run a full do:better audit on the current branch, commit fixes directly, then open a single PR |
 | `/do:push` | Commit and push all work with changelog |
 | `/do:release` | Create a release PR using the project's documented release workflow |
 | `/do:replan` | Review and clean up PLAN.md, extract docs from completed work |

package/commands/do/pr-better.md ADDED Viewed

@@ -0,0 +1,94 @@
+---
+description: Run a full do:better audit/remediation on the current branch, commit fixes directly to it, then open a single PR with do:pr
+argument-hint: "[--interactive] [path filter or focus areas]"
+---
+# PR-Better — Better Audit + Single PR
+Run the full `do:better` DevSecOps audit and remediation, but **commit all fixes directly to the current branch** instead of creating per-category PRs. Then hand off to `do:pr` so the entire result ships as one cohesive PR (with self-review and the Copilot review loop).
+This is the right command when:
+- You want the full `do:better` quality bar on a feature branch you're about to ship
+- You want a single PR (not 8 per-category PRs) so reviewers see one cohesive change
+- You're already on a feature branch and want all audit fixes folded into the same PR as your feature work
+## Argument Forwarding
+Pass `$ARGUMENTS` through to `do:better` verbatim, with these constraints applied automatically:
+- **`--scan-only` is incompatible** — if the user passes it, refuse and explain that pr-better must remediate
+- **`--no-merge` is incompatible** — pr-better always produces a PR, but as one combined PR via `do:pr`
+- All other flags (`--interactive`, path filter, focus areas) pass through unchanged
+## Pre-flight
+1. Run `git branch --show-current` and `gh repo view --json defaultBranchRef -q '.defaultBranchRef.name'`
+2. If the current branch is the default branch, halt and tell the user: pr-better needs a feature branch — either create one first or run `/do:better` directly to produce per-category PRs from default
+3. Run `git status --porcelain` — if dirty, the do:better Phase 3a stash will handle it, but warn the user that uncommitted changes will be stashed and restored after the audit
+## Phase A: Run do:better (constrained to "Commit directly")
+Execute the full `do:better` workflow defined in `~/.claude/commands/do/better.md` with these mandatory deviations:
+### Phase 0 → 4a: unchanged
+Run discovery, audit (all 8 agents), plan generation, worktree setup, foundation utilities, parallel remediation, build/test verification, and internal code review exactly as specified in `do:better`.
+### Phase 4b: Force the "Commit directly" path
+When you reach Phase 4b's decision point, **do not present the AskUserQuestion** and **do not proceed to per-category PR creation**. Instead, always take the "Commit directly" path:
+- Ensure all worktree changes are committed in `better/{DATE}`
+- Return to `{REPO_DIR}`, check out `{CURRENT_BRANCH}`, and merge `better/{DATE}` into it:
+  ```bash
+  cd {REPO_DIR}
+  git checkout {CURRENT_BRANCH}
+  if git merge better/{DATE}; then
+    git worktree remove {WORKTREE_DIR}
+    git branch -D better/{DATE}
+  else
+    echo "Merge conflict — resolve in {REPO_DIR}, then run:"
+    echo "  git worktree remove {WORKTREE_DIR}"
+    echo "  git branch -D better/{DATE}"
+    # halt and surface the conflict to the user
+  fi
+  ```
+- If a merge conflict surfaces, stop here and ask the user to resolve before continuing to Phase B
+- Restore the stash if Phase 3a stashed changes (`git stash pop`)
+### Phase 4c: Test Enhancement
+Run Phase 4c (test enhancement) **before** the merge-back if the worktree is still active, so the new/fixed tests ship in the same merge. The Phase 4c.3 `FILE_OWNER_MAP` update is unnecessary in pr-better (we're not building per-category branches), so skip that step.
+### Phases 5, 6, 7: SKIP entirely
+Do not create category branches. Do not bump the version (the user's PR is responsible for any version bump via the project's normal release flow). Do not run the Copilot review loop here — `do:pr` will run it once on the combined PR. Do not delete branches you didn't create.
+The only Phase 7-equivalent housekeeping that applies:
+- Update PLAN.md to mark completed findings as `[x]` and note any skipped findings with reasons. Stage these changes so the next phase commits them as part of the PR.
+- Print the final summary table from Phase 7 (with PR fields blank — they'll be filled by Phase B).
+## Phase B: Run do:pr
+After Phase A leaves all fixes committed on `{CURRENT_BRANCH}`, hand off to the workflow defined in `~/.claude/commands/do/pr.md`:
+1. **Detect branches** — already done in pre-flight, reuse those values
+2. **Commit and push** — commit any remaining staged changes (e.g., the PLAN.md update), then `git pull --rebase --autostash && git push -u origin {current_branch}`
+3. **Local Code Review (REQUIRED GATE)** — run the full review gate from `do:pr`. The do:better Phase 4b internal review covered the worktree diff against the default branch, but the do:pr review gate also covers any prior commits on the feature branch that predate this run. Do not skip it.
+4. **Open the PR** — create a single PR with a description that summarizes both:
+   - The original feature work on the branch (from prior commits)
+   - The do:better audit findings now folded in (categories, counts, severity)
+5. **Copilot review loop** — runs once on the combined PR via the standard `do:pr` flow
+## Final Report
+Print:
+- The do:better summary table (categories, findings fixed/skipped) — with the PR column populated by the single PR URL
+- Test enhancement stats (vacuous fixed, weak strengthened, new cases, new files)
+- The PR URL
+- Final review status (clean / comments addressed / left open)
+## Notes
+- This is **not** equivalent to running `/do:better --no-merge` then `/do:pr`. `--no-merge` still creates per-category branches and PRs (it just skips the Copilot/merge step). pr-better never creates per-category branches at all.
+- Conflict-avoidance machinery from do:better Phase 5 (FILE_OWNER_MAP, backward-compatible re-exports across PRs) is unnecessary here — everything lands on one branch in one merge.
+- The user's existing feature commits remain intact; do:better remediation is added as additional commits on top.
+- If do:better finds zero actionable CRITICAL/HIGH/MEDIUM findings, skip Phase A's Phase 3-4 entirely and run only Phase B (`do:pr`) — the audit served as a quality gate even when nothing needed fixing.

package/commands/do/review.md CHANGED Viewed

@@ -22,7 +22,13 @@ CLAUDE.md is already loaded into your context. Use its rules (code style, error
 Before dispatching agents, understand what this change set claims to do:
 1. Read commit messages (`git log {base}...HEAD --oneline`)
-2. Note the claims — verify after agents return whether the code actually delivers them.
+2. Read PLAN.md, .changelog/NEXT.md (or equivalent), and the PR description for capability claims, test counts, and "deep-links to X" / "feature Y now works" assertions
+3. Note the claims — verify after agents return whether the code actually delivers them. Concrete drift to flag:
+   - Test counts in PLAN/changelog vs `find . -name '*.test.*' -exec grep -c '^\(it\|test\)(' {} +` (or project equivalent)
+   - "Deep-links to record X" claims vs whether the destination route handler actually consumes the encoded parameter
+   - "Auto-prune after N days" / "scans only the page returned" claims vs the listing implementation
+   - Comments in code claiming behavior the surrounding code doesn't perform
+   - Field names quoted in docs (request body shape, event payload shape) vs what the code actually reads/emits
 ## Dispatch Review Agents

package/commands/do/rpr.md CHANGED Viewed

@@ -92,7 +92,7 @@ Poll using GraphQL to check for a new review with a `submittedAt` timestamp afte
 gh api graphql -f query='{ repository(owner: "OWNER", name: "REPO") { pullRequest(number: PR_NUM) { reviews(last: 3) { nodes { state body author { login } submittedAt } } reviewThreads(first: 100) { nodes { id isResolved comments(first: 3) { nodes { body path line author { login } } } } } } } }'
 ```
-**Dynamic poll timing**: Before your first poll, check how long the most recent Copilot review on this PR took by comparing consecutive Copilot review `submittedAt` timestamps (or PR creation time for the first review). Use that duration as your expected wait. If no prior review exists, default to 5 minutes. Use **progressive poll intervals**: 15s, 15s, 30s, 30s, then 60s thereafter — small diffs often complete in under a minute, so early frequent checks avoid wasting time. Set max wait to **2x the expected duration** (minimum 5 minutes, maximum 20 minutes). Copilot reviews can take **10-15 minutes** for large diffs — do NOT give up early.
+**Dynamic poll timing**: Before your first poll, check how long the most recent Copilot review on this PR took by comparing consecutive Copilot review `submittedAt` timestamps (or PR creation time for the first review). Use that duration as your expected wait. If no prior review exists, default to 2 minutes. Use **progressive poll intervals**: 10s, 10s, 15s, 15s, then 30s thereafter — small diffs often complete in under a minute, so early frequent checks avoid wasting time. Set max wait to **2x the expected duration** (minimum 2 minutes, maximum 10 minutes). Copilot reviews typically complete in **2-5 minutes**; large diffs may take longer — do NOT give up early.
 The review is complete when a new `copilot-pull-request-reviewer` review node appears. If no review appears after max wait: **Default mode**: auto-skip and continue. **Interactive mode (`--interactive`)**: ask the user whether to continue waiting, re-request, or skip.

package/install.sh CHANGED Viewed

@@ -47,7 +47,7 @@ banner() {
 COMMANDS=(
   better better-swift fpr goals help omd
-  pr push release replan review rpr update
+  pr pr-better push release replan review rpr update
 )

package/lib/code-review-checklist.md CHANGED Viewed

@@ -22,12 +22,18 @@
    - Functions that index into arrays without guarding empty arrays; aggregate operations (`every`, `some`, `reduce`) on potentially-empty collections returning vacuously true/default values that mask misconfiguration or missing data; state/variables declared but never updated or only partially wired up
    - Parallel arrays or tuples coupled by index position (e.g., a names array, a promises array, and a destructuring assignment that must stay aligned) — insertion or reordering in one silently misaligns all others. Use objects/maps keyed by a stable identifier instead
    - Shared mutable references — module-level defaults passed by reference mutate across calls (use `structuredClone()`/spread); `useCallback`/`useMemo` referencing a later `const` (temporal dead zone); object spread followed by unconditional assignment that clobbers spread values
+   - UI-framework state invariants (uniqueness, monotonicity, cap/floor) checked against the render-time value before calling the updater — rapid events or concurrent updates then violate the invariant. Move the check inside the functional updater (e.g., `setX(prev => prev.includes(id) ? prev : [...prev, id])`) so it runs against the latest state
+   - Reactive effects / observers that depend on a value they themselves write — the write retriggers the effect, producing an infinite loop or network storm. Split into separate effects (one guarded by a "not yet initialized" condition, one that refreshes on an explicit trigger), or drop the self-written value from the dependency array and update via a functional setter
    - Functions with >10 branches or >15 cyclomatic complexity — refactor into smaller units
+   - String accumulation via `+=` inside high-frequency loops (streaming frames, chunked I/O, per-event handlers) becomes O(n²) for long outputs and triggers React re-renders on growing payloads. Collect chunks into an array and `join('')` once at the end while still emitting per-chunk events to consumers
+   - Server-side string formatting (`toLocaleString`, `toLocaleDateString`, currency/number formatters) that depends on locale/timezone defaults produces non-deterministic outputs across deployments. For data that flows into prompts, logs, persisted records, or cross-system messages, format with explicit `Intl.DateTimeFormat({ timeZone: ... })` or ISO strings; reserve locale-aware formatting for user-visible UI layers where the user's locale is the explicit input
+   - Required-at-use-time config values (model name, API key, endpoint URL, default selection) that may be null/undefined in the source data must be validated at the boundary before invoking the downstream API or initialization. Otherwise the upstream API responds with an opaque error far from the user's actual intent. Emit a clear, actionable error (or fail fast at startup) when a required value is missing, identifying the specific field
    **API & URL safety**
    - User-supplied or system-generated values interpolated into URL paths, shell commands, file paths, subprocess arguments, or dynamically evaluated code (eval, CDP evaluate, new Function, template strings executed in browser/page context) without encoding/escaping — use `encodeURIComponent()` for URLs, regex allowlists for execution boundaries, and `JSON.stringify()` for values embedded in evaluated code strings. Generated identifiers used as URL path segments must be safe for your router/storage (no `/`, `?`, `#`; consider allowlisting characters and/or applying `encodeURIComponent()`). Identifiers derived from human-readable names (slugs) used for namespaced resources (git branches, directories) need a unique suffix (ID, hash) to prevent collisions between entities with the same or similar names
-   - Route params passed to services without format validation; path containment checks using string prefix without path separator boundary (use `path.relative()`)
-   - Request schema fields for large string/binary payloads (base64, file content, free text) without per-field size limits — a total body-size limit alone doesn't prevent individual oversized fields from consuming excessive memory or exceeding downstream service limits. Add per-field `max(N)` constraints with clear error messages
+   - Route params passed to services without format validation; path containment checks using string prefix without path separator boundary (use `path.relative()`). When a route validates body/query fields, the corresponding path parameter must be validated with the same schema — skipping param validation on sibling endpoints (e.g., `PUT /:id` validates body.id but `DELETE /:id` lets the raw param fall through) causes inconsistent error classes (400 vs 404 vs 500) for the same invalid input
+   - Character-class regex validators (e.g., `^[a-z0-9-]+$`) claiming to enforce a structured format (slug, kebab-case, reverse-DNS, semver) — they accept leading/trailing separators (`-foo`, `foo-`), repeated separators (`a--b`), and empty segments. Require boundary characters (`^[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$`) or use a dedicated parser when the claim is a structured format rather than a character set
+   - Request schema fields for large string/binary payloads (base64, file content, free text) without per-field size limits — a total body-size limit alone doesn't prevent individual oversized fields from consuming excessive memory or exceeding downstream service limits. Add per-field `max(N)` constraints with clear error messages. The same cap applies to OUTBOUND payloads stored in persisted records or sent over streaming protocols (snippets, previews, cached excerpts) — capture-time size enforcement prevents on-disk growth and SSE/WebSocket payload bloat that pure display-time truncation cannot
    - Parameterized/wildcard routes registered before specific named routes — the generic route captures requests meant for the specific endpoint (e.g., `/:id` registered before `/drafts` matches `/drafts` as `id="drafts"`). Verify route registration order or use path prefixes to disambiguate
    - Stored or external URLs rendered as clickable links (`href`, `src`, `window.open`) without protocol validation — `javascript:`, `data:`, and `vbscript:` URLs execute in the user's browser. Allowlist `http:`/`https:` (and `mailto:` if needed) before rendering; for all other schemes, render as plain text or strip the value
    - Server-side HTTP requests using user-configurable or externally-stored URLs without protocol allowlisting (http/https only) and host/network restrictions — the server becomes an SSRF proxy for reaching internal network services, cloud metadata endpoints, or localhost-bound APIs. Validate scheme and restrict to expected hosts or external-only ranges before any server-side fetch. Also check redirect handling: auto-following redirects (`redirect: 'follow'`) bypasses initial host validation when a public URL redirects to an internal IP. Disable auto-follow and revalidate each hop, or resolve DNS and block private/loopback/link-local ranges before connecting — public hostnames can resolve to internal IPs via DNS rebinding
@@ -35,7 +41,7 @@
    **Trust boundaries & data exposure**
    - API responses returning full objects with sensitive fields — destructure and omit across ALL response paths (GET, PUT, POST, error, socket); comments/docs claiming data isn't exposed while the code path does expose it
-   - Server trusting client-provided computed/derived values (scores, totals, correctness flags, file metadata like MIME type and size) when the server can recompute or verify them — strip and recompute server-side; for file uploads, validate content type via magic bytes and size via actual buffer length rather than trusting client-supplied headers
+   - Server trusting client-provided computed/derived values (scores, totals, correctness flags, file metadata like MIME type and size) when the server can recompute or verify them — strip and recompute server-side; for file uploads, validate content type via magic bytes and size via actual buffer length rather than trusting client-supplied headers. The same principle applies to trust in persisted state: flags read from flat-file/JSON/DB records that control authorization or deletion protection (`builtIn`, `protected`, `role`, `owner`) must be derived from a trusted source (code constants, session identity) on every read — otherwise hand-editing the file or tampered sync can flip the flag and bypass the protection
    - New endpoints mounted under restricted paths (admin, internal) missing authorization verification — compare with sibling endpoints in the same route group to ensure the same access gate (role check, scope validation) is applied consistently. When new capabilities require additional OAuth scopes or API permissions, verify the scope-upgrade check covers all required scopes — a check that only tests for one scope will miss newly added scopes, causing downstream API calls to fail with insufficient permissions
    - User-controlled objects merged via `Object.assign`/spread without sanitizing keys — `__proto__`, `constructor`, and `prototype` keys enable prototype pollution. Use `Object.create(null)` for the target, whitelist allowed keys, and use `hasOwnProperty` (not `in`) to check membership. Also verify the merge can't override reserved/internal fields the system depends on
    - Push events (WebSocket, SSE, pub/sub) emitted without scoping to the originating user or session — sensitive payloads (user content, tokens, progress data, images) leak to all connected clients in multi-user environments. Scope events to the requesting session via room/channel isolation or include a correlation ID the client provides at request time; verify consumers filter events by correlation ID before updating UI state
@@ -43,7 +49,7 @@
 ## Tier 2 — Check When Relevant (Data Integrity, Async, Error Handling)
    **Async & state consistency** _[applies when: code uses async/await, Promises, or UI state]_
-   - Optimistic state changes (view switches, navigation, success callbacks) before async completion — if the operation fails or is cancelled, the UI is stuck with no rollback. Check return values/errors before calling success callbacks. Handle both failure and cancellation paths. Watch for `.catch(() => null)` followed by unconditional success code (toast, state update) — the catch silences the error but the success path still runs. Either let errors propagate naturally or check the return value before proceeding
+   - Optimistic state changes (view switches, navigation, success callbacks) before async completion — if the operation fails or is cancelled, the UI is stuck with no rollback. Check return values/errors before calling success callbacks. Handle both failure and cancellation paths. Watch for `.catch(() => null)` followed by unconditional success code (toast, state update) — the catch silences the error but the success path still runs. Either let errors propagate naturally or check the return value before proceeding. Also covers optimistic placeholder IDs ('pending', 'temp_*', client-generated UUIDs): they must NOT be echoed back to the server in subsequent requests — the server validates against its real ID format and rejects them as 400s. Either disable controls bound to optimistic IDs until the server returns a real one, OR omit the field from outgoing payloads when the local value still matches the optimistic shape. Settings whose persistence model is per-record (per-conversation, per-document, per-project) must be persisted on every mutation, not just held in local component state — otherwise refresh resets to the persisted value and server-side history shows different content than the user used. Decide explicitly whether the field is per-record, per-session, or per-action — and persist accordingly
    - Multiple coupled state variables updated independently — actions that change one must update all related fields; debounced/cancelable operations must reset loading state on every exit path (cleared, stale, failed, aborted). Reference/selection sets that point to items in a data collection must be pruned when items are removed and invalidated when the collection is reloaded, filtered, paginated, or sorted — stale references send nonexistent IDs to downstream operations. Operations triggered from a confirmation dialog must re-validate preconditions (selection non-empty, items still exist) at execution time — the underlying data may change between dialog display and user confirmation. Component state initialized from props via `useState(prop)` only captures the initial value — if the prop updates asynchronously (data fetch, parent re-render), the local state goes stale. Sync with an effect when the user is not actively editing, or lift state to avoid the copy
    - Error notification at multiple layers (shared API client + component-level) — verify exactly one layer owns user-facing error messages. For periodic polling, also check that error notifications are throttled or deduplicated (only fire on state transitions like success→error, not on every failed iteration) and that failure doesn't make the UI section disappear entirely (component returning null when data is null/errored) — render an error or stale-data state instead of absence
    - Optimistic updates using full-collection snapshots for rollback — a second in-flight action gets clobbered. Use per-item rollback and functional state updaters after async gaps; sync optimistic changes to parent via callback or trigger refetch on remount. When appending items to a list optimistically, guard against duplicates (check existence before append) — concurrent or repeated operations can insert the same item multiple times
@@ -59,11 +65,23 @@
    - Side effects during React render (setState, navigation, mutations outside useEffect)
    - Interactive UI elements (buttons, inputs, drag-and-drop targets, keyboard shortcuts) that remain enabled while an async operation owns their related state — a second trigger while the first is in-flight produces concurrent state mutations or duplicate operations. All entry points for the same action must be disabled together while it is pending
    - Optimistic UI messages that substitute placeholder text when the actual payload sent to the server differs — the user sees one thing, the server stores another. Use the same fallback text in both the optimistic render and the outgoing payload, or surface the actual payload text in the UI
+   - Async functions invoked from synchronous event handlers (`onClick`, `onKeyDown`, command dispatchers) or effects without handling rejection at the call site — even when a shared request helper toasts the error, the unhandled rejection pollutes the console, leaves UI state inconsistent (optimistic mutation not reverted, palette/modal stuck open, navigation skipped), and hides failures from downstream event-loop instrumentation. Wrap the `await` in try/catch, attach `.catch(...)`, or use `void promise.catch(...)` — and only run success-path side effects (close, navigate, clear dirty) inside the success branch. After awaiting an async operation that may be cancelled or whose owning component may unmount, check `signal.aborted` (or a `mountedRef`) before subsequent state writes — otherwise React warns about updates on unmounted trees and stale state leaks through
+   - Single shared error-state variable (one `error` setter) reused by multiple independent async flows — one flow's success path clears the other flow's displayed error and vice versa. Split errors by domain (`dataError`, `layoutsError`), scope errors per operation, or only overwrite the specific error the current flow owns
+   - A page renders based on multiple independent async loads but the loading flag reflects only one of them — a slow or failed secondary load renders a blank page with no loading indicator and no error. Either include every render-gating fetch in the loading state (or a per-fetch status map) or provide explicit empty/error states for the secondary data
+   - Unsaved changes / dirty state discarded without warning when the user switches context (selecting another record in a multi-record editor, navigating away, closing a sheet) — silent data loss. Dirty-check on context change (inline confirm), auto-save drafts per record, or gate the switch control until unsaved state is committed. The `beforeunload` prompt alone does not cover in-app context switches
+   - Actions triggered from one surface (command palette, global menu, external event) that mutate data another already-mounted page fetched on mount — re-navigating to the same route does NOT remount the page (routers treat it as a no-op), so the visible state stays stale even though the mutation succeeded server-side. Propagate the change via a shared store, a pub/sub event whose name is a shared constant, a refetch-on-focus/visibility hook, or a key-based remount — and verify the mounted page actually subscribes
+   - Long-lived streaming response handlers (SSE, chunked HTTP, WebSocket) on the server must register a client-disconnect listener (`req.on('close')`, `req.on('aborted')`) and propagate cancellation through the FULL processing chain (retrieval, fetches, subprocesses) — partial threading wastes work after disconnect, leaves `write-after-end` errors, and inflates resource use. Per-request timeouts on streaming responses must remain active for the full duration of stream consumption, not cleared on initial fetch resolution — a stalled upstream that keeps the connection open hangs the consumer indefinitely. Honor write backpressure: check the boolean return of `write()` and await `'drain'` when it returns false. When attaching paired listeners for backpressure or completion (`drain` + `close`), the cleanup handler must remove ALL of them — asymmetric removal accumulates listeners across slow-client cycles. After flushing streaming headers, framework error middleware (asyncHandler, exception filters) cannot send a JSON error — wrap post-handshake logic in try/finally that translates errors into a terminal `event: error` SSE frame and ends the response gracefully (`if (!res.writableEnded && !res.destroyed) res.end()`)
+   - Client-side AbortController for an in-flight streaming/long-lived operation must be invoked when its owning UI context tears down OR navigates AWAY from that operation. When the cleanup is keyed to a route param, navigation events emitted BY the in-flight stream itself (e.g., redirecting to a permalink after the server returns an id) trigger the cleanup and abort the very operation that caused the navigation. Track the streaming operation's identity in a ref and abort only when navigating away from THAT identity, not on every param change. Mirror the abort by cancelling the stream reader (`reader.cancel()`) in a `finally` block and ignoring late events whose ID doesn't match the now-current operation
+   - Streaming UIs must preserve the deltas already received when the stream emits a terminal `error` event mid-stream (commit them as a partial result with an error indicator). Clearing the streaming buffer on error discards visible content the user already saw and breaks recovery flows. Pair this with mutually exclusive terminal events (see Error handling)
    **Error handling** _[applies when: code has try/catch, .catch, error responses, or external calls]_
    - Service functions throwing generic `Error` for client-caused conditions — bubbles as 500 instead of 400/404. Use typed error classes with explicit status codes; ensure consistent error responses across similar endpoints — when multiple endpoints make the same access-control decision (e.g., "resource exists but caller lacks access"), they must return the same HTTP status (typically 404 to avoid leaking existence). Include expected concurrency/conditional failures (transaction cancellations, optimistic lock conflicts) — catch and translate to 409/retry rather than letting them surface as 500
+   - Error discrimination by string matching (`err.message.includes('not found')`, regex on error text) or by coupling to localizable/developer-facing messages — refactors, localization, or wrapper rewrites silently change behavior (HTTP status, retry policy, user message). Use explicit error codes or typed classes and branch on them
+   - Route handlers (or equivalent response mappers) that convert any exception from a service call into a single specific status — e.g., `catch (err) { throw new NotFoundError() }` — mask real server errors (file I/O, JSON parse, atomic-write failures) as domain 404s and hide outages. Map only known error classes/codes; let unknown errors surface as 500 so real bugs aren't suppressed
+   - Error wrappers that re-throw with only a subset of the original fields (e.g., `new ServerError(err.message, { status })` that drops `code`, `context`, `cause`) — downstream consumers see a generic `INTERNAL_ERROR` instead of the specific code they branch on. Preserve structured detail across wrapping: propagate `code`, `context`, `cause`, and any fields clients or logs depend on
    - Swallowed errors (empty `.catch(() => {})`), handlers that replace detailed failure info with generic messages, and error/catch handlers that exit cleanly (`exit 0`, `return`) without any user-visible output — surface a notification, propagate original context, and make failures look like failures. This includes cross-layer error propagation: if the server returns structured error detail (field-level validation messages, `details[]` arrays, error codes), the client should surface actionable detail rather than discarding the structure for a generic string. Includes external service wrappers that return `null`/empty for all non-success responses — collapsing configuration errors (missing API key), auth failures (403), rate limits (429), and server errors (5xx) into a single "not found" return masks outages and misconfiguration as normal "no match" results. Distinguish retriable from non-retriable failures and surface infrastructure errors loudly
-   - SSE or streaming handlers that call `end()`/`close()` on mid-stream errors without emitting an error event — the client observes a clean stream termination and treats partial content as complete. Emit a structured `event: error` block before closing so clients can detect and surface the failure
+   - SSE or streaming handlers that call `end()`/`close()` on mid-stream errors without emitting an error event — the client observes a clean stream termination and treats partial content as complete. Emit a structured `event: error` block before closing so clients can detect and surface the failure. Conversely, named lifecycle events on streams (`error`, `done`, `complete`) must be MUTUALLY EXCLUSIVE — after emitting `error`, do NOT also emit `done`, or include explicit success/error info in the terminal frame. Otherwise clients parsing the last event treat failed runs as completed
+   - Raw `fetch()` failures (TypeError "Failed to fetch", DNS errors, ECONNREFUSED) at API client boundaries must be translated to a consistent user-friendly message matching the project's established transport-error utility — otherwise users see cryptic browser errors and can't distinguish "server unreachable" from "request rejected." Preserve `AbortError` so callers can still distinguish cancellation from failure
    - SSE or event dispatchers handling named event types but ignoring the protocol's default/unnamed event — SSE streams emitting `data:` without `event:` produce type `'message'` (the SSE spec default), which a handler processing only named types silently discards. Verify the default event type is either handled or explicitly excluded
    - Caller/callee disagreement on success/decision semantics — a function that resolves with `{ success: false }` while callers use `.catch()` for error handling means failures are treated as successes. Verify that the contract between producer and consumer is consistent: if callers branch on rejection, the function must reject on failure; if callers branch on a status field, the function must never reject. This extends beyond success/failure to any evaluation result — if a gate/check function returns `{ shouldRun: false }` on errors while the runtime treats errors as fail-open (run anyway), the API surface and runtime disagree on skip semantics. Also covers argument shape contracts — passing a wrapped object (`{ items: [...] }`) when the callee expects a bare array (or vice versa), or supplying arguments at the wrong parameter position, causes silent no-ops or partial processing; verify argument shapes match by reading the callee's signature, not assuming from naming conventions. Also check EventEmitter async handlers — `async` callbacks on `'close'`/`'error'` events create unhandled rejections because EventEmitter doesn't await handler promises; wrap in try/catch
    - Destructive operations in retry/cleanup paths assumed to succeed without their own error handling — if cleanup fails, retry logic crashes instead of reporting the intended failure
@@ -82,12 +100,13 @@
    - API versioning: breaking changes to public endpoints without version bump or deprecation path
    - Backward-incompatible response shape changes without client migration plan
    - Backward compatibility breaking changes — renamed/removed config keys, changed file formats, altered DB schemas, modified event payloads, renamed URL routes/paths, or restructured persisted data (localStorage, files, database rows) without a migration path or fallback that reads the old format. For route/URL renames, add redirects from old paths to preserve bookmarks and external links. Trace all consumers of the changed contract (other services, CLI versions, stored data) and verify they still work or have an upgrade path. For schema changes, require a migration script; for config/format changes, support both old and new formats during a transition period or provide a one-time converter
-   - One-time migrations or initializations triggered on load/startup without a completion guard (version stamp, flag, or condition that excludes already-migrated data) — re-execute on every startup, causing unnecessary writes, duplicate processing, or state churn. Ensure the migration condition excludes records/configs that have already been migrated
+   - One-time migrations or initializations triggered on load/startup without a completion guard (version stamp, flag, or condition that excludes already-migrated data) — re-execute on every startup, causing unnecessary writes, duplicate processing, or state churn. Ensure the migration condition excludes records/configs that have already been migrated. Also covers setup/provisioning scripts invoked from hot paths (`npm start`, dev script, container entrypoint, app boot) that mutate credentials, privileges, or installed-package state (`ALTER USER`, password resets, brew installs, file ownership changes) — gate the heavy work behind a cheap readiness check so reruns are no-ops, OR refactor the script so each step is itself idempotent and detects already-applied state
    - Data migrations that silently change runtime behavior — converting records from one type/schedule/state to another must preserve the original execution semantics (frequency, enabled state, trigger behavior). Migrating an on-demand/manual entity to a scheduled one causes unexpected automated execution; migrating a fine-grained interval to a coarser default changes frequency. Unsupported source values (cron expressions, custom intervals) must be flagged or preserved, not silently dropped to defaults
    - Update/patch endpoints with explicit field allowlists (destructured picks, permitted-key arrays) — when the data model gains new configurable fields, the allowlist must be updated or the new fields are silently dropped on save. Trace from model definition to the update handler's field extraction to verify coverage
-   - New endpoints/schemas should match validation patterns of existing similar endpoints — field limits, required fields, types, error handling. If validation exists on one endpoint for a param, the same param on other endpoints needs the same validation. API documentation schemas (OpenAPI, JSON Schema) must be structurally complete — array types require `items` definitions, required fields must be listed, and the documented shape must match what the implementation actually returns
+   - New endpoints/schemas should match validation patterns of existing similar endpoints — field limits, required fields, types, error handling. If validation exists on one endpoint for a param, the same param on other endpoints needs the same validation — including path params, query params, and body fields on sibling endpoints (create/update/delete/activate). The same field should be accepted, rejected, and trimmed identically everywhere it appears, regardless of which endpoint consumes it. `z.string().min(1)` without `.trim()` accepts whitespace-only values — prefer `z.string().trim().min(1)` for user-visible names. API documentation schemas (OpenAPI, JSON Schema) must be structurally complete — array types require `items` definitions, required fields must be listed, and the documented shape must match what the implementation actually returns
    - Client-side input validation limits (max count, file size, string length, combined totals) must be consistent with — and ideally tighter than — server-side enforcement. When the client allows combinations the server rejects (e.g., 8 × 10MB files vs a 50MB JSON body limit), users hit confusing 400/413 errors. Trace all enforcement boundaries (UI, API schema, body parser, downstream service) and verify they form a coherent envelope
    - Sample config files, README examples, and documentation that reference config keys or structure must match what the implementation actually reads. Trace example keys against the config loader — stale examples teach operators to configure values the system ignores (or vice versa)
+   - Subprocess invocations must inherit the same configuration source as the parent — if the parent reads from `.env`/config files but the child only sees `process.env`, exporting those values explicitly via the `env` option is required. Otherwise a probe uses customized credentials/ports while the underlying setup runs with built-in defaults, creating an "inconsistency loop" where the probe always fails and provisioning re-applies defaults that overwrite user customization. Pass the resolved values through; do not assume children re-parse the same config files
    - Config values whose format can be validated at initialization time (URLs, port numbers, auth schemes) but are only validated at first use — misconfiguration surfaces as a cryptic runtime error deep in the call stack. Validate format and range of security-relevant config values during initialization and surface a specific diagnostic identifying the bad field
    - URL joining utilities that force paths absolute-from-origin discarding the base URL's pathname — `baseUrl=http://host/proxy` + `/v1/api` silently produces `http://host/v1/api` instead of `http://host/proxy/v1/api`. Verify URL construction utilities preserve pathname segments from the base URL, or document and enforce that base URLs must be origin-only
    - Summary/aggregation endpoints that compute counts or previews via a different query path, filter set, or data source than the detail views they link to — users see inconsistent numbers between the dashboard and the destination page. Trace the computation logic in both paths and verify they apply the same filters, exclusions, and ordering guarantees (or document the intentional difference)
@@ -97,6 +116,13 @@
    - Schema fields accepting values downstream code can't handle; Zod/schema stripping fields the service reads (silent `undefined`); config values persisted but silently ignored by the implementation — trace each field through schema → service → consumer. Also check for parameters accepted and validated in the schema but never consumed by the implementation — dead API surface that misleads callers into believing they're configuring behavior that's silently ignored; remove unused parameters or wire them through to the implementation. Update schemas derived from create schemas (e.g., `.partial()`) must also make nested object fields optional — shallow partial on a deeply-required schema rejects valid partial updates. Additionally, `.deepPartial()` or `.partial()` on schemas with `.default()` values will apply those defaults on update, silently overwriting existing persisted values with defaults — create explicit update schemas without defaults instead
    - Multi-part UI features (e.g., table header + rows) whose rendering is gated on different prop/condition subsets — if the header checks prop A while rows check prop B, partial provision causes structural misalignment (column count mismatch, orphaned interactive elements without handlers). Derive a single enablement boolean from the complete prop set and use it consistently across all participating components
    - Entity creation without case-insensitive uniqueness checks — names differing only in case (e.g., "MyAgent" vs "myagent") cause collisions in case-insensitive contexts (file paths, git branches, URLs). Normalize to lowercase before comparing
+   - Arrays of IDs (widget ids, tag ids, member ids) persisted, returned by API, or rendered with `key={x}` without container-level deduplication — element-level validation (type check, length cap) is not enough. Duplicates cause React key collisions, inflated operation counts, and inconsistent UI updates. Enforce uniqueness via schema refinement (`zod.refine(arr => new Set(arr).size === arr.length)`), dedupe during ingestion, AND dedupe during read-path sanitization so hand-edited or legacy data can't reintroduce collisions. Apply the same logic to arrays of records keyed by id at the container level (first-wins, not just element-level shape checks)
+   - Data loaded from files or persistent stores sanitized less strictly than the API accepts on write — hand-edited, migrated, or corrupted persisted state can introduce values (oversized names, non-kebab ids, duplicate entries, bypassed limits) the API would reject, producing oversized responses, unreachable records (client renders but API rejects on mutate), or invariant violations. Apply the same length caps, regex, uniqueness, and type guards in read-path sanitization as in request-schema validation; drop or truncate out-of-range values rather than passing them through
+   - Authority / privilege flags (builtIn, protected, owner, role, immutable) read directly from persisted records without re-derivation — flipping the flag in a hand-edited file or a tampered sync source grants unintended privileges (e.g., changing `builtIn: true → false` lets "protected" records be deleted). Derive authority from code (a constant set of built-in ids, session identity, server-side role lookup), not from the persisted representation
+   - Cross-module constants kept in sync by comment ("must stay in sync with X", copy-pasted regex, duplicated event names or size limits) — the comment is not enforcement and drift is a silent failure. Any constant shared across module boundaries (client↔server, route↔service, component↔component, producer↔consumer) must be a single exported value imported at both sides: event names, regex patterns, numeric limits, path segments, feature-flag keys, error codes
+   - Generator/validator structural invariants — when a generator produces values with structural guarantees (sortability via fixed-width prefix, embedded checksum, encoded version), the validator regex (and any client-side mirror) must enforce the SAME shape. Broader regexes accept inputs the generator never emits, breaking invariants the rest of the system relies on (e.g., lexical sort == chronological sort breaks once a base36 timestamp grows by a digit). Verify generator + server validator + client mirror form a closed loop. Test fixtures should use IDs/payloads that match generator output, not contrived literals — fixtures with off-spec shapes hide regressions where the production format changes
+   - Schemas accepting paired/range fields (`startDate`/`endDate`, `min`/`max`, `from`/`to`) must add a cross-field refinement (`zod .refine()`, JSON Schema `dependentSchemas`) enforcing the relationship (start ≤ end). Without it, the schema accepts inconsistent ranges that the implementation may silently swap, ignore, or interpret ambiguously. Define deterministic rules when only one bound is supplied
+   - Pure persistence/utility modules importing from orchestration/service modules just to access a constant pulls the entire downstream import graph as a transitive dependency — increases test cost, slows imports, and creates side effects on load. Move shared constants (enums, regex, size caps, valid-mode sets) into small dedicated modules so storage and utility code can be reused without dragging the service stack in
    - Code reading properties from API responses, framework-provided objects, or internal abstraction layers using field names the source doesn't populate or forward — silent `undefined`. Verify property names and nesting depth match the actual response shape (e.g., `response.items` vs `response.data.items`, `obj.placeId` vs `obj.id`, flat fields vs nested sub-objects). When building a new consumer against an existing API, check the producer's actual response — not assumed conventions. When branching on fields from a wrapped third-party API, confirm the wrapper actually requests and forwards those fields (e.g., optional response attributes that require explicit opt-in). Also verify call sites pass inputs in the format the called function actually accepts — framework constructors with non-obvious positional argument order, loaders with format-specific variants (content paths vs script paths, asset objects vs class references), and accessor APIs with distinct method-vs-property semantics. Fallback branches in multi-format dispatchers commonly use the wrong function for the input type
    - Data model fields that have different names depending on the creation/write path (e.g., `createdAt` vs `created`) — code referencing only one naming convention silently misses records created through other paths. Trace all write paths to discover the actual field names in use. When new logic (access control, UI display, queries) checks only a newly introduced field, verify it falls back to any legacy field that existing records still use — otherwise records created before the migration are silently excluded or inaccessible. Also check entity identity keys: if code looks up or matches entities using a computed key (e.g., `e.id || e.externalId`), all code paths that perform the same lookup must use the same key computation — one path using `e.id` while another uses `e.id || e.externalId` causes mismatches for entities missing the primary key
    - Entity type changes without invariant revalidation — when an entity has a discriminator field (type, kind, category) and the user changes it, all type-specific invariants must be enforced on the new type AND type-specific fields from the old type must be cleared or revalidated. A job changing from `shell` to `agent` without clearing `command`, or changing to `shell` without requiring `command`, leaves the entity in an invalid hybrid state that fails at runtime or resurfaces stale data
@@ -104,7 +130,7 @@
    - Operations scoped to a specific entity subtype that don't verify the entity's type discriminator before processing — an endpoint or function designed for one account/entity type that accepts any entity by ID can corrupt state or produce wrong results when called with the wrong type. Add an explicit type guard and return a structured error
    - Inconsistent "missing value" semantics across layers — one layer treats `null`/`undefined` as missing while another also treats empty strings or whitespace-only strings as missing. Query filters, update expressions, and UI predicates that disagree on what constitutes "missing" cause records to be skipped by one path but processed by another. Define a single `isMissing` predicate and use it consistently, or normalize empty/whitespace values to `null` at write time. Also applies to comparison/detection logic: coercing an absent field to a sentinel (`?? 0`, default parameters) makes the logic treat "unsupported" as a real value — guard with an explicit presence check before comparing. Watch for validation/sanitization functions that return `null` for invalid input when `null` also means "clear/delete" downstream — malformed input silently destroys existing data. Distinguish "invalid, reject the request" from "explicitly clear this field". Also applies to normalization (trailing slashes, case, whitespace): if one path normalizes a value before comparison but the write path stores it un-normalized, comparisons against the stored value produce incorrect results — normalize at write time or normalize both sides consistently
    - Validation functions that delegate to runtime-behavior computations (next schedule occurrence, URL reachability, resource resolution) — conflating "no result within search window" or "temporarily unavailable" with "invalid input" rejects valid configurations. Validate syntax and structure independently of runtime feasibility
-   - Numeric values from strings used without `NaN`/type guards — `NaN` comparisons silently pass bounds checks. Clamp query params to safe lower bounds
+   - Numeric values from strings used without `NaN`/type guards — `NaN` comparisons silently pass bounds checks; `NaN` flowing into subprocess args (`-p NaN`) or formatted strings produces opaque downstream failures. Clamp query params to safe lower bounds. For env-var parsing in particular, the input commonly contains stray whitespace or inline `# comment` text — use `Number.parseInt(String(value).trim(), 10)` and gate with `Number.isFinite(parsed)` before falling back to the default
    - Hand-rolled regex validators for well-known formats (IP addresses, email, URLs, dates, semver) that accept invalid inputs or reject valid ones — use platform/standard library parsers instead (e.g., `net.isIP()`, `URL` constructor, `semver.valid()`) which handle edge cases the regex misses
    - UI elements hidden from navigation but still accessible via direct URL — enforce restrictions at the route level
    - Summary counters/accumulators that miss edge cases (removals, branch coverage, underflow on decrements — guard against going negative with lower-bound conditions); counters incremented before confirming the operation actually changed state — rejected, skipped, or no-op iterations inflate success counts. Batch operations that report overall success while silently logging per-item failures — callers see success but partial work was done; collect and return per-item failures in the response. Silent operations in verbose sequences where all branches should print status
@@ -116,6 +142,7 @@
    - Writes that replace an entire composite attribute (array, map, JSON blob) when the field is populated by multiple sources — the write discards data from other sources. Use a separate attribute, merge with the existing value, or use list/set append operations
    - Functions with early returns for "no primary fields to update" that silently skip secondary operations (relationship updates, link writes)
    - Functions that acquire shared state (locks, flags, markers) with exit paths that skip cleanup — leaves the system permanently locked. Trace all exit paths including error branches
+   - Modules that own a persistence schema (write to disk/DB with a known shape) should validate at the persistence boundary, not assume the API/route layer will catch everything. Direct callers (internal scripts, tests, programmatic batch jobs, future endpoints) bypass route validation and corrupt on-disk state. Keep enum/range/required checks layered: the route validates input format, the storage layer validates the persistence contract, and at minimum reject invalid enum values before writing
    **Input handling** _[applies when: code accepts user/external input]_
    - Trimming values where whitespace is significant (API keys, tokens, passwords, base64) — only trim identifiers/names
@@ -137,7 +164,7 @@
    **Sync & replication** _[applies when: code uses pagination, batch APIs, or data sync]_
    - Upsert/`ON CONFLICT UPDATE` updating only a subset of exported fields — replicas diverge. Document deliberately omitted fields
-   - Pagination using `COUNT(*)` (full table scan) instead of `limit + 1`; endpoints missing `next` token input/output; hard-capped limits silently truncating results. When a data store applies query limits before filter expressions, a fixed multiplier on the limit still under-fetches — loop with continuation tokens until the target count of post-filter results is collected
+   - Pagination using `COUNT(*)` (full table scan) instead of `limit + 1`; endpoints missing `next` token input/output; hard-capped limits silently truncating results. When a data store applies query limits before filter expressions, a fixed multiplier on the limit still under-fetches — loop with continuation tokens until the target count of post-filter results is collected. Periodic maintenance (cleanup, expiry, dedup) bolted onto a paginated read path runs only for items returned in that page — entries beyond the page boundary are never processed. Move maintenance to a background sweep, run a separate unbounded pass, OR use cheap metadata (mtime, size) for the maintenance pass while only doing expensive reads for the page actually returned. Maintenance gates that depend on parsed metadata fields will skip records where parsing returns a sentinel (0, null, "") — those records become permanent. Treat parse failure as "expired" (or fall back to filesystem mtime), or surface unparseable records for manual review
    - Pagination cursors derived from the last *scanned* item rather than the last *returned* item — if accumulated results are trimmed (e.g., sliced to a page size), the cursor advances past items that were fetched but never delivered, causing permanent skips
    - Batch/paginated API calls (database batch gets, external service calls) that don't handle partial results — unprocessed items, continuation tokens, or rate-limited responses silently dropped. Add retry loops with backoff for unprocessed items
    - Retry loops without backoff or max-attempt limits — tight loops under throttling extend latency indefinitely. Use bounded retries with exponential backoff/jitter
@@ -148,6 +175,7 @@
    - File writes that assume the parent directory exists — on fresh installs or after directory cleanup, the write fails with ENOENT. Ensure the directory exists before writing (or create it on demand)
    - Bootstrap/resilience code that imports the dependencies it's meant to install — restructure so installation precedes resolution
    - Re-exporting from heavy modules defeats lazy loading — use lightweight shared modules
+   - New global APIs (`AbortSignal.any`, `Promise.withResolvers`, `structuredClone`) used in code that runs on older runtimes must be feature-detected. When the codebase already provides a fallback utility for the same API (e.g., a `withTimeout` helper that polyfills `AbortSignal.any`), reuse it instead of calling the unguarded global — drift between the safe path and a new direct call reintroduces the runtime error the fallback was created to avoid
    **Data format portability** _[applies when: code crosses serialization boundaries — JSON, DB, IPC]_
    - Values crossing serialization boundaries may change format (arrays in JSON vs string literals in DB) — convert consistently. Datetime values are especially fragile: mixing UTC string operations (`toISOString().split('T')[0]`) with local-time `Date` methods (`setDate`/`getDate`) shifts results across timezone boundaries; appending `'Z'` to bare datetimes without verifying the source timezone converts non-UTC values incorrectly. Keep all datetime arithmetic in one consistent timezone (preferably UTC with explicit UTC methods) or use a timezone-aware library
@@ -157,34 +185,38 @@
    **Shell & portability** _[applies when: code spawns subprocesses, uses shell scripts, or builds CLI tools]_
    - Subprocess calls under `set -e` abort on failure; non-critical writes fail on broken pipes — use `|| true` for non-critical output
-   - Interactive prompts (`read -p`, `Read-Host`, `prompt()`) in scripts that may run non-interactively (CI, cron, automation) — guard with TTY detection (`[ -t 0 ]`, `[Environment]::UserInteractive`) and default to a safe value or skip when stdin is not a terminal
+   - Interactive prompts (`read -p`, `Read-Host`, `prompt()`) in scripts that may run non-interactively (CI, cron, automation) — guard with TTY detection (`[ -t 0 ]`, `[Environment]::UserInteractive`) and default to a safe value or skip when stdin is not a terminal. Also handle EOF (Ctrl-D, closed stdin) explicitly: under `set -e`, a `read` returning non-zero on EOF will abort the entire script mid-flow — use `read ... || true` or check the return value and apply a safe default. Prompt input validation should accept the full set of expected answers (e.g., `y`/`yes`/`n`/`no` case-insensitive), not just the default — treating any non-default input as consent surprises users who type "no" expecting it to mean no
    - Detached child processes with piped stdio — parent exit causes SIGPIPE. Redirect to log files or use `'ignore'`
    - Subprocess output buffered in memory without size limits — a noisy or stuck child process can cause unbounded memory growth. Cap in-memory buffers and truncate or stream to disk for long-running commands
    - Platform-specific assumptions — hardcoded shell interpreters, `path.join()` backslashes breaking ESM imports. Use `pathToFileURL()` for dynamic imports
    - Naive whitespace splitting of command strings (`str.split(/\s+/)`) breaks quoted arguments — use a proper argv parser or explicitly disallow quoted/multi-word arguments when validating shell commands
    - Subprocess output parsed from a single stream (stdout or stderr) to detect conditions (conflicts, errors, specific states) — the information may appear in the other stream or vary by tool version/config. Check both stdout and stderr, and verify the exit code, to reliably detect the condition
    - Shell expansions (brace `{a,b}`, glob `*`, tilde `~`, variable `$VAR`) suppressed by quoting context — single quotes prevent all expansion, so patterns like `--include='*.{ts,js}'` pass the literal braces to the command instead of expanding. Use multiple flags, unquoted brace expansion (bash-only), or other command-specific syntax when expansion is required
+   - Arguments passed via process argv have OS-imposed length limits (notoriously low on Windows, ~32KB; ~128KB-2MB on most Unix). For variable-length payloads (prompts, JSON blobs, file contents) or anything potentially large, pipe via stdin instead of constructing a long argv. If argv must be used, enforce a strict cap and fail with a clear message before spawning
    **Search & navigation** _[applies when: code implements search results or deep-linking]_
    - Search results linking to generic list pages instead of deep-linking to the specific record
    - Search/query code hardcoding one backend's implementation when the system supports multiple — verify option/parameter names are mapped between backends
+   - Deep-link URL contracts between sender and receiver — a URL with query parameters (`?id=...`, `?date=...`) or path segments is a contract: the receiving page/route MUST consume those parameters and use them to scroll/select/filter. Otherwise the sender misleads users (chip claims to "deep-link to event" but the route ignores the parameter and lands on the section list). Verify deep-link destinations actually route to the intended record/state by reading the receiver's route handler / page code, not just the producer's intent. If the receiver doesn't yet support the parameter, either drop it (and adjust docs/changelog claims) or wire it through end-to-end
    **Destructive UI operations** _[applies when: code adds delete, reset, revoke, or other destructive actions]_
    - Destructive actions (delete, reset, revoke) in the UI without a confirmation step — compare with how similar destructive operations elsewhere in the codebase handle confirmation
    **Streaming & real-time protocols** _[applies when: code implements SSE, WebSocket, ReadableStream, or event-driven APIs]_
-   - Wire protocol parsers that assume a simplified subset of the spec (e.g., `\n`-only line endings when the spec allows `\r\n`) — test against boundary representations the spec permits, not just the happy path
+   - Wire protocol parsers that assume a simplified subset of the spec (e.g., `\n`-only line endings when the spec allows `\r\n`) — test against boundary representations the spec permits, not just the happy path. Parsers must (a) handle the spec's full set of separators (e.g., both `\n\n` and `\r\n\r\n` for SSE; multiple `data:` lines joined with `\n`); (b) flush remaining buffered content on EOF — otherwise the last frame is dropped when upstream closes mid-frame; (c) wrap per-frame deserialization (`JSON.parse`) so a single malformed frame doesn't terminate the entire stream
    - Event handlers or effects that fire on every high-frequency event (streaming deltas, scroll, resize, keydown) without throttle, debounce, or `requestAnimationFrame` batching — causes jank and excessive re-renders
    **Accessibility** _[applies when: code modifies UI components or interactive elements]_
-   - Interactive elements missing accessible names, roles, or ARIA states — including labels lost or replaced with non-descriptive placeholders in conditional/compact rendering modes. Disabled interactions should have `aria-disabled`
+   - Interactive elements missing accessible names, roles, or ARIA states — including labels lost or replaced with non-descriptive placeholders in conditional/compact rendering modes. Disabled interactions should have `aria-disabled`. Verify the ARIA attribute set matches established patterns used elsewhere in the codebase for the same widget type (disclosure, menu, dialog) — inconsistency degrades assistive-tech support and creates confusion for keyboard users
+   - ARIA roles applied without the keyboard interactions the role contract requires — `role="menu"`/`menuitem*` expects roving focus, arrow-key navigation, Escape scoped to the menu, and focus management on open/close; `role="listbox"`/`option` expects Home/End/typeahead; `role="dialog"` expects focus trap + return focus. Shipping the role without the behavior strands screen-reader and keyboard users on a control that looks interactive to AT but doesn't respond. Either implement the full interaction pattern or drop to a simpler one (native `<button>` + disclosure) that doesn't promise more than the code delivers
    - Custom toggle/switch UI built from non-semantic elements instead of native inputs
    - Overlay or absolutely-positioned layers with broad `pointer-events-auto` that intercept clicks/hover intended for elements beneath — use `pointer-events-none` on decorative overlays and enable events only on small interactive affordances. Conversely, `pointer-events-none` on a parent kills hover/click handlers on children — verify both directions when layering positioned elements
+   - Nested inputs / controls handling keyboard events (`Escape`, `Enter`, `ArrowUp`/`Down`) inside a modal or form that also handles the same key at the ancestor level — the event bubbles and the ancestor fires too (closing the modal, submitting the form, collapsing the disclosure). Call `e.stopPropagation()` (and usually `e.preventDefault()`) in the inner handler when it handles the key, or scope the ancestor handler to only fire when the event target is outside the inner control
 ## Tier 4 — Always Check (Quality, Conventions, AI-Generated Code)
    **Intent vs implementation**
-   - Labels, comments, status messages, or documentation that describe behavior the code doesn't implement — e.g., a map named "renamed" that only deletes, an action labeled "migrated" that never creates the target, or UI actions offered for entity states where the transition is invalid (e.g., a "Reject" button on already-rejected items)
+   - Labels, comments, status messages, or documentation that describe behavior the code doesn't implement — e.g., a map named "renamed" that only deletes, an action labeled "migrated" that never creates the target, or UI actions offered for entity states where the transition is invalid (e.g., a "Reject" button on already-rejected items). Also covers doc drift on concrete facts: file paths or extensions (`foo.js` referenced when the file is `foo.jsx`), item counts ("13 widgets" when there are 15), documented default entity names ("Default" vs the actual "Everything"), and route/response-shape comments that say `{ activeId }` when the handler returns `{ activeId, items }`. When reviewing, verify every factual claim in a comment or doc against the code it references
    - Inline code examples, command templates, and query snippets that aren't syntactically valid as written — template placeholders must use a consistent format, queries must use correct syntax for their language (e.g., single `{}` in GraphQL, not `{{}}`)
    - Cross-references between files (identifiers, parameter names, format conventions, version numbers, operational thresholds) that disagree — when one reference changes, trace all other files that reference the same entity and update them. This includes internal identifiers (route paths, file names, component names) that should be renamed when the concept they represent is renamed — a nav label saying "Rejected" pointing to `/admin/flagged` or a component named `FlaggedList` rendering rejected items creates maintenance confusion. For releases, verify version consistency across all versioned artifacts (package manifests, lockfiles, API specs, changelogs, PR metadata). Also applies to field-set enumerations: when an operation targets a set of entity fields, every predicate, filter expression, scan criteria, API doc, and UI conditional that enumerates those fields must stay in sync — an independently maintained list that omits a field causes silent skips or false positives
    - Template/workflow variables referenced (`{VAR_NAME}`) but never assigned — trace each placeholder to a definition step; undefined variables cause silent failures or confusing instructions. Also check for colliding identifiers (two distinct concepts mapped to the same slug, key, or name)
@@ -195,9 +227,11 @@
    - Sequential numbering (section numbers, step numbers) with gaps or jumps after edits — verify continuity
    - Completion markers, success flags, or status files written before the operation they attest to finishes — consumers see false success if the operation fails after the write
    - Existence checks (directory exists, file exists, module resolves) used as proof of correct/complete installation — a directory can exist but be empty, a file can exist with invalid contents. Verify the specific resource the consumer needs
+   - Readiness/health probes that rely solely on subprocess exit code without inspecting output — many CLIs (`psql`, `curl`, `kubectl`) exit 0 for empty results, missing schema, auth-only handshake, or "command accepted" when the actual condition isn't met. Capture stdout (`execFileSync(...).toString()`) and verify it contains the expected marker (a row count, a status code, a schema-table name). For tools that read user-level config (`.psqlrc`, `~/.curlrc`), pass flags that ignore those files (`-X`, `--no-rcfile`) so the probe behaves the same in every environment
    - Lookups that check only one scope when multiple exist — e.g., checking local git branches but not remote, checking in-memory cache but not persistent store. Trace all locations where the resource could exist and check each
    - Tracking/checkpoint files that default to empty on parse failure — causes full re-execution. Fail loudly instead. More broadly, safety/guard checks that catch errors and default to "safe to proceed" (fail-open) rather than treating errors as "unsafe, abort" (fail-closed) — a guard that silently succeeds on error provides no protection when it's needed most
    - Registering references to resources without verifying the resource exists — dangling references after failed operations
+   - Composed instructions, prompts, system messages, or rule sets that vary by mode/role/context — unconditional clauses can contradict mode-specific directives (e.g., "always cite sources inline" combined with a "draft" mode that asks for "no preamble, no commentary"). Build the composition conditionally — include each instruction block only for the modes that actually want it — or define an explicit precedence so contradictions are predictable
    **Automated pipeline discipline**
    - Internal code review must run on all automated remediation changes BEFORE creating PRs — never go straight from "tests pass" to PR creation
@@ -210,6 +244,7 @@
    - Commit messages or comments claiming a fix while the underlying bug remains — verify each claimed fix actually addresses the root cause, not just the symptom
    - Functions containing placeholder comments (`// TODO`, `// FIXME`, `// implement later`) or stub implementations presented as complete
    - Unnecessary defensive code: error handling for scenarios that provably cannot occur given the call site, fallbacks for internal functions that always return valid data
+   - Cleanup callbacks (useEffect return, finalizer, dispose, signal handler) containing only comments are misleading and signal missing or relocated behavior. Either implement the cleanup or remove the callback entirely; don't ship a cleanup function that does nothing
    **Configuration & hardcoding**
    - Hardcoded values when a config field or env var already exists; dead config fields nothing consumes; unused function parameters creating false API contracts; resource names (table names, queue names, bucket names) hardcoded without accounting for environment prefixes — lookups on response objects using the wrong key silently return undefined
@@ -234,8 +269,9 @@
    - Tests that exercise code paths depending on features the integration layer doesn't expose — they pass against mocks but the behavior can't trigger in production. Verify mocked responses match what the real dependency actually returns
    - Test mock state leaking between tests — mock setup APIs that configure return values often persist across tests even after clearing call history, because "clear" resets invocation counts but not configured behavior (use "reset" variants that restore original implementations). Conversely, per-call sequential mock responses couple tests to internal call count — prefer stable return values for behavior tests, sequential mocks only when verifying call order
    - Tests that pass but don't cover the changed code paths — passing unrelated tests is not validation
+   - Response/status assertions written as loose ranges (`status >= 400`, `status < 500`, `ok: false`) — a regression that turns a 400 validation failure into a 500 internal error still passes. Assert the specific expected status (or at least `< 500` for "must be a client error") so the test distinguishes validation from server failure. Same principle for assertions like "contains any error" or "thrown anything" — assert the specific error type/code the code is contracted to produce
    **Style & conventions**
    - Naming and patterns consistent with the rest of the codebase
-   - Formatting consistency within each file — new content must match existing indentation, bullet style, heading levels, and structure. For structured files that follow a convention across sibling files (changelogs, config files, migration files), verify new entries use the same section headers, field names, and ordering as existing siblings
+   - Formatting consistency within each file — new content must match existing indentation, bullet style, heading levels, and structure. For structured files that follow a convention across sibling files (changelogs, config files, migration files), verify new entries use the same section headers, field names, and ordering as existing siblings. Within a single structured file, section headers must be unique — two `## Fixed` blocks in the same changelog (or two `[features]` tables in the same TOML) are a merge artifact that splits content downstream tools expect to find under one header. Consolidate duplicates into a single section
    - Shell/workflow instructions with destructive operations (branch deletion, file removal, force operations) must verify preconditions first — e.g., ensure you're not on a branch being deleted, confirm the target exists, and don't suppress stderr from commands where failures indicate real problems (auth errors, network issues)

package/lib/copilot-review-loop.md CHANGED Viewed

@@ -19,16 +19,16 @@ whether to continue or stop — never loop indefinitely without confirmation.
 TIMEOUT SCHEDULE:
 When running parallel PR reviews (do:better), use shorter waits to avoid
 blocking other PRs:
-- Iteration 1: max wait 5 minutes
-- Iteration 2: max wait 4 minutes
-- Iteration 3: max wait 3 minutes
-- Iteration 4: max wait 2 minutes
-- Iteration 5+: max wait 1 minute
+- Iteration 1: max wait 3 minutes
+- Iteration 2: max wait 2 minutes
+- Iteration 3: max wait 90 seconds
+- Iteration 4: max wait 60 seconds
+- Iteration 5+: max wait 45 seconds
 When running a single-PR review (do:pr, do:release), use dynamic timing:
 check the previous Copilot review duration on this PR and wait up to 2x
-that (minimum 5 minutes, maximum 20 minutes). Copilot reviews can take
-10-15 minutes for large diffs.
-Poll interval: 30 seconds for all iterations.
+that (minimum 2 minutes, maximum 10 minutes). Copilot reviews typically
+complete in 2-5 minutes; large diffs may take longer.
+Poll interval: 15 seconds for all iterations.
 Run the following loop until Copilot returns zero new comments:
@@ -51,8 +51,8 @@ Run the following loop until Copilot returns zero new comments:
    - For parallel PR reviews (do:better): use the DECREASING TIMEOUT for
      the current iteration number
    - For single-PR reviews (do:pr, do:release): use dynamic timing based on
-     the previous Copilot review duration on this PR (2x that, min 5 min,
-     max 20 min)
+     the previous Copilot review duration on this PR (2x that, min 2 min,
+     max 10 min)
    - Error detection: if the review body contains "Copilot encountered an
      error" or "unable to review this pull request", re-request (step 1)
      and resume polling. Max 3 error retries before reporting failure.

package/lib/review-cross-file-tracing.md CHANGED Viewed

@@ -43,6 +43,9 @@ Apply the checklist as a prompt for attention, not an exhaustive specification.
 - Side effects during React render
 - Interactive UI elements (buttons, inputs, drag-and-drop targets, keyboard shortcuts) that remain enabled while an async operation owns their related state — a second trigger while the first is in-flight produces concurrent state mutations or duplicate operations. All entry points for the same operation must be disabled together while the operation is pending
 - Optimistic UI messages that substitute placeholder text when the actual payload sent to the server differs — the conversation history and server-side record will show different content than what the user saw. Use the same fallback text in both the optimistic render and the outgoing payload, or surface the actual payload text
+- Optimistic placeholder IDs ('pending', 'temp_*', client-generated UUIDs) echoed back to the server in subsequent requests — server validates against its real ID format and rejects them as 400s. Trace from optimistic insertion → controls bound to the optimistic record (pin, promote, delete, follow-up) → outgoing request payloads. Disable controls until the server returns a real ID, OR omit the field from outgoing payloads when the local value still matches the optimistic shape
+- Client-side AbortController for an in-flight streaming/long-lived operation must abort when the owning UI context tears down OR navigates AWAY from THAT operation. When cleanup is keyed to a route param, navigation events emitted BY the in-flight stream itself (e.g., redirecting to a permalink after the server returns an id) trigger the cleanup and abort the very operation that caused the navigation. Track the streaming operation's identity in a ref and abort only when navigating away from THAT identity, not on every param change. Mirror by cancelling the stream reader (`reader.cancel()`) in `finally` and ignoring late events whose ID doesn't match the now-current operation
+- Settings whose persistence model is per-record (per-conversation, per-document, per-project) held only in local component state — refresh resets to the persisted value while the server-side history shows different content. Trace: UI mode/setting state → outgoing payload → persistence schema → reload path. Persist on every mutation OR derive UI from the last-persisted record
 ### Error Handling
@@ -52,7 +55,9 @@ Apply the checklist as a prompt for attention, not an exhaustive specification.
 - Destructive ops in retry/cleanup paths without own error handling
 - External service calls without configurable timeouts
 - Missing fallback for unavailable downstream services
-- SSE or streaming handlers that call `end()`/`close()` on mid-stream errors without emitting an error event — the client observes a clean stream termination and treats partial content as complete. Emit a structured `event: error` block before closing so clients can detect and surface the failure
+- SSE or streaming handlers that call `end()`/`close()` on mid-stream errors without emitting an error event — the client observes a clean stream termination and treats partial content as complete. Emit a structured `event: error` block before closing so clients can detect and surface the failure. Conversely, named lifecycle events (`error`, `done`, `complete`) must be MUTUALLY EXCLUSIVE — after emitting `error`, do NOT also emit `done`, or include explicit success/error info in the terminal frame. Trace every exit path of the stream generator and the route's loop body to verify only one terminal event fires
+- Streaming server handlers whose abort signal is wired into ONLY the final consumer (e.g., the LLM provider fetch) but not into upstream retrieval / embedding / subprocess work — disconnects don't actually stop the expensive earlier work. Trace the AbortSignal from `req.on('close')` through every async leg of the pipeline and verify each takes a `signal` parameter and propagates it
+- Raw `fetch()` failures (TypeError "Failed to fetch", DNS errors, ECONNREFUSED) at API client boundaries must be translated to a consistent message matching the project's established transport-error utility. Trace each new API client function against existing siblings (`apiCore.request`, `apiOpenClaw.streamMessage`) and verify the same wrapper is used; preserve `AbortError` so callers distinguish cancellation from failure
 - SSE or event dispatchers that handle named event types but ignore the protocol's default/unnamed event — SSE streams that emit `data:` without `event:` produce type `'message'` (the SSE default), which a handler processing only named types will silently discard. Verify the default event type is either handled or explicitly excluded
 - Route handlers that call a status/health probe before delegating to the main service when the service already handles the "not configured"/"unreachable" case — the pre-probe adds an extra upstream round-trip on every request and can fail even when the intended operation would succeed. Let the service be the authoritative source of truth and map its structured errors to the appropriate HTTP status at the route boundary
@@ -74,9 +79,10 @@ Apply the checklist as a prompt for attention, not an exhaustive specification.
 - One-time migrations without completion guard — re-execute every startup
 - Data migrations silently changing runtime behavior — preserve execution semantics. Unsupported source values must be flagged, not defaulted
 - Update endpoints with field allowlists not covering new model fields
-- New endpoints not matching validation patterns of existing similar ones. API doc schemas must be structurally complete
+- New endpoints not matching validation patterns of existing similar ones. The same field (id, name) accepted by multiple endpoints must be validated identically everywhere — path params, query, body, on sibling endpoints (create/update/delete/activate). Skipping param validation on one sibling turns violations into 404/500 instead of 400. `z.string().min(1)` without `.trim()` accepts whitespace-only names. API doc schemas must be structurally complete
 - Client-side input validation limits (max count, file size, string length, combined totals) must be consistent with — and ideally tighter than — server-side enforcement. When the client allows combinations the server rejects (e.g., 8 × 10MB files vs a 50MB JSON body limit), users hit confusing 400/413 errors. Trace all enforcement boundaries (UI, API schema, body parser, downstream service) and verify they form a coherent envelope
 - Sample config files, README examples, and documentation that reference config keys or structure must match what the implementation actually reads. Trace example keys against the config loader — stale examples teach operators to configure values the system ignores (or vice versa)
+- Subprocess invocations must inherit the same configuration source as the parent — if the parent reads from `.env`/config files but the child only sees `process.env`, exporting those values explicitly via the `env` option is required. Trace from config loader → invocation site → subprocess script. Otherwise a probe uses customized credentials/ports while the underlying setup runs with defaults, creating an "inconsistency loop" where the probe always fails and provisioning re-applies defaults that overwrite user customization
 - Config values whose format can be validated at initialization time (URLs, port numbers, auth schemes) but are only validated at first use — misconfiguration surfaces as a cryptic runtime error deep in the call stack. Validate format and range of security-relevant config values during initialization and surface a specific diagnostic identifying the bad field
 - URL joining utilities that force paths absolute-from-origin (stripping the base URL's pathname) — `baseUrl=http://host/proxy` + `/v1/api` silently produces `http://host/v1/api` instead of `http://host/proxy/v1/api`. Verify URL construction utilities preserve pathname segments from the base URL, or document and enforce that base URLs must be origin-only
 - Summary/aggregation endpoints using different filters/sources than detail views they link to
@@ -84,8 +90,13 @@ Apply the checklist as a prompt for attention, not an exhaustive specification.
 - Validation functions introduced for a field: trace ALL write paths. New branches must apply same validation as siblings
 - Stored config merged with shallow spread — nested objects lose new default keys on upgrade. Use deep merge
 - Schema fields accepting values downstream can't handle. Validated params never consumed (dead API surface). `.partial()` on nested schemas: verify nested objects also partial. `.partial()` with `.default()` silently overwrites persisted values on update
+- Generator/validator structural invariant — when a generator produces values with structural guarantees (sortability via fixed-width prefix, embedded checksum, encoded version), the validator regex (and any client-side mirror) must enforce the SAME shape. Broader regexes accept inputs the generator never emits, breaking invariants the rest of the system relies on (e.g., lexical sort == chronological sort breaks once a base36 timestamp grows by a digit). Trace generator → server validator → client mirror as a closed loop. Test fixtures should use IDs/payloads that match generator output, not contrived literals
+- Schemas accepting paired range fields (`startDate`/`endDate`, `min`/`max`, `from`/`to`) without a cross-field refinement (`zod .refine()`) — accepts inconsistent ranges (start > end). Trace the schema definition, route validation, and downstream consumer to confirm the range relationship is enforced somewhere (preferably at the schema)
+- Required-at-use-time config values (model name, API key, endpoint URL, default selection) that may be null/undefined in the source data must be validated at the boundary before invoking the downstream API. Trace from config source → loading layer → use site, and verify nullable fields are guarded with a clear, actionable error before the downstream call. Otherwise the downstream API responds with an opaque error far from the user's intent
 - Multi-part UI gated on different prop subsets — derive single enablement boolean
 - Entity creation without case-insensitive uniqueness
+- Arrays of IDs (widget ids, tag ids, member ids) persisted, returned by API, or rendered with `key={x}` without container-level dedup — element-level validation (type, length) isn't enough. Enforce uniqueness via schema refinement (`zod.refine(arr => new Set(arr).size === arr.length)`), dedupe on ingest, AND dedupe during read-path sanitization so hand-edited / legacy data can't reintroduce collisions. Apply the same first-wins dedup to arrays of records keyed by id at the container level
+- Data loaded from files or persistent stores sanitized less strictly than the API accepts on write — hand-edited, migrated, or corrupted persisted state can introduce values (oversized names, non-kebab ids, duplicate entries) the API rejects on mutate, producing oversized responses, unreachable records, or invariant violations. Apply the same length caps, regex, uniqueness, and type guards in read-path sanitization as in request validation
 - Code reading response properties that don't exist — verify field names, nesting, actual response shape. Wrappers that don't request/forward needed fields. Call sites using wrong function variant for input format or wrong positional argument order
 - Data model fields with different names per write path. Entity identity keys inconsistent across lookup paths
 - Entity type changes without revalidating type-specific invariants and clearing old-type fields
@@ -115,6 +126,11 @@ Apply the checklist as a prompt for attention, not an exhaustive specification.
 - New external service calls must use established mock/test infrastructure
 - New UI consumers against existing APIs: verify every field name, nesting, identifier, response envelope matches actual producer response
 - Discovery/catalog endpoints: trace enumerated set against consumer's supported inputs
+- Cross-module constants kept in sync by comment ("must stay in sync with X", duplicated regex, duplicated event name, duplicated size limit) — the comment is not enforcement and drift is a silent failure. Event names, regex patterns, numeric limits, path segments, and feature-flag keys shared across modules (client↔server, route↔service, component↔component, producer↔consumer) must be a single exported constant imported by both. Flag any instance where a comment notes "keep in sync" without the actual shared module
+- New global APIs (`AbortSignal.any`, `Promise.withResolvers`, `structuredClone`) used directly when the codebase already has a fallback utility for the same API — search for an existing wrapper (`fetchWithTimeout`, `withSignal`, polyfill helpers) before adding a new direct call. Drift between the safe path and a new direct call reintroduces the runtime error the fallback was created to avoid
+- Pure persistence/utility modules importing from orchestration/service modules just to access a constant pulls the entire downstream import graph as a transitive dependency. Trace each `import` in storage / utility files; if the imported symbol is a constant (enum, regex, size cap, valid-mode set), suggest moving it to a small dedicated shared module
+- Actions triggered from one surface (command palette, global menu, external event) that mutate data another already-mounted page/component fetched on mount — re-navigating to the same route doesn't remount (routers no-op it), so the visible state stays stale while the server updates. Propagate change via shared store, a pub/sub event whose name is a shared constant, focus/visibility refetch, or key-based remount — and verify the mounted page actually subscribes on its side
+- Modules that own a persistence schema (write to disk/DB with a known shape) should validate at the persistence boundary, not assume the API/route layer will catch everything. Trace from route validator → service call → persistence write — verify enum/range/required checks exist at the storage layer for fields the schema cares about. Direct callers (internal scripts, tests, programmatic batch jobs) bypass route validation otherwise
 **Cleanup/teardown side effects**
 - Cleanup functions with implicit mutations (auto-merge, auto-commit, cascade writes) — verify abort on prerequisite failure
@@ -168,6 +184,10 @@ Apply the checklist as a prompt for attention, not an exhaustive specification.
 **Batch/paginated consumption**
 - Batch API callers handle partial results, continuation tokens, rate limits with backoff. Resource names account for environment prefixes
+- Periodic maintenance (cleanup, expiry, dedup) bolted onto a paginated read path runs only for items returned in that page — entries beyond the boundary are never processed. Trace from list endpoint → maintenance/sweep code → the iteration that bounds it. Move maintenance to a background sweep, run a separate unbounded pass, OR use cheap metadata (mtime, size) for the maintenance pass while only doing expensive reads for the page actually returned. Maintenance gates that depend on parsed metadata fields will skip records where parsing returns a sentinel (0, null, "") — those records become permanent
+**Deep-link URL contract (sender ↔ receiver)**
+- A URL with query parameters (`?id=...`, `?date=...`) or path segments is a contract: the receiving page/route MUST consume those parameters and use them to scroll/select/filter. Trace each new deep-link href to the destination route handler / page component and verify it reads and acts on every parameter the sender includes. If the receiver doesn't yet support the parameter, either drop it (and adjust docs/changelog claims) or wire it through end-to-end
 **Data model vs access pattern**
 - Claims of ordering ("recent", "top") verified against key/index design — random UUIDs require full scans

package/lib/review-security-audit.md CHANGED Viewed

@@ -26,6 +26,7 @@ For each changed file:
 - API responses returning full objects with sensitive fields — destructure and omit across ALL paths (GET, PUT, POST, error, socket). Comments claiming data isn't exposed while the code does expose it
 - Server trusting client-provided computed/derived values (scores, totals, correctness flags, file metadata like MIME type and size) — strip and recompute server-side. Validate uploads via magic bytes and buffer length, not headers
+- Server trusting persisted-state flags (builtIn, protected, role, owner, immutable) read from flat-file/JSON/DB records to make authorization or deletion decisions — hand-editing the file or tampered sync can flip the flag and bypass protection. Derive authority on every read from a trusted source: a code-level constant set of built-in ids, session identity, or a server-side role lookup. The persisted representation can cache the flag for display, but must not be the source of truth for security decisions
 - New endpoints under restricted paths (admin, internal) missing authorization — compare with sibling endpoints for same access gate (role check, scope validation). New OAuth scopes must be checked comprehensively — a check testing only one scope misses newly added scopes
 - User-controlled objects merged via `Object.assign`/spread without sanitizing keys — `__proto__`, `constructor`, `prototype` enable prototype pollution. Use `Object.create(null)`, whitelist keys, use `hasOwnProperty` not `in`
 - Push events (WebSocket, SSE, pub/sub) emitted without scoping to originating user/session — sensitive payloads leak to all connected clients. Scope via room/channel isolation or server-side correlation ID
@@ -56,7 +57,9 @@ For each changed file:
 **Sanitization/validation coverage**
 - If a new validation function is introduced for a field: trace ALL write paths (create, update, import, sync, bulk) — partial application means invalid data re-enters through unguarded paths
 - If a "raw" or bypass write path is added: compare normalization against what the read/parse path assumes — data through raw path must be valid on reload
+- Read-path sanitization of persisted data must enforce the SAME bounds as the API schema (length caps, uniqueness, regex, per-item type guards) — hand-edited or migrated data can otherwise introduce values the API rejects on mutate, producing oversized responses, unreachable records (client renders but API rejects), or invariant violations. Drop or truncate out-of-range values rather than passing them through
 - If a new dispatch branch is added within a multi-type handler: verify equivalent validation as sibling branches
+- Modules that own a persistence schema (write to disk/DB with a known shape) must validate at the persistence boundary — not only at the API/route layer. Direct callers (internal scripts, tests, programmatic batch jobs, future endpoints) bypass route validation and corrupt on-disk state. At minimum, reject invalid enum values, missing required fields, and out-of-range values before writing — so the storage layer enforces its own contract independent of who calls it
 **Security-sensitive configuration parsing**
 - Env vars/config affecting security (proxy trust, rate limits, CORS, token expiry): verify type and range enforcement. `Number()` accepts floats, negatives, empty-string-as-zero — use `parseInt` + `Number.isInteger` + range checks with logged safe defaults

package/lib/review-surface-scan.md CHANGED Viewed

@@ -30,22 +30,52 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 **Runtime correctness**
 - Null/undefined access without guards; off-by-one errors; spread of null (is `{}`), spread of non-objects (string → indexed chars, array → numeric keys) — guard with plain-object check before spreading
 - External/user data (parsed JSON, API responses, file reads) used without structural validation — guard parse failures, missing properties, wrong types, null elements. Optional enrichment failures should not abort the main operation
-- Type coercion: `Number('')` is `0` not empty; `0` is falsy in truthy checks; `NaN` comparisons always false; `"10" < "2"` (lexicographic). Deserialized booleans: `"false"` is truthy — use `=== 'true'`. `isinstance(x, int)` accepts `bool` in Python; `typeof NaN === 'number'` in JS
+- Type coercion: `Number('')` is `0` not empty; `0` is falsy in truthy checks; `NaN` comparisons always false; `"10" < "2"` (lexicographic). Deserialized booleans: `"false"` is truthy — use `=== 'true'`. `isinstance(x, int)` accepts `bool` in Python; `typeof NaN === 'number'` in JS. For env-var numeric parsing in particular, use `Number.parseInt(String(value).trim(), 10)` and gate with `Number.isFinite(parsed)` — `NaN` flowing into subprocess args (`-p NaN`) or formatted strings produces opaque downstream failures (whitespace/inline-comment values are common culprits)
 - Config/option defaults applied with `||` or falsy guards — intentional `0`, `false`, or `''` values are treated as "unset" and replaced by the default. Use `?? default` or explicit `=== undefined` checks when zero, false, or empty string are valid configuration values
 - Functions returning different types depending on input conditions (string in one branch, object in another) — callers must branch on return type; prefer a consistent return shape
 - Indexing empty arrays; `every`/`some`/`reduce` on empty collections returning vacuously true; declared-but-never-updated state/variables
 - Parallel arrays coupled by index position — use objects/maps keyed by stable identifier
 - Shared mutable references: module-level defaults mutated across calls (use `structuredClone()`); `useCallback`/`useMemo` referencing later `const` (temporal dead zone); spread followed by unconditional assignment clobbering spread values
+- React state invariants (uniqueness, cap/floor, monotonicity) checked against render-time value before `setX(...)` — rapid events or concurrent updates race the check. Move into the functional updater: `setX(prev => prev.includes(id) ? prev : [...prev, id])`
+- `useEffect` depending on state it writes — the write retriggers the effect (infinite loop / request storm). Split into two effects, drop the self-written value from deps, or use a functional setter that doesn't require the current value in deps
 - Functions with >10 branches or >15 cyclomatic complexity — refactor
+- String accumulation via `+=` inside high-frequency loops (streaming frames, chunked I/O, per-event handlers) is O(n²) for long outputs and triggers React re-renders on growing payloads. Collect chunks into an array and `join('')` at the end while still emitting per-chunk events
+- Server-side string formatting (`toLocaleString`, `toLocaleDateString`, currency/number formatters) that depends on locale/timezone defaults produces non-deterministic outputs across deployments. For data flowing into prompts, logs, or persisted records, format with explicit `Intl.DateTimeFormat({ timeZone: ... })` or ISO strings; reserve locale-aware formatting for user-visible UI layers
+- Required-at-use-time config values (model name, API key, endpoint URL, default selection) that may be null/undefined in source data must be validated at the boundary before invoking the downstream API. Otherwise the API responds with an opaque error far from the user's intent. Emit a clear, actionable error identifying the missing field
 **API route basics**
-- Route params passed to services without format validation; path containment using string prefix without separator boundary (use `path.relative()`)
+- Route params passed to services without format validation; path containment using string prefix without separator boundary (use `path.relative()`). When sibling endpoints validate body/query fields, the path param must be validated with the same schema — skipping param validation on one endpoint turns schema violations into 404/500 instead of 400
 - Parameterized/wildcard routes registered before specific named routes (`/:id` before `/drafts` matches `/drafts` as `id="drafts"`)
 - Stored or external URLs rendered as clickable links without protocol validation — allowlist `http:`/`https:`
 - Request schema fields for large string/binary payloads (base64, file content, free text) without per-field size limits — a total body-size limit alone doesn't prevent individual oversized fields from consuming excessive memory or exceeding downstream service limits; add per-field `max(N)` constraints with clear error messages
+- Character-class regex validators (`^[a-z0-9-]+$`) claiming to enforce a structured format (slug, kebab-case, reverse-DNS) — they accept leading/trailing separators (`-foo`, `foo-`) and repeated separators (`a--b`). Require alnum boundaries or use a parser
+- `z.string().min(1)` without `.trim()` accepts whitespace-only values for user-visible names — use `z.string().trim().min(1)` when the field represents a human-readable identifier
+- Object spread of a potentially-null/undefined boundary value (`{ ...req.body, id }`) — throws a TypeError and surfaces as 500 instead of 4xx. Use `{ ...(req.body ?? {}), id }` at request/boundary entry points
+- Schemas accepting paired range fields (`startDate`/`endDate`, `min`/`max`, `from`/`to`) without a cross-field refinement (`zod .refine()`) — accepts inconsistent ranges (start > end). Define deterministic rules when only one bound is supplied
+- Outbound payloads (snippets, previews, cached excerpts) stored in persisted records or sent over streaming protocols without size caps — applies the same per-field max constraint as inbound payloads, enforced at capture time so display-layer truncation isn't the only defense
+**Async & state (single-file patterns)**
+- Optimistic UI state (selection, active flag, list membership) updated before an async call and never reverted on failure — the user sees success, the server remains on the old state. Capture the previous value, await in try/catch, and reset on rejection. Optimistic placeholder IDs ('pending', 'temp_*', client-generated UUIDs) must NOT be echoed back to the server in subsequent requests — the server validates against its real ID format and rejects them as 400s. Disable controls bound to optimistic IDs until the server returns a real one, OR omit the field from outgoing payloads when the local value still matches the optimistic shape
+- Settings whose persistence model is per-record (per-conversation, per-document, per-project) must be persisted on every mutation, not just held in local component state — otherwise refresh resets to the persisted value. Decide explicitly whether the field is per-record, per-session, or per-action — and persist accordingly
+- Async functions invoked from sync event handlers (`onClick`, `onKeyDown`), effects, or dispatchers without rejection handling at the call site — even when a shared `request()` toasts the error, the unhandled rejection leaves UI stuck (modal doesn't close, palette doesn't navigate, dirty state doesn't clear). Wrap in try/catch, attach `.catch(...)`, or use `void p.catch(...)`; only run close/navigate/success in the resolve branch. After awaiting, check `signal.aborted` (or a `mountedRef`) before subsequent state writes — otherwise React warns about updates on unmounted trees and stale state leaks through
+- Single shared error-state variable reused by multiple independent async flows — one flow's success path clears the other flow's displayed error. Split errors by domain or only overwrite the specific error you own
+- Loading flag covers only the primary fetch — a slow or failed secondary fetch renders a blank page with no indicator. Include every render-gating load in the loading state or provide explicit empty/error states
+- Streaming UIs that clear the streaming buffer when a terminal `error` event arrives discard deltas the user already saw. Preserve accumulated content as a partial result with an error indicator so users don't lose what was visible
+- Raw `fetch()` failures (TypeError "Failed to fetch", DNS errors, ECONNREFUSED) at API client boundaries must be translated to a consistent user-friendly message matching the project's established transport-error utility — preserve `AbortError` so callers can still distinguish cancellation from failure
+**Streaming response handlers (server-side)**
+- Long-lived streaming handlers (SSE, chunked HTTP) must register a client-disconnect listener (`req.on('close')`/`req.on('aborted')`), set an `aborted` flag, and check it before subsequent writes — otherwise the server keeps emitting frames and incurring `write-after-end` errors after the client navigates away
+- After flushing streaming headers, framework error middleware (asyncHandler, exception filters) cannot send a JSON error — wrap post-handshake logic in try/finally that translates errors into a terminal `event: error` SSE frame and ends the response gracefully (`if (!res.writableEnded && !res.destroyed) res.end()`)
+- Per-request timeouts on streaming responses must remain active for the full duration of stream consumption, not cleared on initial fetch resolution — a stalled upstream that keeps the connection open hangs the consumer indefinitely
+- Honor write backpressure: check the boolean return of `write()` and await `'drain'` when it returns false; otherwise a slow client causes unbounded server-side buffering
+- When attaching paired listeners for backpressure or completion (`drain` + `close`), the cleanup handler must remove ALL of them — asymmetric removal accumulates listeners across slow-client cycles
+- Named lifecycle events on streams (`error`, `done`, `complete`) must be MUTUALLY EXCLUSIVE — after emitting `error`, do NOT also emit `done`. Otherwise clients parsing the last event treat failed runs as completed
 **Error handling (single-file)**
 - Swallowed errors (empty `.catch(() => {})`); error handlers that exit cleanly (`exit 0`, `return`) without user-visible output; handlers replacing detailed failure info with generic messages
+- Error discrimination by string matching (`err.message.includes('not found')`, regex on error text) — localization, refactors, or wrapper rewrites silently change HTTP status / retry behavior. Use explicit error codes or typed classes
+- Route handlers mapping any exception from a service into a single HTTP status (e.g., `catch { throw new NotFoundError() }`) — hides real server errors (file I/O, parse, write failures) as domain 404s. Map only known codes/classes; let unknown errors surface as 500
+- Error wrappers that re-throw with only `{ status }` and drop `code`/`context`/`cause` — downstream consumers see generic `INTERNAL_ERROR` instead of the specific code. Preserve structured detail when wrapping
 ### Domain-Specific (check only when file type matches)
@@ -82,13 +112,16 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 **Shell & portability** _[subprocesses, shell scripts, CLI tools]_
 - `set -e` aborting on non-critical failures; broken pipes on non-critical writes — use `|| true`
-- Interactive prompts in non-interactive contexts (CI, cron) — guard with TTY detection
+- Interactive prompts in non-interactive contexts (CI, cron) — guard with TTY detection (`[ -t 0 ]`). Also handle EOF (Ctrl-D, closed stdin) explicitly under `set -e` — a `read` returning non-zero on EOF aborts the script. Use `read ... || true` and check the return; default to a safe value. Validate the full set of expected answers (e.g., `y`/`yes`/`n`/`no` case-insensitive) — treating any non-default input as consent surprises users
 - Detached processes with piped stdio — SIGPIPE on parent exit. Use `'ignore'`
 - Subprocess output buffered without size limits — unbounded memory growth
 - Platform-specific: hardcoded shell interpreters; `path.join()` backslashes breaking ESM imports — use `pathToFileURL()`
 - Naive whitespace splitting of command strings breaks quoted arguments — use proper argv parser
 - Subprocess output parsed from single stream (stdout or stderr) to detect conditions — check both streams and exit code
+- Readiness/health probes that rely solely on subprocess exit code without inspecting output — many CLIs (`psql`, `curl`, `kubectl`) exit 0 for empty results, missing schema, or auth-only handshake. Capture stdout and verify it contains the expected marker. For tools that read user-level config (`.psqlrc`, `~/.curlrc`), pass flags that ignore those files (`-X`, `--no-rcfile`) so the probe behaves the same in every environment
+- Setup/provisioning scripts invoked from hot paths (`npm start`, dev script, container entrypoint) that mutate credentials, privileges, or installed-package state (`ALTER USER`, password resets, brew installs) on every invocation — gate the heavy work behind a cheap readiness check, OR refactor each step to be idempotent and detect already-applied state
 - Shell expansions suppressed by quoting — single quotes prevent all expansion
+- Arguments passed via process argv have OS-imposed length limits (notoriously low on Windows, ~32KB). For variable-length payloads (prompts, JSON blobs, file contents), pipe via stdin instead of constructing a long argv. If argv must be used, enforce a strict cap and fail with a clear message before spawning
 **Search & navigation** _[search, deep-linking]_
 - Search results linking to generic list pages instead of deep-linking to specific record
@@ -98,17 +131,22 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 - Destructive actions without confirmation step
 **Accessibility** _[UI components, interactive elements]_
-- Interactive elements missing accessible names, roles, or ARIA states — including labels removed or replaced with non-descriptive placeholders in conditional/compact rendering modes
+- Interactive elements missing accessible names, roles, or ARIA states — including labels removed or replaced with non-descriptive placeholders in conditional/compact rendering modes. ARIA attributes should match established patterns used elsewhere for the same widget type (disclosure, menu, dialog)
+- ARIA roles applied without the keyboard interactions they imply — `role="menu"`/`menuitem*` expects roving focus, arrow-key navigation, Escape scoped to the menu, and focus management; `role="listbox"` expects Home/End/typeahead; `role="dialog"` expects focus trap + return focus. Either implement the full interaction pattern or drop to a simpler one (native `<button>` + disclosure)
+- Nested inputs handling `Escape`/`Enter`/`ArrowUp`/`ArrowDown` inside a modal/form that also handles the key at the ancestor — the event bubbles and the ancestor fires too (closes modal, submits form). Call `e.stopPropagation()` (and usually `preventDefault()`) in the inner handler
 - Custom toggles from non-semantic elements instead of native inputs
 - Overlay layers with `pointer-events-auto` intercepting clicks beneath; `pointer-events-none` on parent killing child hover handlers
 **UI performance** _[UI components with streaming, scroll, or frequent updates]_
 - Event handlers or `useEffect` callbacks firing on every high-frequency event (streaming deltas, scroll, resize, keydown) without throttle, debounce, or `requestAnimationFrame` batching — causes jank and excessive re-renders. Batch with rAF or a time-based limiter when the handler doesn't need to run on every tick
+**Wire-protocol parsers** _[SSE/NDJSON/line-delimited frame parsers]_
+- Wire-protocol parsers must (a) handle the spec's full set of separators (e.g., both `\n\n` and `\r\n\r\n` for SSE; multiple `data:` lines joined with `\n`); (b) flush remaining buffered content on EOF — otherwise the last frame is dropped when upstream closes mid-frame; (c) wrap per-frame deserialization (`JSON.parse`) so a single malformed frame doesn't terminate the entire stream
 ### Always Check — Quality & Conventions
 **Intent vs implementation (single-file)**
-- Labels, comments, status messages describing behavior the code doesn't implement
+- Labels, comments, status messages describing behavior the code doesn't implement. Also covers factual doc drift: file paths/extensions (`foo.js` referenced when the file is `foo.jsx`), item counts ("13 widgets" when there are 15), default entity names ("Default" vs actual "Everything"), and route/response-shape comments that don't match what the handler returns. Verify every factual claim in a comment or JSDoc against the code it references
 - Inline code examples or command templates that aren't syntactically valid
 - Sequential numbering with gaps or jumps after edits
 - Template/workflow variables referenced but never assigned — trace each placeholder to a definition
@@ -119,6 +157,10 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 - Lookups checking only one scope when multiple exist (local branches but not remote)
 - Tracking/checkpoint files defaulting to empty on parse failure — fail-open guards
 - Registering references to resources without verifying resource exists
+- Composed instructions, prompts, system messages, or rule sets that vary by mode/role/context — unconditional clauses can contradict mode-specific directives (e.g., "always cite sources inline" combined with a `draft` mode that asks for "no preamble, no commentary"). Build the composition conditionally — include each block only for modes that want it — or define an explicit precedence so contradictions are predictable
+**UX integrity (single-component)**
+- Unsaved changes / dirty state silently discarded when the user switches context in a multi-record editor or closes a sheet — data loss. Dirty-check on switch (inline confirm), auto-save drafts, or disable the switch control while dirty. `beforeunload` does not cover in-app context switches
 **AI-generated code quality**
 - New abstractions, wrapper functions, helper files serving only one call site — inline instead
@@ -126,6 +168,7 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 - Commit messages claiming a fix while the bug remains
 - Placeholder comments (`// TODO`, `// FIXME`) or stubs presented as complete
 - Unnecessary defensive code for scenarios that provably cannot occur
+- Cleanup callbacks (useEffect return, finalizer, dispose, signal handler) containing only comments are misleading — implement the cleanup or remove the callback entirely
 **Configuration & hardcoding**
 - Hardcoded values when config/env var exists; dead config fields; unused function parameters
@@ -149,6 +192,7 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 - Missing tests for trust-boundary enforcement
 - Tests exercising code paths the integration layer doesn't expose — pass against mocks but untriggerable in production
 - Test mock state leaking between tests — "clear" resets invocation counts but not configured behavior; use "reset" variants
+- Response/status assertions written as loose ranges (`status >= 400`, `status < 500`, `ok: false`) — a regression that turns a 400 validation failure into a 500 still passes. Assert the specific expected status so tests distinguish validation from server failure
 **Automated pipeline discipline**
 - Internal code review must run before creating PRs — never go straight from "tests pass" to PR
@@ -157,7 +201,7 @@ For each changed file, read the **ENTIRE file** (not just diff hunks). New code
 **Style & conventions**
 - Naming and patterns inconsistent with rest of codebase
-- New content not matching existing indentation, bullet style, heading levels
+- New content not matching existing indentation, bullet style, heading levels. Within a single structured file (changelog, README, TOML config), section headers must be unique — duplicate `## Fixed` blocks or repeated table sections are a merge artifact that splits content downstream tools expect to find under one header. Consolidate
 - Shell instructions with destructive operations not verifying preconditions first
 ## Output Format

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "slash-do",
-  "version": "2.10.0",
+  "version": "2.11.0",
   "description": "Curated slash commands for AI coding assistants — Claude Code, OpenCode, Gemini CLI, and Codex",
   "author": "Adam Eivy <adam@eivy.com>",
   "license": "MIT",