npm - slash-do - Versions diffs - 2.3.0 → 2.5.0 - Mend

slash-do 2.3.0 → 2.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/commands/do/review.md +46 -232
package/commands/do/rpr.md +36 -4
package/install.sh +1 -0
package/lib/code-review-checklist.md +6 -3
package/lib/review-cross-file-tracing.md +227 -0
package/lib/review-security-audit.md +76 -0
package/lib/review-surface-scan.md +161 -0
package/package.json +1 -1
package/uninstall.sh +1 -0

package/commands/do/review.md CHANGED Viewed

@@ -15,253 +15,75 @@ If there are no changes, inform the user and stop.
 ## Apply Project Conventions
-CLAUDE.md is already loaded into your context. Use its rules (code style, error handling, logging, security model, scope exclusions) as overrides to generic best practices throughout this review. For example, if CLAUDE.md says "no auth needed — internal tool", do not flag missing authentication.
-<review_instructions>
+CLAUDE.md is already loaded into your context. Use its rules (code style, error handling, logging, security model, scope exclusions) as overrides to generic best practices throughout this review. Pass relevant convention overrides to each agent so they don't flag things the project intentionally allows (e.g., "no auth needed — internal tool").
 ## PR-Level Coherence Check
-Before reviewing individual files, understand what this change set claims to do:
+Before dispatching agents, understand what this change set claims to do:
 1. Read commit messages (`git log {base}...HEAD --oneline`)
-2. After reviewing all files, verify: does the changed code actually deliver what the commits claim? Flag any claims not backed by code (e.g., "adds rate limiting" but only adds a comment).
-## Large PR Strategy
-If the diff touches more than 15 files, split the review into batches:
-1. Group files by module/directory
-2. Review each batch, printing findings as you go
-3. Delegate files beyond the first 15 to a subagent if context is getting full
-## Deep File Review
-For **each changed file** in the diff, read the **entire file** (not just diff hunks). Reviewing only the diff misses context bugs where new code interacts incorrectly with existing code.
-### Understand the Code Flow
-Before checking individual files against the checklist, **map the flow of changed code across all files**. This means:
-1. **Trace call chains** — for each new or modified function/method, identify every caller and callee across the changed files. Read those files too if needed. You cannot evaluate whether code is duplicated or well-structured without knowing how it connects.
-2. **Identify shared data paths** — trace data from entry point (route handler, event listener, CLI arg) through transforms, storage, and output. Understand what each layer is responsible for.
-3. **Map responsibilities** — for each changed module/file, state its single responsibility in one sentence. If you can't, it may be doing too much.
-### Evaluate Software Engineering Principles
-With the flow understood, evaluate the changed code against these principles:
-**DRY (Don't Repeat Yourself)**
-- Look for logic duplicated across changed files or between changed and existing code. Grep for similar function signatures, repeated conditional blocks, or copy-pasted patterns with minor variations.
-- If two functions do nearly the same thing with small differences, they should likely share a common implementation with the differences parameterized.
-- Duplicated validation, error formatting, or data transformation are common violations.
-**YAGNI (You Ain't Gonna Need It)**
-- Flag abstractions, config options, parameters, or extension points that serve no current use case. Code should solve the problem at hand, not hypothetical future problems.
-- Unnecessary wrapper functions, premature generalization (e.g., a factory that produces one type), and unused feature flags are common violations.
-**SOLID Principles**
-- **Single Responsibility** — each module/function should have one reason to change. If a function handles both business logic and I/O formatting, flag it.
-- **Open/Closed** — new behavior should be addable without modifying existing working code where practical (e.g., strategy patterns, plugin hooks).
-- **Liskov Substitution** — if subclasses or interface implementations exist, verify they are fully substitutable without breaking callers.
-- **Interface Segregation** — callers should not depend on methods they don't use. Large config objects or option bags passed through many layers are a smell.
-- **Dependency Inversion** — high-level modules should not import low-level implementation details directly when an abstraction boundary would be cleaner.
-**Separation of Concerns**
-- Business logic should not be tangled with transport (HTTP, WebSocket), storage (SQL, file I/O), or presentation (HTML, JSON formatting).
-- If a route handler contains business rules beyond simple delegation, flag it.
-**Naming & Readability**
-- Function and variable names should communicate intent. If you need to read the implementation to understand what a name means, it's poorly named.
-- Boolean variables/params should read as predicates (`isReady`, `hasAccess`), not ambiguous nouns.
-For this review, only flag principle and design violations that are **concrete and actionable** in the code changed by this PR. However, if you discover a clear, real bug or correctness issue — even in code not directly modified here — call it out and help ensure it gets fixed (in this PR or a follow-up). Never dismiss serious problems as "out of scope" or "not modified in this PR."
-</review_instructions>
-<checklist>
-### Per-File Checklist
-Check every file against this checklist. The checklist is organized into tiers — always check Tiers 1 and 4, and check Tiers 2-3 only when the relevance filter matches the file:
-!`cat ~/.claude/lib/code-review-checklist.md`
-</checklist>
-<deep_checks>
-### Additional deep checks (read surrounding code to verify):
-**Cross-file consistency**
-- If a new function/endpoint follows a pattern from an existing similar one, verify ALL aspects match (validation, error codes, response shape, cleanup). Partial copying is the #1 source of review feedback.
-- New API client functions should use the same encoding/escaping as existing ones (e.g., if other endpoints use `encodeURIComponent`, new ones must too)
-- If the PR adds a new endpoint, trace where existing endpoints are registered and verify the new one is wired in all runtime adapters (serverless handler map, framework route file, API gateway config, local dev server) — a route registered in one adapter but missing from another will silently 404 in the missing runtime
-- If the PR adds a new call to an external service that has established mock/test infrastructure (mock mode flags, test helpers, dev stubs), verify the new call uses the same patterns — bypassing them makes the new code path untestable in offline/dev environments and inconsistent with existing integrations
-- If the PR adds a new UI component or client-side consumer against an existing API endpoint, read the actual endpoint handler or response shape — verify every field name, nesting level, identifier property, and response envelope path used in the consumer matches what the producer returns. This is the #1 source of "renders empty" bugs in new views built against existing APIs
-- If the PR adds or modifies a discovery/catalog endpoint that enumerates available capabilities (actions, node types, valid options) for a downstream consumer API, trace the full enumerated set against the consumer's actual supported inputs: verify every advertised item can be consumed without error, every consumer-supported item is discoverable, and any identifier transformations (naming conventions, case conversions, key format changes) between discovery output and consumer input preserve the format the consumer expects — mismatches produce runtime errors that no amount of unit testing will catch because the two sides are tested independently
-**Push/real-time event scoping**
-- If the PR adds or modifies WebSocket, SSE, or pub/sub event emission, trace the event scope: does the event reach only the originating session/user, or is it broadcast to all connected clients? Check payloads for sensitive content (user inputs, images, tokens) that should not leak across sessions. If the consumer filters by a correlation ID, verify the producer includes one and that the ID is generated server-side or validated against the session
-**Cleanup/teardown side effect audit**
-- If the PR adds cleanup, teardown, or garbage-collection functions, trace whether the cleanup performs implicit state mutations (auto-merge into main, auto-commit of unreviewed changes, cascade writes to shared state). Verify the cleanup aborts safely if a prerequisite step fails (e.g., saving dirty state before deletion) rather than proceeding with data loss
+2. Note the claims — verify after agents return whether the code actually delivers them.
-**Specification/standard conformance**
-- If the PR implements or extends a parser for a well-known format (cron expressions, date formats, URLs, semver, MIME types), verify boundary handling matches the specification — especially field-specific ranges (month starts at 1, not 0), normalization conventions (cron DOW 0 and 7 both mean Sunday), and step/range semantics that differ per field type
+## Dispatch Review Agents
-**Temporal context consistency**
-- If the PR adds timezone-aware logic alongside existing non-timezone-aware comparisons in the same code flow (e.g., a weekday gate using UTC while cron matching uses user timezone), check that all temporal comparisons in the flow use the same timezone context — mixed contexts cause operations to trigger on the wrong local day/hour
+Read the three agent instruction files, then spawn **all three in parallel** using the Agent tool. Each agent reviews ALL changed files independently.
-**Status/health endpoint freshness**
-- If the PR adds or modifies a status or health-check endpoint, trace whether it returns live probe results or cached data. Cached health checks mask real-time failures — a cache keyed by URL that survives URL reconfiguration reports stale status. Verify health endpoints bypass caches or use sufficiently short TTLs
+<surface_scan_agent>
-**Boolean/type fidelity through serialization boundaries**
-- If the PR persists boolean flags to text-based storage (markdown metadata, flat files, query strings, form data), trace the round-trip: write path → storage format → read/parse path → consumption site. Boolean `false` serialized as the string `"false"` is truthy in JavaScript — verify all consumption sites use strict equality or a dedicated coercion function, and that the same coercion is applied consistently
+### 1. Surface Scan Agent
-**Cross-layer invariant enforcement**
-- If the PR introduces or modifies an invariant relationship between configuration flags (e.g., "flag A implies flag B"), trace enforcement through every layer: UI toggle handlers, form submission payloads, API validation schemas, server default-application functions, and persistence round-trip. If any layer allows the invariant to be violated, cascading defaults produce contradictory state
+Catches per-file bugs: runtime crashes, hygiene, domain-specific issues, quality, and convention violations.
-**Error path completeness**
-- Trace each error path end-to-end: does the error reach the user with a helpful message and correct HTTP status? Or does it get swallowed, logged silently, or surface as a generic 500?
-- For multi-step operations (sync to N repos, batch updates): are per-item failures tracked separately from overall success? Does the status reflect partial failure accurately?
+!`cat ~/.claude/lib/review-surface-scan.md`
-**Concurrency under user interaction**
-- If a component performs optimistic updates with async operations, simulate what happens when the user triggers a second action while the first is in-flight — trace whether rollback/success handlers can clobber concurrent state changes or close over stale snapshots
+</surface_scan_agent>
-**State ownership across component boundaries**
-- If a child component maintains local state derived from a parent's data (e.g., optimistic UI copies), trace the ownership boundary: does the child propagate changes back to the parent? What happens on unmount/remount — does the parent's stale cache resurface?
+<security_agent>
-**Data flow audit**
-- For sensitive data (secrets, tokens): trace the value from input → storage → retrieval → response. Verify it is never leaked in ANY response path (GET, PUT, POST, error responses, socket events)
-- For user input → URL/command interpolation: verify encoding/escaping at every boundary
+### 2. Security Audit Agent
-**Access scope changes**
-- If the PR widens access to an endpoint or resource (admin→public, internal→external), trace all shared dependencies the endpoint uses (rate limiters, queues, connection pools, external service quotas) and assess whether they were sized for the previous access level — in-memory/process-local limiters don't enforce limits across horizontally scaled instances
-- If the PR adds endpoints under a restricted route group (admin, internal, scoped), read sibling endpoints in the same route group and verify the new endpoint applies the same authorization gate — missing gates on admin-mounted endpoints are consistently the most dangerous review finding
+Catches trust boundary violations, injection, SSRF, data exposure, and access control gaps.
-**Guard-before-cache ordering**
-- If a handler performs a pre-flight guard check (rate limit, quota, feature flag) before a cache lookup or short-circuit path, verify the guard doesn't block operations that would be served from cache without touching the guarded resource — restructure so cache hits bypass the guard
+!`cat ~/.claude/lib/review-security-audit.md`
-**Sanitization/validation/normalization coverage**
-- If the PR introduces a new validation or sanitization function for a data field, trace every code path that writes to that field (create, update, import, sync, rename, raw/bulk persist) — verify they all use the same sanitization. Partial application is the #1 way invalid data re-enters through an unguarded path
-- If the PR adds a "raw" or bypass write path (e.g., `raw: true` flag, bulk import, migration backfill), compare the normalization it applies against what the standard read/parse path assumes — ID prefixes, required defaults, shape invariants. Data that passes through the raw path must still be valid when reloaded through the normal path
-- If the PR adds a new dispatch branch within a multi-type handler (e.g., coercing a new data shape, handling a new entity subtype), trace sibling branches and verify the new one applies equivalent validation, type-checking, and error-handling constraints — new branches commonly bypass validation that existing branches enforce because the author focuses on happy-path behavior
+</security_agent>
-**Bootstrap/initialization ordering**
-- If the PR adds resilience or self-healing code (dependency installers, auto-repair, migration runners), trace the execution order: does the main code path resolve or import the dependencies BEFORE the resilience code runs? If so, the bootstrapper never executes when it's needed most — restructure so verification/installation precedes resolution
+<cross_file_agent>
-**Self-rescheduling callback resilience**
-- If the PR adds a one-shot timer or deferred callback that re-registers itself for the next cycle, trace what happens when the callback body throws before re-registration — an unhandled error permanently stops the schedule. Verify re-registration is in a finally block or occurs before the main logic
+### 3. Cross-File Tracing Agent
-**Periodic operation skip behavior**
-- If the PR adds skip/gate conditions to periodic operations (scheduled jobs, pollers), trace whether a skip still advances the scheduling state (lastRun, nextFireTime). A skipped execution with null/stale timing state causes immediate re-trigger loops
+Catches contract mismatches, broken call chains, stale state propagation, lifecycle gaps, and architectural violations.
-**Lock/flag exit-path completeness**
-- If a function sets a shared flag or lock (in-progress, mutex, status marker), trace every exit path — early returns, error catches, platform-specific guards, and normal completion — to verify the flag is cleared. A missed path leaves the system permanently locked
+!`cat ~/.claude/lib/review-cross-file-tracing.md`
-**Operation-marker ordering**
-- If the PR writes completion markers, success flags, or status files, verify they are written AFTER the operation they attest to, not before. If the operation can fail after the marker write, consumers see false success. Also check that marker-dependent startup logic validates the marker's contents rather than treating presence as unconditional success
+</cross_file_agent>
-**Real-time event vs response timing**
-- If a handler emits push notifications (WebSocket, SSE, pub/sub) AND returns an HTTP response, verify clients won't receive push events before the response that gives them context to interpret those events — especially when the response contains IDs or version numbers the event consumer needs
+### How to dispatch
-**Paired lifecycle event completeness**
-- If a function emits a "started" or "begin" event, trace every exit path (success, error, early return, no-op branches for specific entity types) and verify each emits the corresponding "completed" or "failed" event. A missing completion event leaves consumers (UI spinners, progress indicators, orchestrators waiting for all-done) in a permanently stale state. Pay special attention to branches that short-circuit for specific entity subtypes — they often return early without the completion event because the author only considered the primary code path
+For each agent, construct its prompt by combining:
+1. The agent's instruction content (from the sections above)
+2. Project convention overrides from CLAUDE.md that affect the review
+3. The list of changed files from the diff stat
+4. Instruction: "Read each changed file in full (not just diff hunks). Apply your checklist. Return structured findings."
-**Entity identity key consistency**
-- If the PR uses a computed identity key to look up, match, or index entities (e.g., `e.id || e.externalId`, `item.slug ?? item.name`), trace all other code paths that perform the same entity lookup and verify they use the identical key computation. Inconsistent key strategies cause mismatches — one path stores data under key A while another reads under key B, leading to phantom missing records or incorrect counts
+Spawn all three agents simultaneously. Each returns its findings independently.
-**Intent vs implementation (meta-cognitive pass)**
-- For each label, comment, docstring, status message, or inline instruction that describes behavior, verify the code actually implements that behavior. A detection mechanism must query the data it claims to detect; a migration must create the target, not just delete the source
-- If the PR contains inline code examples, command templates, or query snippets, verify they are syntactically valid for their language — run a mental parse of each example. Watch for template placeholder format inconsistencies within and across files
-- If the PR modifies a value (identifier, parameter name, format convention, threshold, timeout) that is referenced in other files, trace all cross-references and verify they agree. This includes: reviewer usernames, API names, placeholder formats, GraphQL field names, operational constants
-- If the PR adds or reorders sequential steps/instructions, verify the ordering matches execution dependencies — readers following steps in order must not perform an action before its prerequisite
+### Large PR handling
-**Transactional write integrity**
-- If the PR performs multi-item writes (database transactions, batch operations), verify each write includes condition expressions that prevent stale-read races (TOCTOU) — an unconditioned write after a read can upsert deleted records, double-count aggregates, or drive counters negative. Trace the gap between read and write for each operation. Also verify that update/modify operations won't silently create records when the target key doesn't exist — database update operations often have implicit upsert semantics (e.g., DynamoDB UpdateItem, MongoDB update with upsert) that create partial records for invalid IDs; add existence condition expressions when the operation should only modify existing records
-- If the PR catches transaction/conditional failures, verify the error is translated to a client-appropriate status (409, 404) rather than bubbling as 500 — expected concurrency failures are not server errors
+If the diff touches more than 20 files, tell each agent to batch files by directory and process groups sequentially within their parallel run. The orchestrator does not manage batching.
-**Batch/paginated API consumption**
-- If the PR calls batch or paginated external APIs (database batch gets, paginated queries, bulk service calls), verify the caller handles partial results — unprocessed items, continuation tokens, and rate-limited responses must be retried or surfaced, not silently dropped. Check that retry loops include backoff and attempt limits
-- If the PR references resource names from API responses (table names, queue names), verify lookups account for environment-prefixed names rather than hardcoding bare names
+## Collect & Deduplicate
-**Data model vs access pattern alignment**
-- If the PR adds queries that claim ordering (e.g., "recent", "top"), verify the underlying key/index design actually supports that ordering natively — random UUIDs and non-time-sortable keys require full scans and in-memory sorting, which degrades at scale
+After all three agents return:
-**Deletion/lifecycle cleanup and aggregate reset completeness**
-- If the PR adds a delete or destroy function, trace all resources created during the entity's lifecycle (data directories, git branches, child records, temporary files, worktrees) and verify each is cleaned up on deletion. Compare with existing delete functions in the codebase for completeness patterns
-- If the PR adds a state transition that resets an aggregate value (counter, score, flag count), trace all individual records that contribute to that aggregate and verify they are also cleared, archived, or versioned — a reset counter with stale contributing records causes inconsistency and blocks duplicate-prevention checks on re-entry
-**Update schema depth**
-- If the PR derives an update/patch schema from a create schema (e.g., `.partial()`, `Partial<T>`), verify that nested objects also become partial — shallow partial on deeply-required schemas rejects valid partial updates where the caller only wants to change one nested field
-**Mutation return value freshness**
-- If a function mutates an entity and returns it, verify the returned object reflects the post-mutation state, not a pre-read snapshot. Also check whether dependent scheduling/evaluation state (backoff, timers, status flags) is reset when a "force" or "trigger" operation is invoked
-**Responsibility relocation audit**
-- If the PR moves a responsibility from one module to another (e.g., a database write from a handler to middleware, a computation from client to server), trace all code at the old location that depended on the timing, return value, or side effects of the moved operation — guards, response fields, in-memory state updates, and downstream scheduling that assumed co-located execution. Verify the new execution point preserves these contracts or that dependents are updated. Check for dead code left behind at the old location
-**Read-after-write consistency**
-- If the PR writes to a data store and then immediately queries that store (especially scans, aggregations, or replica reads), check whether the store's consistency model guarantees visibility of the write. If not, flag the read as potentially stale and suggest computing from in-memory state, using consistent-read options, or adding a delay/caveat
-**Security-sensitive configuration parsing**
-- If the PR reads environment variables or config values that affect security behavior (proxy trust depth, rate limit thresholds, CORS origins, token expiry), verify the parsing enforces the expected type and range — e.g., integer-only via `parseInt` with `Number.isInteger` check, non-negative bounds, and a logged fallback to a safe default on invalid input. `Number()` on arbitrary strings accepts floats, negatives, and empty-string-as-zero, all of which can silently weaken security controls
-**Multi-source data aggregation**
-- If the PR aggregates items from multiple sources into a single collection (merging accounts, combining API results, flattening caches), verify each item retains its source identifier through the aggregation — downstream operations that need to route back to the correct source (updates, deletes, detail views) will silently break or operate on the wrong source if the origin is lost
-**Field-set enumeration consistency**
-- If the PR adds an operation that targets a set of entity fields (enrichment, validation, migration, sync), trace every other location that independently enumerates those fields — UI predicates, scan/query filters, API documentation, response shapes, and test assertions. Each must cover the same field set; a missed field causes silent skips or false UI state. Prefer deriving enumerations from a single source of truth (constant array, schema keys) over maintaining independent lists
-**Abstraction layer fidelity**
-- If the PR calls a third-party API through an internal wrapper/abstraction layer, trace whether the wrapper requests and forwards all fields the handler depends on — third-party APIs often have optional response attributes that require explicit opt-in (e.g., cancellation reasons, extended metadata). Code branching on fields the wrapper doesn't forward will silently receive `undefined` and take the wrong path. Also verify that test mocks match what the real wrapper returns, not what the underlying API could theoretically return
-- If the PR passes multiple parameters through a wrapper/abstraction layer to an underlying API, check whether any parameter combinations are mutually exclusive in the underlying API (e.g., projection expressions + count-only select modes) — the wrapper should strip conflicting parameters rather than forwarding all unconditionally, which causes validation errors at the underlying layer
-- If the PR calls framework or library functions with discriminated input formats (e.g., content paths vs script paths, different loader functions per format), trace each call site to verify the function variant used actually handles the input format being passed — especially fallback/default branches in multi-format dispatchers, where the fallback commonly uses the wrong function. Also verify positional argument order matches the called function's parameter order (not assumed from variable names) and that the object type passed matches what the API expects (e.g., asset object vs class reference, property access vs method call)
-**Parameter consumption tracing**
-- If the PR adds a function with validated input parameters (schema validation, input decorators, type annotations), trace each validated parameter through to where it's actually consumed in the implementation. Parameters that pass validation but are never read create dead API surface — callers believe they're configuring behavior that's silently ignored. Either wire the parameter through or remove it from the public API
-**Summary/aggregation endpoint consistency**
-- If the PR adds a summary or dashboard endpoint that aggregates counts/previews across multiple data sources, trace each category's computation logic against the corresponding detail view it links to — verify they apply the same filters (e.g., orphan exclusion, status filtering), the same ordering guarantees (sort keys that actually exist on the queried index), and that navigation links propagate the aggregated context (e.g., `?status=pending`) so the destination page matches what the summary promised
-**Data model / status lifecycle changes**
-- If the PR changes the set of valid statuses, enum values, or entity lifecycle states, sweep all dependent artifacts: API doc summaries and enum declarations, UI filter/tab options, conditional rendering branches (which actions to show per state), integration guide examples, route names derived from old status names, and test assertions. Each artifact that references the old value set must be updated — partial updates leave stale filters, invalid actions, and misleading documentation
-- If the PR renames a concept (e.g., "flagged" → "rejected"), trace all manifestations beyond user-facing labels: route paths, component/file names, variable names, CSS classes, and test descriptions. Internal identifiers using the old name create confusion even when the UI is correct
-**Type-discriminated entity validation**
-- If the PR modifies entities with a discriminator field (type, kind, category), trace all code paths that change the discriminator value — not just the update handler, but also migration paths, bulk operations, and UI type-switchers. Verify downstream branching logic (execution, rendering, validation) handles all type transitions without falling through to the wrong handler
-**Migration/initialization idempotency**
-- If the PR adds a startup-time migration or one-time initialization, verify it is idempotent — what happens when it runs a second time? Check that the migration condition excludes already-migrated records and that completion is recorded (version stamp, flag) to prevent re-entry on every load
-**Data migration semantic preservation**
-- If the PR includes a data migration, trace each migrated field's behavioral meaning before and after. Focus on: migration-time validation matching runtime validation (don't persist values that will fail at execution), concurrency protection (migrations triggered by read paths race with concurrent requests), and unsupported source values being flagged rather than silently defaulted
-**Formatting & structural consistency**
-- If the PR adds content to an existing file (list items, sections, config entries), verify the new content matches the file's existing indentation, bullet style, heading levels, and structure — rendering inconsistencies are the most common Copilot review finding
-**Dependent operation ordering**
-- If a handler orchestrates multiple operations (primary write + side effects like rewards, uploads, notifications), trace the dependency graph — verify side effects only execute after the primary operation confirms success. Watch for `Promise.all` grouping operations that should be sequential because one depends on the other's outcome, and for resource allocation (file uploads, external API calls) happening before a gate operation (lock acquisition, uniqueness check, validation) that may reject the request
-- If the PR handles file uploads or binary data, verify the server validates content independently of client-supplied metadata — check MIME type via magic bytes, size via buffer length, and content validity via actual parsing rather than trusting request headers or middleware-provided fields
-**Bulk vs single-item operation parity**
-- If the PR modifies a single-item CRUD operation (create, update, delete) to handle new fields or apply new logic, trace the corresponding bulk/batch operation for the same entity — it often has its own independent implementation that won't pick up the change. Verify both paths handle the same fields, apply the same validation, and preserve the same secondary data
-**Bulk operation selection lifecycle**
-- If the PR adds operations that act on a user-selected subset of items (bulk actions, batch operations), trace the complete lifecycle of the selection state: when is it cleared (data refresh, item deletion), when is it not cleared but should be (filter/sort/page changes), and whether the operation re-validates the selection at execution time (especially after confirmation dialogs where the underlying data may change between display and confirmation)
-**Config value provenance for auto-upgrade**
-- If the PR adds auto-upgrade logic that replaces config values with newer defaults (prompt versions, schema migrations, template updates), verify the code can distinguish "user customized this value" from "this is the previous default." Without provenance tracking (version stamps, customization flags, or comparison against known previous defaults), auto-upgrade will overwrite intentional user customizations or skip legitimate upgrades
-**Query key / stored key precision alignment**
-- If the PR adds queries that construct lookup keys with a different precision, encoding, or format than what the write path persists, the query will silently return zero matches. Trace the key construction in both write and read paths and verify they produce compatible values
-</deep_checks>
-<verify_findings>
+1. **Merge** all findings into a single list, tagged by source agent
+2. **Deduplicate**: if two agents flagged the same `file:line` with overlapping descriptions, keep the most detailed version and note both agents found it
+3. **PR coherence**: verify commits deliver what they claim — flag discrepancies as IMPROVEMENT findings
+4. **CLAUDE.md filter**: remove findings that conflict with explicit project conventions
 ## Verify Findings
-For each issue found, ground it in evidence before classifying:
+For each finding, ground it in evidence before classifying:
 1. **Quote the specific code line(s)** that demonstrate the issue
 2. **Explain why it's a problem** in one sentence given the surrounding context
 3. If the fix involves async/state changes, **trace the execution path** to confirm the issue is real
@@ -269,16 +91,12 @@ For each issue found, ground it in evidence before classifying:
 After verifying all findings, run the project's build and test commands to confirm no false positives.
-</verify_findings>
-<fix_and_report>
-## Fix Issues Found
+## Fix Issues
-For each verified issue:
+For each verified finding:
 1. Classify severity: **CRITICAL** (runtime crash, data leak, security) vs **IMPROVEMENT** (consistency, robustness, conventions)
 2. Fix all CRITICAL issues immediately
-3. For IMPROVEMENT issues, fix them too — the goal is to eliminate Copilot review round-trips
+3. For IMPROVEMENT issues, fix them too — the goal is to eliminate review round-trips
 4. After fixes, run the project's test suite and build command (per project conventions already in context)
 5. Verify the test suite covers the changed code paths — passing unrelated tests is not validation
 6. Commit fixes: `refactor: address code review findings`
@@ -290,13 +108,15 @@ Print a summary table of what was reviewed and found:
 ```
 ## Review Summary
-| Category | Files Checked | Issues Found | Fixed |
-|----------|--------------|-------------|-------|
-| Hygiene  | N            | N           | N     |
-| ...      | ...          | ...         | ...   |
+| Agent | Files Checked | Issues Found | Fixed |
+|-------|--------------|-------------|-------|
+| Surface Scan | N | N | N |
+| Security Audit | N | N | N |
+| Cross-File Tracing | N | N | N |
+| **Total** | **N** | **N** | **N** |
 ### Issues Fixed
-- file:line — description of fix
+- file:line — description of fix (agent: Surface/Security/Cross-File)
 ### Accepted As-Is (with rationale)
 - file:line — description and why it's acceptable
@@ -304,10 +124,6 @@ Print a summary table of what was reviewed and found:
 If no issues were found, confirm the code is clean and ready for PR.
-</fix_and_report>
-<pr_comment_policy>
 ## PR Comment Policy
 After the review and any fixes, determine whether to post review comments on the PR/MR:
@@ -318,5 +134,3 @@ After the review and any fixes, determine whether to post review comments on the
 4. **If the PR was opened by someone else**, post a review comment on the PR summarizing the findings using `gh pr review {number} --comment --body "..."`. Include the issues found, fixes applied, and any remaining items that need the author's attention.
 This avoids noisy self-comments on your own PRs while still providing feedback to other contributors.
-</pr_comment_policy>

package/commands/do/rpr.md CHANGED Viewed

@@ -15,7 +15,16 @@ Address the latest code review feedback on the current branch's pull request usi
 1. **Get the current PR and determine repo ownership**: Use `gh pr view --json number,url,reviewDecision,reviews,headRefName,baseRefName` to find the PR for this branch. Parse owner/name from `gh repo view --json owner,name`. Also check the PR's base repository owner — if the PR targets an upstream repo you don't own (i.e., a fork-to-upstream PR), note this as `is_fork_pr=true`. You can detect this by comparing the PR URL's owner against your authenticated user (`gh api user --jq .login`).
-2. **Request Copilot code review** (only if `is_fork_pr=false`): Follow the "Requesting GitHub Copilot Code Review" section below to request a review, then poll until the review is complete before proceeding. **Skip this step entirely for fork-to-upstream PRs** — you don't have permission to request reviewers on repos you don't own.
+2. **Check for existing code review** (only if `is_fork_pr=false`): Before requesting a new review, check if there's already a completed Copilot review or a pending Copilot review in progress. Query the PR's review requests and recent reviews:
+   ```bash
+   gh api graphql -f query='{ repository(owner: "OWNER", name: "REPO") { pullRequest(number: PR_NUM) { reviewRequests(first: 10) { nodes { requestedReviewer { ... on Bot { login } } } } reviews(last: 20) { nodes { state body author { login } submittedAt } } } } }'
+   ```
+   - **If at least one completed Copilot review exists** (a review in `reviews.nodes` authored by `copilot-pull-request-reviewer`): Skip requesting a new review — proceed directly to step 3 to fetch and address the existing feedback threads.
+   - **If a Copilot review is currently pending** (Copilot appears in `reviewRequests.nodes[].requestedReviewer` as `copilot-pull-request-reviewer`): Treat the review as in progress. Poll for completion using the "Poll for review completion" section below, and consider it complete once a new Copilot review appears in `reviews.nodes` with a `submittedAt` timestamp later than the latest Copilot review timestamp you observed before starting to poll. Then proceed to step 3.
+   - **If no Copilot review exists and no Copilot review is currently requested**: Request a new Copilot review per the "Requesting GitHub Copilot Code Review" section below, poll until complete, then proceed.
+   - **Skip this step entirely for fork-to-upstream PRs** — you don't have permission to request reviewers on repos you don't own.
+   **While waiting for review**: During any review polling wait, check CI status in parallel (see "CI Health Check During Review Polling" section below). If CI is failing, fix the failures, commit, and push before the review completes. This avoids wasting a review cycle on code that won't pass CI anyway.
 3. **Fetch review comments**: Use `gh api graphql` with stdin JSON to get all unresolved review threads. **CRITICAL: Do NOT use `$variables` in GraphQL queries — shell expansion consumes `$` signs.** Always inline values and pipe JSON via stdin:
    ```bash
@@ -47,16 +56,18 @@ Address the latest code review feedback on the current branch's pull request usi
    - Stage all changed files and commit with a descriptive message summarizing what was addressed. Do not include co-author info.
    - Push to the branch.
-8. **Resolve conversations**: For each addressed thread, resolve it via GraphQL mutation using stdin JSON. Track resolution count against the total from step 3. **Never use `$variables` in the query — inline the thread ID directly**:
+7. **Resolve conversations**: For each addressed thread, resolve it via GraphQL mutation using stdin JSON. Track resolution count against the total from step 3. **Never use `$variables` in the query — inline the thread ID directly**:
    ```bash
    echo '{"query":"mutation { resolveReviewThread(input: {threadId: \"THREAD_ID_HERE\"}) { thread { id isResolved } } }"}' | gh api graphql --input -
    ```
-9. **Request another Copilot review** (only if `is_fork_pr=false`): After pushing fixes, request a fresh Copilot code review and repeat from step 3 until the review passes clean. **Skip for fork-to-upstream PRs.**
+8. **Request another Copilot review** (only if `is_fork_pr=false`): After pushing fixes, request a fresh Copilot code review and repeat from step 3 until the review passes clean. **Skip for fork-to-upstream PRs.**
+   **While waiting for review**: Check CI status in parallel during polling (see "CI Health Check During Review Polling" section below). Fix any CI failures before the review completes.
    **Repeated-comment dedup**: When fetching threads after a new Copilot review round, compare each new unresolved thread's comment body and file/line against threads from the previous round that were intentionally left unresolved (replied to as non-issues or disagreements). If all new unresolved threads are repeats of previously-dismissed feedback, treat the review as clean (no new actionable comments) and exit the loop.
-10. **Report summary**: Print a table of all threads addressed with file, line, and a brief description of the fix. Include a final count line: "Resolved X/Y threads." If any threads remain unresolved, list them with reasons (unclear feedback, disagreement, requires user input).
+9. **Report summary**: Print a table of all threads addressed with file, line, and a brief description of the fix. Include a final count line: "Resolved X/Y threads." If any threads remain unresolved, list them with reasons (unclear feedback, disagreement, requires user input).
 !`cat ~/.claude/lib/graphql-escaping.md`
@@ -86,6 +97,27 @@ The review is complete when a new `copilot-pull-request-reviewer` review node ap
 **Error detection**: After a review appears, check its `body` for error text such as "Copilot encountered an error" or "unable to review this pull request". If found, this is NOT a successful review — log a warning, re-request the review (same API call above), and resume polling. Allow up to 3 error retries. After 3 failures: **Default mode**: auto-skip and continue. **Interactive mode (`--interactive`)**: ask the user whether to continue or skip.
+## CI Health Check During Review Polling
+While polling for a Copilot review to complete, use the wait time productively by checking CI status:
+1. **Check for running/completed checks** on the current commit:
+   ```bash
+   gh pr checks --json name,state,conclusion,detailsUrl
+   ```
+2. **If any check has failed**: Extract the run ID from the failed check's `detailsUrl` and fetch logs:
+   ```bash
+   RUN_ID="$(gh pr checks --json name,conclusion,detailsUrl \
+     --jq '.[] | select(.conclusion=="FAILURE") | .detailsUrl | capture("/runs/(?<id>[0-9]+)") | .id' \
+     | head -n1)"
+   gh run view "$RUN_ID" --log-failed
+   ```
+   Fix the failure, run tests locally to confirm, commit, and push. The Copilot review request will automatically apply to the new commit on most repos — if not, re-request after the push.
+3. **If checks are still pending**: No action needed — continue polling for the review. Check CI status again on subsequent poll iterations.
+4. **If all checks pass**: No action needed — continue polling for the review.
+This ensures CI failures are caught and fixed early rather than discovered after a full review cycle.
 ## Notes
 - Only resolve threads where you've actually addressed the feedback

package/install.sh CHANGED Viewed

@@ -56,6 +56,7 @@ OLD_COMMANDS=(cam good makegoals makegood optimize-md)
 LIBS=(
   code-review-checklist copilot-review-loop graphql-escaping
   remediation-agent-template swift-review-checklist
+  review-surface-scan review-security-audit review-cross-file-tracing
 )
 HOOKS=(slashdo-check-update slashdo-statusline)

package/lib/code-review-checklist.md CHANGED Viewed

@@ -7,7 +7,7 @@
 ## Tier 1 — Always Check (Runtime Crashes, Security, Hygiene)
    **Hygiene**
-   - Leftover debug code (`console.log`, `debugger`, TODO/FIXME/HACK), hardcoded secrets/credentials, and uncommittable files (.env, node_modules, build artifacts)
+   - Leftover debug code (`console.log`, `debugger`, TODO/FIXME/HACK), hardcoded secrets/credentials, and uncommittable files (.env, node_modules, build artifacts, runtime-generated data/reports)
    - Overly broad changes that should be split into separate PRs
    **Imports & references**
@@ -27,7 +27,7 @@
    - Route params passed to services without format validation; path containment checks using string prefix without path separator boundary (use `path.relative()`)
    - Parameterized/wildcard routes registered before specific named routes — the generic route captures requests meant for the specific endpoint (e.g., `/:id` registered before `/drafts` matches `/drafts` as `id="drafts"`). Verify route registration order or use path prefixes to disambiguate
    - Stored or external URLs rendered as clickable links (`href`, `src`, `window.open`) without protocol validation — `javascript:`, `data:`, and `vbscript:` URLs execute in the user's browser. Allowlist `http:`/`https:` (and `mailto:` if needed) before rendering; for all other schemes, render as plain text or strip the value
-   - Server-side HTTP requests using user-configurable or externally-stored URLs without protocol allowlisting (http/https only) and host/network restrictions — the server becomes an SSRF proxy for reaching internal network services, cloud metadata endpoints, or localhost-bound APIs. Validate scheme and restrict to expected hosts or external-only ranges before any server-side fetch
+   - Server-side HTTP requests using user-configurable or externally-stored URLs without protocol allowlisting (http/https only) and host/network restrictions — the server becomes an SSRF proxy for reaching internal network services, cloud metadata endpoints, or localhost-bound APIs. Validate scheme and restrict to expected hosts or external-only ranges before any server-side fetch. Also check redirect handling: auto-following redirects (`redirect: 'follow'`) bypasses initial host validation when a public URL redirects to an internal IP. Disable auto-follow and revalidate each hop, or resolve DNS and block private/loopback/link-local ranges before connecting — public hostnames can resolve to internal IPs via DNS rebinding
    - Error/fallback responses that hardcode security headers instead of using centralized policy — error paths bypass security tightening
    **Trust boundaries & data exposure**
@@ -65,7 +65,7 @@
    **Resource management** _[applies when: code uses event listeners, timers, subscriptions, or useEffect]_
    - Event listeners, socket handlers, subscriptions, timers, and useEffect side effects are cleaned up on unmount/teardown
-   - Deletion/destroy and state-reset functions that clean up or reset the primary resource but leave orphaned or inconsistent secondary resources (data directories, git branches, child records, temporary files, per-user flag/vote items) — trace all resources created during the entity's lifecycle and verify each is removed on delete. For state transitions that reset aggregate values (counters, scores, flags), also clear or version the individual records that contributed to those aggregates — otherwise the aggregate and its sources disagree, and duplicate-prevention checks block legitimate re-entry. Also check cleanup operations that perform implicit state mutations (auto-merge, auto-commit, cascade writes) as part of teardown — these can introduce unreviewed changes or silently modify shared state. Verify cleanup fails safely when a prerequisite step (e.g., saving dirty state) fails rather than proceeding with data loss
+   - Deletion/destroy and state-reset functions that clean up or reset the primary resource but leave orphaned or inconsistent secondary resources (data directories, git branches, child records, temporary files, per-user flag/vote items) — trace all resources created during the entity's lifecycle and verify each is removed on delete. Also check the inverse: preservation guards that prevent cleanup under overly broad conditions (e.g., preserving a branch "in case there were commits" when the ahead count is zero) — over-conservative guards leak resources over time. For state transitions that reset aggregate values (counters, scores, flags), also clear or version the individual records that contributed to those aggregates — otherwise the aggregate and its sources disagree, and duplicate-prevention checks block legitimate re-entry. Also check cleanup operations that perform implicit state mutations (auto-merge, auto-commit, cascade writes) as part of teardown — these can introduce unreviewed changes or silently modify shared state. Verify cleanup fails safely when a prerequisite step (e.g., saving dirty state) fails rather than proceeding with data loss
    - Initialization functions (schedulers, pollers, listeners) that don't guard against multiple calls — creates duplicate instances. Check for existing instances before reinitializing
    - Self-rescheduling callbacks (one-shot timers, deferred job handlers) where the next cycle is registered inside the callback body — an unhandled error before the re-registration call permanently stops the schedule. Wrap the callback body in try/finally with re-registration in the finally block, or register the next cycle before executing the current one
@@ -92,6 +92,7 @@
    - Inconsistent "missing value" semantics across layers — one layer treats `null`/`undefined` as missing while another also treats empty strings or whitespace-only strings as missing. Query filters, update expressions, and UI predicates that disagree on what constitutes "missing" cause records to be skipped by one path but processed by another. Define a single `isMissing` predicate and use it consistently, or normalize empty/whitespace values to `null` at write time. Also applies to comparison/detection logic: coercing an absent field to a sentinel (`?? 0`, default parameters) makes the logic treat "unsupported" as a real value — guard with an explicit presence check before comparing. Watch for validation/sanitization functions that return `null` for invalid input when `null` also means "clear/delete" downstream — malformed input silently destroys existing data. Distinguish "invalid, reject the request" from "explicitly clear this field". Also applies to normalization (trailing slashes, case, whitespace): if one path normalizes a value before comparison but the write path stores it un-normalized, comparisons against the stored value produce incorrect results — normalize at write time or normalize both sides consistently
    - Validation functions that delegate to runtime-behavior computations (next schedule occurrence, URL reachability, resource resolution) — conflating "no result within search window" or "temporarily unavailable" with "invalid input" rejects valid configurations. Validate syntax and structure independently of runtime feasibility
    - Numeric values from strings used without `NaN`/type guards — `NaN` comparisons silently pass bounds checks. Clamp query params to safe lower bounds
+   - Hand-rolled regex validators for well-known formats (IP addresses, email, URLs, dates, semver) that accept invalid inputs or reject valid ones — use platform/standard library parsers instead (e.g., `net.isIP()`, `URL` constructor, `semver.valid()`) which handle edge cases the regex misses
    - UI elements hidden from navigation but still accessible via direct URL — enforce restrictions at the route level
    - Summary counters/accumulators that miss edge cases (removals, branch coverage, underflow on decrements — guard against going negative with lower-bound conditions); counters incremented before confirming the operation actually changed state — rejected, skipped, or no-op iterations inflate success counts. Batch operations that report overall success while silently logging per-item failures — callers see success but partial work was done; collect and return per-item failures in the response. Silent operations in verbose sequences where all branches should print status
@@ -148,6 +149,7 @@
    - Subprocess output buffered in memory without size limits — a noisy or stuck child process can cause unbounded memory growth. Cap in-memory buffers and truncate or stream to disk for long-running commands
    - Platform-specific assumptions — hardcoded shell interpreters, `path.join()` backslashes breaking ESM imports. Use `pathToFileURL()` for dynamic imports
    - Naive whitespace splitting of command strings (`str.split(/\s+/)`) breaks quoted arguments — use a proper argv parser or explicitly disallow quoted/multi-word arguments when validating shell commands
+   - Subprocess output parsed from a single stream (stdout or stderr) to detect conditions (conflicts, errors, specific states) — the information may appear in the other stream or vary by tool version/config. Check both stdout and stderr, and verify the exit code, to reliably detect the condition
    - Shell expansions (brace `{a,b}`, glob `*`, tilde `~`, variable `$VAR`) suppressed by quoting context — single quotes prevent all expansion, so patterns like `--include='*.{ts,js}'` pass the literal braces to the command instead of expanding. Use multiple flags, unquoted brace expansion (bash-only), or other command-specific syntax when expansion is required
    **Search & navigation** _[applies when: code implements search results or deep-linking]_
@@ -160,6 +162,7 @@
    **Accessibility** _[applies when: code modifies UI components or interactive elements]_
    - Interactive elements missing accessible names, roles, or ARIA states — including disabled interactions without `aria-disabled`
    - Custom toggle/switch UI built from non-semantic elements instead of native inputs
+   - Overlay or absolutely-positioned layers with broad `pointer-events-auto` that intercept clicks/hover intended for elements beneath — use `pointer-events-none` on decorative overlays and enable events only on small interactive affordances. Conversely, `pointer-events-none` on a parent kills hover/click handlers on children — verify both directions when layering positioned elements
 ## Tier 4 — Always Check (Quality, Conventions, AI-Generated Code)

package/lib/review-cross-file-tracing.md ADDED Viewed

@@ -0,0 +1,227 @@
+# Cross-File Tracing Review Agent
+## Mandate
+You review code by tracing data and control flow ACROSS files. You catch issues invisible in single-file review: mismatched contracts, broken call chains, stale state propagation, lifecycle gaps, and architectural violations.
+## Reading Strategy
+1. Read ALL changed files to understand each module's responsibility
+2. Trace call chains: for each new/modified function, identify callers and callees across files. Read unchanged files when needed to verify contracts
+3. Map data from entry points (route handlers, event listeners) through transforms, storage, and output
+4. For each module, state its single responsibility — if you can't, flag it
+## Principles to Evaluate
+**DRY** — Logic duplicated across changed files or between changed and existing code. Similar function signatures, copy-pasted patterns with minor variations. Two functions doing nearly the same thing should share implementation.
+**SOLID**
+- Single Responsibility: each module/function has one reason to change. Route handlers with business rules beyond delegation violate this
+- Open/Closed: new behavior addable without modifying working code
+- Interface Segregation: callers don't depend on methods they don't use
+- Dependency Inversion: high-level modules don't import low-level details directly
+## Checklist
+### Async & State Consistency
+- Optimistic state changes before async completion — if operation fails, UI stuck. `.catch(() => null)` followed by unconditional success code — the catch silences but success path still runs
+- Multiple coupled state variables updated independently — changes to one must update all related fields. Debounced/cancelable ops must reset loading on every exit. Selection sets must be pruned when items are removed, invalidated on reload/filter/sort/page. Ops from confirmation dialogs must re-validate at execution time. `useState(prop)` only captures initial value — sync with effect when prop updates async
+- Error notification at multiple layers — verify exactly one layer owns user-facing messages. Periodic polling: throttle notifications to state transitions (success→error), don't make UI disappear on error
+- Optimistic updates with full-collection rollback snapshots — second in-flight action clobbered. Use per-item rollback and functional updaters. Guard against duplicate appends
+- State updates guarded by truthiness (`if (arr?.length)`) preventing clearing when source returns empty — distinguish "no response" from "empty response"
+- Periodic operations with skip conditions not advancing timing state (lastRun, nextFireTime) — null/stale lastRun causes re-trigger loops. Check initial baseline: epoch makes items immediately due, "now" may prevent them from ever becoming due
+- Cached values keyed without all discriminators (URL, tenant, config version) — context changes serve stale data. Health endpoints returning cached results mask real-time failures
+- Mutation functions returning pre-mutation state — dependent scheduling/evaluation uses stale values
+- Fire-and-forget writes: in-memory not updated (stale response) or updated unconditionally (claims unpersisted state). Side effects (rewards, notifications, uploads) before confirmed primary write. Monotonic counters advancing before write risks running ahead on failure
+- Error/early-exit paths returning default status metadata (hasMore, pagination) or emitting events unconditionally — false success. Paired lifecycle events: every "started" exit path must emit "completed"/"failed" — watch short-circuit branches
+- Missing `await` on async ops in error/cleanup paths that must complete before function returns
+- `Promise.all` without error handling — partial load. `Promise.allSettled` without logging rejection reasons before mapping fallbacks
+- Sequential processing where one throw aborts remaining — wrap per-item in try/catch
+- Side effects during React render
+### Error Handling
+- Service functions throwing generic Error for client conditions — 500 instead of 400/404. Consistent access-control responses across endpoints. Concurrency failures → 409 not 500
+- Swallowed errors; generic messages replacing detail; external service wrappers returning null for all failures (collapsing config errors, auth, rate limits, 5xx into "not found")
+- Caller/callee disagreement: `{ success: false }` vs `.catch()`; gate returning `{ shouldRun: false }` on error vs fail-open runtime; argument shape mismatches (wrapped object vs bare array, wrong positional order); async EventEmitter handlers creating unhandled rejections
+- Destructive ops in retry/cleanup paths without own error handling
+- External service calls without configurable timeouts
+- Missing fallback for unavailable downstream services
+### Resource Management
+- Event listeners, sockets, subscriptions, timers, useEffect cleaned up on teardown
+- Delete/destroy leaving orphaned secondary resources (data dirs, branches, child records, temp files). Over-broad preservation guards preventing cleanup when nothing worth preserving (branch preserved with 0 commits ahead). Cleanup with implicit mutations (auto-merge, auto-commit) — abort on prerequisite failure
+- Initialization functions without guard against multiple calls — creates duplicates
+- Self-rescheduling callbacks where error before re-registration permanently stops schedule — use try/finally
+### Validation & Consistency
+- Breaking changes to public API without version bump or deprecation path
+- Backward-incompatible changes (renamed config keys, file formats, schemas, event payloads, routes, persisted data) without migration or fallback. Route renames need redirects
+- One-time migrations without completion guard — re-execute every startup
+- Data migrations silently changing runtime behavior — preserve execution semantics. Unsupported source values must be flagged, not defaulted
+- Update endpoints with field allowlists not covering new model fields
+- New endpoints not matching validation patterns of existing similar ones. API doc schemas must be structurally complete
+- Summary/aggregation endpoints using different filters/sources than detail views they link to
+- Discovery endpoints must validate against consumer's actual supported set. Identifier transformations between producer and consumer must preserve expected format
+- Validation functions introduced for a field: trace ALL write paths. New branches must apply same validation as siblings
+- Stored config merged with shallow spread — nested objects lose new default keys on upgrade. Use deep merge
+- Schema fields accepting values downstream can't handle. Validated params never consumed (dead API surface). `.partial()` on nested schemas: verify nested objects also partial. `.partial()` with `.default()` silently overwrites persisted values on update
+- Multi-part UI gated on different prop subsets — derive single enablement boolean
+- Entity creation without case-insensitive uniqueness
+- Code reading response properties that don't exist — verify field names, nesting, actual response shape. Wrappers that don't request/forward needed fields. Call sites using wrong function variant for input format or wrong positional argument order
+- Data model fields with different names per write path. Entity identity keys inconsistent across lookup paths
+- Entity type changes without revalidating type-specific invariants and clearing old-type fields
+- Config flag invariants (A implies B) not enforced across all layers: UI toggles, API validation, server defaults, persistence
+- Operations scoped to entity subtype without verifying discriminator — wrong type corrupts state
+- Inconsistent "missing value" semantics (null vs empty string vs whitespace) across layers. Validation returning null when null means "clear" downstream. Normalization applied inconsistently between write and comparison paths
+- Validation delegating to runtime computation — conflating "no result in window" with "invalid input"
+- Numeric strings without NaN/type guards. Hand-rolled regex for well-known formats — use platform parsers
+- UI hidden from nav but accessible via direct URL
+- Summary counters missing edge cases; counters incremented before confirming state change; batch ops reporting success while logging per-item failures
+### Concurrency & Data Integrity
+- Shared mutable state without locking; read-modify-write interleaving — use conditional writes/optimistic concurrency
+- Read-only paths triggering lazy init with write side effects — unprotected concurrent writes
+- Multi-table writes without transaction — partial state on error
+- Writes replacing entire composite attribute populated by multiple sources — discards other sources' data
+- Early returns for "no primary fields" skipping secondary operations
+- Shared flags/locks with exit paths that skip cleanup — permanent lock
+### Cross-File Deep Checks
+**Cross-file consistency**
+- New functions following existing patterns must match ALL aspects (validation, error codes, response shape, cleanup). Partial copying is #1 review feedback source
+- New API client functions must use same encoding/escaping as existing ones
+- New endpoints must be wired in all runtime adapters (serverless, framework routes, gateway)
+- New external service calls must use established mock/test infrastructure
+- New UI consumers against existing APIs: verify every field name, nesting, identifier, response envelope matches actual producer response
+- Discovery/catalog endpoints: trace enumerated set against consumer's supported inputs
+**Cleanup/teardown side effects**
+- Cleanup functions with implicit mutations (auto-merge, auto-commit, cascade writes) — verify abort on prerequisite failure
+**Specification conformance**
+- Parsers for well-known formats (cron, dates, URLs, semver): verify boundary handling matches spec — field ranges, normalization, step/range semantics
+**Temporal context**
+- Timezone-aware logic alongside non-timezone-aware in same flow — mixed contexts trigger on wrong day/hour
+**Boolean/type fidelity through serialization**
+- Boolean flags persisted to text (markdown metadata, query strings, flat files): trace write → storage → read → consumption. `"false"` is truthy — verify strict equality at all consumption sites
+**Cross-layer invariant enforcement**
+- Config flag invariants (A implies B): trace through UI toggles, form submission, route validation, server defaults, persistence round-trip
+**Error path completeness**
+- Each error reaches user with helpful message and correct HTTP status. Multi-step operations track per-item failures separately from overall success
+**Concurrency under user interaction**
+- Optimistic updates with async: second action while first in-flight — rollback/success handlers can clobber concurrent state or close over stale snapshots
+**State ownership across boundaries**
+- Child component local state from parent data: trace ownership, propagation back to parent, unmount/remount stale cache
+**Bootstrap/initialization ordering**
+- Resilience code (installers, auto-repair, migrations) importing dependencies before installing them — restructure so install precedes resolution
+**Lock/flag exit-path completeness**
+- Shared flags/locks: trace every exit path (early returns, catches, platform guards, normal completion) for clearing
+**Operation-marker ordering**
+- Completion markers, success flags written AFTER the operation, not before. Marker-dependent startup validates contents, not just presence
+**Real-time event vs response timing**
+- Push events (WS, SSE) before HTTP response that gives clients context to interpret them (IDs, version numbers)
+**Paired lifecycle event completeness**
+- "Started" event → every exit path (success, error, early return, no-op branches for specific entity types) emits "completed"/"failed"
+**Entity identity key consistency**
+- Computed lookup keys (e.g., `e.id || e.externalId`): trace all paths using same computation — inconsistent keys cause mismatches
+**Intent vs implementation (cross-file)**
+- Cross-references between files (identifiers, param names, format conventions, versions, thresholds) that disagree — trace all references when one changes. Internal identifiers renamed when concept renamed
+- Modified values referenced in other files: trace all cross-references
+- Responsibility relocated from one module to another: trace all dependents at old location (guards, return values, state updates). Remove dead code at old location
+**Transactional write integrity**
+- Multi-item writes: condition expressions preventing stale-read races (TOCTOU). Update ops that silently create records for invalid IDs (DynamoDB UpdateItem, MongoDB upsert) — add existence conditions. Caught conditional failures → 409 not 500
+**Batch/paginated consumption**
+- Batch API callers handle partial results, continuation tokens, rate limits with backoff. Resource names account for environment prefixes
+**Data model vs access pattern**
+- Claims of ordering ("recent", "top") verified against key/index design — random UUIDs require full scans
+**Deletion/lifecycle cleanup**
+- Delete functions: trace all lifecycle resources. State resets: clear individual contributing records — stale records block re-entry
+**Update schema depth**
+- Update schemas from create (`.partial()`): nested objects must also be partial
+**Mutation return value freshness**
+- Returned entity reflects post-mutation state. Force/trigger operations reset dependent scheduling state
+**Read-after-write consistency**
+- Writes then immediate scans/aggregations: check store's consistency model. Compute from in-memory state or use consistent-read options
+**Multi-source data aggregation**
+- Items from multiple sources: retain source identifier through aggregation for downstream routing
+**Field-set enumeration consistency**
+- Operations targeting field sets: trace every other enumeration (UI predicates, filters, docs, tests) — prefer single source of truth
+**Abstraction layer fidelity**
+- Wrappers requesting all fields handlers depend on — third-party APIs often require opt-in. Mutually exclusive params: strip conflicts. Framework function variants match input format. Positional args match called function's parameter order
+**Parameter consumption tracing**
+- Validated params: trace to actual consumption. Unread params create dead API surface — wire through or remove
+**Summary/aggregation consistency**
+- Dashboard counts vs detail views: same filters, ordering. Navigation links propagate aggregated context
+**Data model / status lifecycle**
+- Changed statuses/enums: sweep API docs, UI filters, conditional rendering, routes, tests. Renamed concepts: trace all manifestations (routes, components, variables, CSS, tests)
+**Type-discriminated entities**
+- Discriminator changes: trace all code paths (migration, bulk, UI type-switchers) — verify downstream branching handles all transitions
+**Migration idempotency**
+- Startup migrations: verify second run is no-op. Condition excludes already-migrated records
+**Data migration semantics**
+- Migrated fields preserve behavioral meaning. Concurrency protection for read-triggered migrations. Unsupported source values flagged not defaulted
+**Dependent operation ordering**
+- Side effects only after primary operation confirms success. `Promise.all` grouping sequential deps. Resource allocation before gate operations (locks, validation)
+**Bulk vs single-item parity**
+- Single-item CRUD changes: trace corresponding bulk operation — verify same fields, validation, secondary data
+**Bulk selection lifecycle**
+- Selection cleared on data refresh/deletion. Not cleared but should be on filter/sort/page change. Re-validate at execution time after confirmation dialog
+**Config auto-upgrade provenance**
+- Auto-upgrade logic: distinguish user customization from previous default — without provenance, overwrites intentional customizations
+**Query key / stored key alignment**
+- Lookup key precision/encoding/format matching write path — mismatches return zero matches
+**Subprocess condition detection**
+- Subprocess output parsed to detect conditions: check both stdout and stderr plus exit code — location varies by tool version
+**Formatting consistency**
+- New content matches file's existing indentation, bullets, headings, structure
+## Output Format
+For each finding:
+```
+file:line — [CRITICAL|IMPROVEMENT|UNCERTAIN] description
+Cross-file trace: file_a:line → file_b:line (what flows between them)
+Evidence: `quoted code from each file`
+```
+Only report verified findings with cross-file evidence. If the trace is uncertain, mark [UNCERTAIN].

package/lib/review-security-audit.md ADDED Viewed

@@ -0,0 +1,76 @@
+# Security Audit Review Agent
+## Mandate
+You review code with an adversarial mindset. Find trust boundary violations, injection vectors, data exposure, and access control gaps. Focus on security concerns that a general code reviewer would deprioritize.
+## Reading Strategy
+For each changed file:
+1. Read the **ENTIRE file** (not just diff hunks)
+2. Identify all trust boundaries: client → server, user → system, external → internal
+3. Trace user/external input from entry to consumption within the file
+4. For cross-file security flows (input entering one file, consumed unsafely in another), flag as [NEEDS-TRACE] so the cross-file agent can verify
+## Checklist
+### Injection & URL Safety
+- User/system values interpolated into URL paths, shell commands, file paths, subprocess args, or dynamically evaluated code (eval, CDP evaluate, new Function, template strings in page context) without encoding/escaping — use `encodeURIComponent()` for URLs, regex allowlists for execution boundaries, `JSON.stringify()` for eval'd code. Generated identifiers in URL segments must be safe (no `/`, `?`, `#`). Slugs for namespaced resources (branches, directories) need unique suffix to prevent collisions
+- Server-side HTTP requests using user-configurable or externally-stored URLs without protocol allowlisting (http/https) and host restrictions — SSRF to internal services, metadata endpoints, localhost APIs. Check redirect handling: auto-follow (`redirect: 'follow'`) bypasses initial validation when redirecting to internal IPs. Resolve DNS and block private/loopback/link-local ranges — public hostnames can resolve to internal IPs via DNS rebinding
+- Error/fallback responses hardcoding security headers instead of using centralized policy — error paths bypass tightening
+### Trust Boundaries & Data Exposure
+- API responses returning full objects with sensitive fields — destructure and omit across ALL paths (GET, PUT, POST, error, socket). Comments claiming data isn't exposed while the code does expose it
+- Server trusting client-provided computed/derived values (scores, totals, correctness flags, file metadata like MIME type and size) — strip and recompute server-side. Validate uploads via magic bytes and buffer length, not headers
+- New endpoints under restricted paths (admin, internal) missing authorization — compare with sibling endpoints for same access gate (role check, scope validation). New OAuth scopes must be checked comprehensively — a check testing only one scope misses newly added scopes
+- User-controlled objects merged via `Object.assign`/spread without sanitizing keys — `__proto__`, `constructor`, `prototype` enable prototype pollution. Use `Object.create(null)`, whitelist keys, use `hasOwnProperty` not `in`
+- Push events (WebSocket, SSE, pub/sub) emitted without scoping to originating user/session — sensitive payloads leak to all connected clients. Scope via room/channel isolation or server-side correlation ID
+### Input Handling
+- Trimming values where whitespace is significant (API keys, tokens, passwords, base64) — only trim identifiers/names
+- Endpoints accepting unbounded arrays without upper limits — enforce max size. Validate element types/format, deduplicate to prevent inflated counts/repeated side effects. Internal operations fanning out unbounded parallel I/O risk EMFILE — use concurrency limiters
+- Security/sanitization functions handling only one input format when data arrives in multiple formats (JSON, shell env, URL-encoded, headers) — sensitive data leaks through unhandled format
+### Hand-rolled Validators
+- Hand-rolled regex for well-known formats (IP addresses, email, URLs, dates, semver) that accept invalid inputs — use platform parsers (`net.isIP()`, `URL` constructor, `semver.valid()`)
+### Security Deep Checks
+**Push/real-time event scoping**
+- If the PR adds or modifies WebSocket, SSE, or pub/sub events: does the event reach only the originating session, or all clients? Check payloads for sensitive content. Verify correlation IDs are server-generated or validated against session
+**Access scope changes**
+- If the PR widens access (admin → public, internal → external): trace shared dependencies (rate limiters, queues, pools) — were they sized for the previous access level? Process-local limiters don't enforce across instances
+- New endpoints under restricted route groups: verify same authorization gate as siblings — missing gates on admin-mounted endpoints are the most dangerous finding
+**Data flow audit**
+- For secrets/tokens: trace input → storage → retrieval → response. Verify never leaked in ANY response path
+- For user input → URL/command interpolation: verify encoding/escaping at every boundary
+**Sanitization/validation coverage**
+- If a new validation function is introduced for a field: trace ALL write paths (create, update, import, sync, bulk) — partial application means invalid data re-enters through unguarded paths
+- If a "raw" or bypass write path is added: compare normalization against what the read/parse path assumes — data through raw path must be valid on reload
+- If a new dispatch branch is added within a multi-type handler: verify equivalent validation as sibling branches
+**Security-sensitive configuration parsing**
+- Env vars/config affecting security (proxy trust, rate limits, CORS, token expiry): verify type and range enforcement. `Number()` accepts floats, negatives, empty-string-as-zero — use `parseInt` + `Number.isInteger` + range checks with logged safe defaults
+**Guard-before-cache ordering**
+- Pre-flight guards (rate limit, quota, feature flag) before cache lookup: verify the guard doesn't block operations served from cache without touching the guarded resource
+**Server-side fetch lifecycle**
+- Server-side HTTP requests to user/external URLs: trace initial validation → DNS resolution → connection → redirect handling. Host/IP restrictions must be enforced on each redirect hop and after DNS resolution
+## Output Format
+For each finding:
+```
+file:line — [CRITICAL|IMPROVEMENT|NEEDS-TRACE] description
+Evidence: `quoted code line(s)`
+Attack scenario: brief exploitation description
+```
+Security findings default to CRITICAL unless exploitation requires unlikely preconditions.
+Use [NEEDS-TRACE] for cross-file security flows that require the cross-file agent to verify.

package/lib/review-surface-scan.md ADDED Viewed

@@ -0,0 +1,161 @@
+# Surface Scan Review Agent
+## Mandate
+You review code for per-file correctness: bugs, quality issues, and convention violations visible within a single file. You do NOT trace call chains or data flows across files — another agent handles cross-file analysis.
+## Reading Strategy
+For each changed file, read the **ENTIRE file** (not just diff hunks). New code interacting incorrectly with existing code in the same file is a common bug source. Review one file at a time.
+## Principles to Evaluate
+**YAGNI** — Flag abstractions, config options, parameters, or extension points that serve no current use case. Unnecessary wrapper functions, premature generalization (factory producing one type), unused feature flags.
+**Naming** — Functions and variables should communicate intent without reading the implementation. Booleans should read as predicates (`isReady`, `hasAccess`), not ambiguous nouns.
+## Checklist
+### Always Check — Runtime & Hygiene
+**Hygiene**
+- Leftover debug code (`console.log`, `debugger`, TODO/FIXME/HACK), hardcoded secrets/credentials, uncommittable files (.env, node_modules, build artifacts, runtime-generated data/reports)
+- Overly broad changes that should be split into separate PRs
+**Imports & references**
+- Every symbol used is imported (missing → runtime crash); no unused imports. Also check references to framework utilities (CSS class names, directive names, component props) — a non-existent utility class or prop name silently does nothing
+**Runtime correctness**
+- Null/undefined access without guards; off-by-one errors; spread of null (is `{}`), spread of non-objects (string → indexed chars, array → numeric keys) — guard with plain-object check before spreading
+- External/user data (parsed JSON, API responses, file reads) used without structural validation — guard parse failures, missing properties, wrong types, null elements. Optional enrichment failures should not abort the main operation
+- Type coercion: `Number('')` is `0` not empty; `0` is falsy in truthy checks; `NaN` comparisons always false; `"10" < "2"` (lexicographic). Deserialized booleans: `"false"` is truthy — use `=== 'true'`. `isinstance(x, int)` accepts `bool` in Python; `typeof NaN === 'number'` in JS
+- Indexing empty arrays; `every`/`some`/`reduce` on empty collections returning vacuously true; declared-but-never-updated state/variables
+- Parallel arrays coupled by index position — use objects/maps keyed by stable identifier
+- Shared mutable references: module-level defaults mutated across calls (use `structuredClone()`); `useCallback`/`useMemo` referencing later `const` (temporal dead zone); spread followed by unconditional assignment clobbering spread values
+- Functions with >10 branches or >15 cyclomatic complexity — refactor
+**API route basics**
+- Route params passed to services without format validation; path containment using string prefix without separator boundary (use `path.relative()`)
+- Parameterized/wildcard routes registered before specific named routes (`/:id` before `/drafts` matches `/drafts` as `id="drafts"`)
+- Stored or external URLs rendered as clickable links without protocol validation — allowlist `http:`/`https:`
+**Error handling (single-file)**
+- Swallowed errors (empty `.catch(() => {})`); error handlers that exit cleanly (`exit 0`, `return`) without user-visible output; handlers replacing detailed failure info with generic messages
+### Domain-Specific (check only when file type matches)
+**SQL & database** _[SQL, ORM, migration files]_
+- Parameterized query placeholder indices vs parameter array positions
+- DB triggers clobbering explicit values; auto-increment only on INSERT not UPDATE
+- Full-text search with strict parsers (`to_tsquery`) on user input — use `plainto_tsquery`
+- Dead queries (results never read); N+1 patterns; O(n²) on growing data
+- Performance optimizations (early exits, capped limits) that silently reduce correctness
+- `CREATE TABLE IF NOT EXISTS` as sole migration — won't add columns. Use `ALTER TABLE ... ADD COLUMN IF NOT EXISTS`
+- Functions/extensions requiring unchecked database versions
+- Migrations locking tables (ADD COLUMN with default, CREATE INDEX without CONCURRENTLY)
+- Missing rollback/down migration
+**Sync & replication** _[pagination, batch APIs, data sync]_
+- Upsert/`ON CONFLICT UPDATE` updating only subset of exported fields — replicas diverge
+- Pagination: `COUNT(*)` (full scan) instead of `limit + 1`; missing `next` token; hard-capped limits truncating silently; store applying limits before filters requiring loop with continuation tokens
+- Pagination cursors from last scanned vs last returned item — trimmed results cause permanent skips
+- Batch API calls not handling partial results — unprocessed items, continuation tokens dropped
+- Retry loops without backoff or max attempts
+**Lazy initialization** _[dynamic imports, lazy singletons, bootstrap]_
+- Cached state getters returning null before initialization
+- Module-level side effects (file reads, SDK init) without error handling
+- File writes assuming parent directory exists
+- Bootstrap code importing dependencies it's meant to install — restructure so install precedes resolution
+- Re-exporting from heavy modules defeats lazy loading
+**Data format portability** _[JSON, DB, IPC, serialization boundaries]_
+- Values changing format across boundaries (arrays in JSON vs strings in DB). Datetime: mixing UTC string ops with local Date methods shifts across timezones; appending 'Z' without verifying source timezone
+- Reads immediately after writes to eventually consistent stores
+- BIGINT → JS Number precision loss past `MAX_SAFE_INTEGER` — use strings or BigInt
+- Key/index design not supporting required query patterns (random UUIDs claiming "recent" ordering)
+**Shell & portability** _[subprocesses, shell scripts, CLI tools]_
+- `set -e` aborting on non-critical failures; broken pipes on non-critical writes — use `|| true`
+- Interactive prompts in non-interactive contexts (CI, cron) — guard with TTY detection
+- Detached processes with piped stdio — SIGPIPE on parent exit. Use `'ignore'`
+- Subprocess output buffered without size limits — unbounded memory growth
+- Platform-specific: hardcoded shell interpreters; `path.join()` backslashes breaking ESM imports — use `pathToFileURL()`
+- Naive whitespace splitting of command strings breaks quoted arguments — use proper argv parser
+- Subprocess output parsed from single stream (stdout or stderr) to detect conditions — check both streams and exit code
+- Shell expansions suppressed by quoting — single quotes prevent all expansion
+**Search & navigation** _[search, deep-linking]_
+- Search results linking to generic list pages instead of deep-linking to specific record
+- Search code hardcoding one backend when system supports multiple
+**Destructive UI** _[delete, reset, revoke actions]_
+- Destructive actions without confirmation step
+**Accessibility** _[UI components, interactive elements]_
+- Interactive elements missing accessible names, roles, or ARIA states
+- Custom toggles from non-semantic elements instead of native inputs
+- Overlay layers with `pointer-events-auto` intercepting clicks beneath; `pointer-events-none` on parent killing child hover handlers
+### Always Check — Quality & Conventions
+**Intent vs implementation (single-file)**
+- Labels, comments, status messages describing behavior the code doesn't implement
+- Inline code examples or command templates that aren't syntactically valid
+- Sequential numbering with gaps or jumps after edits
+- Template/workflow variables referenced but never assigned — trace each placeholder to a definition
+- Constraints described in preamble not enforced by conditions in procedural steps
+- Duplicate or contradictory items in sequential lists
+- Completion markers or success flags written before the operation they attest to
+- Existence checks (directory exists, file exists) used as proof of correctness — file can exist with invalid contents
+- Lookups checking only one scope when multiple exist (local branches but not remote)
+- Tracking/checkpoint files defaulting to empty on parse failure — fail-open guards
+- Registering references to resources without verifying resource exists
+**AI-generated code quality**
+- New abstractions, wrapper functions, helper files serving only one call site — inline instead
+- Feature flags, config options, extension points with only one possible value
+- Commit messages claiming a fix while the bug remains
+- Placeholder comments (`// TODO`, `// FIXME`) or stubs presented as complete
+- Unnecessary defensive code for scenarios that provably cannot occur
+**Configuration & hardcoding**
+- Hardcoded values when config/env var exists; dead config fields; unused function parameters
+- Duplicated config/constants/helpers across modules — extract to shared module. Watch for behavioral inconsistencies between copies
+- CI pipelines without lockfile pinning or version constraints
+- Production code paths with no structured logging at entry/exit
+- Error logs missing reproduction context (request ID, input params)
+- Async flows without correlation ID propagation
+**Supply chain & dependencies**
+- Lockfile committed and CI uses `--frozen-lockfile`; no drift from manifest
+- `npm audit` / `cargo audit` / `pip-audit` — no unaddressed HIGH/CRITICAL vulns
+- No `postinstall` scripts from untrusted packages executing arbitrary code
+- Overly permissive version ranges (`*`, `>=`) on deps with breaking-change history
+**Test coverage**
+- New logic/schemas/services without tests when similar existing code has tests
+- New error paths untestable because services throw generic errors
+- Tests re-implementing logic under test instead of importing real exports — pass even when real code regresses. Tests asserting by inspecting source code strings rather than calling functions
+- Tests depending on real wall-clock time or external dependencies
+- Missing tests for trust-boundary enforcement
+- Tests exercising code paths the integration layer doesn't expose — pass against mocks but untriggerable in production
+- Test mock state leaking between tests — "clear" resets invocation counts but not configured behavior; use "reset" variants
+**Automated pipeline discipline**
+- Internal code review must run before creating PRs — never go straight from "tests pass" to PR
+- Copilot review must complete before merging
+- Automated agent output must be reviewed against project conventions
+**Style & conventions**
+- Naming and patterns inconsistent with rest of codebase
+- New content not matching existing indentation, bullet style, heading levels
+- Shell instructions with destructive operations not verifying preconditions first
+## Output Format
+For each finding:
+```
+file:line — [CRITICAL|IMPROVEMENT|UNCERTAIN] description
+Evidence: `quoted code line(s)`
+```
+Only report verified findings with quoted code evidence. If you cannot quote specific code for a finding, mark as [UNCERTAIN].

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "slash-do",
-  "version": "2.3.0",
+  "version": "2.5.0",
   "description": "Curated slash commands for AI coding assistants — Claude Code, OpenCode, Gemini CLI, and Codex",
   "author": "Adam Eivy <adam@eivy.com>",
   "license": "MIT",

package/uninstall.sh CHANGED Viewed

@@ -33,6 +33,7 @@ OLD_COMMANDS=(cam good makegoals makegood optimize-md)
 LIBS=(
   code-review-checklist copilot-review-loop graphql-escaping
   remediation-agent-template swift-review-checklist
+  review-surface-scan review-security-audit review-cross-file-tracing
 )
 HOOKS=(slashdo-check-update slashdo-statusline)