npm - slash-do - Versions diffs - 1.5.1 → 1.6.1 - Mend

slash-do 1.5.1 → 1.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/commands/do/better.md +171 -25
package/commands/do/review.md +24 -1
package/lib/code-review-checklist.md +26 -12
package/lib/copilot-review-loop.md +10 -5
package/package.json +1 -1

package/commands/do/better.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-description: Unified DevSecOps audit, remediation, per-category PRs, CI verification, and Copilot review loop with worktree isolation
+description: Unified DevSecOps audit, remediation, test enhancement, per-category PRs, CI verification, and Copilot review loop with worktree isolation
 argument-hint: "[--scan-only] [--no-merge] [path filter or focus areas]"
 ---
@@ -58,6 +58,9 @@ When compacting during this workflow, always preserve:
 - All PR numbers and URLs created so far
 - `BUILD_CMD`, `TEST_CMD`, `PROJECT_TYPE`, `WORKTREE_DIR` values
 - `VCS_HOST`, `CLI_TOOL`, `DEFAULT_BRANCH`, `CURRENT_BRANCH`
+- `PHASE_4C_START_SHA` (needed for FILE_OWNER_MAP update in Phase 4c.3)
+- `VACUOUS_TESTS_FIXED`, `WEAK_TESTS_STRENGTHENED`, `NEW_TEST_CASES`, `NEW_TEST_FILES`
+- `CREATED_CATEGORY_SLUGS` (list of branch slugs created in Phase 5)
 ## Phase 0: Discovery & Setup
@@ -173,9 +176,31 @@ Skip step 4 if steps 1-3 reveal the code is correct.
    - **Database migrations**: exclusive-lock ALTER TABLE on large tables, CREATE INDEX without CONCURRENTLY, missing down migrations or untested rollback paths
    - General: framework-specific security issues, language-specific gotchas, domain-specific compliance, environment variable hygiene (missing `.env.example`, required env vars not validated at startup, secrets in config files that should be in env)
-7. **Test Coverage**
-   Uses Batch 1 findings as context to prioritize:
-   Focus: missing test files for critical modules, untested edge cases, tests that only cover happy paths, mocked dependencies that hide real bugs, areas with high complexity (identified by agents 1-5) but no tests, test files that don't actually assert anything meaningful
+7. **Test Quality & Coverage**
+   Uses Batch 1 findings as context to prioritize.
+   Focus areas:
+   **Coverage gaps:**
+   - Missing test files for critical modules, untested edge cases, tests that only cover happy paths
+   - Areas with high complexity (identified by agents 1-5) but no tests
+   - Remediation changes from agents 1-6 that lack corresponding test coverage
+   **Vacuous tests (tests that don't actually test anything):**
+   - Tests that assert on mocked return values instead of real behavior (testing the mock, not the code)
+   - Tests that only check truthiness (`assert.ok(result)`) when they should verify specific values or shapes
+   - Tests with assertions that can never fail (e.g., asserting a hardcoded value equals itself, asserting `typeof x === 'object'` on a literal `{}`)
+   - Tests that re-implement the logic under test instead of importing the real function — these pass even when real code regresses
+   - `it('should work', ...)` tests with no meaningful assertion or with assertions commented out
+   - Tests that mock the module they're testing (testing mock behavior, not real behavior)
+   **Weak test patterns:**
+   - Tests that verify implementation details (internal state, private methods, call counts) instead of observable behavior
+   - Tests where all assertions pass even if the function under test returns `null`/`undefined`/empty — verify by mentally substituting a no-op and checking if the test would still pass
+   - Integration tests that mock so aggressively they become unit tests of glue code
+   - Tests missing negative cases (invalid input, error paths, boundary conditions)
+   - Tests with shared mutable state between cases (`beforeEach` that doesn't reset, module-level variables)
+   Report each finding with a severity prefix `**[CRITICAL]**`, `**[HIGH]**`, `**[MEDIUM]**`, or `**[LOW]**` followed immediately by a quality prefix `[VACUOUS]`, `[WEAK]`, or `[MISSING]` (for example, `**[HIGH][VACUOUS]**`) to distinguish quality issues from coverage gaps while keeping the format consistent with other agents. Include the specific test name and file:line for existing test issues.
 Wait for ALL agents to complete before proceeding.
@@ -220,10 +245,18 @@ For each file touched by multiple categories, document why it was assigned to on
 ### Architecture & SOLID
 ### Bugs, Performance & Error Handling
 ### Stack-Specific
-### Test Coverage (tracked, not auto-remediated)
+### Test Quality & Coverage
 ```
-6. Print a summary table:
+6. Print a summary table (short labels → full category → branch slug):
+   - Security → Security & Secrets → `security`
+   - Code Quality → Code Quality & Style → `code-quality`
+   - DRY & YAGNI → DRY & YAGNI → `dry`
+   - Architecture → Architecture & SOLID → `architecture`
+   - Bugs & Perf → Bugs, Performance & Error Handling → `bugs-perf`
+   - Stack-Specific → Stack-Specific → `stack-specific`
+   - Tests → Test Quality & Coverage → `tests`
 ```
 | Category          | CRITICAL | HIGH | MEDIUM | LOW | Total |
 |-------------------|----------|------|--------|-----|-------|
@@ -233,7 +266,7 @@ For each file touched by multiple categories, document why it was assigned to on
 | Architecture      | ...      | ...  | ...    | ... | ...   |
 | Bugs & Perf       | ...      | ...  | ...    | ... | ...   |
 | Stack-Specific    | ...      | ...  | ...    | ... | ...   |
-| Test Coverage     | ...      | ...  | ...    | ... | ...   |
+| Tests             | ...      | ...  | ...    | ... | ...   |
 | TOTAL             | ...      | ...  | ...    | ... | ...   |
 ```
@@ -241,7 +274,7 @@ For each file touched by multiple categories, document why it was assigned to on
 ## Phase 3: Worktree Remediation
-Only proceed with CRITICAL, HIGH, and MEDIUM findings. LOW and Test Coverage findings remain tracked in PLAN.md but are not auto-remediated.
+Only proceed with CRITICAL, HIGH, and MEDIUM findings for code remediation. LOW findings remain tracked in PLAN.md but are not auto-remediated. Test Quality & Coverage findings are handled separately in Phase 4c.
 ### 3a: Setup
@@ -349,21 +382,126 @@ Before creating PRs, run a deep code review on all remediation changes to catch
    ```
 5. If "Show diff" selected, print the diff and re-ask. If "Abort", stop and print the worktree path.
+## Phase 4c: Test Enhancement
+After internal code review passes, evaluate and enhance the project's test suite. This phase acts on Agent 7's findings AND ensures all remediation work from Phase 3 has proper test coverage.
+### 4c.0: Record Start SHA
+Before any test enhancement commits, capture the current HEAD so Phase 4c changes can be diffed later:
+```bash
+cd {WORKTREE_DIR}
+PHASE_4C_START_SHA="$(git rev-parse HEAD)"
+```
+### 4c.1: Test Audit Triage
+Review Agent 7 findings from Phase 1 and categorize them:
+1. **`[VACUOUS]` findings** — tests that exist but don't test real behavior. These are the highest priority because they create a false sense of safety.
+2. **`[WEAK]` findings** — tests that partially cover behavior but miss important cases. Strengthen with additional assertions and edge cases.
+3. **`[MISSING]` findings** — no tests exist for critical paths. Write new test files or add test cases to existing files.
+Additionally, scan all remediation changes from Phase 3:
+- For each file modified by remediation agents, check if corresponding tests exist
+- If tests exist, verify they cover the specific behavior that was fixed/changed
+- If no tests exist for a remediated module, flag for new test creation
+### 4c.2: Test Enhancement Execution
+Spawn a general-purpose agent (using `REMEDIATION_MODEL`) in the worktree to fix and write tests. Populate the template placeholders below from Phase 4c.1 triage output: `{VACUOUS_AND_WEAK_FINDINGS}` from `[VACUOUS]`/`[WEAK]` findings, `{MISSING_FINDINGS}` from `[MISSING]` findings, and `{REMEDIATED_FILES_WITHOUT_TESTS}` from the remediation-change scan. The agent instructions:
+```
+You are a test enhancement agent working in {WORKTREE_DIR}.
+Project type: {PROJECT_TYPE}. Test command: {TEST_CMD}.
+Your job is to fix weak/vacuous tests and write missing tests that verify REAL BEHAVIOR.
+## Rules for writing good tests
+1. **Test observable behavior, not implementation.** Assert on return values, side effects (files written, state changed), and error messages — never on internal variable names, call counts, or private method invocations.
+2. **Every assertion must be falsifiable.** For each assertion you write, mentally substitute a broken implementation (returns null, returns wrong value, throws instead of succeeding, succeeds instead of throwing). If your assertion would still pass, it's vacuous — rewrite it.
+3. **Prefer real modules over mocks.** Only mock at system boundaries (filesystem, network, time). If you must mock, assert on the arguments passed TO the mock, not on its return value.
+4. **Test the edges.** Each test function needs at minimum:
+   - Happy path with specific expected output
+   - Empty/null/undefined input
+   - Invalid input that should error
+   - Boundary values (0, -1, MAX, empty string vs null)
+5. **Use concrete expected values.** `assert.equal(result, 'expected string')` not `assert.ok(result)`. `assert.deepEqual(output, { key: 'value' })` not `assert.ok(typeof output === 'object')`.
+6. **One behavior per test.** Each `it()` block tests exactly one scenario. The test name describes the scenario and expected outcome.
+7. **No shared mutable state.** Each test must be independently runnable. Use `beforeEach` to create fresh fixtures. Never rely on test execution order.
+## Task list
+Fix these vacuous/weak tests:
+{VACUOUS_AND_WEAK_FINDINGS}
+Write tests for these gaps:
+{MISSING_FINDINGS}
+Write tests for these remediated files:
+{REMEDIATED_FILES_WITHOUT_TESTS}
+## Verification
+After writing/fixing each test file:
+1. Run `{TEST_CMD}` to verify all tests pass
+2. For each NEW test, verify that it fails when the behavior under test is wrong:
+   - Stage your test changes so they are protected: `git add path/to/test_file*`
+   - Confirm your staged diff only includes the intended test changes: `git diff --cached`
+   - Confirm there are no other unstaged changes in the worktree: `git diff` is clean
+   - Apply a small, obvious, and **uncommitted** change to the code under test (e.g., return a constant, flip a conditional)
+   - Run `{TEST_CMD}` and confirm the new test FAILS
+   - Immediately restore only the temporary code change (do **not** touch the staged tests), for example:
+     - `git restore path/to/code_under_test` **or**
+     - `git checkout HEAD -- path/to/code_under_test`
+   - Confirm the worktree has no remaining unstaged changes (`git diff` shows no changes) and that your staged test changes are still present (`git diff --cached`)
+   This is the key quality gate — a test that does not fail when the code is broken is worthless.
+3. After confirming the temporary code change is reverted and only the intended test changes are staged, commit the passing tests: `test: {description of what's tested}`
+```
+### 4c.3: Verification
+After the test agent completes:
+1. Run the full test suite:
+   ```bash
+   cd {WORKTREE_DIR} && {TEST_CMD}
+   ```
+2. If tests fail, fix in a new commit
+3. Count new/fixed tests and record four variables:
+   - `VACUOUS_TESTS_FIXED` — number of vacuous tests fixed
+   - `WEAK_TESTS_STRENGTHENED` — number of weak tests strengthened
+   - `NEW_TEST_CASES` — number of new test cases added
+   - `NEW_TEST_FILES` — number of new test files created
+4. **Update `FILE_OWNER_MAP`** — Phase 4c may have created or modified test files that were not in the Phase 2 map. Before Phase 5 assembles branches:
+   - List all files changed by Phase 4c commits: `git diff --name-only "$PHASE_4C_START_SHA"..HEAD`
+   - For each file not already in `FILE_OWNER_MAP`, assign it to the `tests` category
+   - For each file already owned by another category, leave it in that category (co-located test changes ship with the code they test — the `tests` branch only contains standalone test files not owned by other categories)
 ## Phase 5: Per-Category PR Creation
 Instead of one mega PR, create **separate branches and PRs for each category**. This enables independent review, targeted CI, and granular merge decisions.
 ### 5a: Build the Category Branches
-Using the `FILE_OWNER_MAP` from Phase 2, create one branch per category:
+Using the `FILE_OWNER_MAP` from Phase 2 (updated in Phase 4c.3), create one branch per category.
+Initialize `CREATED_CATEGORY_SLUGS=""` (empty space-delimited string). After each category branch is successfully created and pushed below, append its slug: `CREATED_CATEGORY_SLUGS="$CREATED_CATEGORY_SLUGS {CATEGORY_SLUG}"`. Phase 7 uses this as the set of candidate branches for cleanup; when deleting branches, either run cleanup only after all desired merges are complete or explicitly verify that each branch in `CREATED_CATEGORY_SLUGS` has been merged before deleting it.
 For each category that has findings:
 1. Switch to `{DEFAULT_BRANCH}`: `git checkout {DEFAULT_BRANCH}`
 2. Create a category branch: `git checkout -b better/{CATEGORY_SLUG}`
-   - Use slugs: `security`, `code-quality`, `dry`, `arch-bugs`, `stack-specific`
+   - Use slugs: `security`, `code-quality`, `dry`, `architecture`, `bugs-perf`, `stack-specific`, `tests`
 3. For each file assigned to this category in `FILE_OWNER_MAP`:
-   - **Modified files**: `git checkout origin/better/{DATE} -- {file_path}`
-   - **New files (Added)**: `git checkout origin/better/{DATE} -- {file_path}`
+   - **Modified files**: `git checkout better/{DATE} -- {file_path}`
+   - **New files (Added)**: `git checkout better/{DATE} -- {file_path}`
    - **Deleted files**: `git rm {file_path}`
 4. Commit all staged changes with a descriptive message:
    ```bash
@@ -475,7 +613,7 @@ After creating all PRs, verify CI passes on each one:
 ## Phase 6: Copilot Review Loop (GitHub only)
-Maximum 5 iterations per PR to prevent infinite loops.
+Loop until Copilot returns zero new comments (no fixed iteration limit). Sub-agents enforce a 10-iteration guardrail: at iteration 10 the sub-agent stops and returns a "guardrail" status, prompting the parent agent to ask the user whether to continue or stop.
 **Sub-agent delegation** (prevents context exhaustion): delegate each PR's review loop to a **separate general-purpose sub-agent** via the Agent tool. Launch sub-agents in parallel (one per PR). Each sub-agent runs the full loop (request → wait → check → fix → re-request) autonomously and returns only the final status.
@@ -493,9 +631,9 @@ Launch all PR sub-agents in parallel. Wait for all to complete.
 For each sub-agent result:
 - **clean**: mark PR as ready to merge
-- **timeout**: ask the user whether to continue waiting, re-request, or skip
-- **max-iterations-reached**: inform the user "Reached max review iterations (5) on PR #{number}. Remaining issues may need manual review."
+- **timeout**: inform the user "Copilot review timed out on PR #{number}." and ask whether to continue waiting, re-request, or skip
 - **error**: inform the user and ask whether to retry or skip
+- **guardrail**: the sub-agent hit the 10-iteration limit; ask the user whether to continue with more iterations or stop
 ### 6.3: Merge Gate (MANDATORY)
@@ -546,16 +684,17 @@ If merge fails (e.g., branch protection, merge conflicts from a prior PR):
    ```bash
    git worktree remove {WORKTREE_DIR}
    ```
-2. Delete local AND remote branches (only if merged):
+2. Delete the local staging branch and per-category branches (local + remote). Use the tracked list of branches from Phase 5 rather than a fixed list:
    ```bash
-   git branch -d better/{DATE}
-   git branch -d better/security better/code-quality better/dry better/arch-bugs better/stack-specific
+   git checkout {DEFAULT_BRANCH}
+   git branch -D better/{DATE}
+   # CREATED_CATEGORY_SLUGS is a space-delimited string, e.g. "security code-quality tests"
+   for slug in $CREATED_CATEGORY_SLUGS; do
+     git branch -d "better/$slug" || echo "warning: local branch better/$slug not found or not fully merged — skipping (use -D to force)"
+     git push origin --delete "better/$slug" || echo "warning: remote branch better/$slug not found or already deleted"
+   done
    ```
-   ```bash
-   git push origin --delete better/{DATE}
-   git push origin --delete better/security better/code-quality better/dry better/arch-bugs better/stack-specific
-   ```
-   Ignore errors from `--delete` if a branch doesn't exist remotely.
+   `-D` (force delete) is used only for the staging branch `better/{DATE}` because it is intentionally unmerged — its file contents are cherry-picked into category branches. Category branches use `-d` (safe delete) so that unmerged work is not accidentally lost; if a category branch was not merged, the warning will surface it. The guards prevent errors from interrupting cleanup.
 3. Restore stashed changes (if stashed in Phase 3a):
    ```bash
    git stash pop
@@ -575,8 +714,14 @@ If merge fails (e.g., branch protection, merge conflicts from a prior PR):
 | Architecture       | ...      | ...   | ...     | #number  | pass   | approved |
 | Bugs & Perf        | ...      | ...   | ...     | #number  | pass   | approved |
 | Stack-Specific     | ...      | ...   | ...     | #number  | pass   | approved |
-| Test Coverage      | ...      | (tracked only) | ...     |        |          |
+| Tests              | ...      | ...   | ...     | #number  | pass   | approved |
 | TOTAL              | ...      | ...   | ...     | N PRs    |        |          |
+Test Enhancement Stats:
+- Vacuous tests fixed: {VACUOUS_TESTS_FIXED}
+- Weak tests strengthened: {WEAK_TESTS_STRENGTHENED}
+- New test cases added: {NEW_TEST_CASES}
+- New test files created: {NEW_TEST_FILES}
 ```
 ## Error Recovery
@@ -602,6 +747,7 @@ If merge fails (e.g., branch protection, merge conflicts from a prior PR):
 - Each file appears in exactly ONE PR (file ownership map) to prevent merge conflicts between PRs
 - When extracting modules, always add backward-compatible re-exports in the original module to prevent cross-PR breakage
 - Version bump happens exactly once on the first category branch based on aggregate commit analysis
-- Only CRITICAL, HIGH, and MEDIUM findings are auto-remediated; LOW and Test Coverage remain tracked in PLAN.md
+- Only CRITICAL, HIGH, and MEDIUM findings are auto-remediated for code categories; LOW findings remain tracked in PLAN.md
+- Test Quality & Coverage findings are remediated in Phase 4c with a dedicated test enhancement agent that verifies tests fail when code is broken
 - GitLab projects skip the Copilot review loop entirely (Phase 6) and stop after MR creation
 - CI must pass on each PR before requesting Copilot review or merging

package/commands/do/review.md CHANGED Viewed

@@ -94,6 +94,9 @@ Check every file against this checklist. The checklist is organized into tiers
 **Cross-file consistency**
 - If a new function/endpoint follows a pattern from an existing similar one, verify ALL aspects match (validation, error codes, response shape, cleanup). Partial copying is the #1 source of review feedback.
 - New API client functions should use the same encoding/escaping as existing ones (e.g., if other endpoints use `encodeURIComponent`, new ones must too)
+- If the PR adds a new endpoint, trace where existing endpoints are registered and verify the new one is wired in all runtime adapters (serverless handler map, framework route file, API gateway config, local dev server) — a route registered in one adapter but missing from another will silently 404 in the missing runtime
+- If the PR adds a new call to an external service that has established mock/test infrastructure (mock mode flags, test helpers, dev stubs), verify the new call uses the same patterns — bypassing them makes the new code path untestable in offline/dev environments and inconsistent with existing integrations
+- If the PR adds a new UI component or client-side consumer against an existing API endpoint, read the actual endpoint handler or response shape — verify every field name, nesting level, identifier property, and response envelope path used in the consumer matches what the producer returns. This is the #1 source of "renders empty" bugs in new views built against existing APIs
 **Error path completeness**
 - Trace each error path end-to-end: does the error reach the user with a helpful message and correct HTTP status? Or does it get swallowed, logged silently, or surface as a generic 500?
@@ -148,8 +151,9 @@ Check every file against this checklist. The checklist is organized into tiers
 **Data model vs access pattern alignment**
 - If the PR adds queries that claim ordering (e.g., "recent", "top"), verify the underlying key/index design actually supports that ordering natively — random UUIDs and non-time-sortable keys require full scans and in-memory sorting, which degrades at scale
-**Deletion/lifecycle cleanup completeness**
+**Deletion/lifecycle cleanup and aggregate reset completeness**
 - If the PR adds a delete or destroy function, trace all resources created during the entity's lifecycle (data directories, git branches, child records, temporary files, worktrees) and verify each is cleaned up on deletion. Compare with existing delete functions in the codebase for completeness patterns
+- If the PR adds a state transition that resets an aggregate value (counter, score, flag count), trace all individual records that contribute to that aggregate and verify they are also cleared, archived, or versioned — a reset counter with stale contributing records causes inconsistency and blocks duplicate-prevention checks on re-entry
 **Update schema depth**
 - If the PR derives an update/patch schema from a create schema (e.g., `.partial()`, `Partial<T>`), verify that nested objects also become partial — shallow partial on deeply-required schemas rejects valid partial updates where the caller only wants to change one nested field
@@ -163,9 +167,28 @@ Check every file against this checklist. The checklist is organized into tiers
 **Read-after-write consistency**
 - If the PR writes to a data store and then immediately queries that store (especially scans, aggregations, or replica reads), check whether the store's consistency model guarantees visibility of the write. If not, flag the read as potentially stale and suggest computing from in-memory state, using consistent-read options, or adding a delay/caveat
+**Security-sensitive configuration parsing**
+- If the PR reads environment variables or config values that affect security behavior (proxy trust depth, rate limit thresholds, CORS origins, token expiry), verify the parsing enforces the expected type and range — e.g., integer-only via `parseInt` with `Number.isInteger` check, non-negative bounds, and a logged fallback to a safe default on invalid input. `Number()` on arbitrary strings accepts floats, negatives, and empty-string-as-zero, all of which can silently weaken security controls
+**Multi-source data aggregation**
+- If the PR aggregates items from multiple sources into a single collection (merging accounts, combining API results, flattening caches), verify each item retains its source identifier through the aggregation — downstream operations that need to route back to the correct source (updates, deletes, detail views) will silently break or operate on the wrong source if the origin is lost
+**Field-set enumeration consistency**
+- If the PR adds an operation that targets a set of entity fields (enrichment, validation, migration, sync), trace every other location that independently enumerates those fields — UI predicates, scan/query filters, API documentation, response shapes, and test assertions. Each must cover the same field set; a missed field causes silent skips or false UI state. Prefer deriving enumerations from a single source of truth (constant array, schema keys) over maintaining independent lists
+**Abstraction layer fidelity**
+- If the PR calls a third-party API through an internal wrapper/abstraction layer, trace whether the wrapper requests and forwards all fields the handler depends on — third-party APIs often have optional response attributes that require explicit opt-in (e.g., cancellation reasons, extended metadata). Code branching on fields the wrapper doesn't forward will silently receive `undefined` and take the wrong path. Also verify that test mocks match what the real wrapper returns, not what the underlying API could theoretically return
+**Data model / status lifecycle changes**
+- If the PR changes the set of valid statuses, enum values, or entity lifecycle states, sweep all dependent artifacts: API doc summaries and enum declarations, UI filter/tab options, conditional rendering branches (which actions to show per state), integration guide examples, route names derived from old status names, and test assertions. Each artifact that references the old value set must be updated — partial updates leave stale filters, invalid actions, and misleading documentation
+- If the PR renames a concept (e.g., "flagged" → "rejected"), trace all manifestations beyond user-facing labels: route paths, component/file names, variable names, CSS classes, and test descriptions. Internal identifiers using the old name create confusion even when the UI is correct
 **Formatting & structural consistency**
 - If the PR adds content to an existing file (list items, sections, config entries), verify the new content matches the file's existing indentation, bullet style, heading levels, and structure — rendering inconsistencies are the most common Copilot review finding
+**Query key / stored key precision alignment**
+- If the PR adds queries that construct lookup keys with a different precision, encoding, or format than what the write path persists, the query will silently return zero matches. Trace the key construction in both write and read paths and verify they produce compatible values
 </deep_checks>
 <verify_findings>

package/lib/code-review-checklist.md CHANGED Viewed

@@ -17,13 +17,15 @@
    - Null/undefined access without guards, off-by-one errors, object spread of potentially-null values (spread of null is `{}`, silently discarding state)
    - Data from external/user sources (parsed JSON, API responses, file reads) used without structural validation — guard against parse failures, missing properties, wrong types, and null elements before accessing nested values. When parsed data is optional enrichment, isolate failures so they don't abort the main operation
    - Type coercion edge cases — `Number('')` is `0` not empty, `0` is falsy in truthy checks, `NaN` comparisons are always false; string comparison operators (`<`, `>`, `localeCompare`) do lexicographic, not semantic, ordering (e.g., `"10" < "2"`). Use explicit type checks (`Number.isFinite()`, `!= null`) and dedicated libraries (e.g., semver for versions) instead of truthy guards or lexicographic ordering when zero/empty are valid values or semantic ordering matters
-   - Functions that index into arrays without guarding empty arrays; state/variables declared but never updated or only partially wired up
+   - Functions that index into arrays without guarding empty arrays; aggregate operations (`every`, `some`, `reduce`) on potentially-empty collections returning vacuously true/default values that mask misconfiguration or missing data; state/variables declared but never updated or only partially wired up
    - Shared mutable references — module-level defaults passed by reference mutate across calls (use `structuredClone()`/spread); `useCallback`/`useMemo` referencing a later `const` (temporal dead zone); object spread followed by unconditional assignment that clobbers spread values
    - Functions with >10 branches or >15 cyclomatic complexity — refactor into smaller units
    **API & URL safety**
    - User-supplied or system-generated values interpolated into URL paths, shell commands, file paths, or subprocess arguments without encoding/validation — use `encodeURIComponent()` for URLs, regex allowlists for execution boundaries. Generated identifiers used as URL path segments must be safe for your router/storage (no `/`, `?`, `#`; consider allowlisting characters and/or applying `encodeURIComponent()`). Identifiers derived from human-readable names (slugs) used for namespaced resources (git branches, directories) need a unique suffix (ID, hash) to prevent collisions between entities with the same or similar names
    - Route params passed to services without format validation; path containment checks using string prefix without path separator boundary (use `path.relative()`)
+   - Parameterized/wildcard routes registered before specific named routes — the generic route captures requests meant for the specific endpoint (e.g., `/:id` registered before `/drafts` matches `/drafts` as `id="drafts"`). Verify route registration order or use path prefixes to disambiguate
+   - Stored or external URLs rendered as clickable links (`href`, `src`, `window.open`) without protocol validation — `javascript:`, `data:`, and `vbscript:` URLs execute in the user's browser. Allowlist `http:`/`https:` (and `mailto:` if needed) before rendering; for all other schemes, render as plain text or strip the value
    - Error/fallback responses that hardcode security headers instead of using centralized policy — error paths bypass security tightening
    **Trust boundaries & data exposure**
@@ -40,32 +42,36 @@
    - Optimistic updates using full-collection snapshots for rollback — a second in-flight action gets clobbered. Use per-item rollback and functional state updaters after async gaps; sync optimistic changes to parent via callback or trigger refetch on remount
    - State updates guarded by truthiness of the new value (`if (arr?.length)`) — prevents clearing state when the source legitimately returns empty. Distinguish "no response" from "empty response"
    - Mutation/trigger functions that return or propagate stale pre-mutation state — if a function activates, updates, or resets an entity, the returned value and any dependent scheduling/evaluation state (backoff timers, "last run" timestamps, status flags) must reflect the post-mutation state, not a snapshot read before the mutation
-   - Fire-and-forget or async writes where the in-memory object is not updated (response returns stale data) or is updated unconditionally regardless of write success (response claims state that was never persisted) — update in-memory state conditionally on write outcome, or document the tradeoff explicitly
+   - Fire-and-forget or async writes where the in-memory object is not updated (response returns stale data) or is updated unconditionally regardless of write success (response claims state that was never persisted) — update in-memory state conditionally on write outcome, or document the tradeoff explicitly. Also applies to responses and business-logic decisions (threshold triggers, status transitions) derived from pre-transaction reads — concurrent writers all read the same stale value, so thresholds may be crossed without triggering the transition. Compute from post-write state or use conditional expressions that evaluate the stored value. For monotonic counters (sequence numbers, cursors) that must stay in lockstep with append-only storage, advancing before the write risks the counter running ahead on failure; not advancing after a partial write risks reuse — reserve the range before writing and commit only on success
+   - Error/early-exit paths that return status metadata (pagination flags, truncation indicators, hasMore, completion markers) or emit events (WebSocket, SSE, pub/sub) with default/initial values instead of reflecting actual accumulated state — downstream consumers make incorrect decisions (e.g., treating a failed sync as successful because the completion event was emitted unconditionally). Set metadata flags and event payloads based on actual outcome, not just the final request's exit path
    - Missing `await` on async operations in error/cleanup paths — fire-and-forget cleanup (e.g., aborting a failed operation, rolling back partial state) that must complete before the function returns or the caller proceeds
    - `Promise.all` without error handling — partial load with unhandled rejection. Wrap with fallback/error state
+   - Sequential processing of items (loops over external operations, batch mutations) where one item throwing aborts all remaining items — wrap per-item operations in try/catch with logging so partial progress is preserved and failures are isolated
    - Side effects during React render (setState, navigation, mutations outside useEffect)
    **Error handling** _[applies when: code has try/catch, .catch, error responses, or external calls]_
-   - Service functions throwing generic `Error` for client-caused conditions — bubbles as 500 instead of 400/404. Use typed error classes with explicit status codes; ensure consistent error responses across similar endpoints. Include expected concurrency/conditional failures (transaction cancellations, optimistic lock conflicts) — catch and translate to 409/retry rather than letting them surface as 500
-   - Swallowed errors (empty `.catch(() => {})`), handlers that replace detailed failure info with generic messages, and error/catch handlers that exit cleanly (`exit 0`, `return`) without any user-visible output — surface a notification, propagate original context, and make failures look like failures
+   - Service functions throwing generic `Error` for client-caused conditions — bubbles as 500 instead of 400/404. Use typed error classes with explicit status codes; ensure consistent error responses across similar endpoints — when multiple endpoints make the same access-control decision (e.g., "resource exists but caller lacks access"), they must return the same HTTP status (typically 404 to avoid leaking existence). Include expected concurrency/conditional failures (transaction cancellations, optimistic lock conflicts) — catch and translate to 409/retry rather than letting them surface as 500
+   - Swallowed errors (empty `.catch(() => {})`), handlers that replace detailed failure info with generic messages, and error/catch handlers that exit cleanly (`exit 0`, `return`) without any user-visible output — surface a notification, propagate original context, and make failures look like failures. Includes external service wrappers that return `null`/empty for all non-success responses — collapsing configuration errors (missing API key), auth failures (403), rate limits (429), and server errors (5xx) into a single "not found" return masks outages and misconfiguration as normal "no match" results. Distinguish retriable from non-retriable failures and surface infrastructure errors loudly
    - Destructive operations in retry/cleanup paths assumed to succeed without their own error handling — if cleanup fails, retry logic crashes instead of reporting the intended failure
    - External service calls without configurable timeouts — a hung downstream service blocks the caller indefinitely
    - Missing fallback behavior when downstream services are unavailable (see also: retry without backoff in "Sync & replication")
    **Resource management** _[applies when: code uses event listeners, timers, subscriptions, or useEffect]_
    - Event listeners, socket handlers, subscriptions, timers, and useEffect side effects are cleaned up on unmount/teardown
-   - Deletion/destroy functions that clean up the primary resource but leave orphaned secondary resources (data directories, git branches, child records, temporary files) — trace all resources created during the entity's lifecycle and verify each is removed on delete
+   - Deletion/destroy and state-reset functions that clean up or reset the primary resource but leave orphaned or inconsistent secondary resources (data directories, git branches, child records, temporary files, per-user flag/vote items) — trace all resources created during the entity's lifecycle and verify each is removed on delete. For state transitions that reset aggregate values (counters, scores, flags), also clear or version the individual records that contributed to those aggregates — otherwise the aggregate and its sources disagree, and duplicate-prevention checks block legitimate re-entry
    - Initialization functions (schedulers, pollers, listeners) that don't guard against multiple calls — creates duplicate instances. Check for existing instances before reinitializing
    **Validation & consistency** _[applies when: code handles user input, schemas, or API contracts]_
    - API versioning: breaking changes to public endpoints without version bump or deprecation path
    - Backward-incompatible response shape changes without client migration plan
+   - Backward compatibility breaking changes — renamed/removed config keys, changed file formats, altered DB schemas, modified event payloads, renamed URL routes/paths, or restructured persisted data (localStorage, files, database rows) without a migration path or fallback that reads the old format. For route/URL renames, add redirects from old paths to preserve bookmarks and external links. Trace all consumers of the changed contract (other services, CLI versions, stored data) and verify they still work or have an upgrade path. For schema changes, require a migration script; for config/format changes, support both old and new formats during a transition period or provide a one-time converter
    - New endpoints/schemas should match validation patterns of existing similar endpoints — field limits, required fields, types, error handling. If validation exists on one endpoint for a param, the same param on other endpoints needs the same validation
    - When a validation/sanitization function is introduced for a field, trace ALL write paths (create, update, sync, import) — partial application means invalid values re-enter through the unguarded path
    - Schema fields accepting values downstream code can't handle; Zod/schema stripping fields the service reads (silent `undefined`); config values persisted but silently ignored by the implementation — trace each field through schema → service → consumer. Update schemas derived from create schemas (e.g., `.partial()`) must also make nested object fields optional — shallow partial on a deeply-required schema rejects valid partial updates. Additionally, `.deepPartial()` or `.partial()` on schemas with `.default()` values will apply those defaults on update, silently overwriting existing persisted values with defaults — create explicit update schemas without defaults instead
    - Entity creation without case-insensitive uniqueness checks — names differing only in case (e.g., "MyAgent" vs "myagent") cause collisions in case-insensitive contexts (file paths, git branches, URLs). Normalize to lowercase before comparing
-   - Handlers reading properties from framework-provided objects using field names the framework doesn't populate — silent `undefined`. Verify property names match the caller's contract
-   - Data model fields that have different names depending on the creation/write path (e.g., `createdAt` vs `created`) — code referencing only one naming convention silently misses records created through other paths. Trace all write paths to discover the actual field names in use
+   - Code reading properties from API responses, framework-provided objects, or internal abstraction layers using field names the source doesn't populate or forward — silent `undefined`. Verify property names and nesting depth match the actual response shape (e.g., `response.items` vs `response.data.items`, `obj.placeId` vs `obj.id`, flat fields vs nested sub-objects). When building a new consumer against an existing API, check the producer's actual response — not assumed conventions. When branching on fields from a wrapped third-party API, confirm the wrapper actually requests and forwards those fields (e.g., optional response attributes that require explicit opt-in)
+   - Data model fields that have different names depending on the creation/write path (e.g., `createdAt` vs `created`) — code referencing only one naming convention silently misses records created through other paths. Trace all write paths to discover the actual field names in use. When new logic (access control, UI display, queries) checks only a newly introduced field, verify it falls back to any legacy field that existing records still use — otherwise records created before the migration are silently excluded or inaccessible
+   - Inconsistent "missing value" semantics across layers — one layer treats `null`/`undefined` as missing while another also treats empty strings or whitespace-only strings as missing. Query filters, update expressions, and UI predicates that disagree on what constitutes "missing" cause records to be skipped by one path but processed by another. Define a single `isMissing` predicate and use it consistently, or normalize empty/whitespace values to `null` at write time. Also applies to comparison/detection logic: coercing an absent field to a sentinel (`?? 0`, default parameters) makes the logic treat "unsupported" as a real value — guard with an explicit presence check before comparing
    - Numeric values from strings used without `NaN`/type guards — `NaN` comparisons silently pass bounds checks. Clamp query params to safe lower bounds
    - UI elements hidden from navigation but still accessible via direct URL — enforce restrictions at the route level
    - Summary counters/accumulators that miss edge cases (removals, branch coverage, underflow on decrements — guard against going negative with lower-bound conditions); silent operations in verbose sequences where all branches should print status
@@ -88,6 +94,7 @@
    - Database triggers clobbering explicitly-provided values; auto-incrementing columns that only increment on INSERT, not UPDATE
    - Full-text search with strict parsers (`to_tsquery`) on user input — use `websearch_to_tsquery` or `plainto_tsquery`
    - Dead queries (results never read), N+1 patterns inside transactions, O(n²) algorithms on growing data
+   - Performance optimizations in query/search loops (early exits, capped per-item limits, break-on-first-match) that silently reduce correctness — verify the optimization preserves the same result set as the unoptimized path, especially for dedup/nearest-match queries where stopping early can miss closer or more appropriate results
    - `CREATE TABLE IF NOT EXISTS` as sole migration strategy — won't add columns/indexes on upgrade. Use `ALTER TABLE ... ADD COLUMN IF NOT EXISTS` or a migration framework
    - Functions/extensions requiring specific database versions without verification
    - Migrations that lock tables for extended periods (ADD COLUMN with default on large tables, CREATE INDEX without CONCURRENTLY) — use concurrent operations or batched backfills
@@ -95,7 +102,8 @@
    **Sync & replication** _[applies when: code uses pagination, batch APIs, or data sync]_
    - Upsert/`ON CONFLICT UPDATE` updating only a subset of exported fields — replicas diverge. Document deliberately omitted fields
-   - Pagination using `COUNT(*)` (full table scan) instead of `limit + 1`; endpoints missing `next` token input/output; hard-capped limits silently truncating results
+   - Pagination using `COUNT(*)` (full table scan) instead of `limit + 1`; endpoints missing `next` token input/output; hard-capped limits silently truncating results. When a data store applies query limits before filter expressions, a fixed multiplier on the limit still under-fetches — loop with continuation tokens until the target count of post-filter results is collected
+   - Pagination cursors derived from the last *scanned* item rather than the last *returned* item — if accumulated results are trimmed (e.g., sliced to a page size), the cursor advances past items that were fetched but never delivered, causing permanent skips
    - Batch/paginated API calls (database batch gets, external service calls) that don't handle partial results — unprocessed items, continuation tokens, or rate-limited responses silently dropped. Add retry loops with backoff for unprocessed items
    - Retry loops without backoff or max-attempt limits — tight loops under throttling extend latency indefinitely. Use bounded retries with exponential backoff/jitter
@@ -109,7 +117,7 @@
    - Values crossing serialization boundaries may change format (arrays in JSON vs string literals in DB) — convert consistently
    - Reads issued immediately after writes to an eventually consistent store (database scans, replica reads, cache refreshes) may return stale data — use consistent-read options, compute from in-memory state after confirmed writes, or document the eventual-consistency window
    - BIGINT values parsed into JavaScript `Number` — precision lost past `MAX_SAFE_INTEGER`. Use strings or `BigInt`
-   - Data model key/index design that doesn't support required query access patterns — e.g., claiming "recent" ordering but using non-time-sortable keys (random UUIDs, user IDs). Verify sort keys and indexes can serve the queries the code performs without full-partition scans and in-memory sorting
+   - Data model key/index design that doesn't support required query access patterns — e.g., claiming "recent" ordering but using non-time-sortable keys (random UUIDs, user IDs). Verify sort keys and indexes can serve the queries the code performs without full-partition scans and in-memory sorting. When a new write path creates or associates an entity through a different attribute than the primary index (e.g., adding co-owners to an array field when the discovery index queries a single-owner scalar field), verify existing listing/discovery queries can surface the new association — otherwise the new data is persisted but undiscoverable
    **Shell & portability** _[applies when: code spawns subprocesses, uses shell scripts, or builds CLI tools]_
    - Subprocess calls under `set -e` abort on failure; non-critical writes fail on broken pipes — use `|| true` for non-critical output
@@ -130,11 +138,14 @@
 ## Tier 4 — Always Check (Quality, Conventions, AI-Generated Code)
    **Intent vs implementation**
-   - Labels, comments, status messages, or documentation that describe behavior the code doesn't implement — e.g., a map named "renamed" that only deletes, or an action labeled "migrated" that never creates the target
+   - Labels, comments, status messages, or documentation that describe behavior the code doesn't implement — e.g., a map named "renamed" that only deletes, an action labeled "migrated" that never creates the target, or UI actions offered for entity states where the transition is invalid (e.g., a "Reject" button on already-rejected items)
    - Inline code examples, command templates, and query snippets that aren't syntactically valid as written — template placeholders must use a consistent format, queries must use correct syntax for their language (e.g., single `{}` in GraphQL, not `{{}}`)
-   - Cross-references between files (identifiers, parameter names, format conventions, operational thresholds) that disagree — when one reference changes, trace all other files that reference the same entity and update them
+   - Cross-references between files (identifiers, parameter names, format conventions, version numbers, operational thresholds) that disagree — when one reference changes, trace all other files that reference the same entity and update them. This includes internal identifiers (route paths, file names, component names) that should be renamed when the concept they represent is renamed — a nav label saying "Rejected" pointing to `/admin/flagged` or a component named `FlaggedList` rendering rejected items creates maintenance confusion. For releases, verify version consistency across all versioned artifacts (package manifests, lockfiles, API specs, changelogs, PR metadata). Also applies to field-set enumerations: when an operation targets a set of entity fields, every predicate, filter expression, scan criteria, API doc, and UI conditional that enumerates those fields must stay in sync — an independently maintained list that omits a field causes silent skips or false positives
+   - Template/workflow variables referenced (`{VAR_NAME}`) but never assigned — trace each placeholder to a definition step; undefined variables cause silent failures or confusing instructions. Also check for colliding identifiers (two distinct concepts mapped to the same slug, key, or name)
    - Responsibility relocated from one module to another (e.g., writes moved from handler to middleware) without updating all consumers that depended on the old location's timing, return value, or side effects — trace callers that relied on the synchronous or co-located behavior and verify they still work with the new execution point. Remove dead code left behind at the old location
    - Sequential instructions or steps whose ordering doesn't match the required execution order — readers following in order will perform actions at the wrong time (e.g., "record X" in step 2 when X must be captured before step 1's action)
+   - Constraints, limits, or guardrails described in a preamble or summary that are not enforced by an explicit condition in the procedural steps below — the description promises safety but the steps don't implement it. Add an explicit check/exit condition tied to the stated constraint
+   - Duplicate or contradictory items in sequential lists — copy/paste producing two entries for the same case with conflicting instructions. Deduplicate and reconcile
    - Sequential numbering (section numbers, step numbers) with gaps or jumps after edits — verify continuity
    - Completion markers, success flags, or status files written before the operation they attest to finishes — consumers see false success if the operation fails after the write
    - Existence checks (directory exists, file exists, module resolves) used as proof of correct/complete installation — a directory can exist but be empty, a file can exist with invalid contents. Verify the specific resource the consumer needs
@@ -174,8 +185,11 @@
    - Tests re-implementing logic under test instead of importing real exports — pass even when real code regresses
    - Tests depending on real wall-clock time or external dependencies when testing logic — use fake timers and mocks
    - Missing tests for trust-boundary enforcement — submit tampered values, verify server ignores them
+   - Tests that exercise code paths depending on features the integration layer doesn't expose — they pass against mocks but the behavior can't trigger in production. Verify mocked responses match what the real dependency actually returns
+   - Test mock state leaking between tests — mock setup APIs that configure return values often persist across tests even after clearing call history, because "clear" resets invocation counts but not configured behavior (use "reset" variants that restore original implementations). Conversely, per-call sequential mock responses couple tests to internal call count — prefer stable return values for behavior tests, sequential mocks only when verifying call order
    - Tests that pass but don't cover the changed code paths — passing unrelated tests is not validation
    **Style & conventions**
    - Naming and patterns consistent with the rest of the codebase
-   - Formatting consistency within each file — new content must match existing indentation, bullet style, heading levels, and structure
+   - Formatting consistency within each file — new content must match existing indentation, bullet style, heading levels, and structure. For structured files that follow a convention across sibling files (changelogs, config files, migration files), verify new entries use the same section headers, field names, and ordering as existing siblings
+   - Shell/workflow instructions with destructive operations (branch deletion, file removal, force operations) must verify preconditions first — e.g., ensure you're not on a branch being deleted, confirm the target exists, and don't suppress stderr from commands where failures indicate real problems (auth errors, network issues)

package/lib/copilot-review-loop.md CHANGED Viewed

@@ -12,7 +12,9 @@ You are a Copilot review loop agent.
 PR: {PR_NUMBER} in {OWNER}/{REPO}
 Branch: {BRANCH_NAME}
 Build command: {BUILD_CMD}
-Max iterations: 5
+Max iterations: unlimited (loop until Copilot returns 0 comments)
+Safety guardrail: after 10 iterations, report back and ask the user
+whether to continue or stop — never loop indefinitely without confirmation.
 TIMEOUT SCHEDULE:
 When running parallel PR reviews (do:better), use shorter waits to avoid
@@ -28,8 +30,7 @@ that (minimum 5 minutes, maximum 20 minutes). Copilot reviews can take
 10-15 minutes for large diffs.
 Poll interval: 30 seconds for all iterations.
-Run the following loop until Copilot returns zero new comments or you hit
-the max iteration limit:
+Run the following loop until Copilot returns zero new comments:
 1. CAPTURE the latest Copilot review submittedAt timestamp (so you can
    detect when a NEW review arrives):
@@ -75,10 +76,14 @@ the max iteration limit:
    - Resolve the thread via GraphQL mutation using stdin JSON piping:
      echo '{"query":"mutation { resolveReviewThread(input: {threadId: \"{THREAD_ID}\"}) { thread { id isResolved } } }"}' | gh api graphql --input -
    - After all threads resolved, push all commits to remote
-   - Increment iteration counter and go back to step 1
+   - Increment iteration counter
+   - If iteration counter reaches 10, stop the loop and report back with
+     status "guardrail" — the parent agent will ask the user whether to
+     continue or stop
+   - Otherwise, go back to step 1
 When done, report back:
-- Final status: clean / max-iterations-reached / timeout / error
+- Final status: clean / timeout / error / guardrail
 - Total iterations completed
 - List of commits made (if any)
 - Any unresolved threads remaining

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "slash-do",
-  "version": "1.5.1",
+  "version": "1.6.1",
   "description": "Curated slash commands for AI coding assistants — Claude Code, OpenCode, Gemini CLI, and Codex",
   "author": "Adam Eivy <adam@eivy.com>",
   "license": "MIT",