npm - slash-do - Versions diffs - 1.4.0 → 1.4.2 - Mend

slash-do 1.4.0 → 1.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/commands/do/better.md +92 -52
package/commands/do/pr.md +0 -2
package/commands/do/review.md +24 -0
package/lib/code-review-checklist.md +82 -105
package/lib/copilot-review-loop.md +82 -41
package/package.json +1 -1

package/commands/do/better.md CHANGED Viewed

@@ -465,6 +465,8 @@ After creating all PRs, verify CI passes on each one:
 Maximum 5 iterations per PR to prevent infinite loops.
+**IMPORTANT — Sub-agent delegation**: To prevent context exhaustion on long review cycles with multiple PRs, delegate each PR's review loop to a **separate general-purpose sub-agent** via the Agent tool. Launch sub-agents in parallel (one per PR). Each sub-agent runs the full loop (request → wait → check → fix → re-request) autonomously and returns only the final status.
 ### 6.0: Verify browser authentication
 If `BROWSER_AUTHENTICATED` is not true (e.g., Phase 0e was skipped or failed):
@@ -472,71 +474,109 @@ If `BROWSER_AUTHENTICATED` is not true (e.g., Phase 0e was skipped or failed):
 2. Check for user avatar/menu
 3. If not logged in: navigate to `https://github.com/login`, inform the user **"Please log in to GitHub in the browser. I'll wait for you to confirm."**, and use `AskUserQuestion` to wait
-### 6.1: Request Copilot reviews on all PRs
-For each PR:
+### 6.1: Determine review request method
-**Try the API first:**
+**Try the API first** on any one PR:
 ```bash
-gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/requested_reviewers --method POST --input - <<< '{"reviewers":["copilot-pull-request-reviewer"]}'
+gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/requested_reviewers \
+  -f 'reviewers[]=copilot-pull-request-reviewer[bot]'
 ```
-If this returns 422 ("not a collaborator"), **fall back to Playwright** for each PR:
-1. Navigate to `{PR_URL}`
-2. Click the "Reviewers" gear button in the PR sidebar
-3. In the dropdown menu, click the Copilot `menuitemradio` option (not the "Request" button which may be obscured by the dropdown header)
-4. Verify the sidebar shows "Awaiting requested review from Copilot"
+If this returns 422 ("not a collaborator"), record `REVIEW_METHOD=playwright`. Otherwise record `REVIEW_METHOD=api`.
-### 6.2: Poll for review completion
+### 6.2: Launch parallel sub-agents (one per PR)
-**Dynamic poll timing**: Before your first poll, check how long the most recent Copilot review on this PR took by comparing consecutive Copilot review `submittedAt` timestamps (or PR creation time for the first review). Use that duration as your expected wait. If no prior review exists, default to 5 minutes. Set poll interval to 60 seconds and max wait to **2x the expected duration** (minimum 5 minutes, maximum 20 minutes). Copilot reviews can take **10-15 minutes** for large diffs — do NOT give up early.
+For each PR, spawn a general-purpose sub-agent with:
-For each PR, poll using GraphQL to check for a new Copilot review:
-```bash
-echo '{"query":"{ repository(owner: \"OWNER\", name: \"REPO\") { pullRequest(number: PR_NUM) { reviews(last: 5) { nodes { state body author { login } submittedAt } } reviewThreads(first: 100) { nodes { id isResolved comments(first: 3) { nodes { body path line author { login } } } } } } } }"}' | gh api graphql --input -
 ```
+You are a Copilot review loop agent for PR {PR_NUMBER}.
-The review is complete when a new `copilot-pull-request-reviewer[bot]` review appears with a `submittedAt` after your request. If no review appears after max wait, **ask the user** whether to continue waiting, re-request, or skip.
-**Error detection**: After a review appears, check its `body` for error text such as "Copilot encountered an error" or "unable to review this pull request". If found, this is NOT a successful review — log a warning, re-request the review (step 6.1), and resume polling from 6.2. Allow up to 3 error retries per PR before asking the user whether to continue or skip.
-### 6.3: Check for unresolved threads
-For each reviewed PR, fetch review threads via GraphQL using stdin JSON (**never use `$variables`** — shell expansion consumes `$` signs):
-```bash
-echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviewThreads(first: 100) { nodes { id isResolved comments(first: 10) { nodes { body path line author { login } } } } } } } }"}' | gh api graphql --input -
+Repository: {OWNER}/{REPO}
+Branch: better/{CATEGORY_SLUG}
+Build command: {BUILD_CMD}
+Review request method: {REVIEW_METHOD}
+Max iterations: 5
+DECREASING TIMEOUT SCHEDULE (shorter than single-PR review since multiple
+PRs are reviewed in parallel — see do:rpr for single-PR dynamic timing):
+- Iteration 1: max wait 5 minutes
+- Iteration 2: max wait 4 minutes
+- Iteration 3: max wait 3 minutes
+- Iteration 4: max wait 2 minutes
+- Iteration 5+: max wait 1 minute
+Poll interval: 30 seconds for all iterations.
+Run the following loop until Copilot returns zero new comments or you hit
+the max iteration limit:
+1. CAPTURE the latest Copilot review timestamp, then REQUEST a new review:
+   - First, capture the latest Copilot review timestamp via GraphQL:
+     echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviews(last: 20) { nodes { author { login } submittedAt } } } } }"}' | gh api graphql --input -
+   - Find the most recent submittedAt where author.login is
+     copilot-pull-request-reviewer[bot] and record as LAST_COPILOT_SUBMITTED_AT.
+   - If no prior Copilot review exists, record LAST_COPILOT_SUBMITTED_AT=NONE
+     and treat the next Copilot review as NEW regardless of timestamp.
+   - Then REQUEST:
+     If REVIEW_METHOD is "api":
+       gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/requested_reviewers \
+         -f 'reviewers[]=copilot-pull-request-reviewer[bot]'
+     If REVIEW_METHOD is "playwright":
+       Navigate to the PR URL, click the "Reviewers" gear button, click the
+       Copilot menuitemradio option, verify sidebar shows "Awaiting requested
+       review from Copilot"
+2. WAIT for the review (BLOCKING):
+   - Poll using stdin JSON piping (avoid shell-escaping issues):
+     echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviews(last: 5) { totalCount nodes { state body author { login } submittedAt } } reviewThreads(first: 100) { nodes { id isResolved comments(first: 3) { nodes { body path line author { login } } } } } } } }"}' | gh api graphql --input -
+   - Complete when a new copilot-pull-request-reviewer[bot] review appears
+     with submittedAt after LAST_COPILOT_SUBMITTED_AT captured in step 1
+     (or, if LAST_COPILOT_SUBMITTED_AT=NONE, when the first
+     copilot-pull-request-reviewer[bot] review for this loop appears)
+   - Use the DECREASING TIMEOUT for the current iteration number
+   - Error detection: if review body contains "Copilot encountered an error"
+     or "unable to review", re-request and resume. Max 3 error retries.
+   - If no review after max wait, report timeout and exit
+3. CHECK for unresolved threads:
+   Fetch threads via stdin JSON piping:
+     echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviewThreads(first: 100) { nodes { id isResolved comments(first: 10) { nodes { body path line author { login } } } } } } } }"}' | gh api graphql --input -
+   - Verify review was successful (no error text in body)
+   - If zero comments / no unresolved threads: report success and exit
+   - If unresolved threads exist: proceed to step 4
+4. FIX all unresolved threads:
+   For each unresolved thread:
+   - Read the referenced file and understand the feedback
+   - Evaluate: valid feedback → make the fix; informational/false positive →
+     resolve without changes
+   - If fixing:
+     git checkout better/{CATEGORY_SLUG}
+     # make changes
+     git add <specific files>
+     git commit -m "address Copilot review feedback"
+     git push
+   - Resolve thread via stdin JSON piping:
+     echo '{"query":"mutation { resolveReviewThread(input: {threadId: \"{THREAD_ID}\"}) { thread { id isResolved } } }"}' | gh api graphql --input -
+   - After all threads resolved, increment iteration and go back to step 1
+When done, report back:
+- Final status: clean / max-iterations-reached / timeout / error
+- Total iterations completed
+- List of commits made (if any)
+- Any unresolved threads remaining
 ```
-Save to `/tmp/better_threads_{PR_NUMBER}.json` for parsing.
-- If **no unresolved threads** → mark PR as ready to merge
-- If **unresolved threads exist** → proceed to 6.4 (fix)
-### 6.4: Fix unresolved threads
+Launch all PR sub-agents in parallel. Wait for all to complete.
-For each unresolved thread on each PR:
-1. Read the referenced file and understand the feedback
-2. Evaluate: is the feedback valid? Some Copilot comments are informational or about pre-existing patterns.
-   - **Valid feedback**: make the code fix
-   - **Informational/false positive**: resolve the thread without changes
-3. If fixing:
-   ```bash
-   git checkout better/{CATEGORY_SLUG}
-   # make changes
-   git add <specific files>
-   git commit -m "address review: {summary of change}"
-   git push
-   ```
-4. Resolve the thread via GraphQL mutation:
-   ```bash
-   echo '{"query":"mutation { resolveReviewThread(input: {threadId: \"{THREAD_ID}\"}) { thread { id isResolved } } }"}' | gh api graphql --input -
-   ```
+### 6.3: Handle sub-agent results
-After all threads are resolved on a PR:
-1. Increment that PR's `REVIEW_ITERATION`
-2. If `REVIEW_ITERATION >= 5`: inform the user "Reached max review iterations (5) on PR #{number}. Remaining issues may need manual review."
-3. Otherwise: re-request Copilot review on that PR and repeat from 6.2
+For each sub-agent result:
+- **clean**: mark PR as ready to merge
+- **timeout**: ask the user whether to continue waiting, re-request, or skip
+- **max-iterations-reached**: inform the user "Reached max review iterations (5) on PR #{number}. Remaining issues may need manual review."
+- **error**: inform the user and ask whether to retry or skip
-### 6.5: Merge
+### 6.4: Merge
 For each PR that has passed CI and review (in dependency order if applicable):
 ```bash
@@ -599,7 +639,7 @@ If merge fails (e.g., branch protection, merge conflicts from a prior PR):
 - **Push failure**: `git pull --rebase --autostash` then retry push
 - **CI failure on PR**: investigate logs, fix in a new commit, push, re-check (max 3 attempts per PR)
 - **Cross-PR dependency breakage**: add backward-compatible re-exports or move shared files to the PR that creates them
-- **Copilot timeout** (review not received within 3 min): inform user, offer to merge without review approval or wait longer
+- **Copilot timeout** (review not received within decreasing timeout window): inform user, offer to merge without review approval or wait longer
 - **Copilot review loop exceeds 5 iterations per PR**: stop iterating on that PR, inform user, proceed to merge
 - **Existing worktree found at startup**: ask user — resume (reuse worktree) or cleanup (remove and start fresh)
 - **No findings above LOW**: skip Phases 3-7, print "No actionable findings" with the LOW summary

package/commands/do/pr.md CHANGED Viewed

@@ -34,8 +34,6 @@ Before creating the PR, perform a thorough self-review. Read each changed file
 - Create a PR from `{current_branch}` to `{default_branch}`
 - Create a rich PR description
-**IMPORTANT**: During each fix cycle in the Copilot review loop below, after fixing all review comments and before pushing, also bump the patch version (`npm version patch --no-git-tag-version` or equivalent) and commit the version bump.
 !`cat ~/.claude/lib/copilot-review-loop.md`
 **Report the final status** to the user including PR URL and review outcome.

package/commands/do/review.md CHANGED Viewed

@@ -91,6 +91,30 @@ Check every file against this checklist:
 **Guard-before-cache ordering**
 - If a handler performs a pre-flight guard check (rate limit, quota, feature flag) before a cache lookup or short-circuit path, verify the guard doesn't block operations that would be served from cache without touching the guarded resource — restructure so cache hits bypass the guard
+**Sanitization/validation coverage**
+- If the PR introduces a new validation or sanitization function for a data field, trace every code path that writes to that field (create, update, import, sync, rename) — verify they all use the same sanitization. Partial application is the #1 way invalid data re-enters through an unguarded path
+**Bootstrap/initialization ordering**
+- If the PR adds resilience or self-healing code (dependency installers, auto-repair, migration runners), trace the execution order: does the main code path resolve or import the dependencies BEFORE the resilience code runs? If so, the bootstrapper never executes when it's needed most — restructure so verification/installation precedes resolution
+**Lock/flag exit-path completeness**
+- If a function sets a shared flag or lock (in-progress, mutex, status marker), trace every exit path — early returns, error catches, platform-specific guards, and normal completion — to verify the flag is cleared. A missed path leaves the system permanently locked
+**Operation-marker ordering**
+- If the PR writes completion markers, success flags, or status files, verify they are written AFTER the operation they attest to, not before. If the operation can fail after the marker write, consumers see false success. Also check that marker-dependent startup logic validates the marker's contents rather than treating presence as unconditional success
+**Real-time event vs response timing**
+- If a handler emits push notifications (WebSocket, SSE, pub/sub) AND returns an HTTP response, verify clients won't receive push events before the response that gives them context to interpret those events — especially when the response contains IDs or version numbers the event consumer needs
+**Intent vs implementation (meta-cognitive pass)**
+- For each label, comment, docstring, status message, or inline instruction that describes behavior, verify the code actually implements that behavior. A detection mechanism must query the data it claims to detect; a migration must create the target, not just delete the source
+- If the PR contains inline code examples, command templates, or query snippets, verify they are syntactically valid for their language — run a mental parse of each example. Watch for template placeholder format inconsistencies within and across files
+- If the PR modifies a value (identifier, parameter name, format convention, threshold, timeout) that is referenced in other files, trace all cross-references and verify they agree. This includes: reviewer usernames, API names, placeholder formats, GraphQL field names, operational constants
+- If the PR adds or reorders sequential steps/instructions, verify the ordering matches execution dependencies — readers following steps in order must not perform an action before its prerequisite
+**Formatting & structural consistency**
+- If the PR adds content to an existing file (list items, sections, config entries), verify the new content matches the file's existing indentation, bullet style, heading levels, and structure — rendering inconsistencies are the most common Copilot review finding
 ## Fix Issues Found
 For each issue found:

package/lib/code-review-checklist.md CHANGED Viewed

@@ -1,144 +1,121 @@
    **Hygiene**
-   - Leftover debug code (`console.log`, `debugger`, TODO/FIXME/HACK comments)
-   - Hardcoded secrets, API keys, or credentials
-   - Files that shouldn't be committed (.env, node_modules, build artifacts)
+   - Leftover debug code (`console.log`, `debugger`, TODO/FIXME/HACK), hardcoded secrets/credentials, and uncommittable files (.env, node_modules, build artifacts)
    - Overly broad changes that should be split into separate PRs
    **Imports & references**
-   - Every symbol used in the file is imported (missing imports → runtime crash)
-   - No unused imports introduced by the changes
+   - Every symbol used is imported (missing → runtime crash); no unused imports introduced
    **Runtime correctness**
-   - State/variables that are declared but never updated or only partially wired up (e.g. a state setter that's never called)
+   - Null/undefined access without guards, off-by-one errors, object spread of potentially-null values (spread of null is `{}`, silently discarding state)
+   - Data from external/user sources (parsed JSON, API responses, file reads) used without structural validation — guard against parse failures, missing properties, wrong types, and null elements before accessing nested values. When parsed data is optional enrichment, isolate failures so they don't abort the main operation
+   - Type coercion edge cases — `Number('')` is `0` not empty, `0` is falsy in truthy checks, `NaN` comparisons are always false; string comparison operators (`<`, `>`, `localeCompare`) do lexicographic, not semantic, ordering (e.g., `"10" < "2"`). Use explicit type checks (`Number.isFinite()`, `!= null`) and dedicated libraries (e.g., semver for versions) instead of truthy guards or lexicographic ordering when zero/empty are valid values or semantic ordering matters
+   - Functions that index into arrays without guarding empty arrays; state/variables declared but never updated or only partially wired up
+   - Shared mutable references — module-level defaults passed by reference mutate across calls (use `structuredClone()`/spread); `useCallback`/`useMemo` referencing a later `const` (temporal dead zone); object spread followed by unconditional assignment that clobbers spread values
    - Side effects during React render (setState, navigation, mutations outside useEffect)
-   - Off-by-one errors, null/undefined access without guards
-   - `JSON.parse` on user-editable or external files (config, settings, cache, package metadata) without error handling — corrupted files will crash the process. When the parsed data is optional enrichment (e.g., version info, display metadata), isolate the failure so it doesn't abort the main operation
-   - Accessing properties/methods on parsed JSON objects without verifying expected structure (e.g., `obj.arr.push()` when `arr` might not be an array)
-   - Iterating arrays from external/user-editable sources without guarding each element — a `null` or wrong-type entry throws `TypeError` when treated as an object
-   - Version/string comparisons using `!==` when semantic ordering matters — use proper semver comparison for version checks
-   - `Number('')` produces `0`, not empty — cleared numeric inputs must map to `undefined`/`null`, not `0`, which silently fails validation or sets wrong values
-   - Truthy checks on numeric values where `0` is valid (e.g., `days || 365` treats `0` as falsy) — use `!= null` or explicit undefined checks instead
-   - Functions that index into arrays (`arr[Math.floor(Math.random() * arr.length)]`) without guarding empty arrays — produces `undefined`/`NaN` when `arr.length === 0`
-   - Module-level default/config objects passed by reference to consumers — shared mutation across calls. Use `structuredClone()` or spread when handing out defaults
-   - `useCallback`/`useMemo` referencing a `const` declared later in the same function body — triggers temporal dead zone `ReferenceError`. Ensure dependency declarations appear before their dependents
-   - Object spread/merge followed by unconditional field assignment that clobbers spread values — e.g., `{...input.details, notes: notes || null}` silently overwrites `input.details.notes` even when `notes` is undefined. Only set fields when the overriding value is explicitly provided
-   **Async & UI state consistency**
-   - Optimistic UI state changes (view switches, navigation, success callbacks) before an async operation completes — if the operation fails or is cancelled (drag cancel, upload abort, form dismiss), the UI is stuck in the wrong state with no rollback. Handle both failure and cancellation paths to reset intermediate state
-   - `Promise.all` without try/catch — if any request rejects, the UI ends up partially loaded with an unhandled rejection. Wrap in try/catch with fallback/error state so the view remains usable
-   - Success callbacks (`onSaved()`, `onComplete()`) called unconditionally after an async call — check the return value or catch errors before calling the callback
-   - Debounced/cancelable async operations that don't reset loading state on all code paths (input cleared, stale response arrives, request fails) — loading spinners get stuck and stale results display. Use AbortController or request IDs to discard outdated responses and clear loading in every exit path (including early returns)
-   - Multiple UI state variables representing coupled data (coordinates + display name, selected item + dependent list) updated independently — actions that change one must update all related fields to prevent display/data mismatch
-   - Error notification at multiple layers (shared API client that auto-displays errors + component-level error handling) — verify exactly one layer is responsible for user-facing error messages to avoid duplicate toasts/alerts. Suppress lower-layer notifications when the caller handles its own error display
-   - Optimistic state updates using full-collection snapshots for rollback — if a second user action starts while the first is in-flight, rollback restores the snapshot and clobbers the second action's changes. Use per-item rollback and functional state updaters (`setState(prev => ...)`) after async gaps to avoid stale closures
-   - Child components maintaining local copies of parent-provided data for optimistic updates without propagating changes back — on unmount/remount the parent's stale cache is re-rendered. Sync optimistic changes to the parent via callback alongside local state, or trigger a data refetch on remount
+   **Async & state consistency**
+   - Optimistic state changes (view switches, navigation, success callbacks) before async completion — if the operation fails or is cancelled, the UI is stuck with no rollback. Check return values/errors before calling success callbacks. Handle both failure and cancellation paths
+   - Multiple coupled state variables updated independently — actions that change one must update all related fields; debounced/cancelable operations must reset loading state on every exit path (cleared, stale, failed, aborted)
+   - Error notification at multiple layers (shared API client + component-level) — verify exactly one layer owns user-facing error messages
+   - Optimistic updates using full-collection snapshots for rollback — a second in-flight action gets clobbered. Use per-item rollback and functional state updaters after async gaps; sync optimistic changes to parent via callback or trigger refetch on remount
+   - State updates guarded by truthiness of the new value (`if (arr?.length)`) — prevents clearing state when the source legitimately returns empty. Distinguish "no response" from "empty response"
+   - `Promise.all` without error handling — partial load with unhandled rejection. Wrap with fallback/error state
    **Resource management**
-   - Event listeners, socket handlers, subscriptions, and timers are cleaned up on unmount/teardown
-   - useEffect cleanup functions remove everything the effect sets up
+   - Event listeners, socket handlers, subscriptions, timers, and useEffect side effects are cleaned up on unmount/teardown
+   - Initialization functions (schedulers, pollers, listeners) that don't guard against multiple calls — creates duplicate instances. Check for existing instances before reinitializing
-   **HTTP status codes & error classification**
-   - Service functions that throw generic `Error` for client-caused conditions (not found, invalid input) — these bubble as 500 when they should be 400/404. Use typed error classes with explicit status codes
-   - Consistent error responses across similar endpoints — if one validates, all should
+   **Error handling**
+   - Service functions throwing generic `Error` for client-caused conditions — bubbles as 500 instead of 400/404. Use typed error classes with explicit status codes; ensure consistent error responses across similar endpoints
+   - Swallowed errors (empty `.catch(() => {})`), handlers that replace detailed failure info with generic messages, and error/catch handlers that exit cleanly (`exit 0`, `return`) without any user-visible output — surface a notification, propagate original context, and make failures look like failures
+   - Destructive operations in retry/cleanup paths assumed to succeed without their own error handling — if cleanup fails, retry logic crashes instead of reporting the intended failure
    **API & URL safety**
-   - User-supplied values interpolated into URL paths must use `encodeURIComponent()` — even if the UI restricts input, the API should be safe independently
-   - Route params (`:name`, `:id`) passed to services without validation — add format checks (regex, length limits) at the route level
-   - Data from external APIs or upstream services interpolated into shell commands, file paths, or subprocess arguments without validation — enforce expected format (e.g., regex allowlist) before passing to execution boundaries
-   - Path containment checks using string prefix comparison (`resolvedPath.startsWith(baseDir)`) without a path separator boundary — `baseDir + "evil/..."` passes the check. Use `path.relative()` (reject if starts with `..`) or append `path.sep` to the base
-   - Error/fallback responses that hardcode security headers (CORS, CSP) instead of using the centralized policy — error paths bypass security tightening applied to happy paths. Always reuse shared header middleware/constants
-   **Data exposure**
-   - API responses returning full objects that contain sensitive fields (secrets, tokens, passwords) — destructure and omit before sending. Check ALL response paths (GET, PUT, POST) not just one
-   - Comments/docs claiming data is never exposed while the code path does expose it
+   - User-supplied values interpolated into URL paths, shell commands, file paths, or subprocess arguments without encoding/validation — use `encodeURIComponent()` for URLs, regex allowlists for execution boundaries
+   - Route params passed to services without format validation; path containment checks using string prefix without path separator boundary (use `path.relative()`)
+   - Error/fallback responses that hardcode security headers instead of using centralized policy — error paths bypass security tightening
-   **Client/server trust boundary**
-   - Server trusting client-provided computed/derived values (scores, totals, correctness flags) when the server has the data to recompute them — strip client-provided scoring/summary fields and recompute server-side
-   - Validation schemas requiring clients to submit fields the server should own (e.g., `expected` answers, `correct` flags) — make these optional/omitted in submissions and derive them server-side
-   - API responses leaking answer keys or expected values that the client will later submit back — either strip before responding or use server-side nonce/seed verification
+   **Trust boundaries & data exposure**
+   - API responses returning full objects with sensitive fields — destructure and omit across ALL response paths (GET, PUT, POST, error, socket); comments/docs claiming data isn't exposed while the code path does expose it
+   - Server trusting client-provided computed/derived values (scores, totals, correctness flags) when the server can recompute them — strip and recompute server-side; don't require clients to submit fields the server should own
    **Input handling**
-   - Trimming values where whitespace is significant (API keys, tokens, passwords, base64) — only trim identifiers/names, not secret values
-   - Swallowed errors (empty `.catch(() => {})`) that hide failures from users — at minimum surface a notification on failure
-   - Endpoints that accept unbounded arrays/collections without an upper limit — large payloads can exceed request timeouts, exhaust memory, or create DoS vectors. Enforce a max size and return 400 when exceeded, or move large operations to background jobs
+   - Trimming values where whitespace is significant (API keys, tokens, passwords, base64) — only trim identifiers/names
+   - Endpoints accepting unbounded arrays/collections without upper limits — enforce max size or move to background jobs
    **Validation & consistency**
-   - New endpoints/schemas match validation standards of similar existing endpoints (check for field limits, required fields, types)
-   - New API routes have the same error handling patterns as existing routes
-   - If validation exists on one endpoint for a param, the same param on other endpoints needs the same validation
-   - Schema fields that accept values the rest of the system can't handle (e.g., a field accepts any string but downstream code requires a specific format)
-   - Zod/schema stripping fields the service actually reads — when Zod uses `.strict()` or strips unknown keys, any field the service reads from the validated object must be declared in the schema, otherwise it's silently `undefined`
-   - Config values accepted by the API and persisted but silently ignored by the implementation — trace each config field through schema → service → generator/consumer to verify it's actually used (e.g., a `startRange` saved to config but the generator hardcodes a range)
-   - Handlers/functions that read properties from framework-provided objects (request, event, context) using a field name the framework doesn't populate — results in silent `undefined`. Verify the property name matches the caller's contract, not just the handler's assumption
-   - Numeric query params (`limit`, `offset`, `page`) parsed from strings without lower-bound clamping — `parseInt` can produce 0, negative, or `NaN` values that cause SQL errors or unexpected behavior. Always clamp to safe bounds (e.g., `Math.max(1, ...)`)
-   - Summary counters/accumulators that miss edge cases — if an item is removed, is the count updated? Are all branches counted?
-   - Silent operations in verbose sequences — when a series of operations each prints a status line, ensure all branches print consistent output
-   - UI elements hidden from navigation (filtered tabs, conditional menu items) but still accessible via direct URL — enforce access restrictions at the route/handler level, not just visibility
-   - Labels, comments, or status messages that describe behavior the code doesn't implement — e.g., a map named "renamed" that only deletes, or an action labeled "migrated" that never creates the target
-   - Registering references (config entries, settings pointers) to files or resources without verifying the resource actually exists — a failed download or missing file leaves dangling references that break later operations
-   - Error/catch handlers that exit cleanly (`exit 0`, `return`) without any user-visible output — makes failures look like successes; always print a skip/warning message explaining why the operation was skipped
+   - New endpoints/schemas should match validation patterns of existing similar endpoints — field limits, required fields, types, error handling. If validation exists on one endpoint for a param, the same param on other endpoints needs the same validation
+   - When a validation/sanitization function is introduced for a field, trace ALL write paths (create, update, sync, import) — partial application means invalid values re-enter through the unguarded path
+   - Schema fields accepting values downstream code can't handle; Zod/schema stripping fields the service reads (silent `undefined`); config values persisted but silently ignored by the implementation — trace each field through schema → service → consumer
+   - Handlers reading properties from framework-provided objects using field names the framework doesn't populate — silent `undefined`. Verify property names match the caller's contract
+   - Numeric values from strings used without `NaN`/type guards — `NaN` comparisons silently pass bounds checks. Clamp query params to safe lower bounds
+   - UI elements hidden from navigation but still accessible via direct URL — enforce restrictions at the route level
+   - Summary counters/accumulators that miss edge cases (removals, branch coverage); silent operations in verbose sequences where all branches should print status
+   **Intent vs implementation**
+   - Labels, comments, status messages, or documentation that describe behavior the code doesn't implement — e.g., a map named "renamed" that only deletes, or an action labeled "migrated" that never creates the target
+   - Inline code examples, command templates, and query snippets that aren't syntactically valid as written — template placeholders must use a consistent format, queries must use correct syntax for their language (e.g., single `{}` in GraphQL, not `{{}}`)
+   - Cross-references between files (identifiers, parameter names, format conventions, operational thresholds) that disagree — when one reference changes, trace all other files that reference the same entity and update them
+   - Sequential instructions or steps whose ordering doesn't match the required execution order — readers following in order will perform actions at the wrong time (e.g., "record X" in step 2 when X must be captured before step 1's action)
+   - Sequential numbering (section numbers, step numbers) with gaps or jumps after edits — verify continuity
+   - Completion markers, success flags, or status files written before the operation they attest to finishes — consumers see false success if the operation fails after the write
+   - Existence checks (directory exists, file exists, module resolves) used as proof of correct/complete installation — a directory can exist but be empty, a file can exist with invalid contents. Verify the specific resource the consumer needs
+   - Tracking/checkpoint files that default to empty on parse failure — causes full re-execution. Fail loudly instead
+   - Registering references to resources without verifying the resource exists — dangling references after failed operations
    **Concurrency & data integrity**
-   - Shared mutable state (files, in-memory caches) accessed by concurrent requests without locking or atomic writes
-   - Multi-step read-modify-write cycles on files or databases that can interleave with other requests
-   - Multi-table writes (e.g., parent row + relationship/link rows) without a transaction — FK violations or errors after the first insert leave partial state. Wrap all related writes in a single transaction
-   - Functions with early returns for "no primary fields to update" that silently skip secondary operations (relationship updates, link table writes) — ensure early-return guards don't bypass logic that should run independently of primary field changes
+   - Shared mutable state accessed by concurrent requests without locking or atomic writes; multi-step read-modify-write cycles that can interleave
+   - Multi-table writes without a transaction — FK violations or errors leave partial state
+   - Functions with early returns for "no primary fields to update" that silently skip secondary operations (relationship updates, link writes)
+   - Functions that acquire shared state (locks, flags, markers) with exit paths that skip cleanup — leaves the system permanently locked. Trace all exit paths including error branches
    **Search & navigation**
-   - Search results that link to generic list pages instead of deep-linking to the specific record — include the record type and ID in the URL
-   - Search or query code that hardcodes one backend's implementation when the system supports multiple backends — use the active backend's capabilities so results aren't stale after a backend switch. Also check that option/parameter names are mapped between backends (e.g., `ftsWeight` vs `bm25Weight`) so configuration isn't silently ignored
+   - Search results linking to generic list pages instead of deep-linking to the specific record
+   - Search/query code hardcoding one backend's implementation when the system supports multiple — verify option/parameter names are mapped between backends
    **Sync & replication**
-   - Upsert/`ON CONFLICT UPDATE` clauses that only update a subset of the fields exported by the corresponding "get changes" query — omitted fields cause replicas to diverge. Deliberately omit only fields that should stay local (e.g., access stats), and document the decision
-   - Pagination using `COUNT(*)` to compute `hasMore` — this forces a full table scan on large tables. Use the `limit + 1` pattern: fetch one extra row to detect more pages, return only `limit` rows
-   - Pagination endpoints that return a `next` token but don't accept one as input (or vice versa) — clients can't retrieve pages beyond the first. Also check that hard-capped query limits (e.g., `Limit: 100`) don't silently truncate results when offset exceeds the cap
+   - Upsert/`ON CONFLICT UPDATE` updating only a subset of exported fields — replicas diverge. Document deliberately omitted fields
+   - Pagination using `COUNT(*)` (full table scan) instead of `limit + 1`; endpoints missing `next` token input/output; hard-capped limits silently truncating results
    **SQL & database**
-   - Parameterized query placeholder indices (`$1`, `$2`, ...) must match the actual parameter array positions — especially when multiple queries share a param builder or when the index is computed dynamically
-   - Database triggers (e.g., `BEFORE UPDATE` setting `updated_at = NOW()`) that clobber explicitly-provided values — verify triggers don't interfere with replication/sync that sets fields to remote timestamps
-   - Auto-incrementing columns (`BIGSERIAL`, `SERIAL`) only auto-increment on INSERT, not UPDATE — if change-tracking relies on a sequence column, the UPDATE path must explicitly call `nextval()` to bump it
-   - Database functions that require specific extensions or minimum versions — verify the deployment target supports them and the init script enables the extension
-   - Full-text search with strict query parsers (`to_tsquery`) directly on user input — punctuation, quotes, and operators cause SQL errors. Use `websearch_to_tsquery` or `plainto_tsquery` for user-facing search
-   - Query results assigned to variables but never read — remove dead queries to avoid unnecessary database load
-   - N+1 query patterns inside transactions (SELECT + INSERT/UPDATE per row) — use batched upserts (`INSERT ... ON CONFLICT ... DO UPDATE`) to reduce round-trips and lock time
-   - `CREATE TABLE IF NOT EXISTS` used as the sole schema migration strategy — it won't add new columns, indexes, or triggers to existing tables on upgrade. Use `ALTER TABLE ... ADD COLUMN IF NOT EXISTS` or a migration framework for schema evolution
-   - O(n²) algorithms (self-joins, all-pairs comparisons, nested loops over full tables) triggered per-request on data that grows over time — these become prohibitive at scale. Add caps, use indexed lookups, or move to background jobs
+   - Parameterized query placeholder indices must match parameter array positions — especially with shared param builders or computed indices
+   - Database triggers clobbering explicitly-provided values; auto-incrementing columns that only increment on INSERT, not UPDATE
+   - Full-text search with strict parsers (`to_tsquery`) on user input — use `websearch_to_tsquery` or `plainto_tsquery`
+   - Dead queries (results never read), N+1 patterns inside transactions, O(n²) algorithms on growing data
+   - `CREATE TABLE IF NOT EXISTS` as sole migration strategy — won't add columns/indexes on upgrade. Use `ALTER TABLE ... ADD COLUMN IF NOT EXISTS` or a migration framework
+   - Functions/extensions requiring specific database versions without verification
    **Lazy initialization & module loading**
-   - Cached state getters that return `null`/`undefined` before the module is initialized — code that checks the cached value before triggering initialization will get incorrect results. Provide an async initializer or ensure-style function
-   - Re-exporting constants from heavy modules defeats lazy loading — define shared constants in a lightweight module or inline them
-   - Module-level side effects (file reads, JSON.parse, SDK client init) that run on import without error handling — a corrupted file or missing credential crashes the entire process before any request is served. Wrap module-level init in try/catch and degrade gracefully
+   - Cached state getters returning null before initialization — provide async initializer or ensure-style function
+   - Module-level side effects (file reads, SDK init) without error handling — corrupted files crash the process on import
+   - Bootstrap/resilience code that imports the dependencies it's meant to install — restructure so installation precedes resolution
+   - Re-exporting from heavy modules defeats lazy loading — use lightweight shared modules
    **Data format portability**
-   - Values that cross serialization boundaries (JSON API → database, peer sync) may change format — e.g., arrays in JSON vs specialized string literals in the database. Convert consistently before writing to the target
-   - Database BIGINT/BIGSERIAL values parsed into JavaScript `Number` via `parseInt` or `Number()` — precision is lost past `Number.MAX_SAFE_INTEGER`, silently corrupting IDs, sequence cursors, or pagination tokens. Use string representation or `BigInt` for large integer columns
-   **Shell script safety**
-   - Subprocess calls in shell scripts under `set -e` — if the subprocess fails, the script aborts. Also check non-critical writes (e.g., `echo` to stdout) which fail on broken pipes and trigger exit — use `|| true` for non-critical output
-   - Detached/background child processes spawned with piped stdio — if the parent exits (restart, crash), pipes close and writes cause SIGPIPE. Redirect stdio to log files or use `'ignore'` for children that must outlive the parent
-   - When the same data structure is manipulated in both application code and shell-inline scripts, apply identical guards in both places
+   - Values crossing serialization boundaries may change format (arrays in JSON vs string literals in DB) — convert consistently
+   - BIGINT values parsed into JavaScript `Number` — precision lost past `MAX_SAFE_INTEGER`. Use strings or `BigInt`
-   **Cross-platform compatibility**
-   - Platform-specific execution assumptions — hardcoded shell interpreters (`bash`, `sh`), `path.join()` producing backslashes that break ESM `import()` or URL-based APIs on Windows, platform-gated scripts without fallback or clear error. Use `pathToFileURL()` for dynamic imports, check `process.platform` for shell dispatch
+   **Shell & portability**
+   - Subprocess calls under `set -e` abort on failure; non-critical writes fail on broken pipes — use `|| true` for non-critical output
+   - Detached child processes with piped stdio — parent exit causes SIGPIPE. Redirect to log files or use `'ignore'`
+   - Platform-specific assumptions — hardcoded shell interpreters, `path.join()` backslashes breaking ESM imports. Use `pathToFileURL()` for dynamic imports
    **Test coverage**
-   - New validation schemas, service functions, or business logic added without corresponding tests — especially when the project already has a test suite covering similar existing code
-   - New error paths (404, 400) that are untestable because the service throws generic errors instead of typed/status-coded ones
-   - Tests that re-implement the logic under test instead of importing real exports — these pass even when the real code regresses. Import and call the actual functions
-   - Missing tests for trust-boundary enforcement — if the server strips/recomputes client-provided fields, add a test that submits tampered values and verifies the server ignores them
-   - Tests that depend on real wall-clock time (`setTimeout`, `Date.now`, network delays) for rate limiters, debounce, or scheduling — slow under normal conditions and flaky under CI load. Use fake timers or time mocking
+   - New logic/schemas/services without corresponding tests when similar existing code has tests
+   - New error paths untestable because services throw generic errors instead of typed ones
+   - Tests re-implementing logic under test instead of importing real exports — pass even when real code regresses
+   - Tests depending on real wall-clock time or external dependencies when testing logic — use fake timers and mocks
+   - Missing tests for trust-boundary enforcement — submit tampered values, verify server ignores them
    **Accessibility**
-   - Interactive elements (buttons, toggles, custom controls) missing accessible names, roles, or ARIA states — including programmatically disabled interactions that don't reflect the disabled state visually or via `aria-disabled` (e.g., drag handles that appear interactive but are inert during async operations)
-   - Custom toggle/switch UI built from `<button>` or `<div>` instead of native inputs with appropriate labeling
+   - Interactive elements missing accessible names, roles, or ARIA states — including disabled interactions without `aria-disabled`
+   - Custom toggle/switch UI built from non-semantic elements instead of native inputs
    **Configuration & hardcoding**
-   - Hardcoded values (usernames, org names, limits) when a config field or env var already exists for that purpose
-   - Dead config fields that nothing reads — either wire them up or remove them
-   - Function parameters that are accepted but never used — creates a false API contract; remove unused params or implement the intended behavior
-   - Duplicated config/constants/utility helpers across modules — extract to a single shared module to prevent drift (watch for circular imports when choosing the shared location)
-   - CI pipelines that install dependencies without lockfile pinning (`npm install` instead of `npm ci`) or that ad-hoc install packages without version constraints — creates non-deterministic builds that can break unpredictably
+   - Hardcoded values when a config field or env var already exists; dead config fields nothing consumes; unused function parameters creating false API contracts
+   - Duplicated config/constants/utilities across modules — extract to shared module to prevent drift
+   - CI pipelines installing without lockfile pinning or version constraints — non-deterministic builds
    **Style & conventions**
    - Naming and patterns consistent with the rest of the codebase
-   - Missing error handling at system boundaries (user input, external APIs)
+   - Formatting consistency within each file — new content must match existing indentation, bullet style, heading levels, and structure

package/lib/copilot-review-loop.md CHANGED Viewed

@@ -1,46 +1,87 @@
 ## Copilot Code Review Loop
-After the PR is created, run the Copilot review-and-fix loop:
-1. **Request a Copilot review via API**
-   ```bash
-   gh api repos/OWNER/REPO/pulls/PR_NUM/requested_reviewers -f 'reviewers[]=copilot-pull-request-reviewer[bot]'
-   ```
-   **CRITICAL**: The reviewer name MUST include the `[bot]` suffix. Without it, the API returns a 422 "not a collaborator" error.
-   - For **public repos**: Copilot review may trigger automatically on PR creation — check if a review already exists before requesting
-   - If no Copilot reviewer is configured at all, inform the user and skip this loop
-2. **Wait for the review to complete (BLOCKING — do not skip or proceed early)**
-   - Record the current review count and latest `submittedAt` timestamp before waiting
-   - Poll using `gh api graphql` to check the `reviews` array for a NEW review node (compare `submittedAt` timestamps or count):
-     ```bash
-     gh api graphql -f query='{ repository(owner: "OWNER", name: "REPO") { pullRequest(number: PR_NUM) { reviews(last: 3) { nodes { state body author { login } submittedAt } } reviewThreads(first: 100) { nodes { id isResolved comments(first: 3) { nodes { body path line author { login } } } } } } } }'
-     ```
-   - The review is complete when a new Copilot review node appears with a `submittedAt` after your latest push
-   - **Error detection**: After a review appears, check the review `body` for error text such as "Copilot encountered an error" or "unable to review this pull request". If the review body contains this error, it is NOT a successful review — re-request the review (step 1) and resume polling. Log a warning so the user knows a retry occurred. Apply a maximum of 3 error retries before asking the user whether to continue waiting or skip.
-   - **Do NOT proceed until the re-requested review has actually posted** — "Awaiting requested review" means it is still in progress
-   - **Dynamic poll timing**: Before your first poll, check how long the most recent Copilot review on this PR took by comparing its `submittedAt` to the previous review's `submittedAt` (or to the PR creation time if it was the first review). Use that duration as your expected wait time. If no prior review exists, default to 5 minutes. Set poll interval to 60 seconds and max wait to **2x the expected duration** (minimum 5 minutes, maximum 20 minutes).
-   - Copilot reviews can take **10-15 minutes** for large diffs — do NOT give up early
-   - If no review appears after the max wait time, **ask the user** whether to continue waiting, re-request the review, or skip — **never proceed without user approval when the review loop fails**
-   - If the review request silently disappears (reviewRequests becomes empty without a review being posted), re-request the review once and resume polling
-3. **Check for unresolved comments**
-   - Filter review threads for `isResolved: false`
-   - **First, verify the review was successful**: check that the latest Copilot review body does NOT contain "Copilot encountered an error" or "unable to review". If it does, this is an error response — go back to step 1 (re-request) instead of proceeding. This check is critical because error reviews have no comments and no unresolved threads, making them look identical to a clean review.
-   - Also count the total comments in the latest review (check the review body for "generated N comments")
-   - If the latest review has **zero comments** (body says "generated 0 comments" or no unresolved threads exist): the PR is clean — exit the loop
-   - If **there are unresolved comments**: proceed to fix them (step 4)
-4. **Fix all unresolved review comments**
+After the PR is created, run the Copilot review-and-fix loop.
+**IMPORTANT — Sub-agent delegation**: To prevent context exhaustion on long review cycles, delegate the entire review loop to a **general-purpose sub-agent** via the Agent tool. The sub-agent runs the full loop (request → wait → check → fix → re-request) autonomously and returns only the final status. This keeps the parent agent's context clean.
+### Sub-agent prompt template:
+```
+You are a Copilot review loop agent.
+PR: {PR_NUMBER} in {OWNER}/{REPO}
+Branch: {BRANCH_NAME}
+Build command: {BUILD_CMD}
+Max iterations: 5
+TIMEOUT SCHEDULE:
+When running parallel PR reviews (do:better), use shorter waits to avoid
+blocking other PRs:
+- Iteration 1: max wait 5 minutes
+- Iteration 2: max wait 4 minutes
+- Iteration 3: max wait 3 minutes
+- Iteration 4: max wait 2 minutes
+- Iteration 5+: max wait 1 minute
+When running a single-PR review (do:pr, do:release), use dynamic timing:
+check the previous Copilot review duration on this PR and wait up to 2x
+that (minimum 5 minutes, maximum 20 minutes). Copilot reviews can take
+10-15 minutes for large diffs.
+Poll interval: 30 seconds for all iterations.
+Run the following loop until Copilot returns zero new comments or you hit
+the max iteration limit:
+1. CAPTURE the latest Copilot review submittedAt timestamp (so you can
+   detect when a NEW review arrives):
+   echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviews(last: 5) { nodes { author { login } submittedAt } } } } }"}' | gh api graphql --input -
+   Record the most recent submittedAt from copilot-pull-request-reviewer[bot].
+   Then REQUEST a Copilot review:
+   gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/requested_reviewers \
+     -f 'reviewers[]=copilot-pull-request-reviewer[bot]'
+   CRITICAL: The reviewer name MUST include the [bot] suffix.
+   - For public repos: check if a review already exists before requesting
+   - If no Copilot reviewer is configured, report back and exit
+2. WAIT for the review to complete (BLOCKING):
+   - Poll using stdin JSON piping to avoid shell-escaping issues:
+     echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviews(last: 5) { totalCount nodes { state body author { login } submittedAt } } reviewThreads(first: 100) { nodes { id isResolved comments(first: 3) { nodes { body path line author { login } } } } } } } }"}' | gh api graphql --input -
+   - The review is complete when a new Copilot review node appears with a
+     submittedAt after the timestamp captured in step 1
+   - For parallel PR reviews (do:better): use the DECREASING TIMEOUT for
+     the current iteration number
+   - For single-PR reviews (do:pr, do:release): use dynamic timing based on
+     the previous Copilot review duration on this PR (2x that, min 5 min,
+     max 20 min)
+   - Error detection: if the review body contains "Copilot encountered an
+     error" or "unable to review this pull request", re-request (step 1)
+     and resume polling. Max 3 error retries before reporting failure.
+   - If no review appears after max wait, report the timeout — the parent
+     agent will ask the user what to do
+3. CHECK for unresolved comments:
+   - Filter review threads for isResolved: false
+   - First verify the review was successful: check that the latest Copilot
+     review body does NOT contain error text. If it does, go back to step 1.
+   - If zero comments (body says "generated 0 comments" or no unresolved
+     threads): PR is clean — report success and exit
+   - If unresolved comments exist: proceed to step 4
+4. FIX all unresolved review comments:
    For each unresolved thread:
    - Read the referenced file and understand the feedback
    - Make the code fix
-   - Run the build (`npm run build` or the project's build command)
-   - If build passes, commit with message `address review: <summary of changes>`
-   - Resolve the thread via GraphQL mutation:
-     ```bash
-     gh api graphql -f query='mutation { resolveReviewThread(input: {threadId: "THREAD_ID"}) { thread { id isResolved } } }'
-     ```
-   - After all threads are resolved, push all commits to remote
-   - **Re-request a Copilot review** via API (same command as step 1)
-   - **Go back to step 2** (wait for new review) — this loop MUST repeat until Copilot returns a review with zero new comments. Never proceed after only one round of fixes.
+   - Run the build command
+   - If build passes, commit: address review: <summary>
+   - Resolve the thread via GraphQL mutation using stdin JSON piping:
+     echo '{"query":"mutation { resolveReviewThread(input: {threadId: \"{THREAD_ID}\"}) { thread { id isResolved } } }"}' | gh api graphql --input -
+   - After all threads resolved, push all commits to remote
+   - Increment iteration counter and go back to step 1
+When done, report back:
+- Final status: clean / max-iterations-reached / timeout / error
+- Total iterations completed
+- List of commits made (if any)
+- Any unresolved threads remaining
+```
+Launch the sub-agent and wait for its result. If the sub-agent reports a timeout or error, **ask the user** whether to continue waiting, re-request the review, or skip — never proceed without user approval when the review loop fails.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "slash-do",
-  "version": "1.4.0",
+  "version": "1.4.2",
   "description": "Curated slash commands for AI coding assistants — Claude Code, OpenCode, Gemini CLI, and Codex",
   "author": "Adam Eivy <adam@eivy.com>",
   "license": "MIT",