npm - slash-do - Versions diffs - 1.4.3 → 1.5.1 - Mend

slash-do 1.4.3 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/commands/do/better.md +105 -159
package/commands/do/fpr.md +26 -5
package/commands/do/pr.md +25 -6
package/commands/do/release.md +25 -6
package/commands/do/review.md +64 -3
package/install.sh +1 -0
package/lib/code-review-checklist.md +111 -54
package/lib/remediation-agent-template.md +61 -0
package/package.json +1 -1
package/uninstall.sh +1 -0

package/commands/do/better.md CHANGED Viewed

@@ -49,6 +49,17 @@ When the resolved model is `opus`, **omit** the `model` parameter on the Agent/T
 Opus reduces false positives in audit (judgment-heavy). Sonnet is the floor for code-writing agents (remediation). Haiku works for fast first-pass pattern scanning but may produce more false positives — remediation agents (Sonnet+) validate before fixing.
+## Compaction Guidance
+When compacting during this workflow, always preserve:
+- The `FILE_OWNER_MAP` (complete, not summarized)
+- All CRITICAL/HIGH findings with file:line references
+- The current phase number and what phases remain
+- All PR numbers and URLs created so far
+- `BUILD_CMD`, `TEST_CMD`, `PROJECT_TYPE`, `WORKTREE_DIR` values
+- `VCS_HOST`, `CLI_TOOL`, `DEFAULT_BRANCH`, `CURRENT_BRANCH`
 ## Phase 0: Discovery & Setup
 Detect the project environment before any scanning or remediation.
@@ -88,15 +99,8 @@ Record as `BUILD_CMD` and `TEST_CMD`.
 - Check for `.changelog/` directory → `HAS_CHANGELOG`
 - Check for existing `../better-*` worktrees: `git worktree list`. If found, inform the user and ask whether to resume (use existing worktree) or clean up (remove it and start fresh)
-### 0e: Browser Authentication (GitHub only)
-If `VCS_HOST` is `github`, proactively verify browser authentication for the Copilot review loop later:
-1. Navigate to the repo URL using `browser_navigate` via Playwright MCP
-2. Take a snapshot and check for user avatar/menu indicating logged-in state
-3. If NOT logged in: navigate to `https://github.com/login`, inform the user **"Please log in to GitHub in the browser. I'll wait for you to complete authentication."**, and use `AskUserQuestion` to wait for the user to confirm they've logged in
-4. Do NOT close the browser — it stays open for the entire session
-5. Record `BROWSER_AUTHENTICATED = true` once confirmed
-This ensures the browser is ready before we need it in Phase 6, avoiding interruptions mid-flow.
+<audit_instructions>
 ## Phase 1: Unified Audit
@@ -107,7 +111,7 @@ Launch 7 Explore agents in two batches. Each agent must report findings in this
 - **[CRITICAL/HIGH/MEDIUM/LOW]** `file:line` - Description. Suggested fix: ... Complexity: Simple/Medium/Complex
 ```
-**IMPORTANT: Context requirement for audit agents.** When flagging an issue, agents MUST read at least 30 lines of surrounding context to confirm the issue is real. Common false positives to watch for:
+**Context requirement.** Before flagging, read at least 30 lines of surrounding context to confirm the issue is real. Common false positives to watch for:
 - A Promise `.then()` chain that appears "unawaited" but IS collected into an array and awaited via `Promise.all` downstream
 - A value that appears "unvalidated" but IS checked by a guard clause earlier in the function or by the caller
 - A pattern that looks like an anti-pattern in isolation but IS idiomatic for the specific framework or library being used
@@ -115,6 +119,17 @@ Launch 7 Explore agents in two batches. Each agent must report findings in this
 If the surrounding context shows the code is correct, do NOT flag it.
+If uncertain whether something is a genuine issue, report it as **[UNCERTAIN]** with your reasoning. The consolidation phase will evaluate these separately. Fewer confident findings is better than padding with questionable ones.
+<approach>
+For each potential finding:
+1. Read the file and 30+ lines of surrounding context
+2. Quote the specific code that demonstrates the issue
+3. Explain why it's a problem given the context
+4. Only then classify severity and suggest a fix
+Skip step 4 if steps 1-3 reveal the code is correct.
+</approach>
 ### Batch 1 (5 parallel Explore agents via Task tool):
 **Model**: Pass `AUDIT_MODEL` as the `model` parameter on each agent. If `AUDIT_MODEL` is `opus`, omit the parameter to inherit from session.
@@ -122,6 +137,8 @@ If the surrounding context shows the code is correct, do NOT flag it.
 1. **Security & Secrets**
    Sources: authentication checks, credential exposure, infrastructure security, input validation, dependency health
    Focus: hardcoded credentials, API keys, exposed secrets, authentication bypasses, disabled security checks, PII exposure, injection vulnerabilities (SQL/command/path traversal), insecure CORS configurations, missing auth checks, unsanitized user input in file paths or queries, known CVEs in dependencies (check `npm audit` / `cargo audit` / `pip-audit` / `go vuln` output), abandoned or unmaintained dependencies, overly permissive dependency version ranges
+   OWASP Top 10 framing: broken auth (session fixation, credential stuffing), security misconfiguration (default creds, debug mode in prod), SSRF (user-controlled URLs in server fetch without allowlist), mass assignment (request bodies bound to models without field allowlist)
+   Supply chain: lockfile committed + frozen installs in CI, no untrusted postinstall scripts
 2. **Code Quality & Style**
    Sources: code brittleness, convention violations, test workarounds, logging & observability
@@ -134,10 +151,13 @@ If the surrounding context shows the code is correct, do NOT flag it.
 4. **Architecture & SOLID**
    Sources: structural violations, coupling analysis, modularity, API contract quality
    Focus: Single Responsibility violations (god files >500 lines, functions >50 lines doing multiple things), tight coupling between modules, circular dependencies, mixed concerns in single files, dependency inversion violations, classes/modules with too many responsibilities (>20 public methods), deep nesting (>4 levels), long parameter lists, modules reaching into other modules' internals, inconsistent API error response shapes across endpoints, list endpoints missing pagination, missing rate limiting on public endpoints, inconsistent request/response envelope patterns
+   API contract consistency: breaking response shape changes without versioning, inconsistent error envelopes across endpoints, missing deprecation headers on sunset endpoints
 5. **Bugs, Performance & Error Handling**
    Sources: runtime safety, resource management, async correctness, performance, race conditions
    Focus: missing `await` on async calls, unhandled promise rejections, null/undefined access without guards, off-by-one errors, incorrect comparison operators, mutation of shared state, resource leaks (unbounded caches/maps, unclosed connections/streams), `process.exit()` in library code, async routes without error forwarding, missing AbortController on data fetching, N+1 query patterns (loading related records inside loops), O(n²) or worse algorithms in hot paths, unbounded result sets (missing LIMIT/pagination on DB queries), missing database indexes on frequently queried columns, race conditions (TOCTOU, double-submit without idempotency keys, concurrent writes to shared state without locks, stale-read-then-write patterns), missing connection pooling or pool exhaustion
+   Resilience: external calls without timeouts, missing fallback for unavailable downstream services, retry without backoff ceiling/jitter, missing health check endpoints
+   Observability: production paths without structured logging, error logs missing reproduction context (request ID, input params), async flows without correlation IDs
 ### Batch 2 (2 agents after Batch 1 completes):
@@ -150,6 +170,7 @@ If the surrounding context shows the code is correct, do NOT flag it.
    - **Python**: mutable default arguments, bare except clauses, missing type hints on public APIs, sync I/O in async contexts
    - **Go**: unchecked errors, goroutine leaks, defer in loops, context propagation gaps
    - **Web projects (any stack)**: accessibility issues — missing alt text on images, broken keyboard navigation, missing ARIA labels on interactive elements, insufficient color contrast, form inputs without associated labels
+   - **Database migrations**: exclusive-lock ALTER TABLE on large tables, CREATE INDEX without CONCURRENTLY, missing down migrations or untested rollback paths
    - General: framework-specific security issues, language-specific gotchas, domain-specific compliance, environment variable hygiene (missing `.env.example`, required env vars not validated at startup, secrets in config files that should be in env)
 7. **Test Coverage**
@@ -158,12 +179,16 @@ If the surrounding context shows the code is correct, do NOT flag it.
 Wait for ALL agents to complete before proceeding.
+</audit_instructions>
+<plan_and_remediate>
 ## Phase 2: Plan Generation
 1. Read the existing `PLAN.md` (create if it doesn't exist)
 2. Consolidate all findings from Phase 1, deduplicating across agents (same file:line flagged by multiple agents → keep the most specific description)
 3. Identify **shared utility extractions** — patterns duplicated 3+ times that should become reusable functions. Group these as "Foundation" work for Phase 3b.
-4. **Build the file ownership map** (CRITICAL for Phase 5):
+4. **Build the file ownership map** (required by Phase 5 for conflict-free PRs):
    - For each finding, record which file(s) it touches
    - Assign each file to exactly ONE category (its primary category)
    - If a file is touched by multiple categories, assign it to the category with the highest-severity finding for that file
@@ -266,58 +291,17 @@ If no shared utilities were identified, skip this step.
 4. Spawn up to 5 general-purpose agents as teammates. **Pass `REMEDIATION_MODEL` as the `model` parameter on each agent.** If `REMEDIATION_MODEL` is `opus`, omit the parameter to inherit from session.
 ### Agent instructions template:
-```
-You are {agent-name} on team better-{DATE}.
-Your task: Fix all {CATEGORY} findings from the Good audit.
-Working directory: {WORKTREE_DIR} (this is a git worktree — all work happens here)
-Project type: {PROJECT_TYPE}
-Build command: {BUILD_CMD}
-Test command: {TEST_CMD}
-Foundation utilities available (if created):
-{list of utility files with brief descriptions}
-Findings to address:
-{filtered list of CRITICAL/HIGH/MEDIUM findings for this category}
-FINDING VALIDATION — verify before fixing:
-- Before fixing each finding, READ the file and at least 30 lines of surrounding
-  context to confirm the issue is genuine.
-- Check whether the flagged code is already correct (e.g., a Promise chain that
-  IS properly awaited downstream, a value that IS validated earlier in the function,
-  a pattern that IS idiomatic for the framework).
-- If the existing code is already correct, SKIP the fix and report it as a
-  false positive with a brief explanation of why the original code is fine.
-- Do not make changes that are semantically equivalent to the original code
-  (e.g., wrapping a .then() chain in an async IIFE adds noise without fixing anything).
-COMMIT STRATEGY — commit early and often:
-- After completing each logical group of related fixes, stage those files
-  and commit immediately with a descriptive conventional commit message.
-- Each commit should be independently valid (build should pass).
-- Run {BUILD_CMD} in {WORKTREE_DIR} before each commit to verify.
-- Use `git -C {WORKTREE_DIR} add <specific files>` — never `git add -A` or `git add .`
-- Use `git -C {WORKTREE_DIR} commit -m "prefix: description"`
-- Use conventional commit prefixes: fix:, refactor:, feat:, security:
-- Do NOT include co-author or generated-by annotations in commits.
-- Do NOT bump the version — that happens once at the end.
-After all fixes:
-- Ensure all changes are committed (no uncommitted work)
-- Mark your task as completed via TaskUpdate
-- Report: commits made, files modified, findings addressed, any skipped issues
-CONFLICT AVOIDANCE:
-- Only modify files listed in your assigned findings
-- If you need to modify a file assigned to another agent, skip that change and report it
-```
+!`cat ~/.claude/lib/remediation-agent-template.md`
 ### Conflict avoidance:
 - Review all findings before task assignment. If two categories touch the same file, assign both sets of findings to the same agent.
 - Security agent gets priority on validation logic; DRY agent gets priority on import consolidation.
+</plan_and_remediate>
+<verification_and_pr>
 ## Phase 4: Verification
 After all agents complete:
@@ -337,6 +321,34 @@ After all agents complete:
 4. Shut down all agents via `SendMessage` with `type: "shutdown_request"`
 5. Clean up team via `TeamDelete`
+## Phase 4b: Internal Code Review
+Before creating PRs, run a deep code review on all remediation changes to catch issues that automated agents may have introduced.
+1. Generate the diff of all changes in the worktree:
+   ```bash
+   cd {WORKTREE_DIR} && git diff {DEFAULT_BRANCH}...HEAD
+   ```
+2. Review the diff against the code review checklist:
+   ```
+   !`cat ~/.claude/lib/code-review-checklist.md`
+   ```
+3. For each issue found:
+   - Fix in a new commit: `fix: {description of review finding}`
+   - Re-run `{BUILD_CMD}` and `{TEST_CMD}` to verify
+4. Present a summary of review findings and fixes to the user via `AskUserQuestion`:
+   ```
+   AskUserQuestion([{
+     question: "Code review complete. {N} issues found and fixed. {list}. Proceed to PR creation?",
+     options: [
+       { label: "Proceed", description: "Create per-category PRs" },
+       { label: "Show diff", description: "Show the full diff for manual review before proceeding" },
+       { label: "Abort", description: "Stop here — I'll review manually" }
+     ]
+   }])
+   ```
+5. If "Show diff" selected, print the diff and re-ask. If "Abort", stop and print the worktree path.
 ## Phase 5: Per-Category PR Creation
 Instead of one mega PR, create **separate branches and PRs for each category**. This enables independent review, targeted CI, and granular merge decisions.
@@ -359,9 +371,9 @@ For each category that has findings:
    ```
 5. Push the branch: `git push -u origin better/{CATEGORY_SLUG}`
-**CRITICAL: File isolation rule** — each file must appear in exactly ONE branch. If a file has changes from multiple categories (e.g., `server/index.js` with both security and stack-specific changes), assign the whole file to one category based on the file ownership map. Do not split file-level changes across PRs.
+**File isolation rule** (one file per branch) — each file must appear in exactly ONE branch. If a file has changes from multiple categories (e.g., `server/index.js` with both security and stack-specific changes), assign the whole file to one category based on the file ownership map. Do not split file-level changes across PRs.
-**CRITICAL: Cross-PR dependency check** — after building all branches, verify each branch builds independently:
+**Cross-PR dependency check** — verify each branch builds independently:
 ```bash
 git checkout better/{CATEGORY_SLUG} && {BUILD_CMD}
 ```
@@ -465,110 +477,19 @@ After creating all PRs, verify CI passes on each one:
 Maximum 5 iterations per PR to prevent infinite loops.
-**IMPORTANT — Sub-agent delegation**: To prevent context exhaustion on long review cycles with multiple PRs, delegate each PR's review loop to a **separate general-purpose sub-agent** via the Agent tool. Launch sub-agents in parallel (one per PR). Each sub-agent runs the full loop (request → wait → check → fix → re-request) autonomously and returns only the final status.
-### 6.0: Verify browser authentication
-If `BROWSER_AUTHENTICATED` is not true (e.g., Phase 0e was skipped or failed):
-1. Navigate to the first PR URL using `browser_navigate`
-2. Check for user avatar/menu
-3. If not logged in: navigate to `https://github.com/login`, inform the user **"Please log in to GitHub in the browser. I'll wait for you to confirm."**, and use `AskUserQuestion` to wait
-### 6.1: Determine review request method
+**Sub-agent delegation** (prevents context exhaustion): delegate each PR's review loop to a **separate general-purpose sub-agent** via the Agent tool. Launch sub-agents in parallel (one per PR). Each sub-agent runs the full loop (request → wait → check → fix → re-request) autonomously and returns only the final status.
-**Try the API first** on any one PR:
-```bash
-gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/requested_reviewers \
-  -f 'reviewers[]=copilot-pull-request-reviewer[bot]'
-```
-If this returns 422 ("not a collaborator"), record `REVIEW_METHOD=playwright`. Otherwise record `REVIEW_METHOD=api`.
+### 6.1: Launch parallel sub-agents (one per PR)
-### 6.2: Launch parallel sub-agents (one per PR)
+For each PR, spawn a general-purpose sub-agent using the shared review loop template:
-For each PR, spawn a general-purpose sub-agent with:
+!`cat ~/.claude/lib/copilot-review-loop.md`
-```
-You are a Copilot review loop agent for PR {PR_NUMBER}.
-Repository: {OWNER}/{REPO}
-Branch: better/{CATEGORY_SLUG}
-Build command: {BUILD_CMD}
-Review request method: {REVIEW_METHOD}
-Max iterations: 5
-DECREASING TIMEOUT SCHEDULE (shorter than single-PR review since multiple
-PRs are reviewed in parallel — see do:rpr for single-PR dynamic timing):
-- Iteration 1: max wait 5 minutes
-- Iteration 2: max wait 4 minutes
-- Iteration 3: max wait 3 minutes
-- Iteration 4: max wait 2 minutes
-- Iteration 5+: max wait 1 minute
-Poll interval: 30 seconds for all iterations.
-Run the following loop until Copilot returns zero new comments or you hit
-the max iteration limit:
-1. CAPTURE the latest Copilot review timestamp, then REQUEST a new review:
-   - First, capture the latest Copilot review timestamp via GraphQL:
-     echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviews(last: 20) { nodes { author { login } submittedAt } } } } }"}' | gh api graphql --input -
-   - Find the most recent submittedAt where author.login is
-     copilot-pull-request-reviewer[bot] and record as LAST_COPILOT_SUBMITTED_AT.
-   - If no prior Copilot review exists, record LAST_COPILOT_SUBMITTED_AT=NONE
-     and treat the next Copilot review as NEW regardless of timestamp.
-   - Then REQUEST:
-     If REVIEW_METHOD is "api":
-       gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/requested_reviewers \
-         -f 'reviewers[]=copilot-pull-request-reviewer[bot]'
-     If REVIEW_METHOD is "playwright":
-       Navigate to the PR URL, click the "Reviewers" gear button, click the
-       Copilot menuitemradio option, verify sidebar shows "Awaiting requested
-       review from Copilot"
-2. WAIT for the review (BLOCKING):
-   - Poll using stdin JSON piping (avoid shell-escaping issues):
-     echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviews(last: 5) { totalCount nodes { state body author { login } submittedAt } } reviewThreads(first: 100) { nodes { id isResolved comments(first: 3) { nodes { body path line author { login } } } } } } } }"}' | gh api graphql --input -
-   - Complete when a new copilot-pull-request-reviewer[bot] review appears
-     with submittedAt after LAST_COPILOT_SUBMITTED_AT captured in step 1
-     (or, if LAST_COPILOT_SUBMITTED_AT=NONE, when the first
-     copilot-pull-request-reviewer[bot] review for this loop appears)
-   - Use the DECREASING TIMEOUT for the current iteration number
-   - Error detection: if review body contains "Copilot encountered an error"
-     or "unable to review", re-request and resume. Max 3 error retries.
-   - If no review after max wait, report timeout and exit
-3. CHECK for unresolved threads:
-   Fetch threads via stdin JSON piping:
-     echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviewThreads(first: 100) { nodes { id isResolved comments(first: 10) { nodes { body path line author { login } } } } } } } }"}' | gh api graphql --input -
-   - Verify review was successful (no error text in body)
-   - If zero comments / no unresolved threads: report success and exit
-   - If unresolved threads exist: proceed to step 4
-4. FIX all unresolved threads:
-   For each unresolved thread:
-   - Read the referenced file and understand the feedback
-   - Evaluate: valid feedback → make the fix; informational/false positive →
-     resolve without changes
-   - If fixing:
-     git checkout better/{CATEGORY_SLUG}
-     # make changes
-     git add <specific files>
-     git commit -m "address Copilot review feedback"
-     git push
-   - Resolve thread via stdin JSON piping:
-     echo '{"query":"mutation { resolveReviewThread(input: {threadId: \"{THREAD_ID}\"}) { thread { id isResolved } } }"}' | gh api graphql --input -
-   - After all threads resolved, increment iteration and go back to step 1
-When done, report back:
-- Final status: clean / max-iterations-reached / timeout / error
-- Total iterations completed
-- List of commits made (if any)
-- Any unresolved threads remaining
-```
+Pass each sub-agent the PR-specific variables: `{PR_NUMBER}`, `{OWNER}/{REPO}`, `better/{CATEGORY_SLUG}`, and `{BUILD_CMD}`.
 Launch all PR sub-agents in parallel. Wait for all to complete.
-### 6.3: Handle sub-agent results
+### 6.2: Handle sub-agent results
 For each sub-agent result:
 - **clean**: mark PR as ready to merge
@@ -576,9 +497,28 @@ For each sub-agent result:
 - **max-iterations-reached**: inform the user "Reached max review iterations (5) on PR #{number}. Remaining issues may need manual review."
 - **error**: inform the user and ask whether to retry or skip
+### 6.3: Merge Gate (MANDATORY)
+**Do NOT merge any PR until Copilot review has completed (approved or commented) on ALL PRs, or the user explicitly approves skipping.**
+Present the review status summary to the user via `AskUserQuestion`:
+```
+AskUserQuestion([{
+  question: "Copilot review status:\n{for each PR: #number - status (approved/comments/pending/timeout)}\n\nHow would you like to proceed?",
+  options: [
+    { label: "Merge approved PRs", description: "Merge only PRs with passing review" },
+    { label: "Merge all", description: "Merge all PRs regardless of review status" },
+    { label: "Wait", description: "Wait longer for pending reviews" },
+    { label: "Don't merge", description: "Leave PRs open for manual review" }
+  ]
+}])
+```
+Only proceed with merging based on the user's selection. Never auto-merge without user confirmation.
 ### 6.4: Merge
-For each PR that has passed CI and review (in dependency order if applicable):
+For each PR approved for merge (in dependency order if applicable):
 ```bash
 gh pr merge {PR_NUMBER} --merge
 ```
@@ -598,17 +538,24 @@ If merge fails (e.g., branch protection, merge conflicts from a prior PR):
   Then re-run CI check before merging.
 - If branch protection: inform the user and suggest manual merge
+</verification_and_pr>
 ## Phase 7: Cleanup
 1. Remove the worktree:
    ```bash
    git worktree remove {WORKTREE_DIR}
    ```
-2. Delete local branches (only if merged):
+2. Delete local AND remote branches (only if merged):
    ```bash
    git branch -d better/{DATE}
    git branch -d better/security better/code-quality better/dry better/arch-bugs better/stack-specific
    ```
+   ```bash
+   git push origin --delete better/{DATE}
+   git push origin --delete better/security better/code-quality better/dry better/arch-bugs better/stack-specific
+   ```
+   Ignore errors from `--delete` if a branch doesn't exist remotely.
 3. Restore stashed changes (if stashed in Phase 3a):
    ```bash
    git stash pop
@@ -643,7 +590,6 @@ If merge fails (e.g., branch protection, merge conflicts from a prior PR):
 - **Copilot review loop exceeds 5 iterations per PR**: stop iterating on that PR, inform user, proceed to merge
 - **Existing worktree found at startup**: ask user — resume (reuse worktree) or cleanup (remove and start fresh)
 - **No findings above LOW**: skip Phases 3-7, print "No actionable findings" with the LOW summary
-- **Browser not authenticated**: use `AskUserQuestion` to ask the user to log in — never skip this or close the browser
 - **Merge conflict after prior PR merged**: rebase the branch onto the updated default branch, push with `--force-with-lease`, re-run CI
 !`cat ~/.claude/lib/graphql-escaping.md`

package/commands/do/fpr.md CHANGED Viewed

@@ -59,18 +59,39 @@ Before committing, ensure the fork is up to date with upstream:
    git push -u origin {CURRENT_BRANCH}
    ```
-## Local Code Review (before opening PR)
+## Local Code Review (REQUIRED GATE)
+Fork PRs go to upstream maintainers who can't easily ask for changes — getting it right the first time matters more here than on internal PRs.
+<review_gate>
 1. Fetch upstream default branch for accurate diff:
    ```bash
    git fetch upstream {UPSTREAM_DEFAULT_BRANCH}
    ```
-2. Run `git diff upstream/{UPSTREAM_DEFAULT_BRANCH}...{CURRENT_BRANCH}` to see the full diff against upstream
-3. **For each changed file**, read the full file (not just the diff hunks) and check:
+2. Run `git diff upstream/{UPSTREAM_DEFAULT_BRANCH}...{CURRENT_BRANCH}` to get the list of changed files
+3. For every changed file:
+   a. Read the entire file using the Read tool (not just diff hunks)
+   b. Check it against the tiered checklist below (always check Tiers 1+4; check Tiers 2-3 when relevance filters match)
+   c. For each finding, quote the specific code line and explain why it's a problem
+4. After reviewing all files, verify: does the code actually deliver what the commits claim?
+5. Print a review summary table (see do:review for format)
+6. Fix any issues, recommit, and push before proceeding
+7. Only after printing the review summary may you proceed to "Open the PR"
+If the diff touches more than 15 files, delegate later batches to a subagent to keep context clean.
+</review_gate>
+Checklist to apply to each file:
 !`cat ~/.claude/lib/code-review-checklist.md`
-4. If issues are found, fix them, recommit, and push before proceeding
-5. Summarize the review findings so the user can see what was checked
+Verification — confirm before proceeding:
+- [ ] Read every changed file in full (not just diffs)
+- [ ] Checked each file against the relevant checklist tiers
+- [ ] Quoted specific code for each finding
+- [ ] Printed a review summary table with findings
 ## Check for Upstream Contributing Guidelines

package/commands/do/pr.md CHANGED Viewed

@@ -17,17 +17,36 @@ Print: `PR flow: {current_branch} → {default_branch}`
 - Keep commit message concise and do not use co-author information
 - Push the branch to remote: `git pull --rebase --autostash && git push -u origin {current_branch}`
-## Local Code Review (before opening PR)
+## Local Code Review (REQUIRED GATE)
-Before creating the PR, perform a thorough self-review. Read each changed file — not just the diff — to understand how the changes behave at runtime.
+This review catches bugs that Copilot misses — incomplete pattern copying is the #1 source of post-merge review feedback. Skipping costs more time in review cycles than it saves.
-1. Run `git diff {default_branch}...{current_branch}` to see the full diff
-2. **For each changed file**, read the full file (not just the diff hunks) and check:
+<review_gate>
+1. Read commit messages to understand what this change claims to do
+2. Run `git diff {default_branch}...{current_branch}` to get the list of changed files
+3. For every changed file:
+   a. Read the entire file using the Read tool (not just diff hunks)
+   b. Check it against the tiered checklist below (always check Tiers 1+4; check Tiers 2-3 when relevance filters match)
+   c. For each finding, quote the specific code line and explain why it's a problem
+4. After reviewing all files, verify: does the code actually deliver what the commits claim?
+5. Print a review summary table (see do:review for format)
+6. Fix any issues, run tests, and verify tests cover the changed code paths
+7. Only after printing the review summary may you proceed to "Open the PR"
+If the diff touches more than 15 files, delegate later batches to a subagent to keep context clean.
+</review_gate>
+Checklist to apply to each file:
 !`cat ~/.claude/lib/code-review-checklist.md`
-3. If issues are found, fix them and amend/recommit before proceeding
-4. Summarize the review findings (even if clean) so the user can see what was checked
+Verification — confirm before proceeding:
+- [ ] Read every changed file in full (not just diffs)
+- [ ] Checked each file against the relevant checklist tiers
+- [ ] Quoted specific code for each finding
+- [ ] Printed a review summary table with findings
 ## Open the PR

package/commands/do/release.md CHANGED Viewed

@@ -57,17 +57,36 @@ If ambiguous, ask the user to confirm before proceeding.
 4. **Commit the release**: Stage `package.json`, `package-lock.json`, and the changelog file. Commit with message `chore: release v{new_version}`
-## Local Code Review (before opening PR)
+## Local Code Review (REQUIRED GATE)
-Perform a thorough self-review. Read each changed file — not just the diff — to understand how the changes behave at runtime.
+A release without a deep code review ships bugs to users. This review is the last line of defense — the full diff since the last release often contains interactions that individual PR reviews missed.
-1. Run `git diff {target}...{source}` to see the full diff
-2. **For each changed file**, read the full file (not just the diff hunks) and check:
+<review_gate>
+1. Read all commit messages since last release to understand the scope
+2. Run `git diff {target}...{source}` to get the list of changed files
+3. For every changed file:
+   a. Read the entire file using the Read tool (not just diff hunks)
+   b. Check it against the tiered checklist below (always check Tiers 1+4; check Tiers 2-3 when relevance filters match)
+   c. For each finding, quote the specific code line and explain why it's a problem
+4. After reviewing all files, verify: does the aggregate change set deliver what the release claims?
+5. Print a review summary table (see do:review for format)
+6. Fix any issues, run tests, verify tests cover the changed code paths, commit and push
+7. Only after printing the review summary may you proceed to "Open the Release PR"
+If the diff touches more than 15 files, delegate later batches to a subagent to keep context clean.
+</review_gate>
+Checklist to apply to each file:
 !`cat ~/.claude/lib/code-review-checklist.md`
-3. If issues are found, fix them, commit, and push before proceeding
-4. Summarize the review findings (even if clean) so the user can see what was checked
+Verification — confirm before proceeding:
+- [ ] Read every changed file in full (not just diffs)
+- [ ] Checked each file against the relevant checklist tiers
+- [ ] Quoted specific code for each finding
+- [ ] Printed a review summary table with findings
 ## Open the Release PR

package/commands/do/review.md CHANGED Viewed

@@ -17,6 +17,22 @@ If there are no changes, inform the user and stop.
 CLAUDE.md is already loaded into your context. Use its rules (code style, error handling, logging, security model, scope exclusions) as overrides to generic best practices throughout this review. For example, if CLAUDE.md says "no auth needed — internal tool", do not flag missing authentication.
+<review_instructions>
+## PR-Level Coherence Check
+Before reviewing individual files, understand what this change set claims to do:
+1. Read commit messages (`git log {base}...HEAD --oneline`)
+2. After reviewing all files, verify: does the changed code actually deliver what the commits claim? Flag any claims not backed by code (e.g., "adds rate limiting" but only adds a comment).
+## Large PR Strategy
+If the diff touches more than 15 files, split the review into batches:
+1. Group files by module/directory
+2. Review each batch, printing findings as you go
+3. Delegate files beyond the first 15 to a subagent if context is getting full
 ## Deep File Review
 For **each changed file** in the diff, read the **entire file** (not just diff hunks). Reviewing only the diff misses context bugs where new code interacts incorrectly with existing code.
@@ -59,12 +75,20 @@ With the flow understood, evaluate the changed code against these principles:
 Only flag principle violations that are **concrete and actionable** in the changed code. Do not flag pre-existing design issues in untouched code unless the changes make them worse.
+</review_instructions>
+<checklist>
 ### Per-File Checklist
-Check every file against this checklist:
+Check every file against this checklist. The checklist is organized into tiers — always check Tiers 1 and 4, and check Tiers 2-3 only when the relevance filter matches the file:
 !`cat ~/.claude/lib/code-review-checklist.md`
+</checklist>
+<deep_checks>
 ### Additional deep checks (read surrounding code to verify):
 **Cross-file consistency**
@@ -87,6 +111,7 @@ Check every file against this checklist:
 **Access scope changes**
 - If the PR widens access to an endpoint or resource (admin→public, internal→external), trace all shared dependencies the endpoint uses (rate limiters, queues, connection pools, external service quotas) and assess whether they were sized for the previous access level — in-memory/process-local limiters don't enforce limits across horizontally scaled instances
+- If the PR adds endpoints under a restricted route group (admin, internal, scoped), read sibling endpoints in the same route group and verify the new endpoint applies the same authorization gate — missing gates on admin-mounted endpoints are consistently the most dangerous review finding
 **Guard-before-cache ordering**
 - If a handler performs a pre-flight guard check (rate limit, quota, feature flag) before a cache lookup or short-circuit path, verify the guard doesn't block operations that would be served from cache without touching the guarded resource — restructure so cache hits bypass the guard
@@ -123,17 +148,51 @@ Check every file against this checklist:
 **Data model vs access pattern alignment**
 - If the PR adds queries that claim ordering (e.g., "recent", "top"), verify the underlying key/index design actually supports that ordering natively — random UUIDs and non-time-sortable keys require full scans and in-memory sorting, which degrades at scale
+**Deletion/lifecycle cleanup completeness**
+- If the PR adds a delete or destroy function, trace all resources created during the entity's lifecycle (data directories, git branches, child records, temporary files, worktrees) and verify each is cleaned up on deletion. Compare with existing delete functions in the codebase for completeness patterns
+**Update schema depth**
+- If the PR derives an update/patch schema from a create schema (e.g., `.partial()`, `Partial<T>`), verify that nested objects also become partial — shallow partial on deeply-required schemas rejects valid partial updates where the caller only wants to change one nested field
+**Mutation return value freshness**
+- If a function mutates an entity and returns it, verify the returned object reflects the post-mutation state, not a pre-read snapshot. Also check whether dependent scheduling/evaluation state (backoff, timers, status flags) is reset when a "force" or "trigger" operation is invoked
+**Responsibility relocation audit**
+- If the PR moves a responsibility from one module to another (e.g., a database write from a handler to middleware, a computation from client to server), trace all code at the old location that depended on the timing, return value, or side effects of the moved operation — guards, response fields, in-memory state updates, and downstream scheduling that assumed co-located execution. Verify the new execution point preserves these contracts or that dependents are updated. Check for dead code left behind at the old location
+**Read-after-write consistency**
+- If the PR writes to a data store and then immediately queries that store (especially scans, aggregations, or replica reads), check whether the store's consistency model guarantees visibility of the write. If not, flag the read as potentially stale and suggest computing from in-memory state, using consistent-read options, or adding a delay/caveat
 **Formatting & structural consistency**
 - If the PR adds content to an existing file (list items, sections, config entries), verify the new content matches the file's existing indentation, bullet style, heading levels, and structure — rendering inconsistencies are the most common Copilot review finding
+</deep_checks>
+<verify_findings>
+## Verify Findings
+For each issue found, ground it in evidence before classifying:
+1. **Quote the specific code line(s)** that demonstrate the issue
+2. **Explain why it's a problem** in one sentence given the surrounding context
+3. If the fix involves async/state changes, **trace the execution path** to confirm the issue is real
+4. If you cannot quote specific code for a finding, downgrade it to **[UNCERTAIN]**
+After verifying all findings, run the project's build and test commands to confirm no false positives.
+</verify_findings>
+<fix_and_report>
 ## Fix Issues Found
-For each issue found:
+For each verified issue:
 1. Classify severity: **CRITICAL** (runtime crash, data leak, security) vs **IMPROVEMENT** (consistency, robustness, conventions)
 2. Fix all CRITICAL issues immediately
 3. For IMPROVEMENT issues, fix them too — the goal is to eliminate Copilot review round-trips
 4. After fixes, run the project's test suite and build command (per project conventions already in context)
-5. Commit fixes: `refactor: address code review findings`
+5. Verify the test suite covers the changed code paths — passing unrelated tests is not validation
+6. Commit fixes: `refactor: address code review findings`
 ## Report
@@ -155,3 +214,5 @@ Print a summary table of what was reviewed and found:
 ```
 If no issues were found, confirm the code is clean and ready for PR.
+</fix_and_report>

package/install.sh CHANGED Viewed

@@ -36,6 +36,7 @@ OLD_COMMANDS=(cam good makegoals makegood optimize-md)
 LIBS=(
   code-review-checklist copilot-review-loop graphql-escaping
+  remediation-agent-template
 )
 HOOKS=(slashdo-check-update slashdo-statusline)

package/lib/code-review-checklist.md CHANGED Viewed

@@ -1,3 +1,11 @@
+<!--
+  Triage: Check Tiers 1 and 4 for every file. Check Tier 2/3 only when
+  the relevance filter matches the changed code. This prevents important
+  checks from being lost in a long list.
+-->
+## Tier 1 — Always Check (Runtime Crashes, Security, Hygiene)
    **Hygiene**
    - Leftover debug code (`console.log`, `debugger`, TODO/FIXME/HACK), hardcoded secrets/credentials, and uncommittable files (.env, node_modules, build artifacts)
    - Overly broad changes that should be split into separate PRs
@@ -11,113 +19,162 @@
    - Type coercion edge cases — `Number('')` is `0` not empty, `0` is falsy in truthy checks, `NaN` comparisons are always false; string comparison operators (`<`, `>`, `localeCompare`) do lexicographic, not semantic, ordering (e.g., `"10" < "2"`). Use explicit type checks (`Number.isFinite()`, `!= null`) and dedicated libraries (e.g., semver for versions) instead of truthy guards or lexicographic ordering when zero/empty are valid values or semantic ordering matters
    - Functions that index into arrays without guarding empty arrays; state/variables declared but never updated or only partially wired up
    - Shared mutable references — module-level defaults passed by reference mutate across calls (use `structuredClone()`/spread); `useCallback`/`useMemo` referencing a later `const` (temporal dead zone); object spread followed by unconditional assignment that clobbers spread values
-   - Side effects during React render (setState, navigation, mutations outside useEffect)
+   - Functions with >10 branches or >15 cyclomatic complexity — refactor into smaller units
+   **API & URL safety**
+   - User-supplied or system-generated values interpolated into URL paths, shell commands, file paths, or subprocess arguments without encoding/validation — use `encodeURIComponent()` for URLs, regex allowlists for execution boundaries. Generated identifiers used as URL path segments must be safe for your router/storage (no `/`, `?`, `#`; consider allowlisting characters and/or applying `encodeURIComponent()`). Identifiers derived from human-readable names (slugs) used for namespaced resources (git branches, directories) need a unique suffix (ID, hash) to prevent collisions between entities with the same or similar names
+   - Route params passed to services without format validation; path containment checks using string prefix without path separator boundary (use `path.relative()`)
+   - Error/fallback responses that hardcode security headers instead of using centralized policy — error paths bypass security tightening
+   **Trust boundaries & data exposure**
+   - API responses returning full objects with sensitive fields — destructure and omit across ALL response paths (GET, PUT, POST, error, socket); comments/docs claiming data isn't exposed while the code path does expose it
+   - Server trusting client-provided computed/derived values (scores, totals, correctness flags) when the server can recompute them — strip and recompute server-side; don't require clients to submit fields the server should own
+   - New endpoints mounted under restricted paths (admin, internal) missing authorization verification — compare with sibling endpoints in the same route group to ensure the same access gate (role check, scope validation) is applied consistently
-   **Async & state consistency**
-   - Optimistic state changes (view switches, navigation, success callbacks) before async completion — if the operation fails or is cancelled, the UI is stuck with no rollback. Check return values/errors before calling success callbacks. Handle both failure and cancellation paths
+## Tier 2 — Check When Relevant (Data Integrity, Async, Error Handling)
+   **Async & state consistency** _[applies when: code uses async/await, Promises, or UI state]_
+   - Optimistic state changes (view switches, navigation, success callbacks) before async completion — if the operation fails or is cancelled, the UI is stuck with no rollback. Check return values/errors before calling success callbacks. Handle both failure and cancellation paths. Watch for `.catch(() => null)` followed by unconditional success code (toast, state update) — the catch silences the error but the success path still runs. Either let errors propagate naturally or check the return value before proceeding
    - Multiple coupled state variables updated independently — actions that change one must update all related fields; debounced/cancelable operations must reset loading state on every exit path (cleared, stale, failed, aborted)
    - Error notification at multiple layers (shared API client + component-level) — verify exactly one layer owns user-facing error messages
    - Optimistic updates using full-collection snapshots for rollback — a second in-flight action gets clobbered. Use per-item rollback and functional state updaters after async gaps; sync optimistic changes to parent via callback or trigger refetch on remount
    - State updates guarded by truthiness of the new value (`if (arr?.length)`) — prevents clearing state when the source legitimately returns empty. Distinguish "no response" from "empty response"
+   - Mutation/trigger functions that return or propagate stale pre-mutation state — if a function activates, updates, or resets an entity, the returned value and any dependent scheduling/evaluation state (backoff timers, "last run" timestamps, status flags) must reflect the post-mutation state, not a snapshot read before the mutation
+   - Fire-and-forget or async writes where the in-memory object is not updated (response returns stale data) or is updated unconditionally regardless of write success (response claims state that was never persisted) — update in-memory state conditionally on write outcome, or document the tradeoff explicitly
+   - Missing `await` on async operations in error/cleanup paths — fire-and-forget cleanup (e.g., aborting a failed operation, rolling back partial state) that must complete before the function returns or the caller proceeds
    - `Promise.all` without error handling — partial load with unhandled rejection. Wrap with fallback/error state
+   - Side effects during React render (setState, navigation, mutations outside useEffect)
-   **Resource management**
-   - Event listeners, socket handlers, subscriptions, timers, and useEffect side effects are cleaned up on unmount/teardown
-   - Initialization functions (schedulers, pollers, listeners) that don't guard against multiple calls — creates duplicate instances. Check for existing instances before reinitializing
-   **Error handling**
+   **Error handling** _[applies when: code has try/catch, .catch, error responses, or external calls]_
    - Service functions throwing generic `Error` for client-caused conditions — bubbles as 500 instead of 400/404. Use typed error classes with explicit status codes; ensure consistent error responses across similar endpoints. Include expected concurrency/conditional failures (transaction cancellations, optimistic lock conflicts) — catch and translate to 409/retry rather than letting them surface as 500
    - Swallowed errors (empty `.catch(() => {})`), handlers that replace detailed failure info with generic messages, and error/catch handlers that exit cleanly (`exit 0`, `return`) without any user-visible output — surface a notification, propagate original context, and make failures look like failures
    - Destructive operations in retry/cleanup paths assumed to succeed without their own error handling — if cleanup fails, retry logic crashes instead of reporting the intended failure
+   - External service calls without configurable timeouts — a hung downstream service blocks the caller indefinitely
+   - Missing fallback behavior when downstream services are unavailable (see also: retry without backoff in "Sync & replication")
-   **API & URL safety**
-   - User-supplied or system-generated values interpolated into URL paths, shell commands, file paths, or subprocess arguments without encoding/validation — use `encodeURIComponent()` for URLs, regex allowlists for execution boundaries. Generated identifiers used as URL path segments must be safe for your router/storage (no `/`, `?`, `#`; consider allowlisting characters and/or applying `encodeURIComponent()`)
-   - Route params passed to services without format validation; path containment checks using string prefix without path separator boundary (use `path.relative()`)
-   - Error/fallback responses that hardcode security headers instead of using centralized policy — error paths bypass security tightening
-   **Trust boundaries & data exposure**
-   - API responses returning full objects with sensitive fields — destructure and omit across ALL response paths (GET, PUT, POST, error, socket); comments/docs claiming data isn't exposed while the code path does expose it
-   - Server trusting client-provided computed/derived values (scores, totals, correctness flags) when the server can recompute them — strip and recompute server-side; don't require clients to submit fields the server should own
-   **Input handling**
-   - Trimming values where whitespace is significant (API keys, tokens, passwords, base64) — only trim identifiers/names
-   - Endpoints accepting unbounded arrays/collections without upper limits — enforce max size or move to background jobs
+   **Resource management** _[applies when: code uses event listeners, timers, subscriptions, or useEffect]_
+   - Event listeners, socket handlers, subscriptions, timers, and useEffect side effects are cleaned up on unmount/teardown
+   - Deletion/destroy functions that clean up the primary resource but leave orphaned secondary resources (data directories, git branches, child records, temporary files) — trace all resources created during the entity's lifecycle and verify each is removed on delete
+   - Initialization functions (schedulers, pollers, listeners) that don't guard against multiple calls — creates duplicate instances. Check for existing instances before reinitializing
-   **Validation & consistency**
+   **Validation & consistency** _[applies when: code handles user input, schemas, or API contracts]_
+   - API versioning: breaking changes to public endpoints without version bump or deprecation path
+   - Backward-incompatible response shape changes without client migration plan
    - New endpoints/schemas should match validation patterns of existing similar endpoints — field limits, required fields, types, error handling. If validation exists on one endpoint for a param, the same param on other endpoints needs the same validation
    - When a validation/sanitization function is introduced for a field, trace ALL write paths (create, update, sync, import) — partial application means invalid values re-enter through the unguarded path
-   - Schema fields accepting values downstream code can't handle; Zod/schema stripping fields the service reads (silent `undefined`); config values persisted but silently ignored by the implementation — trace each field through schema → service → consumer
+   - Schema fields accepting values downstream code can't handle; Zod/schema stripping fields the service reads (silent `undefined`); config values persisted but silently ignored by the implementation — trace each field through schema → service → consumer. Update schemas derived from create schemas (e.g., `.partial()`) must also make nested object fields optional — shallow partial on a deeply-required schema rejects valid partial updates. Additionally, `.deepPartial()` or `.partial()` on schemas with `.default()` values will apply those defaults on update, silently overwriting existing persisted values with defaults — create explicit update schemas without defaults instead
+   - Entity creation without case-insensitive uniqueness checks — names differing only in case (e.g., "MyAgent" vs "myagent") cause collisions in case-insensitive contexts (file paths, git branches, URLs). Normalize to lowercase before comparing
    - Handlers reading properties from framework-provided objects using field names the framework doesn't populate — silent `undefined`. Verify property names match the caller's contract
+   - Data model fields that have different names depending on the creation/write path (e.g., `createdAt` vs `created`) — code referencing only one naming convention silently misses records created through other paths. Trace all write paths to discover the actual field names in use
    - Numeric values from strings used without `NaN`/type guards — `NaN` comparisons silently pass bounds checks. Clamp query params to safe lower bounds
    - UI elements hidden from navigation but still accessible via direct URL — enforce restrictions at the route level
    - Summary counters/accumulators that miss edge cases (removals, branch coverage, underflow on decrements — guard against going negative with lower-bound conditions); silent operations in verbose sequences where all branches should print status
-   **Intent vs implementation**
-   - Labels, comments, status messages, or documentation that describe behavior the code doesn't implement — e.g., a map named "renamed" that only deletes, or an action labeled "migrated" that never creates the target
-   - Inline code examples, command templates, and query snippets that aren't syntactically valid as written — template placeholders must use a consistent format, queries must use correct syntax for their language (e.g., single `{}` in GraphQL, not `{{}}`)
-   - Cross-references between files (identifiers, parameter names, format conventions, operational thresholds) that disagree — when one reference changes, trace all other files that reference the same entity and update them
-   - Sequential instructions or steps whose ordering doesn't match the required execution order — readers following in order will perform actions at the wrong time (e.g., "record X" in step 2 when X must be captured before step 1's action)
-   - Sequential numbering (section numbers, step numbers) with gaps or jumps after edits — verify continuity
-   - Completion markers, success flags, or status files written before the operation they attest to finishes — consumers see false success if the operation fails after the write
-   - Existence checks (directory exists, file exists, module resolves) used as proof of correct/complete installation — a directory can exist but be empty, a file can exist with invalid contents. Verify the specific resource the consumer needs
-   - Tracking/checkpoint files that default to empty on parse failure — causes full re-execution. Fail loudly instead
-   - Registering references to resources without verifying the resource exists — dangling references after failed operations
-   **Concurrency & data integrity**
+   **Concurrency & data integrity** _[applies when: code has shared state, database writes, or multi-step mutations]_
    - Shared mutable state accessed by concurrent requests without locking or atomic writes; multi-step read-modify-write cycles that can interleave — use conditional writes/optimistic concurrency (e.g., condition expressions, version checks) to close the gap between read and write; if the conditional write fails, surface a retryable error instead of letting it bubble as a 500
    - Multi-table writes without a transaction — FK violations or errors leave partial state
+   - Writes that replace an entire composite attribute (array, map, JSON blob) when the field is populated by multiple sources — the write discards data from other sources. Use a separate attribute, merge with the existing value, or use list/set append operations
    - Functions with early returns for "no primary fields to update" that silently skip secondary operations (relationship updates, link writes)
    - Functions that acquire shared state (locks, flags, markers) with exit paths that skip cleanup — leaves the system permanently locked. Trace all exit paths including error branches
-   **Search & navigation**
-   - Search results linking to generic list pages instead of deep-linking to the specific record
-   - Search/query code hardcoding one backend's implementation when the system supports multiple — verify option/parameter names are mapped between backends
+   **Input handling** _[applies when: code accepts user/external input]_
+   - Trimming values where whitespace is significant (API keys, tokens, passwords, base64) — only trim identifiers/names
+   - Endpoints accepting unbounded arrays/collections without upper limits — enforce max size or move to background jobs
-   **Sync & replication**
-   - Upsert/`ON CONFLICT UPDATE` updating only a subset of exported fields — replicas diverge. Document deliberately omitted fields
-   - Pagination using `COUNT(*)` (full table scan) instead of `limit + 1`; endpoints missing `next` token input/output; hard-capped limits silently truncating results
-   - Batch/paginated API calls (database batch gets, external service calls) that don't handle partial results — unprocessed items, continuation tokens, or rate-limited responses silently dropped. Add retry loops with backoff for unprocessed items
-   - Retry loops without backoff or max-attempt limits — tight loops under throttling extend latency indefinitely. Use bounded retries with exponential backoff/jitter
+## Tier 3 — Domain-Specific (Check Only When File Type Matches)
-   **SQL & database**
+   **SQL & database** _[applies when: code contains SQL, ORM queries, or migration files]_
    - Parameterized query placeholder indices must match parameter array positions — especially with shared param builders or computed indices
    - Database triggers clobbering explicitly-provided values; auto-incrementing columns that only increment on INSERT, not UPDATE
    - Full-text search with strict parsers (`to_tsquery`) on user input — use `websearch_to_tsquery` or `plainto_tsquery`
    - Dead queries (results never read), N+1 patterns inside transactions, O(n²) algorithms on growing data
    - `CREATE TABLE IF NOT EXISTS` as sole migration strategy — won't add columns/indexes on upgrade. Use `ALTER TABLE ... ADD COLUMN IF NOT EXISTS` or a migration framework
    - Functions/extensions requiring specific database versions without verification
+   - Migrations that lock tables for extended periods (ADD COLUMN with default on large tables, CREATE INDEX without CONCURRENTLY) — use concurrent operations or batched backfills
+   - Missing rollback/down migration or untested rollback path
-   **Lazy initialization & module loading**
+   **Sync & replication** _[applies when: code uses pagination, batch APIs, or data sync]_
+   - Upsert/`ON CONFLICT UPDATE` updating only a subset of exported fields — replicas diverge. Document deliberately omitted fields
+   - Pagination using `COUNT(*)` (full table scan) instead of `limit + 1`; endpoints missing `next` token input/output; hard-capped limits silently truncating results
+   - Batch/paginated API calls (database batch gets, external service calls) that don't handle partial results — unprocessed items, continuation tokens, or rate-limited responses silently dropped. Add retry loops with backoff for unprocessed items
+   - Retry loops without backoff or max-attempt limits — tight loops under throttling extend latency indefinitely. Use bounded retries with exponential backoff/jitter
+   **Lazy initialization & module loading** _[applies when: code uses dynamic imports, lazy singletons, or bootstrap sequences]_
    - Cached state getters returning null before initialization — provide async initializer or ensure-style function
    - Module-level side effects (file reads, SDK init) without error handling — corrupted files crash the process on import
    - Bootstrap/resilience code that imports the dependencies it's meant to install — restructure so installation precedes resolution
    - Re-exporting from heavy modules defeats lazy loading — use lightweight shared modules
-   **Data format portability**
+   **Data format portability** _[applies when: code crosses serialization boundaries — JSON, DB, IPC]_
    - Values crossing serialization boundaries may change format (arrays in JSON vs string literals in DB) — convert consistently
+   - Reads issued immediately after writes to an eventually consistent store (database scans, replica reads, cache refreshes) may return stale data — use consistent-read options, compute from in-memory state after confirmed writes, or document the eventual-consistency window
    - BIGINT values parsed into JavaScript `Number` — precision lost past `MAX_SAFE_INTEGER`. Use strings or `BigInt`
    - Data model key/index design that doesn't support required query access patterns — e.g., claiming "recent" ordering but using non-time-sortable keys (random UUIDs, user IDs). Verify sort keys and indexes can serve the queries the code performs without full-partition scans and in-memory sorting
-   **Shell & portability**
+   **Shell & portability** _[applies when: code spawns subprocesses, uses shell scripts, or builds CLI tools]_
    - Subprocess calls under `set -e` abort on failure; non-critical writes fail on broken pipes — use `|| true` for non-critical output
    - Detached child processes with piped stdio — parent exit causes SIGPIPE. Redirect to log files or use `'ignore'`
    - Platform-specific assumptions — hardcoded shell interpreters, `path.join()` backslashes breaking ESM imports. Use `pathToFileURL()` for dynamic imports
-   **Test coverage**
-   - New logic/schemas/services without corresponding tests when similar existing code has tests
-   - New error paths untestable because services throw generic errors instead of typed ones
-   - Tests re-implementing logic under test instead of importing real exports — pass even when real code regresses
-   - Tests depending on real wall-clock time or external dependencies when testing logic — use fake timers and mocks
-   - Missing tests for trust-boundary enforcement — submit tampered values, verify server ignores them
+   **Search & navigation** _[applies when: code implements search results or deep-linking]_
+   - Search results linking to generic list pages instead of deep-linking to the specific record
+   - Search/query code hardcoding one backend's implementation when the system supports multiple — verify option/parameter names are mapped between backends
+   **Destructive UI operations** _[applies when: code adds delete, reset, revoke, or other destructive actions]_
+   - Destructive actions (delete, reset, revoke) in the UI without a confirmation step — compare with how similar destructive operations elsewhere in the codebase handle confirmation
-   **Accessibility**
+   **Accessibility** _[applies when: code modifies UI components or interactive elements]_
    - Interactive elements missing accessible names, roles, or ARIA states — including disabled interactions without `aria-disabled`
    - Custom toggle/switch UI built from non-semantic elements instead of native inputs
+## Tier 4 — Always Check (Quality, Conventions, AI-Generated Code)
+   **Intent vs implementation**
+   - Labels, comments, status messages, or documentation that describe behavior the code doesn't implement — e.g., a map named "renamed" that only deletes, or an action labeled "migrated" that never creates the target
+   - Inline code examples, command templates, and query snippets that aren't syntactically valid as written — template placeholders must use a consistent format, queries must use correct syntax for their language (e.g., single `{}` in GraphQL, not `{{}}`)
+   - Cross-references between files (identifiers, parameter names, format conventions, operational thresholds) that disagree — when one reference changes, trace all other files that reference the same entity and update them
+   - Responsibility relocated from one module to another (e.g., writes moved from handler to middleware) without updating all consumers that depended on the old location's timing, return value, or side effects — trace callers that relied on the synchronous or co-located behavior and verify they still work with the new execution point. Remove dead code left behind at the old location
+   - Sequential instructions or steps whose ordering doesn't match the required execution order — readers following in order will perform actions at the wrong time (e.g., "record X" in step 2 when X must be captured before step 1's action)
+   - Sequential numbering (section numbers, step numbers) with gaps or jumps after edits — verify continuity
+   - Completion markers, success flags, or status files written before the operation they attest to finishes — consumers see false success if the operation fails after the write
+   - Existence checks (directory exists, file exists, module resolves) used as proof of correct/complete installation — a directory can exist but be empty, a file can exist with invalid contents. Verify the specific resource the consumer needs
+   - Lookups that check only one scope when multiple exist — e.g., checking local git branches but not remote, checking in-memory cache but not persistent store. Trace all locations where the resource could exist and check each
+   - Tracking/checkpoint files that default to empty on parse failure — causes full re-execution. Fail loudly instead
+   - Registering references to resources without verifying the resource exists — dangling references after failed operations
+   **Automated pipeline discipline**
+   - Internal code review must run on all automated remediation changes BEFORE creating PRs — never go straight from "tests pass" to PR creation
+   - Copilot review must complete (approved or commented) on all PRs before merging — never merge while reviews are still pending unless the user explicitly approves
+   - Automated agents may introduce subtle issues that pass tests but violate project conventions — review agent output against CLAUDE.md conventions
+   **AI-generated code quality** _(Claude 4.6 specific failure modes)_
+   - Over-engineering: new abstractions, wrapper functions, helper files, or utility modules that serve only one call site — inline the logic instead
+   - Feature flags, configuration options, or extension points with only one possible value or consumer
+   - Commit messages or comments claiming a fix while the underlying bug remains — verify each claimed fix actually addresses the root cause, not just the symptom
+   - Functions containing placeholder comments (`// TODO`, `// FIXME`, `// implement later`) or stub implementations presented as complete
+   - Unnecessary defensive code: error handling for scenarios that provably cannot occur given the call site, fallbacks for internal functions that always return valid data
    **Configuration & hardcoding**
    - Hardcoded values when a config field or env var already exists; dead config fields nothing consumes; unused function parameters creating false API contracts; resource names (table names, queue names, bucket names) hardcoded without accounting for environment prefixes — lookups on response objects using the wrong key silently return undefined
-   - Duplicated config/constants/utilities across modules — extract to shared module to prevent drift
+   - Duplicated config/constants/utilities/helper functions across modules — extract to shared module to prevent drift. Watch for behavioral inconsistencies between copies (e.g., one returns `'unknown'` for null while another returns `'never'`)
    - CI pipelines installing without lockfile pinning or version constraints — non-deterministic builds
+   - Production code paths with no structured logging at entry/exit points
+   - Error logs missing reproduction context (request ID, input parameters)
+   - Async flows without correlation ID propagation
+   **Supply chain & dependency health**
+   - Lockfile committed and CI uses `--frozen-lockfile`; no lockfile drift from manifest
+   - `npm audit` / `cargo audit` / `pip-audit` has no unaddressed HIGH/CRITICAL vulnerabilities
+   - No `postinstall` scripts from untrusted packages executing arbitrary code without review
+   - Overly permissive version ranges (`*`, `>=`) on deps with known breaking-change history
+   **Test coverage**
+   - New logic/schemas/services without corresponding tests when similar existing code has tests
+   - New error paths untestable because services throw generic errors instead of typed ones
+   - Tests re-implementing logic under test instead of importing real exports — pass even when real code regresses
+   - Tests depending on real wall-clock time or external dependencies when testing logic — use fake timers and mocks
+   - Missing tests for trust-boundary enforcement — submit tampered values, verify server ignores them
+   - Tests that pass but don't cover the changed code paths — passing unrelated tests is not validation
    **Style & conventions**
    - Naming and patterns consistent with the rest of the codebase

package/lib/remediation-agent-template.md ADDED Viewed

@@ -0,0 +1,61 @@
+## Remediation Agent Template
+Use this template when spawning remediation agents in Phase 3c. Replace all `{PLACEHOLDERS}` with actual values.
+```
+<context>
+Project type: {PROJECT_TYPE}
+Build command: {BUILD_CMD}
+Test command: {TEST_CMD}
+Working directory: {WORKTREE_DIR} (this is a git worktree — all work happens here)
+Foundation utilities available (if created):
+{FOUNDATION_UTILS}
+</context>
+<findings>
+{FINDINGS}
+</findings>
+<instructions>
+You are {AGENT_NAME} on team better-{DATE}.
+Your task: Fix all {CATEGORY} findings listed above.
+FINDING VALIDATION — verify before fixing:
+- Before fixing each finding, READ the file and at least 30 lines of surrounding
+  context to confirm the issue is genuine.
+- Check whether the flagged code is already correct (e.g., a Promise chain that
+  IS properly awaited downstream, a value that IS validated earlier in the function,
+  a pattern that IS idiomatic for the framework).
+- If the existing code is already correct, SKIP the fix and report it as a
+  false positive with a brief explanation of why the original code is fine.
+- Do not make changes that are semantically equivalent to the original code
+  (e.g., wrapping a .then() chain in an async IIFE adds noise without fixing anything).
+</instructions>
+<guardrails>
+- Only use APIs/functions verified to exist by reading source files. If a fix
+  requires an API you haven't confirmed, read the module's exports first.
+- Fix with minimum change required. Do not introduce new abstractions or helpers
+  unless the finding specifically calls for it. A one-line fix beats a refactored module.
+- If a git/build/file-read command fails, retry once after verifying the working
+  directory and path. If it fails again, report the error and move to the next finding.
+</guardrails>
+<commit_strategy>
+Goal: each commit builds independently and contains one logical group of
+related fixes. Use conventional prefixes (fix:, refactor:, feat:, security:).
+Stage specific files only (`git -C {WORKTREE_DIR} add <specific files>` — never
+`git add -A` or `git add .`). Run {BUILD_CMD} in {WORKTREE_DIR} before committing.
+No co-author annotations or version bumps.
+</commit_strategy>
+CONFLICT AVOIDANCE:
+- Only modify files listed in your assigned findings
+- If you need to modify a file assigned to another agent, skip that change and report it
+After all fixes:
+- Ensure all changes are committed (no uncommitted work)
+- Mark your task as completed via TaskUpdate
+- Report: commits made, files modified, findings addressed, any skipped issues
+```

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "slash-do",
-  "version": "1.4.3",
+  "version": "1.5.1",
   "description": "Curated slash commands for AI coding assistants — Claude Code, OpenCode, Gemini CLI, and Codex",
   "author": "Adam Eivy <adam@eivy.com>",
   "license": "MIT",

package/uninstall.sh CHANGED Viewed

@@ -32,6 +32,7 @@ OLD_COMMANDS=(cam good makegoals makegood optimize-md)
 LIBS=(
   code-review-checklist copilot-review-loop graphql-escaping
+  remediation-agent-template
 )
 HOOKS=(slashdo-check-update slashdo-statusline)