slash-do 1.4.3 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -49,6 +49,17 @@ When the resolved model is `opus`, **omit** the `model` parameter on the Agent/T
49
49
 
50
50
  Opus reduces false positives in audit (judgment-heavy). Sonnet is the floor for code-writing agents (remediation). Haiku works for fast first-pass pattern scanning but may produce more false positives — remediation agents (Sonnet+) validate before fixing.
51
51
 
52
+ ## Compaction Guidance
53
+
54
+ When compacting during this workflow, always preserve:
55
+ - The `FILE_OWNER_MAP` (complete, not summarized)
56
+ - All CRITICAL/HIGH findings with file:line references
57
+ - The current phase number and what phases remain
58
+ - All PR numbers and URLs created so far
59
+ - `BUILD_CMD`, `TEST_CMD`, `PROJECT_TYPE`, `WORKTREE_DIR` values
60
+ - `VCS_HOST`, `CLI_TOOL`, `DEFAULT_BRANCH`, `CURRENT_BRANCH`
61
+
62
+
52
63
  ## Phase 0: Discovery & Setup
53
64
 
54
65
  Detect the project environment before any scanning or remediation.
@@ -88,15 +99,8 @@ Record as `BUILD_CMD` and `TEST_CMD`.
88
99
  - Check for `.changelog/` directory → `HAS_CHANGELOG`
89
100
  - Check for existing `../better-*` worktrees: `git worktree list`. If found, inform the user and ask whether to resume (use existing worktree) or clean up (remove it and start fresh)
90
101
 
91
- ### 0e: Browser Authentication (GitHub only)
92
- If `VCS_HOST` is `github`, proactively verify browser authentication for the Copilot review loop later:
93
- 1. Navigate to the repo URL using `browser_navigate` via Playwright MCP
94
- 2. Take a snapshot and check for user avatar/menu indicating logged-in state
95
- 3. If NOT logged in: navigate to `https://github.com/login`, inform the user **"Please log in to GitHub in the browser. I'll wait for you to complete authentication."**, and use `AskUserQuestion` to wait for the user to confirm they've logged in
96
- 4. Do NOT close the browser — it stays open for the entire session
97
- 5. Record `BROWSER_AUTHENTICATED = true` once confirmed
98
102
 
99
- This ensures the browser is ready before we need it in Phase 6, avoiding interruptions mid-flow.
103
+ <audit_instructions>
100
104
 
101
105
  ## Phase 1: Unified Audit
102
106
 
@@ -107,7 +111,7 @@ Launch 7 Explore agents in two batches. Each agent must report findings in this
107
111
  - **[CRITICAL/HIGH/MEDIUM/LOW]** `file:line` - Description. Suggested fix: ... Complexity: Simple/Medium/Complex
108
112
  ```
109
113
 
110
- **IMPORTANT: Context requirement for audit agents.** When flagging an issue, agents MUST read at least 30 lines of surrounding context to confirm the issue is real. Common false positives to watch for:
114
+ **Context requirement.** Before flagging, read at least 30 lines of surrounding context to confirm the issue is real. Common false positives to watch for:
111
115
  - A Promise `.then()` chain that appears "unawaited" but IS collected into an array and awaited via `Promise.all` downstream
112
116
  - A value that appears "unvalidated" but IS checked by a guard clause earlier in the function or by the caller
113
117
  - A pattern that looks like an anti-pattern in isolation but IS idiomatic for the specific framework or library being used
@@ -115,6 +119,17 @@ Launch 7 Explore agents in two batches. Each agent must report findings in this
115
119
 
116
120
  If the surrounding context shows the code is correct, do NOT flag it.
117
121
 
122
+ If uncertain whether something is a genuine issue, report it as **[UNCERTAIN]** with your reasoning. The consolidation phase will evaluate these separately. Fewer confident findings is better than padding with questionable ones.
123
+
124
+ <approach>
125
+ For each potential finding:
126
+ 1. Read the file and 30+ lines of surrounding context
127
+ 2. Quote the specific code that demonstrates the issue
128
+ 3. Explain why it's a problem given the context
129
+ 4. Only then classify severity and suggest a fix
130
+ Skip step 4 if steps 1-3 reveal the code is correct.
131
+ </approach>
132
+
118
133
  ### Batch 1 (5 parallel Explore agents via Task tool):
119
134
 
120
135
  **Model**: Pass `AUDIT_MODEL` as the `model` parameter on each agent. If `AUDIT_MODEL` is `opus`, omit the parameter to inherit from session.
@@ -122,6 +137,8 @@ If the surrounding context shows the code is correct, do NOT flag it.
122
137
  1. **Security & Secrets**
123
138
  Sources: authentication checks, credential exposure, infrastructure security, input validation, dependency health
124
139
  Focus: hardcoded credentials, API keys, exposed secrets, authentication bypasses, disabled security checks, PII exposure, injection vulnerabilities (SQL/command/path traversal), insecure CORS configurations, missing auth checks, unsanitized user input in file paths or queries, known CVEs in dependencies (check `npm audit` / `cargo audit` / `pip-audit` / `go vuln` output), abandoned or unmaintained dependencies, overly permissive dependency version ranges
140
+ OWASP Top 10 framing: broken auth (session fixation, credential stuffing), security misconfiguration (default creds, debug mode in prod), SSRF (user-controlled URLs in server fetch without allowlist), mass assignment (request bodies bound to models without field allowlist)
141
+ Supply chain: lockfile committed + frozen installs in CI, no untrusted postinstall scripts
125
142
 
126
143
  2. **Code Quality & Style**
127
144
  Sources: code brittleness, convention violations, test workarounds, logging & observability
@@ -134,10 +151,13 @@ If the surrounding context shows the code is correct, do NOT flag it.
134
151
  4. **Architecture & SOLID**
135
152
  Sources: structural violations, coupling analysis, modularity, API contract quality
136
153
  Focus: Single Responsibility violations (god files >500 lines, functions >50 lines doing multiple things), tight coupling between modules, circular dependencies, mixed concerns in single files, dependency inversion violations, classes/modules with too many responsibilities (>20 public methods), deep nesting (>4 levels), long parameter lists, modules reaching into other modules' internals, inconsistent API error response shapes across endpoints, list endpoints missing pagination, missing rate limiting on public endpoints, inconsistent request/response envelope patterns
154
+ API contract consistency: breaking response shape changes without versioning, inconsistent error envelopes across endpoints, missing deprecation headers on sunset endpoints
137
155
 
138
156
  5. **Bugs, Performance & Error Handling**
139
157
  Sources: runtime safety, resource management, async correctness, performance, race conditions
140
158
  Focus: missing `await` on async calls, unhandled promise rejections, null/undefined access without guards, off-by-one errors, incorrect comparison operators, mutation of shared state, resource leaks (unbounded caches/maps, unclosed connections/streams), `process.exit()` in library code, async routes without error forwarding, missing AbortController on data fetching, N+1 query patterns (loading related records inside loops), O(n²) or worse algorithms in hot paths, unbounded result sets (missing LIMIT/pagination on DB queries), missing database indexes on frequently queried columns, race conditions (TOCTOU, double-submit without idempotency keys, concurrent writes to shared state without locks, stale-read-then-write patterns), missing connection pooling or pool exhaustion
159
+ Resilience: external calls without timeouts, missing fallback for unavailable downstream services, retry without backoff ceiling/jitter, missing health check endpoints
160
+ Observability: production paths without structured logging, error logs missing reproduction context (request ID, input params), async flows without correlation IDs
141
161
 
142
162
  ### Batch 2 (2 agents after Batch 1 completes):
143
163
 
@@ -150,6 +170,7 @@ If the surrounding context shows the code is correct, do NOT flag it.
150
170
  - **Python**: mutable default arguments, bare except clauses, missing type hints on public APIs, sync I/O in async contexts
151
171
  - **Go**: unchecked errors, goroutine leaks, defer in loops, context propagation gaps
152
172
  - **Web projects (any stack)**: accessibility issues — missing alt text on images, broken keyboard navigation, missing ARIA labels on interactive elements, insufficient color contrast, form inputs without associated labels
173
+ - **Database migrations**: exclusive-lock ALTER TABLE on large tables, CREATE INDEX without CONCURRENTLY, missing down migrations or untested rollback paths
153
174
  - General: framework-specific security issues, language-specific gotchas, domain-specific compliance, environment variable hygiene (missing `.env.example`, required env vars not validated at startup, secrets in config files that should be in env)
154
175
 
155
176
  7. **Test Coverage**
@@ -158,12 +179,16 @@ If the surrounding context shows the code is correct, do NOT flag it.
158
179
 
159
180
  Wait for ALL agents to complete before proceeding.
160
181
 
182
+ </audit_instructions>
183
+
184
+ <plan_and_remediate>
185
+
161
186
  ## Phase 2: Plan Generation
162
187
 
163
188
  1. Read the existing `PLAN.md` (create if it doesn't exist)
164
189
  2. Consolidate all findings from Phase 1, deduplicating across agents (same file:line flagged by multiple agents → keep the most specific description)
165
190
  3. Identify **shared utility extractions** — patterns duplicated 3+ times that should become reusable functions. Group these as "Foundation" work for Phase 3b.
166
- 4. **Build the file ownership map** (CRITICAL for Phase 5):
191
+ 4. **Build the file ownership map** (required by Phase 5 for conflict-free PRs):
167
192
  - For each finding, record which file(s) it touches
168
193
  - Assign each file to exactly ONE category (its primary category)
169
194
  - If a file is touched by multiple categories, assign it to the category with the highest-severity finding for that file
@@ -266,58 +291,17 @@ If no shared utilities were identified, skip this step.
266
291
  4. Spawn up to 5 general-purpose agents as teammates. **Pass `REMEDIATION_MODEL` as the `model` parameter on each agent.** If `REMEDIATION_MODEL` is `opus`, omit the parameter to inherit from session.
267
292
 
268
293
  ### Agent instructions template:
269
- ```
270
- You are {agent-name} on team better-{DATE}.
271
-
272
- Your task: Fix all {CATEGORY} findings from the Good audit.
273
- Working directory: {WORKTREE_DIR} (this is a git worktree — all work happens here)
274
-
275
- Project type: {PROJECT_TYPE}
276
- Build command: {BUILD_CMD}
277
- Test command: {TEST_CMD}
278
-
279
- Foundation utilities available (if created):
280
- {list of utility files with brief descriptions}
281
-
282
- Findings to address:
283
- {filtered list of CRITICAL/HIGH/MEDIUM findings for this category}
284
-
285
- FINDING VALIDATION — verify before fixing:
286
- - Before fixing each finding, READ the file and at least 30 lines of surrounding
287
- context to confirm the issue is genuine.
288
- - Check whether the flagged code is already correct (e.g., a Promise chain that
289
- IS properly awaited downstream, a value that IS validated earlier in the function,
290
- a pattern that IS idiomatic for the framework).
291
- - If the existing code is already correct, SKIP the fix and report it as a
292
- false positive with a brief explanation of why the original code is fine.
293
- - Do not make changes that are semantically equivalent to the original code
294
- (e.g., wrapping a .then() chain in an async IIFE adds noise without fixing anything).
295
-
296
- COMMIT STRATEGY — commit early and often:
297
- - After completing each logical group of related fixes, stage those files
298
- and commit immediately with a descriptive conventional commit message.
299
- - Each commit should be independently valid (build should pass).
300
- - Run {BUILD_CMD} in {WORKTREE_DIR} before each commit to verify.
301
- - Use `git -C {WORKTREE_DIR} add <specific files>` — never `git add -A` or `git add .`
302
- - Use `git -C {WORKTREE_DIR} commit -m "prefix: description"`
303
- - Use conventional commit prefixes: fix:, refactor:, feat:, security:
304
- - Do NOT include co-author or generated-by annotations in commits.
305
- - Do NOT bump the version — that happens once at the end.
306
-
307
- After all fixes:
308
- - Ensure all changes are committed (no uncommitted work)
309
- - Mark your task as completed via TaskUpdate
310
- - Report: commits made, files modified, findings addressed, any skipped issues
311
-
312
- CONFLICT AVOIDANCE:
313
- - Only modify files listed in your assigned findings
314
- - If you need to modify a file assigned to another agent, skip that change and report it
315
- ```
294
+
295
+ !`cat ~/.claude/lib/remediation-agent-template.md`
316
296
 
317
297
  ### Conflict avoidance:
318
298
  - Review all findings before task assignment. If two categories touch the same file, assign both sets of findings to the same agent.
319
299
  - Security agent gets priority on validation logic; DRY agent gets priority on import consolidation.
320
300
 
301
+ </plan_and_remediate>
302
+
303
+ <verification_and_pr>
304
+
321
305
  ## Phase 4: Verification
322
306
 
323
307
  After all agents complete:
@@ -337,6 +321,34 @@ After all agents complete:
337
321
  4. Shut down all agents via `SendMessage` with `type: "shutdown_request"`
338
322
  5. Clean up team via `TeamDelete`
339
323
 
324
+ ## Phase 4b: Internal Code Review
325
+
326
+ Before creating PRs, run a deep code review on all remediation changes to catch issues that automated agents may have introduced.
327
+
328
+ 1. Generate the diff of all changes in the worktree:
329
+ ```bash
330
+ cd {WORKTREE_DIR} && git diff {DEFAULT_BRANCH}...HEAD
331
+ ```
332
+ 2. Review the diff against the code review checklist:
333
+ ```
334
+ !`cat ~/.claude/lib/code-review-checklist.md`
335
+ ```
336
+ 3. For each issue found:
337
+ - Fix in a new commit: `fix: {description of review finding}`
338
+ - Re-run `{BUILD_CMD}` and `{TEST_CMD}` to verify
339
+ 4. Present a summary of review findings and fixes to the user via `AskUserQuestion`:
340
+ ```
341
+ AskUserQuestion([{
342
+ question: "Code review complete. {N} issues found and fixed. {list}. Proceed to PR creation?",
343
+ options: [
344
+ { label: "Proceed", description: "Create per-category PRs" },
345
+ { label: "Show diff", description: "Show the full diff for manual review before proceeding" },
346
+ { label: "Abort", description: "Stop here — I'll review manually" }
347
+ ]
348
+ }])
349
+ ```
350
+ 5. If "Show diff" selected, print the diff and re-ask. If "Abort", stop and print the worktree path.
351
+
340
352
  ## Phase 5: Per-Category PR Creation
341
353
 
342
354
  Instead of one mega PR, create **separate branches and PRs for each category**. This enables independent review, targeted CI, and granular merge decisions.
@@ -359,9 +371,9 @@ For each category that has findings:
359
371
  ```
360
372
  5. Push the branch: `git push -u origin better/{CATEGORY_SLUG}`
361
373
 
362
- **CRITICAL: File isolation rule** — each file must appear in exactly ONE branch. If a file has changes from multiple categories (e.g., `server/index.js` with both security and stack-specific changes), assign the whole file to one category based on the file ownership map. Do not split file-level changes across PRs.
374
+ **File isolation rule** (one file per branch) — each file must appear in exactly ONE branch. If a file has changes from multiple categories (e.g., `server/index.js` with both security and stack-specific changes), assign the whole file to one category based on the file ownership map. Do not split file-level changes across PRs.
363
375
 
364
- **CRITICAL: Cross-PR dependency check** — after building all branches, verify each branch builds independently:
376
+ **Cross-PR dependency check** — verify each branch builds independently:
365
377
  ```bash
366
378
  git checkout better/{CATEGORY_SLUG} && {BUILD_CMD}
367
379
  ```
@@ -465,110 +477,19 @@ After creating all PRs, verify CI passes on each one:
465
477
 
466
478
  Maximum 5 iterations per PR to prevent infinite loops.
467
479
 
468
- **IMPORTANT — Sub-agent delegation**: To prevent context exhaustion on long review cycles with multiple PRs, delegate each PR's review loop to a **separate general-purpose sub-agent** via the Agent tool. Launch sub-agents in parallel (one per PR). Each sub-agent runs the full loop (request → wait → check → fix → re-request) autonomously and returns only the final status.
469
-
470
- ### 6.0: Verify browser authentication
471
-
472
- If `BROWSER_AUTHENTICATED` is not true (e.g., Phase 0e was skipped or failed):
473
- 1. Navigate to the first PR URL using `browser_navigate`
474
- 2. Check for user avatar/menu
475
- 3. If not logged in: navigate to `https://github.com/login`, inform the user **"Please log in to GitHub in the browser. I'll wait for you to confirm."**, and use `AskUserQuestion` to wait
476
-
477
- ### 6.1: Determine review request method
480
+ **Sub-agent delegation** (prevents context exhaustion): delegate each PR's review loop to a **separate general-purpose sub-agent** via the Agent tool. Launch sub-agents in parallel (one per PR). Each sub-agent runs the full loop (request → wait → check → fix → re-request) autonomously and returns only the final status.
478
481
 
479
- **Try the API first** on any one PR:
480
- ```bash
481
- gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/requested_reviewers \
482
- -f 'reviewers[]=copilot-pull-request-reviewer[bot]'
483
- ```
484
-
485
- If this returns 422 ("not a collaborator"), record `REVIEW_METHOD=playwright`. Otherwise record `REVIEW_METHOD=api`.
482
+ ### 6.1: Launch parallel sub-agents (one per PR)
486
483
 
487
- ### 6.2: Launch parallel sub-agents (one per PR)
484
+ For each PR, spawn a general-purpose sub-agent using the shared review loop template:
488
485
 
489
- For each PR, spawn a general-purpose sub-agent with:
486
+ !`cat ~/.claude/lib/copilot-review-loop.md`
490
487
 
491
- ```
492
- You are a Copilot review loop agent for PR {PR_NUMBER}.
493
-
494
- Repository: {OWNER}/{REPO}
495
- Branch: better/{CATEGORY_SLUG}
496
- Build command: {BUILD_CMD}
497
- Review request method: {REVIEW_METHOD}
498
- Max iterations: 5
499
-
500
- DECREASING TIMEOUT SCHEDULE (shorter than single-PR review since multiple
501
- PRs are reviewed in parallel — see do:rpr for single-PR dynamic timing):
502
- - Iteration 1: max wait 5 minutes
503
- - Iteration 2: max wait 4 minutes
504
- - Iteration 3: max wait 3 minutes
505
- - Iteration 4: max wait 2 minutes
506
- - Iteration 5+: max wait 1 minute
507
- Poll interval: 30 seconds for all iterations.
508
-
509
- Run the following loop until Copilot returns zero new comments or you hit
510
- the max iteration limit:
511
-
512
- 1. CAPTURE the latest Copilot review timestamp, then REQUEST a new review:
513
- - First, capture the latest Copilot review timestamp via GraphQL:
514
- echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviews(last: 20) { nodes { author { login } submittedAt } } } } }"}' | gh api graphql --input -
515
- - Find the most recent submittedAt where author.login is
516
- copilot-pull-request-reviewer[bot] and record as LAST_COPILOT_SUBMITTED_AT.
517
- - If no prior Copilot review exists, record LAST_COPILOT_SUBMITTED_AT=NONE
518
- and treat the next Copilot review as NEW regardless of timestamp.
519
- - Then REQUEST:
520
- If REVIEW_METHOD is "api":
521
- gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/requested_reviewers \
522
- -f 'reviewers[]=copilot-pull-request-reviewer[bot]'
523
- If REVIEW_METHOD is "playwright":
524
- Navigate to the PR URL, click the "Reviewers" gear button, click the
525
- Copilot menuitemradio option, verify sidebar shows "Awaiting requested
526
- review from Copilot"
527
-
528
- 2. WAIT for the review (BLOCKING):
529
- - Poll using stdin JSON piping (avoid shell-escaping issues):
530
- echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviews(last: 5) { totalCount nodes { state body author { login } submittedAt } } reviewThreads(first: 100) { nodes { id isResolved comments(first: 3) { nodes { body path line author { login } } } } } } } }"}' | gh api graphql --input -
531
- - Complete when a new copilot-pull-request-reviewer[bot] review appears
532
- with submittedAt after LAST_COPILOT_SUBMITTED_AT captured in step 1
533
- (or, if LAST_COPILOT_SUBMITTED_AT=NONE, when the first
534
- copilot-pull-request-reviewer[bot] review for this loop appears)
535
- - Use the DECREASING TIMEOUT for the current iteration number
536
- - Error detection: if review body contains "Copilot encountered an error"
537
- or "unable to review", re-request and resume. Max 3 error retries.
538
- - If no review after max wait, report timeout and exit
539
-
540
- 3. CHECK for unresolved threads:
541
- Fetch threads via stdin JSON piping:
542
- echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviewThreads(first: 100) { nodes { id isResolved comments(first: 10) { nodes { body path line author { login } } } } } } } }"}' | gh api graphql --input -
543
- - Verify review was successful (no error text in body)
544
- - If zero comments / no unresolved threads: report success and exit
545
- - If unresolved threads exist: proceed to step 4
546
-
547
- 4. FIX all unresolved threads:
548
- For each unresolved thread:
549
- - Read the referenced file and understand the feedback
550
- - Evaluate: valid feedback → make the fix; informational/false positive →
551
- resolve without changes
552
- - If fixing:
553
- git checkout better/{CATEGORY_SLUG}
554
- # make changes
555
- git add <specific files>
556
- git commit -m "address Copilot review feedback"
557
- git push
558
- - Resolve thread via stdin JSON piping:
559
- echo '{"query":"mutation { resolveReviewThread(input: {threadId: \"{THREAD_ID}\"}) { thread { id isResolved } } }"}' | gh api graphql --input -
560
- - After all threads resolved, increment iteration and go back to step 1
561
-
562
- When done, report back:
563
- - Final status: clean / max-iterations-reached / timeout / error
564
- - Total iterations completed
565
- - List of commits made (if any)
566
- - Any unresolved threads remaining
567
- ```
488
+ Pass each sub-agent the PR-specific variables: `{PR_NUMBER}`, `{OWNER}/{REPO}`, `better/{CATEGORY_SLUG}`, and `{BUILD_CMD}`.
568
489
 
569
490
  Launch all PR sub-agents in parallel. Wait for all to complete.
570
491
 
571
- ### 6.3: Handle sub-agent results
492
+ ### 6.2: Handle sub-agent results
572
493
 
573
494
  For each sub-agent result:
574
495
  - **clean**: mark PR as ready to merge
@@ -576,9 +497,28 @@ For each sub-agent result:
576
497
  - **max-iterations-reached**: inform the user "Reached max review iterations (5) on PR #{number}. Remaining issues may need manual review."
577
498
  - **error**: inform the user and ask whether to retry or skip
578
499
 
500
+ ### 6.3: Merge Gate (MANDATORY)
501
+
502
+ **Do NOT merge any PR until Copilot review has completed (approved or commented) on ALL PRs, or the user explicitly approves skipping.**
503
+
504
+ Present the review status summary to the user via `AskUserQuestion`:
505
+ ```
506
+ AskUserQuestion([{
507
+ question: "Copilot review status:\n{for each PR: #number - status (approved/comments/pending/timeout)}\n\nHow would you like to proceed?",
508
+ options: [
509
+ { label: "Merge approved PRs", description: "Merge only PRs with passing review" },
510
+ { label: "Merge all", description: "Merge all PRs regardless of review status" },
511
+ { label: "Wait", description: "Wait longer for pending reviews" },
512
+ { label: "Don't merge", description: "Leave PRs open for manual review" }
513
+ ]
514
+ }])
515
+ ```
516
+
517
+ Only proceed with merging based on the user's selection. Never auto-merge without user confirmation.
518
+
579
519
  ### 6.4: Merge
580
520
 
581
- For each PR that has passed CI and review (in dependency order if applicable):
521
+ For each PR approved for merge (in dependency order if applicable):
582
522
  ```bash
583
523
  gh pr merge {PR_NUMBER} --merge
584
524
  ```
@@ -598,17 +538,24 @@ If merge fails (e.g., branch protection, merge conflicts from a prior PR):
598
538
  Then re-run CI check before merging.
599
539
  - If branch protection: inform the user and suggest manual merge
600
540
 
541
+ </verification_and_pr>
542
+
601
543
  ## Phase 7: Cleanup
602
544
 
603
545
  1. Remove the worktree:
604
546
  ```bash
605
547
  git worktree remove {WORKTREE_DIR}
606
548
  ```
607
- 2. Delete local branches (only if merged):
549
+ 2. Delete local AND remote branches (only if merged):
608
550
  ```bash
609
551
  git branch -d better/{DATE}
610
552
  git branch -d better/security better/code-quality better/dry better/arch-bugs better/stack-specific
611
553
  ```
554
+ ```bash
555
+ git push origin --delete better/{DATE}
556
+ git push origin --delete better/security better/code-quality better/dry better/arch-bugs better/stack-specific
557
+ ```
558
+ Ignore errors from `--delete` if a branch doesn't exist remotely.
612
559
  3. Restore stashed changes (if stashed in Phase 3a):
613
560
  ```bash
614
561
  git stash pop
@@ -643,7 +590,6 @@ If merge fails (e.g., branch protection, merge conflicts from a prior PR):
643
590
  - **Copilot review loop exceeds 5 iterations per PR**: stop iterating on that PR, inform user, proceed to merge
644
591
  - **Existing worktree found at startup**: ask user — resume (reuse worktree) or cleanup (remove and start fresh)
645
592
  - **No findings above LOW**: skip Phases 3-7, print "No actionable findings" with the LOW summary
646
- - **Browser not authenticated**: use `AskUserQuestion` to ask the user to log in — never skip this or close the browser
647
593
  - **Merge conflict after prior PR merged**: rebase the branch onto the updated default branch, push with `--force-with-lease`, re-run CI
648
594
 
649
595
  !`cat ~/.claude/lib/graphql-escaping.md`
@@ -59,18 +59,39 @@ Before committing, ensure the fork is up to date with upstream:
59
59
  git push -u origin {CURRENT_BRANCH}
60
60
  ```
61
61
 
62
- ## Local Code Review (before opening PR)
62
+ ## Local Code Review (REQUIRED GATE)
63
+
64
+ Fork PRs go to upstream maintainers who can't easily ask for changes — getting it right the first time matters more here than on internal PRs.
65
+
66
+ <review_gate>
63
67
 
64
68
  1. Fetch upstream default branch for accurate diff:
65
69
  ```bash
66
70
  git fetch upstream {UPSTREAM_DEFAULT_BRANCH}
67
71
  ```
68
- 2. Run `git diff upstream/{UPSTREAM_DEFAULT_BRANCH}...{CURRENT_BRANCH}` to see the full diff against upstream
69
- 3. **For each changed file**, read the full file (not just the diff hunks) and check:
72
+ 2. Run `git diff upstream/{UPSTREAM_DEFAULT_BRANCH}...{CURRENT_BRANCH}` to get the list of changed files
73
+ 3. For every changed file:
74
+ a. Read the entire file using the Read tool (not just diff hunks)
75
+ b. Check it against the tiered checklist below (always check Tiers 1+4; check Tiers 2-3 when relevance filters match)
76
+ c. For each finding, quote the specific code line and explain why it's a problem
77
+ 4. After reviewing all files, verify: does the code actually deliver what the commits claim?
78
+ 5. Print a review summary table (see do:review for format)
79
+ 6. Fix any issues, recommit, and push before proceeding
80
+ 7. Only after printing the review summary may you proceed to "Open the PR"
81
+
82
+ If the diff touches more than 15 files, delegate later batches to a subagent to keep context clean.
83
+
84
+ </review_gate>
85
+
86
+ Checklist to apply to each file:
70
87
 
71
88
  !`cat ~/.claude/lib/code-review-checklist.md`
72
- 4. If issues are found, fix them, recommit, and push before proceeding
73
- 5. Summarize the review findings so the user can see what was checked
89
+
90
+ Verification confirm before proceeding:
91
+ - [ ] Read every changed file in full (not just diffs)
92
+ - [ ] Checked each file against the relevant checklist tiers
93
+ - [ ] Quoted specific code for each finding
94
+ - [ ] Printed a review summary table with findings
74
95
 
75
96
  ## Check for Upstream Contributing Guidelines
76
97
 
package/commands/do/pr.md CHANGED
@@ -17,17 +17,36 @@ Print: `PR flow: {current_branch} → {default_branch}`
17
17
  - Keep commit message concise and do not use co-author information
18
18
  - Push the branch to remote: `git pull --rebase --autostash && git push -u origin {current_branch}`
19
19
 
20
- ## Local Code Review (before opening PR)
20
+ ## Local Code Review (REQUIRED GATE)
21
21
 
22
- Before creating the PR, perform a thorough self-review. Read each changed file not just the diff to understand how the changes behave at runtime.
22
+ This review catches bugs that Copilot misses incomplete pattern copying is the #1 source of post-merge review feedback. Skipping costs more time in review cycles than it saves.
23
23
 
24
- 1. Run `git diff {default_branch}...{current_branch}` to see the full diff
25
- 2. **For each changed file**, read the full file (not just the diff hunks) and check:
24
+ <review_gate>
25
+
26
+ 1. Read commit messages to understand what this change claims to do
27
+ 2. Run `git diff {default_branch}...{current_branch}` to get the list of changed files
28
+ 3. For every changed file:
29
+ a. Read the entire file using the Read tool (not just diff hunks)
30
+ b. Check it against the tiered checklist below (always check Tiers 1+4; check Tiers 2-3 when relevance filters match)
31
+ c. For each finding, quote the specific code line and explain why it's a problem
32
+ 4. After reviewing all files, verify: does the code actually deliver what the commits claim?
33
+ 5. Print a review summary table (see do:review for format)
34
+ 6. Fix any issues, run tests, and verify tests cover the changed code paths
35
+ 7. Only after printing the review summary may you proceed to "Open the PR"
36
+
37
+ If the diff touches more than 15 files, delegate later batches to a subagent to keep context clean.
38
+
39
+ </review_gate>
40
+
41
+ Checklist to apply to each file:
26
42
 
27
43
  !`cat ~/.claude/lib/code-review-checklist.md`
28
44
 
29
- 3. If issues are found, fix them and amend/recommit before proceeding
30
- 4. Summarize the review findings (even if clean) so the user can see what was checked
45
+ Verification confirm before proceeding:
46
+ - [ ] Read every changed file in full (not just diffs)
47
+ - [ ] Checked each file against the relevant checklist tiers
48
+ - [ ] Quoted specific code for each finding
49
+ - [ ] Printed a review summary table with findings
31
50
 
32
51
  ## Open the PR
33
52
 
@@ -57,17 +57,36 @@ If ambiguous, ask the user to confirm before proceeding.
57
57
 
58
58
  4. **Commit the release**: Stage `package.json`, `package-lock.json`, and the changelog file. Commit with message `chore: release v{new_version}`
59
59
 
60
- ## Local Code Review (before opening PR)
60
+ ## Local Code Review (REQUIRED GATE)
61
61
 
62
- Perform a thorough self-review. Read each changed file not just the diffto understand how the changes behave at runtime.
62
+ A release without a deep code review ships bugs to users. This review is the last line of defense the full diff since the last release often contains interactions that individual PR reviews missed.
63
63
 
64
- 1. Run `git diff {target}...{source}` to see the full diff
65
- 2. **For each changed file**, read the full file (not just the diff hunks) and check:
64
+ <review_gate>
65
+
66
+ 1. Read all commit messages since last release to understand the scope
67
+ 2. Run `git diff {target}...{source}` to get the list of changed files
68
+ 3. For every changed file:
69
+ a. Read the entire file using the Read tool (not just diff hunks)
70
+ b. Check it against the tiered checklist below (always check Tiers 1+4; check Tiers 2-3 when relevance filters match)
71
+ c. For each finding, quote the specific code line and explain why it's a problem
72
+ 4. After reviewing all files, verify: does the aggregate change set deliver what the release claims?
73
+ 5. Print a review summary table (see do:review for format)
74
+ 6. Fix any issues, run tests, verify tests cover the changed code paths, commit and push
75
+ 7. Only after printing the review summary may you proceed to "Open the Release PR"
76
+
77
+ If the diff touches more than 15 files, delegate later batches to a subagent to keep context clean.
78
+
79
+ </review_gate>
80
+
81
+ Checklist to apply to each file:
66
82
 
67
83
  !`cat ~/.claude/lib/code-review-checklist.md`
68
84
 
69
- 3. If issues are found, fix them, commit, and push before proceeding
70
- 4. Summarize the review findings (even if clean) so the user can see what was checked
85
+ Verification confirm before proceeding:
86
+ - [ ] Read every changed file in full (not just diffs)
87
+ - [ ] Checked each file against the relevant checklist tiers
88
+ - [ ] Quoted specific code for each finding
89
+ - [ ] Printed a review summary table with findings
71
90
 
72
91
  ## Open the Release PR
73
92
 
@@ -17,6 +17,22 @@ If there are no changes, inform the user and stop.
17
17
 
18
18
  CLAUDE.md is already loaded into your context. Use its rules (code style, error handling, logging, security model, scope exclusions) as overrides to generic best practices throughout this review. For example, if CLAUDE.md says "no auth needed — internal tool", do not flag missing authentication.
19
19
 
20
+ <review_instructions>
21
+
22
+ ## PR-Level Coherence Check
23
+
24
+ Before reviewing individual files, understand what this change set claims to do:
25
+
26
+ 1. Read commit messages (`git log {base}...HEAD --oneline`)
27
+ 2. After reviewing all files, verify: does the changed code actually deliver what the commits claim? Flag any claims not backed by code (e.g., "adds rate limiting" but only adds a comment).
28
+
29
+ ## Large PR Strategy
30
+
31
+ If the diff touches more than 15 files, split the review into batches:
32
+ 1. Group files by module/directory
33
+ 2. Review each batch, printing findings as you go
34
+ 3. Delegate files beyond the first 15 to a subagent if context is getting full
35
+
20
36
  ## Deep File Review
21
37
 
22
38
  For **each changed file** in the diff, read the **entire file** (not just diff hunks). Reviewing only the diff misses context bugs where new code interacts incorrectly with existing code.
@@ -59,12 +75,20 @@ With the flow understood, evaluate the changed code against these principles:
59
75
 
60
76
  Only flag principle violations that are **concrete and actionable** in the changed code. Do not flag pre-existing design issues in untouched code unless the changes make them worse.
61
77
 
78
+ </review_instructions>
79
+
80
+ <checklist>
81
+
62
82
  ### Per-File Checklist
63
83
 
64
- Check every file against this checklist:
84
+ Check every file against this checklist. The checklist is organized into tiers — always check Tiers 1 and 4, and check Tiers 2-3 only when the relevance filter matches the file:
65
85
 
66
86
  !`cat ~/.claude/lib/code-review-checklist.md`
67
87
 
88
+ </checklist>
89
+
90
+ <deep_checks>
91
+
68
92
  ### Additional deep checks (read surrounding code to verify):
69
93
 
70
94
  **Cross-file consistency**
@@ -87,6 +111,7 @@ Check every file against this checklist:
87
111
 
88
112
  **Access scope changes**
89
113
  - If the PR widens access to an endpoint or resource (admin→public, internal→external), trace all shared dependencies the endpoint uses (rate limiters, queues, connection pools, external service quotas) and assess whether they were sized for the previous access level — in-memory/process-local limiters don't enforce limits across horizontally scaled instances
114
+ - If the PR adds endpoints under a restricted route group (admin, internal, scoped), read sibling endpoints in the same route group and verify the new endpoint applies the same authorization gate — missing gates on admin-mounted endpoints are consistently the most dangerous review finding
90
115
 
91
116
  **Guard-before-cache ordering**
92
117
  - If a handler performs a pre-flight guard check (rate limit, quota, feature flag) before a cache lookup or short-circuit path, verify the guard doesn't block operations that would be served from cache without touching the guarded resource — restructure so cache hits bypass the guard
@@ -123,17 +148,51 @@ Check every file against this checklist:
123
148
  **Data model vs access pattern alignment**
124
149
  - If the PR adds queries that claim ordering (e.g., "recent", "top"), verify the underlying key/index design actually supports that ordering natively — random UUIDs and non-time-sortable keys require full scans and in-memory sorting, which degrades at scale
125
150
 
151
+ **Deletion/lifecycle cleanup completeness**
152
+ - If the PR adds a delete or destroy function, trace all resources created during the entity's lifecycle (data directories, git branches, child records, temporary files, worktrees) and verify each is cleaned up on deletion. Compare with existing delete functions in the codebase for completeness patterns
153
+
154
+ **Update schema depth**
155
+ - If the PR derives an update/patch schema from a create schema (e.g., `.partial()`, `Partial<T>`), verify that nested objects also become partial — shallow partial on deeply-required schemas rejects valid partial updates where the caller only wants to change one nested field
156
+
157
+ **Mutation return value freshness**
158
+ - If a function mutates an entity and returns it, verify the returned object reflects the post-mutation state, not a pre-read snapshot. Also check whether dependent scheduling/evaluation state (backoff, timers, status flags) is reset when a "force" or "trigger" operation is invoked
159
+
160
+ **Responsibility relocation audit**
161
+ - If the PR moves a responsibility from one module to another (e.g., a database write from a handler to middleware, a computation from client to server), trace all code at the old location that depended on the timing, return value, or side effects of the moved operation — guards, response fields, in-memory state updates, and downstream scheduling that assumed co-located execution. Verify the new execution point preserves these contracts or that dependents are updated. Check for dead code left behind at the old location
162
+
163
+ **Read-after-write consistency**
164
+ - If the PR writes to a data store and then immediately queries that store (especially scans, aggregations, or replica reads), check whether the store's consistency model guarantees visibility of the write. If not, flag the read as potentially stale and suggest computing from in-memory state, using consistent-read options, or adding a delay/caveat
165
+
126
166
  **Formatting & structural consistency**
127
167
  - If the PR adds content to an existing file (list items, sections, config entries), verify the new content matches the file's existing indentation, bullet style, heading levels, and structure — rendering inconsistencies are the most common Copilot review finding
128
168
 
169
+ </deep_checks>
170
+
171
+ <verify_findings>
172
+
173
+ ## Verify Findings
174
+
175
+ For each issue found, ground it in evidence before classifying:
176
+ 1. **Quote the specific code line(s)** that demonstrate the issue
177
+ 2. **Explain why it's a problem** in one sentence given the surrounding context
178
+ 3. If the fix involves async/state changes, **trace the execution path** to confirm the issue is real
179
+ 4. If you cannot quote specific code for a finding, downgrade it to **[UNCERTAIN]**
180
+
181
+ After verifying all findings, run the project's build and test commands to confirm no false positives.
182
+
183
+ </verify_findings>
184
+
185
+ <fix_and_report>
186
+
129
187
  ## Fix Issues Found
130
188
 
131
- For each issue found:
189
+ For each verified issue:
132
190
  1. Classify severity: **CRITICAL** (runtime crash, data leak, security) vs **IMPROVEMENT** (consistency, robustness, conventions)
133
191
  2. Fix all CRITICAL issues immediately
134
192
  3. For IMPROVEMENT issues, fix them too — the goal is to eliminate Copilot review round-trips
135
193
  4. After fixes, run the project's test suite and build command (per project conventions already in context)
136
- 5. Commit fixes: `refactor: address code review findings`
194
+ 5. Verify the test suite covers the changed code paths — passing unrelated tests is not validation
195
+ 6. Commit fixes: `refactor: address code review findings`
137
196
 
138
197
  ## Report
139
198
 
@@ -155,3 +214,5 @@ Print a summary table of what was reviewed and found:
155
214
  ```
156
215
 
157
216
  If no issues were found, confirm the code is clean and ready for PR.
217
+
218
+ </fix_and_report>
package/install.sh CHANGED
@@ -36,6 +36,7 @@ OLD_COMMANDS=(cam good makegoals makegood optimize-md)
36
36
 
37
37
  LIBS=(
38
38
  code-review-checklist copilot-review-loop graphql-escaping
39
+ remediation-agent-template
39
40
  )
40
41
 
41
42
  HOOKS=(slashdo-check-update slashdo-statusline)
@@ -1,3 +1,11 @@
1
+ <!--
2
+ Triage: Check Tiers 1 and 4 for every file. Check Tier 2/3 only when
3
+ the relevance filter matches the changed code. This prevents important
4
+ checks from being lost in a long list.
5
+ -->
6
+
7
+ ## Tier 1 — Always Check (Runtime Crashes, Security, Hygiene)
8
+
1
9
  **Hygiene**
2
10
  - Leftover debug code (`console.log`, `debugger`, TODO/FIXME/HACK), hardcoded secrets/credentials, and uncommittable files (.env, node_modules, build artifacts)
3
11
  - Overly broad changes that should be split into separate PRs
@@ -11,113 +19,162 @@
11
19
  - Type coercion edge cases — `Number('')` is `0` not empty, `0` is falsy in truthy checks, `NaN` comparisons are always false; string comparison operators (`<`, `>`, `localeCompare`) do lexicographic, not semantic, ordering (e.g., `"10" < "2"`). Use explicit type checks (`Number.isFinite()`, `!= null`) and dedicated libraries (e.g., semver for versions) instead of truthy guards or lexicographic ordering when zero/empty are valid values or semantic ordering matters
12
20
  - Functions that index into arrays without guarding empty arrays; state/variables declared but never updated or only partially wired up
13
21
  - Shared mutable references — module-level defaults passed by reference mutate across calls (use `structuredClone()`/spread); `useCallback`/`useMemo` referencing a later `const` (temporal dead zone); object spread followed by unconditional assignment that clobbers spread values
14
- - Side effects during React render (setState, navigation, mutations outside useEffect)
22
+ - Functions with >10 branches or >15 cyclomatic complexity refactor into smaller units
23
+
24
+ **API & URL safety**
25
+ - User-supplied or system-generated values interpolated into URL paths, shell commands, file paths, or subprocess arguments without encoding/validation — use `encodeURIComponent()` for URLs, regex allowlists for execution boundaries. Generated identifiers used as URL path segments must be safe for your router/storage (no `/`, `?`, `#`; consider allowlisting characters and/or applying `encodeURIComponent()`). Identifiers derived from human-readable names (slugs) used for namespaced resources (git branches, directories) need a unique suffix (ID, hash) to prevent collisions between entities with the same or similar names
26
+ - Route params passed to services without format validation; path containment checks using string prefix without path separator boundary (use `path.relative()`)
27
+ - Error/fallback responses that hardcode security headers instead of using centralized policy — error paths bypass security tightening
28
+
29
+ **Trust boundaries & data exposure**
30
+ - API responses returning full objects with sensitive fields — destructure and omit across ALL response paths (GET, PUT, POST, error, socket); comments/docs claiming data isn't exposed while the code path does expose it
31
+ - Server trusting client-provided computed/derived values (scores, totals, correctness flags) when the server can recompute them — strip and recompute server-side; don't require clients to submit fields the server should own
32
+ - New endpoints mounted under restricted paths (admin, internal) missing authorization verification — compare with sibling endpoints in the same route group to ensure the same access gate (role check, scope validation) is applied consistently
15
33
 
16
- **Async & state consistency**
17
- - Optimistic state changes (view switches, navigation, success callbacks) before async completion — if the operation fails or is cancelled, the UI is stuck with no rollback. Check return values/errors before calling success callbacks. Handle both failure and cancellation paths
34
+ ## Tier 2 — Check When Relevant (Data Integrity, Async, Error Handling)
35
+
36
+ **Async & state consistency** _[applies when: code uses async/await, Promises, or UI state]_
37
+ - Optimistic state changes (view switches, navigation, success callbacks) before async completion — if the operation fails or is cancelled, the UI is stuck with no rollback. Check return values/errors before calling success callbacks. Handle both failure and cancellation paths. Watch for `.catch(() => null)` followed by unconditional success code (toast, state update) — the catch silences the error but the success path still runs. Either let errors propagate naturally or check the return value before proceeding
18
38
  - Multiple coupled state variables updated independently — actions that change one must update all related fields; debounced/cancelable operations must reset loading state on every exit path (cleared, stale, failed, aborted)
19
39
  - Error notification at multiple layers (shared API client + component-level) — verify exactly one layer owns user-facing error messages
20
40
  - Optimistic updates using full-collection snapshots for rollback — a second in-flight action gets clobbered. Use per-item rollback and functional state updaters after async gaps; sync optimistic changes to parent via callback or trigger refetch on remount
21
41
  - State updates guarded by truthiness of the new value (`if (arr?.length)`) — prevents clearing state when the source legitimately returns empty. Distinguish "no response" from "empty response"
42
+ - Mutation/trigger functions that return or propagate stale pre-mutation state — if a function activates, updates, or resets an entity, the returned value and any dependent scheduling/evaluation state (backoff timers, "last run" timestamps, status flags) must reflect the post-mutation state, not a snapshot read before the mutation
43
+ - Fire-and-forget or async writes where the in-memory object is not updated (response returns stale data) or is updated unconditionally regardless of write success (response claims state that was never persisted) — update in-memory state conditionally on write outcome, or document the tradeoff explicitly
44
+ - Missing `await` on async operations in error/cleanup paths — fire-and-forget cleanup (e.g., aborting a failed operation, rolling back partial state) that must complete before the function returns or the caller proceeds
22
45
  - `Promise.all` without error handling — partial load with unhandled rejection. Wrap with fallback/error state
46
+ - Side effects during React render (setState, navigation, mutations outside useEffect)
23
47
 
24
- **Resource management**
25
- - Event listeners, socket handlers, subscriptions, timers, and useEffect side effects are cleaned up on unmount/teardown
26
- - Initialization functions (schedulers, pollers, listeners) that don't guard against multiple calls — creates duplicate instances. Check for existing instances before reinitializing
27
-
28
- **Error handling**
48
+ **Error handling** _[applies when: code has try/catch, .catch, error responses, or external calls]_
29
49
  - Service functions throwing generic `Error` for client-caused conditions — bubbles as 500 instead of 400/404. Use typed error classes with explicit status codes; ensure consistent error responses across similar endpoints. Include expected concurrency/conditional failures (transaction cancellations, optimistic lock conflicts) — catch and translate to 409/retry rather than letting them surface as 500
30
50
  - Swallowed errors (empty `.catch(() => {})`), handlers that replace detailed failure info with generic messages, and error/catch handlers that exit cleanly (`exit 0`, `return`) without any user-visible output — surface a notification, propagate original context, and make failures look like failures
31
51
  - Destructive operations in retry/cleanup paths assumed to succeed without their own error handling — if cleanup fails, retry logic crashes instead of reporting the intended failure
52
+ - External service calls without configurable timeouts — a hung downstream service blocks the caller indefinitely
53
+ - Missing fallback behavior when downstream services are unavailable (see also: retry without backoff in "Sync & replication")
32
54
 
33
- **API & URL safety**
34
- - User-supplied or system-generated values interpolated into URL paths, shell commands, file paths, or subprocess arguments without encoding/validation — use `encodeURIComponent()` for URLs, regex allowlists for execution boundaries. Generated identifiers used as URL path segments must be safe for your router/storage (no `/`, `?`, `#`; consider allowlisting characters and/or applying `encodeURIComponent()`)
35
- - Route params passed to services without format validation; path containment checks using string prefix without path separator boundary (use `path.relative()`)
36
- - Error/fallback responses that hardcode security headers instead of using centralized policyerror paths bypass security tightening
37
-
38
- **Trust boundaries & data exposure**
39
- - API responses returning full objects with sensitive fields — destructure and omit across ALL response paths (GET, PUT, POST, error, socket); comments/docs claiming data isn't exposed while the code path does expose it
40
- - Server trusting client-provided computed/derived values (scores, totals, correctness flags) when the server can recompute them — strip and recompute server-side; don't require clients to submit fields the server should own
41
-
42
- **Input handling**
43
- - Trimming values where whitespace is significant (API keys, tokens, passwords, base64) — only trim identifiers/names
44
- - Endpoints accepting unbounded arrays/collections without upper limits — enforce max size or move to background jobs
55
+ **Resource management** _[applies when: code uses event listeners, timers, subscriptions, or useEffect]_
56
+ - Event listeners, socket handlers, subscriptions, timers, and useEffect side effects are cleaned up on unmount/teardown
57
+ - Deletion/destroy functions that clean up the primary resource but leave orphaned secondary resources (data directories, git branches, child records, temporary files) — trace all resources created during the entity's lifecycle and verify each is removed on delete
58
+ - Initialization functions (schedulers, pollers, listeners) that don't guard against multiple callscreates duplicate instances. Check for existing instances before reinitializing
45
59
 
46
- **Validation & consistency**
60
+ **Validation & consistency** _[applies when: code handles user input, schemas, or API contracts]_
61
+ - API versioning: breaking changes to public endpoints without version bump or deprecation path
62
+ - Backward-incompatible response shape changes without client migration plan
47
63
  - New endpoints/schemas should match validation patterns of existing similar endpoints — field limits, required fields, types, error handling. If validation exists on one endpoint for a param, the same param on other endpoints needs the same validation
48
64
  - When a validation/sanitization function is introduced for a field, trace ALL write paths (create, update, sync, import) — partial application means invalid values re-enter through the unguarded path
49
- - Schema fields accepting values downstream code can't handle; Zod/schema stripping fields the service reads (silent `undefined`); config values persisted but silently ignored by the implementation — trace each field through schema → service → consumer
65
+ - Schema fields accepting values downstream code can't handle; Zod/schema stripping fields the service reads (silent `undefined`); config values persisted but silently ignored by the implementation — trace each field through schema → service → consumer. Update schemas derived from create schemas (e.g., `.partial()`) must also make nested object fields optional — shallow partial on a deeply-required schema rejects valid partial updates. Additionally, `.deepPartial()` or `.partial()` on schemas with `.default()` values will apply those defaults on update, silently overwriting existing persisted values with defaults — create explicit update schemas without defaults instead
66
+ - Entity creation without case-insensitive uniqueness checks — names differing only in case (e.g., "MyAgent" vs "myagent") cause collisions in case-insensitive contexts (file paths, git branches, URLs). Normalize to lowercase before comparing
50
67
  - Handlers reading properties from framework-provided objects using field names the framework doesn't populate — silent `undefined`. Verify property names match the caller's contract
68
+ - Data model fields that have different names depending on the creation/write path (e.g., `createdAt` vs `created`) — code referencing only one naming convention silently misses records created through other paths. Trace all write paths to discover the actual field names in use
51
69
  - Numeric values from strings used without `NaN`/type guards — `NaN` comparisons silently pass bounds checks. Clamp query params to safe lower bounds
52
70
  - UI elements hidden from navigation but still accessible via direct URL — enforce restrictions at the route level
53
71
  - Summary counters/accumulators that miss edge cases (removals, branch coverage, underflow on decrements — guard against going negative with lower-bound conditions); silent operations in verbose sequences where all branches should print status
54
72
 
55
- **Intent vs implementation**
56
- - Labels, comments, status messages, or documentation that describe behavior the code doesn't implement — e.g., a map named "renamed" that only deletes, or an action labeled "migrated" that never creates the target
57
- - Inline code examples, command templates, and query snippets that aren't syntactically valid as written — template placeholders must use a consistent format, queries must use correct syntax for their language (e.g., single `{}` in GraphQL, not `{{}}`)
58
- - Cross-references between files (identifiers, parameter names, format conventions, operational thresholds) that disagree — when one reference changes, trace all other files that reference the same entity and update them
59
- - Sequential instructions or steps whose ordering doesn't match the required execution order — readers following in order will perform actions at the wrong time (e.g., "record X" in step 2 when X must be captured before step 1's action)
60
- - Sequential numbering (section numbers, step numbers) with gaps or jumps after edits — verify continuity
61
- - Completion markers, success flags, or status files written before the operation they attest to finishes — consumers see false success if the operation fails after the write
62
- - Existence checks (directory exists, file exists, module resolves) used as proof of correct/complete installation — a directory can exist but be empty, a file can exist with invalid contents. Verify the specific resource the consumer needs
63
- - Tracking/checkpoint files that default to empty on parse failure — causes full re-execution. Fail loudly instead
64
- - Registering references to resources without verifying the resource exists — dangling references after failed operations
65
-
66
- **Concurrency & data integrity**
73
+ **Concurrency & data integrity** _[applies when: code has shared state, database writes, or multi-step mutations]_
67
74
  - Shared mutable state accessed by concurrent requests without locking or atomic writes; multi-step read-modify-write cycles that can interleave — use conditional writes/optimistic concurrency (e.g., condition expressions, version checks) to close the gap between read and write; if the conditional write fails, surface a retryable error instead of letting it bubble as a 500
68
75
  - Multi-table writes without a transaction — FK violations or errors leave partial state
76
+ - Writes that replace an entire composite attribute (array, map, JSON blob) when the field is populated by multiple sources — the write discards data from other sources. Use a separate attribute, merge with the existing value, or use list/set append operations
69
77
  - Functions with early returns for "no primary fields to update" that silently skip secondary operations (relationship updates, link writes)
70
78
  - Functions that acquire shared state (locks, flags, markers) with exit paths that skip cleanup — leaves the system permanently locked. Trace all exit paths including error branches
71
79
 
72
- **Search & navigation**
73
- - Search results linking to generic list pages instead of deep-linking to the specific record
74
- - Search/query code hardcoding one backend's implementation when the system supports multiple verify option/parameter names are mapped between backends
80
+ **Input handling** _[applies when: code accepts user/external input]_
81
+ - Trimming values where whitespace is significant (API keys, tokens, passwords, base64) only trim identifiers/names
82
+ - Endpoints accepting unbounded arrays/collections without upper limits enforce max size or move to background jobs
75
83
 
76
- **Sync & replication**
77
- - Upsert/`ON CONFLICT UPDATE` updating only a subset of exported fields — replicas diverge. Document deliberately omitted fields
78
- - Pagination using `COUNT(*)` (full table scan) instead of `limit + 1`; endpoints missing `next` token input/output; hard-capped limits silently truncating results
79
- - Batch/paginated API calls (database batch gets, external service calls) that don't handle partial results — unprocessed items, continuation tokens, or rate-limited responses silently dropped. Add retry loops with backoff for unprocessed items
80
- - Retry loops without backoff or max-attempt limits — tight loops under throttling extend latency indefinitely. Use bounded retries with exponential backoff/jitter
84
+ ## Tier 3 — Domain-Specific (Check Only When File Type Matches)
81
85
 
82
- **SQL & database**
86
+ **SQL & database** _[applies when: code contains SQL, ORM queries, or migration files]_
83
87
  - Parameterized query placeholder indices must match parameter array positions — especially with shared param builders or computed indices
84
88
  - Database triggers clobbering explicitly-provided values; auto-incrementing columns that only increment on INSERT, not UPDATE
85
89
  - Full-text search with strict parsers (`to_tsquery`) on user input — use `websearch_to_tsquery` or `plainto_tsquery`
86
90
  - Dead queries (results never read), N+1 patterns inside transactions, O(n²) algorithms on growing data
87
91
  - `CREATE TABLE IF NOT EXISTS` as sole migration strategy — won't add columns/indexes on upgrade. Use `ALTER TABLE ... ADD COLUMN IF NOT EXISTS` or a migration framework
88
92
  - Functions/extensions requiring specific database versions without verification
93
+ - Migrations that lock tables for extended periods (ADD COLUMN with default on large tables, CREATE INDEX without CONCURRENTLY) — use concurrent operations or batched backfills
94
+ - Missing rollback/down migration or untested rollback path
89
95
 
90
- **Lazy initialization & module loading**
96
+ **Sync & replication** _[applies when: code uses pagination, batch APIs, or data sync]_
97
+ - Upsert/`ON CONFLICT UPDATE` updating only a subset of exported fields — replicas diverge. Document deliberately omitted fields
98
+ - Pagination using `COUNT(*)` (full table scan) instead of `limit + 1`; endpoints missing `next` token input/output; hard-capped limits silently truncating results
99
+ - Batch/paginated API calls (database batch gets, external service calls) that don't handle partial results — unprocessed items, continuation tokens, or rate-limited responses silently dropped. Add retry loops with backoff for unprocessed items
100
+ - Retry loops without backoff or max-attempt limits — tight loops under throttling extend latency indefinitely. Use bounded retries with exponential backoff/jitter
101
+
102
+ **Lazy initialization & module loading** _[applies when: code uses dynamic imports, lazy singletons, or bootstrap sequences]_
91
103
  - Cached state getters returning null before initialization — provide async initializer or ensure-style function
92
104
  - Module-level side effects (file reads, SDK init) without error handling — corrupted files crash the process on import
93
105
  - Bootstrap/resilience code that imports the dependencies it's meant to install — restructure so installation precedes resolution
94
106
  - Re-exporting from heavy modules defeats lazy loading — use lightweight shared modules
95
107
 
96
- **Data format portability**
108
+ **Data format portability** _[applies when: code crosses serialization boundaries — JSON, DB, IPC]_
97
109
  - Values crossing serialization boundaries may change format (arrays in JSON vs string literals in DB) — convert consistently
110
+ - Reads issued immediately after writes to an eventually consistent store (database scans, replica reads, cache refreshes) may return stale data — use consistent-read options, compute from in-memory state after confirmed writes, or document the eventual-consistency window
98
111
  - BIGINT values parsed into JavaScript `Number` — precision lost past `MAX_SAFE_INTEGER`. Use strings or `BigInt`
99
112
  - Data model key/index design that doesn't support required query access patterns — e.g., claiming "recent" ordering but using non-time-sortable keys (random UUIDs, user IDs). Verify sort keys and indexes can serve the queries the code performs without full-partition scans and in-memory sorting
100
113
 
101
- **Shell & portability**
114
+ **Shell & portability** _[applies when: code spawns subprocesses, uses shell scripts, or builds CLI tools]_
102
115
  - Subprocess calls under `set -e` abort on failure; non-critical writes fail on broken pipes — use `|| true` for non-critical output
103
116
  - Detached child processes with piped stdio — parent exit causes SIGPIPE. Redirect to log files or use `'ignore'`
104
117
  - Platform-specific assumptions — hardcoded shell interpreters, `path.join()` backslashes breaking ESM imports. Use `pathToFileURL()` for dynamic imports
105
118
 
106
- **Test coverage**
107
- - New logic/schemas/services without corresponding tests when similar existing code has tests
108
- - New error paths untestable because services throw generic errors instead of typed ones
109
- - Tests re-implementing logic under test instead of importing real exports — pass even when real code regresses
110
- - Tests depending on real wall-clock time or external dependencies when testing logic — use fake timers and mocks
111
- - Missing tests for trust-boundary enforcementsubmit tampered values, verify server ignores them
119
+ **Search & navigation** _[applies when: code implements search results or deep-linking]_
120
+ - Search results linking to generic list pages instead of deep-linking to the specific record
121
+ - Search/query code hardcoding one backend's implementation when the system supports multiple verify option/parameter names are mapped between backends
122
+
123
+ **Destructive UI operations** _[applies when: code adds delete, reset, revoke, or other destructive actions]_
124
+ - Destructive actions (delete, reset, revoke) in the UI without a confirmation step compare with how similar destructive operations elsewhere in the codebase handle confirmation
112
125
 
113
- **Accessibility**
126
+ **Accessibility** _[applies when: code modifies UI components or interactive elements]_
114
127
  - Interactive elements missing accessible names, roles, or ARIA states — including disabled interactions without `aria-disabled`
115
128
  - Custom toggle/switch UI built from non-semantic elements instead of native inputs
116
129
 
130
+ ## Tier 4 — Always Check (Quality, Conventions, AI-Generated Code)
131
+
132
+ **Intent vs implementation**
133
+ - Labels, comments, status messages, or documentation that describe behavior the code doesn't implement — e.g., a map named "renamed" that only deletes, or an action labeled "migrated" that never creates the target
134
+ - Inline code examples, command templates, and query snippets that aren't syntactically valid as written — template placeholders must use a consistent format, queries must use correct syntax for their language (e.g., single `{}` in GraphQL, not `{{}}`)
135
+ - Cross-references between files (identifiers, parameter names, format conventions, operational thresholds) that disagree — when one reference changes, trace all other files that reference the same entity and update them
136
+ - Responsibility relocated from one module to another (e.g., writes moved from handler to middleware) without updating all consumers that depended on the old location's timing, return value, or side effects — trace callers that relied on the synchronous or co-located behavior and verify they still work with the new execution point. Remove dead code left behind at the old location
137
+ - Sequential instructions or steps whose ordering doesn't match the required execution order — readers following in order will perform actions at the wrong time (e.g., "record X" in step 2 when X must be captured before step 1's action)
138
+ - Sequential numbering (section numbers, step numbers) with gaps or jumps after edits — verify continuity
139
+ - Completion markers, success flags, or status files written before the operation they attest to finishes — consumers see false success if the operation fails after the write
140
+ - Existence checks (directory exists, file exists, module resolves) used as proof of correct/complete installation — a directory can exist but be empty, a file can exist with invalid contents. Verify the specific resource the consumer needs
141
+ - Lookups that check only one scope when multiple exist — e.g., checking local git branches but not remote, checking in-memory cache but not persistent store. Trace all locations where the resource could exist and check each
142
+ - Tracking/checkpoint files that default to empty on parse failure — causes full re-execution. Fail loudly instead
143
+ - Registering references to resources without verifying the resource exists — dangling references after failed operations
144
+
145
+ **Automated pipeline discipline**
146
+ - Internal code review must run on all automated remediation changes BEFORE creating PRs — never go straight from "tests pass" to PR creation
147
+ - Copilot review must complete (approved or commented) on all PRs before merging — never merge while reviews are still pending unless the user explicitly approves
148
+ - Automated agents may introduce subtle issues that pass tests but violate project conventions — review agent output against CLAUDE.md conventions
149
+
150
+ **AI-generated code quality** _(Claude 4.6 specific failure modes)_
151
+ - Over-engineering: new abstractions, wrapper functions, helper files, or utility modules that serve only one call site — inline the logic instead
152
+ - Feature flags, configuration options, or extension points with only one possible value or consumer
153
+ - Commit messages or comments claiming a fix while the underlying bug remains — verify each claimed fix actually addresses the root cause, not just the symptom
154
+ - Functions containing placeholder comments (`// TODO`, `// FIXME`, `// implement later`) or stub implementations presented as complete
155
+ - Unnecessary defensive code: error handling for scenarios that provably cannot occur given the call site, fallbacks for internal functions that always return valid data
156
+
117
157
  **Configuration & hardcoding**
118
158
  - Hardcoded values when a config field or env var already exists; dead config fields nothing consumes; unused function parameters creating false API contracts; resource names (table names, queue names, bucket names) hardcoded without accounting for environment prefixes — lookups on response objects using the wrong key silently return undefined
119
- - Duplicated config/constants/utilities across modules — extract to shared module to prevent drift
159
+ - Duplicated config/constants/utilities/helper functions across modules — extract to shared module to prevent drift. Watch for behavioral inconsistencies between copies (e.g., one returns `'unknown'` for null while another returns `'never'`)
120
160
  - CI pipelines installing without lockfile pinning or version constraints — non-deterministic builds
161
+ - Production code paths with no structured logging at entry/exit points
162
+ - Error logs missing reproduction context (request ID, input parameters)
163
+ - Async flows without correlation ID propagation
164
+
165
+ **Supply chain & dependency health**
166
+ - Lockfile committed and CI uses `--frozen-lockfile`; no lockfile drift from manifest
167
+ - `npm audit` / `cargo audit` / `pip-audit` has no unaddressed HIGH/CRITICAL vulnerabilities
168
+ - No `postinstall` scripts from untrusted packages executing arbitrary code without review
169
+ - Overly permissive version ranges (`*`, `>=`) on deps with known breaking-change history
170
+
171
+ **Test coverage**
172
+ - New logic/schemas/services without corresponding tests when similar existing code has tests
173
+ - New error paths untestable because services throw generic errors instead of typed ones
174
+ - Tests re-implementing logic under test instead of importing real exports — pass even when real code regresses
175
+ - Tests depending on real wall-clock time or external dependencies when testing logic — use fake timers and mocks
176
+ - Missing tests for trust-boundary enforcement — submit tampered values, verify server ignores them
177
+ - Tests that pass but don't cover the changed code paths — passing unrelated tests is not validation
121
178
 
122
179
  **Style & conventions**
123
180
  - Naming and patterns consistent with the rest of the codebase
@@ -0,0 +1,61 @@
1
+ ## Remediation Agent Template
2
+
3
+ Use this template when spawning remediation agents in Phase 3c. Replace all `{PLACEHOLDERS}` with actual values.
4
+
5
+ ```
6
+ <context>
7
+ Project type: {PROJECT_TYPE}
8
+ Build command: {BUILD_CMD}
9
+ Test command: {TEST_CMD}
10
+ Working directory: {WORKTREE_DIR} (this is a git worktree — all work happens here)
11
+ Foundation utilities available (if created):
12
+ {FOUNDATION_UTILS}
13
+ </context>
14
+
15
+ <findings>
16
+ {FINDINGS}
17
+ </findings>
18
+
19
+ <instructions>
20
+ You are {AGENT_NAME} on team better-{DATE}.
21
+
22
+ Your task: Fix all {CATEGORY} findings listed above.
23
+
24
+ FINDING VALIDATION — verify before fixing:
25
+ - Before fixing each finding, READ the file and at least 30 lines of surrounding
26
+ context to confirm the issue is genuine.
27
+ - Check whether the flagged code is already correct (e.g., a Promise chain that
28
+ IS properly awaited downstream, a value that IS validated earlier in the function,
29
+ a pattern that IS idiomatic for the framework).
30
+ - If the existing code is already correct, SKIP the fix and report it as a
31
+ false positive with a brief explanation of why the original code is fine.
32
+ - Do not make changes that are semantically equivalent to the original code
33
+ (e.g., wrapping a .then() chain in an async IIFE adds noise without fixing anything).
34
+ </instructions>
35
+
36
+ <guardrails>
37
+ - Only use APIs/functions verified to exist by reading source files. If a fix
38
+ requires an API you haven't confirmed, read the module's exports first.
39
+ - Fix with minimum change required. Do not introduce new abstractions or helpers
40
+ unless the finding specifically calls for it. A one-line fix beats a refactored module.
41
+ - If a git/build/file-read command fails, retry once after verifying the working
42
+ directory and path. If it fails again, report the error and move to the next finding.
43
+ </guardrails>
44
+
45
+ <commit_strategy>
46
+ Goal: each commit builds independently and contains one logical group of
47
+ related fixes. Use conventional prefixes (fix:, refactor:, feat:, security:).
48
+ Stage specific files only (`git -C {WORKTREE_DIR} add <specific files>` — never
49
+ `git add -A` or `git add .`). Run {BUILD_CMD} in {WORKTREE_DIR} before committing.
50
+ No co-author annotations or version bumps.
51
+ </commit_strategy>
52
+
53
+ CONFLICT AVOIDANCE:
54
+ - Only modify files listed in your assigned findings
55
+ - If you need to modify a file assigned to another agent, skip that change and report it
56
+
57
+ After all fixes:
58
+ - Ensure all changes are committed (no uncommitted work)
59
+ - Mark your task as completed via TaskUpdate
60
+ - Report: commits made, files modified, findings addressed, any skipped issues
61
+ ```
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "slash-do",
3
- "version": "1.4.3",
3
+ "version": "1.5.1",
4
4
  "description": "Curated slash commands for AI coding assistants — Claude Code, OpenCode, Gemini CLI, and Codex",
5
5
  "author": "Adam Eivy <adam@eivy.com>",
6
6
  "license": "MIT",
package/uninstall.sh CHANGED
@@ -32,6 +32,7 @@ OLD_COMMANDS=(cam good makegoals makegood optimize-md)
32
32
 
33
33
  LIBS=(
34
34
  code-review-checklist copilot-review-loop graphql-escaping
35
+ remediation-agent-template
35
36
  )
36
37
 
37
38
  HOOKS=(slashdo-check-update slashdo-statusline)