slash-do 1.4.2 → 1.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +4 -0
- package/commands/do/better.md +50 -130
- package/commands/do/fpr.md +26 -5
- package/commands/do/pr.md +25 -6
- package/commands/do/release.md +25 -6
- package/commands/do/review.md +75 -3
- package/install.sh +1 -0
- package/lib/code-review-checklist.md +111 -56
- package/lib/remediation-agent-template.md +61 -0
- package/package.json +1 -1
- package/uninstall.sh +1 -0
package/README.md
CHANGED
|
@@ -30,6 +30,10 @@
|
|
|
30
30
|
|
|
31
31
|
---
|
|
32
32
|
|
|
33
|
+
## Philosophy
|
|
34
|
+
|
|
35
|
+
slashdo commands emphasize **high-quality software engineering over token conservation**. While efforts are made to use agents, models, and prompts efficiently, these tools work hard to ensure your software meets high-quality standards — and will use the tokens necessary to meet that end. Expect thorough reviews, multi-agent scans, and verification loops rather than shortcuts.
|
|
36
|
+
|
|
33
37
|
## Quick Start
|
|
34
38
|
|
|
35
39
|
**With npm/npx:**
|
package/commands/do/better.md
CHANGED
|
@@ -49,6 +49,16 @@ When the resolved model is `opus`, **omit** the `model` parameter on the Agent/T
|
|
|
49
49
|
|
|
50
50
|
Opus reduces false positives in audit (judgment-heavy). Sonnet is the floor for code-writing agents (remediation). Haiku works for fast first-pass pattern scanning but may produce more false positives — remediation agents (Sonnet+) validate before fixing.
|
|
51
51
|
|
|
52
|
+
## Compaction Guidance
|
|
53
|
+
|
|
54
|
+
When compacting during this workflow, always preserve:
|
|
55
|
+
- The `FILE_OWNER_MAP` (complete, not summarized)
|
|
56
|
+
- All CRITICAL/HIGH findings with file:line references
|
|
57
|
+
- The current phase number and what phases remain
|
|
58
|
+
- All PR numbers and URLs created so far
|
|
59
|
+
- `BUILD_CMD`, `TEST_CMD`, `PROJECT_TYPE`, `WORKTREE_DIR` values
|
|
60
|
+
- `VCS_HOST`, `CLI_TOOL`, `DEFAULT_BRANCH`, `CURRENT_BRANCH`
|
|
61
|
+
|
|
52
62
|
## Phase 0: Discovery & Setup
|
|
53
63
|
|
|
54
64
|
Detect the project environment before any scanning or remediation.
|
|
@@ -98,6 +108,8 @@ If `VCS_HOST` is `github`, proactively verify browser authentication for the Cop
|
|
|
98
108
|
|
|
99
109
|
This ensures the browser is ready before we need it in Phase 6, avoiding interruptions mid-flow.
|
|
100
110
|
|
|
111
|
+
<audit_instructions>
|
|
112
|
+
|
|
101
113
|
## Phase 1: Unified Audit
|
|
102
114
|
|
|
103
115
|
Project conventions are already in your context. Pass relevant conventions to each agent.
|
|
@@ -107,7 +119,7 @@ Launch 7 Explore agents in two batches. Each agent must report findings in this
|
|
|
107
119
|
- **[CRITICAL/HIGH/MEDIUM/LOW]** `file:line` - Description. Suggested fix: ... Complexity: Simple/Medium/Complex
|
|
108
120
|
```
|
|
109
121
|
|
|
110
|
-
**
|
|
122
|
+
**Context requirement.** Before flagging, read at least 30 lines of surrounding context to confirm the issue is real. Common false positives to watch for:
|
|
111
123
|
- A Promise `.then()` chain that appears "unawaited" but IS collected into an array and awaited via `Promise.all` downstream
|
|
112
124
|
- A value that appears "unvalidated" but IS checked by a guard clause earlier in the function or by the caller
|
|
113
125
|
- A pattern that looks like an anti-pattern in isolation but IS idiomatic for the specific framework or library being used
|
|
@@ -115,6 +127,17 @@ Launch 7 Explore agents in two batches. Each agent must report findings in this
|
|
|
115
127
|
|
|
116
128
|
If the surrounding context shows the code is correct, do NOT flag it.
|
|
117
129
|
|
|
130
|
+
If uncertain whether something is a genuine issue, report it as **[UNCERTAIN]** with your reasoning. The consolidation phase will evaluate these separately. Fewer confident findings is better than padding with questionable ones.
|
|
131
|
+
|
|
132
|
+
<approach>
|
|
133
|
+
For each potential finding:
|
|
134
|
+
1. Read the file and 30+ lines of surrounding context
|
|
135
|
+
2. Quote the specific code that demonstrates the issue
|
|
136
|
+
3. Explain why it's a problem given the context
|
|
137
|
+
4. Only then classify severity and suggest a fix
|
|
138
|
+
Skip step 4 if steps 1-3 reveal the code is correct.
|
|
139
|
+
</approach>
|
|
140
|
+
|
|
118
141
|
### Batch 1 (5 parallel Explore agents via Task tool):
|
|
119
142
|
|
|
120
143
|
**Model**: Pass `AUDIT_MODEL` as the `model` parameter on each agent. If `AUDIT_MODEL` is `opus`, omit the parameter to inherit from session.
|
|
@@ -122,6 +145,8 @@ If the surrounding context shows the code is correct, do NOT flag it.
|
|
|
122
145
|
1. **Security & Secrets**
|
|
123
146
|
Sources: authentication checks, credential exposure, infrastructure security, input validation, dependency health
|
|
124
147
|
Focus: hardcoded credentials, API keys, exposed secrets, authentication bypasses, disabled security checks, PII exposure, injection vulnerabilities (SQL/command/path traversal), insecure CORS configurations, missing auth checks, unsanitized user input in file paths or queries, known CVEs in dependencies (check `npm audit` / `cargo audit` / `pip-audit` / `go vuln` output), abandoned or unmaintained dependencies, overly permissive dependency version ranges
|
|
148
|
+
OWASP Top 10 framing: broken auth (session fixation, credential stuffing), security misconfiguration (default creds, debug mode in prod), SSRF (user-controlled URLs in server fetch without allowlist), mass assignment (request bodies bound to models without field allowlist)
|
|
149
|
+
Supply chain: lockfile committed + frozen installs in CI, no untrusted postinstall scripts
|
|
125
150
|
|
|
126
151
|
2. **Code Quality & Style**
|
|
127
152
|
Sources: code brittleness, convention violations, test workarounds, logging & observability
|
|
@@ -134,10 +159,13 @@ If the surrounding context shows the code is correct, do NOT flag it.
|
|
|
134
159
|
4. **Architecture & SOLID**
|
|
135
160
|
Sources: structural violations, coupling analysis, modularity, API contract quality
|
|
136
161
|
Focus: Single Responsibility violations (god files >500 lines, functions >50 lines doing multiple things), tight coupling between modules, circular dependencies, mixed concerns in single files, dependency inversion violations, classes/modules with too many responsibilities (>20 public methods), deep nesting (>4 levels), long parameter lists, modules reaching into other modules' internals, inconsistent API error response shapes across endpoints, list endpoints missing pagination, missing rate limiting on public endpoints, inconsistent request/response envelope patterns
|
|
162
|
+
API contract consistency: breaking response shape changes without versioning, inconsistent error envelopes across endpoints, missing deprecation headers on sunset endpoints
|
|
137
163
|
|
|
138
164
|
5. **Bugs, Performance & Error Handling**
|
|
139
165
|
Sources: runtime safety, resource management, async correctness, performance, race conditions
|
|
140
166
|
Focus: missing `await` on async calls, unhandled promise rejections, null/undefined access without guards, off-by-one errors, incorrect comparison operators, mutation of shared state, resource leaks (unbounded caches/maps, unclosed connections/streams), `process.exit()` in library code, async routes without error forwarding, missing AbortController on data fetching, N+1 query patterns (loading related records inside loops), O(n²) or worse algorithms in hot paths, unbounded result sets (missing LIMIT/pagination on DB queries), missing database indexes on frequently queried columns, race conditions (TOCTOU, double-submit without idempotency keys, concurrent writes to shared state without locks, stale-read-then-write patterns), missing connection pooling or pool exhaustion
|
|
167
|
+
Resilience: external calls without timeouts, missing fallback for unavailable downstream services, retry without backoff ceiling/jitter, missing health check endpoints
|
|
168
|
+
Observability: production paths without structured logging, error logs missing reproduction context (request ID, input params), async flows without correlation IDs
|
|
141
169
|
|
|
142
170
|
### Batch 2 (2 agents after Batch 1 completes):
|
|
143
171
|
|
|
@@ -150,6 +178,7 @@ If the surrounding context shows the code is correct, do NOT flag it.
|
|
|
150
178
|
- **Python**: mutable default arguments, bare except clauses, missing type hints on public APIs, sync I/O in async contexts
|
|
151
179
|
- **Go**: unchecked errors, goroutine leaks, defer in loops, context propagation gaps
|
|
152
180
|
- **Web projects (any stack)**: accessibility issues — missing alt text on images, broken keyboard navigation, missing ARIA labels on interactive elements, insufficient color contrast, form inputs without associated labels
|
|
181
|
+
- **Database migrations**: exclusive-lock ALTER TABLE on large tables, CREATE INDEX without CONCURRENTLY, missing down migrations or untested rollback paths
|
|
153
182
|
- General: framework-specific security issues, language-specific gotchas, domain-specific compliance, environment variable hygiene (missing `.env.example`, required env vars not validated at startup, secrets in config files that should be in env)
|
|
154
183
|
|
|
155
184
|
7. **Test Coverage**
|
|
@@ -158,12 +187,16 @@ If the surrounding context shows the code is correct, do NOT flag it.
|
|
|
158
187
|
|
|
159
188
|
Wait for ALL agents to complete before proceeding.
|
|
160
189
|
|
|
190
|
+
</audit_instructions>
|
|
191
|
+
|
|
192
|
+
<plan_and_remediate>
|
|
193
|
+
|
|
161
194
|
## Phase 2: Plan Generation
|
|
162
195
|
|
|
163
196
|
1. Read the existing `PLAN.md` (create if it doesn't exist)
|
|
164
197
|
2. Consolidate all findings from Phase 1, deduplicating across agents (same file:line flagged by multiple agents → keep the most specific description)
|
|
165
198
|
3. Identify **shared utility extractions** — patterns duplicated 3+ times that should become reusable functions. Group these as "Foundation" work for Phase 3b.
|
|
166
|
-
4. **Build the file ownership map** (
|
|
199
|
+
4. **Build the file ownership map** (required by Phase 5 for conflict-free PRs):
|
|
167
200
|
- For each finding, record which file(s) it touches
|
|
168
201
|
- Assign each file to exactly ONE category (its primary category)
|
|
169
202
|
- If a file is touched by multiple categories, assign it to the category with the highest-severity finding for that file
|
|
@@ -266,58 +299,17 @@ If no shared utilities were identified, skip this step.
|
|
|
266
299
|
4. Spawn up to 5 general-purpose agents as teammates. **Pass `REMEDIATION_MODEL` as the `model` parameter on each agent.** If `REMEDIATION_MODEL` is `opus`, omit the parameter to inherit from session.
|
|
267
300
|
|
|
268
301
|
### Agent instructions template:
|
|
269
|
-
|
|
270
|
-
|
|
271
|
-
|
|
272
|
-
Your task: Fix all {CATEGORY} findings from the Good audit.
|
|
273
|
-
Working directory: {WORKTREE_DIR} (this is a git worktree — all work happens here)
|
|
274
|
-
|
|
275
|
-
Project type: {PROJECT_TYPE}
|
|
276
|
-
Build command: {BUILD_CMD}
|
|
277
|
-
Test command: {TEST_CMD}
|
|
278
|
-
|
|
279
|
-
Foundation utilities available (if created):
|
|
280
|
-
{list of utility files with brief descriptions}
|
|
281
|
-
|
|
282
|
-
Findings to address:
|
|
283
|
-
{filtered list of CRITICAL/HIGH/MEDIUM findings for this category}
|
|
284
|
-
|
|
285
|
-
FINDING VALIDATION — verify before fixing:
|
|
286
|
-
- Before fixing each finding, READ the file and at least 30 lines of surrounding
|
|
287
|
-
context to confirm the issue is genuine.
|
|
288
|
-
- Check whether the flagged code is already correct (e.g., a Promise chain that
|
|
289
|
-
IS properly awaited downstream, a value that IS validated earlier in the function,
|
|
290
|
-
a pattern that IS idiomatic for the framework).
|
|
291
|
-
- If the existing code is already correct, SKIP the fix and report it as a
|
|
292
|
-
false positive with a brief explanation of why the original code is fine.
|
|
293
|
-
- Do not make changes that are semantically equivalent to the original code
|
|
294
|
-
(e.g., wrapping a .then() chain in an async IIFE adds noise without fixing anything).
|
|
295
|
-
|
|
296
|
-
COMMIT STRATEGY — commit early and often:
|
|
297
|
-
- After completing each logical group of related fixes, stage those files
|
|
298
|
-
and commit immediately with a descriptive conventional commit message.
|
|
299
|
-
- Each commit should be independently valid (build should pass).
|
|
300
|
-
- Run {BUILD_CMD} in {WORKTREE_DIR} before each commit to verify.
|
|
301
|
-
- Use `git -C {WORKTREE_DIR} add <specific files>` — never `git add -A` or `git add .`
|
|
302
|
-
- Use `git -C {WORKTREE_DIR} commit -m "prefix: description"`
|
|
303
|
-
- Use conventional commit prefixes: fix:, refactor:, feat:, security:
|
|
304
|
-
- Do NOT include co-author or generated-by annotations in commits.
|
|
305
|
-
- Do NOT bump the version — that happens once at the end.
|
|
306
|
-
|
|
307
|
-
After all fixes:
|
|
308
|
-
- Ensure all changes are committed (no uncommitted work)
|
|
309
|
-
- Mark your task as completed via TaskUpdate
|
|
310
|
-
- Report: commits made, files modified, findings addressed, any skipped issues
|
|
311
|
-
|
|
312
|
-
CONFLICT AVOIDANCE:
|
|
313
|
-
- Only modify files listed in your assigned findings
|
|
314
|
-
- If you need to modify a file assigned to another agent, skip that change and report it
|
|
315
|
-
```
|
|
302
|
+
|
|
303
|
+
!`cat ~/.claude/lib/remediation-agent-template.md`
|
|
316
304
|
|
|
317
305
|
### Conflict avoidance:
|
|
318
306
|
- Review all findings before task assignment. If two categories touch the same file, assign both sets of findings to the same agent.
|
|
319
307
|
- Security agent gets priority on validation logic; DRY agent gets priority on import consolidation.
|
|
320
308
|
|
|
309
|
+
</plan_and_remediate>
|
|
310
|
+
|
|
311
|
+
<verification_and_pr>
|
|
312
|
+
|
|
321
313
|
## Phase 4: Verification
|
|
322
314
|
|
|
323
315
|
After all agents complete:
|
|
@@ -359,9 +351,9 @@ For each category that has findings:
|
|
|
359
351
|
```
|
|
360
352
|
5. Push the branch: `git push -u origin better/{CATEGORY_SLUG}`
|
|
361
353
|
|
|
362
|
-
**
|
|
354
|
+
**File isolation rule** (one file per branch) — each file must appear in exactly ONE branch. If a file has changes from multiple categories (e.g., `server/index.js` with both security and stack-specific changes), assign the whole file to one category based on the file ownership map. Do not split file-level changes across PRs.
|
|
363
355
|
|
|
364
|
-
**
|
|
356
|
+
**Cross-PR dependency check** — verify each branch builds independently:
|
|
365
357
|
```bash
|
|
366
358
|
git checkout better/{CATEGORY_SLUG} && {BUILD_CMD}
|
|
367
359
|
```
|
|
@@ -465,7 +457,7 @@ After creating all PRs, verify CI passes on each one:
|
|
|
465
457
|
|
|
466
458
|
Maximum 5 iterations per PR to prevent infinite loops.
|
|
467
459
|
|
|
468
|
-
**
|
|
460
|
+
**Sub-agent delegation** (prevents context exhaustion): delegate each PR's review loop to a **separate general-purpose sub-agent** via the Agent tool. Launch sub-agents in parallel (one per PR). Each sub-agent runs the full loop (request → wait → check → fix → re-request) autonomously and returns only the final status.
|
|
469
461
|
|
|
470
462
|
### 6.0: Verify browser authentication
|
|
471
463
|
|
|
@@ -486,85 +478,11 @@ If this returns 422 ("not a collaborator"), record `REVIEW_METHOD=playwright`. O
|
|
|
486
478
|
|
|
487
479
|
### 6.2: Launch parallel sub-agents (one per PR)
|
|
488
480
|
|
|
489
|
-
For each PR, spawn a general-purpose sub-agent
|
|
481
|
+
For each PR, spawn a general-purpose sub-agent using the shared review loop template:
|
|
490
482
|
|
|
491
|
-
|
|
492
|
-
|
|
493
|
-
|
|
494
|
-
Repository: {OWNER}/{REPO}
|
|
495
|
-
Branch: better/{CATEGORY_SLUG}
|
|
496
|
-
Build command: {BUILD_CMD}
|
|
497
|
-
Review request method: {REVIEW_METHOD}
|
|
498
|
-
Max iterations: 5
|
|
499
|
-
|
|
500
|
-
DECREASING TIMEOUT SCHEDULE (shorter than single-PR review since multiple
|
|
501
|
-
PRs are reviewed in parallel — see do:rpr for single-PR dynamic timing):
|
|
502
|
-
- Iteration 1: max wait 5 minutes
|
|
503
|
-
- Iteration 2: max wait 4 minutes
|
|
504
|
-
- Iteration 3: max wait 3 minutes
|
|
505
|
-
- Iteration 4: max wait 2 minutes
|
|
506
|
-
- Iteration 5+: max wait 1 minute
|
|
507
|
-
Poll interval: 30 seconds for all iterations.
|
|
508
|
-
|
|
509
|
-
Run the following loop until Copilot returns zero new comments or you hit
|
|
510
|
-
the max iteration limit:
|
|
511
|
-
|
|
512
|
-
1. CAPTURE the latest Copilot review timestamp, then REQUEST a new review:
|
|
513
|
-
- First, capture the latest Copilot review timestamp via GraphQL:
|
|
514
|
-
echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviews(last: 20) { nodes { author { login } submittedAt } } } } }"}' | gh api graphql --input -
|
|
515
|
-
- Find the most recent submittedAt where author.login is
|
|
516
|
-
copilot-pull-request-reviewer[bot] and record as LAST_COPILOT_SUBMITTED_AT.
|
|
517
|
-
- If no prior Copilot review exists, record LAST_COPILOT_SUBMITTED_AT=NONE
|
|
518
|
-
and treat the next Copilot review as NEW regardless of timestamp.
|
|
519
|
-
- Then REQUEST:
|
|
520
|
-
If REVIEW_METHOD is "api":
|
|
521
|
-
gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/requested_reviewers \
|
|
522
|
-
-f 'reviewers[]=copilot-pull-request-reviewer[bot]'
|
|
523
|
-
If REVIEW_METHOD is "playwright":
|
|
524
|
-
Navigate to the PR URL, click the "Reviewers" gear button, click the
|
|
525
|
-
Copilot menuitemradio option, verify sidebar shows "Awaiting requested
|
|
526
|
-
review from Copilot"
|
|
527
|
-
|
|
528
|
-
2. WAIT for the review (BLOCKING):
|
|
529
|
-
- Poll using stdin JSON piping (avoid shell-escaping issues):
|
|
530
|
-
echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviews(last: 5) { totalCount nodes { state body author { login } submittedAt } } reviewThreads(first: 100) { nodes { id isResolved comments(first: 3) { nodes { body path line author { login } } } } } } } }"}' | gh api graphql --input -
|
|
531
|
-
- Complete when a new copilot-pull-request-reviewer[bot] review appears
|
|
532
|
-
with submittedAt after LAST_COPILOT_SUBMITTED_AT captured in step 1
|
|
533
|
-
(or, if LAST_COPILOT_SUBMITTED_AT=NONE, when the first
|
|
534
|
-
copilot-pull-request-reviewer[bot] review for this loop appears)
|
|
535
|
-
- Use the DECREASING TIMEOUT for the current iteration number
|
|
536
|
-
- Error detection: if review body contains "Copilot encountered an error"
|
|
537
|
-
or "unable to review", re-request and resume. Max 3 error retries.
|
|
538
|
-
- If no review after max wait, report timeout and exit
|
|
539
|
-
|
|
540
|
-
3. CHECK for unresolved threads:
|
|
541
|
-
Fetch threads via stdin JSON piping:
|
|
542
|
-
echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviewThreads(first: 100) { nodes { id isResolved comments(first: 10) { nodes { body path line author { login } } } } } } } }"}' | gh api graphql --input -
|
|
543
|
-
- Verify review was successful (no error text in body)
|
|
544
|
-
- If zero comments / no unresolved threads: report success and exit
|
|
545
|
-
- If unresolved threads exist: proceed to step 4
|
|
546
|
-
|
|
547
|
-
4. FIX all unresolved threads:
|
|
548
|
-
For each unresolved thread:
|
|
549
|
-
- Read the referenced file and understand the feedback
|
|
550
|
-
- Evaluate: valid feedback → make the fix; informational/false positive →
|
|
551
|
-
resolve without changes
|
|
552
|
-
- If fixing:
|
|
553
|
-
git checkout better/{CATEGORY_SLUG}
|
|
554
|
-
# make changes
|
|
555
|
-
git add <specific files>
|
|
556
|
-
git commit -m "address Copilot review feedback"
|
|
557
|
-
git push
|
|
558
|
-
- Resolve thread via stdin JSON piping:
|
|
559
|
-
echo '{"query":"mutation { resolveReviewThread(input: {threadId: \"{THREAD_ID}\"}) { thread { id isResolved } } }"}' | gh api graphql --input -
|
|
560
|
-
- After all threads resolved, increment iteration and go back to step 1
|
|
561
|
-
|
|
562
|
-
When done, report back:
|
|
563
|
-
- Final status: clean / max-iterations-reached / timeout / error
|
|
564
|
-
- Total iterations completed
|
|
565
|
-
- List of commits made (if any)
|
|
566
|
-
- Any unresolved threads remaining
|
|
567
|
-
```
|
|
483
|
+
!`cat ~/.claude/lib/copilot-review-loop.md`
|
|
484
|
+
|
|
485
|
+
Pass each sub-agent the PR-specific variables: `{PR_NUMBER}`, `{OWNER}/{REPO}`, `better/{CATEGORY_SLUG}`, `{BUILD_CMD}`, and `{REVIEW_METHOD}`.
|
|
568
486
|
|
|
569
487
|
Launch all PR sub-agents in parallel. Wait for all to complete.
|
|
570
488
|
|
|
@@ -598,6 +516,8 @@ If merge fails (e.g., branch protection, merge conflicts from a prior PR):
|
|
|
598
516
|
Then re-run CI check before merging.
|
|
599
517
|
- If branch protection: inform the user and suggest manual merge
|
|
600
518
|
|
|
519
|
+
</verification_and_pr>
|
|
520
|
+
|
|
601
521
|
## Phase 7: Cleanup
|
|
602
522
|
|
|
603
523
|
1. Remove the worktree:
|
package/commands/do/fpr.md
CHANGED
|
@@ -59,18 +59,39 @@ Before committing, ensure the fork is up to date with upstream:
|
|
|
59
59
|
git push -u origin {CURRENT_BRANCH}
|
|
60
60
|
```
|
|
61
61
|
|
|
62
|
-
## Local Code Review (
|
|
62
|
+
## Local Code Review (REQUIRED GATE)
|
|
63
|
+
|
|
64
|
+
Fork PRs go to upstream maintainers who can't easily ask for changes — getting it right the first time matters more here than on internal PRs.
|
|
65
|
+
|
|
66
|
+
<review_gate>
|
|
63
67
|
|
|
64
68
|
1. Fetch upstream default branch for accurate diff:
|
|
65
69
|
```bash
|
|
66
70
|
git fetch upstream {UPSTREAM_DEFAULT_BRANCH}
|
|
67
71
|
```
|
|
68
|
-
2. Run `git diff upstream/{UPSTREAM_DEFAULT_BRANCH}...{CURRENT_BRANCH}` to
|
|
69
|
-
3.
|
|
72
|
+
2. Run `git diff upstream/{UPSTREAM_DEFAULT_BRANCH}...{CURRENT_BRANCH}` to get the list of changed files
|
|
73
|
+
3. For every changed file:
|
|
74
|
+
a. Read the entire file using the Read tool (not just diff hunks)
|
|
75
|
+
b. Check it against the tiered checklist below (always check Tiers 1+4; check Tiers 2-3 when relevance filters match)
|
|
76
|
+
c. For each finding, quote the specific code line and explain why it's a problem
|
|
77
|
+
4. After reviewing all files, verify: does the code actually deliver what the commits claim?
|
|
78
|
+
5. Print a review summary table (see do:review for format)
|
|
79
|
+
6. Fix any issues, recommit, and push before proceeding
|
|
80
|
+
7. Only after printing the review summary may you proceed to "Open the PR"
|
|
81
|
+
|
|
82
|
+
If the diff touches more than 15 files, delegate later batches to a subagent to keep context clean.
|
|
83
|
+
|
|
84
|
+
</review_gate>
|
|
85
|
+
|
|
86
|
+
Checklist to apply to each file:
|
|
70
87
|
|
|
71
88
|
!`cat ~/.claude/lib/code-review-checklist.md`
|
|
72
|
-
|
|
73
|
-
|
|
89
|
+
|
|
90
|
+
Verification — confirm before proceeding:
|
|
91
|
+
- [ ] Read every changed file in full (not just diffs)
|
|
92
|
+
- [ ] Checked each file against the relevant checklist tiers
|
|
93
|
+
- [ ] Quoted specific code for each finding
|
|
94
|
+
- [ ] Printed a review summary table with findings
|
|
74
95
|
|
|
75
96
|
## Check for Upstream Contributing Guidelines
|
|
76
97
|
|
package/commands/do/pr.md
CHANGED
|
@@ -17,17 +17,36 @@ Print: `PR flow: {current_branch} → {default_branch}`
|
|
|
17
17
|
- Keep commit message concise and do not use co-author information
|
|
18
18
|
- Push the branch to remote: `git pull --rebase --autostash && git push -u origin {current_branch}`
|
|
19
19
|
|
|
20
|
-
## Local Code Review (
|
|
20
|
+
## Local Code Review (REQUIRED GATE)
|
|
21
21
|
|
|
22
|
-
|
|
22
|
+
This review catches bugs that Copilot misses — incomplete pattern copying is the #1 source of post-merge review feedback. Skipping costs more time in review cycles than it saves.
|
|
23
23
|
|
|
24
|
-
|
|
25
|
-
|
|
24
|
+
<review_gate>
|
|
25
|
+
|
|
26
|
+
1. Read commit messages to understand what this change claims to do
|
|
27
|
+
2. Run `git diff {default_branch}...{current_branch}` to get the list of changed files
|
|
28
|
+
3. For every changed file:
|
|
29
|
+
a. Read the entire file using the Read tool (not just diff hunks)
|
|
30
|
+
b. Check it against the tiered checklist below (always check Tiers 1+4; check Tiers 2-3 when relevance filters match)
|
|
31
|
+
c. For each finding, quote the specific code line and explain why it's a problem
|
|
32
|
+
4. After reviewing all files, verify: does the code actually deliver what the commits claim?
|
|
33
|
+
5. Print a review summary table (see do:review for format)
|
|
34
|
+
6. Fix any issues, run tests, and verify tests cover the changed code paths
|
|
35
|
+
7. Only after printing the review summary may you proceed to "Open the PR"
|
|
36
|
+
|
|
37
|
+
If the diff touches more than 15 files, delegate later batches to a subagent to keep context clean.
|
|
38
|
+
|
|
39
|
+
</review_gate>
|
|
40
|
+
|
|
41
|
+
Checklist to apply to each file:
|
|
26
42
|
|
|
27
43
|
!`cat ~/.claude/lib/code-review-checklist.md`
|
|
28
44
|
|
|
29
|
-
|
|
30
|
-
|
|
45
|
+
Verification — confirm before proceeding:
|
|
46
|
+
- [ ] Read every changed file in full (not just diffs)
|
|
47
|
+
- [ ] Checked each file against the relevant checklist tiers
|
|
48
|
+
- [ ] Quoted specific code for each finding
|
|
49
|
+
- [ ] Printed a review summary table with findings
|
|
31
50
|
|
|
32
51
|
## Open the PR
|
|
33
52
|
|
package/commands/do/release.md
CHANGED
|
@@ -57,17 +57,36 @@ If ambiguous, ask the user to confirm before proceeding.
|
|
|
57
57
|
|
|
58
58
|
4. **Commit the release**: Stage `package.json`, `package-lock.json`, and the changelog file. Commit with message `chore: release v{new_version}`
|
|
59
59
|
|
|
60
|
-
## Local Code Review (
|
|
60
|
+
## Local Code Review (REQUIRED GATE)
|
|
61
61
|
|
|
62
|
-
|
|
62
|
+
A release without a deep code review ships bugs to users. This review is the last line of defense — the full diff since the last release often contains interactions that individual PR reviews missed.
|
|
63
63
|
|
|
64
|
-
|
|
65
|
-
|
|
64
|
+
<review_gate>
|
|
65
|
+
|
|
66
|
+
1. Read all commit messages since last release to understand the scope
|
|
67
|
+
2. Run `git diff {target}...{source}` to get the list of changed files
|
|
68
|
+
3. For every changed file:
|
|
69
|
+
a. Read the entire file using the Read tool (not just diff hunks)
|
|
70
|
+
b. Check it against the tiered checklist below (always check Tiers 1+4; check Tiers 2-3 when relevance filters match)
|
|
71
|
+
c. For each finding, quote the specific code line and explain why it's a problem
|
|
72
|
+
4. After reviewing all files, verify: does the aggregate change set deliver what the release claims?
|
|
73
|
+
5. Print a review summary table (see do:review for format)
|
|
74
|
+
6. Fix any issues, run tests, verify tests cover the changed code paths, commit and push
|
|
75
|
+
7. Only after printing the review summary may you proceed to "Open the Release PR"
|
|
76
|
+
|
|
77
|
+
If the diff touches more than 15 files, delegate later batches to a subagent to keep context clean.
|
|
78
|
+
|
|
79
|
+
</review_gate>
|
|
80
|
+
|
|
81
|
+
Checklist to apply to each file:
|
|
66
82
|
|
|
67
83
|
!`cat ~/.claude/lib/code-review-checklist.md`
|
|
68
84
|
|
|
69
|
-
|
|
70
|
-
|
|
85
|
+
Verification — confirm before proceeding:
|
|
86
|
+
- [ ] Read every changed file in full (not just diffs)
|
|
87
|
+
- [ ] Checked each file against the relevant checklist tiers
|
|
88
|
+
- [ ] Quoted specific code for each finding
|
|
89
|
+
- [ ] Printed a review summary table with findings
|
|
71
90
|
|
|
72
91
|
## Open the Release PR
|
|
73
92
|
|
package/commands/do/review.md
CHANGED
|
@@ -17,6 +17,22 @@ If there are no changes, inform the user and stop.
|
|
|
17
17
|
|
|
18
18
|
CLAUDE.md is already loaded into your context. Use its rules (code style, error handling, logging, security model, scope exclusions) as overrides to generic best practices throughout this review. For example, if CLAUDE.md says "no auth needed — internal tool", do not flag missing authentication.
|
|
19
19
|
|
|
20
|
+
<review_instructions>
|
|
21
|
+
|
|
22
|
+
## PR-Level Coherence Check
|
|
23
|
+
|
|
24
|
+
Before reviewing individual files, understand what this change set claims to do:
|
|
25
|
+
|
|
26
|
+
1. Read commit messages (`git log {base}...HEAD --oneline`)
|
|
27
|
+
2. After reviewing all files, verify: does the changed code actually deliver what the commits claim? Flag any claims not backed by code (e.g., "adds rate limiting" but only adds a comment).
|
|
28
|
+
|
|
29
|
+
## Large PR Strategy
|
|
30
|
+
|
|
31
|
+
If the diff touches more than 15 files, split the review into batches:
|
|
32
|
+
1. Group files by module/directory
|
|
33
|
+
2. Review each batch, printing findings as you go
|
|
34
|
+
3. Delegate files beyond the first 15 to a subagent if context is getting full
|
|
35
|
+
|
|
20
36
|
## Deep File Review
|
|
21
37
|
|
|
22
38
|
For **each changed file** in the diff, read the **entire file** (not just diff hunks). Reviewing only the diff misses context bugs where new code interacts incorrectly with existing code.
|
|
@@ -59,12 +75,20 @@ With the flow understood, evaluate the changed code against these principles:
|
|
|
59
75
|
|
|
60
76
|
Only flag principle violations that are **concrete and actionable** in the changed code. Do not flag pre-existing design issues in untouched code unless the changes make them worse.
|
|
61
77
|
|
|
78
|
+
</review_instructions>
|
|
79
|
+
|
|
80
|
+
<checklist>
|
|
81
|
+
|
|
62
82
|
### Per-File Checklist
|
|
63
83
|
|
|
64
|
-
Check every file against this checklist:
|
|
84
|
+
Check every file against this checklist. The checklist is organized into tiers — always check Tiers 1 and 4, and check Tiers 2-3 only when the relevance filter matches the file:
|
|
65
85
|
|
|
66
86
|
!`cat ~/.claude/lib/code-review-checklist.md`
|
|
67
87
|
|
|
88
|
+
</checklist>
|
|
89
|
+
|
|
90
|
+
<deep_checks>
|
|
91
|
+
|
|
68
92
|
### Additional deep checks (read surrounding code to verify):
|
|
69
93
|
|
|
70
94
|
**Cross-file consistency**
|
|
@@ -87,6 +111,7 @@ Check every file against this checklist:
|
|
|
87
111
|
|
|
88
112
|
**Access scope changes**
|
|
89
113
|
- If the PR widens access to an endpoint or resource (admin→public, internal→external), trace all shared dependencies the endpoint uses (rate limiters, queues, connection pools, external service quotas) and assess whether they were sized for the previous access level — in-memory/process-local limiters don't enforce limits across horizontally scaled instances
|
|
114
|
+
- If the PR adds endpoints under a restricted route group (admin, internal, scoped), read sibling endpoints in the same route group and verify the new endpoint applies the same authorization gate — missing gates on admin-mounted endpoints are consistently the most dangerous review finding
|
|
90
115
|
|
|
91
116
|
**Guard-before-cache ordering**
|
|
92
117
|
- If a handler performs a pre-flight guard check (rate limit, quota, feature flag) before a cache lookup or short-circuit path, verify the guard doesn't block operations that would be served from cache without touching the guarded resource — restructure so cache hits bypass the guard
|
|
@@ -112,17 +137,62 @@ Check every file against this checklist:
|
|
|
112
137
|
- If the PR modifies a value (identifier, parameter name, format convention, threshold, timeout) that is referenced in other files, trace all cross-references and verify they agree. This includes: reviewer usernames, API names, placeholder formats, GraphQL field names, operational constants
|
|
113
138
|
- If the PR adds or reorders sequential steps/instructions, verify the ordering matches execution dependencies — readers following steps in order must not perform an action before its prerequisite
|
|
114
139
|
|
|
140
|
+
**Transactional write integrity**
|
|
141
|
+
- If the PR performs multi-item writes (database transactions, batch operations), verify each write includes condition expressions that prevent stale-read races (TOCTOU) — an unconditioned write after a read can upsert deleted records, double-count aggregates, or drive counters negative. Trace the gap between read and write for each operation
|
|
142
|
+
- If the PR catches transaction/conditional failures, verify the error is translated to a client-appropriate status (409, 404) rather than bubbling as 500 — expected concurrency failures are not server errors
|
|
143
|
+
|
|
144
|
+
**Batch/paginated API consumption**
|
|
145
|
+
- If the PR calls batch or paginated external APIs (database batch gets, paginated queries, bulk service calls), verify the caller handles partial results — unprocessed items, continuation tokens, and rate-limited responses must be retried or surfaced, not silently dropped. Check that retry loops include backoff and attempt limits
|
|
146
|
+
- If the PR references resource names from API responses (table names, queue names), verify lookups account for environment-prefixed names rather than hardcoding bare names
|
|
147
|
+
|
|
148
|
+
**Data model vs access pattern alignment**
|
|
149
|
+
- If the PR adds queries that claim ordering (e.g., "recent", "top"), verify the underlying key/index design actually supports that ordering natively — random UUIDs and non-time-sortable keys require full scans and in-memory sorting, which degrades at scale
|
|
150
|
+
|
|
151
|
+
**Deletion/lifecycle cleanup completeness**
|
|
152
|
+
- If the PR adds a delete or destroy function, trace all resources created during the entity's lifecycle (data directories, git branches, child records, temporary files, worktrees) and verify each is cleaned up on deletion. Compare with existing delete functions in the codebase for completeness patterns
|
|
153
|
+
|
|
154
|
+
**Update schema depth**
|
|
155
|
+
- If the PR derives an update/patch schema from a create schema (e.g., `.partial()`, `Partial<T>`), verify that nested objects also become partial — shallow partial on deeply-required schemas rejects valid partial updates where the caller only wants to change one nested field
|
|
156
|
+
|
|
157
|
+
**Mutation return value freshness**
|
|
158
|
+
- If a function mutates an entity and returns it, verify the returned object reflects the post-mutation state, not a pre-read snapshot. Also check whether dependent scheduling/evaluation state (backoff, timers, status flags) is reset when a "force" or "trigger" operation is invoked
|
|
159
|
+
|
|
160
|
+
**Responsibility relocation audit**
|
|
161
|
+
- If the PR moves a responsibility from one module to another (e.g., a database write from a handler to middleware, a computation from client to server), trace all code at the old location that depended on the timing, return value, or side effects of the moved operation — guards, response fields, in-memory state updates, and downstream scheduling that assumed co-located execution. Verify the new execution point preserves these contracts or that dependents are updated. Check for dead code left behind at the old location
|
|
162
|
+
|
|
163
|
+
**Read-after-write consistency**
|
|
164
|
+
- If the PR writes to a data store and then immediately queries that store (especially scans, aggregations, or replica reads), check whether the store's consistency model guarantees visibility of the write. If not, flag the read as potentially stale and suggest computing from in-memory state, using consistent-read options, or adding a delay/caveat
|
|
165
|
+
|
|
115
166
|
**Formatting & structural consistency**
|
|
116
167
|
- If the PR adds content to an existing file (list items, sections, config entries), verify the new content matches the file's existing indentation, bullet style, heading levels, and structure — rendering inconsistencies are the most common Copilot review finding
|
|
117
168
|
|
|
169
|
+
</deep_checks>
|
|
170
|
+
|
|
171
|
+
<verify_findings>
|
|
172
|
+
|
|
173
|
+
## Verify Findings
|
|
174
|
+
|
|
175
|
+
For each issue found, ground it in evidence before classifying:
|
|
176
|
+
1. **Quote the specific code line(s)** that demonstrate the issue
|
|
177
|
+
2. **Explain why it's a problem** in one sentence given the surrounding context
|
|
178
|
+
3. If the fix involves async/state changes, **trace the execution path** to confirm the issue is real
|
|
179
|
+
4. If you cannot quote specific code for a finding, downgrade it to **[UNCERTAIN]**
|
|
180
|
+
|
|
181
|
+
After verifying all findings, run the project's build and test commands to confirm no false positives.
|
|
182
|
+
|
|
183
|
+
</verify_findings>
|
|
184
|
+
|
|
185
|
+
<fix_and_report>
|
|
186
|
+
|
|
118
187
|
## Fix Issues Found
|
|
119
188
|
|
|
120
|
-
For each issue
|
|
189
|
+
For each verified issue:
|
|
121
190
|
1. Classify severity: **CRITICAL** (runtime crash, data leak, security) vs **IMPROVEMENT** (consistency, robustness, conventions)
|
|
122
191
|
2. Fix all CRITICAL issues immediately
|
|
123
192
|
3. For IMPROVEMENT issues, fix them too — the goal is to eliminate Copilot review round-trips
|
|
124
193
|
4. After fixes, run the project's test suite and build command (per project conventions already in context)
|
|
125
|
-
5.
|
|
194
|
+
5. Verify the test suite covers the changed code paths — passing unrelated tests is not validation
|
|
195
|
+
6. Commit fixes: `refactor: address code review findings`
|
|
126
196
|
|
|
127
197
|
## Report
|
|
128
198
|
|
|
@@ -144,3 +214,5 @@ Print a summary table of what was reviewed and found:
|
|
|
144
214
|
```
|
|
145
215
|
|
|
146
216
|
If no issues were found, confirm the code is clean and ready for PR.
|
|
217
|
+
|
|
218
|
+
</fix_and_report>
|
package/install.sh
CHANGED
|
@@ -1,3 +1,11 @@
|
|
|
1
|
+
<!--
|
|
2
|
+
Triage: Check Tiers 1 and 4 for every file. Check Tier 2/3 only when
|
|
3
|
+
the relevance filter matches the changed code. This prevents important
|
|
4
|
+
checks from being lost in a long list.
|
|
5
|
+
-->
|
|
6
|
+
|
|
7
|
+
## Tier 1 — Always Check (Runtime Crashes, Security, Hygiene)
|
|
8
|
+
|
|
1
9
|
**Hygiene**
|
|
2
10
|
- Leftover debug code (`console.log`, `debugger`, TODO/FIXME/HACK), hardcoded secrets/credentials, and uncommittable files (.env, node_modules, build artifacts)
|
|
3
11
|
- Overly broad changes that should be split into separate PRs
|
|
@@ -11,110 +19,157 @@
|
|
|
11
19
|
- Type coercion edge cases — `Number('')` is `0` not empty, `0` is falsy in truthy checks, `NaN` comparisons are always false; string comparison operators (`<`, `>`, `localeCompare`) do lexicographic, not semantic, ordering (e.g., `"10" < "2"`). Use explicit type checks (`Number.isFinite()`, `!= null`) and dedicated libraries (e.g., semver for versions) instead of truthy guards or lexicographic ordering when zero/empty are valid values or semantic ordering matters
|
|
12
20
|
- Functions that index into arrays without guarding empty arrays; state/variables declared but never updated or only partially wired up
|
|
13
21
|
- Shared mutable references — module-level defaults passed by reference mutate across calls (use `structuredClone()`/spread); `useCallback`/`useMemo` referencing a later `const` (temporal dead zone); object spread followed by unconditional assignment that clobbers spread values
|
|
14
|
-
-
|
|
22
|
+
- Functions with >10 branches or >15 cyclomatic complexity — refactor into smaller units
|
|
23
|
+
|
|
24
|
+
**API & URL safety**
|
|
25
|
+
- User-supplied or system-generated values interpolated into URL paths, shell commands, file paths, or subprocess arguments without encoding/validation — use `encodeURIComponent()` for URLs, regex allowlists for execution boundaries. Generated identifiers used as URL path segments must be safe for your router/storage (no `/`, `?`, `#`; consider allowlisting characters and/or applying `encodeURIComponent()`). Identifiers derived from human-readable names (slugs) used for namespaced resources (git branches, directories) need a unique suffix (ID, hash) to prevent collisions between entities with the same or similar names
|
|
26
|
+
- Route params passed to services without format validation; path containment checks using string prefix without path separator boundary (use `path.relative()`)
|
|
27
|
+
- Error/fallback responses that hardcode security headers instead of using centralized policy — error paths bypass security tightening
|
|
28
|
+
|
|
29
|
+
**Trust boundaries & data exposure**
|
|
30
|
+
- API responses returning full objects with sensitive fields — destructure and omit across ALL response paths (GET, PUT, POST, error, socket); comments/docs claiming data isn't exposed while the code path does expose it
|
|
31
|
+
- Server trusting client-provided computed/derived values (scores, totals, correctness flags) when the server can recompute them — strip and recompute server-side; don't require clients to submit fields the server should own
|
|
32
|
+
- New endpoints mounted under restricted paths (admin, internal) missing authorization verification — compare with sibling endpoints in the same route group to ensure the same access gate (role check, scope validation) is applied consistently
|
|
33
|
+
|
|
34
|
+
## Tier 2 — Check When Relevant (Data Integrity, Async, Error Handling)
|
|
15
35
|
|
|
16
|
-
**Async & state consistency**
|
|
17
|
-
- Optimistic state changes (view switches, navigation, success callbacks) before async completion — if the operation fails or is cancelled, the UI is stuck with no rollback. Check return values/errors before calling success callbacks. Handle both failure and cancellation paths
|
|
36
|
+
**Async & state consistency** _[applies when: code uses async/await, Promises, or UI state]_
|
|
37
|
+
- Optimistic state changes (view switches, navigation, success callbacks) before async completion — if the operation fails or is cancelled, the UI is stuck with no rollback. Check return values/errors before calling success callbacks. Handle both failure and cancellation paths. Watch for `.catch(() => null)` followed by unconditional success code (toast, state update) — the catch silences the error but the success path still runs. Either let errors propagate naturally or check the return value before proceeding
|
|
18
38
|
- Multiple coupled state variables updated independently — actions that change one must update all related fields; debounced/cancelable operations must reset loading state on every exit path (cleared, stale, failed, aborted)
|
|
19
39
|
- Error notification at multiple layers (shared API client + component-level) — verify exactly one layer owns user-facing error messages
|
|
20
40
|
- Optimistic updates using full-collection snapshots for rollback — a second in-flight action gets clobbered. Use per-item rollback and functional state updaters after async gaps; sync optimistic changes to parent via callback or trigger refetch on remount
|
|
21
41
|
- State updates guarded by truthiness of the new value (`if (arr?.length)`) — prevents clearing state when the source legitimately returns empty. Distinguish "no response" from "empty response"
|
|
42
|
+
- Mutation/trigger functions that return or propagate stale pre-mutation state — if a function activates, updates, or resets an entity, the returned value and any dependent scheduling/evaluation state (backoff timers, "last run" timestamps, status flags) must reflect the post-mutation state, not a snapshot read before the mutation
|
|
43
|
+
- Fire-and-forget or async writes where the in-memory object is not updated (response returns stale data) or is updated unconditionally regardless of write success (response claims state that was never persisted) — update in-memory state conditionally on write outcome, or document the tradeoff explicitly
|
|
44
|
+
- Missing `await` on async operations in error/cleanup paths — fire-and-forget cleanup (e.g., aborting a failed operation, rolling back partial state) that must complete before the function returns or the caller proceeds
|
|
22
45
|
- `Promise.all` without error handling — partial load with unhandled rejection. Wrap with fallback/error state
|
|
46
|
+
- Side effects during React render (setState, navigation, mutations outside useEffect)
|
|
23
47
|
|
|
24
|
-
**
|
|
25
|
-
-
|
|
26
|
-
- Initialization functions (schedulers, pollers, listeners) that don't guard against multiple calls — creates duplicate instances. Check for existing instances before reinitializing
|
|
27
|
-
|
|
28
|
-
**Error handling**
|
|
29
|
-
- Service functions throwing generic `Error` for client-caused conditions — bubbles as 500 instead of 400/404. Use typed error classes with explicit status codes; ensure consistent error responses across similar endpoints
|
|
48
|
+
**Error handling** _[applies when: code has try/catch, .catch, error responses, or external calls]_
|
|
49
|
+
- Service functions throwing generic `Error` for client-caused conditions — bubbles as 500 instead of 400/404. Use typed error classes with explicit status codes; ensure consistent error responses across similar endpoints. Include expected concurrency/conditional failures (transaction cancellations, optimistic lock conflicts) — catch and translate to 409/retry rather than letting them surface as 500
|
|
30
50
|
- Swallowed errors (empty `.catch(() => {})`), handlers that replace detailed failure info with generic messages, and error/catch handlers that exit cleanly (`exit 0`, `return`) without any user-visible output — surface a notification, propagate original context, and make failures look like failures
|
|
31
51
|
- Destructive operations in retry/cleanup paths assumed to succeed without their own error handling — if cleanup fails, retry logic crashes instead of reporting the intended failure
|
|
52
|
+
- External service calls without configurable timeouts — a hung downstream service blocks the caller indefinitely
|
|
53
|
+
- Missing fallback behavior when downstream services are unavailable (see also: retry without backoff in "Sync & replication")
|
|
32
54
|
|
|
33
|
-
**
|
|
34
|
-
-
|
|
35
|
-
-
|
|
36
|
-
-
|
|
37
|
-
|
|
38
|
-
**Trust boundaries & data exposure**
|
|
39
|
-
- API responses returning full objects with sensitive fields — destructure and omit across ALL response paths (GET, PUT, POST, error, socket); comments/docs claiming data isn't exposed while the code path does expose it
|
|
40
|
-
- Server trusting client-provided computed/derived values (scores, totals, correctness flags) when the server can recompute them — strip and recompute server-side; don't require clients to submit fields the server should own
|
|
41
|
-
|
|
42
|
-
**Input handling**
|
|
43
|
-
- Trimming values where whitespace is significant (API keys, tokens, passwords, base64) — only trim identifiers/names
|
|
44
|
-
- Endpoints accepting unbounded arrays/collections without upper limits — enforce max size or move to background jobs
|
|
55
|
+
**Resource management** _[applies when: code uses event listeners, timers, subscriptions, or useEffect]_
|
|
56
|
+
- Event listeners, socket handlers, subscriptions, timers, and useEffect side effects are cleaned up on unmount/teardown
|
|
57
|
+
- Deletion/destroy functions that clean up the primary resource but leave orphaned secondary resources (data directories, git branches, child records, temporary files) — trace all resources created during the entity's lifecycle and verify each is removed on delete
|
|
58
|
+
- Initialization functions (schedulers, pollers, listeners) that don't guard against multiple calls — creates duplicate instances. Check for existing instances before reinitializing
|
|
45
59
|
|
|
46
|
-
**Validation & consistency**
|
|
60
|
+
**Validation & consistency** _[applies when: code handles user input, schemas, or API contracts]_
|
|
61
|
+
- API versioning: breaking changes to public endpoints without version bump or deprecation path
|
|
62
|
+
- Backward-incompatible response shape changes without client migration plan
|
|
47
63
|
- New endpoints/schemas should match validation patterns of existing similar endpoints — field limits, required fields, types, error handling. If validation exists on one endpoint for a param, the same param on other endpoints needs the same validation
|
|
48
64
|
- When a validation/sanitization function is introduced for a field, trace ALL write paths (create, update, sync, import) — partial application means invalid values re-enter through the unguarded path
|
|
49
|
-
- Schema fields accepting values downstream code can't handle; Zod/schema stripping fields the service reads (silent `undefined`); config values persisted but silently ignored by the implementation — trace each field through schema → service → consumer
|
|
65
|
+
- Schema fields accepting values downstream code can't handle; Zod/schema stripping fields the service reads (silent `undefined`); config values persisted but silently ignored by the implementation — trace each field through schema → service → consumer. Update schemas derived from create schemas (e.g., `.partial()`) must also make nested object fields optional — shallow partial on a deeply-required schema rejects valid partial updates. Additionally, `.deepPartial()` or `.partial()` on schemas with `.default()` values will apply those defaults on update, silently overwriting existing persisted values with defaults — create explicit update schemas without defaults instead
|
|
66
|
+
- Entity creation without case-insensitive uniqueness checks — names differing only in case (e.g., "MyAgent" vs "myagent") cause collisions in case-insensitive contexts (file paths, git branches, URLs). Normalize to lowercase before comparing
|
|
50
67
|
- Handlers reading properties from framework-provided objects using field names the framework doesn't populate — silent `undefined`. Verify property names match the caller's contract
|
|
68
|
+
- Data model fields that have different names depending on the creation/write path (e.g., `createdAt` vs `created`) — code referencing only one naming convention silently misses records created through other paths. Trace all write paths to discover the actual field names in use
|
|
51
69
|
- Numeric values from strings used without `NaN`/type guards — `NaN` comparisons silently pass bounds checks. Clamp query params to safe lower bounds
|
|
52
70
|
- UI elements hidden from navigation but still accessible via direct URL — enforce restrictions at the route level
|
|
53
|
-
- Summary counters/accumulators that miss edge cases (removals, branch coverage); silent operations in verbose sequences where all branches should print status
|
|
54
|
-
|
|
55
|
-
**Intent vs implementation**
|
|
56
|
-
- Labels, comments, status messages, or documentation that describe behavior the code doesn't implement — e.g., a map named "renamed" that only deletes, or an action labeled "migrated" that never creates the target
|
|
57
|
-
- Inline code examples, command templates, and query snippets that aren't syntactically valid as written — template placeholders must use a consistent format, queries must use correct syntax for their language (e.g., single `{}` in GraphQL, not `{{}}`)
|
|
58
|
-
- Cross-references between files (identifiers, parameter names, format conventions, operational thresholds) that disagree — when one reference changes, trace all other files that reference the same entity and update them
|
|
59
|
-
- Sequential instructions or steps whose ordering doesn't match the required execution order — readers following in order will perform actions at the wrong time (e.g., "record X" in step 2 when X must be captured before step 1's action)
|
|
60
|
-
- Sequential numbering (section numbers, step numbers) with gaps or jumps after edits — verify continuity
|
|
61
|
-
- Completion markers, success flags, or status files written before the operation they attest to finishes — consumers see false success if the operation fails after the write
|
|
62
|
-
- Existence checks (directory exists, file exists, module resolves) used as proof of correct/complete installation — a directory can exist but be empty, a file can exist with invalid contents. Verify the specific resource the consumer needs
|
|
63
|
-
- Tracking/checkpoint files that default to empty on parse failure — causes full re-execution. Fail loudly instead
|
|
64
|
-
- Registering references to resources without verifying the resource exists — dangling references after failed operations
|
|
71
|
+
- Summary counters/accumulators that miss edge cases (removals, branch coverage, underflow on decrements — guard against going negative with lower-bound conditions); silent operations in verbose sequences where all branches should print status
|
|
65
72
|
|
|
66
|
-
**Concurrency & data integrity**
|
|
67
|
-
- Shared mutable state accessed by concurrent requests without locking or atomic writes; multi-step read-modify-write cycles that can interleave
|
|
73
|
+
**Concurrency & data integrity** _[applies when: code has shared state, database writes, or multi-step mutations]_
|
|
74
|
+
- Shared mutable state accessed by concurrent requests without locking or atomic writes; multi-step read-modify-write cycles that can interleave — use conditional writes/optimistic concurrency (e.g., condition expressions, version checks) to close the gap between read and write; if the conditional write fails, surface a retryable error instead of letting it bubble as a 500
|
|
68
75
|
- Multi-table writes without a transaction — FK violations or errors leave partial state
|
|
76
|
+
- Writes that replace an entire composite attribute (array, map, JSON blob) when the field is populated by multiple sources — the write discards data from other sources. Use a separate attribute, merge with the existing value, or use list/set append operations
|
|
69
77
|
- Functions with early returns for "no primary fields to update" that silently skip secondary operations (relationship updates, link writes)
|
|
70
78
|
- Functions that acquire shared state (locks, flags, markers) with exit paths that skip cleanup — leaves the system permanently locked. Trace all exit paths including error branches
|
|
71
79
|
|
|
72
|
-
**
|
|
73
|
-
-
|
|
74
|
-
-
|
|
80
|
+
**Input handling** _[applies when: code accepts user/external input]_
|
|
81
|
+
- Trimming values where whitespace is significant (API keys, tokens, passwords, base64) — only trim identifiers/names
|
|
82
|
+
- Endpoints accepting unbounded arrays/collections without upper limits — enforce max size or move to background jobs
|
|
75
83
|
|
|
76
|
-
|
|
77
|
-
- Upsert/`ON CONFLICT UPDATE` updating only a subset of exported fields — replicas diverge. Document deliberately omitted fields
|
|
78
|
-
- Pagination using `COUNT(*)` (full table scan) instead of `limit + 1`; endpoints missing `next` token input/output; hard-capped limits silently truncating results
|
|
84
|
+
## Tier 3 — Domain-Specific (Check Only When File Type Matches)
|
|
79
85
|
|
|
80
|
-
**SQL & database**
|
|
86
|
+
**SQL & database** _[applies when: code contains SQL, ORM queries, or migration files]_
|
|
81
87
|
- Parameterized query placeholder indices must match parameter array positions — especially with shared param builders or computed indices
|
|
82
88
|
- Database triggers clobbering explicitly-provided values; auto-incrementing columns that only increment on INSERT, not UPDATE
|
|
83
89
|
- Full-text search with strict parsers (`to_tsquery`) on user input — use `websearch_to_tsquery` or `plainto_tsquery`
|
|
84
90
|
- Dead queries (results never read), N+1 patterns inside transactions, O(n²) algorithms on growing data
|
|
85
91
|
- `CREATE TABLE IF NOT EXISTS` as sole migration strategy — won't add columns/indexes on upgrade. Use `ALTER TABLE ... ADD COLUMN IF NOT EXISTS` or a migration framework
|
|
86
92
|
- Functions/extensions requiring specific database versions without verification
|
|
93
|
+
- Migrations that lock tables for extended periods (ADD COLUMN with default on large tables, CREATE INDEX without CONCURRENTLY) — use concurrent operations or batched backfills
|
|
94
|
+
- Missing rollback/down migration or untested rollback path
|
|
87
95
|
|
|
88
|
-
**
|
|
96
|
+
**Sync & replication** _[applies when: code uses pagination, batch APIs, or data sync]_
|
|
97
|
+
- Upsert/`ON CONFLICT UPDATE` updating only a subset of exported fields — replicas diverge. Document deliberately omitted fields
|
|
98
|
+
- Pagination using `COUNT(*)` (full table scan) instead of `limit + 1`; endpoints missing `next` token input/output; hard-capped limits silently truncating results
|
|
99
|
+
- Batch/paginated API calls (database batch gets, external service calls) that don't handle partial results — unprocessed items, continuation tokens, or rate-limited responses silently dropped. Add retry loops with backoff for unprocessed items
|
|
100
|
+
- Retry loops without backoff or max-attempt limits — tight loops under throttling extend latency indefinitely. Use bounded retries with exponential backoff/jitter
|
|
101
|
+
|
|
102
|
+
**Lazy initialization & module loading** _[applies when: code uses dynamic imports, lazy singletons, or bootstrap sequences]_
|
|
89
103
|
- Cached state getters returning null before initialization — provide async initializer or ensure-style function
|
|
90
104
|
- Module-level side effects (file reads, SDK init) without error handling — corrupted files crash the process on import
|
|
91
105
|
- Bootstrap/resilience code that imports the dependencies it's meant to install — restructure so installation precedes resolution
|
|
92
106
|
- Re-exporting from heavy modules defeats lazy loading — use lightweight shared modules
|
|
93
107
|
|
|
94
|
-
**Data format portability**
|
|
108
|
+
**Data format portability** _[applies when: code crosses serialization boundaries — JSON, DB, IPC]_
|
|
95
109
|
- Values crossing serialization boundaries may change format (arrays in JSON vs string literals in DB) — convert consistently
|
|
110
|
+
- Reads issued immediately after writes to an eventually consistent store (database scans, replica reads, cache refreshes) may return stale data — use consistent-read options, compute from in-memory state after confirmed writes, or document the eventual-consistency window
|
|
96
111
|
- BIGINT values parsed into JavaScript `Number` — precision lost past `MAX_SAFE_INTEGER`. Use strings or `BigInt`
|
|
112
|
+
- Data model key/index design that doesn't support required query access patterns — e.g., claiming "recent" ordering but using non-time-sortable keys (random UUIDs, user IDs). Verify sort keys and indexes can serve the queries the code performs without full-partition scans and in-memory sorting
|
|
97
113
|
|
|
98
|
-
**Shell & portability**
|
|
114
|
+
**Shell & portability** _[applies when: code spawns subprocesses, uses shell scripts, or builds CLI tools]_
|
|
99
115
|
- Subprocess calls under `set -e` abort on failure; non-critical writes fail on broken pipes — use `|| true` for non-critical output
|
|
100
116
|
- Detached child processes with piped stdio — parent exit causes SIGPIPE. Redirect to log files or use `'ignore'`
|
|
101
117
|
- Platform-specific assumptions — hardcoded shell interpreters, `path.join()` backslashes breaking ESM imports. Use `pathToFileURL()` for dynamic imports
|
|
102
118
|
|
|
103
|
-
**
|
|
104
|
-
-
|
|
105
|
-
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
-
|
|
119
|
+
**Search & navigation** _[applies when: code implements search results or deep-linking]_
|
|
120
|
+
- Search results linking to generic list pages instead of deep-linking to the specific record
|
|
121
|
+
- Search/query code hardcoding one backend's implementation when the system supports multiple — verify option/parameter names are mapped between backends
|
|
122
|
+
|
|
123
|
+
**Destructive UI operations** _[applies when: code adds delete, reset, revoke, or other destructive actions]_
|
|
124
|
+
- Destructive actions (delete, reset, revoke) in the UI without a confirmation step — compare with how similar destructive operations elsewhere in the codebase handle confirmation
|
|
109
125
|
|
|
110
|
-
**Accessibility**
|
|
126
|
+
**Accessibility** _[applies when: code modifies UI components or interactive elements]_
|
|
111
127
|
- Interactive elements missing accessible names, roles, or ARIA states — including disabled interactions without `aria-disabled`
|
|
112
128
|
- Custom toggle/switch UI built from non-semantic elements instead of native inputs
|
|
113
129
|
|
|
130
|
+
## Tier 4 — Always Check (Quality, Conventions, AI-Generated Code)
|
|
131
|
+
|
|
132
|
+
**Intent vs implementation**
|
|
133
|
+
- Labels, comments, status messages, or documentation that describe behavior the code doesn't implement — e.g., a map named "renamed" that only deletes, or an action labeled "migrated" that never creates the target
|
|
134
|
+
- Inline code examples, command templates, and query snippets that aren't syntactically valid as written — template placeholders must use a consistent format, queries must use correct syntax for their language (e.g., single `{}` in GraphQL, not `{{}}`)
|
|
135
|
+
- Cross-references between files (identifiers, parameter names, format conventions, operational thresholds) that disagree — when one reference changes, trace all other files that reference the same entity and update them
|
|
136
|
+
- Responsibility relocated from one module to another (e.g., writes moved from handler to middleware) without updating all consumers that depended on the old location's timing, return value, or side effects — trace callers that relied on the synchronous or co-located behavior and verify they still work with the new execution point. Remove dead code left behind at the old location
|
|
137
|
+
- Sequential instructions or steps whose ordering doesn't match the required execution order — readers following in order will perform actions at the wrong time (e.g., "record X" in step 2 when X must be captured before step 1's action)
|
|
138
|
+
- Sequential numbering (section numbers, step numbers) with gaps or jumps after edits — verify continuity
|
|
139
|
+
- Completion markers, success flags, or status files written before the operation they attest to finishes — consumers see false success if the operation fails after the write
|
|
140
|
+
- Existence checks (directory exists, file exists, module resolves) used as proof of correct/complete installation — a directory can exist but be empty, a file can exist with invalid contents. Verify the specific resource the consumer needs
|
|
141
|
+
- Lookups that check only one scope when multiple exist — e.g., checking local git branches but not remote, checking in-memory cache but not persistent store. Trace all locations where the resource could exist and check each
|
|
142
|
+
- Tracking/checkpoint files that default to empty on parse failure — causes full re-execution. Fail loudly instead
|
|
143
|
+
- Registering references to resources without verifying the resource exists — dangling references after failed operations
|
|
144
|
+
|
|
145
|
+
**AI-generated code quality** _(Claude 4.6 specific failure modes)_
|
|
146
|
+
- Over-engineering: new abstractions, wrapper functions, helper files, or utility modules that serve only one call site — inline the logic instead
|
|
147
|
+
- Feature flags, configuration options, or extension points with only one possible value or consumer
|
|
148
|
+
- Commit messages or comments claiming a fix while the underlying bug remains — verify each claimed fix actually addresses the root cause, not just the symptom
|
|
149
|
+
- Functions containing placeholder comments (`// TODO`, `// FIXME`, `// implement later`) or stub implementations presented as complete
|
|
150
|
+
- Unnecessary defensive code: error handling for scenarios that provably cannot occur given the call site, fallbacks for internal functions that always return valid data
|
|
151
|
+
|
|
114
152
|
**Configuration & hardcoding**
|
|
115
|
-
- Hardcoded values when a config field or env var already exists; dead config fields nothing consumes; unused function parameters creating false API contracts
|
|
116
|
-
- Duplicated config/constants/utilities across modules — extract to shared module to prevent drift
|
|
153
|
+
- Hardcoded values when a config field or env var already exists; dead config fields nothing consumes; unused function parameters creating false API contracts; resource names (table names, queue names, bucket names) hardcoded without accounting for environment prefixes — lookups on response objects using the wrong key silently return undefined
|
|
154
|
+
- Duplicated config/constants/utilities/helper functions across modules — extract to shared module to prevent drift. Watch for behavioral inconsistencies between copies (e.g., one returns `'unknown'` for null while another returns `'never'`)
|
|
117
155
|
- CI pipelines installing without lockfile pinning or version constraints — non-deterministic builds
|
|
156
|
+
- Production code paths with no structured logging at entry/exit points
|
|
157
|
+
- Error logs missing reproduction context (request ID, input parameters)
|
|
158
|
+
- Async flows without correlation ID propagation
|
|
159
|
+
|
|
160
|
+
**Supply chain & dependency health**
|
|
161
|
+
- Lockfile committed and CI uses `--frozen-lockfile`; no lockfile drift from manifest
|
|
162
|
+
- `npm audit` / `cargo audit` / `pip-audit` has no unaddressed HIGH/CRITICAL vulnerabilities
|
|
163
|
+
- No `postinstall` scripts from untrusted packages executing arbitrary code without review
|
|
164
|
+
- Overly permissive version ranges (`*`, `>=`) on deps with known breaking-change history
|
|
165
|
+
|
|
166
|
+
**Test coverage**
|
|
167
|
+
- New logic/schemas/services without corresponding tests when similar existing code has tests
|
|
168
|
+
- New error paths untestable because services throw generic errors instead of typed ones
|
|
169
|
+
- Tests re-implementing logic under test instead of importing real exports — pass even when real code regresses
|
|
170
|
+
- Tests depending on real wall-clock time or external dependencies when testing logic — use fake timers and mocks
|
|
171
|
+
- Missing tests for trust-boundary enforcement — submit tampered values, verify server ignores them
|
|
172
|
+
- Tests that pass but don't cover the changed code paths — passing unrelated tests is not validation
|
|
118
173
|
|
|
119
174
|
**Style & conventions**
|
|
120
175
|
- Naming and patterns consistent with the rest of the codebase
|
|
@@ -0,0 +1,61 @@
|
|
|
1
|
+
## Remediation Agent Template
|
|
2
|
+
|
|
3
|
+
Use this template when spawning remediation agents in Phase 3c. Replace all `{PLACEHOLDERS}` with actual values.
|
|
4
|
+
|
|
5
|
+
```
|
|
6
|
+
<context>
|
|
7
|
+
Project type: {PROJECT_TYPE}
|
|
8
|
+
Build command: {BUILD_CMD}
|
|
9
|
+
Test command: {TEST_CMD}
|
|
10
|
+
Working directory: {WORKTREE_DIR} (this is a git worktree — all work happens here)
|
|
11
|
+
Foundation utilities available (if created):
|
|
12
|
+
{FOUNDATION_UTILS}
|
|
13
|
+
</context>
|
|
14
|
+
|
|
15
|
+
<findings>
|
|
16
|
+
{FINDINGS}
|
|
17
|
+
</findings>
|
|
18
|
+
|
|
19
|
+
<instructions>
|
|
20
|
+
You are {AGENT_NAME} on team better-{DATE}.
|
|
21
|
+
|
|
22
|
+
Your task: Fix all {CATEGORY} findings listed above.
|
|
23
|
+
|
|
24
|
+
FINDING VALIDATION — verify before fixing:
|
|
25
|
+
- Before fixing each finding, READ the file and at least 30 lines of surrounding
|
|
26
|
+
context to confirm the issue is genuine.
|
|
27
|
+
- Check whether the flagged code is already correct (e.g., a Promise chain that
|
|
28
|
+
IS properly awaited downstream, a value that IS validated earlier in the function,
|
|
29
|
+
a pattern that IS idiomatic for the framework).
|
|
30
|
+
- If the existing code is already correct, SKIP the fix and report it as a
|
|
31
|
+
false positive with a brief explanation of why the original code is fine.
|
|
32
|
+
- Do not make changes that are semantically equivalent to the original code
|
|
33
|
+
(e.g., wrapping a .then() chain in an async IIFE adds noise without fixing anything).
|
|
34
|
+
</instructions>
|
|
35
|
+
|
|
36
|
+
<guardrails>
|
|
37
|
+
- Only use APIs/functions verified to exist by reading source files. If a fix
|
|
38
|
+
requires an API you haven't confirmed, read the module's exports first.
|
|
39
|
+
- Fix with minimum change required. Do not introduce new abstractions or helpers
|
|
40
|
+
unless the finding specifically calls for it. A one-line fix beats a refactored module.
|
|
41
|
+
- If a git/build/file-read command fails, retry once after verifying the working
|
|
42
|
+
directory and path. If it fails again, report the error and move to the next finding.
|
|
43
|
+
</guardrails>
|
|
44
|
+
|
|
45
|
+
<commit_strategy>
|
|
46
|
+
Goal: each commit builds independently and contains one logical group of
|
|
47
|
+
related fixes. Use conventional prefixes (fix:, refactor:, feat:, security:).
|
|
48
|
+
Stage specific files only (`git -C {WORKTREE_DIR} add <specific files>` — never
|
|
49
|
+
`git add -A` or `git add .`). Run {BUILD_CMD} in {WORKTREE_DIR} before committing.
|
|
50
|
+
No co-author annotations or version bumps.
|
|
51
|
+
</commit_strategy>
|
|
52
|
+
|
|
53
|
+
CONFLICT AVOIDANCE:
|
|
54
|
+
- Only modify files listed in your assigned findings
|
|
55
|
+
- If you need to modify a file assigned to another agent, skip that change and report it
|
|
56
|
+
|
|
57
|
+
After all fixes:
|
|
58
|
+
- Ensure all changes are committed (no uncommitted work)
|
|
59
|
+
- Mark your task as completed via TaskUpdate
|
|
60
|
+
- Report: commits made, files modified, findings addressed, any skipped issues
|
|
61
|
+
```
|
package/package.json
CHANGED