slash-do 1.4.0 → 1.4.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/commands/do/better.md +92 -52
- package/commands/do/pr.md +0 -2
- package/commands/do/review.md +24 -0
- package/lib/code-review-checklist.md +82 -105
- package/lib/copilot-review-loop.md +82 -41
- package/package.json +1 -1
package/commands/do/better.md
CHANGED
|
@@ -465,6 +465,8 @@ After creating all PRs, verify CI passes on each one:
|
|
|
465
465
|
|
|
466
466
|
Maximum 5 iterations per PR to prevent infinite loops.
|
|
467
467
|
|
|
468
|
+
**IMPORTANT — Sub-agent delegation**: To prevent context exhaustion on long review cycles with multiple PRs, delegate each PR's review loop to a **separate general-purpose sub-agent** via the Agent tool. Launch sub-agents in parallel (one per PR). Each sub-agent runs the full loop (request → wait → check → fix → re-request) autonomously and returns only the final status.
|
|
469
|
+
|
|
468
470
|
### 6.0: Verify browser authentication
|
|
469
471
|
|
|
470
472
|
If `BROWSER_AUTHENTICATED` is not true (e.g., Phase 0e was skipped or failed):
|
|
@@ -472,71 +474,109 @@ If `BROWSER_AUTHENTICATED` is not true (e.g., Phase 0e was skipped or failed):
|
|
|
472
474
|
2. Check for user avatar/menu
|
|
473
475
|
3. If not logged in: navigate to `https://github.com/login`, inform the user **"Please log in to GitHub in the browser. I'll wait for you to confirm."**, and use `AskUserQuestion` to wait
|
|
474
476
|
|
|
475
|
-
### 6.1:
|
|
476
|
-
|
|
477
|
-
For each PR:
|
|
477
|
+
### 6.1: Determine review request method
|
|
478
478
|
|
|
479
|
-
**Try the API first
|
|
479
|
+
**Try the API first** on any one PR:
|
|
480
480
|
```bash
|
|
481
|
-
gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/requested_reviewers
|
|
481
|
+
gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/requested_reviewers \
|
|
482
|
+
-f 'reviewers[]=copilot-pull-request-reviewer[bot]'
|
|
482
483
|
```
|
|
483
484
|
|
|
484
|
-
If this returns 422 ("not a collaborator"),
|
|
485
|
-
1. Navigate to `{PR_URL}`
|
|
486
|
-
2. Click the "Reviewers" gear button in the PR sidebar
|
|
487
|
-
3. In the dropdown menu, click the Copilot `menuitemradio` option (not the "Request" button which may be obscured by the dropdown header)
|
|
488
|
-
4. Verify the sidebar shows "Awaiting requested review from Copilot"
|
|
485
|
+
If this returns 422 ("not a collaborator"), record `REVIEW_METHOD=playwright`. Otherwise record `REVIEW_METHOD=api`.
|
|
489
486
|
|
|
490
|
-
### 6.2:
|
|
487
|
+
### 6.2: Launch parallel sub-agents (one per PR)
|
|
491
488
|
|
|
492
|
-
|
|
489
|
+
For each PR, spawn a general-purpose sub-agent with:
|
|
493
490
|
|
|
494
|
-
For each PR, poll using GraphQL to check for a new Copilot review:
|
|
495
|
-
```bash
|
|
496
|
-
echo '{"query":"{ repository(owner: \"OWNER\", name: \"REPO\") { pullRequest(number: PR_NUM) { reviews(last: 5) { nodes { state body author { login } submittedAt } } reviewThreads(first: 100) { nodes { id isResolved comments(first: 3) { nodes { body path line author { login } } } } } } } }"}' | gh api graphql --input -
|
|
497
491
|
```
|
|
492
|
+
You are a Copilot review loop agent for PR {PR_NUMBER}.
|
|
498
493
|
|
|
499
|
-
|
|
500
|
-
|
|
501
|
-
|
|
502
|
-
|
|
503
|
-
|
|
504
|
-
|
|
505
|
-
|
|
506
|
-
|
|
507
|
-
|
|
494
|
+
Repository: {OWNER}/{REPO}
|
|
495
|
+
Branch: better/{CATEGORY_SLUG}
|
|
496
|
+
Build command: {BUILD_CMD}
|
|
497
|
+
Review request method: {REVIEW_METHOD}
|
|
498
|
+
Max iterations: 5
|
|
499
|
+
|
|
500
|
+
DECREASING TIMEOUT SCHEDULE (shorter than single-PR review since multiple
|
|
501
|
+
PRs are reviewed in parallel — see do:rpr for single-PR dynamic timing):
|
|
502
|
+
- Iteration 1: max wait 5 minutes
|
|
503
|
+
- Iteration 2: max wait 4 minutes
|
|
504
|
+
- Iteration 3: max wait 3 minutes
|
|
505
|
+
- Iteration 4: max wait 2 minutes
|
|
506
|
+
- Iteration 5+: max wait 1 minute
|
|
507
|
+
Poll interval: 30 seconds for all iterations.
|
|
508
|
+
|
|
509
|
+
Run the following loop until Copilot returns zero new comments or you hit
|
|
510
|
+
the max iteration limit:
|
|
511
|
+
|
|
512
|
+
1. CAPTURE the latest Copilot review timestamp, then REQUEST a new review:
|
|
513
|
+
- First, capture the latest Copilot review timestamp via GraphQL:
|
|
514
|
+
echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviews(last: 20) { nodes { author { login } submittedAt } } } } }"}' | gh api graphql --input -
|
|
515
|
+
- Find the most recent submittedAt where author.login is
|
|
516
|
+
copilot-pull-request-reviewer[bot] and record as LAST_COPILOT_SUBMITTED_AT.
|
|
517
|
+
- If no prior Copilot review exists, record LAST_COPILOT_SUBMITTED_AT=NONE
|
|
518
|
+
and treat the next Copilot review as NEW regardless of timestamp.
|
|
519
|
+
- Then REQUEST:
|
|
520
|
+
If REVIEW_METHOD is "api":
|
|
521
|
+
gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/requested_reviewers \
|
|
522
|
+
-f 'reviewers[]=copilot-pull-request-reviewer[bot]'
|
|
523
|
+
If REVIEW_METHOD is "playwright":
|
|
524
|
+
Navigate to the PR URL, click the "Reviewers" gear button, click the
|
|
525
|
+
Copilot menuitemradio option, verify sidebar shows "Awaiting requested
|
|
526
|
+
review from Copilot"
|
|
527
|
+
|
|
528
|
+
2. WAIT for the review (BLOCKING):
|
|
529
|
+
- Poll using stdin JSON piping (avoid shell-escaping issues):
|
|
530
|
+
echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviews(last: 5) { totalCount nodes { state body author { login } submittedAt } } reviewThreads(first: 100) { nodes { id isResolved comments(first: 3) { nodes { body path line author { login } } } } } } } }"}' | gh api graphql --input -
|
|
531
|
+
- Complete when a new copilot-pull-request-reviewer[bot] review appears
|
|
532
|
+
with submittedAt after LAST_COPILOT_SUBMITTED_AT captured in step 1
|
|
533
|
+
(or, if LAST_COPILOT_SUBMITTED_AT=NONE, when the first
|
|
534
|
+
copilot-pull-request-reviewer[bot] review for this loop appears)
|
|
535
|
+
- Use the DECREASING TIMEOUT for the current iteration number
|
|
536
|
+
- Error detection: if review body contains "Copilot encountered an error"
|
|
537
|
+
or "unable to review", re-request and resume. Max 3 error retries.
|
|
538
|
+
- If no review after max wait, report timeout and exit
|
|
539
|
+
|
|
540
|
+
3. CHECK for unresolved threads:
|
|
541
|
+
Fetch threads via stdin JSON piping:
|
|
542
|
+
echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviewThreads(first: 100) { nodes { id isResolved comments(first: 10) { nodes { body path line author { login } } } } } } } }"}' | gh api graphql --input -
|
|
543
|
+
- Verify review was successful (no error text in body)
|
|
544
|
+
- If zero comments / no unresolved threads: report success and exit
|
|
545
|
+
- If unresolved threads exist: proceed to step 4
|
|
546
|
+
|
|
547
|
+
4. FIX all unresolved threads:
|
|
548
|
+
For each unresolved thread:
|
|
549
|
+
- Read the referenced file and understand the feedback
|
|
550
|
+
- Evaluate: valid feedback → make the fix; informational/false positive →
|
|
551
|
+
resolve without changes
|
|
552
|
+
- If fixing:
|
|
553
|
+
git checkout better/{CATEGORY_SLUG}
|
|
554
|
+
# make changes
|
|
555
|
+
git add <specific files>
|
|
556
|
+
git commit -m "address Copilot review feedback"
|
|
557
|
+
git push
|
|
558
|
+
- Resolve thread via stdin JSON piping:
|
|
559
|
+
echo '{"query":"mutation { resolveReviewThread(input: {threadId: \"{THREAD_ID}\"}) { thread { id isResolved } } }"}' | gh api graphql --input -
|
|
560
|
+
- After all threads resolved, increment iteration and go back to step 1
|
|
561
|
+
|
|
562
|
+
When done, report back:
|
|
563
|
+
- Final status: clean / max-iterations-reached / timeout / error
|
|
564
|
+
- Total iterations completed
|
|
565
|
+
- List of commits made (if any)
|
|
566
|
+
- Any unresolved threads remaining
|
|
508
567
|
```
|
|
509
|
-
Save to `/tmp/better_threads_{PR_NUMBER}.json` for parsing.
|
|
510
|
-
|
|
511
|
-
- If **no unresolved threads** → mark PR as ready to merge
|
|
512
|
-
- If **unresolved threads exist** → proceed to 6.4 (fix)
|
|
513
568
|
|
|
514
|
-
|
|
569
|
+
Launch all PR sub-agents in parallel. Wait for all to complete.
|
|
515
570
|
|
|
516
|
-
|
|
517
|
-
1. Read the referenced file and understand the feedback
|
|
518
|
-
2. Evaluate: is the feedback valid? Some Copilot comments are informational or about pre-existing patterns.
|
|
519
|
-
- **Valid feedback**: make the code fix
|
|
520
|
-
- **Informational/false positive**: resolve the thread without changes
|
|
521
|
-
3. If fixing:
|
|
522
|
-
```bash
|
|
523
|
-
git checkout better/{CATEGORY_SLUG}
|
|
524
|
-
# make changes
|
|
525
|
-
git add <specific files>
|
|
526
|
-
git commit -m "address review: {summary of change}"
|
|
527
|
-
git push
|
|
528
|
-
```
|
|
529
|
-
4. Resolve the thread via GraphQL mutation:
|
|
530
|
-
```bash
|
|
531
|
-
echo '{"query":"mutation { resolveReviewThread(input: {threadId: \"{THREAD_ID}\"}) { thread { id isResolved } } }"}' | gh api graphql --input -
|
|
532
|
-
```
|
|
571
|
+
### 6.3: Handle sub-agent results
|
|
533
572
|
|
|
534
|
-
|
|
535
|
-
|
|
536
|
-
|
|
537
|
-
|
|
573
|
+
For each sub-agent result:
|
|
574
|
+
- **clean**: mark PR as ready to merge
|
|
575
|
+
- **timeout**: ask the user whether to continue waiting, re-request, or skip
|
|
576
|
+
- **max-iterations-reached**: inform the user "Reached max review iterations (5) on PR #{number}. Remaining issues may need manual review."
|
|
577
|
+
- **error**: inform the user and ask whether to retry or skip
|
|
538
578
|
|
|
539
|
-
### 6.
|
|
579
|
+
### 6.4: Merge
|
|
540
580
|
|
|
541
581
|
For each PR that has passed CI and review (in dependency order if applicable):
|
|
542
582
|
```bash
|
|
@@ -599,7 +639,7 @@ If merge fails (e.g., branch protection, merge conflicts from a prior PR):
|
|
|
599
639
|
- **Push failure**: `git pull --rebase --autostash` then retry push
|
|
600
640
|
- **CI failure on PR**: investigate logs, fix in a new commit, push, re-check (max 3 attempts per PR)
|
|
601
641
|
- **Cross-PR dependency breakage**: add backward-compatible re-exports or move shared files to the PR that creates them
|
|
602
|
-
- **Copilot timeout** (review not received within
|
|
642
|
+
- **Copilot timeout** (review not received within decreasing timeout window): inform user, offer to merge without review approval or wait longer
|
|
603
643
|
- **Copilot review loop exceeds 5 iterations per PR**: stop iterating on that PR, inform user, proceed to merge
|
|
604
644
|
- **Existing worktree found at startup**: ask user — resume (reuse worktree) or cleanup (remove and start fresh)
|
|
605
645
|
- **No findings above LOW**: skip Phases 3-7, print "No actionable findings" with the LOW summary
|
package/commands/do/pr.md
CHANGED
|
@@ -34,8 +34,6 @@ Before creating the PR, perform a thorough self-review. Read each changed file
|
|
|
34
34
|
- Create a PR from `{current_branch}` to `{default_branch}`
|
|
35
35
|
- Create a rich PR description
|
|
36
36
|
|
|
37
|
-
**IMPORTANT**: During each fix cycle in the Copilot review loop below, after fixing all review comments and before pushing, also bump the patch version (`npm version patch --no-git-tag-version` or equivalent) and commit the version bump.
|
|
38
|
-
|
|
39
37
|
!`cat ~/.claude/lib/copilot-review-loop.md`
|
|
40
38
|
|
|
41
39
|
**Report the final status** to the user including PR URL and review outcome.
|
package/commands/do/review.md
CHANGED
|
@@ -91,6 +91,30 @@ Check every file against this checklist:
|
|
|
91
91
|
**Guard-before-cache ordering**
|
|
92
92
|
- If a handler performs a pre-flight guard check (rate limit, quota, feature flag) before a cache lookup or short-circuit path, verify the guard doesn't block operations that would be served from cache without touching the guarded resource — restructure so cache hits bypass the guard
|
|
93
93
|
|
|
94
|
+
**Sanitization/validation coverage**
|
|
95
|
+
- If the PR introduces a new validation or sanitization function for a data field, trace every code path that writes to that field (create, update, import, sync, rename) — verify they all use the same sanitization. Partial application is the #1 way invalid data re-enters through an unguarded path
|
|
96
|
+
|
|
97
|
+
**Bootstrap/initialization ordering**
|
|
98
|
+
- If the PR adds resilience or self-healing code (dependency installers, auto-repair, migration runners), trace the execution order: does the main code path resolve or import the dependencies BEFORE the resilience code runs? If so, the bootstrapper never executes when it's needed most — restructure so verification/installation precedes resolution
|
|
99
|
+
|
|
100
|
+
**Lock/flag exit-path completeness**
|
|
101
|
+
- If a function sets a shared flag or lock (in-progress, mutex, status marker), trace every exit path — early returns, error catches, platform-specific guards, and normal completion — to verify the flag is cleared. A missed path leaves the system permanently locked
|
|
102
|
+
|
|
103
|
+
**Operation-marker ordering**
|
|
104
|
+
- If the PR writes completion markers, success flags, or status files, verify they are written AFTER the operation they attest to, not before. If the operation can fail after the marker write, consumers see false success. Also check that marker-dependent startup logic validates the marker's contents rather than treating presence as unconditional success
|
|
105
|
+
|
|
106
|
+
**Real-time event vs response timing**
|
|
107
|
+
- If a handler emits push notifications (WebSocket, SSE, pub/sub) AND returns an HTTP response, verify clients won't receive push events before the response that gives them context to interpret those events — especially when the response contains IDs or version numbers the event consumer needs
|
|
108
|
+
|
|
109
|
+
**Intent vs implementation (meta-cognitive pass)**
|
|
110
|
+
- For each label, comment, docstring, status message, or inline instruction that describes behavior, verify the code actually implements that behavior. A detection mechanism must query the data it claims to detect; a migration must create the target, not just delete the source
|
|
111
|
+
- If the PR contains inline code examples, command templates, or query snippets, verify they are syntactically valid for their language — run a mental parse of each example. Watch for template placeholder format inconsistencies within and across files
|
|
112
|
+
- If the PR modifies a value (identifier, parameter name, format convention, threshold, timeout) that is referenced in other files, trace all cross-references and verify they agree. This includes: reviewer usernames, API names, placeholder formats, GraphQL field names, operational constants
|
|
113
|
+
- If the PR adds or reorders sequential steps/instructions, verify the ordering matches execution dependencies — readers following steps in order must not perform an action before its prerequisite
|
|
114
|
+
|
|
115
|
+
**Formatting & structural consistency**
|
|
116
|
+
- If the PR adds content to an existing file (list items, sections, config entries), verify the new content matches the file's existing indentation, bullet style, heading levels, and structure — rendering inconsistencies are the most common Copilot review finding
|
|
117
|
+
|
|
94
118
|
## Fix Issues Found
|
|
95
119
|
|
|
96
120
|
For each issue found:
|
|
@@ -1,144 +1,121 @@
|
|
|
1
1
|
**Hygiene**
|
|
2
|
-
- Leftover debug code (`console.log`, `debugger`, TODO/FIXME/HACK
|
|
3
|
-
- Hardcoded secrets, API keys, or credentials
|
|
4
|
-
- Files that shouldn't be committed (.env, node_modules, build artifacts)
|
|
2
|
+
- Leftover debug code (`console.log`, `debugger`, TODO/FIXME/HACK), hardcoded secrets/credentials, and uncommittable files (.env, node_modules, build artifacts)
|
|
5
3
|
- Overly broad changes that should be split into separate PRs
|
|
6
4
|
|
|
7
5
|
**Imports & references**
|
|
8
|
-
- Every symbol used
|
|
9
|
-
- No unused imports introduced by the changes
|
|
6
|
+
- Every symbol used is imported (missing → runtime crash); no unused imports introduced
|
|
10
7
|
|
|
11
8
|
**Runtime correctness**
|
|
12
|
-
-
|
|
9
|
+
- Null/undefined access without guards, off-by-one errors, object spread of potentially-null values (spread of null is `{}`, silently discarding state)
|
|
10
|
+
- Data from external/user sources (parsed JSON, API responses, file reads) used without structural validation — guard against parse failures, missing properties, wrong types, and null elements before accessing nested values. When parsed data is optional enrichment, isolate failures so they don't abort the main operation
|
|
11
|
+
- Type coercion edge cases — `Number('')` is `0` not empty, `0` is falsy in truthy checks, `NaN` comparisons are always false; string comparison operators (`<`, `>`, `localeCompare`) do lexicographic, not semantic, ordering (e.g., `"10" < "2"`). Use explicit type checks (`Number.isFinite()`, `!= null`) and dedicated libraries (e.g., semver for versions) instead of truthy guards or lexicographic ordering when zero/empty are valid values or semantic ordering matters
|
|
12
|
+
- Functions that index into arrays without guarding empty arrays; state/variables declared but never updated or only partially wired up
|
|
13
|
+
- Shared mutable references — module-level defaults passed by reference mutate across calls (use `structuredClone()`/spread); `useCallback`/`useMemo` referencing a later `const` (temporal dead zone); object spread followed by unconditional assignment that clobbers spread values
|
|
13
14
|
- Side effects during React render (setState, navigation, mutations outside useEffect)
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
-
|
|
17
|
-
-
|
|
18
|
-
-
|
|
19
|
-
-
|
|
20
|
-
-
|
|
21
|
-
-
|
|
22
|
-
- Module-level default/config objects passed by reference to consumers — shared mutation across calls. Use `structuredClone()` or spread when handing out defaults
|
|
23
|
-
- `useCallback`/`useMemo` referencing a `const` declared later in the same function body — triggers temporal dead zone `ReferenceError`. Ensure dependency declarations appear before their dependents
|
|
24
|
-
- Object spread/merge followed by unconditional field assignment that clobbers spread values — e.g., `{...input.details, notes: notes || null}` silently overwrites `input.details.notes` even when `notes` is undefined. Only set fields when the overriding value is explicitly provided
|
|
25
|
-
|
|
26
|
-
**Async & UI state consistency**
|
|
27
|
-
- Optimistic UI state changes (view switches, navigation, success callbacks) before an async operation completes — if the operation fails or is cancelled (drag cancel, upload abort, form dismiss), the UI is stuck in the wrong state with no rollback. Handle both failure and cancellation paths to reset intermediate state
|
|
28
|
-
- `Promise.all` without try/catch — if any request rejects, the UI ends up partially loaded with an unhandled rejection. Wrap in try/catch with fallback/error state so the view remains usable
|
|
29
|
-
- Success callbacks (`onSaved()`, `onComplete()`) called unconditionally after an async call — check the return value or catch errors before calling the callback
|
|
30
|
-
- Debounced/cancelable async operations that don't reset loading state on all code paths (input cleared, stale response arrives, request fails) — loading spinners get stuck and stale results display. Use AbortController or request IDs to discard outdated responses and clear loading in every exit path (including early returns)
|
|
31
|
-
- Multiple UI state variables representing coupled data (coordinates + display name, selected item + dependent list) updated independently — actions that change one must update all related fields to prevent display/data mismatch
|
|
32
|
-
- Error notification at multiple layers (shared API client that auto-displays errors + component-level error handling) — verify exactly one layer is responsible for user-facing error messages to avoid duplicate toasts/alerts. Suppress lower-layer notifications when the caller handles its own error display
|
|
33
|
-
- Optimistic state updates using full-collection snapshots for rollback — if a second user action starts while the first is in-flight, rollback restores the snapshot and clobbers the second action's changes. Use per-item rollback and functional state updaters (`setState(prev => ...)`) after async gaps to avoid stale closures
|
|
34
|
-
- Child components maintaining local copies of parent-provided data for optimistic updates without propagating changes back — on unmount/remount the parent's stale cache is re-rendered. Sync optimistic changes to the parent via callback alongside local state, or trigger a data refetch on remount
|
|
15
|
+
|
|
16
|
+
**Async & state consistency**
|
|
17
|
+
- Optimistic state changes (view switches, navigation, success callbacks) before async completion — if the operation fails or is cancelled, the UI is stuck with no rollback. Check return values/errors before calling success callbacks. Handle both failure and cancellation paths
|
|
18
|
+
- Multiple coupled state variables updated independently — actions that change one must update all related fields; debounced/cancelable operations must reset loading state on every exit path (cleared, stale, failed, aborted)
|
|
19
|
+
- Error notification at multiple layers (shared API client + component-level) — verify exactly one layer owns user-facing error messages
|
|
20
|
+
- Optimistic updates using full-collection snapshots for rollback — a second in-flight action gets clobbered. Use per-item rollback and functional state updaters after async gaps; sync optimistic changes to parent via callback or trigger refetch on remount
|
|
21
|
+
- State updates guarded by truthiness of the new value (`if (arr?.length)`) — prevents clearing state when the source legitimately returns empty. Distinguish "no response" from "empty response"
|
|
22
|
+
- `Promise.all` without error handling — partial load with unhandled rejection. Wrap with fallback/error state
|
|
35
23
|
|
|
36
24
|
**Resource management**
|
|
37
|
-
- Event listeners, socket handlers, subscriptions, and
|
|
38
|
-
-
|
|
25
|
+
- Event listeners, socket handlers, subscriptions, timers, and useEffect side effects are cleaned up on unmount/teardown
|
|
26
|
+
- Initialization functions (schedulers, pollers, listeners) that don't guard against multiple calls — creates duplicate instances. Check for existing instances before reinitializing
|
|
39
27
|
|
|
40
|
-
**
|
|
41
|
-
- Service functions
|
|
42
|
-
-
|
|
28
|
+
**Error handling**
|
|
29
|
+
- Service functions throwing generic `Error` for client-caused conditions — bubbles as 500 instead of 400/404. Use typed error classes with explicit status codes; ensure consistent error responses across similar endpoints
|
|
30
|
+
- Swallowed errors (empty `.catch(() => {})`), handlers that replace detailed failure info with generic messages, and error/catch handlers that exit cleanly (`exit 0`, `return`) without any user-visible output — surface a notification, propagate original context, and make failures look like failures
|
|
31
|
+
- Destructive operations in retry/cleanup paths assumed to succeed without their own error handling — if cleanup fails, retry logic crashes instead of reporting the intended failure
|
|
43
32
|
|
|
44
33
|
**API & URL safety**
|
|
45
|
-
- User-supplied values interpolated into URL paths
|
|
46
|
-
- Route params
|
|
47
|
-
-
|
|
48
|
-
- Path containment checks using string prefix comparison (`resolvedPath.startsWith(baseDir)`) without a path separator boundary — `baseDir + "evil/..."` passes the check. Use `path.relative()` (reject if starts with `..`) or append `path.sep` to the base
|
|
49
|
-
- Error/fallback responses that hardcode security headers (CORS, CSP) instead of using the centralized policy — error paths bypass security tightening applied to happy paths. Always reuse shared header middleware/constants
|
|
50
|
-
|
|
51
|
-
**Data exposure**
|
|
52
|
-
- API responses returning full objects that contain sensitive fields (secrets, tokens, passwords) — destructure and omit before sending. Check ALL response paths (GET, PUT, POST) not just one
|
|
53
|
-
- Comments/docs claiming data is never exposed while the code path does expose it
|
|
34
|
+
- User-supplied values interpolated into URL paths, shell commands, file paths, or subprocess arguments without encoding/validation — use `encodeURIComponent()` for URLs, regex allowlists for execution boundaries
|
|
35
|
+
- Route params passed to services without format validation; path containment checks using string prefix without path separator boundary (use `path.relative()`)
|
|
36
|
+
- Error/fallback responses that hardcode security headers instead of using centralized policy — error paths bypass security tightening
|
|
54
37
|
|
|
55
|
-
**
|
|
56
|
-
-
|
|
57
|
-
-
|
|
58
|
-
- API responses leaking answer keys or expected values that the client will later submit back — either strip before responding or use server-side nonce/seed verification
|
|
38
|
+
**Trust boundaries & data exposure**
|
|
39
|
+
- API responses returning full objects with sensitive fields — destructure and omit across ALL response paths (GET, PUT, POST, error, socket); comments/docs claiming data isn't exposed while the code path does expose it
|
|
40
|
+
- Server trusting client-provided computed/derived values (scores, totals, correctness flags) when the server can recompute them — strip and recompute server-side; don't require clients to submit fields the server should own
|
|
59
41
|
|
|
60
42
|
**Input handling**
|
|
61
|
-
- Trimming values where whitespace is significant (API keys, tokens, passwords, base64) — only trim identifiers/names
|
|
62
|
-
-
|
|
63
|
-
- Endpoints that accept unbounded arrays/collections without an upper limit — large payloads can exceed request timeouts, exhaust memory, or create DoS vectors. Enforce a max size and return 400 when exceeded, or move large operations to background jobs
|
|
43
|
+
- Trimming values where whitespace is significant (API keys, tokens, passwords, base64) — only trim identifiers/names
|
|
44
|
+
- Endpoints accepting unbounded arrays/collections without upper limits — enforce max size or move to background jobs
|
|
64
45
|
|
|
65
46
|
**Validation & consistency**
|
|
66
|
-
- New endpoints/schemas match validation
|
|
67
|
-
-
|
|
68
|
-
-
|
|
69
|
-
-
|
|
70
|
-
-
|
|
71
|
-
-
|
|
72
|
-
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
-
|
|
76
|
-
-
|
|
77
|
-
-
|
|
78
|
-
-
|
|
79
|
-
-
|
|
47
|
+
- New endpoints/schemas should match validation patterns of existing similar endpoints — field limits, required fields, types, error handling. If validation exists on one endpoint for a param, the same param on other endpoints needs the same validation
|
|
48
|
+
- When a validation/sanitization function is introduced for a field, trace ALL write paths (create, update, sync, import) — partial application means invalid values re-enter through the unguarded path
|
|
49
|
+
- Schema fields accepting values downstream code can't handle; Zod/schema stripping fields the service reads (silent `undefined`); config values persisted but silently ignored by the implementation — trace each field through schema → service → consumer
|
|
50
|
+
- Handlers reading properties from framework-provided objects using field names the framework doesn't populate — silent `undefined`. Verify property names match the caller's contract
|
|
51
|
+
- Numeric values from strings used without `NaN`/type guards — `NaN` comparisons silently pass bounds checks. Clamp query params to safe lower bounds
|
|
52
|
+
- UI elements hidden from navigation but still accessible via direct URL — enforce restrictions at the route level
|
|
53
|
+
- Summary counters/accumulators that miss edge cases (removals, branch coverage); silent operations in verbose sequences where all branches should print status
|
|
54
|
+
|
|
55
|
+
**Intent vs implementation**
|
|
56
|
+
- Labels, comments, status messages, or documentation that describe behavior the code doesn't implement — e.g., a map named "renamed" that only deletes, or an action labeled "migrated" that never creates the target
|
|
57
|
+
- Inline code examples, command templates, and query snippets that aren't syntactically valid as written — template placeholders must use a consistent format, queries must use correct syntax for their language (e.g., single `{}` in GraphQL, not `{{}}`)
|
|
58
|
+
- Cross-references between files (identifiers, parameter names, format conventions, operational thresholds) that disagree — when one reference changes, trace all other files that reference the same entity and update them
|
|
59
|
+
- Sequential instructions or steps whose ordering doesn't match the required execution order — readers following in order will perform actions at the wrong time (e.g., "record X" in step 2 when X must be captured before step 1's action)
|
|
60
|
+
- Sequential numbering (section numbers, step numbers) with gaps or jumps after edits — verify continuity
|
|
61
|
+
- Completion markers, success flags, or status files written before the operation they attest to finishes — consumers see false success if the operation fails after the write
|
|
62
|
+
- Existence checks (directory exists, file exists, module resolves) used as proof of correct/complete installation — a directory can exist but be empty, a file can exist with invalid contents. Verify the specific resource the consumer needs
|
|
63
|
+
- Tracking/checkpoint files that default to empty on parse failure — causes full re-execution. Fail loudly instead
|
|
64
|
+
- Registering references to resources without verifying the resource exists — dangling references after failed operations
|
|
80
65
|
|
|
81
66
|
**Concurrency & data integrity**
|
|
82
|
-
- Shared mutable state
|
|
83
|
-
- Multi-
|
|
84
|
-
-
|
|
85
|
-
- Functions
|
|
67
|
+
- Shared mutable state accessed by concurrent requests without locking or atomic writes; multi-step read-modify-write cycles that can interleave
|
|
68
|
+
- Multi-table writes without a transaction — FK violations or errors leave partial state
|
|
69
|
+
- Functions with early returns for "no primary fields to update" that silently skip secondary operations (relationship updates, link writes)
|
|
70
|
+
- Functions that acquire shared state (locks, flags, markers) with exit paths that skip cleanup — leaves the system permanently locked. Trace all exit paths including error branches
|
|
86
71
|
|
|
87
72
|
**Search & navigation**
|
|
88
|
-
- Search results
|
|
89
|
-
- Search
|
|
73
|
+
- Search results linking to generic list pages instead of deep-linking to the specific record
|
|
74
|
+
- Search/query code hardcoding one backend's implementation when the system supports multiple — verify option/parameter names are mapped between backends
|
|
90
75
|
|
|
91
76
|
**Sync & replication**
|
|
92
|
-
- Upsert/`ON CONFLICT UPDATE`
|
|
93
|
-
- Pagination using `COUNT(*)`
|
|
94
|
-
- Pagination endpoints that return a `next` token but don't accept one as input (or vice versa) — clients can't retrieve pages beyond the first. Also check that hard-capped query limits (e.g., `Limit: 100`) don't silently truncate results when offset exceeds the cap
|
|
77
|
+
- Upsert/`ON CONFLICT UPDATE` updating only a subset of exported fields — replicas diverge. Document deliberately omitted fields
|
|
78
|
+
- Pagination using `COUNT(*)` (full table scan) instead of `limit + 1`; endpoints missing `next` token input/output; hard-capped limits silently truncating results
|
|
95
79
|
|
|
96
80
|
**SQL & database**
|
|
97
|
-
- Parameterized query placeholder indices
|
|
98
|
-
- Database triggers
|
|
99
|
-
-
|
|
100
|
-
-
|
|
101
|
-
-
|
|
102
|
-
-
|
|
103
|
-
- N+1 query patterns inside transactions (SELECT + INSERT/UPDATE per row) — use batched upserts (`INSERT ... ON CONFLICT ... DO UPDATE`) to reduce round-trips and lock time
|
|
104
|
-
- `CREATE TABLE IF NOT EXISTS` used as the sole schema migration strategy — it won't add new columns, indexes, or triggers to existing tables on upgrade. Use `ALTER TABLE ... ADD COLUMN IF NOT EXISTS` or a migration framework for schema evolution
|
|
105
|
-
- O(n²) algorithms (self-joins, all-pairs comparisons, nested loops over full tables) triggered per-request on data that grows over time — these become prohibitive at scale. Add caps, use indexed lookups, or move to background jobs
|
|
81
|
+
- Parameterized query placeholder indices must match parameter array positions — especially with shared param builders or computed indices
|
|
82
|
+
- Database triggers clobbering explicitly-provided values; auto-incrementing columns that only increment on INSERT, not UPDATE
|
|
83
|
+
- Full-text search with strict parsers (`to_tsquery`) on user input — use `websearch_to_tsquery` or `plainto_tsquery`
|
|
84
|
+
- Dead queries (results never read), N+1 patterns inside transactions, O(n²) algorithms on growing data
|
|
85
|
+
- `CREATE TABLE IF NOT EXISTS` as sole migration strategy — won't add columns/indexes on upgrade. Use `ALTER TABLE ... ADD COLUMN IF NOT EXISTS` or a migration framework
|
|
86
|
+
- Functions/extensions requiring specific database versions without verification
|
|
106
87
|
|
|
107
88
|
**Lazy initialization & module loading**
|
|
108
|
-
- Cached state getters
|
|
109
|
-
-
|
|
110
|
-
-
|
|
89
|
+
- Cached state getters returning null before initialization — provide async initializer or ensure-style function
|
|
90
|
+
- Module-level side effects (file reads, SDK init) without error handling — corrupted files crash the process on import
|
|
91
|
+
- Bootstrap/resilience code that imports the dependencies it's meant to install — restructure so installation precedes resolution
|
|
92
|
+
- Re-exporting from heavy modules defeats lazy loading — use lightweight shared modules
|
|
111
93
|
|
|
112
94
|
**Data format portability**
|
|
113
|
-
- Values
|
|
114
|
-
-
|
|
115
|
-
|
|
116
|
-
**Shell script safety**
|
|
117
|
-
- Subprocess calls in shell scripts under `set -e` — if the subprocess fails, the script aborts. Also check non-critical writes (e.g., `echo` to stdout) which fail on broken pipes and trigger exit — use `|| true` for non-critical output
|
|
118
|
-
- Detached/background child processes spawned with piped stdio — if the parent exits (restart, crash), pipes close and writes cause SIGPIPE. Redirect stdio to log files or use `'ignore'` for children that must outlive the parent
|
|
119
|
-
- When the same data structure is manipulated in both application code and shell-inline scripts, apply identical guards in both places
|
|
95
|
+
- Values crossing serialization boundaries may change format (arrays in JSON vs string literals in DB) — convert consistently
|
|
96
|
+
- BIGINT values parsed into JavaScript `Number` — precision lost past `MAX_SAFE_INTEGER`. Use strings or `BigInt`
|
|
120
97
|
|
|
121
|
-
**
|
|
122
|
-
-
|
|
98
|
+
**Shell & portability**
|
|
99
|
+
- Subprocess calls under `set -e` abort on failure; non-critical writes fail on broken pipes — use `|| true` for non-critical output
|
|
100
|
+
- Detached child processes with piped stdio — parent exit causes SIGPIPE. Redirect to log files or use `'ignore'`
|
|
101
|
+
- Platform-specific assumptions — hardcoded shell interpreters, `path.join()` backslashes breaking ESM imports. Use `pathToFileURL()` for dynamic imports
|
|
123
102
|
|
|
124
103
|
**Test coverage**
|
|
125
|
-
- New
|
|
126
|
-
- New error paths
|
|
127
|
-
- Tests
|
|
128
|
-
-
|
|
129
|
-
-
|
|
104
|
+
- New logic/schemas/services without corresponding tests when similar existing code has tests
|
|
105
|
+
- New error paths untestable because services throw generic errors instead of typed ones
|
|
106
|
+
- Tests re-implementing logic under test instead of importing real exports — pass even when real code regresses
|
|
107
|
+
- Tests depending on real wall-clock time or external dependencies when testing logic — use fake timers and mocks
|
|
108
|
+
- Missing tests for trust-boundary enforcement — submit tampered values, verify server ignores them
|
|
130
109
|
|
|
131
110
|
**Accessibility**
|
|
132
|
-
- Interactive elements
|
|
133
|
-
- Custom toggle/switch UI built from
|
|
111
|
+
- Interactive elements missing accessible names, roles, or ARIA states — including disabled interactions without `aria-disabled`
|
|
112
|
+
- Custom toggle/switch UI built from non-semantic elements instead of native inputs
|
|
134
113
|
|
|
135
114
|
**Configuration & hardcoding**
|
|
136
|
-
- Hardcoded values
|
|
137
|
-
-
|
|
138
|
-
-
|
|
139
|
-
- Duplicated config/constants/utility helpers across modules — extract to a single shared module to prevent drift (watch for circular imports when choosing the shared location)
|
|
140
|
-
- CI pipelines that install dependencies without lockfile pinning (`npm install` instead of `npm ci`) or that ad-hoc install packages without version constraints — creates non-deterministic builds that can break unpredictably
|
|
115
|
+
- Hardcoded values when a config field or env var already exists; dead config fields nothing consumes; unused function parameters creating false API contracts
|
|
116
|
+
- Duplicated config/constants/utilities across modules — extract to shared module to prevent drift
|
|
117
|
+
- CI pipelines installing without lockfile pinning or version constraints — non-deterministic builds
|
|
141
118
|
|
|
142
119
|
**Style & conventions**
|
|
143
120
|
- Naming and patterns consistent with the rest of the codebase
|
|
144
|
-
-
|
|
121
|
+
- Formatting consistency within each file — new content must match existing indentation, bullet style, heading levels, and structure
|
|
@@ -1,46 +1,87 @@
|
|
|
1
1
|
## Copilot Code Review Loop
|
|
2
2
|
|
|
3
|
-
After the PR is created, run the Copilot review-and-fix loop
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
3
|
+
After the PR is created, run the Copilot review-and-fix loop.
|
|
4
|
+
|
|
5
|
+
**IMPORTANT — Sub-agent delegation**: To prevent context exhaustion on long review cycles, delegate the entire review loop to a **general-purpose sub-agent** via the Agent tool. The sub-agent runs the full loop (request → wait → check → fix → re-request) autonomously and returns only the final status. This keeps the parent agent's context clean.
|
|
6
|
+
|
|
7
|
+
### Sub-agent prompt template:
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
You are a Copilot review loop agent.
|
|
11
|
+
|
|
12
|
+
PR: {PR_NUMBER} in {OWNER}/{REPO}
|
|
13
|
+
Branch: {BRANCH_NAME}
|
|
14
|
+
Build command: {BUILD_CMD}
|
|
15
|
+
Max iterations: 5
|
|
16
|
+
|
|
17
|
+
TIMEOUT SCHEDULE:
|
|
18
|
+
When running parallel PR reviews (do:better), use shorter waits to avoid
|
|
19
|
+
blocking other PRs:
|
|
20
|
+
- Iteration 1: max wait 5 minutes
|
|
21
|
+
- Iteration 2: max wait 4 minutes
|
|
22
|
+
- Iteration 3: max wait 3 minutes
|
|
23
|
+
- Iteration 4: max wait 2 minutes
|
|
24
|
+
- Iteration 5+: max wait 1 minute
|
|
25
|
+
When running a single-PR review (do:pr, do:release), use dynamic timing:
|
|
26
|
+
check the previous Copilot review duration on this PR and wait up to 2x
|
|
27
|
+
that (minimum 5 minutes, maximum 20 minutes). Copilot reviews can take
|
|
28
|
+
10-15 minutes for large diffs.
|
|
29
|
+
Poll interval: 30 seconds for all iterations.
|
|
30
|
+
|
|
31
|
+
Run the following loop until Copilot returns zero new comments or you hit
|
|
32
|
+
the max iteration limit:
|
|
33
|
+
|
|
34
|
+
1. CAPTURE the latest Copilot review submittedAt timestamp (so you can
|
|
35
|
+
detect when a NEW review arrives):
|
|
36
|
+
echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviews(last: 5) { nodes { author { login } submittedAt } } } } }"}' | gh api graphql --input -
|
|
37
|
+
Record the most recent submittedAt from copilot-pull-request-reviewer[bot].
|
|
38
|
+
Then REQUEST a Copilot review:
|
|
39
|
+
gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/requested_reviewers \
|
|
40
|
+
-f 'reviewers[]=copilot-pull-request-reviewer[bot]'
|
|
41
|
+
CRITICAL: The reviewer name MUST include the [bot] suffix.
|
|
42
|
+
- For public repos: check if a review already exists before requesting
|
|
43
|
+
- If no Copilot reviewer is configured, report back and exit
|
|
44
|
+
|
|
45
|
+
2. WAIT for the review to complete (BLOCKING):
|
|
46
|
+
- Poll using stdin JSON piping to avoid shell-escaping issues:
|
|
47
|
+
echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviews(last: 5) { totalCount nodes { state body author { login } submittedAt } } reviewThreads(first: 100) { nodes { id isResolved comments(first: 3) { nodes { body path line author { login } } } } } } } }"}' | gh api graphql --input -
|
|
48
|
+
- The review is complete when a new Copilot review node appears with a
|
|
49
|
+
submittedAt after the timestamp captured in step 1
|
|
50
|
+
- For parallel PR reviews (do:better): use the DECREASING TIMEOUT for
|
|
51
|
+
the current iteration number
|
|
52
|
+
- For single-PR reviews (do:pr, do:release): use dynamic timing based on
|
|
53
|
+
the previous Copilot review duration on this PR (2x that, min 5 min,
|
|
54
|
+
max 20 min)
|
|
55
|
+
- Error detection: if the review body contains "Copilot encountered an
|
|
56
|
+
error" or "unable to review this pull request", re-request (step 1)
|
|
57
|
+
and resume polling. Max 3 error retries before reporting failure.
|
|
58
|
+
- If no review appears after max wait, report the timeout — the parent
|
|
59
|
+
agent will ask the user what to do
|
|
60
|
+
|
|
61
|
+
3. CHECK for unresolved comments:
|
|
62
|
+
- Filter review threads for isResolved: false
|
|
63
|
+
- First verify the review was successful: check that the latest Copilot
|
|
64
|
+
review body does NOT contain error text. If it does, go back to step 1.
|
|
65
|
+
- If zero comments (body says "generated 0 comments" or no unresolved
|
|
66
|
+
threads): PR is clean — report success and exit
|
|
67
|
+
- If unresolved comments exist: proceed to step 4
|
|
68
|
+
|
|
69
|
+
4. FIX all unresolved review comments:
|
|
35
70
|
For each unresolved thread:
|
|
36
71
|
- Read the referenced file and understand the feedback
|
|
37
72
|
- Make the code fix
|
|
38
|
-
- Run the build
|
|
39
|
-
- If build passes, commit
|
|
40
|
-
- Resolve the thread via GraphQL mutation:
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
73
|
+
- Run the build command
|
|
74
|
+
- If build passes, commit: address review: <summary>
|
|
75
|
+
- Resolve the thread via GraphQL mutation using stdin JSON piping:
|
|
76
|
+
echo '{"query":"mutation { resolveReviewThread(input: {threadId: \"{THREAD_ID}\"}) { thread { id isResolved } } }"}' | gh api graphql --input -
|
|
77
|
+
- After all threads resolved, push all commits to remote
|
|
78
|
+
- Increment iteration counter and go back to step 1
|
|
79
|
+
|
|
80
|
+
When done, report back:
|
|
81
|
+
- Final status: clean / max-iterations-reached / timeout / error
|
|
82
|
+
- Total iterations completed
|
|
83
|
+
- List of commits made (if any)
|
|
84
|
+
- Any unresolved threads remaining
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
Launch the sub-agent and wait for its result. If the sub-agent reports a timeout or error, **ask the user** whether to continue waiting, re-request the review, or skip — never proceed without user approval when the review loop fails.
|
package/package.json
CHANGED