slash-do 1.8.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,13 +1,18 @@
1
1
  ---
2
2
  description: SwiftUI DevSecOps audit, remediation, test enhancement, per-category PRs, CI verification, and Copilot review loop with worktree isolation — optimized for multi-platform Swift/SwiftUI apps (iOS, macOS, watchOS, tvOS, visionOS)
3
- argument-hint: "[--scan-only] [--no-merge] [path filter or focus areas]"
3
+ argument-hint: "[--interactive] [--scan-only] [--no-merge] [path filter or focus areas]"
4
4
  ---
5
5
 
6
6
  # Better Swift — Unified DevSecOps Pipeline for SwiftUI Apps
7
7
 
8
8
  Run the full DevSecOps lifecycle optimized for Swift/SwiftUI multi-platform projects: audit the codebase with 7 deduplicated agents, consolidate findings, remediate in an isolated worktree, create **separate PRs per category** with SemVer bump, verify CI, run Copilot review loops, and merge.
9
9
 
10
+ **Default mode: fully autonomous.** Uses Balanced model profile, proceeds through all phases without prompting, auto-merges PRs with clean reviews.
11
+
12
+ **`--interactive` mode:** Pauses for model profile selection, review findings approval, guardrail decisions, and merge confirmation.
13
+
10
14
  Parse `$ARGUMENTS` for:
15
+ - **`--interactive`**: pause at each decision point for user approval
11
16
  - **`--scan-only`**: run Phase 0 + 1 + 2 only (audit and plan), skip remediation
12
17
  - **`--no-merge`**: run through PR creation (Phase 5), skip Copilot review and merge
13
18
  - **Path filter**: limit scanning scope to specific directories or files
@@ -15,7 +20,13 @@ Parse `$ARGUMENTS` for:
15
20
 
16
21
  ## Configuration
17
22
 
18
- Before starting the pipeline, present the user with configuration options using `AskUserQuestion`:
23
+ ### Default Mode (autonomous)
24
+
25
+ Use the **Balanced** model profile automatically (`AUDIT_MODEL=sonnet`, `REMEDIATION_MODEL=sonnet`).
26
+
27
+ ### Interactive Mode (`--interactive`)
28
+
29
+ Present the user with configuration options using `AskUserQuestion`:
19
30
 
20
31
  ```
21
32
  AskUserQuestion([
@@ -56,7 +67,7 @@ When compacting during this workflow, always preserve:
56
67
  - All CRITICAL/HIGH findings with file:line references
57
68
  - The current phase number and what phases remain
58
69
  - All PR numbers and URLs created so far
59
- - `BUILD_CMD`, `TEST_CMD`, `PROJECT_TYPE`, `WORKTREE_DIR` values
70
+ - `BUILD_CMD`, `TEST_CMD`, `PROJECT_TYPE`, `WORKTREE_DIR`, `REPO_DIR` values
60
71
  - `VCS_HOST`, `CLI_TOOL`, `DEFAULT_BRANCH`, `CURRENT_BRANCH`
61
72
  - `PLATFORMS` (list of supported platforms: iOS, macOS, etc.)
62
73
  - `DEPLOYMENT_TARGETS` (minimum OS versions per platform)
@@ -154,6 +165,7 @@ If the project has a `Makefile` or `fastlane/Fastfile`, check for custom build/t
154
165
  Record as `BUILD_CMD` and `TEST_CMD`.
155
166
 
156
167
  ### 0d: State Snapshot
168
+ - Record `REPO_DIR` via `git rev-parse --show-toplevel`
157
169
  - Record `CURRENT_BRANCH` via `git rev-parse --abbrev-ref HEAD`
158
170
  - Record `DEFAULT_BRANCH` via `gh repo view --json defaultBranchRef --jq '.defaultBranchRef.name'` (or `glab` equivalent)
159
171
  - Record `IS_DIRTY` via `git status --porcelain`
@@ -561,18 +573,43 @@ Before creating PRs, run a deep code review on all remediation changes to catch
561
573
  3. For each issue found:
562
574
  - Fix in a new commit: `fix: {description of review finding}`
563
575
  - Re-run `{BUILD_CMD}` and `{TEST_CMD}` on ALL platforms to verify
564
- 4. Present a summary of review findings and fixes to the user via `AskUserQuestion`:
576
+ 4. **Default mode**: Print a brief summary of findings and fixes, then proceed to PR creation automatically.
577
+ **Interactive mode (`--interactive`)**: Present a summary to the user via `AskUserQuestion`:
565
578
  ```
566
579
  AskUserQuestion([{
567
580
  question: "Code review complete. {N} issues found and fixed. {list}. All {PLATFORMS} platforms build and test successfully. Proceed to PR creation?",
568
581
  options: [
569
582
  { label: "Proceed", description: "Create per-category PRs" },
583
+ { label: "Commit directly", description: "Merge worktree changes into {CURRENT_BRANCH} — no PRs, no review loops" },
570
584
  { label: "Show diff", description: "Show the full diff for manual review before proceeding" },
571
585
  { label: "Abort", description: "Stop here — I'll review manually" }
572
586
  ]
573
587
  }])
574
588
  ```
575
- 5. If "Show diff" selected, print the diff and re-ask. If "Abort", stop and print the worktree path.
589
+ 5. (Interactive only) If "Show diff" selected, print the diff and re-ask. If "Abort", stop and print the worktree path.
590
+ 6. If "Commit directly" selected:
591
+ - All remediation and review fixes are already committed incrementally in the worktree branch `better-swift/{DATE}`. If any uncommitted changes remain, stage and commit them now:
592
+ ```bash
593
+ cd {WORKTREE_DIR}
594
+ git diff --quiet && git diff --cached --quiet || {
595
+ git add <list of remaining changed files>
596
+ git commit -m "fix: better-swift audit remediation — remaining changes"
597
+ }
598
+ ```
599
+ - Return to the main repo checkout, merge the worktree branch, and clean up on success:
600
+ ```bash
601
+ cd {REPO_DIR}
602
+ git checkout {CURRENT_BRANCH}
603
+ if git merge better-swift/{DATE}; then
604
+ git worktree remove {WORKTREE_DIR}
605
+ git branch -D better-swift/{DATE}
606
+ else
607
+ echo "Merge conflict — resolve in {REPO_DIR}, then run:"
608
+ echo " git worktree remove {WORKTREE_DIR}"
609
+ echo " git branch -D better-swift/{DATE}"
610
+ fi
611
+ ```
612
+ - Restore stash if needed (`git stash pop`), update PLAN.md, print final summary, then **stop** — this completes the workflow (Phases 5, 6, and 7 are skipped entirely since no PRs or category branches were created)
576
613
 
577
614
  ## Phase 4c: Test Enhancement
578
615
 
@@ -825,7 +862,7 @@ After creating all PRs, verify CI passes on each one:
825
862
 
826
863
  ## Phase 6: Copilot Review Loop (GitHub only)
827
864
 
828
- Loop until Copilot returns zero new comments (no fixed iteration limit). Sub-agents enforce a 10-iteration guardrail: at iteration 10 the sub-agent stops and returns a "guardrail" status, prompting the parent agent to ask the user whether to continue or stop.
865
+ Loop until Copilot returns zero new comments (no fixed iteration limit). Sub-agents enforce a 10-iteration guardrail: at iteration 10 the sub-agent stops and returns a "guardrail" status. **Default mode**: auto-stop at the guardrail. **Interactive mode (`--interactive`)**: prompt the parent agent to ask the user whether to continue or stop.
829
866
 
830
867
  **Sub-agent delegation** (prevents context exhaustion): delegate each PR's review loop to a **separate general-purpose sub-agent** via the Agent tool. Launch sub-agents in parallel (one per PR). Each sub-agent runs the full loop (request → wait → check → fix → re-request) autonomously and returns only the final status.
831
868
 
@@ -845,13 +882,19 @@ Launch all PR sub-agents in parallel. Wait for all to complete.
845
882
 
846
883
  For each sub-agent result:
847
884
  - **clean**: mark PR as ready to merge
848
- - **timeout**: inform the user "Copilot review timed out on PR #{number}." and ask whether to continue waiting, re-request, or skip
849
- - **error**: inform the user and ask whether to retry or skip
850
- - **guardrail**: the sub-agent hit the 10-iteration limit; ask the user whether to continue with more iterations or stop
885
+ - **timeout**: **Default mode**: skip the timed-out PR and continue. **Interactive mode**: inform the user and ask whether to continue waiting, re-request, or skip
886
+ - **error**: **Default mode**: retry up to 3 times, then skip. **Interactive mode**: inform the user and ask whether to retry or skip
887
+ - **guardrail**: the sub-agent hit the 10-iteration limit. **Default mode**: auto-stop and mark as best-effort. **Interactive mode**: ask the user whether to continue with more iterations or stop
851
888
 
852
889
  ### 6.3: Merge Gate (MANDATORY)
853
890
 
854
- **Do NOT merge any PR until Copilot review has completed (approved or commented) on ALL PRs, or the user explicitly approves skipping.**
891
+ **Do NOT merge any PR until its own Copilot review has completed (approved or commented with zero unresolved issues).**
892
+
893
+ ### Default Mode (autonomous)
894
+
895
+ Print the review status summary, then auto-merge all PRs whose reviews completed cleanly. PRs that timed out, hit guardrails, or still have unresolved comments are left open for manual review. Print which PRs were merged and which were left open.
896
+
897
+ ### Interactive Mode (`--interactive`)
855
898
 
856
899
  Present the review status summary to the user via `AskUserQuestion`:
857
900
  ```
@@ -866,7 +909,7 @@ AskUserQuestion([{
866
909
  }])
867
910
  ```
868
911
 
869
- Only proceed with merging based on the user's selection. Never auto-merge without user confirmation.
912
+ Only proceed with merging based on the user's selection.
870
913
 
871
914
  ### 6.4: Merge
872
915
 
@@ -1,13 +1,18 @@
1
1
  ---
2
2
  description: Unified DevSecOps audit, remediation, test enhancement, per-category PRs, CI verification, and Copilot review loop with worktree isolation
3
- argument-hint: "[--scan-only] [--no-merge] [path filter or focus areas]"
3
+ argument-hint: "[--interactive] [--scan-only] [--no-merge] [path filter or focus areas]"
4
4
  ---
5
5
 
6
6
  # Better — Unified DevSecOps Pipeline
7
7
 
8
8
  Run the full DevSecOps lifecycle: audit the codebase with 7 deduplicated agents, consolidate findings, remediate in an isolated worktree, create **separate PRs per category** with SemVer bump, verify CI, run Copilot review loops, and merge.
9
9
 
10
+ **Default mode: fully autonomous.** Uses Balanced model profile, proceeds through all phases without prompting, auto-merges PRs with clean reviews.
11
+
12
+ **`--interactive` mode:** Pauses for model profile selection, review findings approval, guardrail decisions, and merge confirmation.
13
+
10
14
  Parse `$ARGUMENTS` for:
15
+ - **`--interactive`**: pause at each decision point for user approval
11
16
  - **`--scan-only`**: run Phase 0 + 1 + 2 only (audit and plan), skip remediation
12
17
  - **`--no-merge`**: run through PR creation (Phase 5), skip Copilot review and merge
13
18
  - **Path filter**: limit scanning scope to specific directories or files
@@ -15,7 +20,13 @@ Parse `$ARGUMENTS` for:
15
20
 
16
21
  ## Configuration
17
22
 
18
- Before starting the pipeline, present the user with configuration options using `AskUserQuestion`:
23
+ ### Default Mode (autonomous)
24
+
25
+ Use the **Balanced** model profile automatically (`AUDIT_MODEL=sonnet`, `REMEDIATION_MODEL=sonnet`).
26
+
27
+ ### Interactive Mode (`--interactive`)
28
+
29
+ Present the user with configuration options using `AskUserQuestion`:
19
30
 
20
31
  ```
21
32
  AskUserQuestion([
@@ -56,7 +67,7 @@ When compacting during this workflow, always preserve:
56
67
  - All CRITICAL/HIGH findings with file:line references
57
68
  - The current phase number and what phases remain
58
69
  - All PR numbers and URLs created so far
59
- - `BUILD_CMD`, `TEST_CMD`, `PROJECT_TYPE`, `WORKTREE_DIR` values
70
+ - `BUILD_CMD`, `TEST_CMD`, `PROJECT_TYPE`, `WORKTREE_DIR`, `REPO_DIR` values
60
71
  - `VCS_HOST`, `CLI_TOOL`, `DEFAULT_BRANCH`, `CURRENT_BRANCH`
61
72
  - `PHASE_4C_START_SHA` (needed for FILE_OWNER_MAP update in Phase 4c.3)
62
73
  - `VACUOUS_TESTS_FIXED`, `WEAK_TESTS_STRENGTHENED`, `NEW_TEST_CASES`, `NEW_TEST_FILES`
@@ -96,6 +107,7 @@ Derive build and test commands from the project type:
96
107
  Record as `BUILD_CMD` and `TEST_CMD`.
97
108
 
98
109
  ### 0d: State Snapshot
110
+ - Record `REPO_DIR` via `git rev-parse --show-toplevel`
99
111
  - Record `CURRENT_BRANCH` via `git rev-parse --abbrev-ref HEAD`
100
112
  - Record `DEFAULT_BRANCH` via `gh repo view --json defaultBranchRef --jq '.defaultBranchRef.name'` (or `glab` equivalent)
101
113
  - Record `IS_DIRTY` via `git status --porcelain`
@@ -369,18 +381,43 @@ Before creating PRs, run a deep code review on all remediation changes to catch
369
381
  3. For each issue found:
370
382
  - Fix in a new commit: `fix: {description of review finding}`
371
383
  - Re-run `{BUILD_CMD}` and `{TEST_CMD}` to verify
372
- 4. Present a summary of review findings and fixes to the user via `AskUserQuestion`:
384
+ 4. **Default mode**: Print a brief summary of findings and fixes, then proceed to PR creation automatically.
385
+ **Interactive mode (`--interactive`)**: Present a summary to the user via `AskUserQuestion`:
373
386
  ```
374
387
  AskUserQuestion([{
375
388
  question: "Code review complete. {N} issues found and fixed. {list}. Proceed to PR creation?",
376
389
  options: [
377
390
  { label: "Proceed", description: "Create per-category PRs" },
391
+ { label: "Commit directly", description: "Merge worktree changes into {CURRENT_BRANCH} — no PRs, no review loops" },
378
392
  { label: "Show diff", description: "Show the full diff for manual review before proceeding" },
379
393
  { label: "Abort", description: "Stop here — I'll review manually" }
380
394
  ]
381
395
  }])
382
396
  ```
383
- 5. If "Show diff" selected, print the diff and re-ask. If "Abort", stop and print the worktree path.
397
+ 5. (Interactive only) If "Show diff" selected, print the diff and re-ask. If "Abort", stop and print the worktree path.
398
+ 6. If "Commit directly" selected:
399
+ - All remediation and review fixes are already committed incrementally in the worktree branch `better/{DATE}`. If any uncommitted changes remain, stage and commit them now:
400
+ ```bash
401
+ cd {WORKTREE_DIR}
402
+ git diff --quiet && git diff --cached --quiet || {
403
+ git add <list of remaining changed files>
404
+ git commit -m "fix: better audit remediation — remaining changes"
405
+ }
406
+ ```
407
+ - Return to the main repo checkout, merge the worktree branch, and clean up on success:
408
+ ```bash
409
+ cd {REPO_DIR}
410
+ git checkout {CURRENT_BRANCH}
411
+ if git merge better/{DATE}; then
412
+ git worktree remove {WORKTREE_DIR}
413
+ git branch -D better/{DATE}
414
+ else
415
+ echo "Merge conflict — resolve in {REPO_DIR}, then run:"
416
+ echo " git worktree remove {WORKTREE_DIR}"
417
+ echo " git branch -D better/{DATE}"
418
+ fi
419
+ ```
420
+ - Restore stash if needed (`git stash pop`), update PLAN.md, print final summary, then **stop** — this completes the workflow (Phases 5, 6, and 7 are skipped entirely since no PRs or category branches were created)
384
421
 
385
422
  ## Phase 4c: Test Enhancement
386
423
 
@@ -613,7 +650,7 @@ After creating all PRs, verify CI passes on each one:
613
650
 
614
651
  ## Phase 6: Copilot Review Loop (GitHub only)
615
652
 
616
- Loop until Copilot returns zero new comments (no fixed iteration limit). Sub-agents enforce a 10-iteration guardrail: at iteration 10 the sub-agent stops and returns a "guardrail" status, prompting the parent agent to ask the user whether to continue or stop.
653
+ Loop until Copilot returns zero new comments (no fixed iteration limit). Sub-agents enforce a 10-iteration guardrail: at iteration 10 the sub-agent stops and returns a "guardrail" status. **Default mode**: auto-stop at the guardrail. **Interactive mode (`--interactive`)**: prompt the parent agent to ask the user whether to continue or stop.
617
654
 
618
655
  **Sub-agent delegation** (prevents context exhaustion): delegate each PR's review loop to a **separate general-purpose sub-agent** via the Agent tool. Launch sub-agents in parallel (one per PR). Each sub-agent runs the full loop (request → wait → check → fix → re-request) autonomously and returns only the final status.
619
656
 
@@ -631,13 +668,19 @@ Launch all PR sub-agents in parallel. Wait for all to complete.
631
668
 
632
669
  For each sub-agent result:
633
670
  - **clean**: mark PR as ready to merge
634
- - **timeout**: inform the user "Copilot review timed out on PR #{number}." and ask whether to continue waiting, re-request, or skip
635
- - **error**: inform the user and ask whether to retry or skip
636
- - **guardrail**: the sub-agent hit the 10-iteration limit; ask the user whether to continue with more iterations or stop
671
+ - **timeout**: **Default mode**: skip the timed-out PR and continue. **Interactive mode**: inform the user and ask whether to continue waiting, re-request, or skip
672
+ - **error**: **Default mode**: retry up to 3 times, then skip. **Interactive mode**: inform the user and ask whether to retry or skip
673
+ - **guardrail**: the sub-agent hit the 10-iteration limit. **Default mode**: auto-stop and mark as best-effort. **Interactive mode**: ask the user whether to continue with more iterations or stop
637
674
 
638
675
  ### 6.3: Merge Gate (MANDATORY)
639
676
 
640
- **Do NOT merge any PR until Copilot review has completed (approved or commented) on ALL PRs, or the user explicitly approves skipping.**
677
+ **Do NOT merge any PR until its own Copilot review has completed (approved or commented with zero unresolved issues).**
678
+
679
+ ### Default Mode (autonomous)
680
+
681
+ Print the review status summary, then auto-merge all PRs whose reviews completed cleanly. PRs that timed out, hit guardrails, or still have unresolved comments are left open for manual review. Print which PRs were merged and which were left open.
682
+
683
+ ### Interactive Mode (`--interactive`)
641
684
 
642
685
  Present the review status summary to the user via `AskUserQuestion`:
643
686
  ```
@@ -652,7 +695,7 @@ AskUserQuestion([{
652
695
  }])
653
696
  ```
654
697
 
655
- Only proceed with merging based on the user's selection. Never auto-merge without user confirmation.
698
+ Only proceed with merging based on the user's selection.
656
699
 
657
700
  ### 6.4: Merge
658
701
 
@@ -1,13 +1,18 @@
1
1
  ---
2
- description: Scan codebase to infer project goals, clarify with user, and generate GOALS.md
3
- argument-hint: "[--refresh] [focus hint, e.g. 'just the CLI']"
2
+ description: Scan codebase to infer project goals and generate GOALS.md (default: fully autonomous; use --interactive to review with user)
3
+ argument-hint: "[--interactive] [--refresh] [focus hint, e.g. 'just the CLI']"
4
4
  ---
5
5
 
6
6
  # Goals — Generate a GOALS.md from Codebase Analysis
7
7
 
8
- Scan the codebase to infer the project's goals, purpose, and direction, then collaborate with the user to produce a comprehensive `GOALS.md` at the repo root.
8
+ Scan the codebase to infer the project's goals, purpose, and direction, then generate a comprehensive `GOALS.md` at the repo root.
9
+
10
+ **Default mode: fully autonomous.** Scans the codebase, synthesizes goals, and writes GOALS.md without prompting. HIGH and MEDIUM confidence goals are included; LOW confidence goals are included but marked as inferred.
11
+
12
+ **`--interactive` mode:** Pauses after synthesis to validate purpose, prioritize goals, confirm non-goals, and refine wording with the user.
9
13
 
10
14
  Parse `$ARGUMENTS` for:
15
+ - **`--interactive`**: pause after synthesis for user validation and refinement
11
16
  - **`--refresh`**: re-scan and update an existing GOALS.md rather than creating from scratch
12
17
  - **Focus hints**: e.g., "focus on API goals", "just the CLI"
13
18
 
@@ -94,29 +99,35 @@ For each goal, assign a confidence level:
94
99
  - **MEDIUM** — strongly implied by patterns, architecture, or recent work
95
100
  - **LOW** — inferred/speculative, needs user confirmation
96
101
 
97
- ## Phase 3: User Clarification
102
+ ## Phase 3: Validation
103
+
104
+ ### Default Mode (autonomous)
105
+
106
+ Skip user clarification. Include all HIGH and MEDIUM confidence goals directly. Include LOW confidence goals but mark them with `(inferred)` so the user can review after generation. Proceed directly to Phase 4.
107
+
108
+ ### Interactive Mode (`--interactive`)
98
109
 
99
110
  Present the draft to the user and ask targeted questions to resolve uncertainty. Use `AskUserQuestion` for each area that needs input.
100
111
 
101
- ### 3a: Purpose Validation
112
+ #### 3a: Purpose Validation
102
113
  Show the inferred one-paragraph purpose statement. Ask if it's accurate or needs refinement.
103
114
 
104
- ### 3b: Goal Prioritization
115
+ #### 3b: Goal Prioritization
105
116
  Present the inferred goals list. For each LOW or MEDIUM confidence goal, ask the user:
106
117
  - Is this actually a goal?
107
118
  - How would you rephrase it?
108
119
  - What priority is it (primary, secondary, stretch)?
109
120
 
110
- ### 3c: Missing Goals
121
+ #### 3c: Missing Goals
111
122
  Ask: "Are there any goals I missed that aren't yet reflected in the codebase?" Present 2-3 suggested possibilities based on common patterns for this type of project, to prompt the user's thinking.
112
123
 
113
- ### 3d: Non-Goals Validation
124
+ #### 3d: Non-Goals Validation
114
125
  Present the inferred non-goals. Ask: "Are these accurate? Anything to add or remove?"
115
126
 
116
- ### 3e: Target Users
127
+ #### 3e: Target Users
117
128
  Present the inferred target user description. Ask if it's accurate.
118
129
 
119
- ### 3f: Success Criteria (optional)
130
+ #### 3f: Success Criteria (optional)
120
131
  Ask: "Would you like to define measurable success criteria for any of these goals?" Offer examples relevant to the project type (e.g., "support N concurrent users", "< Xms response time", "100% test coverage on core module").
121
132
 
122
133
  ## Phase 4: Document Generation
@@ -190,9 +201,9 @@ If `--refresh` was passed and `GOALS.md` already exists:
190
201
  1. Read the existing `GOALS.md`
191
202
  2. Compare existing goals against current codebase state
192
203
  3. Identify goals whose status has changed (new progress, completed, abandoned)
193
- 4. Present changes to the user for confirmation
194
- 5. Update the document in-place, preserving user-written content where possible
195
- 6. If any checkbox task lists are found in the existing GOALS.md, flag them and offer to move them to PLAN.md
204
+ 4. **Default mode**: Update the document in-place automatically, preserving user-written content where possible. Print a summary of what changed.
205
+ **Interactive mode (`--interactive`)**: Present changes to the user for confirmation before updating.
206
+ 5. If any checkbox task lists are found in the existing GOALS.md, move them to PLAN.md automatically (default) or offer to move them (interactive)
196
207
 
197
208
  ## Phase 5: Finalize
198
209
 
@@ -211,8 +222,8 @@ If `--refresh` was passed and `GOALS.md` already exists:
211
222
  ## Notes
212
223
 
213
224
  - This command is project-agnostic — it reads whatever project signals exist
214
- - The goal is collaboration: scan first, then refine with the user — never assume
215
- - LOW confidence inferences should always be validated with the user before inclusion
225
+ - In default mode, scan and generate autonomously; in interactive mode, collaborate with the user
226
+ - LOW confidence inferences are included as `(inferred)` in default mode; validated with the user in interactive mode
216
227
  - Preserve the user's voice — if they provide rephrased goals, use their wording verbatim
217
228
  - If the project is brand new with minimal code, lean more heavily on user input and less on codebase inference
218
229
  - If `gh` CLI is not authenticated, skip issue/PR scanning gracefully — don't halt
@@ -1,7 +1,12 @@
1
1
  ---
2
2
  description: Create a release PR using the project's documented release workflow
3
+ argument-hint: "[--interactive]"
3
4
  ---
4
5
 
6
+ **Default mode: fully autonomous.** Auto-detects branches, determines version bump from commits, runs review, creates and merges the release PR without prompting.
7
+
8
+ **`--interactive` mode:** Pauses for branch confirmation, version approval, and merge confirmation.
9
+
5
10
  ## Detect Release Workflow
6
11
 
7
12
  Before doing anything, determine the project's source and target branches for releases. Do NOT hardcode branch names. Instead, discover them:
@@ -11,7 +16,7 @@ Before doing anything, determine the project's source and target branches for re
11
16
  - **GitHub Actions workflows** — check `.github/workflows/release.yml` (or similar) for `on: push: branches:` to find the branch that triggers the release pipeline
12
17
  - **Project conventions** (already in context) — look for git workflow sections, branch descriptions, or release instructions
13
18
  - **Versioning docs** — check `docs/VERSIONING.md`, `CONTRIBUTING.md`, or `RELEASING.md`
14
- - **Branch convention** — if a `release` branch exists, the target is `release`; otherwise ask the user
19
+ - **Branch convention** — if a `release` branch exists, the target is `release`; otherwise create it from the last release tag (see step 3 below). In `--interactive` mode, ask the user to confirm
15
20
  3. **Ensure the target branch exists** — if not, create it from the last release tag:
16
21
  ```bash
17
22
  git branch release $(git describe --tags --abbrev=0)
@@ -21,7 +26,7 @@ Before doing anything, determine the project's source and target branches for re
21
26
 
22
27
  Print the detected workflow: `Detected release flow: {source} → {target}`
23
28
 
24
- If ambiguous, ask the user to confirm before proceeding.
29
+ **Default mode**: If ambiguous, use the most likely branch (prefer `release` if it exists). If the target branch does not exist, create it from the last release tag (see step 3 above). If detection still yields `target == source`, abort with an error — a release PR cannot merge a branch into itself. **Interactive mode (`--interactive`)**: Ask the user to confirm before proceeding.
25
30
 
26
31
  **Important**: The PR direction is `{source}` → `{target}` (e.g., `main` → `release`). This gives Copilot the full diff of all changes since the last release for review. Do NOT create a branch from source and PR back into it — that only shows the version bump commit.
27
32
 
@@ -40,7 +45,7 @@ If ambiguous, ask the user to confirm before proceeding.
40
45
  - `feat:` → **minor** bump
41
46
  - `fix:`, `chore:`, `docs:`, `refactor:`, `perf:`, `style:`, `test:`, `ci:` → **patch** bump
42
47
  - Use the **highest applicable level** across all commits
43
- - Present the proposed version to the user for confirmation
48
+ - **Default mode**: Use the determined version automatically. **Interactive mode (`--interactive`)**: Present the proposed version to the user for confirmation
44
49
 
45
50
  2. **Bump version**: Run `npm version <major|minor|patch> --no-git-tag-version` to update `package.json` and `package-lock.json`
46
51
 
@@ -82,7 +87,7 @@ Checklist to apply to each file:
82
87
 
83
88
  !`cat ~/.claude/lib/code-review-checklist.md`
84
89
 
85
- Verification — confirm before proceeding:
90
+ Verification — self-check before proceeding (no user prompt needed):
86
91
  - [ ] Read every changed file in full (not just diffs)
87
92
  - [ ] Checked each file against the relevant checklist tiers
88
93
  - [ ] Quoted specific code for each finding
@@ -1,212 +1,206 @@
1
1
  ---
2
- description: Review and clean up PLAN.md, extract docs from completed work
2
+ description: Automated audit/triage of PLAN.md — archive completed items to DONE.md, suggest new work, keep PLAN.md lean
3
+ argument-hint: "[--interactive]"
3
4
  ---
4
5
 
5
6
  # Replan Command
6
7
 
7
- You are tasked with reviewing and updating the PLAN.md file to keep it clean, current, and action-oriented.
8
+ Automatically audit PLAN.md against the codebase, prune completed/stale items, archive what's done, suggest new work, and leave PLAN.md lean and actionable.
8
9
 
9
- **This is an interactive process.** Do NOT assume items are still pending or still relevant. Verify with the user.
10
+ **Default mode: fully autonomous.** Scans the codebase, archives done items, removes stale items, adds suggested new items, and commits — no user interaction.
11
+
12
+ **`--interactive` mode:** Pauses after evidence gathering to present findings and get user approval before making changes.
13
+
14
+ **Philosophy:** PLAN.md should be short enough to paste into a prompt. Completed items belong in a done log, not cluttering the active plan.
10
15
 
11
16
  ## Boundary Rule: PLAN.md vs GOALS.md
12
17
 
13
18
  **PLAN.md is tactical. GOALS.md is strategic.**
14
19
 
15
- PLAN.md answers: *What are we building next? What's the backlog? What's done?*
20
+ PLAN.md answers: *What are we building next? What's the backlog?*
16
21
  GOALS.md answers: *Why does this project exist? What does success look like? What will we never do?*
17
22
 
18
- **PLAN.md owns:**
19
- - Checkbox task lists (`- [ ] Add feature X`)
20
- - Implementation details, subtasks, and technical steps
21
- - Known issues and testing gaps
22
- - Prioritized next-action lists
23
- - Completed work archive
24
- - Documentation index
25
-
26
- **PLAN.md must NOT duplicate:**
23
+ **PLAN.md must NOT contain:**
27
24
  - Mission statements, core tenets, or non-goals (those belong in GOALS.md)
28
- - Milestone definitions written as outcome prose (GOALS.md territory)
25
+ - Completed items (those belong in `DONE.md`)
26
+ - Detailed documentation (those belong in `docs/`)
29
27
 
30
- **Cross-reference:** PLAN.md should link to GOALS.md for strategic context, and GOALS.md should link back to PLAN.md for tactical details.
28
+ ## Phase 1: Automated Evidence Gathering
31
29
 
32
- ## Your Responsibilities
30
+ Launch these agents in parallel — no user interaction needed.
33
31
 
34
- ### 1. Gather Evidence
35
-
36
- Before touching PLAN.md, gather signals about what's actually happened since the plan was last updated. Run these in parallel:
37
-
38
- **Agent 1: Git History**
39
- ```bash
40
- git log --oneline -30
41
- ```
42
- Look for commits that may have completed items listed in PLAN.md.
32
+ **Agent 1: Git History Analysis**
33
+ - `git log --oneline -50` — identify commits that completed plan items
34
+ - `git log --since="2 weeks ago" --oneline` surface recent work not yet reflected in the plan
35
+ - Cross-reference commit messages against pending PLAN.md items to auto-detect completions
43
36
 
44
- **Agent 2: Codebase Scan**
45
- Search for evidence that "pending" items may already be implemented:
46
- - Grep for function names, component names, or feature keywords mentioned in pending items
37
+ **Agent 2: Codebase Verification**
38
+ - For each pending item in PLAN.md, grep for function names, component names, or feature keywords
47
39
  - Check test files for coverage of features listed as untested
48
40
  - Look at recently modified files for signs of completed work
41
+ - Build a confidence score per item: `confirmed-done`, `likely-done`, `still-pending`, `stale`
42
+
43
+ **Agent 3: Opportunity Scanner**
44
+ - Scan for TODOs, FIXMEs, HACKs in the codebase that aren't in PLAN.md
45
+ - Look for test coverage gaps (files with no corresponding test)
46
+ - Check for outdated dependencies (`npm outdated`, `cargo outdated`, etc. as appropriate)
47
+ - Review GOALS.md (if it exists) for strategic goals not yet represented in the plan
48
+ - Identify code quality opportunities (large files, complex functions, missing error handling)
49
+ - Formulate 1-3 suggested new items
49
50
 
50
- **Agent 3: GOALS.md Boundary Check**
51
+ **Agent 4: GOALS.md Boundary Check**
51
52
  If `GOALS.md` exists:
52
- - Read it and check for checkbox task lists or implementation details that leaked in
53
+ - Check for checkbox task lists or implementation details that leaked in
53
54
  - Note any items that should be absorbed into PLAN.md
54
55
 
55
- ### 2. Interactive Item Review
56
+ ## Phase 2: Auto-Triage
56
57
 
57
- **This is the most important step. Do NOT skip it.**
58
+ Using agent results, classify every PLAN.md item:
58
59
 
59
- Walk through PLAN.md with the user, section by section. For each section that has pending items, present your findings and ask the user to confirm status.
60
+ | Status | Criteria | Action |
61
+ |--------|----------|--------|
62
+ | `confirmed-done` | Git commit + code exists + tests pass | Archive to DONE.md |
63
+ | `likely-done` | Strong evidence but not 100% certain | Archive to DONE.md |
64
+ | `stale` | No commits, no code, no recent discussion; item is >30 days old with zero progress | Remove from PLAN.md |
65
+ | `still-pending` | No evidence of completion | Keep in PLAN.md |
60
66
 
61
- **For each group of related pending items**, use `AskUserQuestion` to verify. Batch related items together (don't ask one-by-one for 20 items). For example:
67
+ ## Phase 3: Apply Changes (or Checkpoint if Interactive)
62
68
 
63
- ```
64
- I found these items still marked as pending under "Testing Gaps":
65
- - [ ] Server route unit tests
66
- - [ ] Aggregate calculation tests
67
- - [ ] Visual regression tests for charts
69
+ ### Default Mode (autonomous)
68
70
 
69
- Git history shows commits for "add server route tests" on Feb 15.
70
- I also found test files at packages/server/test/routes/.
71
+ Apply all changes immediately without prompting:
71
72
 
72
- Which of these are actually done?
73
+ 1. Archive `confirmed-done` and `likely-done` items to DONE.md
74
+ 2. Remove `stale` items from PLAN.md
75
+ 3. Add suggested new items to the appropriate PLAN.md section
76
+ 4. Absorb any tactical items found in GOALS.md
77
+ 5. Print a brief summary of what was done:
78
+
79
+ ```
80
+ Replan complete:
81
+ - Archived {N} completed items to DONE.md
82
+ - Removed {S} stale items
83
+ - Added {P} new suggested items
84
+ - {any GOALS.md boundary fixes}
73
85
  ```
74
86
 
75
- **How to batch the review:**
76
- - Group items by section (Next Up, Remaining Work, Future, etc.)
77
- - Present each group with any evidence you found (git commits, files that exist, grep matches)
78
- - Ask the user to confirm: which are done, which are still needed, which should be removed or rephrased
79
- - Use multiSelect questions when asking about multiple items (let the user check off what's done)
80
- - If a section has no evidence of changes, still ask briefly: "These items under [section] — still accurate, or any updates?"
87
+ ### Interactive Mode (`--interactive`)
88
+
89
+ Present ONE consolidated summary to the user:
81
90
 
82
- **For known issues**, ask whether they're still reproducible or have been fixed.
91
+ ```
92
+ AskUserQuestion([{
93
+ question: "Replan audit complete. Here's what I found:\n\n**Auto-archiving to DONE.md** ({N} items):\n{list of confirmed-done items}\n\n**Likely done — archive?** ({M} items):\n{list with evidence}\n\n**Flagged as stale** ({S} items):\n{list with last-activity dates}\n\n**New suggestions** ({P} items):\n{numbered list of proposed new items with rationale}\n\nHow should I proceed?",
94
+ multiSelect: true,
95
+ options: [
96
+ { label: "Archive confirmed-done", description: "Move {N} confirmed items to DONE.md" },
97
+ { label: "Archive likely-done too", description: "Also move {M} likely-done items to DONE.md" },
98
+ { label: "Remove stale items", description: "Delete {S} stale items from PLAN.md" },
99
+ { label: "Add suggested items", description: "Add {P} new items to PLAN.md" }
100
+ ]
101
+ }])
102
+ ```
83
103
 
84
- **For "Next Actions" / priority ordering**, ask if the priorities still reflect the user's current thinking.
104
+ **Exclusive options** (present only if the user asks, as a separate follow-up):
105
+ - "Show me the details" — print full evidence, then re-ask the above
106
+ - "Just clean up formatting" — only reformat PLAN.md, skip all archive/remove/add actions
85
107
 
86
- ### 3. Extract Documentation from Completed Work
108
+ If the user selects "Show me the details" as a response, print the full evidence and re-ask.
87
109
 
88
- For each completed item with substantial documentation:
89
- - Determine the appropriate docs location (create docs/ directory if needed)
90
- - Extract the detailed documentation sections
91
- - Move them to appropriate docs files with proper formatting
92
- - Follow existing documentation patterns if they exist
110
+ For suggested new items: if the user selects "Add suggested items", present each suggestion individually so they can accept, reject, or modify each one.
93
111
 
94
- **Common docs files to consider:**
95
- - `docs/ARCHITECTURE.md` - System design, data flow, architecture
96
- - `docs/API.md` - API endpoints, schemas, events
97
- - `docs/TROUBLESHOOTING.md` - Common issues and solutions
98
- - `docs/features/*.md` - Individual feature documentation
99
- - `README.md` - User-facing documentation
112
+ ## Phase 4: Archive to DONE.md
100
113
 
101
- ### 4. Clean Up PLAN.md
114
+ `DONE.md` lives at project root. It's the append-only log of completed work.
102
115
 
103
- Using the verified information from the interactive review:
104
- - Mark confirmed-completed items as [x] and move to archive
105
- - Remove items the user confirmed are no longer relevant
106
- - Update wording for items the user rephrased
107
- - Replace detailed documentation with brief summaries + doc links
108
- - Remove redundant or outdated information
116
+ ### Format
109
117
 
110
- **Example transformation:**
111
118
  ```markdown
112
- Before:
113
- - [x] Feature X: Authentication System
119
+ # Done Log
120
+
121
+ Completed items archived from PLAN.md. For release notes, see `.changelogs/`.
114
122
 
115
- ### Architecture
116
- - **Auth Service**: Core authentication logic
117
- - **JWT Tokens**: Token generation and validation
118
- [... 50 more lines of detailed docs ...]
123
+ ## 2026-03-16
119
124
 
120
- After:
121
- - [x] Feature X: Authentication System - JWT-based auth with session management. See [Authentication](./docs/features/authentication.md)
125
+ - Implemented feature X — added auth middleware and JWT validation
126
+ - Fixed bug Y null check on user profile load
127
+ - Refactored Z — extracted shared utilities from monolithic handler
128
+
129
+ ## 2026-03-10
130
+
131
+ - Added CI pipeline for staging deploys
132
+ - Test coverage for API routes
122
133
  ```
123
134
 
124
- ### 5. Update Documentation Index
125
- - Ensure PLAN.md references all relevant docs files
126
- - Add any new docs files you created
127
- - Verify all links are correct
128
- - Add a Documentation section if it doesn't exist
135
+ ### Rules
129
136
 
130
- ### 6. Rewrite Next Actions
137
+ - Group by date (newest first)
138
+ - One line per item — concise description of what was done, not the original checkbox text
139
+ - If the completed item had substantial documentation (>20 lines), extract it to `docs/` and add a link: `- Feature X — see [docs/features/x.md](./docs/features/x.md)`
140
+ - Do NOT duplicate changelog entries — DONE.md captures plan-item completion, changelogs capture release-level changes
131
141
 
132
- Based on the interactive review, rebuild the "Next Actions" section:
133
- - Ask the user: "Based on what's left, what are your top 3-5 priorities right now?"
134
- - Present a suggested ordering based on what you learned, but let the user override
135
- - Make action items specific and actionable
142
+ ## Phase 5: Rebuild PLAN.md
136
143
 
137
- ### 7. Absorb GOALS.md Violations
144
+ Rewrite PLAN.md to be lean and actionable:
138
145
 
139
- If you found checkbox items or tactical details in GOALS.md during step 1:
140
- - Show the user what you found
141
- - Offer to move them into the appropriate PLAN.md section
142
- - Update GOALS.md to remove the tactical items (replace with outcome prose or remove entirely)
146
+ ### Target Structure
143
147
 
144
- ### 8. Commit Your Changes
145
- After reorganizing (if in a git repository):
146
- - Commit changes with a clear message like:
147
- ```
148
- docs: reorganize PLAN.md and extract completed work to docs
148
+ ```markdown
149
+ # Development Plan
149
150
 
150
- - Moved completed feature docs to docs/features/
151
- - Updated PLAN.md to focus on next actions
152
- - Added Next Actions section
153
- ```
151
+ For project mission and milestones, see [GOALS.md](./GOALS.md).
152
+ For completed work, see [DONE.md](./DONE.md).
154
153
 
155
- ## Guidelines
154
+ ## Next Up
156
155
 
157
- - **Verify, don't assume**: The whole point of this command is to sync PLAN.md with reality. Never mark items as done or still-pending without checking.
158
- - **Be thorough**: Read all completed items and assess documentation value
159
- - **Be surgical**: Only move substantial documentation (>20 lines), keep brief summaries in PLAN
160
- - **Be organized**: Group related content in docs files with clear headings
161
- - **Be consistent**: Match the style and format of existing docs files
162
- - **Be helpful**: Make it easy to find information by adding clear references
163
- - **Respect boundaries**: Tactical items in PLAN.md, strategic items in GOALS.md
164
- - **Batch intelligently**: Don't ask 20 individual questions — group related items and ask about sections at a time. Aim for 3-6 interactive checkpoints, not 20.
156
+ 1. **Item A**: Brief actionable description
157
+ 2. **Item B**: Brief actionable description
158
+ 3. **Item C**: Brief actionable description
165
159
 
166
- ## Example Output Structure
160
+ ## Backlog
167
161
 
168
- After running `/replan`, the PLAN.md should have:
169
- ```markdown
170
- # Project Name - Development Plan
162
+ - [ ] Item D: Description
163
+ - [ ] Item E: Description
171
164
 
172
- The tactical backlog. For mission and milestones, see [GOALS.md](./GOALS.md).
165
+ ## Future / Ideas
173
166
 
174
- ## Documentation
175
- - [Architecture Overview](./docs/ARCHITECTURE.md)
176
- - [API Reference](./docs/API.md)
167
+ - Item F: One-line description
168
+ - Item G: One-line description
169
+ ```
177
170
 
178
- ## Next Up
179
- - [ ] Feature C: Brief description with subtasks
171
+ ### Guidelines
180
172
 
181
- ## Remaining Work
182
- ### Known Issues
183
- - ...
184
- ### Testing Gaps
185
- - [ ] ...
173
+ - **"Next Up" is ordered** — numbered list, max 5 items, these are the immediate priorities
174
+ - **"Backlog" is unordered** — checkbox items that are planned but not prioritized
175
+ - **"Future / Ideas" has no checkboxes** — these are possibilities, not commitments
176
+ - **No completed items** — they're in DONE.md
177
+ - **No detailed docs** — link to `docs/` files instead
178
+ - **No section if it's empty** — don't include "Backlog" with zero items
186
179
 
187
- ## Future (v2.0+)
188
- - [ ] Feature D: Brief description of planned work
180
+ ## Phase 6: Absorb GOALS.md Violations
189
181
 
190
- ## Next Actions
182
+ If tactical items (checkboxes, implementation details) were found in GOALS.md:
183
+ - Move them into the appropriate PLAN.md section
184
+ - Update GOALS.md to remove tactical content
191
185
 
192
- 1. **Task 1**: Brief description of what needs to be done
193
- 2. **Task 2**: Brief description of next task
194
- 3. **Task 3**: Brief description of another task
186
+ ## Phase 7: Commit
195
187
 
196
- ## Completed Work (Archive)
197
- <details>
198
- <summary>v0.x Features</summary>
199
- - [x] Feature A - See [Feature A Docs](./docs/features/feature-a.md)
200
- - [x] Feature B - See [Feature B Docs](./docs/features/feature-b.md)
201
- </details>
188
+ Stage and commit all files modified during this replan:
189
+ ```bash
190
+ git add PLAN.md
191
+ # Stage optional files only if they exist and were modified
192
+ git add DONE.md 2>/dev/null || true
193
+ git add GOALS.md 2>/dev/null || true
194
+ git add docs/ 2>/dev/null || true
195
+ git commit -m "docs: replan — archive {N} completed items, update priorities"
202
196
  ```
203
197
 
198
+ Do NOT push unless explicitly asked.
199
+
204
200
  ## Notes
205
201
 
206
- - Don't delete information - move it to appropriate docs files
207
- - Keep related information consolidated in single docs files
208
- - Create feature-specific docs in docs/features/ for complex systems
209
- - Preserve all historical information but organize it better
210
- - If no PLAN.md exists, inform the user rather than creating one
211
- - Adapt to the existing structure and conventions of the project
212
- - If GOALS.md has task lists that belong in PLAN.md, migrate them
202
+ - If no PLAN.md exists, inform the user and offer to create one from codebase analysis
203
+ - The opportunity scanner suggestion is the key differentiator — every replan should surface at least one new idea
204
+ - DONE.md is append-only never delete entries from it
205
+ - Keep PLAN.md under ~50 lines whenever possible — it should be scannable in seconds
206
+ - Adapt to existing project structure and conventions
@@ -73,7 +73,7 @@ With the flow understood, evaluate the changed code against these principles:
73
73
  - Function and variable names should communicate intent. If you need to read the implementation to understand what a name means, it's poorly named.
74
74
  - Boolean variables/params should read as predicates (`isReady`, `hasAccess`), not ambiguous nouns.
75
75
 
76
- Only flag principle violations that are **concrete and actionable** in the changed code. Do not flag pre-existing design issues in untouched code unless the changes make them worse.
76
+ For this review, only flag principle and design violations that are **concrete and actionable** in the code changed by this PR. However, if you discover a clear, real bug or correctness issue — even in code not directly modified here call it out and help ensure it gets fixed (in this PR or a follow-up). Never dismiss serious problems as "out of scope" or "not modified in this PR."
77
77
 
78
78
  </review_instructions>
79
79
 
@@ -98,6 +98,27 @@ Check every file against this checklist. The checklist is organized into tiers
98
98
  - If the PR adds a new call to an external service that has established mock/test infrastructure (mock mode flags, test helpers, dev stubs), verify the new call uses the same patterns — bypassing them makes the new code path untestable in offline/dev environments and inconsistent with existing integrations
99
99
  - If the PR adds a new UI component or client-side consumer against an existing API endpoint, read the actual endpoint handler or response shape — verify every field name, nesting level, identifier property, and response envelope path used in the consumer matches what the producer returns. This is the #1 source of "renders empty" bugs in new views built against existing APIs
100
100
 
101
+ **Push/real-time event scoping**
102
+ - If the PR adds or modifies WebSocket, SSE, or pub/sub event emission, trace the event scope: does the event reach only the originating session/user, or is it broadcast to all connected clients? Check payloads for sensitive content (user inputs, images, tokens) that should not leak across sessions. If the consumer filters by a correlation ID, verify the producer includes one and that the ID is generated server-side or validated against the session
103
+
104
+ **Cleanup/teardown side effect audit**
105
+ - If the PR adds cleanup, teardown, or garbage-collection functions, trace whether the cleanup performs implicit state mutations (auto-merge into main, auto-commit of unreviewed changes, cascade writes to shared state). Verify the cleanup aborts safely if a prerequisite step fails (e.g., saving dirty state before deletion) rather than proceeding with data loss
106
+
107
+ **Specification/standard conformance**
108
+ - If the PR implements or extends a parser for a well-known format (cron expressions, date formats, URLs, semver, MIME types), verify boundary handling matches the specification — especially field-specific ranges (month starts at 1, not 0), normalization conventions (cron DOW 0 and 7 both mean Sunday), and step/range semantics that differ per field type
109
+
110
+ **Temporal context consistency**
111
+ - If the PR adds timezone-aware logic alongside existing non-timezone-aware comparisons in the same code flow (e.g., a weekday gate using UTC while cron matching uses user timezone), check that all temporal comparisons in the flow use the same timezone context — mixed contexts cause operations to trigger on the wrong local day/hour
112
+
113
+ **Status/health endpoint freshness**
114
+ - If the PR adds or modifies a status or health-check endpoint, trace whether it returns live probe results or cached data. Cached health checks mask real-time failures — a cache keyed by URL that survives URL reconfiguration reports stale status. Verify health endpoints bypass caches or use sufficiently short TTLs
115
+
116
+ **Boolean/type fidelity through serialization boundaries**
117
+ - If the PR persists boolean flags to text-based storage (markdown metadata, flat files, query strings, form data), trace the round-trip: write path → storage format → read/parse path → consumption site. Boolean `false` serialized as the string `"false"` is truthy in JavaScript — verify all consumption sites use strict equality or a dedicated coercion function, and that the same coercion is applied consistently
118
+
119
+ **Cross-layer invariant enforcement**
120
+ - If the PR introduces or modifies an invariant relationship between configuration flags (e.g., "flag A implies flag B"), trace enforcement through every layer: UI toggle handlers, form submission payloads, API validation schemas, server default-application functions, and persistence round-trip. If any layer allows the invariant to be violated, cascading defaults produce contradictory state
121
+
101
122
  **Error path completeness**
102
123
  - Trace each error path end-to-end: does the error reach the user with a helpful message and correct HTTP status? Or does it get swallowed, logged silently, or surface as a generic 500?
103
124
  - For multi-step operations (sync to N repos, batch updates): are per-item failures tracked separately from overall success? Does the status reflect partial failure accurately?
@@ -1,7 +1,12 @@
1
1
  ---
2
2
  description: Resolve PR review feedback with parallel agents
3
+ argument-hint: "[--interactive]"
3
4
  ---
4
5
 
6
+ **Default mode: fully autonomous.** Fetches review feedback, fixes issues, pushes, resolves threads, and loops Copilot reviews without prompting. Auto-skips on timeout/errors after retries.
7
+
8
+ **`--interactive` mode:** Pauses on Copilot review timeout and repeated errors to ask the user how to proceed.
9
+
5
10
  # Resolve PR Review Feedback
6
11
 
7
12
  Address the latest code review feedback on the current branch's pull request using parallel sub-agents.
@@ -18,6 +23,8 @@ Address the latest code review feedback on the current branch's pull request usi
18
23
  ```
19
24
  Save results to `/tmp/pr_threads.json` for parsing.
20
25
 
26
+ **Thread-count tracking**: Count and report total unresolved threads upfront (e.g., "Found 7 unresolved review threads"). After resolution, report how many were addressed vs. remaining (e.g., "Resolved 5/7 threads, 2 left unaddressed"). This prevents partial sessions from going unnoticed across context resets.
27
+
21
28
  4. **Spawn parallel sub-agents to address feedback**:
22
29
  - For small PRs (1-3 unresolved threads), handle fixes inline instead of spawning agents
23
30
  - For larger PRs, spawn one `Agent` call (general-purpose type) per review thread (or group closely related threads on the same file into one agent)
@@ -40,14 +47,14 @@ Address the latest code review feedback on the current branch's pull request usi
40
47
  - Stage all changed files and commit with a descriptive message summarizing what was addressed. Do not include co-author info.
41
48
  - Push to the branch.
42
49
 
43
- 8. **Resolve conversations**: For each addressed thread, resolve it via GraphQL mutation using stdin JSON. **Never use `$variables` in the query — inline the thread ID directly**:
50
+ 8. **Resolve conversations**: For each addressed thread, resolve it via GraphQL mutation using stdin JSON. Track resolution count against the total from step 3. **Never use `$variables` in the query — inline the thread ID directly**:
44
51
  ```bash
45
52
  echo '{"query":"mutation { resolveReviewThread(input: {threadId: \"THREAD_ID_HERE\"}) { thread { id isResolved } } }"}' | gh api graphql --input -
46
53
  ```
47
54
 
48
55
  9. **Request another Copilot review** (only if `is_fork_pr=false`): After pushing fixes, request a fresh Copilot code review and repeat from step 3 until the review passes clean. **Skip for fork-to-upstream PRs.**
49
56
 
50
- 10. **Report summary**: Print a table of all threads addressed with file, line, and a brief description of the fix.
57
+ 10. **Report summary**: Print a table of all threads addressed with file, line, and a brief description of the fix. Include a final count line: "Resolved X/Y threads." If any threads remain unresolved, list them with reasons (unclear feedback, disagreement, requires user input).
51
58
 
52
59
  !`cat ~/.claude/lib/graphql-escaping.md`
53
60
 
@@ -73,12 +80,13 @@ gh api graphql -f query='{ repository(owner: "OWNER", name: "REPO") { pullReques
73
80
 
74
81
  **Dynamic poll timing**: Before your first poll, check how long the most recent Copilot review on this PR took by comparing consecutive Copilot review `submittedAt` timestamps (or PR creation time for the first review). Use that duration as your expected wait. If no prior review exists, default to 5 minutes. Set poll interval to 60 seconds and max wait to **2x the expected duration** (minimum 5 minutes, maximum 20 minutes). Copilot reviews can take **10-15 minutes** for large diffs — do NOT give up early.
75
82
 
76
- The review is complete when a new `copilot-pull-request-reviewer` review node appears. If no review appears after max wait, **ask the user** whether to continue waiting, re-request, or skip.
83
+ The review is complete when a new `copilot-pull-request-reviewer` review node appears. If no review appears after max wait: **Default mode**: auto-skip and continue. **Interactive mode (`--interactive`)**: ask the user whether to continue waiting, re-request, or skip.
77
84
 
78
- **Error detection**: After a review appears, check its `body` for error text such as "Copilot encountered an error" or "unable to review this pull request". If found, this is NOT a successful review — log a warning, re-request the review (same API call above), and resume polling. Allow up to 3 error retries before asking the user whether to continue or skip.
85
+ **Error detection**: After a review appears, check its `body` for error text such as "Copilot encountered an error" or "unable to review this pull request". If found, this is NOT a successful review — log a warning, re-request the review (same API call above), and resume polling. Allow up to 3 error retries. After 3 failures: **Default mode**: auto-skip and continue. **Interactive mode (`--interactive`)**: ask the user whether to continue or skip.
79
86
 
80
87
  ## Notes
81
88
 
82
89
  - Only resolve threads where you've actually addressed the feedback
83
90
  - If feedback is unclear or incorrect, leave a reply comment instead of resolving
84
91
  - Always run tests before committing — never push code with known failures
92
+ - **Never dismiss findings as "out of scope" or "not modified in this PR."** If a review identifies a real issue, fix it — regardless of whether the current PR touched that code. Evaluate every finding on its merits. Don't leave trash on the floor.
@@ -16,7 +16,7 @@
16
16
  **Runtime correctness**
17
17
  - Null/undefined access without guards, off-by-one errors, object spread of potentially-null values (spread of null is `{}`, silently discarding state) or non-object values (spreading a string produces indexed character keys, spreading an array produces numeric keys) — guard with a plain-object check before spreading
18
18
  - Data from external/user sources (parsed JSON, API responses, file reads) used without structural validation — guard against parse failures, missing properties, wrong types, and null elements before accessing nested values. When parsed data is optional enrichment, isolate failures so they don't abort the main operation
19
- - Type coercion edge cases — `Number('')` is `0` not empty, `0` is falsy in truthy checks, `NaN` comparisons are always false; string comparison operators (`<`, `>`, `localeCompare`) do lexicographic, not semantic, ordering (e.g., `"10" < "2"`). Use explicit type checks (`Number.isFinite()`, `!= null`) and dedicated libraries (e.g., semver for versions) instead of truthy guards or lexicographic ordering when zero/empty are valid values or semantic ordering matters
19
+ - Type coercion edge cases — `Number('')` is `0` not empty, `0` is falsy in truthy checks, `NaN` comparisons are always false; string comparison operators (`<`, `>`, `localeCompare`) do lexicographic, not semantic, ordering (e.g., `"10" < "2"`). Use explicit type checks (`Number.isFinite()`, `!= null`) and dedicated libraries (e.g., semver for versions) instead of truthy guards or lexicographic ordering when zero/empty are valid values or semantic ordering matters. Boolean values round-tripping through text serialization (markdown metadata, query strings, form data, flat-file config) become strings — `"false"` is truthy in JavaScript, so truthiness checks on deserialized booleans silently treat explicit `false` as `true`. Use strict equality (`=== true`, `=== 'true'`) or a dedicated coercion function; ensure the same coercion is applied at every consumption site
20
20
  - Functions that index into arrays without guarding empty arrays; aggregate operations (`every`, `some`, `reduce`) on potentially-empty collections returning vacuously true/default values that mask misconfiguration or missing data; state/variables declared but never updated or only partially wired up
21
21
  - Parallel arrays or tuples coupled by index position (e.g., a names array, a promises array, and a destructuring assignment that must stay aligned) — insertion or reordering in one silently misaligns all others. Use objects/maps keyed by a stable identifier instead
22
22
  - Shared mutable references — module-level defaults passed by reference mutate across calls (use `structuredClone()`/spread); `useCallback`/`useMemo` referencing a later `const` (temporal dead zone); object spread followed by unconditional assignment that clobbers spread values
@@ -27,13 +27,15 @@
27
27
  - Route params passed to services without format validation; path containment checks using string prefix without path separator boundary (use `path.relative()`)
28
28
  - Parameterized/wildcard routes registered before specific named routes — the generic route captures requests meant for the specific endpoint (e.g., `/:id` registered before `/drafts` matches `/drafts` as `id="drafts"`). Verify route registration order or use path prefixes to disambiguate
29
29
  - Stored or external URLs rendered as clickable links (`href`, `src`, `window.open`) without protocol validation — `javascript:`, `data:`, and `vbscript:` URLs execute in the user's browser. Allowlist `http:`/`https:` (and `mailto:` if needed) before rendering; for all other schemes, render as plain text or strip the value
30
+ - Server-side HTTP requests using user-configurable or externally-stored URLs without protocol allowlisting (http/https only) and host/network restrictions — the server becomes an SSRF proxy for reaching internal network services, cloud metadata endpoints, or localhost-bound APIs. Validate scheme and restrict to expected hosts or external-only ranges before any server-side fetch
30
31
  - Error/fallback responses that hardcode security headers instead of using centralized policy — error paths bypass security tightening
31
32
 
32
33
  **Trust boundaries & data exposure**
33
34
  - API responses returning full objects with sensitive fields — destructure and omit across ALL response paths (GET, PUT, POST, error, socket); comments/docs claiming data isn't exposed while the code path does expose it
34
35
  - Server trusting client-provided computed/derived values (scores, totals, correctness flags, file metadata like MIME type and size) when the server can recompute or verify them — strip and recompute server-side; for file uploads, validate content type via magic bytes and size via actual buffer length rather than trusting client-supplied headers
35
- - New endpoints mounted under restricted paths (admin, internal) missing authorization verification — compare with sibling endpoints in the same route group to ensure the same access gate (role check, scope validation) is applied consistently
36
+ - New endpoints mounted under restricted paths (admin, internal) missing authorization verification — compare with sibling endpoints in the same route group to ensure the same access gate (role check, scope validation) is applied consistently. When new capabilities require additional OAuth scopes or API permissions, verify the scope-upgrade check covers all required scopes — a check that only tests for one scope will miss newly added scopes, causing downstream API calls to fail with insufficient permissions
36
37
  - User-controlled objects merged via `Object.assign`/spread without sanitizing keys — `__proto__`, `constructor`, and `prototype` keys enable prototype pollution. Use `Object.create(null)` for the target, whitelist allowed keys, and use `hasOwnProperty` (not `in`) to check membership. Also verify the merge can't override reserved/internal fields the system depends on
38
+ - Push events (WebSocket, SSE, pub/sub) emitted without scoping to the originating user or session — sensitive payloads (user content, tokens, progress data, images) leak to all connected clients in multi-user environments. Scope events to the requesting session via room/channel isolation or include a correlation ID the client provides at request time; verify consumers filter events by correlation ID before updating UI state
37
39
 
38
40
  ## Tier 2 — Check When Relevant (Data Integrity, Async, Error Handling)
39
41
 
@@ -43,7 +45,8 @@
43
45
  - Error notification at multiple layers (shared API client + component-level) — verify exactly one layer owns user-facing error messages. For periodic polling, also check that error notifications are throttled or deduplicated (only fire on state transitions like success→error, not on every failed iteration) and that failure doesn't make the UI section disappear entirely (component returning null when data is null/errored) — render an error or stale-data state instead of absence
44
46
  - Optimistic updates using full-collection snapshots for rollback — a second in-flight action gets clobbered. Use per-item rollback and functional state updaters after async gaps; sync optimistic changes to parent via callback or trigger refetch on remount. When appending items to a list optimistically, guard against duplicates (check existence before append) — concurrent or repeated operations can insert the same item multiple times
45
47
  - State updates guarded by truthiness of the new value (`if (arr?.length)`) — prevents clearing state when the source legitimately returns empty. Distinguish "no response" from "empty response"
46
- - Periodic/scheduled operations with skip conditions (gates, precondition checks, "nothing to do" early exits) that don't advance timing state (lastRun, nextFireTime) on skip — null or stale lastRun causes immediate re-trigger in a tight loop. Record the skip as an execution or compute the next fire time from now, not from the missing lastRun
48
+ - Periodic/scheduled operations with skip conditions (gates, precondition checks, "nothing to do" early exits) that don't advance timing state (lastRun, nextFireTime) on skip — null or stale lastRun causes immediate re-trigger in a tight loop. Record the skip as an execution or compute the next fire time from now, not from the missing lastRun. Also check the initial baseline for never-run items: using epoch (0) or distant-past as the default "last run" makes schedule-based items appear immediately due on first evaluation, while using "now" may cause them to never become due — choose a baseline that correctly represents "first occurrence after activation"
49
+ - Cached values keyed without all relevant discriminators (base URL, tenant ID, environment, configuration version) — context changes (URL reconfiguration, tenant switch) serve stale cached data from the previous context. Health/status endpoints that return cached results instead of live probes mask real-time failures, reporting "connected" when the service is unreachable. Key caches by their full context and bypass or invalidate caches for availability checks
47
50
  - Mutation/trigger functions that return or propagate stale pre-mutation state — if a function activates, updates, or resets an entity, the returned value and any dependent scheduling/evaluation state (backoff timers, "last run" timestamps, status flags) must reflect the post-mutation state, not a snapshot read before the mutation
48
51
  - Fire-and-forget or async writes where the in-memory object is not updated (response returns stale data) or is updated unconditionally regardless of write success (response claims state that was never persisted) — update in-memory state conditionally on write outcome, or document the tradeoff explicitly. Also applies to responses and business-logic decisions (threshold triggers, status transitions) derived from pre-transaction reads — concurrent writers all read the same stale value, so thresholds may be crossed without triggering the transition. Compute from post-write state or use conditional expressions that evaluate the stored value. For monotonic counters (sequence numbers, cursors) that must stay in lockstep with append-only storage, advancing before the write risks the counter running ahead on failure; not advancing after a partial write risks reuse — reserve the range before writing and commit only on success. Also check for dependent side effects (rewards, notifications, secondary uploads, resource allocation) executing in parallel with or before the primary write they depend on — if the primary write fails or is rejected (lock contention, dedup, validation), the side effects are irrecoverable (orphaned uploads, unearned rewards, phantom notifications). Gate side effects on confirmed primary write success
49
52
  - Error/early-exit paths that return status metadata (pagination flags, truncation indicators, hasMore, completion markers) or emit events (WebSocket, SSE, pub/sub) with default/initial values instead of reflecting actual accumulated state — downstream consumers make incorrect decisions (e.g., treating a failed sync as successful because the completion event was emitted unconditionally). Set metadata flags and event payloads based on actual outcome, not just the final request's exit path. Also check paired lifecycle events (started/completed/failed): if a function emits a "started" event, every exit path — including early returns and no-op branches — must emit the corresponding "completed" or "failed" event, or clients waiting for completion will hang or show stale state
@@ -62,7 +65,7 @@
62
65
 
63
66
  **Resource management** _[applies when: code uses event listeners, timers, subscriptions, or useEffect]_
64
67
  - Event listeners, socket handlers, subscriptions, timers, and useEffect side effects are cleaned up on unmount/teardown
65
- - Deletion/destroy and state-reset functions that clean up or reset the primary resource but leave orphaned or inconsistent secondary resources (data directories, git branches, child records, temporary files, per-user flag/vote items) — trace all resources created during the entity's lifecycle and verify each is removed on delete. For state transitions that reset aggregate values (counters, scores, flags), also clear or version the individual records that contributed to those aggregates — otherwise the aggregate and its sources disagree, and duplicate-prevention checks block legitimate re-entry
68
+ - Deletion/destroy and state-reset functions that clean up or reset the primary resource but leave orphaned or inconsistent secondary resources (data directories, git branches, child records, temporary files, per-user flag/vote items) — trace all resources created during the entity's lifecycle and verify each is removed on delete. For state transitions that reset aggregate values (counters, scores, flags), also clear or version the individual records that contributed to those aggregates — otherwise the aggregate and its sources disagree, and duplicate-prevention checks block legitimate re-entry. Also check cleanup operations that perform implicit state mutations (auto-merge, auto-commit, cascade writes) as part of teardown — these can introduce unreviewed changes or silently modify shared state. Verify cleanup fails safely when a prerequisite step (e.g., saving dirty state) fails rather than proceeding with data loss
66
69
  - Initialization functions (schedulers, pollers, listeners) that don't guard against multiple calls — creates duplicate instances. Check for existing instances before reinitializing
67
70
  - Self-rescheduling callbacks (one-shot timers, deferred job handlers) where the next cycle is registered inside the callback body — an unhandled error before the re-registration call permanently stops the schedule. Wrap the callback body in try/finally with re-registration in the finally block, or register the next cycle before executing the current one
68
71
 
@@ -82,11 +85,13 @@
82
85
  - Code reading properties from API responses, framework-provided objects, or internal abstraction layers using field names the source doesn't populate or forward — silent `undefined`. Verify property names and nesting depth match the actual response shape (e.g., `response.items` vs `response.data.items`, `obj.placeId` vs `obj.id`, flat fields vs nested sub-objects). When building a new consumer against an existing API, check the producer's actual response — not assumed conventions. When branching on fields from a wrapped third-party API, confirm the wrapper actually requests and forwards those fields (e.g., optional response attributes that require explicit opt-in)
83
86
  - Data model fields that have different names depending on the creation/write path (e.g., `createdAt` vs `created`) — code referencing only one naming convention silently misses records created through other paths. Trace all write paths to discover the actual field names in use. When new logic (access control, UI display, queries) checks only a newly introduced field, verify it falls back to any legacy field that existing records still use — otherwise records created before the migration are silently excluded or inaccessible. Also check entity identity keys: if code looks up or matches entities using a computed key (e.g., `e.id || e.externalId`), all code paths that perform the same lookup must use the same key computation — one path using `e.id` while another uses `e.id || e.externalId` causes mismatches for entities missing the primary key
84
87
  - Entity type changes without invariant revalidation — when an entity has a discriminator field (type, kind, category) and the user changes it, all type-specific invariants must be enforced on the new type AND type-specific fields from the old type must be cleared or revalidated. A job changing from `shell` to `agent` without clearing `command`, or changing to `shell` without requiring `command`, leaves the entity in an invalid hybrid state that fails at runtime or resurfaces stale data
88
+ - Invariant relationships between configuration flags (flag A implies flag B) not enforced across all layers — UI toggle handlers, API validation schemas, server default-application functions, and serialization/deserialization must all preserve the invariant. If any layer allows setting A=true with B=false (or vice versa), cascading defaults and toggle logic produce contradictory state. Trace the invariant through: UI state handlers, form submission, route validation, service defaults, and persistence round-trip
85
89
  - Operations scoped to a specific entity subtype that don't verify the entity's type discriminator before processing — an endpoint or function designed for one account/entity type that accepts any entity by ID can corrupt state or produce wrong results when called with the wrong type. Add an explicit type guard and return a structured error
86
- - Inconsistent "missing value" semantics across layers — one layer treats `null`/`undefined` as missing while another also treats empty strings or whitespace-only strings as missing. Query filters, update expressions, and UI predicates that disagree on what constitutes "missing" cause records to be skipped by one path but processed by another. Define a single `isMissing` predicate and use it consistently, or normalize empty/whitespace values to `null` at write time. Also applies to comparison/detection logic: coercing an absent field to a sentinel (`?? 0`, default parameters) makes the logic treat "unsupported" as a real value — guard with an explicit presence check before comparing. Watch for validation/sanitization functions that return `null` for invalid input when `null` also means "clear/delete" downstream — malformed input silently destroys existing data. Distinguish "invalid, reject the request" from "explicitly clear this field"
90
+ - Inconsistent "missing value" semantics across layers — one layer treats `null`/`undefined` as missing while another also treats empty strings or whitespace-only strings as missing. Query filters, update expressions, and UI predicates that disagree on what constitutes "missing" cause records to be skipped by one path but processed by another. Define a single `isMissing` predicate and use it consistently, or normalize empty/whitespace values to `null` at write time. Also applies to comparison/detection logic: coercing an absent field to a sentinel (`?? 0`, default parameters) makes the logic treat "unsupported" as a real value — guard with an explicit presence check before comparing. Watch for validation/sanitization functions that return `null` for invalid input when `null` also means "clear/delete" downstream — malformed input silently destroys existing data. Distinguish "invalid, reject the request" from "explicitly clear this field". Also applies to normalization (trailing slashes, case, whitespace): if one path normalizes a value before comparison but the write path stores it un-normalized, comparisons against the stored value produce incorrect results — normalize at write time or normalize both sides consistently
91
+ - Validation functions that delegate to runtime-behavior computations (next schedule occurrence, URL reachability, resource resolution) — conflating "no result within search window" or "temporarily unavailable" with "invalid input" rejects valid configurations. Validate syntax and structure independently of runtime feasibility
87
92
  - Numeric values from strings used without `NaN`/type guards — `NaN` comparisons silently pass bounds checks. Clamp query params to safe lower bounds
88
93
  - UI elements hidden from navigation but still accessible via direct URL — enforce restrictions at the route level
89
- - Summary counters/accumulators that miss edge cases (removals, branch coverage, underflow on decrements — guard against going negative with lower-bound conditions); silent operations in verbose sequences where all branches should print status
94
+ - Summary counters/accumulators that miss edge cases (removals, branch coverage, underflow on decrements — guard against going negative with lower-bound conditions); counters incremented before confirming the operation actually changed state — rejected, skipped, or no-op iterations inflate success counts. Silent operations in verbose sequences where all branches should print status
90
95
 
91
96
  **Concurrency & data integrity** _[applies when: code has shared state, database writes, or multi-step mutations]_
92
97
  - Shared mutable state accessed by concurrent requests without locking or atomic writes; multi-step read-modify-write cycles that can interleave — use conditional writes/optimistic concurrency (e.g., condition expressions, version checks) to close the gap between read and write; if the conditional write fails, surface a retryable error instead of letting it bubble as a 500
@@ -98,7 +103,7 @@
98
103
 
99
104
  **Input handling** _[applies when: code accepts user/external input]_
100
105
  - Trimming values where whitespace is significant (API keys, tokens, passwords, base64) — only trim identifiers/names
101
- - Endpoints accepting unbounded arrays/collections without upper limits — enforce max size or move to background jobs
106
+ - Endpoints accepting unbounded arrays/collections without upper limits — enforce max size or move to background jobs. Also check internal operations that fan out unbounded parallel I/O (e.g., `Promise.all(files.map(readFile))`) — large collections risk EMFILE (too many open file descriptors) or memory exhaustion. Use a concurrency limiter or batch processing for collections that can grow without bound
102
107
  - Security/sanitization functions (redaction, escaping, validation) that only handle one input format — if data can arrive in multiple formats (JSON `"KEY": "value"`, shell `KEY=value`, URL-encoded, headers), the function must cover all formats present in the system or sensitive data leaks through the unhandled format
103
108
 
104
109
  ## Tier 3 — Domain-Specific (Check Only When File Type Matches)
@@ -124,6 +129,7 @@
124
129
  **Lazy initialization & module loading** _[applies when: code uses dynamic imports, lazy singletons, or bootstrap sequences]_
125
130
  - Cached state getters returning null before initialization — provide async initializer or ensure-style function
126
131
  - Module-level side effects (file reads, SDK init) without error handling — corrupted files crash the process on import
132
+ - File writes that assume the parent directory exists — on fresh installs or after directory cleanup, the write fails with ENOENT. Ensure the directory exists before writing (or create it on demand)
127
133
  - Bootstrap/resilience code that imports the dependencies it's meant to install — restructure so installation precedes resolution
128
134
  - Re-exporting from heavy modules defeats lazy loading — use lightweight shared modules
129
135
 
@@ -167,7 +173,7 @@
167
173
  - Completion markers, success flags, or status files written before the operation they attest to finishes — consumers see false success if the operation fails after the write
168
174
  - Existence checks (directory exists, file exists, module resolves) used as proof of correct/complete installation — a directory can exist but be empty, a file can exist with invalid contents. Verify the specific resource the consumer needs
169
175
  - Lookups that check only one scope when multiple exist — e.g., checking local git branches but not remote, checking in-memory cache but not persistent store. Trace all locations where the resource could exist and check each
170
- - Tracking/checkpoint files that default to empty on parse failure — causes full re-execution. Fail loudly instead
176
+ - Tracking/checkpoint files that default to empty on parse failure — causes full re-execution. Fail loudly instead. More broadly, safety/guard checks that catch errors and default to "safe to proceed" (fail-open) rather than treating errors as "unsafe, abort" (fail-closed) — a guard that silently succeeds on error provides no protection when it's needed most
171
177
  - Registering references to resources without verifying the resource exists — dangling references after failed operations
172
178
 
173
179
  **Automated pipeline discipline**
@@ -56,8 +56,8 @@ Run the following loop until Copilot returns zero new comments:
56
56
  - Error detection: if the review body contains "Copilot encountered an
57
57
  error" or "unable to review this pull request", re-request (step 1)
58
58
  and resume polling. Max 3 error retries before reporting failure.
59
- - If no review appears after max wait, report the timeout — the parent
60
- agent will ask the user what to do
59
+ - If no review appears after max wait, report the timeout.
60
+ **Default mode**: skip and continue. **Interactive mode (`--interactive`)**: ask the user what to do
61
61
 
62
62
  3. CHECK for unresolved comments:
63
63
  - Filter review threads for isResolved: false
@@ -70,6 +70,7 @@ Run the following loop until Copilot returns zero new comments:
70
70
  4. FIX all unresolved review comments:
71
71
  For each unresolved thread:
72
72
  - Read the referenced file and understand the feedback
73
+ - Evaluate if the finding is a real issue — if it is, fix it regardless of whether the current PR modified that code. Never dismiss findings as "out of scope" or "pre-existing."
73
74
  - Make the code fix
74
75
  - Run the build command
75
76
  - If build passes, commit: address review: <summary>
@@ -78,8 +79,8 @@ Run the following loop until Copilot returns zero new comments:
78
79
  - After all threads resolved, push all commits to remote
79
80
  - Increment iteration counter
80
81
  - If iteration counter reaches 10, stop the loop and report back with
81
- status "guardrail" the parent agent will ask the user whether to
82
- continue or stop
82
+ status "guardrail". **Default mode**: auto-stop and mark as best-effort.
83
+ **Interactive mode (`--interactive`)**: ask the user whether to continue or stop
83
84
  - Otherwise, go back to step 1
84
85
 
85
86
  When done, report back:
@@ -89,4 +90,8 @@ When done, report back:
89
90
  - Any unresolved threads remaining
90
91
  ```
91
92
 
92
- Launch the sub-agent and wait for its result. If the sub-agent reports a timeout or error, **ask the user** whether to continue waiting, re-request the review, or skip — never proceed without user approval when the review loop fails.
93
+ Launch the sub-agent and wait for its result.
94
+
95
+ **Default mode**: If the sub-agent reports a timeout or error, skip the timed-out review and continue autonomously.
96
+
97
+ **Interactive mode (`--interactive`)**: If the sub-agent reports a timeout or error, ask the user whether to continue waiting, re-request the review, or skip.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "slash-do",
3
- "version": "1.8.0",
3
+ "version": "2.0.0",
4
4
  "description": "Curated slash commands for AI coding assistants — Claude Code, OpenCode, Gemini CLI, and Codex",
5
5
  "author": "Adam Eivy <adam@eivy.com>",
6
6
  "license": "MIT",