slash-do 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,587 @@
1
+ ---
2
+ description: Unified DevSecOps audit, remediation, per-category PRs, CI verification, and Copilot review loop with worktree isolation
3
+ argument-hint: "[--scan-only] [--no-merge] [path filter or focus areas]"
4
+ ---
5
+
6
+ # MakeGood — Unified DevSecOps Pipeline
7
+
8
+ Run the full DevSecOps lifecycle: audit the codebase with 7 deduplicated agents, consolidate findings, remediate in an isolated worktree, create **separate PRs per category** with SemVer bump, verify CI, run Copilot review loops, and merge.
9
+
10
+ Parse `$ARGUMENTS` for:
11
+ - **`--scan-only`**: run Phase 0 + 1 + 2 only (audit and plan), skip remediation
12
+ - **`--no-merge`**: run through PR creation (Phase 5), skip Copilot review and merge
13
+ - **Path filter**: limit scanning scope to specific directories or files
14
+ - **Focus areas**: e.g., "security only", "DRY and bugs"
15
+
16
+ ## Phase 0: Discovery & Setup
17
+
18
+ Detect the project environment before any scanning or remediation.
19
+
20
+ ### 0a: VCS Host Detection
21
+ Run `gh auth status` to check GitHub CLI. If it fails, run `glab auth status` for GitLab.
22
+ - Set `VCS_HOST` to `github` or `gitlab`
23
+ - Set `CLI_TOOL` to `gh` or `glab`
24
+ - If neither is authenticated, warn the user and halt
25
+
26
+ ### 0b: Project Type Detection
27
+ Check for project manifests to determine the tech stack:
28
+ - `package.json` → Node.js (check for `next`, `react`, `vue`, `express`, etc.)
29
+ - `Cargo.toml` → Rust
30
+ - `pyproject.toml` / `requirements.txt` → Python
31
+ - `go.mod` → Go
32
+ - `pom.xml` / `build.gradle` → Java/Kotlin
33
+ - `Gemfile` → Ruby
34
+ - `*.csproj` / `*.sln` → .NET
35
+
36
+ Record the detected stack as `PROJECT_TYPE` for agent context.
37
+
38
+ ### 0c: Build & Test Command Detection
39
+ Derive build and test commands from the project type:
40
+ - Node.js: check `package.json` scripts for `build`, `test`, `typecheck`, `lint`
41
+ - Rust: `cargo build`, `cargo test`
42
+ - Python: `pytest`, `python -m pytest`
43
+ - Go: `go build ./...`, `go test ./...`
44
+ - If ambiguous, check CLAUDE.md for documented commands
45
+
46
+ Record as `BUILD_CMD` and `TEST_CMD`.
47
+
48
+ ### 0d: State Snapshot
49
+ - Record `CURRENT_BRANCH` via `git rev-parse --abbrev-ref HEAD`
50
+ - Record `DEFAULT_BRANCH` via `gh repo view --json defaultBranchRef --jq '.defaultBranchRef.name'` (or `glab` equivalent)
51
+ - Record `IS_DIRTY` via `git status --porcelain`
52
+ - Check for `.changelog/` directory → `HAS_CHANGELOG`
53
+ - Check for existing `../makegood-*` worktrees: `git worktree list`. If found, inform the user and ask whether to resume (use existing worktree) or clean up (remove it and start fresh)
54
+
55
+ ### 0e: Browser Authentication (GitHub only)
56
+ If `VCS_HOST` is `github`, proactively verify browser authentication for the Copilot review loop later:
57
+ 1. Navigate to the repo URL using `browser_navigate` via Playwright MCP
58
+ 2. Take a snapshot and check for user avatar/menu indicating logged-in state
59
+ 3. If NOT logged in: navigate to `https://github.com/login`, inform the user **"Please log in to GitHub in the browser. I'll wait for you to complete authentication."**, and use `AskUserQuestion` to wait for the user to confirm they've logged in
60
+ 4. Do NOT close the browser — it stays open for the entire session
61
+ 5. Record `BROWSER_AUTHENTICATED = true` once confirmed
62
+
63
+ This ensures the browser is ready before we need it in Phase 6, avoiding interruptions mid-flow.
64
+
65
+ ## Phase 1: Unified Audit
66
+
67
+ Read the project's CLAUDE.md files first to understand conventions. Pass relevant conventions to each agent.
68
+
69
+ Launch 7 Explore agents in two batches. Each agent must report findings in this format:
70
+ ```
71
+ - **[CRITICAL/HIGH/MEDIUM/LOW]** `file:line` - Description. Suggested fix: ... Complexity: Simple/Medium/Complex
72
+ ```
73
+
74
+ **IMPORTANT: Context requirement for audit agents.** When flagging an issue, agents MUST read at least 30 lines of surrounding context to confirm the issue is real. Common false positives to watch for:
75
+ - A Promise `.then()` chain that appears "unawaited" but IS collected into an array and awaited via `Promise.all` downstream
76
+ - A value that appears "unvalidated" but IS checked by a guard clause earlier in the function or by the caller
77
+ - A pattern that looks like an anti-pattern in isolation but IS idiomatic for the specific framework or library being used
78
+ - An `async` function called without `await` that IS intentionally fire-and-forget (the return value is unused by design)
79
+
80
+ If the surrounding context shows the code is correct, do NOT flag it.
81
+
82
+ ### Batch 1 (5 parallel Explore agents via Task tool):
83
+
84
+ 1. **Security & Secrets**
85
+ Sources: authentication checks, credential exposure, infrastructure security, input validation, dependency health
86
+ Focus: hardcoded credentials, API keys, exposed secrets, authentication bypasses, disabled security checks, PII exposure, injection vulnerabilities (SQL/command/path traversal), insecure CORS configurations, missing auth checks, unsanitized user input in file paths or queries, known CVEs in dependencies (check `npm audit` / `cargo audit` / `pip-audit` / `go vuln` output), abandoned or unmaintained dependencies, overly permissive dependency version ranges
87
+
88
+ 2. **Code Quality & Style**
89
+ Sources: code brittleness, convention violations, test workarounds, logging & observability
90
+ Focus: magic numbers, brittle conditionals, hardcoded execution paths, test-specific hacks, narrow implementations that pass specific cases but lack generality, dead/unreachable code, unused imports/variables, violations of CLAUDE.md conventions (try/catch usage, window.alert/confirm, class-based code where functional preferred), anti-patterns specific to the detected tech stack, inconsistent or missing structured logging (raw `console.log`/`print` in production code instead of a logger), missing log levels or correlation IDs, swallowed errors (empty catch blocks, `.catch(() => {})`, bare `except: pass`), missing request/response logging at API boundaries
91
+
92
+ 3. **DRY & YAGNI**
93
+ Sources: duplication patterns, speculative abstractions
94
+ Focus: duplicate code blocks, copy-paste patterns, redundant implementations, repeated inline logic (count duplications per pattern, e.g., "DATA_DIR declared 20+ times"), speculative abstractions, unused features, over-engineered solutions, premature optimization, YAGNI violations
95
+
96
+ 4. **Architecture & SOLID**
97
+ Sources: structural violations, coupling analysis, modularity, API contract quality
98
+ Focus: Single Responsibility violations (god files >500 lines, functions >50 lines doing multiple things), tight coupling between modules, circular dependencies, mixed concerns in single files, dependency inversion violations, classes/modules with too many responsibilities (>20 public methods), deep nesting (>4 levels), long parameter lists, modules reaching into other modules' internals, inconsistent API error response shapes across endpoints, list endpoints missing pagination, missing rate limiting on public endpoints, inconsistent request/response envelope patterns
99
+
100
+ 5. **Bugs, Performance & Error Handling**
101
+ Sources: runtime safety, resource management, async correctness, performance, race conditions
102
+ Focus: missing `await` on async calls, unhandled promise rejections, null/undefined access without guards, off-by-one errors, incorrect comparison operators, mutation of shared state, resource leaks (unbounded caches/maps, unclosed connections/streams), `process.exit()` in library code, async routes without error forwarding, missing AbortController on data fetching, N+1 query patterns (loading related records inside loops), O(n²) or worse algorithms in hot paths, unbounded result sets (missing LIMIT/pagination on DB queries), missing database indexes on frequently queried columns, race conditions (TOCTOU, double-submit without idempotency keys, concurrent writes to shared state without locks, stale-read-then-write patterns), missing connection pooling or pool exhaustion
103
+
104
+ ### Batch 2 (2 agents after Batch 1 completes):
105
+
106
+ 6. **Stack-Specific**
107
+ Dynamically focus based on `PROJECT_TYPE` detected in Phase 0:
108
+ - **Node/React**: missing cleanup in useEffect, stale closures, unstable deps arrays, duplicate hooks across components, re-created functions inside render, missing AbortController, bundle size concerns (large imports that could be tree-shaken or lazy-loaded)
109
+ - **Rust**: unsafe blocks, lifetime issues, unwrap() in non-test code, clippy warnings
110
+ - **Python**: mutable default arguments, bare except clauses, missing type hints on public APIs, sync I/O in async contexts
111
+ - **Go**: unchecked errors, goroutine leaks, defer in loops, context propagation gaps
112
+ - **Web projects (any stack)**: accessibility issues — missing alt text on images, broken keyboard navigation, missing ARIA labels on interactive elements, insufficient color contrast, form inputs without associated labels
113
+ - General: framework-specific security issues, language-specific gotchas, domain-specific compliance, environment variable hygiene (missing `.env.example`, required env vars not validated at startup, secrets in config files that should be in env)
114
+
115
+ 7. **Test Coverage**
116
+ Uses Batch 1 findings as context to prioritize:
117
+ Focus: missing test files for critical modules, untested edge cases, tests that only cover happy paths, mocked dependencies that hide real bugs, areas with high complexity (identified by agents 1-5) but no tests, test files that don't actually assert anything meaningful
118
+
119
+ Wait for ALL agents to complete before proceeding.
120
+
121
+ ## Phase 2: Plan Generation
122
+
123
+ 1. Read the existing `PLAN.md` (create if it doesn't exist)
124
+ 2. Consolidate all findings from Phase 1, deduplicating across agents (same file:line flagged by multiple agents → keep the most specific description)
125
+ 3. Identify **shared utility extractions** — patterns duplicated 3+ times that should become reusable functions. Group these as "Foundation" work for Phase 3b.
126
+ 4. **Build the file ownership map** (CRITICAL for Phase 5):
127
+ - For each finding, record which file(s) it touches
128
+ - Assign each file to exactly ONE category (its primary category)
129
+ - If a file is touched by multiple categories, assign it to the category with the highest-severity finding for that file
130
+ - Record the mapping as `FILE_OWNER_MAP` — this ensures no two PRs modify the same file
131
+ - If a module extraction creates a new file (e.g., extracting `mediaConvert.js` from `dbCrud.js`), add a backward-compatible re-export in the original file so other PRs don't break
132
+ 5. Add a new section to PLAN.md: `## MakeGood Audit - {YYYY-MM-DD}`
133
+
134
+ ```markdown
135
+ ## MakeGood Audit - {date}
136
+
137
+ Summary: {N} findings across {M} files. {X} shared utilities to extract.
138
+
139
+ ### Foundation — Shared Utilities
140
+ For each utility: name, purpose, files it replaces, signature sketch.
141
+
142
+ ### File Ownership Map
143
+ | File | Primary Category | Reason |
144
+ For each file touched by multiple categories, document why it was assigned to one.
145
+
146
+ ### Security & Secrets
147
+ - [ ] **[CRITICAL]** `file:line` - Description — Fix: ... (Complexity: Simple/Medium/Complex)
148
+
149
+ ### Code Quality
150
+ - [ ] **[HIGH]** `file:line` - Description — Fix: ...
151
+
152
+ ### DRY & YAGNI
153
+ - [ ] **[MEDIUM]** `file:line` - Description — Fix: ...
154
+
155
+ ### Architecture & SOLID
156
+ ### Bugs, Performance & Error Handling
157
+ ### Stack-Specific
158
+ ### Test Coverage (tracked, not auto-remediated)
159
+ ```
160
+
161
+ 6. Print a summary table:
162
+ ```
163
+ | Category | CRITICAL | HIGH | MEDIUM | LOW | Total |
164
+ |-------------------|----------|------|--------|-----|-------|
165
+ | Security | ... | ... | ... | ... | ... |
166
+ | Code Quality | ... | ... | ... | ... | ... |
167
+ | DRY & YAGNI | ... | ... | ... | ... | ... |
168
+ | Architecture | ... | ... | ... | ... | ... |
169
+ | Bugs & Perf | ... | ... | ... | ... | ... |
170
+ | Stack-Specific | ... | ... | ... | ... | ... |
171
+ | Test Coverage | ... | ... | ... | ... | ... |
172
+ | TOTAL | ... | ... | ... | ... | ... |
173
+ ```
174
+
175
+ **GATE: If `--scan-only` was passed, STOP HERE.** Print the summary and exit.
176
+
177
+ ## Phase 3: Worktree Remediation
178
+
179
+ Only proceed with CRITICAL, HIGH, and MEDIUM findings. LOW and Test Coverage findings remain tracked in PLAN.md but are not auto-remediated.
180
+
181
+ ### 3a: Setup
182
+
183
+ 1. If `IS_DIRTY` is true: `git stash --include-untracked -m "makegood: pre-scan stash"`
184
+ 2. Set `DATE` to today's date in YYYY-MM-DD format
185
+ 3. Create the worktree:
186
+ ```bash
187
+ git worktree add ../makegood-{DATE} -b makegood/{DATE}
188
+ ```
189
+ 4. Set `WORKTREE_DIR` to `../makegood-{DATE}`
190
+
191
+ ### 3b: Foundation Utilities
192
+
193
+ This phase is done by the team lead (you) directly — NOT delegated to agents — because all subsequent agents depend on these files existing and compiling.
194
+
195
+ 1. Create each shared utility file identified in Phase 2's "Foundation" section
196
+ 2. When extracting functions from an existing module, **add a backward-compatible re-export** in the original module:
197
+ ```js
198
+ // Re-export for backward compatibility (extracted to newModule.js)
199
+ export { extractedFunction } from "./newModule.js";
200
+ ```
201
+ This prevents cross-PR import breakage when different PRs modify different files.
202
+ 3. Run `{BUILD_CMD}` in the worktree to verify compilation:
203
+ ```bash
204
+ cd {WORKTREE_DIR} && {BUILD_CMD}
205
+ ```
206
+ 4. If build fails, fix issues before proceeding
207
+ 5. Commit in the worktree:
208
+ ```bash
209
+ git -C {WORKTREE_DIR} add <specific files>
210
+ git -C {WORKTREE_DIR} commit -m "refactor: add shared utilities for {purpose}"
211
+ ```
212
+
213
+ If no shared utilities were identified, skip this step.
214
+
215
+ ### 3c: Parallel Remediation
216
+
217
+ 1. Use `TeamCreate` with name `makegood-{DATE}`
218
+ 2. Use `TaskCreate` for each category that has CRITICAL, HIGH, or MEDIUM findings. Possible categories:
219
+ - Security & Secrets
220
+ - Code Quality & Style
221
+ - DRY & YAGNI
222
+ - Architecture & SOLID
223
+ - Bugs, Performance & Error Handling
224
+ - Stack-Specific
225
+ 3. Only create tasks for categories that have actionable findings
226
+ 4. Spawn up to 5 general-purpose agents as teammates
227
+
228
+ ### Agent instructions template:
229
+ ```
230
+ You are {agent-name} on team makegood-{DATE}.
231
+
232
+ Your task: Fix all {CATEGORY} findings from the MakeGood audit.
233
+ Working directory: {WORKTREE_DIR} (this is a git worktree — all work happens here)
234
+
235
+ Project type: {PROJECT_TYPE}
236
+ Build command: {BUILD_CMD}
237
+ Test command: {TEST_CMD}
238
+
239
+ Foundation utilities available (if created):
240
+ {list of utility files with brief descriptions}
241
+
242
+ Findings to address:
243
+ {filtered list of CRITICAL/HIGH/MEDIUM findings for this category}
244
+
245
+ FINDING VALIDATION — verify before fixing:
246
+ - Before fixing each finding, READ the file and at least 30 lines of surrounding
247
+ context to confirm the issue is genuine.
248
+ - Check whether the flagged code is already correct (e.g., a Promise chain that
249
+ IS properly awaited downstream, a value that IS validated earlier in the function,
250
+ a pattern that IS idiomatic for the framework).
251
+ - If the existing code is already correct, SKIP the fix and report it as a
252
+ false positive with a brief explanation of why the original code is fine.
253
+ - Do not make changes that are semantically equivalent to the original code
254
+ (e.g., wrapping a .then() chain in an async IIFE adds noise without fixing anything).
255
+
256
+ COMMIT STRATEGY — commit early and often:
257
+ - After completing each logical group of related fixes, stage those files
258
+ and commit immediately with a descriptive conventional commit message.
259
+ - Each commit should be independently valid (build should pass).
260
+ - Run {BUILD_CMD} in {WORKTREE_DIR} before each commit to verify.
261
+ - Use `git -C {WORKTREE_DIR} add <specific files>` — never `git add -A` or `git add .`
262
+ - Use `git -C {WORKTREE_DIR} commit -m "prefix: description"`
263
+ - Use conventional commit prefixes: fix:, refactor:, feat:, security:
264
+ - Do NOT include co-author or generated-by annotations in commits.
265
+ - Do NOT bump the version — that happens once at the end.
266
+
267
+ After all fixes:
268
+ - Ensure all changes are committed (no uncommitted work)
269
+ - Mark your task as completed via TaskUpdate
270
+ - Report: commits made, files modified, findings addressed, any skipped issues
271
+
272
+ CONFLICT AVOIDANCE:
273
+ - Only modify files listed in your assigned findings
274
+ - If you need to modify a file assigned to another agent, skip that change and report it
275
+ ```
276
+
277
+ ### Conflict avoidance:
278
+ - Review all findings before task assignment. If two categories touch the same file, assign both sets of findings to the same agent.
279
+ - Security agent gets priority on validation logic; DRY agent gets priority on import consolidation.
280
+
281
+ ## Phase 4: Verification
282
+
283
+ After all agents complete:
284
+
285
+ 1. Run the full build in the worktree:
286
+ ```bash
287
+ cd {WORKTREE_DIR} && {BUILD_CMD}
288
+ ```
289
+ 2. Run tests in the worktree:
290
+ ```bash
291
+ cd {WORKTREE_DIR} && {TEST_CMD}
292
+ ```
293
+ 3. If build or tests fail:
294
+ - Identify which commits caused the failure via `git bisect` or manual review
295
+ - Attempt to fix in a new commit: `fix: resolve build/test failure from {category} changes`
296
+ - If unfixable, revert the problematic commit(s): `git -C {WORKTREE_DIR} revert <sha>` and note which findings were skipped
297
+ 4. Shut down all agents via `SendMessage` with `type: "shutdown_request"`
298
+ 5. Clean up team via `TeamDelete`
299
+
300
+ ## Phase 5: Per-Category PR Creation
301
+
302
+ Instead of one mega PR, create **separate branches and PRs for each category**. This enables independent review, targeted CI, and granular merge decisions.
303
+
304
+ ### 5a: Build the Category Branches
305
+
306
+ Using the `FILE_OWNER_MAP` from Phase 2, create one branch per category:
307
+
308
+ For each category that has findings:
309
+ 1. Switch to `{DEFAULT_BRANCH}`: `git checkout {DEFAULT_BRANCH}`
310
+ 2. Create a category branch: `git checkout -b makegood/{CATEGORY_SLUG}`
311
+ - Use slugs: `security`, `code-quality`, `dry`, `arch-bugs`, `stack-specific`
312
+ 3. For each file assigned to this category in `FILE_OWNER_MAP`:
313
+ - **Modified files**: `git checkout origin/makegood/{DATE} -- {file_path}`
314
+ - **New files (Added)**: `git checkout origin/makegood/{DATE} -- {file_path}`
315
+ - **Deleted files**: `git rm {file_path}`
316
+ 4. Commit all staged changes with a descriptive message:
317
+ ```bash
318
+ git commit -m "{prefix}: {category summary}"
319
+ ```
320
+ 5. Push the branch: `git push -u origin makegood/{CATEGORY_SLUG}`
321
+
322
+ **CRITICAL: File isolation rule** — each file must appear in exactly ONE branch. If a file has changes from multiple categories (e.g., `server/index.js` with both security and stack-specific changes), assign the whole file to one category based on the file ownership map. Do not split file-level changes across PRs.
323
+
324
+ **CRITICAL: Cross-PR dependency check** — after building all branches, verify each branch builds independently:
325
+ ```bash
326
+ git checkout makegood/{CATEGORY_SLUG} && {BUILD_CMD}
327
+ ```
328
+ If a branch fails because it imports from a new module created in another branch:
329
+ - Add a backward-compatible re-export in the original module (in the branch that has the original module)
330
+ - Or move the new module file to the branch that needs it
331
+ - Or revert the import change to use the original module path
332
+
333
+ ### 5b: Version Bump
334
+
335
+ Only if ALL category branches pass build:
336
+ 1. Pick the first category branch (e.g., `makegood/security`) for the version bump
337
+ 2. Analyze all commits across ALL category branches to determine the aggregate SemVer bump:
338
+ - Any `breaking:` or `BREAKING CHANGE` → **major**
339
+ - Any `feat:` → **minor**
340
+ - Otherwise (fix:, refactor:, security:, chore:) → **patch**
341
+ 3. Bump the version on that branch:
342
+ ```bash
343
+ git checkout makegood/{FIRST_CATEGORY}
344
+ npm version {LEVEL} --no-git-tag-version
345
+ git add package.json package-lock.json
346
+ git commit -m "chore: bump version to {NEW_VERSION}"
347
+ git push
348
+ ```
349
+ 4. If `HAS_CHANGELOG`, update changelog and include in the commit.
350
+
351
+ ### 5c: Create PRs
352
+
353
+ For each category branch, create a PR:
354
+
355
+ **GitHub:**
356
+ ```bash
357
+ gh pr create --head makegood/{CATEGORY_SLUG} --base {DEFAULT_BRANCH} \
358
+ --title "{prefix}: {short description}" \
359
+ --body "$(cat <<'EOF'
360
+ ## MakeGood Audit — {Category Name}
361
+
362
+ ### Summary
363
+ {count} findings addressed across {files} files.
364
+
365
+ ### Changes
366
+ {bulleted list of changes with severity levels}
367
+
368
+ ### Files Modified
369
+ {list of files}
370
+
371
+ ### Merge Order
372
+ {dependency info if applicable, e.g., "Depends on Security PR for shared helper exports" or "Independent — can be merged in any order"}
373
+ EOF
374
+ )"
375
+ ```
376
+
377
+ **GitLab:**
378
+ ```bash
379
+ glab mr create --source-branch makegood/{CATEGORY_SLUG} --target-branch {DEFAULT_BRANCH} \
380
+ --title "{prefix}: {short description}" --description "..."
381
+ ```
382
+
383
+ Record all `PR_NUMBERS` and `PR_URLS` in a map: `{category: {number, url}}`.
384
+
385
+ **GATE: If `--no-merge` was passed, STOP HERE.** Print all PR URLs and summary.
386
+
387
+ **GATE: If `VCS_HOST` is `gitlab`, STOP HERE.** Print all MR URLs and summary. GitLab does not support the Copilot review loop.
388
+
389
+ ## Phase 5d: CI Verification
390
+
391
+ After creating all PRs, verify CI passes on each one:
392
+
393
+ 1. Wait 30 seconds for CI to start
394
+ 2. For each PR, poll CI status:
395
+ ```bash
396
+ gh pr checks {PR_NUMBER}
397
+ ```
398
+ Poll every 30 seconds, max 10 minutes per PR.
399
+
400
+ 3. If CI **passes** on all PRs → proceed to Phase 6
401
+
402
+ 4. If CI **fails** on any PR:
403
+ a. Fetch the failure logs:
404
+ ```bash
405
+ gh run view {RUN_ID} --job {JOB_ID} --log-failed
406
+ ```
407
+ b. Analyze the failure — common causes:
408
+ - **Missing imports**: a file imports from a module in another PR's branch. Fix by adding a backward-compatible re-export or reverting the import.
409
+ - **Missing exports**: a module removed an export that other code still references. Fix by adding a re-export.
410
+ - **Test failures**: a test depends on code changed in the PR. Fix the test or the code.
411
+ c. Switch to the failing branch:
412
+ ```bash
413
+ git checkout makegood/{CATEGORY_SLUG}
414
+ ```
415
+ d. Make the fix, commit, and push:
416
+ ```bash
417
+ git add <specific files>
418
+ git commit -m "fix: resolve CI failure - {description}"
419
+ git push
420
+ ```
421
+ e. Re-poll CI until it passes or max retries (3) are exhausted
422
+ f. If CI still fails after 3 fix attempts, inform the user and continue with other PRs
423
+
424
+ ## Phase 6: Copilot Review Loop (GitHub only)
425
+
426
+ Maximum 5 iterations per PR to prevent infinite loops.
427
+
428
+ ### 6.0: Verify browser authentication
429
+
430
+ If `BROWSER_AUTHENTICATED` is not true (e.g., Phase 0e was skipped or failed):
431
+ 1. Navigate to the first PR URL using `browser_navigate`
432
+ 2. Check for user avatar/menu
433
+ 3. If not logged in: navigate to `https://github.com/login`, inform the user **"Please log in to GitHub in the browser. I'll wait for you to confirm."**, and use `AskUserQuestion` to wait
434
+ 4. Do NOT close the browser at any point during this phase
435
+
436
+ ### 6.1: Request Copilot reviews on all PRs
437
+
438
+ For each PR:
439
+
440
+ **Try the API first:**
441
+ ```bash
442
+ gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/requested_reviewers --method POST --input - <<< '{"reviewers":["copilot-pull-request-reviewer"]}'
443
+ ```
444
+
445
+ If this returns 422 ("not a collaborator"), **fall back to Playwright** for each PR:
446
+ 1. Navigate to `{PR_URL}`
447
+ 2. Click the "Reviewers" gear button in the PR sidebar
448
+ 3. In the dropdown menu, click the Copilot `menuitemradio` option (not the "Request" button which may be obscured by the dropdown header)
449
+ 4. Verify the sidebar shows "Awaiting requested review from Copilot"
450
+
451
+ ### 6.2: Poll for review completion
452
+
453
+ For each PR, poll every 15 seconds, max 3 minutes (12 polls):
454
+ ```bash
455
+ gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/reviews --jq '.[] | "\(.user.login): \(.state)"'
456
+ ```
457
+
458
+ Also check for inline comments:
459
+ ```bash
460
+ gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/comments --jq '.[] | "\(.user.login) [\(.path):\(.line)]: \(.body[:120])"'
461
+ ```
462
+
463
+ The review is complete when a `copilot[bot]` or `copilot-pull-request-reviewer[bot]` review appears.
464
+
465
+ ### 6.3: Check for unresolved threads
466
+
467
+ For each reviewed PR, fetch review threads via GraphQL using stdin JSON (**never use `$variables`** — shell expansion consumes `$` signs):
468
+ ```bash
469
+ echo '{"query":"{ repository(owner: \"{OWNER}\", name: \"{REPO}\") { pullRequest(number: {PR_NUMBER}) { reviewThreads(first: 100) { nodes { id isResolved comments(first: 10) { nodes { body path line author { login } } } } } } } }"}' | gh api graphql --input -
470
+ ```
471
+ Save to `/tmp/makegood_threads_{PR_NUMBER}.json` for parsing.
472
+
473
+ - If **no unresolved threads** → mark PR as ready to merge
474
+ - If **unresolved threads exist** → proceed to 6.4 (fix)
475
+
476
+ ### 6.4: Fix unresolved threads
477
+
478
+ For each unresolved thread on each PR:
479
+ 1. Read the referenced file and understand the feedback
480
+ 2. Evaluate: is the feedback valid? Some Copilot comments are informational or about pre-existing patterns.
481
+ - **Valid feedback**: make the code fix
482
+ - **Informational/false positive**: resolve the thread without changes
483
+ 3. If fixing:
484
+ ```bash
485
+ git checkout makegood/{CATEGORY_SLUG}
486
+ # make changes
487
+ git add <specific files>
488
+ git commit -m "address review: {summary of change}"
489
+ git push
490
+ ```
491
+ 4. Resolve the thread via GraphQL mutation:
492
+ ```bash
493
+ echo '{"query":"mutation { resolveReviewThread(input: {threadId: \"{THREAD_ID}\"}) { thread { id isResolved } } }"}' | gh api graphql --input -
494
+ ```
495
+
496
+ After all threads are resolved on a PR:
497
+ 1. Increment that PR's `REVIEW_ITERATION`
498
+ 2. If `REVIEW_ITERATION >= 5`: inform the user "Reached max review iterations (5) on PR #{number}. Remaining issues may need manual review."
499
+ 3. Otherwise: re-request Copilot review on that PR and repeat from 6.2
500
+
501
+ ### 6.5: Merge
502
+
503
+ For each PR that has passed CI and review (in dependency order if applicable):
504
+ ```bash
505
+ gh pr merge {PR_NUMBER} --merge
506
+ ```
507
+
508
+ Verify each merge:
509
+ ```bash
510
+ gh pr view {PR_NUMBER} --json state,mergedAt
511
+ ```
512
+
513
+ If merge fails (e.g., branch protection, merge conflicts from a prior PR):
514
+ - If merge conflict: rebase the branch and retry
515
+ ```bash
516
+ git checkout makegood/{CATEGORY_SLUG}
517
+ git pull --rebase origin {DEFAULT_BRANCH}
518
+ git push --force-with-lease
519
+ ```
520
+ Then re-run CI check before merging.
521
+ - If branch protection: inform the user and suggest manual merge
522
+
523
+ ## Phase 7: Cleanup
524
+
525
+ 1. Remove the worktree:
526
+ ```bash
527
+ git worktree remove {WORKTREE_DIR}
528
+ ```
529
+ 2. Delete local branches (only if merged):
530
+ ```bash
531
+ git branch -d makegood/{DATE}
532
+ git branch -d makegood/security makegood/code-quality makegood/dry makegood/arch-bugs makegood/stack-specific
533
+ ```
534
+ 3. Restore stashed changes (if stashed in Phase 3a):
535
+ ```bash
536
+ git stash pop
537
+ ```
538
+ 4. Update PLAN.md:
539
+ - Mark completed findings with `[x]`
540
+ - Add PR links to each category section header
541
+ - Note any skipped findings with reasons
542
+ 5. Print the final summary table:
543
+
544
+ ```
545
+ | Category | Findings | Fixed | Skipped | PR | CI | Review |
546
+ |--------------------|----------|-------|---------|----------|--------|----------|
547
+ | Security & Secrets | ... | ... | ... | #number | pass | approved |
548
+ | Code Quality | ... | ... | ... | #number | pass | approved |
549
+ | DRY & YAGNI | ... | ... | ... | #number | pass | approved |
550
+ | Architecture | ... | ... | ... | #number | pass | approved |
551
+ | Bugs & Perf | ... | ... | ... | #number | pass | approved |
552
+ | Stack-Specific | ... | ... | ... | #number | pass | approved |
553
+ | Test Coverage | ... | (tracked only) | ... | | |
554
+ | TOTAL | ... | ... | ... | N PRs | | |
555
+ ```
556
+
557
+ ## Error Recovery
558
+
559
+ - **Agent failure**: continue with remaining agents, note gaps in the summary
560
+ - **Build failure in worktree**: attempt fix in a new commit; if unfixable, revert problematic commits and ask the user
561
+ - **Push failure**: `git pull --rebase --autostash` then retry push
562
+ - **CI failure on PR**: investigate logs, fix in a new commit, push, re-check (max 3 attempts per PR)
563
+ - **Cross-PR dependency breakage**: add backward-compatible re-exports or move shared files to the PR that creates them
564
+ - **Copilot timeout** (review not received within 3 min): inform user, offer to merge without review approval or wait longer
565
+ - **Copilot review loop exceeds 5 iterations per PR**: stop iterating on that PR, inform user, proceed to merge
566
+ - **Existing worktree found at startup**: ask user — resume (reuse worktree) or cleanup (remove and start fresh)
567
+ - **`gh auth status` / `glab auth status` failure**: halt and tell user to authenticate first
568
+ - **No findings above LOW**: skip Phases 3-7, print "No actionable findings" with the LOW summary
569
+ - **Browser not authenticated**: use `AskUserQuestion` to ask the user to log in — never skip this or close the browser
570
+ - **Merge conflict after prior PR merged**: rebase the branch onto the updated default branch, push with `--force-with-lease`, re-run CI
571
+
572
+ !`cat ~/.claude/lib/graphql-escaping.md`
573
+
574
+ ## Notes
575
+
576
+ - This command is project-agnostic: it reads CLAUDE.md for project-specific conventions and auto-detects the tech stack
577
+ - All remediation happens in an isolated worktree — the user's working directory is never modified
578
+ - **One PR per category** — each category gets its own branch and PR for independent review and merge
579
+ - Each file appears in exactly ONE PR (file ownership map) to prevent merge conflicts between PRs
580
+ - When extracting modules, always add backward-compatible re-exports in the original module to prevent cross-PR breakage
581
+ - Version bump happens exactly once on the first category branch based on aggregate commit analysis
582
+ - Only CRITICAL, HIGH, and MEDIUM findings are auto-remediated; LOW and Test Coverage remain tracked in PLAN.md
583
+ - Do not include co-author or generated-by info in any commits, PRs, or output
584
+ - GitLab projects skip the Copilot review loop entirely (Phase 6) and stop after MR creation
585
+ - Always run `gh auth status` (or `glab auth status`) before any authenticated operation
586
+ - The Playwright browser should be opened early (Phase 0e) and kept open throughout — never close it prematurely
587
+ - CI must pass on each PR before requesting Copilot review or merging