slash-do 1.5.0 → 1.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/commands/do/better.md +212 -45
- package/lib/code-review-checklist.md +7 -0
- package/package.json +1 -1
package/commands/do/better.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
---
|
|
2
|
-
description: Unified DevSecOps audit, remediation, per-category PRs, CI verification, and Copilot review loop with worktree isolation
|
|
2
|
+
description: Unified DevSecOps audit, remediation, test enhancement, per-category PRs, CI verification, and Copilot review loop with worktree isolation
|
|
3
3
|
argument-hint: "[--scan-only] [--no-merge] [path filter or focus areas]"
|
|
4
4
|
---
|
|
5
5
|
|
|
@@ -58,6 +58,10 @@ When compacting during this workflow, always preserve:
|
|
|
58
58
|
- All PR numbers and URLs created so far
|
|
59
59
|
- `BUILD_CMD`, `TEST_CMD`, `PROJECT_TYPE`, `WORKTREE_DIR` values
|
|
60
60
|
- `VCS_HOST`, `CLI_TOOL`, `DEFAULT_BRANCH`, `CURRENT_BRANCH`
|
|
61
|
+
- `PHASE_4C_START_SHA` (needed for FILE_OWNER_MAP update in Phase 4c.3)
|
|
62
|
+
- `VACUOUS_TESTS_FIXED`, `WEAK_TESTS_STRENGTHENED`, `NEW_TEST_CASES`, `NEW_TEST_FILES`
|
|
63
|
+
- `CREATED_CATEGORY_SLUGS` (list of branch slugs created in Phase 5)
|
|
64
|
+
|
|
61
65
|
|
|
62
66
|
## Phase 0: Discovery & Setup
|
|
63
67
|
|
|
@@ -98,15 +102,6 @@ Record as `BUILD_CMD` and `TEST_CMD`.
|
|
|
98
102
|
- Check for `.changelog/` directory → `HAS_CHANGELOG`
|
|
99
103
|
- Check for existing `../better-*` worktrees: `git worktree list`. If found, inform the user and ask whether to resume (use existing worktree) or clean up (remove it and start fresh)
|
|
100
104
|
|
|
101
|
-
### 0e: Browser Authentication (GitHub only)
|
|
102
|
-
If `VCS_HOST` is `github`, proactively verify browser authentication for the Copilot review loop later:
|
|
103
|
-
1. Navigate to the repo URL using `browser_navigate` via Playwright MCP
|
|
104
|
-
2. Take a snapshot and check for user avatar/menu indicating logged-in state
|
|
105
|
-
3. If NOT logged in: navigate to `https://github.com/login`, inform the user **"Please log in to GitHub in the browser. I'll wait for you to complete authentication."**, and use `AskUserQuestion` to wait for the user to confirm they've logged in
|
|
106
|
-
4. Do NOT close the browser — it stays open for the entire session
|
|
107
|
-
5. Record `BROWSER_AUTHENTICATED = true` once confirmed
|
|
108
|
-
|
|
109
|
-
This ensures the browser is ready before we need it in Phase 6, avoiding interruptions mid-flow.
|
|
110
105
|
|
|
111
106
|
<audit_instructions>
|
|
112
107
|
|
|
@@ -181,9 +176,31 @@ Skip step 4 if steps 1-3 reveal the code is correct.
|
|
|
181
176
|
- **Database migrations**: exclusive-lock ALTER TABLE on large tables, CREATE INDEX without CONCURRENTLY, missing down migrations or untested rollback paths
|
|
182
177
|
- General: framework-specific security issues, language-specific gotchas, domain-specific compliance, environment variable hygiene (missing `.env.example`, required env vars not validated at startup, secrets in config files that should be in env)
|
|
183
178
|
|
|
184
|
-
7. **Test Coverage**
|
|
185
|
-
Uses Batch 1 findings as context to prioritize
|
|
186
|
-
Focus
|
|
179
|
+
7. **Test Quality & Coverage**
|
|
180
|
+
Uses Batch 1 findings as context to prioritize.
|
|
181
|
+
Focus areas:
|
|
182
|
+
|
|
183
|
+
**Coverage gaps:**
|
|
184
|
+
- Missing test files for critical modules, untested edge cases, tests that only cover happy paths
|
|
185
|
+
- Areas with high complexity (identified by agents 1-5) but no tests
|
|
186
|
+
- Remediation changes from agents 1-6 that lack corresponding test coverage
|
|
187
|
+
|
|
188
|
+
**Vacuous tests (tests that don't actually test anything):**
|
|
189
|
+
- Tests that assert on mocked return values instead of real behavior (testing the mock, not the code)
|
|
190
|
+
- Tests that only check truthiness (`assert.ok(result)`) when they should verify specific values or shapes
|
|
191
|
+
- Tests with assertions that can never fail (e.g., asserting a hardcoded value equals itself, asserting `typeof x === 'object'` on a literal `{}`)
|
|
192
|
+
- Tests that re-implement the logic under test instead of importing the real function — these pass even when real code regresses
|
|
193
|
+
- `it('should work', ...)` tests with no meaningful assertion or with assertions commented out
|
|
194
|
+
- Tests that mock the module they're testing (testing mock behavior, not real behavior)
|
|
195
|
+
|
|
196
|
+
**Weak test patterns:**
|
|
197
|
+
- Tests that verify implementation details (internal state, private methods, call counts) instead of observable behavior
|
|
198
|
+
- Tests where all assertions pass even if the function under test returns `null`/`undefined`/empty — verify by mentally substituting a no-op and checking if the test would still pass
|
|
199
|
+
- Integration tests that mock so aggressively they become unit tests of glue code
|
|
200
|
+
- Tests missing negative cases (invalid input, error paths, boundary conditions)
|
|
201
|
+
- Tests with shared mutable state between cases (`beforeEach` that doesn't reset, module-level variables)
|
|
202
|
+
|
|
203
|
+
Report each finding with a severity prefix `**[CRITICAL]**`, `**[HIGH]**`, `**[MEDIUM]**`, or `**[LOW]**` followed immediately by a quality prefix `[VACUOUS]`, `[WEAK]`, or `[MISSING]` (for example, `**[HIGH][VACUOUS]**`) to distinguish quality issues from coverage gaps while keeping the format consistent with other agents. Include the specific test name and file:line for existing test issues.
|
|
187
204
|
|
|
188
205
|
Wait for ALL agents to complete before proceeding.
|
|
189
206
|
|
|
@@ -228,10 +245,18 @@ For each file touched by multiple categories, document why it was assigned to on
|
|
|
228
245
|
### Architecture & SOLID
|
|
229
246
|
### Bugs, Performance & Error Handling
|
|
230
247
|
### Stack-Specific
|
|
231
|
-
### Test
|
|
248
|
+
### Test Quality & Coverage
|
|
232
249
|
```
|
|
233
250
|
|
|
234
|
-
6. Print a summary table:
|
|
251
|
+
6. Print a summary table (short labels → full category → branch slug):
|
|
252
|
+
- Security → Security & Secrets → `security`
|
|
253
|
+
- Code Quality → Code Quality & Style → `code-quality`
|
|
254
|
+
- DRY & YAGNI → DRY & YAGNI → `dry`
|
|
255
|
+
- Architecture → Architecture & SOLID → `architecture`
|
|
256
|
+
- Bugs & Perf → Bugs, Performance & Error Handling → `bugs-perf`
|
|
257
|
+
- Stack-Specific → Stack-Specific → `stack-specific`
|
|
258
|
+
- Tests → Test Quality & Coverage → `tests`
|
|
259
|
+
|
|
235
260
|
```
|
|
236
261
|
| Category | CRITICAL | HIGH | MEDIUM | LOW | Total |
|
|
237
262
|
|-------------------|----------|------|--------|-----|-------|
|
|
@@ -241,7 +266,7 @@ For each file touched by multiple categories, document why it was assigned to on
|
|
|
241
266
|
| Architecture | ... | ... | ... | ... | ... |
|
|
242
267
|
| Bugs & Perf | ... | ... | ... | ... | ... |
|
|
243
268
|
| Stack-Specific | ... | ... | ... | ... | ... |
|
|
244
|
-
|
|
|
269
|
+
| Tests | ... | ... | ... | ... | ... |
|
|
245
270
|
| TOTAL | ... | ... | ... | ... | ... |
|
|
246
271
|
```
|
|
247
272
|
|
|
@@ -249,7 +274,7 @@ For each file touched by multiple categories, document why it was assigned to on
|
|
|
249
274
|
|
|
250
275
|
## Phase 3: Worktree Remediation
|
|
251
276
|
|
|
252
|
-
Only proceed with CRITICAL, HIGH, and MEDIUM findings
|
|
277
|
+
Only proceed with CRITICAL, HIGH, and MEDIUM findings for code remediation. LOW findings remain tracked in PLAN.md but are not auto-remediated. Test Quality & Coverage findings are handled separately in Phase 4c.
|
|
253
278
|
|
|
254
279
|
### 3a: Setup
|
|
255
280
|
|
|
@@ -329,18 +354,147 @@ After all agents complete:
|
|
|
329
354
|
4. Shut down all agents via `SendMessage` with `type: "shutdown_request"`
|
|
330
355
|
5. Clean up team via `TeamDelete`
|
|
331
356
|
|
|
357
|
+
## Phase 4b: Internal Code Review
|
|
358
|
+
|
|
359
|
+
Before creating PRs, run a deep code review on all remediation changes to catch issues that automated agents may have introduced.
|
|
360
|
+
|
|
361
|
+
1. Generate the diff of all changes in the worktree:
|
|
362
|
+
```bash
|
|
363
|
+
cd {WORKTREE_DIR} && git diff {DEFAULT_BRANCH}...HEAD
|
|
364
|
+
```
|
|
365
|
+
2. Review the diff against the code review checklist:
|
|
366
|
+
```
|
|
367
|
+
!`cat ~/.claude/lib/code-review-checklist.md`
|
|
368
|
+
```
|
|
369
|
+
3. For each issue found:
|
|
370
|
+
- Fix in a new commit: `fix: {description of review finding}`
|
|
371
|
+
- Re-run `{BUILD_CMD}` and `{TEST_CMD}` to verify
|
|
372
|
+
4. Present a summary of review findings and fixes to the user via `AskUserQuestion`:
|
|
373
|
+
```
|
|
374
|
+
AskUserQuestion([{
|
|
375
|
+
question: "Code review complete. {N} issues found and fixed. {list}. Proceed to PR creation?",
|
|
376
|
+
options: [
|
|
377
|
+
{ label: "Proceed", description: "Create per-category PRs" },
|
|
378
|
+
{ label: "Show diff", description: "Show the full diff for manual review before proceeding" },
|
|
379
|
+
{ label: "Abort", description: "Stop here — I'll review manually" }
|
|
380
|
+
]
|
|
381
|
+
}])
|
|
382
|
+
```
|
|
383
|
+
5. If "Show diff" selected, print the diff and re-ask. If "Abort", stop and print the worktree path.
|
|
384
|
+
|
|
385
|
+
## Phase 4c: Test Enhancement
|
|
386
|
+
|
|
387
|
+
After internal code review passes, evaluate and enhance the project's test suite. This phase acts on Agent 7's findings AND ensures all remediation work from Phase 3 has proper test coverage.
|
|
388
|
+
|
|
389
|
+
### 4c.0: Record Start SHA
|
|
390
|
+
|
|
391
|
+
Before any test enhancement commits, capture the current HEAD so Phase 4c changes can be diffed later:
|
|
392
|
+
```bash
|
|
393
|
+
cd {WORKTREE_DIR}
|
|
394
|
+
PHASE_4C_START_SHA="$(git rev-parse HEAD)"
|
|
395
|
+
```
|
|
396
|
+
|
|
397
|
+
### 4c.1: Test Audit Triage
|
|
398
|
+
|
|
399
|
+
Review Agent 7 findings from Phase 1 and categorize them:
|
|
400
|
+
|
|
401
|
+
1. **`[VACUOUS]` findings** — tests that exist but don't test real behavior. These are the highest priority because they create a false sense of safety.
|
|
402
|
+
2. **`[WEAK]` findings** — tests that partially cover behavior but miss important cases. Strengthen with additional assertions and edge cases.
|
|
403
|
+
3. **`[MISSING]` findings** — no tests exist for critical paths. Write new test files or add test cases to existing files.
|
|
404
|
+
|
|
405
|
+
Additionally, scan all remediation changes from Phase 3:
|
|
406
|
+
- For each file modified by remediation agents, check if corresponding tests exist
|
|
407
|
+
- If tests exist, verify they cover the specific behavior that was fixed/changed
|
|
408
|
+
- If no tests exist for a remediated module, flag for new test creation
|
|
409
|
+
|
|
410
|
+
### 4c.2: Test Enhancement Execution
|
|
411
|
+
|
|
412
|
+
Spawn a general-purpose agent (using `REMEDIATION_MODEL`) in the worktree to fix and write tests. Populate the template placeholders below from Phase 4c.1 triage output: `{VACUOUS_AND_WEAK_FINDINGS}` from `[VACUOUS]`/`[WEAK]` findings, `{MISSING_FINDINGS}` from `[MISSING]` findings, and `{REMEDIATED_FILES_WITHOUT_TESTS}` from the remediation-change scan. The agent instructions:
|
|
413
|
+
|
|
414
|
+
```
|
|
415
|
+
You are a test enhancement agent working in {WORKTREE_DIR}.
|
|
416
|
+
Project type: {PROJECT_TYPE}. Test command: {TEST_CMD}.
|
|
417
|
+
|
|
418
|
+
Your job is to fix weak/vacuous tests and write missing tests that verify REAL BEHAVIOR.
|
|
419
|
+
|
|
420
|
+
## Rules for writing good tests
|
|
421
|
+
|
|
422
|
+
1. **Test observable behavior, not implementation.** Assert on return values, side effects (files written, state changed), and error messages — never on internal variable names, call counts, or private method invocations.
|
|
423
|
+
|
|
424
|
+
2. **Every assertion must be falsifiable.** For each assertion you write, mentally substitute a broken implementation (returns null, returns wrong value, throws instead of succeeding, succeeds instead of throwing). If your assertion would still pass, it's vacuous — rewrite it.
|
|
425
|
+
|
|
426
|
+
3. **Prefer real modules over mocks.** Only mock at system boundaries (filesystem, network, time). If you must mock, assert on the arguments passed TO the mock, not on its return value.
|
|
427
|
+
|
|
428
|
+
4. **Test the edges.** Each test function needs at minimum:
|
|
429
|
+
- Happy path with specific expected output
|
|
430
|
+
- Empty/null/undefined input
|
|
431
|
+
- Invalid input that should error
|
|
432
|
+
- Boundary values (0, -1, MAX, empty string vs null)
|
|
433
|
+
|
|
434
|
+
5. **Use concrete expected values.** `assert.equal(result, 'expected string')` not `assert.ok(result)`. `assert.deepEqual(output, { key: 'value' })` not `assert.ok(typeof output === 'object')`.
|
|
435
|
+
|
|
436
|
+
6. **One behavior per test.** Each `it()` block tests exactly one scenario. The test name describes the scenario and expected outcome.
|
|
437
|
+
|
|
438
|
+
7. **No shared mutable state.** Each test must be independently runnable. Use `beforeEach` to create fresh fixtures. Never rely on test execution order.
|
|
439
|
+
|
|
440
|
+
## Task list
|
|
441
|
+
|
|
442
|
+
Fix these vacuous/weak tests:
|
|
443
|
+
{VACUOUS_AND_WEAK_FINDINGS}
|
|
444
|
+
|
|
445
|
+
Write tests for these gaps:
|
|
446
|
+
{MISSING_FINDINGS}
|
|
447
|
+
|
|
448
|
+
Write tests for these remediated files:
|
|
449
|
+
{REMEDIATED_FILES_WITHOUT_TESTS}
|
|
450
|
+
|
|
451
|
+
## Verification
|
|
452
|
+
|
|
453
|
+
After writing/fixing each test file:
|
|
454
|
+
1. Run `{TEST_CMD}` to verify all tests pass
|
|
455
|
+
2. For each NEW test, verify that it fails when the behavior under test is wrong:
|
|
456
|
+
- Ensure you have no unstaged changes (`git diff` is clean)
|
|
457
|
+
- Apply a small, obvious, and **uncommitted** change to the code under test (e.g., return a constant, flip a conditional)
|
|
458
|
+
- Run `{TEST_CMD}` and confirm the new test FAILS
|
|
459
|
+
- Immediately restore the code: `git checkout -- {file_path}`
|
|
460
|
+
- Confirm the worktree is clean again (`git diff` shows no changes)
|
|
461
|
+
This is the key quality gate — a test that does not fail when the code is broken is worthless.
|
|
462
|
+
3. After confirming the code is restored and the worktree is clean, commit passing tests: `test: {description of what's tested}`
|
|
463
|
+
```
|
|
464
|
+
|
|
465
|
+
### 4c.3: Verification
|
|
466
|
+
|
|
467
|
+
After the test agent completes:
|
|
468
|
+
|
|
469
|
+
1. Run the full test suite:
|
|
470
|
+
```bash
|
|
471
|
+
cd {WORKTREE_DIR} && {TEST_CMD}
|
|
472
|
+
```
|
|
473
|
+
2. If tests fail, fix in a new commit
|
|
474
|
+
3. Count new/fixed tests and record four variables:
|
|
475
|
+
- `VACUOUS_TESTS_FIXED` — number of vacuous tests fixed
|
|
476
|
+
- `WEAK_TESTS_STRENGTHENED` — number of weak tests strengthened
|
|
477
|
+
- `NEW_TEST_CASES` — number of new test cases added
|
|
478
|
+
- `NEW_TEST_FILES` — number of new test files created
|
|
479
|
+
4. **Update `FILE_OWNER_MAP`** — Phase 4c may have created or modified test files that were not in the Phase 2 map. Before Phase 5 assembles branches:
|
|
480
|
+
- List all files changed by Phase 4c commits: `git diff --name-only "$PHASE_4C_START_SHA"..HEAD`
|
|
481
|
+
- For each file not already in `FILE_OWNER_MAP`, assign it to the `tests` category
|
|
482
|
+
- For each file already owned by another category, leave it in that category (co-located test changes ship with the code they test — the `tests` branch only contains standalone test files not owned by other categories)
|
|
483
|
+
|
|
332
484
|
## Phase 5: Per-Category PR Creation
|
|
333
485
|
|
|
334
486
|
Instead of one mega PR, create **separate branches and PRs for each category**. This enables independent review, targeted CI, and granular merge decisions.
|
|
335
487
|
|
|
336
488
|
### 5a: Build the Category Branches
|
|
337
489
|
|
|
338
|
-
Using the `FILE_OWNER_MAP` from Phase 2, create one branch per category
|
|
490
|
+
Using the `FILE_OWNER_MAP` from Phase 2 (updated in Phase 4c.3), create one branch per category.
|
|
491
|
+
|
|
492
|
+
Initialize `CREATED_CATEGORY_SLUGS=""` (empty space-delimited string). After each category branch is successfully created and pushed below, append its slug: `CREATED_CATEGORY_SLUGS="$CREATED_CATEGORY_SLUGS {CATEGORY_SLUG}"`. Phase 7 uses this for cleanup.
|
|
339
493
|
|
|
340
494
|
For each category that has findings:
|
|
341
495
|
1. Switch to `{DEFAULT_BRANCH}`: `git checkout {DEFAULT_BRANCH}`
|
|
342
496
|
2. Create a category branch: `git checkout -b better/{CATEGORY_SLUG}`
|
|
343
|
-
- Use slugs: `security`, `code-quality`, `dry`, `
|
|
497
|
+
- Use slugs: `security`, `code-quality`, `dry`, `architecture`, `bugs-perf`, `stack-specific`, `tests`
|
|
344
498
|
3. For each file assigned to this category in `FILE_OWNER_MAP`:
|
|
345
499
|
- **Modified files**: `git checkout origin/better/{DATE} -- {file_path}`
|
|
346
500
|
- **New files (Added)**: `git checkout origin/better/{DATE} -- {file_path}`
|
|
@@ -459,34 +613,17 @@ Maximum 5 iterations per PR to prevent infinite loops.
|
|
|
459
613
|
|
|
460
614
|
**Sub-agent delegation** (prevents context exhaustion): delegate each PR's review loop to a **separate general-purpose sub-agent** via the Agent tool. Launch sub-agents in parallel (one per PR). Each sub-agent runs the full loop (request → wait → check → fix → re-request) autonomously and returns only the final status.
|
|
461
615
|
|
|
462
|
-
### 6.
|
|
463
|
-
|
|
464
|
-
If `BROWSER_AUTHENTICATED` is not true (e.g., Phase 0e was skipped or failed):
|
|
465
|
-
1. Navigate to the first PR URL using `browser_navigate`
|
|
466
|
-
2. Check for user avatar/menu
|
|
467
|
-
3. If not logged in: navigate to `https://github.com/login`, inform the user **"Please log in to GitHub in the browser. I'll wait for you to confirm."**, and use `AskUserQuestion` to wait
|
|
468
|
-
|
|
469
|
-
### 6.1: Determine review request method
|
|
470
|
-
|
|
471
|
-
**Try the API first** on any one PR:
|
|
472
|
-
```bash
|
|
473
|
-
gh api repos/{OWNER}/{REPO}/pulls/{PR_NUMBER}/requested_reviewers \
|
|
474
|
-
-f 'reviewers[]=copilot-pull-request-reviewer[bot]'
|
|
475
|
-
```
|
|
476
|
-
|
|
477
|
-
If this returns 422 ("not a collaborator"), record `REVIEW_METHOD=playwright`. Otherwise record `REVIEW_METHOD=api`.
|
|
478
|
-
|
|
479
|
-
### 6.2: Launch parallel sub-agents (one per PR)
|
|
616
|
+
### 6.1: Launch parallel sub-agents (one per PR)
|
|
480
617
|
|
|
481
618
|
For each PR, spawn a general-purpose sub-agent using the shared review loop template:
|
|
482
619
|
|
|
483
620
|
!`cat ~/.claude/lib/copilot-review-loop.md`
|
|
484
621
|
|
|
485
|
-
Pass each sub-agent the PR-specific variables: `{PR_NUMBER}`, `{OWNER}/{REPO}`, `better/{CATEGORY_SLUG}`,
|
|
622
|
+
Pass each sub-agent the PR-specific variables: `{PR_NUMBER}`, `{OWNER}/{REPO}`, `better/{CATEGORY_SLUG}`, and `{BUILD_CMD}`.
|
|
486
623
|
|
|
487
624
|
Launch all PR sub-agents in parallel. Wait for all to complete.
|
|
488
625
|
|
|
489
|
-
### 6.
|
|
626
|
+
### 6.2: Handle sub-agent results
|
|
490
627
|
|
|
491
628
|
For each sub-agent result:
|
|
492
629
|
- **clean**: mark PR as ready to merge
|
|
@@ -494,9 +631,28 @@ For each sub-agent result:
|
|
|
494
631
|
- **max-iterations-reached**: inform the user "Reached max review iterations (5) on PR #{number}. Remaining issues may need manual review."
|
|
495
632
|
- **error**: inform the user and ask whether to retry or skip
|
|
496
633
|
|
|
634
|
+
### 6.3: Merge Gate (MANDATORY)
|
|
635
|
+
|
|
636
|
+
**Do NOT merge any PR until Copilot review has completed (approved or commented) on ALL PRs, or the user explicitly approves skipping.**
|
|
637
|
+
|
|
638
|
+
Present the review status summary to the user via `AskUserQuestion`:
|
|
639
|
+
```
|
|
640
|
+
AskUserQuestion([{
|
|
641
|
+
question: "Copilot review status:\n{for each PR: #number - status (approved/comments/pending/timeout)}\n\nHow would you like to proceed?",
|
|
642
|
+
options: [
|
|
643
|
+
{ label: "Merge approved PRs", description: "Merge only PRs with passing review" },
|
|
644
|
+
{ label: "Merge all", description: "Merge all PRs regardless of review status" },
|
|
645
|
+
{ label: "Wait", description: "Wait longer for pending reviews" },
|
|
646
|
+
{ label: "Don't merge", description: "Leave PRs open for manual review" }
|
|
647
|
+
]
|
|
648
|
+
}])
|
|
649
|
+
```
|
|
650
|
+
|
|
651
|
+
Only proceed with merging based on the user's selection. Never auto-merge without user confirmation.
|
|
652
|
+
|
|
497
653
|
### 6.4: Merge
|
|
498
654
|
|
|
499
|
-
For each PR
|
|
655
|
+
For each PR approved for merge (in dependency order if applicable):
|
|
500
656
|
```bash
|
|
501
657
|
gh pr merge {PR_NUMBER} --merge
|
|
502
658
|
```
|
|
@@ -524,11 +680,16 @@ If merge fails (e.g., branch protection, merge conflicts from a prior PR):
|
|
|
524
680
|
```bash
|
|
525
681
|
git worktree remove {WORKTREE_DIR}
|
|
526
682
|
```
|
|
527
|
-
2. Delete local branches (only
|
|
683
|
+
2. Delete local AND remote branches (only categories that were created and merged). Use the tracked list of branches from Phase 5 rather than a fixed list:
|
|
528
684
|
```bash
|
|
529
685
|
git branch -d better/{DATE}
|
|
530
|
-
|
|
686
|
+
# CREATED_CATEGORY_SLUGS is a space-delimited string, e.g. "security code-quality tests"
|
|
687
|
+
for slug in $CREATED_CATEGORY_SLUGS; do
|
|
688
|
+
git branch -d "better/$slug" || echo "warning: local branch better/$slug not found or not fully merged"
|
|
689
|
+
git push origin --delete "better/$slug" 2>/dev/null || echo "warning: remote branch better/$slug not found"
|
|
690
|
+
done
|
|
531
691
|
```
|
|
692
|
+
The guards prevent errors from interrupting cleanup. Warnings are printed so leftover branches are visible.
|
|
532
693
|
3. Restore stashed changes (if stashed in Phase 3a):
|
|
533
694
|
```bash
|
|
534
695
|
git stash pop
|
|
@@ -548,8 +709,14 @@ If merge fails (e.g., branch protection, merge conflicts from a prior PR):
|
|
|
548
709
|
| Architecture | ... | ... | ... | #number | pass | approved |
|
|
549
710
|
| Bugs & Perf | ... | ... | ... | #number | pass | approved |
|
|
550
711
|
| Stack-Specific | ... | ... | ... | #number | pass | approved |
|
|
551
|
-
|
|
|
712
|
+
| Tests | ... | ... | ... | #number | pass | approved |
|
|
552
713
|
| TOTAL | ... | ... | ... | N PRs | | |
|
|
714
|
+
|
|
715
|
+
Test Enhancement Stats:
|
|
716
|
+
- Vacuous tests fixed: {VACUOUS_TESTS_FIXED}
|
|
717
|
+
- Weak tests strengthened: {WEAK_TESTS_STRENGTHENED}
|
|
718
|
+
- New test cases added: {NEW_TEST_CASES}
|
|
719
|
+
- New test files created: {NEW_TEST_FILES}
|
|
553
720
|
```
|
|
554
721
|
|
|
555
722
|
## Error Recovery
|
|
@@ -563,7 +730,6 @@ If merge fails (e.g., branch protection, merge conflicts from a prior PR):
|
|
|
563
730
|
- **Copilot review loop exceeds 5 iterations per PR**: stop iterating on that PR, inform user, proceed to merge
|
|
564
731
|
- **Existing worktree found at startup**: ask user — resume (reuse worktree) or cleanup (remove and start fresh)
|
|
565
732
|
- **No findings above LOW**: skip Phases 3-7, print "No actionable findings" with the LOW summary
|
|
566
|
-
- **Browser not authenticated**: use `AskUserQuestion` to ask the user to log in — never skip this or close the browser
|
|
567
733
|
- **Merge conflict after prior PR merged**: rebase the branch onto the updated default branch, push with `--force-with-lease`, re-run CI
|
|
568
734
|
|
|
569
735
|
!`cat ~/.claude/lib/graphql-escaping.md`
|
|
@@ -576,6 +742,7 @@ If merge fails (e.g., branch protection, merge conflicts from a prior PR):
|
|
|
576
742
|
- Each file appears in exactly ONE PR (file ownership map) to prevent merge conflicts between PRs
|
|
577
743
|
- When extracting modules, always add backward-compatible re-exports in the original module to prevent cross-PR breakage
|
|
578
744
|
- Version bump happens exactly once on the first category branch based on aggregate commit analysis
|
|
579
|
-
- Only CRITICAL, HIGH, and MEDIUM findings are auto-remediated; LOW
|
|
745
|
+
- Only CRITICAL, HIGH, and MEDIUM findings are auto-remediated for code categories; LOW findings remain tracked in PLAN.md
|
|
746
|
+
- Test Quality & Coverage findings are remediated in Phase 4c with a dedicated test enhancement agent that verifies tests fail when code is broken
|
|
580
747
|
- GitLab projects skip the Copilot review loop entirely (Phase 6) and stop after MR creation
|
|
581
748
|
- CI must pass on each PR before requesting Copilot review or merging
|
|
@@ -60,6 +60,7 @@
|
|
|
60
60
|
**Validation & consistency** _[applies when: code handles user input, schemas, or API contracts]_
|
|
61
61
|
- API versioning: breaking changes to public endpoints without version bump or deprecation path
|
|
62
62
|
- Backward-incompatible response shape changes without client migration plan
|
|
63
|
+
- Backward compatibility breaking changes — renamed/removed config keys, changed file formats, altered DB schemas, modified event payloads, or restructured persisted data (localStorage, files, database rows) without a migration path or fallback that reads the old format. Trace all consumers of the changed contract (other services, CLI versions, stored data) and verify they still work or have an upgrade path. For schema changes, require a migration script; for config/format changes, support both old and new formats during a transition period or provide a one-time converter
|
|
63
64
|
- New endpoints/schemas should match validation patterns of existing similar endpoints — field limits, required fields, types, error handling. If validation exists on one endpoint for a param, the same param on other endpoints needs the same validation
|
|
64
65
|
- When a validation/sanitization function is introduced for a field, trace ALL write paths (create, update, sync, import) — partial application means invalid values re-enter through the unguarded path
|
|
65
66
|
- Schema fields accepting values downstream code can't handle; Zod/schema stripping fields the service reads (silent `undefined`); config values persisted but silently ignored by the implementation — trace each field through schema → service → consumer. Update schemas derived from create schemas (e.g., `.partial()`) must also make nested object fields optional — shallow partial on a deeply-required schema rejects valid partial updates. Additionally, `.deepPartial()` or `.partial()` on schemas with `.default()` values will apply those defaults on update, silently overwriting existing persisted values with defaults — create explicit update schemas without defaults instead
|
|
@@ -133,6 +134,7 @@
|
|
|
133
134
|
- Labels, comments, status messages, or documentation that describe behavior the code doesn't implement — e.g., a map named "renamed" that only deletes, or an action labeled "migrated" that never creates the target
|
|
134
135
|
- Inline code examples, command templates, and query snippets that aren't syntactically valid as written — template placeholders must use a consistent format, queries must use correct syntax for their language (e.g., single `{}` in GraphQL, not `{{}}`)
|
|
135
136
|
- Cross-references between files (identifiers, parameter names, format conventions, operational thresholds) that disagree — when one reference changes, trace all other files that reference the same entity and update them
|
|
137
|
+
- Template/workflow variables referenced (`{VAR_NAME}`) but never assigned — trace each placeholder to a definition step; undefined variables cause silent failures or confusing instructions. Also check for colliding identifiers (two distinct concepts mapped to the same slug, key, or name)
|
|
136
138
|
- Responsibility relocated from one module to another (e.g., writes moved from handler to middleware) without updating all consumers that depended on the old location's timing, return value, or side effects — trace callers that relied on the synchronous or co-located behavior and verify they still work with the new execution point. Remove dead code left behind at the old location
|
|
137
139
|
- Sequential instructions or steps whose ordering doesn't match the required execution order — readers following in order will perform actions at the wrong time (e.g., "record X" in step 2 when X must be captured before step 1's action)
|
|
138
140
|
- Sequential numbering (section numbers, step numbers) with gaps or jumps after edits — verify continuity
|
|
@@ -142,6 +144,11 @@
|
|
|
142
144
|
- Tracking/checkpoint files that default to empty on parse failure — causes full re-execution. Fail loudly instead
|
|
143
145
|
- Registering references to resources without verifying the resource exists — dangling references after failed operations
|
|
144
146
|
|
|
147
|
+
**Automated pipeline discipline**
|
|
148
|
+
- Internal code review must run on all automated remediation changes BEFORE creating PRs — never go straight from "tests pass" to PR creation
|
|
149
|
+
- Copilot review must complete (approved or commented) on all PRs before merging — never merge while reviews are still pending unless the user explicitly approves
|
|
150
|
+
- Automated agents may introduce subtle issues that pass tests but violate project conventions — review agent output against CLAUDE.md conventions
|
|
151
|
+
|
|
145
152
|
**AI-generated code quality** _(Claude 4.6 specific failure modes)_
|
|
146
153
|
- Over-engineering: new abstractions, wrapper functions, helper files, or utility modules that serve only one call site — inline the logic instead
|
|
147
154
|
- Feature flags, configuration options, or extension points with only one possible value or consumer
|
package/package.json
CHANGED