agentic-sdlc-wizard 1.21.0 → 1.23.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +43 -0
- package/CLAUDE_CODE_SDLC_WIZARD.md +122 -10
- package/README.md +1 -1
- package/cli/init.js +5 -4
- package/cli/templates/hooks/instructions-loaded-check.sh +12 -0
- package/cli/templates/skills/feedback/SKILL.md +92 -0
- package/cli/templates/skills/sdlc/SKILL.md +277 -118
- package/cli/templates/skills/setup/SKILL.md +11 -0
- package/cli/templates/skills/update/SKILL.md +4 -3
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,49 @@ All notable changes to the SDLC Wizard.
|
|
|
4
4
|
|
|
5
5
|
> **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
|
|
6
6
|
|
|
7
|
+
## [1.23.0] - 2026-04-01
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
- Update notification hook — `instructions-loaded-check.sh` checks npm for newer wizard version each session. Non-blocking, graceful on network failure. One-liner: "SDLC Wizard update available: X → Y (run /update-wizard)" (#64)
|
|
11
|
+
- Cross-model review standardization — mission-first handoff (mission/success/failure fields), preflight self-review doc, verification checklist, adversarial framing, domain template guidance, convergence reduced to 2-3 rounds. Audited 4 repos + 14 external repos + 7 papers (#72, #56)
|
|
12
|
+
- Release Planning Gate — section in SDLC skill. Before implementing release items: list all, plan each at 95% confidence, identify blocks, present plans as batch. Prove It Gate strengthened with absorption check (#73)
|
|
13
|
+
- 6 quality tests for update notification (fake npm in PATH, version comparison, failure modes)
|
|
14
|
+
- 12 quality tests for cross-model review, context position, release planning
|
|
15
|
+
- Testing Diamond boundary table — explicit E2E (UI/browser ~5%) vs Integration (API/no UI ~90%) vs Unit (pure logic ~5%) in SKILL.md and wizard doc (#65)
|
|
16
|
+
- Skill frontmatter docs — expanded to full table covering `paths:`, `context: fork`, `effort:`, `disable-model-invocation:`, `argument-hint:` (#69)
|
|
17
|
+
- `--bare` mode documentation in SKILL.md — complete wizard bypass warning for scripted headless calls (#70)
|
|
18
|
+
- 6 quality tests for #65/#69/#70
|
|
19
|
+
- "NEVER AUTO-MERGE" enforcement gate in CI Shepherd section — same weight as "ALL TESTS MUST PASS." Full shepherd sequence documented as mandatory (post-mortem from PR #145 incident)
|
|
20
|
+
- Post-Mortem pattern — when process fails, feed it back: Incident → Root Cause → New Rule → Test → Ship. "Every mistake becomes a rule"
|
|
21
|
+
- 4 quality tests for enforcement gate + post-mortem
|
|
22
|
+
|
|
23
|
+
### Fixed
|
|
24
|
+
- Dead-code pipe in `test_prove_it_absorption()` — `grep -qi | grep -qi` was a no-op (P1 from PR #145 CI review)
|
|
25
|
+
|
|
26
|
+
### Changed
|
|
27
|
+
- Moved "ALL TESTS MUST PASS" from 61% depth to 11% depth in SDLC skill (Lost in the Middle fix) (#57)
|
|
28
|
+
- Prove It Gate now requires absorption check — "can this be a section in an existing skill?" — before proposing new skills/components
|
|
29
|
+
- Wizard "E2E vs Manual Testing" section replaced with "E2E vs Integration — The Critical Boundary" (#65)
|
|
30
|
+
- Wizard "Skill Effort Frontmatter" section expanded to "Skill Frontmatter Fields" with full field reference (#69)
|
|
31
|
+
|
|
32
|
+
## [1.22.0] - 2026-04-01
|
|
33
|
+
|
|
34
|
+
### Added
|
|
35
|
+
- Plan Auto-Approval Gate — skip plan approval when confidence >= 95% AND task is single-file/trivial. Still announces approach, just doesn't wait. "When in doubt, wait for approval" (#53)
|
|
36
|
+
- Debugging Workflow section — systematic Reproduce → Isolate → Root Cause → Fix → Regression Test methodology. `git bisect` for regressions, environment-specific debugging guidance (#55)
|
|
37
|
+
- `/feedback` skill — privacy-first community contribution loop. Bug reports, feature requests, pattern sharing, SDLC improvements. Never scans without explicit consent. Creates GH issues on wizard repo (#37)
|
|
38
|
+
- BRANDING.md detection in setup wizard — scans for brand/, logos/, style-guide.md, brand-voice.md. Conditional generation only when branding assets found (#44)
|
|
39
|
+
- N-Reviewer CI Pipeline guidance — address each reviewer independently, resolve conflicts, max 3 iterations per reviewer (#32)
|
|
40
|
+
- Custom Subagents documentation — `.claude/agents/` pattern for sdlc-reviewer, ci-debug, test-writer agents. Skills vs agents comparison (#45)
|
|
41
|
+
- CLI distributes `/feedback` skill (9 template files, was 8)
|
|
42
|
+
- Improved CLI install restart messaging — `--continue` promoted as primary option for preserving conversation history
|
|
43
|
+
- 20 new tests across all 6 roadmap items
|
|
44
|
+
|
|
45
|
+
### Changed
|
|
46
|
+
- SDLC skill: added Auto-Approval, Debugging Workflow, Multiple Reviewers, Custom Subagents sections
|
|
47
|
+
- Setup skill: added branding asset detection (Step 1) and BRANDING.md generation (Step 8.5)
|
|
48
|
+
- Wizard doc: added Plan Auto-Approval, Debugging Workflow, N-Reviewer Pipeline, Custom Subagents, BRANDING.md template
|
|
49
|
+
|
|
7
50
|
## [1.21.0] - 2026-03-31
|
|
8
51
|
|
|
9
52
|
### Added
|
|
@@ -307,9 +307,24 @@ New built-in commands available to use alongside the wizard:
|
|
|
307
307
|
|
|
308
308
|
**Tip**: `/simplify` pairs well with the self-review phase. Run it after implementation as an additional quality check.
|
|
309
309
|
|
|
310
|
-
### Skill
|
|
310
|
+
### Skill Frontmatter Fields (v2.1.80+)
|
|
311
311
|
|
|
312
|
-
Skills
|
|
312
|
+
Skills support these frontmatter fields:
|
|
313
|
+
|
|
314
|
+
| Field | Purpose | Example |
|
|
315
|
+
|-------|---------|---------|
|
|
316
|
+
| `name` | Skill name (matches `/command`) | `name: sdlc` |
|
|
317
|
+
| `description` | Trigger description for auto-invocation | `description: Full SDLC workflow...` |
|
|
318
|
+
| `effort` | Set reasoning effort level | `effort: high` |
|
|
319
|
+
| `paths` | Restrict skill to specific file patterns | `paths: ["src/**/*.ts", "tests/**"]` |
|
|
320
|
+
| `context` | Context mode (`fork` = isolated subagent) | `context: fork` |
|
|
321
|
+
| `argument-hint` | Hint for `$ARGUMENTS` placeholder | `argument-hint: [task description]` |
|
|
322
|
+
| `disable-model-invocation` | Prevent skill from being auto-invoked by model | `disable-model-invocation: true` |
|
|
323
|
+
|
|
324
|
+
**Key fields explained:**
|
|
325
|
+
- **`effort: high`** — The wizard's `/sdlc` skill uses this to ensure Claude gives full attention. `max` is available but costs significantly more tokens.
|
|
326
|
+
- **`paths:`** — Limits when a skill activates based on files being worked on. Useful for language-specific or directory-specific skills.
|
|
327
|
+
- **`context: fork`** — Runs the skill in an isolated subagent context. The subagent gets its own context window, so it won't pollute the main conversation. Useful for review skills or analysis that should run independently.
|
|
313
328
|
|
|
314
329
|
### InstructionsLoaded Hook (v2.1.69+)
|
|
315
330
|
|
|
@@ -418,6 +433,8 @@ After planning, you get a free `/compact` - Claude's plan is preserved in the su
|
|
|
418
433
|
4. You run `/compact` → frees context, plan preserved in summary
|
|
419
434
|
5. Claude implements with clean context
|
|
420
435
|
|
|
436
|
+
**Plan Auto-Approval:** For HIGH confidence (95%+) tasks that are single-file or trivial (config tweak, small bug fix, string change) with no new patterns — skip plan approval and go straight to TDD. Claude still announces the approach but doesn't wait for approval. When in doubt, wait.
|
|
437
|
+
|
|
421
438
|
### 2. Confidence Levels Prevent Disasters
|
|
422
439
|
|
|
423
440
|
Claude MUST state confidence before implementing:
|
|
@@ -473,10 +490,11 @@ Here's the "Testing Diamond" approach (recommended for AI agents):
|
|
|
473
490
|
- **Confidence**: If integration tests pass, production usually works
|
|
474
491
|
- **AI-friendly**: Give Claude concrete pass/fail feedback on real behavior
|
|
475
492
|
|
|
476
|
-
**E2E vs
|
|
477
|
-
- **E2E (
|
|
478
|
-
- **
|
|
479
|
-
- **
|
|
493
|
+
**E2E vs Integration — The Critical Boundary:**
|
|
494
|
+
- **E2E**: Tests that go through the user's actual UI/browser (Playwright, Cypress). ~5% of suite.
|
|
495
|
+
- **Integration**: Tests that hit real systems via API without UI — real DB, real cache, real services. ~90% of suite.
|
|
496
|
+
- **Unit**: Pure logic only — no DB, no API, no filesystem. ~5% of suite.
|
|
497
|
+
- **The rule**: If your test doesn't open a browser or render a UI, it's not E2E — it's integration. Mislabeling leads to overinvestment in slow browser tests.
|
|
480
498
|
|
|
481
499
|
**But your team decides:**
|
|
482
500
|
|
|
@@ -1773,6 +1791,19 @@ TodoWrite([
|
|
|
1773
1791
|
|
|
1774
1792
|
**Before TDD, MUST ask:** "Docs updated. Run `/compact` before implementation?"
|
|
1775
1793
|
|
|
1794
|
+
### Auto-Approval: Skip Plan Approval Step
|
|
1795
|
+
|
|
1796
|
+
If ALL of these are true, skip plan approval and go straight to TDD:
|
|
1797
|
+
- Confidence is **HIGH (95%+)** — you know exactly what to do
|
|
1798
|
+
- Task is **single-file or trivial** (config tweak, small bug fix, string change)
|
|
1799
|
+
- No new patterns introduced
|
|
1800
|
+
- No architectural decisions
|
|
1801
|
+
|
|
1802
|
+
When auto-approving, still announce your approach — just don't wait for approval:
|
|
1803
|
+
> "Confidence HIGH (95%). Single-file change. Proceeding directly to TDD."
|
|
1804
|
+
|
|
1805
|
+
**When in doubt, wait for approval.** Auto-approval is for clear-cut cases only.
|
|
1806
|
+
|
|
1776
1807
|
## Confidence Check (REQUIRED)
|
|
1777
1808
|
|
|
1778
1809
|
Before presenting approach, STATE your confidence:
|
|
@@ -1927,6 +1958,28 @@ Before any release/publish, add these to `review_instructions`:
|
|
|
1927
1958
|
|
|
1928
1959
|
Evidence: v1.20.0 cross-model review caught CHANGELOG section loss and stale wizard version examples that passed all tests and self-review.
|
|
1929
1960
|
|
|
1961
|
+
### Multiple Reviewers (N-Reviewer Pipeline)
|
|
1962
|
+
|
|
1963
|
+
When multiple reviewers comment on a PR (Claude, Codex, human reviewers), address each reviewer independently:
|
|
1964
|
+
|
|
1965
|
+
1. **Read all reviews** — collect feedback from every active reviewer
|
|
1966
|
+
2. **Respond per-reviewer** — each reviewer has different blind spots. Address each one's findings separately
|
|
1967
|
+
3. **Resolve conflicts** — if reviewers disagree, pick the stronger argument, note why
|
|
1968
|
+
4. **Iterate until all approve** — don't merge until every active reviewer is satisfied
|
|
1969
|
+
5. **Max 3 iterations per reviewer** — escalate to user if a reviewer keeps finding new things
|
|
1970
|
+
|
|
1971
|
+
The value of multiple reviewers: different models/humans catch different issues. No single reviewer is sufficient for high-stakes changes.
|
|
1972
|
+
|
|
1973
|
+
### Custom Subagents (`.claude/agents/`)
|
|
1974
|
+
|
|
1975
|
+
Claude Code supports custom subagents in `.claude/agents/`. These run as independent subprocesses focused on a single task:
|
|
1976
|
+
|
|
1977
|
+
- **`sdlc-reviewer`** — SDLC compliance review (planning, TDD, self-review checks)
|
|
1978
|
+
- **`ci-debug`** — CI failure diagnosis (reads logs, identifies root cause)
|
|
1979
|
+
- **`test-writer`** — Quality test writing following TESTING.md philosophies
|
|
1980
|
+
|
|
1981
|
+
**Skills vs agents:** Skills guide Claude's behavior for a task type. Agents are independent subprocesses that run autonomously and return results. Use agents when you want parallel work or a fresh context window.
|
|
1982
|
+
|
|
1930
1983
|
## Test Review (Harder Than Implementation)
|
|
1931
1984
|
|
|
1932
1985
|
During self-review, critique tests HARDER than app code:
|
|
@@ -2004,6 +2057,24 @@ Sometimes the flakiness is genuinely in CI infrastructure (runner environment, G
|
|
|
2004
2057
|
- **Keep quality gates strict** — the actual pass/fail decision must NOT have `continue-on-error`
|
|
2005
2058
|
- **Separate "fail the build" from "nice to have"** — a missing PR comment is not a regression
|
|
2006
2059
|
|
|
2060
|
+
## Debugging Workflow (Systematic Investigation)
|
|
2061
|
+
|
|
2062
|
+
When something breaks and the cause isn't obvious, follow this systematic debugging workflow:
|
|
2063
|
+
|
|
2064
|
+
```
|
|
2065
|
+
Reproduce → Isolate → Root Cause → Fix → Regression Test
|
|
2066
|
+
```
|
|
2067
|
+
|
|
2068
|
+
1. **Reproduce** — Can you make it fail consistently? If intermittent, stress-test (run N times). If you can't reproduce it, you can't fix it
|
|
2069
|
+
2. **Isolate** — Narrow the scope. Which file? Which function? Which input? Use binary search: comment out half the code, does it still fail?
|
|
2070
|
+
3. **Root cause** — Don't fix symptoms. Ask "why?" until you hit the actual cause. "It crashes on line 42" is a symptom. "Null pointer because the API returns undefined when rate-limited" is a root cause
|
|
2071
|
+
4. **Fix** — Fix the root cause, not the symptom
|
|
2072
|
+
5. **Regression test** — Write a test that fails without your fix and passes with it (TDD GREEN)
|
|
2073
|
+
|
|
2074
|
+
**For regressions** (it worked before, now it doesn't): Use `git bisect` to find the exact breaking commit. `git bisect start`, `git bisect bad` (current), `git bisect good <known-good-commit>`. Narrows to the breaking commit in O(log n) steps.
|
|
2075
|
+
|
|
2076
|
+
**Environment-specific bugs** (works locally, fails in CI/staging/prod): Check environment differences (env vars, OS version, dependency versions, file permissions). Reproduce the environment locally if possible. Add logging at the failure point — don't guess, observe.
|
|
2077
|
+
|
|
2007
2078
|
## CI Feedback Loop — Local Shepherd (After Commit)
|
|
2008
2079
|
|
|
2009
2080
|
**This is the "local shepherd" — your CI fix mechanism.** It runs in your active session with full context.
|
|
@@ -2346,7 +2417,7 @@ If deployment fails or post-deploy verification catches issues:
|
|
|
2346
2417
|
|
|
2347
2418
|
**SDLC.md:**
|
|
2348
2419
|
```markdown
|
|
2349
|
-
<!-- SDLC Wizard Version: 1.
|
|
2420
|
+
<!-- SDLC Wizard Version: 1.23.0 -->
|
|
2350
2421
|
<!-- Setup Date: [DATE] -->
|
|
2351
2422
|
<!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
2352
2423
|
<!-- Git Workflow: [PRs or Solo] -->
|
|
@@ -2455,6 +2526,36 @@ Reference: `components/ui/` or Storybook
|
|
|
2455
2526
|
|
|
2456
2527
|
**If you have external design system:** Point to Storybook/Figma URL instead of duplicating.
|
|
2457
2528
|
|
|
2529
|
+
### BRANDING.md (If Branding Assets Detected)
|
|
2530
|
+
|
|
2531
|
+
**Only generated if branding-related files are found:** BRANDING.md, brand/, logos/, style-guide.md, brand-voice.md, tone-of-voice.*, or UI/content-heavy project patterns.
|
|
2532
|
+
|
|
2533
|
+
```markdown
|
|
2534
|
+
# Brand Guidelines
|
|
2535
|
+
|
|
2536
|
+
## Brand Voice & Tone
|
|
2537
|
+
- [Detected from brand-voice.md or style guide, or ask user]
|
|
2538
|
+
- Formal/casual/technical/friendly
|
|
2539
|
+
- Target audience description
|
|
2540
|
+
|
|
2541
|
+
## Naming Conventions
|
|
2542
|
+
- Product name: [official name, capitalization]
|
|
2543
|
+
- Feature names: [naming pattern]
|
|
2544
|
+
- Technical terminology: [glossary of project-specific terms]
|
|
2545
|
+
|
|
2546
|
+
## Visual Identity
|
|
2547
|
+
- Logo usage: [reference to logo files or guidelines]
|
|
2548
|
+
- Color palette: [reference to DESIGN_SYSTEM.md if exists]
|
|
2549
|
+
- Typography: [font choices and usage]
|
|
2550
|
+
|
|
2551
|
+
## Content Style
|
|
2552
|
+
- [Any content writing guidelines]
|
|
2553
|
+
- [Error message tone]
|
|
2554
|
+
- [User-facing copy standards]
|
|
2555
|
+
```
|
|
2556
|
+
|
|
2557
|
+
**Why BRANDING.md?** Claude writing user-facing copy, error messages, or documentation needs to know the brand voice. Without this, output tone is inconsistent. Skip for backend-only or internal-tool projects.
|
|
2558
|
+
|
|
2458
2559
|
---
|
|
2459
2560
|
|
|
2460
2561
|
## Step 10: Verify Setup (Claude Does This Automatically)
|
|
@@ -2712,6 +2813,7 @@ Docs drift when code changes but docs don't. The SDLC skill's planning phase det
|
|
|
2712
2813
|
- During planning, Claude reads feature docs for the area being changed
|
|
2713
2814
|
- If the code change contradicts what the doc says, Claude updates the doc
|
|
2714
2815
|
- The "After Session" step routes learnings to the right doc
|
|
2816
|
+
- Plan files get closed out — if the session's work came from a plan, it gets deleted or marked complete so future sessions aren't misled
|
|
2715
2817
|
- Stale docs cause low confidence — if Claude struggles, the doc may need updating
|
|
2716
2818
|
|
|
2717
2819
|
**CLAUDE.md health:** Run `/claude-md-improver` periodically (quarterly or after major changes). It audits CLAUDE.md specifically — structure, clarity, completeness (6 criteria, 100-point rubric). It does NOT cover feature docs, TESTING.md, or ADRs — the SDLC workflow handles those.
|
|
@@ -2929,7 +3031,7 @@ Use an independent AI model from a different company as a code reviewer. The aut
|
|
|
2929
3031
|
**The Protocol:**
|
|
2930
3032
|
|
|
2931
3033
|
1. Create a `.reviews/` directory in your project
|
|
2932
|
-
2. After Claude completes its SDLC loop (self-review passes), write a handoff file:
|
|
3034
|
+
2. After Claude completes its SDLC loop (self-review passes), write a preflight doc (what you already checked) then a mission-first handoff file:
|
|
2933
3035
|
|
|
2934
3036
|
```jsonc
|
|
2935
3037
|
// .reviews/handoff.json
|
|
@@ -2937,12 +3039,22 @@ Use an independent AI model from a different company as a code reviewer. The aut
|
|
|
2937
3039
|
"review_id": "feature-xyz-001",
|
|
2938
3040
|
"status": "PENDING_REVIEW",
|
|
2939
3041
|
"round": 1,
|
|
3042
|
+
"mission": "What changed and why — context for the reviewer",
|
|
3043
|
+
"success": "What 'correctly reviewed' looks like",
|
|
3044
|
+
"failure": "What gets missed if the reviewer is superficial",
|
|
2940
3045
|
"files_changed": ["src/auth.ts", "tests/auth.test.ts"],
|
|
2941
|
-
"
|
|
3046
|
+
"verification_checklist": [
|
|
3047
|
+
"(a) Verify input validation at auth.ts:45",
|
|
3048
|
+
"(b) Verify test covers null-token edge case"
|
|
3049
|
+
],
|
|
3050
|
+
"review_instructions": "Focus on security and edge cases. Assume bugs may be present until proven otherwise.",
|
|
3051
|
+
"preflight_path": ".reviews/preflight-feature-xyz-001.md",
|
|
2942
3052
|
"artifact_path": ".reviews/feature-xyz-001/"
|
|
2943
3053
|
}
|
|
2944
3054
|
```
|
|
2945
3055
|
|
|
3056
|
+
The `mission/success/failure` fields give the reviewer context. Without them, you get generic "looks good" feedback. With them, reviewers dig into source files and verify specific claims. The `verification_checklist` tells the reviewer exactly what to verify — not "review this" but specific items with file:line references.
|
|
3057
|
+
|
|
2946
3058
|
3. Run the independent reviewer (Round 1 — full review). These commands use your Codex default model — configure it to the latest, most capable model available:
|
|
2947
3059
|
|
|
2948
3060
|
```bash
|
|
@@ -3217,7 +3329,7 @@ Walk through updates? (y/n)
|
|
|
3217
3329
|
Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
|
|
3218
3330
|
|
|
3219
3331
|
```markdown
|
|
3220
|
-
<!-- SDLC Wizard Version: 1.
|
|
3332
|
+
<!-- SDLC Wizard Version: 1.23.0 -->
|
|
3221
3333
|
<!-- Setup Date: 2026-01-24 -->
|
|
3222
3334
|
<!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
3223
3335
|
<!-- Git Workflow: PRs -->
|
package/README.md
CHANGED
|
@@ -186,7 +186,7 @@ This isn't the only Claude Code SDLC tool. Here's an honest comparison:
|
|
|
186
186
|
|--------|------------|----------------------|-------------|
|
|
187
187
|
| **Focus** | SDLC enforcement + measurement | Agent performance optimization | Plugin marketplace |
|
|
188
188
|
| **Hooks** | 3 (SDLC, TDD, instructions) | 12+ (dev blocker, prettier, etc.) | Webhook watcher |
|
|
189
|
-
| **Skills** |
|
|
189
|
+
| **Skills** | 4 (/sdlc, /setup, /update, /feedback) | 80+ domain-specific | 13 slash commands |
|
|
190
190
|
| **Evaluation** | 95% CI, CUSUM, SDP, Tier 1/2 | Configuration testing | skilltest framework |
|
|
191
191
|
| **CI Shepherd** | Local CI fix loop | No | No |
|
|
192
192
|
| **Auto-updates** | Weekly CC + community scan | No | No |
|
package/cli/init.js
CHANGED
|
@@ -22,6 +22,7 @@ const FILES = [
|
|
|
22
22
|
{ src: 'skills/sdlc/SKILL.md', dest: '.claude/skills/sdlc/SKILL.md' },
|
|
23
23
|
{ src: 'skills/setup/SKILL.md', dest: '.claude/skills/setup/SKILL.md' },
|
|
24
24
|
{ src: 'skills/update/SKILL.md', dest: '.claude/skills/update/SKILL.md' },
|
|
25
|
+
{ src: 'skills/feedback/SKILL.md', dest: '.claude/skills/feedback/SKILL.md' },
|
|
25
26
|
];
|
|
26
27
|
|
|
27
28
|
const WIZARD_HOOK_MARKERS = FILES
|
|
@@ -224,12 +225,12 @@ function init(targetDir, { force = false, dryRun = false } = {}) {
|
|
|
224
225
|
console.log(`
|
|
225
226
|
${GREEN}SDLC Wizard installed successfully!${RESET}
|
|
226
227
|
|
|
227
|
-
${YELLOW}
|
|
228
|
-
|
|
229
|
-
|
|
228
|
+
${YELLOW}Restart Claude Code${RESET} to load new hooks and skills:
|
|
229
|
+
${CYAN}/exit${RESET} then ${CYAN}claude --continue${RESET} (keeps conversation history)
|
|
230
|
+
${CYAN}/exit${RESET} then ${CYAN}claude${RESET} (fresh start)
|
|
230
231
|
|
|
231
232
|
Next steps:
|
|
232
|
-
1.
|
|
233
|
+
1. Restart Claude Code (see above)
|
|
233
234
|
2. Tell Claude anything — setup auto-invokes when SDLC files are missing
|
|
234
235
|
3. Claude reads the wizard doc and creates CLAUDE.md, SDLC.md, TESTING.md, ARCHITECTURE.md
|
|
235
236
|
|
|
@@ -20,4 +20,16 @@ if [ -n "$MISSING" ]; then
|
|
|
20
20
|
echo "Invoke Skill tool, skill=\"setup-wizard\" to generate them."
|
|
21
21
|
fi
|
|
22
22
|
|
|
23
|
+
# Version update check (non-blocking, best-effort)
|
|
24
|
+
SDLC_MD="$PROJECT_DIR/SDLC.md"
|
|
25
|
+
if [ -f "$SDLC_MD" ]; then
|
|
26
|
+
INSTALLED_VERSION=$(grep -o 'SDLC Wizard Version: [0-9.]*' "$SDLC_MD" | head -1 | sed 's/SDLC Wizard Version: //')
|
|
27
|
+
if [ -n "$INSTALLED_VERSION" ] && command -v npm > /dev/null 2>&1; then
|
|
28
|
+
LATEST_VERSION=$(npm view agentic-sdlc-wizard version 2>/dev/null) || true
|
|
29
|
+
if [ -n "$LATEST_VERSION" ] && [ "$LATEST_VERSION" != "$INSTALLED_VERSION" ]; then
|
|
30
|
+
echo "SDLC Wizard update available: ${INSTALLED_VERSION} → ${LATEST_VERSION} (run /update-wizard)"
|
|
31
|
+
fi
|
|
32
|
+
fi
|
|
33
|
+
fi
|
|
34
|
+
|
|
23
35
|
exit 0
|
|
@@ -0,0 +1,92 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: feedback
|
|
3
|
+
description: Submit feedback, bug reports, feature requests, or share SDLC patterns you've discovered. Privacy-first — always asks before scanning.
|
|
4
|
+
argument-hint: [optional: bug | feature | pattern | improvement]
|
|
5
|
+
effort: medium
|
|
6
|
+
---
|
|
7
|
+
# Feedback — Community Contribution Loop
|
|
8
|
+
|
|
9
|
+
## Task
|
|
10
|
+
$ARGUMENTS
|
|
11
|
+
|
|
12
|
+
## Purpose
|
|
13
|
+
|
|
14
|
+
Help users contribute back to the SDLC wizard: bug reports, feature requests, pattern sharing, and SDLC improvements. Privacy-first — never scan without explicit permission.
|
|
15
|
+
|
|
16
|
+
## Privacy & Permission (MANDATORY)
|
|
17
|
+
|
|
18
|
+
**NEVER scan the user's repo without explicit consent.** Always ask first:
|
|
19
|
+
|
|
20
|
+
> "I can scan your SDLC setup to identify what you've customized vs wizard defaults. This helps me create a more specific report. May I scan? (Only file names and SDLC config are read — no source code, secrets, or business logic.)"
|
|
21
|
+
|
|
22
|
+
**What IS scanned (with permission):**
|
|
23
|
+
- SDLC.md, TESTING.md, CLAUDE.md structure (not content details)
|
|
24
|
+
- Hook file names and which hooks are active
|
|
25
|
+
- Skill names and which skills exist
|
|
26
|
+
- .claude/settings.json hook configuration (not allowedTools or secrets)
|
|
27
|
+
|
|
28
|
+
**What is NEVER scanned:**
|
|
29
|
+
- Source code files
|
|
30
|
+
- .env files, secrets, credentials
|
|
31
|
+
- Business logic or proprietary code
|
|
32
|
+
- Git history or commit messages
|
|
33
|
+
|
|
34
|
+
## Feedback Types
|
|
35
|
+
|
|
36
|
+
### Bug Report
|
|
37
|
+
1. Ask user to describe the issue
|
|
38
|
+
2. With permission, check which wizard version is installed (`SDLC.md` metadata)
|
|
39
|
+
3. Check if hooks are properly configured
|
|
40
|
+
4. Create a GitHub issue with reproduction steps
|
|
41
|
+
|
|
42
|
+
### Feature Request
|
|
43
|
+
1. Ask user what they want
|
|
44
|
+
2. With permission, check if a similar capability already exists in their setup
|
|
45
|
+
3. Create a GitHub issue with the request and context
|
|
46
|
+
|
|
47
|
+
### Pattern Sharing
|
|
48
|
+
1. Ask user what pattern they've discovered (custom hook, modified philosophy, test approach)
|
|
49
|
+
2. With permission, diff their SDLC setup against wizard defaults to identify customizations
|
|
50
|
+
3. Ask: "Which of these customizations worked well for you?"
|
|
51
|
+
4. Create a GitHub issue describing the pattern and evidence it works
|
|
52
|
+
|
|
53
|
+
### SDLC Improvement
|
|
54
|
+
1. Ask what could be better about the SDLC workflow
|
|
55
|
+
2. With permission, check which SDLC steps they use most/least
|
|
56
|
+
3. Create a GitHub issue with the improvement suggestion
|
|
57
|
+
|
|
58
|
+
## Creating the Issue
|
|
59
|
+
|
|
60
|
+
Use `gh issue create` on the wizard repo:
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
gh issue create \
|
|
64
|
+
--repo BaseInfinity/agentic-ai-sdlc-wizard \
|
|
65
|
+
--title "[feedback-type]: Brief description" \
|
|
66
|
+
--body "$(cat <<'EOF'
|
|
67
|
+
## Feedback Type
|
|
68
|
+
bug / feature / pattern / improvement
|
|
69
|
+
|
|
70
|
+
## Description
|
|
71
|
+
[User's description]
|
|
72
|
+
|
|
73
|
+
## Context
|
|
74
|
+
- Wizard version: [from SDLC.md metadata]
|
|
75
|
+
- Setup type: [detected stack if permission granted]
|
|
76
|
+
|
|
77
|
+
## Evidence (if pattern sharing)
|
|
78
|
+
[What the user customized and why it worked]
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
Submitted via `/feedback` skill
|
|
82
|
+
EOF
|
|
83
|
+
)"
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## Rules
|
|
87
|
+
|
|
88
|
+
- **Privacy first** — always ask before scanning anything
|
|
89
|
+
- **Opt-in only** — if user declines scan, still create the issue with whatever they tell you manually
|
|
90
|
+
- **No source code** — never include source code snippets in issues
|
|
91
|
+
- **Be specific** — vague issues waste maintainer time. Ask clarifying questions
|
|
92
|
+
- **Check for duplicates** — `gh issue list --repo BaseInfinity/agentic-ai-sdlc-wizard --search "keywords"` before creating
|
|
@@ -48,7 +48,8 @@ TodoWrite([
|
|
|
48
48
|
{ content: "Post-deploy verification (if deploy task — see Deployment Tasks)", status: "pending", activeForm: "Verifying deployment" },
|
|
49
49
|
// FINAL
|
|
50
50
|
{ content: "Present summary: changes, tests, CI status", status: "pending", activeForm: "Presenting final summary" },
|
|
51
|
-
{ content: "Capture learnings (if any — update TESTING.md, CLAUDE.md, or feature docs)", status: "pending", activeForm: "Capturing session learnings" }
|
|
51
|
+
{ content: "Capture learnings (if any — update TESTING.md, CLAUDE.md, or feature docs)", status: "pending", activeForm: "Capturing session learnings" },
|
|
52
|
+
{ content: "Close out plan files: if task came from a plan, mark complete or delete", status: "pending", activeForm: "Closing plan artifacts" }
|
|
52
53
|
])
|
|
53
54
|
```
|
|
54
55
|
|
|
@@ -72,6 +73,42 @@ Your work is scored on these criteria. **Critical** criteria are must-pass.
|
|
|
72
73
|
|
|
73
74
|
Critical miss on `tdd_red` or `self_review` = process failure regardless of total score.
|
|
74
75
|
|
|
76
|
+
## Test Failure Recovery (SDET Philosophy)
|
|
77
|
+
|
|
78
|
+
```
|
|
79
|
+
┌─────────────────────────────────────────────────────────────────────┐
|
|
80
|
+
│ ALL TESTS MUST PASS. NO EXCEPTIONS. │
|
|
81
|
+
│ │
|
|
82
|
+
│ This is not negotiable. This is not flexible. This is absolute. │
|
|
83
|
+
└─────────────────────────────────────────────────────────────────────┘
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
**Not acceptable:**
|
|
87
|
+
- "Those were already failing" → Fix them first
|
|
88
|
+
- "Not related to my changes" → Doesn't matter, fix it
|
|
89
|
+
- "It's flaky" → Flaky = bug, investigate
|
|
90
|
+
|
|
91
|
+
**Treat test code like app code.** Test failures are bugs. Investigate them the way a 15-year SDET would - with thought and care, not by brushing them aside.
|
|
92
|
+
|
|
93
|
+
If tests fail:
|
|
94
|
+
1. Identify which test(s) failed
|
|
95
|
+
2. Diagnose WHY - this is the important part:
|
|
96
|
+
- Your code broke it? Fix your code (regression)
|
|
97
|
+
- Test is for deleted code? Delete the test
|
|
98
|
+
- Test has wrong assertions? Fix the test
|
|
99
|
+
- Test is "flaky"? Investigate - flakiness is just another word for bug
|
|
100
|
+
3. Fix appropriately (fix code, fix test, or delete dead test)
|
|
101
|
+
4. Run specific test individually first
|
|
102
|
+
5. Then run ALL tests
|
|
103
|
+
6. Still failing? ASK USER - don't spin your wheels
|
|
104
|
+
|
|
105
|
+
**Flaky tests are bugs, not mysteries:**
|
|
106
|
+
- Sometimes the bug is in app code (race condition, timing issue)
|
|
107
|
+
- Sometimes the bug is in test code (shared state, not parallel-safe)
|
|
108
|
+
- Sometimes the bug is in test environment (cleanup not proper)
|
|
109
|
+
|
|
110
|
+
Debug it. Find root cause. Fix it properly. Tests ARE code.
|
|
111
|
+
|
|
75
112
|
## New Pattern & Test Design Scrutiny (PLANNING)
|
|
76
113
|
|
|
77
114
|
**New design patterns require human approval:**
|
|
@@ -89,11 +126,12 @@ Critical miss on `tdd_red` or `self_review` = process failure regardless of tota
|
|
|
89
126
|
|
|
90
127
|
**Adding a new skill, hook, workflow, or component? PROVE IT FIRST:**
|
|
91
128
|
|
|
92
|
-
1. **
|
|
93
|
-
2. **
|
|
94
|
-
3. **If
|
|
95
|
-
4. **
|
|
96
|
-
5. **
|
|
129
|
+
1. **Absorption check:** Can this be added as a section in an existing skill instead of a new component? Default is YES — new skills/hooks need strong justification. Releasing is SDLC, not a separate skill. Debugging is SDLC, not a separate skill. Keep it lean
|
|
130
|
+
2. **Research:** Does something equivalent already exist (native CC, third-party plugin, existing skill)?
|
|
131
|
+
3. **If YES:** Why is yours better? Show evidence (A/B test, quality comparison, gap analysis)
|
|
132
|
+
4. **If NO:** What gap does this fill? Is the gap real or theoretical?
|
|
133
|
+
5. **Quality tests:** New additions MUST have tests that prove OUTPUT QUALITY, not just existence
|
|
134
|
+
6. **Less is more:** Every addition is maintenance burden. Default answer is NO unless proven YES
|
|
97
135
|
|
|
98
136
|
**Existence tests are NOT quality tests:**
|
|
99
137
|
- BAD: "ci-analyzer skill file exists" — proves nothing about quality
|
|
@@ -110,6 +148,19 @@ Critical miss on `tdd_red` or `self_review` = process failure regardless of tota
|
|
|
110
148
|
2. **Transition** (after approval): Update feature docs
|
|
111
149
|
3. **Implementation**: TDD RED -> GREEN -> PASS
|
|
112
150
|
|
|
151
|
+
### Auto-Approval: Skip Plan Approval Step
|
|
152
|
+
|
|
153
|
+
If ALL of these are true, skip plan approval and go straight to TDD:
|
|
154
|
+
- Confidence is **HIGH (95%+)** — you know exactly what to do
|
|
155
|
+
- Task is **single-file or trivial** (config tweak, small bug fix, string change)
|
|
156
|
+
- No new patterns introduced
|
|
157
|
+
- No architectural decisions
|
|
158
|
+
|
|
159
|
+
When auto-approving, still announce your approach — just don't wait for approval:
|
|
160
|
+
> "Confidence HIGH (95%). Single-file change. Proceeding directly to TDD."
|
|
161
|
+
|
|
162
|
+
**When in doubt, wait for approval.** Auto-approval is for clear-cut cases only.
|
|
163
|
+
|
|
113
164
|
## Confidence Check (REQUIRED)
|
|
114
165
|
|
|
115
166
|
Before presenting approach, STATE your confidence:
|
|
@@ -118,9 +169,9 @@ Before presenting approach, STATE your confidence:
|
|
|
118
169
|
|-------|---------|--------|--------|
|
|
119
170
|
| HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval | `high` (default) |
|
|
120
171
|
| MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties | `high` (default) |
|
|
121
|
-
| LOW (<60%) | Not sure | ASK USER
|
|
122
|
-
| FAILED 2x | Something's wrong |
|
|
123
|
-
| CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help | Try `/effort max` |
|
|
172
|
+
| LOW (<60%) | Not sure | Do more research or try cross-model research (Codex) to get to 95%. If still LOW after research, ASK USER | Consider `/effort max` |
|
|
173
|
+
| FAILED 2x | Something's wrong | Try cross-model research (Codex) for a fresh perspective. If still stuck, STOP and ASK USER | Try `/effort max` |
|
|
174
|
+
| CONFUSED | Can't diagnose why something is failing | Try cross-model research (Codex). If still confused, STOP. Describe what you tried, ask for help | Try `/effort max` |
|
|
124
175
|
|
|
125
176
|
## Self-Review Loop (CRITICAL)
|
|
126
177
|
|
|
@@ -153,36 +204,76 @@ PLANNING -> DOCS -> TDD RED -> TDD GREEN -> Tests Pass -> Self-Review
|
|
|
153
204
|
|
|
154
205
|
**Prerequisites:** Codex CLI installed (`npm i -g @openai/codex`), OpenAI API key set.
|
|
155
206
|
|
|
156
|
-
|
|
207
|
+
**The core insight:** The review PROTOCOL is universal across domains. Only the review INSTRUCTIONS change. Code review is the default template below. For non-code domains (research, persuasion, medical content), adapt the `review_instructions` and `verification_checklist` fields while keeping the same handoff/dialogue/convergence loop.
|
|
157
208
|
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
209
|
+
### Step 0: Write Preflight Self-Review Doc
|
|
210
|
+
|
|
211
|
+
Before submitting to an external reviewer, document what YOU already checked. This is proven to reduce reviewer findings to 0-1 per round (evidence: anticheat repo preflight discipline).
|
|
212
|
+
|
|
213
|
+
Write `.reviews/preflight-{review_id}.md`:
|
|
214
|
+
```markdown
|
|
215
|
+
## Preflight Self-Review: {feature}
|
|
216
|
+
- [ ] Self-review via /code-review passed
|
|
217
|
+
- [ ] All tests passing
|
|
218
|
+
- [ ] Checked for: [specific concerns for this change]
|
|
219
|
+
- [ ] Verified: [what you manually confirmed]
|
|
220
|
+
- [ ] Known limitations: [what you couldn't verify]
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
### Step 1: Write Mission-First Handoff
|
|
224
|
+
|
|
225
|
+
After self-review and preflight pass, write `.reviews/handoff.json`:
|
|
226
|
+
```jsonc
|
|
227
|
+
{
|
|
228
|
+
"review_id": "feature-xyz-001",
|
|
229
|
+
"status": "PENDING_REVIEW",
|
|
230
|
+
"round": 1,
|
|
231
|
+
"mission": "What changed and why — 2-3 sentences of context",
|
|
232
|
+
"success": "What 'correctly reviewed' looks like — the reviewer's goal",
|
|
233
|
+
"failure": "What gets missed if the reviewer is superficial",
|
|
234
|
+
"files_changed": ["src/auth.ts", "tests/auth.test.ts"],
|
|
235
|
+
"fixes_applied": [],
|
|
236
|
+
"previous_score": null,
|
|
237
|
+
"verification_checklist": [
|
|
238
|
+
"(a) Verify input validation at auth.ts:45 handles empty strings",
|
|
239
|
+
"(b) Verify test covers the null-token edge case",
|
|
240
|
+
"(c) Check no hardcoded secrets in diff"
|
|
241
|
+
],
|
|
242
|
+
"review_instructions": "Focus on security and edge cases. Be strict — assume bugs may be present until proven otherwise.",
|
|
243
|
+
"preflight_path": ".reviews/preflight-feature-xyz-001.md",
|
|
244
|
+
"artifact_path": ".reviews/feature-xyz-001/"
|
|
245
|
+
}
|
|
246
|
+
```
|
|
182
247
|
|
|
183
|
-
|
|
248
|
+
**Key fields explained:**
|
|
249
|
+
- `mission/success/failure` — Gives the reviewer context. Without this, you get generic "looks good" feedback. With it, reviewers read raw source files and verify specific claims (proven across 4 repos)
|
|
250
|
+
- `verification_checklist` — Specific things to verify with file:line references. NOT "review for correctness" — that's too vague. Each item is independently verifiable
|
|
251
|
+
- `preflight_path` — Shows the reviewer what you already checked, so they focus on what you might have missed
|
|
252
|
+
|
|
253
|
+
### Step 2: Run the Independent Reviewer
|
|
254
|
+
|
|
255
|
+
```bash
|
|
256
|
+
codex exec \
|
|
257
|
+
-c 'model_reasoning_effort="xhigh"' \
|
|
258
|
+
-s danger-full-access \
|
|
259
|
+
-o .reviews/latest-review.md \
|
|
260
|
+
"You are an independent code reviewer performing a certification audit. \
|
|
261
|
+
Read .reviews/handoff.json for full context — mission, success/failure \
|
|
262
|
+
conditions, and verification checklist. \
|
|
263
|
+
Verify each checklist item with evidence (file:line, grep results, test output). \
|
|
264
|
+
Output each finding with: ID (1, 2, ...), severity (P0/P1/P2), evidence, \
|
|
265
|
+
and a 'certify condition' (what specific change resolves it). \
|
|
266
|
+
Re-verify any prior-round passes still hold. \
|
|
267
|
+
End with: score (1-10), CERTIFIED or NOT CERTIFIED."
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
**Always use `xhigh` reasoning effort.** Lower settings miss subtle errors (wrong-generation references, stale pricing, cross-file inconsistencies).
|
|
271
|
+
|
|
272
|
+
If CERTIFIED → proceed to CI. If NOT CERTIFIED → go to dialogue loop.
|
|
273
|
+
|
|
274
|
+
### Step 3: Dialogue Loop
|
|
184
275
|
|
|
185
|
-
|
|
276
|
+
Respond per-finding — don't silently fix everything:
|
|
186
277
|
|
|
187
278
|
1. Write `.reviews/response.json`:
|
|
188
279
|
```jsonc
|
|
@@ -192,16 +283,16 @@ When the reviewer finds issues, respond per-finding instead of silently fixing e
|
|
|
192
283
|
"responding_to": ".reviews/latest-review.md",
|
|
193
284
|
"responses": [
|
|
194
285
|
{ "finding": "1", "action": "FIXED", "summary": "Added missing validation" },
|
|
195
|
-
{ "finding": "2", "action": "DISPUTED", "justification": "
|
|
286
|
+
{ "finding": "2", "action": "DISPUTED", "justification": "Intentional — see CODE_REVIEW_EXCEPTIONS.md" },
|
|
196
287
|
{ "finding": "3", "action": "ACCEPTED", "summary": "Will add test coverage" }
|
|
197
288
|
]
|
|
198
289
|
}
|
|
199
290
|
```
|
|
200
|
-
- **FIXED**: "I fixed this. Here
|
|
201
|
-
- **DISPUTED**: "This is intentional/incorrect. Here
|
|
202
|
-
- **ACCEPTED**: "You
|
|
291
|
+
- **FIXED**: "I fixed this. Here's what changed." Reviewer verifies against certify condition.
|
|
292
|
+
- **DISPUTED**: "This is intentional/incorrect. Here's why." Reviewer accepts or rejects with reasoning.
|
|
293
|
+
- **ACCEPTED**: "You're right. Fixing now." (Same as FIXED, batched.)
|
|
203
294
|
|
|
204
|
-
2. Update `handoff.json`
|
|
295
|
+
2. Update `handoff.json`: increment `round`, set `"status": "PENDING_RECHECK"`, add `fixes_applied` list with numbered items and file:line references, update `previous_score`.
|
|
205
296
|
|
|
206
297
|
3. Run targeted recheck (NOT a full re-review):
|
|
207
298
|
```bash
|
|
@@ -209,60 +300,88 @@ When the reviewer finds issues, respond per-finding instead of silently fixing e
|
|
|
209
300
|
-c 'model_reasoning_effort="xhigh"' \
|
|
210
301
|
-s danger-full-access \
|
|
211
302
|
-o .reviews/latest-review.md \
|
|
212
|
-
"
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
FIXED → verify the fix against the original certify condition. \
|
|
217
|
-
DISPUTED → evaluate the justification (ACCEPT if sound, REJECT if not). \
|
|
303
|
+
"TARGETED RECHECK — not a full re-review. Read .reviews/handoff.json \
|
|
304
|
+
for previous_review path and response.json for the author's responses. \
|
|
305
|
+
For each finding: FIXED → verify against original certify condition. \
|
|
306
|
+
DISPUTED → evaluate justification (ACCEPT if sound, REJECT with reasoning). \
|
|
218
307
|
ACCEPTED → verify it was applied. \
|
|
219
308
|
Do NOT raise new findings unless P0 (critical/security). \
|
|
220
309
|
New observations go in 'Notes for next review' (non-blocking). \
|
|
221
|
-
|
|
310
|
+
Re-verify all prior passes still hold. \
|
|
311
|
+
End with: score (1-10), CERTIFIED or NOT CERTIFIED."
|
|
222
312
|
```
|
|
223
313
|
|
|
224
|
-
4. If CERTIFIED → done. If NOT CERTIFIED (rejected disputes or failed fixes) → fix rejected items and repeat.
|
|
225
|
-
|
|
226
314
|
### Convergence
|
|
227
315
|
|
|
228
|
-
|
|
316
|
+
**2 rounds is the sweet spot. 3 max.** Research across 14 repos and 7 papers confirms additional rounds beyond 3 produce <5% position shift.
|
|
317
|
+
|
|
318
|
+
Max 2 recheck rounds (3 total including initial review). If still NOT CERTIFIED after round 3, escalate to the user with a summary of open findings.
|
|
229
319
|
|
|
230
320
|
```
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
Reviewer: TARGETED RECHECK (previous findings only)
|
|
245
|
-
|
|
|
246
|
-
All resolved? → YES → CERTIFIED
|
|
247
|
-
|
|
|
248
|
-
NO → fix rejected items, repeat
|
|
249
|
-
(max 3 rechecks, then escalate to user)
|
|
321
|
+
Preflight → handoff.json (round 1) → FULL REVIEW
|
|
322
|
+
|
|
|
323
|
+
CERTIFIED? → YES → CI
|
|
324
|
+
|
|
|
325
|
+
NO (scored findings)
|
|
326
|
+
|
|
|
327
|
+
response.json (FIXED/DISPUTED/ACCEPTED)
|
|
328
|
+
|
|
|
329
|
+
handoff.json (round 2+) → TARGETED RECHECK
|
|
330
|
+
|
|
|
331
|
+
CERTIFIED? → YES → CI
|
|
332
|
+
|
|
|
333
|
+
NO → one more round, then escalate
|
|
250
334
|
```
|
|
251
335
|
|
|
252
336
|
**Tool-agnostic:** The value is adversarial diversity (different model, different blind spots), not the specific tool. Any competing AI reviewer works.
|
|
253
337
|
|
|
254
|
-
|
|
338
|
+
### Anti-Patterns to Avoid
|
|
339
|
+
|
|
340
|
+
- **"Find at least N problems"** — Incentivizes false positives. Use adversarial framing ("assume bugs may be present") instead
|
|
341
|
+
- **"Review this"** — Too vague, gets generic feedback. Use mission + verification checklist
|
|
342
|
+
- **Numeric 1-10 scales without criteria** — Unreliable. Decompose into specific checklist items
|
|
343
|
+
- **Letting reviewer see author's reasoning** — Causes anchoring bias. Let them form independent opinion from code
|
|
255
344
|
|
|
256
345
|
### Release Review Focus
|
|
257
346
|
|
|
258
|
-
Before any release/publish, add these to `
|
|
347
|
+
Before any release/publish, add these to `verification_checklist`:
|
|
259
348
|
- **CHANGELOG consistency** — all sections present, no lost entries during consolidation
|
|
260
349
|
- **Version parity** — package.json, SDLC.md, CHANGELOG, wizard metadata all match
|
|
261
350
|
- **Stale examples** — hardcoded version strings in docs match current release
|
|
262
351
|
- **Docs accuracy** — README, ARCHITECTURE.md reflect current feature set
|
|
263
352
|
- **CLI-distributed file parity** — live skills, hooks, settings match CLI templates
|
|
264
353
|
|
|
265
|
-
|
|
354
|
+
### Multiple Reviewers (N-Reviewer Pipeline)
|
|
355
|
+
|
|
356
|
+
When multiple reviewers comment on a PR (Claude PR review, Codex, human reviewers), address each reviewer independently:
|
|
357
|
+
|
|
358
|
+
1. **Read all reviews** — `gh api repos/OWNER/REPO/pulls/PR/comments` to get every reviewer's feedback
|
|
359
|
+
2. **Respond per-reviewer** — Each reviewer has different blind spots and priorities. Address each one's findings separately
|
|
360
|
+
3. **Resolve conflicts** — If reviewers disagree, pick the stronger argument, note why
|
|
361
|
+
4. **Iterate until all approve** — Don't merge until every active reviewer is satisfied
|
|
362
|
+
5. **Max 3 iterations per reviewer** — If a reviewer keeps finding new things, escalate to the user
|
|
363
|
+
|
|
364
|
+
### Adapting for Non-Code Domains
|
|
365
|
+
|
|
366
|
+
The handoff format and dialogue loop work for ANY domain. Only `review_instructions` and `verification_checklist` change:
|
|
367
|
+
|
|
368
|
+
| Domain | Instructions Focus | Checklist Example |
|
|
369
|
+
|--------|-------------------|-------------------|
|
|
370
|
+
| **Code (default)** | Security, logic bugs, test coverage | "Verify input validation at file:line" |
|
|
371
|
+
| **Research/Docs** | Factual accuracy, source verification, overclaims | "Verify $736-$804 appears in both docs, no stale $695-$723 remains" |
|
|
372
|
+
| **Persuasion** | Audience psychology, tone, trust | "If you were [audience], what's the moment you'd stop reading?" |
|
|
373
|
+
|
|
374
|
+
For non-code: add `"audience"` and `"stakes"` fields to handoff.json. For code, these are implied (audience = other developers, stakes = production impact).
|
|
375
|
+
|
|
376
|
+
### Custom Subagents (`.claude/agents/`)
|
|
377
|
+
|
|
378
|
+
Claude Code supports custom subagents in `.claude/agents/`:
|
|
379
|
+
|
|
380
|
+
- **`sdlc-reviewer`** — SDLC compliance review (planning, TDD, self-review checks)
|
|
381
|
+
- **`ci-debug`** — CI failure diagnosis (reads logs, identifies root cause, suggests fix)
|
|
382
|
+
- **`test-writer`** — Quality tests following TESTING.md philosophies
|
|
383
|
+
|
|
384
|
+
**Skills** guide Claude's behavior. **Agents** run autonomously and return results. Use agents for parallel work or fresh context windows.
|
|
266
385
|
|
|
267
386
|
## Test Review (Harder Than Implementation)
|
|
268
387
|
|
|
@@ -276,6 +395,16 @@ During self-review, critique tests HARDER than app code:
|
|
|
276
395
|
|
|
277
396
|
**Tests are the foundation.** Bad tests = false confidence = production bugs.
|
|
278
397
|
|
|
398
|
+
### Testing Diamond — Know Your Layers
|
|
399
|
+
|
|
400
|
+
| Layer | What It Tests | % of Suite | Key Trait |
|
|
401
|
+
|-------|--------------|------------|-----------|
|
|
402
|
+
| **E2E** | Full user flow through UI/browser (Playwright, Cypress) | ~5% | Slow, brittle, but proves the real thing works |
|
|
403
|
+
| **Integration** | Real systems via API without UI — real DB, real cache, real services | ~90% | **Best bang for buck.** Fast, stable, high confidence |
|
|
404
|
+
| **Unit** | Pure logic only — no DB, no API, no filesystem | ~5% | Fast but limited scope |
|
|
405
|
+
|
|
406
|
+
**The critical boundary:** E2E tests go through the user's actual UI/browser. Integration tests hit real systems via API but without UI. If your test doesn't open a browser or render a UI, it's not E2E — it's integration. This distinction matters because mislabeling integration tests as E2E leads to overinvestment in slow browser tests when fast API-level tests would suffice.
|
|
407
|
+
|
|
279
408
|
### Minimal Mocking Philosophy
|
|
280
409
|
|
|
281
410
|
| What | Mock? | Why |
|
|
@@ -327,41 +456,34 @@ If you notice something else that should be fixed:
|
|
|
327
456
|
|
|
328
457
|
**Why this matters:** AI agents can drift into "helpful" changes that weren't requested. This creates unexpected diffs, breaks unrelated things, and makes code review harder.
|
|
329
458
|
|
|
330
|
-
##
|
|
459
|
+
## Debugging Workflow (Systematic Investigation)
|
|
460
|
+
|
|
461
|
+
When something breaks and the cause isn't obvious, follow this systematic debugging workflow:
|
|
331
462
|
|
|
332
463
|
```
|
|
333
|
-
|
|
334
|
-
│ ALL TESTS MUST PASS. NO EXCEPTIONS. │
|
|
335
|
-
│ │
|
|
336
|
-
│ This is not negotiable. This is not flexible. This is absolute. │
|
|
337
|
-
└─────────────────────────────────────────────────────────────────────┘
|
|
464
|
+
Reproduce → Isolate → Root Cause → Fix → Regression Test
|
|
338
465
|
```
|
|
339
466
|
|
|
340
|
-
**
|
|
341
|
-
|
|
342
|
-
|
|
343
|
-
|
|
344
|
-
|
|
345
|
-
**Treat test code like app code.** Test failures are bugs. Investigate them the way a 15-year SDET would - with thought and care, not by brushing them aside.
|
|
467
|
+
1. **Reproduce** — Can you make it fail consistently? If intermittent, stress-test (run N times). If you can't reproduce it, you can't fix it
|
|
468
|
+
2. **Isolate** — Narrow the scope. Which file? Which function? Which input? Use binary search: comment out half the code, does it still fail?
|
|
469
|
+
3. **Root cause** — Don't fix symptoms. Ask "why?" until you hit the actual cause. "It crashes on line 42" is a symptom. "Null pointer because the API returns undefined when rate-limited" is a root cause
|
|
470
|
+
4. **Fix** — Fix the root cause, not the symptom. Write the fix
|
|
471
|
+
5. **Regression test** — Write a test that fails without your fix and passes with it (TDD GREEN)
|
|
346
472
|
|
|
347
|
-
|
|
348
|
-
|
|
349
|
-
|
|
350
|
-
|
|
351
|
-
- Test is for deleted code? Delete the test
|
|
352
|
-
- Test has wrong assertions? Fix the test
|
|
353
|
-
- Test is "flaky"? Investigate - flakiness is just another word for bug
|
|
354
|
-
3. Fix appropriately (fix code, fix test, or delete dead test)
|
|
355
|
-
4. Run specific test individually first
|
|
356
|
-
5. Then run ALL tests
|
|
357
|
-
6. Still failing? ASK USER - don't spin your wheels
|
|
473
|
+
**For regressions** (it worked before, now it doesn't):
|
|
474
|
+
- Use `git bisect` to find the exact commit that broke it
|
|
475
|
+
- `git bisect start`, `git bisect bad` (current), `git bisect good <known-good-commit>`
|
|
476
|
+
- Bisect narrows to the breaking commit in O(log n) steps
|
|
358
477
|
|
|
359
|
-
**
|
|
360
|
-
-
|
|
361
|
-
-
|
|
362
|
-
-
|
|
478
|
+
**Environment-specific bugs** (works locally, fails in CI/staging/prod):
|
|
479
|
+
- Check environment differences: env vars, OS version, dependency versions, file permissions
|
|
480
|
+
- Reproduce the environment locally if possible (Docker, env vars)
|
|
481
|
+
- Add logging at the failure point — don't guess, observe
|
|
363
482
|
|
|
364
|
-
|
|
483
|
+
**When to stop and ask:**
|
|
484
|
+
- After 2 failed fix attempts → STOP and ASK USER
|
|
485
|
+
- If the bug is in code you don't understand → read first, then fix
|
|
486
|
+
- If reproducing requires access you don't have → ASK USER
|
|
365
487
|
|
|
366
488
|
## CI Feedback Loop — Local Shepherd (After Commit)
|
|
367
489
|
|
|
@@ -383,25 +505,25 @@ Local tests pass -> Commit -> Push -> Watch CI
|
|
|
383
505
|
STOP and ASK USER
|
|
384
506
|
```
|
|
385
507
|
|
|
386
|
-
|
|
387
|
-
|
|
388
|
-
|
|
389
|
-
|
|
390
|
-
|
|
391
|
-
|
|
508
|
+
```
|
|
509
|
+
┌─────────────────────────────────────────────────────────────────────┐
|
|
510
|
+
│ NEVER AUTO-MERGE. NO EXCEPTIONS. │
|
|
511
|
+
│ │
|
|
512
|
+
│ Do NOT run `gh pr merge --auto`. Ever. │
|
|
513
|
+
│ Auto-merge fires before you can read review feedback. │
|
|
514
|
+
│ The shepherd loop IS the process. Skipping it = shipping bugs. │
|
|
515
|
+
└─────────────────────────────────────────────────────────────────────┘
|
|
516
|
+
```
|
|
392
517
|
|
|
393
|
-
|
|
394
|
-
|
|
518
|
+
**The full shepherd sequence — every step is mandatory:**
|
|
519
|
+
1. Push changes to remote
|
|
520
|
+
2. Watch CI: `gh pr checks --watch`
|
|
521
|
+
3. If CI fails → read logs (`gh run view <RUN_ID> --log-failed`), fix, push again (max 2 attempts)
|
|
522
|
+
4. If CI passes → read ALL review comments: `gh api repos/OWNER/REPO/pulls/PR/comments`
|
|
523
|
+
5. Fix valid suggestions, push, iterate until clean
|
|
524
|
+
6. Only then: explicit merge with `gh pr merge --squash`
|
|
395
525
|
|
|
396
|
-
|
|
397
|
-
gh run view <RUN_ID> --log-failed
|
|
398
|
-
```
|
|
399
|
-
3. If CI fails:
|
|
400
|
-
- Read failure logs: `gh run view <RUN_ID> --log-failed`
|
|
401
|
-
- Diagnose root cause (same philosophy as local test failures)
|
|
402
|
-
- Fix and push again
|
|
403
|
-
4. Max 2 fix attempts - if still failing, ASK USER
|
|
404
|
-
5. If CI passes - proceed to present final summary
|
|
526
|
+
**Why this is non-negotiable:** PR #145 auto-merged a release before review feedback was read. CI reviewer found a P1 dead-code bug that shipped to main. The fix required a follow-up commit. Auto-merge cost more time than the shepherd loop would have taken.
|
|
405
527
|
|
|
406
528
|
**Context GC (compact during idle):** While waiting for CI (typically 3-5 min), suggest `/compact` if the conversation is long. Think of it like a time-based garbage collector — idle time + high memory pressure = good time to collect. Don't suggest on short conversations.
|
|
407
529
|
|
|
@@ -451,6 +573,8 @@ CI passes -> Read review suggestions
|
|
|
451
573
|
- Auto-compact fires at ~95% capacity — no manual management needed
|
|
452
574
|
- After committing a PR, `/clear` before starting the next feature
|
|
453
575
|
|
|
576
|
+
**`--bare` mode (v2.1.81+):** `claude -p "prompt" --bare` skips ALL hooks, skills, LSP, and plugins. This is a complete wizard bypass — no SDLC enforcement, no TDD checks, no planning hooks. Use only for scripted headless calls (CI pipelines, automation) where you explicitly don't want wizard enforcement. Never use `--bare` for normal development work.
|
|
577
|
+
|
|
454
578
|
## DRY Principle
|
|
455
579
|
|
|
456
580
|
**Before coding:** "What patterns exist I can reuse?"
|
|
@@ -475,6 +599,21 @@ CI passes -> Read review suggestions
|
|
|
475
599
|
|
|
476
600
|
**If no DESIGN_SYSTEM.md exists:** Skip these checks (project has no documented design system).
|
|
477
601
|
|
|
602
|
+
## Release Planning (If Task Involves a Release)
|
|
603
|
+
|
|
604
|
+
**When to check:** Task mentions "release", "publish", "version bump", "npm publish", or multiple items being shipped together.
|
|
605
|
+
**When to skip:** Single feature implementation, bug fix, or anything that isn't a release.
|
|
606
|
+
|
|
607
|
+
Before implementing any release items:
|
|
608
|
+
|
|
609
|
+
1. **List all items** — Read ROADMAP.md (or equivalent), identify every item planned for this release
|
|
610
|
+
2. **Plan each at 95% confidence** — For each item: what files change, what tests prove it works, what's the blast radius. If confidence < 95% on any item, flag it
|
|
611
|
+
3. **Identify blocks** — Which items depend on others? What must go first?
|
|
612
|
+
4. **Present all plans together** — User reviews the complete batch, not one at a time. This catches conflicts, sequencing issues, and scope creep before any code is written
|
|
613
|
+
5. **User approves, then implement** — Full SDLC per item (TDD RED → GREEN → self-review), in the prioritized order
|
|
614
|
+
|
|
615
|
+
**Why batch planning works:** Ad-hoc one-at-a-time implementation leads to unvalidated additions and scope creep. Batch planning catches problems early — if you can't plan it at 95%, you're not ready to ship it.
|
|
616
|
+
|
|
478
617
|
## Deployment Tasks (If Task Involves Deploy)
|
|
479
618
|
|
|
480
619
|
**When to check:** Task mentions "deploy", "release", "push to prod", "staging", etc.
|
|
@@ -540,6 +679,26 @@ If this session revealed insights, update the right place:
|
|
|
540
679
|
- **Feature-specific quirks** → Feature docs (`*_PLAN.md`, `*_DOCS.md`)
|
|
541
680
|
- **Architecture decisions** → `docs/decisions/` (ADR format) or `ARCHITECTURE.md`
|
|
542
681
|
- **General project context** → `CLAUDE.md` (or `/revise-claude-md`)
|
|
682
|
+
- **Plan files** → If this session's work came from a plan file, delete it or mark it complete. Stale plans mislead future sessions into thinking work is still pending
|
|
683
|
+
|
|
684
|
+
## Post-Mortem: When Process Fails, Feed It Back
|
|
685
|
+
|
|
686
|
+
**Every process failure becomes an enforcement rule.** When you skip a step and it causes a problem, don't just fix the symptom — add a gate so it can't happen again.
|
|
687
|
+
|
|
688
|
+
```
|
|
689
|
+
Incident → Root Cause → New Rule → Test That Proves the Rule → Ship
|
|
690
|
+
```
|
|
691
|
+
|
|
692
|
+
**How to post-mortem a process failure:**
|
|
693
|
+
1. **What happened?** — Describe the incident (what went wrong, what was the impact)
|
|
694
|
+
2. **Root cause** — Not "I forgot" — what structurally allowed the skip? Was it guidance (easy to ignore) instead of a gate (impossible to skip)?
|
|
695
|
+
3. **New rule** — Turn the failure into an enforcement rule in the SDLC skill
|
|
696
|
+
4. **Test** — Write a test that proves the rule exists (TDD — the rule is code too)
|
|
697
|
+
5. **Evidence** — Reference the incident so future readers understand WHY the rule exists
|
|
698
|
+
|
|
699
|
+
**Example (real incident):** PR #145 auto-merged before CI review was read. Root cause: auto-merge was enabled by default, no enforcement gate existed. New rule: "NEVER AUTO-MERGE" block added to CI Shepherd section with the same weight as "ALL TESTS MUST PASS." Test: `test_never_auto_merge_gate` verifies the block exists.
|
|
700
|
+
|
|
701
|
+
**Industry pattern:** "Every mistake becomes a rule" — the best SDLC systems are built from accumulated incident learnings, not theoretical best practices.
|
|
543
702
|
|
|
544
703
|
---
|
|
545
704
|
|
|
@@ -37,6 +37,7 @@ Scan the project root for:
|
|
|
37
37
|
- Feature docs: *_PLAN.md, *_DOCS.md, *_SPEC.md, docs/
|
|
38
38
|
- Deployment: Dockerfile, vercel.json, fly.toml, netlify.toml, Procfile, k8s/
|
|
39
39
|
- Design system: tailwind.config.*, .storybook/, theme files, CSS custom properties
|
|
40
|
+
- Branding assets: BRANDING.md, brand/, logos/, style-guide.md, brand-voice.md, tone-of-voice.*
|
|
40
41
|
- Existing docs: README.md, CLAUDE.md, ARCHITECTURE.md
|
|
41
42
|
- Scripts in package.json (lint, test, build, typecheck, etc.)
|
|
42
43
|
- Database config files (prisma/, drizzle.config.*, knexfile.*, .env with DB_*)
|
|
@@ -151,6 +152,16 @@ Only if design system artifacts were found in Step 1:
|
|
|
151
152
|
|
|
152
153
|
Skip this step if no UI/design system detected.
|
|
153
154
|
|
|
155
|
+
### Step 8.5: Generate BRANDING.md (If Branding Detected)
|
|
156
|
+
|
|
157
|
+
Only if branding-related assets were found in Step 1 (brand/, logos/, style-guide.md, brand-voice.md, existing BRANDING.md, or UI/content-heavy project detected):
|
|
158
|
+
- Brand voice and tone guidelines
|
|
159
|
+
- Naming conventions (product names, feature names, terminology)
|
|
160
|
+
- Visual identity summary (logo usage, color palette references)
|
|
161
|
+
- Content style guide (if the project has user-facing copy)
|
|
162
|
+
|
|
163
|
+
Skip this step if no branding assets or UI/content patterns detected.
|
|
164
|
+
|
|
154
165
|
### Step 9: Configure Tool Permissions
|
|
155
166
|
|
|
156
167
|
Based on detected stack, suggest `allowedTools` entries for `.claude/settings.json`:
|
|
@@ -45,13 +45,14 @@ Extract the latest version from the first `## [X.X.X]` line.
|
|
|
45
45
|
Parse all CHANGELOG entries between the user's installed version and the latest. Present a clear summary:
|
|
46
46
|
|
|
47
47
|
```
|
|
48
|
-
Installed: 1.
|
|
49
|
-
Latest: 1.
|
|
48
|
+
Installed: 1.21.0
|
|
49
|
+
Latest: 1.23.0
|
|
50
50
|
|
|
51
51
|
What changed:
|
|
52
|
+
- [1.23.0] Update notification hook, ...
|
|
53
|
+
- [1.22.0] Plan auto-approval, debugging workflow, /feedback skill, BRANDING.md detection, ...
|
|
52
54
|
- [1.21.0] Confidence-driven setup, prove-it gate, cross-model release review, ...
|
|
53
55
|
- [1.20.0] Version-pinned CC update gate, Tier 1 flakiness fix, flaky test guidance, ...
|
|
54
|
-
- [1.19.0] CI shepherd model, token efficiency, feature doc enforcement, ...
|
|
55
56
|
```
|
|
56
57
|
|
|
57
58
|
**If versions match:** Say "You're up to date! (version X.X.X)" and stop.
|