agentic-sdlc-wizard 1.18.0 → 1.21.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,43 @@ All notable changes to the SDLC Wizard.
4
4
 
5
5
  > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
6
6
 
7
+ ## [1.21.0] - 2026-03-31
8
+
9
+ ### Added
10
+ - Confidence-driven setup wizard — kills the fixed 18 questions. Scans repo, builds confidence per data point, only asks what it can't infer. Dynamic question count (0-2 for well-configured projects, 10+ for bare repos). 95% aggregate confidence threshold (#52)
11
+ - CI Shepherd opt-in question in setup wizard (#48 partial)
12
+ - Cross-model release review recommendation — releases/publishes as explicit trigger, Release Review Checklist with v1.20.0 evidence (#49)
13
+ - Prove It Gate enforcement in SDLC skill — prevents unvalidated additions with quality test requirements (#50)
14
+ - 6 confidence-driven setup tests, 10 prove-it-gate tests, 6 release review tests
15
+
16
+ ### Removed
17
+ - ci-analyzer skill — violated Prove It philosophy (existence-only tests, no quality validation, overlap with `/claude-automation-recommender`) (#50)
18
+ - ci-self-heal.yml deprecated — local shepherd is the primary CI fix mechanism
19
+
20
+ ### Changed
21
+ - Wizard doc: Q-numbered questions → data point descriptions with detection hints
22
+ - Setup skill: 12 steps (was 11) with new "Build Confidence Map" step
23
+ - CLI distributes 8 template files (was 9, removed ci-analyzer)
24
+
25
+ ## [1.20.0] - 2026-03-31
26
+
27
+ ### Added
28
+ - CC version-pinned update gate — E2E tests run actual new CC version, not bundled binary (#46)
29
+ - Tier 1 E2E flakiness fix — regression threshold 1.5→3.0, absorbs ±2-3 point LLM variance (#47)
30
+ - Flaky test prevention guidance with external reference in wizard, SKILL.md
31
+ - 2 release consistency tests (package.json ↔ CHANGELOG ↔ SDLC.md version parity)
32
+
33
+ ## [1.19.0] - 2026-03-31
34
+
35
+ ### Added
36
+ - CI Local Shepherd Model — two-tier CI fix model (shepherd primary, bot fallback), SHA-based suppression (#36)
37
+ - Gap Analysis vs `/claude-automation-recommender` — complementary tools positioning (#35)
38
+ - `/clear` vs `/compact` context management guidance (#38)
39
+ - Token efficiency auditing — `/cost`, `--max-budget-usd`, OpenTelemetry (#42)
40
+ - Blank repo support — verified clean install, 10 new E2E tests (#31)
41
+ - Feature documentation enforcement — ADR guidance, `claude-md-improver`, doc sync in SDLC (#43)
42
+ - Setup skill description trimmed to 199 chars (v2.1.86 caps at 250)
43
+
7
44
  ## [1.18.0] - 2026-03-30
8
45
 
9
46
  ### Added
@@ -37,7 +37,7 @@ As Claude Code improves, the wizard absorbs those improvements and removes its o
37
37
  **But here's the key:** This isn't a one-size-fits-all answer. It's a starting point that helps you find YOUR answer. Every project is different. The self-evaluating loop (plan → build → test → review → improve) needs to be tuned to your codebase, your team, your standards. The wizard gives you the framework — you shape it into something bespoke.
38
38
 
39
39
  **The living system:**
40
- - CI self-heal captures friction signals as GitHub issues for pattern analysis
40
+ - The local shepherd captures friction signals during active sessions
41
41
  - You approve changes to the process
42
42
  - Both sides learn over time
43
43
  - The system improves the system (recursive improvement)
@@ -93,8 +93,10 @@ This prevents both false positives (crying wolf) and false negatives (missing re
93
93
 
94
94
  **How We Apply This:**
95
95
  - Weekly workflow tests new Claude Code versions before recommending upgrade
96
+ - Version-pinned gate: installs the specific CC version and passes it via `path_to_claude_code_executable` so E2E actually runs the new binary
96
97
  - Phase A: Does new CC version break SDLC enforcement?
97
98
  - Phase B: Do changelog-suggested improvements actually help?
99
+ - Green CI = safe to upgrade. Red = stay on current version until fixed
98
100
  - Results shown in PR with statistical confidence
99
101
 
100
102
  ---
@@ -209,6 +211,8 @@ When Anthropic provides official plugins or tools that handle something:
209
211
  | **Claude Code v2.1.69+** | Required for InstructionsLoaded hook, skill directory variable, and Tasks system |
210
212
  | **Git repository** | Files should be committed for team sharing |
211
213
 
214
+ **Blank repos (no CLAUDE.md, no code):** The wizard works on empty repos. Run `npx agentic-sdlc-wizard init` — it installs hooks, skills, and the wizard doc. On first session, the hooks detect missing SDLC files and redirect to `/setup-wizard`, which generates CLAUDE.md, SDLC.md, TESTING.md, and ARCHITECTURE.md interactively. You do NOT need to run Claude's built-in `/init` first — the setup wizard handles everything.
215
+
212
216
  ---
213
217
 
214
218
  ## Recommended Effort Level
@@ -352,6 +356,14 @@ This applies to everything: native Claude Code commands vs custom skills, framew
352
356
 
353
357
  **For the wizard's CI/CD:** When the weekly-update workflow detects a new Claude Code feature that overlaps with a wizard feature, the CI should automatically run E2E with both versions and recommend KEEP CUSTOM / SWITCH TO NATIVE / TIE.
354
358
 
359
+ **This applies to YOUR OWN additions too — not just native vs custom:**
360
+ - Adding a new skill? Prove it fills a gap nothing else covers. Write quality tests.
361
+ - Adding a new hook? Prove it improves scores or catches real issues.
362
+ - Adding a new workflow? Prove the automation ROI exceeds maintenance cost.
363
+ - Existence tests ("file exists", "has frontmatter") are NOT proof. They prove the file was created, not that it works.
364
+
365
+ **Evidence:** ci-analyzer skill was added in v1.20.0 with 4 existence-only tests, zero quality validation, and overlap with the third-party `/claude-automation-recommender`. Deleted in next release. This gap led to the Prove It Gate enforcement in the SDLC skill.
366
+
355
367
  ---
356
368
 
357
369
  ## What You're Setting Up
@@ -644,6 +656,33 @@ Security review depth should match your project's risk profile. During wizard se
644
656
 
645
657
  ---
646
658
 
659
+ ## Context Management: `/clear` vs `/compact`
660
+
661
+ Two tools for managing context — use the right one:
662
+
663
+ | | `/compact` | `/clear` |
664
+ |---|---|---|
665
+ | **What it does** | Summarizes conversation, frees space | Resets conversation entirely |
666
+ | **When to use** | Continuing same task, need more room | Switching to an unrelated task |
667
+ | **Preserves** | Summary of decisions + progress | Nothing (fresh start) |
668
+ | **CLAUDE.md** | Re-loaded from disk | Re-loaded from disk |
669
+ | **Hooks/skills/settings** | Unaffected | Unaffected |
670
+ | **Task list** | Persists | Cleared |
671
+
672
+ **Rules of thumb:**
673
+ - `/compact` between planning and implementation (plan preserved in summary)
674
+ - `/clear` between unrelated tasks (stale context wastes tokens and misleads Claude)
675
+ - `/clear` after 2+ failed corrections on the same issue (context is polluted with bad approaches — start fresh with a better prompt)
676
+ - After committing a PR, `/clear` before starting the next feature
677
+
678
+ **Auto-compact** fires automatically at ~95% context capacity. You don't need to manage this manually — Claude Code handles it. The SDLC skill suggests `/compact` during CI idle time as a "context GC" opportunity.
679
+
680
+ **What survives `/compact`:** Key decisions, code changes, task state (as a summary). What can be lost: detailed early-conversation instructions not in CLAUDE.md, specific file contents read long ago.
681
+
682
+ **Best practice:** Put persistent instructions in CLAUDE.md (survives both `/compact` and `/clear`), not in conversation.
683
+
684
+ ---
685
+
647
686
  ## Example Workflow (End-to-End)
648
687
 
649
688
  Here's what a typical task looks like with this system:
@@ -912,21 +951,25 @@ The wizard creates TDD-specific automations that official plugins don't provide:
912
951
 
913
952
  ### Step 0.3: Additional Recommendations (Optional)
914
953
 
915
- After SDLC setup is complete, run `claude-code-setup` for additional recommendations:
954
+ After SDLC setup is complete, run `/claude-automation-recommender` for stack-specific tooling:
916
955
 
917
956
  ```
918
- "Based on your codebase, recommend additional automations"
957
+ /claude-automation-recommender
919
958
  ```
920
959
 
921
- This may suggest:
922
- - MCP Servers (context7 for docs, Playwright for frontend)
923
- - Additional hooks (auto-format if Prettier configured)
924
- - Subagents (security-reviewer if auth code detected)
960
+ **The wizard is an enforcement engine** — it installs working hooks, skills, and process guardrails that run automatically. **The recommender is a suggestion engine** — it analyzes your codebase and suggests additional automations you might want. They're complementary:
925
961
 
926
- **Claude prompts for each:**
927
- > "[Detected: Prettier config] Want to add auto-format hook? (y/n)"
962
+ | Category | Wizard Ships | Recommender Suggests |
963
+ |----------|-------------|---------------------|
964
+ | SDLC process (TDD, planning, review) | Enforced via hooks + skills | Not covered |
965
+ | CI workflows (PR review) | Templates + docs | Not covered |
966
+ | MCP servers (context7, Playwright, DB) | Not covered | Per-stack suggestions |
967
+ | Auto-formatting hooks (Prettier, ESLint) | Not covered | Per-stack suggestions |
968
+ | Type-checking hooks (tsc, mypy) | Not covered | Per-stack suggestions |
969
+ | Subagent templates (code-reviewer, etc.) | Cross-model review only | 8 templates |
970
+ | Plugin recommendations (LSPs, etc.) | Not covered | Per-stack suggestions |
928
971
 
929
- These are additive—they don't replace our TDD hooks.
972
+ The recommender's suggestions are additive they don't replace the wizard's TDD hooks or SDLC enforcement.
930
973
 
931
974
  ### Git Workflow Preference
932
975
 
@@ -991,39 +1034,44 @@ Feature branches still recommended for solo devs (keeps main clean, easy rollbac
991
1034
 
992
1035
  **Back-and-forth:** User questions live in PR comments. Bot's response is always the latest sticky comment. Clean and organized.
993
1036
 
994
- **CI monitoring question:**
995
- > "Should Claude monitor CI checks after pushing and auto-diagnose failures? (y/n)"
1037
+ **CI shepherd opt-in (only if CI detected during auto-scan):**
1038
+ > "Enable CI shepherd role? Claude will actively watch CI, auto-fix failures, and iterate on review feedback. (y/n)"
996
1039
 
997
- - **Yes** → Enable CI feedback loop in SDLC skill, add `gh` CLI to allowedTools
998
- - **No** → Skip CI monitoring steps (Claude still runs local tests, just doesn't watch CI)
1040
+ - **Yes** → Enable full shepherd loop: CI fix loop + review feedback loop. Ask detail questions below
1041
+ - **No** → Skip CI shepherd entirely (Claude still runs local tests, just doesn't interact with CI after pushing)
999
1042
 
1000
- **What this does:**
1001
- 1. After pushing, Claude runs `gh pr checks` to watch CI status
1002
- 2. If checks fail, Claude reads logs via `gh run view --log-failed`
1003
- 3. Claude diagnoses the failure and proposes a fix
1004
- 4. Max 2 fix attempts, then asks user
1005
- 5. Job isn't done until CI is green
1043
+ **What the CI shepherd does:**
1044
+ 1. **CI fix loop:** After pushing, Claude watches CI via `gh pr checks`, reads failure logs, diagnoses and fixes, pushes again (max 2 attempts)
1045
+ 2. **Review feedback loop:** After CI passes, Claude reads automated review comments, implements valid suggestions, pushes and re-reviews (max 3 iterations)
1006
1046
 
1007
- **Recommendation:** Yes if you have CI configured. This closes the loop between
1008
- "local tests pass" and "PR is actually ready to merge."
1047
+ **Recommendation:** Yes if you have CI configured. The shepherd closes the loop between "local tests pass" and "PR is actually ready to merge."
1009
1048
 
1010
1049
  **Requirements:**
1011
1050
  - `gh` CLI installed and authenticated
1012
1051
  - CI/CD configured (GitHub Actions, etc.)
1013
1052
  - If no CI yet: skip, add later when you set up CI
1014
1053
 
1054
+ **Stored in SDLC.md metadata as:**
1055
+ ```
1056
+ <!-- CI Shepherd: enabled -->
1057
+ ```
1058
+
1059
+ **Detail questions (only if CI shepherd is enabled):**
1060
+
1061
+ **CI monitoring detail:**
1062
+ > "Should Claude monitor CI checks after pushing and auto-diagnose failures? (y/n)"
1063
+
1064
+ - **Yes** → Enable CI feedback loop in SDLC skill, add `gh` CLI to allowedTools
1065
+ - **No** → Skip CI monitoring steps (Claude still runs local tests, just doesn't watch CI)
1066
+
1015
1067
  **CI review feedback question (only if CI monitoring is enabled):**
1016
1068
  > "What level of automated review response do you want?"
1017
1069
 
1018
- | Level | Name | What autofix handles | Est. API cost |
1019
- |-------|------|---------------------|---------------|
1020
- | **L1** | `ci-only` | CI failures only (broken tests, lint) | ~$0.50/fix |
1021
- | **L2** | `criticals` (default) | + Critical review findings (must-fix) | ~$1/fix |
1022
- | **L3** | `all-findings` | + Every suggestion the reviewer flags | ~$2/fix |
1023
-
1024
- > **Cost note:** Higher levels mean more autofix iterations (each ~$0.50).
1025
- > L3 typically adds 1-2 extra iterations per PR but produces cleaner code.
1026
- > You can change this anytime by editing `AUTOFIX_LEVEL` in your ci-autofix workflow.
1070
+ | Level | Name | What the shepherd handles |
1071
+ |-------|------|--------------------------|
1072
+ | **L1** | `ci-only` | CI failures only (broken tests, lint) |
1073
+ | **L2** | `criticals` (default) | + Critical review findings (must-fix) |
1074
+ | **L3** | `all-findings` | + Every suggestion the reviewer flags |
1027
1075
 
1028
1076
  **What this does:**
1029
1077
  1. After CI passes, Claude reads the automated code review comments
@@ -1198,9 +1246,11 @@ Recommendation: Your current tests rely heavily on mocks.
1198
1246
 
1199
1247
  ---
1200
1248
 
1201
- ## Step 1: Confirm or Customize
1249
+ ## Step 1: Build Confidence Map and Fill Gaps
1202
1250
 
1203
- Claude presents what it found. You confirm or override:
1251
+ Claude assigns a state to each configuration data point based on scan results. **RESOLVED (detected)** items are presented for bulk confirmation. **RESOLVED (inferred)** items are presented with inferred values for the user to verify. **UNRESOLVED** items become questions. **The number of questions is dynamic — it depends on how much the scan resolves.** Stop asking when ALL data points are resolved (detected, inferred+confirmed, or answered by user).
1252
+
1253
+ Claude presents what it found, organized by resolution state:
1204
1254
 
1205
1255
  ### Project Structure (Auto-Detected)
1206
1256
 
@@ -1209,13 +1259,13 @@ Claude presents what it found. You confirm or override:
1209
1259
  Override? (leave blank to accept): _______________
1210
1260
  ```
1211
1261
 
1212
- **Q2: Where do your tests live?**
1262
+ **Test directory** (detect from tests/, __tests__/, spec/, test file patterns)
1213
1263
  ```
1214
1264
  Examples: tests/, __tests__/, src/**/*.test.ts, spec/
1215
1265
  Your answer: _______________
1216
1266
  ```
1217
1267
 
1218
- **Q3: What's your test framework?**
1268
+ **Test framework** (detect from jest.config, vitest.config, pytest.ini, etc.)
1219
1269
  ```
1220
1270
  Options: Jest, Vitest, Playwright, Cypress, pytest, Go testing, other
1221
1271
  Your answer: _______________
@@ -1223,31 +1273,31 @@ Your answer: _______________
1223
1273
 
1224
1274
  ### Commands
1225
1275
 
1226
- **Q4: What runs your linter?**
1276
+ **Lint command** (detect from package.json scripts, Makefile, config files)
1227
1277
  ```
1228
1278
  Examples: npm run lint, pnpm lint, eslint ., biome check
1229
1279
  Your answer: _______________
1230
1280
  ```
1231
1281
 
1232
- **Q5: What runs type checking?**
1282
+ **Type-check command** (detect from tsconfig.json, mypy.ini, etc.)
1233
1283
  ```
1234
1284
  Examples: npm run typecheck, tsc --noEmit, mypy, none
1235
1285
  Your answer: _______________
1236
1286
  ```
1237
1287
 
1238
- **Q6: What runs all tests?**
1288
+ **Run all tests command** (detect from package.json "test" script, Makefile)
1239
1289
  ```
1240
1290
  Examples: npm run test, pnpm test, pytest, go test ./...
1241
1291
  Your answer: _______________
1242
1292
  ```
1243
1293
 
1244
- **Q7: What runs a specific test file?**
1294
+ **Run single test file command** (infer from framework: jest jest path, pytest → pytest path)
1245
1295
  ```
1246
1296
  Examples: npm run test -- path/to/test.ts, pytest path/to/test.py
1247
1297
  Your answer: _______________
1248
1298
  ```
1249
1299
 
1250
- **Q8: What builds for production?**
1300
+ **Production build command** (detect from package.json "build" script, Makefile)
1251
1301
  ```
1252
1302
  Examples: npm run build, pnpm build, go build, cargo build
1253
1303
  Your answer: _______________
@@ -1255,7 +1305,7 @@ Your answer: _______________
1255
1305
 
1256
1306
  ### Deployment
1257
1307
 
1258
- **Q8.5: How do you deploy? (auto-detected, confirm or override)**
1308
+ **Deployment setup** (auto-detected from Dockerfile, vercel.json, fly.toml, deploy scripts)
1259
1309
  ```
1260
1310
  Detected: [e.g., Vercel, GitHub Actions, Docker, none]
1261
1311
 
@@ -1278,19 +1328,19 @@ Your answer: _______________
1278
1328
 
1279
1329
  ### Infrastructure
1280
1330
 
1281
- **Q9: What database(s) do you use?**
1331
+ **Database(s)** (detect from prisma/, .env DB vars, docker-compose services)
1282
1332
  ```
1283
1333
  Examples: PostgreSQL, MySQL, SQLite, MongoDB, none
1284
1334
  Your answer: _______________
1285
1335
  ```
1286
1336
 
1287
- **Q10: Do you use caching (Redis, etc.)?**
1337
+ **Caching layer** (detect from .env REDIS vars, docker-compose redis service)
1288
1338
  ```
1289
1339
  Examples: Redis, Memcached, none
1290
1340
  Your answer: _______________
1291
1341
  ```
1292
1342
 
1293
- **Q11: How long do your tests take?**
1343
+ **Test duration** (estimate from test file count, CI run times if available)
1294
1344
  ```
1295
1345
  Examples: <1 minute, 1-5 minutes, 5+ minutes
1296
1346
  Your answer: _______________
@@ -1298,7 +1348,7 @@ Your answer: _______________
1298
1348
 
1299
1349
  ### Output Preferences
1300
1350
 
1301
- **Q12: How much detail in Claude's responses?**
1351
+ **Response detail level** (cannot detect always ask if no preference found)
1302
1352
  ```
1303
1353
  Options:
1304
1354
  - Small - Minimal output, just essentials (experienced users)
@@ -1316,7 +1366,7 @@ Stored in `.claude/settings.json` as `"verbosity": "small|medium|large"`.
1316
1366
 
1317
1367
  ### Testing Philosophy
1318
1368
 
1319
- **Q13: What's your testing approach?**
1369
+ **Testing approach** (infer from existing test patterns — test-first files, coverage config)
1320
1370
  ```
1321
1371
  Options:
1322
1372
  - Strict TDD (test first always)
@@ -1327,7 +1377,7 @@ Options:
1327
1377
  Your answer: _______________
1328
1378
  ```
1329
1379
 
1330
- **Q14: What types of tests do you want?**
1380
+ **Test types** (detect from existing test file patterns: *.test.*, *.spec.*, e2e/, integration/)
1331
1381
  ```
1332
1382
  (Check all that apply)
1333
1383
  [ ] Unit tests (pure logic, isolated)
@@ -1337,7 +1387,7 @@ Your answer: _______________
1337
1387
  [ ] Other: _______________
1338
1388
  ```
1339
1389
 
1340
- **Q15: Your mocking philosophy?**
1390
+ **Mocking philosophy** (detect from jest.mock, unittest.mock usage patterns)
1341
1391
  ```
1342
1392
  Options:
1343
1393
  - Minimal mocking (real DB, mock external APIs only)
@@ -1352,7 +1402,7 @@ Your answer: _______________
1352
1402
  **If test framework detected (Jest, pytest, Go, etc.):**
1353
1403
 
1354
1404
  ```
1355
- Q16: Code Coverage (Optional)
1405
+ Code Coverage (Optional)
1356
1406
 
1357
1407
  Detected: [test framework] with coverage configuration
1358
1408
 
@@ -1373,7 +1423,7 @@ Your answer: _______________
1373
1423
  **If no test framework detected (docs/AI-heavy project):**
1374
1424
 
1375
1425
  ```
1376
- Q16: Code Coverage (Optional)
1426
+ Code Coverage (Optional)
1377
1427
 
1378
1428
  No test framework detected (documentation/AI-heavy project).
1379
1429
 
@@ -1393,19 +1443,19 @@ Your answer: _______________
1393
1443
 
1394
1444
  ---
1395
1445
 
1396
- ### Using Your Answers
1446
+ ### How Configuration Data Points Map to Files
1397
1447
 
1398
- Your answers map to these files:
1448
+ Each resolved data point (whether detected or confirmed by the user) maps to generated files:
1399
1449
 
1400
- | Question | Used In |
1401
- |----------|---------|
1402
- | Q1 (source dir) | `tdd-pretool-check.sh` - pattern match |
1403
- | Q2 (test dir) | `TESTING.md` - documentation |
1404
- | Q3 (test framework) | `TESTING.md` - documentation |
1405
- | Q4-Q8 (commands) | `CLAUDE.md` - Commands section |
1406
- | Q9-Q10 (infra) | `CLAUDE.md` - Architecture section, `TESTING.md` - mock decisions |
1407
- | Q11 (test duration) | `SDLC skill` - wait time note |
1408
- | Q12 (E2E) | `TESTING.md` - testing diamond top |
1450
+ | Data Point | Used In |
1451
+ |-----------|---------|
1452
+ | Source directory | `tdd-pretool-check.sh` - pattern match |
1453
+ | Test directory | `TESTING.md` - documentation |
1454
+ | Test framework | `TESTING.md` - documentation |
1455
+ | Commands (lint, typecheck, test, build) | `CLAUDE.md` - Commands section |
1456
+ | Infrastructure (DB, cache) | `CLAUDE.md` - Architecture section, `TESTING.md` - mock decisions |
1457
+ | Test duration | `SDLC skill` - wait time note |
1458
+ | Test types (E2E) | `TESTING.md` - testing diamond top |
1409
1459
 
1410
1460
  ---
1411
1461
 
@@ -1654,13 +1704,14 @@ TodoWrite([
1654
1704
  { content: "Find and read relevant documentation", status: "in_progress", activeForm: "Reading docs" },
1655
1705
  { content: "Assess doc health - flag issues (ask before cleaning)", status: "pending", activeForm: "Checking doc health" },
1656
1706
  { content: "DRY scan: What patterns exist to reuse?", status: "pending", activeForm: "Scanning for reusable patterns" },
1707
+ { content: "Prove It Gate: adding new component? Research alternatives, prove quality with tests", status: "pending", activeForm: "Checking prove-it gate" },
1657
1708
  { content: "Blast radius: What depends on code I'm changing?", status: "pending", activeForm: "Checking dependencies" },
1658
1709
  { content: "Restate task in own words - verify understanding", status: "pending", activeForm: "Verifying understanding" },
1659
1710
  { content: "Scrutinize test design - right things tested? Follow TESTING.md?", status: "pending", activeForm: "Reviewing test approach" },
1660
1711
  { content: "Present approach + STATE CONFIDENCE LEVEL", status: "pending", activeForm: "Presenting approach" },
1661
1712
  { content: "Signal ready - user exits plan mode", status: "pending", activeForm: "Awaiting plan approval" },
1662
1713
  // TRANSITION PHASE (After plan mode, before compact)
1663
- { content: "Update feature docs with discovered gotchas", status: "pending", activeForm: "Updating feature docs" },
1714
+ { content: "Doc sync: update feature docs if code change contradicts or extends documented behavior", status: "pending", activeForm: "Syncing feature docs" },
1664
1715
  { content: "Request /compact before TDD", status: "pending", activeForm: "Requesting compact" },
1665
1716
  // IMPLEMENTATION PHASE (After compact)
1666
1717
  { content: "TDD RED: Write failing test FIRST", status: "pending", activeForm: "Writing failing test" },
@@ -1695,13 +1746,29 @@ TodoWrite([
1695
1746
  - Does test approach follow TESTING.md philosophies?
1696
1747
  - If introducing new test patterns, same scrutiny as code patterns
1697
1748
 
1749
+ ## Prove It Gate (REQUIRED for New Additions)
1750
+
1751
+ **Adding a new skill, hook, workflow, or component? PROVE IT FIRST:**
1752
+
1753
+ 1. **Research:** Does something equivalent already exist (native CC, third-party plugin, existing skill)?
1754
+ 2. **If YES:** Why is yours better? Show evidence (A/B test, quality comparison, gap analysis)
1755
+ 3. **If NO:** What gap does this fill? Is the gap real or theoretical?
1756
+ 4. **Quality tests:** New additions MUST have tests that prove OUTPUT QUALITY, not just existence
1757
+ 5. **Less is more:** Every addition is maintenance burden. Default answer is NO unless proven YES
1758
+
1759
+ **Existence tests are NOT quality tests:**
1760
+ - BAD: "ci-analyzer skill file exists" — proves nothing about quality
1761
+ - GOOD: "ci-analyzer recommends lint-first when test-before-lint detected" — proves behavior
1762
+
1763
+ **If you can't write a quality test for it, you can't prove it works, so don't add it.**
1764
+
1698
1765
  ## Plan Mode Integration
1699
1766
 
1700
1767
  **Use plan mode for:** Multi-file changes, new features, LOW confidence, bugs needing investigation.
1701
1768
 
1702
1769
  **Workflow:**
1703
1770
  1. **Plan Mode** (editing blocked): Research → Write plan file → Present approach + confidence
1704
- 2. **Transition** (after approval): Update feature docs → Request /compact
1771
+ 2. **Transition** (after approval): Doc sync (update feature docs if code contradicts/extends them) → Request /compact
1705
1772
  3. **Implementation** (after compact): TDD RED → GREEN → PASS
1706
1773
 
1707
1774
  **Before TDD, MUST ask:** "Docs updated. Run `/compact` before implementation?"
@@ -1744,7 +1811,7 @@ PLANNING → DOCS → TDD RED → TDD GREEN → Tests Pass → Self-Review
1744
1811
 
1745
1812
  ## Cross-Model Review (If Configured)
1746
1813
 
1747
- **When to run:** High-stakes changes (auth, payments, data handling), complex refactors, research-heavy work.
1814
+ **When to run:** High-stakes changes (auth, payments, data handling), releases/publishes (version bumps, CHANGELOG, npm publish), complex refactors, research-heavy work.
1748
1815
  **When to skip:** Trivial changes (typo fixes, config tweaks), time-sensitive hotfixes, risk < review cost.
1749
1816
 
1750
1817
  **Prerequisites:** Codex CLI installed (`npm i -g @openai/codex`), OpenAI API key set.
@@ -1849,6 +1916,17 @@ Self-review passes → handoff.json (round 1, PENDING_REVIEW)
1849
1916
 
1850
1917
  **Full protocol:** See the "Cross-Model Review Loop (Optional)" section below for key flags and reasoning effort guidance.
1851
1918
 
1919
+ ### Release Review Focus
1920
+
1921
+ Before any release/publish, add these to `review_instructions`:
1922
+ - **CHANGELOG consistency** — all sections present, no lost entries during consolidation
1923
+ - **Version parity** — package.json, SDLC.md, CHANGELOG, wizard metadata all match
1924
+ - **Stale examples** — hardcoded version strings in docs match current release
1925
+ - **Docs accuracy** — README, ARCHITECTURE.md reflect current feature set
1926
+ - **CLI-distributed file parity** — live skills, hooks, settings match CLI templates
1927
+
1928
+ Evidence: v1.20.0 cross-model review caught CHANGELOG section loss and stale wizard version examples that passed all tests and self-review.
1929
+
1852
1930
  ## Test Review (Harder Than Implementation)
1853
1931
 
1854
1932
  During self-review, critique tests HARDER than app code:
@@ -1898,7 +1976,7 @@ Debug it. Find root cause. Fix it properly. Tests ARE code.
1898
1976
 
1899
1977
  ## Flaky Test Prevention
1900
1978
 
1901
- **Flaky tests are bugs. Period.** They erode trust in the test suite, slow down teams, and mask real regressions.
1979
+ **Flaky tests are bugs. Period.** They erode trust in the test suite, slow down teams, and mask real regressions. For a deep dive, see: [How do you Address and Prevent Flaky Tests?](https://softwareautomation.notion.site/How-do-you-Address-and-Prevent-Flaky-Tests-23c539e19b3c46eeb655642b95237dc0)
1902
1980
 
1903
1981
  ### Principles
1904
1982
 
@@ -1926,7 +2004,9 @@ Sometimes the flakiness is genuinely in CI infrastructure (runner environment, G
1926
2004
  - **Keep quality gates strict** — the actual pass/fail decision must NOT have `continue-on-error`
1927
2005
  - **Separate "fail the build" from "nice to have"** — a missing PR comment is not a regression
1928
2006
 
1929
- ## CI Feedback Loop (After Commit)
2007
+ ## CI Feedback Loop — Local Shepherd (After Commit)
2008
+
2009
+ **This is the "local shepherd" — your CI fix mechanism.** It runs in your active session with full context.
1930
2010
 
1931
2011
  **The SDLC doesn't end at local tests.** CI must pass too.
1932
2012
 
@@ -1972,7 +2052,7 @@ Local tests pass -> Commit -> Push -> Watch CI
1972
2052
  - Flaky? Investigate - flakiness is a bug
1973
2053
  - Stuck? ASK USER
1974
2054
 
1975
- ## CI Review Feedback Loop (After CI Passes)
2055
+ ## CI Review Feedback Loop — Local Shepherd (After CI Passes)
1976
2056
 
1977
2057
  **CI passing isn't the end.** If CI includes a code reviewer, read and address its suggestions.
1978
2058
 
@@ -2102,7 +2182,7 @@ Create `CLAUDE.md` in your project root. This is your project-specific configura
2102
2182
 
2103
2183
  ## Commands
2104
2184
 
2105
- <!-- CUSTOMIZE: Replace with your actual commands from Q4-Q8 -->
2185
+ <!-- CUSTOMIZE: Replace with your actual detected/confirmed commands -->
2106
2186
 
2107
2187
  - Build: `[your build command]`
2108
2188
  - Run dev: `[your dev command]`
@@ -2189,7 +2269,7 @@ These are your full reference docs. Start with stubs and expand over time:
2189
2269
 
2190
2270
  ## Environments
2191
2271
 
2192
- <!-- Claude auto-populates this from Q8.5 deployment detection -->
2272
+ <!-- Claude auto-populates this from deployment detection -->
2193
2273
 
2194
2274
  | Environment | URL | Deploy Command | Trigger |
2195
2275
  |-------------|-----|----------------|---------|
@@ -2236,7 +2316,7 @@ If deployment fails or post-deploy verification catches issues:
2236
2316
 
2237
2317
  | Environment | Rollback Command | Notes |
2238
2318
  |-------------|------------------|-------|
2239
- | Preview | [auto-expires or redeploy] | Usually self-heals |
2319
+ | Preview | [auto-expires or redeploy] | Ephemeral redeploy to fix |
2240
2320
  | Staging | `[your rollback command]` | [notes] |
2241
2321
  | Production | `[your rollback command]` | [critical - document clearly] |
2242
2322
 
@@ -2266,7 +2346,7 @@ If deployment fails or post-deploy verification catches issues:
2266
2346
 
2267
2347
  **SDLC.md:**
2268
2348
  ```markdown
2269
- <!-- SDLC Wizard Version: 1.18.0 -->
2349
+ <!-- SDLC Wizard Version: 1.21.0 -->
2270
2350
  <!-- Setup Date: [DATE] -->
2271
2351
  <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
2272
2352
  <!-- Git Workflow: [PRs or Solo] -->
@@ -2577,9 +2657,17 @@ Want me to file these? (yes/no/not now)
2577
2657
 
2578
2658
  ## Going Further
2579
2659
 
2580
- ### Create Feature Plan Docs
2660
+ ### Feature Documentation
2581
2661
 
2582
- For each major feature, create `FEATURE_NAME_PLAN.md`:
2662
+ Keep feature docs alongside code. Three patterns, use what fits:
2663
+
2664
+ | Pattern | When to Use | Example |
2665
+ |---------|-------------|---------|
2666
+ | `*_PLAN.md` / `*_DOCS.md` | Per-feature living docs | `AUTH_DOCS.md`, `PAYMENTS_PLAN.md` |
2667
+ | `docs/decisions/NNN-title.md` (ADR) | Architecture decisions that need rationale | `docs/decisions/001-use-postgres.md` |
2668
+ | `docs/features/name.md` | Feature docs in a `docs/` directory | `docs/features/auth.md` |
2669
+
2670
+ **Feature doc template:**
2583
2671
 
2584
2672
  ```markdown
2585
2673
  # Feature Name
@@ -2597,7 +2685,36 @@ Things that can trip you up.
2597
2685
  What's planned but not done.
2598
2686
  ```
2599
2687
 
2600
- Claude will read these during planning and update them with discoveries.
2688
+ **ADR (Architecture Decision Record) template** for decisions that need context:
2689
+
2690
+ ```markdown
2691
+ # ADR-NNN: Decision Title
2692
+
2693
+ ## Status
2694
+ Accepted | Superseded by ADR-NNN | Deprecated
2695
+
2696
+ ## Context
2697
+ What is the problem? What forces are at play?
2698
+
2699
+ ## Decision
2700
+ What did we decide and why?
2701
+
2702
+ ## Consequences
2703
+ What are the trade-offs? What becomes easier/harder?
2704
+ ```
2705
+
2706
+ Store ADRs in `docs/decisions/`. Number sequentially. Claude reads these during planning to understand why things are built the way they are.
2707
+
2708
+ **Keeping docs in sync with code:**
2709
+
2710
+ Docs drift when code changes but docs don't. The SDLC skill's planning phase detects this:
2711
+
2712
+ - During planning, Claude reads feature docs for the area being changed
2713
+ - If the code change contradicts what the doc says, Claude updates the doc
2714
+ - The "After Session" step routes learnings to the right doc
2715
+ - Stale docs cause low confidence — if Claude struggles, the doc may need updating
2716
+
2717
+ **CLAUDE.md health:** Run `/claude-md-improver` periodically (quarterly or after major changes). It audits CLAUDE.md specifically — structure, clarity, completeness (6 criteria, 100-point rubric). It does NOT cover feature docs, TESTING.md, or ADRs — the SDLC workflow handles those.
2601
2718
 
2602
2719
  ### Expand TESTING.md
2603
2720
 
@@ -2621,6 +2738,10 @@ Add project-specific guidance to skills:
2621
2738
  - Preferred patterns
2622
2739
  - Architecture decisions
2623
2740
 
2741
+ ### Complementary Tools
2742
+
2743
+ The wizard handles SDLC process enforcement. For stack-specific tooling, run `/claude-automation-recommender` — it suggests MCP servers, formatting hooks, type-checking hooks, subagent templates, and plugins based on your detected tech stack. See [Step 0.3](#step-03-additional-recommendations-optional) for the full comparison.
2744
+
2624
2745
  ---
2625
2746
 
2626
2747
  ## Testing AI Apps: What's Different
@@ -2688,6 +2809,49 @@ _Sources: [Confident AI](https://www.confident-ai.com/blog/llm-testing-in-2024-t
2688
2809
 
2689
2810
  ---
2690
2811
 
2812
+ ## Token Efficiency
2813
+
2814
+ Practical techniques to reduce token consumption without sacrificing quality.
2815
+
2816
+ ### Monitor Costs
2817
+
2818
+ | Tool | What It Shows | When to Use |
2819
+ |------|---------------|-------------|
2820
+ | `/cost` | Session total: USD, API time, code changes | After a session to review spend |
2821
+ | `/context` | What's consuming context window space | When hitting context limits |
2822
+ | Status line | Real-time `cost.total_cost_usd` + token counts | Continuous monitoring |
2823
+
2824
+ ### Reduce Consumption
2825
+
2826
+ | Technique | Savings | How |
2827
+ |-----------|---------|-----|
2828
+ | `/compact` between phases | ~40-60% context | Plan → compact → implement (plan preserved) |
2829
+ | `/clear` between tasks | 100% context reset | No stale context from prior work |
2830
+ | Delegate verbose ops to subagents | Separate context | `Agent` tool returns summary, not full output |
2831
+ | Use skills for on-demand knowledge | Smaller base context | Skills load only when invoked |
2832
+ | Scope investigations narrowly | Fewer tokens read | "investigate auth module" > "investigate codebase" |
2833
+ | `--effort low` for simple tasks | ~50% thinking tokens | Simple renames, config changes |
2834
+
2835
+ ### CI Cost Control
2836
+
2837
+ Add `--max-budget-usd` to CI workflows as a safety net:
2838
+
2839
+ ```yaml
2840
+ claude_args: "--max-budget-usd 5.00 --max-turns 30"
2841
+ ```
2842
+
2843
+ | Flag | Purpose |
2844
+ |------|---------|
2845
+ | `--max-budget-usd` | Hard dollar cap per CI invocation |
2846
+ | `--max-turns` | Limit agentic turns (prevents infinite loops) |
2847
+ | `--effort` | `low`/`medium`/`high` controls thinking depth |
2848
+
2849
+ ### Advanced: OpenTelemetry
2850
+
2851
+ For organization-wide cost tracking, enable `CLAUDE_CODE_ENABLE_TELEMETRY=1`. This exports per-request `cost_usd`, `input_tokens`, `output_tokens` to any OTLP-compatible backend (Datadog, Honeycomb, Prometheus).
2852
+
2853
+ ---
2854
+
2691
2855
  ## CI/CD Gotchas
2692
2856
 
2693
2857
  Common pitfalls when automating AI-assisted development workflows.
@@ -2749,85 +2913,6 @@ Claude: [fetches via gh api, discusses with you interactively]
2749
2913
 
2750
2914
  This is optional - skip if you prefer fresh reviews only.
2751
2915
 
2752
- ### CI Auto-Fix Loop (Optional)
2753
-
2754
- Automatically fix CI failures and PR review findings. Claude reads the error context, fixes the code, commits, and re-triggers CI. Loops until CI passes AND review has no findings at your chosen level, or max retries hit.
2755
-
2756
- **The Loop:**
2757
- ```
2758
- Push to PR
2759
- |
2760
- v
2761
- CI runs ──► FAIL ──► ci-autofix: Claude reads logs, fixes, commits [autofix 1/3] ──► re-trigger
2762
- |
2763
- └── PASS ──► PR Review ──► has findings at your level? ──► ci-autofix: fixes all ──► re-trigger
2764
- |
2765
- └── APPROVE, no findings ──► DONE
2766
- ```
2767
-
2768
- **Safety measures:**
2769
- - Never runs on main branch
2770
- - Max retries (default 3, configurable via `MAX_AUTOFIX_RETRIES`)
2771
- - `AUTOFIX_LEVEL` controls what findings to act on (`ci-only`, `criticals`, `all-findings`)
2772
- - Restricted Claude tools (no git, no npm)
2773
- - Self-modification ban (can't edit its own workflow file)
2774
- - `[autofix N/M]` commit tags for audit trail
2775
- - Sticky PR comments show status
2776
-
2777
- **Setup:**
2778
- 1. Create `.github/workflows/ci-autofix.yml`:
2779
-
2780
- ```yaml
2781
- name: CI Auto-Fix
2782
-
2783
- on:
2784
- workflow_run:
2785
- workflows: ["CI", "PR Code Review"]
2786
- types: [completed]
2787
-
2788
- permissions:
2789
- contents: write
2790
- pull-requests: write
2791
-
2792
- env:
2793
- MAX_AUTOFIX_RETRIES: 3
2794
- AUTOFIX_LEVEL: criticals # ci-only | criticals | all-findings
2795
-
2796
- jobs:
2797
- autofix:
2798
- runs-on: ubuntu-latest
2799
- if: |
2800
- github.event.workflow_run.head_branch != 'main' &&
2801
- github.event.workflow_run.event == 'pull_request' &&
2802
- (
2803
- (github.event.workflow_run.name == 'CI' && github.event.workflow_run.conclusion == 'failure') ||
2804
- (github.event.workflow_run.name == 'PR Code Review' && github.event.workflow_run.conclusion == 'success')
2805
- )
2806
- steps:
2807
- # Count previous [autofix] commits to enforce max retries
2808
- # Download CI failure logs or fetch review comment
2809
- # Check findings at your AUTOFIX_LEVEL (criticals + suggestions)
2810
- # Run Claude to fix ALL findings with restricted tools
2811
- # Commit [autofix N/M], push, re-trigger CI
2812
- # Post sticky PR comment with status
2813
- ```
2814
-
2815
- 2. Add `workflow_dispatch:` trigger to your CI workflow (so autofix can re-trigger it)
2816
- 3. Optionally configure a GitHub App for token generation (avoids `workflow_run` default-branch constraint)
2817
-
2818
- **Token approaches:**
2819
-
2820
- | Approach | When | Pros |
2821
- |----------|------|------|
2822
- | GITHUB_TOKEN + `gh workflow run` | Default | No extra setup |
2823
- | GitHub App token | `CI_AUTOFIX_APP_ID` secret exists | Push triggers `synchronize` naturally |
2824
-
2825
- **Note:** `workflow_run` only fires for workflows on the default branch. The ci-autofix workflow is dormant until first merged to main.
2826
-
2827
- > **Template vs. this repo:** The template above uses `ci-autofix.yml` with `criticals` as a safe default for new projects. The wizard's own repo has evolved this into `ci-self-heal.yml` with `all-findings` — a more aggressive configuration we dogfood internally. Both naming conventions work; the behavior is identical.
2828
-
2829
- ---
2830
-
2831
2916
  ### Cross-Model Review Loop (Optional)
2832
2917
 
2833
2918
  Use an independent AI model from a different company as a code reviewer. The author can't grade their own homework — a model with different training data and different biases catches blind spots the authoring model misses.
@@ -2966,6 +3051,7 @@ Claude writes code → self-review passes → handoff.json (round 1)
2966
3051
 
2967
3052
  **When to use this:**
2968
3053
  - High-stakes changes (auth, payments, data handling)
3054
+ - **Releases and publishes** (version bumps, CHANGELOG, npm publish) — see Release Review Checklist below
2969
3055
  - Research-heavy work where accuracy matters more than speed
2970
3056
  - Complex refactors touching many files
2971
3057
  - Any time you want higher confidence before merging
@@ -2975,6 +3061,30 @@ Claude writes code → self-review passes → handoff.json (round 1)
2975
3061
  - Time-sensitive hotfixes
2976
3062
  - Changes where the review cost exceeds the risk
2977
3063
 
3064
+ #### Release Review Checklist
3065
+
3066
+ Before any release or npm publish, add these focus areas to the cross-model `review_instructions`:
3067
+
3068
+ **Why:** Self-review and automated tests regularly miss release-specific inconsistencies. Evidence: v1.20.0 cross-model review caught 2 real issues (CHANGELOG section lost during consolidation, stale hardcoded version examples) that passed all tests and self-review.
3069
+
3070
+ | Check | What to Look For | Example Failure |
3071
+ |-------|-------------------|-----------------|
3072
+ | CHANGELOG consistency | All sections present, no lost entries during consolidation | v1.19.0 section dropped when merging into v1.20.0 |
3073
+ | Version parity | package.json, SDLC.md, CHANGELOG, wizard metadata all match | SDLC.md says 1.19.0 but package.json says 1.20.0 |
3074
+ | Stale examples | Hardcoded version strings in docs/wizard match current release | Wizard examples showing v1.15.0 when publishing v1.20.0 |
3075
+ | Docs accuracy | README, ARCHITECTURE.md reflect current feature set | "8 workflows" when there are actually 7 |
3076
+ | CLI-distributed file parity | Live skills, hooks, settings match CLI templates | SKILL.md edited but cli/templates/ not updated |
3077
+
3078
+ **Example `review_instructions` for releases:**
3079
+ ```
3080
+ Review for release consistency: CHANGELOG completeness (no lost sections),
3081
+ version parity across package.json/SDLC.md/CHANGELOG/wizard metadata,
3082
+ stale hardcoded versions in examples, docs accuracy vs actual features,
3083
+ CLI-distributed file parity (skills, hooks, settings).
3084
+ ```
3085
+
3086
+ **This complements automated tests, not replaces them.** Tests catch exact version mismatches (e.g., `test_package_version_matches_changelog`). Cross-model review catches semantic issues tests cannot — a section silently dropped, examples using outdated but syntactically valid versions, docs describing features that no longer exist.
3087
+
2978
3088
  ---
2979
3089
 
2980
3090
  ## User Understanding and Periodic Feedback
@@ -3079,21 +3189,19 @@ Claude reads the CHANGELOG to show you what's new **before** applying anything.
3079
3189
  ```
3080
3190
  Claude: "Fetching CHANGELOG to check for updates..."
3081
3191
 
3082
- Your version: 1.8.0
3083
- Latest version: 1.13.0
3192
+ Your version: X.Y.0
3193
+ Latest version: X.Z.0
3084
3194
 
3085
- What's new since 1.8.0:
3086
- - v1.13.0: Self-update improvements, optional CI notification
3087
- - v1.12.0: Full system audit, apply step fixes
3088
- - v1.11.0: Stale output cleanup, error handling
3089
- - v1.10.0: "Prove It's Better" CI automation
3090
- - v1.9.0: Workflow consolidation (6 → 5 workflows)
3195
+ What's new since X.Y.0:
3196
+ - vX.Z.0: Latest features and improvements
3197
+ - vX.Y+1.0: Previous version changes
3198
+ (... entries from CHANGELOG between your version and latest ...)
3091
3199
 
3092
3200
  Now checking your setup against latest wizard...
3093
3201
 
3094
3202
  ✓ Hooks - up to date
3095
3203
  ✓ Skills - content differs (update available)
3096
- ✗ step-update-notify - NOT DONE (new in v1.13.0, optional)
3204
+ ✗ step-update-notify - NOT DONE (new in vX.Z.0, optional)
3097
3205
 
3098
3206
  Summary:
3099
3207
  - 1 file update available (SDLC skill)
@@ -3109,7 +3217,7 @@ Walk through updates? (y/n)
3109
3217
  Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
3110
3218
 
3111
3219
  ```markdown
3112
- <!-- SDLC Wizard Version: 1.18.0 -->
3220
+ <!-- SDLC Wizard Version: 1.21.0 -->
3113
3221
  <!-- Setup Date: 2026-01-24 -->
3114
3222
  <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
3115
3223
  <!-- Git Workflow: PRs -->
package/README.md CHANGED
@@ -83,7 +83,7 @@ Layer 1: PHILOSOPHY
83
83
  | **SDP normalization** | Separates "the model had a bad day" from "our SDLC broke" by cross-referencing external benchmarks |
84
84
  | **CUSUM drift detection** | Catches gradual quality decay over time — borrowed from manufacturing quality control |
85
85
  | **Pre-tool TDD hooks** | Before source edits, a hook reminds Claude to write tests first. CI scoring checks whether it actually followed TDD |
86
- | **Self-evolving loop** | Weekly/monthly external research + CI friction signals from self-heal — you approve, the system gets better |
86
+ | **Self-evolving loop** | Weekly/monthly external research + local CI shepherd loop — you approve, the system gets better |
87
87
 
88
88
  ## How It Works
89
89
 
@@ -186,14 +186,14 @@ This isn't the only Claude Code SDLC tool. Here's an honest comparison:
186
186
  |--------|------------|----------------------|-------------|
187
187
  | **Focus** | SDLC enforcement + measurement | Agent performance optimization | Plugin marketplace |
188
188
  | **Hooks** | 3 (SDLC, TDD, instructions) | 12+ (dev blocker, prettier, etc.) | Webhook watcher |
189
- | **Skills** | 2 (/sdlc, /setup) | 80+ domain-specific | 13 slash commands |
189
+ | **Skills** | 3 (/sdlc, /setup, /update) | 80+ domain-specific | 13 slash commands |
190
190
  | **Evaluation** | 95% CI, CUSUM, SDP, Tier 1/2 | Configuration testing | skilltest framework |
191
- | **Self-healing** | CI auto-fix + re-trigger | No | No |
191
+ | **CI Shepherd** | Local CI fix loop | No | No |
192
192
  | **Auto-updates** | Weekly CC + community scan | No | No |
193
193
  | **Install** | `npx agentic-sdlc-wizard init` | npm install | npm install |
194
194
  | **Philosophy** | Lightweight, prove-it-or-delete | Scale and optimization | Documentation-first |
195
195
 
196
- **Our unique strengths:** Statistical rigor (CUSUM + 95% CI), SDP scoring (model quality vs SDLC compliance), self-healing CI, Prove-It A/B pipeline, comprehensive automated test suite, dogfooding enforcement.
196
+ **Our unique strengths:** Statistical rigor (CUSUM + 95% CI), SDP scoring (model quality vs SDLC compliance), CI shepherd loop, Prove-It A/B pipeline, comprehensive automated test suite, dogfooding enforcement.
197
197
 
198
198
  **Where others are stronger:** everything-claude-code has broader language/framework coverage. claude-sdlc has webhook-driven automation. Both have npm distribution.
199
199
 
@@ -204,7 +204,7 @@ This isn't the only Claude Code SDLC tool. Here's an honest comparison:
204
204
  | Document | What It Covers |
205
205
  |----------|---------------|
206
206
  | [ARCHITECTURE.md](ARCHITECTURE.md) | System design, 5-layer diagram, data flows, file structure |
207
- | [CI_CD.md](CI_CD.md) | All 5 workflows, E2E scoring, tier system, SDP, integrity checks |
207
+ | [CI_CD.md](CI_CD.md) | All 4 workflows, E2E scoring, tier system, SDP, integrity checks |
208
208
  | [SDLC.md](SDLC.md) | Version tracking, enforcement rules, SDLC configuration |
209
209
  | [TESTING.md](TESTING.md) | Testing philosophy, test diamond, TDD approach |
210
210
  | [CHANGELOG.md](CHANGELOG.md) | Version history, what changed and when |
@@ -19,6 +19,7 @@ TodoWrite([
19
19
  { content: "Find and read relevant documentation", status: "in_progress", activeForm: "Reading docs" },
20
20
  { content: "Assess doc health - flag issues (ask before cleaning)", status: "pending", activeForm: "Checking doc health" },
21
21
  { content: "DRY scan: What patterns exist to reuse? New pattern = get approval", status: "pending", activeForm: "Scanning for reusable patterns" },
22
+ { content: "Prove It Gate: adding new component? Research alternatives, prove quality with tests", status: "pending", activeForm: "Checking prove-it gate" },
22
23
  { content: "Blast radius: What depends on code I'm changing?", status: "pending", activeForm: "Checking dependencies" },
23
24
  { content: "Design system check (if UI change)", status: "pending", activeForm: "Checking design system" },
24
25
  { content: "Restate task in own words - verify understanding", status: "pending", activeForm: "Verifying understanding" },
@@ -26,7 +27,7 @@ TodoWrite([
26
27
  { content: "Present approach + STATE CONFIDENCE LEVEL", status: "pending", activeForm: "Presenting approach" },
27
28
  { content: "Signal ready - user exits plan mode", status: "pending", activeForm: "Awaiting plan approval" },
28
29
  // TRANSITION PHASE (After plan mode)
29
- { content: "Update feature docs with discovered gotchas", status: "pending", activeForm: "Updating feature docs" },
30
+ { content: "Doc sync: update feature docs if code change contradicts or extends documented behavior", status: "pending", activeForm: "Syncing feature docs" },
30
31
  // IMPLEMENTATION PHASE
31
32
  { content: "TDD RED: Write failing test FIRST", status: "pending", activeForm: "Writing failing test" },
32
33
  { content: "TDD GREEN: Implement, verify test passes", status: "pending", activeForm: "Implementing feature" },
@@ -84,6 +85,22 @@ Critical miss on `tdd_red` or `self_review` = process failure regardless of tota
84
85
  - Does test approach follow TESTING.md philosophies?
85
86
  - If introducing new test patterns, same scrutiny as code patterns
86
87
 
88
+ ## Prove It Gate (REQUIRED for New Additions)
89
+
90
+ **Adding a new skill, hook, workflow, or component? PROVE IT FIRST:**
91
+
92
+ 1. **Research:** Does something equivalent already exist (native CC, third-party plugin, existing skill)?
93
+ 2. **If YES:** Why is yours better? Show evidence (A/B test, quality comparison, gap analysis)
94
+ 3. **If NO:** What gap does this fill? Is the gap real or theoretical?
95
+ 4. **Quality tests:** New additions MUST have tests that prove OUTPUT QUALITY, not just existence
96
+ 5. **Less is more:** Every addition is maintenance burden. Default answer is NO unless proven YES
97
+
98
+ **Existence tests are NOT quality tests:**
99
+ - BAD: "ci-analyzer skill file exists" — proves nothing about quality
100
+ - GOOD: "ci-analyzer recommends lint-first when test-before-lint detected" — proves behavior
101
+
102
+ **If you can't write a quality test for it, you can't prove it works, so don't add it.**
103
+
87
104
  ## Plan Mode Integration
88
105
 
89
106
  **Use plan mode for:** Multi-file changes, new features, LOW confidence, bugs needing investigation.
@@ -131,7 +148,7 @@ PLANNING -> DOCS -> TDD RED -> TDD GREEN -> Tests Pass -> Self-Review
131
148
 
132
149
  ## Cross-Model Review (If Configured)
133
150
 
134
- **When to run:** High-stakes changes (auth, payments, data handling), complex refactors, research-heavy work.
151
+ **When to run:** High-stakes changes (auth, payments, data handling), releases/publishes (version bumps, CHANGELOG, npm publish), complex refactors, research-heavy work.
135
152
  **When to skip:** Trivial changes (typo fixes, config tweaks), time-sensitive hotfixes, risk < review cost.
136
153
 
137
154
  **Prerequisites:** Codex CLI installed (`npm i -g @openai/codex`), OpenAI API key set.
@@ -236,6 +253,17 @@ Self-review passes → handoff.json (round 1, PENDING_REVIEW)
236
253
 
237
254
  **Full protocol:** See the wizard's "Cross-Model Review Loop (Optional)" section for key flags and reasoning effort guidance.
238
255
 
256
+ ### Release Review Focus
257
+
258
+ Before any release/publish, add these to `review_instructions`:
259
+ - **CHANGELOG consistency** — all sections present, no lost entries during consolidation
260
+ - **Version parity** — package.json, SDLC.md, CHANGELOG, wizard metadata all match
261
+ - **Stale examples** — hardcoded version strings in docs match current release
262
+ - **Docs accuracy** — README, ARCHITECTURE.md reflect current feature set
263
+ - **CLI-distributed file parity** — live skills, hooks, settings match CLI templates
264
+
265
+ Evidence: v1.20.0 cross-model review caught CHANGELOG section loss and stale wizard version examples that passed all tests and self-review. Tests catch version mismatches; cross-model review catches semantic issues tests cannot.
266
+
239
267
  ## Test Review (Harder Than Implementation)
240
268
 
241
269
  During self-review, critique tests HARDER than app code:
@@ -280,6 +308,8 @@ Everything else needs integration tests.
280
308
 
281
309
  ## Flaky Test Recovery
282
310
 
311
+ **Flaky tests are bugs. Period.** See: [How do you Address and Prevent Flaky Tests?](https://softwareautomation.notion.site/How-do-you-Address-and-Prevent-Flaky-Tests-23c539e19b3c46eeb655642b95237dc0)
312
+
283
313
  When a test fails intermittently:
284
314
  1. **Don't dismiss it** — "flaky" means "bug we haven't found yet"
285
315
  2. **Identify the layer** — test code? app code? environment?
@@ -333,7 +363,9 @@ If tests fail:
333
363
 
334
364
  Debug it. Find root cause. Fix it properly. Tests ARE code.
335
365
 
336
- ## CI Feedback Loop (After Commit)
366
+ ## CI Feedback Loop — Local Shepherd (After Commit)
367
+
368
+ **This is the "local shepherd" — the CI fix mechanism.** It runs in your active session with full context.
337
369
 
338
370
  **The SDLC doesn't end at local tests.** CI must pass too.
339
371
 
@@ -379,7 +411,7 @@ Local tests pass -> Commit -> Push -> Watch CI
379
411
  - Flaky? Investigate - flakiness is a bug
380
412
  - Stuck? ASK USER
381
413
 
382
- ## CI Review Feedback Loop (After CI Passes)
414
+ ## CI Review Feedback Loop — Local Shepherd (After CI Passes)
383
415
 
384
416
  **CI passing isn't the end.** If CI includes a code reviewer, read and address its suggestions.
385
417
 
@@ -411,6 +443,14 @@ CI passes -> Read review suggestions
411
443
  - **Ask first**: Present suggestions to user, let them decide which to implement
412
444
  - **Skip review feedback**: Ignore CI review suggestions, only fix CI failures
413
445
 
446
+ ## Context Management
447
+
448
+ - `/compact` between planning and implementation (plan preserved in summary)
449
+ - `/clear` between unrelated tasks (stale context wastes tokens and misleads)
450
+ - `/clear` after 2+ failed corrections (context polluted — start fresh with better prompt)
451
+ - Auto-compact fires at ~95% capacity — no manual management needed
452
+ - After committing a PR, `/clear` before starting the next feature
453
+
414
454
  ## DRY Principle
415
455
 
416
456
  **Before coding:** "What patterns exist I can reuse?"
@@ -480,11 +520,25 @@ CI passes -> Read review suggestions
480
520
 
481
521
  **THE RULE:** Delete old code first. If it breaks, fix it properly.
482
522
 
523
+ ## Documentation Sync (During Planning)
524
+
525
+ When a code change affects a documented feature, update the doc in the same PR:
526
+
527
+ 1. **During planning**, read feature docs for the area being changed (`*_PLAN.md`, `*_DOCS.md`, `docs/features/`, `docs/decisions/`)
528
+ 2. If your code change contradicts what the doc says → update the doc
529
+ 3. If your code change extends behavior the doc describes → add to the doc
530
+ 4. If no feature doc exists and the change is substantial → note it in the summary (don't create one unprompted)
531
+
532
+ **Doc staleness signals:** Low confidence in an area often means the docs are stale, missing, or misleading. If you struggle during planning, check whether the docs match the actual code.
533
+
534
+ **CLAUDE.md health:** `/claude-md-improver` audits CLAUDE.md structure and completeness. Run it periodically. It does NOT cover feature docs — the SDLC workflow handles those.
535
+
483
536
  ## After Session (Capture Learnings)
484
537
 
485
538
  If this session revealed insights, update the right place:
486
539
  - **Testing patterns, gotchas** → `TESTING.md`
487
- - **Feature-specific quirks** → Feature docs (`*_PLAN.md`)
540
+ - **Feature-specific quirks** → Feature docs (`*_PLAN.md`, `*_DOCS.md`)
541
+ - **Architecture decisions** → `docs/decisions/` (ADR format) or `ARCHITECTURE.md`
488
542
  - **General project context** → `CLAUDE.md` (or `/revise-claude-md`)
489
543
 
490
544
  ---
@@ -1,17 +1,19 @@
1
1
  ---
2
2
  name: setup-wizard
3
- description: Interactive project setup wizard that scans the codebase, asks all 16 configuration questions, generates SDLC files (CLAUDE.md, SDLC.md, TESTING.md, ARCHITECTURE.md), and verifies the installation. Use this skill when setting up the SDLC wizard for the first time or re-running setup.
3
+ description: Setup wizard scans codebase, builds confidence per data point, only asks what it can't figure out, generates SDLC files. Use for first-time setup or re-running setup.
4
4
  argument-hint: [optional: regenerate | verify-only]
5
5
  effort: high
6
6
  ---
7
- # Setup Wizard - Interactive Project Configuration
7
+ # Setup Wizard - Confidence-Driven Project Configuration
8
8
 
9
9
  ## Task
10
10
  $ARGUMENTS
11
11
 
12
12
  ## Purpose
13
13
 
14
- You are an interactive setup wizard. Your job is to scan the project, ask the user ALL configuration questions, and generate the SDLC files. DO NOT skip questions. DO NOT make assumptions. The user's answers drive the output.
14
+ You are a confidence-driven setup wizard. Your job is to scan the project, infer as much as possible, and only ask the user about what you can't figure out. The number of questions is DYNAMIC it depends on how much you can detect. Stop asking when all configuration data points are resolved (detected, confirmed, or answered).
15
+
16
+ **DO NOT ask a fixed list of questions. DO NOT ask what you already know.**
15
17
 
16
18
  ## MANDATORY FIRST ACTION: Read the Wizard Doc
17
19
 
@@ -36,56 +38,70 @@ Scan the project root for:
36
38
  - Deployment: Dockerfile, vercel.json, fly.toml, netlify.toml, Procfile, k8s/
37
39
  - Design system: tailwind.config.*, .storybook/, theme files, CSS custom properties
38
40
  - Existing docs: README.md, CLAUDE.md, ARCHITECTURE.md
41
+ - Scripts in package.json (lint, test, build, typecheck, etc.)
42
+ - Database config files (prisma/, drizzle.config.*, knexfile.*, .env with DB_*)
43
+ - Cache config (redis.conf, .env with REDIS_*)
44
+
45
+ ### Step 2: Build Confidence Map
46
+
47
+ For each configuration data point, assign a confidence level based on scan results:
48
+
49
+ **Configuration Data Points:**
39
50
 
40
- Present findings to the user in a clear summary with detected values.
51
+ | Category | Data Point | How to Detect |
52
+ |----------|-----------|---------------|
53
+ | Structure | Source directory | Look for src/, app/, lib/, etc. |
54
+ | Structure | Test directory | Look for tests/, __tests__/, spec/ |
55
+ | Structure | Test framework | Config files (jest.config, vitest.config, pytest.ini) |
56
+ | Commands | Lint command | package.json scripts, Makefile, config files |
57
+ | Commands | Type-check command | tsconfig.json → tsc, mypy.ini → mypy |
58
+ | Commands | Run all tests | package.json "test" script, Makefile |
59
+ | Commands | Run single test file | Infer from framework (jest → jest path, pytest → pytest path) |
60
+ | Commands | Production build | package.json "build" script, Makefile |
61
+ | Commands | Deployment setup | Dockerfile, vercel.json, fly.toml, deploy scripts |
62
+ | Infra | Database(s) | prisma/, .env DB vars, docker-compose services |
63
+ | Infra | Caching layer | .env REDIS vars, docker-compose redis service |
64
+ | Infra | Test duration | Count test files, check CI run times if available |
65
+ | Preferences | Response detail level | Cannot detect — ALWAYS ASK |
66
+ | Preferences | Testing approach | Cannot detect intent from existing code — ALWAYS ASK |
67
+ | Preferences | Mocking philosophy | Cannot detect intent from existing code — ALWAYS ASK |
68
+ | Testing | Test types | What test files exist (*.test.*, *.spec.*, e2e/, integration/) |
69
+ | Coverage | Coverage config | nyc, c8, coverage.py config, CI coverage steps |
70
+ | CI | CI shepherd opt-in | Only if CI detected — ALWAYS ASK |
41
71
 
42
- ### Step 2: Ask ALL 17 Questions
72
+ **Each data point has one of three states:**
73
+ - **RESOLVED (detected):** Found concrete evidence — config file, script, directory exists. No question needed, just confirm.
74
+ - **RESOLVED (inferred):** Found indirect evidence — naming patterns, related config. Present inference, let user confirm or correct.
75
+ - **UNRESOLVED:** No evidence found — must ask user directly.
43
76
 
44
- Ask every question. Pre-fill detected values but let the user confirm or override.
77
+ **Preference data points** (response detail, testing approach, mocking philosophy, CI shepherd) are ALWAYS UNRESOLVED regardless of what code patterns exist. Current code patterns show what IS, not what the user WANTS going forward.
45
78
 
46
- **Project Structure:**
47
- 1. Source directory (detected or ask)
48
- 2. Test directory (detected or ask)
49
- 3. Test framework (detected or ask)
79
+ ### Step 3: Present Findings and Fill Gaps
50
80
 
51
- **Commands:**
52
- 4. Lint command
53
- 5. Type-check command
54
- 6. Run all tests command
55
- 7. Run single test file command
56
- 8. Production build command
57
- 9. Deployment setup (detected environments, confirm or customize)
81
+ Present ALL detected values organized by state to the user.
58
82
 
59
- **Infrastructure:**
60
- 10. Database(s) used
61
- 11. Caching layer (Redis, etc.)
62
- 12. Test duration (<1 min, 1-5 min, 5+ min)
83
+ **For RESOLVED (detected) items:** Show what was found, let user bulk-confirm with a single "Looks good" or override specific items.
63
84
 
64
- **Output Preferences:**
65
- 13. Response detail level (small/medium/large)
85
+ **For RESOLVED (inferred) items:** Show what was inferred with reasoning, ask user to confirm or correct.
66
86
 
67
- **Testing Philosophy:**
68
- 14. Testing approach (strict TDD, test-after, mixed, minimal, none yet)
69
- 15. Test types wanted (unit, integration, E2E, API)
70
- 16. Mocking philosophy (minimal, heavy, no mocking)
87
+ **For UNRESOLVED items:** Ask the user directly — these are your questions.
71
88
 
72
- **Coverage:**
73
- 17. Code coverage preferences (enforce threshold, report only, AI suggestions, skip)
89
+ **The ready rule:** You are ready to generate files when ALL data points are resolved (detected, inferred+confirmed, or answered by user). The number of questions you ask depends entirely on how many data points remain unresolved after scanning. A well-configured project might need 3-4 questions (just preferences). A bare repo might need 10+. There is no fixed count.
74
90
 
75
- DO NOT proceed to file generation until ALL 17 questions have answers.
91
+ DO NOT proceed to file generation until all data points are resolved.
76
92
 
77
- ### Step 3: Generate CLAUDE.md
93
+ ### Step 4: Generate CLAUDE.md
78
94
 
79
- Using the user's answers, generate `CLAUDE.md` with:
95
+ Using detected + confirmed values, generate `CLAUDE.md` with:
80
96
  - Project overview (from scan results)
81
- - Commands table (Q4-Q8 answers)
97
+ - Commands table (detected/confirmed commands)
82
98
  - Code style section (from detected linters/formatters)
83
99
  - Architecture summary (from scan)
84
- - Special notes (from Q9-Q11)
100
+ - Special notes (infra, deployment)
85
101
 
86
102
  Reference: See "Step 3" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full template.
87
103
 
88
- ### Step 4: Generate SDLC.md
104
+ ### Step 5: Generate SDLC.md
89
105
 
90
106
  Generate `SDLC.md` with the full SDLC checklist customized to the project:
91
107
  - Plan mode guidance
@@ -98,35 +114,35 @@ Include metadata comments:
98
114
  ```
99
115
  <!-- SDLC Wizard Version: [version from CLAUDE_CODE_SDLC_WIZARD.md] -->
100
116
  <!-- Setup Date: [today's date] -->
101
- <!-- Completed Steps: 0.4, 1-10 -->
117
+ <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
102
118
  ```
103
119
 
104
120
  Reference: See "Step 4" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full template.
105
121
 
106
- ### Step 5: Generate TESTING.md
122
+ ### Step 6: Generate TESTING.md
107
123
 
108
- Generate `TESTING.md` based on Q13-Q16 answers:
124
+ Generate `TESTING.md` based on detected/confirmed testing data:
109
125
  - Testing Diamond visualization
110
126
  - Test types and their purposes
111
- - Mocking rules (from Q15)
112
- - Test file organization (from Q2, Q3)
113
- - Coverage config (from Q16)
127
+ - Mocking rules (from detected patterns or user input)
128
+ - Test file organization (from detected structure)
129
+ - Coverage config (from detected config or user input)
114
130
  - Framework-specific patterns
115
131
 
116
132
  Reference: See "Step 5" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full template.
117
133
 
118
- ### Step 6: Generate ARCHITECTURE.md
134
+ ### Step 7: Generate ARCHITECTURE.md
119
135
 
120
136
  Generate `ARCHITECTURE.md` with:
121
137
  - System overview diagram (from scan)
122
138
  - Component descriptions
123
- - Environments table (from Q8.5)
139
+ - Environments table (from detected deployment config)
124
140
  - Deployment checklist
125
141
  - Key technical decisions
126
142
 
127
143
  Reference: See "Step 6" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full template.
128
144
 
129
- ### Step 7: Generate DESIGN_SYSTEM.md (If UI Detected)
145
+ ### Step 8: Generate DESIGN_SYSTEM.md (If UI Detected)
130
146
 
131
147
  Only if design system artifacts were found in Step 1:
132
148
  - Extract colors, fonts, spacing from config
@@ -135,7 +151,7 @@ Only if design system artifacts were found in Step 1:
135
151
 
136
152
  Skip this step if no UI/design system detected.
137
153
 
138
- ### Step 8: Configure Tool Permissions
154
+ ### Step 9: Configure Tool Permissions
139
155
 
140
156
  Based on detected stack, suggest `allowedTools` entries for `.claude/settings.json`:
141
157
  - Package manager commands (npm, pnpm, yarn, cargo, go, pip, etc.)
@@ -144,11 +160,11 @@ Based on detected stack, suggest `allowedTools` entries for `.claude/settings.js
144
160
 
145
161
  Present suggestions and let the user confirm.
146
162
 
147
- ### Step 9: Customize Hooks
163
+ ### Step 10: Customize Hooks
148
164
 
149
- Update `tdd-pretool-check.sh` with the actual source directory from Q1 (replace generic `/src/` pattern).
165
+ Update `tdd-pretool-check.sh` with the actual source directory (replace generic `/src/` pattern).
150
166
 
151
- ### Step 10: Verify Setup
167
+ ### Step 11: Verify Setup
152
168
 
153
169
  Run verification checks:
154
170
  1. All generated files exist and are non-empty
@@ -159,18 +175,24 @@ Run verification checks:
159
175
 
160
176
  Report any issues found.
161
177
 
162
- ### Step 11: Instruct Restart
178
+ ### Step 12: Instruct Restart and Next Steps
163
179
 
164
180
  Tell the user:
165
181
  > Setup complete. Hooks and settings load at session start.
166
182
  > **Exit Claude Code and restart it** for the new configuration to take effect.
167
183
  > On restart, the SDLC hook will fire and you'll see the checklist in every response.
184
+ >
185
+ > **Optional next step:**
186
+ > - Run `/claude-automation-recommender` for stack-specific tooling suggestions (MCP servers, formatting hooks, type-checking hooks, plugins)
187
+ >
188
+ > The recommender is complementary to the SDLC wizard — it adds tooling recommendations, not process enforcement.
168
189
 
169
190
  ## Rules
170
191
 
171
- - NEVER skip a question. If the user says "I don't know", record that and move on.
172
- - NEVER assume answers. If auto-scan can't detect something, ASK.
173
- - ALWAYS show detected values and let the user confirm or override.
192
+ - NEVER ask what you already know from scanning. If you found it, confirm it — don't ask it.
193
+ - NEVER use a fixed question count. The number of questions is dynamic based on scan results.
194
+ - ALWAYS show detected values organized by resolution state and let the user confirm or override.
174
195
  - ALWAYS generate metadata comments in SDLC.md (version, date, steps).
196
+ - If most data points are resolved after scanning, present findings for bulk confirmation — don't force individual questions.
175
197
  - If the user passes `regenerate` as an argument, skip Q&A and regenerate files from existing SDLC.md metadata.
176
- - If the user passes `verify-only` as an argument, skip to Step 10 (verify) only.
198
+ - If the user passes `verify-only` as an argument, skip to Step 11 (verify) only.
@@ -45,13 +45,13 @@ Extract the latest version from the first `## [X.X.X]` line.
45
45
  Parse all CHANGELOG entries between the user's installed version and the latest. Present a clear summary:
46
46
 
47
47
  ```
48
- Installed: 1.15.0
49
- Latest: 1.18.0
48
+ Installed: 1.19.0
49
+ Latest: 1.21.0
50
50
 
51
51
  What changed:
52
- - [1.18.0] Added /update-wizard skill, ...
53
- - [1.17.0] Consolidated /testing into /sdlc, ...
54
- - [1.16.0] Cross-model review protocol, ...
52
+ - [1.21.0] Confidence-driven setup, prove-it gate, cross-model release review, ...
53
+ - [1.20.0] Version-pinned CC update gate, Tier 1 flakiness fix, flaky test guidance, ...
54
+ - [1.19.0] CI shepherd model, token efficiency, feature doc enforcement, ...
55
55
  ```
56
56
 
57
57
  **If versions match:** Say "You're up to date! (version X.X.X)" and stop.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agentic-sdlc-wizard",
3
- "version": "1.18.0",
3
+ "version": "1.21.0",
4
4
  "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
5
5
  "bin": {
6
6
  "sdlc-wizard": "./cli/bin/sdlc-wizard.js"