agentic-sdlc-wizard 1.26.0 → 1.28.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +1 -1
- package/.claude-plugin/plugin.json +1 -1
- package/CHANGELOG.md +32 -0
- package/CLAUDE_CODE_SDLC_WIZARD.md +257 -12
- package/README.md +2 -2
- package/package.json +1 -1
- package/skills/setup/SKILL.md +21 -11
- package/skills/update/SKILL.md +3 -1
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,38 @@ All notable changes to the SDLC Wizard.
|
|
|
4
4
|
|
|
5
5
|
> **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
|
|
6
6
|
|
|
7
|
+
## [1.28.0] - 2026-04-06
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
- Autocompact benchmarking methodology — first rigorous framework for testing `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` thresholds (#92, PR #158)
|
|
11
|
+
- `AUTOCOMPACT_BENCHMARK.md`: experimental design, canary fact mechanism, cost estimation, limitations
|
|
12
|
+
- `tests/benchmarks/run-benchmark.sh`: parameterized harness with `--dry-run`, threshold validation, multi-turn session via `--resume`
|
|
13
|
+
- `tests/benchmarks/analyze-results.sh`: statistical comparison tables using `stats.sh`
|
|
14
|
+
- 3 task files (short/medium/long) with canary fact injection for context preservation measurement
|
|
15
|
+
- `canary-facts.json`: 5 domain-independent facts for binary recall scoring with negation detection
|
|
16
|
+
- `.github/workflows/benchmark-autocompact.yml`: `workflow_dispatch` with matrix strategy across thresholds
|
|
17
|
+
- 26 quality tests proving methodology rigor, harness behavior, and research standards
|
|
18
|
+
|
|
19
|
+
### Changed
|
|
20
|
+
- README bio: reflects full-stack founding engineer background (not just SDET/QA)
|
|
21
|
+
- Wizard doc autocompact section references benchmarking methodology
|
|
22
|
+
- Workflow count updated (5→6) across README and CI
|
|
23
|
+
|
|
24
|
+
## [1.27.0] - 2026-04-05
|
|
25
|
+
|
|
26
|
+
### Added
|
|
27
|
+
- Domain-adaptive testing diamond — setup wizard auto-detects project domain (firmware/data-science/CLI/web) and generates domain-specific TESTING.md with appropriate testing layers (#79, PR #157)
|
|
28
|
+
- Firmware/Embedded: HIL/SIL/Config Validation/Unit (no browser, no DB)
|
|
29
|
+
- Data Science: Model Evaluation/Pipeline Integration/Data Validation/Unit
|
|
30
|
+
- CLI Tool: CLI Integration/Behavior/Unit (no browser)
|
|
31
|
+
- Web/API: unchanged default (E2E/Integration/Unit)
|
|
32
|
+
- Domain detection patterns in wizard doc scan tree and setup skill Step 1/2/6
|
|
33
|
+
- 3 new test fixtures: firmware-embedded, data-science, cli-tool (partially satisfies #78)
|
|
34
|
+
- 25 domain detection quality tests
|
|
35
|
+
|
|
36
|
+
### Fixed
|
|
37
|
+
- Setup skill cross-references: Step 4/5 now correctly reference wizard doc Steps 8/9 (caught by CI PR review)
|
|
38
|
+
|
|
7
39
|
## [1.26.0] - 2026-04-05
|
|
8
40
|
|
|
9
41
|
### Added
|
|
@@ -519,6 +519,82 @@ Here's the "Testing Diamond" approach (recommended for AI agents):
|
|
|
519
519
|
- **Unit**: Pure logic only — no DB, no API, no filesystem. ~5% of suite.
|
|
520
520
|
- **The rule**: If your test doesn't open a browser or render a UI, it's not E2E — it's integration. Mislabeling leads to overinvestment in slow browser tests.
|
|
521
521
|
|
|
522
|
+
#### Domain-Adaptive Testing Layers
|
|
523
|
+
|
|
524
|
+
The Testing Diamond above is the Web/API default. Other project domains have fundamentally different testing layers. The setup wizard auto-detects your domain and generates the appropriate TESTING.md.
|
|
525
|
+
|
|
526
|
+
**Domain Detection Patterns:**
|
|
527
|
+
|
|
528
|
+
| Domain | File/Dir Indicators |
|
|
529
|
+
|--------|-------------------|
|
|
530
|
+
| **Firmware/Embedded** | Makefile with `flash`/`burn` targets, `.cfg` device configs, `/sys/` or `/dev/tty` references, `.c`/`.h` source, `platformio.ini`, `CMakeLists.txt` with embedded targets |
|
|
531
|
+
| **Data Science** | `.ipynb` notebooks, `requirements.txt` with pandas/sklearn/tensorflow/torch, `data/` or `datasets/` dir, `models/` dir, Jupyter config |
|
|
532
|
+
| **CLI Tool** | `package.json` with `"bin"` field (no React/Vue/Angular), `bin/` dir, `src/cli.*`, no `src/components/` |
|
|
533
|
+
| **Web/API (default)** | Everything else — web frameworks, `src/components/`, Playwright/Cypress config, DB config. Fallback when no other domain matches |
|
|
534
|
+
|
|
535
|
+
**Firmware/Embedded Testing Layers:**
|
|
536
|
+
|
|
537
|
+
```
|
|
538
|
+
/\ ← Few HIL (Hardware-in-the-Loop: real device, flash + boot verify)
|
|
539
|
+
/ \
|
|
540
|
+
/ \
|
|
541
|
+
/------\
|
|
542
|
+
| | ← MANY SIL (Software-in-the-Loop: emulated hardware, QEMU, device sims)
|
|
543
|
+
| |
|
|
544
|
+
\------/
|
|
545
|
+
\ / ← Config Validation (device config parsing, constraint checks)
|
|
546
|
+
\ /
|
|
547
|
+
\/ ← Few Unit (parsers, formatters, math)
|
|
548
|
+
```
|
|
549
|
+
|
|
550
|
+
- **HIL (~5%)**: Hardware-in-the-Loop — flash to real device, verify boot, test hardware interfaces
|
|
551
|
+
- **SIL (~60%)**: Software-in-the-Loop — emulated hardware via QEMU or device simulators. Best bang for buck
|
|
552
|
+
- **Config Validation (~25%)**: Device config (.cfg) parsing, cross-device constraint checks, valid value ranges
|
|
553
|
+
- **Unit (~10%)**: Pure logic only — parsers, formatters, math functions
|
|
554
|
+
- **Mocking**: Mock hardware interfaces (`/dev/tty*`, GPIO), NEVER mock config parsers
|
|
555
|
+
- NO browser tests, NO database mocking
|
|
556
|
+
|
|
557
|
+
**Data Science Testing Layers:**
|
|
558
|
+
|
|
559
|
+
```
|
|
560
|
+
/\ ← Few Model Evaluation (accuracy/precision/recall on holdout sets)
|
|
561
|
+
/ \
|
|
562
|
+
/ \
|
|
563
|
+
/------\
|
|
564
|
+
| | ← MANY Pipeline Integration (end-to-end with test datasets)
|
|
565
|
+
| |
|
|
566
|
+
\------/
|
|
567
|
+
\ / ← Data Validation (schema checks, distribution drift, missing values)
|
|
568
|
+
\ /
|
|
569
|
+
\/ ← Few Unit (pure transformations, feature engineering)
|
|
570
|
+
```
|
|
571
|
+
|
|
572
|
+
- **Model Evaluation (~10%)**: Accuracy, precision, recall, F1 on holdout test sets. Catches model degradation
|
|
573
|
+
- **Pipeline Integration (~60%)**: End-to-end pipeline runs with test datasets. Best bang for buck
|
|
574
|
+
- **Data Validation (~20%)**: Schema checks, distribution drift detection, missing value handling, type enforcement
|
|
575
|
+
- **Unit (~10%)**: Pure transformations, feature engineering functions, data cleaning logic
|
|
576
|
+
- **Mocking**: Mock external data sources (APIs, S3), NEVER mock data transformations
|
|
577
|
+
- NO browser tests, NO traditional API endpoint testing
|
|
578
|
+
|
|
579
|
+
**CLI Tool Testing Layers:**
|
|
580
|
+
|
|
581
|
+
```
|
|
582
|
+
/------\
|
|
583
|
+
| | ← MANY CLI Integration (full invocations, real args, real filesystem)
|
|
584
|
+
| |
|
|
585
|
+
| |
|
|
586
|
+
\------/
|
|
587
|
+
\ / ← Behavior (exit codes, stdout/stderr content, file creation)
|
|
588
|
+
\ /
|
|
589
|
+
\/ ← Few Unit (arg parsing, formatters, pure logic)
|
|
590
|
+
```
|
|
591
|
+
|
|
592
|
+
- **CLI Integration (~80%)**: Full CLI invocations with real arguments and real filesystem. Best bang for buck
|
|
593
|
+
- **Behavior (~10%)**: Exit codes, stdout/stderr output validation, file creation/modification verification
|
|
594
|
+
- **Unit (~10%)**: Argument parsing, output formatters, pure logic
|
|
595
|
+
- **Mocking**: Mock network calls, NEVER mock filesystem operations
|
|
596
|
+
- NO browser tests, usually NO database
|
|
597
|
+
|
|
522
598
|
**But your team decides:**
|
|
523
599
|
|
|
524
600
|
| Question | Your Choice |
|
|
@@ -724,7 +800,7 @@ Two tools for managing context — use the right one:
|
|
|
724
800
|
|
|
725
801
|
### Autocompact Tuning
|
|
726
802
|
|
|
727
|
-
Override the default auto-compact threshold with environment variables. These are community-discovered settings referenced in upstream issues ([#34332](https://github.com/anthropics/claude-code/issues/34332), [#42375](https://github.com/anthropics/claude-code/issues/42375)) — not yet officially documented by Anthropic
|
|
803
|
+
Override the default auto-compact threshold with environment variables. These are community-discovered settings referenced in upstream issues ([#34332](https://github.com/anthropics/claude-code/issues/34332), [#42375](https://github.com/anthropics/claude-code/issues/42375)) — not yet officially documented by Anthropic. For a rigorous benchmarking methodology to validate these thresholds, see [AUTOCOMPACT_BENCHMARK.md](AUTOCOMPACT_BENCHMARK.md).
|
|
728
804
|
|
|
729
805
|
| Variable | What It Does | Default |
|
|
730
806
|
|----------|-------------|---------|
|
|
@@ -752,6 +828,10 @@ export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75
|
|
|
752
828
|
|
|
753
829
|
**Note:** These env vars may change as Claude Code evolves. Check [Claude Code settings docs](https://docs.anthropic.com/en/docs/claude-code/settings) for the latest supported configuration.
|
|
754
830
|
|
|
831
|
+
### Benchmarking Methodology
|
|
832
|
+
|
|
833
|
+
The thresholds above are community consensus — not empirically validated. For rigorous benchmarking of autocompact thresholds (measuring task quality, context preservation, and cost at each setting), see [AUTOCOMPACT_BENCHMARK.md](AUTOCOMPACT_BENCHMARK.md). It provides a controlled experimental methodology with a novel "canary fact" mechanism for measuring context preservation post-compaction.
|
|
834
|
+
|
|
755
835
|
### 1M vs 200K Context Window
|
|
756
836
|
|
|
757
837
|
Claude Code supports both 200K and 1M context windows. Choose based on your task:
|
|
@@ -1235,12 +1315,34 @@ Claude scans for:
|
|
|
1235
1315
|
│ ├── docker-compose.yml → Bash(docker *)
|
|
1236
1316
|
│ └── .github/workflows/ → Bash(gh *)
|
|
1237
1317
|
│
|
|
1238
|
-
|
|
1239
|
-
|
|
1240
|
-
|
|
1241
|
-
|
|
1242
|
-
|
|
1243
|
-
|
|
1318
|
+
├── Design system (for UI projects):
|
|
1319
|
+
│ ├── tailwind.config.* → Extract colors, fonts, spacing from theme
|
|
1320
|
+
│ ├── CSS with --var-name → Extract custom property palette
|
|
1321
|
+
│ ├── .storybook/ → Reference as design source of truth
|
|
1322
|
+
│ ├── MUI/Chakra theme files → Reference theming docs + overrides
|
|
1323
|
+
│ └── /assets/, /images/ → Document asset locations
|
|
1324
|
+
│
|
|
1325
|
+
└── Project domain (for domain-adaptive TESTING.md):
|
|
1326
|
+
├── Firmware/Embedded:
|
|
1327
|
+
│ ├── Makefile with flash/burn targets
|
|
1328
|
+
│ ├── .cfg device config files
|
|
1329
|
+
│ ├── /sys/ or /dev/tty references in scripts
|
|
1330
|
+
│ ├── .c/.h source files without web frameworks
|
|
1331
|
+
│ ├── platformio.ini, CMakeLists.txt
|
|
1332
|
+
│ └── No package.json with web frameworks
|
|
1333
|
+
├── Data Science:
|
|
1334
|
+
│ ├── .ipynb notebook files
|
|
1335
|
+
│ ├── requirements.txt with pandas/sklearn/tensorflow/torch
|
|
1336
|
+
│ ├── data/ or datasets/ directory
|
|
1337
|
+
│ ├── models/ directory
|
|
1338
|
+
│ └── No Express/FastAPI/Rails web framework
|
|
1339
|
+
├── CLI Tool:
|
|
1340
|
+
│ ├── package.json with "bin" field (no React/Vue/Angular deps)
|
|
1341
|
+
│ ├── bin/ directory with executable scripts
|
|
1342
|
+
│ ├── src/cli.* entry point
|
|
1343
|
+
│ └── No src/components/, no browser test config
|
|
1344
|
+
└── Web/API (default):
|
|
1345
|
+
└── Everything else — fallback when no other domain matches
|
|
1244
1346
|
```
|
|
1245
1347
|
|
|
1246
1348
|
**If Claude can't detect something, it asks.** Never assumes.
|
|
@@ -1553,6 +1655,7 @@ Each resolved data point (whether detected or confirmed by the user) maps to gen
|
|
|
1553
1655
|
| Infrastructure (DB, cache) | `CLAUDE.md` - Architecture section, `TESTING.md` - mock decisions |
|
|
1554
1656
|
| Test duration | `SDLC skill` - wait time note |
|
|
1555
1657
|
| Test types (E2E) | `TESTING.md` - testing diamond top |
|
|
1658
|
+
| Project domain (firmware/data-science/CLI/web) | `TESTING.md` - domain-adaptive testing layers and mocking rules |
|
|
1556
1659
|
|
|
1557
1660
|
---
|
|
1558
1661
|
|
|
@@ -2497,7 +2600,7 @@ If deployment fails or post-deploy verification catches issues:
|
|
|
2497
2600
|
|
|
2498
2601
|
**SDLC.md:**
|
|
2499
2602
|
```markdown
|
|
2500
|
-
<!-- SDLC Wizard Version: 1.
|
|
2603
|
+
<!-- SDLC Wizard Version: 1.28.0 -->
|
|
2501
2604
|
<!-- Setup Date: [DATE] -->
|
|
2502
2605
|
<!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
2503
2606
|
<!-- Git Workflow: [PRs or Solo] -->
|
|
@@ -2525,26 +2628,168 @@ See `.claude/skills/sdlc/SKILL.md` for the enforced checklist.
|
|
|
2525
2628
|
- Survives file edits
|
|
2526
2629
|
- Travels with the repo
|
|
2527
2630
|
|
|
2528
|
-
**TESTING.md:**
|
|
2631
|
+
**TESTING.md (domain-adaptive — generate the template matching the detected domain):**
|
|
2632
|
+
|
|
2633
|
+
**Web/API (default):**
|
|
2529
2634
|
```markdown
|
|
2530
2635
|
# Testing Guidelines
|
|
2531
2636
|
|
|
2532
|
-
|
|
2637
|
+
## Testing Diamond
|
|
2638
|
+
|
|
2639
|
+
Integration tests are best bang for buck. Mocks can "pass" while production fails.
|
|
2640
|
+
|
|
2641
|
+
| Layer | What It Tests | % of Suite |
|
|
2642
|
+
|-------|--------------|------------|
|
|
2643
|
+
| E2E | Full user flow through browser (Playwright, Cypress) | ~5% |
|
|
2644
|
+
| Integration | Real DB, real cache, API-level — no UI | ~90% |
|
|
2645
|
+
| Unit | Pure logic — no DB, no API, no filesystem | ~5% |
|
|
2533
2646
|
|
|
2534
2647
|
## Test Commands
|
|
2535
2648
|
|
|
2536
2649
|
- All tests: `[your command]`
|
|
2537
2650
|
- Specific test: `[your command]`
|
|
2538
2651
|
|
|
2652
|
+
## Mocking Rules
|
|
2653
|
+
|
|
2654
|
+
| Dependency | Mock? | Why |
|
|
2655
|
+
|------------|-------|-----|
|
|
2656
|
+
| Database | NEVER | Use test DB or in-memory |
|
|
2657
|
+
| Cache | NEVER | Use isolated test instance |
|
|
2658
|
+
| External APIs | YES | Real calls = flaky + expensive |
|
|
2659
|
+
| Time/Date | YES | Determinism |
|
|
2660
|
+
|
|
2539
2661
|
## Fixtures
|
|
2540
2662
|
|
|
2541
|
-
Location: `[
|
|
2663
|
+
Location: `[tests/fixtures/ or test-data/]`
|
|
2542
2664
|
|
|
2543
2665
|
## Lessons Learned
|
|
2544
2666
|
|
|
2545
2667
|
<!-- Add testing gotchas as you discover them -->
|
|
2546
2668
|
```
|
|
2547
2669
|
|
|
2670
|
+
**Firmware/Embedded (if detected):**
|
|
2671
|
+
```markdown
|
|
2672
|
+
# Testing Guidelines
|
|
2673
|
+
|
|
2674
|
+
## Testing Layers (Firmware)
|
|
2675
|
+
|
|
2676
|
+
SIL tests are best bang for buck. Real hardware tests are slow but prove the real thing works.
|
|
2677
|
+
|
|
2678
|
+
| Layer | What It Tests | % of Suite |
|
|
2679
|
+
|-------|--------------|------------|
|
|
2680
|
+
| HIL | Hardware-in-the-Loop — real device, flash + boot verify | ~5% |
|
|
2681
|
+
| SIL | Software-in-the-Loop — emulated hardware (QEMU, device sims) | ~60% |
|
|
2682
|
+
| Config Validation | Device config parsing, constraint checks, valid ranges | ~25% |
|
|
2683
|
+
| Unit | Pure logic — parsers, formatters, math | ~10% |
|
|
2684
|
+
|
|
2685
|
+
## Test Commands
|
|
2686
|
+
|
|
2687
|
+
- All tests: `[your command, e.g., make test]`
|
|
2688
|
+
- Flash + verify: `[your flash command]`
|
|
2689
|
+
- Config validation: `[your config check command]`
|
|
2690
|
+
|
|
2691
|
+
## Mocking Rules
|
|
2692
|
+
|
|
2693
|
+
| Dependency | Mock? | Why |
|
|
2694
|
+
|------------|-------|-----|
|
|
2695
|
+
| Hardware interfaces (/dev/tty*, GPIO) | YES | Real hardware not always available |
|
|
2696
|
+
| Config parsers | NEVER | Config bugs brick devices |
|
|
2697
|
+
| Filesystem (/sys/, /proc/) | YES in CI | Real paths only exist on target |
|
|
2698
|
+
| Serial protocols | YES | Use loopback or emulator |
|
|
2699
|
+
|
|
2700
|
+
## Device Matrix
|
|
2701
|
+
|
|
2702
|
+
| Device | Config File | Status |
|
|
2703
|
+
|--------|------------|--------|
|
|
2704
|
+
| [device-a] | configs/device-a.cfg | [tested/untested] |
|
|
2705
|
+
|
|
2706
|
+
## Lessons Learned
|
|
2707
|
+
|
|
2708
|
+
<!-- Add firmware testing gotchas as you discover them -->
|
|
2709
|
+
```
|
|
2710
|
+
|
|
2711
|
+
**Data Science (if detected):**
|
|
2712
|
+
```markdown
|
|
2713
|
+
# Testing Guidelines
|
|
2714
|
+
|
|
2715
|
+
## Testing Layers (Data Science)
|
|
2716
|
+
|
|
2717
|
+
Pipeline integration tests are best bang for buck. Model evaluation catches degradation.
|
|
2718
|
+
|
|
2719
|
+
| Layer | What It Tests | % of Suite |
|
|
2720
|
+
|-------|--------------|------------|
|
|
2721
|
+
| Model Evaluation | Accuracy/precision/recall/F1 on holdout sets | ~10% |
|
|
2722
|
+
| Pipeline Integration | End-to-end pipeline runs with test datasets | ~60% |
|
|
2723
|
+
| Data Validation | Schema checks, distribution drift, missing values | ~20% |
|
|
2724
|
+
| Unit | Pure transformations, feature engineering | ~10% |
|
|
2725
|
+
|
|
2726
|
+
## Test Commands
|
|
2727
|
+
|
|
2728
|
+
- All tests: `[your command, e.g., pytest]`
|
|
2729
|
+
- Model evaluation: `[your eval command]`
|
|
2730
|
+
- Data validation: `[your validation command]`
|
|
2731
|
+
|
|
2732
|
+
## Mocking Rules
|
|
2733
|
+
|
|
2734
|
+
| Dependency | Mock? | Why |
|
|
2735
|
+
|------------|-------|-----|
|
|
2736
|
+
| External data sources (APIs, S3) | YES | Real calls = flaky + expensive |
|
|
2737
|
+
| Data transformations | NEVER | Transform bugs corrupt pipelines |
|
|
2738
|
+
| Model training | PARTIAL | Use small test datasets for speed |
|
|
2739
|
+
| Database/warehouse | YES in unit | Use test fixtures for integration |
|
|
2740
|
+
|
|
2741
|
+
## Test Datasets
|
|
2742
|
+
|
|
2743
|
+
Location: `[tests/data/ or tests/fixtures/]`
|
|
2744
|
+
- Keep test datasets small but representative
|
|
2745
|
+
- Include edge cases: missing values, wrong types, outliers
|
|
2746
|
+
|
|
2747
|
+
## Lessons Learned
|
|
2748
|
+
|
|
2749
|
+
<!-- Add data science testing gotchas as you discover them -->
|
|
2750
|
+
```
|
|
2751
|
+
|
|
2752
|
+
**CLI Tool (if detected):**
|
|
2753
|
+
```markdown
|
|
2754
|
+
# Testing Guidelines
|
|
2755
|
+
|
|
2756
|
+
## Testing Layers (CLI)
|
|
2757
|
+
|
|
2758
|
+
CLI integration tests are best bang for buck. Test real invocations with real arguments.
|
|
2759
|
+
|
|
2760
|
+
| Layer | What It Tests | % of Suite |
|
|
2761
|
+
|-------|--------------|------------|
|
|
2762
|
+
| CLI Integration | Full invocations with real args, real filesystem | ~80% |
|
|
2763
|
+
| Behavior | Exit codes, stdout/stderr content, file creation | ~10% |
|
|
2764
|
+
| Unit | Arg parsing, formatters, pure logic | ~10% |
|
|
2765
|
+
|
|
2766
|
+
## Test Commands
|
|
2767
|
+
|
|
2768
|
+
- All tests: `[your command]`
|
|
2769
|
+
- Specific test: `[your command]`
|
|
2770
|
+
|
|
2771
|
+
## Mocking Rules
|
|
2772
|
+
|
|
2773
|
+
| Dependency | Mock? | Why |
|
|
2774
|
+
|------------|-------|-----|
|
|
2775
|
+
| Filesystem | NEVER | CLI tools live on the filesystem |
|
|
2776
|
+
| Network calls | YES | Real calls = flaky |
|
|
2777
|
+
| Stdin/stdout | CAPTURE | Use child_process or subprocess |
|
|
2778
|
+
| Environment vars | SET per test | Determinism |
|
|
2779
|
+
|
|
2780
|
+
## Behavior Contract
|
|
2781
|
+
|
|
2782
|
+
| Input | Expected Exit Code | Expected Output |
|
|
2783
|
+
|-------|-------------------|----------------|
|
|
2784
|
+
| `--help` | 0 | Usage text |
|
|
2785
|
+
| (no args) | 1 | Error message |
|
|
2786
|
+
| `--version` | 0 | Version string |
|
|
2787
|
+
|
|
2788
|
+
## Lessons Learned
|
|
2789
|
+
|
|
2790
|
+
<!-- Add CLI testing gotchas as you discover them -->
|
|
2791
|
+
```
|
|
2792
|
+
|
|
2548
2793
|
---
|
|
2549
2794
|
|
|
2550
2795
|
**DESIGN_SYSTEM.md (if UI detected):**
|
|
@@ -3414,7 +3659,7 @@ Walk through updates? (y/n)
|
|
|
3414
3659
|
Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
|
|
3415
3660
|
|
|
3416
3661
|
```markdown
|
|
3417
|
-
<!-- SDLC Wizard Version: 1.
|
|
3662
|
+
<!-- SDLC Wizard Version: 1.28.0 -->
|
|
3418
3663
|
<!-- Setup Date: 2026-01-24 -->
|
|
3419
3664
|
<!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
3420
3665
|
<!-- Git Workflow: PRs -->
|
package/README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
A **self-evolving Software Development Life Cycle (SDLC) enforcement system for AI coding agents**. Makes Claude plan before coding, test before shipping, and ask when uncertain. Measures itself getting better over time.
|
|
4
4
|
|
|
5
|
-
**Built on 15+ years of
|
|
5
|
+
**Built on 15+ years of software engineering and founding engineering experience** — battle-tested patterns from real production systems, baked into an AI agent that follows tried-and-true software quality practices so you don't have to enforce them manually.
|
|
6
6
|
|
|
7
7
|
## Install
|
|
8
8
|
|
|
@@ -229,7 +229,7 @@ This isn't the only Claude Code SDLC tool. Here's an honest comparison:
|
|
|
229
229
|
| Document | What It Covers |
|
|
230
230
|
|----------|---------------|
|
|
231
231
|
| [ARCHITECTURE.md](ARCHITECTURE.md) | System design, 5-layer diagram, data flows, file structure |
|
|
232
|
-
| [CI_CD.md](CI_CD.md) | All
|
|
232
|
+
| [CI_CD.md](CI_CD.md) | All 6 workflows, E2E scoring, tier system, SDP, integrity checks |
|
|
233
233
|
| [SDLC.md](SDLC.md) | Version tracking, enforcement rules, SDLC configuration |
|
|
234
234
|
| [TESTING.md](TESTING.md) | Testing philosophy, test diamond, TDD approach |
|
|
235
235
|
| [CHANGELOG.md](CHANGELOG.md) | Version history, what changed and when |
|
package/package.json
CHANGED
package/skills/setup/SKILL.md
CHANGED
|
@@ -42,6 +42,11 @@ Scan the project root for:
|
|
|
42
42
|
- Scripts in package.json (lint, test, build, typecheck, etc.)
|
|
43
43
|
- Database config files (prisma/, drizzle.config.*, knexfile.*, .env with DB_*)
|
|
44
44
|
- Cache config (redis.conf, .env with REDIS_*)
|
|
45
|
+
- Domain indicators (for domain-adaptive TESTING.md):
|
|
46
|
+
- Firmware/Embedded: Makefile with flash/burn targets, .cfg device configs, /sys/ or /dev/tty references, .c/.h source, platformio.ini
|
|
47
|
+
- Data Science: .ipynb notebooks, requirements.txt with pandas/sklearn/tensorflow/torch, data/ or datasets/ dir, models/ dir
|
|
48
|
+
- CLI Tool: package.json with "bin" field (no React/Vue/Angular), bin/ dir, src/cli.*, no src/components/
|
|
49
|
+
- Web/API: default — everything else (web frameworks, src/components/, Playwright/Cypress config)
|
|
45
50
|
|
|
46
51
|
### Step 2: Build Confidence Map
|
|
47
52
|
|
|
@@ -69,6 +74,7 @@ For each configuration data point, assign a confidence level based on scan resul
|
|
|
69
74
|
| Testing | Test types | What test files exist (*.test.*, *.spec.*, e2e/, integration/) |
|
|
70
75
|
| Coverage | Coverage config | nyc, c8, coverage.py config, CI coverage steps |
|
|
71
76
|
| CI | CI shepherd opt-in | Only if CI detected — ALWAYS ASK |
|
|
77
|
+
| Domain | Project domain | Auto-detect from domain indicators above (firmware/data-science/CLI/web). Web/API is the default fallback. One domain per project — dominant signal wins |
|
|
72
78
|
|
|
73
79
|
**Each data point has one of three states:**
|
|
74
80
|
- **RESOLVED (detected):** Found concrete evidence — config file, script, directory exists. No question needed, just confirm.
|
|
@@ -100,7 +106,7 @@ Using detected + confirmed values, generate `CLAUDE.md` with:
|
|
|
100
106
|
- Architecture summary (from scan)
|
|
101
107
|
- Special notes (infra, deployment)
|
|
102
108
|
|
|
103
|
-
Reference: See "Step
|
|
109
|
+
Reference: See "Step 8" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full template.
|
|
104
110
|
|
|
105
111
|
### Step 5: Generate SDLC.md
|
|
106
112
|
|
|
@@ -118,19 +124,23 @@ Include metadata comments:
|
|
|
118
124
|
<!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
119
125
|
```
|
|
120
126
|
|
|
121
|
-
Reference: See "Step
|
|
127
|
+
Reference: See "Step 9" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full template.
|
|
122
128
|
|
|
123
|
-
### Step 6: Generate TESTING.md
|
|
129
|
+
### Step 6: Generate TESTING.md (Domain-Adaptive)
|
|
124
130
|
|
|
125
|
-
Generate `TESTING.md`
|
|
126
|
-
- Testing Diamond
|
|
127
|
-
-
|
|
128
|
-
-
|
|
129
|
-
-
|
|
130
|
-
- Coverage config (from detected config or user input)
|
|
131
|
-
- Framework-specific patterns
|
|
131
|
+
Generate `TESTING.md` using the domain-specific template matching the detected project domain:
|
|
132
|
+
- **Web/API (default)**: Standard Testing Diamond (E2E/Integration/Unit)
|
|
133
|
+
- **Firmware/Embedded**: HIL/SIL/Config Validation/Unit layers
|
|
134
|
+
- **Data Science**: Model Evaluation/Pipeline Integration/Data Validation/Unit layers
|
|
135
|
+
- **CLI Tool**: CLI Integration/Behavior/Unit layers
|
|
132
136
|
|
|
133
|
-
|
|
137
|
+
Each domain template includes:
|
|
138
|
+
- Domain-appropriate testing layer visualization and percentages
|
|
139
|
+
- Domain-specific mocking rules (what to mock, what NEVER to mock)
|
|
140
|
+
- Test commands and fixture locations
|
|
141
|
+
- Domain-specific sections (Device Matrix for firmware, Test Datasets for data science, Behavior Contract for CLI)
|
|
142
|
+
|
|
143
|
+
Reference: See "Step 9" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full domain-conditional templates.
|
|
134
144
|
|
|
135
145
|
### Step 7: Generate ARCHITECTURE.md
|
|
136
146
|
|
package/skills/update/SKILL.md
CHANGED
|
@@ -46,9 +46,11 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
|
|
|
46
46
|
|
|
47
47
|
```
|
|
48
48
|
Installed: 1.24.0
|
|
49
|
-
Latest: 1.
|
|
49
|
+
Latest: 1.28.0
|
|
50
50
|
|
|
51
51
|
What changed:
|
|
52
|
+
- [1.28.0] Autocompact benchmarking methodology, canary fact mechanism, benchmark harness, ...
|
|
53
|
+
- [1.27.0] Domain-adaptive testing diamond, 3 domain fixtures, 25 quality tests, ...
|
|
52
54
|
- [1.26.0] Codex SDLC Adapter plan, claw-code/OmO/OmX research, CC feature discovery verified, ...
|
|
53
55
|
- [1.25.0] Plugin format, 6 distribution channels (curl, Homebrew, gh, GitHub Releases), ...
|
|
54
56
|
- [1.24.0] Hook if conditionals, autocompact tuning + 1M/200K guidance, tdd_red fix, ...
|