agentic-sdlc-wizard 1.26.0 → 1.28.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -13,7 +13,7 @@
13
13
  "name": "sdlc-wizard",
14
14
  "source": ".",
15
15
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
16
- "version": "1.26.0",
16
+ "version": "1.28.0",
17
17
  "author": {
18
18
  "name": "Stefan Ayala"
19
19
  },
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "sdlc-wizard",
3
- "version": "1.26.0",
3
+ "version": "1.28.0",
4
4
  "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
5
5
  "author": {
6
6
  "name": "Stefan Ayala",
package/CHANGELOG.md CHANGED
@@ -4,6 +4,38 @@ All notable changes to the SDLC Wizard.
4
4
 
5
5
  > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
6
6
 
7
+ ## [1.28.0] - 2026-04-06
8
+
9
+ ### Added
10
+ - Autocompact benchmarking methodology — first rigorous framework for testing `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` thresholds (#92, PR #158)
11
+ - `AUTOCOMPACT_BENCHMARK.md`: experimental design, canary fact mechanism, cost estimation, limitations
12
+ - `tests/benchmarks/run-benchmark.sh`: parameterized harness with `--dry-run`, threshold validation, multi-turn session via `--resume`
13
+ - `tests/benchmarks/analyze-results.sh`: statistical comparison tables using `stats.sh`
14
+ - 3 task files (short/medium/long) with canary fact injection for context preservation measurement
15
+ - `canary-facts.json`: 5 domain-independent facts for binary recall scoring with negation detection
16
+ - `.github/workflows/benchmark-autocompact.yml`: `workflow_dispatch` with matrix strategy across thresholds
17
+ - 26 quality tests proving methodology rigor, harness behavior, and research standards
18
+
19
+ ### Changed
20
+ - README bio: reflects full-stack founding engineer background (not just SDET/QA)
21
+ - Wizard doc autocompact section references benchmarking methodology
22
+ - Workflow count updated (5→6) across README and CI
23
+
24
+ ## [1.27.0] - 2026-04-05
25
+
26
+ ### Added
27
+ - Domain-adaptive testing diamond — setup wizard auto-detects project domain (firmware/data-science/CLI/web) and generates domain-specific TESTING.md with appropriate testing layers (#79, PR #157)
28
+ - Firmware/Embedded: HIL/SIL/Config Validation/Unit (no browser, no DB)
29
+ - Data Science: Model Evaluation/Pipeline Integration/Data Validation/Unit
30
+ - CLI Tool: CLI Integration/Behavior/Unit (no browser)
31
+ - Web/API: unchanged default (E2E/Integration/Unit)
32
+ - Domain detection patterns in wizard doc scan tree and setup skill Step 1/2/6
33
+ - 3 new test fixtures: firmware-embedded, data-science, cli-tool (partially satisfies #78)
34
+ - 25 domain detection quality tests
35
+
36
+ ### Fixed
37
+ - Setup skill cross-references: Step 4/5 now correctly reference wizard doc Steps 8/9 (caught by CI PR review)
38
+
7
39
  ## [1.26.0] - 2026-04-05
8
40
 
9
41
  ### Added
@@ -519,6 +519,82 @@ Here's the "Testing Diamond" approach (recommended for AI agents):
519
519
  - **Unit**: Pure logic only — no DB, no API, no filesystem. ~5% of suite.
520
520
  - **The rule**: If your test doesn't open a browser or render a UI, it's not E2E — it's integration. Mislabeling leads to overinvestment in slow browser tests.
521
521
 
522
+ #### Domain-Adaptive Testing Layers
523
+
524
+ The Testing Diamond above is the Web/API default. Other project domains have fundamentally different testing layers. The setup wizard auto-detects your domain and generates the appropriate TESTING.md.
525
+
526
+ **Domain Detection Patterns:**
527
+
528
+ | Domain | File/Dir Indicators |
529
+ |--------|-------------------|
530
+ | **Firmware/Embedded** | Makefile with `flash`/`burn` targets, `.cfg` device configs, `/sys/` or `/dev/tty` references, `.c`/`.h` source, `platformio.ini`, `CMakeLists.txt` with embedded targets |
531
+ | **Data Science** | `.ipynb` notebooks, `requirements.txt` with pandas/sklearn/tensorflow/torch, `data/` or `datasets/` dir, `models/` dir, Jupyter config |
532
+ | **CLI Tool** | `package.json` with `"bin"` field (no React/Vue/Angular), `bin/` dir, `src/cli.*`, no `src/components/` |
533
+ | **Web/API (default)** | Everything else — web frameworks, `src/components/`, Playwright/Cypress config, DB config. Fallback when no other domain matches |
534
+
535
+ **Firmware/Embedded Testing Layers:**
536
+
537
+ ```
538
+ /\ ← Few HIL (Hardware-in-the-Loop: real device, flash + boot verify)
539
+ / \
540
+ / \
541
+ /------\
542
+ | | ← MANY SIL (Software-in-the-Loop: emulated hardware, QEMU, device sims)
543
+ | |
544
+ \------/
545
+ \ / ← Config Validation (device config parsing, constraint checks)
546
+ \ /
547
+ \/ ← Few Unit (parsers, formatters, math)
548
+ ```
549
+
550
+ - **HIL (~5%)**: Hardware-in-the-Loop — flash to real device, verify boot, test hardware interfaces
551
+ - **SIL (~60%)**: Software-in-the-Loop — emulated hardware via QEMU or device simulators. Best bang for buck
552
+ - **Config Validation (~25%)**: Device config (.cfg) parsing, cross-device constraint checks, valid value ranges
553
+ - **Unit (~10%)**: Pure logic only — parsers, formatters, math functions
554
+ - **Mocking**: Mock hardware interfaces (`/dev/tty*`, GPIO), NEVER mock config parsers
555
+ - NO browser tests, NO database mocking
556
+
557
+ **Data Science Testing Layers:**
558
+
559
+ ```
560
+ /\ ← Few Model Evaluation (accuracy/precision/recall on holdout sets)
561
+ / \
562
+ / \
563
+ /------\
564
+ | | ← MANY Pipeline Integration (end-to-end with test datasets)
565
+ | |
566
+ \------/
567
+ \ / ← Data Validation (schema checks, distribution drift, missing values)
568
+ \ /
569
+ \/ ← Few Unit (pure transformations, feature engineering)
570
+ ```
571
+
572
+ - **Model Evaluation (~10%)**: Accuracy, precision, recall, F1 on holdout test sets. Catches model degradation
573
+ - **Pipeline Integration (~60%)**: End-to-end pipeline runs with test datasets. Best bang for buck
574
+ - **Data Validation (~20%)**: Schema checks, distribution drift detection, missing value handling, type enforcement
575
+ - **Unit (~10%)**: Pure transformations, feature engineering functions, data cleaning logic
576
+ - **Mocking**: Mock external data sources (APIs, S3), NEVER mock data transformations
577
+ - NO browser tests, NO traditional API endpoint testing
578
+
579
+ **CLI Tool Testing Layers:**
580
+
581
+ ```
582
+ /------\
583
+ | | ← MANY CLI Integration (full invocations, real args, real filesystem)
584
+ | |
585
+ | |
586
+ \------/
587
+ \ / ← Behavior (exit codes, stdout/stderr content, file creation)
588
+ \ /
589
+ \/ ← Few Unit (arg parsing, formatters, pure logic)
590
+ ```
591
+
592
+ - **CLI Integration (~80%)**: Full CLI invocations with real arguments and real filesystem. Best bang for buck
593
+ - **Behavior (~10%)**: Exit codes, stdout/stderr output validation, file creation/modification verification
594
+ - **Unit (~10%)**: Argument parsing, output formatters, pure logic
595
+ - **Mocking**: Mock network calls, NEVER mock filesystem operations
596
+ - NO browser tests, usually NO database
597
+
522
598
  **But your team decides:**
523
599
 
524
600
  | Question | Your Choice |
@@ -724,7 +800,7 @@ Two tools for managing context — use the right one:
724
800
 
725
801
  ### Autocompact Tuning
726
802
 
727
- Override the default auto-compact threshold with environment variables. These are community-discovered settings referenced in upstream issues ([#34332](https://github.com/anthropics/claude-code/issues/34332), [#42375](https://github.com/anthropics/claude-code/issues/42375)) — not yet officially documented by Anthropic:
803
+ Override the default auto-compact threshold with environment variables. These are community-discovered settings referenced in upstream issues ([#34332](https://github.com/anthropics/claude-code/issues/34332), [#42375](https://github.com/anthropics/claude-code/issues/42375)) — not yet officially documented by Anthropic. For a rigorous benchmarking methodology to validate these thresholds, see [AUTOCOMPACT_BENCHMARK.md](AUTOCOMPACT_BENCHMARK.md).
728
804
 
729
805
  | Variable | What It Does | Default |
730
806
  |----------|-------------|---------|
@@ -752,6 +828,10 @@ export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75
752
828
 
753
829
  **Note:** These env vars may change as Claude Code evolves. Check [Claude Code settings docs](https://docs.anthropic.com/en/docs/claude-code/settings) for the latest supported configuration.
754
830
 
831
+ ### Benchmarking Methodology
832
+
833
+ The thresholds above are community consensus — not empirically validated. For rigorous benchmarking of autocompact thresholds (measuring task quality, context preservation, and cost at each setting), see [AUTOCOMPACT_BENCHMARK.md](AUTOCOMPACT_BENCHMARK.md). It provides a controlled experimental methodology with a novel "canary fact" mechanism for measuring context preservation post-compaction.
834
+
755
835
  ### 1M vs 200K Context Window
756
836
 
757
837
  Claude Code supports both 200K and 1M context windows. Choose based on your task:
@@ -1235,12 +1315,34 @@ Claude scans for:
1235
1315
  │ ├── docker-compose.yml → Bash(docker *)
1236
1316
  │ └── .github/workflows/ → Bash(gh *)
1237
1317
 
1238
- └── Design system (for UI projects):
1239
- ├── tailwind.config.* → Extract colors, fonts, spacing from theme
1240
- ├── CSS with --var-name → Extract custom property palette
1241
- ├── .storybook/ → Reference as design source of truth
1242
- ├── MUI/Chakra theme files → Reference theming docs + overrides
1243
- └── /assets/, /images/ → Document asset locations
1318
+ ├── Design system (for UI projects):
1319
+ ├── tailwind.config.* → Extract colors, fonts, spacing from theme
1320
+ ├── CSS with --var-name → Extract custom property palette
1321
+ ├── .storybook/ → Reference as design source of truth
1322
+ ├── MUI/Chakra theme files → Reference theming docs + overrides
1323
+ └── /assets/, /images/ → Document asset locations
1324
+
1325
+ └── Project domain (for domain-adaptive TESTING.md):
1326
+ ├── Firmware/Embedded:
1327
+ │ ├── Makefile with flash/burn targets
1328
+ │ ├── .cfg device config files
1329
+ │ ├── /sys/ or /dev/tty references in scripts
1330
+ │ ├── .c/.h source files without web frameworks
1331
+ │ ├── platformio.ini, CMakeLists.txt
1332
+ │ └── No package.json with web frameworks
1333
+ ├── Data Science:
1334
+ │ ├── .ipynb notebook files
1335
+ │ ├── requirements.txt with pandas/sklearn/tensorflow/torch
1336
+ │ ├── data/ or datasets/ directory
1337
+ │ ├── models/ directory
1338
+ │ └── No Express/FastAPI/Rails web framework
1339
+ ├── CLI Tool:
1340
+ │ ├── package.json with "bin" field (no React/Vue/Angular deps)
1341
+ │ ├── bin/ directory with executable scripts
1342
+ │ ├── src/cli.* entry point
1343
+ │ └── No src/components/, no browser test config
1344
+ └── Web/API (default):
1345
+ └── Everything else — fallback when no other domain matches
1244
1346
  ```
1245
1347
 
1246
1348
  **If Claude can't detect something, it asks.** Never assumes.
@@ -1553,6 +1655,7 @@ Each resolved data point (whether detected or confirmed by the user) maps to gen
1553
1655
  | Infrastructure (DB, cache) | `CLAUDE.md` - Architecture section, `TESTING.md` - mock decisions |
1554
1656
  | Test duration | `SDLC skill` - wait time note |
1555
1657
  | Test types (E2E) | `TESTING.md` - testing diamond top |
1658
+ | Project domain (firmware/data-science/CLI/web) | `TESTING.md` - domain-adaptive testing layers and mocking rules |
1556
1659
 
1557
1660
  ---
1558
1661
 
@@ -2497,7 +2600,7 @@ If deployment fails or post-deploy verification catches issues:
2497
2600
 
2498
2601
  **SDLC.md:**
2499
2602
  ```markdown
2500
- <!-- SDLC Wizard Version: 1.26.0 -->
2603
+ <!-- SDLC Wizard Version: 1.28.0 -->
2501
2604
  <!-- Setup Date: [DATE] -->
2502
2605
  <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
2503
2606
  <!-- Git Workflow: [PRs or Solo] -->
@@ -2525,26 +2628,168 @@ See `.claude/skills/sdlc/SKILL.md` for the enforced checklist.
2525
2628
  - Survives file edits
2526
2629
  - Travels with the repo
2527
2630
 
2528
- **TESTING.md:**
2631
+ **TESTING.md (domain-adaptive — generate the template matching the detected domain):**
2632
+
2633
+ **Web/API (default):**
2529
2634
  ```markdown
2530
2635
  # Testing Guidelines
2531
2636
 
2532
- See `TESTING.md` for TDD philosophy.
2637
+ ## Testing Diamond
2638
+
2639
+ Integration tests are best bang for buck. Mocks can "pass" while production fails.
2640
+
2641
+ | Layer | What It Tests | % of Suite |
2642
+ |-------|--------------|------------|
2643
+ | E2E | Full user flow through browser (Playwright, Cypress) | ~5% |
2644
+ | Integration | Real DB, real cache, API-level — no UI | ~90% |
2645
+ | Unit | Pure logic — no DB, no API, no filesystem | ~5% |
2533
2646
 
2534
2647
  ## Test Commands
2535
2648
 
2536
2649
  - All tests: `[your command]`
2537
2650
  - Specific test: `[your command]`
2538
2651
 
2652
+ ## Mocking Rules
2653
+
2654
+ | Dependency | Mock? | Why |
2655
+ |------------|-------|-----|
2656
+ | Database | NEVER | Use test DB or in-memory |
2657
+ | Cache | NEVER | Use isolated test instance |
2658
+ | External APIs | YES | Real calls = flaky + expensive |
2659
+ | Time/Date | YES | Determinism |
2660
+
2539
2661
  ## Fixtures
2540
2662
 
2541
- Location: `[Claude will discover or ask - e.g., tests/fixtures/, test-data/]`
2663
+ Location: `[tests/fixtures/ or test-data/]`
2542
2664
 
2543
2665
  ## Lessons Learned
2544
2666
 
2545
2667
  <!-- Add testing gotchas as you discover them -->
2546
2668
  ```
2547
2669
 
2670
+ **Firmware/Embedded (if detected):**
2671
+ ```markdown
2672
+ # Testing Guidelines
2673
+
2674
+ ## Testing Layers (Firmware)
2675
+
2676
+ SIL tests are best bang for buck. Real hardware tests are slow but prove the real thing works.
2677
+
2678
+ | Layer | What It Tests | % of Suite |
2679
+ |-------|--------------|------------|
2680
+ | HIL | Hardware-in-the-Loop — real device, flash + boot verify | ~5% |
2681
+ | SIL | Software-in-the-Loop — emulated hardware (QEMU, device sims) | ~60% |
2682
+ | Config Validation | Device config parsing, constraint checks, valid ranges | ~25% |
2683
+ | Unit | Pure logic — parsers, formatters, math | ~10% |
2684
+
2685
+ ## Test Commands
2686
+
2687
+ - All tests: `[your command, e.g., make test]`
2688
+ - Flash + verify: `[your flash command]`
2689
+ - Config validation: `[your config check command]`
2690
+
2691
+ ## Mocking Rules
2692
+
2693
+ | Dependency | Mock? | Why |
2694
+ |------------|-------|-----|
2695
+ | Hardware interfaces (/dev/tty*, GPIO) | YES | Real hardware not always available |
2696
+ | Config parsers | NEVER | Config bugs brick devices |
2697
+ | Filesystem (/sys/, /proc/) | YES in CI | Real paths only exist on target |
2698
+ | Serial protocols | YES | Use loopback or emulator |
2699
+
2700
+ ## Device Matrix
2701
+
2702
+ | Device | Config File | Status |
2703
+ |--------|------------|--------|
2704
+ | [device-a] | configs/device-a.cfg | [tested/untested] |
2705
+
2706
+ ## Lessons Learned
2707
+
2708
+ <!-- Add firmware testing gotchas as you discover them -->
2709
+ ```
2710
+
2711
+ **Data Science (if detected):**
2712
+ ```markdown
2713
+ # Testing Guidelines
2714
+
2715
+ ## Testing Layers (Data Science)
2716
+
2717
+ Pipeline integration tests are best bang for buck. Model evaluation catches degradation.
2718
+
2719
+ | Layer | What It Tests | % of Suite |
2720
+ |-------|--------------|------------|
2721
+ | Model Evaluation | Accuracy/precision/recall/F1 on holdout sets | ~10% |
2722
+ | Pipeline Integration | End-to-end pipeline runs with test datasets | ~60% |
2723
+ | Data Validation | Schema checks, distribution drift, missing values | ~20% |
2724
+ | Unit | Pure transformations, feature engineering | ~10% |
2725
+
2726
+ ## Test Commands
2727
+
2728
+ - All tests: `[your command, e.g., pytest]`
2729
+ - Model evaluation: `[your eval command]`
2730
+ - Data validation: `[your validation command]`
2731
+
2732
+ ## Mocking Rules
2733
+
2734
+ | Dependency | Mock? | Why |
2735
+ |------------|-------|-----|
2736
+ | External data sources (APIs, S3) | YES | Real calls = flaky + expensive |
2737
+ | Data transformations | NEVER | Transform bugs corrupt pipelines |
2738
+ | Model training | PARTIAL | Use small test datasets for speed |
2739
+ | Database/warehouse | YES in unit | Use test fixtures for integration |
2740
+
2741
+ ## Test Datasets
2742
+
2743
+ Location: `[tests/data/ or tests/fixtures/]`
2744
+ - Keep test datasets small but representative
2745
+ - Include edge cases: missing values, wrong types, outliers
2746
+
2747
+ ## Lessons Learned
2748
+
2749
+ <!-- Add data science testing gotchas as you discover them -->
2750
+ ```
2751
+
2752
+ **CLI Tool (if detected):**
2753
+ ```markdown
2754
+ # Testing Guidelines
2755
+
2756
+ ## Testing Layers (CLI)
2757
+
2758
+ CLI integration tests are best bang for buck. Test real invocations with real arguments.
2759
+
2760
+ | Layer | What It Tests | % of Suite |
2761
+ |-------|--------------|------------|
2762
+ | CLI Integration | Full invocations with real args, real filesystem | ~80% |
2763
+ | Behavior | Exit codes, stdout/stderr content, file creation | ~10% |
2764
+ | Unit | Arg parsing, formatters, pure logic | ~10% |
2765
+
2766
+ ## Test Commands
2767
+
2768
+ - All tests: `[your command]`
2769
+ - Specific test: `[your command]`
2770
+
2771
+ ## Mocking Rules
2772
+
2773
+ | Dependency | Mock? | Why |
2774
+ |------------|-------|-----|
2775
+ | Filesystem | NEVER | CLI tools live on the filesystem |
2776
+ | Network calls | YES | Real calls = flaky |
2777
+ | Stdin/stdout | CAPTURE | Use child_process or subprocess |
2778
+ | Environment vars | SET per test | Determinism |
2779
+
2780
+ ## Behavior Contract
2781
+
2782
+ | Input | Expected Exit Code | Expected Output |
2783
+ |-------|-------------------|----------------|
2784
+ | `--help` | 0 | Usage text |
2785
+ | (no args) | 1 | Error message |
2786
+ | `--version` | 0 | Version string |
2787
+
2788
+ ## Lessons Learned
2789
+
2790
+ <!-- Add CLI testing gotchas as you discover them -->
2791
+ ```
2792
+
2548
2793
  ---
2549
2794
 
2550
2795
  **DESIGN_SYSTEM.md (if UI detected):**
@@ -3414,7 +3659,7 @@ Walk through updates? (y/n)
3414
3659
  Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
3415
3660
 
3416
3661
  ```markdown
3417
- <!-- SDLC Wizard Version: 1.26.0 -->
3662
+ <!-- SDLC Wizard Version: 1.28.0 -->
3418
3663
  <!-- Setup Date: 2026-01-24 -->
3419
3664
  <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
3420
3665
  <!-- Git Workflow: PRs -->
package/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  A **self-evolving Software Development Life Cycle (SDLC) enforcement system for AI coding agents**. Makes Claude plan before coding, test before shipping, and ask when uncertain. Measures itself getting better over time.
4
4
 
5
- **Built on 15+ years of SDET and QA engineering experience** — battle-tested patterns from real production systems, baked into an AI agent that follows tried-and-true software quality practices so you don't have to enforce them manually.
5
+ **Built on 15+ years of software engineering and founding engineering experience** — battle-tested patterns from real production systems, baked into an AI agent that follows tried-and-true software quality practices so you don't have to enforce them manually.
6
6
 
7
7
  ## Install
8
8
 
@@ -229,7 +229,7 @@ This isn't the only Claude Code SDLC tool. Here's an honest comparison:
229
229
  | Document | What It Covers |
230
230
  |----------|---------------|
231
231
  | [ARCHITECTURE.md](ARCHITECTURE.md) | System design, 5-layer diagram, data flows, file structure |
232
- | [CI_CD.md](CI_CD.md) | All 5 workflows, E2E scoring, tier system, SDP, integrity checks |
232
+ | [CI_CD.md](CI_CD.md) | All 6 workflows, E2E scoring, tier system, SDP, integrity checks |
233
233
  | [SDLC.md](SDLC.md) | Version tracking, enforcement rules, SDLC configuration |
234
234
  | [TESTING.md](TESTING.md) | Testing philosophy, test diamond, TDD approach |
235
235
  | [CHANGELOG.md](CHANGELOG.md) | Version history, what changed and when |
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agentic-sdlc-wizard",
3
- "version": "1.26.0",
3
+ "version": "1.28.0",
4
4
  "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
5
5
  "bin": {
6
6
  "sdlc-wizard": "./cli/bin/sdlc-wizard.js"
@@ -42,6 +42,11 @@ Scan the project root for:
42
42
  - Scripts in package.json (lint, test, build, typecheck, etc.)
43
43
  - Database config files (prisma/, drizzle.config.*, knexfile.*, .env with DB_*)
44
44
  - Cache config (redis.conf, .env with REDIS_*)
45
+ - Domain indicators (for domain-adaptive TESTING.md):
46
+ - Firmware/Embedded: Makefile with flash/burn targets, .cfg device configs, /sys/ or /dev/tty references, .c/.h source, platformio.ini
47
+ - Data Science: .ipynb notebooks, requirements.txt with pandas/sklearn/tensorflow/torch, data/ or datasets/ dir, models/ dir
48
+ - CLI Tool: package.json with "bin" field (no React/Vue/Angular), bin/ dir, src/cli.*, no src/components/
49
+ - Web/API: default — everything else (web frameworks, src/components/, Playwright/Cypress config)
45
50
 
46
51
  ### Step 2: Build Confidence Map
47
52
 
@@ -69,6 +74,7 @@ For each configuration data point, assign a confidence level based on scan resul
69
74
  | Testing | Test types | What test files exist (*.test.*, *.spec.*, e2e/, integration/) |
70
75
  | Coverage | Coverage config | nyc, c8, coverage.py config, CI coverage steps |
71
76
  | CI | CI shepherd opt-in | Only if CI detected — ALWAYS ASK |
77
+ | Domain | Project domain | Auto-detect from domain indicators above (firmware/data-science/CLI/web). Web/API is the default fallback. One domain per project — dominant signal wins |
72
78
 
73
79
  **Each data point has one of three states:**
74
80
  - **RESOLVED (detected):** Found concrete evidence — config file, script, directory exists. No question needed, just confirm.
@@ -100,7 +106,7 @@ Using detected + confirmed values, generate `CLAUDE.md` with:
100
106
  - Architecture summary (from scan)
101
107
  - Special notes (infra, deployment)
102
108
 
103
- Reference: See "Step 3" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full template.
109
+ Reference: See "Step 8" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full template.
104
110
 
105
111
  ### Step 5: Generate SDLC.md
106
112
 
@@ -118,19 +124,23 @@ Include metadata comments:
118
124
  <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
119
125
  ```
120
126
 
121
- Reference: See "Step 4" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full template.
127
+ Reference: See "Step 9" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full template.
122
128
 
123
- ### Step 6: Generate TESTING.md
129
+ ### Step 6: Generate TESTING.md (Domain-Adaptive)
124
130
 
125
- Generate `TESTING.md` based on detected/confirmed testing data:
126
- - Testing Diamond visualization
127
- - Test types and their purposes
128
- - Mocking rules (from detected patterns or user input)
129
- - Test file organization (from detected structure)
130
- - Coverage config (from detected config or user input)
131
- - Framework-specific patterns
131
+ Generate `TESTING.md` using the domain-specific template matching the detected project domain:
132
+ - **Web/API (default)**: Standard Testing Diamond (E2E/Integration/Unit)
133
+ - **Firmware/Embedded**: HIL/SIL/Config Validation/Unit layers
134
+ - **Data Science**: Model Evaluation/Pipeline Integration/Data Validation/Unit layers
135
+ - **CLI Tool**: CLI Integration/Behavior/Unit layers
132
136
 
133
- Reference: See "Step 5" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full template.
137
+ Each domain template includes:
138
+ - Domain-appropriate testing layer visualization and percentages
139
+ - Domain-specific mocking rules (what to mock, what NEVER to mock)
140
+ - Test commands and fixture locations
141
+ - Domain-specific sections (Device Matrix for firmware, Test Datasets for data science, Behavior Contract for CLI)
142
+
143
+ Reference: See "Step 9" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full domain-conditional templates.
134
144
 
135
145
  ### Step 7: Generate ARCHITECTURE.md
136
146
 
@@ -46,9 +46,11 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
46
46
 
47
47
  ```
48
48
  Installed: 1.24.0
49
- Latest: 1.26.0
49
+ Latest: 1.28.0
50
50
 
51
51
  What changed:
52
+ - [1.28.0] Autocompact benchmarking methodology, canary fact mechanism, benchmark harness, ...
53
+ - [1.27.0] Domain-adaptive testing diamond, 3 domain fixtures, 25 quality tests, ...
52
54
  - [1.26.0] Codex SDLC Adapter plan, claw-code/OmO/OmX research, CC feature discovery verified, ...
53
55
  - [1.25.0] Plugin format, 6 distribution channels (curl, Homebrew, gh, GitHub Releases), ...
54
56
  - [1.24.0] Hook if conditionals, autocompact tuning + 1M/200K guidance, tdd_red fix, ...