npm - agentic-sdlc-wizard - Versions diffs - 1.25.0 → 1.27.0 - Mend

agentic-sdlc-wizard 1.25.0 → 1.27.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +27 -0
package/CLAUDE_CODE_SDLC_WIZARD.md +252 -11
package/package.json +1 -1
package/skills/setup/SKILL.md +21 -11
package/skills/update/SKILL.md +4 -3

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -13,7 +13,7 @@
       "name": "sdlc-wizard",
       "source": ".",
       "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
-      "version": "1.25.0",
+      "version": "1.27.0",
       "author": {
         "name": "Stefan Ayala"
       },

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "sdlc-wizard",
-  "version": "1.25.0",
+  "version": "1.27.0",
   "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
   "author": {
     "name": "Stefan Ayala",

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,33 @@ All notable changes to the SDLC Wizard.
 > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
+## [1.27.0] - 2026-04-05
+### Added
+- Domain-adaptive testing diamond — setup wizard auto-detects project domain (firmware/data-science/CLI/web) and generates domain-specific TESTING.md with appropriate testing layers (#79, PR #157)
+  - Firmware/Embedded: HIL/SIL/Config Validation/Unit (no browser, no DB)
+  - Data Science: Model Evaluation/Pipeline Integration/Data Validation/Unit
+  - CLI Tool: CLI Integration/Behavior/Unit (no browser)
+  - Web/API: unchanged default (E2E/Integration/Unit)
+- Domain detection patterns in wizard doc scan tree and setup skill Step 1/2/6
+- 3 new test fixtures: firmware-embedded, data-science, cli-tool (partially satisfies #78)
+- 25 domain detection quality tests
+### Fixed
+- Setup skill cross-references: Step 4/5 now correctly reference wizard doc Steps 8/9 (caught by CI PR review)
+## [1.26.0] - 2026-04-05
+### Added
+- Codex SDLC Adapter plan — certified (9/10) through 5-round cross-model review. `BaseInfinity/codex-sdlc-wizard` repo created with plan + README. Upstream sync architecture designed (weekly GH Action monitors sdlc-wizard releases). Hooks: PreToolUse `^Bash$` for git commit/push blocking (HARD — stronger than CC), AGENTS.md for TDD guidance (SOFT), UserPromptSubmit for SDLC baseline. ~70% CC parity (#91)
+- Research: claw-code, OmO, OmX pattern analysis — 16 candidate patterns identified from 3 open-source AI agent projects (claw-code 168K stars, OmO 48K, OmX 16K). Key findings: GreenContract graduated test levels, $ralph bounded persistence loop, planning gate enforcement, planner/executor separation. All candidates require Prove It Gate before adoption. Codex certified 8/10 round 3 (#58)
+- Automated CC Feature Discovery verified working — weekly-update.yml already implements this via analyze-release.md (#85)
+### Changed
+- Roadmap: #79 (Domain-Adaptive Testing) and #92 (Autocompact Benchmarking) queued for next release
+- Research doc: `RESEARCH_58_CLAW_OMO_OMX.md` added as reference (candidates list, not commitments)
+- Codex adapter plan: `CODEX_ADAPTER_PLAN.md` added with full specs (hooks, scripts, tests, install flow)
 ## [1.25.0] - 2026-04-04
 ### Added

package/CLAUDE_CODE_SDLC_WIZARD.md CHANGED Viewed

@@ -519,6 +519,82 @@ Here's the "Testing Diamond" approach (recommended for AI agents):
 - **Unit**: Pure logic only — no DB, no API, no filesystem. ~5% of suite.
 - **The rule**: If your test doesn't open a browser or render a UI, it's not E2E — it's integration. Mislabeling leads to overinvestment in slow browser tests.
+#### Domain-Adaptive Testing Layers
+The Testing Diamond above is the Web/API default. Other project domains have fundamentally different testing layers. The setup wizard auto-detects your domain and generates the appropriate TESTING.md.
+**Domain Detection Patterns:**
+| Domain | File/Dir Indicators |
+|--------|-------------------|
+| **Firmware/Embedded** | Makefile with `flash`/`burn` targets, `.cfg` device configs, `/sys/` or `/dev/tty` references, `.c`/`.h` source, `platformio.ini`, `CMakeLists.txt` with embedded targets |
+| **Data Science** | `.ipynb` notebooks, `requirements.txt` with pandas/sklearn/tensorflow/torch, `data/` or `datasets/` dir, `models/` dir, Jupyter config |
+| **CLI Tool** | `package.json` with `"bin"` field (no React/Vue/Angular), `bin/` dir, `src/cli.*`, no `src/components/` |
+| **Web/API (default)** | Everything else — web frameworks, `src/components/`, Playwright/Cypress config, DB config. Fallback when no other domain matches |
+**Firmware/Embedded Testing Layers:**
+```
+        /\           ← Few HIL (Hardware-in-the-Loop: real device, flash + boot verify)
+       /  \
+      /    \
+     /------\
+    |        |       ← MANY SIL (Software-in-the-Loop: emulated hardware, QEMU, device sims)
+    |        |
+     \------/
+      \    /        ← Config Validation (device config parsing, constraint checks)
+       \  /
+        \/           ← Few Unit (parsers, formatters, math)
+```
+- **HIL (~5%)**: Hardware-in-the-Loop — flash to real device, verify boot, test hardware interfaces
+- **SIL (~60%)**: Software-in-the-Loop — emulated hardware via QEMU or device simulators. Best bang for buck
+- **Config Validation (~25%)**: Device config (.cfg) parsing, cross-device constraint checks, valid value ranges
+- **Unit (~10%)**: Pure logic only — parsers, formatters, math functions
+- **Mocking**: Mock hardware interfaces (`/dev/tty*`, GPIO), NEVER mock config parsers
+- NO browser tests, NO database mocking
+**Data Science Testing Layers:**
+```
+        /\           ← Few Model Evaluation (accuracy/precision/recall on holdout sets)
+       /  \
+      /    \
+     /------\
+    |        |       ← MANY Pipeline Integration (end-to-end with test datasets)
+    |        |
+     \------/
+      \    /        ← Data Validation (schema checks, distribution drift, missing values)
+       \  /
+        \/           ← Few Unit (pure transformations, feature engineering)
+```
+- **Model Evaluation (~10%)**: Accuracy, precision, recall, F1 on holdout test sets. Catches model degradation
+- **Pipeline Integration (~60%)**: End-to-end pipeline runs with test datasets. Best bang for buck
+- **Data Validation (~20%)**: Schema checks, distribution drift detection, missing value handling, type enforcement
+- **Unit (~10%)**: Pure transformations, feature engineering functions, data cleaning logic
+- **Mocking**: Mock external data sources (APIs, S3), NEVER mock data transformations
+- NO browser tests, NO traditional API endpoint testing
+**CLI Tool Testing Layers:**
+```
+     /------\
+    |        |       ← MANY CLI Integration (full invocations, real args, real filesystem)
+    |        |
+    |        |
+     \------/
+      \    /        ← Behavior (exit codes, stdout/stderr content, file creation)
+       \  /
+        \/           ← Few Unit (arg parsing, formatters, pure logic)
+```
+- **CLI Integration (~80%)**: Full CLI invocations with real arguments and real filesystem. Best bang for buck
+- **Behavior (~10%)**: Exit codes, stdout/stderr output validation, file creation/modification verification
+- **Unit (~10%)**: Argument parsing, output formatters, pure logic
+- **Mocking**: Mock network calls, NEVER mock filesystem operations
+- NO browser tests, usually NO database
 **But your team decides:**
 | Question | Your Choice |
@@ -1235,12 +1311,34 @@ Claude scans for:
 │   ├── docker-compose.yml     → Bash(docker *)
 │   └── .github/workflows/     → Bash(gh *)
 │
-└── Design system (for UI projects):
-    ├── tailwind.config.*      → Extract colors, fonts, spacing from theme
-    ├── CSS with --var-name    → Extract custom property palette
-    ├── .storybook/            → Reference as design source of truth
-    ├── MUI/Chakra theme files → Reference theming docs + overrides
-    └── /assets/, /images/     → Document asset locations
+├── Design system (for UI projects):
+│   ├── tailwind.config.*      → Extract colors, fonts, spacing from theme
+│   ├── CSS with --var-name    → Extract custom property palette
+│   ├── .storybook/            → Reference as design source of truth
+│   ├── MUI/Chakra theme files → Reference theming docs + overrides
+│   └── /assets/, /images/     → Document asset locations
+│
+└── Project domain (for domain-adaptive TESTING.md):
+    ├── Firmware/Embedded:
+    │   ├── Makefile with flash/burn targets
+    │   ├── .cfg device config files
+    │   ├── /sys/ or /dev/tty references in scripts
+    │   ├── .c/.h source files without web frameworks
+    │   ├── platformio.ini, CMakeLists.txt
+    │   └── No package.json with web frameworks
+    ├── Data Science:
+    │   ├── .ipynb notebook files
+    │   ├── requirements.txt with pandas/sklearn/tensorflow/torch
+    │   ├── data/ or datasets/ directory
+    │   ├── models/ directory
+    │   └── No Express/FastAPI/Rails web framework
+    ├── CLI Tool:
+    │   ├── package.json with "bin" field (no React/Vue/Angular deps)
+    │   ├── bin/ directory with executable scripts
+    │   ├── src/cli.* entry point
+    │   └── No src/components/, no browser test config
+    └── Web/API (default):
+        └── Everything else — fallback when no other domain matches
 ```
 **If Claude can't detect something, it asks.** Never assumes.
@@ -1553,6 +1651,7 @@ Each resolved data point (whether detected or confirmed by the user) maps to gen
 | Infrastructure (DB, cache) | `CLAUDE.md` - Architecture section, `TESTING.md` - mock decisions |
 | Test duration | `SDLC skill` - wait time note |
 | Test types (E2E) | `TESTING.md` - testing diamond top |
+| Project domain (firmware/data-science/CLI/web) | `TESTING.md` - domain-adaptive testing layers and mocking rules |
 ---
@@ -2497,7 +2596,7 @@ If deployment fails or post-deploy verification catches issues:
 **SDLC.md:**
 ```markdown
-<!-- SDLC Wizard Version: 1.25.0 -->
+<!-- SDLC Wizard Version: 1.27.0 -->
 <!-- Setup Date: [DATE] -->
 <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: [PRs or Solo] -->
@@ -2525,26 +2624,168 @@ See `.claude/skills/sdlc/SKILL.md` for the enforced checklist.
 - Survives file edits
 - Travels with the repo
-**TESTING.md:**
+**TESTING.md (domain-adaptive — generate the template matching the detected domain):**
+**Web/API (default):**
 ```markdown
 # Testing Guidelines
-See `TESTING.md` for TDD philosophy.
+## Testing Diamond
+Integration tests are best bang for buck. Mocks can "pass" while production fails.
+| Layer | What It Tests | % of Suite |
+|-------|--------------|------------|
+| E2E | Full user flow through browser (Playwright, Cypress) | ~5% |
+| Integration | Real DB, real cache, API-level — no UI | ~90% |
+| Unit | Pure logic — no DB, no API, no filesystem | ~5% |
 ## Test Commands
 - All tests: `[your command]`
 - Specific test: `[your command]`
+## Mocking Rules
+| Dependency | Mock? | Why |
+|------------|-------|-----|
+| Database | NEVER | Use test DB or in-memory |
+| Cache | NEVER | Use isolated test instance |
+| External APIs | YES | Real calls = flaky + expensive |
+| Time/Date | YES | Determinism |
 ## Fixtures
-Location: `[Claude will discover or ask - e.g., tests/fixtures/, test-data/]`
+Location: `[tests/fixtures/ or test-data/]`
 ## Lessons Learned
 <!-- Add testing gotchas as you discover them -->
 ```
+**Firmware/Embedded (if detected):**
+```markdown
+# Testing Guidelines
+## Testing Layers (Firmware)
+SIL tests are best bang for buck. Real hardware tests are slow but prove the real thing works.
+| Layer | What It Tests | % of Suite |
+|-------|--------------|------------|
+| HIL | Hardware-in-the-Loop — real device, flash + boot verify | ~5% |
+| SIL | Software-in-the-Loop — emulated hardware (QEMU, device sims) | ~60% |
+| Config Validation | Device config parsing, constraint checks, valid ranges | ~25% |
+| Unit | Pure logic — parsers, formatters, math | ~10% |
+## Test Commands
+- All tests: `[your command, e.g., make test]`
+- Flash + verify: `[your flash command]`
+- Config validation: `[your config check command]`
+## Mocking Rules
+| Dependency | Mock? | Why |
+|------------|-------|-----|
+| Hardware interfaces (/dev/tty*, GPIO) | YES | Real hardware not always available |
+| Config parsers | NEVER | Config bugs brick devices |
+| Filesystem (/sys/, /proc/) | YES in CI | Real paths only exist on target |
+| Serial protocols | YES | Use loopback or emulator |
+## Device Matrix
+| Device | Config File | Status |
+|--------|------------|--------|
+| [device-a] | configs/device-a.cfg | [tested/untested] |
+## Lessons Learned
+<!-- Add firmware testing gotchas as you discover them -->
+```
+**Data Science (if detected):**
+```markdown
+# Testing Guidelines
+## Testing Layers (Data Science)
+Pipeline integration tests are best bang for buck. Model evaluation catches degradation.
+| Layer | What It Tests | % of Suite |
+|-------|--------------|------------|
+| Model Evaluation | Accuracy/precision/recall/F1 on holdout sets | ~10% |
+| Pipeline Integration | End-to-end pipeline runs with test datasets | ~60% |
+| Data Validation | Schema checks, distribution drift, missing values | ~20% |
+| Unit | Pure transformations, feature engineering | ~10% |
+## Test Commands
+- All tests: `[your command, e.g., pytest]`
+- Model evaluation: `[your eval command]`
+- Data validation: `[your validation command]`
+## Mocking Rules
+| Dependency | Mock? | Why |
+|------------|-------|-----|
+| External data sources (APIs, S3) | YES | Real calls = flaky + expensive |
+| Data transformations | NEVER | Transform bugs corrupt pipelines |
+| Model training | PARTIAL | Use small test datasets for speed |
+| Database/warehouse | YES in unit | Use test fixtures for integration |
+## Test Datasets
+Location: `[tests/data/ or tests/fixtures/]`
+- Keep test datasets small but representative
+- Include edge cases: missing values, wrong types, outliers
+## Lessons Learned
+<!-- Add data science testing gotchas as you discover them -->
+```
+**CLI Tool (if detected):**
+```markdown
+# Testing Guidelines
+## Testing Layers (CLI)
+CLI integration tests are best bang for buck. Test real invocations with real arguments.
+| Layer | What It Tests | % of Suite |
+|-------|--------------|------------|
+| CLI Integration | Full invocations with real args, real filesystem | ~80% |
+| Behavior | Exit codes, stdout/stderr content, file creation | ~10% |
+| Unit | Arg parsing, formatters, pure logic | ~10% |
+## Test Commands
+- All tests: `[your command]`
+- Specific test: `[your command]`
+## Mocking Rules
+| Dependency | Mock? | Why |
+|------------|-------|-----|
+| Filesystem | NEVER | CLI tools live on the filesystem |
+| Network calls | YES | Real calls = flaky |
+| Stdin/stdout | CAPTURE | Use child_process or subprocess |
+| Environment vars | SET per test | Determinism |
+## Behavior Contract
+| Input | Expected Exit Code | Expected Output |
+|-------|-------------------|----------------|
+| `--help` | 0 | Usage text |
+| (no args) | 1 | Error message |
+| `--version` | 0 | Version string |
+## Lessons Learned
+<!-- Add CLI testing gotchas as you discover them -->
+```
 ---
 **DESIGN_SYSTEM.md (if UI detected):**
@@ -3414,7 +3655,7 @@ Walk through updates? (y/n)
 Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
 ```markdown
-<!-- SDLC Wizard Version: 1.25.0 -->
+<!-- SDLC Wizard Version: 1.27.0 -->
 <!-- Setup Date: 2026-01-24 -->
 <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: PRs -->

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agentic-sdlc-wizard",
-  "version": "1.25.0",
+  "version": "1.27.0",
   "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
   "bin": {
     "sdlc-wizard": "./cli/bin/sdlc-wizard.js"

package/skills/setup/SKILL.md CHANGED Viewed

@@ -42,6 +42,11 @@ Scan the project root for:
 - Scripts in package.json (lint, test, build, typecheck, etc.)
 - Database config files (prisma/, drizzle.config.*, knexfile.*, .env with DB_*)
 - Cache config (redis.conf, .env with REDIS_*)
+- Domain indicators (for domain-adaptive TESTING.md):
+  - Firmware/Embedded: Makefile with flash/burn targets, .cfg device configs, /sys/ or /dev/tty references, .c/.h source, platformio.ini
+  - Data Science: .ipynb notebooks, requirements.txt with pandas/sklearn/tensorflow/torch, data/ or datasets/ dir, models/ dir
+  - CLI Tool: package.json with "bin" field (no React/Vue/Angular), bin/ dir, src/cli.*, no src/components/
+  - Web/API: default — everything else (web frameworks, src/components/, Playwright/Cypress config)
 ### Step 2: Build Confidence Map
@@ -69,6 +74,7 @@ For each configuration data point, assign a confidence level based on scan resul
 | Testing | Test types | What test files exist (*.test.*, *.spec.*, e2e/, integration/) |
 | Coverage | Coverage config | nyc, c8, coverage.py config, CI coverage steps |
 | CI | CI shepherd opt-in | Only if CI detected — ALWAYS ASK |
+| Domain | Project domain | Auto-detect from domain indicators above (firmware/data-science/CLI/web). Web/API is the default fallback. One domain per project — dominant signal wins |
 **Each data point has one of three states:**
 - **RESOLVED (detected):** Found concrete evidence — config file, script, directory exists. No question needed, just confirm.
@@ -100,7 +106,7 @@ Using detected + confirmed values, generate `CLAUDE.md` with:
 - Architecture summary (from scan)
 - Special notes (infra, deployment)
-Reference: See "Step 3" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full template.
+Reference: See "Step 8" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full template.
 ### Step 5: Generate SDLC.md
@@ -118,19 +124,23 @@ Include metadata comments:
 <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 ```
-Reference: See "Step 4" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full template.
+Reference: See "Step 9" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full template.
-### Step 6: Generate TESTING.md
+### Step 6: Generate TESTING.md (Domain-Adaptive)
-Generate `TESTING.md` based on detected/confirmed testing data:
-- Testing Diamond visualization
-- Test types and their purposes
-- Mocking rules (from detected patterns or user input)
-- Test file organization (from detected structure)
-- Coverage config (from detected config or user input)
-- Framework-specific patterns
+Generate `TESTING.md` using the domain-specific template matching the detected project domain:
+- **Web/API (default)**: Standard Testing Diamond (E2E/Integration/Unit)
+- **Firmware/Embedded**: HIL/SIL/Config Validation/Unit layers
+- **Data Science**: Model Evaluation/Pipeline Integration/Data Validation/Unit layers
+- **CLI Tool**: CLI Integration/Behavior/Unit layers
-Reference: See "Step 5" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full template.
+Each domain template includes:
+- Domain-appropriate testing layer visualization and percentages
+- Domain-specific mocking rules (what to mock, what NEVER to mock)
+- Test commands and fixture locations
+- Domain-specific sections (Device Matrix for firmware, Test Datasets for data science, Behavior Contract for CLI)
+Reference: See "Step 9" in `CLAUDE_CODE_SDLC_WIZARD.md` for the full domain-conditional templates.
 ### Step 7: Generate ARCHITECTURE.md

package/skills/update/SKILL.md CHANGED Viewed

@@ -45,13 +45,14 @@ Extract the latest version from the first `## [X.X.X]` line.
 Parse all CHANGELOG entries between the user's installed version and the latest. Present a clear summary:
 ```
-Installed: 1.23.0
-Latest:    1.25.0
+Installed: 1.24.0
+Latest:    1.27.0
 What changed:
+- [1.27.0] Domain-adaptive testing diamond, 3 domain fixtures, 25 quality tests, ...
+- [1.26.0] Codex SDLC Adapter plan, claw-code/OmO/OmX research, CC feature discovery verified, ...
 - [1.25.0] Plugin format, 6 distribution channels (curl, Homebrew, gh, GitHub Releases), ...
 - [1.24.0] Hook if conditionals, autocompact tuning + 1M/200K guidance, tdd_red fix, ...
-- [1.23.0] Update notification hook, cross-model review standardization, ...
 ```
 **If versions match:** Say "You're up to date! (version X.X.X)" and stop.