agentic-sdlc-wizard 1.31.0 → 1.33.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +1 -1
- package/.claude-plugin/plugin.json +1 -1
- package/CHANGELOG.md +53 -0
- package/CLAUDE_CODE_SDLC_WIZARD.md +61 -27
- package/README.md +4 -0
- package/cli/bin/sdlc-wizard.js +3 -1
- package/cli/init.js +44 -1
- package/cli/templates/settings.json +12 -1
- package/hooks/hooks.json +10 -0
- package/hooks/instructions-loaded-check.sh +32 -0
- package/hooks/model-effort-check.sh +43 -0
- package/package.json +1 -1
- package/skills/sdlc/SKILL.md +15 -0
- package/skills/setup/SKILL.md +6 -5
- package/skills/update/SKILL.md +3 -1
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,59 @@ All notable changes to the SDLC Wizard.
|
|
|
4
4
|
|
|
5
5
|
> **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
|
|
6
6
|
|
|
7
|
+
## [1.33.0] - 2026-04-17
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
- `opus[1m]` as the SDLC wizard default model (#182)
|
|
11
|
+
- CLI template ships `"model": "opus[1m]"` + `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` (tuned for 1M — compacts at ~300K)
|
|
12
|
+
- `cli/init.js` `mergeSettings` merges top-level `model` on fresh installs and when absent; respects user's explicit choice; `--force` overwrites
|
|
13
|
+
- Wizard doc "1M vs 200K Context Window" section flipped to recommend `opus[1m]` as default; pricing framed as "verify current rates at docs.anthropic.com" (no stale tier-specific claims)
|
|
14
|
+
- `/sdlc` skill: new "Recommended Model" section between auto-approval and Confidence Check
|
|
15
|
+
- `/setup` skill Step 9.5: 1M default, 200K fallback (inverted from before)
|
|
16
|
+
- SDLC.md baseline bumped to v2.1.111+ (Opus 4.7 minimum)
|
|
17
|
+
- Session-start hooks now recommend `opus[1m]` alias (matches the `/model` command users run)
|
|
18
|
+
- 9 new tests (5 CLI model merge, 4 doc consistency); 6 existing autocompact tests updated to expect `30`, fixtures bumped to `50` in tests 37/38 to preserve the no-overwrite proof
|
|
19
|
+
- Codex xhigh 2-round review: 9/10 CERTIFIED
|
|
20
|
+
- Dual-channel install drift guardrails (#181)
|
|
21
|
+
- `cli/init.js` detects plugin install paths (`~/.claude/plugins-local/sdlc-wizard-wrap/`, `~/.claude/plugins/cache/sdlc-wizard-local/`) and blocks init with a typed `err.pluginPaths` error; `--force` bypasses
|
|
22
|
+
- `instructions-loaded-check.sh` non-blocking nudge when both CLI skills and Claude plugin are present in the same project
|
|
23
|
+
- HOME isolation in test files (`mktemp -d` + `trap` cleanup) prevents dev-machine HOME from leaking into assertions
|
|
24
|
+
- `path.isAbsolute(home)` guard in `detectPluginInstall` — empty/relative HOME no longer causes false-positive block
|
|
25
|
+
- `run_init_split` test helper captures stdout/stderr separately with explicit exit code
|
|
26
|
+
- 9 new CLI tests, 5 new hook tests; Codex xhigh 4-round review: 9/10 CERTIFIED
|
|
27
|
+
- Model/effort upgrade detection at session start (#179, #180)
|
|
28
|
+
- SessionStart hook nudges when configured `effortLevel` is below `xhigh` recommendation
|
|
29
|
+
- Reads `.claude/settings.local.json` → `.claude/settings.json` → `$HOME/.claude/settings.json` precedence
|
|
30
|
+
- Non-blocking (`exit 0`); asks Claude to compare recommended model against its own system prompt
|
|
31
|
+
- `claude-opus-4-6` defaults bumped to `claude-opus-4-7` in `pr-review.yml`, `evaluate.sh`, `sdp-score.sh`, `pairwise-compare.sh`
|
|
32
|
+
- Hook added to `SDLC.md` hooks table + CLI distributes `model-effort-check.sh`
|
|
33
|
+
|
|
34
|
+
### Fixed
|
|
35
|
+
- `cli/bin/sdlc-wizard.js` double-print: plugin-detect errors now suppress the outer `"Error:"` prefix since detection streams its own colored guidance block (#181)
|
|
36
|
+
|
|
37
|
+
## [1.32.0] - 2026-04-16
|
|
38
|
+
|
|
39
|
+
### Added
|
|
40
|
+
- Opus 4.7 support in benchmark workflow (#178)
|
|
41
|
+
- `claude-opus-4-7` added to model choices, `effort` input (high/xhigh/max)
|
|
42
|
+
- `--effort` passed via `claude_args`, effort recorded in artifacts + summaries
|
|
43
|
+
- Hard-fail when xhigh used with non-4.7 models (inputs resolved before shell)
|
|
44
|
+
- Artifact names include effort level to prevent collision
|
|
45
|
+
- Default: opus-4-7 + xhigh (matches CC's new default)
|
|
46
|
+
- 3 new tests (39 total model-comparison tests)
|
|
47
|
+
- `xhigh` effort level documented in wizard (#178)
|
|
48
|
+
- New effort table: high → xhigh (recommended for coding) → max
|
|
49
|
+
- Opus 4.7 changes: stricter effort adherence, budget_tokens deprecated, 64k+ max_tokens guidance
|
|
50
|
+
- Benchmark ceiling effect audit documented in wizard
|
|
51
|
+
- Cross-model audit (Codex GPT-5.4, xhigh) rated benchmark 2/10 NOT CERTIFIED
|
|
52
|
+
- 4 P0 findings: fake trials, answer key leaked, no independent verification, binary rubric
|
|
53
|
+
- 3 concrete fixes documented (remove coaching, add correctness scoring, real trials)
|
|
54
|
+
- External benchmark comparison (SWE-Bench, Aider methodology)
|
|
55
|
+
- Automation Station community Discord link in README
|
|
56
|
+
|
|
57
|
+
### Fixed
|
|
58
|
+
- Orphaned `skills/gdlc/` causing test-doc-consistency failures (deleted)
|
|
59
|
+
|
|
7
60
|
## [1.31.0] - 2026-04-14
|
|
8
61
|
|
|
9
62
|
### Added
|
|
@@ -99,6 +99,27 @@ This prevents both false positives (crying wolf) and false negatives (missing re
|
|
|
99
99
|
- Green CI = safe to upgrade. Red = stay on current version until fixed
|
|
100
100
|
- Results shown in PR with statistical confidence
|
|
101
101
|
|
|
102
|
+
### Benchmark Ceiling Effect (Known Issue — April 2026)
|
|
103
|
+
|
|
104
|
+
**Our E2E benchmark currently has zero discriminating power.** Both Opus 4.6 and 4.7 scored perfect 10/10 on the `add-feature` scenario (3 trials each, `high` effort). A cross-model audit (Codex GPT-5.4, xhigh reasoning) rated the benchmark methodology **2/10, NOT CERTIFIED** and identified 4 P0 critical issues:
|
|
105
|
+
|
|
106
|
+
| Finding | Severity | Problem |
|
|
107
|
+
|---------|----------|---------|
|
|
108
|
+
| **Fake trials** | P0 | The workflow runs the simulation ONCE, then re-scores the same output N times. "Trials" measure judge jitter, not model variance |
|
|
109
|
+
| **Answer key leaked** | P0 | The simulation prompt tells the model exactly what's scored ("You MUST use TodoWrite... scored by automated checks"). This tests obedience to rubric, not SDLC judgment |
|
|
110
|
+
| **No independent verification** | P0 | "Tests pass" is self-reported from the transcript. The evaluator never re-runs `npm test` on the final code |
|
|
111
|
+
| **Binary rubric** | P0 | Every criterion is YES/NO. The evaluator is explicitly designed for "near-zero variance." On an easy coached task, scores collapse to 10/10 |
|
|
112
|
+
|
|
113
|
+
**Three concrete fixes to break the ceiling:**
|
|
114
|
+
|
|
115
|
+
1. **Remove rubric leakage** — Don't tell the model what's scored in the simulation prompt. Let the wizard hooks and docs drive behavior naturally. Score hidden behaviors from traces, not coached compliance
|
|
116
|
+
2. **Make correctness the majority of the score** — After simulation, run an external verifier: re-run `npm test` on the modified fixture, add hidden tests the model didn't know about, inspect the actual diff. Replace transcript-only `clean_code` with diff-based quality checks
|
|
117
|
+
3. **Real trials on calibrated scenarios** — Each trial must be a fresh end-to-end simulation run on a fresh checkout. Select scenarios by pilot difficulty so top models don't all saturate (similar to Aider's hard-subset methodology). The current single-coached-toy-run approach is measuring nothing
|
|
118
|
+
|
|
119
|
+
**What external benchmarks do differently:** SWE-Bench gives a real issue plus a full repo snapshot, applies the agent's patch, and runs the repo's actual tests to score `% resolved`. Aider's polyglot benchmark was explicitly rebuilt because the old one saturated — it uses 225 harder tasks chosen to preserve headroom. Our benchmark lacks real task difficulty calibration, independent execution-based correctness, multi-task breadth, and headroom management.
|
|
120
|
+
|
|
121
|
+
**Status:** This is tracked as item #96 (E2E score audit) on the roadmap. Until fixed, the benchmark measures process compliance coaching, not model quality differentiation.
|
|
122
|
+
|
|
102
123
|
---
|
|
103
124
|
|
|
104
125
|
## Philosophy: Sensible Defaults, Smart Customization
|
|
@@ -221,12 +242,20 @@ Claude Code's **effort level** controls how much thinking the model does before
|
|
|
221
242
|
|
|
222
243
|
| Level | When to Use | How to Set |
|
|
223
244
|
|-------|-------------|------------|
|
|
224
|
-
| `high` |
|
|
225
|
-
| `
|
|
245
|
+
| `high` | Standard SDLC work. Features, bug fixes, refactoring, tests, reviews | `effort: high` in skill frontmatter (already set) |
|
|
246
|
+
| `xhigh` | **Recommended default for coding and agentic work (Opus 4.7+).** Long-running tasks, repeated tool calls, deep exploration. Claude Code defaults to this on Opus 4.7 | `/effort xhigh` or set in skill frontmatter |
|
|
247
|
+
| `max` | LOW confidence, FAILED 2x, architecture decisions, complex debugging, cross-model reviews. Reserve for genuinely frontier problems — on most workloads `max` adds cost for small quality gains | `/effort max` (session only — resets next session) |
|
|
226
248
|
|
|
227
|
-
**
|
|
249
|
+
**Effort level changes in Opus 4.7 (April 2026):**
|
|
250
|
+
- **`xhigh` is new** — sits between `high` and `max`, designed for coding and agentic work (30+ minute tasks with token budgets in the millions)
|
|
251
|
+
- **Claude Code now defaults to `xhigh`** on Opus 4.7 for all plans
|
|
252
|
+
- **Opus 4.7 respects effort levels more strictly** than 4.6 — at lower levels it scopes work tighter instead of going above and beyond. If you see shallow reasoning, raise effort rather than prompting around it
|
|
253
|
+
- **`budget_tokens` is deprecated** on Opus 4.7 — use adaptive thinking with effort instead
|
|
254
|
+
- When running at `xhigh` or `max`, set a large `max_tokens` (64k+) so the model has room to think across subagents and tool calls
|
|
228
255
|
|
|
229
|
-
|
|
256
|
+
**Why `high` was the previous default:** Claude Code uses **adaptive thinking** to dynamically allocate reasoning budget per turn. On Pro and Max plans, the default effort level was **medium (85)**, which causes the model to under-allocate reasoning on complex multi-step tasks — leading to shallow analysis, missed edge cases, and "lazy" outputs. This was [confirmed by Anthropic engineer Boris Cherny](https://github.com/anthropics/claude-code/issues/42796) and is documented at [code.claude.com](https://code.claude.com/docs/en/model-config). API, Team, and Enterprise plans default to high effort and are not affected.
|
|
257
|
+
|
|
258
|
+
The `/sdlc` skill sets `effort: high` in its frontmatter, overriding the medium default on every SDLC invocation. Consider upgrading to `effort: xhigh` on Opus 4.7+ for deeper reasoning on complex tasks.
|
|
230
259
|
|
|
231
260
|
**Nuclear option — disable adaptive thinking entirely:** Set `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` in your environment or settings.json `env` block. This forces a fixed reasoning budget per turn instead of letting the model dynamically allocate. Use this if you observe persistent quality issues even with `effort: high`. See [Claude Code model config docs](https://code.claude.com/docs/en/model-config) for details.
|
|
232
261
|
|
|
@@ -826,30 +855,31 @@ Override the default auto-compact threshold with environment variables. These ar
|
|
|
826
855
|
| `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` | Trigger compaction at this % of context capacity (1-100) | ~95% |
|
|
827
856
|
| `CLAUDE_CODE_AUTO_COMPACT_WINDOW` | Override context capacity in tokens (useful for 1M models) | Model default |
|
|
828
857
|
|
|
829
|
-
**Recommended:** The SDLC Wizard CLI sets `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=
|
|
858
|
+
**Recommended:** The SDLC Wizard CLI sets `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` and `"model": "opus[1m]"` in `.claude/settings.json` by default (tuned for the 1M context window — compacts at ~300K). To customize, edit `.claude/settings.json`:
|
|
830
859
|
|
|
831
860
|
```json
|
|
832
861
|
{
|
|
862
|
+
"model": "opus[1m]",
|
|
833
863
|
"env": {
|
|
834
|
-
"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "
|
|
864
|
+
"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "30"
|
|
835
865
|
}
|
|
836
866
|
}
|
|
837
867
|
```
|
|
838
868
|
|
|
839
|
-
Alternatively, set via shell profile (`~/.bashrc`, `~/.zshrc`) or per-project `.envrc`:
|
|
869
|
+
If you switch back to the 200K model (`opus`), raise the override to `75` — otherwise 30% of 200K = 60K compacts too early. Alternatively, set via shell profile (`~/.bashrc`, `~/.zshrc`) or per-project `.envrc`:
|
|
840
870
|
|
|
841
871
|
```bash
|
|
842
|
-
export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=
|
|
872
|
+
export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30
|
|
843
873
|
```
|
|
844
874
|
|
|
845
875
|
**Community-recommended thresholds by use case:**
|
|
846
876
|
|
|
847
877
|
| Use Case | AUTOCOMPACT % | Why |
|
|
848
878
|
|----------|--------------|-----|
|
|
849
|
-
|
|
|
850
|
-
|
|
|
879
|
+
| **SDLC default (`opus[1m]`)** | **30%** | **Fires at ~300K on 1M — right balance for plan + TDD + review sessions** |
|
|
880
|
+
| General development (200K `opus`) | 75% | Leaves room for implementation after planning |
|
|
881
|
+
| Complex refactors (200K `opus`) | 80% | Slightly more context before compaction |
|
|
851
882
|
| CI pipelines | 60% | Short tasks, compact early to stay fast |
|
|
852
|
-
| 1M context model | 30% | See "1M vs 200K" below — 95% on 1M wastes budget |
|
|
853
883
|
| Short tasks | 60-70% | Less context needed, compact early |
|
|
854
884
|
|
|
855
885
|
**Important:** Values above the default ~95% threshold have no effect — you can only trigger compaction *earlier*, not later. Noise (progress ticks, thinking blocks, stale reads) makes up 50-70% of session tokens, so threshold tuning matters less than noise reduction (scoped reads, subagents, `/compact` between phases).
|
|
@@ -862,27 +892,31 @@ The thresholds above are community consensus — not empirically validated. For
|
|
|
862
892
|
|
|
863
893
|
### 1M vs 200K Context Window
|
|
864
894
|
|
|
865
|
-
Claude Code supports both 200K and 1M context windows.
|
|
895
|
+
Claude Code supports both 200K and 1M context windows. **Default to `opus[1m]` for SDLC work** — the 1M headroom is free until you actually use it.
|
|
866
896
|
|
|
867
|
-
| | 200K Context | 1M Context |
|
|
897
|
+
| | 200K Context (`opus`) | 1M Context (`opus[1m]`) **← default** |
|
|
868
898
|
|---|---|---|
|
|
869
|
-
| **Best for** |
|
|
870
|
-
| **Typical usage** | 50-80K tokens per task | 200K+
|
|
871
|
-
| **Cost** |
|
|
872
|
-
| **Auto-compact** | Default 95% works well |
|
|
899
|
+
| **Best for** | Short one-off tasks | Normal SDLC cycles + multi-feature work |
|
|
900
|
+
| **Typical usage** | 50-80K tokens per task | 50-80K typical, up to 200K+ for complex workflows |
|
|
901
|
+
| **Cost** | Standard pricing | Anthropic currently lists the 1M window at standard pricing across the full context for supported Opus/Sonnet models — **verify current rates at [docs.anthropic.com/pricing](https://docs.anthropic.com/)** before assuming no premium |
|
|
902
|
+
| **Auto-compact** | Default 95% works well | Fires at ~76K by default ([issue #34332](https://github.com/anthropics/claude-code/issues/34332)) — **tune to 30%** |
|
|
873
903
|
| **Suggested override** | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75` | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` or `CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000` |
|
|
874
904
|
|
|
875
|
-
**
|
|
905
|
+
**Why `opus[1m]` as default:**
|
|
906
|
+
- **Long SDLC sessions accumulate context fast** — plan → TDD → review → CI shepherd on a single feature regularly crosses 100K tokens
|
|
907
|
+
- **Safety margin against autocompact loss** — cheaper to have headroom than to re-read files after a forced compact
|
|
908
|
+
- **At time of writing, Anthropic lists 1M context at standard pricing for supported Opus/Sonnet models.** Verify current rates for your plan before relying on this — see [docs.anthropic.com/pricing](https://docs.anthropic.com/)
|
|
909
|
+
|
|
910
|
+
**Set it up:** `/model opus[1m]` in your session, or set `"model": "opus[1m]"` in `.claude/settings.json`. The CLI template ships with this default. Requires Claude Code v2.1.111+ for Opus 4.7.
|
|
876
911
|
|
|
877
|
-
**
|
|
878
|
-
-
|
|
879
|
-
-
|
|
880
|
-
-
|
|
881
|
-
- Complex debugging sessions that need full history
|
|
912
|
+
**Fall back to `opus` (200K) when:**
|
|
913
|
+
- Your plan or organization charges higher rates for long-context prompts (check your billing)
|
|
914
|
+
- You're doing genuinely short one-off tasks and want slightly faster responses
|
|
915
|
+
- Your team has cost controls that flag >200K prompts
|
|
882
916
|
|
|
883
|
-
**Cost awareness:**
|
|
917
|
+
**Cost awareness:** Larger windows let you consume more tokens in one session, and total cost always scales with tokens consumed regardless of tier. Use `/cost` to monitor — a 900K-token session is meaningfully more expensive than an 80K one even at standard rates.
|
|
884
918
|
|
|
885
|
-
**
|
|
919
|
+
**Autocompact pairing (important):** If you default to `opus[1m]`, also set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` — otherwise CC's default autocompact fires at ~76K and destroys the headroom you're paying for. The CLI template sets this automatically.
|
|
886
920
|
|
|
887
921
|
---
|
|
888
922
|
|
|
@@ -2628,7 +2662,7 @@ If deployment fails or post-deploy verification catches issues:
|
|
|
2628
2662
|
|
|
2629
2663
|
**SDLC.md:**
|
|
2630
2664
|
```markdown
|
|
2631
|
-
<!-- SDLC Wizard Version: 1.
|
|
2665
|
+
<!-- SDLC Wizard Version: 1.33.0 -->
|
|
2632
2666
|
<!-- Setup Date: [DATE] -->
|
|
2633
2667
|
<!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
2634
2668
|
<!-- Git Workflow: [PRs or Solo] -->
|
|
@@ -3687,7 +3721,7 @@ Walk through updates? (y/n)
|
|
|
3687
3721
|
Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
|
|
3688
3722
|
|
|
3689
3723
|
```markdown
|
|
3690
|
-
<!-- SDLC Wizard Version: 1.
|
|
3724
|
+
<!-- SDLC Wizard Version: 1.33.0 -->
|
|
3691
3725
|
<!-- Setup Date: 2026-01-24 -->
|
|
3692
3726
|
<!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
3693
3727
|
<!-- Git Workflow: PRs -->
|
package/README.md
CHANGED
|
@@ -235,6 +235,10 @@ This isn't the only Claude Code SDLC tool. Here's an honest comparison:
|
|
|
235
235
|
| [CHANGELOG.md](CHANGELOG.md) | Version history, what changed and when |
|
|
236
236
|
| [CONTRIBUTING.md](CONTRIBUTING.md) | How to contribute, evaluation methodology |
|
|
237
237
|
|
|
238
|
+
## Community
|
|
239
|
+
|
|
240
|
+
Come join **[Automation Station](https://discord.com/invite/fGPEF7GHrF)** — a community Discord packed with software engineers bringing 40+ years of combined experience across every area of the stack (frontend, backend, infra, embedded, data, QA, DevOps, you name it). Share patterns, ask questions, compare notes on AI agents, automation, and SDLC tooling.
|
|
241
|
+
|
|
238
242
|
## Contributing
|
|
239
243
|
|
|
240
244
|
PRs welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for evaluation methodology and testing.
|
package/cli/bin/sdlc-wizard.js
CHANGED
|
@@ -42,7 +42,9 @@ if (command === 'init') {
|
|
|
42
42
|
init(process.cwd(), flags);
|
|
43
43
|
process.exit(0);
|
|
44
44
|
} catch (err) {
|
|
45
|
-
|
|
45
|
+
// Plugin-detect errors already streamed a colored guidance block to stderr
|
|
46
|
+
// from init() — skip the redundant "Error:" prefix line.
|
|
47
|
+
if (!err.pluginPaths) console.error(`Error: ${err.message}`);
|
|
46
48
|
process.exit(1);
|
|
47
49
|
}
|
|
48
50
|
} else if (command === 'check') {
|
package/cli/init.js
CHANGED
|
@@ -23,6 +23,7 @@ const FILES = [
|
|
|
23
23
|
{ src: 'hooks/sdlc-prompt-check.sh', dest: '.claude/hooks/sdlc-prompt-check.sh', executable: true, base: REPO_ROOT },
|
|
24
24
|
{ src: 'hooks/tdd-pretool-check.sh', dest: '.claude/hooks/tdd-pretool-check.sh', executable: true, base: REPO_ROOT },
|
|
25
25
|
{ src: 'hooks/instructions-loaded-check.sh', dest: '.claude/hooks/instructions-loaded-check.sh', executable: true, base: REPO_ROOT },
|
|
26
|
+
{ src: 'hooks/model-effort-check.sh', dest: '.claude/hooks/model-effort-check.sh', executable: true, base: REPO_ROOT },
|
|
26
27
|
{ src: 'skills/sdlc/SKILL.md', dest: '.claude/skills/sdlc/SKILL.md', base: REPO_ROOT },
|
|
27
28
|
{ src: 'skills/setup/SKILL.md', dest: '.claude/skills/setup/SKILL.md', base: REPO_ROOT },
|
|
28
29
|
{ src: 'skills/update/SKILL.md', dest: '.claude/skills/update/SKILL.md', base: REPO_ROOT },
|
|
@@ -35,6 +36,23 @@ const WIZARD_HOOK_MARKERS = FILES
|
|
|
35
36
|
|
|
36
37
|
const GITIGNORE_ENTRIES = ['.claude/plans/', '.claude/settings.local.json'];
|
|
37
38
|
|
|
39
|
+
// Paths where the Claude plugin form of this wizard installs.
|
|
40
|
+
// If present, running `npx init` creates duplicate /update-wizard (#181).
|
|
41
|
+
const PLUGIN_INSTALL_PATHS = [
|
|
42
|
+
'.claude/plugins-local/sdlc-wizard-wrap',
|
|
43
|
+
'.claude/plugins/cache/sdlc-wizard-local',
|
|
44
|
+
];
|
|
45
|
+
|
|
46
|
+
function detectPluginInstall(homeDir) {
|
|
47
|
+
const home = homeDir || os.homedir();
|
|
48
|
+
// Guard empty / non-absolute HOME: without this, path.join('', '.claude/...')
|
|
49
|
+
// produces a project-relative path and init falsely blocks on local dirs.
|
|
50
|
+
if (!home || !path.isAbsolute(home)) return [];
|
|
51
|
+
return PLUGIN_INSTALL_PATHS
|
|
52
|
+
.map((rel) => path.join(home, rel))
|
|
53
|
+
.filter((p) => fs.existsSync(p));
|
|
54
|
+
}
|
|
55
|
+
|
|
38
56
|
// Paths from previous versions that should be removed on upgrade
|
|
39
57
|
const OBSOLETE_PATHS = [
|
|
40
58
|
'.claude/skills/testing', // consolidated into /sdlc in v1.17.0
|
|
@@ -52,6 +70,13 @@ function mergeSettings(existingPath, templatePath, force) {
|
|
|
52
70
|
const existing = JSON.parse(fs.readFileSync(existingPath, 'utf8'));
|
|
53
71
|
const template = JSON.parse(fs.readFileSync(templatePath, 'utf8'));
|
|
54
72
|
|
|
73
|
+
// Merge top-level model field (only set if missing, unless --force).
|
|
74
|
+
// Respects user's explicit model choice; adds the wizard default on fresh
|
|
75
|
+
// installs and for users upgrading from a pre-model template.
|
|
76
|
+
if (template.model && (!('model' in existing) || force)) {
|
|
77
|
+
existing.model = template.model;
|
|
78
|
+
}
|
|
79
|
+
|
|
55
80
|
// Merge env field
|
|
56
81
|
if (template.env) {
|
|
57
82
|
if (!existing.env || typeof existing.env !== 'object' || Array.isArray(existing.env)) {
|
|
@@ -201,6 +226,24 @@ function printOps(ops) {
|
|
|
201
226
|
}
|
|
202
227
|
|
|
203
228
|
function init(targetDir, { force = false, dryRun = false } = {}) {
|
|
229
|
+
if (!dryRun && !force) {
|
|
230
|
+
const pluginPaths = detectPluginInstall();
|
|
231
|
+
if (pluginPaths.length > 0) {
|
|
232
|
+
console.error(`\n${YELLOW}Claude plugin install detected:${RESET}`);
|
|
233
|
+
for (const p of pluginPaths) console.error(` ${p}`);
|
|
234
|
+
console.error('\nInstalling via npm on top of the plugin creates duplicate /update-wizard commands.');
|
|
235
|
+
console.error('Pick one channel:');
|
|
236
|
+
console.error(` - Keep plugin: exit and use ${CYAN}/plugin update sdlc-wizard${RESET}`);
|
|
237
|
+
console.error(` - Switch to CLI: remove plugin dir above, then rerun ${CYAN}init${RESET}`);
|
|
238
|
+
console.error(` - Keep both: rerun with ${CYAN}--force${RESET} (duplicates expected)\n`);
|
|
239
|
+
const err = new Error(
|
|
240
|
+
`Plugin install detected at: ${pluginPaths.join(', ')}. Use --force to bypass.`
|
|
241
|
+
);
|
|
242
|
+
err.pluginPaths = pluginPaths;
|
|
243
|
+
throw err;
|
|
244
|
+
}
|
|
245
|
+
}
|
|
246
|
+
|
|
204
247
|
const ops = planOperations(targetDir, { force });
|
|
205
248
|
|
|
206
249
|
if (dryRun) {
|
|
@@ -408,4 +451,4 @@ function checkMarketplacePaths() {
|
|
|
408
451
|
return results;
|
|
409
452
|
}
|
|
410
453
|
|
|
411
|
-
module.exports = { init, check, planOperations, GITIGNORE_ENTRIES };
|
|
454
|
+
module.exports = { init, check, planOperations, detectPluginInstall, GITIGNORE_ENTRIES };
|
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
{
|
|
2
|
+
"model": "opus[1m]",
|
|
2
3
|
"env": {
|
|
3
|
-
"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "
|
|
4
|
+
"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "30"
|
|
4
5
|
},
|
|
5
6
|
"hooks": {
|
|
6
7
|
"UserPromptSubmit": [
|
|
@@ -34,6 +35,16 @@
|
|
|
34
35
|
}
|
|
35
36
|
]
|
|
36
37
|
}
|
|
38
|
+
],
|
|
39
|
+
"SessionStart": [
|
|
40
|
+
{
|
|
41
|
+
"hooks": [
|
|
42
|
+
{
|
|
43
|
+
"type": "command",
|
|
44
|
+
"command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/model-effort-check.sh"
|
|
45
|
+
}
|
|
46
|
+
]
|
|
47
|
+
}
|
|
37
48
|
]
|
|
38
49
|
}
|
|
39
50
|
}
|
package/hooks/hooks.json
CHANGED
|
@@ -66,6 +66,38 @@ if command -v codex > /dev/null 2>&1 && [ -d "$PROJECT_DIR/.reviews" ]; then
|
|
|
66
66
|
fi
|
|
67
67
|
fi
|
|
68
68
|
|
|
69
|
+
# Model/effort upgrade check (non-blocking, best-effort)
|
|
70
|
+
RECOMMENDED_MODEL="opus[1m]"
|
|
71
|
+
RECOMMENDED_EFFORT="xhigh"
|
|
72
|
+
if command -v jq > /dev/null 2>&1; then
|
|
73
|
+
EFFORT=""
|
|
74
|
+
PROJ="${CLAUDE_PROJECT_DIR:-$PROJECT_DIR}"
|
|
75
|
+
for f in "$PROJ/.claude/settings.local.json" "$PROJ/.claude/settings.json" "$HOME/.claude/settings.json"; do
|
|
76
|
+
if [ -f "$f" ]; then
|
|
77
|
+
val=$(jq -r '.effortLevel // empty' "$f" 2>/dev/null)
|
|
78
|
+
if [ -n "$val" ]; then EFFORT="$val"; break; fi
|
|
79
|
+
fi
|
|
80
|
+
done
|
|
81
|
+
if [ -n "$EFFORT" ] && [ "$EFFORT" != "$RECOMMENDED_EFFORT" ]; then
|
|
82
|
+
echo "Upgrade available: effort $EFFORT → $RECOMMENDED_EFFORT (run: /effort $RECOMMENDED_EFFORT)"
|
|
83
|
+
echo "Recommended model: $RECOMMENDED_MODEL (run: /model $RECOMMENDED_MODEL)"
|
|
84
|
+
fi
|
|
85
|
+
fi
|
|
86
|
+
|
|
87
|
+
# Dual-channel install check (#181) — nudge when CLI skills + Claude plugin both present
|
|
88
|
+
if [ -d "$PROJECT_DIR/.claude/skills/update" ]; then
|
|
89
|
+
for plugin_path in "$HOME/.claude/plugins-local/sdlc-wizard-wrap" "$HOME/.claude/plugins/cache/sdlc-wizard-local"; do
|
|
90
|
+
if [ -d "$plugin_path" ]; then
|
|
91
|
+
echo "WARNING: dual-install detected — CLI skills in .claude/skills/ AND Claude plugin at:"
|
|
92
|
+
echo " $plugin_path"
|
|
93
|
+
echo " Duplicate /update-wizard commands come from running both channels. Pick one:"
|
|
94
|
+
echo " - Keep plugin: remove .claude/skills/ from this project"
|
|
95
|
+
echo " - Keep CLI: /plugin uninstall sdlc-wizard (or remove plugin dir)"
|
|
96
|
+
break
|
|
97
|
+
fi
|
|
98
|
+
done
|
|
99
|
+
fi
|
|
100
|
+
|
|
69
101
|
# Claude Code version check (non-blocking, best-effort)
|
|
70
102
|
if command -v claude > /dev/null 2>&1 && command -v npm > /dev/null 2>&1; then
|
|
71
103
|
CC_LOCAL=$(claude --version 2>/dev/null | grep -o '[0-9][0-9.]*' | head -1) || true
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
# SessionStart hook — nudges user when effort level is below recommended
|
|
3
|
+
# and tells Claude the recommended model so it can compare against its own
|
|
4
|
+
# CC does NOT expose the model to hooks, so model nudge relies on Claude
|
|
5
|
+
# seeing this output and comparing against its system prompt
|
|
6
|
+
# Non-blocking: always exits 0
|
|
7
|
+
|
|
8
|
+
RECOMMENDED_MODEL="opus[1m]"
|
|
9
|
+
RECOMMENDED_EFFORT="xhigh"
|
|
10
|
+
|
|
11
|
+
# Drain stdin (SessionStart sends JSON but model field isn't in it)
|
|
12
|
+
cat > /dev/null
|
|
13
|
+
|
|
14
|
+
if ! command -v jq > /dev/null 2>&1; then
|
|
15
|
+
exit 0
|
|
16
|
+
fi
|
|
17
|
+
|
|
18
|
+
effort=""
|
|
19
|
+
project_dir="${CLAUDE_PROJECT_DIR:-.}"
|
|
20
|
+
for f in "$project_dir/.claude/settings.local.json" "$project_dir/.claude/settings.json" "$HOME/.claude/settings.json"; do
|
|
21
|
+
if [ -f "$f" ]; then
|
|
22
|
+
val=$(jq -r '.effortLevel // empty' "$f" 2>/dev/null)
|
|
23
|
+
if [ -n "$val" ]; then
|
|
24
|
+
effort="$val"
|
|
25
|
+
break
|
|
26
|
+
fi
|
|
27
|
+
fi
|
|
28
|
+
done
|
|
29
|
+
|
|
30
|
+
nudge=""
|
|
31
|
+
|
|
32
|
+
if [ -n "$effort" ] && [ "$effort" != "$RECOMMENDED_EFFORT" ]; then
|
|
33
|
+
nudge="effort: $effort → $RECOMMENDED_EFFORT (run: /effort $RECOMMENDED_EFFORT)"
|
|
34
|
+
fi
|
|
35
|
+
|
|
36
|
+
if [ -n "$nudge" ]; then
|
|
37
|
+
echo "Upgrade available:"
|
|
38
|
+
echo " $nudge"
|
|
39
|
+
echo " recommended model: $RECOMMENDED_MODEL (run: /model $RECOMMENDED_MODEL)"
|
|
40
|
+
echo " (Claude: compare recommended model against your current model — nudge user if different)"
|
|
41
|
+
fi
|
|
42
|
+
|
|
43
|
+
exit 0
|
package/package.json
CHANGED
package/skills/sdlc/SKILL.md
CHANGED
|
@@ -161,6 +161,21 @@ When auto-approving, still announce your approach — just don't wait for approv
|
|
|
161
161
|
|
|
162
162
|
**When in doubt, wait for approval.** Auto-approval is for clear-cut cases only.
|
|
163
163
|
|
|
164
|
+
## Recommended Model
|
|
165
|
+
|
|
166
|
+
**Default: `opus[1m]` (Opus 4.7 with 1M context window).** Run `/model opus[1m]` at the start of any non-trivial SDLC session.
|
|
167
|
+
|
|
168
|
+
**Why:**
|
|
169
|
+
- SDLC sessions (plan → TDD → review → CI shepherd) accumulate context fast — plans, test output, diffs, review artifacts. 200K fills up before you're done.
|
|
170
|
+
- Forced auto-compact mid-task loses your working state. Extra headroom is cheaper than re-reading files.
|
|
171
|
+
- At time of writing, Anthropic lists 1M context at standard pricing for supported Opus/Sonnet models — verify current rates for your plan before relying on this.
|
|
172
|
+
|
|
173
|
+
**Requires Claude Code v2.1.111+** for Opus 4.7.
|
|
174
|
+
|
|
175
|
+
**Pair with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30`.** Without it, CC's default auto-compact on 1M fires at ~76K and defeats the purpose. The wizard's `cli/templates/settings.json` sets both defaults on install.
|
|
176
|
+
|
|
177
|
+
**Fall back to `opus` (200K) only when:** your plan charges a premium for long-context prompts, the task is genuinely short (<30K), or team cost controls flag >200K prompts. See the "1M vs 200K Context Window" section in `CLAUDE_CODE_SDLC_WIZARD.md` for details.
|
|
178
|
+
|
|
164
179
|
## Confidence Check (REQUIRED)
|
|
165
180
|
|
|
166
181
|
Before presenting approach, STATE your confidence:
|
package/skills/setup/SKILL.md
CHANGED
|
@@ -183,16 +183,17 @@ Present suggestions and let the user confirm.
|
|
|
183
183
|
|
|
184
184
|
### Step 9.5: Context Window Configuration
|
|
185
185
|
|
|
186
|
-
The CLI
|
|
186
|
+
The CLI ships `cli/templates/settings.json` with `"model": "opus[1m]"` and `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` (tuned for the 1M context window — compacts at ~300K). This is the SDLC wizard default. Confirm the installed values, or fall back to 200K if the user prefers:
|
|
187
187
|
|
|
188
|
-
- **
|
|
189
|
-
- **
|
|
188
|
+
- **1M default (`opus[1m]`):** Confirm `"model": "opus[1m]"` at top level and `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE: "30"` under `env`. Requires Claude Code v2.1.111+ for Opus 4.7.
|
|
189
|
+
- **200K fallback (`opus`):** Edit `.claude/settings.json` — change `"model"` to `"opus"` and raise `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` to `"75"` (otherwise `30%` of 200K compacts too early at 60K).
|
|
190
190
|
|
|
191
|
-
To
|
|
191
|
+
To fall back to 200K, edit `.claude/settings.json`:
|
|
192
192
|
```json
|
|
193
193
|
{
|
|
194
|
+
"model": "opus",
|
|
194
195
|
"env": {
|
|
195
|
-
"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "
|
|
196
|
+
"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "75"
|
|
196
197
|
}
|
|
197
198
|
}
|
|
198
199
|
```
|
package/skills/update/SKILL.md
CHANGED
|
@@ -46,9 +46,11 @@ Parse all CHANGELOG entries between the user's installed version and the latest.
|
|
|
46
46
|
|
|
47
47
|
```
|
|
48
48
|
Installed: 1.24.0
|
|
49
|
-
Latest: 1.
|
|
49
|
+
Latest: 1.33.0
|
|
50
50
|
|
|
51
51
|
What changed:
|
|
52
|
+
- [1.33.0] opus[1m] as SDLC default, dual-channel install drift guardrails, model/effort session-start nudge, ...
|
|
53
|
+
- [1.32.0] Opus 4.7 + xhigh support, model/effort upgrade detection, benchmark ceiling audit, ...
|
|
52
54
|
- [1.31.0] Hook false-positive fix for non-SDLC dirs, ephemeral marketplace path warning, ...
|
|
53
55
|
- [1.30.0] Firmware fixture, model A/B comparison workflow, CC degradation detection, ...
|
|
54
56
|
- [1.29.0] Node 24 compliance, autocompact in settings.json, effectiveness scoreboard, ...
|