agentic-sdlc-wizard 1.15.0 → 1.18.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +26 -0
- package/CLAUDE_CODE_SDLC_WIZARD.md +231 -156
- package/README.md +51 -35
- package/cli/bin/sdlc-wizard.js +14 -3
- package/cli/init.js +202 -10
- package/cli/templates/hooks/instructions-loaded-check.sh +1 -1
- package/cli/templates/hooks/sdlc-prompt-check.sh +16 -5
- package/cli/templates/skills/sdlc/SKILL.md +159 -30
- package/cli/templates/skills/setup/SKILL.md +176 -0
- package/cli/templates/skills/update/SKILL.md +141 -0
- package/package.json +1 -1
- package/cli/templates/skills/testing/SKILL.md +0 -97
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,32 @@ All notable changes to the SDLC Wizard.
|
|
|
4
4
|
|
|
5
5
|
> **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
|
|
6
6
|
|
|
7
|
+
## [1.18.0] - 2026-03-30
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
- `/update-wizard` skill — guided update with changelog diff, per-file comparison, selective adoption
|
|
11
|
+
- `step-update-wizard` in wizard step registry
|
|
12
|
+
- CLI distributes `skills/update/SKILL.md` (now 8 managed files)
|
|
13
|
+
- `/update-wizard` reference in wizard "How to Update" section
|
|
14
|
+
|
|
15
|
+
## [1.17.0] - 2026-03-30
|
|
16
|
+
|
|
17
|
+
### Fixed
|
|
18
|
+
- Setup skill now force-reads entire wizard doc before proceeding (was just "Reference")
|
|
19
|
+
- README no longer tells users to manually invoke setup — hooks auto-invoke
|
|
20
|
+
- 3 new tests for setup auto-invoke behavior
|
|
21
|
+
|
|
22
|
+
### Changed
|
|
23
|
+
- Testing consolidation: `/testing` skill merged into `/sdlc` (#28)
|
|
24
|
+
|
|
25
|
+
## [1.16.0] - 2026-03-29
|
|
26
|
+
|
|
27
|
+
### Added
|
|
28
|
+
- Cross-model review dialogue protocol — structured FIXED/DISPUTED/ACCEPTED negotiation loop (#40)
|
|
29
|
+
- P0/P1/P2 severity rubric in PR review prompt (#34)
|
|
30
|
+
- Effort level recommendations in wizard
|
|
31
|
+
- 5 enforcement gap fixes in TodoWrite checklist (#39)
|
|
32
|
+
|
|
7
33
|
## [1.15.0] - 2026-03-25
|
|
8
34
|
|
|
9
35
|
### Added
|
|
@@ -211,6 +211,35 @@ When Anthropic provides official plugins or tools that handle something:
|
|
|
211
211
|
|
|
212
212
|
---
|
|
213
213
|
|
|
214
|
+
## Recommended Effort Level
|
|
215
|
+
|
|
216
|
+
Claude Code's **effort level** controls how much thinking the model does before responding. Higher effort = deeper reasoning but more tokens.
|
|
217
|
+
|
|
218
|
+
| Level | When to Use | How to Set |
|
|
219
|
+
|-------|-------------|------------|
|
|
220
|
+
| `high` | **Default for all SDLC work.** Features, bug fixes, refactoring, tests, reviews | `effort: high` in skill frontmatter (already set) |
|
|
221
|
+
| `max` | LOW confidence, FAILED 2x, architecture decisions, complex debugging, cross-model reviews | `/effort max` (session only — resets next session) |
|
|
222
|
+
|
|
223
|
+
**Why `high` is the default:** The `/sdlc` skill sets `effort: high` in its frontmatter, so every SDLC invocation automatically uses high effort. This gives thorough reasoning without the unbounded token cost of `max`.
|
|
224
|
+
|
|
225
|
+
**When to escalate to `max`:**
|
|
226
|
+
- You hit LOW confidence on your approach — deeper thinking may find clarity
|
|
227
|
+
- You've failed the same thing twice — something non-obvious is wrong
|
|
228
|
+
- Architecture decisions with wide blast radius
|
|
229
|
+
- Complex multi-system debugging where you need to hold many constraints
|
|
230
|
+
- Cross-model review analysis (reading and evaluating external reviewer findings)
|
|
231
|
+
|
|
232
|
+
**How it works:**
|
|
233
|
+
- `/effort max` changes effort for the current session only (resets next session)
|
|
234
|
+
- `effort: high` in SKILL.md frontmatter persists — every `/sdlc` invocation uses `high`
|
|
235
|
+
- You can also type `ultrathink` in any prompt for a single high-effort turn
|
|
236
|
+
|
|
237
|
+
**Cost note:** `max` uses significantly more tokens than `high`. Use it when the problem justifies it, not as a default.
|
|
238
|
+
|
|
239
|
+
> See also: the **Effort** column in the [Confidence Check table](#confidence-check-required) below for per-confidence-level guidance on when to escalate to `max`.
|
|
240
|
+
|
|
241
|
+
---
|
|
242
|
+
|
|
214
243
|
## Claude Code Feature Updates
|
|
215
244
|
|
|
216
245
|
> **Keep your SDLC current**: Claude Code evolves. This section documents features that enhance the SDLC workflow. Check [Claude Code releases](https://github.com/anthropics/claude-code/releases) periodically.
|
|
@@ -250,7 +279,7 @@ $ARGUMENTS
|
|
|
250
279
|
|
|
251
280
|
**Usage examples**:
|
|
252
281
|
- `/sdlc fix the login validation bug` → `$ARGUMENTS` = "fix the login validation bug"
|
|
253
|
-
- `/
|
|
282
|
+
- `/sdlc write tests for UserService` → `$ARGUMENTS` = "write tests for UserService"
|
|
254
283
|
|
|
255
284
|
**Note**: Skills still auto-invoke via hooks. This is optional polish for manual invocation.
|
|
256
285
|
|
|
@@ -276,7 +305,7 @@ New built-in commands available to use alongside the wizard:
|
|
|
276
305
|
|
|
277
306
|
### Skill Effort Frontmatter (v2.1.80+)
|
|
278
307
|
|
|
279
|
-
Skills can now set an `effort` level in frontmatter. The wizard's `/sdlc`
|
|
308
|
+
Skills can now set an `effort` level in frontmatter. The wizard's `/sdlc` skill uses `effort: high` to ensure Claude gives full attention to SDLC tasks.
|
|
280
309
|
|
|
281
310
|
### InstructionsLoaded Hook (v2.1.69+)
|
|
282
311
|
|
|
@@ -1376,7 +1405,7 @@ Your answers map to these files:
|
|
|
1376
1405
|
| Q4-Q8 (commands) | `CLAUDE.md` - Commands section |
|
|
1377
1406
|
| Q9-Q10 (infra) | `CLAUDE.md` - Architecture section, `TESTING.md` - mock decisions |
|
|
1378
1407
|
| Q11 (test duration) | `SDLC skill` - wait time note |
|
|
1379
|
-
| Q12 (E2E) | `
|
|
1408
|
+
| Q12 (E2E) | `TESTING.md` - testing diamond top |
|
|
1380
1409
|
|
|
1381
1410
|
---
|
|
1382
1411
|
|
|
@@ -1387,7 +1416,6 @@ Create these directories in your project root:
|
|
|
1387
1416
|
```bash
|
|
1388
1417
|
mkdir -p .claude/hooks
|
|
1389
1418
|
mkdir -p .claude/skills/sdlc
|
|
1390
|
-
mkdir -p .claude/skills/testing
|
|
1391
1419
|
```
|
|
1392
1420
|
|
|
1393
1421
|
**Commit to Git:** Yes! These files should be committed so your whole team gets the same SDLC enforcement. When teammates pull, they get the hooks and skills automatically.
|
|
@@ -1506,9 +1534,8 @@ The `allowedTools` array is auto-generated based on your stack detected in Step
|
|
|
1506
1534
|
The light hook outputs text that **instructs Claude** to invoke skills:
|
|
1507
1535
|
|
|
1508
1536
|
```
|
|
1509
|
-
AUTO-INVOKE
|
|
1510
|
-
- implement/fix/refactor/feature/bug/build → Invoke: Skill tool, skill="sdlc"
|
|
1511
|
-
- test/TDD/write test (standalone) → Invoke: Skill tool, skill="testing"
|
|
1537
|
+
AUTO-INVOKE SKILL (Claude MUST do this FIRST):
|
|
1538
|
+
- implement/fix/refactor/feature/bug/build/test/TDD → Invoke: Skill tool, skill="sdlc"
|
|
1512
1539
|
```
|
|
1513
1540
|
|
|
1514
1541
|
**This is text-based, not programmatic.** Claude reads this instruction and follows it. When Claude sees your message is an implementation task, it invokes the sdlc skill using the Skill tool. This loads the full SDLC guidance into context.
|
|
@@ -1530,7 +1557,7 @@ Create `.claude/hooks/sdlc-prompt-check.sh`:
|
|
|
1530
1557
|
```bash
|
|
1531
1558
|
#!/bin/bash
|
|
1532
1559
|
# Light SDLC hook - baseline reminder every prompt (~100 tokens)
|
|
1533
|
-
# Full guidance in
|
|
1560
|
+
# Full guidance in skill: .claude/skills/sdlc/
|
|
1534
1561
|
|
|
1535
1562
|
cat << 'EOF'
|
|
1536
1563
|
SDLC BASELINE:
|
|
@@ -1540,10 +1567,8 @@ SDLC BASELINE:
|
|
|
1540
1567
|
4. FAILED 2x? STOP and ASK USER
|
|
1541
1568
|
5. 🛑 ALL TESTS MUST PASS BEFORE COMMIT - NO EXCEPTIONS
|
|
1542
1569
|
|
|
1543
|
-
AUTO-INVOKE
|
|
1544
|
-
- implement/fix/refactor/feature/bug/build → Invoke: Skill tool, skill="sdlc"
|
|
1545
|
-
- test/TDD/write test (standalone) → Invoke: Skill tool, skill="testing"
|
|
1546
|
-
- If BOTH match (e.g., "fix the test") → sdlc takes precedence (includes TDD)
|
|
1570
|
+
AUTO-INVOKE SKILL (Claude MUST do this FIRST):
|
|
1571
|
+
- implement/fix/refactor/feature/bug/build/test/TDD → Invoke: Skill tool, skill="sdlc"
|
|
1547
1572
|
- DON'T invoke for: questions, explanations, reading/exploring code, simple queries
|
|
1548
1573
|
- DON'T wait for user to type /sdlc - AUTO-INVOKE based on task type
|
|
1549
1574
|
|
|
@@ -1685,13 +1710,13 @@ TodoWrite([
|
|
|
1685
1710
|
|
|
1686
1711
|
Before presenting approach, STATE your confidence:
|
|
1687
1712
|
|
|
1688
|
-
| Level | Meaning | Action |
|
|
1689
|
-
|
|
1690
|
-
| HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval |
|
|
1691
|
-
| MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties |
|
|
1692
|
-
| LOW (<60%) | Not sure | ASK USER before proceeding |
|
|
1693
|
-
| FAILED 2x | Something's wrong | STOP. ASK USER immediately |
|
|
1694
|
-
| CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help |
|
|
1713
|
+
| Level | Meaning | Action | Effort |
|
|
1714
|
+
|-------|---------|--------|--------|
|
|
1715
|
+
| HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval | `high` (default) |
|
|
1716
|
+
| MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties | `high` (default) |
|
|
1717
|
+
| LOW (<60%) | Not sure | ASK USER before proceeding | Consider `/effort max` |
|
|
1718
|
+
| FAILED 2x | Something's wrong | STOP. ASK USER immediately | Try `/effort max` |
|
|
1719
|
+
| CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help | Try `/effort max` |
|
|
1695
1720
|
|
|
1696
1721
|
## Self-Review Loop (CRITICAL)
|
|
1697
1722
|
|
|
@@ -1724,38 +1749,100 @@ PLANNING → DOCS → TDD RED → TDD GREEN → Tests Pass → Self-Review
|
|
|
1724
1749
|
|
|
1725
1750
|
**Prerequisites:** Codex CLI installed (`npm i -g @openai/codex`), OpenAI API key set.
|
|
1726
1751
|
|
|
1727
|
-
|
|
1752
|
+
### Round 1: Initial Review
|
|
1753
|
+
|
|
1728
1754
|
1. After self-review passes, write `.reviews/handoff.json`:
|
|
1729
1755
|
```jsonc
|
|
1730
1756
|
{
|
|
1731
1757
|
"review_id": "feature-xyz-001",
|
|
1732
1758
|
"status": "PENDING_REVIEW",
|
|
1759
|
+
"round": 1,
|
|
1733
1760
|
"files_changed": ["src/auth.ts", "tests/auth.test.ts"],
|
|
1734
1761
|
"review_instructions": "Review for security, edge cases, and correctness",
|
|
1735
1762
|
"artifact_path": ".reviews/feature-xyz-001/"
|
|
1736
1763
|
}
|
|
1737
1764
|
```
|
|
1738
|
-
2.
|
|
1765
|
+
2. Run the independent reviewer:
|
|
1739
1766
|
```bash
|
|
1740
1767
|
codex exec \
|
|
1741
1768
|
-c 'model_reasoning_effort="xhigh"' \
|
|
1742
1769
|
-s danger-full-access \
|
|
1743
1770
|
-o .reviews/latest-review.md \
|
|
1744
1771
|
"You are an independent code reviewer. Read .reviews/handoff.json, \
|
|
1745
|
-
review the listed files
|
|
1772
|
+
review the listed files. Output each finding with: an ID (1, 2, ...), \
|
|
1773
|
+
severity (P0/P1/P2), description, and a 'certify condition' stating \
|
|
1774
|
+
what specific change would resolve it. \
|
|
1775
|
+
End with CERTIFIED or NOT CERTIFIED."
|
|
1776
|
+
```
|
|
1777
|
+
3. If CERTIFIED → proceed to CI. If NOT CERTIFIED → go to Round 2.
|
|
1778
|
+
|
|
1779
|
+
### Round 2+: Dialogue Loop
|
|
1780
|
+
|
|
1781
|
+
When the reviewer finds issues, respond per-finding instead of silently fixing everything:
|
|
1782
|
+
|
|
1783
|
+
1. Write `.reviews/response.json`:
|
|
1784
|
+
```jsonc
|
|
1785
|
+
{
|
|
1786
|
+
"review_id": "feature-xyz-001",
|
|
1787
|
+
"round": 2,
|
|
1788
|
+
"responding_to": ".reviews/latest-review.md",
|
|
1789
|
+
"responses": [
|
|
1790
|
+
{ "finding": "1", "action": "FIXED", "summary": "Added missing validation" },
|
|
1791
|
+
{ "finding": "2", "action": "DISPUTED", "justification": "This is intentional — see CODE_REVIEW_EXCEPTIONS.md" },
|
|
1792
|
+
{ "finding": "3", "action": "ACCEPTED", "summary": "Will add test coverage" }
|
|
1793
|
+
]
|
|
1794
|
+
}
|
|
1795
|
+
```
|
|
1796
|
+
- **FIXED**: "I fixed this. Here is what changed." Reviewer verifies.
|
|
1797
|
+
- **DISPUTED**: "This is intentional/incorrect. Here is why." Reviewer accepts or rejects.
|
|
1798
|
+
- **ACCEPTED**: "You are right. Fixing now." (Same as FIXED, batched.)
|
|
1799
|
+
|
|
1800
|
+
2. Update `handoff.json` with `"status": "PENDING_RECHECK"`, increment `round`, add `"response_path"` and `"previous_review"` fields.
|
|
1801
|
+
|
|
1802
|
+
3. Run targeted recheck (NOT a full re-review):
|
|
1803
|
+
```bash
|
|
1804
|
+
codex exec \
|
|
1805
|
+
-c 'model_reasoning_effort="xhigh"' \
|
|
1806
|
+
-s danger-full-access \
|
|
1807
|
+
-o .reviews/latest-review.md \
|
|
1808
|
+
"You are doing a TARGETED RECHECK. First read .reviews/handoff.json \
|
|
1809
|
+
to find the previous_review path — read that file for the original \
|
|
1810
|
+
findings and certify conditions. Then read .reviews/response.json \
|
|
1811
|
+
for the author's responses. For each: \
|
|
1812
|
+
FIXED → verify the fix against the original certify condition. \
|
|
1813
|
+
DISPUTED → evaluate the justification (ACCEPT if sound, REJECT if not). \
|
|
1814
|
+
ACCEPTED → verify it was applied. \
|
|
1815
|
+
Do NOT raise new findings unless P0 (critical/security). \
|
|
1816
|
+
New observations go in 'Notes for next review' (non-blocking). \
|
|
1746
1817
|
End with CERTIFIED or NOT CERTIFIED."
|
|
1747
1818
|
```
|
|
1748
|
-
|
|
1819
|
+
|
|
1820
|
+
4. If CERTIFIED → done. If NOT CERTIFIED (rejected disputes or failed fixes) → fix rejected items and repeat.
|
|
1821
|
+
|
|
1822
|
+
### Convergence
|
|
1823
|
+
|
|
1824
|
+
Max 3 recheck rounds (4 total including initial review). If still NOT CERTIFIED after round 4, escalate to the user with a summary of open findings. Don't spin indefinitely.
|
|
1749
1825
|
|
|
1750
1826
|
```
|
|
1751
|
-
Self-review passes →
|
|
1752
|
-
|
|
1753
|
-
|
|
1754
|
-
|
|
1755
|
-
|
|
1756
|
-
|
|
1757
|
-
|
|
1758
|
-
|
|
1827
|
+
Self-review passes → handoff.json (round 1, PENDING_REVIEW)
|
|
1828
|
+
|
|
|
1829
|
+
Reviewer: FULL REVIEW (structured findings)
|
|
1830
|
+
|
|
|
1831
|
+
CERTIFIED? → YES → CI feedback loop
|
|
1832
|
+
|
|
|
1833
|
+
NO (findings with IDs + certify conditions)
|
|
1834
|
+
|
|
|
1835
|
+
Claude writes response.json:
|
|
1836
|
+
FIXED / DISPUTED / ACCEPTED per finding
|
|
1837
|
+
|
|
|
1838
|
+
handoff.json (round 2+, PENDING_RECHECK)
|
|
1839
|
+
|
|
|
1840
|
+
Reviewer: TARGETED RECHECK (previous findings only)
|
|
1841
|
+
|
|
|
1842
|
+
All resolved? → YES → CERTIFIED
|
|
1843
|
+
|
|
|
1844
|
+
NO → fix rejected items, repeat
|
|
1845
|
+
(max 3 rechecks, then escalate to user)
|
|
1759
1846
|
```
|
|
1760
1847
|
|
|
1761
1848
|
**Tool-agnostic:** The value is adversarial diversity (different model, different blind spots), not the specific tool. Any competing AI reviewer works.
|
|
@@ -1937,111 +2024,6 @@ CI passes -> Read review suggestions
|
|
|
1937
2024
|
|
|
1938
2025
|
---
|
|
1939
2026
|
|
|
1940
|
-
## Step 7: Create Testing Skill
|
|
1941
|
-
|
|
1942
|
-
Create `.claude/skills/testing/SKILL.md`:
|
|
1943
|
-
|
|
1944
|
-
````markdown
|
|
1945
|
-
---
|
|
1946
|
-
name: testing
|
|
1947
|
-
description: TDD and testing philosophy for writing tests, test-driven development, integration tests, and unit tests. Use this skill when writing tests, doing TDD, or debugging test issues.
|
|
1948
|
-
argument-hint: [test type] [target]
|
|
1949
|
-
---
|
|
1950
|
-
# Testing Skill - TDD & Testing Philosophy
|
|
1951
|
-
|
|
1952
|
-
## Task
|
|
1953
|
-
$ARGUMENTS
|
|
1954
|
-
|
|
1955
|
-
## Testing Diamond (CRITICAL)
|
|
1956
|
-
|
|
1957
|
-
```
|
|
1958
|
-
/\ ← Few E2E (automated or manual sign-off at end)
|
|
1959
|
-
/ \
|
|
1960
|
-
/ \
|
|
1961
|
-
/------\
|
|
1962
|
-
| | ← MANY Integration (real DB, real cache - BEST BANG FOR BUCK)
|
|
1963
|
-
| |
|
|
1964
|
-
\------/
|
|
1965
|
-
\ /
|
|
1966
|
-
\ /
|
|
1967
|
-
\/ ← Few Unit (pure logic only)
|
|
1968
|
-
```
|
|
1969
|
-
|
|
1970
|
-
**Why Integration Tests are Best Bang for Buck:**
|
|
1971
|
-
- **Speed**: Fast enough to run on every change
|
|
1972
|
-
- **Stability**: Touch real code, not mocks that lie
|
|
1973
|
-
- **Confidence**: If they pass, production usually works
|
|
1974
|
-
- **Real bugs**: Integration tests with real DB catch real bugs
|
|
1975
|
-
- Unit tests with mocks can "pass" while production fails
|
|
1976
|
-
|
|
1977
|
-
## Minimal Mocking Philosophy
|
|
1978
|
-
|
|
1979
|
-
| What | Mock? | Why |
|
|
1980
|
-
|------|-------|-----|
|
|
1981
|
-
| Database | ❌ NEVER | Use test DB or in-memory |
|
|
1982
|
-
| Cache | ❌ NEVER | Use isolated test instance |
|
|
1983
|
-
| External APIs | ✅ YES | Real calls = flaky + expensive |
|
|
1984
|
-
| Time/Date | ✅ YES | Determinism |
|
|
1985
|
-
|
|
1986
|
-
**Mocks MUST come from REAL captured data:**
|
|
1987
|
-
- Capture real API response
|
|
1988
|
-
- Save to your fixtures directory (Claude will discover where yours is, e.g., `tests/fixtures/`, `test-data/`, etc.)
|
|
1989
|
-
- Import in tests
|
|
1990
|
-
- Never guess mock shapes!
|
|
1991
|
-
|
|
1992
|
-
## TDD Tests Must PROVE
|
|
1993
|
-
|
|
1994
|
-
| Phase | What It Proves |
|
|
1995
|
-
|-------|----------------|
|
|
1996
|
-
| RED | Test FAILS → Bug exists or feature missing |
|
|
1997
|
-
| GREEN | Test PASSES → Fix works or feature implemented |
|
|
1998
|
-
| Forever | Regression protection |
|
|
1999
|
-
|
|
2000
|
-
**WRONG approach:**
|
|
2001
|
-
```
|
|
2002
|
-
// ❌ Writing test that passes with current (buggy) code
|
|
2003
|
-
assert currentBuggyBehavior == currentBuggyBehavior // pseudocode
|
|
2004
|
-
```
|
|
2005
|
-
|
|
2006
|
-
**CORRECT approach:**
|
|
2007
|
-
```
|
|
2008
|
-
// ✅ Writing test that FAILS with buggy code, PASSES with fix
|
|
2009
|
-
assert result.status == 'success' // pseudocode - adapt to your framework
|
|
2010
|
-
assert result.data != null
|
|
2011
|
-
```
|
|
2012
|
-
|
|
2013
|
-
## Unit Tests = Pure Logic ONLY
|
|
2014
|
-
|
|
2015
|
-
A function qualifies for unit testing ONLY if:
|
|
2016
|
-
- ✅ No database calls
|
|
2017
|
-
- ✅ No external API calls
|
|
2018
|
-
- ✅ No file system access
|
|
2019
|
-
- ✅ No cache calls
|
|
2020
|
-
- ✅ Input → Output transformation only
|
|
2021
|
-
|
|
2022
|
-
Everything else needs integration tests.
|
|
2023
|
-
|
|
2024
|
-
## When Stuck on Tests
|
|
2025
|
-
|
|
2026
|
-
1. Add console.logs → Check output
|
|
2027
|
-
2. Run single test in isolation
|
|
2028
|
-
3. Check fixtures match real API
|
|
2029
|
-
4. **STILL stuck?** ASK USER
|
|
2030
|
-
|
|
2031
|
-
## After Session (Capture Learnings)
|
|
2032
|
-
|
|
2033
|
-
If this session revealed testing insights, update the right place:
|
|
2034
|
-
- **Testing patterns, gotchas** → `TESTING.md`
|
|
2035
|
-
- **Feature-specific test quirks** → Feature docs (`*_PLAN.md`)
|
|
2036
|
-
- **General project context** → `CLAUDE.md` (or `/revise-claude-md`)
|
|
2037
|
-
|
|
2038
|
-
---
|
|
2039
|
-
|
|
2040
|
-
**Full reference:** TESTING.md
|
|
2041
|
-
````
|
|
2042
|
-
|
|
2043
|
-
---
|
|
2044
|
-
|
|
2045
2027
|
### Visual Regression Testing (Experimental - Niche Use Cases Only)
|
|
2046
2028
|
|
|
2047
2029
|
**Most apps don't need this.** Standard E2E testing (Playwright, Cypress) covers 99% of UI testing needs.
|
|
@@ -2230,9 +2212,27 @@ These are your full reference docs. Start with stubs and expand over time:
|
|
|
2230
2212
|
|
|
2231
2213
|
**Claude follows this automatically.** When task involves "deploy to prod" and confidence is LOW, Claude will ask before proceeding.
|
|
2232
2214
|
|
|
2215
|
+
## Post-Deploy Verification
|
|
2216
|
+
|
|
2217
|
+
**After deploying to ANY environment, verify it's working:**
|
|
2218
|
+
|
|
2219
|
+
| Environment | Health Check | Log Command | Smoke Test |
|
|
2220
|
+
|-------------|-------------|-------------|------------|
|
|
2221
|
+
| Local Dev | `curl http://localhost:3000/health` | `[your dev log command]` | `npm run test:smoke` |
|
|
2222
|
+
| Staging | `curl https://staging.example.com/health` | `[your staging log command]` | `[your staging smoke test]` |
|
|
2223
|
+
| Production | `curl https://example.com/health` | `[your prod log command, e.g., kubectl logs]` | `[your prod smoke test]` |
|
|
2224
|
+
|
|
2225
|
+
**Monitoring after production deploy:**
|
|
2226
|
+
1. Watch error rates for 15 minutes (dashboard: `[your monitoring URL]`)
|
|
2227
|
+
2. Check application logs for new errors: `[your log command]`
|
|
2228
|
+
3. Run smoke tests against production: `[your smoke test command]`
|
|
2229
|
+
4. If issues found → rollback first, THEN start new SDLC loop to fix
|
|
2230
|
+
|
|
2231
|
+
**Claude follows this automatically.** After a deploy task, Claude runs through the Post-Deploy Verification table for the target environment. If any check fails, Claude suggests rollback and a new fix cycle.
|
|
2232
|
+
|
|
2233
2233
|
## Rollback
|
|
2234
2234
|
|
|
2235
|
-
If deployment fails or
|
|
2235
|
+
If deployment fails or post-deploy verification catches issues:
|
|
2236
2236
|
|
|
2237
2237
|
| Environment | Rollback Command | Notes |
|
|
2238
2238
|
|-------------|------------------|-------|
|
|
@@ -2266,7 +2266,7 @@ If deployment fails or causes issues:
|
|
|
2266
2266
|
|
|
2267
2267
|
**SDLC.md:**
|
|
2268
2268
|
```markdown
|
|
2269
|
-
<!-- SDLC Wizard Version: 1.
|
|
2269
|
+
<!-- SDLC Wizard Version: 1.18.0 -->
|
|
2270
2270
|
<!-- Setup Date: [DATE] -->
|
|
2271
2271
|
<!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
2272
2272
|
<!-- Git Workflow: [PRs or Solo] -->
|
|
@@ -2298,7 +2298,7 @@ See `.claude/skills/sdlc/SKILL.md` for the enforced checklist.
|
|
|
2298
2298
|
```markdown
|
|
2299
2299
|
# Testing Guidelines
|
|
2300
2300
|
|
|
2301
|
-
See
|
|
2301
|
+
See `TESTING.md` for TDD philosophy.
|
|
2302
2302
|
|
|
2303
2303
|
## Test Commands
|
|
2304
2304
|
|
|
@@ -2394,7 +2394,6 @@ Verification Results:
|
|
|
2394
2394
|
├── .claude/hooks/tdd-pretool-check.sh ✓ executable
|
|
2395
2395
|
├── .claude/settings.json ✓ valid JSON
|
|
2396
2396
|
├── .claude/skills/sdlc/SKILL.md ✓ frontmatter OK
|
|
2397
|
-
├── .claude/skills/testing/SKILL.md ✓ frontmatter OK
|
|
2398
2397
|
├── CLAUDE.md ✓ exists
|
|
2399
2398
|
├── SDLC.md ✓ exists
|
|
2400
2399
|
└── TESTING.md ✓ exists
|
|
@@ -2422,14 +2421,14 @@ All checks passed! Setup complete.
|
|
|
2422
2421
|
|------|-----------------|
|
|
2423
2422
|
| "What files handle auth?" | Answers without invoking skills |
|
|
2424
2423
|
| "Add a logout button" | Auto-invokes sdlc skill, uses TodoWrite |
|
|
2425
|
-
| "Write tests for login" | Auto-invokes
|
|
2424
|
+
| "Write tests for login" | Auto-invokes sdlc skill |
|
|
2426
2425
|
|
|
2427
2426
|
**What happens automatically:**
|
|
2428
2427
|
|
|
2429
2428
|
| You Do | System Does |
|
|
2430
2429
|
|--------|-------------|
|
|
2431
2430
|
| Ask to implement something | SDLC skill auto-invokes, TodoWrite starts |
|
|
2432
|
-
| Ask to write tests |
|
|
2431
|
+
| Ask to write tests | SDLC skill auto-invokes |
|
|
2433
2432
|
| Claude tries to edit code | TDD reminder fires |
|
|
2434
2433
|
| Task completes | Compliance check runs |
|
|
2435
2434
|
|
|
@@ -2494,7 +2493,7 @@ All checks passed! Setup complete.
|
|
|
2494
2493
|
|--------|---------|
|
|
2495
2494
|
| Free context after planning | `/compact` |
|
|
2496
2495
|
| Enter planning mode | Claude suggests or `/plan` |
|
|
2497
|
-
| Run specific skill | `/sdlc`
|
|
2496
|
+
| Run specific skill | `/sdlc` |
|
|
2498
2497
|
|
|
2499
2498
|
---
|
|
2500
2499
|
|
|
@@ -2525,7 +2524,7 @@ You've successfully set up the system when:
|
|
|
2525
2524
|
|
|
2526
2525
|
- [ ] Light hook fires every prompt (you see SDLC BASELINE in responses)
|
|
2527
2526
|
- [ ] Claude auto-invokes sdlc skill for implementation tasks
|
|
2528
|
-
- [ ] Claude auto-invokes
|
|
2527
|
+
- [ ] Claude auto-invokes sdlc skill for all tasks
|
|
2529
2528
|
- [ ] Claude uses TodoWrite to track progress
|
|
2530
2529
|
- [ ] Claude states confidence levels
|
|
2531
2530
|
- [ ] Claude asks for clarification when LOW confidence
|
|
@@ -2852,13 +2851,14 @@ Use an independent AI model from a different company as a code reviewer. The aut
|
|
|
2852
2851
|
{
|
|
2853
2852
|
"review_id": "feature-xyz-001",
|
|
2854
2853
|
"status": "PENDING_REVIEW",
|
|
2854
|
+
"round": 1,
|
|
2855
2855
|
"files_changed": ["src/auth.ts", "tests/auth.test.ts"],
|
|
2856
2856
|
"review_instructions": "Review for security, edge cases, and correctness",
|
|
2857
2857
|
"artifact_path": ".reviews/feature-xyz-001/"
|
|
2858
2858
|
}
|
|
2859
2859
|
```
|
|
2860
2860
|
|
|
2861
|
-
3. Run the independent reviewer:
|
|
2861
|
+
3. Run the independent reviewer (Round 1 — full review). These commands use your Codex default model — configure it to the latest, most capable model available:
|
|
2862
2862
|
|
|
2863
2863
|
```bash
|
|
2864
2864
|
codex exec \
|
|
@@ -2866,23 +2866,95 @@ codex exec \
|
|
|
2866
2866
|
-s danger-full-access \
|
|
2867
2867
|
-o .reviews/latest-review.md \
|
|
2868
2868
|
"You are an independent code reviewer. Read .reviews/handoff.json, \
|
|
2869
|
-
review the listed files
|
|
2869
|
+
review the listed files. Output each finding with: an ID (1, 2, ...), \
|
|
2870
|
+
severity (P0/P1/P2), description, and a 'certify condition' stating \
|
|
2871
|
+
what specific change would resolve it. \
|
|
2870
2872
|
End with CERTIFIED or NOT CERTIFIED."
|
|
2871
2873
|
```
|
|
2872
2874
|
|
|
2873
|
-
|
|
2875
|
+
4. If CERTIFIED → done. If NOT CERTIFIED → enter the dialogue loop.
|
|
2876
|
+
|
|
2877
|
+
**The Dialogue Loop (Round 2+):**
|
|
2878
|
+
|
|
2879
|
+
Instead of silently fixing everything and resubmitting for another full review, respond to each finding:
|
|
2880
|
+
|
|
2881
|
+
```jsonc
|
|
2882
|
+
// .reviews/response.json
|
|
2883
|
+
{
|
|
2884
|
+
"review_id": "feature-xyz-001",
|
|
2885
|
+
"round": 2,
|
|
2886
|
+
"responding_to": ".reviews/latest-review.md",
|
|
2887
|
+
"responses": [
|
|
2888
|
+
{
|
|
2889
|
+
"finding": "1",
|
|
2890
|
+
"action": "FIXED",
|
|
2891
|
+
"summary": "Added missing mocking table to SKILL.md",
|
|
2892
|
+
"evidence": "git diff shows table at SKILL.md:195-210"
|
|
2893
|
+
},
|
|
2894
|
+
{
|
|
2895
|
+
"finding": "2",
|
|
2896
|
+
"action": "DISPUTED",
|
|
2897
|
+
"justification": "The upgrade path cleanup runs in init.js:205. Verified with test-cli.sh test 29.",
|
|
2898
|
+
"evidence": "tests/test-cli.sh:583-600"
|
|
2899
|
+
},
|
|
2900
|
+
{
|
|
2901
|
+
"finding": "3",
|
|
2902
|
+
"action": "ACCEPTED",
|
|
2903
|
+
"summary": "Will add EVAL_PROMPT_VERSION bump"
|
|
2904
|
+
}
|
|
2905
|
+
]
|
|
2906
|
+
}
|
|
2907
|
+
```
|
|
2908
|
+
|
|
2909
|
+
Three response types:
|
|
2910
|
+
- **FIXED**: "I fixed this. Here is what changed." Reviewer verifies the fix.
|
|
2911
|
+
- **DISPUTED**: "This is intentional/incorrect. Here is why." Reviewer accepts or rejects the reasoning.
|
|
2912
|
+
- **ACCEPTED**: "You are right. Fixing now." (Same outcome as FIXED, used when batching fixes.)
|
|
2913
|
+
|
|
2914
|
+
Then update `handoff.json` to `"status": "PENDING_RECHECK"`, increment `round`, add `"response_path"` and `"previous_review"` fields. Run a targeted recheck:
|
|
2915
|
+
|
|
2916
|
+
```bash
|
|
2917
|
+
codex exec \
|
|
2918
|
+
-c 'model_reasoning_effort="xhigh"' \
|
|
2919
|
+
-s danger-full-access \
|
|
2920
|
+
-o .reviews/latest-review.md \
|
|
2921
|
+
"You are doing a TARGETED RECHECK. First read .reviews/handoff.json \
|
|
2922
|
+
to find the previous_review path — read that file for the original \
|
|
2923
|
+
findings and certify conditions. Then read .reviews/response.json \
|
|
2924
|
+
for the author's responses. For each: \
|
|
2925
|
+
FIXED → verify the fix against the original certify condition. \
|
|
2926
|
+
DISPUTED → evaluate the justification (ACCEPT if sound, REJECT if not). \
|
|
2927
|
+
ACCEPTED → verify it was applied. \
|
|
2928
|
+
Do NOT raise new findings unless P0 (critical/security). \
|
|
2929
|
+
New observations go in 'Notes for next review' (non-blocking). \
|
|
2930
|
+
End with CERTIFIED or NOT CERTIFIED."
|
|
2874
2931
|
```
|
|
2875
|
-
|
|
2932
|
+
|
|
2933
|
+
**The key constraint:** Rechecks are scoped to previous findings only. The reviewer cannot block certification with new P2 observations discovered during recheck. This prevents scope creep and ensures convergence.
|
|
2934
|
+
|
|
2935
|
+
**Convergence:** Max 3 recheck rounds (4 total including initial review). If still NOT CERTIFIED after round 4, escalate to the user with a summary of all open findings. Don't spin indefinitely.
|
|
2936
|
+
|
|
2937
|
+
```
|
|
2938
|
+
Claude writes code → self-review passes → handoff.json (round 1)
|
|
2876
2939
|
↑ |
|
|
2877
2940
|
| v
|
|
2878
|
-
|
|
|
2941
|
+
| Reviewer: FULL REVIEW
|
|
2942
|
+
| (structured findings with IDs)
|
|
2879
2943
|
| |
|
|
2880
2944
|
| CERTIFIED? -+→ YES → Done
|
|
2881
2945
|
| |
|
|
2882
2946
|
| +→ NO (findings)
|
|
2883
2947
|
| |
|
|
2884
|
-
|
|
2885
|
-
|
|
2948
|
+
| Claude writes response.json:
|
|
2949
|
+
| FIXED / DISPUTED / ACCEPTED
|
|
2950
|
+
| |
|
|
2951
|
+
| Reviewer: TARGETED RECHECK
|
|
2952
|
+
| (previous findings only, no new P1/P2)
|
|
2953
|
+
| |
|
|
2954
|
+
| All resolved? → YES → CERTIFIED
|
|
2955
|
+
| |
|
|
2956
|
+
└────────── Fix rejected items ←───────────┘
|
|
2957
|
+
(max 3 rechecks, then escalate to user)
|
|
2886
2958
|
```
|
|
2887
2959
|
|
|
2888
2960
|
**Key flags:**
|
|
@@ -2952,10 +3024,13 @@ If Claude repeatedly struggles in a codebase area:
|
|
|
2952
3024
|
|
|
2953
3025
|
### How to Update
|
|
2954
3026
|
|
|
2955
|
-
|
|
3027
|
+
Use the `/update-wizard` skill for a guided, selective update experience:
|
|
3028
|
+
> `/update-wizard` — full guided update (shows changelog, per-file diff, selective adoption)
|
|
3029
|
+
> `/update-wizard check-only` — just show what changed, don't apply anything
|
|
3030
|
+
> `/update-wizard force-all` — apply all updates without per-file approval
|
|
3031
|
+
|
|
3032
|
+
Or ask Claude directly:
|
|
2956
3033
|
> "Check for SDLC wizard updates"
|
|
2957
|
-
> "Run me through the SDLC wizard"
|
|
2958
|
-
> "What am I missing from the latest wizard?"
|
|
2959
3034
|
> "Update my SDLC setup"
|
|
2960
3035
|
|
|
2961
3036
|
**All of these do the same thing:** Claude checks what's new, shows you, and walks you through only what's missing.
|
|
@@ -3034,7 +3109,7 @@ Walk through updates? (y/n)
|
|
|
3034
3109
|
Store wizard state in `SDLC.md` as metadata comments (invisible to readers, parseable by Claude):
|
|
3035
3110
|
|
|
3036
3111
|
```markdown
|
|
3037
|
-
<!-- SDLC Wizard Version: 1.
|
|
3112
|
+
<!-- SDLC Wizard Version: 1.18.0 -->
|
|
3038
3113
|
<!-- Setup Date: 2026-01-24 -->
|
|
3039
3114
|
<!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
|
|
3040
3115
|
<!-- Git Workflow: PRs -->
|
|
@@ -3067,12 +3142,12 @@ Every wizard step has a unique ID for tracking:
|
|
|
3067
3142
|
| `step-4` | Light hook | 1.0.0 |
|
|
3068
3143
|
| `step-5` | TDD hook | 1.0.0 |
|
|
3069
3144
|
| `step-6` | SDLC skill | 1.0.0 |
|
|
3070
|
-
| `step-7` | Testing skill | 1.0.0 |
|
|
3071
3145
|
| `step-8` | CLAUDE.md | 1.0.0 |
|
|
3072
3146
|
| `step-9` | SDLC/TESTING/ARCH docs | 1.0.0 |
|
|
3073
3147
|
| `question-git-workflow` | Git workflow preference | 1.2.0 |
|
|
3074
3148
|
| `step-update-notify` | Optional: CI update notification | 1.13.0 |
|
|
3075
3149
|
| `step-cross-model-review` | Optional: Cross-model review loop | 1.16.0 |
|
|
3150
|
+
| `step-update-wizard` | /update-wizard smart update skill | 1.18.0 |
|
|
3076
3151
|
|
|
3077
3152
|
When checking for updates, Claude compares user's completed steps against this registry.
|
|
3078
3153
|
|