agentic-sdlc-wizard 1.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,363 @@
1
+ ---
2
+ name: sdlc
3
+ description: Full SDLC workflow for implementing features, fixing bugs, refactoring code, and creating new functionality. Use this skill when implementing, fixing, refactoring, adding features, or building new code.
4
+ argument-hint: [task description]
5
+ effort: high
6
+ ---
7
+ # SDLC Skill - Full Development Workflow
8
+
9
+ ## Task
10
+ $ARGUMENTS
11
+
12
+ ## Full SDLC Checklist
13
+
14
+ Your FIRST action must be TodoWrite with these steps:
15
+
16
+ ```
17
+ TodoWrite([
18
+ // PLANNING PHASE (Plan Mode for non-trivial tasks)
19
+ { content: "Find and read relevant documentation", status: "in_progress", activeForm: "Reading docs" },
20
+ { content: "Assess doc health - flag issues (ask before cleaning)", status: "pending", activeForm: "Checking doc health" },
21
+ { content: "DRY scan: What patterns exist to reuse?", status: "pending", activeForm: "Scanning for reusable patterns" },
22
+ { content: "Blast radius: What depends on code I'm changing?", status: "pending", activeForm: "Checking dependencies" },
23
+ { content: "Design system check (if UI change)", status: "pending", activeForm: "Checking design system" },
24
+ { content: "Restate task in own words - verify understanding", status: "pending", activeForm: "Verifying understanding" },
25
+ { content: "Scrutinize test design - right things tested? Follow TESTING.md?", status: "pending", activeForm: "Reviewing test approach" },
26
+ { content: "Present approach + STATE CONFIDENCE LEVEL", status: "pending", activeForm: "Presenting approach" },
27
+ { content: "Signal ready - user exits plan mode", status: "pending", activeForm: "Awaiting plan approval" },
28
+ // TRANSITION PHASE (After plan mode, before compact)
29
+ { content: "Update feature docs with discovered gotchas", status: "pending", activeForm: "Updating feature docs" },
30
+ { content: "Request /compact before TDD", status: "pending", activeForm: "Requesting compact" },
31
+ // IMPLEMENTATION PHASE (After compact)
32
+ { content: "TDD RED: Write failing test FIRST", status: "pending", activeForm: "Writing failing test" },
33
+ { content: "TDD GREEN: Implement, verify test passes", status: "pending", activeForm: "Implementing feature" },
34
+ { content: "Run lint/typecheck", status: "pending", activeForm: "Running lint and typecheck" },
35
+ { content: "Run ALL tests", status: "pending", activeForm: "Running all tests" },
36
+ { content: "Production build check", status: "pending", activeForm: "Verifying production build" },
37
+ // REVIEW PHASE
38
+ { content: "DRY check: Is logic duplicated elsewhere?", status: "pending", activeForm: "Checking for duplication" },
39
+ { content: "Visual consistency check (if UI change)", status: "pending", activeForm: "Checking visual consistency" },
40
+ { content: "Self-review: run /code-review", status: "pending", activeForm: "Running code review" },
41
+ { content: "Security review (if warranted)", status: "pending", activeForm: "Checking security implications" },
42
+ { content: "Cross-model review (if configured — see below)", status: "pending", activeForm: "Running cross-model review" },
43
+ // CI FEEDBACK LOOP (if CI monitoring enabled in setup - skip if no CI)
44
+ { content: "Commit and push to remote", status: "pending", activeForm: "Pushing to remote" },
45
+ { content: "Watch CI - fix failures, iterate until green (max 2x)", status: "pending", activeForm: "Watching CI" },
46
+ { content: "Read CI review - implement valid suggestions, iterate until clean", status: "pending", activeForm: "Addressing CI review feedback" },
47
+ // FINAL
48
+ { content: "Present summary: changes, tests, CI status", status: "pending", activeForm: "Presenting final summary" }
49
+ ])
50
+ ```
51
+
52
+ ## New Pattern & Test Design Scrutiny (PLANNING)
53
+
54
+ **New design patterns require human approval:**
55
+ 1. Search first - do similar patterns exist in codebase?
56
+ 2. If YES and they're good - use as building block
57
+ 3. If YES but they're bad - propose improvement, get approval
58
+ 4. If NO (new pattern) - explain why needed, get explicit approval
59
+
60
+ **Test design scrutiny during planning:**
61
+ - Are we testing the right things?
62
+ - Does test approach follow TESTING.md philosophies?
63
+ - If introducing new test patterns, same scrutiny as code patterns
64
+
65
+ ## Plan Mode Integration
66
+
67
+ **Use plan mode for:** Multi-file changes, new features, LOW confidence, bugs needing investigation.
68
+
69
+ **Workflow:**
70
+ 1. **Plan Mode** (editing blocked): Research -> Write plan file -> Present approach + confidence
71
+ 2. **Transition** (after approval): Update feature docs -> Request /compact
72
+ 3. **Implementation** (after compact): TDD RED -> GREEN -> PASS
73
+
74
+ **Before TDD, MUST ask:** "Docs updated. Run `/compact` before implementation?"
75
+
76
+ ## Confidence Check (REQUIRED)
77
+
78
+ Before presenting approach, STATE your confidence:
79
+
80
+ | Level | Meaning | Action |
81
+ |-------|---------|--------|
82
+ | HIGH (90%+) | Know exactly what to do | Present approach, proceed after approval |
83
+ | MEDIUM (60-89%) | Solid approach, some uncertainty | Present approach, highlight uncertainties |
84
+ | LOW (<60%) | Not sure | ASK USER before proceeding |
85
+ | FAILED 2x | Something's wrong | STOP. ASK USER immediately |
86
+ | CONFUSED | Can't diagnose why something is failing | STOP. Describe what you tried, ask for help |
87
+
88
+ ## Self-Review Loop (CRITICAL)
89
+
90
+ ```
91
+ PLANNING -> DOCS -> TDD RED -> TDD GREEN -> Tests Pass -> Self-Review
92
+ ^ |
93
+ | v
94
+ | Issues found?
95
+ | |-- NO -> Present to user
96
+ | +-- YES v
97
+ +------------------------------------------- Ask user: fix in new plan?
98
+ ```
99
+
100
+ **The loop goes back to PLANNING, not TDD RED.** When self-review finds issues:
101
+ 1. Ask user: "Found issues. Want to create a plan to fix?"
102
+ 2. If yes -> back to PLANNING phase with new plan doc
103
+ 3. Then -> docs update -> TDD -> review (proper SDLC loop)
104
+
105
+ **How to self-review:**
106
+ 1. Run `/code-review` to review your changes
107
+ 2. It launches parallel agents (CLAUDE.md compliance, bug detection, logic & security)
108
+ 3. Issues at confidence >= 80 are real findings — go back to PLANNING to fix
109
+ 4. Issues below 80 are likely false positives — skip unless obviously valid
110
+ 5. Address issues by going back through the proper SDLC loop
111
+
112
+ ## Cross-Model Review (If Configured)
113
+
114
+ **When to run:** High-stakes changes (auth, payments, data handling), complex refactors, research-heavy work.
115
+ **When to skip:** Trivial changes (typo fixes, config tweaks), time-sensitive hotfixes, risk < review cost.
116
+
117
+ **Prerequisites:** Codex CLI installed (`npm i -g @openai/codex`), OpenAI API key set.
118
+
119
+ **Steps:**
120
+ 1. After self-review passes, write `.reviews/handoff.json`:
121
+ ```jsonc
122
+ {
123
+ "review_id": "feature-xyz-001",
124
+ "status": "PENDING_REVIEW",
125
+ "files_changed": ["src/auth.ts", "tests/auth.test.ts"],
126
+ "review_instructions": "Review for security, edge cases, and correctness",
127
+ "artifact_path": ".reviews/feature-xyz-001/"
128
+ }
129
+ ```
130
+ 2. Tell the user to run the independent reviewer:
131
+ ```bash
132
+ codex exec \
133
+ -c 'model_reasoning_effort="xhigh"' \
134
+ -s danger-full-access \
135
+ -o .reviews/latest-review.md \
136
+ "You are an independent code reviewer. Read .reviews/handoff.json, \
137
+ review the listed files, and write your findings to the artifact_path. \
138
+ End with CERTIFIED or NOT CERTIFIED."
139
+ ```
140
+ 3. Read `.reviews/latest-review.md` — if CERTIFIED, proceed to CI. If NOT CERTIFIED, fix findings and repeat from step 1.
141
+
142
+ ```
143
+ Self-review passes → write handoff.json → user runs codex exec
144
+ ^ |
145
+ | CERTIFIED? → YES → CI feedback loop
146
+ | |
147
+ | → NO (findings)
148
+ | |
149
+ └──────── Fix findings ←───────────────────────┘
150
+ (repeat until CERTIFIED, or ask user)
151
+ ```
152
+
153
+ **Tool-agnostic:** The value is adversarial diversity (different model, different blind spots), not the specific tool. Any competing AI reviewer works.
154
+
155
+ **Full protocol:** See the wizard's "Cross-Model Review Loop (Optional)" section for key flags and reasoning effort guidance.
156
+
157
+ ## Test Review (Harder Than Implementation)
158
+
159
+ During self-review, critique tests HARDER than app code:
160
+ 1. **Testing the right things?** - Not just that tests pass
161
+ 2. **Tests prove correctness?** - Or just verify current behavior?
162
+ 3. **Follow our philosophies (TESTING.md)?**
163
+ - Testing Diamond (integration-heavy)?
164
+ - Minimal mocking (real DB, mock external APIs only)?
165
+ - Real fixtures from captured data?
166
+
167
+ **Tests are the foundation.** Bad tests = false confidence = production bugs.
168
+
169
+ ## Flaky Test Recovery
170
+
171
+ When a test fails intermittently:
172
+ 1. **Don't dismiss it** — "flaky" means "bug we haven't found yet"
173
+ 2. **Identify the layer** — test code? app code? environment?
174
+ 3. **Stress-test** — run the suspect test N times to reproduce reliably
175
+ 4. **Fix root cause** — don't just retry-and-pray
176
+ 5. **If CI infrastructure** — make cosmetic steps non-blocking, keep quality gates strict
177
+
178
+ ## Scope Guard (Stay in Your Lane)
179
+
180
+ **Only make changes directly related to the task.**
181
+
182
+ If you notice something else that should be fixed:
183
+ - NOTE it in your summary ("I noticed X could be improved")
184
+ - DON'T fix it unless asked
185
+
186
+ **Why this matters:** AI agents can drift into "helpful" changes that weren't requested. This creates unexpected diffs, breaks unrelated things, and makes code review harder.
187
+
188
+ ## Test Failure Recovery (SDET Philosophy)
189
+
190
+ ```
191
+ ┌─────────────────────────────────────────────────────────────────────┐
192
+ │ ALL TESTS MUST PASS. NO EXCEPTIONS. │
193
+ │ │
194
+ │ This is not negotiable. This is not flexible. This is absolute. │
195
+ └─────────────────────────────────────────────────────────────────────┘
196
+ ```
197
+
198
+ **Not acceptable:**
199
+ - "Those were already failing" → Fix them first
200
+ - "Not related to my changes" → Doesn't matter, fix it
201
+ - "It's flaky" → Flaky = bug, investigate
202
+
203
+ **Treat test code like app code.** Test failures are bugs. Investigate them the way a 15-year SDET would - with thought and care, not by brushing them aside.
204
+
205
+ If tests fail:
206
+ 1. Identify which test(s) failed
207
+ 2. Diagnose WHY - this is the important part:
208
+ - Your code broke it? Fix your code (regression)
209
+ - Test is for deleted code? Delete the test
210
+ - Test has wrong assertions? Fix the test
211
+ - Test is "flaky"? Investigate - flakiness is just another word for bug
212
+ 3. Fix appropriately (fix code, fix test, or delete dead test)
213
+ 4. Run specific test individually first
214
+ 5. Then run ALL tests
215
+ 6. Still failing? ASK USER - don't spin your wheels
216
+
217
+ **Flaky tests are bugs, not mysteries:**
218
+ - Sometimes the bug is in app code (race condition, timing issue)
219
+ - Sometimes the bug is in test code (shared state, not parallel-safe)
220
+ - Sometimes the bug is in test environment (cleanup not proper)
221
+
222
+ Debug it. Find root cause. Fix it properly. Tests ARE code.
223
+
224
+ ## CI Feedback Loop (After Commit)
225
+
226
+ **The SDLC doesn't end at local tests.** CI must pass too.
227
+
228
+ ```
229
+ Local tests pass -> Commit -> Push -> Watch CI
230
+ |
231
+ CI passes? -+-> YES -> Present for review
232
+ |
233
+ +-> NO -> Fix -> Push -> Watch CI
234
+ |
235
+ (max 2 attempts)
236
+ |
237
+ Still failing?
238
+ |
239
+ STOP and ASK USER
240
+ ```
241
+
242
+ **How to watch CI:**
243
+ 1. Push changes to remote
244
+ 2. Check CI status:
245
+ ```bash
246
+ # Watch checks in real-time (blocks until complete)
247
+ gh pr checks --watch
248
+
249
+ # Or check status without blocking
250
+ gh pr checks
251
+
252
+ # View specific failed run logs
253
+ gh run view <RUN_ID> --log-failed
254
+ ```
255
+ 3. If CI fails:
256
+ - Read failure logs: `gh run view <RUN_ID> --log-failed`
257
+ - Diagnose root cause (same philosophy as local test failures)
258
+ - Fix and push again
259
+ 4. Max 2 fix attempts - if still failing, ASK USER
260
+ 5. If CI passes - proceed to present final summary
261
+
262
+ **Context GC (compact during idle):** While waiting for CI (typically 3-5 min), suggest `/compact` if the conversation is long. Think of it like a time-based garbage collector — idle time + high memory pressure = good time to collect. Don't suggest on short conversations.
263
+
264
+ **CI failures follow same rules as test failures:**
265
+ - Your code broke it? Fix your code
266
+ - CI config issue? Fix the config
267
+ - Flaky? Investigate - flakiness is a bug
268
+ - Stuck? ASK USER
269
+
270
+ ## CI Review Feedback Loop (After CI Passes)
271
+
272
+ **CI passing isn't the end.** If CI includes a code reviewer, read and address its suggestions.
273
+
274
+ ```
275
+ CI passes -> Read review suggestions
276
+ |
277
+ Valid improvements? -+-> YES -> Implement -> Run tests -> Push
278
+ | |
279
+ | Review again (iterate)
280
+ |
281
+ +-> NO (just opinions/style) -> Skip, note why
282
+ |
283
+ +-> None -> Done, present to user
284
+ ```
285
+
286
+ **How to evaluate suggestions:**
287
+ 1. Read all CI review comments: `gh api repos/OWNER/REPO/pulls/PR/comments`
288
+ 2. For each suggestion, ask: **"Is this a real improvement or just an opinion?"**
289
+ - **Real improvement:** Fixes a bug, improves performance, adds missing error handling, reduces duplication, improves test coverage → Implement it
290
+ - **Opinion/style:** Different but equivalent formatting, subjective naming preference, "you could also..." without clear benefit → Skip it
291
+ 3. Implement the valid ones, run tests locally, push
292
+ 4. CI re-reviews — repeat until no substantive suggestions remain
293
+ 5. Max 3 iterations — if reviewer keeps finding new things, ASK USER
294
+
295
+ **The goal:** User is only brought in at the very end, when both CI and reviewer are satisfied. The code should be polished before human review.
296
+
297
+ **Customizable behavior** (set during wizard setup):
298
+ - **Auto-implement** (default): Implement valid suggestions autonomously, skip opinions
299
+ - **Ask first**: Present suggestions to user, let them decide which to implement
300
+ - **Skip review feedback**: Ignore CI review suggestions, only fix CI failures
301
+
302
+ ## DRY Principle
303
+
304
+ **Before coding:** "What patterns exist I can reuse?"
305
+ **After coding:** "Did I accidentally duplicate anything?"
306
+
307
+ ## Design System Check (If UI Change)
308
+
309
+ **When to check:** CSS/styling changes, new UI components, color/font usage.
310
+ **When to skip:** Backend-only changes, config/build changes, non-visual code.
311
+
312
+ **Planning phase - "Design system check":**
313
+ 1. Read DESIGN_SYSTEM.md if it exists
314
+ 2. Check if change involves colors, fonts, spacing, or components
315
+ 3. Verify intended styles match design system tokens
316
+ 4. Flag if introducing new patterns not in design system
317
+
318
+ **Review phase - "Visual consistency check":**
319
+ 1. Are colors from the design system palette?
320
+ 2. Are fonts/sizes from typography scale?
321
+ 3. Are spacing values from the spacing scale?
322
+ 4. Do new components follow existing patterns?
323
+
324
+ **If no DESIGN_SYSTEM.md exists:** Skip these checks (project has no documented design system).
325
+
326
+ ## Deployment Tasks (If Task Involves Deploy)
327
+
328
+ **When to check:** Task mentions "deploy", "release", "push to prod", "staging", etc.
329
+ **When to skip:** Code changes only, no deployment involved.
330
+
331
+ **Before any deployment:**
332
+ 1. Read ARCHITECTURE.md → Find the Environments table and Deployment Checklist
333
+ 2. Verify which environment is the target (dev/staging/prod)
334
+ 3. Follow the deployment checklist in ARCHITECTURE.md
335
+
336
+ **Confidence levels for deployment:**
337
+
338
+ | Target | Required Confidence | If Lower |
339
+ |--------|---------------------|----------|
340
+ | Dev/Preview | MEDIUM or higher | Proceed with caution |
341
+ | Staging | MEDIUM or higher | Proceed, note uncertainties |
342
+ | **Production** | **HIGH only** | **ASK USER before deploying** |
343
+
344
+ **Production deployment requires:**
345
+ - All tests passing
346
+ - Production build succeeding
347
+ - Changes tested in staging/preview first
348
+ - HIGH confidence (90%+)
349
+ - If ANY doubt → ASK USER first
350
+
351
+ **If ARCHITECTURE.md has no Environments section:** Ask user "How do you deploy to [target]?" before proceeding.
352
+
353
+ ## DELETE Legacy Code
354
+
355
+ - Legacy code? DELETE IT
356
+ - Backwards compatibility? NO - DELETE IT
357
+ - "Just in case" fallbacks? DELETE IT
358
+
359
+ **THE RULE:** Delete old code first. If it breaks, fix it properly.
360
+
361
+ ---
362
+
363
+ **Full reference:** SDLC.md
@@ -0,0 +1,97 @@
1
+ ---
2
+ name: testing
3
+ description: TDD and testing philosophy for writing tests, test-driven development, integration tests, and unit tests. Use this skill when writing tests, doing TDD, or debugging test issues.
4
+ argument-hint: [test type] [target]
5
+ effort: high
6
+ ---
7
+ # Testing Skill - TDD & Testing Philosophy
8
+
9
+ ## Task
10
+ $ARGUMENTS
11
+
12
+ ## Testing Diamond (CRITICAL)
13
+
14
+ ```
15
+ /\ <- Few E2E (automated or manual sign-off at end)
16
+ / \
17
+ / \
18
+ /------\
19
+ | | <- MANY Integration (real DB, real cache - BEST BANG FOR BUCK)
20
+ | |
21
+ \------/
22
+ \ /
23
+ \ /
24
+ \/ <- Few Unit (pure logic only)
25
+ ```
26
+
27
+ **Why Integration Tests are Best Bang for Buck:**
28
+ - **Speed**: Fast enough to run on every change
29
+ - **Stability**: Touch real code, not mocks that lie
30
+ - **Confidence**: If they pass, production usually works
31
+ - **Real bugs**: Integration tests with real DB catch real bugs
32
+ - Unit tests with mocks can "pass" while production fails
33
+
34
+ ## Minimal Mocking Philosophy
35
+
36
+ | What | Mock? | Why |
37
+ |------|-------|-----|
38
+ | Database | NEVER | Use test DB or in-memory |
39
+ | Cache | NEVER | Use isolated test instance |
40
+ | External APIs | YES | Real calls = flaky + expensive |
41
+ | Time/Date | YES | Determinism |
42
+
43
+ **Mocks MUST come from REAL captured data:**
44
+ - Capture real API response
45
+ - Save to your fixtures directory (Claude will discover where yours is, e.g., `tests/fixtures/`, `test-data/`, etc.)
46
+ - Import in tests
47
+ - Never guess mock shapes!
48
+
49
+ ## TDD Tests Must PROVE
50
+
51
+ | Phase | What It Proves |
52
+ |-------|----------------|
53
+ | RED | Test FAILS -> Bug exists or feature missing |
54
+ | GREEN | Test PASSES -> Fix works or feature implemented |
55
+ | Forever | Regression protection |
56
+
57
+ **WRONG approach:**
58
+ ```
59
+ // Writing test that passes with current (buggy) code
60
+ assert currentBuggyBehavior == currentBuggyBehavior // pseudocode
61
+ ```
62
+
63
+ **CORRECT approach:**
64
+ ```
65
+ // Writing test that FAILS with buggy code, PASSES with fix
66
+ assert result.status == 'success' // pseudocode - adapt to your framework
67
+ assert result.data != null
68
+ ```
69
+
70
+ ## Unit Tests = Pure Logic ONLY
71
+
72
+ A function qualifies for unit testing ONLY if:
73
+ - No database calls
74
+ - No external API calls
75
+ - No file system access
76
+ - No cache calls
77
+ - Input -> Output transformation only
78
+
79
+ Everything else needs integration tests.
80
+
81
+ ## When Stuck on Tests
82
+
83
+ 1. Add console.logs -> Check output
84
+ 2. Run single test in isolation
85
+ 3. Check fixtures match real API
86
+ 4. **STILL stuck?** ASK USER
87
+
88
+ ## After Session (Capture Learnings)
89
+
90
+ If this session revealed testing insights, update the right place:
91
+ - **Testing patterns, gotchas** -> `TESTING.md`
92
+ - **Feature-specific test quirks** -> Feature docs (`*_PLAN.md`)
93
+ - **General project context** -> `CLAUDE.md` (or `/revise-claude-md`)
94
+
95
+ ---
96
+
97
+ **Full reference:** TESTING.md
package/package.json ADDED
@@ -0,0 +1,29 @@
1
+ {
2
+ "name": "agentic-sdlc-wizard",
3
+ "version": "1.15.0",
4
+ "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
5
+ "bin": {
6
+ "sdlc-wizard": "./cli/bin/sdlc-wizard.js"
7
+ },
8
+ "files": [
9
+ "cli/",
10
+ "CLAUDE_CODE_SDLC_WIZARD.md",
11
+ "CHANGELOG.md"
12
+ ],
13
+ "keywords": [
14
+ "claude-code",
15
+ "sdlc",
16
+ "tdd",
17
+ "ai-agent",
18
+ "developer-tools",
19
+ "code-quality"
20
+ ],
21
+ "license": "MIT",
22
+ "repository": {
23
+ "type": "git",
24
+ "url": "https://github.com/BaseInfinity/agentic-ai-sdlc-wizard"
25
+ },
26
+ "engines": {
27
+ "node": ">=18.0.0"
28
+ }
29
+ }