sdlc-framework 3.0.0 → 3.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -136,9 +136,10 @@ Follows: reproduce → isolate → root-cause → fix → verify.
136
136
 
137
137
  - Asks clarification questions (no guessing)
138
138
  - Presents options with trade-offs and recommendations
139
- - Defines BDD acceptance criteria (Given/When/Then)
139
+ - Defines BDD acceptance criteria (Given/When/Then) with **explicit verification types** — each AC declares whether it's tested via Playwright MCP, curl, Bash, test runner, or DB queries
140
140
  - Decomposes tasks with dependency graph for parallel execution
141
- - **Spec integrity review** — checks completeness, consistency, feasibility
141
+ - **Codebase gap analysis** — reads existing code the spec touches, flags missing error handling, edge cases, validation gaps, and integration risks before implementation starts
142
+ - **Spec integrity review** — checks completeness, consistency, feasibility, verification types, and gap analysis (7 checks)
142
143
  - **User approval gate** — spec must be explicitly approved before implementation starts
143
144
 
144
145
  ### IMPLEMENT — Build it
@@ -150,11 +151,14 @@ Follows: reproduce → isolate → root-cause → fix → verify.
150
151
 
151
152
  ### VERIFY — Test it
152
153
 
153
- - Translates acceptance criteria to automated tests
154
- - UI: Playwright MCP (navigate, click, assert)
155
- - API: curl requests with assertions
156
- - Logic: test suite execution
157
- - Reports pass/fail per AC with evidence
154
+ - Reads the **declared verification type** from each AC — no keyword guessing
155
+ - **UI_INTERACTION**: Playwright MCP (navigate, click, snapshot, screenshot, fill_form)
156
+ - **API_ENDPOINT**: curl requests with status code and body assertions
157
+ - **CLI_BEHAVIOR**: Bash command execution with stdout/stderr/exit code capture
158
+ - **BUSINESS_LOGIC**: project test runner (bun test, npm test, vitest, jest, pytest)
159
+ - **DATA_INTEGRITY**: database queries via CLI
160
+ - ACs missing a Type field are marked FAIL — never skipped, never guessed
161
+ - Reports pass/fail per AC with concrete evidence (screenshots, responses, test output)
158
162
 
159
163
  ### REVIEW — Check it
160
164
 
@@ -325,9 +329,15 @@ npx sdlc-framework@latest
325
329
 
326
330
  ---
327
331
 
328
- ## What's New in v3.0.0
332
+ ## What's New in v3.1.0
329
333
 
330
- **Fewer commands, faster flow, zero idle time.**
334
+ **Playwright actually works. Specs catch gaps before implementation.**
335
+
336
+ - **Explicit verification types on every AC** — Each acceptance criterion now declares a `Type` field (`UI_INTERACTION`, `API_ENDPOINT`, `CLI_BEHAVIOR`, `BUSINESS_LOGIC`, `DATA_INTEGRITY`). The verify phase reads this field directly instead of guessing from keywords. Playwright MCP is now reliably invoked for UI acceptance criteria.
337
+ - **Codebase gap analysis** — New mandatory step in `/sdlc:spec` reads existing code the spec touches and flags missing error handling, edge cases, validation gaps, integration risks, and untested paths. Gaps become new ACs and task updates before implementation starts.
338
+ - **7 integrity checks** (up from 5) — Spec integrity review now validates verification type correctness and gap analysis completion.
339
+
340
+ ### Previous: v3.0.0
331
341
 
332
342
  - **Research merged into Discuss** — `/sdlc:discuss` now spawns research subagents inline when unknowns are detected. No separate `/sdlc:research` command needed. One entry point for all pre-spec discovery.
333
343
  - **Transition absorbed into Close** — `/sdlc:close` handles phase transitions inline when the last plan completes. Phase completeness verification, PROJECT.md updates, git commits — all automatic. No separate `/sdlc:transition` command.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "sdlc-framework",
3
- "version": "3.0.0",
3
+ "version": "3.1.0",
4
4
  "description": "Structured Development Lifecycle - A closed-loop AI-assisted development framework for Claude Code",
5
5
  "bin": {
6
6
  "sdlc-framework": "bin/install.js"
@@ -78,16 +78,18 @@ Step-by-step:
78
78
  - Option C (do nothing / defer): description, risk of deferring.
79
79
  Ask the user to choose before proceeding.
80
80
 
81
- 5. **Define acceptance criteria in BDD format**
82
- Write Given/When/Then scenarios for each user-facing behavior:
81
+ 5. **Define acceptance criteria in BDD format with verification type**
82
+ Write Given/When/Then scenarios for each user-facing behavior. Every AC MUST include a `Type` field that declares how the verify phase will test it:
83
83
  ```
84
84
  AC-1: [Short title]
85
+ Type: [UI_INTERACTION | API_ENDPOINT | CLI_BEHAVIOR | BUSINESS_LOGIC | DATA_INTEGRITY]
85
86
  Given [precondition]
86
87
  When [action]
87
88
  Then [expected outcome]
88
89
  ```
90
+ The Type field is mandatory — it tells `/sdlc:verify` whether to use Playwright MCP, curl, Bash, the test runner, or DB queries. Without it, the AC will be skipped during verification.
89
91
  Include happy path, error cases, and edge cases.
90
- Present acceptance criteria to the user and ask: "Do these cover all cases? Anything to add or change?"
92
+ Present acceptance criteria with their types to the user and ask: "Do these cover all cases? Are the verification types correct?"
91
93
 
92
94
  6. **Break into tasks with dependency analysis**
93
95
  - List every discrete task needed to implement the spec.
@@ -106,7 +108,16 @@ Step-by-step:
106
108
  - What is OUT of scope.
107
109
  - What is DEFERRED to a future plan.
108
110
 
109
- 8. **Write SPEC.md**
111
+ 8. **Codebase gap analysis — MANDATORY**
112
+ Read every file listed in the task breakdown. Review existing code for:
113
+ - Missing error handling that the spec should cover.
114
+ - Untested code paths that need new ACs.
115
+ - Edge cases the existing code handles but the spec does not mention.
116
+ - Integration risks (callers of modified functions, shared state, exports).
117
+ - Validation gaps at system boundaries.
118
+ Present a gap report to the user. Add new ACs (with verification_type) and update tasks for every gap found.
119
+
120
+ 9. **Write SPEC.md**
110
121
  Create .sdlc/specs/SPEC-<plan-id>.md with:
111
122
  ```
112
123
  # Spec: <title>
@@ -123,42 +134,48 @@ Step-by-step:
123
134
  ## Engineering Constraints (from LAWS.md)
124
135
  ```
125
136
 
126
- 9. **Spec integrity review — MANDATORY, DO NOT SKIP**
127
- You MUST print a full integrity review table with ✓/✗ for EACH check:
128
- - CHECK 1 — COMPLETENESS: every task has name, action, files, verification, done criteria, complexity. Every AC has numbered GIVEN/WHEN/THEN with specific values.
129
- - CHECK 2 — CONSISTENCY: no orphan tasks (every task → AC), no orphan ACs (every AC → task), no boundary violations, no DAG cycles, no shared-file parallel writes.
130
- - CHECK 3 — CONTRADICTIONS: no conflicting ACs for same input, no conflicting task actions on same function, no task contradicting boundary, no AC contradicting PROJECT.md constraints.
131
- - CHECK 4 — FEASIBILITY: task count 2-5, estimated change under ~300 lines, all referenced files exist.
132
- - CHECK 5 — DEPENDENCY GRAPH: every task in exactly one wave, ordering matches dependencies, independent tasks parallelized, dependent tasks sequenced.
133
- - Print the full review table with all results. Fix any failures before proceeding.
134
-
135
- 10. **User approval gate** (BLOCKING cannot proceed without approval)
137
+ 10. **Spec integrity review — MANDATORY, DO NOT SKIP**
138
+ You MUST print a full integrity review table with ✓/✗ for EACH check:
139
+ - CHECK 1 — COMPLETENESS: every task has name, action, files, verification, done criteria, complexity. Every AC has numbered Type/GIVEN/WHEN/THEN with specific values.
140
+ - CHECK 2 — CONSISTENCY: no orphan tasks (every task → AC), no orphan ACs (every AC → task), no boundary violations, no DAG cycles, no shared-file parallel writes.
141
+ - CHECK 3 — CONTRADICTIONS: no conflicting ACs for same input, no conflicting task actions on same function, no task contradicting boundary, no AC contradicting PROJECT.md constraints.
142
+ - CHECK 4 — FEASIBILITY: task count 2-5, estimated change under ~300 lines, all referenced files exist.
143
+ - CHECK 5 — DEPENDENCY GRAPH: every task in exactly one wave, ordering matches dependencies, independent tasks parallelized, dependent tasks sequenced.
144
+ - CHECK 6 VERIFICATION TYPES: every AC has a valid Type field (UI_INTERACTION, API_ENDPOINT, CLI_BEHAVIOR, BUSINESS_LOGIC, DATA_INTEGRITY). Type matches AC content.
145
+ - CHECK 7 — GAP ANALYSIS: codebase gap analysis was performed, gaps were reported, and findings were incorporated into ACs/tasks.
146
+ - Print the full review table with all results. Fix any failures before proceeding.
147
+
148
+ 11. **User approval gate** (BLOCKING — cannot proceed without approval)
136
149
  - Present full spec summary AND integrity review results to user
137
150
  - User options: APPROVE (proceed), REVISE (change and re-review), REJECT (discard)
138
151
  - If REVISE: apply changes, re-run ALL integrity checks, re-present
139
152
  - If REJECT: delete spec, stop
140
153
  - If APPROVE: proceed to update state
141
154
 
142
- 11. **Update STATE.md**
155
+ 12. **Update STATE.md**
143
156
  - Set `current_plan: <plan-id>`
144
157
  - Set `spec_path: .sdlc/phases/{phase}/{plan}-SPEC.md`
145
158
  - Set `loop_position: SPEC ✓`
146
159
  - Set `next_required_action: /sdlc:impl`
147
160
 
148
- 12. **Output confirmation**
149
- - Print spec summary: objective, number of ACs, number of tasks, execution waves.
161
+ 13. **Output confirmation**
162
+ - Print spec summary: objective, number of ACs (with verification type breakdown), number of tasks, execution waves.
150
163
  - End with auto-advance directive to /sdlc:impl
151
164
  </process>
152
165
 
153
166
  <success_criteria>
154
167
  - [ ] At least 2 clarification questions asked before writing the spec
155
168
  - [ ] Every design decision presented with trade-offs and user confirmation
156
- - [ ] Acceptance criteria written in BDD format (Given/When/Then)
169
+ - [ ] Acceptance criteria written in BDD format (Given/When/Then) with mandatory Type field
170
+ - [ ] Every AC has a verification_type: UI_INTERACTION, API_ENDPOINT, CLI_BEHAVIOR, BUSINESS_LOGIC, or DATA_INTEGRITY
171
+ - [ ] Verification types match AC content (UI ACs test browser state, API ACs test endpoints, etc.)
157
172
  - [ ] Acceptance criteria cover happy path, error cases, and edge cases
158
173
  - [ ] Tasks broken into dependency graph with execution waves
159
174
  - [ ] Each task identifies files to create/modify and complexity
160
175
  - [ ] Boundaries section explicitly states in-scope, out-of-scope, and deferred items
161
- - [ ] Spec integrity review passed (completeness, consistency, feasibility)
176
+ - [ ] Codebase gap analysis performed — existing code reviewed for missing logic, edge cases, and integration risks
177
+ - [ ] Gap analysis findings incorporated into ACs and tasks (no silent gaps)
178
+ - [ ] Spec integrity review passed (completeness, consistency, feasibility, verification types, gap analysis)
162
179
  - [ ] User explicitly approved the spec (APPROVE response received)
163
180
  - [ ] SPEC.md created at .sdlc/phases/{phase}/{plan}-SPEC.md
164
181
  - [ ] STATE.md updated with current_plan, spec_path, loop_position: SPEC ✓
@@ -49,18 +49,18 @@ Step-by-step:
49
49
  - Read SPEC.md. Extract all acceptance criteria (AC-1, AC-2, ...).
50
50
  - Read IMPL.md for the list of modified files and build status.
51
51
 
52
- 2. **Classify each acceptance criterion by verification type**
53
- For each AC, determine the appropriate test strategy:
54
-
55
- | Work Type | Verification Method | Tools Used |
56
- |-----------|-------------------|------------|
57
- | UI/Frontend | Playwright MCP browser automation | browser_navigate, browser_click, browser_snapshot, browser_fill_form |
58
- | REST API | HTTP requests via curl or fetch | Bash (curl commands) |
59
- | GraphQL API | GraphQL queries via HTTP | Bash (curl commands) |
60
- | CLI tool | Shell command execution | Bash |
61
- | Business logic | Test runner (jest, vitest, pytest, etc.) | Bash (npm test, etc.) |
62
- | Database | Query verification | Bash (database CLI tools) |
63
- | File output | File content inspection | Read, Grep |
52
+ 2. **Read verification types from SPEC.md DO NOT GUESS**
53
+ Each AC in the SPEC.md has a declared `Type` field. Read it directly. Do NOT infer or guess the type from keywords.
54
+
55
+ | AC Type (from SPEC.md) | Verification Tool | Actions |
56
+ |------------------------|------------------|---------|
57
+ | UI_INTERACTION | Playwright MCP | browser_navigate, browser_snapshot, browser_click, browser_fill_form, browser_take_screenshot |
58
+ | API_ENDPOINT | Bash curl | HTTP requests, response parsing |
59
+ | CLI_BEHAVIOR | Bash | Shell commands, stdout/stderr/exit code capture |
60
+ | BUSINESS_LOGIC | Test runner | bun test, npm test, vitest, jest, pytest |
61
+ | DATA_INTEGRITY | DB CLI | Database queries via Bash |
62
+
63
+ If any AC is missing its Type field, mark it FAIL immediately — do not skip it or guess.
64
64
 
65
65
  3. **Execute UI verifications (Playwright MCP)**
66
66
  For each UI acceptance criterion:
@@ -142,8 +142,13 @@ Step-by-step:
142
142
 
143
143
  <success_criteria>
144
144
  - [ ] Every acceptance criterion from SPEC.md has a corresponding verification
145
- - [ ] Correct verification type selected per AC (UI uses Playwright, API uses curl, etc.)
146
- - [ ] UI tests use Playwright MCP tools (navigate, snapshot, click, fill_form, screenshot)
145
+ - [ ] Verification type READ from each AC's Type field not guessed from keywords
146
+ - [ ] ACs with Type: UI_INTERACTION verified using Playwright MCP tools (navigate, snapshot, click, fill_form, screenshot)
147
+ - [ ] ACs with Type: API_ENDPOINT verified using curl commands
148
+ - [ ] ACs with Type: CLI_BEHAVIOR verified using Bash command execution
149
+ - [ ] ACs with Type: BUSINESS_LOGIC verified using the project test runner
150
+ - [ ] ACs with Type: DATA_INTEGRITY verified using database queries
151
+ - [ ] ACs missing a Type field marked FAIL — not skipped, not guessed
147
152
  - [ ] Each verification produces concrete evidence (screenshots, response bodies, test output)
148
153
  - [ ] VERIFY-<plan-id>.md created with per-AC pass/fail results
149
154
  - [ ] Failed ACs include: expected vs actual, root cause analysis, suggested fix, files to modify
@@ -141,39 +141,91 @@
141
141
  </step>
142
142
 
143
143
  <step name="write_acceptance_criteria" priority="fifth">
144
- For each piece of observable behavior, write a BDD acceptance criterion:
144
+ For each piece of observable behavior, write a BDD acceptance criterion.
145
145
 
146
+ ╔══════════════════════════════════════════════════════════════════════╗
147
+ ║ EVERY AC MUST INCLUDE A verification_type FIELD. ║
148
+ ║ This field is NOT optional. It is NOT inferred. It is DECLARED. ║
149
+ ║ The verify phase reads this field to select the correct tool. ║
150
+ ║ Without it, the AC will NOT be verified — it will be SKIPPED. ║
151
+ ╚══════════════════════════════════════════════════════════════════════╝
152
+
153
+ FORMAT — every AC must follow this exact structure:
146
154
  ```
147
155
  AC-{N}: {short description}
156
+ Type: {verification_type}
148
157
  GIVEN {initial state or precondition}
149
158
  WHEN {action or trigger}
150
159
  THEN {expected outcome}
151
160
  ```
152
161
 
162
+ VERIFICATION TYPE — choose exactly ONE per AC:
163
+
164
+ | Type | When to Use | Verify Phase Tool |
165
+ |------|-------------|-------------------|
166
+ | UI_INTERACTION | The AC involves a browser, page, form, button, visual element, navigation, rendering, modal, dropdown, or any user-facing screen | Playwright MCP (browser_navigate, browser_snapshot, browser_click, browser_fill_form, browser_take_screenshot) |
167
+ | API_ENDPOINT | The AC involves an HTTP endpoint, request/response, status code, JSON body, REST or GraphQL call | Bash curl commands |
168
+ | CLI_BEHAVIOR | The AC involves a terminal command, stdout/stderr, exit code, file system output | Bash command execution |
169
+ | BUSINESS_LOGIC | The AC involves a function return value, calculation, transformation, validation rule, or internal behavior with no UI/API surface | Project test runner (bun test, npm test, vitest, jest, pytest) |
170
+ | DATA_INTEGRITY | The AC involves a database record, migration, schema constraint, or data consistency | Database CLI queries via Bash |
171
+
172
+ DECISION RULES for choosing the type:
173
+ 1. If the user interacts with a browser to trigger the behavior → UI_INTERACTION
174
+ 2. If the behavior is triggered by an HTTP request (even if there is also a UI) → choose based on WHAT the AC is testing:
175
+ - Testing the visual result in the browser → UI_INTERACTION
176
+ - Testing the API contract (status code, response body) → API_ENDPOINT
177
+ 3. If the AC tests internal logic that has no user-facing surface → BUSINESS_LOGIC
178
+ 4. If the AC tests a CLI tool or script → CLI_BEHAVIOR
179
+ 5. If the AC tests data persistence or schema → DATA_INTEGRITY
180
+ 6. When in doubt, prefer UI_INTERACTION — Playwright is the primary UAT tool
181
+
153
182
  Rules for writing good ACs:
154
183
  - Each AC tests ONE behavior (not multiple things)
155
184
  - The GIVEN sets up a specific, reproducible state
156
185
  - The WHEN is a single action (not a sequence)
157
186
  - The THEN is observable and verifiable (not "works correctly" — specify WHAT is correct)
158
187
  - Include at least one AC for the "happy path" and one for an error/edge case
188
+ - The verification_type MUST match the nature of the test — do NOT default everything to BUSINESS_LOGIC
159
189
 
160
- Example (good):
190
+ Example (good — UI):
161
191
  ```
162
192
  AC-1: User login with valid credentials
193
+ Type: UI_INTERACTION
163
194
  GIVEN a registered user with email "test@example.com" and password "ValidPass123"
164
195
  WHEN the user submits the login form with those credentials
165
- THEN a JWT token is returned with status 200 and the response body contains { "token": "<jwt>" }
196
+ THEN the browser redirects to /dashboard and displays "Welcome back" in the header
197
+ ```
198
+
199
+ Example (good — API):
200
+ ```
201
+ AC-2: Login API returns JWT token
202
+ Type: API_ENDPOINT
203
+ GIVEN a registered user with email "test@example.com" and password "ValidPass123"
204
+ WHEN a POST request is sent to /api/auth/login with { "email": "test@example.com", "password": "ValidPass123" }
205
+ THEN the response has status 200 and body contains { "token": "<jwt>" }
206
+ ```
207
+
208
+ Example (good — CLI):
209
+ ```
210
+ AC-3: Install command copies files to target
211
+ Type: CLI_BEHAVIOR
212
+ GIVEN an empty target directory at /tmp/test-install
213
+ WHEN the user runs "node bin/install.js --local --target /tmp/test-install"
214
+ THEN the directory /tmp/test-install/commands/sdlc/ contains 12 command files and exit code is 0
166
215
  ```
167
216
 
168
- Example (bad — too vague):
217
+ Example (bad — missing Type):
169
218
  ```
170
219
  AC-1: Login works
171
220
  GIVEN a user
172
221
  WHEN they log in
173
222
  THEN it works
174
223
  ```
224
+ This AC is INVALID: no Type field, vague GIVEN/WHEN/THEN, untestable.
175
225
 
176
- WHY: The verify phase translates ACs directly into test actions. Vague ACs produce vague tests that pass even when the code is broken. Specific ACs catch real bugs.
226
+ Present ACs to user with their types and ask: "Do these cover all cases? Are the verification types correct?"
227
+
228
+ WHY: The verify phase reads the Type field to select Playwright MCP, curl, Bash, or the test runner. Without an explicit type, the verify phase guesses based on keywords — and guesses wrong. This field is the contract between spec and verify. It MUST be present on every AC.
177
229
  </step>
178
230
 
179
231
  <step name="define_boundaries" priority="sixth">
@@ -196,7 +248,114 @@
196
248
  WHY: Without boundaries, sub-agents during implementation will "helpfully" refactor nearby code, breaking things outside the spec scope. Boundaries are guardrails.
197
249
  </step>
198
250
 
199
- <step name="write_spec_file" priority="seventh">
251
+ <step name="codebase_gap_analysis" priority="seventh">
252
+ ╔══════════════════════════════════════════════════════════════════════╗
253
+ ║ THIS STEP IS MANDATORY. DO NOT SKIP. DO NOT ABBREVIATE. ║
254
+ ║ Review the EXISTING code that this spec touches BEFORE writing ║
255
+ ║ the spec file. Gaps found here become new ACs or task updates. ║
256
+ ╚══════════════════════════════════════════════════════════════════════╝
257
+
258
+ PURPOSE: Catch missing logic, incomplete requirements, and hidden dependencies
259
+ by reading the actual code BEFORE implementation begins. A spec written without
260
+ reading the code it modifies will miss error handling, edge cases, validation,
261
+ and integration points that only the code reveals.
262
+
263
+ PROCEDURE:
264
+
265
+ 1. IDENTIFY FILES TO REVIEW:
266
+ Collect every file path listed in the tasks from step "task_decomposition":
267
+ - Files to modify (existing code that will change)
268
+ - Files adjacent to modified files (same directory — they may share types, imports, or patterns)
269
+ - Test files for modified files (if they exist)
270
+ Do NOT review files listed in Boundaries/DO NOT CHANGE.
271
+
272
+ 2. READ EACH FILE and check for these gap categories:
273
+
274
+ A. MISSING ERROR HANDLING:
275
+ - Does the code handle null/undefined returns from functions the spec will call?
276
+ - Are there try/catch blocks around operations that can fail (network, file I/O, DB)?
277
+ - Does the code validate input at system boundaries (controllers, API handlers)?
278
+ - Are there error paths that the spec's ACs do not cover?
279
+ → If gaps found: create new ACs with Type: BUSINESS_LOGIC or Type: API_ENDPOINT for each uncovered error path.
280
+
281
+ B. INCOMPLETE EDGE CASES:
282
+ - Empty arrays, empty strings, zero values — does the code handle them?
283
+ - Concurrent access — can two users hit this code path simultaneously?
284
+ - Partial failures — what happens if step 2 of 3 fails?
285
+ - Boundary values — maximum lengths, integer overflow, special characters?
286
+ → If gaps found: add edge case ACs or expand THEN clauses in existing ACs.
287
+
288
+ C. MISSING VALIDATION:
289
+ - Are there function parameters that accept any value but should be constrained?
290
+ - Are there user inputs that reach business logic without sanitization?
291
+ - Are there type assertions or casts that assume a shape without checking?
292
+ → If gaps found: add validation tasks or expand existing tasks to include validation.
293
+
294
+ D. INTEGRATION RISKS:
295
+ - Does the modified code export types/functions used by OTHER files not in the spec?
296
+ - Will changing a function signature break its callers?
297
+ - Are there shared state or singleton patterns that could cause side effects?
298
+ - Does the code depend on environment variables, config, or feature flags not mentioned in the spec?
299
+ → If gaps found: add integration ACs or expand Boundaries to protect callers.
300
+
301
+ E. UNTESTED PATHS:
302
+ - Read existing test files for the modified code.
303
+ - Are there public methods with zero test coverage?
304
+ - Are there conditional branches (if/else, switch) where only one branch is tested?
305
+ - Does the spec add new behavior that existing tests do not cover?
306
+ → If gaps found: add BUSINESS_LOGIC ACs for untested paths, or add test tasks.
307
+
308
+ F. PATTERN VIOLATIONS:
309
+ - Does the existing code follow patterns (error handling, naming, structure) that the spec's tasks should also follow?
310
+ - Are there established abstractions (base classes, utilities, factories) that the spec should reuse instead of reinventing?
311
+ → If gaps found: update tasks to reference existing patterns. Add to Required Patterns section.
312
+
313
+ 3. COMPILE GAP REPORT:
314
+ Present findings to the user in this format:
315
+ ```
316
+ ══════════════════════════════════════════════
317
+ CODEBASE GAP ANALYSIS
318
+ ══════════════════════════════════════════════
319
+ Files reviewed: {N}
320
+
321
+ Gaps found: {N}
322
+
323
+ GAP-1: {category} — {file}:{line range}
324
+ Problem: {what is missing or incomplete}
325
+ Impact: {what could go wrong during implementation or in production}
326
+ Recommendation: {add AC / expand task / add to boundaries}
327
+
328
+ GAP-2: ...
329
+
330
+ New ACs to add: {N}
331
+ Tasks to update: {N}
332
+ Boundaries to add: {N}
333
+
334
+ No gaps found in: {list categories with no issues}
335
+ ══════════════════════════════════════════════
336
+ ```
337
+
338
+ 4. APPLY FIXES:
339
+ For each gap:
340
+ - If it requires a new AC → add the AC with the correct verification_type to the AC list
341
+ - If it requires expanding an existing task → update the task's action and done criteria
342
+ - If it reveals an out-of-scope dependency → add to Boundaries
343
+ - If it reveals a required pattern → add to Required Patterns
344
+
345
+ Do NOT silently absorb gaps. Every gap MUST appear in the report and result in a spec change.
346
+
347
+ 5. RE-PRESENT UPDATED ACs AND TASKS:
348
+ After applying fixes, show the user the updated AC list and any changed tasks.
349
+ Ask: "I found {N} gaps in the existing code. Here are the additions. Do these look right?"
350
+
351
+ WHY: Specs written in a vacuum miss what the code actually does. A login spec might
352
+ forget rate limiting because the spec author did not read the auth middleware. A form
353
+ spec might miss accessibility because the spec author did not see the existing aria
354
+ patterns. This step forces the spec to account for reality — not just intent. It is
355
+ the difference between a spec that sounds right and a spec that IS right.
356
+ </step>
357
+
358
+ <step name="write_spec_file" priority="eighth">
200
359
  Write the spec to: .sdlc/phases/{phase-dir}/{plan-number}-SPEC.md
201
360
 
202
361
  Format:
@@ -245,12 +404,16 @@
245
404
  ## Acceptance Criteria
246
405
 
247
406
  AC-1: {description}
407
+ Type: {UI_INTERACTION | API_ENDPOINT | CLI_BEHAVIOR | BUSINESS_LOGIC | DATA_INTEGRITY}
248
408
  GIVEN {precondition}
249
409
  WHEN {action}
250
410
  THEN {outcome}
251
411
 
252
412
  AC-2: ...
253
413
 
414
+ ## Codebase Gap Analysis
415
+ {Summary of gaps found in existing code, with GAP-N references to ACs/tasks they generated}
416
+
254
417
  ## Boundaries
255
418
  - DO NOT modify: {files/modules}
256
419
  - DO NOT implement: {out-of-scope features}
@@ -267,7 +430,7 @@
267
430
  WHY: The YAML frontmatter is machine-readable — the impl phase parses it to build the execution plan. The markdown body is human-readable — developers can review it. Both are needed.
268
431
  </step>
269
432
 
270
- <step name="spec_integrity_review" priority="eighth">
433
+ <step name="spec_integrity_review" priority="ninth">
271
434
  ╔══════════════════════════════════════════════════════════════════════╗
272
435
  ║ THIS STEP IS MANDATORY. DO NOT SKIP. DO NOT ABBREVIATE. ║
273
436
  ║ You MUST print the full integrity review table before proceeding. ║
@@ -286,6 +449,7 @@
286
449
  - [ ] complexity (LOW/MEDIUM/HIGH)
287
450
  For each AC, verify it has ALL of these (print each):
288
451
  - [ ] AC number (AC-1, AC-2, etc.)
452
+ - [ ] Type field with valid verification_type (UI_INTERACTION, API_ENDPOINT, CLI_BEHAVIOR, BUSINESS_LOGIC, or DATA_INTEGRITY)
289
453
  - [ ] GIVEN with specific precondition (not "given the system is running")
290
454
  - [ ] WHEN with specific action (not "when the user does something")
291
455
  - [ ] THEN with specific observable outcome (not "then it works correctly")
@@ -321,11 +485,26 @@
321
485
  - [ ] Independent tasks (no shared files, no shared types) are correctly parallelized
322
486
  - [ ] Dependent tasks (shared files, shared types/interfaces) are correctly sequenced
323
487
 
488
+ CHECK 6 — VERIFICATION TYPE VALIDITY:
489
+ - [ ] Every AC has a Type field (no AC without a Type)
490
+ - [ ] Every Type is one of: UI_INTERACTION, API_ENDPOINT, CLI_BEHAVIOR, BUSINESS_LOGIC, DATA_INTEGRITY
491
+ - [ ] Type matches the AC content (UI ACs reference browser/page/form elements, API ACs reference endpoints/status codes, etc.)
492
+ - [ ] ACs with Type: UI_INTERACTION have THEN clauses that reference observable browser state (page content, URL, visible elements) — not internal state
493
+ - [ ] ACs with Type: API_ENDPOINT have WHEN clauses that specify HTTP method and endpoint path
494
+ - [ ] ACs with Type: CLI_BEHAVIOR have WHEN clauses that specify the exact command to run
495
+
496
+ CHECK 7 — CODEBASE GAP ANALYSIS COMPLETED:
497
+ - [ ] The codebase_gap_analysis step was executed (not skipped)
498
+ - [ ] Files listed in tasks were read and reviewed for gaps
499
+ - [ ] Gap report was presented to user
500
+ - [ ] Any gaps found resulted in new ACs or updated tasks (no silent absorption)
501
+ - [ ] Codebase Gap Analysis section exists in the spec file
502
+
324
503
  IF ANY CHECK ITEM FAILS:
325
504
  1. List all failures with specific descriptions
326
505
  2. Propose a fix for each failure
327
506
  3. Apply all fixes to the SPEC.md
328
- 4. Re-run ALL five checks on the updated spec
507
+ 4. Re-run ALL seven checks on the updated spec
329
508
  5. Repeat until all checks pass
330
509
 
331
510
  PRINT the full integrity review report:
@@ -333,17 +512,29 @@
333
512
  ══════════════════════════════════════════════
334
513
  SPEC INTEGRITY REVIEW
335
514
  ══════════════════════════════════════════════
336
- Completeness: {✓ or ✗} — {details}
337
- Consistency: {✓ or ✗} — {details}
338
- Contradictions: {✓ or ✗} — {details or "none found"}
339
- Feasibility: {✓ or ✗} — {details}
340
- Dependency Graph: {✓ or ✗} — {details}
515
+ Completeness: {✓ or ✗} — {details}
516
+ Consistency: {✓ or ✗} — {details}
517
+ Contradictions: {✓ or ✗} — {details or "none found"}
518
+ Feasibility: {✓ or ✗} — {details}
519
+ Dependency Graph: {✓ or ✗} — {details}
520
+ Verification Types: {✓ or ✗} — {details}
521
+ Gap Analysis: {✓ or ✗} — {details}
341
522
 
342
523
  Tasks: {N} fully defined
343
- ACs: {N} with specific Given/When/Then
524
+ ACs: {N} with specific Given/When/Then and verification_type
344
525
  Waves: {N} parallel execution groups
345
526
  Estimated scope: ~{N} lines across {N} files
346
527
 
528
+ Verification type breakdown:
529
+ - UI_INTERACTION: {N} ACs (Playwright MCP)
530
+ - API_ENDPOINT: {N} ACs (curl)
531
+ - CLI_BEHAVIOR: {N} ACs (Bash)
532
+ - BUSINESS_LOGIC: {N} ACs (test runner)
533
+ - DATA_INTEGRITY: {N} ACs (DB queries)
534
+
535
+ Codebase gaps found: {N}
536
+ {if any: list each gap and the AC/task it generated}
537
+
347
538
  Contradictions found: {N}
348
539
  {if any: list each contradiction and how it was resolved}
349
540
 
@@ -352,10 +543,10 @@
352
543
  ══════════════════════════════════════════════
353
544
  ```
354
545
 
355
- WHY: A spec with contradictions produces agents that build conflicting code. A spec with orphan ACs means untested features. A spec with invalid dependencies means agents wait forever or overwrite each other. This review catches all of these. It costs 30 seconds now and saves hours of debugging later.
546
+ WHY: A spec with contradictions produces agents that build conflicting code. A spec with orphan ACs means untested features. A spec with invalid dependencies means agents wait forever or overwrite each other. A spec without verification types means the verify phase guesses and skips Playwright. A spec without gap analysis means missing logic ships undetected. This review catches all of these.
356
547
  </step>
357
548
 
358
- <step name="user_approval_gate" priority="ninth">
549
+ <step name="user_approval_gate" priority="tenth">
359
550
  THIS IS A BLOCKING GATE. The spec does NOT proceed without explicit user approval.
360
551
 
361
552
  Present the complete spec summary to the user:
@@ -409,7 +600,7 @@
409
600
  WHY: Implementation is expensive. Building the wrong thing wastes hours. A 30-second review of the spec catches misunderstandings before they become code. The user MUST see and approve the plan before sub-agents start writing code.
410
601
  </step>
411
602
 
412
- <step name="update_state" priority="tenth">
603
+ <step name="update_state" priority="eleventh">
413
604
  ONLY REACHED AFTER USER APPROVES THE SPEC.
414
605
 
415
606
  Update .sdlc/STATE.md:
@@ -45,38 +45,43 @@
45
45
  WHY: Without acceptance criteria, there is nothing to verify. The spec must define what "done" looks like.
46
46
  </step>
47
47
 
48
- <step name="classify_acceptance_criteria" priority="third">
49
- For each AC, classify its verification type based on the GIVEN/WHEN/THEN content:
50
-
51
- TYPE: UI_INTERACTION
52
- Indicators: mentions page, form, button, click, navigate, display, render, modal, input field, dropdown
53
- Method: Playwright MCP (browser_navigate, browser_snapshot, browser_click, browser_fill_form, etc.)
48
+ <step name="read_verification_types" priority="third">
49
+ ╔══════════════════════════════════════════════════════════════════════╗
50
+ ║ DO NOT GUESS OR INFER THE VERIFICATION TYPE. ║
51
+ ║ READ the Type field from each AC in the SPEC.md. ║
52
+ ║ The spec phase declared the type. The verify phase USES it. ║
53
+ ║ If an AC is missing its Type field, mark it FAIL: "No Type field." ║
54
+ ╚══════════════════════════════════════════════════════════════════════╝
54
55
 
55
- TYPE: API_ENDPOINT
56
- Indicators: mentions endpoint, request, response, status code, JSON body, header, GET/POST/PUT/DELETE
57
- Method: Bash curl commands
56
+ For each AC extracted in the previous step, read its declared Type field.
58
57
 
59
- TYPE: CLI_BEHAVIOR
60
- Indicators: mentions command, terminal, output, exit code, stderr, stdout, flag, argument
61
- Method: Bash command execution
58
+ VALID TYPES AND THEIR VERIFICATION TOOLS:
62
59
 
63
- TYPE: BUSINESS_LOGIC
64
- Indicators: mentions function, method, returns, throws, calculates, transforms, validates
65
- Method: Run project test suite (npm test, bun test, etc.) or execute specific test files
60
+ | Type | Verification Tool | Action |
61
+ |------|------------------|--------|
62
+ | UI_INTERACTION | Playwright MCP | Use browser_navigate, browser_snapshot, browser_click, browser_fill_form, browser_take_screenshot |
63
+ | API_ENDPOINT | Bash curl | Construct and execute HTTP requests, parse response |
64
+ | CLI_BEHAVIOR | Bash | Run shell commands, capture stdout/stderr/exit code |
65
+ | BUSINESS_LOGIC | Test runner | Run bun test, npm test, vitest, jest, or pytest |
66
+ | DATA_INTEGRITY | DB CLI | Execute database queries via Bash |
66
67
 
67
- TYPE: DATA_INTEGRITY
68
- Indicators: mentions database, record, row, column, migration, schema, constraint
69
- Method: Database queries via Bash or ORM commands
68
+ MISSING TYPE HANDLING:
69
+ If an AC does NOT have a Type field:
70
+ 1. Record: "FAIL AC-{N} is missing its Type field. The spec must declare a verification_type for every AC."
71
+ 2. Do NOT attempt to guess the type from keywords.
72
+ 3. Do NOT skip the AC silently.
73
+ 4. Include this failure in the verification report.
74
+ 5. The overall verification FAILS if any AC is missing its Type.
70
75
 
71
- Present classification to proceed:
76
+ Present the type mapping before executing:
72
77
  ```
73
- AC Classification:
74
- AC-1: {description} → UI_INTERACTION (Playwright)
75
- AC-2: {description} → API_ENDPOINT (curl)
76
- AC-3: {description} → BUSINESS_LOGIC (test suite)
78
+ Verification Plan:
79
+ AC-1: {description} → {Type} (tool: {Playwright MCP | curl | Bash | test runner | DB CLI})
80
+ AC-2: {description} → {Type} (tool: {Playwright MCP | curl | Bash | test runner | DB CLI})
81
+ AC-3: {description} → {Type} (tool: {Playwright MCP | curl | Bash | test runner | DB CLI})
77
82
  ```
78
83
 
79
- WHY: Different AC types need different verification tools. Using curl for a UI test or Playwright for a unit test wastes time and gives unreliable results.
84
+ WHY: The spec phase explicitly declared how each AC should be verified. Reading the declared type eliminates guessing, prevents misclassification, and ensures Playwright MCP is actually invoked for UI acceptance criteria. This is the contract between spec and verify the spec says WHAT tool to use, the verify phase USES that tool.
80
85
  </step>
81
86
 
82
87
  <step name="verify_ui_interactions" priority="fourth">