claude-devkit-cli 1.2.4 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,141 +1,501 @@
1
- Generate spec + test plan from description or existing spec.
1
+ Generate spec with acceptance scenarios from description or existing spec.
2
2
 
3
3
  ## Determine mode
4
4
 
5
5
  Examine `$ARGUMENTS`:
6
6
 
7
- - **Mode A — Spec exists:** Argument is a file path read spec, generate test plan.
8
- - **Mode B No spec:** Argument is a description → create spec + test plan.
9
- - **Mode CUpdate:** Argument mentions "update" or existing path read existing, update surgically.
7
+ - **Mode A — New spec:** Argument is a feature description AND directory
8
+ `docs/specs/<feature>/` does not exist → create new spec.
9
+ - **Mode BAdd scenarios:** Argument is a path to an existing spec AND spec does not
10
+ contain `## Stories` section with AS-NNN IDs → read spec, add acceptance scenarios.
11
+ - **Mode C — Update:** Argument is a path to an existing spec AND spec already contains
12
+ `## Stories` section with AS-NNN IDs → update flow (see Mode C section below).
13
+
14
+ ---
15
+
16
+ ## Directory Structure
17
+
18
+ ```
19
+ docs/specs/
20
+ <feature>/
21
+ <feature>.md # current state — always read this file
22
+ snapshots/ # version history
23
+ <YYYY-MM-DD>.md
24
+ <YYYY-MM-DD>-<REF>.md
25
+ ```
26
+
27
+ - `<feature>.md` is the single source of truth. All spec reads start from this file.
28
+ - `snapshots/` contains full copies at points in time. Immutable — never edit a snapshot.
29
+ - When a feature is split, sub-specs live in the same directory:
30
+ ```
31
+ docs/specs/billing/
32
+ billing.md # root spec or overview
33
+ billing-checkout.md # sub-spec
34
+ billing-refund.md # sub-spec
35
+ snapshots/
36
+ ```
10
37
 
11
38
  ---
12
39
 
13
40
  ## Phase 0: Codebase Awareness
14
41
 
15
- Before writing anything:
16
- 1. Scan existing code in the feature area — what files, functions, types already exist?
17
- 2. Check `docs/specs/` is there already a spec for this or a related feature?
18
- 3. Check `docs/test-plans/` — any overlap with existing plans?
19
- 4. Identify project patterns test framework, naming conventions, directory structure.
42
+ Before writing anything, run this checklist:
43
+
44
+ | # | Action | How |
45
+ |---|--------|-----|
46
+ | P0-1 | **Keyword scan** | Grep the codebase for 3-5 keywords from the feature description. Note matching files, functions, types. |
47
+ | P0-2 | **Related specs** | List `docs/specs/` directories. Read the main spec of any related feature. Is there overlap? |
48
+ | P0-3 | **Dependency scan** | In the feature area, check imports/dependencies. What modules does this code touch? |
49
+ | P0-4 | **Reusable utilities** | Look for existing helpers, validators, formatters, shared types that the new feature could reuse. List candidates. |
50
+ | P0-5 | **Project patterns** | Identify test framework, naming conventions, directory structure from existing code. |
51
+ | P0-6 | **Change Log** | If the feature exists, read its Change Log to understand evolution. |
52
+ | P0-7 | **Knowledge graph** | If `codebase-memory-mcp` is available, use `search_code`, `get_architecture`, and `trace_call_path` to discover related code, understand architecture context, and trace dependencies — faster and more thorough than manual grep. |
53
+
54
+ If `codebase-memory-mcp` MCP server is connected, prefer it for P0-1, P0-3, P0-4 — it provides indexed search, architecture overview, and call path tracing that are more reliable than ad-hoc grep.
55
+
56
+ Record findings as bullet points — carry them into Phase 2 (Data Model, Constraints) and Phase 3 (ambiguity check).
20
57
 
21
58
  Don't plan in a vacuum. A spec that ignores existing code creates conflicts.
22
59
 
23
60
  ---
24
61
 
25
- ## Phase 1: Draft the Spec (Mode B only)
62
+ ## Phase 1: Scope & Split Assessment
63
+
64
+ Before writing the spec, assess size.
65
+
66
+ **Input:** Feature description from user (Mode A) or current spec (Mode B).
67
+ Mode C does not run Phase 1 — it uses its own flow (see Mode C section).
68
+
69
+ **Split rules:**
70
+
71
+ | # | Condition | Action |
72
+ |---|-----------|--------|
73
+ | T1 | Feature has >7 expected stories | MUST split |
74
+ | T2 | Feature has >20 expected AS | MUST split |
75
+ | T3 | Stories belong to different domains (e.g. payment + notification) | SHOULD split |
76
+ | T4 | A story can ship independently without depending on other stories | SHOULD split |
77
+ | T5 | Stories share a data model or state machine | DO NOT split |
78
+ | T6 | Splitting would duplicate >50% of context (entities, constraints) | DO NOT split |
79
+
80
+ "MUST" = mandatory split, inform user.
81
+ "SHOULD" = suggest split, ask user.
82
+ "DO NOT" = keep together, unless user requests split.
83
+
84
+ If splitting:
85
+ - Create the feature directory, place sub-specs in the same directory.
86
+ - Each sub-spec must be self-contained (own overview, relevant data model, constraints).
87
+ - No sub-spec should depend on another sub-spec to be understood.
88
+
89
+ ---
90
+
91
+ ## Phase 2: Draft the Spec (Mode A + B)
92
+
93
+ **Mode A:** Create a new spec at `docs/specs/<feature>/<feature>.md` using the template below. Include stories + acceptance scenarios.
94
+
95
+ **Mode B:** Read existing spec, add `## Stories` section with AS following depth rules.
96
+
97
+ ### Spec Template
98
+
99
+ ```markdown
100
+ # Spec: <Feature Name>
101
+
102
+ **Created:** <$(date +%Y-%m-%d)>
103
+ **Last updated:** <$(date +%Y-%m-%d)>
104
+ **Status:** Draft | Active | Deprecated
105
+ **Snapshot limit:** <N, optional — default 5>
106
+
107
+ ## Overview
108
+ [what, why, who — 2-3 sentences]
109
+
110
+ ## Data Model
111
+ [entities, attributes, relationships — if applicable]
112
+
113
+ ## Stories
114
+
115
+ ### S-001: <Story name> (P0)
116
+
117
+ **Description:** [user story]
118
+ **Source:** [optional: ticket/issue ref]
119
+
120
+ **Acceptance Scenarios:**
121
+
122
+ AS-001: <short description>
123
+ - **Given:** [state]
124
+ - **When:** [action]
125
+ - **Then:** [expected]
126
+ - **Data:** [test data]
127
+
128
+ AS-002: <short description>
129
+ - **Given:** [error state]
130
+ - **When:** [action]
131
+ - **Then:** [error handling]
132
+ - **Data:** [edge case data]
133
+
134
+ ### S-002: <Story name> (P1)
135
+
136
+ **Description:** [user story]
137
+ **Source:** [optional]
138
+
139
+ **Acceptance Scenarios:**
140
+
141
+ AS-003: <short description>
142
+ - **Given:** [state]
143
+ - **When:** [action]
144
+ - **Then:** [expected]
145
+
146
+ ### S-003: <Story name> (P2)
147
+
148
+ **Description:** [user story]
149
+
150
+ **Acceptance Scenarios:**
151
+
152
+ AS-004: <short description>
153
+ - [flow description + expected behavior]
154
+
155
+ ## Constraints & Invariants
156
+ [rules that must ALWAYS hold]
157
+
158
+ ## Change Log
159
+
160
+ | Date | Change | Ref |
161
+ |------|--------|-----|
162
+ | <$(date +%Y-%m-%d)> | Initial creation | -- |
163
+ ```
164
+
165
+ ### Acceptance Scenario Depth
166
+
167
+ | Story priority | AS must contain | AS optional |
168
+ |---------------|----------------|-------------|
169
+ | P0 | Given + When + Then + Data + Setup | -- |
170
+ | P1 | Given + When + Then | Data, Setup |
171
+ | P2 | 1-2 line flow description + expected | Separate Given/When/Then |
172
+
173
+ **AS rules:**
174
+ - Every P0 story must have at least 1 happy path AS + 1 error path AS.
175
+ - Every P1 story must have at least 1 happy path AS.
176
+ - Every P2 story must have at least 1 AS.
177
+ - No orphan AS — every AS belongs to exactly 1 story.
178
+
179
+ Match depth to complexity. Simple CRUD = 3 stories. Complex auth = full template.
180
+
181
+ ### Writing Instructions
182
+
183
+ When generating stories and acceptance scenarios:
184
+
185
+ **DO:**
186
+ - Write AS that test one specific behavior each. If it fails, the developer knows exactly what broke.
187
+ - Use concrete values in Given/When/Then — `Given: user with balance $50` not `Given: user with some balance`.
188
+ - Name edge cases explicitly — `AS-005: Payment with insufficient funds` not `AS-005: Payment error`.
189
+ - Each AS should be independent — no AS depends on another running first.
190
+ - Include the boundary — `Given: cart with 0 items` and `Given: cart with 999 items`, not just `Given: cart with items`.
191
+
192
+ **DO NOT produce:**
193
+ - Vague AS: "Test that the feature works" — every AS must specify Given, When, Then (or a concrete flow for P2).
194
+ - Excessive AS: 30+ scenarios for simple CRUD — over-testing wastes time and creates maintenance burden.
195
+ - Implementation-testing AS: "Test that the database query uses an index" — test behavior, not internals.
196
+ - Duplicate AS: two scenarios verifying the same behavior with trivially different inputs.
197
+ - Framework-testing AS: "Test that the router handles the path" — test YOUR logic, not the framework.
198
+
199
+ ### Spec Section Guidelines
200
+
201
+ Include only sections that apply. Skip sections that add no value:
202
+ - **Data Model** — skip if feature has no persistent data or entities.
203
+ - **Constraints & Invariants** — skip if no rules must always hold.
204
+ - Don't generate filler for sections that don't apply.
26
205
 
27
- Create at `docs/specs/<feature-name>.md`. Include these sections (skip any that don't apply):
206
+ ### Consistency Check (after drafting)
28
207
 
29
- - **Overview** what, why, who. 2-3 sentences.
30
- - **Data Model** — entities, attributes, relationships (table format)
31
- - **Use Cases** — UC-NNN with actor, preconditions, flow, postconditions, error cases. Each use case contains:
32
- - **FR-NNN** (Functional Requirements) — specific behaviors the system must exhibit
33
- - **SC-NNN** (Success Criteria) — measurable non-functional targets (performance, limits)
34
- - **State Machine** — states and valid transitions (if applicable)
35
- - **Settings/Configuration** — configurable behavior and defaults
36
- - **Constraints & Invariants** — rules that must ALWAYS hold
37
- - **Error Handling** — how errors surface to users and are logged
38
- - **Security Considerations** — auth, authorization, data sensitivity
208
+ Before showing the draft, verify:
39
209
 
40
- Match depth to complexity. Simple CRUD = 1 paragraph overview + 3 use cases. Complex auth system = full template. Don't generate filler for sections that don't apply.
210
+ | # | Check | On failure |
211
+ |---|-------|-------------|
212
+ | CC1 | Every story has at least 1 AS | Add missing AS |
213
+ | CC2 | Every AS belongs to exactly 1 story | Assign orphan AS or delete |
214
+ | CC3 | P0 stories have error path AS | Add error AS if missing |
215
+ | CC4 | No 2 AS test the same behavior | Merge or delete duplicate |
216
+ | CC5 | Constraints have AS verifying them | Add AS for uncovered constraints |
217
+ | CC6 | Story count ≤7, AS count ≤20 | Go back to Phase 1 and split |
41
218
 
42
- Show the draft to the user. Wait for confirmation before generating the test plan.
219
+ All checks must pass before showing the draft to the user.
220
+
221
+ Show the draft to the user. Wait for confirmation before proceeding.
43
222
 
44
223
  ---
45
224
 
46
- ## Phase 2: Clarify Ambiguities
225
+ ## Phase 3: Clarify Ambiguities
47
226
 
48
- Before generating the test plan, scan the spec for gaps. A test plan built on a vague spec produces vague tests.
227
+ Before finalizing, scan the spec for gaps. Check BOTH the spec content AND the acceptance scenarios:
49
228
 
50
229
  | Lens | What to look for |
51
230
  |------|-----------------|
52
- | **Behavioral gaps** | Missing user actions, undefined system responses, incomplete flows |
53
- | **Data & persistence** | Undefined entities, missing relationships, unclear storage/lifecycle |
54
- | **Auth & access** | Who can do what is unclear, missing role definitions |
55
- | **Non-functional** | Vague adjectives without metrics ("fast", "secure", "scalable") — add SC-NNN with numbers |
56
- | **Integration** | Third-party API assumptions, unstated dependencies, SLA gaps |
57
- | **Concurrency & edge cases** | Multi-user scenarios, boundary conditions, error paths not addressed |
231
+ | Behavioral gaps | Missing user actions, undefined system responses, incomplete flows. Which stories lack an error path AS? |
232
+ | Data & persistence | Undefined entities, missing relationships, unclear storage/lifecycle |
233
+ | Auth & access | Who can do what is unclear, missing role definitions |
234
+ | Non-functional | Vague adjectives without metrics ("fast", "secure", "scalable") — add SC-NNN with concrete numbers |
235
+ | Integration | Third-party API assumptions, unstated dependencies, SLA gaps |
236
+ | Concurrency & edge cases | Multi-user scenarios, boundary conditions, error paths not addressed |
237
+ | AS completeness | Which AS is missing Given or Then? |
238
+ | AS overlap | Do 2 AS test the same behavior? |
239
+ | Story orphans | Which story has no AS? |
240
+ | Priority consistency | P0 story with only 1 happy path AS? |
241
+ | Constraint coverage | Which constraint has no AS verifying it? |
58
242
 
59
243
  Identify the top 3-5 ambiguities (most impactful first). For each, ask the user a targeted question with 2-4 concrete options and a recommendation.
60
244
 
61
- If the spec is clear and complete, 0 questions is valid. Don't manufacture ambiguity.
245
+ If 0 questions remain, you MUST state why not just "spec is clear." Cite at minimum:
246
+ - **Edge cases checked:** which boundary conditions were considered and found covered.
247
+ - **Error paths checked:** which failure modes were verified to have AS.
248
+ - **Integration points checked:** which external dependencies were reviewed.
249
+ - One-line verdict per lens from the table above that had no findings.
250
+
251
+ Example: *"0 questions. Edge cases: cart-empty and cart-max covered by AS-003/AS-004. Error paths: payment failure covered by AS-006. Auth: single-role feature, no ambiguity. No third-party integrations."*
252
+
253
+ Don't manufacture ambiguity — but don't skip the justification either.
254
+
255
+ **Present all questions (or the 0-question justification) to the user. Wait for answers before continuing.**
62
256
 
63
257
  Write clarifications back into the spec under `## Clarifications — <date>`.
64
- Then proceed to test plan generation.
258
+ Update any affected stories or AS to reflect the user's answers.
259
+ Then proceed to summary.
65
260
 
66
261
  ---
67
262
 
68
- ## Phase 3: Generate the Test Plan
263
+ ## Phase 4: Summary
69
264
 
70
- Read the spec. For each section, extract:
71
- 1. Use cases → at least 1 test (happy path) + 1 test (error path) each
72
- 2. State transitions → test valid AND invalid transitions
73
- 3. Constraints test they hold under edge conditions
74
- 4. Settings test default AND non-default values
75
- 5. Cross-cutting concerns (auth, validation) → integration-level tests
265
+ Show:
266
+ - Story counts (P0/P1/P2)
267
+ - AS count
268
+ - Directory structure created
269
+ - Implementation order: which stories to implement first (by priority + dependency)
270
+ - Next steps: "Implement stories in order. Use `/mf-test` to verify each story. For complex specs, run `/mf-challenge` first."
76
271
 
77
- Prioritize by risk: data loss/security = P0, error handling = P1, cosmetic/rare = P2.
272
+ ---
78
273
 
79
- ### Output
274
+ ## Mode C: Update Flow
80
275
 
81
- Write to `docs/test-plans/<feature-name>.md`:
276
+ > **⛔ CRITICAL — MANDATORY ORDER:**
277
+ > Snapshot MUST be created **BEFORE** updating the spec.
278
+ > If you update the spec first then create a snapshot → the snapshot contains the new content, old version is lost.
279
+ > Correct order: C2 (classify) → C3 (snapshot) → C4 (report) → C5 (apply changes).
280
+ > NEVER reverse the order of C3 and C5.
82
281
 
83
- ```markdown
84
- # Test Plan: <Feature Name>
282
+ ### C0: Read current state
283
+
284
+ Read `<feature>.md`. This is the current truth.
285
+
286
+ ### C1: Identify changes
287
+
288
+ Compare the requested changes against the current spec. List:
289
+ - Stories: added / modified / removed / unchanged
290
+ - AS: added / modified / removed / unchanged
291
+ - Constraints: added / modified / removed / unchanged
292
+
293
+ ### C2: Classification
294
+
295
+ Walk through table M1-M6. If ANY condition is true → Major.
296
+
297
+ | # | Condition | Example |
298
+ |---|-----------|---------|
299
+ | M1 | New story added | Adding S-004: Subscription |
300
+ | M2 | Story removed | Removing S-002: Invoice |
301
+ | M3 | Story priority changed | S-002 from P1 → P0 |
302
+ | M4 | Story's main flow changed (Given or When changed) | AS-003 Given changes state, or When changes action |
303
+ | M5 | Expected behavior changed (Then changed) for a P0 story | AS-001 Then changes result |
304
+ | M6 | Constraint/invariant added or removed | Adding "balance must not be negative" |
305
+
306
+ Minor = NONE of M1-M6 apply. Examples: typo fix, rewording without meaning change, adding/editing Data fields, formatting, adding Source ref.
85
307
 
86
- **Spec:** docs/specs/<feature-name>.md
87
- **Generated:** <$(date +%Y-%m-%d)>
308
+ **Major → create snapshot before updating.**
309
+ **Minor no snapshot. Update directly.**
88
310
 
89
- ## Test Cases
311
+ > **⛔ MUST check ALL 6 conditions M1-M6.** Do not stop early.
312
+ > Common mistake: check M1 = false, M2 = false → conclude Minor without checking M3-M6.
313
+ > Correct: walk through M1 to M6 completely. If ANY is true → Major.
90
314
 
91
- | ID | Priority | Type | UC | FR/SC | Description | Expected |
92
- |----|----------|------|----|-------|-------------|----------|
93
- | TC-001 | P0 | unit | UC-001 | FR-001 | Valid login returns token | 200 + JWT |
94
- | TC-002 | P0 | unit | UC-001 | FR-002 | Wrong password returns 401 | 401 + error msg |
315
+ ### C3: Snapshot (if Major)
95
316
 
96
- ## Implementation Order
97
- 1. TC-001, TC-002 (no dependencies — start here)
98
- 2. TC-003+ (depend on setup from earlier tests)
317
+ If Major → create snapshot:
99
318
 
100
- ## Coverage Notes
101
- - Highest risk areas: ...
102
- - Existing code needing modification: [file paths]
319
+ **Step 1:** Copy file using shell command (bit-perfect, not through LLM):
320
+
321
+ ```bash
322
+ mkdir -p docs/specs/<feature>/snapshots
323
+ cp docs/specs/<feature>/<feature>.md docs/specs/<feature>/snapshots/<YYYY-MM-DD>.md
103
324
  ```
104
325
 
105
- **Priority:** P0 = must have (blocks release), P1 = should have, P2 = nice to have.
106
- **Type:** `unit`, `integration`, `e2e`, `snapshot`, `performance`
326
+ If ref available: `cp ... snapshots/<YYYY-MM-DD>-<REF>.md`
327
+ If same-day snapshot exists: `cp ... snapshots/<YYYY-MM-DD>-2.md`
328
+
329
+ **Step 2:** Prepend header to the snapshot file (using Edit):
107
330
 
108
- ### What NOT to produce
109
- - "Test that the feature works" — too vague
110
- - 50+ test cases for simple CRUD — over-testing
111
- - Testing implementation details brittle
112
- - Duplicate tests verifying same behavior
331
+ ```markdown
332
+ # Snapshot: <Feature Name>
333
+ **Date:** <YYYY-MM-DD>
334
+ **Ref:** <ticket/issue if available, "--" otherwise>
335
+ **Reason:** <M1|M2|M3|M4|M5|M6 list which conditions triggered>
113
336
 
114
337
  ---
115
338
 
116
- ## Phase 4: Summary
339
+ ```
340
+
341
+ Header is added BEFORE the copied content. Do not modify any other content in the snapshot.
342
+
343
+ > **⛔ Why `cp` instead of LLM copy:** Specs require 101% accuracy. LLM text copy risks
344
+ > dropping lines, altering formatting, truncating long content. `cp` is bit-perfect.
117
345
 
118
- Show: test case counts (P0/P1/P2), implementation order, estimated scope.
119
- Next steps: "Use `/mf-test` after each chunk. For complex plans, run `/mf-challenge` first."
346
+ **Step 3:** Rotate snapshots. Check the spec frontmatter for `Snapshot limit: N`. If absent, default to **5**.
347
+ After creating a new snapshot, if `snapshots/` contains more files than the limit:
348
+ - Sort by timestamp in filename.
349
+ - Delete oldest files until count equals the limit.
350
+ - Only delete snapshot files. Log deletion in Change Log: `"Snapshot <filename> rotated out"`.
351
+
352
+ If Minor: skip C3 entirely.
353
+
354
+ **Snapshots are immutable.** Never edit a created snapshot. Wrong snapshot → create a new one, delete the wrong one.
355
+
356
+ **mf-plan creates snapshots. Developers do not create them manually.** Developers do not decide, intervene, or skip.
357
+
358
+ ### C4: Change report
359
+
360
+ Display to the user:
361
+
362
+ ```markdown
363
+ ## Change Report: <feature>
364
+ **Classification:** Major / Minor
365
+ **Snapshot:** Created <filename> / Not needed
366
+
367
+ ### Changes
368
+ | Item | Action | Detail |
369
+ |------|--------|--------|
370
+ | S-002 | Priority change | P1 → P0 |
371
+ | AS-003 | Updated | Then changed |
372
+ | S-004 | Added | Subscription (P1) |
373
+
374
+ ### Unchanged
375
+ S-001, S-003
376
+ ```
377
+
378
+ > **⛔ MUST wait for user confirmation before applying.**
379
+ > Do not show the report and apply in the same step.
380
+ > User has the right to reject or modify the change report.
381
+
382
+ ### C5: Apply changes
383
+
384
+ - Update the spec directly.
385
+ - Update `Last updated`.
386
+ - Write to Change Log.
387
+ - New AS use the next sequential ID (never reuse deleted IDs).
388
+ - New AS follow the same Writing Instructions as Phase 2 (concrete values, one behavior per AS, no vague/duplicate/implementation-testing scenarios).
389
+
390
+ > **⛔ Change Log MUST be updated at this step.**
391
+ > Common mistake: update the spec, forget to write to Change Log.
392
+ > Every C5 execution → Change Log MUST have a new row. No exceptions.
393
+ > (Exception: non-semantic changes — C7 — do not write to Change Log.)
394
+
395
+ ### C6: Consistency check
396
+
397
+ After updating, verify:
398
+
399
+ | # | Check | On failure → |
400
+ |---|-------|-------------|
401
+ | CC1 | Every story has at least 1 AS | Add missing AS |
402
+ | CC2 | Every AS belongs to exactly 1 story | Assign orphan AS or delete |
403
+ | CC3 | P0 stories have error path AS | Add error AS if missing |
404
+ | CC4 | No 2 AS test the same behavior | Suggest merge or delete duplicate |
405
+ | CC5 | Constraints have AS verifying them | Add AS for uncovered constraints |
406
+ | CC6 | Story count ≤7, AS count ≤20 | Suggest splitting spec (Phase 1) |
407
+
408
+ > **⛔ Consistency check is NOT optional.**
409
+ > Run CC1-CC6 after EVERY update (Major and Minor).
410
+ > Common mistake: finish update, looks fine → skip consistency check.
411
+ > CC6 (size check) is especially easy to skip — MUST check after every story/AS addition.
412
+
413
+ If any check fails → fix or report to user. NEVER skip.
414
+
415
+ ### C7: Non-semantic changes
416
+
417
+ If the change is only typo, formatting, or wording that does NOT change behavior:
418
+ - Edit directly, do not run C2-C6.
419
+ - Do not write to Change Log.
420
+ - Do not create snapshot.
421
+
422
+ Criteria for "non-semantic": Given, When, Then, priority, constraint **DO NOT** change in meaning.
423
+
424
+ > **⛔ When in doubt whether "non-semantic or behavioral?" → treat as behavioral.**
425
+ > Common mistake: LLM classifies a Then change as "rewording" to avoid snapshot overhead.
426
+ > Test: if a developer reads the AS before and after the change and would write different code → it is behavioral.
427
+
428
+ ### C8: Archival (all stories removed)
429
+
430
+ If a Mode C update results in ALL stories being removed from a spec:
431
+
432
+ 1. Create a snapshot per C3 (this is a Major change — M2 applies).
433
+ 2. Move the entire feature directory to `docs/specs/_archived/`:
434
+ ```bash
435
+ mkdir -p docs/specs/_archived
436
+ mv docs/specs/<feature> docs/specs/_archived/$(date +%Y-%m-%d)-<feature>
437
+ ```
438
+ 3. The archived directory retains all snapshots and the final spec state.
439
+ 4. Log in Change Log before archiving: `"Feature archived — all stories removed"`.
440
+
441
+ Archived specs are read-only. To resurrect a feature, copy from `_archived/` back to `docs/specs/` and run `/mf-plan` in Mode A.
442
+
443
+ ---
120
444
 
121
445
  ## Naming Convention
122
446
 
123
- Spec and test plan MUST share the same filename:
124
447
  ```
125
- docs/specs/<feature-name>.md ← kebab-case, 2-3 words
126
- docs/test-plans/<feature-name>.md ← same name
448
+ docs/specs/<feature>/ ← kebab-case, 2-3 words
449
+ <feature>.md ← same name as directory
450
+ snapshots/
451
+ YYYY-MM-DD.md
452
+ YYYY-MM-DD-<REF>.md
127
453
  ```
128
- - Use feature name, not module name: `user-auth.md` not `AuthService.md`
454
+
455
+ - Feature name, not module name: `user-auth/` not `AuthService/`
456
+ - Sub-specs when splitting: `<feature>-<sub>.md` in the same directory
129
457
  - No prefix/suffix: `user-auth.md` not `spec-user-auth.md`
130
458
 
131
- **Requirement IDs** — sequential per spec:
132
- - `UC-001` Use Case, `FR-001` Functional Requirement, `SC-001` Success Criteria, `TC-001` Test Case
133
- - Every TC must reference at least one FR or SC for traceability.
459
+ **ID rules:**
460
+ - `S-NNN` Story sequential per spec, starting from S-001
461
+ - `AS-NNN` Acceptance Scenario sequential per spec, across all stories, starting from AS-001
462
+ - `FR-NNN` Functional Requirement — if needed
463
+ - `SC-NNN` Success Criteria — if needed
464
+ - Deleted IDs must never be reused
465
+ - **Sub-spec numbering is local.** Each sub-spec starts its own S-001, AS-001 sequence.
466
+ Sub-specs are self-contained (Phase 1 rule), so IDs need not be globally unique.
467
+ - **Cross-references between sub-specs** use the sub-spec name as prefix:
468
+ `billing-refund:AS-002` refers to AS-002 in `billing-refund.md`.
469
+ Avoid cross-references where possible — if you need many, the split may be wrong.
470
+
471
+ ---
134
472
 
135
473
  ## Rules
136
- 1. **Spec-first.** Test plan derives from spec, never from code.
137
- 2. **Codebase-aware.** Don't plan features that already exist.
138
- 3. **Actionable.** Every test case must be unambiguous enough to implement directly.
139
- 4. **Proportional.** Simple feature = simple plan. Don't over-engineer CRUD.
140
- 5. **Traceable.** Every test links to a use case. No orphan tests.
141
- 6. **Consistent names.** Spec and test plan always share the same filename.
474
+
475
+ 1. **Spec-first.** Code serves the spec, not the other way around.
476
+ 2. **Single file = current truth.** `<feature>.md` always reflects the current state.
477
+ 3. **Codebase-aware.** Don't plan features that already exist.
478
+ 4. **Actionable.** Every AS must be clear enough to implement directly.
479
+ 5. **Proportional.** Simple feature = simple spec. Don't over-engineer CRUD.
480
+ 6. **Traceable.** Every AS belongs to 1 story. No orphan AS.
481
+ 7. **Bounded.** Spec exceeding 7 stories or 20 AS must be split.
482
+ 8. **Snapshot = mf-plan's job.** Developers do not create, delete, or edit snapshots.
483
+ 9. **Classification = checklist.** Major/Minor decided by table M1-M6, not judgment.
484
+ 10. **ID immutable.** Assigned IDs never change, never get reused.
485
+
486
+ ---
487
+
488
+ ## Traps — Common Mistakes That MUST Be Avoided
489
+
490
+ | # | Trap | Consequence | Rule violated |
491
+ |---|------|-------------|--------------|
492
+ | TRAP-1 | Update spec BEFORE creating snapshot | Snapshot contains new content, old version lost | Order C3→C5 |
493
+ | TRAP-2 | Check M1-M2 then stop, skip M3-M6 | Major change classified as Minor, snapshot missed | Classification M1-M6 |
494
+ | TRAP-3 | Skip consistency check (CC1-CC6) | Story without AS, P0 missing error path, spec bloat undetected | C6 Consistency |
495
+ | TRAP-4 | Classify behavioral change as "non-semantic" | Important change not snapshotted, not logged | C7 Non-semantic |
496
+ | TRAP-5 | Apply changes without waiting for user confirmation | User loses control, wrong changes can't be rolled back | C4 Change report |
497
+ | TRAP-6 | Update spec but forget to write Change Log | Change history lost, no one knows what happened | C5 Apply |
498
+ | TRAP-7 | Reuse deleted ID (S-003 deleted then assigned to new story) | Confusion with old references in code, commits, conversations | ID rules |
499
+ | TRAP-8 | LLM copies spec content instead of using `cp` for snapshot | Lines dropped, formatting altered, truncation — inaccurate snapshot | C3 Snapshot |
500
+ | TRAP-9 | Skip Phase 1 (Scope & Split) for large features | Spec bloats >7 stories, hard to maintain, hard to review | Phase 1 |
501
+ | TRAP-10 | Write P2-depth AS for a P0 story | P0 story lacks Given/When/Then/Data, developer can't implement | AS Depth |
@@ -7,7 +7,7 @@ Pre-merge code review — security, correctness, spec alignment.
7
7
  BASE=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||') || BASE="main"
8
8
  git log --oneline "$BASE"...HEAD
9
9
  ```
10
- 2. Check for spec in `docs/specs/` and test plan in `docs/test-plans/` — review against INTENT.
10
+ 2. Check for spec in `docs/specs/<feature>/<feature>.md` — review against INTENT.
11
11
  3. Read the diff: `git diff "$BASE"...HEAD`
12
12
 
13
13
  If `$ARGUMENTS` provided → scope to those files only.
@@ -50,11 +50,15 @@ Spend 60% of analysis on the primary focus. Cover all categories, but proportion
50
50
  - **Null safety:** Optionals used without guards? `object!.property` without nil check?
51
51
 
52
52
  ### Spec-Test Alignment (Medium)
53
- - Source changed but no spec update in `docs/specs/`? → flag
53
+ - Source changed but no spec update in `docs/specs/<feature>/`? → flag
54
54
  - Source changed but no test update? → flag
55
- - Spec changed but tests not updated? → flag
55
+ - Spec changed but acceptance scenarios or tests not updated? → flag
56
56
  - Code removed but dead tests remain? → flag
57
57
  - Spec contains vague requirements without metrics ("fast", "secure", "easy", "scalable")? → flag with suggestion to add SC-NNN with concrete numbers
58
+ - **AS-to-test name check:** Read the spec's `## Stories` section. For each AS-NNN, check if a test file contains a test named or described with that AS ID or its short description. Flag:
59
+ - AS in spec with no matching test → "AS-NNN: \<description\> has no corresponding test"
60
+ - Test referencing an AS-NNN that no longer exists in the spec → "Test references removed AS-NNN"
61
+ Keep this lightweight — match on AS-NNN identifiers and story name substrings, not semantic analysis.
58
62
 
59
63
  ### Code Quality (Medium)
60
64
  - Dead code: removed functions still imported elsewhere?
@@ -1,4 +1,4 @@
1
- Write tests from test plan, compile, run, fix until green.
1
+ Write tests from spec acceptance scenarios, compile, run, fix until green.
2
2
 
3
3
  ## Phase 0: Build Context
4
4
 
@@ -10,8 +10,7 @@ Write tests from test plan, compile, run, fix until green.
10
10
  If `$ARGUMENTS` provided → scope to that file or feature only.
11
11
  If no changes → "No source changes found. Specify a file or feature."
12
12
 
13
- 2. **Read the test plan** in `docs/test-plans/` if it exists this is your roadmap.
14
- 3. **Read the spec** in `docs/specs/` if it exists — understand the INTENT behind the code.
13
+ 2. **Read the spec** at `docs/specs/<feature>/<feature>.md` the `## Stories` section with acceptance scenarios is your roadmap. The `## Overview` and `## Constraints` sections tell you the INTENT behind the code.
15
14
  4. **Read existing tests** for the changed files — find patterns, fixtures, naming conventions. Don't duplicate.
16
15
 
17
16
  ---
@@ -86,14 +85,25 @@ If tests fail:
86
85
 
87
86
  ```
88
87
  Tests: X added, Y modified, Z unchanged
89
- Result: All passing ✓
88
+ Result: All passing ✓ / N failing ✗
90
89
  Coverage: [critical uncovered paths if any]
91
90
  Files: [test files touched]
92
- Plan: [TC-001 ✓, TC-002 ✓, TC-005 new]
91
+ Stories: [AS-001 ✓, AS-002 ✓, AS-005 new]
93
92
  ```
94
93
 
95
- If behavior changed: "Consider updating the spec in docs/specs/."
94
+ If behavior changed: "Consider updating the spec in docs/specs/<feature>/<feature>.md."
95
+
96
+ ### Spec Gap Detection
97
+
98
+ If a test fails due to an edge case, error path, or boundary condition that is NOT covered by any existing AS in the spec:
99
+
100
+ 1. State explicitly: **"This failure suggests a missing acceptance scenario."**
101
+ 2. Describe the gap: what behavior was tested, which story it belongs to, why no AS covers it.
102
+ 3. Prompt: **"Run `/mf-plan <spec-path> 'Add AS for <description>'` to add the missing scenario."**
103
+
104
+ Do not silently fix the test and move on. A test that has no corresponding AS means the spec is incomplete — the spec must be updated first.
96
105
 
97
106
  ## Rules
98
107
  1. **Behavior over implementation.** Test what code DOES, not how.
99
108
  2. **Independent tests.** Each test sets up its own state, cleans up after.
109
+ 3. **Spec stays upstream.** If a test reveals a spec gap, update the spec before adding the test.