claude-devkit-cli 1.2.5 → 1.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,141 +1,568 @@
1
- Generate spec + test plan from description or existing spec.
1
+ Generate spec with acceptance scenarios from description or existing spec.
2
+
3
+ ## Question Format
4
+
5
+ When presenting a question to the user with multiple options, ALWAYS use this structured format:
6
+
7
+ ```
8
+ Q<N>: <Plain-language problem statement — what needs deciding and why>
9
+
10
+ A) <option> — <1-line rationale>
11
+ Fit: X/10 | Trade-off: <what you gain vs. lose>
12
+
13
+ B) <option> — <1-line rationale>
14
+ Fit: X/10 | Trade-off: <what you gain vs. lose>
15
+
16
+ C) <option> — <1-line rationale> (if applicable)
17
+ Fit: X/10 | Trade-off: <what you gain vs. lose>
18
+
19
+ RECOMMENDATION: [A/B/C] — <one-sentence reason>
20
+ ```
21
+
22
+ **Fit scoring calibration:**
23
+ - **9-10:** Covers the requirement fully, no meaningful downside.
24
+ - **7-8:** Solid choice, minor trade-offs that are acceptable for most projects.
25
+ - **5-6:** Workable but defers significant decisions or adds friction.
26
+ - **3-4:** Shortcut — gets past the question but creates debt.
27
+ - **1-2:** Placeholder only, must be revisited.
28
+
29
+ Rules:
30
+ - 2-4 options per question. Never more than 4.
31
+ - Every option must have a Fit score AND a Trade-off. No score without rationale.
32
+ - RECOMMENDATION is mandatory. Pick one. State why.
33
+ - If two options score within 1 point, flag it: "Close call — A and B are both strong. Leaning A because [reason]."
34
+ - Present all questions at once (not one-by-one) unless the answer to Q1 changes what Q2 should be.
35
+
36
+ ---
2
37
 
3
38
  ## Determine mode
4
39
 
5
40
  Examine `$ARGUMENTS`:
6
41
 
7
- - **Mode A — Spec exists:** Argument is a file path read spec, generate test plan.
8
- - **Mode B No spec:** Argument is a description → create spec + test plan.
9
- - **Mode CUpdate:** Argument mentions "update" or existing path read existing, update surgically.
42
+ - **Mode A — New spec:** Argument is a feature description AND directory
43
+ `docs/specs/<feature>/` does not exist → create new spec.
44
+ - **Mode BAdd scenarios:** Argument is a path to an existing spec AND spec does not
45
+ contain `## Stories` section with AS-NNN IDs → read spec, add acceptance scenarios.
46
+ - **Mode C — Update:** Argument is a path to an existing spec AND spec already contains
47
+ `## Stories` section with AS-NNN IDs → update flow (see Mode C section below).
48
+
49
+ ---
50
+
51
+ ## Directory Structure
52
+
53
+ ```
54
+ docs/specs/
55
+ <feature>/
56
+ <feature>.md # current state — always read this file
57
+ snapshots/ # version history
58
+ <YYYY-MM-DD>.md
59
+ <YYYY-MM-DD>-<REF>.md
60
+ ```
61
+
62
+ - `<feature>.md` is the single source of truth. All spec reads start from this file.
63
+ - `snapshots/` contains full copies at points in time. Immutable — never edit a snapshot.
64
+ - When a feature is split, sub-specs live in the same directory:
65
+ ```
66
+ docs/specs/billing/
67
+ billing.md # root spec or overview
68
+ billing-checkout.md # sub-spec
69
+ billing-refund.md # sub-spec
70
+ snapshots/
71
+ ```
10
72
 
11
73
  ---
12
74
 
13
75
  ## Phase 0: Codebase Awareness
14
76
 
15
- Before writing anything:
16
- 1. Scan existing code in the feature area — what files, functions, types already exist?
17
- 2. Check `docs/specs/` is there already a spec for this or a related feature?
18
- 3. Check `docs/test-plans/` — any overlap with existing plans?
19
- 4. Identify project patterns test framework, naming conventions, directory structure.
77
+ Before writing anything, run this checklist:
78
+
79
+ | # | Action | How |
80
+ |---|--------|-----|
81
+ | P0-1 | **Keyword scan** | Grep the codebase for 3-5 keywords from the feature description. Note matching files, functions, types. |
82
+ | P0-2 | **Related specs** | List `docs/specs/` directories. Read the main spec of any related feature. Is there overlap? |
83
+ | P0-3 | **Dependency scan** | In the feature area, check imports/dependencies. What modules does this code touch? |
84
+ | P0-4 | **Reusable utilities** | Look for existing helpers, validators, formatters, shared types that the new feature could reuse. List candidates. |
85
+ | P0-5 | **Project patterns** | Identify test framework, naming conventions, directory structure from existing code. |
86
+ | P0-6 | **Change Log** | If the feature exists, read its Change Log to understand evolution. |
87
+ | P0-7 | **Knowledge graph** | If `codebase-memory-mcp` is available, use `search_code`, `get_architecture`, and `trace_call_path` to discover related code, understand architecture context, and trace dependencies — faster and more thorough than manual grep. |
88
+
89
+ If `codebase-memory-mcp` MCP server is connected, prefer it for P0-1, P0-3, P0-4 — it provides indexed search, architecture overview, and call path tracing that are more reliable than ad-hoc grep.
90
+
91
+ Record findings as bullet points — carry them into Phase 2 (Data Model, Constraints) and Phase 3 (ambiguity check).
20
92
 
21
93
  Don't plan in a vacuum. A spec that ignores existing code creates conflicts.
22
94
 
23
95
  ---
24
96
 
25
- ## Phase 1: Draft the Spec (Mode B only)
97
+ ## Phase 1: Scope & Split Assessment
98
+
99
+ Before writing the spec, assess size.
100
+
101
+ **Input:** Feature description from user (Mode A) or current spec (Mode B).
102
+ Mode C does not run Phase 1 — it uses its own flow (see Mode C section).
103
+
104
+ **Split rules:**
105
+
106
+ | # | Condition | Action |
107
+ |---|-----------|--------|
108
+ | T1 | Feature has >7 expected stories | MUST split |
109
+ | T2 | Feature has >20 expected AS | MUST split |
110
+ | T3 | Stories belong to different domains (e.g. payment + notification) | SHOULD split |
111
+ | T4 | A story can ship independently without depending on other stories | SHOULD split |
112
+ | T5 | Stories share a data model or state machine | DO NOT split |
113
+ | T6 | Splitting would duplicate >50% of context (entities, constraints) | DO NOT split |
114
+
115
+ "MUST" = mandatory split, inform user.
116
+ "SHOULD" = suggest split, present using **Question Format** with split vs. keep-together as options.
117
+ "DO NOT" = keep together, unless user requests split.
118
+
119
+ If splitting:
120
+ - Create the feature directory, place sub-specs in the same directory.
121
+ - Each sub-spec must be self-contained (own overview, relevant data model, constraints).
122
+ - No sub-spec should depend on another sub-spec to be understood.
123
+
124
+ ---
125
+
126
+ ## Phase 2: Draft the Spec (Mode A + B)
127
+
128
+ **Mode A:** Create a new spec at `docs/specs/<feature>/<feature>.md` using the template below. Include stories + acceptance scenarios.
129
+
130
+ **Mode B:** Read existing spec, add `## Stories` section with AS following depth rules.
131
+
132
+ ### Spec Template
133
+
134
+ ```markdown
135
+ # Spec: <Feature Name>
136
+
137
+ **Created:** <$(date +%Y-%m-%d)>
138
+ **Last updated:** <$(date +%Y-%m-%d)>
139
+ **Status:** Draft | Active | Deprecated
140
+ **Snapshot limit:** <N, optional — default 5>
141
+
142
+ ## Overview
143
+ [what, why, who — 2-3 sentences]
144
+
145
+ ## Data Model
146
+ [entities, attributes, relationships — if applicable]
147
+
148
+ ## Stories
149
+
150
+ ### S-001: <Story name> (P0)
151
+
152
+ **Description:** [user story]
153
+ **Source:** [optional: ticket/issue ref]
154
+
155
+ **Acceptance Scenarios:**
156
+
157
+ AS-001: <short description>
158
+ - **Given:** [state]
159
+ - **When:** [action]
160
+ - **Then:** [expected]
161
+ - **Data:** [test data]
162
+
163
+ AS-002: <short description>
164
+ - **Given:** [error state]
165
+ - **When:** [action]
166
+ - **Then:** [error handling]
167
+ - **Data:** [edge case data]
168
+
169
+ ### S-002: <Story name> (P1)
170
+
171
+ **Description:** [user story]
172
+ **Source:** [optional]
173
+
174
+ **Acceptance Scenarios:**
175
+
176
+ AS-003: <short description>
177
+ - **Given:** [state]
178
+ - **When:** [action]
179
+ - **Then:** [expected]
180
+
181
+ ### S-003: <Story name> (P2)
182
+
183
+ **Description:** [user story]
184
+
185
+ **Acceptance Scenarios:**
186
+
187
+ AS-004: <short description>
188
+ - [flow description + expected behavior]
189
+
190
+ ## Constraints & Invariants
191
+ [rules that must ALWAYS hold]
192
+
193
+ ## Change Log
194
+
195
+ | Date | Change | Ref |
196
+ |------|--------|-----|
197
+ | <$(date +%Y-%m-%d)> | Initial creation | -- |
198
+ ```
199
+
200
+ ### Acceptance Scenario Depth
201
+
202
+ | Story priority | AS must contain | AS optional |
203
+ |---------------|----------------|-------------|
204
+ | P0 | Given + When + Then + Data + Setup | -- |
205
+ | P1 | Given + When + Then | Data, Setup |
206
+ | P2 | 1-2 line flow description + expected | Separate Given/When/Then |
207
+
208
+ **AS rules:**
209
+ - Every P0 story must have at least 1 happy path AS + 1 error path AS.
210
+ - Every P1 story must have at least 1 happy path AS.
211
+ - Every P2 story must have at least 1 AS.
212
+ - No orphan AS — every AS belongs to exactly 1 story.
213
+
214
+ Match depth to complexity. Simple CRUD = 3 stories. Complex auth = full template.
215
+
216
+ ### Writing Instructions
217
+
218
+ When generating stories and acceptance scenarios:
219
+
220
+ **DO:**
221
+ - Write AS that test one specific behavior each. If it fails, the developer knows exactly what broke.
222
+ - Use concrete values in Given/When/Then — `Given: user with balance $50` not `Given: user with some balance`.
223
+ - Name edge cases explicitly — `AS-005: Payment with insufficient funds` not `AS-005: Payment error`.
224
+ - Each AS should be independent — no AS depends on another running first.
225
+ - Include the boundary — `Given: cart with 0 items` and `Given: cart with 999 items`, not just `Given: cart with items`.
226
+
227
+ **DO NOT produce:**
228
+ - Vague AS: "Test that the feature works" — every AS must specify Given, When, Then (or a concrete flow for P2).
229
+ - Excessive AS: 30+ scenarios for simple CRUD — over-testing wastes time and creates maintenance burden.
230
+ - Implementation-testing AS: "Test that the database query uses an index" — test behavior, not internals.
231
+ - Duplicate AS: two scenarios verifying the same behavior with trivially different inputs.
232
+ - Framework-testing AS: "Test that the router handles the path" — test YOUR logic, not the framework.
233
+
234
+ ### Spec Section Guidelines
235
+
236
+ Include only sections that apply. Skip sections that add no value:
237
+ - **Data Model** — skip if feature has no persistent data or entities.
238
+ - **Constraints & Invariants** — skip if no rules must always hold.
239
+ - Don't generate filler for sections that don't apply.
240
+
241
+ ### Consistency Check (after drafting)
26
242
 
27
- Create at `docs/specs/<feature-name>.md`. Include these sections (skip any that don't apply):
243
+ Before showing the draft, verify:
28
244
 
29
- - **Overview** what, why, who. 2-3 sentences.
30
- - **Data Model** — entities, attributes, relationships (table format)
31
- - **Use Cases** UC-NNN with actor, preconditions, flow, postconditions, error cases. Each use case contains:
32
- - **FR-NNN** (Functional Requirements) specific behaviors the system must exhibit
33
- - **SC-NNN** (Success Criteria) measurable non-functional targets (performance, limits)
34
- - **State Machine** states and valid transitions (if applicable)
35
- - **Settings/Configuration** configurable behavior and defaults
36
- - **Constraints & Invariants** rules that must ALWAYS hold
37
- - **Error Handling** — how errors surface to users and are logged
38
- - **Security Considerations** — auth, authorization, data sensitivity
245
+ | # | Check | On failure → |
246
+ |---|-------|-------------|
247
+ | CC1 | Every story has at least 1 AS | Add missing AS |
248
+ | CC2 | Every AS belongs to exactly 1 story | Assign orphan AS or delete |
249
+ | CC3 | P0 stories have error path AS | Add error AS if missing |
250
+ | CC4 | No 2 AS test the same behavior | Merge or delete duplicate |
251
+ | CC5 | Constraints have AS verifying them | Add AS for uncovered constraints |
252
+ | CC6 | Story count ≤7, AS count ≤20 | Go back to Phase 1 and split |
39
253
 
40
- Match depth to complexity. Simple CRUD = 1 paragraph overview + 3 use cases. Complex auth system = full template. Don't generate filler for sections that don't apply.
254
+ All checks must pass before showing the draft to the user.
41
255
 
42
- Show the draft to the user. Wait for confirmation before generating the test plan.
256
+ Show the draft to the user. Wait for confirmation before proceeding.
43
257
 
44
258
  ---
45
259
 
46
- ## Phase 2: Clarify Ambiguities
260
+ ## Phase 3: Clarify Ambiguities
47
261
 
48
- Before generating the test plan, scan the spec for gaps. A test plan built on a vague spec produces vague tests.
262
+ Before finalizing, scan the spec for gaps. Check BOTH the spec content AND the acceptance scenarios:
49
263
 
50
264
  | Lens | What to look for |
51
265
  |------|-----------------|
52
- | **Behavioral gaps** | Missing user actions, undefined system responses, incomplete flows |
53
- | **Data & persistence** | Undefined entities, missing relationships, unclear storage/lifecycle |
54
- | **Auth & access** | Who can do what is unclear, missing role definitions |
55
- | **Non-functional** | Vague adjectives without metrics ("fast", "secure", "scalable") — add SC-NNN with numbers |
56
- | **Integration** | Third-party API assumptions, unstated dependencies, SLA gaps |
57
- | **Concurrency & edge cases** | Multi-user scenarios, boundary conditions, error paths not addressed |
266
+ | Behavioral gaps | Missing user actions, undefined system responses, incomplete flows. Which stories lack an error path AS? |
267
+ | Data & persistence | Undefined entities, missing relationships, unclear storage/lifecycle |
268
+ | Auth & access | Who can do what is unclear, missing role definitions |
269
+ | Non-functional | Vague adjectives without metrics ("fast", "secure", "scalable") — add SC-NNN with concrete numbers |
270
+ | Integration | Third-party API assumptions, unstated dependencies, SLA gaps |
271
+ | Concurrency & edge cases | Multi-user scenarios, boundary conditions, error paths not addressed |
272
+ | AS completeness | Which AS is missing Given or Then? |
273
+ | AS overlap | Do 2 AS test the same behavior? |
274
+ | Story orphans | Which story has no AS? |
275
+ | Priority consistency | P0 story with only 1 happy path AS? |
276
+ | Constraint coverage | Which constraint has no AS verifying it? |
277
+
278
+ Identify the top 3-5 ambiguities (most impactful first). Present each using the **Question Format** (see top of this file). Example:
58
279
 
59
- Identify the top 3-5 ambiguities (most impactful first). For each, ask the user a targeted question with 2-4 concrete options and a recommendation.
280
+ ```
281
+ Q1: Auth strategy not specified — spec mentions "logged-in users" but no auth mechanism.
282
+
283
+ A) Session-based auth (cookie) — traditional, simple server-side
284
+ Fit: 8/10 | Trade-off: simple setup vs. harder to scale across services
285
+
286
+ B) JWT (stateless tokens) — API-friendly, no server session
287
+ Fit: 7/10 | Trade-off: scalable vs. token revocation complexity
288
+
289
+ C) Defer — add auth story later when auth requirements are clearer
290
+ Fit: 5/10 | Trade-off: unblocks now vs. may require spec rewrite later
291
+
292
+ RECOMMENDATION: A — single-service app, session auth is simplest path.
293
+ ```
294
+
295
+ If 0 questions remain, you MUST state why — not just "spec is clear." Cite at minimum:
296
+ - **Edge cases checked:** which boundary conditions were considered and found covered.
297
+ - **Error paths checked:** which failure modes were verified to have AS.
298
+ - **Integration points checked:** which external dependencies were reviewed.
299
+ - One-line verdict per lens from the table above that had no findings.
300
+
301
+ Example: *"0 questions. Edge cases: cart-empty and cart-max covered by AS-003/AS-004. Error paths: payment failure covered by AS-006. Auth: single-role feature, no ambiguity. No third-party integrations."*
302
+
303
+ Don't manufacture ambiguity — but don't skip the justification either.
60
304
 
61
- If the spec is clear and complete, 0 questions is valid. Don't manufacture ambiguity.
305
+ **Present all questions (or the 0-question justification) to the user. Wait for answers before continuing.**
62
306
 
63
307
  Write clarifications back into the spec under `## Clarifications — <date>`.
64
- Then proceed to test plan generation.
308
+ Update any affected stories or AS to reflect the user's answers.
309
+ Then proceed to summary.
65
310
 
66
311
  ---
67
312
 
68
- ## Phase 3: Generate the Test Plan
313
+ ## Phase 4: Summary
314
+
315
+ Show:
316
+ - Story counts (P0/P1/P2)
317
+ - AS count
318
+ - Directory structure created
319
+ - Implementation order: which stories to implement first (by priority + dependency)
320
+ - Next steps: "Implement stories in order. Use `/mf-test` to verify each story. For complex specs, run `/mf-challenge` first."
69
321
 
70
- Read the spec. For each section, extract:
71
- 1. Use cases → at least 1 test (happy path) + 1 test (error path) each
72
- 2. State transitions → test valid AND invalid transitions
73
- 3. Constraints → test they hold under edge conditions
74
- 4. Settings → test default AND non-default values
75
- 5. Cross-cutting concerns (auth, validation) → integration-level tests
322
+ ---
76
323
 
77
- Prioritize by risk: data loss/security = P0, error handling = P1, cosmetic/rare = P2.
324
+ ## Mode C: Update Flow
78
325
 
79
- ### Output
326
+ > **⛔ CRITICAL — MANDATORY ORDER:**
327
+ > Snapshot MUST be created **BEFORE** updating the spec.
328
+ > If you update the spec first then create a snapshot → the snapshot contains the new content, old version is lost.
329
+ > Correct order: C2 (classify) → C3 (snapshot) → C4 (report) → C5 (apply changes).
330
+ > NEVER reverse the order of C3 and C5.
80
331
 
81
- Write to `docs/test-plans/<feature-name>.md`:
332
+ ### C0: Read current state
82
333
 
83
- ```markdown
84
- # Test Plan: <Feature Name>
334
+ Read `<feature>.md`. This is the current truth.
335
+
336
+ ### C1: Identify changes
337
+
338
+ Compare the requested changes against the current spec. List:
339
+ - Stories: added / modified / removed / unchanged
340
+ - AS: added / modified / removed / unchanged
341
+ - Constraints: added / modified / removed / unchanged
342
+
343
+ ### C2: Classification
344
+
345
+ Walk through table M1-M6. If ANY condition is true → Major.
85
346
 
86
- **Spec:** docs/specs/<feature-name>.md
87
- **Generated:** <$(date +%Y-%m-%d)>
347
+ | # | Condition | Example |
348
+ |---|-----------|---------|
349
+ | M1 | New story added | Adding S-004: Subscription |
350
+ | M2 | Story removed | Removing S-002: Invoice |
351
+ | M3 | Story priority changed | S-002 from P1 → P0 |
352
+ | M4 | Story's main flow changed (Given or When changed) | AS-003 Given changes state, or When changes action |
353
+ | M5 | Expected behavior changed (Then changed) for a P0 story | AS-001 Then changes result |
354
+ | M6 | Constraint/invariant added or removed | Adding "balance must not be negative" |
88
355
 
89
- ## Test Cases
356
+ Minor = NONE of M1-M6 apply. Examples: typo fix, rewording without meaning change, adding/editing Data fields, formatting, adding Source ref.
90
357
 
91
- | ID | Priority | Type | UC | FR/SC | Description | Expected |
92
- |----|----------|------|----|-------|-------------|----------|
93
- | TC-001 | P0 | unit | UC-001 | FR-001 | Valid login returns token | 200 + JWT |
94
- | TC-002 | P0 | unit | UC-001 | FR-002 | Wrong password returns 401 | 401 + error msg |
358
+ **Major create snapshot before updating.**
359
+ **Minor → no snapshot. Update directly.**
95
360
 
96
- ## Implementation Order
97
- 1. TC-001, TC-002 (no dependencies start here)
98
- 2. TC-003+ (depend on setup from earlier tests)
361
+ > **⛔ MUST check ALL 6 conditions M1-M6.** Do not stop early.
362
+ > Common mistake: check M1 = false, M2 = false conclude Minor without checking M3-M6.
363
+ > Correct: walk through M1 to M6 completely. If ANY is true Major.
99
364
 
100
- ## Coverage Notes
101
- - Highest risk areas: ...
102
- - Existing code needing modification: [file paths]
365
+ ### C3: Snapshot (if Major)
366
+
367
+ If Major create snapshot:
368
+
369
+ **Step 1:** Copy file using shell command (bit-perfect, not through LLM):
370
+
371
+ ```bash
372
+ mkdir -p docs/specs/<feature>/snapshots
373
+ cp docs/specs/<feature>/<feature>.md docs/specs/<feature>/snapshots/<YYYY-MM-DD>.md
103
374
  ```
104
375
 
105
- **Priority:** P0 = must have (blocks release), P1 = should have, P2 = nice to have.
106
- **Type:** `unit`, `integration`, `e2e`, `snapshot`, `performance`
376
+ If ref available: `cp ... snapshots/<YYYY-MM-DD>-<REF>.md`
377
+ If same-day snapshot exists: `cp ... snapshots/<YYYY-MM-DD>-2.md`
107
378
 
108
- ### What NOT to produce
109
- - "Test that the feature works" — too vague
110
- - 50+ test cases for simple CRUD — over-testing
111
- - Testing implementation details — brittle
112
- - Duplicate tests verifying same behavior
379
+ **Step 2:** Prepend header to the snapshot file (using Edit):
380
+
381
+ ```markdown
382
+ # Snapshot: <Feature Name>
383
+ **Date:** <YYYY-MM-DD>
384
+ **Ref:** <ticket/issue if available, "--" otherwise>
385
+ **Reason:** <M1|M2|M3|M4|M5|M6 — list which conditions triggered>
113
386
 
114
387
  ---
115
388
 
116
- ## Phase 4: Summary
389
+ ```
390
+
391
+ Header is added BEFORE the copied content. Do not modify any other content in the snapshot.
392
+
393
+ > **⛔ Why `cp` instead of LLM copy:** Specs require 101% accuracy. LLM text copy risks
394
+ > dropping lines, altering formatting, truncating long content. `cp` is bit-perfect.
395
+
396
+ **Step 3:** Rotate snapshots. Check the spec frontmatter for `Snapshot limit: N`. If absent, default to **5**.
397
+ After creating a new snapshot, if `snapshots/` contains more files than the limit:
398
+ - Sort by timestamp in filename.
399
+ - Delete oldest files until count equals the limit.
400
+ - Only delete snapshot files. Log deletion in Change Log: `"Snapshot <filename> rotated out"`.
401
+
402
+ If Minor: skip C3 entirely.
403
+
404
+ **Snapshots are immutable.** Never edit a created snapshot. Wrong snapshot → create a new one, delete the wrong one.
405
+
406
+ **mf-plan creates snapshots. Developers do not create them manually.** Developers do not decide, intervene, or skip.
407
+
408
+ ### C4: Change report
409
+
410
+ Display to the user:
411
+
412
+ ```markdown
413
+ ## Change Report: <feature>
414
+ **Classification:** Major / Minor
415
+ **Snapshot:** Created <filename> / Not needed
416
+
417
+ ### Changes
418
+ | Item | Action | Detail |
419
+ |------|--------|--------|
420
+ | S-002 | Priority change | P1 → P0 |
421
+ | AS-003 | Updated | Then changed |
422
+ | S-004 | Added | Subscription (P1) |
423
+
424
+ ### Unchanged
425
+ S-001, S-003
426
+ ```
427
+
428
+ Present the decision using **Question Format**:
429
+
430
+ ```
431
+ Q1: Apply these changes to <feature> spec?
432
+
433
+ A) Apply all — accept the full change report as shown
434
+ Fit: N/10 | Trade-off: fast vs. no per-item control
435
+
436
+ B) Review each — walk through changes one by one, accept/reject/modify
437
+ Fit: N/10 | Trade-off: precise control vs. slower
438
+
439
+ C) Reject all — discard and start over
440
+ Fit: N/10 | Trade-off: clean slate vs. loses work
441
+
442
+ RECOMMENDATION: [A or B] — <reason based on change count and complexity>
443
+ ```
444
+
445
+ > **⛔ MUST wait for user confirmation before applying.**
446
+ > Do not show the report and apply in the same step.
447
+ > User has the right to reject or modify the change report.
448
+
449
+ ### C5: Apply changes
450
+
451
+ - Update the spec directly.
452
+ - Update `Last updated`.
453
+ - Write to Change Log.
454
+ - New AS use the next sequential ID (never reuse deleted IDs).
455
+ - New AS follow the same Writing Instructions as Phase 2 (concrete values, one behavior per AS, no vague/duplicate/implementation-testing scenarios).
456
+
457
+ > **⛔ Change Log MUST be updated at this step.**
458
+ > Common mistake: update the spec, forget to write to Change Log.
459
+ > Every C5 execution → Change Log MUST have a new row. No exceptions.
460
+ > (Exception: non-semantic changes — C7 — do not write to Change Log.)
461
+
462
+ ### C6: Consistency check
463
+
464
+ After updating, verify:
465
+
466
+ | # | Check | On failure → |
467
+ |---|-------|-------------|
468
+ | CC1 | Every story has at least 1 AS | Add missing AS |
469
+ | CC2 | Every AS belongs to exactly 1 story | Assign orphan AS or delete |
470
+ | CC3 | P0 stories have error path AS | Add error AS if missing |
471
+ | CC4 | No 2 AS test the same behavior | Suggest merge or delete duplicate |
472
+ | CC5 | Constraints have AS verifying them | Add AS for uncovered constraints |
473
+ | CC6 | Story count ≤7, AS count ≤20 | Suggest splitting spec (Phase 1) |
474
+
475
+ > **⛔ Consistency check is NOT optional.**
476
+ > Run CC1-CC6 after EVERY update (Major and Minor).
477
+ > Common mistake: finish update, looks fine → skip consistency check.
478
+ > CC6 (size check) is especially easy to skip — MUST check after every story/AS addition.
117
479
 
118
- Show: test case counts (P0/P1/P2), implementation order, estimated scope.
119
- Next steps: "Use `/mf-test` after each chunk. For complex plans, run `/mf-challenge` first."
480
+ If any check fails fix or report to user. NEVER skip.
481
+
482
+ ### C7: Non-semantic changes
483
+
484
+ If the change is only typo, formatting, or wording that does NOT change behavior:
485
+ - Edit directly, do not run C2-C6.
486
+ - Do not write to Change Log.
487
+ - Do not create snapshot.
488
+
489
+ Criteria for "non-semantic": Given, When, Then, priority, constraint **DO NOT** change in meaning.
490
+
491
+ > **⛔ When in doubt whether "non-semantic or behavioral?" → treat as behavioral.**
492
+ > Common mistake: LLM classifies a Then change as "rewording" to avoid snapshot overhead.
493
+ > Test: if a developer reads the AS before and after the change and would write different code → it is behavioral.
494
+
495
+ ### C8: Archival (all stories removed)
496
+
497
+ If a Mode C update results in ALL stories being removed from a spec:
498
+
499
+ 1. Create a snapshot per C3 (this is a Major change — M2 applies).
500
+ 2. Move the entire feature directory to `docs/specs/_archived/`:
501
+ ```bash
502
+ mkdir -p docs/specs/_archived
503
+ mv docs/specs/<feature> docs/specs/_archived/$(date +%Y-%m-%d)-<feature>
504
+ ```
505
+ 3. The archived directory retains all snapshots and the final spec state.
506
+ 4. Log in Change Log before archiving: `"Feature archived — all stories removed"`.
507
+
508
+ Archived specs are read-only. To resurrect a feature, copy from `_archived/` back to `docs/specs/` and run `/mf-plan` in Mode A.
509
+
510
+ ---
120
511
 
121
512
  ## Naming Convention
122
513
 
123
- Spec and test plan MUST share the same filename:
124
514
  ```
125
- docs/specs/<feature-name>.md ← kebab-case, 2-3 words
126
- docs/test-plans/<feature-name>.md ← same name
515
+ docs/specs/<feature>/ ← kebab-case, 2-3 words
516
+ <feature>.md ← same name as directory
517
+ snapshots/
518
+ YYYY-MM-DD.md
519
+ YYYY-MM-DD-<REF>.md
127
520
  ```
128
- - Use feature name, not module name: `user-auth.md` not `AuthService.md`
521
+
522
+ - Feature name, not module name: `user-auth/` not `AuthService/`
523
+ - Sub-specs when splitting: `<feature>-<sub>.md` in the same directory
129
524
  - No prefix/suffix: `user-auth.md` not `spec-user-auth.md`
130
525
 
131
- **Requirement IDs** — sequential per spec:
132
- - `UC-001` Use Case, `FR-001` Functional Requirement, `SC-001` Success Criteria, `TC-001` Test Case
133
- - Every TC must reference at least one FR or SC for traceability.
526
+ **ID rules:**
527
+ - `S-NNN` Story sequential per spec, starting from S-001
528
+ - `AS-NNN` Acceptance Scenario sequential per spec, across all stories, starting from AS-001
529
+ - `FR-NNN` Functional Requirement — if needed
530
+ - `SC-NNN` Success Criteria — if needed
531
+ - Deleted IDs must never be reused
532
+ - **Sub-spec numbering is local.** Each sub-spec starts its own S-001, AS-001 sequence.
533
+ Sub-specs are self-contained (Phase 1 rule), so IDs need not be globally unique.
534
+ - **Cross-references between sub-specs** use the sub-spec name as prefix:
535
+ `billing-refund:AS-002` refers to AS-002 in `billing-refund.md`.
536
+ Avoid cross-references where possible — if you need many, the split may be wrong.
537
+
538
+ ---
134
539
 
135
540
  ## Rules
136
- 1. **Spec-first.** Test plan derives from spec, never from code.
137
- 2. **Codebase-aware.** Don't plan features that already exist.
138
- 3. **Actionable.** Every test case must be unambiguous enough to implement directly.
139
- 4. **Proportional.** Simple feature = simple plan. Don't over-engineer CRUD.
140
- 5. **Traceable.** Every test links to a use case. No orphan tests.
141
- 6. **Consistent names.** Spec and test plan always share the same filename.
541
+
542
+ 1. **Spec-first.** Code serves the spec, not the other way around.
543
+ 2. **Single file = current truth.** `<feature>.md` always reflects the current state.
544
+ 3. **Codebase-aware.** Don't plan features that already exist.
545
+ 4. **Actionable.** Every AS must be clear enough to implement directly.
546
+ 5. **Proportional.** Simple feature = simple spec. Don't over-engineer CRUD.
547
+ 6. **Traceable.** Every AS belongs to 1 story. No orphan AS.
548
+ 7. **Bounded.** Spec exceeding 7 stories or 20 AS must be split.
549
+ 8. **Snapshot = mf-plan's job.** Developers do not create, delete, or edit snapshots.
550
+ 9. **Classification = checklist.** Major/Minor decided by table M1-M6, not judgment.
551
+ 10. **ID immutable.** Assigned IDs never change, never get reused.
552
+
553
+ ---
554
+
555
+ ## Traps — Common Mistakes That MUST Be Avoided
556
+
557
+ | # | Trap | Consequence | Rule violated |
558
+ |---|------|-------------|--------------|
559
+ | TRAP-1 | Update spec BEFORE creating snapshot | Snapshot contains new content, old version lost | Order C3→C5 |
560
+ | TRAP-2 | Check M1-M2 then stop, skip M3-M6 | Major change classified as Minor, snapshot missed | Classification M1-M6 |
561
+ | TRAP-3 | Skip consistency check (CC1-CC6) | Story without AS, P0 missing error path, spec bloat undetected | C6 Consistency |
562
+ | TRAP-4 | Classify behavioral change as "non-semantic" | Important change not snapshotted, not logged | C7 Non-semantic |
563
+ | TRAP-5 | Apply changes without waiting for user confirmation | User loses control, wrong changes can't be rolled back | C4 Change report |
564
+ | TRAP-6 | Update spec but forget to write Change Log | Change history lost, no one knows what happened | C5 Apply |
565
+ | TRAP-7 | Reuse deleted ID (S-003 deleted then assigned to new story) | Confusion with old references in code, commits, conversations | ID rules |
566
+ | TRAP-8 | LLM copies spec content instead of using `cp` for snapshot | Lines dropped, formatting altered, truncation — inaccurate snapshot | C3 Snapshot |
567
+ | TRAP-9 | Skip Phase 1 (Scope & Split) for large features | Spec bloats >7 stories, hard to maintain, hard to review | Phase 1 |
568
+ | TRAP-10 | Write P2-depth AS for a P0 story | P0 story lacks Given/When/Then/Data, developer can't implement | AS Depth |