npm - claude-devkit-cli - Versions diffs - 1.2.5 → 1.3.0 - Mend

claude-devkit-cli 1.2.5 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/README.md +122 -95
package/package.json +1 -1
package/templates/.claude/CLAUDE.md +10 -10
package/templates/.claude/commands/mf-challenge.md +6 -8
package/templates/.claude/commands/mf-fix.md +2 -2
package/templates/.claude/commands/mf-plan.md +441 -81
package/templates/.claude/commands/mf-review.md +7 -3
package/templates/.claude/commands/mf-test.md +16 -6
package/templates/docs/WORKFLOW.md +33 -34

package/templates/.claude/commands/mf-plan.md CHANGED Viewed

@@ -1,141 +1,501 @@
-Generate spec + test plan from description or existing spec.
+Generate spec with acceptance scenarios from description or existing spec.
 ## Determine mode
 Examine `$ARGUMENTS`:
-- **Mode A — Spec exists:** Argument is a file path → read spec, generate test plan.
-- **Mode B — No spec:** Argument is a description → create spec + test plan.
-- **Mode C — Update:** Argument mentions "update" or existing path → read existing, update surgically.
+- **Mode A — New spec:** Argument is a feature description AND directory
+  `docs/specs/<feature>/` does not exist → create new spec.
+- **Mode B — Add scenarios:** Argument is a path to an existing spec AND spec does not
+  contain `## Stories` section with AS-NNN IDs → read spec, add acceptance scenarios.
+- **Mode C — Update:** Argument is a path to an existing spec AND spec already contains
+  `## Stories` section with AS-NNN IDs → update flow (see Mode C section below).
+---
+## Directory Structure
+```
+docs/specs/
+  <feature>/
+    <feature>.md                    # current state — always read this file
+    snapshots/                      # version history
+      <YYYY-MM-DD>.md
+      <YYYY-MM-DD>-<REF>.md
+```
+- `<feature>.md` is the single source of truth. All spec reads start from this file.
+- `snapshots/` contains full copies at points in time. Immutable — never edit a snapshot.
+- When a feature is split, sub-specs live in the same directory:
+  ```
+  docs/specs/billing/
+    billing.md                # root spec or overview
+    billing-checkout.md       # sub-spec
+    billing-refund.md         # sub-spec
+    snapshots/
+  ```
 ---
 ## Phase 0: Codebase Awareness
-Before writing anything:
-1. Scan existing code in the feature area — what files, functions, types already exist?
-2. Check `docs/specs/` — is there already a spec for this or a related feature?
-3. Check `docs/test-plans/` — any overlap with existing plans?
-4. Identify project patterns — test framework, naming conventions, directory structure.
+Before writing anything, run this checklist:
+| # | Action | How |
+|---|--------|-----|
+| P0-1 | **Keyword scan** | Grep the codebase for 3-5 keywords from the feature description. Note matching files, functions, types. |
+| P0-2 | **Related specs** | List `docs/specs/` directories. Read the main spec of any related feature. Is there overlap? |
+| P0-3 | **Dependency scan** | In the feature area, check imports/dependencies. What modules does this code touch? |
+| P0-4 | **Reusable utilities** | Look for existing helpers, validators, formatters, shared types that the new feature could reuse. List candidates. |
+| P0-5 | **Project patterns** | Identify test framework, naming conventions, directory structure from existing code. |
+| P0-6 | **Change Log** | If the feature exists, read its Change Log to understand evolution. |
+| P0-7 | **Knowledge graph** | If `codebase-memory-mcp` is available, use `search_code`, `get_architecture`, and `trace_call_path` to discover related code, understand architecture context, and trace dependencies — faster and more thorough than manual grep. |
+If `codebase-memory-mcp` MCP server is connected, prefer it for P0-1, P0-3, P0-4 — it provides indexed search, architecture overview, and call path tracing that are more reliable than ad-hoc grep.
+Record findings as bullet points — carry them into Phase 2 (Data Model, Constraints) and Phase 3 (ambiguity check).
 Don't plan in a vacuum. A spec that ignores existing code creates conflicts.
 ---
-## Phase 1: Draft the Spec (Mode B only)
+## Phase 1: Scope & Split Assessment
+Before writing the spec, assess size.
+**Input:** Feature description from user (Mode A) or current spec (Mode B).
+Mode C does not run Phase 1 — it uses its own flow (see Mode C section).
+**Split rules:**
+| # | Condition | Action |
+|---|-----------|--------|
+| T1 | Feature has >7 expected stories | MUST split |
+| T2 | Feature has >20 expected AS | MUST split |
+| T3 | Stories belong to different domains (e.g. payment + notification) | SHOULD split |
+| T4 | A story can ship independently without depending on other stories | SHOULD split |
+| T5 | Stories share a data model or state machine | DO NOT split |
+| T6 | Splitting would duplicate >50% of context (entities, constraints) | DO NOT split |
+"MUST" = mandatory split, inform user.
+"SHOULD" = suggest split, ask user.
+"DO NOT" = keep together, unless user requests split.
+If splitting:
+- Create the feature directory, place sub-specs in the same directory.
+- Each sub-spec must be self-contained (own overview, relevant data model, constraints).
+- No sub-spec should depend on another sub-spec to be understood.
+---
+## Phase 2: Draft the Spec (Mode A + B)
+**Mode A:** Create a new spec at `docs/specs/<feature>/<feature>.md` using the template below. Include stories + acceptance scenarios.
+**Mode B:** Read existing spec, add `## Stories` section with AS following depth rules.
+### Spec Template
+```markdown
+# Spec: <Feature Name>
+**Created:** <$(date +%Y-%m-%d)>
+**Last updated:** <$(date +%Y-%m-%d)>
+**Status:** Draft | Active | Deprecated
+**Snapshot limit:** <N, optional — default 5>
+## Overview
+[what, why, who — 2-3 sentences]
+## Data Model
+[entities, attributes, relationships — if applicable]
+## Stories
+### S-001: <Story name> (P0)
+**Description:** [user story]
+**Source:** [optional: ticket/issue ref]
+**Acceptance Scenarios:**
+AS-001: <short description>
+- **Given:** [state]
+- **When:** [action]
+- **Then:** [expected]
+- **Data:** [test data]
+AS-002: <short description>
+- **Given:** [error state]
+- **When:** [action]
+- **Then:** [error handling]
+- **Data:** [edge case data]
+### S-002: <Story name> (P1)
+**Description:** [user story]
+**Source:** [optional]
+**Acceptance Scenarios:**
+AS-003: <short description>
+- **Given:** [state]
+- **When:** [action]
+- **Then:** [expected]
+### S-003: <Story name> (P2)
+**Description:** [user story]
+**Acceptance Scenarios:**
+AS-004: <short description>
+- [flow description + expected behavior]
+## Constraints & Invariants
+[rules that must ALWAYS hold]
+## Change Log
+| Date | Change | Ref |
+|------|--------|-----|
+| <$(date +%Y-%m-%d)> | Initial creation | -- |
+```
+### Acceptance Scenario Depth
+| Story priority | AS must contain | AS optional |
+|---------------|----------------|-------------|
+| P0 | Given + When + Then + Data + Setup | -- |
+| P1 | Given + When + Then | Data, Setup |
+| P2 | 1-2 line flow description + expected | Separate Given/When/Then |
+**AS rules:**
+- Every P0 story must have at least 1 happy path AS + 1 error path AS.
+- Every P1 story must have at least 1 happy path AS.
+- Every P2 story must have at least 1 AS.
+- No orphan AS — every AS belongs to exactly 1 story.
+Match depth to complexity. Simple CRUD = 3 stories. Complex auth = full template.
+### Writing Instructions
+When generating stories and acceptance scenarios:
+**DO:**
+- Write AS that test one specific behavior each. If it fails, the developer knows exactly what broke.
+- Use concrete values in Given/When/Then — `Given: user with balance $50` not `Given: user with some balance`.
+- Name edge cases explicitly — `AS-005: Payment with insufficient funds` not `AS-005: Payment error`.
+- Each AS should be independent — no AS depends on another running first.
+- Include the boundary — `Given: cart with 0 items` and `Given: cart with 999 items`, not just `Given: cart with items`.
+**DO NOT produce:**
+- Vague AS: "Test that the feature works" — every AS must specify Given, When, Then (or a concrete flow for P2).
+- Excessive AS: 30+ scenarios for simple CRUD — over-testing wastes time and creates maintenance burden.
+- Implementation-testing AS: "Test that the database query uses an index" — test behavior, not internals.
+- Duplicate AS: two scenarios verifying the same behavior with trivially different inputs.
+- Framework-testing AS: "Test that the router handles the path" — test YOUR logic, not the framework.
+### Spec Section Guidelines
+Include only sections that apply. Skip sections that add no value:
+- **Data Model** — skip if feature has no persistent data or entities.
+- **Constraints & Invariants** — skip if no rules must always hold.
+- Don't generate filler for sections that don't apply.
-Create at `docs/specs/<feature-name>.md`. Include these sections (skip any that don't apply):
+### Consistency Check (after drafting)
-- **Overview** — what, why, who. 2-3 sentences.
-- **Data Model** — entities, attributes, relationships (table format)
-- **Use Cases** — UC-NNN with actor, preconditions, flow, postconditions, error cases. Each use case contains:
-  - **FR-NNN** (Functional Requirements) — specific behaviors the system must exhibit
-  - **SC-NNN** (Success Criteria) — measurable non-functional targets (performance, limits)
-- **State Machine** — states and valid transitions (if applicable)
-- **Settings/Configuration** — configurable behavior and defaults
-- **Constraints & Invariants** — rules that must ALWAYS hold
-- **Error Handling** — how errors surface to users and are logged
-- **Security Considerations** — auth, authorization, data sensitivity
+Before showing the draft, verify:
-Match depth to complexity. Simple CRUD = 1 paragraph overview + 3 use cases. Complex auth system = full template. Don't generate filler for sections that don't apply.
+| # | Check | On failure → |
+|---|-------|-------------|
+| CC1 | Every story has at least 1 AS | Add missing AS |
+| CC2 | Every AS belongs to exactly 1 story | Assign orphan AS or delete |
+| CC3 | P0 stories have error path AS | Add error AS if missing |
+| CC4 | No 2 AS test the same behavior | Merge or delete duplicate |
+| CC5 | Constraints have AS verifying them | Add AS for uncovered constraints |
+| CC6 | Story count ≤7, AS count ≤20 | Go back to Phase 1 and split |
-Show the draft to the user. Wait for confirmation before generating the test plan.
+All checks must pass before showing the draft to the user.
+Show the draft to the user. Wait for confirmation before proceeding.
 ---
-## Phase 2: Clarify Ambiguities
+## Phase 3: Clarify Ambiguities
-Before generating the test plan, scan the spec for gaps. A test plan built on a vague spec produces vague tests.
+Before finalizing, scan the spec for gaps. Check BOTH the spec content AND the acceptance scenarios:
 | Lens | What to look for |
 |------|-----------------|
-| **Behavioral gaps** | Missing user actions, undefined system responses, incomplete flows |
-| **Data & persistence** | Undefined entities, missing relationships, unclear storage/lifecycle |
-| **Auth & access** | Who can do what is unclear, missing role definitions |
-| **Non-functional** | Vague adjectives without metrics ("fast", "secure", "scalable") — add SC-NNN with numbers |
-| **Integration** | Third-party API assumptions, unstated dependencies, SLA gaps |
-| **Concurrency & edge cases** | Multi-user scenarios, boundary conditions, error paths not addressed |
+| Behavioral gaps | Missing user actions, undefined system responses, incomplete flows. Which stories lack an error path AS? |
+| Data & persistence | Undefined entities, missing relationships, unclear storage/lifecycle |
+| Auth & access | Who can do what is unclear, missing role definitions |
+| Non-functional | Vague adjectives without metrics ("fast", "secure", "scalable") — add SC-NNN with concrete numbers |
+| Integration | Third-party API assumptions, unstated dependencies, SLA gaps |
+| Concurrency & edge cases | Multi-user scenarios, boundary conditions, error paths not addressed |
+| AS completeness | Which AS is missing Given or Then? |
+| AS overlap | Do 2 AS test the same behavior? |
+| Story orphans | Which story has no AS? |
+| Priority consistency | P0 story with only 1 happy path AS? |
+| Constraint coverage | Which constraint has no AS verifying it? |
 Identify the top 3-5 ambiguities (most impactful first). For each, ask the user a targeted question with 2-4 concrete options and a recommendation.
-If the spec is clear and complete, 0 questions is valid. Don't manufacture ambiguity.
+If 0 questions remain, you MUST state why — not just "spec is clear." Cite at minimum:
+- **Edge cases checked:** which boundary conditions were considered and found covered.
+- **Error paths checked:** which failure modes were verified to have AS.
+- **Integration points checked:** which external dependencies were reviewed.
+- One-line verdict per lens from the table above that had no findings.
+Example: *"0 questions. Edge cases: cart-empty and cart-max covered by AS-003/AS-004. Error paths: payment failure covered by AS-006. Auth: single-role feature, no ambiguity. No third-party integrations."*
+Don't manufacture ambiguity — but don't skip the justification either.
+**Present all questions (or the 0-question justification) to the user. Wait for answers before continuing.**
 Write clarifications back into the spec under `## Clarifications — <date>`.
-Then proceed to test plan generation.
+Update any affected stories or AS to reflect the user's answers.
+Then proceed to summary.
 ---
-## Phase 3: Generate the Test Plan
+## Phase 4: Summary
-Read the spec. For each section, extract:
-1. Use cases → at least 1 test (happy path) + 1 test (error path) each
-2. State transitions → test valid AND invalid transitions
-3. Constraints → test they hold under edge conditions
-4. Settings → test default AND non-default values
-5. Cross-cutting concerns (auth, validation) → integration-level tests
+Show:
+- Story counts (P0/P1/P2)
+- AS count
+- Directory structure created
+- Implementation order: which stories to implement first (by priority + dependency)
+- Next steps: "Implement stories in order. Use `/mf-test` to verify each story. For complex specs, run `/mf-challenge` first."
-Prioritize by risk: data loss/security = P0, error handling = P1, cosmetic/rare = P2.
+---
-### Output
+## Mode C: Update Flow
-Write to `docs/test-plans/<feature-name>.md`:
+> **⛔ CRITICAL — MANDATORY ORDER:**
+> Snapshot MUST be created **BEFORE** updating the spec.
+> If you update the spec first then create a snapshot → the snapshot contains the new content, old version is lost.
+> Correct order: C2 (classify) → C3 (snapshot) → C4 (report) → C5 (apply changes).
+> NEVER reverse the order of C3 and C5.
-```markdown
-# Test Plan: <Feature Name>
+### C0: Read current state
+Read `<feature>.md`. This is the current truth.
+### C1: Identify changes
+Compare the requested changes against the current spec. List:
+- Stories: added / modified / removed / unchanged
+- AS: added / modified / removed / unchanged
+- Constraints: added / modified / removed / unchanged
+### C2: Classification
+Walk through table M1-M6. If ANY condition is true → Major.
+| # | Condition | Example |
+|---|-----------|---------|
+| M1 | New story added | Adding S-004: Subscription |
+| M2 | Story removed | Removing S-002: Invoice |
+| M3 | Story priority changed | S-002 from P1 → P0 |
+| M4 | Story's main flow changed (Given or When changed) | AS-003 Given changes state, or When changes action |
+| M5 | Expected behavior changed (Then changed) for a P0 story | AS-001 Then changes result |
+| M6 | Constraint/invariant added or removed | Adding "balance must not be negative" |
+Minor = NONE of M1-M6 apply. Examples: typo fix, rewording without meaning change, adding/editing Data fields, formatting, adding Source ref.
-**Spec:** docs/specs/<feature-name>.md
-**Generated:** <$(date +%Y-%m-%d)>
+**Major → create snapshot before updating.**
+**Minor → no snapshot. Update directly.**
-## Test Cases
+> **⛔ MUST check ALL 6 conditions M1-M6.** Do not stop early.
+> Common mistake: check M1 = false, M2 = false → conclude Minor without checking M3-M6.
+> Correct: walk through M1 to M6 completely. If ANY is true → Major.
-| ID | Priority | Type | UC | FR/SC | Description | Expected |
-|----|----------|------|----|-------|-------------|----------|
-| TC-001 | P0 | unit | UC-001 | FR-001 | Valid login returns token | 200 + JWT |
-| TC-002 | P0 | unit | UC-001 | FR-002 | Wrong password returns 401 | 401 + error msg |
+### C3: Snapshot (if Major)
-## Implementation Order
-1. TC-001, TC-002 (no dependencies — start here)
-2. TC-003+ (depend on setup from earlier tests)
+If Major → create snapshot:
-## Coverage Notes
-- Highest risk areas: ...
-- Existing code needing modification: [file paths]
+**Step 1:** Copy file using shell command (bit-perfect, not through LLM):
+```bash
+mkdir -p docs/specs/<feature>/snapshots
+cp docs/specs/<feature>/<feature>.md docs/specs/<feature>/snapshots/<YYYY-MM-DD>.md
 ```
-**Priority:** P0 = must have (blocks release), P1 = should have, P2 = nice to have.
-**Type:** `unit`, `integration`, `e2e`, `snapshot`, `performance`
+If ref available: `cp ... snapshots/<YYYY-MM-DD>-<REF>.md`
+If same-day snapshot exists: `cp ... snapshots/<YYYY-MM-DD>-2.md`
+**Step 2:** Prepend header to the snapshot file (using Edit):
-### What NOT to produce
-- "Test that the feature works" — too vague
-- 50+ test cases for simple CRUD — over-testing
-- Testing implementation details — brittle
-- Duplicate tests verifying same behavior
+```markdown
+# Snapshot: <Feature Name>
+**Date:** <YYYY-MM-DD>
+**Ref:** <ticket/issue if available, "--" otherwise>
+**Reason:** <M1|M2|M3|M4|M5|M6 — list which conditions triggered>
 ---
-## Phase 4: Summary
+```
+Header is added BEFORE the copied content. Do not modify any other content in the snapshot.
+> **⛔ Why `cp` instead of LLM copy:** Specs require 101% accuracy. LLM text copy risks
+> dropping lines, altering formatting, truncating long content. `cp` is bit-perfect.
-Show: test case counts (P0/P1/P2), implementation order, estimated scope.
-Next steps: "Use `/mf-test` after each chunk. For complex plans, run `/mf-challenge` first."
+**Step 3:** Rotate snapshots. Check the spec frontmatter for `Snapshot limit: N`. If absent, default to **5**.
+After creating a new snapshot, if `snapshots/` contains more files than the limit:
+- Sort by timestamp in filename.
+- Delete oldest files until count equals the limit.
+- Only delete snapshot files. Log deletion in Change Log: `"Snapshot <filename> rotated out"`.
+If Minor: skip C3 entirely.
+**Snapshots are immutable.** Never edit a created snapshot. Wrong snapshot → create a new one, delete the wrong one.
+**mf-plan creates snapshots. Developers do not create them manually.** Developers do not decide, intervene, or skip.
+### C4: Change report
+Display to the user:
+```markdown
+## Change Report: <feature>
+**Classification:** Major / Minor
+**Snapshot:** Created <filename> / Not needed
+### Changes
+| Item | Action | Detail |
+|------|--------|--------|
+| S-002 | Priority change | P1 → P0 |
+| AS-003 | Updated | Then changed |
+| S-004 | Added | Subscription (P1) |
+### Unchanged
+S-001, S-003
+```
+> **⛔ MUST wait for user confirmation before applying.**
+> Do not show the report and apply in the same step.
+> User has the right to reject or modify the change report.
+### C5: Apply changes
+- Update the spec directly.
+- Update `Last updated`.
+- Write to Change Log.
+- New AS use the next sequential ID (never reuse deleted IDs).
+- New AS follow the same Writing Instructions as Phase 2 (concrete values, one behavior per AS, no vague/duplicate/implementation-testing scenarios).
+> **⛔ Change Log MUST be updated at this step.**
+> Common mistake: update the spec, forget to write to Change Log.
+> Every C5 execution → Change Log MUST have a new row. No exceptions.
+> (Exception: non-semantic changes — C7 — do not write to Change Log.)
+### C6: Consistency check
+After updating, verify:
+| # | Check | On failure → |
+|---|-------|-------------|
+| CC1 | Every story has at least 1 AS | Add missing AS |
+| CC2 | Every AS belongs to exactly 1 story | Assign orphan AS or delete |
+| CC3 | P0 stories have error path AS | Add error AS if missing |
+| CC4 | No 2 AS test the same behavior | Suggest merge or delete duplicate |
+| CC5 | Constraints have AS verifying them | Add AS for uncovered constraints |
+| CC6 | Story count ≤7, AS count ≤20 | Suggest splitting spec (Phase 1) |
+> **⛔ Consistency check is NOT optional.**
+> Run CC1-CC6 after EVERY update (Major and Minor).
+> Common mistake: finish update, looks fine → skip consistency check.
+> CC6 (size check) is especially easy to skip — MUST check after every story/AS addition.
+If any check fails → fix or report to user. NEVER skip.
+### C7: Non-semantic changes
+If the change is only typo, formatting, or wording that does NOT change behavior:
+- Edit directly, do not run C2-C6.
+- Do not write to Change Log.
+- Do not create snapshot.
+Criteria for "non-semantic": Given, When, Then, priority, constraint **DO NOT** change in meaning.
+> **⛔ When in doubt whether "non-semantic or behavioral?" → treat as behavioral.**
+> Common mistake: LLM classifies a Then change as "rewording" to avoid snapshot overhead.
+> Test: if a developer reads the AS before and after the change and would write different code → it is behavioral.
+### C8: Archival (all stories removed)
+If a Mode C update results in ALL stories being removed from a spec:
+1. Create a snapshot per C3 (this is a Major change — M2 applies).
+2. Move the entire feature directory to `docs/specs/_archived/`:
+   ```bash
+   mkdir -p docs/specs/_archived
+   mv docs/specs/<feature> docs/specs/_archived/$(date +%Y-%m-%d)-<feature>
+   ```
+3. The archived directory retains all snapshots and the final spec state.
+4. Log in Change Log before archiving: `"Feature archived — all stories removed"`.
+Archived specs are read-only. To resurrect a feature, copy from `_archived/` back to `docs/specs/` and run `/mf-plan` in Mode A.
+---
 ## Naming Convention
-Spec and test plan MUST share the same filename:
 ```
-docs/specs/<feature-name>.md       ← kebab-case, 2-3 words
-docs/test-plans/<feature-name>.md  ← same name
+docs/specs/<feature>/              ← kebab-case, 2-3 words
+  <feature>.md                     ← same name as directory
+  snapshots/
+    YYYY-MM-DD.md
+    YYYY-MM-DD-<REF>.md
 ```
-- Use feature name, not module name: `user-auth.md` not `AuthService.md`
+- Feature name, not module name: `user-auth/` not `AuthService/`
+- Sub-specs when splitting: `<feature>-<sub>.md` in the same directory
 - No prefix/suffix: `user-auth.md` not `spec-user-auth.md`
-**Requirement IDs** — sequential per spec:
-- `UC-001` Use Case, `FR-001` Functional Requirement, `SC-001` Success Criteria, `TC-001` Test Case
-- Every TC must reference at least one FR or SC for traceability.
+**ID rules:**
+- `S-NNN` Story — sequential per spec, starting from S-001
+- `AS-NNN` Acceptance Scenario — sequential per spec, across all stories, starting from AS-001
+- `FR-NNN` Functional Requirement — if needed
+- `SC-NNN` Success Criteria — if needed
+- Deleted IDs must never be reused
+- **Sub-spec numbering is local.** Each sub-spec starts its own S-001, AS-001 sequence.
+  Sub-specs are self-contained (Phase 1 rule), so IDs need not be globally unique.
+- **Cross-references between sub-specs** use the sub-spec name as prefix:
+  `billing-refund:AS-002` refers to AS-002 in `billing-refund.md`.
+  Avoid cross-references where possible — if you need many, the split may be wrong.
+---
 ## Rules
-1. **Spec-first.** Test plan derives from spec, never from code.
-2. **Codebase-aware.** Don't plan features that already exist.
-3. **Actionable.** Every test case must be unambiguous enough to implement directly.
-4. **Proportional.** Simple feature = simple plan. Don't over-engineer CRUD.
-5. **Traceable.** Every test links to a use case. No orphan tests.
-6. **Consistent names.** Spec and test plan always share the same filename.
+1. **Spec-first.** Code serves the spec, not the other way around.
+2. **Single file = current truth.** `<feature>.md` always reflects the current state.
+3. **Codebase-aware.** Don't plan features that already exist.
+4. **Actionable.** Every AS must be clear enough to implement directly.
+5. **Proportional.** Simple feature = simple spec. Don't over-engineer CRUD.
+6. **Traceable.** Every AS belongs to 1 story. No orphan AS.
+7. **Bounded.** Spec exceeding 7 stories or 20 AS must be split.
+8. **Snapshot = mf-plan's job.** Developers do not create, delete, or edit snapshots.
+9. **Classification = checklist.** Major/Minor decided by table M1-M6, not judgment.
+10. **ID immutable.** Assigned IDs never change, never get reused.
+---
+## Traps — Common Mistakes That MUST Be Avoided
+| # | Trap | Consequence | Rule violated |
+|---|------|-------------|--------------|
+| TRAP-1 | Update spec BEFORE creating snapshot | Snapshot contains new content, old version lost | Order C3→C5 |
+| TRAP-2 | Check M1-M2 then stop, skip M3-M6 | Major change classified as Minor, snapshot missed | Classification M1-M6 |
+| TRAP-3 | Skip consistency check (CC1-CC6) | Story without AS, P0 missing error path, spec bloat undetected | C6 Consistency |
+| TRAP-4 | Classify behavioral change as "non-semantic" | Important change not snapshotted, not logged | C7 Non-semantic |
+| TRAP-5 | Apply changes without waiting for user confirmation | User loses control, wrong changes can't be rolled back | C4 Change report |
+| TRAP-6 | Update spec but forget to write Change Log | Change history lost, no one knows what happened | C5 Apply |
+| TRAP-7 | Reuse deleted ID (S-003 deleted then assigned to new story) | Confusion with old references in code, commits, conversations | ID rules |
+| TRAP-8 | LLM copies spec content instead of using `cp` for snapshot | Lines dropped, formatting altered, truncation — inaccurate snapshot | C3 Snapshot |
+| TRAP-9 | Skip Phase 1 (Scope & Split) for large features | Spec bloats >7 stories, hard to maintain, hard to review | Phase 1 |
+| TRAP-10 | Write P2-depth AS for a P0 story | P0 story lacks Given/When/Then/Data, developer can't implement | AS Depth |

package/templates/.claude/commands/mf-review.md CHANGED Viewed

@@ -7,7 +7,7 @@ Pre-merge code review — security, correctness, spec alignment.
    BASE=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||') || BASE="main"
    git log --oneline "$BASE"...HEAD
    ```
-2. Check for spec in `docs/specs/` and test plan in `docs/test-plans/` — review against INTENT.
+2. Check for spec in `docs/specs/<feature>/<feature>.md` — review against INTENT.
 3. Read the diff: `git diff "$BASE"...HEAD`
 If `$ARGUMENTS` provided → scope to those files only.
@@ -50,11 +50,15 @@ Spend 60% of analysis on the primary focus. Cover all categories, but proportion
 - **Null safety:** Optionals used without guards? `object!.property` without nil check?
 ### Spec-Test Alignment (Medium)
-- Source changed but no spec update in `docs/specs/`? → flag
+- Source changed but no spec update in `docs/specs/<feature>/`? → flag
 - Source changed but no test update? → flag
-- Spec changed but tests not updated? → flag
+- Spec changed but acceptance scenarios or tests not updated? → flag
 - Code removed but dead tests remain? → flag
 - Spec contains vague requirements without metrics ("fast", "secure", "easy", "scalable")? → flag with suggestion to add SC-NNN with concrete numbers
+- **AS-to-test name check:** Read the spec's `## Stories` section. For each AS-NNN, check if a test file contains a test named or described with that AS ID or its short description. Flag:
+  - AS in spec with no matching test → "AS-NNN: \<description\> has no corresponding test"
+  - Test referencing an AS-NNN that no longer exists in the spec → "Test references removed AS-NNN"
+  Keep this lightweight — match on AS-NNN identifiers and story name substrings, not semantic analysis.
 ### Code Quality (Medium)
 - Dead code: removed functions still imported elsewhere?

package/templates/.claude/commands/mf-test.md CHANGED Viewed

@@ -1,4 +1,4 @@
-Write tests from test plan, compile, run, fix until green.
+Write tests from spec acceptance scenarios, compile, run, fix until green.
 ## Phase 0: Build Context
@@ -10,8 +10,7 @@ Write tests from test plan, compile, run, fix until green.
    If `$ARGUMENTS` provided → scope to that file or feature only.
    If no changes → "No source changes found. Specify a file or feature."
-2. **Read the test plan** in `docs/test-plans/` if it exists — this is your roadmap.
-3. **Read the spec** in `docs/specs/` if it exists — understand the INTENT behind the code.
+2. **Read the spec** at `docs/specs/<feature>/<feature>.md` — the `## Stories` section with acceptance scenarios is your roadmap. The `## Overview` and `## Constraints` sections tell you the INTENT behind the code.
 4. **Read existing tests** for the changed files — find patterns, fixtures, naming conventions. Don't duplicate.
 ---
@@ -86,14 +85,25 @@ If tests fail:
 ```
 Tests: X added, Y modified, Z unchanged
-Result: All passing ✓
+Result: All passing ✓ / N failing ✗
 Coverage: [critical uncovered paths if any]
 Files: [test files touched]
-Plan: [TC-001 ✓, TC-002 ✓, TC-005 new]
+Stories: [AS-001 ✓, AS-002 ✓, AS-005 new]
 ```
-If behavior changed: "Consider updating the spec in docs/specs/."
+If behavior changed: "Consider updating the spec in docs/specs/<feature>/<feature>.md."
+### Spec Gap Detection
+If a test fails due to an edge case, error path, or boundary condition that is NOT covered by any existing AS in the spec:
+1. State explicitly: **"This failure suggests a missing acceptance scenario."**
+2. Describe the gap: what behavior was tested, which story it belongs to, why no AS covers it.
+3. Prompt: **"Run `/mf-plan <spec-path> 'Add AS for <description>'` to add the missing scenario."**
+Do not silently fix the test and move on. A test that has no corresponding AS means the spec is incomplete — the spec must be updated first.
 ## Rules
 1. **Behavior over implementation.** Test what code DOES, not how.
 2. **Independent tests.** Each test sets up its own state, cleans up after.
+3. **Spec stays upstream.** If a test reveals a spec gap, update the spec before adding the test.