npm - claude-devkit-cli - Versions diffs - 1.3.3 → 1.3.5 - Mend

claude-devkit-cli 1.3.3 → 1.3.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/README.md +17 -17
package/package.json +1 -1
package/templates/.claude/CLAUDE.md +2 -2
package/templates/.claude/commands/{mf-test.md → mf-build.md} +19 -3
package/templates/.claude/commands/mf-challenge.md +38 -27
package/templates/.claude/commands/mf-commit.md +35 -8
package/templates/.claude/commands/mf-fix.md +18 -1
package/templates/.claude/commands/mf-plan.md +55 -48
package/templates/docs/WORKFLOW.md +5 -5

package/README.md CHANGED Viewed

@@ -66,7 +66,7 @@ claude
 /mf-plan "describe your feature here"
 # 4. Write code, then test
-/mf-test
+/mf-build
 # 5. Review before merging
 /mf-review
@@ -138,7 +138,7 @@ your-project/
 │   └── commands/
 │       ├── mf-plan.md         ← /mf-plan command
 │       ├── mf-challenge.md    ← /mf-challenge command
-│       ├── mf-test.md         ← /mf-test command
+│       ├── mf-build.md        ← /mf-build command
 │       ├── mf-fix.md          ← /mf-fix command
 │       ├── mf-review.md       ← /mf-review command
 │       └── mf-commit.md       ← /mf-commit command
@@ -202,7 +202,7 @@ This removes hooks, commands, settings, and build-test.sh. It preserves `CLAUDE.
    → Generates spec with acceptance scenarios at docs/specs/<feature>/<feature>.md.
 2. Implement code in chunks.
-   After each chunk: /mf-test
+   After each chunk: /mf-build
    Repeat until green.
 3. /mf-review (before merge)
@@ -225,7 +225,7 @@ This removes hooks, commands, settings, and build-test.sh. It preserves `CLAUDE.
    Do NOT manually edit the spec before running /mf-plan.
 2. Implement the code change.
-   /mf-test
+   /mf-build
    Fix until green.
 3. /mf-review → /mf-commit
@@ -392,21 +392,21 @@ docs/specs/<feature>/
 **Token cost:** 15-30k (uses parallel subagents, doesn't bloat main context)
-### /mf-test — Write + Run Tests
+### /mf-build — TDD Delivery Loop
 **Usage:**
 ```
-/mf-test                              # test all changes vs base branch
-/mf-test src/api/users.ts             # test specific file
-/mf-test "user authentication"        # test specific feature
+/mf-build                              # build all changes vs base branch
+/mf-build src/api/users.ts             # build specific file
+/mf-build "user authentication"        # build specific feature
 ```
 **How it works:**
 1. **Phase 0: Build Context** — Finds changed files vs base branch, reads the spec (acceptance scenarios in `## Stories` section are the roadmap), reads existing tests for patterns, fixtures, and naming conventions. Doesn't duplicate what already exists.
-2. **Phase 1: Write Tests** — Creates or updates tests based on acceptance scenarios. Each test covers one concept, is independent, deterministic (no random, no time-dependent, no external calls), and has a clear name.
+2. **Phase 1: Write Failing Tests** — Creates tests from acceptance scenarios (RED). Each test covers one AS, is independent, deterministic (no random, no time-dependent, no external calls), and has a clear name.
 3. **Phase 2: Compile First** — Runs typecheck/compile before executing tests. Catches syntax errors early.
-4. **Phase 3: Run Tests** — Executes the test suite.
+4. **Phase 3: Implement + Run** — Implements story code, executes tests, drives to GREEN story by story.
 5. **Phase 4: Fix Loop** — If tests fail, fixes **test code only** (max 3 attempts, then hard stop and report). If tests expect X but code does Y, asks you whether to fix production code or adjust the test.
 6. **Phase 5: Report** — Summary with test counts, results, coverage, and files touched.
@@ -858,7 +858,7 @@ Then use: `/deploy staging`
 | Activity | Tokens | Frequency |
 |----------|--------|-----------|
-| `/mf-test` (incremental, 1-3 files) | 5–10k | Every code chunk |
+| `/mf-build` (incremental, 1-3 files) | 5–10k | Every code chunk |
 | `/mf-fix` (single bug) | 3–5k | As needed |
 | `/mf-commit` | 2–4k | Every commit |
 | `/mf-review` (diff-based) | 10–20k | Before merge |
@@ -868,9 +868,9 @@ Then use: `/deploy staging`
 ### Minimizing Token Usage
-- **Test incrementally.** `/mf-test` after each small chunk uses 5-10k. Waiting until everything is done then running `/mf-test` on a large diff uses 50k+.
-- **Use filters.** `/mf-test src/auth/login.ts` is cheaper than `/mf-test` on the whole project.
-- **Skip `/mf-plan` for tiny changes.** Under 5 lines with no behavior change? Just `/mf-test` and `/mf-commit`.
+- **Test incrementally.** `/mf-build` after each small chunk uses 5-10k. Waiting until everything is done then running `/mf-build` on a large diff uses 50k+.
+- **Use filters.** `/mf-build src/auth/login.ts` is cheaper than `/mf-build` on the whole project.
+- **Skip `/mf-plan` for tiny changes.** Under 5 lines with no behavior change? Just `/mf-build` and `/mf-commit`.
 - **Use `/mf-review` only before merge.** Not after every commit.
 ---
@@ -898,7 +898,7 @@ Then use: `/deploy staging`
 ### Wrong base branch
-**Symptom:** `/mf-test` or `/mf-review` compares against wrong branch.
+**Symptom:** `/mf-build` or `/mf-review` compares against wrong branch.
 **Check:**
 ```bash
@@ -928,13 +928,13 @@ export FILE_GUARD_EXCLUDE="*.generated.swift,*.pb.go,*.min.js,*.snap"
 ## 12. FAQ
 **Q: Do I need specs for every tiny change?**
-A: No. Changes under 5 lines with no behavior change can skip the spec. Just `/mf-test` and `/mf-commit`. The spec-first rule is for meaningful behavior changes.
+A: No. Changes under 5 lines with no behavior change can skip the spec. Just `/mf-build` and `/mf-commit`. The spec-first rule is for meaningful behavior changes.
 **Q: Can I use mocks in tests?**
 A: Only for external services you can't run locally (third-party APIs, email services). Never mock your own code or database just to make tests pass faster.
 **Q: What if Claude writes a test that tests the wrong thing?**
-A: This usually means the spec is ambiguous. Clarify the spec first, then re-run `/mf-test`. Good specs produce good tests.
+A: This usually means the spec is ambiguous. Clarify the spec first, then re-run `/mf-build`. Good specs produce good tests.
 **Q: Can I use this with other AI coding tools?**
 A: The commands and hooks are Claude Code-specific. The specs, workflow, and `build-test.sh` work with any tool or manual workflow.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-devkit-cli",
-  "version": "1.3.3",
+  "version": "1.3.5",
   "description": "CLI toolkit for spec-first development with Claude Code — hooks, commands, guards, and test runners",
   "bin": {
     "claude-devkit": "./bin/devkit.js",

package/templates/.claude/CLAUDE.md CHANGED Viewed

@@ -13,8 +13,8 @@ Every change follows this cycle: **SPEC (with acceptance scenarios) → CODE + T
 | Trigger | Commands | Details |
 |---------|----------|---------|
-| New feature | `/mf-plan` → `/mf-challenge` (optional) → code in chunks → `/mf-test` each chunk | Start with spec or description |
-| Update feature | `/mf-plan <spec-path> "changes"` → code → `/mf-test` | Do NOT manually edit spec before /mf-plan |
+| New feature | `/mf-plan` → `/mf-challenge` (optional) → code in chunks → `/mf-build` each chunk | Start with spec or description |
+| Update feature | `/mf-plan <spec-path> "changes"` → code → `/mf-build` | Do NOT manually edit spec before /mf-plan |
 | Bug fix | `/mf-fix "description"` | Test-first: write failing test → fix → green |
 | Remove feature | `/mf-plan <spec-path> "remove stories"` → delete code + tests → build pass | /mf-plan handles snapshot before removal |
 | Pre-merge check | `/mf-review` | Diff-based quality gate |

package/templates/.claude/commands/{mf-test.md → mf-build.md} RENAMED Viewed

@@ -1,4 +1,4 @@
-Write tests from spec acceptance scenarios, compile, run, fix until green.
+TDD delivery loop — write failing tests from spec AS, implement story by story, drive to GREEN.
 ## Phase 0: Build Context
@@ -70,7 +70,23 @@ If `scripts/build-test.sh` doesn't exist, detect and run directly:
 If tests fail:
 1. Read error output. Is the test wrong or the production code wrong?
-2. If production code seems wrong → **ASK the user:** "Test expects X but code does Y. Fix production code or adjust test?"
+2. If production code seems wrong → use `AskUserQuestion`:
+```json
+{
+  "questions": [
+    {
+      "question": "Test expects <X> but code does <Y>. Which is correct?",
+      "header": "Test vs Code Mismatch",
+      "multiSelect": false,
+      "options": [
+        {"label": "Fix production code — the test is correct"},
+        {"label": "Adjust the test — the code behavior is intentional"}
+      ]
+    }
+  ]
+}
+```
 3. Fix test code only. Re-run. Max 3 attempts, then stop and report.
 **NEVER:**
@@ -99,7 +115,7 @@ If a test fails due to an edge case, error path, or boundary condition that is N
 1. State explicitly: **"This failure suggests a missing acceptance scenario."**
 2. Describe the gap: what behavior was tested, which story it belongs to, why no AS covers it.
-3. Prompt: **"Run `/mf-plan <spec-path> 'Add AS for <description>'` to add the missing scenario."**
+3. Prompt: **"Run `/mf-plan <spec-path> 'Add AS for <description>'` to add the missing scenario, then re-run `/mf-build`."**
 Do not silently fix the test and move on. A test that has no corresponding AS means the spec is incomplete — the spec must be updated first.

package/templates/.claude/commands/mf-challenge.md CHANGED Viewed

@@ -173,32 +173,43 @@ Include 1-sentence rationale for each disposition. Be honest — don't reject va
 Show adjudicated findings using the reviewer output format plus Disposition and Rationale fields.
-Then present the decision:
-```
-How to proceed with N accepted findings?
-  A) Apply all accepted — bulk-apply all fixes at once
-     Fit: N/10  |  Trade-off: fast vs. no per-finding control
-  B) Review each — walk through one by one, accept/reject/modify
-     Fit: N/10  |  Trade-off: precise control vs. slower
-  RECOMMENDATION: [A or B] — <reason based on finding count and severity>
-```
-Score Fit based on context: if most findings are High/Critical, recommend B (review each). If mostly Medium with clear fixes, recommend A.
-If user picks B: for each finding, present:
-```
-Finding [C-1]: <title>
-  A) Accept — apply the suggested fix
-  B) Modify — accept with changes (describe your modification)
-  C) Reject — skip this finding
-  RECOMMENDATION: [A/B/C] — <based on your adjudication>
+Then present the decision using the `AskUserQuestion` tool:
+```json
+{
+  "questions": [
+    {
+      "question": "How to proceed with N accepted findings? RECOMMENDATION: Choose A if mostly Medium fixes, B if any Critical/High findings.",
+      "header": "Apply Findings",
+      "multiSelect": false,
+      "options": [
+        {"label": "A) Apply all accepted — bulk-apply all fixes at once | Trade-off: fast vs. no per-finding control"},
+        {"label": "B) Review each — walk through one by one, accept/reject/modify | Trade-off: precise control vs. slower"}
+      ]
+    }
+  ]
+}
+```
+Score: if most findings are High/Critical, recommend B. If mostly Medium with clear fixes, recommend A.
+If user picks B: for each finding, use `AskUserQuestion`:
+```json
+{
+  "questions": [
+    {
+      "question": "Finding [C-1]: <title>\n<flaw summary>\nRECOMMENDATION: Choose A — <adjudication rationale>.",
+      "header": "Finding C-1",
+      "multiSelect": false,
+      "options": [
+        {"label": "A) Accept — apply the suggested fix"},
+        {"label": "B) Modify — accept with changes (describe your modification)"},
+        {"label": "C) Reject — skip this finding"}
+      ]
+    }
+  ]
+}
 ```
 ## Phase 7: Apply
@@ -215,7 +226,7 @@ Reviewers: N lenses
 Findings: X total → Y accepted, Z rejected
 Severity: N Critical, N High, N Medium
 Files modified: [list]
-Next: /mf-test to implement, or /mf-plan to regenerate if major changes.
+Next: /mf-build to implement, or /mf-plan to regenerate if major changes.
 ```
 If a reviewer returns > 7 findings, take only top 7 by severity. If a reviewer fails, proceed with remaining reviewers.

package/templates/.claude/commands/mf-commit.md CHANGED Viewed

@@ -22,9 +22,41 @@ echo "=== DEBUG ===" && \
 **Secrets (hard block):** If count > 0, show matched lines and STOP. Do not commit.
-**Debug code (soft warn):** If count > 0, show matched lines. Proceed only after user confirms they're intentional.
+**Debug code (soft warn):** If count > 0, show matched lines. Use `AskUserQuestion` to confirm:
+```json
+{
+  "questions": [
+    {
+      "question": "Found <N> debug statements (console.log, debugger, etc.) in the diff. Are these intentional?",
+      "header": "Debug Code",
+      "multiSelect": false,
+      "options": [
+        {"label": "Yes, intentional — proceed with commit"},
+        {"label": "No, remove them first"}
+      ]
+    }
+  ]
+}
+```
-**Large diff:** If > 10 files or > 300 lines, note: "Large commit — consider splitting for easier review." Continue unless user says to split.
+**Large diff:** If > 10 files or > 300 lines, use `AskUserQuestion` to confirm:
+```json
+{
+  "questions": [
+    {
+      "question": "Large commit detected (<N> files, <M> lines). Large commits are harder to review and revert.",
+      "header": "Large Commit",
+      "multiSelect": false,
+      "options": [
+        {"label": "Proceed — commit everything as one"},
+        {"label": "Split — I'll stage specific files myself"}
+      ]
+    }
+  ]
+}
+```
 ---
@@ -67,12 +99,7 @@ Never stage: `.env`, credentials, build artifacts, generated files, binaries > 1
 ## Step 5 — Commit
 ```bash
-git commit -m "$(cat <<'EOF'
-type(scope): description
-Co-Authored-By: Claude <noreply@anthropic.com>
-EOF
-)"
+git commit -m "type(scope): description"
 ```
 **Do NOT push** unless user explicitly asks.

package/templates/.claude/commands/mf-fix.md CHANGED Viewed

@@ -29,7 +29,24 @@ bash scripts/build-test.sh --filter "<test name>"
 ```
 - **FAILS** → reproduced. Continue.
-- **PASSES** → hypothesis may be wrong. Ask: "Test passes — need different repro steps or environment details."
+- **PASSES** → hypothesis may be wrong. Use `AskUserQuestion`:
+```json
+{
+  "questions": [
+    {
+      "question": "The test passes with current code — the bug isn't reproduced yet. How to proceed?",
+      "header": "Test Passes Unexpectedly",
+      "multiSelect": false,
+      "options": [
+        {"label": "Provide different repro steps or environment details"},
+        {"label": "The bug may be environment-specific — describe the setup"},
+        {"label": "Skip test-first for this bug — fix directly"}
+      ]
+    }
+  ]
+}
+```
 ---

package/templates/.claude/commands/mf-plan.md CHANGED Viewed

@@ -2,21 +2,24 @@ Generate spec with acceptance scenarios from description or existing spec.
 ## Question Format
-When presenting a question to the user with multiple options, ALWAYS use this structured format:
-```
-Q<N>: <Plain-language problem statement — what needs deciding and why>
-  A) <option> — <1-line rationale>
-     Fit: X/10  |  Trade-off: <what you gain vs. lose>
-  B) <option> — <1-line rationale>
-     Fit: X/10  |  Trade-off: <what you gain vs. lose>
-  C) <option> — <1-line rationale> (if applicable)
-     Fit: X/10  |  Trade-off: <what you gain vs. lose>
-  RECOMMENDATION: [A/B/C] — <one-sentence reason>
+When presenting questions to the user with multiple options, use the `AskUserQuestion` tool.
+**Schema:**
+```json
+{
+  "questions": [
+    {
+      "question": "<plain-language problem statement — what needs deciding and why. Include RECOMMENDATION: Choose [X] because [one-line reason]>",
+      "header": "<short label>",
+      "multiSelect": false,
+      "options": [
+        {"label": "A) <option> — <1-line rationale> | Fit: X/10 | Trade-off: <gain vs. lose>"},
+        {"label": "B) <option> — <1-line rationale> | Fit: X/10 | Trade-off: <gain vs. lose>"},
+        {"label": "C) <option> — <1-line rationale> | Fit: X/10 | Trade-off: <gain vs. lose>"}
+      ]
+    }
+  ]
+}
 ```
 **Fit scoring calibration:**
@@ -29,9 +32,9 @@ Q<N>: <Plain-language problem statement — what needs deciding and why>
 Rules:
 - 2-4 options per question. Never more than 4.
 - Every option must have a Fit score AND a Trade-off. No score without rationale.
-- RECOMMENDATION is mandatory. Pick one. State why.
+- RECOMMENDATION is mandatory in the question text. Pick one. State why.
 - If two options score within 1 point, flag it: "Close call — A and B are both strong. Leaning A because [reason]."
-- Present all questions at once (not one-by-one) unless the answer to Q1 changes what Q2 should be.
+- Pass all questions in a single `AskUserQuestion` call (not one-by-one) unless the answer to Q1 changes what Q2 should be.
 ---
@@ -275,21 +278,23 @@ Before finalizing, scan the spec for gaps. Check BOTH the spec content AND the a
 | Priority consistency | P0 story with only 1 happy path AS? |
 | Constraint coverage | Which constraint has no AS verifying it? |
-Identify the top 3-5 ambiguities (most impactful first). Present each using the **Question Format** (see top of this file). Example:
-```
-Q1: Auth strategy not specified — spec mentions "logged-in users" but no auth mechanism.
-  A) Session-based auth (cookie) — traditional, simple server-side
-     Fit: 8/10  |  Trade-off: simple setup vs. harder to scale across services
-  B) JWT (stateless tokens) — API-friendly, no server session
-     Fit: 7/10  |  Trade-off: scalable vs. token revocation complexity
-  C) Defer — add auth story later when auth requirements are clearer
-     Fit: 5/10  |  Trade-off: unblocks now vs. may require spec rewrite later
-  RECOMMENDATION: A — single-service app, session auth is simplest path.
+Identify the top 3-5 ambiguities (most impactful first). Present all at once using the `AskUserQuestion` tool (see Question Format at top). Example call:
+```json
+{
+  "questions": [
+    {
+      "question": "Auth strategy not specified — spec mentions 'logged-in users' but no auth mechanism. RECOMMENDATION: Choose A — single-service app, session auth is simplest path.",
+      "header": "Auth Strategy",
+      "multiSelect": false,
+      "options": [
+        {"label": "A) Session-based auth (cookie) — traditional, simple server-side | Fit: 8/10 | Trade-off: simple setup vs. harder to scale across services"},
+        {"label": "B) JWT (stateless tokens) — API-friendly, no server session | Fit: 7/10 | Trade-off: scalable vs. token revocation complexity"},
+        {"label": "C) Defer — add auth story later when auth requirements are clearer | Fit: 5/10 | Trade-off: unblocks now vs. may require spec rewrite later"}
+      ]
+    }
+  ]
+}
 ```
 If 0 questions remain, you MUST state why — not just "spec is clear." Cite at minimum:
@@ -317,7 +322,7 @@ Show:
 - AS count
 - Directory structure created
 - Implementation order: which stories to implement first (by priority + dependency)
-- Next steps: "Implement stories in order. Use `/mf-test` to verify each story. For complex specs, run `/mf-challenge` first."
+- Next steps: "Implement stories in order. Use `/mf-build` to verify each story. For complex specs, run `/mf-challenge` first."
 ---
@@ -425,21 +430,23 @@ Display to the user:
 S-001, S-003
 ```
-Present the decision using **Question Format**:
-```
-Q1: Apply these changes to <feature> spec?
-  A) Apply all — accept the full change report as shown
-     Fit: N/10  |  Trade-off: fast vs. no per-item control
-  B) Review each — walk through changes one by one, accept/reject/modify
-     Fit: N/10  |  Trade-off: precise control vs. slower
-  C) Reject all — discard and start over
-     Fit: N/10  |  Trade-off: clean slate vs. loses work
-  RECOMMENDATION: [A or B] — <reason based on change count and complexity>
+Present the decision using the `AskUserQuestion` tool:
+```json
+{
+  "questions": [
+    {
+      "question": "Apply these changes to <feature> spec? RECOMMENDATION: Choose A — <reason based on change count and complexity>.",
+      "header": "Apply Changes",
+      "multiSelect": false,
+      "options": [
+        {"label": "A) Apply all — accept the full change report as shown | Fit: N/10 | Trade-off: fast vs. no per-item control"},
+        {"label": "B) Review each — walk through changes one by one, accept/reject/modify | Fit: N/10 | Trade-off: precise control vs. slower"},
+        {"label": "C) Reject all — discard and start over | Fit: N/10 | Trade-off: clean slate vs. loses work"}
+      ]
+    }
+  ]
+}
 ```
 > **⛔ MUST wait for user confirmation before applying.**

package/templates/docs/WORKFLOW.md CHANGED Viewed

@@ -22,7 +22,7 @@ Step 2 → (Optional) /mf-challenge docs/specs/<feature>/<feature>.md
           Skip for simple CRUD or small features.
 Step 3 → Implement in chunks. After each chunk:
-          /mf-test
+          /mf-build
           Repeat until chunk is green.
 Step 4 → /mf-review (before merge)
@@ -41,7 +41,7 @@ Step 1 → /mf-plan docs/specs/<feature>/<feature>.md "description of changes"
           snapshot first, then applies changes. Manual edits bypass snapshot protection.
 Step 2 → Implement code changes.
-          /mf-test
+          /mf-build
           Fix until green.
 Step 4 → /mf-review → /mf-commit
@@ -99,7 +99,7 @@ Is this a brand new feature (no existing spec or code)?
     │       │   └─ No → Update Feature workflow. Start with /mf-plan.
     │       │
     │       └─ Is the change very small (< 5 lines, behavior unchanged)?
-    │           └─ Yes → Skip spec update. Just /mf-test and /mf-commit.
+    │           └─ Yes → Skip spec update. Just /mf-build and /mf-commit.
 ```
 ---
@@ -170,7 +170,7 @@ Files to delete: [list]
 | Workflow | Estimated Tokens | When |
 |----------|-----------------|------|
-| `/mf-test` (incremental) | 5–10k | Daily, after each code chunk |
+| `/mf-build` (incremental) | 5–10k | Daily, after each code chunk |
 | `/mf-fix` (single bug) | 3–5k | As bugs arise |
 | `/mf-commit` | 2–4k | Each commit |
 | `/mf-review` (diff-based) | 10–20k | Before merge |
@@ -178,7 +178,7 @@ Files to delete: [list]
 | `/mf-challenge` (adversarial) | 15–30k | After /mf-plan, for complex features |
 | Full audit (manual) | 100k+ | Before release, quarterly |
-**Rule of thumb:** Daily work uses templates + `/mf-test` → low token cost.
+**Rule of thumb:** Daily work uses templates + `/mf-build` → low token cost.
 Save `/mf-plan` and full audits for significant milestones.
 ---