claude-devkit-cli 1.1.0 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,931 @@
1
+ # claude-devkit-cli
2
+
3
+ A lightweight, spec-first development toolkit for [Claude Code](https://claude.ai/code). It enforces the cycle **spec → test plan → code + tests → build pass** through custom commands, automatic hooks, and a universal test runner.
4
+
5
+ **Works with:** Swift, TypeScript/JavaScript, Python, Rust, Go, Java/Kotlin, C#, Ruby.
6
+ **Dependencies:** None (requires only Claude Code CLI, Node.js, Git, and Bash).
7
+
8
+ ---
9
+
10
+ ## Table of Contents
11
+
12
+ 1. [Philosophy](#1-philosophy)
13
+ 2. [Quick Start](#2-quick-start)
14
+ 3. [Setup](#3-setup)
15
+ 4. [Daily Workflows](#4-daily-workflows)
16
+ 5. [Commands Reference](#5-commands-reference)
17
+ 6. [Automatic Guards (Hooks)](#6-automatic-guards-hooks)
18
+ 7. [Build Test Script](#7-build-test-script)
19
+ 8. [Specs & Test Plans Format](#8-specs--test-plans-format)
20
+ 9. [Customization](#9-customization)
21
+ 10. [Token Cost Guide](#10-token-cost-guide)
22
+ 11. [Troubleshooting](#11-troubleshooting)
23
+ 12. [FAQ](#12-faq)
24
+
25
+ ---
26
+
27
+ ## 1. Philosophy
28
+
29
+ ### The Core Cycle
30
+
31
+ ```
32
+ SPEC → TEST PLAN → CODE + TESTS → BUILD PASS
33
+ ```
34
+
35
+ Every code change — feature, fix, or removal — follows this cycle. The spec is the source of truth. If code contradicts the spec, the code is wrong.
36
+
37
+ ### Why Spec-First?
38
+
39
+ - **Prevents drift.** Without a spec, code becomes the only documentation and diverges from intent over time.
40
+ - **Tests have purpose.** Test plans derived from specs test behavior, not implementation details. This means tests survive refactoring.
41
+ - **AI writes better code.** When Claude Code has a spec to reference, it generates more accurate implementations and more meaningful tests.
42
+ - **Reviews are grounded.** Reviewers can check code against the spec rather than guessing at intent.
43
+
44
+ ### Principles
45
+
46
+ 1. **Specs are source of truth** — Code changes require spec updates first.
47
+ 2. **Incremental, not big-bang** — Test after each code chunk, not after everything is done.
48
+ 3. **Tests travel with code** — Every PR includes production code + tests + spec updates.
49
+ 4. **Build pass is the gate** — Nothing merges with failing tests.
50
+ 5. **Everything in the repo** — Specs, plans, tests, and code are version-controlled and reviewable.
51
+
52
+ ---
53
+
54
+ ## 2. Quick Start
55
+
56
+ **Time needed: 5 minutes.**
57
+
58
+ ```bash
59
+ # 1. Install dev-kit into your project
60
+ npx claude-devkit-cli init .
61
+
62
+ # 2. Open your project in Claude Code
63
+ claude
64
+
65
+ # 3. Create your first spec and test plan
66
+ /plan "describe your feature here"
67
+
68
+ # 4. Write code, then test
69
+ /test
70
+
71
+ # 5. Review before merging
72
+ /review
73
+
74
+ # 6. Commit
75
+ /commit
76
+ ```
77
+
78
+ That's it. The CLI auto-detects your project type and configures everything.
79
+
80
+ ---
81
+
82
+ ## 3. Setup
83
+
84
+ ### Prerequisites
85
+
86
+ | Tool | Required | Why |
87
+ |------|----------|-----|
88
+ | **Claude Code CLI** | Yes | Runs the commands and hooks |
89
+ | **Git** | Yes | Change detection, commit workflow |
90
+ | **Node.js** (18+) | Yes | File guard hook, JSON parsing |
91
+ | **Bash** (4+) | Yes | Path guard hook, build-test script |
92
+ | **Language toolchain** | Yes | Whatever your project uses (Swift, npm, pytest, etc.) |
93
+
94
+ ### Installation
95
+
96
+ **Option A: One-command install** (recommended)
97
+
98
+ ```bash
99
+ npx claude-devkit-cli init .
100
+ ```
101
+
102
+ **Option B: Global install**
103
+
104
+ ```bash
105
+ npm install -g claude-devkit-cli
106
+
107
+ # Then, in any project:
108
+ cd my-project
109
+ claude-devkit init .
110
+ ```
111
+
112
+ **Option C: Force re-install** (overwrites existing files)
113
+
114
+ ```bash
115
+ npx claude-devkit-cli init --force .
116
+ ```
117
+
118
+ **Option D: Selective install** (only specific components)
119
+
120
+ ```bash
121
+ npx claude-devkit-cli init --only hooks,commands .
122
+ ```
123
+
124
+ ### What Gets Installed
125
+
126
+ ```
127
+ your-project/
128
+ ├── .claude/
129
+ │ ├── CLAUDE.md ← Project rules hub
130
+ │ ├── settings.json ← Hook wiring
131
+ │ ├── hooks/
132
+ │ │ ├── file-guard.js ← Warns on large files
133
+ │ │ ├── path-guard.sh ← Blocks wasteful Bash paths
134
+ │ │ ├── glob-guard.js ← Blocks broad glob patterns
135
+ │ │ ├── comment-guard.js ← Blocks placeholder comments
136
+ │ │ ├── sensitive-guard.sh ← Blocks access to secrets
137
+ │ │ └── self-review.sh ← Quality checklist on stop
138
+ │ └── commands/
139
+ │ ├── plan.md ← /plan command
140
+ │ ├── challenge.md ← /challenge command
141
+ │ ├── test.md ← /test command
142
+ │ ├── fix.md ← /fix command
143
+ │ ├── review.md ← /review command
144
+ │ └── commit.md ← /commit command
145
+ ├── scripts/
146
+ │ └── build-test.sh ← Universal test runner
147
+ └── docs/
148
+ ├── specs/ ← Your specs go here
149
+ ├── test-plans/ ← Generated test plans
150
+ └── WORKFLOW.md ← Process reference
151
+ ```
152
+
153
+ ### Post-Install Configuration
154
+
155
+ The CLI auto-detects your project type and fills in `CLAUDE.md`. Verify it's correct:
156
+
157
+ ```bash
158
+ cat .claude/CLAUDE.md
159
+ ```
160
+
161
+ Look for the **Project Info** section. Ensure language, test framework, and directories are correct. Edit manually if needed.
162
+
163
+ ### Upgrade
164
+
165
+ ```bash
166
+ npx claude-devkit-cli upgrade
167
+ ```
168
+
169
+ Smart upgrade — updates kit files but preserves any you've customized. Use `--force` to overwrite everything.
170
+
171
+ ```bash
172
+ # Check if update is available
173
+ npx claude-devkit-cli check
174
+
175
+ # See what changed
176
+ npx claude-devkit-cli diff
177
+
178
+ # View installed files and status
179
+ npx claude-devkit-cli list
180
+ ```
181
+
182
+ ### Uninstall
183
+
184
+ ```bash
185
+ npx claude-devkit-cli remove
186
+ ```
187
+
188
+ This removes hooks, commands, settings, and build-test.sh. It preserves `CLAUDE.md` (which you may have customized) and `docs/` (which contains your specs and plans).
189
+
190
+ ---
191
+
192
+ ## 4. Daily Workflows
193
+
194
+ ### New Feature
195
+
196
+ > When: Building something new — no existing code or spec.
197
+
198
+ ```
199
+ 1. /plan "description of the feature"
200
+ → Generates spec + test plan. Review both.
201
+
202
+ 2. Implement code in chunks.
203
+ After each chunk: /test
204
+ Repeat until green.
205
+
206
+ 3. /review (before merge)
207
+
208
+ 4. /commit
209
+ ```
210
+
211
+ **Example:**
212
+ ```
213
+ /plan "User authentication with email/password login, password reset via email, and session management with 24h expiry"
214
+ ```
215
+
216
+ ### Update Existing Feature
217
+
218
+ > When: Changing behavior of something that already exists.
219
+
220
+ ```
221
+ 1. Edit the spec first: docs/specs/<feature>.md
222
+
223
+ 2. /plan docs/specs/<feature>.md
224
+ → Updates the test plan with new/modified/removed cases.
225
+
226
+ 3. Implement the code change.
227
+ /test
228
+ Fix until green.
229
+
230
+ 4. /review → /commit
231
+ ```
232
+
233
+ ### Bug Fix
234
+
235
+ > When: Something is broken.
236
+
237
+ ```
238
+ 1. /fix "description of the bug"
239
+ → Writes failing test → fixes code → runs full suite.
240
+
241
+ 2. /commit
242
+ ```
243
+
244
+ **Example:**
245
+ ```
246
+ /fix "Search returns no results when query contains apostrophes like O'Brien"
247
+ ```
248
+
249
+ ### Remove Feature
250
+
251
+ > When: Deleting code, removing deprecated functionality.
252
+
253
+ ```
254
+ 1. Mark spec sections as removed in docs/specs/<feature>.md
255
+
256
+ 2. Delete production code + related tests.
257
+
258
+ 3. bash scripts/build-test.sh (run full suite)
259
+ Fix cascading breaks.
260
+
261
+ 4. /commit
262
+ ```
263
+
264
+ ---
265
+
266
+ ## 5. Commands Reference
267
+
268
+ ### /plan — Generate Spec + Test Plan
269
+
270
+ **Usage:**
271
+ ```
272
+ /plan docs/specs/auth.md # Mode A: from existing spec
273
+ /plan "user authentication with OAuth2" # Mode B: from description
274
+ /plan docs/specs/auth.md (after spec edit) # Mode C: update existing plan
275
+ ```
276
+
277
+ **Modes:**
278
+ - **Mode A** — Reads an existing spec, generates a test plan.
279
+ - **Mode B** — Drafts a spec from your description, asks for confirmation, then generates the test plan.
280
+ - **Mode C** — Updates an existing spec + test plan when requirements change.
281
+
282
+ **How it works:**
283
+
284
+ 1. **Phase 0: Codebase Awareness** — Scans existing code, `docs/specs/`, `docs/test-plans/`, and project patterns before planning. Prevents plans that conflict with existing implementations.
285
+ 2. **Phase 1: Write/Update Spec** — Generates a structured spec with sections: Overview, Data Model, Use Cases, State Machine, Constraints & Invariants, Error Handling, Security Considerations. Depth is proportional to complexity — simple CRUD gets 1 paragraph + 3 use cases, complex auth gets the full template.
286
+ 3. **Phase 2: Clarify Ambiguities** — Systematically finds gaps across behavioral, data, auth, non-functional, integration, and concurrency dimensions. Asks 3-5 targeted questions with 2-4 options each before proceeding. If the spec is clear and complete, 0 questions is valid.
287
+ 4. **Phase 3: Generate Test Plan** — Derives test cases from the spec (never from code).
288
+
289
+ **Traceability IDs:** Every requirement gets a traceable ID:
290
+ - `UC-NNN` — Use Cases
291
+ - `FR-NNN` — Functional Requirements
292
+ - `SC-NNN` — Security Constraints
293
+ - `TC-NNN` — Test Cases (each must reference at least one FR or SC)
294
+
295
+ **Test plan table format:**
296
+
297
+ | ID | Priority | Type | UC | FR/SC | Description | Expected |
298
+ |----|----------|------|----|-------|-------------|----------|
299
+ | TC-001 | P0 | unit | UC-001 | FR-001 | ... | ... |
300
+
301
+ Priorities: **P0** (must have), **P1** (should have), **P2** (nice to have).
302
+
303
+ **Output:**
304
+ - Spec: `docs/specs/<feature>.md`
305
+ - Test plan: `docs/test-plans/<feature>.md`
306
+
307
+ ### /challenge — Adversarial Plan Review
308
+
309
+ **Usage:**
310
+ ```
311
+ /challenge docs/test-plans/auth.md # challenge a test plan
312
+ /challenge docs/specs/auth.md # challenge a spec
313
+ /challenge "user authentication" # challenge by feature name
314
+ ```
315
+
316
+ **How it works (7 phases):**
317
+
318
+ 1. **Read & Map** — Reads the plan/spec and maps: decisions made, assumptions (stated AND implied), dependencies, scope boundaries, risk acknowledgments, spec-plan consistency.
319
+ 2. **Scale Reviewers** — Assesses complexity and selects reviewers:
320
+
321
+ | Complexity | Signals | Reviewers |
322
+ |------------|---------|-----------|
323
+ | Simple | 1 spec section, <20 test cases, no auth/data | 2 |
324
+ | Standard | Multiple sections, auth or data involved | 3 |
325
+ | Complex | Multiple integrations, concurrency, migrations, 6+ phases | 4 |
326
+
327
+ 3. **Spawn Reviewers** — Launches parallel subagents, each with an adversarial lens:
328
+
329
+ - **Security Adversary**
330
+ - OWASP Top 10
331
+ - Injection vectors
332
+ - Auth/authz bypass
333
+ - Crypto issues
334
+ - Data exposure
335
+ - Supply chain risks
336
+
337
+ - **Failure Mode Analyst** — *"Everything that can go wrong, will — simultaneously, at 3 AM, during peak traffic"*
338
+ - Partial failures
339
+ - Concurrency & race conditions
340
+ - Cascading failures
341
+ - Recovery paths
342
+ - Idempotency
343
+ - Observability gaps
344
+
345
+ - **Assumption Destroyer** — *"'It should work' is not evidence"*
346
+ - Unverified claims
347
+ - Scale assumptions
348
+ - Environment differences
349
+ - Integration contracts
350
+ - Data shape assumptions
351
+ - Timing dependencies
352
+ - Hidden dependencies
353
+
354
+ - **Scope & YAGNI Critic** — *"The best code is no code. The best feature is the one you didn't build"*
355
+ - Over-engineering
356
+ - Premature abstraction
357
+ - Missing MVP cuts
358
+ - Gold plating
359
+ - Simpler alternatives
360
+
361
+ 4. **Deduplicate & Rate** — Collects all findings, removes duplicates, rates severity using a Likelihood x Impact matrix. Caps at 15 findings: keeps all Critical, top High by specificity, notes how many Medium were dropped. Each reviewer is limited to top 7 findings.
362
+
363
+ 5. **Adjudicate** — Evaluates each finding: Accept (valid flaw, plan should change) or Reject (false positive, acceptable risk, already handled). 1-sentence rationale for each.
364
+
365
+ 6. **User Choice** — Two modes: "Apply all accepted" (fast) or "Review each" (walk through one by one).
366
+
367
+ 7. **Apply** — Surgical edits only to accepted findings. Doesn't rewrite surrounding sections.
368
+
369
+ **Finding format:** Each finding includes Title, Severity, Location, Flaw description, Evidence (direct quote from the plan), step-by-step Failure scenario, and Suggested fix.
370
+
371
+ **6 non-negotiable rules:**
372
+ 1. Spawn reviewers in parallel (not sequential)
373
+ 2. Reviewers read files directly, not summarized content
374
+ 3. Be hostile — no praise, no softening
375
+ 4. Every finding must quote the plan directly as evidence
376
+ 5. Quality over quantity — 3 honest findings > 15 padded ones
377
+ 6. Skip style/formatting — substance only
378
+
379
+ **When to use:**
380
+ - After `/plan`, before coding — for complex features
381
+ - Features involving auth, payments, data pipelines, multi-service integration
382
+ - NOT needed for simple CRUD, small bug fixes, or trivial features
383
+
384
+ **Token cost:** 15-30k (uses parallel subagents, doesn't bloat main context)
385
+
386
+ ### /test — Write + Run Tests
387
+
388
+ **Usage:**
389
+ ```
390
+ /test # test all changes vs base branch
391
+ /test src/api/users.ts # test specific file
392
+ /test "user authentication" # test specific feature
393
+ ```
394
+
395
+ **How it works:**
396
+
397
+ 1. **Phase 0: Build Context** — Finds changed files vs base branch, reads matching test plan/spec, reads existing tests for patterns, fixtures, and naming conventions. Doesn't duplicate what already exists.
398
+ 2. **Phase 1: Write Tests** — Creates or updates tests based on the test plan. Each test covers one concept, is independent, deterministic (no random, no time-dependent, no external calls), and has a clear name.
399
+ 3. **Phase 2: Compile First** — Runs typecheck/compile before executing tests. Catches syntax errors early.
400
+ 4. **Phase 3: Run Tests** — Executes the test suite.
401
+ 5. **Phase 4: Fix Loop** — If tests fail, fixes **test code only** (max 3 attempts, then hard stop and report). If tests expect X but code does Y, asks you whether to fix production code or adjust the test.
402
+ 6. **Phase 5: Report** — Summary with test counts, results, coverage, and files touched.
403
+
404
+ **Rules:**
405
+ - Never changes production code without asking first
406
+ - Never deletes or weakens existing tests
407
+ - Never adds `skip`/`xit`/`@disabled` to hide failures
408
+ - Max 3 fix attempts — then stops and reports the issue
409
+
410
+ **What NOT to test:** Private/internal methods, framework behavior, trivial getters/setters, implementation details.
411
+
412
+ ### /fix — Test-First Bug Fix
413
+
414
+ **Usage:**
415
+ ```
416
+ /fix "description of the bug"
417
+ ```
418
+
419
+ **How it works:**
420
+
421
+ 1. **Phase 0: Investigate** — Parses the bug report, locates relevant code, checks git history (`git log` + `git blame`), forms a hypothesis with evidence: *"I believe the bug is caused by [X] in [file:function] because [evidence]."* If the bug is in a dependency/config/data (not your code), reports that before proceeding.
422
+ 2. **Phase 1: Write Failing Test** — Creates a regression test that reproduces the bug. Test includes a comment: `// Regression: <bug description> — <expected> vs <actual>`.
423
+ 3. **Phase 2: Confirm Failure** — Runs the test to verify it fails for the right reason.
424
+ 4. **Phase 3: Fix** — Minimal change to production code. If other tests break, the fix is wrong — never weakens existing tests.
425
+ 5. **Phase 4: Root Cause Analysis** — Documents: Symptom, Root cause, Gap (why wasn't this caught earlier?), Prevention (suggests one: type constraint, validation, lint rule, spec update, or test plan update). Non-optional for serious bugs; for trivial bugs, the fix summary is enough.
426
+ 6. **Phase 5: Full Suite** — Runs all tests to catch regressions.
427
+
428
+ **Multiple bugs:** Triages by severity, fixes one at a time, commits each separately.
429
+
430
+ ### /review — Pre-Merge Quality Gate
431
+
432
+ **Usage:**
433
+ ```
434
+ /review # review all changes vs base branch
435
+ /review src/auth/ # review specific directory
436
+ ```
437
+
438
+ **How it works:**
439
+
440
+ 1. **Phase 0: Understand Intent** — Reads commit messages and checks for related spec/test plan. Understands *why* the change was made before reviewing *how*.
441
+ 2. **Phase 1: Smart Focus** — Auto-detects what to focus on based on the diff:
442
+
443
+ | Diff contains | Focus on |
444
+ |---------------|----------|
445
+ | Auth/session code | Security, token handling, permission checks |
446
+ | SQL/queries | Injection, parameterization, N+1 queries |
447
+ | API endpoints | Input validation, error responses, rate limiting |
448
+ | `.env`/config | Secrets exposure, environment handling |
449
+ | Tests only | Test quality, coverage gaps, flaky patterns |
450
+ | Payment/billing | Financial accuracy, idempotency, audit trails |
451
+
452
+ 3. **Phase 2: Review** — Checks security, correctness, null safety, spec-test alignment, and code quality. Spends 60% of analysis on the primary focus area. Looks for specific patterns: `${var}` in queries, `.innerHTML`, template literals in SQL, optionals without guards.
453
+ 4. **Phase 3: Report** — Structured report with severity tiers (Critical/High/Medium/Low).
454
+
455
+ **Proportional review:** A 5-line doc change gets a light review. A 500-line auth rewrite gets file-by-file deep analysis. Diffs >500 lines get a note suggesting to split the commit.
456
+
457
+ **Verdicts:** APPROVE / REQUEST CHANGES / NEEDS DISCUSSION (three options, not binary).
458
+
459
+ **Rules:**
460
+ - At least 1 positive note — reinforces good patterns, not just problems
461
+ - Never auto-fixes code — report only
462
+ - Checks spec-test alignment: code changed → spec/tests also changed? Vague requirements without metrics ("fast", "secure") get flagged with a suggestion to add concrete numbers
463
+
464
+ ### /commit — Smart Git Commit
465
+
466
+ **Usage:**
467
+ ```
468
+ /commit
469
+ ```
470
+
471
+ **How it works:**
472
+
473
+ 1. **Analyze** — Scans `git status`, diff stats, and file contents in one pass.
474
+ 2. **Scan for secrets** — Matches patterns: `api_key`, `token`, `password`, `secret`, `private_key`, `credential`, `auth_token`. **Hard block** — stops immediately if found, non-negotiable.
475
+ 3. **Scan for debug code** — Matches: `console.log`, `debugger`, `print()`, `TODO:remove`, `HACK:`, `FIXME:temp`, `binding.pry`, `var_dump`. **Soft warn** — proceeds if you confirm.
476
+ 4. **Stage files** — Stages specific files by name. Never uses `git add -A`.
477
+ 5. **Generate message** — Conventional format: `type(scope): description`. Imperative tense ("add" not "added"), no period, WHAT+WHY not HOW.
478
+ 6. **Commit** — Does NOT push (safe default). Ask Claude explicitly to push.
479
+
480
+ **Large diff warning:** If >10 files OR >300 lines changed, suggests splitting into smaller commits for easier review.
481
+
482
+ **Never stages:** `.env`, credentials, build artifacts, generated files, binaries >1MB.
483
+
484
+ **Breaking changes:** If the diff removes/renames a public function, export, or API endpoint, uses `feat!` or `fix!` type, or adds a `BREAKING CHANGE:` footer.
485
+
486
+ ---
487
+
488
+ ## 6. Automatic Guards (Hooks)
489
+
490
+ Hooks run automatically — you don't invoke them. They provide passive protection.
491
+
492
+ ### File Guard (`file-guard.js`)
493
+
494
+ **Trigger:** After every Write or Edit operation.
495
+ **Action:** If the modified file exceeds 200 lines, injects a warning suggesting modularization.
496
+ **Blocking:** No — warns only, does not prevent the edit.
497
+
498
+ **Configuration:**
499
+ ```bash
500
+ # Change the line threshold (default: 200)
501
+ export FILE_GUARD_THRESHOLD=300
502
+
503
+ # Exclude files from checking (comma-separated globs)
504
+ export FILE_GUARD_EXCLUDE="*.generated.swift,*.pb.go,*.min.js"
505
+ ```
506
+
507
+ ### Path Guard (`path-guard.sh`)
508
+
509
+ **Trigger:** Before every Bash command.
510
+ **Action:** Blocks commands that reference large directories (node_modules, build artifacts, etc.).
511
+ **Blocking:** Yes — prevents the command from running.
512
+
513
+ **Default blocked paths:**
514
+ `node_modules`, `__pycache__`, `.git/objects`, `dist/`, `build/`, `.next/`, `vendor/`, `Pods/`, `.build/`, `DerivedData/`, `.gradle/`, `target/debug`, `target/release`, `.nuget`, `.cache`
515
+
516
+ **Configuration:**
517
+ ```bash
518
+ # Add project-specific blocked paths (pipe-separated)
519
+ export PATH_GUARD_EXTRA="\.terraform|\.vagrant|\.docker"
520
+ ```
521
+
522
+ ### Glob Guard (`glob-guard.js`)
523
+
524
+ **Trigger:** Before every Glob (file search) operation.
525
+ **Action:** Blocks overly broad glob patterns at project root that would return thousands of files and fill the context window.
526
+ **Blocking:** Yes — prevents the glob and suggests scoped alternatives.
527
+
528
+ **What it blocks:**
529
+ - `**/*.ts` at project root (use `src/**/*.ts` instead)
530
+ - `**/*` at project root (use `src/**/*` instead)
531
+ - `*` or `**` at project root
532
+ - Any recursive glob without a specific directory prefix
533
+
534
+ **What it allows:**
535
+ - `src/**/*.ts` — scoped to a specific directory
536
+ - `tests/**/*.test.js` — scoped to tests
537
+ - `**/*.ts` when run from inside a scoped directory (e.g., `path: "src"`)
538
+
539
+ ### Comment Guard (`comment-guard.js`)
540
+
541
+ **Trigger:** After every Edit operation.
542
+ **Action:** Detects when real code is replaced with placeholder comments like `// ... existing code ...` or `// rest of implementation`. This is a common LLM laziness pattern.
543
+ **Blocking:** Yes — rejects the edit and tells Claude to preserve the original code.
544
+
545
+ **What it catches:**
546
+ - `// ... existing code ...`, `// ... rest of implementation`
547
+ - `// [previous code remains]`, `// unchanged`
548
+ - `/* ... */` replacing real code
549
+ - `# ... existing ...` (Python placeholders)
550
+ - `// TODO: implement` replacing real code
551
+ - Any edit where real code is replaced with a much shorter comment-only block
552
+
553
+ **What it allows:**
554
+ - Editing comments (old content was already comments)
555
+ - Adding comments alongside code (new content has both)
556
+ - Normal code replacements
557
+
558
+ ### Sensitive Guard (`sensitive-guard.sh`)
559
+
560
+ **Trigger:** Before every Read, Write, Edit, and Bash command.
561
+ **Action:** Protects files containing secrets: `.env`, private keys, credentials, tokens.
562
+ **Blocking:** Read/Write/Edit → **blocks** (exit 2). Bash commands → **warns only** (allows access).
563
+
564
+ The Bash warn-only behavior enables an approval flow: Claude asks the user for permission, and if approved, can use `bash cat .env` to read the file.
565
+
566
+ **Protected files:**
567
+ - `.env`, `.env.local`, `.env.production`, etc. (but NOT `.env.example`)
568
+ - Private keys: `*.pem`, `*.key`, `*.p12`, `*.pfx`, `*.jks`
569
+ - SSH keys: `id_rsa`, `id_ecdsa`, `id_ed25519`
570
+ - Cloud credentials: `serviceAccountKey.json`, `firebase-adminsdk*`
571
+ - Token files: `.npmrc`, `.pypirc`, `.netrc`
572
+ - Any file matching `*credential*`, `*secret*`, `*private_key*`
573
+
574
+ **Supports `.agentignore`:** Create a `.agentignore` file (or `.aiignore`, `.cursorignore`) in the project root with gitignore-style patterns to add project-specific protections.
575
+
576
+ **Configuration:**
577
+ ```bash
578
+ # Add extra patterns (pipe-separated regex)
579
+ export SENSITIVE_GUARD_EXTRA="\.vault|.*_token\.json"
580
+ ```
581
+
582
+ ### Self-Review (`self-review.sh`)
583
+
584
+ **Trigger:** When Claude is about to stop (Stop event).
585
+ **Action:** Injects a self-review checklist reminding Claude to verify quality before finishing.
586
+ **Blocking:** No — just a reminder.
587
+
588
+ **Questions asked:**
589
+ 1. Did you leave any TODO/FIXME that should be resolved now?
590
+ 2. Did you create mock/fake implementations just to pass tests?
591
+ 3. Did you replace real code with placeholder comments?
592
+ 4. Do all changed files compile and typecheck cleanly?
593
+ 5. Did you run the full test suite, not just the new tests?
594
+ 6. Are there any files you modified but forgot to include in the summary?
595
+
596
+ **Configuration:**
597
+ ```bash
598
+ # Disable self-review
599
+ export SELF_REVIEW_ENABLED=false
600
+ ```
601
+
602
+ ### Testing Hooks Manually
603
+
604
+ You can test hooks by piping mock JSON payloads:
605
+
606
+ ```bash
607
+ # ── Path Guard ──
608
+ # Should exit 2 (blocked)
609
+ echo '{"tool_input":{"command":"ls node_modules"}}' | bash .claude/hooks/path-guard.sh
610
+ echo $? # expect: 2
611
+
612
+ # Should exit 0 (allowed)
613
+ echo '{"tool_input":{"command":"ls src"}}' | bash .claude/hooks/path-guard.sh
614
+ echo $? # expect: 0
615
+
616
+ # ── File Guard ──
617
+ seq 1 250 > /tmp/test-large.txt
618
+ echo '{"tool_input":{"file_path":"/tmp/test-large.txt"}}' | node .claude/hooks/file-guard.js
619
+ # Should output JSON with additionalContext warning
620
+
621
+ # ── Comment Guard ──
622
+ # Should exit 2 (blocked — replacing code with placeholder)
623
+ echo '{"tool_input":{"old_string":"function hello() {\n return world;\n}","new_string":"// ... existing code ..."}}' | node .claude/hooks/comment-guard.js
624
+ echo $? # expect: 2
625
+
626
+ # Should exit 0 (allowed — replacing code with code)
627
+ echo '{"tool_input":{"old_string":"return a;","new_string":"return b;"}}' | node .claude/hooks/comment-guard.js
628
+ echo $? # expect: 0
629
+
630
+ # ── Sensitive Guard ──
631
+ # Should exit 2 (blocked)
632
+ echo '{"tool_input":{"file_path":".env"}}' | bash .claude/hooks/sensitive-guard.sh
633
+ echo $? # expect: 2
634
+
635
+ # Should exit 0 (allowed)
636
+ echo '{"tool_input":{"file_path":".env.example"}}' | bash .claude/hooks/sensitive-guard.sh
637
+ echo $? # expect: 0
638
+
639
+ # Should exit 0 (warn only — bash commands are allowed for approved access)
640
+ echo '{"tool_input":{"command":"cat .env.local"}}' | bash .claude/hooks/sensitive-guard.sh
641
+ echo $? # expect: 0 (with warning on stderr)
642
+
643
+ # ── Glob Guard ──
644
+ # Should exit 2 (blocked — broad pattern at root)
645
+ echo '{"tool_input":{"pattern":"**/*.ts"}}' | node .claude/hooks/glob-guard.js
646
+ echo $? # expect: 2
647
+
648
+ # Should exit 0 (allowed — scoped pattern)
649
+ echo '{"tool_input":{"pattern":"src/**/*.ts"}}' | node .claude/hooks/glob-guard.js
650
+ echo $? # expect: 0
651
+ ```
652
+
653
+ ---
654
+
655
+ ## 7. Build Test Script
656
+
657
+ ### Usage
658
+
659
+ ```bash
660
+ bash scripts/build-test.sh # run all tests
661
+ bash scripts/build-test.sh --filter "Auth" # filter by pattern
662
+ bash scripts/build-test.sh --list # show detected project type
663
+ bash scripts/build-test.sh --ci # machine-readable output
664
+ bash scripts/build-test.sh --help # show usage
665
+ ```
666
+
667
+ ### Supported Languages
668
+
669
+ | Language | Detected By | Test Command |
670
+ |----------|-------------|-------------|
671
+ | Swift (SPM) | `Package.swift` | `swift test` |
672
+ | Swift (Xcode) | `*.xcworkspace` / `*.xcodeproj` | `xcodebuild test` |
673
+ | Node (Vitest) | `vitest.config.*` or vitest in `package.json` | `npx vitest run` |
674
+ | Node (Jest) | `jest.config.*` or jest in `package.json` | `npx jest` |
675
+ | Python (pytest) | `pyproject.toml`, `setup.py`, `pytest.ini` | `python3 -m pytest` |
676
+ | Rust | `Cargo.toml` | `cargo test` |
677
+ | Go | `go.mod` | `go test -race ./...` |
678
+ | Java (Gradle) | `build.gradle` / `build.gradle.kts` | `./gradlew test` |
679
+ | Java (Maven) | `pom.xml` | `mvn test` |
680
+ | C# (.NET) | `*.sln` / `*.csproj` | `dotnet test` |
681
+ | Ruby (RSpec) | `Gemfile` with rspec | `bundle exec rspec` |
682
+ | Ruby (Minitest) | `Gemfile` without rspec | `bundle exec rake test` |
683
+
684
+ Detection order: first match wins. The script also detects package managers (pnpm, bun) for Node projects.
685
+
686
+ ### Exit Codes
687
+
688
+ | Code | Meaning |
689
+ |------|---------|
690
+ | 0 | All tests passed |
691
+ | 1 | Tests failed |
692
+ | 2 | No project detected or missing tooling |
693
+
694
+ ### CI Integration
695
+
696
+ ```yaml
697
+ # GitHub Actions example
698
+ - name: Run tests
699
+ run: bash scripts/build-test.sh --ci
700
+ ```
701
+
702
+ ### Adding a New Language
703
+
704
+ Edit `scripts/build-test.sh`:
705
+ 1. Add a `detect_<language>()` function
706
+ 2. Add it to the `DETECTORS` array
707
+ 3. The function should set `LANG_NAME` and `TEST_CMD`
708
+
709
+ ---
710
+
711
+ ## 8. Specs & Test Plans Format
712
+
713
+ ### Business Logic Spec Template
714
+
715
+ Create specs at `docs/specs/<feature-name>.md`:
716
+
717
+ ```markdown
718
+ # Spec: <Feature Name>
719
+
720
+ ## Overview
721
+ What this feature does, why it exists, who uses it. 2-3 sentences.
722
+
723
+ ## Data Model
724
+ Entities, attributes, relationships (table format).
725
+
726
+ ## Use Cases
727
+
728
+ ### UC-001: <Name>
729
+ - **Actor:** User / System
730
+ - **Preconditions:** ...
731
+ - **Flow:** 1. ... 2. ...
732
+ - **Postconditions:** ...
733
+ - **Error cases:** ...
734
+
735
+ ## Settings / Configuration
736
+ Configurable behavior and defaults.
737
+
738
+ ## Constraints & Invariants
739
+ Rules that must always hold.
740
+
741
+ ## Error Handling
742
+ How errors surface to users and are logged.
743
+
744
+ ## Security Considerations
745
+ Auth, authorization, data sensitivity.
746
+ ```
747
+
748
+ Skip sections that don't apply. Match depth to feature complexity.
749
+
750
+ ### Test Plan Format
751
+
752
+ Generated by `/plan` at `docs/test-plans/<feature-name>.md`:
753
+
754
+ ```markdown
755
+ # Test Plan: <Feature Name>
756
+
757
+ **Spec:** docs/specs/<feature-name>.md
758
+ **Generated:** 2026-04-01
759
+
760
+ ## Test Cases
761
+
762
+ | ID | Priority | Type | UC | FR/SC | Description | Expected |
763
+ |----|----------|------|----|-------|-------------|----------|
764
+ | TC-001 | P0 | unit | UC-001 | FR-001 | Valid login returns token | 200 + JWT |
765
+ | TC-002 | P0 | unit | UC-001 | FR-002 | Wrong password returns 401 | 401 + error msg |
766
+ | TC-003 | P1 | integration | UC-002 | FR-003 | Session expires after 24h | Auto-logout |
767
+
768
+ ## Coverage Notes
769
+ - Highest risk areas: ...
770
+ - Edge cases needing attention: ...
771
+ ```
772
+
773
+ ### Naming Conventions
774
+
775
+ | Item | Convention | Example |
776
+ |------|-----------|---------|
777
+ | Spec file | `<feature-name>.md` in `docs/specs/` | `user-auth.md` |
778
+ | Test plan | Same name as spec, in `docs/test-plans/` | `user-auth.md` |
779
+ | Test case ID | `TC-NNN` sequential | `TC-001`, `TC-042` |
780
+ | Priority | `P0` (critical), `P1` (important), `P2` (nice-to-have) | — |
781
+ | Type | `unit`, `integration`, `e2e`, `snapshot`, `performance` | — |
782
+
783
+ ---
784
+
785
+ ## 9. Customization
786
+
787
+ ### Environment Variables
788
+
789
+ | Variable | Default | Description |
790
+ |----------|---------|-------------|
791
+ | `FILE_GUARD_THRESHOLD` | `200` | Max lines before file guard warns |
792
+ | `FILE_GUARD_EXCLUDE` | _(empty)_ | Comma-separated globs to skip (e.g. `*.generated.swift`) |
793
+ | `PATH_GUARD_EXTRA` | _(empty)_ | Additional pipe-separated patterns to block (e.g. `\.terraform`) |
794
+ | `SENSITIVE_GUARD_EXTRA` | _(empty)_ | Additional pipe-separated patterns for sensitive files (e.g. `\.vault`) |
795
+ | `SELF_REVIEW_ENABLED` | `true` | Set to `false` to disable the self-review checklist on Stop |
796
+
797
+ Set these in your shell profile or project `.envrc` (if using direnv).
798
+
799
+ ### Extending CLAUDE.md
800
+
801
+ Add project-specific rules to `.claude/CLAUDE.md`:
802
+
803
+ ```markdown
804
+ ## Project-Specific Rules
805
+
806
+ - All API endpoints must have OpenAPI annotations
807
+ - Database migrations must be reversible
808
+ - UI components must support dark mode
809
+ - All strings must be localized via i18n keys
810
+ ```
811
+
812
+ ### Adding Custom Commands
813
+
814
+ Create new `.md` files in `.claude/commands/`:
815
+
816
+ ```markdown
817
+ # .claude/commands/deploy.md
818
+
819
+ Run the deployment pipeline:
820
+ 1. /review
821
+ 2. /commit
822
+ 3. Run: bash scripts/deploy.sh $ARGUMENTS
823
+ 4. Verify deployment health: curl -f https://api.example.com/health
824
+ ```
825
+
826
+ Then use: `/deploy staging`
827
+
828
+ ---
829
+
830
+ ## 10. Token Cost Guide
831
+
832
+ | Activity | Tokens | Frequency |
833
+ |----------|--------|-----------|
834
+ | `/test` (incremental, 1-3 files) | 5–10k | Every code chunk |
835
+ | `/fix` (single bug) | 3–5k | As needed |
836
+ | `/commit` | 2–4k | Every commit |
837
+ | `/review` (diff-based) | 10–20k | Before merge |
838
+ | `/plan` (new feature) | 20–40k | Start of feature |
839
+ | `/challenge` (adversarial review) | 15–30k | After /plan, complex features |
840
+ | Full audit (manual prompt) | 100k+ | Before release |
841
+
842
+ ### Minimizing Token Usage
843
+
844
+ - **Test incrementally.** `/test` after each small chunk uses 5-10k. Waiting until everything is done then running `/test` on a large diff uses 50k+.
845
+ - **Use filters.** `/test src/auth/login.ts` is cheaper than `/test` on the whole project.
846
+ - **Skip `/plan` for tiny changes.** Under 5 lines with no behavior change? Just `/test` and `/commit`.
847
+ - **Use `/review` only before merge.** Not after every commit.
848
+
849
+ ---
850
+
851
+ ## 11. Troubleshooting
852
+
853
+ ### Hook not firing
854
+
855
+ **Symptom:** File guard or path guard doesn't trigger.
856
+
857
+ **Check:**
858
+ 1. Is `settings.json` valid? `node -e "JSON.parse(require('fs').readFileSync('.claude/settings.json','utf-8'))"`
859
+ 2. Are hooks executable? `ls -la .claude/hooks/`
860
+ 3. Is Node.js available? `node --version`
861
+ 4. Is `$CLAUDE_PROJECT_DIR` set? Check in Claude Code with: `echo $CLAUDE_PROJECT_DIR`
862
+
863
+ ### Tests not detected
864
+
865
+ **Symptom:** `build-test.sh` says "No supported project detected."
866
+
867
+ **Check:**
868
+ 1. Are you in the project root? `pwd`
869
+ 2. Does the project marker file exist? (e.g., `package.json`, `Cargo.toml`)
870
+ 3. Run `bash scripts/build-test.sh --list` for diagnostic output.
871
+
872
+ ### Wrong base branch
873
+
874
+ **Symptom:** `/test` or `/review` compares against wrong branch.
875
+
876
+ **Check:**
877
+ ```bash
878
+ git symbolic-ref refs/remotes/origin/HEAD
879
+ ```
880
+
881
+ If this is wrong or missing:
882
+ ```bash
883
+ git remote set-head origin <your-main-branch>
884
+ ```
885
+
886
+ ### Path guard blocking a legitimate command
887
+
888
+ **Symptom:** Claude can't run a command you need.
889
+
890
+ **Fix:** The path guard blocks broad patterns. If you need to access `build/` for a specific reason, run the command directly in your terminal (not through Claude Code).
891
+
892
+ ### File guard warning on generated files
893
+
894
+ **Fix:** Set the exclude pattern:
895
+ ```bash
896
+ export FILE_GUARD_EXCLUDE="*.generated.swift,*.pb.go,*.min.js,*.snap"
897
+ ```
898
+
899
+ ---
900
+
901
+ ## 12. FAQ
902
+
903
+ **Q: Do I need specs for every tiny change?**
904
+ A: No. Changes under 5 lines with no behavior change can skip the spec. Just `/test` and `/commit`. The spec-first rule is for meaningful behavior changes.
905
+
906
+ **Q: Can I use mocks in tests?**
907
+ A: Only for external services you can't run locally (third-party APIs, email services). Never mock your own code or database just to make tests pass faster.
908
+
909
+ **Q: What if Claude writes a test that tests the wrong thing?**
910
+ A: This usually means the spec is ambiguous. Clarify the spec first, then re-run `/test`. Good specs produce good tests.
911
+
912
+ **Q: Can I use this with other AI coding tools?**
913
+ A: The commands and hooks are Claude Code-specific. The specs, test plans, workflow, and `build-test.sh` work with any tool or manual workflow.
914
+
915
+ **Q: When should I use `/challenge`?**
916
+ A: After `/plan`, for complex features involving authentication, payments, data pipelines, or multi-service integration. It spawns parallel hostile reviewers that find security holes, failure modes, and false assumptions BEFORE you write code. Skip it for simple CRUD or small features — the overhead isn't worth it.
917
+
918
+ **Q: How do I do a full coverage audit?**
919
+ A: This is intentionally not a command (it's expensive and rare). When needed, prompt Claude directly: "Audit test coverage for feature X against docs/test-plans/X.md. Identify gaps and write missing tests."
920
+
921
+ **Q: What if my project uses multiple languages?**
922
+ A: `build-test.sh` detects the first match. For monorepos, you may need to run it from each sub-project directory or customize the script.
923
+
924
+ **Q: Can I add more commands?**
925
+ A: Yes. Drop a `.md` file in `.claude/commands/` and it becomes available as a slash command. See [Customization](#9-customization).
926
+
927
+ **Q: How do I update the kit in existing projects?**
928
+ A: Run `npx claude-devkit-cli upgrade`. It automatically detects which files you've customized and only updates unchanged files. Use `--force` to overwrite everything.
929
+
930
+ **Q: I installed with the old setup.sh — how do I migrate?**
931
+ A: Run `npx claude-devkit-cli init --adopt .` to generate a manifest from your existing files without overwriting anything. Future upgrades will then work normally.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-devkit-cli",
3
- "version": "1.1.0",
3
+ "version": "1.1.1",
4
4
  "description": "CLI toolkit for spec-first development with Claude Code — hooks, commands, guards, and test runners",
5
5
  "bin": {
6
6
  "claude-devkit": "./bin/devkit.js",
@@ -28,8 +28,8 @@
28
28
  "guards"
29
29
  ],
30
30
  "scripts": {
31
- "prepublishOnly": "rm -rf templates && cp -r ../kit templates",
32
- "postpublish": "rm -rf templates",
31
+ "prepublishOnly": "rm -rf templates && cp -r ../kit templates && cp ../README.md README.md",
32
+ "postpublish": "rm -rf templates README.md",
33
33
  "dev": "node bin/devkit.js"
34
34
  },
35
35
  "license": "MIT",
@@ -13,7 +13,7 @@ Don't jump to code. Understand the bug first:
13
13
  3. **Check history.** `git log --oneline -5 -- <file>` and `git blame -L <range> <file>` — who changed this last and why?
14
14
  4. **Form a hypothesis:** "I believe the bug is caused by [X] in [file:function] because [evidence]."
15
15
 
16
- If the bug is in a dependency/config/data (not our code), say so before proceeding.
16
+ If the bug is in a dependency/config/data (not project code), say so before proceeding.
17
17
 
18
18
  ---
19
19