specpipe 1.0.0 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. package/README.md +116 -1220
  2. package/package.json +3 -2
  3. package/src/cli.js +16 -6
  4. package/src/commands/diff.js +1 -1
  5. package/src/commands/init-agents.js +40 -20
  6. package/src/commands/init-global.js +88 -33
  7. package/src/commands/init-interactive.js +71 -0
  8. package/src/commands/init.js +61 -22
  9. package/src/commands/remove.js +159 -49
  10. package/src/commands/upgrade.js +21 -56
  11. package/src/lib/agent-guards.js +34 -78
  12. package/src/lib/agent-install.js +38 -25
  13. package/src/lib/agents.js +53 -11
  14. package/src/lib/claude-global.js +50 -77
  15. package/src/lib/hooks.js +203 -0
  16. package/src/lib/installer.js +73 -61
  17. package/src/lib/reconcile.js +13 -8
  18. package/templates/{.claude/hooks → hooks}/file-guard.js +26 -21
  19. package/templates/hooks/specpipe-read-guard.sh +94 -21
  20. package/templates/hooks/specpipe-shell-guard.sh +121 -29
  21. package/templates/rules/specpipe-rules.md +77 -0
  22. package/templates/skills/sp-build/SKILL.md +101 -1
  23. package/templates/skills/sp-build-behavior-matrix/SKILL.md +876 -0
  24. package/templates/skills/sp-challenge/SKILL.md +34 -0
  25. package/templates/skills/sp-challenge-behavior-matrix/SKILL.md +289 -0
  26. package/templates/skills/sp-explore/SKILL.md +132 -0
  27. package/templates/skills/sp-explore-behavior-matrix/SKILL.md +862 -0
  28. package/templates/skills/sp-fix/SKILL.md +73 -1
  29. package/templates/skills/sp-fix-behavior-matrix/SKILL.md +338 -0
  30. package/templates/skills/sp-investigate/SKILL.md +70 -0
  31. package/templates/skills/sp-investigate-behavior-matrix/SKILL.md +718 -0
  32. package/templates/skills/sp-plan/SKILL.md +90 -0
  33. package/templates/skills/sp-plan-behavior-matrix/SKILL.md +1037 -0
  34. package/templates/skills/sp-review/SKILL.md +29 -3
  35. package/templates/skills/sp-review-behavior-matrix/SKILL.md +294 -0
  36. package/templates/.claude/CLAUDE.md +0 -79
  37. package/templates/.claude/hooks/path-guard.sh +0 -118
  38. package/templates/.claude/hooks/self-review.sh +0 -27
  39. package/templates/.claude/hooks/sensitive-guard.sh +0 -227
  40. package/templates/.claude/settings.json +0 -68
  41. package/templates/docs/WORKFLOW.md +0 -325
  42. package/templates/docs/specs/.gitkeep +0 -0
  43. package/templates/rules/specpipe-guards.md +0 -40
  44. package/templates/scripts/test-hooks.sh +0 -66
  45. /package/templates/{.claude/hooks → hooks}/comment-guard.js +0 -0
  46. /package/templates/{.claude/hooks → hooks}/glob-guard.js +0 -0
package/README.md CHANGED
@@ -1,1319 +1,215 @@
1
1
  <p align="center">
2
- <img src="docs/cover.svg" alt="Specpipe — spec-first multi-agent dev toolkit" width="100%">
2
+ <img src="docs/cover.svg" alt="Specpipe — spec-first toolkit for AI coding agents" width="100%">
3
3
  </p>
4
4
 
5
5
  <h1 align="center">Specpipe</h1>
6
6
 
7
7
  <p align="center">
8
- A lightweight, spec-first development toolkit for agentic AI coding agents.
8
+ <b>One spec-first workflow, installed into every AI coding agent you use.</b>
9
9
  </p>
10
10
 
11
- It enforces the cycle **spec (with acceptance scenarios) → code + tests → build pass** through skills, always-on guardrails, and a universal test runner.
12
-
13
- **Agents:** [Claude Code](https://claude.ai/code) (full hook enforcement) plus Codex, Cursor, Antigravity, OpenClaw, and Hermes (skills + advisory guard rules). Install for one or all: `specpipe init --agents <list>|all`. See [docs/multi-agent.md](docs/multi-agent.md).
14
- **Works with:** Swift, TypeScript/JavaScript, Python, Rust, Go, Java/Kotlin, C#, Ruby.
15
- **Dependencies:** None (requires only a supported agent CLI, Node.js, Git, and Bash).
16
- **Optional:** [GraphAtlas](https://github.com/microvn/graphatlas) MCP server for graph-based code intelligence — six skills use it automatically when present and fall back to `grep` when it isn't. See [§3 Setup](#3-setup).
17
-
18
- ---
19
-
20
- ## Table of Contents
21
-
22
- 1. [Philosophy](#1-philosophy)
23
- 2. [Quick Start](#2-quick-start)
24
- 3. [Setup](#3-setup)
25
- 4. [Daily Workflows](#4-daily-workflows)
26
- 5. [Commands Reference](#5-commands-reference)
27
- 6. [Automatic Guards (Hooks)](#6-automatic-guards-hooks)
28
- 7. [Spec Format](#7-spec-format)
29
- 8. [Customization](#8-customization)
30
- 9. [Token Cost Guide](#9-token-cost-guide)
31
- 10. [Troubleshooting](#10-troubleshooting)
32
- 11. [FAQ](#11-faq)
33
-
34
- ---
35
-
36
- ## 1. Philosophy
11
+ <p align="center">
12
+ <a href="https://specpipe.vercel.app"><b>Live demo&nbsp;→</b></a>
13
+ </p>
37
14
 
38
- ### The Core Cycle
15
+ Specpipe installs a disciplined loop — **spec → code + tests → build pass** — as native skills, guard hooks, and project rules. You author it once; it lands in whichever agent you run (Claude Code, Codex, Cursor, Antigravity, OpenClaw, Hermes). Switch agents, keep the same `/sp-*` commands.
39
16
 
40
- ```
41
- SPEC (with acceptance scenarios) CODE + TESTS BUILD PASS
17
+ ```bash
18
+ npx specpipe init # interactive picker choose agents, skills, guards
42
19
  ```
43
20
 
44
- Every code change — feature, fix, or removal — follows this cycle. The spec is the source of truth. Acceptance scenarios (Given/When/Then) are embedded directly in the spec — no separate test plan file. If code contradicts the spec, the code is wrong.
21
+ ---
45
22
 
46
- ### Why Spec-First?
23
+ ## Why
47
24
 
48
- - **Prevents drift.** Acceptance scenarios live inside the spec no separate test plan to fall out of sync.
49
- - **Tests have purpose.** Scenarios derived from specs test behavior, not implementation details. This means tests survive refactoring.
50
- - **AI writes better code.** When an agent has a spec with concrete Given/When/Then scenarios, it generates more accurate implementations and more meaningful tests.
51
- - **Reviews are grounded.** Reviewers can check code against the spec rather than guessing at intent.
25
+ AI agents write code fast and drift fast they invent requirements, test the wrong things, and quietly break what worked. Specpipe fixes the loop, not the model:
52
26
 
53
- ### Principles
27
+ - **The spec is the source of truth.** Acceptance scenarios (Given/When/Then) live inside the spec; if code contradicts it, the code is wrong. No separate test plan to fall out of sync.
28
+ - **Tests check behavior, not guesses.** They come from the spec, so they survive refactors.
29
+ - **Guardrails are enforced, not suggested.** Hooks stop an agent from reading secrets or crawling `node_modules` *before* it happens.
30
+ - **Learn it once.** The same commands and discipline work in every agent.
54
31
 
55
- 1. **Specs are source of truth** Code changes require spec updates first.
56
- 2. **Incremental, not big-bang** — Test after each code chunk, not after everything is done.
57
- 3. **Tests travel with code** — Every PR includes production code + tests + spec updates.
58
- 4. **Build pass is the gate** — Nothing merges with failing tests.
59
- 5. **Everything in the repo** — Specs, plans, tests, and code are version-controlled and reviewable.
32
+ You don't have to be a developer to feel it: describe a feature in plain language, the agent turns it into a spec you can read, builds against it, and proves it with tests — every step reviewable.
60
33
 
61
34
  ---
62
35
 
63
- ## 2. Quick Start
64
-
65
- **Time needed: 5 minutes.** Below is a realistic transcript — user input, what each skill actually asks, what it actually outputs. Nothing embellished.
36
+ ## Quick start
66
37
 
67
38
  ```bash
68
- npx specpipe init . # install for Claude Code (default)
69
- # or target other agents:
70
- npx specpipe init . --agents cursor,codex
71
- npx specpipe init . --agents all
72
-
73
- claude # open your agent (claude, codex, cursor, …)
39
+ npx specpipe init # interactive: scope (project/global), agents, skills, guards
40
+ npx specpipe init -y # non-interactive defaults: this project, Claude, all skills + guards
74
41
  ```
75
42
 
76
- > The transcript below is on Claude Code. The same skills install for Codex, Cursor,
77
- > Antigravity, OpenClaw, and Hermes — see [§ Supported agents](#supported-agents).
43
+ Then open your agent (`claude`, `codex`, `cursor`, …) and run the loop. The transcript below is on Claude Code with a Swift project; the same commands and output work in every agent.
78
44
 
79
- ### Step 1 — Spec the feature (`/sp-plan`)
45
+ **1 — Spec it (`/sp-plan`)**
80
46
 
81
47
  ```text
82
48
  You: /sp-plan "add tag filter to the todo list screen"
83
49
 
84
- Claude → running /sp-plan
50
+ Agent → running /sp-plan
85
51
  Phase 0: Codebase scan — found docs/specs/todo-list/todo-list.md (exists)
86
- Phase 1: Scope Challenge — TodoList already supports text search; tag
87
- filter can reuse the same query pipeline. Flagging: reuse vs rebuild.
88
-
89
- AskUserQuestion: "Extend existing search, or add a separate tag filter UI?"
90
- A) Extend search — one input, parsed (text + "tag:work") [lowest cost]
91
- B) Separate tag chips — chips below the search box [clearer UX]
92
- C) Both — chips drive the same query [most work]
52
+ Phase 1: Scope Challenge — TodoList already has text search; the tag filter
53
+ can reuse the same query pipeline. Flagging: reuse vs rebuild.
93
54
 
55
+ "Extend existing search, or add a separate tag filter UI?"
56
+ A) Extend search — one input, parsed (text + "tag:work") [lowest cost]
57
+ B) Separate tag chips — chips below the search box [clearer UX]
58
+ C) Both — chips drive the same query [most work]
94
59
  You: B
95
60
 
96
- Claude: Drafting spec…
97
- docs/specs/todo-list/todo-list.md (Mode C update, snapshot created)
98
-
99
- ## Stories (added)
61
+ Agent: Drafting spec… docs/specs/todo-list/todo-list.md (Mode C update, snapshot created)
100
62
  ### S-007: Filter todos by tag (P0)
101
63
  AS-014: Given todos tagged [work, home], when user taps "work" chip,
102
64
  then only work-tagged todos show.
103
- AS-015: Given no todos match the selected tags, when filter applied,
104
- then empty state shows "No todos match these tags".
105
-
106
- What Already Exists: TodoRepository.query() handles predicates — reuse.
65
+ AS-015: Given no todos match, when filter applied, then empty state shows.
107
66
  Not in Scope: saving favorite tag combinations (deferred).
108
67
  ```
109
68
 
110
- > **Complex spec?** Run `/sp-challenge docs/specs/<feature>/<feature>.md` before coding.
111
- > It spawns parallel hostile reviewers (Security Adversary, Failure Mode Analyst,
112
- > Assumption Destroyer, Scope & YAGNI Critic) that quote the spec and propose surgical
113
- > edits — e.g. "AS-014 doesn't define tag case-sensitivity → add constraint or AS".
114
- > Skip for simple CRUD like this tag filter.
115
-
116
- ### Step 2 — Implement + build (`/sp-build`)
69
+ **2 Build it (`/sp-build`)** TDD loop: coverage map → write the missing test → build green.
117
70
 
118
71
  ```text
119
- You: (edit TodoListView.swift, TagFilterChip.swift) — now /sp-build
120
-
121
- Claude → running /sp-build
122
72
  Phase 1.5: Coverage Map
123
- [+] TodoListView.swift
124
- └── applyTagFilter(selected:)
125
- ├── [★★ TESTED] empty selection → returns all (new test)
126
- ├── [★★ TESTED] single tag match (new test)
127
- ├── [GAP] no-match empty state ( AS-015)
128
- └── [★★ TESTED] multi-tag AND semantics (new test)
129
-
130
- Writing test for [GAP]… building… running.
131
- ✓ 14 tests passed (3 new, 0 regressions)
73
+ applyTagFilter(selected:)
74
+ ├── [★ TESTED] empty selection → returns all
75
+ ├── [GAP] no-match empty state ( AS-015)
76
+ └── [TESTED] multi-tag AND semantics
77
+ Writing test for [GAP] building… ✓ 14 tests passed (3 new, 0 regressions)
132
78
  ```
133
79
 
134
- ### Step 3 — Bug found? (`/sp-fix`)
80
+ **3 — Fix a bug (`/sp-fix`)** — failing test first, then the one-line fix, then green.
135
81
 
136
82
  ```text
137
83
  You: /sp-fix "selecting two tags shows OR results, should be AND"
138
-
139
- Clauderunning /sp-fix
140
- Phase 0: Bug Path Diagram
141
- applyTagFilter(selected: ["work","urgent"])
142
- └── [BUG] predicate uses .contains(anyOf:) — should be .contains(allOf:)
143
-
144
- Phase 1: writing failing test (AS-014 variant: 2 tags, AND)… FAILS ✓
145
- Phase 2: changing .anyOf → .allOf in TodoRepository.swift:88 (1 line)
146
- Phase 3: test now PASSES, full suite green.
147
-
148
- DEBUG REPORT
149
- Root cause: wrong set operator, regression from S-005 search refactor
150
- Prevention: add AS for multi-tag AND to spec (auto-signaled)
84
+ [BUG] predicate uses .contains(anyOf:) — should be .contains(allOf:)
85
+ failing test change .anyOf → .allOf (TodoRepository.swift:88) → suite green
151
86
  ```
152
87
 
153
- ### Step 4 — Review + commit (`/sp-review` → `/sp-commit`)
88
+ **4 — Review + commit (`/sp-review` → `/sp-commit`)** — failure-mode grid, then a secret-scanned conventional commit.
154
89
 
155
90
  ```text
156
- You: /sp-review
157
-
158
- Claude → Smart Focus: UI state + query predicate
159
- Failure Mode Grid: all paths have tests ✓ error states covered ✓
160
- Finding (confidence 8/10): selected-tags state not persisted across
161
- navigation. Not in current spec — logged under "Not in Scope".
162
91
  Verdict: APPROVE
163
-
164
- You: /sp-commit
165
-
166
- Claude → secret scan: clean. debug-code scan: clean.
167
- Staging 4 files. Conventional message:
168
- feat(todo): filter list by selected tags with AND semantics
92
+ feat(todo): filter list by selected tags with AND semantics
169
93
  ✓ commit a1b2c3d (not pushed — run `git push` when ready)
170
94
  ```
171
95
 
172
- > **Complex bug?** Insert `/sp-investigate "<bug>"` before `/sp-fix`. It's read-only,
173
- > writes `docs/investigate/<slug>-<date>.md` with hypotheses + blast radius, then
174
- > `/sp-fix` auto-picks it up. Skip for trivial bugs.
175
-
176
- That's the 5 minutes. The CLI auto-detected your project (Swift + XCTest here) — no config touched.
96
+ The CLI auto-detected the stack (Swift + XCTest) — no config touched. For a risky spec, run `/sp-challenge` between steps 1 and 2; it spawns hostile reviewers that quote the spec and propose surgical edits.
177
97
 
178
98
  ---
179
99
 
180
- ## 3. Setup
181
-
182
- ### Prerequisites
183
-
184
- | Tool | Required | Why |
185
- |------|----------|-----|
186
- | **A supported agent CLI** | Yes | Runs the skills — Claude Code, Codex, Cursor, Antigravity, OpenClaw, or Hermes |
187
- | **Git** | Yes | Change detection, commit workflow |
188
- | **Node.js** (18+) | Yes | File guard hook, JSON parsing |
189
- | **Bash** (4+) | Yes | Path guard hook, shell-based hooks |
190
- | **Language toolchain** | Yes | Whatever your project uses (Swift, npm, pytest, etc.) |
191
- | **[GraphAtlas](https://github.com/microvn/graphatlas)** | Optional | Graph-based code intelligence — skills prefer it over `grep` when connected (see below) |
192
-
193
- ### Installation
194
-
195
- **Option A: One-command install** (recommended)
196
-
197
- ```bash
198
- npx specpipe init .
199
- ```
200
-
201
- **Option B: Global install**
202
-
203
- ```bash
204
- npm install -g specpipe
205
-
206
- # Then, in any project:
207
- cd my-project
208
- specpipe init .
209
- ```
210
-
211
- **Option C: Global skills install** (available in all projects without running `init` again)
212
-
213
- ```bash
214
- specpipe init --global
215
- # or after per-project init, answer "yes" to the global prompt
216
- ```
217
-
218
- Skills installed globally at `~/.claude/skills/` are available in every project. Per-project `.claude/skills/` always takes precedence over global — so projects can still override individual skills.
219
-
220
- **Option D: Force re-install** (overwrites existing files)
221
-
222
- ```bash
223
- npx specpipe init --force .
224
- ```
225
-
226
- **Option D: Selective install** (only specific components)
100
+ ## Supported agents
227
101
 
228
- ```bash
229
- npx specpipe init --only hooks,skills .
230
- ```
102
+ A skill is authored once and **emitted into each agent's native format** — the markdown body is identical; only the location, frontmatter, and hook config change.
231
103
 
232
- **Option E: Multi-agent install** (one agent, several, or all)
104
+ | Agent | Skills | Rules | Enforced guards |
105
+ |-------|--------|-------|-----------------|
106
+ | **Claude Code** | `.claude/skills/` | `.claude/CLAUDE.md` | `.claude/settings.json` — all five |
107
+ | **Codex CLI** | `.agents/skills/` | `AGENTS.md` | `.codex/hooks.json` — shell |
108
+ | **Cursor** | `.cursor/skills/` | `.cursor/rules/*.mdc` | `.cursor/hooks.json` — shell + read + file |
109
+ | **Antigravity** | `.agents/skills/` | `.agents/rules/` | `.agents/hooks.json` — shell |
110
+ | **OpenClaw** | `skills/` | `SPECPIPE-RULES.md` | advisory rules |
111
+ | **Hermes** | `~/.hermes/skills/` (global only) | `SPECPIPE-RULES.md` | advisory rules |
233
112
 
234
- ```bash
235
- npx specpipe init --agents cursor . # one
236
- npx specpipe init --agents claude,codex . # several
237
- npx specpipe init --agents all . # every supported agent
238
- ```
113
+ Guards run as **blocking hooks** wherever the agent exposes a pre-tool-call hook — they deny a tool call before it runs. Each agent enforces only the guards its hook system supports; most hook Claude-specific tool events:
239
114
 
240
- ### Supported agents
115
+ | Guard | Stops | Claude | Cursor | Codex | Antigravity |
116
+ |---|---|:--:|:--:|:--:|:--:|
117
+ | **shell** | crawling `node_modules`/build dirs, reading `.env`/keys in a command | ✓ | ✓ | ✓ | ✓ |
118
+ | **read** | the agent reading a secret file | ✓ | ✓ | — | — |
119
+ | **file** | *(advisory)* warns when a source file grows too large | ✓ | ✓ | — | — |
120
+ | **comment / glob** | placeholder-comment edits, repo-wide broad globs | ✓ | — | — | — |
241
121
 
242
- The skills are authored once and emitted into each agent's native format on install.
243
- The markdown body is identical across agents; only the file location, name, and
244
- frontmatter change. Guardrails are **enforced via blocking hooks** for Claude, Codex,
245
- and Cursor (they can deny a tool call); Antigravity, OpenClaw, and Hermes get the same
246
- guard intent as **always-on advisory rules**.
122
+ A guard an agent can't hook still reaches it as an **advisory rule** in that agent's rules file, so the intent travels everywhere; OpenClaw and Hermes (no blocking hooks) get all guards that way. Skills that lean on Claude-only tools degrade gracefully with a short "running outside Claude Code" note. Details: [docs/multi-agent.md](docs/multi-agent.md).
247
123
 
248
- | Agent | Install location | Guardrails |
249
- |-------|------------------|-----------|
250
- | **Claude Code** | `.claude/skills/sp-*/SKILL.md` + `.claude/hooks/` | Hook-enforced |
251
- | **Codex CLI** | `.agents/skills/sp-*/SKILL.md` | **enforced** `.codex/hooks.json` + `AGENTS.md` |
252
- | **Cursor** | `.cursor/skills/sp-*/SKILL.md` | **enforced** `.cursor/hooks.json` + `.cursor/rules/` |
253
- | **Antigravity** | `.agents/skills/sp-*/SKILL.md` | `.agent/rules/` (advisory) |
254
- | **OpenClaw** | `skills/sp-*/SKILL.md` | `SPECPIPE-GUARDS.md` (advisory) |
255
- | **Hermes** | `optional-skills/specpipe/sp-*/SKILL.md` | `SPECPIPE-GUARDS.md` (advisory) |
256
-
257
- Skills that use Claude-only tools (`AskUserQuestion`, subagents) get a "Running outside
258
- Claude Code" note appended for the other agents, so they degrade gracefully. The specs
259
- and workflow themselves are tool-agnostic. Full details: [docs/multi-agent.md](docs/multi-agent.md).
124
+ ---
260
125
 
261
- ### What Gets Installed
126
+ ## What gets installed
262
127
 
263
- The tree below is the **Claude Code** layout (`--agents claude`, the default). Other
264
- agents install the same skills into their own locations — see [Supported agents](#supported-agents).
128
+ Each agent gets its own paths; the Claude layout, as an example:
265
129
 
266
130
  ```
267
131
  your-project/
268
- ├── .specpipe/
269
- │ └── manifest.json ← install manifest (tracks files per agent; used by upgrade/remove)
132
+ ├── .specpipe/manifest.json ← tracks every installed file per agent (drives upgrade/remove)
270
133
  ├── .claude/
271
- │ ├── CLAUDE.md Project rules hub
272
- │ ├── settings.json Hook wiring
273
- │ ├── hooks/
274
- │ ├── file-guard.js Warns on large files
275
- │ │ ├── path-guard.sh Blocks wasteful Bash paths
276
- │ │ ├── glob-guard.js ← Blocks broad glob patterns
277
- │ │ ├── comment-guard.js ← Blocks placeholder comments
278
- │ │ ├── sensitive-guard.sh ← Blocks access to secrets
279
- │ │ └── self-review.sh ← Quality checklist on stop
280
- │ └── skills/
281
- │ ├── sp-explore/SKILL.md ← /sp-explore skill
282
- │ ├── sp-scaffold/ ← /sp-scaffold skill (greenfield bootstrap)
283
- │ │ ├── SKILL.md
284
- │ │ └── references/ ← ARCHITECTURE/DESIGN templates, ADR template,
285
- │ │ │ stack-profiles/ seeds (copy to ~/.claude or
286
- │ │ │ ./.claude to customize — bundled copy is overwritten on upgrade)
287
- │ │ ├── ARCHITECTURE.md.tmpl
288
- │ │ ├── DESIGN.md.tmpl
289
- │ │ ├── adr/NNNN-template.md
290
- │ │ └── stack-profiles/react.md
291
- │ ├── sp-plan/SKILL.md ← /sp-plan skill
292
- │ ├── sp-challenge/SKILL.md ← /sp-challenge skill
293
- │ ├── sp-build/SKILL.md ← /sp-build skill
294
- │ ├── sp-investigate/SKILL.md ← /sp-investigate skill (optional, read-only)
295
- │ ├── sp-fix/SKILL.md ← /sp-fix skill
296
- │ ├── sp-review/SKILL.md ← /sp-review skill
297
- │ ├── sp-commit/SKILL.md ← /sp-commit skill
298
- │ ├── sp-spec-render/ ← /sp-spec-render skill (spec HTML view, user-invoked)
299
- │ │ ├── SKILL.md
300
- │ │ ├── template.html
301
- │ │ ├── components.md
302
- │ │ └── examples/
303
- │ ├── sp-md-render/ ← /sp-md-render skill (generic markdown HTML view)
304
- │ │ ├── SKILL.md
305
- │ │ ├── template.html
306
- │ │ └── components.md
307
- │ ├── sp-voices/SKILL.md ← /sp-voices skill (multi-LLM review)
308
- │ └── sp-humanize/SKILL.md ← /sp-humanize skill (rephrase to human voice)
309
- └── docs/
310
- ├── specs/ ← Your specs (folder-per-feature)
311
- │ └── <feature>/
312
- │ ├── <feature>.md ← Spec with acceptance scenarios
313
- │ └── snapshots/ ← Version history (managed by /sp-plan)
314
- └── WORKFLOW.md ← Process reference
315
- ```
316
-
317
- ### Optional: GraphAtlas Code Intelligence
318
-
319
- The `sp-*` skills work out of the box with `grep`. But when [GraphAtlas](https://github.com/microvn/graphatlas) (GA) is connected as an MCP server, six skills — `/sp-explore`, `/sp-plan`, `/sp-build`, `/sp-fix`, `/sp-review`, `/sp-investigate` — prefer it over `grep` for code discovery, call-graph tracing, and blast-radius analysis.
320
-
321
- **Why it helps:** `grep` can't tell a call site from a string literal, doesn't see polymorphic dispatch, and won't follow re-exports. An agent that edits one function but misses its callers, test files, and overrides in other modules ships a bug. GA indexes the repo once into a local graph with typed `CALL` / `IMPORT` / `OVERRIDE` edges, then answers structural questions deterministically in milliseconds with a small token footprint. It runs 100% locally — no LLM, no embeddings, no telemetry.
322
-
323
- **How the skills use it:** each skill runs a one-time probe (`ga_architecture`) at the start. If GA responds, it leans on tools like `ga_impact` (blast radius + affected tests), `ga_callers` / `ga_callees` (call graph), `ga_symbols` (definition lookup), and `ga_rename_safety`. If GA is absent — or the index is stale — the skill falls back to `grep`/`glob` automatically. Nothing breaks; you only lose the precision.
324
-
325
- **Setup:** GA is a separate tool, not bundled with this kit. Install and register it as an MCP server following the instructions at [github.com/microvn/graphatlas](https://github.com/microvn/graphatlas). Once registered, the skills detect it on their own — no changes to this kit's config needed.
326
-
327
- ### Post-Install Configuration
328
-
329
- The CLI auto-detects your project type and fills in `CLAUDE.md`. Verify it's correct:
330
-
331
- ```bash
332
- cat .claude/CLAUDE.md
333
- ```
334
-
335
- Look for the **Project Info** section. Ensure language, test framework, and directories are correct. Edit manually if needed.
336
-
337
- ### Upgrade
338
-
339
- ```bash
340
- npx specpipe upgrade
341
- ```
342
-
343
- Smart upgrade — updates kit files but preserves any you've customized. Use `--force` to overwrite everything.
344
-
345
- ```bash
346
- # Check if update is available
347
- npx specpipe check
348
-
349
- # See what changed
350
- npx specpipe diff
351
-
352
- # View installed files and status
353
- npx specpipe list
354
- ```
355
-
356
- ### Uninstall
357
-
358
- ```bash
359
- npx specpipe remove
360
- ```
361
-
362
- This removes hooks, skills, and settings. It preserves `CLAUDE.md` (which you may have customized) and `docs/` (which contains your specs).
363
-
364
- ---
365
-
366
- ## 4. Daily Workflows
367
-
368
- ### New Project (Greenfield)
369
-
370
- > When: Brand-new project — no codebase yet (empty repo, no package manager / `src/`).
371
-
372
- ```
373
- 1. /sp-explore "what you're building"
374
- → Detects greenfield, also decides app-type + stack (researched, current),
375
- emits a Bootstrap Brief in docs/explore/<feature>.md.
376
-
377
- 2. /sp-scaffold
378
- → Generator-first runnable skeleton (core/ + one pattern-demonstrating module +
379
- tests), smoke-gated (install→build→start GREEN), + ARCHITECTURE.md / ADRs.
380
- Hands off only when it RUNS.
381
-
382
- 3. /sp-plan → /sp-build → normal New Feature flow, now on a runnable base.
383
- ```
384
-
385
- ### Explore Before Planning
386
-
387
- > When: Requirements are unclear, you're debating between approaches, or it's a brownfield feature with existing code to understand first.
388
-
389
- ```
390
- 1. /sp-explore "feature description"
391
- → Asks questions as a Client Technical Lead — one topic at a time.
392
- → Clarifies: why, behavior, boundaries, business rules, edge cases, permissions, UI.
393
- → Output: docs/explore/<feature>.md
394
-
395
- 2. /sp-plan "feature description"
396
- → Auto-detects docs/explore/<feature>.md, skips redundant discovery.
397
- → Continue with the normal New Feature flow.
398
- ```
399
-
400
- **Example:**
401
- ```
402
- /sp-explore "cancel order request"
403
- ```
404
-
405
- ### New Feature
406
-
407
- > When: Building something new — no existing code or spec.
408
-
409
- ```
410
- 1. /sp-plan "description of the feature"
411
- → Generates spec with acceptance scenarios at docs/specs/<feature>/<feature>.md.
412
-
413
- 2. Implement code in chunks.
414
- After each chunk: /sp-build
415
- Repeat until green.
416
-
417
- 3. /sp-review (before merge)
418
-
419
- 4. /sp-commit
420
- ```
421
-
422
- **Example:**
423
- ```
424
- /sp-plan "User authentication with email/password login, password reset via email, and session management with 24h expiry"
425
- ```
426
-
427
- ### Update Existing Feature
428
-
429
- > When: Changing behavior of something that already exists.
430
-
431
- ```
432
- 1. /sp-plan docs/specs/<feature>/<feature>.md "description of changes"
433
- → Mode C handles everything: snapshot → classification → change report → apply.
434
- Do NOT manually edit the spec before running /sp-plan.
435
-
436
- 2. Implement the code change.
437
- /sp-build
438
- Fix until green.
439
-
440
- 3. /sp-review → /sp-commit
441
- ```
442
-
443
- ### Bug Fix
444
-
445
- > When: Something is broken.
446
-
447
- ```
448
- 0. (OPTIONAL) /sp-investigate "description of the bug"
449
- → Use for complex bugs, outages, data corruption, or when the cause is unclear.
450
- → Read-only: hypothesis + blast radius + evidence, no code changes.
451
- → Writes docs/investigate/<slug>-<date>.md for /sp-fix to consume.
452
- → Skip for trivial/obvious bugs — go straight to /sp-fix.
453
-
454
- 1. /sp-fix "description of the bug" (or /sp-fix docs/investigate/<slug>-<date>.md)
455
- → Writes failing test → fixes code → runs full suite.
456
-
457
- 2. /sp-commit
458
- ```
459
-
460
- **Example:**
461
- ```
462
- /sp-fix "Search returns no results when query contains apostrophes like O'Brien"
134
+ │ ├── CLAUDE.md ← rules hub: spec-first cycle + guardrails + auto-detected stack
135
+ │ ├── settings.json hook wiring
136
+ │ ├── hooks/ ← shell, read, comment, glob, file guards
137
+ └── skills/sp-*/ the 13 skills
138
+ └── docs/specs/<feature>/ your specs + snapshots (created by the skills)
463
139
  ```
464
140
 
465
- ### Remove Feature
466
-
467
- > When: Deleting code, removing deprecated functionality.
468
-
469
- ```
470
- 1. /sp-plan docs/specs/<feature>/<feature>.md "remove stories S-XXX"
471
- → Mode C creates a snapshot (removing stories = Major), then marks as removed.
141
+ Other agents add their own dirs (`.agents/`, `.cursor/`, `.codex/`) and a shared `AGENTS.md`. `remove` cleans it all up; `remove --agents <list>` drops one agent and keeps shared files the others need. Your `CLAUDE.md` content and `docs/` are always preserved.
472
142
 
473
- 2. Delete production code + related tests.
474
-
475
- 3. Run the full test suite (your project's native test command).
476
- Fix cascading breaks.
477
-
478
- 4. /sp-commit
479
- ```
143
+ > **GraphAtlas (optional):** with [GraphAtlas](https://github.com/microvn/graphatlas) connected as an MCP server, six skills prefer it over `grep` for call-graph and blast-radius analysis — 100% local, no LLM. Skills fall back to `grep` when it's absent; nothing breaks.
480
144
 
481
145
  ---
482
146
 
483
- ## 5. Commands Reference
147
+ ## Commands
484
148
 
485
- ### /sp-explore Feature Discovery as Client Technical Lead
486
-
487
- **Usage:**
488
- ```
489
- /sp-explore "cancel order request"
490
- /sp-explore "user notification preferences"
491
- ```
149
+ Thirteen slash commands. Full per-skill behaviour (phases, rules, outputs) lives in **[docs/commands.md](docs/commands.md)**.
492
150
 
493
- **When to use:** Requirements are unclear, you're debating between approaches, or you want to clarify a feature deeply before committing to a spec. Runs before `/sp-plan`.
494
-
495
- **How it works:**
496
-
497
- 1. **Phase 0: Codebase scan** Silently checks for existing code, related specs, and existing explore docs before asking anything.
498
- 2. **Phase 1: Why, not what** Asks what problem requires this feature, who faces it, and how they handle it today. Prevents building the wrong thing.
499
- 3. **Phase 2: Desired behavior**Walks through the flow step by step, identifies trigger and final result, checks for multi-role approval chains.
500
- 4. **Phase 2.5: UI/UX expectation** Clarifies interface type (table, form, wizard, dashboard). Offers sensible defaults when the client is unsure. Suggests simpler approaches when expectations are complex.
501
- 5. **Phase 3: Boundaries**Impact on existing screens, data changes, migration needs, out of scope, permissions.
502
- 6. **Phase 3.5: Scope optimization** Identifies what can ship fast vs what can defer to phase 2.
503
- 7. **Phase 4: Business rules & validation** — Conditions, formulas (with real numbers), input validation, notifications, time constraints, concurrency.
504
- 8. **Phase 5: Edge cases** Empty states, error messages, double submit, network loss, limits, sensitive data, domain-specific cases (payment double-charge, booking overbooking, etc.).
505
- 9. **Phase 6: Scenario confirmation** Presents concrete happy path + unhappy paths with fake data. Confirms with user before proceeding.
506
- 10. **Phase 7: Handoff summary** Compiles everything into a structured doc, confirms with user, writes to `docs/explore/<feature>.md`.
507
-
508
- **Output:** `docs/explore/<feature>.md` — auto-detected by `/sp-plan`, which skips redundant discovery and maps explore findings directly to spec sections.
509
-
510
- **Token cost:** 10–20k
151
+ | Command | What it does | Tokens |
152
+ |---------|--------------|--------|
153
+ | [`/sp-explore`](docs/commands.md#sp-explore--feature-discovery-as-client-technical-lead) | Feature discovery as a Client Technical Lead — read-only Q&A before planning | 10–20k |
154
+ | [`/sp-scaffold`](docs/commands.md#sp-scaffold--greenfield-project-bootstrap) | Greenfield bootstrap to a runnable, smoke-gated skeleton | 15–40k + build |
155
+ | [`/sp-plan`](docs/commands.md#sp-plan--generate-spec-with-acceptance-scenarios) | Generate / update a spec with acceptance scenarios (Given/When/Then) | 20–40k |
156
+ | [`/sp-challenge`](docs/commands.md#sp-challenge--adversarial-plan-review) | Adversarial spec review by parallel hostile reviewers | 15–30k |
157
+ | [`/sp-build`](docs/commands.md#sp-build--tdd-delivery-loop) | TDD delivery loopcoverage map tests build green | 5–10k |
158
+ | [`/sp-investigate`](docs/commands.md#sp-investigate--read-only-root-cause-investigation-optional) | Read-only root-cause investigation (optional, before fix) | 8–15k |
159
+ | [`/sp-fix`](docs/commands.md#sp-fix--test-first-bug-fix) | Test-first bug fix failing test minimal fix green | 3–5k |
160
+ | [`/sp-review`](docs/commands.md#sp-review--pre-merge-quality-gate) | Pre-merge quality gate with smart focus + failure-mode grid | 10–20k |
161
+ | [`/sp-voices`](docs/commands.md#sp-voices--multi-llm-review-optional) | Multi-LLM review panel (optional second opinion) | 10–30k + API |
162
+ | [`/sp-commit`](docs/commands.md#sp-commit--smart-git-commit) | Smart conventional commit with secret + debug-code scan | 2–4k |
163
+ | [`/sp-spec-render`](docs/commands.md#sp-spec-render--render-spec-as-html-view) | Render a spec as a standalone HTML view | 3–8k |
164
+ | [`/sp-md-render`](docs/commands.md#sp-md-render--render-any-markdown-as-html-view) | Render any long-form markdown as an HTML view | 3–8k |
165
+ | [`/sp-humanize`](docs/commands.md#sp-humanize--rephrase-to-human-voice) | Rephrase a plan/draft into natural, send-ready text | 2–6k |
511
166
 
512
167
  ---
513
168
 
514
- ### /sp-scaffold — Greenfield Project Bootstrap
515
-
516
- **Usage:**
517
- ```
518
- /sp-scaffold # bootstrap from the Bootstrap Brief in docs/explore/
519
- /sp-scaffold "Next.js + Nest pnpm monorepo" # standalone: gather app-type/stack itself
520
- ```
521
-
522
- **When to use:** A brand-new project with no runnable codebase yet. Runs between `/sp-explore` (greenfield branch) and `/sp-plan`: `sp-explore → sp-scaffold → sp-plan → sp-build`. Skip if a runnable project already exists — go straight to `/sp-plan`. `/sp-build`'s Foundation Gate refuses to start the TDD loop until this has produced a runnable harness.
523
-
524
- **How it works:**
169
+ ## Workflows
525
170
 
526
- 1. **Precondition** confirms greenfield; resumes a partial repo without clobbering user files.
527
- 2. **App-type + stack** — taken from the Bootstrap Brief (or asked); never silently defaulted; **current versions researched**, not recalled from training memory. Optional layered stack profiles (`./.claude/` > `~/.claude/` > kit seed) supply opinionated defaults; the Brief always wins.
528
- 3. **Skeleton (generator-first)** — official `create-*` CLIs give real pinned deps (defends against hallucinated/typosquatted packages); monorepos orchestrated root-first; imposes `core/` + `modules/` + co-located tests; seeds ONE module that **demonstrates the architecture pattern** (the template every feature copies).
529
- 4. **Smoke gate (non-negotiable)** — `install → build → start/smoke` must be GREEN, with ≥1 real passing test (this resolves `TEST_CMD` for `/sp-build`). Not green → BLOCKED; never a half-scaffold.
530
- 5. **Docs** — fills `ARCHITECTURE.md` (codemap + invariants), one ADR per major stack choice, optional `DESIGN.md`.
531
- 6. **Hygiene & handoff** — secret scan, `.gitignore`, `.env.example`; reports the resolved `TEST_CMD`.
171
+ The four-step loop above is the **new feature** flow. Variants:
532
172
 
533
- **Output:** a runnable walking skeleton + canonical docs. Thin by design features come later via `/sp-plan` → `/sp-build`.
534
-
535
- **Token cost:** 15–40k + real install/build time (heavier than other skillsit runs generators and builds).
173
+ - **Greenfield** (empty repo): `/sp-explore` (decides app-type + stack) `/sp-scaffold` (runnable, smoke-gated skeleton) the feature loop.
174
+ - **Update a feature:** `/sp-plan docs/specs/<feature>/<feature>.md "what's changing"` — never hand-edit the spec; Mode C snapshots → diffs → applies. Then `/sp-build` → `/sp-review` → `/sp-commit`.
175
+ - **Bug fix:** `/sp-fix "the bug"` (failing test minimal fix green). For a murky bug, run `/sp-investigate` first read-only hypothesis + blast radius.
176
+ - **Fuzzy requirements:** `/sp-explore "feature"` runs a one-topic-at-a-time Q&A and `/sp-plan` picks up its notes automatically.
536
177
 
537
178
  ---
538
179
 
539
- ### /sp-plan — Generate Spec with Acceptance Scenarios
540
-
541
- **Usage:**
542
- ```
543
- /sp-plan "user authentication with OAuth2" # Mode A: new spec from description
544
- /sp-plan docs/specs/auth/auth.md # Mode B: add scenarios to existing spec
545
- /sp-plan docs/specs/auth/auth.md "add password reset flow" # Mode C: update existing spec
546
- ```
547
-
548
- **Modes:**
549
- - **Mode A** — Creates a new spec with stories and acceptance scenarios from your description.
550
- - **Mode B** — Reads an existing spec that has no acceptance scenarios yet, adds them.
551
- - **Mode C** — Updates an existing spec: creates a snapshot before Major changes, shows a change report, waits for confirmation, then applies.
552
-
553
- **How it works:**
554
-
555
- 1. **Phase 0: Codebase Awareness** — Scans existing code, `docs/specs/`, and project patterns before planning. Prevents specs that conflict with existing implementations.
556
- 2. **Phase 1: Scope & Split + Scope Challenge** — Evaluates feature size (>7 stories or >20 AS → must split). When a feature is large, applies **Sizing & Phasing**: Phase 1 (minimum viable — smallest slice with value), Phase 2 (core experience — happy path), Phase 3 (edge cases, polish), Phase 4 (optimization, monitoring) — each phase mergeable independently. Also runs a **Scope Challenge** before drafting: checks for existing code that already solves sub-problems (reuse vs rebuild), flags complexity smells (8+ files or 2+ new classes/services), searches for framework built-ins, checks for distribution needs (new artifact → CI/CD in scope?), and applies the Completeness Principle (complete version costs only `CC: ≤15m` more → recommend it directly).
557
- 3. **Phase 2: Draft Spec** — Generates a structured spec with stories and acceptance scenarios (Given/When/Then). Depth scales by priority: P0 gets full GWT + test data, P1 gets GWT, P2 gets 1-2 line descriptions. Runs consistency checks (CC1-CC6) before showing draft.
558
- 4. **Phase 3: Clarify Ambiguities** — Systematically finds gaps across behavioral, data, auth, non-functional, integration, and concurrency dimensions. Questions include `(human: ~X / CC: ~Y)` effort scales and `Completeness: X/10` scores for each option.
559
- 5. **Phase 4: Summary** — Shows story counts, AS counts, implementation order, next steps. Every spec also gets a **"What Already Exists"** section (existing code that partially solves the problem) and a **"Not in Scope"** section (deferred work with rationale — prevents work from silently dropping).
560
-
561
- **Mode C (Update) adds:**
562
- - **Classification** — Walks through M1-M6 checklist to determine Major vs Minor change.
563
- - **Snapshot** — Major changes trigger an automatic snapshot (`cp`, bit-perfect) before editing.
564
- - **Change report** — Shows what will change, waits for user confirmation.
565
- - **Consistency check** — Runs CC1-CC6 after every update.
566
-
567
- **Traceability IDs:**
568
- - `S-NNN` — Stories (with priority P0/P1/P2)
569
- - `AS-NNN` — Acceptance Scenarios (Given/When/Then, embedded in stories)
570
- - `FR-NNN` — Functional Requirements (if needed)
571
- - `SC-NNN` — Success Criteria (if needed)
572
- - IDs are immutable — deleted IDs are never reused.
573
-
574
- **Directory structure:**
575
- ```
576
- docs/specs/<feature>/
577
- <feature>.md # single source of truth — always read this file
578
- snapshots/ # version history (managed by sp-plan, not developers)
579
- YYYY-MM-DD.md
580
- YYYY-MM-DD-<REF>.md
581
- ```
582
-
583
- **Output:**
584
- - Spec with acceptance scenarios: `docs/specs/<feature>/<feature>.md`
585
- - (Optional) Scannable HTML view: `docs/specs/<feature>/<feature>.html` — generated by running `/sp-spec-render <feature>` after `/sp-plan`. `/sp-plan` suggests the command at the end of Phase 4 and Mode C but does not invoke it. Source `.md` remains canonical; HTML is regenerable.
586
-
587
- ### /sp-spec-render — Render Spec as HTML View
588
-
589
- **Usage:**
590
- ```
591
- /sp-spec-render <feature> # render by feature slug
592
- /sp-spec-render docs/specs/auth/auth.md # render specific spec
593
- /sp-spec-render docs/specs/billing/ # render spec dir
594
- /sp-spec-render --all # bulk re-render all specs
595
- /sp-spec-render # list + prompt
596
- ```
597
-
598
- **When to use:** Decoupled from `/sp-plan` — you invoke it explicitly when you want the HTML view. `/sp-plan` writes the spec markdown and ends; it suggests `/sp-spec-render` at the end of Phase 4 and Mode C but never calls it automatically. Run it:
599
- - After `/sp-plan` to generate the initial HTML view (sidebar TOC, story cards, collapsible AS)
600
- - After a Mode C update to refresh a now-stale `.html`
601
- - After fixing a typo directly in `<feature>.md` (no spec semantics changed, but HTML is stale)
602
- - For specs written before this skill existed
603
- - Bulk (`--all`) after changing `template.html` or `components.md`
604
-
605
- **How it works:**
606
-
607
- 1. Reads `docs/specs/<feature>/<feature>.md` (+ sub-specs if multi-spec).
608
- 2. Reads `template.html` + `components.md` (cached, not regenerated each call).
609
- 3. Parses spec: frontmatter, stories with priority badges, acceptance scenarios (Given/When/Then), constraints, change log, snapshots.
610
- 4. Builds the HTML buffer in-memory using component snippets — copy verbatim, fill content. AI never writes CSS or component markup from scratch.
611
- 5. Writes `<feature>.html` next to `<feature>.md` in one Write call.
612
-
613
- **Output features (the rendered HTML):**
614
-
615
- - Sticky top bar: doc type + feature name + version + last-updated + counts (specs / stories / AS) + status pill (Active/Draft/Deprecated)
616
- - Mandatory TL;DR card immediately after the title
617
- - Sidebar TOC with scroll-spy + search filter, grouped by sub-spec (multi-spec) or by section (single)
618
- - Story cards with priority badge (P0/P1/P2) + AS count badge
619
- - AS as collapsible details (first AS of each story open by default), with Given/When/Then grid
620
- - Constraint callouts (warning style), grouped per sub-spec for large specs
621
- - Change Log and Snapshots collapsed by default
622
- - Dark/light/auto theme toggle (system preference honored)
623
- - Print stylesheet (sidebar hidden, all details expanded, page-break-aware)
624
- - Self-contained: zero external dependencies, no CDN, opens offline
625
-
626
- **Source remains truth:**
627
- - `.md` is canonical. Edit `.md` via `/sp-plan`; regenerate `.html` via this skill.
628
- - Never hand-edit the `.html`. Re-rendering is idempotent — run `/sp-spec-render` any time you want the HTML to catch up with the `.md`.
629
-
630
- **Token cost:** 3–8k (template + components cached; output ≈ source markdown × 1.2 — no CSS/JS in output token stream).
631
-
632
- ### /sp-md-render — Render Any Markdown as HTML View
633
-
634
- Generic counterpart to `/sp-spec-render`. Same template/component architecture, but for arbitrary long-form markdown with no fixed schema — investigation reports, explore docs, RFCs, retros, design notes, READMEs.
635
-
636
- **Usage:**
637
- ```
638
- /sp-md-render docs/investigate/payment-bug-2026-05-16.md # render next to source
639
- /sp-md-render <file.md> --out report.html # custom output path
640
- /sp-md-render docs/notes/ # list + prompt
641
- /sp-md-render # prompt for path
642
- ```
643
-
644
- **When to use:** Any non-spec markdown you want as a scannable, shareable single HTML file. It refuses spec files (heading `### S-NNN:`) and points you to `/sp-spec-render` instead.
645
-
646
- **How it works:** Reads source + `template.html` + `components.md`, then uses an *analyzer pattern* (not fixed parsing) — each markdown chunk is mapped to the best component: numbered actions → step cards, GFM admonitions → callouts, ` ```mermaid ` → diagrams, pros/cons → compare cards, long appendices → collapsible. Builds the buffer in-memory, writes once.
647
-
648
- **Output features:** sidebar TOC + scroll-spy + search, anchored headings with copy-link, code blocks with copy button + language label, Mermaid diagrams (CDN), 4-variant callouts (note/tip/warn/danger), step cards, compare cards, task lists, footnotes, figure+caption, dark/light/auto theme, scroll progress bar, mobile drawer, print stylesheet. Self-contained (only Mermaid loads from CDN).
649
-
650
- **Token cost:** 3–8k (template + components cached; output ≈ source markdown × 1.2 — no CSS/JS in output token stream).
651
-
652
- ### /sp-challenge — Adversarial Plan Review
653
-
654
- **Usage:**
655
- ```
656
- /sp-challenge docs/specs/auth/auth.md # challenge a spec
657
- /sp-challenge "user authentication" # challenge by feature name
658
- ```
659
-
660
- **How it works (7 phases):**
661
-
662
- 1. **Read & Map** — Reads the spec (including acceptance scenarios) and maps: decisions made, assumptions (stated AND implied), dependencies, scope boundaries, risk acknowledgments, story-AS consistency.
663
- 2. **Scale Reviewers** — Assesses complexity and selects reviewers:
664
-
665
- | Complexity | Signals | Reviewers |
666
- |------------|---------|-----------|
667
- | Simple | 1 spec section, <20 acceptance scenarios, no auth/data | 2 |
668
- | Standard | Multiple sections, auth or data involved | 3 |
669
- | Complex | Multiple integrations, concurrency, migrations, 6+ phases | 4 |
670
-
671
- 3. **Spawn Reviewers** — Launches parallel subagents, each with an adversarial lens:
672
-
673
- - **Security Adversary**
674
- - OWASP Top 10
675
- - Injection vectors
676
- - Auth/authz bypass
677
- - Crypto issues
678
- - Data exposure
679
- - Supply chain risks
680
-
681
- - **Failure Mode Analyst** — *"Everything that can go wrong, will — simultaneously, at 3 AM, during peak traffic"*
682
- - Partial failures
683
- - Concurrency & race conditions
684
- - Cascading failures
685
- - Recovery paths
686
- - Idempotency
687
- - Observability gaps
688
-
689
- - **Assumption Destroyer** — *"'It should work' is not evidence"*
690
- - Unverified claims
691
- - Scale assumptions
692
- - Environment differences
693
- - Integration contracts
694
- - Data shape assumptions
695
- - Timing dependencies
696
- - Hidden dependencies
697
-
698
- - **Scope & YAGNI Critic** — *"The best code is no code. The best feature is the one you didn't build"*
699
- - Over-engineering
700
- - Premature abstraction
701
- - Missing MVP cuts
702
- - Gold plating
703
- - Simpler alternatives
704
-
705
- 4. **Deduplicate & Rate** — Collects all findings, removes duplicates, rates severity using a Likelihood x Impact matrix. Caps at 15 findings: keeps all Critical, top High by specificity, notes how many Medium were dropped. Each reviewer is limited to top 7 findings.
706
-
707
- 5. **Adjudicate** — Evaluates each finding: Accept (valid flaw, plan should change) or Reject (false positive, acceptable risk, already handled). 1-sentence rationale for each.
708
-
709
- 6. **User Choice** — Two modes: "Apply all accepted" (fast) or "Review each" (walk through one by one).
710
-
711
- 7. **Apply** — Surgical edits only to accepted findings. Doesn't rewrite surrounding sections.
712
-
713
- **Finding format:** Each finding includes Title, Severity, **Confidence score** (9-10 = verified; 7-8 = strong match; 5-6 = note caveat; ≤4 = omit unless Critical), Location, Flaw description, Evidence (direct quote from the plan), step-by-step Failure scenario, and Suggested fix.
714
-
715
- **6 non-negotiable rules:**
716
- 1. Spawn reviewers in parallel (not sequential)
717
- 2. Reviewers read files directly, not summarized content
718
- 3. Be hostile — no praise, no softening
719
- 4. Every finding must quote the plan directly as evidence
720
- 5. Quality over quantity — 3 honest findings > 15 padded ones
721
- 6. Skip style/formatting — substance only
722
-
723
- **When to use:**
724
- - After `/sp-plan`, before coding — for complex features
725
- - Features involving auth, payments, data pipelines, multi-service integration
726
- - NOT needed for simple CRUD, small bug fixes, or trivial features
727
-
728
- **Token cost:** 15-30k (uses parallel subagents, doesn't bloat main context)
180
+ ## CLI reference
729
181
 
730
- ### /sp-build — TDD Delivery Loop
731
-
732
- **Usage:**
733
- ```
734
- /sp-build # build all changes vs base branch
735
- /sp-build src/api/users.ts # build specific file
736
- /sp-build "user authentication" # build specific feature
737
- ```
738
-
739
- **How it works:**
740
-
741
- 1. **Phase 0: Build Context** — Finds changed files vs base branch, reads the spec (acceptance scenarios in `## Stories` section are the roadmap), checks `docs/specs/<feature>/.build-progress` to resume from a previous interrupted session, reads existing tests for patterns, fixtures, and naming conventions. Doesn't duplicate what already exists.
742
- 2. **Phase 1: Decide What to Test** — Determines test scope from acceptance scenarios. Applies the **Completeness Principle**: AI writes tests ~50x faster than humans, so if full coverage costs `CC: ≤15m`, it writes complete tests without asking. Always checks 8 mandatory edge case categories: null/undefined, empty arrays/strings, invalid types, boundary values (min/max), error paths (network failures, DB errors), race conditions, large data (10k+ items), and special characters (Unicode, SQL chars).
743
- 3. **Phase 1.5: Coverage Map** — Before writing a single test, traces every code path (if/else, switch, guard, try/catch) AND user flows (double-click, stale session, navigate away mid-op). Draws an ASCII diagram marking each path as `[★★★ TESTED]`, `[★★ TESTED]`, `[★ TESTED]`, or `[GAP]`. Gaps marked `[GAP] [→E2E]` need E2E tests; `[GAP] [→EVAL]` need evals — when flagged, defines capability + regression evals before implementing and reports pass@1/pass@3. **Regression rule:** if the diff changes existing behavior with no covering test, a regression test is a CRITICAL requirement — no asking, no skipping.
744
- 4. **Phase 2: Write Tests** — Writes tests for every `[GAP]` identified in the Coverage Map. Before moving to Phase 3, verifies: all public functions have unit tests, all API endpoints have integration tests, edge cases covered, error paths tested, tests independent, assertions specific.
745
- 5. **Phase 3: Build and Run** — Compiles/typechecks first, then runs tests.
746
- 6. **Phase 4: Fix Loop** — If tests fail, fixes **test code only** (max 3 attempts, then hard stop and report). If tests expect X but code does Y, asks whether to fix production code or adjust the test — with effort scales `(human: ~X / CC: ~Y)`.
747
- 7. **Phase 5: Report** — Summary with test counts, results, coverage, files touched, and any E2E/eval gaps to follow up on.
748
-
749
- **Rules:**
750
- - Never changes production code without asking first
751
- - Never deletes or weakens existing tests
752
- - Never adds `skip`/`xit`/`@disabled` to hide failures
753
- - Max 3 fix attempts — then stops and reports the issue
754
-
755
- **What NOT to test:** Private/internal methods, framework behavior, trivial getters/setters, implementation details.
756
-
757
- ### /sp-investigate — Read-Only Root Cause Investigation (Optional)
758
-
759
- **Usage:**
760
- ```
761
- /sp-investigate "production 500s after deploy on /api/orders"
762
- /sp-investigate "intermittent data corruption in nightly sync"
763
- ```
764
-
765
- **When to use:** OPTIONAL branch before `/sp-fix`. Use for complex bugs, production outages, data corruption, unclear regressions, or when the user wants a diagnosis report without any code change. Skip for trivial/obvious bugs — go straight to `/sp-fix`.
766
-
767
- **What it does NOT do:** Never edits source code, tests, or config. The only write it performs is the investigation report at `docs/investigate/<slug>-<date>.md`.
768
-
769
- **How it works (adaptive depth, auto-scales):**
770
-
771
- 1. **Phase 1: Understand the Report** — Extract symptom, expected, actual from `$ARGUMENTS`. Asks ONE clarifying question via AskUserQuestion if required fields are missing.
772
- 2. **Phase 2: Locate** — Entry-point search (error/stack/function/feature), recurring-bug check (3+ fix commits on same pattern → architectural smell), data-flow trace, git history (regression signal).
773
- 3. **Phase 3: Pattern Match** — 12 known bug patterns (nil propagation, race, state corruption, off-by-one, type coercion, stale cache, config drift, silent error swallow, ordering/timing, resource leak, merge conflict, API contract). Skipped if Phase 2 already produced a HIGH-confidence hypothesis.
774
- 4. **Phase 4: Form Hypothesis** — Specific, testable, falsifiable. Location + mechanism + causal chain + disproof condition + confidence (HIGH/MEDIUM/LOW). 3-strike rule: if 3 hypotheses all stay below MEDIUM → escalate via AskUserQuestion.
775
- 5. **Phase 5: Map Blast Radius** — Investigation scope, bug path diagram (skipped if ISOLATED), impact scope (direct/indirect/data/user-facing), similar-risk scan (5-min timebox).
776
- 6. **Phase 6: Recommend Next Steps** — CRITICAL/HIGH/MEDIUM actions, test strategy, fix approach (minimal / targeted refactor / architectural).
777
- 7. **Output** — Writes structured Investigation Report to `docs/investigate/<slug>-<date>.md`. Signals `/sp-fix <file>` for handoff.
778
-
779
- **Status values:** `ROOT_CAUSE_FOUND | PROBABLE_CAUSE | INSUFFICIENT_EVIDENCE | BLOCKED`
780
-
781
- **Iron Law:** Follow evidence, never start with a theory. Every claim references file:line or git commit. INSUFFICIENT_EVIDENCE is a valid outcome — don't inflate confidence to ship a report.
782
-
783
- **Token cost:** 8–15k
784
-
785
- ---
786
-
787
- ### /sp-fix — Test-First Bug Fix
788
-
789
- **Usage:**
790
- ```
791
- /sp-fix "description of the bug"
792
- ```
793
-
794
- **How it works:**
795
-
796
- 1. **Phase 0: Investigate** — Parses the bug report, locates relevant code, checks git history, and forms a root cause hypothesis. Then draws a **Bug Path Diagram** (same `[GAP]`/`[★★ TESTED]` format as `/sp-build`) for the buggy function — if no specific `[GAP]` path can be identified, the hypothesis isn't specific enough yet.
797
- 2. **Phase 1: Write Failing Test** — **Regression rule first:** if the bug exists because the diff changed existing behavior with no test covering that path, a regression test is a CRITICAL requirement. Creates a test that reproduces the bug and **MUST fail** with current code.
798
- 3. **Phase 2: Fix** — Minimal change only. Blast radius check: if fix touches >5 files, stops and asks before editing.
799
- 4. **Phase 3: Verify** — Bug test must pass; full suite must show no new regressions.
800
- 5. **Phase 4: Root Cause Analysis** — Documents: Symptom, Root cause, Gap (why wasn't this caught earlier?), Prevention (one of: type constraint, validation, lint rule, spec update). Non-optional for serious bugs.
801
- 6. **Phase 5: Report** — Structured debug report with hypothesis, fix, evidence, and regression test reference.
802
-
803
- **Multiple bugs:** Triages by severity, fixes one at a time, commits each separately.
804
-
805
- ### /sp-review — Pre-Merge Quality Gate
806
-
807
- **Usage:**
808
- ```
809
- /sp-review # review all changes vs base branch
810
- /sp-review src/auth/ # review specific directory
811
- ```
812
-
813
- **How it works:**
814
-
815
- 1. **Phase 0: Understand Intent** — Reads commit messages, checks for related spec, expands blast radius. Also notes **what already exists**: flags if the diff rebuilds something that already exists in the codebase.
816
- 2. **Phase 1: Smart Focus** — Auto-detects what to focus on based on the diff (auth → security, SQL → injection, payments → idempotency, etc.). Spends 60% of analysis on the primary focus.
817
- 3. **Phase 2: Review** — Security, correctness, **API/Backend patterns** (unvalidated input, missing rate limiting, missing timeouts, missing CORS, error message leakage), spec-test alignment, code quality (including **diagram maintenance**: stale ASCII diagrams in comments are flagged), performance, a **Failure Mode Grid** for each new codepath (3 dimensions: test covers it? error handling exists? user sees a clear error or silent failure? — all 3 missing = Critical gap), and an **AI-generated code addendum** when reviewing AI-written changes (behavioral regressions, trust boundaries, architecture drift, model cost escalation).
818
- 4. **Phase 3: Report** — Structured report. Every finding includes a **confidence score** `(confidence: N/10)`: 9-10 = verified in code; 7-8 = strong pattern match; 5-6 = possible false positive; <5 = appendix only. Includes a **"Not in scope"** section listing deferred work with rationale.
819
-
820
- **Proportional review:** A 5-line doc change gets a light review. A 500-line auth rewrite gets file-by-file deep analysis.
821
-
822
- **Verdicts:** APPROVE / REQUEST CHANGES / NEEDS DISCUSSION.
823
-
824
- **Rules:**
825
- - At least 1 positive note — reinforces good patterns, not just problems
826
- - Never auto-fixes code — report only
827
- - Checks spec-test alignment: code changed → spec/acceptance scenarios/tests also changed?
828
-
829
- ### /sp-commit — Smart Git Commit
830
-
831
- **Usage:**
832
- ```
833
- /sp-commit
834
- ```
835
-
836
- **How it works:**
837
-
838
- 1. **Analyze** — Scans `git status`, diff stats, and file contents in one pass.
839
- 2. **Scan for secrets** — Matches patterns: `api_key`, `token`, `password`, `secret`, `private_key`, `credential`, `auth_token`. **Hard block** — stops immediately if found, non-negotiable.
840
- 3. **Scan for debug code** — Matches: `console.log`, `debugger`, `print()`, `TODO:remove`, `HACK:`, `FIXME:temp`, `binding.pry`, `var_dump`. **Soft warn** — proceeds if you confirm.
841
- 4. **Stage files** — Stages specific files by name. Never uses `git add -A`.
842
- 5. **Generate message** — Conventional format: `type(scope): description`. Imperative tense ("add" not "added"), no period, WHAT+WHY not HOW.
843
- 6. **Commit** — Does NOT push (safe default). Ask Claude explicitly to push.
844
-
845
- **Large diff warning:** If >10 files OR >300 lines changed, suggests splitting into smaller commits for easier review.
846
-
847
- **Never stages:** `.env`, credentials, build artifacts, generated files, binaries >1MB.
848
-
849
- **Breaking changes:** If the diff removes/renames a public function, export, or API endpoint, uses `feat!` or `fix!` type, or adds a `BREAKING CHANGE:` footer.
850
-
851
- ### /sp-voices — Multi-LLM Review (Optional)
852
-
853
- **Usage:**
854
- ```
855
- /sp-voices # review current diff with multi-LLM panel
856
- /sp-voices docs/specs/auth/auth.md # review a spec
857
- /sp-voices src/payment/ # review specific files
858
- ```
859
-
860
- **When to use:** Optional second opinion *after* `/sp-review` for high-stakes changes (auth, payment, data pipelines), when `/sp-review` returns mixed-confidence findings (most at 5–7), or any time you want cross-model verification before merge. Skip for routine refactors and small CRUD.
861
-
862
- **How it works:**
863
-
864
- 1. **Detect available LLMs** — Checks for OpenAI / Codex CLI / Gemini / Perplexity / Anthropic API / Ollama in priority order. Falls back to a self-spawned Claude sub-agent if no external LLM is available, with the limitation flagged in the report.
865
- 2. **Construct open-ended review prompts** — Same material to every voice with a light bias nudge (correctness / security / design). No structured templates, no severity scale forced on reviewers — they think freely; *we* structure the synthesis.
866
- 3. **Call voices in parallel** — 2–3 voices typically; temperature 0.3; graceful degradation if any voice fails.
867
- 4. **Synthesize** — Parses free-form responses into findings, classifies severity/category ourselves, identifies CONSENSUS (2+ voices agree → REINFORCED), UNIQUE findings (single voice → flag for verification), and DISAGREEMENTS (voices contradict → present both sides; tiebreaker for HIGH+).
868
- 5. **Output report** — Critical/High findings, disagreements, voice breakdown table, agreement rate (100% may indicate shared blind spot), blind spots (categories with 0 findings).
869
-
870
- **Decision points** (all use `AskUserQuestion`): review type ambiguous, voice panel size for large reviews, voice unavailable, critical consensus finding, disagreement resolution, follow-up cost > $0.10, report destination.
871
-
872
- **Rules:** Same material different lenses. Don't resolve disagreements — present both sides, human decides. Consensus ≠ correct (flag if agreement rate is 100%). Findings must be specific (`auth.ts:47` not "code could be improved").
873
-
874
- **Token cost:** 10–30k host + external API cost (Budget: ~$0.01–0.05; Standard: ~$0.05–0.20; Premium: ~$0.20–0.50 per review).
875
-
876
- ---
877
-
878
- ### /sp-humanize — Rephrase to Human Voice
879
-
880
- **Usage:**
881
- ```
882
- /sp-humanize <paste plan/notes/draft> # infer format + audience from context
883
- /sp-humanize reply jira <notes> # target a specific format
884
- /sp-humanize draft a customer email <notes> # switch audience, hide implementation
885
- ```
886
-
887
- **When to use:** You have a plan, bullet notes, or AI-generated draft and want it rewritten into natural, send-ready text — a PR description, release note, slack announcement, postmortem, customer reply, LinkedIn post, or plain email. Not part of the spec-first dev cycle. Skip for pure translation, summarization, or generating content from zero.
888
-
889
- **How it works:**
890
-
891
- 1. **Infer target format** — From explicit instruction → session context → input shape → fallback to tight plain text. No fixed whitelist; uncommon or hybrid formats follow their own conventions.
892
- 2. **Infer audience** — Engineering, customer, executive, public, or mixed. Same content, phrasing shifts by reader (technical terms for engineers, outcome-focused for customers).
893
- 3. **Preserve facts** — Numbers, names, error codes, file paths, commands, URLs, commitments, and decisions are never paraphrased. Certainty is never softened ("will ship Monday" ≠ "hope to ship Monday").
894
- 4. **Strip AI tone** — Removes em-dash overuse, banned buzzwords (EN + VI), hollow openings/closings, fake enthusiasm, and "rule of three" pile-ups. Varies sentence rhythm.
895
- 5. **Return send-ready text** — The final version directly, no preamble, no explanation of edits.
896
-
897
- **Language:** Follows the session's dominant language. Mixed Vietnamese-English is fine — technical terms stay untranslated.
898
-
899
- **Token cost:** 2–6k, no external API.
900
-
901
- ---
902
-
903
- ## 6. Automatic Guards (Hooks)
904
-
905
- Hooks run automatically — you don't invoke them. They provide passive protection.
906
-
907
- ### File Guard (`file-guard.js`)
908
-
909
- **Trigger:** After every Write or Edit operation.
910
- **Action:** If a modified **source code file** exceeds 350 lines, injects a warning suggesting modularization. Docs, configs, and templates are intentionally excluded — they are naturally long.
911
- **Blocking:** No — warns only, does not prevent the edit.
912
-
913
- **Checked extensions:** `.ts`, `.tsx`, `.js`, `.jsx`, `.py`, `.php`, `.rb`, `.rs`, `.go`, `.swift`, `.kt`, `.java`, `.cs`, `.cpp`, `.c`, `.dart`, `.vue`, `.svelte`, `.astro`, and more.
914
- **Not checked:** `.md`, `.json`, `.yaml`, `.toml`, `.html`, `.css`, `.sh`, and other non-source files.
915
-
916
- **Configuration:**
917
- ```bash
918
- # Change the line threshold (default: 350)
919
- export FILE_GUARD_THRESHOLD=500
920
-
921
- # Exclude files from checking (comma-separated globs)
922
- export FILE_GUARD_EXCLUDE="*.generated.swift,*.pb.go,*.min.js"
923
- ```
924
-
925
- ### Path Guard (`path-guard.sh`)
926
-
927
- **Trigger:** Before every Bash command.
928
- **Action:** Blocks commands that reference large directories (node_modules, build artifacts, etc.).
929
- **Blocking:** Yes — prevents the command from running.
930
-
931
- **Default blocked paths:**
932
- `node_modules`, `__pycache__`, `.git/objects`, `dist/`, `build/`, `.next/`, `vendor/`, `Pods/`, `.build/`, `DerivedData/`, `.gradle/`, `target/debug`, `target/release`, `.nuget`, `.cache`
933
-
934
- **Configuration:**
935
182
  ```bash
936
- # Add project-specific blocked paths (pipe-separated)
937
- export PATH_GUARD_EXTRA="\.terraform|\.vagrant|\.docker"
938
- ```
939
-
940
- ### Glob Guard (`glob-guard.js`)
941
-
942
- **Trigger:** Before every Glob (file search) operation.
943
- **Action:** Blocks overly broad glob patterns at project root that would return thousands of files and fill the context window.
944
- **Blocking:** Yes — prevents the glob and suggests scoped alternatives.
945
-
946
- **What it blocks:**
947
- - `**/*.ts` at project root (use `src/**/*.ts` instead)
948
- - `**/*` at project root (use `src/**/*` instead)
949
- - `*` or `**` at project root
950
- - Any recursive glob without a specific directory prefix
951
-
952
- **What it allows:**
953
- - `src/**/*.ts` — scoped to a specific directory
954
- - `tests/**/*.test.js` — scoped to tests
955
- - `**/*.ts` when run from inside a scoped directory (e.g., `path: "src"`)
956
-
957
- ### Comment Guard (`comment-guard.js`)
958
-
959
- **Trigger:** After every Edit operation.
960
- **Action:** Detects when real code is replaced with placeholder comments like `// ... existing code ...` or `// rest of implementation`. This is a common LLM laziness pattern.
961
- **Blocking:** Yes — rejects the edit and tells Claude to preserve the original code.
183
+ npx specpipe init . --agents cursor,codex # install for specific agents (a list, or `all`)
184
+ npx specpipe init . --skills core # skip optional render/humanize skills (or a comma list)
185
+ npx specpipe init . --hooks none # skills only, no guardrails (or --hooks shell,read)
186
+ npx specpipe init --global --agents claude,codex # install skills globally for chosen agents
962
187
 
963
- **What it catches:**
964
- - `// ... existing code ...`, `// ... rest of implementation`
965
- - `// [previous code remains]`, `// unchanged`
966
- - `/* ... */` replacing real code
967
- - `# ... existing ...` (Python placeholders)
968
- - `// TODO: implement` replacing real code
969
- - Any edit where real code is replaced with a much shorter comment-only block
970
-
971
- **What it allows:**
972
- - Editing comments (old content was already comments)
973
- - Adding comments alongside code (new content has both)
974
- - Normal code replacements
975
-
976
- ### Sensitive Guard (`sensitive-guard.sh`)
977
-
978
- **Trigger:** Before every Read, Write, Edit, and Bash command.
979
- **Action:** Protects files containing secrets: `.env`, private keys, credentials, tokens.
980
- **Blocking:** Read/Write/Edit → **blocks** (exit 2). Bash commands → **warns only** (allows access).
981
-
982
- The Bash warn-only behavior enables an approval flow: Claude asks the user for permission, and if approved, can use `bash cat .env` to read the file.
983
-
984
- **Protected files:**
985
- - `.env`, `.env.local`, `.env.production`, etc. (but NOT `.env.example`)
986
- - Private keys: `*.pem`, `*.key`, `*.p12`, `*.pfx`, `*.jks`
987
- - SSH keys: `id_rsa`, `id_ecdsa`, `id_ed25519`
988
- - Cloud credentials: `serviceAccountKey.json`, `firebase-adminsdk*`
989
- - Token files: `.npmrc`, `.pypirc`, `.netrc`
990
- - Any file matching `*credential*`, `*secret*`, `*private_key*`
991
-
992
- **Supports `.agentignore`:** Create a `.agentignore` file (or `.aiignore`, `.cursorignore`) in the project root with gitignore-style patterns to add project-specific protections.
993
-
994
- **Configuration:**
995
- ```bash
996
- # Add extra patterns (pipe-separated regex)
997
- export SENSITIVE_GUARD_EXTRA="\.vault|.*_token\.json"
188
+ npx specpipe check | diff | list # update available? · what changed? · installed status
189
+ npx specpipe upgrade # smart upgrade, preserves files you customized (--force overwrites)
190
+ npx specpipe remove [--agents <list>] [--dry-run] # uninstall (keeps your CLAUDE.md content + docs/)
998
191
  ```
999
192
 
1000
- ### Self-Review (`self-review.sh`)
1001
-
1002
- **Trigger:** When Claude is about to stop (Stop event).
1003
- **Action:** Injects a self-review checklist reminding Claude to verify quality before finishing.
1004
- **Blocking:** No — just a reminder.
193
+ **Requirements:** a supported agent CLI, Git, Node.js 18+, Bash 4+, and your project's own toolchain. No dependencies are added to your project.
1005
194
 
1006
- **Questions asked:**
1007
- 1. Did you leave any TODO/FIXME that should be resolved now?
1008
- 2. Did you create mock/fake implementations just to pass tests?
1009
- 3. Did you replace real code with placeholder comments?
1010
- 4. Do all changed files compile and typecheck cleanly?
1011
- 5. Did you run the full test suite, not just the new tests?
1012
- 6. Are there any files you modified but forgot to include in the summary?
1013
-
1014
- **Configuration:**
1015
- ```bash
1016
- # Disable self-review
1017
- export SELF_REVIEW_ENABLED=false
1018
- ```
195
+ **Global install** puts each agent's skills in its user-level dir (`~/.claude/skills/`, `~/.codex/skills/`, `~/.cursor/skills/`, `~/.gemini/antigravity-cli/skills/`, `~/.openclaw/skills/`, `~/.hermes/skills/`), so every project is covered. Per-project skills take precedence; Hermes is global-only; global hooks are Claude-only. The lifecycle remembers your skill selection — `upgrade` won't resurrect ones you deselected.
1019
196
 
1020
- ### Testing Hooks Manually
1021
-
1022
- You can test hooks by piping mock JSON payloads:
1023
-
1024
- ```bash
1025
- # ── Path Guard ──
1026
- # Should exit 2 (blocked)
1027
- echo '{"tool_input":{"command":"ls node_modules"}}' | bash .claude/hooks/path-guard.sh
1028
- echo $? # expect: 2
1029
-
1030
- # Should exit 0 (allowed)
1031
- echo '{"tool_input":{"command":"ls src"}}' | bash .claude/hooks/path-guard.sh
1032
- echo $? # expect: 0
1033
-
1034
- # ── File Guard ──
1035
- seq 1 250 > /tmp/test-large.txt
1036
- echo '{"tool_input":{"file_path":"/tmp/test-large.txt"}}' | node .claude/hooks/file-guard.js
1037
- # Should output JSON with additionalContext warning
1038
-
1039
- # ── Comment Guard ──
1040
- # Should exit 2 (blocked — replacing code with placeholder)
1041
- echo '{"tool_input":{"old_string":"function hello() {\n return world;\n}","new_string":"// ... existing code ..."}}' | node .claude/hooks/comment-guard.js
1042
- echo $? # expect: 2
1043
-
1044
- # Should exit 0 (allowed — replacing code with code)
1045
- echo '{"tool_input":{"old_string":"return a;","new_string":"return b;"}}' | node .claude/hooks/comment-guard.js
1046
- echo $? # expect: 0
1047
-
1048
- # ── Sensitive Guard ──
1049
- # Should exit 2 (blocked)
1050
- echo '{"tool_input":{"file_path":".env"}}' | bash .claude/hooks/sensitive-guard.sh
1051
- echo $? # expect: 2
1052
-
1053
- # Should exit 0 (allowed)
1054
- echo '{"tool_input":{"file_path":".env.example"}}' | bash .claude/hooks/sensitive-guard.sh
1055
- echo $? # expect: 0
1056
-
1057
- # Should exit 0 (warn only — bash commands are allowed for approved access)
1058
- echo '{"tool_input":{"command":"cat .env.local"}}' | bash .claude/hooks/sensitive-guard.sh
1059
- echo $? # expect: 0 (with warning on stderr)
1060
-
1061
- # ── Glob Guard ──
1062
- # Should exit 2 (blocked — broad pattern at root)
1063
- echo '{"tool_input":{"pattern":"**/*.ts"}}' | node .claude/hooks/glob-guard.js
1064
- echo $? # expect: 2
1065
-
1066
- # Should exit 0 (allowed — scoped pattern)
1067
- echo '{"tool_input":{"pattern":"src/**/*.ts"}}' | node .claude/hooks/glob-guard.js
1068
- echo $? # expect: 0
1069
- ```
197
+ After install, check the **Project Info** in `.claude/CLAUDE.md` and fix anything auto-detection missed. Per-skill behavior is tunable via env vars — see [docs/customization.md](docs/customization.md).
1070
198
 
1071
199
  ---
1072
200
 
1073
- ## 7. Spec Format
1074
-
1075
- ### Spec Template
1076
-
1077
- Create specs at `docs/specs/<feature>/<feature>.md`:
1078
-
1079
- ```markdown
1080
- # Spec: <Feature Name>
1081
-
1082
- **Created:** 2026-04-02
1083
- **Last updated:** 2026-04-02
1084
- **Status:** Draft | Active | Deprecated
1085
-
1086
- ## Overview
1087
- What this feature does, why it exists, who uses it. 2-3 sentences.
1088
-
1089
- ## Data Model
1090
- Entities, attributes, relationships (if applicable).
1091
-
1092
- ## Stories
1093
-
1094
- ### S-001: <Story name> (P0)
1095
-
1096
- **Description:** [user story]
1097
- **Source:** [optional: ticket/issue ref]
1098
-
1099
- **Acceptance Scenarios:**
1100
-
1101
- AS-001: <short description>
1102
- - **Given:** [state]
1103
- - **When:** [action]
1104
- - **Then:** [expected]
1105
- - **Data:** [test data]
1106
-
1107
- AS-002: <short description>
1108
- - **Given:** [error state]
1109
- - **When:** [action]
1110
- - **Then:** [error handling]
201
+ ## Docs
1111
202
 
1112
- ### S-002: <Story name> (P1)
1113
-
1114
- AS-003: <short description>
1115
- - **Given:** [state]
1116
- - **When:** [action]
1117
- - **Then:** [expected]
1118
-
1119
- ### S-003: <Story name> (P2)
1120
-
1121
- AS-004: <short description>
1122
- - [flow description + expected behavior]
1123
-
1124
- ## Constraints & Invariants
1125
- Rules that must always hold.
1126
-
1127
- ## Change Log
1128
-
1129
- | Date | Change | Ref |
1130
- |------|--------|-----|
1131
- | 2026-04-02 | Initial creation | -- |
1132
- ```
1133
-
1134
- Skip sections that don't apply. Match depth to feature complexity.
1135
-
1136
- **Acceptance Scenario depth by priority:**
1137
- - **P0:** Full Given + When + Then + Data + Setup. At least 1 happy path + 1 error path.
1138
- - **P1:** Given + When + Then. At least 1 happy path.
1139
- - **P2:** 1-2 line flow description. At least 1 scenario.
1140
-
1141
- ### Snapshots (Version History)
1142
-
1143
- When `/sp-plan` Mode C detects a Major change (new story, removed story, priority change, flow change, behavior change for P0, or constraint change), it automatically creates a snapshot before updating:
1144
-
1145
- ```
1146
- docs/specs/<feature>/snapshots/
1147
- 2026-04-02.md ← full copy at that point in time
1148
- 2026-04-05-BILL-101.md ← with ticket reference
1149
- ```
1150
-
1151
- Snapshots are immutable, managed by sp-plan (not developers), and capped at 5 most recent.
1152
-
1153
- ### Naming Conventions
1154
- | Item | Convention | Example |
1155
- |------|-----------|---------|
1156
- | Spec directory | `docs/specs/<feature>/` | `docs/specs/user-auth/` |
1157
- | Spec file | `<feature>.md` in feature directory | `user-auth.md` |
1158
- | Story ID | `S-NNN` sequential per spec | `S-001`, `S-005` |
1159
- | Scenario ID | `AS-NNN` sequential across all stories | `AS-001`, `AS-042` |
1160
- | Priority | `P0` (critical), `P1` (important), `P2` (nice-to-have) — per story | — |
1161
- | Snapshot | `YYYY-MM-DD.md` or `YYYY-MM-DD-<REF>.md` in `snapshots/` | `2026-04-02.md` |
203
+ | Doc | What's in it |
204
+ |-----|--------------|
205
+ | [multi-agent.md](docs/multi-agent.md) | How one skill emits into every agent's native format (verified path/format matrix) |
206
+ | [commands.md](docs/commands.md) | Full per-skill reference — phases, rules, outputs, token cost |
207
+ | [hooks.md](docs/hooks.md) | The guards — triggers, what each blocks, config, manual testing |
208
+ | [spec-format.md](docs/spec-format.md) | Spec template, AS depth by priority, snapshots, naming |
209
+ | [customization.md](docs/customization.md) | Environment variables, extending rules, custom skills |
210
+ | [troubleshooting.md](docs/troubleshooting.md) · [faq.md](docs/faq.md) | Hooks not firing, tests not detected, specs for tiny changes, … |
211
+ | [architecture.md](docs/architecture.md) · [adding-an-agent.md](docs/adding-an-agent.md) | CLI internals; how to add a new agent |
1162
212
 
1163
213
  ---
1164
214
 
1165
- ## 8. Customization
1166
-
1167
- ### Environment Variables
1168
-
1169
- | Variable | Default | Description |
1170
- |----------|---------|-------------|
1171
- | `FILE_GUARD_THRESHOLD` | `200` | Max lines before file guard warns |
1172
- | `FILE_GUARD_EXCLUDE` | _(empty)_ | Comma-separated globs to skip (e.g. `*.generated.swift`) |
1173
- | `PATH_GUARD_EXTRA` | _(empty)_ | Additional pipe-separated patterns to block (e.g. `\.terraform`) |
1174
- | `SENSITIVE_GUARD_EXTRA` | _(empty)_ | Additional pipe-separated patterns for sensitive files (e.g. `\.vault`) |
1175
- | `SELF_REVIEW_ENABLED` | `true` | Set to `false` to disable the self-review checklist on Stop |
1176
-
1177
- Set these in your shell profile or project `.envrc` (if using direnv).
1178
-
1179
- ### Extending CLAUDE.md
1180
-
1181
- Add project-specific rules to `.claude/CLAUDE.md`:
1182
-
1183
- ```markdown
1184
- ## Project-Specific Rules
1185
-
1186
- - All API endpoints must have OpenAPI annotations
1187
- - Database migrations must be reversible
1188
- - UI components must support dark mode
1189
- - All strings must be localized via i18n keys
1190
- ```
1191
-
1192
- ### Adding Custom Skills
1193
-
1194
- Create new skills in `.claude/skills/<name>/SKILL.md`:
1195
-
1196
- ```markdown
1197
- # .claude/skills/deploy/SKILL.md
1198
-
1199
- Run the deployment pipeline:
1200
- 1. /sp-review
1201
- 2. /sp-commit
1202
- 3. Run: bash scripts/deploy.sh $ARGUMENTS
1203
- 4. Verify deployment health: curl -f https://api.example.com/health
1204
- ```
1205
-
1206
- Then use: `/deploy staging`
1207
-
1208
- ---
1209
-
1210
- ## 9. Token Cost Guide
1211
-
1212
- | Activity | Tokens | Frequency |
1213
- |----------|--------|-----------|
1214
- | `/sp-scaffold` (greenfield bootstrap) | 15–40k + install/build time | Once per new project, before the first spec |
1215
- | `/sp-build` (incremental, 1-3 files) | 5–10k | Every code chunk |
1216
- | `/sp-investigate` (complex bug) | 8–15k | OPTIONAL before /sp-fix — complex/outage only |
1217
- | `/sp-fix` (single bug) | 3–5k | As needed |
1218
- | `/sp-commit` | 2–4k | Every commit |
1219
- | `/sp-review` (diff-based) | 10–20k | Before merge |
1220
- | `/sp-plan` (new feature) | 20–40k | Start of feature |
1221
- | `/sp-challenge` (adversarial review) | 15–30k | After /sp-plan, complex features |
1222
- | `/sp-spec-render` (HTML view) | 3–8k | User-invoked after /sp-plan when HTML view wanted, or to refresh stale `.html` |
1223
- | `/sp-md-render` (HTML view, any md) | 3–8k | User-invoked for non-spec markdown — investigation, explore, RFC, retro, README |
1224
- | `/sp-voices` (multi-LLM review) | 10–30k + external API cost (~$0.01–0.50) | Optional — after /sp-review for high-stakes changes |
1225
- | Full audit (manual prompt) | 100k+ | Before release |
1226
-
1227
- ### Minimizing Token Usage
1228
-
1229
- - **Test incrementally.** `/sp-build` after each small chunk uses 5-10k. Waiting until everything is done then running `/sp-build` on a large diff uses 50k+.
1230
- - **Use filters.** `/sp-build src/auth/login.ts` is cheaper than `/sp-build` on the whole project.
1231
- - **Skip `/sp-plan` for tiny changes.** Under 5 lines with no behavior change? Just `/sp-build` and `/sp-commit`.
1232
- - **Use `/sp-review` only before merge.** Not after every commit.
1233
-
1234
- ---
1235
-
1236
- ## 10. Troubleshooting
1237
-
1238
- ### Hook not firing
1239
-
1240
- **Symptom:** File guard or path guard doesn't trigger.
1241
-
1242
- **Check:**
1243
- 1. Is `settings.json` valid? `node -e "JSON.parse(require('fs').readFileSync('.claude/settings.json','utf-8'))"`
1244
- 2. Are hooks executable? `ls -la .claude/hooks/`
1245
- 3. Is Node.js available? `node --version`
1246
- 4. Is `$CLAUDE_PROJECT_DIR` set? Check in Claude Code with: `echo $CLAUDE_PROJECT_DIR`
1247
-
1248
- ### Tests not detected
1249
-
1250
- **Symptom:** `/sp-build` or `/sp-fix` can't figure out how to run the tests.
1251
-
1252
- **Check:**
1253
- 1. Are you in the project root? `pwd`
1254
- 2. Does the project marker file exist? (e.g., `package.json`, `Cargo.toml`, `pyproject.toml`)
1255
- 3. If your test command is non-standard, set it explicitly in `.claude/CLAUDE.md` under **Testing** so the skills use it.
1256
-
1257
- ### Wrong base branch
1258
-
1259
- **Symptom:** `/sp-build` or `/sp-review` compares against wrong branch.
1260
-
1261
- **Check:**
1262
- ```bash
1263
- git symbolic-ref refs/remotes/origin/HEAD
1264
- ```
1265
-
1266
- If this is wrong or missing:
1267
- ```bash
1268
- git remote set-head origin <your-main-branch>
1269
- ```
1270
-
1271
- ### Path guard blocking a legitimate command
1272
-
1273
- **Symptom:** Claude can't run a command you need.
1274
-
1275
- **Fix:** The path guard blocks broad patterns. If you need to access `build/` for a specific reason, run the command directly in your terminal (not through Claude Code).
1276
-
1277
- ### File guard warning on generated files
1278
-
1279
- **Fix:** Set the exclude pattern:
1280
- ```bash
1281
- export FILE_GUARD_EXCLUDE="*.generated.swift,*.pb.go,*.min.js,*.snap"
1282
- ```
1283
-
1284
- ---
1285
-
1286
- ## 11. FAQ
1287
-
1288
- **Q: Do I need specs for every tiny change?**
1289
- A: No. Changes under 5 lines with no behavior change can skip the spec. Just `/sp-build` and `/sp-commit`. The spec-first rule is for meaningful behavior changes.
1290
-
1291
- **Q: Can I use mocks in tests?**
1292
- A: Only for external services you can't run locally (third-party APIs, email services). Never mock your own code or database just to make tests pass faster.
1293
-
1294
- **Q: What if Claude writes a test that tests the wrong thing?**
1295
- A: This usually means the spec is ambiguous. Clarify the spec first, then re-run `/sp-build`. Good specs produce good tests.
1296
-
1297
- **Q: Can I use this with other AI coding tools?**
1298
- A: Yes. `specpipe init --agents <list>|all` installs the skills for Codex, Cursor, Antigravity, OpenClaw, and Hermes, each in its native format. Guards are hook-*enforced* for Claude, Codex, and Cursor (`.codex/hooks.json` / `.cursor/hooks.json` can block tool calls); Antigravity, OpenClaw, and Hermes get them as always-on advisory rules. The specs and workflow are tool-agnostic. See [docs/multi-agent.md](docs/multi-agent.md).
1299
-
1300
- **Q: When should I use `/sp-challenge`?**
1301
- A: After `/sp-plan`, for complex features involving authentication, payments, data pipelines, or multi-service integration. It spawns parallel hostile reviewers that find security holes, failure modes, and false assumptions BEFORE you write code. Skip it for simple CRUD or small features — the overhead isn't worth it.
1302
-
1303
- **Q: How do I do a full coverage audit?**
1304
- A: This is intentionally not a command (it's expensive and rare). When needed, prompt Claude directly: "Audit test coverage for feature X against docs/specs/X/X.md acceptance scenarios. Identify gaps and write missing tests."
1305
-
1306
- **Q: What if my project uses multiple languages?**
1307
- A: The skills auto-detect the test command from the first project marker they find. For monorepos, run `/sp-build` from each sub-project directory, or pin the test command per project in `.claude/CLAUDE.md` under **Testing**.
1308
-
1309
- **Q: Can I add more skills?**
1310
- A: Yes. Create a directory `.claude/skills/<name>/SKILL.md` and it becomes available as a slash command. See [Customization](#8-customization).
1311
-
1312
- **Q: How do I update the kit in existing projects?**
1313
- A: Run `npx specpipe upgrade`. It automatically detects which files you've customized and only updates unchanged files. Use `--force` to overwrite everything.
1314
-
1315
- **Q: What's the HTML view next to my spec, and how do I generate it?**
1316
- A: It's a scannable view of the spec — sidebar TOC, story cards, collapsible AS, dark/light theme. Reading a 1000-line spec markdown in an editor is painful; the HTML is what a tired human can actually skim. Generate or refresh it by running `/sp-spec-render <feature>` — `/sp-plan` does not create it automatically, it just suggests the command at the end. `.md` remains the source of truth (AI and `/sp-build` read it, git diffs work normally). `.html` is a regenerable artifact — never edit it by hand, let `/sp-spec-render` rebuild it. You can email/Slack the HTML to PMs/stakeholders who don't want to clone the repo.
1317
-
1318
- **Q: I installed with the old setup.sh — how do I migrate?**
1319
- A: Run `npx specpipe init --adopt .` to generate a manifest from your existing files without overwriting anything. Future upgrades will then work normally.
215
+ Issues and PRs welcome — see [CONTRIBUTING.md](CONTRIBUTING.md). Security: [SECURITY.md](SECURITY.md). License: [MIT](LICENSE) © Microvn