create-issflow 1.0.2 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. package/README.md +61 -56
  2. package/bin/cli.js +269 -259
  3. package/package.json +32 -28
  4. package/template/.claude/agents/debugger.md +47 -47
  5. package/template/.claude/agents/e2e-runner.md +66 -66
  6. package/template/.claude/agents/implementer.md +79 -75
  7. package/template/.claude/agents/planner.md +93 -71
  8. package/template/.claude/agents/researcher.md +103 -103
  9. package/template/.claude/agents/synthesizer.md +78 -72
  10. package/template/.claude/agents/test-author.md +70 -70
  11. package/template/.claude/commands/change-request.md +53 -0
  12. package/template/.claude/commands/log-decision.md +33 -33
  13. package/template/.claude/commands/log-issue.md +28 -28
  14. package/template/.claude/commands/overview.md +114 -99
  15. package/template/.claude/commands/phase.md +230 -202
  16. package/template/.claude/commands/propose.md +71 -0
  17. package/template/.claude/commands/quick.md +30 -30
  18. package/template/.claude/commands/replan.md +68 -63
  19. package/template/.claude/commands/store-wisdom.md +195 -195
  20. package/template/.claude/commands/synthesize.md +26 -26
  21. package/template/.claude/commands/unstuck.md +40 -40
  22. package/template/.claude/hooks/pre-compact.js +42 -0
  23. package/template/.claude/hooks/session-start.js +137 -0
  24. package/template/.claude/hooks/subagent-stop.js +18 -0
  25. package/template/.claude/istartsoft-flow/METHODOLOGY.md +403 -229
  26. package/template/.claude/skills/caveman/SKILL.md +39 -39
  27. package/template/.claude/skills/code-standards/SKILL.md +61 -0
  28. package/template/.claude/skills/code-standards/references/architecture.md +61 -0
  29. package/template/.claude/skills/code-standards/references/naming.md +60 -0
  30. package/template/.claude/skills/grill-me/SKILL.md +31 -10
  31. package/template/.claude/skills/karpathy-guidelines/SKILL.md +34 -34
  32. package/template/.claude/skills/security/SKILL.md +70 -0
  33. package/template/.claude/skills/security/references/pentest-checklist.md +46 -0
  34. package/template/.claude/skills/security/references/secure-coding.md +50 -0
  35. package/template/.claude/skills/security/references/standards.md +60 -0
  36. package/template/.claude/skills/security/references/threat-modeling.md +36 -0
  37. package/template/.claude/skills/ux-design/SKILL.md +113 -99
  38. package/template/.claude/skills/ux-design/{wireframe-template.md → references/wireframe-template.md} +95 -95
  39. package/template/.claude/templates/proposal.html +126 -0
  40. package/template/.claude/hooks/pre-compact.sh +0 -25
  41. package/template/.claude/hooks/session-start.sh +0 -120
  42. package/template/.claude/hooks/subagent-stop.sh +0 -11
@@ -1,229 +1,403 @@
1
- # iStartSoftFlow — portable agent methodology (single source of truth)
2
-
3
- > **The iStartSoft execution loop.** Namespaced under `.claude/istartsoft-flow/`
4
- > so it coexists with a repo's own agent-instruction files (`CLAUDE.md`,
5
- > `AGENTS.md`, `GEMINI.md`, …) — it does NOT replace them. The kit is
6
- > **stack-agnostic and tool-agnostic**: it pins a *process*, not a *stack*.
7
- > Declare your stack (language, framework, infra, auth, test + E2E runner,
8
- > planning source) once in `docs/OVERVIEW.md`; every rule below references *your
9
- > declared stack* and hardcodes none. If infra is **managed** (a PaaS + a managed
10
- > datastore), Phase 0 (infra) is N/A and phases begin at the first vertical slice;
11
- > otherwise Phase 0 provisions infra first. Planning source of truth stays in your
12
- > PRD / architecture / stories (e.g. BMAD / iSSM); iStartSoftFlow is the execution
13
- > loop layered on top.
14
-
15
- <!-- ISTARTSOFTFLOW-AGENTS-SENTINEL-v2.0 -->
16
- > **SENTINEL.** The HTML comment above (`ISTARTSOFTFLOW-AGENTS-SENTINEL-v2.0`) is a
17
- > load-bearing marker. The installer (`create-issflow`) and tooling grep for it to
18
- > confirm this file resolved on disk and was not clobbered. Do not remove or rename it.
19
-
20
- > **What this file is.** The complete, tool-agnostic methodology for the iStartSoftFlow
21
- > workflow: the loop, the roles, the procedures, the rituals, and the hard rules.
22
- > This is the ONE place every rule lives. Claude Code, and any tool that reads the
23
- > open `AGENTS.md` standard, get the full methodology from here.
24
-
25
- > **Anti-drift invariant (load-bearing).** Every rule lives in exactly ONE place:
26
- > this file. `CLAUDE.md` restates NO rule — it only maps roles to Claude-native
27
- > files and says "native mechanism X performs ritual Y automatically." A rule and
28
- > its automation may never contradict. Duplication between files is an
29
- > architectural defect, not a convenience.
30
-
31
- Caveman ULTRA mode always on. Apply the `karpathy-guidelines` skill (engineering
32
- discipline) on every coding and debugging task. Apply the `ux-design` skill (the UX
33
- cookbook + wireframe baseline) on every UI-facing task.
34
-
35
- -----
36
-
37
- ## The loop
38
-
39
- `design-research -> grill (×2) -> plan -> implement -> test -> deploy`, one
40
- VERTICAL SLICE per phase. Implement AND test a phase before the next. The last
41
- phase always includes deployment.
42
-
43
- A phase runs in one of two orders, chosen at RESEARCH time by the TDD
44
- APPLICABILITY check (see Procedures → `phase`):
45
-
46
- - **TDD phase** (`TDD_PHASE=true`):
47
- `RESEARCH -> SCAFFOLD -> RED -> GREEN -> TEST(e2e) -> FIX -> CLOSE`
48
- - **Non-TDD phase** (pure infra / config / doc — `TDD_PHASE=false`):
49
- `RESEARCH -> IMPLEMENT -> TEST -> FIX -> CLOSE`
50
-
51
- `TDD_PHASE` = "the phase adds or changes a public callable surface (an endpoint,
52
- exported function/class, CLI command, or message contract) that is assertable
53
- from the acceptance spec." Size is NOT the criterion. On ambiguity, default
54
- `TDD_PHASE=true` AND state the classification + reason so a human can override to
55
- non-TDD before SCAFFOLD fires.
56
-
57
- -----
58
-
59
- ## Roles (fresh-context workers)
60
-
61
- Each role is a fresh-context worker mapped to a named Claude Code subagent in
62
- `.claude/agents/*.md`. A worker dumps its noise to a file and returns only a
63
- terse summary + path. Workers cannot address the user — only the orchestrator
64
- can. Escalation is at most two hops.
65
-
66
- - **researcher** two modes. DESIGN: domain/constraint research before planning
67
- (service limits, API contracts, architectural constraints, cost surprises).
68
- IMPL: per-phase codebase + service investigation. Checks the shared KB snapshot
69
- first (step 0). Writes findings to `docs/research/`; returns terse summary + path.
70
- - **planner** — research → vertical-slice `docs/PLAN.md`. Phase 0 (infra) leads
71
- when there is infra to provision; with managed infra it is N/A and the plan
72
- begins at the first slice. The last phase always contains the deploy task.
73
- - **implementer**builds ONE phase. Two MODES for TDD phases (SCAFFOLD: stubs
74
- only; FILL: logic to green) plus a legacy full-build mode for non-TDD phases.
75
- Writes code, never tests. Maintains `docs/ENDPOINTS.md` each phase.
76
- - **test-author** writes tests BLIND (never reads implementation logic). On TDD
77
- phases it is dispatched BEFORE logic exists (RED-first), so blindness is
78
- structural, not honor-system. Writes a MOCK suite + a REAL API suite.
79
- - **e2e-runner** — writes/runs functional browser E2E (your declared E2E runner,
80
- e.g. Playwright) BLIND. Reads only the acceptance spec + `docs/ENDPOINTS.md`,
81
- never the implementation.
82
- - **debugger** — debugs in an ISOLATED context. Writes a trace to
83
- `docs/research/debug-<slug>.md`; returns a summary.
84
- - **synthesizer** compresses `docs/STATE.md` / `docs/ISSUES.md`, prunes
85
- snapshots. On the final phase, also updates `README.md` + `docs/OVERVIEW.md`.
86
-
87
- The orchestrator ROUTES. It does not implement or debug.
88
-
89
- -----
90
-
91
- ## Procedures (the slash-command set)
92
-
93
- Named procedures, each with a canonical body in `.claude/commands/<name>.md`.
94
-
95
- - **overview** — bootstrap a project: design-research → grill r1 → design-research
96
- → re-grill r2 → `OVERVIEW.md` → planner → `PLAN.md`.
97
- - **phase [n]** — run one phase end-to-end with the circuit breaker. Chooses the
98
- TDD or non-TDD order at RESEARCH. CLOSE runs the regression guard + ENDPOINTS
99
- coverage gate.
100
- - **quick [change]** small, obvious, non-phase change; no agent chain. Stays
101
- non-TDD. Runs the mock regression corpus after the change.
102
- - **unstuck** — deep re-research after a circuit breaker (human-triggered).
103
- - **synthesize** compress STATE.md, dedup ISSUES.md, prune snapshots. Run
104
- before a context reset.
105
- - **replan** revise `PLAN.md` (add/cut/split/merge/reorder pending phases) and
106
- reconcile the regression corpus in step.
107
- - **log-issue** append an error to `ISSUES.md` with root cause + failed attempts.
108
- - **log-decision** record an architectural change in `docs/DESIGN_LOG.md`.
109
- - **store-wisdom** promote resolved issues + research to the shared KB.
110
-
111
- -----
112
-
113
- ## Rituals (model-run fallback for hooks)
114
-
115
- Where the host tool can run lifecycle hooks (Claude Code), these rituals are
116
- AUTOMATED by hook scripts and must NOT be run by hand. Where the tool cannot
117
- inject context, the model performs them itself.
118
-
119
- ### SESSION-OPEN (start / clear / compact-resume)
120
-
121
- At the start of every session, before any other work, surface:
122
- 1. git state (branch, uncommitted count, last 3 commits).
123
- 2. `docs/STATE.md` the current position. READ THIS FIRST.
124
- 3. open items in `docs/ISSUES.md`.
125
- 4. `docs/research/INDEX.md` (research map) + infra/auth status.
126
- 5. shared KB: pull latest + load `docs/.kb-snapshot.md` if `.claude/kb-config.json`
127
- exists.
128
- 6. a one-line reminder of the hard rules below.
129
-
130
- ### COMPRESS (before a context compaction)
131
-
132
- Snapshot the live position to `docs/.snapshots/` so a post-compact session can
133
- recover: current phase, next action, open blocker.
134
-
135
- -----
136
-
137
- ## Hard rules (1–10)
138
-
139
- 1. Before debugging ANY error: grep `docs/ISSUES.md` AND `docs/research/INDEX.md`.
140
- The SESSION-OPEN ritual surfaces ISSUES.md there is no excuse to miss it.
141
- Before debugging an auth/infra error, check the infra + auth status surfaced
142
- at SESSION-OPEN first.
143
- 2. Debug attempt cap = 3: WARN the user at attempt 2; the FIRST hard-stop at 3
144
- STOPS and asks the user. No 4th in-place attempt.
145
- 3. Every resolved error -> logged to `docs/ISSUES.md` with root cause + failed
146
- attempts.
147
- 4. End of phase -> synthesize -> context reset -> next phase.
148
- 5. **PHASE GATE** = the current-phase REAL API suite passes AND (frontend phase)
149
- the E2E suite passes AND the accumulated mock regression corpus stays green AND
150
- every `docs/ENDPOINTS.md` entry has at least one test in `tests/regression/`.
151
- The final phase additionally runs the full REAL regression corpus. A green
152
- mock suite alone can never close a phase.
153
- 6. Tests are written by `test-author`, which never sees the implementation logic
154
- (unbiased). On TDD phases the suite is written before the logic (RED-first).
155
- `STACK NOT READY` / `FLAKE` do not spend the debug budget. Only `LOGIC FAIL`
156
- reaches the debugger.
157
- 7. E2E auth = a dedicated test account driven by a PROGRAMMATIC session
158
- (an API login or a saved/reused auth state), never by scripting a
159
- third-party OAuth/login UI.
160
- 8. Architectural change (new/removed agent, hook, command, or a changed workflow
161
- rule)? -> run `log-decision` before closing.
162
- 9. **UI conforms to the frame.** Every UI-facing change is validated against the
163
- `ux-design` cookbook (design tokens, spacing scale, a11y/WCAG AA, component +
164
- state inventory, breakpoints) AND stays inside the wireframe baseline. Drift
165
- outside the wireframe frame is a defect, not a creative liberty. A frontend
166
- phase cannot CLOSE until the UX cookbook check passes.
167
- 10. **No-rationalization (scoped).** Do not downgrade a TDD phase to non-TDD to
168
- dodge the RED gate, and do not route phase-worthy work through `quick` to
169
- dodge it. (Scoped deliberately to these two seams; this is not a broad
170
- "never make excuses" rule.)
171
-
172
- -----
173
-
174
- ## Shared KB (optional)
175
-
176
- If `.claude/kb-config.json` exists, the SESSION-OPEN ritual pulls the KB and loads
177
- a snapshot to `docs/.kb-snapshot.md`. The researcher checks the snapshot (step 0)
178
- before any web search. Run `store-wisdom` to promote resolved issues + research to
179
- the KB. The kit works normally without a KB.
180
-
181
- -----
182
-
183
- ## File contract
184
-
185
- - `docs/STATE.md`current position. Small. Rewritten, not appended.
186
- - `docs/ISSUES.md` — error log. Deduped by synthesizer.
187
- - `docs/PLAN.md` — the phase plan. The last phase has the deploy task.
188
- - `docs/HISTORY.md` — one line per finished phase.
189
- - `docs/DESIGN_LOG.md` kit architectural rationale (§5.x decision log).
190
- - `docs/OVERVIEW.md` — project scope. Written after the double-grill in `overview`.
191
- E2E target.
192
- - `docs/ENDPOINTS.md` API/service endpoint catalogue. Maintained by implementer
193
- each phase. Drives the CLOSE coverage gate.
194
- - `docs/research/` — full research + debug files. `INDEX.md` is the searchable map.
195
- `design-<slug>.md` (design research), `<slug>.md` (impl research),
196
- `debug-<slug>.md` (debugger traces).
197
- - `docs/.snapshots/` pre-compact recovery markers (auto-pruned, gitignored).
198
- Holds no secrets.
199
- - your E2E stack runner config + any ephemeral test services (e.g. `e2e/`,
200
- `playwright.config.ts`, `scripts/e2e-stack.sh`, `docker-compose.test.yml`).
201
- Names are conventions; use whatever your declared stack ships.
202
- - `tests/phase-<n>/` phase-local test suites.
203
- - `tests/regression/` — cross-phase contract tests (the regression corpus). Run by
204
- `scripts/regression.sh` (default mock; `--real` runs the real corpus).
205
- - `.claude/skills/ux-design/` — the UX cookbook + wireframe baseline (read on
206
- demand for any UI work).
207
- - `.claude/kb-config.json` — shared KB path + remote (optional).
208
- - `docs/.kb-snapshot.md` KB INDEX loaded this session (auto-generated, gitignored).
209
-
210
- -----
211
-
212
- ## Capability matrix (which host gets what)
213
-
214
- The kit is single-source (`.claude/` + this file). `create-issflow --tool=<host>`
215
- writes the right adapter; unsupported features degrade to model-run rituals, never
216
- silently vanish. The portable assets (agents, commands, skills, methodology) are
217
- the same everywhere — only the *wiring* differs.
218
-
219
- | Host | Entry file | Commands | Subagents | Lifecycle hooks | Shared KB |
220
- |------|-----------|----------|-----------|-----------------|-----------|
221
- | **Claude Code** (reference) | `AGENTS.md` + `.claude/` | `.claude/commands/` | native | SessionStart · PreCompact · SubagentStop (with context injection) | yes |
222
- | **Codex CLI** | `AGENTS.md` (native) | `.claude/commands/` (read as prompts) | read as reference | model-run | yes |
223
- | **Cursor** | `.cursor/rules/` + `AGENTS.md` | `.cursor/commands/` | reads `.claude/agents/` | `.cursor/hooks.json` (sessionStart · subagentStop) | yes |
224
- | **Gemini CLI** | `GEMINI.md` + `AGENTS.md` | `.claude/commands/` (read as prompts) | read as reference | model-run | yes |
225
- | **Aider** | `.aider.conf.yml` `AGENTS.md` | read as reference | model-run | model-run | yes |
226
- | **Any AGENTS.md host** | `AGENTS.md` | read as reference | model-run | model-run | yes |
227
-
228
- "model-run" = the host can't automate the ritual, so the model performs it by hand
229
- (SESSION-OPEN at the top of each session; the COMPRESS snapshot before a reset).
1
+ # iStartSoftFlow — portable agent methodology (single source of truth)
2
+
3
+ > **The iStartSoft execution loop.** Namespaced under `.claude/istartsoft-flow/`
4
+ > so it coexists with a repo's own agent-instruction files (`CLAUDE.md`,
5
+ > `AGENTS.md`, `GEMINI.md`, …) — it does NOT replace them. The kit is
6
+ > **stack-agnostic and tool-agnostic**: it pins a *process*, not a *stack*.
7
+ > Declare your stack (language, framework, infra, auth, test + E2E runner,
8
+ > planning source) once in `docs/OVERVIEW.md`; every rule below references *your
9
+ > declared stack* and hardcodes none. If infra is **managed** (a PaaS + a managed
10
+ > datastore), Phase 0 (infra) is N/A and phases begin at the first vertical slice;
11
+ > otherwise Phase 0 provisions infra first. Planning source of truth stays in your
12
+ > PRD / architecture / stories (e.g. BMAD / iSSM); iStartSoftFlow is the execution
13
+ > loop layered on top.
14
+
15
+ <!-- ISTARTSOFTFLOW-AGENTS-SENTINEL-v2.0 -->
16
+ > **SENTINEL.** The HTML comment above (`ISTARTSOFTFLOW-AGENTS-SENTINEL-v2.0`) is a
17
+ > load-bearing marker. The installer (`create-issflow`) and tooling grep for it to
18
+ > confirm this file resolved on disk and was not clobbered. Do not remove or rename it.
19
+
20
+ > **What this file is.** The complete, tool-agnostic methodology for the iStartSoftFlow
21
+ > workflow: the loop, the roles, the procedures, the rituals, and the hard rules.
22
+ > This is the ONE place every rule lives. Claude Code, and any tool that reads the
23
+ > open `AGENTS.md` standard, get the full methodology from here.
24
+
25
+ > **Anti-drift invariant (load-bearing).** Every rule lives in exactly ONE place:
26
+ > this file. `CLAUDE.md` restates NO rule — it only maps roles to Claude-native
27
+ > files and says "native mechanism X performs ritual Y automatically." A rule and
28
+ > its automation may never contradict. Duplication between files is an
29
+ > architectural defect, not a convenience.
30
+
31
+ Caveman ULTRA mode always on. Apply the `karpathy-guidelines` skill (engineering
32
+ discipline) on every coding and debugging task. Apply the `ux-design` skill (the UX
33
+ cookbook + wireframe baseline) on every UI-facing task. Apply the `security` skill
34
+ (the Secure SDLC cookbook) at design (threat model), while coding (secure coding),
35
+ and before any deploy. Apply the `code-standards` skill (naming per the language's
36
+ own idiom + the declared architecture) on every coding task.
37
+
38
+ -----
39
+
40
+ ## The loop
41
+
42
+ `design-research -> grill (×2) -> plan -> implement -> test -> deploy`, one
43
+ VERTICAL SLICE per phase. Implement AND test a phase before the next. The last
44
+ phase always includes deployment.
45
+
46
+ A phase runs in one of two orders, chosen at RESEARCH time by the TDD
47
+ APPLICABILITY check (see Procedures `phase`):
48
+
49
+ - **TDD phase** (`TDD_PHASE=true`):
50
+ `RESEARCH -> SCAFFOLD -> RED -> GREEN -> TEST(e2e) -> FIX -> CLOSE`
51
+ - **Non-TDD phase** (pure infra / config / doc `TDD_PHASE=false`):
52
+ `RESEARCH -> IMPLEMENT -> TEST -> FIX -> CLOSE`
53
+
54
+ `TDD_PHASE` = "the phase adds or changes a public callable surface (an endpoint,
55
+ exported function/class, CLI command, or message contract) that is assertable
56
+ from the acceptance spec." Size is NOT the criterion. On ambiguity, default
57
+ `TDD_PHASE=true` AND state the classification + reason so a human can override to
58
+ non-TDD before SCAFFOLD fires.
59
+
60
+ -----
61
+
62
+ ## Project lifecycle (real-world delivery)
63
+
64
+ The loop above is the BUILD engine. Around it runs a full client-delivery lifecycle
65
+ — every stage produces an artifact and is logged, so a project has a complete trail
66
+ from idea to closeout:
67
+
68
+ 1. **Discover** idea requirements, captured by `/overview` (the double-grill).
69
+ 2. **PRD** crystallised requirements in `docs/PRD.md` (or your BMAD/iSSM stories).
70
+ 3. **Stack & architecture** — decided in `/overview` design-research → `OVERVIEW.md`.
71
+ 4. **Plan** `/overview`'s `planner` `docs/PLAN.md` (the vertical-slice phases).
72
+ The plan exists before the proposal, because the proposal estimates *these* phases.
73
+ 5. **Proposal & estimate (OPTIONAL depends on the job)** for client / quoted
74
+ work, `/propose` reads OVERVIEW + PLAN `docs/PROPOSAL.md` + a rendered
75
+ `docs/proposal.html`: scope, phase breakdown, effort + cost estimate, timeline,
76
+ assumptions, with a **client sign-off gate** before build. Internal / personal
77
+ projects skip straight from plan to build.
78
+ 6. **Build** the loop, one phase at a time (`/phase`, AUTO dev loop).
79
+ 7. **Change mid-flight** — `/change-request`: impact analysis + re-estimate + a logged
80
+ change order (`docs/CHANGES.md`) + sign-off, then `/replan`. Scope and cost never
81
+ change silently.
82
+ 8. **Deploy** — in the final phase.
83
+ 9. **Closeout** — `/synthesize` (final pass) → a project summary: what was built, key
84
+ decisions, every change order, and the final cost vs the original estimate.
85
+
86
+ **Logging is continuous and total.** Every stage writes to a durable artifact:
87
+ requirements (PRD / OVERVIEW), commercial (PROPOSAL / CHANGES), execution
88
+ (PLAN / HISTORY), decisions (DESIGN_LOG), errors (ISSUES), research (research/).
89
+ Nothing important lives only in chat — it is on disk, so the project can always be
90
+ reconstructed and summarised.
91
+
92
+ **Commercial gates are always interactive** (both modes): the proposal sign-off and
93
+ every change-order approval pause for the human. AUTO governs the *dev loop between*
94
+ those gates, never the money decisions.
95
+
96
+ -----
97
+
98
+ ## BMAD integration (planning front-end)
99
+
100
+ iStartSoftFlow is the EXECUTION loop; **BMAD-METHOD** is an optional PLANNING
101
+ front-end. They compose BMAD plans, iStartSoftFlow builds — with no duplication:
102
+
103
+ | BMAD (plan) | feeds | iStartSoftFlow (execute) |
104
+ |-------------|---------|--------------------------|
105
+ | Analyst / PM / Architect / PO agents | → | `/overview` grill + `researcher` + `planner` |
106
+ | PRD + Architecture | → | `docs/OVERVIEW.md` (+ `docs/PRD.md`) |
107
+ | sharded epics / story files | → | `docs/PLAN.md` phases (1 story 1 phase) |
108
+ | SM "story with embedded context" | → | the phase **context package** (rationale + architecture + impl notes + qa focus + sharp acceptance) |
109
+ | Dev QA | | `implementer` `test-author` + the phase gates (TDD · UX · security · code-standards) |
110
+
111
+ Principles (lean, no bloat):
112
+ - **Don't duplicate agents.** BMAD's planning roles map onto our grill + planner +
113
+ researcher we ship no copies of them. The **iSSM MCP** already holds the BMAD
114
+ artifacts (PRD / architecture / stories) and feeds `/overview`.
115
+ - **Adopt the signature pattern context-engineered phases.** Each `PLAN.md` phase
116
+ is a self-contained story: it embeds the rationale, the architecture it touches,
117
+ implementation constraints, and QA focus, so the implementer / test-author need no
118
+ extra digging (see `planner`). This is BMAD's biggest win and it's cheap to adopt.
119
+ - **Scale-adaptive.** Small change → `/quick` (BMAD "lightweight"); a real slice →
120
+ `/phase` (BMAD "heavyweight"). Pick the smaller that fits.
121
+ - **Optional sharding for big plans.** A large `PLAN.md` may be split into
122
+ `docs/plan/<epic>.md` shards so a phase loads only its slice — finer-grained token
123
+ economy on top of the per-phase reset. Opt-in.
124
+
125
+ BMAD-METHOD is MIT and installed separately (`npx bmad-method install`); use it for
126
+ the planning phase when a project needs that rigor, then drive delivery here.
127
+
128
+ -----
129
+
130
+ ## Roles (fresh-context workers)
131
+
132
+ Each role is a fresh-context worker mapped to a named Claude Code subagent in
133
+ `.claude/agents/*.md`. A worker dumps its noise to a file and returns only a
134
+ terse summary + path. Workers cannot address the user — only the orchestrator
135
+ can. Escalation is at most two hops.
136
+
137
+ - **researcher** two modes. DESIGN: domain/constraint research before planning
138
+ (service limits, API contracts, architectural constraints, cost surprises).
139
+ IMPL: per-phase codebase + service investigation. Checks the shared KB snapshot
140
+ first (step 0). Writes findings to `docs/research/`; returns terse summary + path.
141
+ - **planner** research vertical-slice `docs/PLAN.md`. Phase 0 (infra) leads
142
+ when there is infra to provision; with managed infra it is N/A and the plan
143
+ begins at the first slice. The last phase always contains the deploy task.
144
+ - **implementer** builds ONE phase. Two MODES for TDD phases (SCAFFOLD: stubs
145
+ only; FILL: logic to green) plus a legacy full-build mode for non-TDD phases.
146
+ Writes code, never tests. Maintains `docs/ENDPOINTS.md` each phase.
147
+ - **test-author** writes tests BLIND (never reads implementation logic). On TDD
148
+ phases it is dispatched BEFORE logic exists (RED-first), so blindness is
149
+ structural, not honor-system. Writes a MOCK suite + a REAL API suite.
150
+ - **e2e-runner** — writes/runs functional browser E2E (your declared E2E runner,
151
+ e.g. Playwright) BLIND. Reads only the acceptance spec + `docs/ENDPOINTS.md`,
152
+ never the implementation.
153
+ - **debugger** debugs in an ISOLATED context. Writes a trace to
154
+ `docs/research/debug-<slug>.md`; returns a summary.
155
+ - **synthesizer** compresses `docs/STATE.md` / `docs/ISSUES.md`, prunes
156
+ snapshots. On the final phase, also updates `README.md` + `docs/OVERVIEW.md`.
157
+
158
+ The orchestrator ROUTES. It does not implement or debug.
159
+
160
+ -----
161
+
162
+ ## Procedures (the slash-command set)
163
+
164
+ Named procedures, each with a canonical body in `.claude/commands/<name>.md`.
165
+
166
+ - **overview** bootstrap a project: design-research grill r1 → design-research
167
+ re-grill r2 `OVERVIEW.md` planner `PLAN.md`.
168
+ - **propose** turn approved requirements + stack into `PROPOSAL.md` (scope, phase
169
+ breakdown, effort + cost estimate, assumptions) with a client sign-off gate.
170
+ - **change-request** a mid-project scope change: impact analysis + re-estimate +
171
+ a logged change order (`CHANGES.md`) + sign-off, then `replan`.
172
+ - **phase [n]** — run one phase end-to-end with the circuit breaker. Chooses the
173
+ TDD or non-TDD order at RESEARCH. CLOSE runs the regression guard + ENDPOINTS
174
+ coverage gate.
175
+ - **quick [change]** — small, obvious, non-phase change; no agent chain. Stays
176
+ non-TDD. Runs the mock regression corpus after the change.
177
+ - **unstuck** deep re-research after a circuit breaker (auto-run once in AUTO on
178
+ first stuck; human-triggered in GUIDED).
179
+ - **synthesize** compress STATE.md, dedup ISSUES.md, prune snapshots. Run
180
+ before a context reset.
181
+ - **replan** — revise `PLAN.md` (add/cut/split/merge/reorder pending phases) and
182
+ reconcile the regression corpus in step.
183
+ - **log-issue** — append an error to `ISSUES.md` with root cause + failed attempts.
184
+ - **log-decision** — record an architectural change in `docs/DESIGN_LOG.md`.
185
+ - **store-wisdom**promote resolved issues + research to the shared KB.
186
+
187
+ -----
188
+
189
+ ## Rituals (model-run fallback for hooks)
190
+
191
+ Where the host tool can run lifecycle hooks (Claude Code), these rituals are
192
+ AUTOMATED by hook scripts and must NOT be run by hand. Where the tool cannot
193
+ inject context, the model performs them itself.
194
+
195
+ ### SESSION-OPEN (start / clear / compact-resume)
196
+
197
+ At the start of every session, before any other work, surface:
198
+ 1. git state (branch, uncommitted count, last 3 commits).
199
+ 2. `docs/STATE.md`the current position. READ THIS FIRST.
200
+ 3. open items in `docs/ISSUES.md`.
201
+ 4. `docs/research/INDEX.md` (research map) + infra/auth status.
202
+ 5. shared KB: pull latest + load `docs/.kb-snapshot.md` if `.claude/kb-config.json`
203
+ exists.
204
+ 6. a one-line reminder of the hard rules below.
205
+
206
+ ### COMPRESS (before a context compaction)
207
+
208
+ Snapshot the live position to `docs/.snapshots/` so a post-compact session can
209
+ recover: current phase, next action, open blocker.
210
+
211
+ ### Token economy (always)
212
+
213
+ The cheapest token is the one never loaded. The kit is built to minimise context:
214
+
215
+ - **Phase boundary is the primary reset.** `/synthesize -> /clear` ends every
216
+ phase so the next one starts with a small, fresh context instead of carrying
217
+ the whole history forward.
218
+ - **Lazy, not always-on.** This methodology + the skills load on demand; only the
219
+ SessionStart hook output is paid every session, and it injects just the live
220
+ STATE + *open* issues (resolved ones stay on disk for grep, not re-paid in tokens).
221
+ - **Subagents isolate the noise.** Research, debugging, log/test output run in a
222
+ worker's own context and return a terse summary the orchestrator never pays
223
+ for the raw dump.
224
+ - **Soft context budget.** The phase boundary should keep you well under the model
225
+ window. If a single phase grows past ~50% of the window (≈ 200k on a 1M-context
226
+ model), treat it as a signal the slice is too big: `/synthesize -> /clear` or
227
+ split the phase. Don't coast to auto-compact. This is guidance, not a hard gate —
228
+ the number scales with the host model's window, it is not fixed at 200k.
229
+
230
+ -----
231
+
232
+ ## Autonomy
233
+
234
+ The kit runs in one of two modes, declared in `docs/OVERVIEW.md` (default: **AUTO**):
235
+
236
+ **Planning always asks; development doesn't.** Asking is cheap and decisive while
237
+ *planning* — so `/overview` (the double-grill) and plan approval stay interactive in
238
+ both modes. AUTO governs only the **development loop** (implement → test → debug →
239
+ close): there, interruptions are expensive, so it follows the plan instead of asking.
240
+
241
+ - **AUTO (default) — during DEVELOPMENT, follow the plan, don't interrupt.** Once a
242
+ plan exists, the dev loop prefers DECIDING over asking. Resolve any in-process
243
+ choice from (1) the PLAN/OVERVIEW/spec, (2) the codebase, (3) a sensible default +
244
+ the worker's recommendation — then RECORD it (`docs/DESIGN_LOG.md` or STATE) and
245
+ CONTINUE. Do not stop mid-build to ask.
246
+ - **GUIDED — ask at each fork in dev too.** The original behaviour: the development
247
+ loop also surfaces choices and waits. Use when exploring an unfamiliar codebase.
248
+
249
+ **Decision protocol (AUTO, dev loop).** Incomplete acceptance spec → fill the gap
250
+ with the most reasonable interpretation, log it as an Assumption, continue.
251
+ Ambiguous TDD classification → apply the default (`TDD_PHASE=true`), log the reason,
252
+ continue. A worker that would have asked the user instead writes its question + its
253
+ own best answer to STATE and proceeds on that answer. (If the gap is in the PLAN
254
+ itself — not just an implementation detail — that's a planning question: surface it.)
255
+
256
+ **Batched escalation (AUTO).** Blockers never halt the whole run. On first stuck,
257
+ auto-run `/unstuck` (deep re-research) — capped at ONCE per phase, since it is
258
+ token-expensive. Still stuck → park the blocked slice (mark `BLOCKED` in PLAN), move
259
+ to the next independent slice, and surface ONE consolidated report of all parked
260
+ blockers + logged assumptions at the phase boundary / end of run. The human reviews
261
+ THERE (the `/re` checkpoint), not mid-flow.
262
+
263
+ **Hard stops (BOTH modes — these always pause for a human).** Autonomy is for
264
+ *development*, not for risk. Stop and get sign-off ONLY for:
265
+ 1. Irreversible or outbound actions — deploy to prod, data deletion/migration,
266
+ `git push`, publish, spending money, sending external messages.
267
+ 2. Security-sensitive changes — auth, secrets, permissions, data exposure.
268
+ 3. A spec that is internally CONTRADICTORY (merely incomplete is NOT a stop — fill
269
+ + log instead).
270
+ 4. The debug budget is spent AND no independent slice remains to make progress.
271
+
272
+ AUTO removes *questions*, not *discipline*: tests, the phase gate, issue logging,
273
+ and the regression corpus all still run. The point is an efficient, end-to-end
274
+ development run that follows the spec and logs every problem so it never recurs.
275
+
276
+ -----
277
+
278
+ ## Hard rules (1–12)
279
+
280
+ 1. Before debugging ANY error: grep `docs/ISSUES.md` AND `docs/research/INDEX.md`.
281
+ The SESSION-OPEN ritual surfaces ISSUES.md — there is no excuse to miss it.
282
+ Before debugging an auth/infra error, check the infra + auth status surfaced
283
+ at SESSION-OPEN first.
284
+ 2. Debug attempt cap = 3: WARN at attempt 2. At 3, stop the in-place attempts (no
285
+ 4th). GUIDED → ask the user. AUTO → log the issue (root cause + failed
286
+ attempts), park the slice, and continue per the batched-escalation protocol.
287
+ The cap protects efficiency (no flailing / token burn) in both modes.
288
+ 3. Every resolved error -> logged to `docs/ISSUES.md` with root cause + failed
289
+ attempts.
290
+ 4. End of phase -> synthesize -> context reset -> next phase.
291
+ 5. **PHASE GATE** = the current-phase REAL API suite passes AND (frontend phase)
292
+ the E2E suite passes AND the accumulated mock regression corpus stays green AND
293
+ every `docs/ENDPOINTS.md` entry has at least one test in `tests/regression/`.
294
+ The final phase additionally runs the full REAL regression corpus. A green
295
+ mock suite alone can never close a phase.
296
+ 6. Tests are written by `test-author`, which never sees the implementation logic
297
+ (unbiased). On TDD phases the suite is written before the logic (RED-first).
298
+ `STACK NOT READY` / `FLAKE` do not spend the debug budget. Only `LOGIC FAIL`
299
+ reaches the debugger.
300
+ 7. E2E auth = a dedicated test account driven by a PROGRAMMATIC session
301
+ (an API login or a saved/reused auth state), never by scripting a
302
+ third-party OAuth/login UI.
303
+ 8. Architectural change (new/removed agent, hook, command, or a changed workflow
304
+ rule)? -> run `log-decision` before closing.
305
+ 9. **UI conforms to the frame.** ALL web / UI work applies the `ux-design` skill and
306
+ is verified EVERY time — no exception. Each change is validated against the
307
+ cookbook (design tokens, spacing, a11y/WCAG AA, **icons = a real SVG set, NEVER
308
+ emoji**, component + state inventory, breakpoints) AND stays inside the wireframe
309
+ baseline. Drift outside the frame is a defect, not a creative liberty. New visual
310
+ direction (something the wireframe doesn't cover) is confirmed with the user
311
+ before building. A frontend phase cannot CLOSE until the UX cookbook check passes.
312
+ 10. **No-rationalization (scoped).** Do not downgrade a TDD phase to non-TDD to
313
+ dodge the RED gate, and do not route phase-worthy work through `quick` to
314
+ dodge it. (Scoped deliberately to these two seams; this is not a broad
315
+ "never make excuses" rule.)
316
+ 11. **Secure SDLC (security at every stage).** Security runs through the whole loop
317
+ via the `security` skill, not just at the end:
318
+ - **design** — threat-model any phase that touches a trust boundary (STRIDE);
319
+ set the ASVS level; write abuse cases as negative acceptance criteria.
320
+ - **implement** — follow the secure-coding rules (OWASP Top 10 2025).
321
+ - **build (every phase CLOSE)** — secrets scan + SCA (dependency CVEs) + SAST
322
+ must be clean; open HIGH/CRITICAL BLOCKS the close.
323
+ - **pre-deploy** — run the pentest checklist (WSTG) + a security review of the
324
+ diff; sign artifacts (SLSA L2+).
325
+ - **operate** — vulnerability management: keep an SBOM, monitor for new CVEs.
326
+ Deploying to prod with open high/critical findings is a hard-stop (human
327
+ sign-off — see Autonomy). Grounded in OWASP / ASVS / WSTG / ISO 27001 / ISO 25010.
328
+ 12. **Code-standards gate.** Every coding phase: the formatter + linter are clean
329
+ (the language's standard tool), names follow the language's OWN idiom, and the
330
+ code conforms to the declared architecture (Feature-Based by default) — checked
331
+ at CLOSE. Lint/format errors or idiom violations BLOCK the close. (`code-standards`.)
332
+
333
+ -----
334
+
335
+ ## Shared KB (optional)
336
+
337
+ If `.claude/kb-config.json` exists, the SESSION-OPEN ritual pulls the KB and loads
338
+ a snapshot to `docs/.kb-snapshot.md`. The researcher checks the snapshot (step 0)
339
+ before any web search. Run `store-wisdom` to promote resolved issues + research to
340
+ the KB. The kit works normally without a KB.
341
+
342
+ -----
343
+
344
+ ## File contract
345
+
346
+ - `docs/PRD.md` — crystallised product requirements (or your BMAD/iSSM stories).
347
+ - `docs/PROPOSAL.md` — scope + phase breakdown + effort/cost estimate + assumptions
348
+ + sign-off. Versioned; the commercial baseline (source of truth).
349
+ - `docs/proposal.html` — the client-facing proposal, rendered from PROPOSAL.md via
350
+ `.claude/templates/proposal.html`, in the project language; print-ready (PDF).
351
+ - `.claude/templates/` — client-facing document templates (proposal.html, …) the
352
+ commands render into `docs/`.
353
+ - `docs/CHANGES.md` — change-order log (append-only): each scope change with its
354
+ impact, effort/cost delta, new total, and approval status. The commercial audit trail.
355
+ - `docs/STATE.md` — current position. Small. Rewritten, not appended.
356
+ - `docs/ISSUES.md` — error log. Deduped by synthesizer.
357
+ - `docs/PLAN.md` — the phase plan. The last phase has the deploy task.
358
+ - `docs/HISTORY.md` — one line per finished phase.
359
+ - `docs/DESIGN_LOG.md` — kit architectural rationale (§5.x decision log).
360
+ - `docs/OVERVIEW.md` — project scope. Written after the double-grill in `overview`.
361
+ E2E target.
362
+ - `docs/ENDPOINTS.md` — API/service endpoint catalogue. Maintained by implementer
363
+ each phase. Drives the CLOSE coverage gate.
364
+ - `docs/research/` — full research + debug files. `INDEX.md` is the searchable map.
365
+ `design-<slug>.md` (design research), `<slug>.md` (impl research),
366
+ `debug-<slug>.md` (debugger traces).
367
+ - `docs/.snapshots/` — pre-compact recovery markers (auto-pruned, gitignored).
368
+ Holds no secrets.
369
+ - your E2E stack — runner config + any ephemeral test services (e.g. `e2e/`,
370
+ `playwright.config.ts`, `scripts/e2e-stack.sh`, `docker-compose.test.yml`).
371
+ Names are conventions; use whatever your declared stack ships.
372
+ - `tests/phase-<n>/` — phase-local test suites.
373
+ - `tests/regression/` — cross-phase contract tests (the regression corpus). Run by
374
+ `scripts/regression.sh` (default mock; `--real` runs the real corpus).
375
+ - `.claude/skills/ux-design/` — the UX cookbook + wireframe baseline (read on
376
+ demand for any UI work).
377
+ - `.claude/skills/security/` — the Secure SDLC cookbook + threat-modeling /
378
+ secure-coding / pentest / standards references (read on demand for security work).
379
+ - `.claude/skills/code-standards/` — naming-per-language + architecture cookbook
380
+ (read on demand for any coding / scaffolding / structure decision).
381
+ - `.claude/kb-config.json` — shared KB path + remote (optional).
382
+ - `docs/.kb-snapshot.md` — KB INDEX loaded this session (auto-generated, gitignored).
383
+
384
+ -----
385
+
386
+ ## Capability matrix (which host gets what)
387
+
388
+ The kit is single-source (`.claude/` + this file). `create-issflow --tool=<host>`
389
+ writes the right adapter; unsupported features degrade to model-run rituals, never
390
+ silently vanish. The portable assets (agents, commands, skills, methodology) are
391
+ the same everywhere — only the *wiring* differs.
392
+
393
+ | Host | Entry file | Commands | Subagents | Lifecycle hooks | Shared KB |
394
+ |------|-----------|----------|-----------|-----------------|-----------|
395
+ | **Claude Code** (reference) | `AGENTS.md` + `.claude/` | `.claude/commands/` | native | SessionStart · PreCompact · SubagentStop (with context injection) | yes |
396
+ | **Codex CLI** | `AGENTS.md` (native) | `.claude/commands/` (read as prompts) | read as reference | model-run | yes |
397
+ | **Cursor** | `.cursor/rules/` + `AGENTS.md` | `.cursor/commands/` | reads `.claude/agents/` | `.cursor/hooks.json` (sessionStart · subagentStop) | yes |
398
+ | **Gemini CLI** | `GEMINI.md` + `AGENTS.md` | `.claude/commands/` (read as prompts) | read as reference | model-run | yes |
399
+ | **Aider** | `.aider.conf.yml` → `AGENTS.md` | read as reference | model-run | model-run | yes |
400
+ | **Any AGENTS.md host** | `AGENTS.md` | read as reference | model-run | model-run | yes |
401
+
402
+ "model-run" = the host can't automate the ritual, so the model performs it by hand
403
+ (SESSION-OPEN at the top of each session; the COMPRESS snapshot before a reset).