@curdx/flow 2.0.0-beta.6 → 2.0.0-beta.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -6,7 +6,7 @@
6
6
  },
7
7
  "metadata": {
8
8
  "description": "Claude Code Discipline Layer — spec-driven workflow + goal-backward verification + Karpathy 4 principles enforced via gates. Stops Claude from faking \"done\" on non-trivial features.",
9
- "version": "2.0.0-beta.6"
9
+ "version": "2.0.0-beta.7"
10
10
  },
11
11
  "plugins": [
12
12
  {
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "curdx-flow",
3
- "version": "2.0.0-beta.6",
3
+ "version": "2.0.0-beta.7",
4
4
  "description": "Claude Code Discipline Layer — spec-driven workflow + goal-backward verification + Karpathy 4 principles enforced via gates. Stops Claude from faking \"done\" on non-trivial features.",
5
5
  "author": {
6
6
  "name": "wdx",
@@ -64,29 +64,16 @@ Based on input type:
64
64
 
65
65
  ### Step 2: Round 1 — Breadth Scan
66
66
 
67
- For each of the 6 categories, use sequential-thinking **one by one**:
67
+ Walk through the applicable categories below. **Skip categories that don't apply** (e.g. no UI → UX is N/A; no auth → Security only if that absence is itself material) and note them as `N/A: <reason>` in your report. Use sequential-thinking proportional to the surface each category presents — 1 thought for a trivial check, more for genuinely complex surfaces.
68
68
 
69
- ```
70
- Round 1: Architecture layer
71
- Think: Are these decisions right? Will we regret them later? Any implicit coupling?
72
-
73
- Round 2: Implementation layer
74
- Think: Code quality? Error handling? Boundaries?
75
-
76
- Round 3: Testing layer
77
- Think: Coverage? Over-mocked? Falsely green?
69
+ - **Architecture**: Are decisions right? Will we regret them in 6 months? Any implicit coupling?
70
+ - **Implementation**: Code quality? Error handling? Boundaries?
71
+ - **Testing**: Coverage? Over-mocked? Falsely green?
72
+ - **Security**: Injection? Privilege escalation? Leakage? Auth bypass?
73
+ - **Maintainability**: Naming? Structure? Can the next maintainer understand?
74
+ - **UX** (if UI / API contract is involved): Error messages clear? Loading? Accessibility?
78
75
 
79
- Round 4: Security layer
80
- Think: Injection? Privilege escalation? Leakage? Auth bypass?
81
-
82
- Round 5: Maintainability layer
83
- Think: Naming? Structure? Can the next maintainer understand?
84
-
85
- Round 6: UX layer (if UI / API contract is involved)
86
- Think: Are error messages clear? Loading? Accessibility?
87
- ```
88
-
89
- **Key point**: every round must **specifically point out what was examined** (file:line), not vague thinking.
76
+ **Key point**: whenever you examine a category, cite what you looked at (file:line or design-doc section), not vague thinking.
90
77
 
91
78
  ### Step 3: Judgment
92
79
 
@@ -108,24 +95,11 @@ else:
108
95
 
109
96
  ### Step 4: Round 2 — Deep Drill
110
97
 
111
- For areas where Round 1 said "looks fine", use sequential-thinking for another 6 rounds:
98
+ For the "looks fine" areas from Round 1, use sequential-thinking proportional to the residual uncertainty. Three lenses to rotate through (stop when the drill honestly surfaces nothing new, don't force all three):
112
99
 
113
- ```
114
- Rounds 1-2: Trust but verify
115
- - Round 1 I said architecture is fine really?
116
- - Did I only look at the surface?
117
- - What pitfalls have similar projects (e.g., open-source comparisons) hit?
118
-
119
- Rounds 3-4: Counterfactual thinking
120
- - What happens if this system is stress-tested by an adversarial user?
121
- - As code evolves in 6 months, will this decision become a bottleneck?
122
- - What about 10x/100x load?
123
-
124
- Rounds 5-6: Boundaries and implicits
125
- - What "default behaviors" are in the code but unstated?
126
- - Has the dependency library had any famous CVEs?
127
- - What does this design assume users won't do? What if they do?
128
- ```
100
+ - **Trust but verify**: did I only look at the surface? What pitfalls have similar open-source projects hit?
101
+ - **Counterfactual**: under adversarial stress? In 6 months as the codebase evolves? At 10x / 100x load?
102
+ - **Boundaries and implicits**: what "default behaviors" are unstated? Any CVE history in the dependency? What does the design assume users won't do?
129
103
 
130
104
  ### Step 5: Fallback If Still Zero Findings
131
105
 
@@ -134,7 +108,7 @@ If Round 2 still yields no findings, you must output a **proof report**:
134
108
  ```markdown
135
109
  ## Adversarial Review — No Sufficient Findings (Proof Report)
136
110
 
137
- In 2 rounds × 6 dimensions = 12 rounds of sequential-thinking, I checked:
111
+ Across Round 1 (breadth) and Round 2 (depth), I checked the following applicable dimensions (N/A ones listed separately):
138
112
 
139
113
  ### Architecture (specifically examined)
140
114
  - AD-01~05 in design.md
@@ -252,7 +252,7 @@ If the user agrees, suggest a set of tasks to append to tasks.md:
252
252
 
253
253
  ## Forbidden
254
254
 
255
- - ✗ Skipping any of the 7 categories (even if the project is not internationalized, at least state "I18n not applicable, reason: X")
255
+ - ✗ Silently skipping a category N/A is fine, but every category that doesn't apply must be named with a one-line reason (e.g. "I18n: N/A — single-locale MVP")
256
256
  - ✗ Listing scenarios only from imagination (must grep the code + compare tests)
257
257
  - ✗ Not using sequential-thinking
258
258
  - ✗ Gap list without priority ordering
@@ -260,7 +260,7 @@ If the user agrees, suggest a set of tasks to append to tasks.md:
260
260
 
261
261
  ## Quality Self-Check
262
262
 
263
- - [ ] All 7 categories covered?
263
+ - [ ] Every applicable category examined, with N/A reasons recorded for the rest?
264
264
  - [ ] Each gap has category + location + scenario + risk + recommended test code?
265
265
  - [ ] Priority ordering is clear?
266
266
  - [ ] Findings proportional to real edge-case surface (zero is OK if all categories honestly N/A)
@@ -138,7 +138,7 @@ For each of the following sources, every item must be covered by tasks:
138
138
  **CRITICAL (see L8 of the preamble — long-artifact handling):**
139
139
  - Your FIRST action in this step must be a `Write` tool call with the full `tasks.md` content. Do NOT paste the file content as assistant text before writing.
140
140
  - Do NOT preview the tasks list in the response. The file itself is the deliverable.
141
- - If `tasks.md` would be >200 lines, split into `tasks-phase-1.md` `tasks-phase-5.md` and make `tasks.md` a short index linking to them.
141
+ - If a single `Write` call would approach the sub-agent output-token budget (judge by section density, not line count — see preamble L8), split into `tasks-phase-<n>.md` files and make `tasks.md` a short index linking to them.
142
142
 
143
143
  Based on `${CLAUDE_PLUGIN_ROOT}/templates/tasks.md.tmpl`. Must include a **coverage audit table** at the end (from Step 5).
144
144
 
@@ -189,7 +189,7 @@ else:
189
189
 
190
190
  **CRITICAL (see L8 of the preamble):** your FIRST action in this step must be a `Write` tool call with the **complete report content**. Do NOT paste the report as assistant text before writing. After the write succeeds, respond with a ≤ 5-line summary only (path, verdict, blocker count, next step). Do not re-paste the report.
191
191
 
192
- If the report would exceed ~200 lines, split into `review-report.md` (short index + verdict) and `review-details.md` (full findings) — two `Write` calls.
192
+ If a single `Write` call would approach the sub-agent output-token budget (judge by section density, not line count), split into `review-report.md` (short index + verdict) and `review-details.md` (full findings) — two `Write` calls. See preamble L8.
193
193
 
194
194
  Full structure (use this as the content passed to `Write`, not as preview text):
195
195
 
@@ -174,7 +174,7 @@ For each match, check:
174
174
 
175
175
  **CRITICAL (see L8 of the preamble):** your FIRST action in this step must be a `Write` tool call with the **complete report content**. Do NOT paste the report as assistant text before writing — doing so doubles output tokens and causes truncation inside the `Write` call. After the write succeeds, respond with a ≤ 5-line summary only (path, verdict counts, next step). Do not re-paste the report.
176
176
 
177
- If the report would exceed ~200 lines, split into `verification-report.md` (short index + verdict) and `verification-details.md` (full findings table) — two `Write` calls.
177
+ If a single `Write` call would approach the sub-agent output-token budget (judge by section density, not line count), split into `verification-report.md` (short index + verdict) and `verification-details.md` (full findings table) — two `Write` calls. See preamble L8.
178
178
 
179
179
  Required structure (use this as the content passed to `Write`, not as preview text):
180
180
 
package/commands/fast.md CHANGED
@@ -123,6 +123,6 @@ Choosing the right scenario matters more than forcing the flow.
123
123
  ## Forbidden
124
124
 
125
125
  - ✗ Committing without running verification
126
- - ✗ Changes touching more than 5 files (means it is no longer fast — run the full flow)
126
+ - ✗ Changes touching many unrelated files or modules (means it is no longer fast — run the full flow)
127
127
  - ✗ Writing library APIs from memory
128
128
  - ✗ Skipping the Step 2 5-question clarification (even when "obvious," explicit statement still has value)
@@ -330,7 +330,7 @@ Prerequisites:
330
330
 
331
331
  ## Step 6: Progress Feedback
332
332
 
333
- Every 5 tasks or every wave, print status:
333
+ At each wave boundary (or periodically during long linear runs), print status:
334
334
 
335
335
  ```
336
336
  ═════ Progress ═════
@@ -16,8 +16,8 @@ Distinct from `/curdx-flow:verify`:
16
16
  | Flag | Default | Purpose |
17
17
  |------|---------|---------|
18
18
  | `--stage=<1\|2\|both>` | `both` | Stage 1 = spec compliance only. Stage 2 = code quality only. `both` = sequential. |
19
- | `--adversarial` | off | Add an adversarial review pass (6 dimensions × 2 sequential-thinking rounds). Zero-findings forbidden. |
20
- | `--edge-case` | off | Add edge-case hunting across the 7 categories. Produces a test-gap checklist. |
19
+ | `--adversarial` | off | Add an adversarial review pass across applicable categories (zero findings requires proof-of-checking, not fabrication). |
20
+ | `--edge-case` | off | Add edge-case hunting across applicable categories. Produces a test-gap checklist. |
21
21
 
22
22
  ## Preflight
23
23
 
@@ -65,7 +65,7 @@ Output: Stage-2 section of the report.
65
65
  ## Optional: adversarial review
66
66
 
67
67
  If `--adversarial`:
68
- Dispatch `flow-adversary`. It runs 6 dimensions × 2 rounds of `sequential-thinking`:
68
+ Dispatch `flow-adversary`. It scans the applicable categories (Architecture / Implementation / Testing / Security / Maintainability / UX — skip N/A with reason) using `sequential-thinking` proportional to the residual uncertainty, probing:
69
69
  1. What's missing?
70
70
  2. What's overengineered?
71
71
  3. What would break first in production?
@@ -73,12 +73,12 @@ Dispatch `flow-adversary`. It runs 6 dimensions × 2 rounds of `sequential-think
73
73
  5. What decision locks us out of a future option?
74
74
  6. What would a skeptical reviewer reject?
75
75
 
76
- **Zero findings are forbidden** — if the agent reports "all good", re-dispatch with stronger skepticism. Per `@${CLAUDE_PLUGIN_ROOT}/gates/adversarial-review-gate.md`.
76
+ **Zero findings requires proof-of-checking, not fabrication** — honest "clean" verdicts are fine if the agent lists what it examined. Per `@${CLAUDE_PLUGIN_ROOT}/gates/adversarial-review-gate.md`.
77
77
 
78
78
  ## Optional: edge-case hunting
79
79
 
80
80
  If `--edge-case`:
81
- Dispatch `flow-edge-hunter` across the 7 categories:
81
+ Dispatch `flow-edge-hunter` across the applicable categories (skip N/A with one-line reason):
82
82
  1. Boundary values (0, MAX, empty, one-over-limit)
83
83
  2. Concurrency / race conditions
84
84
  3. Network failure / partial failure
package/commands/spec.md CHANGED
@@ -82,7 +82,7 @@ Output: `requirements.md` with user stories (US-NN), acceptance criteria (AC-N.N
82
82
 
83
83
  ### design → `flow-architect`
84
84
  Inputs: `research.md` + `requirements.md`.
85
- Output: `design.md` with architecture decisions (AD-NN), component boundaries, data models, error-path design, mermaid diagrams. Must use `sequential-thinking` MCP (≥8 thoughts).
85
+ Output: `design.md` with architecture decisions (AD-NN), component boundaries, data models, error-path design, mermaid diagrams (when they clarify). Uses `sequential-thinking` MCP proportional to the genuine tradeoff surface.
86
86
 
87
87
  ### tasks → `flow-planner`
88
88
  Inputs: all three prior files + `.flow/PROJECT.md` tech stack.
@@ -87,7 +87,7 @@ Input: object under review (code range / spec / PR diff)
87
87
 
88
88
  Round 1 (agent self-analysis):
89
89
  - Use sequential-thinking proportional to the surface being probed
90
- - Scan all 6 categories
90
+ - Scan each applicable category; mark N/A ones with reason
91
91
  - Output findings list
92
92
 
93
93
  Decision:
@@ -190,10 +190,10 @@ Fix loop:
190
190
 
191
191
  ## Failure Recovery
192
192
 
193
- If after 2 rounds there are still < 3 findings:
193
+ If after Round 2 the honest verdict is still zero findings, emit a proof-of-checking report (do NOT fabricate to hit a quota — there is no quota):
194
194
 
195
195
  ```markdown
196
- ## Adversarial Review — Insufficient Findings
196
+ ## Adversarial Review — Proof of Checking (zero findings)
197
197
 
198
198
  I have examined the following dimensions across 2 rounds of analysis:
199
199
 
@@ -210,7 +210,7 @@ Attach a DevEx checklist at PR time:
210
210
 
211
211
  ## Scoring
212
212
 
213
- Each dimension 0-10 points:
213
+ Score each **applicable** dimension 0-10 (N/A dimensions are excluded from the total):
214
214
 
215
215
  ```
216
216
  10 = best practice
@@ -220,8 +220,7 @@ Each dimension 0-10 points:
220
220
  0 = serious issue
221
221
  ```
222
222
 
223
- Total 40+ / 80 = pass (warning, non-blocking).
224
- Total < 40 = blocked, improvement required.
223
+ Emit the per-dimension scores with evidence. The gate itself does not block on a numeric threshold; it surfaces the weaknesses for the user (or the reviewing agent) to decide whether any of them rise to a blocker. A single 0/10 on a material dimension is a blocker regardless of the total.
225
224
 
226
225
  ---
227
226
 
@@ -223,13 +223,14 @@ return "linear"
223
223
 
224
224
  ## Failure Handling (common to all strategies)
225
225
 
226
- `flow-executor` agent's 5-round retry mechanism:
226
+ `flow-executor` agent's retry ladder — each step escalates only when the prior is honestly exhausted, not on a fixed count:
227
227
 
228
228
  ```
229
- Rounds 1-2: agent retries autonomously (edit code, rerun Verify)
230
- Round 3: sequential-thinking root-cause analysis 5 rounds
231
- Round 4: read related source + trace data flow
232
- Round 5: report TASK_FAILED
229
+ Step A: autonomous retry (edit + rerun Verify) — only for shallow failures
230
+ Step B: sequential-thinking root-cause analysis proportional to the hypothesis space
231
+ Step C: read related source + trace data flow
232
+ Step D: if ≥3 retries fail with no new hypothesis, stop and challenge the architecture (see preamble L3)
233
+ Step E: report TASK_FAILED
233
234
  ```
234
235
 
235
236
  ### Extra protections for Stop-Hook strategy
@@ -57,7 +57,7 @@ What's wasted isn't code — it's context tokens and decision fatigue from churn
57
57
  **Key behaviors** (flow-researcher agent):
58
58
  1. Read `.flow/PROJECT.md` and `.flow/CONTEXT.md` to understand project background
59
59
  2. Call `mcp__claude_mem__search` to retrieve relevant historical experience
60
- 3. Use sequential-thinking for 5-8 rounds of problem understanding
60
+ 3. Use sequential-thinking proportional to the unknowns (1 thought for a trivial prototype, many for a novel domain)
61
61
  4. Scan the codebase for reusable modules
62
62
  5. Use `mcp__context7__*` to look up latest docs for relevant libraries
63
63
  6. When necessary, WebSearch for the latest technical trends
@@ -99,11 +99,12 @@ What's wasted isn't code — it's context tokens and decision fatigue from churn
99
99
 
100
100
  **Key behaviors** (flow-architect agent):
101
101
  1. Read `research.md` + `requirements.md`
102
- 2. **Must use sequential-thinking for at least 8 rounds**:
103
- - Rounds 1-2: constraints
104
- - Rounds 3-5: comparison of options A/B
105
- - Rounds 6-7: selection + trade-offs
106
- - Round 8: rebut yourself
102
+ 2. **Use sequential-thinking proportional to the tradeoff surface** — the phases below are orientation, not a quota:
103
+ - Constraints (from NFR / tech stack)
104
+ - Option comparison (only when alternatives genuinely compete)
105
+ - Selection + accepted tradeoff
106
+ - Self-rebuttal
107
+ A well-known stack pick may finish in 1 thought; a distributed-system design may run many. Do not pad.
107
108
  3. Assign an `AD-NN` ID to each architectural decision
108
109
  4. Draw a data flow diagram (mermaid)
109
110
  5. Define component interfaces + error paths
@@ -125,7 +126,7 @@ What's wasted isn't code — it's context tokens and decision fatigue from churn
125
126
  3. Each task has 5 fields: `Do` / `Files` / `Done-when` / `Verify` / `Commit`
126
127
  4. **Multi-source coverage audit**: for each FR / AC / AD / decision, confirm there is a covering task (no omissions)
127
128
  5. Mark `[P]` (parallel-safe) and `[VERIFY]` (checkpoint)
128
- 6. Simple decomposition doesn't need sequential-thinking, but reflect on coverage every 5 tasks
129
+ 6. Simple decomposition doesn't need sequential-thinking; run a coverage audit at the end (every FR/AC/AD has a task)
129
130
 
130
131
  **Deliverable**: `tasks.md`
131
132
 
@@ -113,17 +113,18 @@ Stage 2 applies all enabled Gates (from `.flow/config.json`):
113
113
 
114
114
  #### 2.5 (enterprise) Adversarial review (adversarial-review-gate)
115
115
 
116
- - 3 categories of issues found?
116
+ - Every applicable category examined (N/A documented for the rest)?
117
+ - Findings proportional to real issues (zero is OK with a proof-of-checking report)?
117
118
  - Each finding has evidence + recommendation?
118
119
 
119
120
  #### 2.6 (enterprise) Edge cases (edge-case-gate)
120
121
 
121
- - Did all 7 major categories pass?
122
+ - Each applicable edge-case category addressed (N/A noted for the rest)?
122
123
  - Gap list has priorities?
123
124
 
124
125
  ### Stage 2 verdict
125
126
 
126
- - **EXCELLENT**: all enabled Gates pass, adversarial findings < 3 (high-quality code)
127
+ - **EXCELLENT**: all enabled Gates pass, adversarial review clean or only low-severity findings
127
128
  - **GOOD**: all enabled Gates pass, but some warnings
128
129
  - **NEEDS_IMPROVEMENT**: Gate violations (blocking)
129
130
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@curdx/flow",
3
- "version": "2.0.0-beta.6",
3
+ "version": "2.0.0-beta.7",
4
4
  "description": "CLI installer for CurDX-Flow — AI engineering workflow meta-framework for Claude Code",
5
5
  "type": "module",
6
6
  "bin": {
@@ -9,155 +9,75 @@ depends_on: requirements.md
9
9
 
10
10
  # Technical Design: {{SPEC_NAME}}
11
11
 
12
- > Conclusions from the flow-architect agent. Sequential-thinking is invoked proportional to the genuine tradeoff surface of this design — the thinking chain does not appear here, only the conclusions.
13
- > This document freezes the technical choices. Subsequent tasks / implementation strictly follow this design.
12
+ > Conclusions from flow-architect. Sequential-thinking is invoked proportional to the genuine tradeoff surface — the chain lives in the thinking tool, not this document.
13
+ >
14
+ > **Fill only the sections that carry real design information for this feature.** Well-known stack assemblies legitimately compress to a stack list + data model + a few real ADs. Delete sections whose honest answer would be "N/A" or "standard for this stack". A forced 13-section template is the bloat pattern this is designed to prevent.
14
15
 
15
16
  ---
16
17
 
17
18
  ## Design Overview (one paragraph)
18
19
 
19
- <!-- One-sentence summary of the architecture -->
20
+ <!-- One sentence summary of the approach. -->
20
21
 
21
22
  ## Architecture Decisions
22
23
 
23
- <!-- Each major decision gets an ID and is written to the decisions array in .flow/STATE.md -->
24
+ <!-- Each real decision gets an AD-NN. If a decision is "obvious, no alternative worth listing," use one line and move on. -->
24
25
 
25
26
  ### AD-01: ...
26
- - **Decision**: Use X instead of Y
27
+ - **Decision**: Use X
27
28
  - **Rationale**: ...
28
- - **Trade-off**: Accepted [downside] in exchange for [upside]
29
- - **sequentialthinking rounds**: rounds 3-5
30
-
31
- ### AD-02: ...
32
-
33
- ## System Architecture Diagram
34
-
35
- ```mermaid
36
- flowchart TB
37
- <!-- actual data flow generated by flow-architect -->
38
- User[User] --> API[API Gateway]
39
- API --> Auth[Auth Service]
40
- Auth --> DB[(Database)]
41
- ```
29
+ - **Trade-off**: ... (omit if there is no genuine tradeoff)
42
30
 
43
31
  ## Component Design
44
32
 
45
- <!-- Each component is independently testable. Interfaces are explicit. -->
33
+ <!-- Each component: responsibility, input type, output type, dependencies, error path. Skip if the feature is a single module with no internal boundaries worth naming. -->
46
34
 
47
- ### Component: {{COMP_NAME_1}}
35
+ ### Component: {{COMP_NAME}}
48
36
  - **Responsibility**: ...
49
- - **Input**:
50
- ```ts
51
- interface Input {
52
- field: Type;
53
- }
54
- ```
55
- - **Output**:
56
- ```ts
57
- interface Output {
58
- field: Type;
59
- }
60
- ```
61
- - **Dependencies**: Component X, Library Y
62
- - **Errors**:
63
- - `ErrorCode.X` — when ... happens
64
- - `ErrorCode.Y` — when ... happens
65
-
66
- ### Component: {{COMP_NAME_2}}
67
- <!-- ... -->
68
-
69
- ## Data Model
70
-
71
- <!-- Database schema / data structures -->
72
-
73
- ### Entity: ...
74
- ```sql
75
- CREATE TABLE ... (
76
- id UUID PRIMARY KEY,
77
- ...
78
- );
79
- ```
37
+ - **Input**: `interface Input { ... }`
38
+ - **Output**: `interface Output { ... }`
39
+ - **Dependencies**: ...
40
+ - **Errors**: ...
80
41
 
81
- ### Or TypeScript types:
82
- ```ts
83
- interface Entity {
84
- id: string;
85
- ...
86
- }
87
- ```
42
+ ## Data Model (if the feature touches persistence or structured data)
88
43
 
89
- ## State Machine (if applicable)
44
+ <!-- SQL schema, TypeScript types, or API payload shape. Delete if the feature has no meaningful data shape. -->
45
+
46
+ ## Architecture Diagram (include only when it clarifies; prose often suffices)
90
47
 
91
48
  ```mermaid
92
- stateDiagram-v2
93
- [*] --> Pending
94
- Pending --> Active: approve
95
- Pending --> Rejected: reject
96
- Active --> Completed: finish
49
+ flowchart TB
50
+ ...
97
51
  ```
98
52
 
99
- ## Error Path Design
53
+ ## State Machine (include only if the feature has non-trivial state transitions)
100
54
 
101
- <!-- Full flow on failure -->
55
+ ## Error Path Design (include when error behavior is not obvious)
102
56
 
103
- | Scenario | Upstream Behavior | System Response | User-visible |
104
- |-----|--------|---------|---------|
105
- | DB connection lost | retry 3 times | return 503 | "Temporarily unavailable, retry in 1 minute" |
106
- | Rate limit hit | none | return 429 | "Too many requests, retry in 60 seconds" |
57
+ | Scenario | System Response | User-visible |
58
+ |-----|---------|---------|
59
+ | ... | ... | ... |
107
60
 
108
- ## API Contract
109
-
110
- <!-- If this is an API project -->
61
+ ## API Contract (include only if this feature exposes or changes an API)
111
62
 
112
63
  ```yaml
113
- POST /api/v1/...
114
- Request:
115
- body:
116
- field: string
117
- Response:
118
- 200:
119
- body:
120
- field: string
121
- 400:
122
- body:
123
- error: string
64
+ ...
124
65
  ```
125
66
 
126
- ## Test Matrix
67
+ ## Test Matrix (brief — one line per layer)
127
68
 
128
69
  | Layer | Coverage | Tool |
129
70
  |---|-----|------|
130
- | Unit | All pure functions | vitest |
131
- | Integration | Between components | vitest + supertest |
132
- | E2E | Complete user flows | playwright / chrome-devtools MCP |
133
-
134
- ### Key Test Scenarios
135
- 1. Happy path: ...
136
- 2. Edge case 1: ...
137
- 3. Error recovery: ...
138
-
139
- ## Suggested Implementation Order
140
-
141
- <!-- Reference for decomposition in the tasks phase -->
142
-
143
- 1. Build skeleton first (Component A → empty implementation)
144
- 2. Then wire up the real logic (core logic of Component A)
145
- 3. Connect DB (persistence for Component A)
146
- 4. Then do Component B ...
147
-
148
- ## Risks and Mitigations
71
+ | ... | ... | ... |
149
72
 
150
- | Risk | Level | Mitigation |
151
- |-----|-----|------|
152
- | ... | medium | ... |
73
+ ## Risks and Mitigations (include only if risks exist that aren't obvious from the ADs)
153
74
 
154
75
  ## Defer to Implementation
155
76
 
156
- <!-- Decisions not worth spending time on in the design phase -->
77
+ <!-- Decisions explicitly deferred to when the executor writes the code. -->
157
78
 
158
- - Logging library choice → reuse project's existing one during implementation
159
- - Caching strategy → no caching initially, adjust based on data after launch
79
+ - ...
160
80
 
161
81
  ---
162
82
 
163
- _Generated by flow-architect agent on {{CREATED_DATE}}. After user reviews and approves AD-01~N, proceed to the tasks phase._
83
+ _Generated by flow-architect on {{CREATED_DATE}}._
@@ -9,86 +9,68 @@ depends_on: research.md
9
9
 
10
10
  # Requirements Spec: {{SPEC_NAME}}
11
11
 
12
- > **Recommended direction from the research phase**: {{RESEARCH_CONCLUSION}}
12
+ > **Recommended direction from research**: {{RESEARCH_CONCLUSION}}
13
13
  >
14
- > This phase: translate "technically feasible" into "concrete behaviors users benefit from".
14
+ > **Fill only the sections that carry real information for this feature.** Delete or collapse any section whose honest content would be "N/A" or "same as usual". Padding sections with "TBD" is worse than omitting them.
15
15
 
16
16
  ---
17
17
 
18
18
  ## User Stories
19
19
 
20
- <!-- Each story follows the format: As X, I want Y, so that Z -->
21
-
22
20
  ### US-01: ...
23
- **As** [user role],
24
- **I want** [capability],
25
- **so that** [business value].
21
+ **As** [user role], **I want** [capability], **so that** [business value].
26
22
 
27
23
  **Acceptance criteria**:
28
24
  - AC-1.1: [verifiable behavior]
29
- - AC-1.2: [verifiable behavior]
30
- - AC-1.3: [edge case handling]
25
+ - AC-1.2: ...
31
26
 
32
- ### US-02: ...
33
- <!-- ... -->
27
+ <!-- Add more US-NN blocks only if the feature genuinely has multiple independent user flows. -->
34
28
 
35
29
  ## Functional Requirements
36
30
 
37
- <!-- FR-NN format. Each FR must be a verifiable statement of "the system must X". -->
38
-
39
31
  - **FR-01**: The system must ...
40
- - **FR-02**: The system must ...
41
- - **FR-03**: ...
32
+ - **FR-02**: ...
42
33
 
43
34
  ## Non-Functional Requirements
44
35
 
45
- ### Performance
46
- - **NFR-P-01**: [e.g. P95 response time < 200ms]
47
- - **NFR-P-02**: ...
36
+ <!--
37
+ Include ONLY the NFR categories that this feature is actually constrained by.
38
+ For a small internal CRUD feature, "Performance / Security / Maintainability / Compatibility" as a four-bucket grid is usually padding.
39
+ Delete categories that have no real requirement, or collapse into one line: "NFR: standard for this stack, no special constraints."
40
+ -->
48
41
 
49
- ### Security
50
- - **NFR-S-01**: ...
51
- - **NFR-S-02**: ...
42
+ ### Performance (if applicable)
43
+ - **NFR-P-01**: ...
52
44
 
53
- ### Maintainability
54
- - **NFR-M-01**: ...
45
+ ### Security (if applicable)
46
+ - **NFR-S-01**: ...
55
47
 
56
- ### Compatibility
57
- - **NFR-C-01**: ...
48
+ <!-- Delete Maintainability / Compatibility sections unless they carry a real constraint. -->
58
49
 
59
50
  ## Edge Cases and Error Handling
60
51
 
61
- <!-- Must be explicit: what happens on failure? how are abnormal inputs handled? -->
52
+ <!-- Include rows only for scenarios that actually apply. -->
62
53
 
63
54
  | Scenario | Expected behavior |
64
55
  |-----|--------|
65
- | Network disconnected | ... |
66
- | Database exception | ... |
67
- | Invalid input | ... |
68
- | Concurrent conflict | ... |
56
+ | ... | ... |
69
57
 
70
58
  ## Out of Scope
71
59
 
72
- <!-- Karpathy principle 2: simplicity first. Explicitly list "not this time" to prevent scope creep. -->
73
-
74
- - ✗ Feature A — deferred to the next version
75
- - ✗ Feature B — out of budget
76
- - ✗ Feature C — needs its own spec
60
+ - ...
77
61
 
78
- ## Success Metrics
62
+ ## Success Metrics (if the feature has measurable outcomes)
79
63
 
80
- <!-- Must be quantifiable -->
64
+ <!-- Delete this section for internal tools or refactors with no user-visible metric. -->
81
65
 
82
- - Metric 1: [e.g. user signup completion rate > 80%]
83
- - Metric 2: [e.g. complaint rate < 1%]
66
+ - Metric 1: ...
84
67
 
85
68
  ## Open Questions
86
69
 
87
- <!-- Questions that need user answers -->
70
+ <!-- Include only if there are genuinely unresolved questions. Delete when empty. -->
88
71
 
89
- 1. **Question 1**: ...
90
- 2. **Question 2**: ...
72
+ 1. ...
91
73
 
92
74
  ---
93
75
 
94
- _Generated by flow-product-designer agent on {{CREATED_DATE}}. After user review, proceed to the design phase._
76
+ _Generated by flow-product-designer on {{CREATED_DATE}}._
@@ -10,105 +10,74 @@ status: in_progress
10
10
 
11
11
  > **Goal**: {{SPEC_GOAL}}
12
12
  >
13
- > Output of this phase. Subsequent requirements / design / tasks are all based on the conclusions of this document.
13
+ > **Fill only the sections that carry real information.** For a well-understood feature on a known stack, research legitimately compresses to: goal, one recommended direction, known constraints. Delete sections whose honest content would be "N/A" or "first time, nothing to fetch". Padding this document with "TBD" is worse than omitting sections.
14
14
 
15
15
  ---
16
16
 
17
- ## Prior Experience (from claude-mem)
18
-
19
- <!--
20
- flow-researcher first calls mcp__claude_mem__search to retrieve relevant history.
21
- If there are relevant observations, summarize them here; if not, write "(first research on this topic)".
22
- -->
17
+ ## Prior Experience (from claude-mem, if relevant)
23
18
 
24
19
  {{CLAUDE_MEM_FINDINGS}}
25
20
 
26
- ## Problem Understanding
21
+ <!-- Delete this section if there are no relevant prior observations. -->
27
22
 
28
- <!-- Translate the user's goal into technical language. Explicitly list assumptions. -->
23
+ ## Problem Understanding
29
24
 
30
25
  ### Core Problem
31
- <!-- One-line description of what we are solving -->
26
+ <!-- One sentence. What are we solving? -->
32
27
 
33
28
  ### Explicit Assumptions
34
- <!-- Karpathy principle 1: think before coding. List all assumptions for the user to confirm -->
29
+ <!-- Only real assumptions that matter. Don't list "assumption: we will write code." -->
30
+
35
31
  - Assumption 1: ...
36
- - Assumption 2: ...
37
32
 
38
33
  ### Known Constraints
39
- - Tech stack:
40
- - Budget / time:
41
- - Team capability:
42
- - Compliance requirements:
43
-
44
- ## Technical Solution Space
34
+ <!-- Include only the constraints that actually shape the solution. -->
45
35
 
46
- <!-- List 2-3 possible approaches with their pros and cons. Pick one in the design phase. -->
36
+ - Tech stack: ...
37
+ - Time budget: ...
38
+ - (Compliance, team capability, etc — only if they constrain this feature)
47
39
 
48
- ### Option A: ...
49
- - **Pros**:
50
- - **Cons**:
51
- - **Complexity**: low / medium / high
52
- - **Docs (context7 queries)**:
53
- - `library-name@version`: ...
40
+ ## Technical Solution Space
54
41
 
55
- ### Option B: ...
56
- - **Pros**:
57
- - **Cons**:
58
- - **Complexity**: low / medium / high
42
+ <!--
43
+ If one approach is clearly the right call for this stack, write only that approach with its rationale.
44
+ Include alternative options ONLY when there is a genuine tradeoff a thoughtful engineer might disagree on.
45
+ Do not invent Option B and Option C just to fill the template.
46
+ -->
59
47
 
60
- ### Option C (optional): ...
48
+ ### Recommended Approach: ...
49
+ - **Why**: ...
50
+ - **Complexity**: ...
51
+ - **Key APIs verified via context7**: ...
61
52
 
62
- ## Existing Code Analysis
53
+ ### Alternative: ... (include only if a real alternative exists)
63
54
 
64
- <!-- Codebase scan results. Which existing modules can be reused? Which need to be new? -->
55
+ ## Existing Code Analysis (include only if the codebase has relevant prior work)
65
56
 
66
57
  ### Reusable Modules
67
- - `path/to/existing-module.ts` — ...
68
-
69
- ### Modules to Create
70
- - `path/to/new-module.ts` — ...
71
-
72
- ### Modules to Modify
73
- - `path/to/modify.ts` — ...
74
-
75
- ## Latest Documentation Summary (context7)
76
-
77
- <!-- Latest APIs / best practices found by flow-researcher via mcp__context7__* -->
78
-
79
- ### {{LIBRARY_1}}
80
- - Version:
81
- - Relevant APIs:
82
- - Gotchas / changes:
83
-
84
- ### {{LIBRARY_2}}
85
- - ...
86
-
87
- ## Feasibility Assessment
58
+ - `path/to/module` — ...
88
59
 
89
- <!-- Explicitly answer: can this be done? how hard is it? -->
60
+ ### New Modules Required
61
+ - `path/to/new` — ...
90
62
 
91
- - **Feasibility**: feasible / ⚠ risky / ✗ not recommended
92
- - **Estimated complexity**: 1-10
93
- - **Main risks**:
94
- - Risk 1: ...
95
- - Risk 2: ...
63
+ ## Latest Documentation Summary
96
64
 
97
- ## Recommended Direction
65
+ <!-- Only include libraries whose API is version-sensitive AND used by this feature. Do not cite every library in the stack. -->
98
66
 
99
- <!-- Research conclusion: which option is recommended and why. If multiple options need discussion, explain here. -->
67
+ ### {{LIBRARY}}
68
+ - Version: ...
69
+ - Relevant APIs: ...
70
+ - Gotchas: ...
100
71
 
101
- **Recommendation**: Option ?
102
- **Rationale**:
103
- **To confirm in the design phase**:
72
+ ## Feasibility
104
73
 
105
- ## Open Questions
74
+ - **Verdict**: feasible / risky / not recommended
75
+ - **Main risks**: (only if real risks exist)
106
76
 
107
- <!-- Questions the research phase couldn't answer, to be deferred to later phases or asked of the user -->
77
+ ## Open Questions (delete if none)
108
78
 
109
79
  1. ...
110
- 2. ...
111
80
 
112
81
  ---
113
82
 
114
- _Generated by flow-researcher agent on {{CREATED_DATE}}. Subsequent phases continue from this document._
83
+ _Generated by flow-researcher on {{CREATED_DATE}}._
@@ -5,137 +5,80 @@ created: {{CREATED_DATE}}
5
5
  version: 1.0
6
6
  status: in_progress
7
7
  depends_on: design.md
8
- task_size: fine
9
8
  ---
10
9
 
11
10
  # Task Breakdown: {{SPEC_NAME}}
12
11
 
13
- > POC-First 5 Phases: **work refactor test quality gates PR lifecycle**.
12
+ > POC-First is an **orientation, not a mandate**. Use the phases below as an organizing idea and **delete phases that don't apply to this feature**. A bug-fix may be one task. A prototype may skip Phase 2 (refactor) and Phase 5 (PR lifecycle). A library may skip the PR lifecycle entirely. Forcing all five phases for a small feature is the padding pattern this template is designed to prevent.
14
13
  >
15
- > Each task includes: `Do`, `Files`, `Done-when`, `Verify`, `Commit`. Verifiable via automation.
14
+ > Each task includes whatever of `Do`, `Files`, `Done-when`, `Verify`, `Commit` is needed for the executor to finish it in a single sub-agent dispatch. Verify must be an automated command (no "manual test").
16
15
 
17
16
  ---
18
17
 
19
18
  ## Marker Rules
20
19
 
21
20
  - `[ ]` TODO / `[x]` done
22
- - `[P]` parallel-safe (can be dispatched in parallel within the same wave)
23
- - `[VERIFY]` quality checkpoint (run by the flow-verifier agent)
21
+ - `[P]` parallel-safe (dispatch in parallel within the same wave)
22
+ - `[VERIFY]` quality checkpoint (flow-verifier agent)
24
23
  - `[SEQUENTIAL]` must be serial (breaks the parallel group)
25
24
 
26
25
  ---
27
26
 
28
27
  ## Phase 1: Make It Work (POC)
29
28
 
30
- > Goal: get it running end-to-end. Hardcoding is acceptable; skip tests.
29
+ > Goal: end-to-end runnable. Hardcoding is acceptable; skip tests here.
31
30
 
32
- - [ ] **1.1** [P] Initialize module skeleton
33
- - **Do**: create `src/{{MODULE}}/` directory, add `index.ts`, `types.ts`
34
- - **Files**: `src/{{MODULE}}/index.ts`, `src/{{MODULE}}/types.ts`
35
- - **Done when**: directory exists, `import {} from './{{MODULE}}'` does not error
36
- - **Verify**: `npx tsc --noEmit`
37
- - **Commit**: `feat({{MODULE}}): initialize module skeleton`
38
- - _Requirements_: FR-01
31
+ <!-- Add only the tasks this feature genuinely needs. Do not invent skeleton tasks to "round out" the phase. -->
39
32
 
40
- - [ ] **1.2** [P] ...
33
+ - [ ] **1.1** ...
41
34
  - **Do**: ...
42
35
  - **Files**: ...
43
36
  - **Done when**: ...
44
37
  - **Verify**: ...
45
38
  - **Commit**: ...
46
- - _Requirements_: ...
47
- - _Design_: AD-01
39
+ - _Requirements_: FR-NN
48
40
 
49
- - [ ] **1.3** [VERIFY] End-to-end POC verification
50
- - **Do**: run the happy path manually, confirm the core scenario works
51
- - **Verify**: `curl http://localhost:3000/... | jq`
52
- - **Done when**: returns expected data (edge cases may still be wrong)
41
+ - [ ] **1.X** [VERIFY] End-to-end POC verification
42
+ - **Verify**: `<command>`
43
+ - **Done when**: happy path returns the expected result
53
44
 
54
- ## Phase 2: Refactoring
45
+ ## Phase 2: Refactoring (delete if the POC is already clean)
55
46
 
56
- > Goal: clean up the code structure. Behavior unchanged.
57
-
58
- - [ ] **2.1** Extract duplicated logic
59
- - **Do**: ...
60
- - **Verify**: `npx tsc --noEmit && git diff --stat`
61
- - **Commit**: `refactor({{MODULE}}): extract common logic`
62
-
63
- - [ ] **2.2** [VERIFY] Refactor does not break behavior
64
- - **Verify**: rerun the manual test from Phase 1
65
- - **Done when**: all outputs match
47
+ > Include only if the POC has genuine duplication or structural mud that warrants cleanup. Skip for tiny features.
66
48
 
67
49
  ## Phase 3: Testing (TDD red / green / yellow)
68
50
 
69
- > Rule: tests first. Let the test fail first (RED), then implement (GREEN), then clean up (YELLOW).
70
-
71
- - [ ] **3.1** [RED] Write failing tests — unit
72
- - **Do**: write unit tests for core functions
73
- - **Files**: `src/{{MODULE}}/*.test.ts`
74
- - **Verify**: `npm test -- src/{{MODULE}}` — expected to fail
75
- - **Commit**: `test({{MODULE}}): red - add unit tests for core logic`
76
-
77
- - [ ] **3.2** [GREEN] Make tests pass
78
- - **Do**: fix the implementation so the tests from 3.1 pass
79
- - **Verify**: `npm test -- src/{{MODULE}}` — all green
80
- - **Commit**: `feat({{MODULE}}): green - satisfy unit tests`
81
-
82
- - [ ] **3.3** [YELLOW] Refactor and clean up
83
- - **Do**: clean up the implementation, tests still pass
84
- - **Commit**: `refactor({{MODULE}}): yellow - clean implementation`
51
+ > Rule: tests first. Red → Green → Yellow. **Collapse red+green into one task when the test and implementation are trivially paired**; split only when the test genuinely precedes a nontrivial implementation.
85
52
 
86
- - [ ] **3.4** [RED GREEN YELLOW] Integration tests
87
- - <!-- Repeat the TDD cycle -->
53
+ - [ ] **3.X** [RED→GREEN→YELLOW] ...
88
54
 
89
- - [ ] **3.5** [VERIFY] Coverage check
90
- - **Verify**: `npm test -- --coverage` core logic > 80%
55
+ - [ ] **3.X+1** [VERIFY] Coverage check
56
+ - **Verify**: coverage on the changed surface project standard
91
57
 
92
58
  ## Phase 4: Quality Gates
93
59
 
94
- > Full local checks. Last gate before CI.
95
-
96
- - [ ] **4.1** TypeScript strict check
97
- - **Verify**: `npx tsc --strict --noEmit` — 0 errors
98
- - **Commit**: `chore({{MODULE}}): tsc strict passes`
99
-
100
- - [ ] **4.2** Lint
101
- - **Verify**: `npx eslint src/{{MODULE}}` — 0 errors, 0 warnings
102
-
103
- - [ ] **4.3** All tests pass
104
- - **Verify**: `npm test` — all green
105
-
106
- - [ ] **4.4** [VERIFY] Final health check
107
- - **Do**: flow-verifier agent performs goal-driven reverse verification
108
- - **Done when**: every FR-XX and AC-X.Y has a corresponding automated verification
109
-
110
- ## Phase 5: PR Lifecycle
60
+ > Include only the checks this project actually runs. `npx eslint` is dead weight if the project uses biome. `tsc --strict` is dead weight for a JS project.
111
61
 
112
- - [ ] **5.1** Generate PR
113
- - **Do**: `/flow-ship` creates the PR
114
- - **Done when**: PR URL returned, description is clear
62
+ - [ ] **4.X** [VERIFY] Final health check
63
+ - **Do**: flow-verifier performs goal-driven reverse verification
64
+ - **Done when**: every FR/AC has an automated check
115
65
 
116
- - [ ] **5.2** Respond to review feedback
117
- - **Do**: iterate until approved
118
- - **Verify**: CI all green
66
+ ## Phase 5: PR Lifecycle (delete for local-only work, scripts, internal tools without a PR flow)
119
67
 
120
- - [ ] **5.3** Merge
121
- - **Do**: `/flow-land`
122
- - **Verify**: the main branch contains all commits for this spec
68
+ - [ ] **5.X** Ship / Land
123
69
 
124
70
  ---
125
71
 
126
72
  ## Coverage Audit
127
73
 
128
- <!-- Final step for flow-planner: confirm every FR / AC / AD / D has a corresponding task -->
74
+ <!-- flow-planner fills this in. Every FR / AC / AD / D must map to a task, or explicitly defer with reason. -->
129
75
 
130
76
  | Requirement ID | Task(s) | Status |
131
77
  |--------|---------|------|
132
- | FR-01 | 1.2, 3.1 | ✓ |
133
- | FR-02 | ... | ⚠ uncovered — needs adding |
134
- | AD-01 | 1.1 | ✓ |
135
- | D-05 (STATE.md) | ... | ✓ |
78
+ | FR-01 | ... | ✓ |
136
79
 
137
- **Uncovered items must be handled**: add a task or document the deferral reason in STATE.md.
80
+ **Uncovered items must be handled**: add a task, or document the deferral reason in STATE.md.
138
81
 
139
82
  ---
140
83
 
141
- _Generated by flow-planner agent on {{CREATED_DATE}}. N tasks total, estimated X hours._
84
+ _Generated by flow-planner on {{CREATED_DATE}}._