npm - @curdx/flow - Versions diffs - 2.0.0-beta.6 → 2.0.0-beta.7 - Mend

@curdx/flow 2.0.0-beta.6 → 2.0.0-beta.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +1 -1
package/agents/flow-adversary.md +13 -39
package/agents/flow-edge-hunter.md +2 -2
package/agents/flow-planner.md +1 -1
package/agents/flow-reviewer.md +1 -1
package/agents/flow-verifier.md +1 -1
package/commands/fast.md +1 -1
package/commands/implement.md +1 -1
package/commands/review.md +5 -5
package/commands/spec.md +1 -1
package/gates/adversarial-review-gate.md +3 -3
package/gates/devex-gate.md +2 -3
package/knowledge/execution-strategies.md +6 -5
package/knowledge/spec-driven-development.md +8 -7
package/knowledge/two-stage-review.md +4 -3
package/package.json +1 -1
package/templates/design.md.tmpl +32 -112
package/templates/requirements.md.tmpl +25 -43
package/templates/research.md.tmpl +37 -68
package/templates/tasks.md.tmpl +27 -84

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -6,7 +6,7 @@
   },
   "metadata": {
     "description": "Claude Code Discipline Layer — spec-driven workflow + goal-backward verification + Karpathy 4 principles enforced via gates. Stops Claude from faking \"done\" on non-trivial features.",
-    "version": "2.0.0-beta.6"
+    "version": "2.0.0-beta.7"
   },
   "plugins": [
     {

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "curdx-flow",
-  "version": "2.0.0-beta.6",
+  "version": "2.0.0-beta.7",
   "description": "Claude Code Discipline Layer — spec-driven workflow + goal-backward verification + Karpathy 4 principles enforced via gates. Stops Claude from faking \"done\" on non-trivial features.",
   "author": {
     "name": "wdx",

package/agents/flow-adversary.md CHANGED Viewed

@@ -64,29 +64,16 @@ Based on input type:
 ### Step 2: Round 1 — Breadth Scan
-For each of the 6 categories, use sequential-thinking **one by one**:
+Walk through the applicable categories below. **Skip categories that don't apply** (e.g. no UI → UX is N/A; no auth → Security only if that absence is itself material) and note them as `N/A: <reason>` in your report. Use sequential-thinking proportional to the surface each category presents — 1 thought for a trivial check, more for genuinely complex surfaces.
-```
-Round 1: Architecture layer
-  Think: Are these decisions right? Will we regret them later? Any implicit coupling?
-Round 2: Implementation layer
-  Think: Code quality? Error handling? Boundaries?
-Round 3: Testing layer
-  Think: Coverage? Over-mocked? Falsely green?
+- **Architecture**: Are decisions right? Will we regret them in 6 months? Any implicit coupling?
+- **Implementation**: Code quality? Error handling? Boundaries?
+- **Testing**: Coverage? Over-mocked? Falsely green?
+- **Security**: Injection? Privilege escalation? Leakage? Auth bypass?
+- **Maintainability**: Naming? Structure? Can the next maintainer understand?
+- **UX** (if UI / API contract is involved): Error messages clear? Loading? Accessibility?
-Round 4: Security layer
-  Think: Injection? Privilege escalation? Leakage? Auth bypass?
-Round 5: Maintainability layer
-  Think: Naming? Structure? Can the next maintainer understand?
-Round 6: UX layer (if UI / API contract is involved)
-  Think: Are error messages clear? Loading? Accessibility?
-```
-**Key point**: every round must **specifically point out what was examined** (file:line), not vague thinking.
+**Key point**: whenever you examine a category, cite what you looked at (file:line or design-doc section), not vague thinking.
 ### Step 3: Judgment
@@ -108,24 +95,11 @@ else:
 ### Step 4: Round 2 — Deep Drill
-For areas where Round 1 said "looks fine", use sequential-thinking for another 6 rounds:
+For the "looks fine" areas from Round 1, use sequential-thinking proportional to the residual uncertainty. Three lenses to rotate through (stop when the drill honestly surfaces nothing new, don't force all three):
-```
-Rounds 1-2: Trust but verify
-  - Round 1 I said architecture is fine — really?
-  - Did I only look at the surface?
-  - What pitfalls have similar projects (e.g., open-source comparisons) hit?
-Rounds 3-4: Counterfactual thinking
-  - What happens if this system is stress-tested by an adversarial user?
-  - As code evolves in 6 months, will this decision become a bottleneck?
-  - What about 10x/100x load?
-Rounds 5-6: Boundaries and implicits
-  - What "default behaviors" are in the code but unstated?
-  - Has the dependency library had any famous CVEs?
-  - What does this design assume users won't do? What if they do?
-```
+- **Trust but verify**: did I only look at the surface? What pitfalls have similar open-source projects hit?
+- **Counterfactual**: under adversarial stress? In 6 months as the codebase evolves? At 10x / 100x load?
+- **Boundaries and implicits**: what "default behaviors" are unstated? Any CVE history in the dependency? What does the design assume users won't do?
 ### Step 5: Fallback If Still Zero Findings
@@ -134,7 +108,7 @@ If Round 2 still yields no findings, you must output a **proof report**:
 ```markdown
 ## Adversarial Review — No Sufficient Findings (Proof Report)
-In 2 rounds × 6 dimensions = 12 rounds of sequential-thinking, I checked:
+Across Round 1 (breadth) and Round 2 (depth), I checked the following applicable dimensions (N/A ones listed separately):
 ### Architecture (specifically examined)
 - AD-01~05 in design.md

package/agents/flow-edge-hunter.md CHANGED Viewed

@@ -252,7 +252,7 @@ If the user agrees, suggest a set of tasks to append to tasks.md:
 ## Forbidden
-- ✗ Skipping any of the 7 categories (even if the project is not internationalized, at least state "I18n not applicable, reason: X")
+- ✗ Silently skipping a category — N/A is fine, but every category that doesn't apply must be named with a one-line reason (e.g. "I18n: N/A — single-locale MVP")
 - ✗ Listing scenarios only from imagination (must grep the code + compare tests)
 - ✗ Not using sequential-thinking
 - ✗ Gap list without priority ordering
@@ -260,7 +260,7 @@ If the user agrees, suggest a set of tasks to append to tasks.md:
 ## Quality Self-Check
-- [ ] All 7 categories covered?
+- [ ] Every applicable category examined, with N/A reasons recorded for the rest?
 - [ ] Each gap has category + location + scenario + risk + recommended test code?
 - [ ] Priority ordering is clear?
 - [ ] Findings proportional to real edge-case surface (zero is OK if all categories honestly N/A)

package/agents/flow-planner.md CHANGED Viewed

@@ -138,7 +138,7 @@ For each of the following sources, every item must be covered by tasks:
 **CRITICAL (see L8 of the preamble — long-artifact handling):**
 - Your FIRST action in this step must be a `Write` tool call with the full `tasks.md` content. Do NOT paste the file content as assistant text before writing.
 - Do NOT preview the tasks list in the response. The file itself is the deliverable.
-- If `tasks.md` would be >200 lines, split into `tasks-phase-1.md` … `tasks-phase-5.md` and make `tasks.md` a short index linking to them.
+- If a single `Write` call would approach the sub-agent output-token budget (judge by section density, not line count — see preamble L8), split into `tasks-phase-<n>.md` files and make `tasks.md` a short index linking to them.
 Based on `${CLAUDE_PLUGIN_ROOT}/templates/tasks.md.tmpl`. Must include a **coverage audit table** at the end (from Step 5).

package/agents/flow-reviewer.md CHANGED Viewed

@@ -189,7 +189,7 @@ else:
 **CRITICAL (see L8 of the preamble):** your FIRST action in this step must be a `Write` tool call with the **complete report content**. Do NOT paste the report as assistant text before writing. After the write succeeds, respond with a ≤ 5-line summary only (path, verdict, blocker count, next step). Do not re-paste the report.
-If the report would exceed ~200 lines, split into `review-report.md` (short index + verdict) and `review-details.md` (full findings) — two `Write` calls.
+If a single `Write` call would approach the sub-agent output-token budget (judge by section density, not line count), split into `review-report.md` (short index + verdict) and `review-details.md` (full findings) — two `Write` calls. See preamble L8.
 Full structure (use this as the content passed to `Write`, not as preview text):

package/agents/flow-verifier.md CHANGED Viewed

@@ -174,7 +174,7 @@ For each match, check:
 **CRITICAL (see L8 of the preamble):** your FIRST action in this step must be a `Write` tool call with the **complete report content**. Do NOT paste the report as assistant text before writing — doing so doubles output tokens and causes truncation inside the `Write` call. After the write succeeds, respond with a ≤ 5-line summary only (path, verdict counts, next step). Do not re-paste the report.
-If the report would exceed ~200 lines, split into `verification-report.md` (short index + verdict) and `verification-details.md` (full findings table) — two `Write` calls.
+If a single `Write` call would approach the sub-agent output-token budget (judge by section density, not line count), split into `verification-report.md` (short index + verdict) and `verification-details.md` (full findings table) — two `Write` calls. See preamble L8.
 Required structure (use this as the content passed to `Write`, not as preview text):

package/commands/fast.md CHANGED Viewed

@@ -123,6 +123,6 @@ Choosing the right scenario matters more than forcing the flow.
 ## Forbidden
 - ✗ Committing without running verification
-- ✗ Changes touching more than 5 files (means it is no longer fast — run the full flow)
+- ✗ Changes touching many unrelated files or modules (means it is no longer fast — run the full flow)
 - ✗ Writing library APIs from memory
 - ✗ Skipping the Step 2 5-question clarification (even when "obvious," explicit statement still has value)

package/commands/implement.md CHANGED Viewed

@@ -330,7 +330,7 @@ Prerequisites:
 ## Step 6: Progress Feedback
-Every 5 tasks or every wave, print status:
+At each wave boundary (or periodically during long linear runs), print status:
 ```
 ═════ Progress ═════

package/commands/review.md CHANGED Viewed

@@ -16,8 +16,8 @@ Distinct from `/curdx-flow:verify`:
 | Flag | Default | Purpose |
 |------|---------|---------|
 | `--stage=<1\|2\|both>` | `both` | Stage 1 = spec compliance only. Stage 2 = code quality only. `both` = sequential. |
-| `--adversarial` | off | Add an adversarial review pass (6 dimensions × 2 sequential-thinking rounds). Zero-findings forbidden. |
-| `--edge-case` | off | Add edge-case hunting across the 7 categories. Produces a test-gap checklist. |
+| `--adversarial` | off | Add an adversarial review pass across applicable categories (zero findings requires proof-of-checking, not fabrication). |
+| `--edge-case` | off | Add edge-case hunting across applicable categories. Produces a test-gap checklist. |
 ## Preflight
@@ -65,7 +65,7 @@ Output: Stage-2 section of the report.
 ## Optional: adversarial review
 If `--adversarial`:
-Dispatch `flow-adversary`. It runs 6 dimensions × 2 rounds of `sequential-thinking`:
+Dispatch `flow-adversary`. It scans the applicable categories (Architecture / Implementation / Testing / Security / Maintainability / UX — skip N/A with reason) using `sequential-thinking` proportional to the residual uncertainty, probing:
 1. What's missing?
 2. What's overengineered?
 3. What would break first in production?
@@ -73,12 +73,12 @@ Dispatch `flow-adversary`. It runs 6 dimensions × 2 rounds of `sequential-think
 5. What decision locks us out of a future option?
 6. What would a skeptical reviewer reject?
-**Zero findings are forbidden** — if the agent reports "all good", re-dispatch with stronger skepticism. Per `@${CLAUDE_PLUGIN_ROOT}/gates/adversarial-review-gate.md`.
+**Zero findings requires proof-of-checking, not fabrication** — honest "clean" verdicts are fine if the agent lists what it examined. Per `@${CLAUDE_PLUGIN_ROOT}/gates/adversarial-review-gate.md`.
 ## Optional: edge-case hunting
 If `--edge-case`:
-Dispatch `flow-edge-hunter` across the 7 categories:
+Dispatch `flow-edge-hunter` across the applicable categories (skip N/A with one-line reason):
 1. Boundary values (0, MAX, empty, one-over-limit)
 2. Concurrency / race conditions
 3. Network failure / partial failure

package/commands/spec.md CHANGED Viewed

@@ -82,7 +82,7 @@ Output: `requirements.md` with user stories (US-NN), acceptance criteria (AC-N.N
 ### design → `flow-architect`
 Inputs: `research.md` + `requirements.md`.
-Output: `design.md` with architecture decisions (AD-NN), component boundaries, data models, error-path design, mermaid diagrams. Must use `sequential-thinking` MCP (≥8 thoughts).
+Output: `design.md` with architecture decisions (AD-NN), component boundaries, data models, error-path design, mermaid diagrams (when they clarify). Uses `sequential-thinking` MCP proportional to the genuine tradeoff surface.
 ### tasks → `flow-planner`
 Inputs: all three prior files + `.flow/PROJECT.md` tech stack.

package/gates/adversarial-review-gate.md CHANGED Viewed

@@ -87,7 +87,7 @@ Input: object under review (code range / spec / PR diff)
   ↓
 Round 1 (agent self-analysis):
   - Use sequential-thinking proportional to the surface being probed
-  - Scan all 6 categories
+  - Scan each applicable category; mark N/A ones with reason
   - Output findings list
   ↓
 Decision:
@@ -190,10 +190,10 @@ Fix loop:
 ## Failure Recovery
-If after 2 rounds there are still < 3 findings:
+If after Round 2 the honest verdict is still zero findings, emit a proof-of-checking report (do NOT fabricate to hit a quota — there is no quota):
 ```markdown
-## Adversarial Review — Insufficient Findings
+## Adversarial Review — Proof of Checking (zero findings)
 I have examined the following dimensions across 2 rounds of analysis:

package/gates/devex-gate.md CHANGED Viewed

@@ -210,7 +210,7 @@ Attach a DevEx checklist at PR time:
 ## Scoring
-Each dimension 0-10 points:
+Score each **applicable** dimension 0-10 (N/A dimensions are excluded from the total):
 ```
 10 = best practice
@@ -220,8 +220,7 @@ Each dimension 0-10 points:
 0  = serious issue
 ```
-Total 40+ / 80 = pass (warning, non-blocking).
-Total < 40 = blocked, improvement required.
+Emit the per-dimension scores with evidence. The gate itself does not block on a numeric threshold; it surfaces the weaknesses for the user (or the reviewing agent) to decide whether any of them rise to a blocker. A single 0/10 on a material dimension is a blocker regardless of the total.
 ---

package/knowledge/execution-strategies.md CHANGED Viewed

@@ -223,13 +223,14 @@ return "linear"
 ## Failure Handling (common to all strategies)
-`flow-executor` agent's 5-round retry mechanism:
+`flow-executor` agent's retry ladder — each step escalates only when the prior is honestly exhausted, not on a fixed count:
 ```
-Rounds 1-2: agent retries autonomously (edit code, rerun Verify)
-Round 3: sequential-thinking root-cause analysis ≥ 5 rounds
-Round 4: read related source + trace data flow
-Round 5: report TASK_FAILED
+Step A: autonomous retry (edit + rerun Verify) — only for shallow failures
+Step B: sequential-thinking root-cause analysis proportional to the hypothesis space
+Step C: read related source + trace data flow
+Step D: if ≥3 retries fail with no new hypothesis, stop and challenge the architecture (see preamble L3)
+Step E: report TASK_FAILED
 ```
 ### Extra protections for Stop-Hook strategy

package/knowledge/spec-driven-development.md CHANGED Viewed

@@ -57,7 +57,7 @@ What's wasted isn't code — it's context tokens and decision fatigue from churn
 **Key behaviors** (flow-researcher agent):
 1. Read `.flow/PROJECT.md` and `.flow/CONTEXT.md` to understand project background
 2. Call `mcp__claude_mem__search` to retrieve relevant historical experience
-3. Use sequential-thinking for 5-8 rounds of problem understanding
+3. Use sequential-thinking proportional to the unknowns (1 thought for a trivial prototype, many for a novel domain)
 4. Scan the codebase for reusable modules
 5. Use `mcp__context7__*` to look up latest docs for relevant libraries
 6. When necessary, WebSearch for the latest technical trends
@@ -99,11 +99,12 @@ What's wasted isn't code — it's context tokens and decision fatigue from churn
 **Key behaviors** (flow-architect agent):
 1. Read `research.md` + `requirements.md`
-2. **Must use sequential-thinking for at least 8 rounds**:
-   - Rounds 1-2: constraints
-   - Rounds 3-5: comparison of options A/B
-   - Rounds 6-7: selection + trade-offs
-   - Round 8: rebut yourself
+2. **Use sequential-thinking proportional to the tradeoff surface** — the phases below are orientation, not a quota:
+   - Constraints (from NFR / tech stack)
+   - Option comparison (only when alternatives genuinely compete)
+   - Selection + accepted tradeoff
+   - Self-rebuttal
+   A well-known stack pick may finish in 1 thought; a distributed-system design may run many. Do not pad.
 3. Assign an `AD-NN` ID to each architectural decision
 4. Draw a data flow diagram (mermaid)
 5. Define component interfaces + error paths
@@ -125,7 +126,7 @@ What's wasted isn't code — it's context tokens and decision fatigue from churn
 3. Each task has 5 fields: `Do` / `Files` / `Done-when` / `Verify` / `Commit`
 4. **Multi-source coverage audit**: for each FR / AC / AD / decision, confirm there is a covering task (no omissions)
 5. Mark `[P]` (parallel-safe) and `[VERIFY]` (checkpoint)
-6. Simple decomposition doesn't need sequential-thinking, but reflect on coverage every 5 tasks
+6. Simple decomposition doesn't need sequential-thinking; run a coverage audit at the end (every FR/AC/AD has a task)
 **Deliverable**: `tasks.md`

package/knowledge/two-stage-review.md CHANGED Viewed

@@ -113,17 +113,18 @@ Stage 2 applies all enabled Gates (from `.flow/config.json`):
 #### 2.5 (enterprise) Adversarial review (adversarial-review-gate)
-- ≥ 3 categories of issues found?
+- Every applicable category examined (N/A documented for the rest)?
+- Findings proportional to real issues (zero is OK with a proof-of-checking report)?
 - Each finding has evidence + recommendation?
 #### 2.6 (enterprise) Edge cases (edge-case-gate)
-- Did all 7 major categories pass?
+- Each applicable edge-case category addressed (N/A noted for the rest)?
 - Gap list has priorities?
 ### Stage 2 verdict
-- **EXCELLENT**: all enabled Gates pass, adversarial findings < 3 (high-quality code)
+- **EXCELLENT**: all enabled Gates pass, adversarial review clean or only low-severity findings
 - **GOOD**: all enabled Gates pass, but some warnings
 - **NEEDS_IMPROVEMENT**: Gate violations (blocking)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@curdx/flow",
-  "version": "2.0.0-beta.6",
+  "version": "2.0.0-beta.7",
   "description": "CLI installer for CurDX-Flow — AI engineering workflow meta-framework for Claude Code",
   "type": "module",
   "bin": {

package/templates/design.md.tmpl CHANGED Viewed

@@ -9,155 +9,75 @@ depends_on: requirements.md
 # Technical Design: {{SPEC_NAME}}
-> Conclusions from the flow-architect agent. Sequential-thinking is invoked proportional to the genuine tradeoff surface of this design — the thinking chain does not appear here, only the conclusions.
-> This document freezes the technical choices. Subsequent tasks / implementation strictly follow this design.
+> Conclusions from flow-architect. Sequential-thinking is invoked proportional to the genuine tradeoff surface — the chain lives in the thinking tool, not this document.
+>
+> **Fill only the sections that carry real design information for this feature.** Well-known stack assemblies legitimately compress to a stack list + data model + a few real ADs. Delete sections whose honest answer would be "N/A" or "standard for this stack". A forced 13-section template is the bloat pattern this is designed to prevent.
 ---
 ## Design Overview (one paragraph)
-<!-- One-sentence summary of the architecture -->
+<!-- One sentence summary of the approach. -->
 ## Architecture Decisions
-<!-- Each major decision gets an ID and is written to the decisions array in .flow/STATE.md -->
+<!-- Each real decision gets an AD-NN. If a decision is "obvious, no alternative worth listing," use one line and move on. -->
 ### AD-01: ...
-- **Decision**: Use X instead of Y
+- **Decision**: Use X
 - **Rationale**: ...
-- **Trade-off**: Accepted [downside] in exchange for [upside]
-- **sequentialthinking rounds**: rounds 3-5
-### AD-02: ...
-## System Architecture Diagram
-```mermaid
-flowchart TB
-  <!-- actual data flow generated by flow-architect -->
-  User[User] --> API[API Gateway]
-  API --> Auth[Auth Service]
-  Auth --> DB[(Database)]
-```
+- **Trade-off**: ... (omit if there is no genuine tradeoff)
 ## Component Design
-<!-- Each component is independently testable. Interfaces are explicit. -->
+<!-- Each component: responsibility, input type, output type, dependencies, error path. Skip if the feature is a single module with no internal boundaries worth naming. -->
-### Component: {{COMP_NAME_1}}
+### Component: {{COMP_NAME}}
 - **Responsibility**: ...
-- **Input**:
-  ```ts
-  interface Input {
-    field: Type;
-  }
-  ```
-- **Output**:
-  ```ts
-  interface Output {
-    field: Type;
-  }
-  ```
-- **Dependencies**: Component X, Library Y
-- **Errors**:
-  - `ErrorCode.X` — when ... happens
-  - `ErrorCode.Y` — when ... happens
-### Component: {{COMP_NAME_2}}
-<!-- ... -->
-## Data Model
-<!-- Database schema / data structures -->
-### Entity: ...
-```sql
-CREATE TABLE ... (
-  id UUID PRIMARY KEY,
-  ...
-);
-```
+- **Input**: `interface Input { ... }`
+- **Output**: `interface Output { ... }`
+- **Dependencies**: ...
+- **Errors**: ...
-### Or TypeScript types:
-```ts
-interface Entity {
-  id: string;
-  ...
-}
-```
+## Data Model (if the feature touches persistence or structured data)
-## State Machine (if applicable)
+<!-- SQL schema, TypeScript types, or API payload shape. Delete if the feature has no meaningful data shape. -->
+## Architecture Diagram (include only when it clarifies; prose often suffices)
 ```mermaid
-stateDiagram-v2
-  [*] --> Pending
-  Pending --> Active: approve
-  Pending --> Rejected: reject
-  Active --> Completed: finish
+flowchart TB
+  ...
 ```
-## Error Path Design
+## State Machine (include only if the feature has non-trivial state transitions)
-<!-- Full flow on failure -->
+## Error Path Design (include when error behavior is not obvious)
-| Scenario | Upstream Behavior | System Response | User-visible |
-|-----|--------|---------|---------|
-| DB connection lost | retry 3 times | return 503 | "Temporarily unavailable, retry in 1 minute" |
-| Rate limit hit | none | return 429 | "Too many requests, retry in 60 seconds" |
+| Scenario | System Response | User-visible |
+|-----|---------|---------|
+| ... | ... | ... |
-## API Contract
-<!-- If this is an API project -->
+## API Contract (include only if this feature exposes or changes an API)
 ```yaml
-POST /api/v1/...
-Request:
-  body:
-    field: string
-Response:
-  200:
-    body:
-      field: string
-  400:
-    body:
-      error: string
+...
 ```
-## Test Matrix
+## Test Matrix (brief — one line per layer)
 | Layer | Coverage | Tool |
 |---|-----|------|
-| Unit | All pure functions | vitest |
-| Integration | Between components | vitest + supertest |
-| E2E | Complete user flows | playwright / chrome-devtools MCP |
-### Key Test Scenarios
-1. Happy path: ...
-2. Edge case 1: ...
-3. Error recovery: ...
-## Suggested Implementation Order
-<!-- Reference for decomposition in the tasks phase -->
-1. Build skeleton first (Component A → empty implementation)
-2. Then wire up the real logic (core logic of Component A)
-3. Connect DB (persistence for Component A)
-4. Then do Component B ...
-## Risks and Mitigations
+| ... | ... | ... |
-| Risk | Level | Mitigation |
-|-----|-----|------|
-| ... | medium | ... |
+## Risks and Mitigations (include only if risks exist that aren't obvious from the ADs)
 ## Defer to Implementation
-<!-- Decisions not worth spending time on in the design phase -->
+<!-- Decisions explicitly deferred to when the executor writes the code. -->
-- Logging library choice → reuse project's existing one during implementation
-- Caching strategy → no caching initially, adjust based on data after launch
+- ...
 ---
-_Generated by flow-architect agent on {{CREATED_DATE}}. After user reviews and approves AD-01~N, proceed to the tasks phase._
+_Generated by flow-architect on {{CREATED_DATE}}._

package/templates/requirements.md.tmpl CHANGED Viewed

@@ -9,86 +9,68 @@ depends_on: research.md
 # Requirements Spec: {{SPEC_NAME}}
-> **Recommended direction from the research phase**: {{RESEARCH_CONCLUSION}}
+> **Recommended direction from research**: {{RESEARCH_CONCLUSION}}
 >
-> This phase: translate "technically feasible" into "concrete behaviors users benefit from".
+> **Fill only the sections that carry real information for this feature.** Delete or collapse any section whose honest content would be "N/A" or "same as usual". Padding sections with "TBD" is worse than omitting them.
 ---
 ## User Stories
-<!-- Each story follows the format: As X, I want Y, so that Z -->
 ### US-01: ...
-**As** [user role],
-**I want** [capability],
-**so that** [business value].
+**As** [user role], **I want** [capability], **so that** [business value].
 **Acceptance criteria**:
 - AC-1.1: [verifiable behavior]
-- AC-1.2: [verifiable behavior]
-- AC-1.3: [edge case handling]
+- AC-1.2: ...
-### US-02: ...
-<!-- ... -->
+<!-- Add more US-NN blocks only if the feature genuinely has multiple independent user flows. -->
 ## Functional Requirements
-<!-- FR-NN format. Each FR must be a verifiable statement of "the system must X". -->
 - **FR-01**: The system must ...
-- **FR-02**: The system must ...
-- **FR-03**: ...
+- **FR-02**: ...
 ## Non-Functional Requirements
-### Performance
-- **NFR-P-01**: [e.g. P95 response time < 200ms]
-- **NFR-P-02**: ...
+<!--
+Include ONLY the NFR categories that this feature is actually constrained by.
+For a small internal CRUD feature, "Performance / Security / Maintainability / Compatibility" as a four-bucket grid is usually padding.
+Delete categories that have no real requirement, or collapse into one line: "NFR: standard for this stack, no special constraints."
+-->
-### Security
-- **NFR-S-01**: ...
-- **NFR-S-02**: ...
+### Performance (if applicable)
+- **NFR-P-01**: ...
-### Maintainability
-- **NFR-M-01**: ...
+### Security (if applicable)
+- **NFR-S-01**: ...
-### Compatibility
-- **NFR-C-01**: ...
+<!-- Delete Maintainability / Compatibility sections unless they carry a real constraint. -->
 ## Edge Cases and Error Handling
-<!-- Must be explicit: what happens on failure? how are abnormal inputs handled? -->
+<!-- Include rows only for scenarios that actually apply. -->
 | Scenario | Expected behavior |
 |-----|--------|
-| Network disconnected | ... |
-| Database exception | ... |
-| Invalid input | ... |
-| Concurrent conflict | ... |
+| ... | ... |
 ## Out of Scope
-<!-- Karpathy principle 2: simplicity first. Explicitly list "not this time" to prevent scope creep. -->
-- ✗ Feature A — deferred to the next version
-- ✗ Feature B — out of budget
-- ✗ Feature C — needs its own spec
+- ✗ ...
-## Success Metrics
+## Success Metrics (if the feature has measurable outcomes)
-<!-- Must be quantifiable -->
+<!-- Delete this section for internal tools or refactors with no user-visible metric. -->
-- Metric 1: [e.g. user signup completion rate > 80%]
-- Metric 2: [e.g. complaint rate < 1%]
+- Metric 1: ...
 ## Open Questions
-<!-- Questions that need user answers -->
+<!-- Include only if there are genuinely unresolved questions. Delete when empty. -->
-1. **Question 1**: ...
-2. **Question 2**: ...
+1. ...
 ---
-_Generated by flow-product-designer agent on {{CREATED_DATE}}. After user review, proceed to the design phase._
+_Generated by flow-product-designer on {{CREATED_DATE}}._

package/templates/research.md.tmpl CHANGED Viewed

@@ -10,105 +10,74 @@ status: in_progress
 > **Goal**: {{SPEC_GOAL}}
 >
-> Output of this phase. Subsequent requirements / design / tasks are all based on the conclusions of this document.
+> **Fill only the sections that carry real information.** For a well-understood feature on a known stack, research legitimately compresses to: goal, one recommended direction, known constraints. Delete sections whose honest content would be "N/A" or "first time, nothing to fetch". Padding this document with "TBD" is worse than omitting sections.
 ---
-## Prior Experience (from claude-mem)
-<!--
-flow-researcher first calls mcp__claude_mem__search to retrieve relevant history.
-If there are relevant observations, summarize them here; if not, write "(first research on this topic)".
--->
+## Prior Experience (from claude-mem, if relevant)
 {{CLAUDE_MEM_FINDINGS}}
-## Problem Understanding
+<!-- Delete this section if there are no relevant prior observations. -->
-<!-- Translate the user's goal into technical language. Explicitly list assumptions. -->
+## Problem Understanding
 ### Core Problem
-<!-- One-line description of what we are solving -->
+<!-- One sentence. What are we solving? -->
 ### Explicit Assumptions
-<!-- Karpathy principle 1: think before coding. List all assumptions for the user to confirm -->
+<!-- Only real assumptions that matter. Don't list "assumption: we will write code." -->
 - Assumption 1: ...
-- Assumption 2: ...
 ### Known Constraints
-- Tech stack:
-- Budget / time:
-- Team capability:
-- Compliance requirements:
-## Technical Solution Space
+<!-- Include only the constraints that actually shape the solution. -->
-<!-- List 2-3 possible approaches with their pros and cons. Pick one in the design phase. -->
+- Tech stack: ...
+- Time budget: ...
+- (Compliance, team capability, etc — only if they constrain this feature)
-### Option A: ...
-- **Pros**:
-- **Cons**:
-- **Complexity**: low / medium / high
-- **Docs (context7 queries)**:
-  - `library-name@version`: ...
+## Technical Solution Space
-### Option B: ...
-- **Pros**:
-- **Cons**:
-- **Complexity**: low / medium / high
+<!--
+If one approach is clearly the right call for this stack, write only that approach with its rationale.
+Include alternative options ONLY when there is a genuine tradeoff a thoughtful engineer might disagree on.
+Do not invent Option B and Option C just to fill the template.
+-->
-### Option C (optional): ...
+### Recommended Approach: ...
+- **Why**: ...
+- **Complexity**: ...
+- **Key APIs verified via context7**: ...
-## Existing Code Analysis
+### Alternative: ... (include only if a real alternative exists)
-<!-- Codebase scan results. Which existing modules can be reused? Which need to be new? -->
+## Existing Code Analysis (include only if the codebase has relevant prior work)
 ### Reusable Modules
-- `path/to/existing-module.ts` — ...
-### Modules to Create
-- `path/to/new-module.ts` — ...
-### Modules to Modify
-- `path/to/modify.ts` — ...
-## Latest Documentation Summary (context7)
-<!-- Latest APIs / best practices found by flow-researcher via mcp__context7__* -->
-### {{LIBRARY_1}}
-- Version:
-- Relevant APIs:
-- Gotchas / changes:
-### {{LIBRARY_2}}
-- ...
-## Feasibility Assessment
+- `path/to/module` — ...
-<!-- Explicitly answer: can this be done? how hard is it? -->
+### New Modules Required
+- `path/to/new` — ...
-- **Feasibility**: ✓ feasible / ⚠ risky / ✗ not recommended
-- **Estimated complexity**: 1-10
-- **Main risks**:
-  - Risk 1: ...
-  - Risk 2: ...
+## Latest Documentation Summary
-## Recommended Direction
+<!-- Only include libraries whose API is version-sensitive AND used by this feature. Do not cite every library in the stack. -->
-<!-- Research conclusion: which option is recommended and why. If multiple options need discussion, explain here. -->
+### {{LIBRARY}}
+- Version: ...
+- Relevant APIs: ...
+- Gotchas: ...
-**Recommendation**: Option ?
-**Rationale**:
-**To confirm in the design phase**:
+## Feasibility
-## Open Questions
+- **Verdict**: feasible / risky / not recommended
+- **Main risks**: (only if real risks exist)
-<!-- Questions the research phase couldn't answer, to be deferred to later phases or asked of the user -->
+## Open Questions (delete if none)
 1. ...
-2. ...
 ---
-_Generated by flow-researcher agent on {{CREATED_DATE}}. Subsequent phases continue from this document._
+_Generated by flow-researcher on {{CREATED_DATE}}._

package/templates/tasks.md.tmpl CHANGED Viewed

@@ -5,137 +5,80 @@ created: {{CREATED_DATE}}
 version: 1.0
 status: in_progress
 depends_on: design.md
-task_size: fine
 ---
 # Task Breakdown: {{SPEC_NAME}}
-> POC-First 5 Phases: **work → refactor → test → quality gates → PR lifecycle**.
+> POC-First is an **orientation, not a mandate**. Use the phases below as an organizing idea and **delete phases that don't apply to this feature**. A bug-fix may be one task. A prototype may skip Phase 2 (refactor) and Phase 5 (PR lifecycle). A library may skip the PR lifecycle entirely. Forcing all five phases for a small feature is the padding pattern this template is designed to prevent.
 >
-> Each task includes: `Do`, `Files`, `Done-when`, `Verify`, `Commit`. Verifiable via automation.
+> Each task includes whatever of `Do`, `Files`, `Done-when`, `Verify`, `Commit` is needed for the executor to finish it in a single sub-agent dispatch. Verify must be an automated command (no "manual test").
 ---
 ## Marker Rules
 - `[ ]` TODO / `[x]` done
-- `[P]` parallel-safe (can be dispatched in parallel within the same wave)
-- `[VERIFY]` quality checkpoint (run by the flow-verifier agent)
+- `[P]` parallel-safe (dispatch in parallel within the same wave)
+- `[VERIFY]` quality checkpoint (flow-verifier agent)
 - `[SEQUENTIAL]` must be serial (breaks the parallel group)
 ---
 ## Phase 1: Make It Work (POC)
-> Goal: get it running end-to-end. Hardcoding is acceptable; skip tests.
+> Goal: end-to-end runnable. Hardcoding is acceptable; skip tests here.
-- [ ] **1.1** [P] Initialize module skeleton
-  - **Do**: create `src/{{MODULE}}/` directory, add `index.ts`, `types.ts`
-  - **Files**: `src/{{MODULE}}/index.ts`, `src/{{MODULE}}/types.ts`
-  - **Done when**: directory exists, `import {} from './{{MODULE}}'` does not error
-  - **Verify**: `npx tsc --noEmit`
-  - **Commit**: `feat({{MODULE}}): initialize module skeleton`
-  - _Requirements_: FR-01
+<!-- Add only the tasks this feature genuinely needs. Do not invent skeleton tasks to "round out" the phase. -->
-- [ ] **1.2** [P] ...
+- [ ] **1.1** ...
   - **Do**: ...
   - **Files**: ...
   - **Done when**: ...
   - **Verify**: ...
   - **Commit**: ...
-  - _Requirements_: ...
-  - _Design_: AD-01
+  - _Requirements_: FR-NN
-- [ ] **1.3** [VERIFY] End-to-end POC verification
-  - **Do**: run the happy path manually, confirm the core scenario works
-  - **Verify**: `curl http://localhost:3000/... | jq`
-  - **Done when**: returns expected data (edge cases may still be wrong)
+- [ ] **1.X** [VERIFY] End-to-end POC verification
+  - **Verify**: `<command>`
+  - **Done when**: happy path returns the expected result
-## Phase 2: Refactoring
+## Phase 2: Refactoring (delete if the POC is already clean)
-> Goal: clean up the code structure. Behavior unchanged.
-- [ ] **2.1** Extract duplicated logic
-  - **Do**: ...
-  - **Verify**: `npx tsc --noEmit && git diff --stat`
-  - **Commit**: `refactor({{MODULE}}): extract common logic`
-- [ ] **2.2** [VERIFY] Refactor does not break behavior
-  - **Verify**: rerun the manual test from Phase 1
-  - **Done when**: all outputs match
+> Include only if the POC has genuine duplication or structural mud that warrants cleanup. Skip for tiny features.
 ## Phase 3: Testing (TDD red / green / yellow)
-> Rule: tests first. Let the test fail first (RED), then implement (GREEN), then clean up (YELLOW).
-- [ ] **3.1** [RED] Write failing tests — unit
-  - **Do**: write unit tests for core functions
-  - **Files**: `src/{{MODULE}}/*.test.ts`
-  - **Verify**: `npm test -- src/{{MODULE}}` — expected to fail
-  - **Commit**: `test({{MODULE}}): red - add unit tests for core logic`
-- [ ] **3.2** [GREEN] Make tests pass
-  - **Do**: fix the implementation so the tests from 3.1 pass
-  - **Verify**: `npm test -- src/{{MODULE}}` — all green
-  - **Commit**: `feat({{MODULE}}): green - satisfy unit tests`
-- [ ] **3.3** [YELLOW] Refactor and clean up
-  - **Do**: clean up the implementation, tests still pass
-  - **Commit**: `refactor({{MODULE}}): yellow - clean implementation`
+> Rule: tests first. Red → Green → Yellow. **Collapse red+green into one task when the test and implementation are trivially paired**; split only when the test genuinely precedes a nontrivial implementation.
-- [ ] **3.4** [RED → GREEN → YELLOW] Integration tests
-  - <!-- Repeat the TDD cycle -->
+- [ ] **3.X** [RED→GREEN→YELLOW] ...
-- [ ] **3.5** [VERIFY] Coverage check
-  - **Verify**: `npm test -- --coverage` — core logic > 80%
+- [ ] **3.X+1** [VERIFY] Coverage check
+  - **Verify**: coverage on the changed surface ≥ project standard
 ## Phase 4: Quality Gates
-> Full local checks. Last gate before CI.
-- [ ] **4.1** TypeScript strict check
-  - **Verify**: `npx tsc --strict --noEmit` — 0 errors
-  - **Commit**: `chore({{MODULE}}): tsc strict passes`
-- [ ] **4.2** Lint
-  - **Verify**: `npx eslint src/{{MODULE}}` — 0 errors, 0 warnings
-- [ ] **4.3** All tests pass
-  - **Verify**: `npm test` — all green
-- [ ] **4.4** [VERIFY] Final health check
-  - **Do**: flow-verifier agent performs goal-driven reverse verification
-  - **Done when**: every FR-XX and AC-X.Y has a corresponding automated verification
-## Phase 5: PR Lifecycle
+> Include only the checks this project actually runs. `npx eslint` is dead weight if the project uses biome. `tsc --strict` is dead weight for a JS project.
-- [ ] **5.1** Generate PR
-  - **Do**: `/flow-ship` creates the PR
-  - **Done when**: PR URL returned, description is clear
+- [ ] **4.X** [VERIFY] Final health check
+  - **Do**: flow-verifier performs goal-driven reverse verification
+  - **Done when**: every FR/AC has an automated check
-- [ ] **5.2** Respond to review feedback
-  - **Do**: iterate until approved
-  - **Verify**: CI all green
+## Phase 5: PR Lifecycle (delete for local-only work, scripts, internal tools without a PR flow)
-- [ ] **5.3** Merge
-  - **Do**: `/flow-land`
-  - **Verify**: the main branch contains all commits for this spec
+- [ ] **5.X** Ship / Land
 ---
 ## Coverage Audit
-<!-- Final step for flow-planner: confirm every FR / AC / AD / D has a corresponding task -->
+<!-- flow-planner fills this in. Every FR / AC / AD / D must map to a task, or explicitly defer with reason. -->
 | Requirement ID | Task(s) | Status |
 |--------|---------|------|
-| FR-01 | 1.2, 3.1 | ✓ |
-| FR-02 | ... | ⚠ uncovered — needs adding |
-| AD-01 | 1.1 | ✓ |
-| D-05 (STATE.md) | ... | ✓ |
+| FR-01 | ... | ✓ |
-**Uncovered items must be handled**: add a task or document the deferral reason in STATE.md.
+**Uncovered items must be handled**: add a task, or document the deferral reason in STATE.md.
 ---
-_Generated by flow-planner agent on {{CREATED_DATE}}. N tasks total, estimated X hours._
+_Generated by flow-planner on {{CREATED_DATE}}._