@curdx/flow 1.1.11 → 2.0.0-beta.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (96) hide show
  1. package/.claude-plugin/marketplace.json +3 -3
  2. package/.claude-plugin/plugin.json +4 -11
  3. package/CHANGELOG.md +99 -0
  4. package/README.md +74 -102
  5. package/README.zh.md +2 -2
  6. package/agent-preamble/preamble.md +81 -11
  7. package/agents/flow-adversary.md +41 -56
  8. package/agents/flow-architect.md +24 -11
  9. package/agents/flow-debugger.md +2 -2
  10. package/agents/flow-edge-hunter.md +20 -6
  11. package/agents/flow-executor.md +3 -3
  12. package/agents/flow-planner.md +51 -48
  13. package/agents/flow-product-designer.md +15 -2
  14. package/agents/flow-qa-engineer.md +4 -4
  15. package/agents/flow-researcher.md +18 -3
  16. package/agents/flow-reviewer.md +5 -1
  17. package/agents/flow-security-auditor.md +2 -2
  18. package/agents/flow-triage-analyst.md +4 -4
  19. package/agents/flow-ui-researcher.md +7 -7
  20. package/agents/flow-ux-designer.md +3 -3
  21. package/agents/flow-verifier.md +47 -14
  22. package/bin/curdx-flow.js +13 -1
  23. package/cli/doctor.js +28 -13
  24. package/cli/install.js +62 -36
  25. package/cli/protocols.js +63 -10
  26. package/cli/registry.js +73 -0
  27. package/cli/uninstall.js +9 -11
  28. package/cli/upgrade.js +6 -10
  29. package/cli/utils.js +104 -56
  30. package/commands/debug.md +10 -10
  31. package/commands/fast.md +1 -1
  32. package/commands/help.md +109 -87
  33. package/commands/implement.md +7 -7
  34. package/commands/init.md +18 -7
  35. package/commands/review.md +114 -130
  36. package/commands/spec.md +131 -89
  37. package/commands/start.md +130 -153
  38. package/commands/verify.md +110 -92
  39. package/gates/adversarial-review-gate.md +20 -20
  40. package/gates/coverage-audit-gate.md +1 -1
  41. package/gates/devex-gate.md +5 -6
  42. package/gates/edge-case-gate.md +2 -2
  43. package/gates/security-gate.md +3 -3
  44. package/hooks/hooks.json +0 -11
  45. package/hooks/scripts/quick-mode-guard.sh +12 -9
  46. package/hooks/scripts/session-start.sh +2 -2
  47. package/hooks/scripts/stop-watcher.sh +25 -15
  48. package/knowledge/epic-decomposition.md +2 -2
  49. package/knowledge/execution-strategies.md +10 -9
  50. package/knowledge/planning-reviews.md +6 -6
  51. package/knowledge/spec-driven-development.md +11 -10
  52. package/knowledge/two-stage-review.md +6 -5
  53. package/knowledge/wave-execution.md +5 -5
  54. package/package.json +4 -2
  55. package/skills/brownfield-index/SKILL.md +62 -0
  56. package/skills/browser-qa/SKILL.md +50 -0
  57. package/skills/epic/SKILL.md +68 -0
  58. package/skills/security-audit/SKILL.md +50 -0
  59. package/skills/ui-sketch/SKILL.md +49 -0
  60. package/templates/config.json.tmpl +1 -1
  61. package/templates/design.md.tmpl +32 -112
  62. package/templates/requirements.md.tmpl +25 -43
  63. package/templates/research.md.tmpl +37 -68
  64. package/templates/tasks.md.tmpl +27 -84
  65. package/agents/persona-amelia.md +0 -128
  66. package/agents/persona-david.md +0 -141
  67. package/agents/persona-emma.md +0 -179
  68. package/agents/persona-john.md +0 -105
  69. package/agents/persona-mary.md +0 -95
  70. package/agents/persona-oliver.md +0 -136
  71. package/agents/persona-rachel.md +0 -126
  72. package/agents/persona-serena.md +0 -175
  73. package/agents/persona-winston.md +0 -117
  74. package/commands/audit.md +0 -170
  75. package/commands/autoplan.md +0 -184
  76. package/commands/design.md +0 -155
  77. package/commands/discuss.md +0 -162
  78. package/commands/doctor.md +0 -124
  79. package/commands/index.md +0 -261
  80. package/commands/install-deps.md +0 -128
  81. package/commands/party.md +0 -241
  82. package/commands/plan-ceo.md +0 -117
  83. package/commands/plan-design.md +0 -107
  84. package/commands/plan-dx.md +0 -104
  85. package/commands/plan-eng.md +0 -108
  86. package/commands/qa.md +0 -118
  87. package/commands/requirements.md +0 -146
  88. package/commands/research.md +0 -141
  89. package/commands/security.md +0 -109
  90. package/commands/sketch.md +0 -118
  91. package/commands/spike.md +0 -181
  92. package/commands/status.md +0 -139
  93. package/commands/switch.md +0 -95
  94. package/commands/tasks.md +0 -189
  95. package/commands/triage.md +0 -160
  96. package/hooks/scripts/fail-tracker.sh +0 -31
package/commands/start.md CHANGED
@@ -1,189 +1,166 @@
1
1
  ---
2
2
  name: start
3
- description: intelligent entry point — create a new spec or resume an existing one, entering the corresponding phase
4
- argument-hint: "<spec-name> \"<goal>\""
5
- allowed-tools: [Read, Write, Bash, AskUserQuestion]
3
+ description: Smart entry point — create a new spec, resume an existing one, or switch between specs. Replaces v1's /start + /switch.
4
+ argument-hint: "[<spec-name>] [\"<one-line goal>\"] [--resume] [--list] [--mode=<fast|standard|enterprise>]"
5
+ allowed-tools: [Read, Write, Bash, AskUserQuestion, Task]
6
6
  ---
7
7
 
8
- # Start a Spec
8
+ # Start or Resume a Feature Spec
9
9
 
10
- CurDX-Flow's main entry point. Based on context, automatically:
11
- - New spec → create directory + templates + enter research phase
12
- - Existing spec → resume from the last phase
10
+ Entry point for every feature. Works in four modes depending on flags and existing state.
13
11
 
14
- ## Step 1: Preflight Check
12
+ ## Invocation patterns
15
13
 
16
- ```bash
17
- # Must be initialized
18
- if [ ! -d ".flow" ]; then
19
- echo "❌ Current directory is not a CurDX-Flow project. Run /curdx-flow:init first"
20
- exit 1
21
- fi
22
- ```
14
+ | Pattern | Behavior |
15
+ |---------|----------|
16
+ | `/curdx-flow:start my-feature "Add JWT auth to REST API"` | Create a fresh spec named `my-feature`, set it active, seed draft requirements. |
17
+ | `/curdx-flow:start my-feature` (name exists) | Switch active spec to `my-feature` (same as v1 `/switch`). |
18
+ | `/curdx-flow:start --resume` | Resume the last active spec from `.flow/.active-spec`. |
19
+ | `/curdx-flow:start --list` | List all specs with their phase and last-updated time, prompt to pick. |
20
+ | `/curdx-flow:start` (no args) | Interactive: ask the user whether to create new, resume recent, or list all. |
21
+ | Add `--mode=<fast\|standard\|enterprise>` | Set the execution mode for this spec (stored in `.state.json`). |
23
22
 
24
- ## Step 2: Parse Arguments
23
+ ## Preflight
25
24
 
26
25
  ```bash
27
- ARGS="$ARGUMENTS"
28
-
29
- # Extract spec-name and goal
30
- # Format: flow-start my-feature "Add JWT auth to API"
31
- # or: flow-start my-feature (resume existing spec)
32
- SPEC_NAME=$(echo "$ARGS" | awk '{print $1}')
33
- GOAL=$(echo "$ARGS" | sed -E "s/^[a-z0-9-]+\s*//" | sed 's/^["\x27]//;s/["\x27]$//')
34
-
35
- if [ -z "$SPEC_NAME" ]; then
36
- echo "Usage: /curdx-flow:start <spec-name> \"<goal>\""
37
- echo "Example: /curdx-flow:start auth-system \"Add JWT authentication to REST API\""
38
- exit 1
39
- fi
40
-
41
- # Validate that spec-name is kebab-case
42
- if ! echo "$SPEC_NAME" | grep -Eq '^[a-z][a-z0-9-]*$'; then
43
- echo "❌ spec-name must be kebab-case (start with lowercase letter, only letters, digits, hyphens)"
44
- exit 1
45
- fi
26
+ # Require a .flow project
27
+ [ ! -d ".flow" ] && {
28
+ echo "✗ Not a CurDX-Flow project. Run /curdx-flow:init first.";
29
+ exit 1;
30
+ }
46
31
  ```
47
32
 
48
- ## Step 3: Determine New vs Resume
33
+ ## Flag parsing
49
34
 
50
- ```bash
51
- SPEC_DIR=".flow/specs/$SPEC_NAME"
52
-
53
- if [ -d "$SPEC_DIR" ]; then
54
- # Resume mode
55
- MODE="resume"
56
- echo "🔄 Resuming spec: $SPEC_NAME"
57
- else
58
- # New mode
59
- MODE="new"
60
- echo "🆕 New spec: $SPEC_NAME"
61
-
62
- if [ -z "$GOAL" ]; then
63
- # If no goal given, ask the user
64
- echo "Please provide a one-sentence goal, then continue."
65
- exit 0
66
- fi
67
- fi
68
- ```
35
+ **Do not shell-split `$ARGUMENTS`.** It is a user-supplied string that may
36
+ contain quoted substrings with spaces, `$`-signs, or embedded quotes.
37
+ `xargs`, naive `awk`, and `sed`-based quote stripping all mis-parse at
38
+ least one of those cases (e.g. `my-feature "Fix user's login bug"` breaks
39
+ `xargs: unmatched quote`). Parse the string as a model task instead:
69
40
 
70
- ## Step 4a: New Path
41
+ 1. **Flags** (order-independent, each is self-delimited):
42
+ - `--resume` / `--list` — boolean presence
43
+ - `--mode=<fast|standard|enterprise>` — value after `=`
44
+ Detect each with a single regex over the full `$ARGUMENTS` string and
45
+ remove the matched span from your working copy. Flags not in the list
46
+ above are errors — surface them to the user.
71
47
 
72
- If `MODE=new`:
48
+ 2. **Positional args** (after flags removed):
49
+ - First whitespace-separated token → `SPEC_NAME` (kebab-case `[a-z0-9-]+`).
50
+ - Remainder of the string, trimmed and with one layer of outer `"..."`
51
+ or `'...'` quotes stripped → `GOAL`. Preserve inner quotes as-is.
73
52
 
74
- ```bash
75
- # Create directory
76
- mkdir -p "$SPEC_DIR"
53
+ 3. If `SPEC_NAME` does not match `^[a-z0-9][a-z0-9-]*$` (per
54
+ `schemas/spec-state.schema.json`), stop and ask the user to pick a
55
+ valid kebab-case name.
56
+
57
+ Mode must be `fast`, `standard`, or `enterprise`. Invalid → default to
58
+ `standard` with a warning.
77
59
 
78
- # Generate .state.json
79
- TODAY=$(date +%Y-%m-%d)
80
- CONFIG_MODE=$(python3 -c "import json; print(json.load(open('.flow/config.json')).get('mode','standard'))" 2>/dev/null || echo "standard")
60
+ Example inputs and their parse:
81
61
 
82
- cat > "$SPEC_DIR/.state.json" <<EOF
62
+ | `$ARGUMENTS` | SPEC_NAME | GOAL | flags |
63
+ |-------------------------------------------------|--------------|-------------------------------|---------------|
64
+ | `my-feature "Add JWT auth"` | `my-feature` | `Add JWT auth` | — |
65
+ | `my-feature --mode=fast "Add JWT auth"` | `my-feature` | `Add JWT auth` | mode=fast |
66
+ | `my-feature "Fix user's login bug"` | `my-feature` | `Fix user's login bug` | — |
67
+ | `--list` | — | — | list=true |
68
+ | `--resume` | — | — | resume=true |
69
+
70
+ ## Branch logic
71
+
72
+ ### Branch A: `--list`
73
+ Enumerate every directory under `.flow/specs/`, read each `.state.json` for `phase` and `updated` (per `schemas/spec-state.schema.json`), print a numbered list, then `AskUserQuestion` to pick one. Picking sets `.flow/.active-spec` and exits.
74
+
75
+ ### Branch B: `--resume` (no name)
76
+ Read `.flow/.active-spec`. If it points to a valid spec dir, report its current phase and next suggested command (`/curdx-flow:spec` if incomplete, `/curdx-flow:implement` if tasks ready). If `.active-spec` is empty or stale, fall back to Branch A.
77
+
78
+ ### Branch C: `SPEC_NAME` provided, spec exists
79
+ Switch `.flow/.active-spec` to `SPEC_NAME`. Confirm with the user if they intended to switch (not overwrite). Report current phase.
80
+
81
+ ### Branch D: `SPEC_NAME` provided, spec does NOT exist
82
+ Create a new spec:
83
+
84
+ ```bash
85
+ mkdir -p ".flow/specs/$SPEC_NAME"
86
+ # NOTE: field names MUST match schemas/spec-state.schema.json:
87
+ # - spec_name (not "spec")
88
+ # - created (date, not "created_at")
89
+ # - updated (date-time, not "updated_at")
90
+ # - phase must be one of the enum values; the initial phase is "research"
91
+ # (there is no "created" phase — that was schema drift pre-beta.9)
92
+ # - version is required
93
+ cat > ".flow/specs/$SPEC_NAME/.state.json" <<JSON
83
94
  {
84
95
  "version": "1.0",
85
96
  "spec_name": "$SPEC_NAME",
86
97
  "goal": "$GOAL",
87
- "mode": "$CONFIG_MODE",
88
- "strategy": "auto",
98
+ "mode": "$FLAG_MODE",
89
99
  "phase": "research",
90
- "phase_status": {
91
- "research": "not_started",
92
- "requirements": "not_started",
93
- "design": "not_started",
94
- "tasks": "not_started"
95
- },
96
- "decisions": [],
97
- "created": "$TODAY",
98
- "updated": "$TODAY"
100
+ "phase_status": {},
101
+ "strategy": "auto",
102
+ "execute_state": {},
103
+ "created": "$(date -u +%Y-%m-%d)",
104
+ "updated": "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
99
105
  }
100
- EOF
101
-
102
- # Generate progress.md (from template)
103
- python3 <<PYEOF
104
- from pathlib import Path
105
- tmpl = Path("${CLAUDE_PLUGIN_ROOT}/templates/progress.md.tmpl").read_text()
106
- content = tmpl.replace("{{SPEC_NAME}}", "$SPEC_NAME").replace("{{CREATED_DATE}}", "$TODAY")
107
- Path("$SPEC_DIR/.progress.md").write_text(content)
108
- PYEOF
109
-
110
- # Set as active spec
111
- echo "$SPEC_NAME" > ".flow/.active-spec"
112
-
113
- echo "✓ Spec directory created: $SPEC_DIR"
114
- echo " Goal: $GOAL"
115
- echo ""
116
- echo "Next step (automatically entering research):"
117
- echo " /curdx-flow:research"
106
+ JSON
107
+ echo "$SPEC_NAME" > .flow/.active-spec
118
108
  ```
119
109
 
120
- ## Step 4b: Resume Path
110
+ If `GOAL` is empty, `AskUserQuestion` to gather it before writing `.state.json`.
121
111
 
122
- If `MODE=resume`:
112
+ Then seed a minimal `.progress.md`:
123
113
 
124
114
  ```bash
125
- # Read current phase
126
- STATE_FILE="$SPEC_DIR/.state.json"
127
- if [ ! -f "$STATE_FILE" ]; then
128
- echo "✗ Spec directory exists but .state.json is missing. State may be corrupted."
129
- # Ask user
130
- # AskUserQuestion: "Rebuild state file?" (yes/no)
131
- exit 0
132
- fi
133
-
134
- # Set as active
135
- echo "$SPEC_NAME" > ".flow/.active-spec"
136
-
137
- # Show current progress
138
- python3 <<PYEOF
139
- import json
140
- s = json.load(open("$STATE_FILE"))
141
- print(f"✓ Spec activated: $SPEC_NAME")
142
- print(f" Goal: {s.get('goal','(undefined)')}")
143
- print(f" Current phase: {s['phase']}")
144
- print(f" Progress:")
145
-
146
- ph_emoji = {"completed": "✓", "in_progress": "●", "not_started": "○", "failed": "✗", "skipped": "—"}
147
- for phase, status in s.get('phase_status',{}).items():
148
- emoji = ph_emoji.get(status, "?")
149
- print(f" {emoji} {phase:<15} {status}")
150
-
151
- print()
152
-
153
- # Recommend next step
154
- phase_order = ["research", "requirements", "design", "tasks", "execute", "verify", "ship"]
155
- for p in phase_order:
156
- st = s.get('phase_status',{}).get(p, 'not_started')
157
- if st in ("not_started", "in_progress"):
158
- print(f"Next step: /curdx-flow:{p}")
159
- break
160
- else:
161
- print("All phases complete! Use /curdx-flow:status to see details.")
162
- PYEOF
115
+ cat > ".flow/specs/$SPEC_NAME/.progress.md" <<MD
116
+ # Progress Log — $SPEC_NAME
117
+
118
+ **Goal**: $GOAL
119
+ **Mode**: $FLAG_MODE
120
+ **Created**: $(date -u +%Y-%m-%d)
121
+
122
+ ## Decisions
123
+ (populated during /curdx-flow:spec)
124
+
125
+ ## Learnings
126
+ (populated during /curdx-flow:implement)
127
+ MD
163
128
  ```
164
129
 
165
- ## Step 5: Summary Prompt
130
+ ### Branch E: no args, no flags
131
+ ```
132
+ AskUserQuestion:
133
+ "No spec name given. What would you like to do?"
134
+ Options:
135
+ - Create a new spec (prompts for name + goal)
136
+ - Resume the last active spec
137
+ - List all specs
138
+ ```
139
+
140
+ Route to the matching branch.
141
+
142
+ ## Mode semantics
143
+
144
+ The `mode` field in `.state.json` drives behavior in later commands:
145
+
146
+ | Mode | `/curdx-flow:spec` default | `/curdx-flow:implement` default | Gates applied |
147
+ |------|---------------------------|--------------------------------|---------------|
148
+ | `fast` | skipped (use `/curdx-flow:fast` instead) | linear strategy | karpathy + verification |
149
+ | `standard` | full 4 phases | auto strategy | + tdd + coverage-audit |
150
+ | `enterprise` | full 4 phases + `--review=all` | auto strategy + stricter gates | + adversarial + edge-case + security |
151
+
152
+ ## Post-create reporting
166
153
 
167
154
  ```
168
- ═══════════════════════════════════════════════
169
- ✓ Spec ready: $SPEC_NAME
170
-
171
- Workflow:
172
- 1. /curdx-flow:research ← research (deep exploration)
173
- 2. /curdx-flow:requirements ← requirements (user stories + acceptance criteria)
174
- 3. /curdx-flow:design ← design (architectural decisions, freeze choices)
175
- 4. /curdx-flow:tasks ← task decomposition (auto-verifiable tasks)
176
-
177
- One-shot version: /curdx-flow:spec ← run research/req/design/tasks in sequence
178
-
179
- Fast modes:
180
- /curdx-flow:fast ← skip spec and implement directly
181
- /curdx-flow:sketch ← UI prototype exploration
182
- ═══════════════════════════════════════════════
155
+ ✓ Spec ready: <name>
156
+ Goal: <goal>
157
+ Mode: <mode>
158
+ Path: .flow/specs/<name>/
159
+
160
+ Next: /curdx-flow:spec
183
161
  ```
184
162
 
185
- ## Error Recovery
163
+ ## References
186
164
 
187
- - `.flow/` does not exist → prompt `/curdx-flow:init`
188
- - spec-name invalid → provide kebab-case example
189
- - Spec directory corrupted → ask user whether to delete-and-rebuild or repair
165
+ - State schema: `@${CLAUDE_PLUGIN_ROOT}/schemas/spec-state.schema.json`
166
+ - Mode semantics: `@${CLAUDE_PLUGIN_ROOT}/knowledge/execution-strategies.md`
@@ -1,124 +1,142 @@
1
1
  ---
2
2
  name: verify
3
- description: goal reverse verification — trace back from FR/AC/AD to check whether the code actually implements them, detect stubs / fake completion. Dispatches flow-verifier.
4
- argument-hint: "[spec-name]"
3
+ description: Goal-backward verification — trace from every FR / AC / AD in the spec to the code and tests, detect stubs and fake completions. The differentiator command. Optionally adds multi-source coverage audit with --strict.
4
+ argument-hint: "[--strict]"
5
5
  allowed-tools: [Read, Bash, Task, Grep, Glob]
6
6
  ---
7
7
 
8
- # Flow Verify — Goal Reverse Verification
8
+ # Goal-Backward Verification
9
9
 
10
- Dispatch the `flow-verifier` agent to confirm, starting from the spec, that the code truly implements the requirements; do not trust any "done" claim.
10
+ This is the **differentiator command**: it scans the implementation against the spec's own requirements and catches the most common Claude failure mode — claiming "done" while actual code is a stub or fake completion.
11
11
 
12
- ## When to Use
12
+ ## Flags
13
13
 
14
- - After `/curdx-flow:implement` completes
15
- - Final gate before a PR
16
- - When you suspect a feature is a fake implementation (stub/TODO)
14
+ | Flag | Default | Purpose |
15
+ |------|---------|---------|
16
+ | `--strict` | off | Also run multi-source coverage audit (FR / AC / AD / Research conclusions / D-NN decisions) — replaces v1's `/audit` command. |
17
17
 
18
- ## Step 1: Parse + Preflight Check
18
+ ## Preflight
19
19
 
20
20
  ```bash
21
- SPEC_NAME="${ARGUMENTS:-$(cat .flow/.active-spec 2>/dev/null)}"
22
- [ -z "$SPEC_NAME" ] && { echo "❌ No active spec. Run /curdx-flow:switch or /curdx-flow:start first"; exit 1; }
21
+ [ ! -d ".flow" ] && { echo "✗ Not a CurDX-Flow project."; exit 1; }
23
22
 
24
- DIR=".flow/specs/$SPEC_NAME"
25
- for f in requirements.md design.md; do
26
- [ ! -f "$DIR/$f" ] && { echo "❌ Missing $f. Complete /curdx-flow:requirements /curdx-flow:design first"; exit 1; }
23
+ SPEC_NAME=$(cat .flow/.active-spec 2>/dev/null)
24
+ [ -z "$SPEC_NAME" ] && { echo "✗ No active spec. Run /curdx-flow:start first."; exit 1; }
25
+
26
+ SPEC_DIR=".flow/specs/$SPEC_NAME"
27
+ for f in requirements.md design.md tasks.md; do
28
+ [ ! -f "$SPEC_DIR/$f" ] && {
29
+ echo "✗ $SPEC_DIR/$f missing. Run /curdx-flow:spec first.";
30
+ exit 1;
31
+ }
27
32
  done
33
+
34
+ FLAG_STRICT=$(echo "$ARGUMENTS" | grep -q -- '--strict' && echo 1 || echo 0)
28
35
  ```
29
36
 
30
- ## Step 2: Determine Scope
37
+ ## Workflow
31
38
 
32
- ```bash
33
- # Read the commit range for the execute phase
34
- # From .state.json or git reflog
35
-
36
- LAST_EXEC_START=$(python3 -c "
37
- import json
38
- s = json.load(open('$DIR/.state.json'))
39
- # Custom field or inferred from git
40
- print(s.get('execute_state', {}).get('start_commit', ''))
41
- ")
42
-
43
- # If unavailable, use main..HEAD
44
- RANGE="${LAST_EXEC_START:-main}..HEAD"
45
- echo "Verification scope: $RANGE"
46
- ```
39
+ ### Step 1: Dispatch `flow-verifier`
47
40
 
48
- ## Step 3: Dispatch flow-verifier
41
+ Delegate to the `flow-verifier` agent with:
42
+ - `requirements.md` (source of FR-NN, AC-N.N, US-NN)
43
+ - `design.md` (source of AD-NN)
44
+ - `tasks.md` (source of T-N.M and their Verify commands)
45
+ - Repository root (to scan code + tests)
49
46
 
50
- ```
51
- Task:
52
- subagent_type: general-purpose
53
- description: "verify $SPEC_NAME"
54
- prompt: |
55
- You are the flow-verifier agent. Full definition:
56
- ${CLAUDE_PLUGIN_ROOT}/agents/flow-verifier.md
57
-
58
- Must read:
59
- - .flow/specs/$SPEC_NAME/requirements.md
60
- - .flow/specs/$SPEC_NAME/design.md
61
- - .flow/specs/$SPEC_NAME/tasks.md
62
- - .flow/specs/$SPEC_NAME/.state.json
63
- - .flow/STATE.md
64
-
65
- Verification scope: commits $RANGE
66
-
67
- Tasks:
68
- 1. Extract every FR / AC / AD / Component / error-path assertion
69
- 2. Find evidence for each assertion (code + test + actual run)
70
- 3. Scan for stub patterns (TODO / Not implemented / return {})
71
- 4. Generate verification-report.md
72
- 5. Update phase_status.verify in .state.json
73
-
74
- Output file:
75
- .flow/specs/$SPEC_NAME/verification-report.md
76
-
77
- Return a brief:
78
- - Fully verified / partially verified / not verified counts
79
- - Number of fake implementations
80
- - List of blockers
81
- - Suggested next step
82
- ```
47
+ The agent performs goal-backward tracing:
48
+ 1. For each **FR-NN**: search for code that implements it. Classify as `IMPLEMENTED` / `STUB` / `MISSING` / `UNCERTAIN`.
49
+ 2. For each **AC-N.N**: search for a test that exercises it. Classify as `TESTED` / `UNTESTED`.
50
+ 3. For each **AD-NN**: check that the design decision is reflected in code structure / interfaces.
51
+ 4. Detect suspicious patterns:
52
+ - `throw new Error("not implemented")` / `TODO` / `NotImplementedError`
53
+ - `return null` / `return {}` in places that should produce real output
54
+ - test files with only `it.skip(...)` or no assertions
55
+ - code that returns mocked fixtures instead of calling real collaborators
56
+
57
+ ### Step 2: Run the Verify commands from `tasks.md`
58
+
59
+ For every task listed in `tasks.md`, run its declared `Verify:` command and record pass/fail. This is the most objective check — if the task said `npm test -- auth.spec.ts`, run exactly that.
60
+
61
+ ### Step 3: (Strict only) Multi-source coverage audit
62
+
63
+ If `--strict`:
64
+ - Cross-check every FR / AC / AD / decision against the implementation
65
+ - Cross-check every research conclusion: was the recommended library / approach actually used?
66
+ - Cross-check every D-NN decision in `.flow/STATE.md` that references this spec
67
+
68
+ ### Step 4: Produce `verification-report.md`
83
69
 
84
- ## Step 4: Read Report + Decide
70
+ **Landing check**: sub-agent responses can be truncated by the model's output-length limit. After dispatching `flow-verifier`, verify the report actually landed:
85
71
 
86
72
  ```bash
87
- REPORT="$DIR/verification-report.md"
88
- [ ! -f "$REPORT" ] && { echo " verifier did not produce a report"; exit 1; }
89
-
90
- # Parse the verdict from the report
91
- VERIFIED=$(grep -c "^\- ✓" "$REPORT" || echo 0)
92
- PARTIAL=$(grep -c "^\- ⚠" "$REPORT" || echo 0)
93
- MISSING=$(grep -c "^\- " "$REPORT" || echo 0)
94
- STUBS=$(grep -c "^\- 🚨" "$REPORT" || echo 0)
73
+ REPORT=".flow/specs/$SPEC_NAME/verification-report.md"
74
+ if [ ! -f "$REPORT" ] || [ "$(wc -c < "$REPORT" 2>/dev/null | tr -d ' ')" -lt 300 ]; then
75
+ echo "⚠ Report missing or truncated. Re-dispatching flow-verifier with a terse 'write the report now' prompt."
76
+ # Re-dispatch pattern:
77
+ # "Your only job right now is to Write the verification-report.md using the
78
+ # findings you already gathered. Do not re-scan. Do not narrate. Write
79
+ # the file and stop."
80
+ fi
95
81
  ```
96
82
 
97
- ## Step 5: Output to User
83
+ Write to `.flow/specs/$SPEC_NAME/verification-report.md`:
98
84
 
85
+ ```markdown
86
+ # Verification Report — <spec-name>
87
+
88
+ **Generated**: <ISO8601>
89
+ **Mode**: strict | normal
90
+
91
+ ## Summary
92
+ - FR coverage: N/M implemented (K stubs, L missing)
93
+ - AC coverage: N/M tested
94
+ - AD coverage: N/M reflected
95
+ - Verify commands: N/M passing
96
+
97
+ ## Findings
98
+
99
+ ### Missing implementations
100
+ - FR-02: <description>. No matching code found in <paths searched>.
101
+
102
+ ### Stubs / fake completions
103
+ - FR-05: Implemented in `src/auth.ts:42` but body is `throw new Error("not implemented")`.
104
+
105
+ ### Untested acceptance criteria
106
+ - AC-1.3: No test asserts token refresh after 15 min expiry.
107
+
108
+ ### Failing Verify commands
109
+ - T-2.1: `npm test auth.spec.ts` → 3 failed
110
+
111
+ ## Verdict
112
+ - [ ] PASS — all items covered and passing
113
+ - [X] PARTIAL — <n> findings must be addressed before shipping
114
+ - [ ] MISSING — substantive implementation gaps
99
115
  ```
100
- ✓ Verify complete: $SPEC_NAME
101
116
 
102
- Stats:
103
- ✓ Fully verified: $VERIFIED
104
- Partially verified: $PARTIAL
105
- Not verified: $MISSING
106
- 🚨 Fake implementations: $STUBS
117
+ ### Step 5: Apply `verification-gate`
118
+
119
+ Hard rule from `@${CLAUDE_PLUGIN_ROOT}/gates/verification-gate.md`:
120
+ - Any `STUB` or `MISSING` finding on a non-deferred FR blocks completion.
121
+ - Any failing Verify command blocks completion.
122
+ - Waive only with an explicit user D-NN decision logged to `.flow/STATE.md`.
107
123
 
108
- Report: .flow/specs/$SPEC_NAME/verification-report.md
124
+ ## Reporting
125
+
126
+ ```
127
+ ✓ Verification complete
128
+ FR coverage: 8/10 implemented (1 stub, 1 missing)
129
+ AC coverage: 9/12 tested
130
+ Verify commands: 14/15 passing
131
+ Verdict: PARTIAL
109
132
 
110
- Verdict:
111
- $([ $MISSING -gt 0 ] && echo '❌ BLOCKED — unimplemented FR/AC/AD exist, return to /curdx-flow:implement to fill in')
112
- $([ $STUBS -gt 0 ] && echo '❌ BLOCKED — fake implementations found')
113
- $([ $MISSING -eq 0 ] && [ $STUBS -eq 0 ] && echo '✓ PASS — can proceed to /curdx-flow:review')
133
+ Findings written to: .flow/specs/<name>/verification-report.md
114
134
 
115
- Next step:
116
- $([ $MISSING -gt 0 ] && echo 'fix blockers → /curdx-flow:implement --task=<new task>')
117
- $([ $MISSING -eq 0 ] && echo '/curdx-flow:review — enter code quality review')
135
+ Next: address findings, then re-run /curdx-flow:verify, or run /curdx-flow:review.
118
136
  ```
119
137
 
120
- ## Error Recovery
138
+ ## References
121
139
 
122
- - verifier times out → reduce spec scope (verify only specific FRs), then rerun
123
- - Some Verify commands are unavailable (e.g. require a DB connection) → mark "needs manual verification" in the report
124
- - verifier output is vague prompt to look at specific sections of the report
140
+ - `flow-verifier` agent: `@${CLAUDE_PLUGIN_ROOT}/agents/flow-verifier.md`
141
+ - `verification-gate`: `@${CLAUDE_PLUGIN_ROOT}/gates/verification-gate.md`
142
+ - `coverage-audit-gate` (used in strict mode): `@${CLAUDE_PLUGIN_ROOT}/gates/coverage-audit-gate.md`
@@ -33,19 +33,19 @@ A reviewer agent's output of "everything looks fine, no issues found" is an **in
33
33
  - "Looks good" is usually confirmation bias (the agent only checked the obvious)
34
34
  - AI tends to please the user ("great job!") — fight this tendency
35
35
 
36
- **Forced actions**:
37
- 1. If the agent outputs "no issues", automatically trigger a second round
38
- 2. The second round requires the agent to perform deeper analysis via sequential-thinking
39
- 3. If both rounds yield no findings, the agent must **prove** it checked:
40
- - List the dimensions examined (at least 5)
41
- - For each dimension, give the specific code/file locations inspected
42
- - Provide counterfactual hypotheses of "what it would look like if there were a problem"
36
+ **Forced actions when the agent reports "no issues"**:
37
+ 1. Automatically trigger a second round framed as "what would a senior skeptic reject in this PR?"
38
+ 2. If both rounds still honestly yield no findings, the agent must emit a **proof-of-checking report**:
39
+ - Every category it examined (with "N/A" for categories that don't apply)
40
+ - For each examined category, the specific code/file locations inspected
41
+ - Counterfactual hypotheses of "what this would look like if there were a problem" and why that signature is absent
42
+ 3. Fabricating findings to avoid the proof-of-checking step is a violation of L3 red line #2 (fact-driven). Better to emit "clean verdict with proof" than invent issues.
43
43
 
44
44
  ---
45
45
 
46
- ### Rule 2: Findings in at Least 3 Categories
46
+ ### Rule 2: Coverage proportional to feature scope
47
47
 
48
- A complete adversarial review must cover (find issues in at least 3 of these categories):
48
+ A complete adversarial review covers every category that applies to the feature, marks the rest as N/A with reason. Number of findings per category is proportional to real issues, not a quota:
49
49
 
50
50
  1. **Architecture layer**: Are decisions sound? Future-extensible? Lock-in risks?
51
51
  2. **Implementation layer**: Code quality? Error handling? Performance?
@@ -86,22 +86,22 @@ Not allowed:
86
86
  Input: object under review (code range / spec / PR diff)
87
87
 
88
88
  Round 1 (agent self-analysis):
89
- - Use sequential-thinking 6 rounds
90
- - Scan all 6 categories
89
+ - Use sequential-thinking proportional to the surface being probed
90
+ - Scan each applicable category; mark N/A ones with reason
91
91
  - Output findings list
92
92
 
93
93
  Decision:
94
- - Findings 3? → output report
95
- - Findings < 3? → force Round 2
94
+ - Any real findings? → output report with findings
95
+ - Zero findings after honest Round 1? → force Round 2 framed as skeptic
96
96
 
97
97
  Round 2 (deep analysis):
98
- - sequential-thinking for another 6 rounds
98
+ - sequential-thinking proportional to residual uncertainty
99
99
  - Focus on "seemingly no issues" parts (trust but verify)
100
- - May introduce external perspectives (read issues from similar projects)
100
+ - Optionally introduce external perspectives (read issues from similar projects)
101
101
 
102
102
  Decision:
103
- - Still < 3? → agent must explicitly prove it checked
104
- - Otherwise → output report
103
+ - Still zero findings? → agent must emit proof-of-checking report (NOT invent findings)
104
+ - Findings exist? → output report
105
105
 
106
106
  Output: review-report.md
107
107
  ```
@@ -190,10 +190,10 @@ Fix loop:
190
190
 
191
191
  ## Failure Recovery
192
192
 
193
- If after 2 rounds there are still < 3 findings:
193
+ If after Round 2 the honest verdict is still zero findings, emit a proof-of-checking report (do NOT fabricate to hit a quota — there is no quota):
194
194
 
195
195
  ```markdown
196
- ## Adversarial Review — Insufficient Findings
196
+ ## Adversarial Review — Proof of Checking (zero findings)
197
197
 
198
198
  I have examined the following dimensions across 2 rounds of analysis:
199
199
 
@@ -210,7 +210,7 @@ I have examined the following dimensions across 2 rounds of analysis:
210
210
 
211
211
  Recommendations:
212
212
  - Human review (at least walk through the diff once)
213
- - Consider /curdx-flow:qa for real browser/integration testing (Phase 5+)
213
+ - Consider the `browser-qa` skill for real browser/integration testing (Phase 5+)
214
214
  - Wait until deployment to staging to observe
215
215
  ```
216
216
 
@@ -17,7 +17,7 @@ depends_on: []
17
17
 
18
18
  - End of the tasks phase (last step of flow-planner)
19
19
  - Before the execution phase completes (when /curdx-flow:verify runs)
20
- - Explicitly requested by /curdx-flow:audit
20
+ - Explicitly requested by /curdx-flow:verify --strict
21
21
 
22
22
  ---
23
23