@curdx/flow 1.1.4 → 1.1.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +25 -0
- package/.claude-plugin/plugin.json +43 -0
- package/CHANGELOG.md +279 -0
- package/agent-preamble/preamble.md +214 -0
- package/agents/flow-adversary.md +216 -0
- package/agents/flow-architect.md +190 -0
- package/agents/flow-debugger.md +325 -0
- package/agents/flow-edge-hunter.md +273 -0
- package/agents/flow-executor.md +246 -0
- package/agents/flow-planner.md +204 -0
- package/agents/flow-product-designer.md +146 -0
- package/agents/flow-qa-engineer.md +276 -0
- package/agents/flow-researcher.md +155 -0
- package/agents/flow-reviewer.md +280 -0
- package/agents/flow-security-auditor.md +398 -0
- package/agents/flow-triage-analyst.md +290 -0
- package/agents/flow-ui-researcher.md +227 -0
- package/agents/flow-ux-designer.md +247 -0
- package/agents/flow-verifier.md +283 -0
- package/agents/persona-amelia.md +128 -0
- package/agents/persona-david.md +141 -0
- package/agents/persona-emma.md +179 -0
- package/agents/persona-john.md +105 -0
- package/agents/persona-mary.md +95 -0
- package/agents/persona-oliver.md +136 -0
- package/agents/persona-rachel.md +126 -0
- package/agents/persona-serena.md +175 -0
- package/agents/persona-winston.md +117 -0
- package/bin/curdx-flow.js +5 -2
- package/cli/install.js +44 -5
- package/commands/audit.md +170 -0
- package/commands/autoplan.md +184 -0
- package/commands/debug.md +199 -0
- package/commands/design.md +155 -0
- package/commands/discuss.md +162 -0
- package/commands/doctor.md +124 -0
- package/commands/fast.md +128 -0
- package/commands/help.md +119 -0
- package/commands/implement.md +381 -0
- package/commands/index.md +261 -0
- package/commands/init.md +105 -0
- package/commands/install-deps.md +128 -0
- package/commands/party.md +241 -0
- package/commands/plan-ceo.md +117 -0
- package/commands/plan-design.md +107 -0
- package/commands/plan-dx.md +104 -0
- package/commands/plan-eng.md +108 -0
- package/commands/qa.md +118 -0
- package/commands/requirements.md +146 -0
- package/commands/research.md +141 -0
- package/commands/review.md +168 -0
- package/commands/security.md +109 -0
- package/commands/sketch.md +118 -0
- package/commands/spec.md +135 -0
- package/commands/spike.md +181 -0
- package/commands/start.md +189 -0
- package/commands/status.md +139 -0
- package/commands/switch.md +95 -0
- package/commands/tasks.md +189 -0
- package/commands/triage.md +160 -0
- package/commands/verify.md +124 -0
- package/gates/adversarial-review-gate.md +219 -0
- package/gates/coverage-audit-gate.md +184 -0
- package/gates/devex-gate.md +255 -0
- package/gates/edge-case-gate.md +194 -0
- package/gates/karpathy-gate.md +130 -0
- package/gates/security-gate.md +218 -0
- package/gates/tdd-gate.md +188 -0
- package/gates/verification-gate.md +183 -0
- package/hooks/hooks.json +56 -0
- package/hooks/scripts/fail-tracker.sh +31 -0
- package/hooks/scripts/inject-karpathy.sh +52 -0
- package/hooks/scripts/quick-mode-guard.sh +64 -0
- package/hooks/scripts/session-start.sh +76 -0
- package/hooks/scripts/stop-watcher.sh +166 -0
- package/knowledge/atomic-commits.md +262 -0
- package/knowledge/epic-decomposition.md +307 -0
- package/knowledge/execution-strategies.md +278 -0
- package/knowledge/karpathy-guidelines.md +219 -0
- package/knowledge/planning-reviews.md +211 -0
- package/knowledge/poc-first-workflow.md +227 -0
- package/knowledge/spec-driven-development.md +183 -0
- package/knowledge/systematic-debugging.md +384 -0
- package/knowledge/two-stage-review.md +233 -0
- package/knowledge/wave-execution.md +387 -0
- package/package.json +14 -3
- package/schemas/config.schema.json +100 -0
- package/schemas/spec-frontmatter.schema.json +42 -0
- package/schemas/spec-state.schema.json +117 -0
|
@@ -0,0 +1,160 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: triage
|
|
3
|
+
description: Epic decomposition — slice a big goal vertically by user value, generate dependency graph + multiple sub-specs. Dispatches flow-triage-analyst.
|
|
4
|
+
argument-hint: "\"<epic goal>\" [--specs=<N>]"
|
|
5
|
+
allowed-tools: [Read, Write, Bash, Task, WebSearch]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Flow Triage — Epic Decomposition
|
|
9
|
+
|
|
10
|
+
@${CLAUDE_PLUGIN_ROOT}/knowledge/epic-decomposition.md
|
|
11
|
+
|
|
12
|
+
Break down a big goal (needs 4+ sub-specs to complete) into an Epic of vertical slices.
|
|
13
|
+
|
|
14
|
+
## When to Use
|
|
15
|
+
|
|
16
|
+
- Goal is clearly more than 2 weeks of work
|
|
17
|
+
- Involves multiple "independently usable" features
|
|
18
|
+
- At `/curdx-flow:start`, you realize the goal is too big — switch to `/curdx-flow:triage`
|
|
19
|
+
|
|
20
|
+
## When Not to Use
|
|
21
|
+
|
|
22
|
+
- Goal fits in 1-2 weeks → use `/curdx-flow:start`
|
|
23
|
+
- Emergency fix → use `/curdx-flow:fast`
|
|
24
|
+
- Exploratory validation → use `/curdx-flow:spike`
|
|
25
|
+
|
|
26
|
+
## Step 1: Preflight Check
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
[ ! -d ".flow" ] && { echo "❌ Not a CurDX-Flow project"; exit 1; }
|
|
30
|
+
|
|
31
|
+
GOAL="$ARGUMENTS"
|
|
32
|
+
# Extract --specs argument
|
|
33
|
+
TARGET_SPECS=""
|
|
34
|
+
case "$GOAL" in
|
|
35
|
+
*--specs=*)
|
|
36
|
+
TARGET_SPECS=$(echo "$GOAL" | grep -oE -- '--specs=[0-9]+' | cut -d= -f2)
|
|
37
|
+
GOAL=$(echo "$GOAL" | sed 's/--specs=[0-9]*//g' | xargs)
|
|
38
|
+
;;
|
|
39
|
+
esac
|
|
40
|
+
|
|
41
|
+
[ -z "$GOAL" ] && { echo "Usage: /curdx-flow:triage \"<epic goal>\""; exit 1; }
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Step 2: Generate Epic Name
|
|
45
|
+
|
|
46
|
+
Derive a kebab-case name from the goal, or ask the user:
|
|
47
|
+
|
|
48
|
+
```python
|
|
49
|
+
# Rough inference
|
|
50
|
+
slug = re.sub(r'\s+', '-', goal.lower())
|
|
51
|
+
slug = re.sub(r'[^a-z0-9-]', '', slug)[:40]
|
|
52
|
+
# e.g.: "add payment system" → "payment-system"
|
|
53
|
+
# May need AskUserQuestion to confirm
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## Step 3: Create Epic Directory
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
EPIC_DIR=".flow/_epics/$EPIC_NAME"
|
|
60
|
+
mkdir -p "$EPIC_DIR"
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
## Step 4: Dispatch flow-triage-analyst
|
|
64
|
+
|
|
65
|
+
```
|
|
66
|
+
Task:
|
|
67
|
+
subagent_type: general-purpose
|
|
68
|
+
description: "Epic decomposition: $EPIC_NAME"
|
|
69
|
+
prompt: |
|
|
70
|
+
You are the flow-triage-analyst agent. Full definition:
|
|
71
|
+
${CLAUDE_PLUGIN_ROOT}/agents/flow-triage-analyst.md
|
|
72
|
+
|
|
73
|
+
Knowledge base (must read):
|
|
74
|
+
${CLAUDE_PLUGIN_ROOT}/knowledge/epic-decomposition.md
|
|
75
|
+
|
|
76
|
+
Input:
|
|
77
|
+
- Epic goal: "$GOAL"
|
|
78
|
+
- Epic name: $EPIC_NAME
|
|
79
|
+
- Suggested sub-spec count: ${TARGET_SPECS:-auto (4-8)}
|
|
80
|
+
- Project context: .flow/PROJECT.md + .flow/CONTEXT.md + .flow/STATE.md
|
|
81
|
+
|
|
82
|
+
Mandatory workflow:
|
|
83
|
+
1. sequential-thinking >= 5 rounds to understand the goal
|
|
84
|
+
2. context7 to validate the key technologies involved
|
|
85
|
+
3. claude-mem to retrieve history
|
|
86
|
+
4. sequential-thinking 5+ rounds to brainstorm decomposition
|
|
87
|
+
5. Vertical slice by user value (not by technical layer)
|
|
88
|
+
6. Define shared interfaces (freeze)
|
|
89
|
+
7. Identify dependencies (hard/soft/parallel)
|
|
90
|
+
8. Generate epic.md + sub-spec skeletons
|
|
91
|
+
|
|
92
|
+
Output files:
|
|
93
|
+
- .flow/_epics/$EPIC_NAME/epic.md (Epic master document)
|
|
94
|
+
- .flow/_epics/$EPIC_NAME/.epic-state.json
|
|
95
|
+
- .flow/specs/<sub-1>/.state.json
|
|
96
|
+
- .flow/specs/<sub-2>/.state.json
|
|
97
|
+
- ...(skeleton for each sub-spec)
|
|
98
|
+
|
|
99
|
+
Success criteria:
|
|
100
|
+
- Sub-spec count: 4-8
|
|
101
|
+
- Each sub-spec has independent user value
|
|
102
|
+
- Dependency graph is clear (mermaid)
|
|
103
|
+
- Shared interfaces frozen (TypeScript types)
|
|
104
|
+
- Out of Scope is explicit
|
|
105
|
+
|
|
106
|
+
When done, return a brief:
|
|
107
|
+
- Sub-spec list (name + one-sentence description)
|
|
108
|
+
- Dependency graph
|
|
109
|
+
- Recommended execution order
|
|
110
|
+
- Estimated total duration
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
## Step 5: Validate Output
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
EPIC_FILE="$EPIC_DIR/epic.md"
|
|
117
|
+
[ ! -f "$EPIC_FILE" ] && { echo "❌ Epic document not generated"; exit 1; }
|
|
118
|
+
|
|
119
|
+
# Count sub-specs
|
|
120
|
+
SUB_COUNT=$(grep -c "^### Spec [0-9]" "$EPIC_FILE" || echo 0)
|
|
121
|
+
[ $SUB_COUNT -lt 3 ] && echo "⚠ Fewer than 3 sub-specs; may not be decomposed enough"
|
|
122
|
+
[ $SUB_COUNT -gt 10 ] && echo "⚠ More than 10 sub-specs; granularity may be too fine"
|
|
123
|
+
|
|
124
|
+
# Check mermaid graph
|
|
125
|
+
grep -q "mermaid" "$EPIC_FILE" || echo "✗ No mermaid dependency graph found"
|
|
126
|
+
|
|
127
|
+
# Check shared interfaces
|
|
128
|
+
grep -q "shared interface" "$EPIC_FILE" || echo "✗ Shared interfaces not defined"
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
## Step 6: Show Results to User
|
|
132
|
+
|
|
133
|
+
```
|
|
134
|
+
✓ Epic decomposition complete: $EPIC_NAME
|
|
135
|
+
|
|
136
|
+
Files:
|
|
137
|
+
.flow/_epics/$EPIC_NAME/epic.md
|
|
138
|
+
.flow/_epics/$EPIC_NAME/.epic-state.json
|
|
139
|
+
|
|
140
|
+
Sub-specs: $SUB_COUNT (skeletons created)
|
|
141
|
+
|
|
142
|
+
Dependency graph: see epic.md
|
|
143
|
+
|
|
144
|
+
Recommended execution order:
|
|
145
|
+
Week 1: /curdx-flow:switch <sub-1> → /curdx-flow:spec → /curdx-flow:implement
|
|
146
|
+
Week 2: /curdx-flow:switch <sub-2> (parallel with sub-3 if independent)
|
|
147
|
+
Week 3: /curdx-flow:switch <sub-4> (depends on sub-1 being done)
|
|
148
|
+
...
|
|
149
|
+
|
|
150
|
+
Next steps:
|
|
151
|
+
1. Review .flow/_epics/$EPIC_NAME/epic.md to confirm the decomposition is reasonable
|
|
152
|
+
2. First sub-spec: /curdx-flow:switch <sub-1-name>
|
|
153
|
+
3. Then proceed with the normal /curdx-flow:spec → /curdx-flow:implement flow
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
## Error Recovery
|
|
157
|
+
|
|
158
|
+
- triage-analyst fails → the goal may be too abstract; refine the description manually and rerun
|
|
159
|
+
- Too many/too few sub-specs → use `--specs=N` to force a count
|
|
160
|
+
- Incomplete interface definitions → go back to epic.md and add interfaces manually, or rerun the agent to fill them in
|
|
@@ -0,0 +1,124 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: verify
|
|
3
|
+
description: goal reverse verification — trace back from FR/AC/AD to check whether the code actually implements them, detect stubs / fake completion. Dispatches flow-verifier.
|
|
4
|
+
argument-hint: "[spec-name]"
|
|
5
|
+
allowed-tools: [Read, Bash, Task, Grep, Glob]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Flow Verify — Goal Reverse Verification
|
|
9
|
+
|
|
10
|
+
Dispatch the `flow-verifier` agent to confirm, starting from the spec, that the code truly implements the requirements; do not trust any "done" claim.
|
|
11
|
+
|
|
12
|
+
## When to Use
|
|
13
|
+
|
|
14
|
+
- After `/curdx-flow:implement` completes
|
|
15
|
+
- Final gate before a PR
|
|
16
|
+
- When you suspect a feature is a fake implementation (stub/TODO)
|
|
17
|
+
|
|
18
|
+
## Step 1: Parse + Preflight Check
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
SPEC_NAME="${ARGUMENTS:-$(cat .flow/.active-spec 2>/dev/null)}"
|
|
22
|
+
[ -z "$SPEC_NAME" ] && { echo "❌ No active spec. Run /curdx-flow:switch or /curdx-flow:start first"; exit 1; }
|
|
23
|
+
|
|
24
|
+
DIR=".flow/specs/$SPEC_NAME"
|
|
25
|
+
for f in requirements.md design.md; do
|
|
26
|
+
[ ! -f "$DIR/$f" ] && { echo "❌ Missing $f. Complete /curdx-flow:requirements /curdx-flow:design first"; exit 1; }
|
|
27
|
+
done
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
## Step 2: Determine Scope
|
|
31
|
+
|
|
32
|
+
```bash
|
|
33
|
+
# Read the commit range for the execute phase
|
|
34
|
+
# From .state.json or git reflog
|
|
35
|
+
|
|
36
|
+
LAST_EXEC_START=$(python3 -c "
|
|
37
|
+
import json
|
|
38
|
+
s = json.load(open('$DIR/.state.json'))
|
|
39
|
+
# Custom field or inferred from git
|
|
40
|
+
print(s.get('execute_state', {}).get('start_commit', ''))
|
|
41
|
+
")
|
|
42
|
+
|
|
43
|
+
# If unavailable, use main..HEAD
|
|
44
|
+
RANGE="${LAST_EXEC_START:-main}..HEAD"
|
|
45
|
+
echo "Verification scope: $RANGE"
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
## Step 3: Dispatch flow-verifier
|
|
49
|
+
|
|
50
|
+
```
|
|
51
|
+
Task:
|
|
52
|
+
subagent_type: general-purpose
|
|
53
|
+
description: "verify $SPEC_NAME"
|
|
54
|
+
prompt: |
|
|
55
|
+
You are the flow-verifier agent. Full definition:
|
|
56
|
+
${CLAUDE_PLUGIN_ROOT}/agents/flow-verifier.md
|
|
57
|
+
|
|
58
|
+
Must read:
|
|
59
|
+
- .flow/specs/$SPEC_NAME/requirements.md
|
|
60
|
+
- .flow/specs/$SPEC_NAME/design.md
|
|
61
|
+
- .flow/specs/$SPEC_NAME/tasks.md
|
|
62
|
+
- .flow/specs/$SPEC_NAME/.state.json
|
|
63
|
+
- .flow/STATE.md
|
|
64
|
+
|
|
65
|
+
Verification scope: commits $RANGE
|
|
66
|
+
|
|
67
|
+
Tasks:
|
|
68
|
+
1. Extract every FR / AC / AD / Component / error-path assertion
|
|
69
|
+
2. Find evidence for each assertion (code + test + actual run)
|
|
70
|
+
3. Scan for stub patterns (TODO / Not implemented / return {})
|
|
71
|
+
4. Generate verification-report.md
|
|
72
|
+
5. Update phase_status.verify in .state.json
|
|
73
|
+
|
|
74
|
+
Output file:
|
|
75
|
+
.flow/specs/$SPEC_NAME/verification-report.md
|
|
76
|
+
|
|
77
|
+
Return a brief:
|
|
78
|
+
- Fully verified / partially verified / not verified counts
|
|
79
|
+
- Number of fake implementations
|
|
80
|
+
- List of blockers
|
|
81
|
+
- Suggested next step
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
## Step 4: Read Report + Decide
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
REPORT="$DIR/verification-report.md"
|
|
88
|
+
[ ! -f "$REPORT" ] && { echo "❌ verifier did not produce a report"; exit 1; }
|
|
89
|
+
|
|
90
|
+
# Parse the verdict from the report
|
|
91
|
+
VERIFIED=$(grep -c "^\- ✓" "$REPORT" || echo 0)
|
|
92
|
+
PARTIAL=$(grep -c "^\- ⚠" "$REPORT" || echo 0)
|
|
93
|
+
MISSING=$(grep -c "^\- ✗" "$REPORT" || echo 0)
|
|
94
|
+
STUBS=$(grep -c "^\- 🚨" "$REPORT" || echo 0)
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
## Step 5: Output to User
|
|
98
|
+
|
|
99
|
+
```
|
|
100
|
+
✓ Verify complete: $SPEC_NAME
|
|
101
|
+
|
|
102
|
+
Stats:
|
|
103
|
+
✓ Fully verified: $VERIFIED
|
|
104
|
+
⚠ Partially verified: $PARTIAL
|
|
105
|
+
✗ Not verified: $MISSING
|
|
106
|
+
🚨 Fake implementations: $STUBS
|
|
107
|
+
|
|
108
|
+
Report: .flow/specs/$SPEC_NAME/verification-report.md
|
|
109
|
+
|
|
110
|
+
Verdict:
|
|
111
|
+
$([ $MISSING -gt 0 ] && echo '❌ BLOCKED — unimplemented FR/AC/AD exist, return to /curdx-flow:implement to fill in')
|
|
112
|
+
$([ $STUBS -gt 0 ] && echo '❌ BLOCKED — fake implementations found')
|
|
113
|
+
$([ $MISSING -eq 0 ] && [ $STUBS -eq 0 ] && echo '✓ PASS — can proceed to /curdx-flow:review')
|
|
114
|
+
|
|
115
|
+
Next step:
|
|
116
|
+
$([ $MISSING -gt 0 ] && echo 'fix blockers → /curdx-flow:implement --task=<new task>')
|
|
117
|
+
$([ $MISSING -eq 0 ] && echo '/curdx-flow:review — enter code quality review')
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
## Error Recovery
|
|
121
|
+
|
|
122
|
+
- verifier times out → reduce spec scope (verify only specific FRs), then rerun
|
|
123
|
+
- Some Verify commands are unavailable (e.g. require a DB connection) → mark "needs manual verification" in the report
|
|
124
|
+
- verifier output is vague → prompt to look at specific sections of the report
|
|
@@ -0,0 +1,219 @@
|
|
|
1
|
+
---
|
|
2
|
+
gate: adversarial-review-gate
|
|
3
|
+
category: enterprise-mode
|
|
4
|
+
severity: warning
|
|
5
|
+
depends_on: []
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Adversarial Review Gate
|
|
9
|
+
|
|
10
|
+
> Derived from BMAD-METHOD's "Adversarial Review Prompts".
|
|
11
|
+
>
|
|
12
|
+
> **Core**: The reviewer must find problems. "Zero findings" triggers re-analysis. This forcibly breaks confirmation bias.
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## Trigger Timing
|
|
17
|
+
|
|
18
|
+
- /curdx-flow:review command
|
|
19
|
+
- Before Phase transitions (requirements → design, design → tasks)
|
|
20
|
+
- Before code merge (/curdx-flow:ship)
|
|
21
|
+
- Enabled by default in Enterprise mode
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Core Rules
|
|
26
|
+
|
|
27
|
+
### Rule 1: Zero Findings Forbidden
|
|
28
|
+
|
|
29
|
+
A reviewer agent's output of "everything looks fine, no issues found" is an **invalid conclusion**.
|
|
30
|
+
|
|
31
|
+
**Reasons**:
|
|
32
|
+
- Real systems inevitably contain trade-offs, boundaries, and risks
|
|
33
|
+
- "Looks good" is usually confirmation bias (the agent only checked the obvious)
|
|
34
|
+
- AI tends to please the user ("great job!") — fight this tendency
|
|
35
|
+
|
|
36
|
+
**Forced actions**:
|
|
37
|
+
1. If the agent outputs "no issues", automatically trigger a second round
|
|
38
|
+
2. The second round requires the agent to perform deeper analysis via sequential-thinking
|
|
39
|
+
3. If both rounds yield no findings, the agent must **prove** it checked:
|
|
40
|
+
- List the dimensions examined (at least 5)
|
|
41
|
+
- For each dimension, give the specific code/file locations inspected
|
|
42
|
+
- Provide counterfactual hypotheses of "what it would look like if there were a problem"
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
### Rule 2: Findings in at Least 3 Categories
|
|
47
|
+
|
|
48
|
+
A complete adversarial review must cover (find issues in at least 3 of these categories):
|
|
49
|
+
|
|
50
|
+
1. **Architecture layer**: Are decisions sound? Future-extensible? Lock-in risks?
|
|
51
|
+
2. **Implementation layer**: Code quality? Error handling? Performance?
|
|
52
|
+
3. **Test layer**: Coverage? Edge cases? Over-mocking?
|
|
53
|
+
4. **Security layer**: Injection risks? Permission checks? Sensitive data leakage?
|
|
54
|
+
5. **Maintainability layer**: Documentation? Naming? Readability?
|
|
55
|
+
6. **User experience layer**: Error messages? Loading states? Accessibility?
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
### Rule 3: Findings Must Have Evidence + Recommendation
|
|
60
|
+
|
|
61
|
+
Format for each finding:
|
|
62
|
+
|
|
63
|
+
```markdown
|
|
64
|
+
### [Category] Issue Title
|
|
65
|
+
|
|
66
|
+
**Location**: src/auth/login.ts:42
|
|
67
|
+
|
|
68
|
+
**Observation**: <what was specifically seen>
|
|
69
|
+
|
|
70
|
+
**Risk**: <what problem will this cause? High/Medium/Low?>
|
|
71
|
+
|
|
72
|
+
**Evidence**: <code snippet / test output / scenario>
|
|
73
|
+
|
|
74
|
+
**Recommendation**: <specifically how to fix>
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
Not allowed:
|
|
78
|
+
- ✗ "The code could be better" (too vague)
|
|
79
|
+
- ✗ "I think... maybe..." (no specific basis)
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
|
|
83
|
+
## Execution Flow
|
|
84
|
+
|
|
85
|
+
```
|
|
86
|
+
Input: object under review (code range / spec / PR diff)
|
|
87
|
+
↓
|
|
88
|
+
Round 1 (agent self-analysis):
|
|
89
|
+
- Use sequential-thinking ≥ 6 rounds
|
|
90
|
+
- Scan all 6 categories
|
|
91
|
+
- Output findings list
|
|
92
|
+
↓
|
|
93
|
+
Decision:
|
|
94
|
+
- Findings ≥ 3? → output report
|
|
95
|
+
- Findings < 3? → force Round 2
|
|
96
|
+
↓
|
|
97
|
+
Round 2 (deep analysis):
|
|
98
|
+
- sequential-thinking for another 6 rounds
|
|
99
|
+
- Focus on "seemingly no issues" parts (trust but verify)
|
|
100
|
+
- May introduce external perspectives (read issues from similar projects)
|
|
101
|
+
↓
|
|
102
|
+
Decision:
|
|
103
|
+
- Still < 3? → agent must explicitly prove it checked
|
|
104
|
+
- Otherwise → output report
|
|
105
|
+
↓
|
|
106
|
+
Output: review-report.md
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## Typical Finding Patterns (for agent reference)
|
|
112
|
+
|
|
113
|
+
### Architecture
|
|
114
|
+
- "New component X depends on Y, but Y is not mentioned in design.md → introduces implicit coupling"
|
|
115
|
+
- "AD-03 chooses JWT, but does not handle token revocation → user logout is not thorough"
|
|
116
|
+
|
|
117
|
+
### Implementation
|
|
118
|
+
- "src/auth/login.ts:42 catches all exceptions and returns 500, but bcrypt timeout should return 429"
|
|
119
|
+
- "Verify in tasks.md is `npm test`, no specific test file specified → may have run unrelated tests"
|
|
120
|
+
|
|
121
|
+
### Testing
|
|
122
|
+
- "login.test.ts has 5 tests, but AC-1.3 (empty password) is not covered"
|
|
123
|
+
- "Tests use `jest.fn()` to mock DB; actual DB constraints are not tested → may fail in production"
|
|
124
|
+
|
|
125
|
+
### Security
|
|
126
|
+
- "Error message returns `User not found`, can be identified by enumeration attack for existence"
|
|
127
|
+
- "JWT secret comes from process.env with no fallback → crashes when environment variable is missing"
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## Output Format
|
|
132
|
+
|
|
133
|
+
```markdown
|
|
134
|
+
## Adversarial Review Report
|
|
135
|
+
|
|
136
|
+
Object under review: auth-system spec (commits abc..xyz)
|
|
137
|
+
Review time: 2026-04-19
|
|
138
|
+
Review rounds: 2 (Round 1 found 2, triggered Round 2)
|
|
139
|
+
|
|
140
|
+
## Total findings: 5
|
|
141
|
+
|
|
142
|
+
### [Architecture] Token revocation not designed
|
|
143
|
+
Location: design.md AD-01
|
|
144
|
+
Observation: stateless JWT was chosen, but requirements FR-04 requires "user logout immediately invalidates other sessions"
|
|
145
|
+
Risk: High — violates an explicit requirement
|
|
146
|
+
Evidence: AD-01 only says "use JWT", does not say how to revoke
|
|
147
|
+
Recommendation: Add AD-06 explaining the use of Redis blacklist + check-on-each-request
|
|
148
|
+
|
|
149
|
+
### [Implementation] bcrypt error handling missing
|
|
150
|
+
Location: src/auth/login.ts:58
|
|
151
|
+
Observation: await bcrypt.compare() has no try-catch
|
|
152
|
+
Risk: Medium — bcrypt crash will cause 500 exposing internal stack
|
|
153
|
+
Evidence: (code snippet omitted)
|
|
154
|
+
Recommendation: wrap in try-catch, return 401
|
|
155
|
+
|
|
156
|
+
### [Test] AC-1.3 not covered
|
|
157
|
+
Location: login.test.ts
|
|
158
|
+
Observation: requirements AC-1.3 requires "empty password returns 400", but there is no corresponding test
|
|
159
|
+
Risk: Medium — regression risk
|
|
160
|
+
Evidence: grep "empty password" → 0 matches
|
|
161
|
+
Recommendation: add test("rejects empty password", ...)
|
|
162
|
+
|
|
163
|
+
### [Security] User enumeration leak
|
|
164
|
+
Location: src/auth/login.ts:45-50
|
|
165
|
+
Observation: email not found → "User not found"; email found pwd wrong → "Wrong password"
|
|
166
|
+
Risk: High — can be used to enumerate registered emails
|
|
167
|
+
Evidence: (code snippet)
|
|
168
|
+
Recommendation: unify error message to "Invalid credentials"
|
|
169
|
+
|
|
170
|
+
### [Maintainability] Inconsistent log format
|
|
171
|
+
Location: src/auth/login.ts:42 vs src/auth/logout.ts:18
|
|
172
|
+
Observation: one uses `console.log`, the other uses `logger.info`
|
|
173
|
+
Risk: Low — does not affect functionality, but future monitoring will be messy
|
|
174
|
+
Evidence: grep log → 2 patterns
|
|
175
|
+
Recommendation: unify to logger.info
|
|
176
|
+
|
|
177
|
+
## Summary
|
|
178
|
+
|
|
179
|
+
Blockers: 2 ([Architecture] token revocation, [Security] user enumeration)
|
|
180
|
+
Warnings: 3
|
|
181
|
+
|
|
182
|
+
Fix loop:
|
|
183
|
+
1. Dispatch flow-executor to fix security + architecture (high priority)
|
|
184
|
+
2. Add tests
|
|
185
|
+
3. Unify log
|
|
186
|
+
4. Re-run /curdx-flow:review for re-review
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
---
|
|
190
|
+
|
|
191
|
+
## Failure Recovery
|
|
192
|
+
|
|
193
|
+
If after 2 rounds there are still < 3 findings:
|
|
194
|
+
|
|
195
|
+
```markdown
|
|
196
|
+
## Adversarial Review — Insufficient Findings
|
|
197
|
+
|
|
198
|
+
I have examined the following dimensions across 2 rounds of analysis:
|
|
199
|
+
|
|
200
|
+
1. Architecture: checked AD-01~04 and corresponding implementations, no obvious issues found
|
|
201
|
+
2. Implementation: checked src/auth/*.ts totaling 342 lines, no obvious issues found
|
|
202
|
+
3. Tests: checked 15 tests, covered all ACs, no gaps found
|
|
203
|
+
4. Security: checked input validation, error messages, secret management, no issues found
|
|
204
|
+
5. Maintainability: naming and structure are consistent
|
|
205
|
+
|
|
206
|
+
⚠ Note: this does not mean there are truly no issues. It may be that:
|
|
207
|
+
- the object under review is indeed high quality
|
|
208
|
+
- or my review capability has blind spots (e.g., specific domains)
|
|
209
|
+
- or there are hidden issues that will only surface at runtime
|
|
210
|
+
|
|
211
|
+
Recommendations:
|
|
212
|
+
- Human review (at least walk through the diff once)
|
|
213
|
+
- Consider /curdx-flow:qa for real browser/integration testing (Phase 5+)
|
|
214
|
+
- Wait until deployment to staging to observe
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
---
|
|
218
|
+
|
|
219
|
+
_Source: BMAD-METHOD's adversarial review prompts._
|
|
@@ -0,0 +1,184 @@
|
|
|
1
|
+
---
|
|
2
|
+
gate: coverage-audit-gate
|
|
3
|
+
category: standard-mode
|
|
4
|
+
severity: blocking
|
|
5
|
+
depends_on: []
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Coverage Audit Gate — Multi-Source Coverage Audit
|
|
9
|
+
|
|
10
|
+
> Derived from get-shit-done's "Multi-Source Coverage Audit".
|
|
11
|
+
>
|
|
12
|
+
> Rule: every claim in the spec must be covered by implementation/tests. Uncovered = missed = future bug.
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## Trigger Timing
|
|
17
|
+
|
|
18
|
+
- End of the tasks phase (last step of flow-planner)
|
|
19
|
+
- Before the execution phase completes (when /curdx-flow:verify runs)
|
|
20
|
+
- Explicitly requested by /curdx-flow:audit
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Audit Sources (4 categories)
|
|
25
|
+
|
|
26
|
+
### Source 1: Requirements (FR + AC)
|
|
27
|
+
|
|
28
|
+
**Source**: `requirements.md`
|
|
29
|
+
|
|
30
|
+
**Checks**:
|
|
31
|
+
- Does every FR-NN have a corresponding task in tasks.md?
|
|
32
|
+
- Does every AC-X.Y have a corresponding test in code?
|
|
33
|
+
|
|
34
|
+
**Missing handling**:
|
|
35
|
+
- Uncovered FR → block, must add task or exempt
|
|
36
|
+
- Uncovered AC → warning, recommend adding test
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
### Source 2: Design (AD + Components)
|
|
41
|
+
|
|
42
|
+
**Source**: `design.md`
|
|
43
|
+
|
|
44
|
+
**Checks**:
|
|
45
|
+
- Does every AD-NN have a corresponding implementation task in tasks.md?
|
|
46
|
+
- Does every Component have a skeleton task + core logic task in tasks.md?
|
|
47
|
+
- Does every error path have a corresponding scenario in tests?
|
|
48
|
+
|
|
49
|
+
**Missing handling**:
|
|
50
|
+
- Uncovered AD → block (an architecture decision not landed = design failure)
|
|
51
|
+
- Uncovered error path → warning
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
### Source 3: Research recommendations
|
|
56
|
+
|
|
57
|
+
**Source**: the "recommendations" section of `research.md`
|
|
58
|
+
|
|
59
|
+
**Checks**:
|
|
60
|
+
- Are the technical approaches recommended by research implemented in design.md?
|
|
61
|
+
- Are the pitfalls found by research avoided in the implementation?
|
|
62
|
+
|
|
63
|
+
**Missing handling**:
|
|
64
|
+
- Recommended direction not adopted → warning (design.md must explain why not)
|
|
65
|
+
- Pitfall not avoided → block
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
### Source 4: Project-level decisions (D-NN)
|
|
70
|
+
|
|
71
|
+
**Source**: the decisions array in `.flow/STATE.md`
|
|
72
|
+
|
|
73
|
+
**Checks**:
|
|
74
|
+
- Which D-NN are involved in this spec?
|
|
75
|
+
- Is each relevant D-NN referenced in design.md / tasks.md?
|
|
76
|
+
- Does the implementation comply with the decision?
|
|
77
|
+
|
|
78
|
+
**Missing handling**:
|
|
79
|
+
- Violates D-NN → block, must either return to the design phase to challenge the decision or modify the implementation
|
|
80
|
+
- Not referenced but actually compliant → warning (prompt to add reference)
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## Execution Flow
|
|
85
|
+
|
|
86
|
+
```python
|
|
87
|
+
# pseudocode
|
|
88
|
+
def audit(spec_name):
|
|
89
|
+
req = parse_requirements(spec_name)
|
|
90
|
+
design = parse_design(spec_name)
|
|
91
|
+
research = parse_research(spec_name)
|
|
92
|
+
state = parse_global_state()
|
|
93
|
+
tasks = parse_tasks(spec_name)
|
|
94
|
+
commits = git_log_for_spec(spec_name)
|
|
95
|
+
|
|
96
|
+
missing = []
|
|
97
|
+
|
|
98
|
+
# Source 1
|
|
99
|
+
for fr in req.functional_requirements:
|
|
100
|
+
if not any(fr.id in t.requirements_ref for t in tasks):
|
|
101
|
+
missing.append(("FR", fr.id, "no task covers"))
|
|
102
|
+
|
|
103
|
+
for ac in req.acceptance_criteria:
|
|
104
|
+
if not any(ac.id in c.body for c in commits):
|
|
105
|
+
missing.append(("AC", ac.id, "no test covers"))
|
|
106
|
+
|
|
107
|
+
# Source 2
|
|
108
|
+
for ad in design.architecture_decisions:
|
|
109
|
+
if not any(ad.id in t.design_ref for t in tasks):
|
|
110
|
+
missing.append(("AD", ad.id, "no task implements"))
|
|
111
|
+
|
|
112
|
+
# Source 3
|
|
113
|
+
for rec in research.recommendations:
|
|
114
|
+
if rec not in design.chosen_approaches:
|
|
115
|
+
missing.append(("Research", rec.summary, "not adopted, no rationale"))
|
|
116
|
+
|
|
117
|
+
# Source 4
|
|
118
|
+
relevant_decisions = [d for d in state.decisions if spec_touches(d)]
|
|
119
|
+
for d in relevant_decisions:
|
|
120
|
+
if not any(d.id in text for text in [design.content, tasks.content]):
|
|
121
|
+
missing.append(("D", d.id, "not referenced"))
|
|
122
|
+
|
|
123
|
+
return missing
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## Violation Levels
|
|
129
|
+
|
|
130
|
+
| Missing Type | Level | Action |
|
|
131
|
+
|---------|------|------|
|
|
132
|
+
| FR uncovered | **block** | add task |
|
|
133
|
+
| AC uncovered | warning | add test (recommended) |
|
|
134
|
+
| AD uncovered | **block** | add implementation task |
|
|
135
|
+
| Component uncovered | **block** | add skeleton + logic tasks |
|
|
136
|
+
| Error path uncovered | warning | add error test |
|
|
137
|
+
| Research recommendation not adopted | warning | add rejection rationale to design.md |
|
|
138
|
+
| Project decision violated | **block** | challenge decision or fix implementation |
|
|
139
|
+
|
|
140
|
+
---
|
|
141
|
+
|
|
142
|
+
## Output Format
|
|
143
|
+
|
|
144
|
+
```markdown
|
|
145
|
+
## Coverage Audit Report
|
|
146
|
+
|
|
147
|
+
Spec: auth-system
|
|
148
|
+
Audit time: 2026-04-19
|
|
149
|
+
|
|
150
|
+
### Source 1: Requirements
|
|
151
|
+
- FR-01 login: ✓ tasks 1.1, 1.2
|
|
152
|
+
- FR-02 logout: ✓ tasks 2.1
|
|
153
|
+
- FR-03 refresh token: ✗ **uncovered**
|
|
154
|
+
- AC-1.1: ✓ test in tasks 3.1
|
|
155
|
+
- AC-1.2: ⚠ no corresponding test
|
|
156
|
+
- AC-2.1: ✓ test in tasks 3.2
|
|
157
|
+
|
|
158
|
+
### Source 2: Design
|
|
159
|
+
- AD-01 JWT vs Session: ✓ tasks 1.1
|
|
160
|
+
- AD-02 bcrypt cost 12: ✓ tasks 1.2
|
|
161
|
+
- AD-03 refresh rotation: ✗ **uncovered** (maps to FR-03)
|
|
162
|
+
- Component TokenManager: ✗ **uncovered**
|
|
163
|
+
- Error path: network fail: ⚠ no test
|
|
164
|
+
|
|
165
|
+
### Source 3: Research
|
|
166
|
+
- Recommendation: use Redis to store blacklist → ✓ adopted in design
|
|
167
|
+
- Pitfall: bcrypt cost > 15 will be slow → ✓ design limits ≤ 12
|
|
168
|
+
|
|
169
|
+
### Source 4: Decisions
|
|
170
|
+
- D-07 session storage → ✓ referenced by AD-01
|
|
171
|
+
- D-12 error log format → ⚠ not mentioned in design/tasks
|
|
172
|
+
|
|
173
|
+
### Summary
|
|
174
|
+
Blockers: 3 (FR-03, AD-03, Component TokenManager)
|
|
175
|
+
Warnings: 3
|
|
176
|
+
|
|
177
|
+
Fix recommendations:
|
|
178
|
+
1. Add tasks to cover FR-03 / AD-03 / TokenManager, or
|
|
179
|
+
2. Explicitly exempt in STATE.md (defer to next spec)
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
---
|
|
183
|
+
|
|
184
|
+
_Source: get-shit-done's multi-source coverage audit._
|