@curdx/flow 3.0.0 → 3.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +33 -82
- package/LICENSE +1 -1
- package/README.md +28 -129
- package/dist/index.mjs +1165 -0
- package/package.json +33 -44
- package/.claude-plugin/marketplace.json +0 -48
- package/.claude-plugin/plugin.json +0 -52
- package/agent-preamble/preamble.md +0 -314
- package/agents/flow-adversary.md +0 -203
- package/agents/flow-architect.md +0 -198
- package/agents/flow-brownfield-analyst.md +0 -143
- package/agents/flow-debugger.md +0 -321
- package/agents/flow-edge-hunter.md +0 -289
- package/agents/flow-executor.md +0 -269
- package/agents/flow-orchestrator.md +0 -145
- package/agents/flow-planner.md +0 -247
- package/agents/flow-product-designer.md +0 -159
- package/agents/flow-qa-engineer.md +0 -282
- package/agents/flow-researcher.md +0 -166
- package/agents/flow-reviewer.md +0 -304
- package/agents/flow-security-auditor.md +0 -401
- package/agents/flow-triage-analyst.md +0 -272
- package/agents/flow-ui-researcher.md +0 -230
- package/agents/flow-ux-designer.md +0 -221
- package/agents/flow-verifier.md +0 -350
- package/bin/curdx-flow +0 -5
- package/bin/curdx-flow-state +0 -104
- package/bin/curdx-flow.js +0 -54
- package/cli/README.md +0 -104
- package/cli/doctor-workflow.js +0 -483
- package/cli/doctor.js +0 -73
- package/cli/help.js +0 -59
- package/cli/install-bundled-mcps.js +0 -37
- package/cli/install-companions.js +0 -19
- package/cli/install-context7-config.js +0 -80
- package/cli/install-curdx-plugin.js +0 -96
- package/cli/install-language.js +0 -35
- package/cli/install-next-steps.js +0 -29
- package/cli/install-options.js +0 -9
- package/cli/install-paths.js +0 -52
- package/cli/install-recommended-plugins.js +0 -104
- package/cli/install-required-plugins.js +0 -57
- package/cli/install-self-update.js +0 -62
- package/cli/install-workflow.js +0 -209
- package/cli/install.js +0 -101
- package/cli/lib/claude-commands.js +0 -41
- package/cli/lib/claude-ops.js +0 -47
- package/cli/lib/claude.js +0 -183
- package/cli/lib/config.js +0 -24
- package/cli/lib/doctor-claude-settings.js +0 -1186
- package/cli/lib/doctor-report.js +0 -978
- package/cli/lib/doctor-runtime-environment.js +0 -196
- package/cli/lib/frontmatter.js +0 -44
- package/cli/lib/json-schema.js +0 -57
- package/cli/lib/logging.js +0 -25
- package/cli/lib/process.js +0 -60
- package/cli/lib/prompts.js +0 -135
- package/cli/lib/runtime.js +0 -107
- package/cli/lib/semver.js +0 -109
- package/cli/lib/version.js +0 -12
- package/cli/protocols-body.md +0 -22
- package/cli/protocols.js +0 -162
- package/cli/registry.js +0 -123
- package/cli/router.js +0 -49
- package/cli/uninstall-actions.js +0 -360
- package/cli/uninstall-workflow.js +0 -146
- package/cli/uninstall.js +0 -42
- package/cli/upgrade-workflow.js +0 -80
- package/cli/upgrade.js +0 -91
- package/cli/utils.js +0 -40
- package/gates/adversarial-review-gate.md +0 -219
- package/gates/coverage-audit-gate.md +0 -182
- package/gates/devex-gate.md +0 -254
- package/gates/edge-case-gate.md +0 -194
- package/gates/karpathy-gate.md +0 -130
- package/gates/security-gate.md +0 -218
- package/gates/tdd-gate.md +0 -182
- package/gates/test-quality-gate.md +0 -59
- package/gates/verification-gate.md +0 -179
- package/hooks/hooks.json +0 -130
- package/hooks/scripts/common.sh +0 -237
- package/hooks/scripts/config-change-guard.sh +0 -94
- package/hooks/scripts/flow-context-watch.sh +0 -94
- package/hooks/scripts/inject-karpathy.sh +0 -53
- package/hooks/scripts/quick-mode-guard.sh +0 -69
- package/hooks/scripts/session-start.sh +0 -94
- package/hooks/scripts/session-title.sh +0 -87
- package/hooks/scripts/stop-watcher.sh +0 -231
- package/hooks/scripts/subagent-artifact-guard.sh +0 -92
- package/hooks/scripts/subagent-statusline.sh +0 -111
- package/hooks/scripts/task-lifecycle-guard.sh +0 -106
- package/hooks/scripts/teammate-idle-guard.sh +0 -83
- package/knowledge/artifact-output-discipline.md +0 -24
- package/knowledge/artifact-summary-contracts.md +0 -50
- package/knowledge/atomic-commits.md +0 -262
- package/knowledge/claude-code-runtime-contracts.md +0 -240
- package/knowledge/epic-decomposition.md +0 -307
- package/knowledge/execution-strategies.md +0 -303
- package/knowledge/karpathy-guidelines.md +0 -219
- package/knowledge/planning-reviews.md +0 -211
- package/knowledge/poc-first-workflow.md +0 -223
- package/knowledge/review-feedback-intake.md +0 -57
- package/knowledge/spec-driven-development.md +0 -180
- package/knowledge/systematic-debugging.md +0 -378
- package/knowledge/two-stage-review.md +0 -249
- package/knowledge/wave-execution.md +0 -403
- package/monitors/monitors.json +0 -8
- package/monitors/scripts/flow-state-monitor.sh +0 -102
- package/output-styles/curdx-evidence-first.md +0 -34
- package/output-styles/curdx-fast-mode.md +0 -42
- package/output-styles/curdx-spec-mode.md +0 -46
- package/schemas/agent-frontmatter.schema.json +0 -66
- package/schemas/config.schema.json +0 -134
- package/schemas/gate-frontmatter.schema.json +0 -30
- package/schemas/hooks.schema.json +0 -115
- package/schemas/output-style-frontmatter.schema.json +0 -22
- package/schemas/plugin-manifest.schema.json +0 -436
- package/schemas/plugin-settings.schema.json +0 -29
- package/schemas/skill-frontmatter.schema.json +0 -177
- package/schemas/spec-frontmatter.schema.json +0 -42
- package/schemas/spec-state.schema.json +0 -165
- package/settings.json +0 -8
- package/skills/brownfield-index/SKILL.md +0 -53
- package/skills/brownfield-index/references/applicability.md +0 -12
- package/skills/brownfield-index/references/handoff.md +0 -8
- package/skills/brownfield-index/references/index-contract.md +0 -10
- package/skills/browser-qa/SKILL.md +0 -39
- package/skills/browser-qa/references/handoff.md +0 -6
- package/skills/browser-qa/references/prerequisites.md +0 -10
- package/skills/browser-qa/references/qa-contract.md +0 -20
- package/skills/cancel/SKILL.md +0 -41
- package/skills/cancel/references/destructive-mode.md +0 -17
- package/skills/cancel/references/reporting.md +0 -18
- package/skills/cancel/references/state-recovery.md +0 -30
- package/skills/cancel/references/target-resolution.md +0 -7
- package/skills/debug/SKILL.md +0 -45
- package/skills/debug/references/context-gathering.md +0 -11
- package/skills/debug/references/failure-guard.md +0 -25
- package/skills/debug/references/intake.md +0 -12
- package/skills/debug/references/phase-workflow.md +0 -34
- package/skills/debug/references/reporting.md +0 -20
- package/skills/epic/SKILL.md +0 -39
- package/skills/epic/references/epic-artifacts.md +0 -20
- package/skills/epic/references/epic-intake.md +0 -9
- package/skills/epic/references/slice-handoff.md +0 -16
- package/skills/fast/SKILL.md +0 -62
- package/skills/fast/references/applicability.md +0 -25
- package/skills/fast/references/clarification.md +0 -20
- package/skills/fast/references/execution-contract.md +0 -56
- package/skills/help/SKILL.md +0 -55
- package/skills/help/references/dispatch.md +0 -20
- package/skills/help/references/overview.md +0 -39
- package/skills/help/references/troubleshoot.md +0 -47
- package/skills/help/references/workflow.md +0 -37
- package/skills/implement/SKILL.md +0 -104
- package/skills/implement/references/error-recovery.md +0 -36
- package/skills/implement/references/linear-execution.md +0 -43
- package/skills/implement/references/native-task-sync.md +0 -107
- package/skills/implement/references/preflight.md +0 -43
- package/skills/implement/references/progress-contract.md +0 -36
- package/skills/implement/references/state-init.md +0 -36
- package/skills/implement/references/stop-hook-execution.md +0 -50
- package/skills/implement/references/strategy-router.md +0 -38
- package/skills/implement/references/subagent-execution.md +0 -57
- package/skills/implement/references/wave-execution.md +0 -180
- package/skills/init/SKILL.md +0 -49
- package/skills/init/references/gitignore-and-health.md +0 -26
- package/skills/init/references/next-steps.md +0 -22
- package/skills/init/references/preflight.md +0 -15
- package/skills/init/references/scaffold-contract.md +0 -27
- package/skills/review/SKILL.md +0 -82
- package/skills/review/references/optional-passes.md +0 -48
- package/skills/review/references/preflight.md +0 -38
- package/skills/review/references/report-contract.md +0 -49
- package/skills/review/references/reporting.md +0 -20
- package/skills/review/references/stage-execution.md +0 -32
- package/skills/security-audit/SKILL.md +0 -47
- package/skills/security-audit/references/audit-contract.md +0 -21
- package/skills/security-audit/references/gate-handoff.md +0 -8
- package/skills/security-audit/references/scope-and-depth.md +0 -9
- package/skills/spec/SKILL.md +0 -100
- package/skills/spec/references/artifact-landing.md +0 -31
- package/skills/spec/references/phase-execution.md +0 -50
- package/skills/spec/references/planning-review.md +0 -31
- package/skills/spec/references/preflight-and-routing.md +0 -46
- package/skills/spec/references/reporting.md +0 -21
- package/skills/start/SKILL.md +0 -84
- package/skills/start/references/branch-routing.md +0 -51
- package/skills/start/references/mode-semantics.md +0 -12
- package/skills/start/references/preflight.md +0 -13
- package/skills/start/references/reporting.md +0 -20
- package/skills/start/references/state-seeding.md +0 -44
- package/skills/start/references/workflow-handoff.md +0 -26
- package/skills/status/SKILL.md +0 -41
- package/skills/status/references/gather-contract.md +0 -30
- package/skills/status/references/health-rules.md +0 -27
- package/skills/status/references/output-contract.md +0 -25
- package/skills/status/references/preflight.md +0 -10
- package/skills/status/references/recovery-hints.md +0 -18
- package/skills/ui-sketch/SKILL.md +0 -39
- package/skills/ui-sketch/references/brief-intake.md +0 -10
- package/skills/ui-sketch/references/iteration-handoff.md +0 -5
- package/skills/ui-sketch/references/variant-contract.md +0 -15
- package/skills/verify/SKILL.md +0 -56
- package/skills/verify/references/evidence-workflow.md +0 -39
- package/skills/verify/references/output-contract.md +0 -23
- package/skills/verify/references/preflight.md +0 -11
- package/skills/verify/references/report-handoff.md +0 -35
- package/skills/verify/references/strict-mode.md +0 -12
- package/templates/CONTEXT.md.tmpl +0 -53
- package/templates/PROJECT.md.tmpl +0 -59
- package/templates/ROADMAP.md.tmpl +0 -50
- package/templates/STATE.md.tmpl +0 -49
- package/templates/config.json.tmpl +0 -51
- package/templates/design.md.tmpl +0 -83
- package/templates/progress.md.tmpl +0 -77
- package/templates/requirements.md.tmpl +0 -76
- package/templates/research.md.tmpl +0 -83
- package/templates/tasks.md.tmpl +0 -107
|
@@ -1,219 +0,0 @@
|
|
|
1
|
-
# Karpathy Guidelines — Detailed Version
|
|
2
|
-
|
|
3
|
-
> Source: Andrej Karpathy's observations on X about LLM coding pitfalls. This is CurdX-Flow's L1 behavior baseline — all agents inherit from it.
|
|
4
|
-
>
|
|
5
|
-
> Agents reference this file via `@${CLAUDE_PLUGIN_ROOT}/knowledge/karpathy-guidelines.md`.
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Core Insight
|
|
10
|
-
|
|
11
|
-
LLMs have 4 systemic failure modes in coding. Each can be countered with a clear principle.
|
|
12
|
-
|
|
13
|
-
---
|
|
14
|
-
|
|
15
|
-
## Principle 1: Think Before Coding
|
|
16
|
-
|
|
17
|
-
### Failure mode
|
|
18
|
-
LLMs silently make wrong assumptions and execute, hiding confusion and failing to surface trade-offs.
|
|
19
|
-
|
|
20
|
-
### Counter rules
|
|
21
|
-
- **Explicitly state assumptions** — "I'm assuming X, Y, Z. Correct me if that's wrong."
|
|
22
|
-
- **Present multiple interpretations** — "export data" could mean CSV / JSON / SQL dump. List options for the user.
|
|
23
|
-
- **Push back when warranted** — tell the user when their idea is clearly off, don't blindly follow
|
|
24
|
-
- **Stop when confused** — ask rather than guess
|
|
25
|
-
|
|
26
|
-
### Good vs Bad
|
|
27
|
-
|
|
28
|
-
**❌ Bad**:
|
|
29
|
-
> User: "Export user data"
|
|
30
|
-
> Agent: *directly writes code to export all users as CSV*
|
|
31
|
-
|
|
32
|
-
**✓ Good**:
|
|
33
|
-
> User: "Export user data"
|
|
34
|
-
> Agent: "I need to confirm a few things:
|
|
35
|
-
> - Export **all** users or a filtered subset?
|
|
36
|
-
> - Format: CSV / JSON / SQL dump?
|
|
37
|
-
> - Which fields? Should sensitive fields (email, phone) be masked?
|
|
38
|
-
> - One-off script or scheduled job?"
|
|
39
|
-
|
|
40
|
-
### Triggers
|
|
41
|
-
|
|
42
|
-
- Task has multiple reasonable interpretations → **must** ask
|
|
43
|
-
- Operation affects data → **must** confirm
|
|
44
|
-
- Simple local utility script → reasonable assumptions are OK, but state them
|
|
45
|
-
|
|
46
|
-
---
|
|
47
|
-
|
|
48
|
-
## Principle 2: Simplicity First
|
|
49
|
-
|
|
50
|
-
### Failure mode
|
|
51
|
-
LLMs tend to "do more than asked", producing 1000-line complex solutions instead of 100-line simple ones. Over-abstraction, premature optimization, excessive defensiveness.
|
|
52
|
-
|
|
53
|
-
### Counter rules
|
|
54
|
-
- **Minimize code** — write only the minimum needed for the task
|
|
55
|
-
- **No features beyond the request** — if the user asks for `calculate_discount`, don't also implement `calculate_tax`
|
|
56
|
-
- **No single-use abstractions** — Strategy pattern for one strategy? Just write a function
|
|
57
|
-
- **No unnecessary flexibility** — don't leave hooks for "maybe future needs"
|
|
58
|
-
|
|
59
|
-
### Good vs Bad
|
|
60
|
-
|
|
61
|
-
**❌ Bad** (200 lines):
|
|
62
|
-
```python
|
|
63
|
-
class DiscountStrategy(ABC):
|
|
64
|
-
@abstractmethod
|
|
65
|
-
def calculate(self, amount): ...
|
|
66
|
-
|
|
67
|
-
class PercentageDiscount(DiscountStrategy):
|
|
68
|
-
def __init__(self, percent):
|
|
69
|
-
self.percent = percent
|
|
70
|
-
def calculate(self, amount):
|
|
71
|
-
return amount * (1 - self.percent / 100)
|
|
72
|
-
|
|
73
|
-
class DiscountFactory:
|
|
74
|
-
@staticmethod
|
|
75
|
-
def create(type, **kwargs):
|
|
76
|
-
if type == 'percentage':
|
|
77
|
-
return PercentageDiscount(**kwargs)
|
|
78
|
-
# ... more types ...
|
|
79
|
-
|
|
80
|
-
# Usage:
|
|
81
|
-
factory = DiscountFactory()
|
|
82
|
-
discount = factory.create('percentage', percent=10)
|
|
83
|
-
result = discount.calculate(100)
|
|
84
|
-
```
|
|
85
|
-
|
|
86
|
-
**✓ Good** (1 line):
|
|
87
|
-
```python
|
|
88
|
-
def calculate_discount(amount, percent):
|
|
89
|
-
return amount * (1 - percent / 100)
|
|
90
|
-
```
|
|
91
|
-
|
|
92
|
-
### When abstraction is allowed
|
|
93
|
-
- 3+ real repetitions (not "imagined maybe-future")
|
|
94
|
-
- Post-abstraction code is **shorter and clearer**, not more complex
|
|
95
|
-
- Tests become **easier** to write, not harder
|
|
96
|
-
|
|
97
|
-
### Famous quote
|
|
98
|
-
> "A little copying is better than a little dependency." — Rob Pike
|
|
99
|
-
|
|
100
|
-
---
|
|
101
|
-
|
|
102
|
-
## Principle 3: Surgical Changes
|
|
103
|
-
|
|
104
|
-
### Failure mode
|
|
105
|
-
While editing, AI "also" improves, refactors, reformats, and fixes lint warnings. It touches code it shouldn't.
|
|
106
|
-
|
|
107
|
-
### Counter rules
|
|
108
|
-
- **Only change required lines** — task says change A, only change A
|
|
109
|
-
- **Match the existing style** — project uses single quotes, you use single quotes
|
|
110
|
-
- **Don't delete pre-existing dead code** — unless user explicitly asks
|
|
111
|
-
- **Only clean up orphan code you created** — variables you added that are unused, delete; leave others alone
|
|
112
|
-
|
|
113
|
-
### Good vs Bad
|
|
114
|
-
|
|
115
|
-
**Task**: "Fix the bug where email can be empty on user registration"
|
|
116
|
-
|
|
117
|
-
**❌ Bad** (10 lines changed):
|
|
118
|
-
```diff
|
|
119
|
-
- def register(email, password):
|
|
120
|
-
+ def register(email: str, password: str) -> User:
|
|
121
|
-
+ """Register a new user with email validation."""
|
|
122
|
-
+ if not email:
|
|
123
|
-
+ raise ValueError("Email is required")
|
|
124
|
-
+ if not EMAIL_REGEX.match(email): # also added format validation
|
|
125
|
-
+ raise ValueError("Invalid email format")
|
|
126
|
-
+ if len(password) < 8: # also added password length
|
|
127
|
-
+ raise ValueError("Password too short")
|
|
128
|
-
- user = User(email=email, password=password)
|
|
129
|
-
+ user = User(email=email.lower().strip(), password=hash_password(password)) # also lowercased
|
|
130
|
-
db.save(user)
|
|
131
|
-
return user
|
|
132
|
-
```
|
|
133
|
-
|
|
134
|
-
**✓ Good** (2 lines changed):
|
|
135
|
-
```diff
|
|
136
|
-
def register(email, password):
|
|
137
|
-
+ if not email:
|
|
138
|
-
+ raise ValueError("Email is required")
|
|
139
|
-
user = User(email=email, password=password)
|
|
140
|
-
db.save(user)
|
|
141
|
-
return user
|
|
142
|
-
```
|
|
143
|
-
|
|
144
|
-
### Why it matters
|
|
145
|
-
- PR does one thing → easy to review, easy to revert, git blame stays accurate
|
|
146
|
-
- Side changes introduce **untested** variations
|
|
147
|
-
- Doing only what was asked → trust is built
|
|
148
|
-
|
|
149
|
-
### Only exceptions
|
|
150
|
-
- User explicitly says "clean up while you're at it"
|
|
151
|
-
- Orphans you yourself just created
|
|
152
|
-
|
|
153
|
-
---
|
|
154
|
-
|
|
155
|
-
## Principle 4: Goal-Driven Execution
|
|
156
|
-
|
|
157
|
-
### Failure mode
|
|
158
|
-
Vague instructions ("make it work", "fix authentication") lead to aimless changes. The LLM cannot autonomously decide when it's done.
|
|
159
|
-
|
|
160
|
-
### Counter rules
|
|
161
|
-
- **Define success criteria first** — before starting, make "done" explicit
|
|
162
|
-
- **Translate into verifiable goals** — not "fix auth", but "after password change, old sessions become invalid"
|
|
163
|
-
- **Test first** — write tests that express the goal; passing = done
|
|
164
|
-
- **Verifiable > explainable** — show evidence, not "should work"
|
|
165
|
-
|
|
166
|
-
### Good vs Bad
|
|
167
|
-
|
|
168
|
-
**Vague instruction**: "Fix the authentication system"
|
|
169
|
-
|
|
170
|
-
**❌ Bad**:
|
|
171
|
-
> Agent: *modifies authController.js, middleware.js, session.js, user.model.js*
|
|
172
|
-
> Agent: "I improved several aspects of the auth system."
|
|
173
|
-
> [Later discovered: bug still exists, none of the changes are relevant]
|
|
174
|
-
|
|
175
|
-
**✓ Good**:
|
|
176
|
-
> Agent: "'Auth system' is too broad. Let me ask: what is the specific symptom?
|
|
177
|
-
> - Login failing?
|
|
178
|
-
> - Sessions expiring too fast?
|
|
179
|
-
> - CSRF vulnerability?"
|
|
180
|
-
> User: "After password change, old sessions still work"
|
|
181
|
-
> Agent: "OK, success criteria are:
|
|
182
|
-
> 1. User changes password
|
|
183
|
-
> 2. All other sessions are invalidated immediately
|
|
184
|
-
> 3. Verify: two browsers, A changes password, B refresh should redirect to login
|
|
185
|
-
>
|
|
186
|
-
> I'll write this integration test first (it will fail), then find the fix."
|
|
187
|
-
|
|
188
|
-
### Forbidden vocabulary (without evidence)
|
|
189
|
-
|
|
190
|
-
- "should"
|
|
191
|
-
- "probably"
|
|
192
|
-
- "seems"
|
|
193
|
-
- "done"
|
|
194
|
-
- "fixed"
|
|
195
|
-
|
|
196
|
-
These may only be used **with supporting evidence**. Evidence = command output / test passing / curl response / screenshot.
|
|
197
|
-
|
|
198
|
-
---
|
|
199
|
-
|
|
200
|
-
## Integration: How to Self-Check
|
|
201
|
-
|
|
202
|
-
Before emitting output, the agent asks 4 questions:
|
|
203
|
-
|
|
204
|
-
1. **Think**: Where did I make an unstated assumption?
|
|
205
|
-
2. **Simplicity**: Is there code beyond the task? How much can I delete?
|
|
206
|
-
3. **Surgery**: Did I touch code I shouldn't have?
|
|
207
|
-
4. **Goal**: What is my evidence for claiming "done"?
|
|
208
|
-
|
|
209
|
-
Only output when all 4 answers are "clear and evidenced".
|
|
210
|
-
|
|
211
|
-
---
|
|
212
|
-
|
|
213
|
-
## Trade-off Statement (Caution vs Speed)
|
|
214
|
-
|
|
215
|
-
These 4 principles **reduce speed to improve accuracy**. For trivial tasks (rename a variable, write a doc) you can pragmatically skip strict flow. For complex tasks or production-affecting changes, **enforce every one**.
|
|
216
|
-
|
|
217
|
-
---
|
|
218
|
-
|
|
219
|
-
_Source: Andrej Karpathy's observations on LLM coding pitfalls, distilled for CurdX-Flow._
|
|
@@ -1,211 +0,0 @@
|
|
|
1
|
-
# Planning Reviews — 4-Dimension Planning Review
|
|
2
|
-
|
|
3
|
-
> After design is complete and before tasks are generated, review the design from 4 independent angles. Originally from gstack.
|
|
4
|
-
>
|
|
5
|
-
> Agents reference this via `@${CLAUDE_PLUGIN_ROOT}/knowledge/planning-reviews.md`.
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Why
|
|
10
|
-
|
|
11
|
-
`design.md` is produced by flow-architect, but the architect perspective has blind spots:
|
|
12
|
-
- They look at "technical feasibility", not "business value"
|
|
13
|
-
- They look at "architectural elegance", not "user experience"
|
|
14
|
-
- They look at "current implementation", not "future maintenance"
|
|
15
|
-
|
|
16
|
-
The 4 Planning Reviews **examine the same design from different angles**:
|
|
17
|
-
|
|
18
|
-
```
|
|
19
|
-
design.md
|
|
20
|
-
↓ multi-perspective review
|
|
21
|
-
├── CEO Review business: strategy, scope, ROI
|
|
22
|
-
├── Engineering technical: architecture lock-in, risk
|
|
23
|
-
├── Design Review UX: user usability, design system
|
|
24
|
-
└── DevEx Review maintenance: the next person who picks this up
|
|
25
|
-
```
|
|
26
|
-
|
|
27
|
-
Each review is dispatched independently (different agent / context) to avoid perspective convergence.
|
|
28
|
-
|
|
29
|
-
Finally `/curdx-flow:spec --review=all` ties them together: runs all 4 reviews in one pass.
|
|
30
|
-
|
|
31
|
-
---
|
|
32
|
-
|
|
33
|
-
## Review 1: CEO Review
|
|
34
|
-
|
|
35
|
-
### Angle: strategy / business
|
|
36
|
-
|
|
37
|
-
**Question**: Is this worth doing? Is the scope right? Could it undercut next quarter's strategy?
|
|
38
|
-
|
|
39
|
-
### Checklist
|
|
40
|
-
|
|
41
|
-
- [ ] **Scope is appropriate**: not too large (over-engineering), not too small (doesn't deliver value)
|
|
42
|
-
- [ ] **Timeline is reasonable**: relative to the overall roadmap
|
|
43
|
-
- [ ] **ROI is quantifiable**: the business value this feature brings
|
|
44
|
-
- [ ] **Opportunity cost**: doing this means not doing what
|
|
45
|
-
- [ ] **Strategic alignment**: does it support company OKRs / product roadmap
|
|
46
|
-
- [ ] **Stakeholders**: who benefits, who is impacted
|
|
47
|
-
|
|
48
|
-
### Typical findings
|
|
49
|
-
|
|
50
|
-
- "This design is complete, but the MVP needs only 30%. Suggest trimming."
|
|
51
|
-
- "This will take 3 months, but project XX is more urgent right now. Suggest deferring."
|
|
52
|
-
- "Scope is too small — users still won't convert after it ships. Suggest extending to include Y."
|
|
53
|
-
- "This decision locks us in for 2 years. Are you sure?"
|
|
54
|
-
|
|
55
|
-
### Dispatch
|
|
56
|
-
|
|
57
|
-
Dispatch `flow-architect` (switching perspective to CEO), or create a new agent `flow-ceo-reviewer`.
|
|
58
|
-
|
|
59
|
-
Phase 5 implementation: reuse `flow-architect` with a prompt instructing "switch to CEO perspective".
|
|
60
|
-
|
|
61
|
-
---
|
|
62
|
-
|
|
63
|
-
## Review 2: Engineering Review
|
|
64
|
-
|
|
65
|
-
### Angle: technical architecture
|
|
66
|
-
|
|
67
|
-
**Question**: Will this architecture work? Is it maintainable long-term? Are risks identified comprehensively?
|
|
68
|
-
|
|
69
|
-
### Checklist
|
|
70
|
-
|
|
71
|
-
- [ ] **Architecture lock-in**: each AD-NN has explicit trade-off notes
|
|
72
|
-
- [ ] **Scalable**: what happens at 10x users / data
|
|
73
|
-
- [ ] **Dependencies reasonable**: is the chosen library / service the right one
|
|
74
|
-
- [ ] **Data flow diagram clear**: mermaid reflects the real flow
|
|
75
|
-
- [ ] **Error paths covered**: not just the happy path
|
|
76
|
-
- [ ] **Test strategy explicit**: unit / integration / E2E ratio
|
|
77
|
-
- [ ] **Deployment feasible**: CI/CD / monitoring / rollback considered
|
|
78
|
-
|
|
79
|
-
### Typical findings
|
|
80
|
-
|
|
81
|
-
- "AD-03 picks Redis but doesn't specify the fallback on failure"
|
|
82
|
-
- "Data flow diagram shows 3 services hitting DB directly — recommend adding a layer"
|
|
83
|
-
- "Test strategy says 'add E2E' but doesn't specify which scenarios"
|
|
84
|
-
- "Dependency library X has known performance issues (see GitHub issue #123)"
|
|
85
|
-
|
|
86
|
-
### Dispatch
|
|
87
|
-
|
|
88
|
-
Essentially runs `flow-architect` again — but this time not to generate the design, but to **review** it.
|
|
89
|
-
|
|
90
|
-
---
|
|
91
|
-
|
|
92
|
-
## Review 3: Design Review
|
|
93
|
-
|
|
94
|
-
### Angle: UI/UX / design system
|
|
95
|
-
|
|
96
|
-
**Question**: Can users use this? Is it visually consistent? Is accessibility sufficient?
|
|
97
|
-
|
|
98
|
-
### Checklist
|
|
99
|
-
|
|
100
|
-
- [ ] **User flow**: main scenario completes in ≤ 3 steps
|
|
101
|
-
- [ ] **Error states**: users know what to do on failure
|
|
102
|
-
- [ ] **Loading states**: long operations give feedback
|
|
103
|
-
- [ ] **Empty states**: no data does not mean a blank page
|
|
104
|
-
- [ ] **Accessibility**: color contrast / keyboard operation / screen reader
|
|
105
|
-
- [ ] **Design system**: uses existing tokens / component library, doesn't reinvent
|
|
106
|
-
- [ ] **Mobile adaptation**: usable at the narrowest viewport
|
|
107
|
-
- [ ] **Internationalization**: copy can be translated / RTL compatible
|
|
108
|
-
|
|
109
|
-
### Typical findings
|
|
110
|
-
|
|
111
|
-
- "On login failure, the user only sees an error toast that disappears — can't see the specific reason"
|
|
112
|
-
- "Input has no focus ring — keyboard users don't know where they are"
|
|
113
|
-
- "New Button doesn't use the project's Button component — visually inconsistent"
|
|
114
|
-
- "Mobile button is < 44pt — hard to tap"
|
|
115
|
-
|
|
116
|
-
### Dispatch
|
|
117
|
-
|
|
118
|
-
`flow-ux-designer` switches into review mode.
|
|
119
|
-
|
|
120
|
-
---
|
|
121
|
-
|
|
122
|
-
## Review 4: DevEx Review
|
|
123
|
-
|
|
124
|
-
### Angle: the next maintainer
|
|
125
|
-
|
|
126
|
-
**Question**: Can the person (maybe you) picking up this code 6 months from now get up to speed quickly?
|
|
127
|
-
|
|
128
|
-
### Checklist
|
|
129
|
-
|
|
130
|
-
(See the 8 dimensions in `gates/devex-gate.md`)
|
|
131
|
-
|
|
132
|
-
- [ ] Clear naming
|
|
133
|
-
- [ ] Intent comments
|
|
134
|
-
- [ ] File structure
|
|
135
|
-
- [ ] Error messages
|
|
136
|
-
- [ ] Easy setup
|
|
137
|
-
- [ ] Clear types
|
|
138
|
-
- [ ] Tests as documentation
|
|
139
|
-
- [ ] Fast dev loop
|
|
140
|
-
|
|
141
|
-
### Typical findings
|
|
142
|
-
|
|
143
|
-
- "Function named `doStuff(x, y)` — no idea what it does"
|
|
144
|
-
- "Test named `test('calls validateEmail')` — should describe behavior"
|
|
145
|
-
- "Setting up a new env requires 7 manual configuration steps — no docker / script"
|
|
146
|
-
- "Error 'Failed to process' — failed how?"
|
|
147
|
-
|
|
148
|
-
### Dispatch
|
|
149
|
-
|
|
150
|
-
A new agent `flow-devex-reviewer`, or reuse `flow-reviewer` by passing in the devex-gate.
|
|
151
|
-
|
|
152
|
-
Phase 5 implementation: reuse `flow-reviewer` + `@${CLAUDE_PLUGIN_ROOT}/gates/devex-gate.md`.
|
|
153
|
-
|
|
154
|
-
---
|
|
155
|
-
|
|
156
|
-
## /curdx-flow:spec --review=all — Run All 4 at Once
|
|
157
|
-
|
|
158
|
-
```bash
|
|
159
|
-
/curdx-flow:spec --review=all
|
|
160
|
-
```
|
|
161
|
-
|
|
162
|
-
Workflow:
|
|
163
|
-
1. In parallel (single-message multi-Agent), dispatch 4 reviews
|
|
164
|
-
2. Wait for all to return
|
|
165
|
-
3. Merge findings, sort by priority
|
|
166
|
-
4. Present to the user for decision
|
|
167
|
-
|
|
168
|
-
Output:
|
|
169
|
-
```markdown
|
|
170
|
-
# Auto Plan Review: <spec-name>
|
|
171
|
-
|
|
172
|
-
## Summary
|
|
173
|
-
- CEO Review: 3 findings
|
|
174
|
-
- Engineering: 5 findings
|
|
175
|
-
- Design: 2 findings
|
|
176
|
-
- DevEx: 4 findings
|
|
177
|
-
|
|
178
|
-
## Blockers (fix first)
|
|
179
|
-
1. [Engineering] AD-03 Redis fallback missing
|
|
180
|
-
2. [CEO] Scope too large (suggest MVP 30%)
|
|
181
|
-
|
|
182
|
-
## Warnings
|
|
183
|
-
...
|
|
184
|
-
|
|
185
|
-
## Recommendations
|
|
186
|
-
1. Return to /curdx-flow:spec --phase=design to fix blockers
|
|
187
|
-
2. Record warnings in STATE.md, address in tasks phase
|
|
188
|
-
```
|
|
189
|
-
|
|
190
|
-
---
|
|
191
|
-
|
|
192
|
-
## When to Skip Planning Reviews
|
|
193
|
-
|
|
194
|
-
- **MVP / prototype**: time-pressured, run /curdx-flow:spec --phase=tasks first, review after launch
|
|
195
|
-
- **Tiny changes**: a single file < 50 lines doesn't warrant a 4-dimension review
|
|
196
|
-
- **Similar work done before**: reuse prior review conclusions
|
|
197
|
-
|
|
198
|
-
But for production-grade features, running through is strongly recommended.
|
|
199
|
-
|
|
200
|
-
---
|
|
201
|
-
|
|
202
|
-
## Difference from /curdx-flow:review
|
|
203
|
-
|
|
204
|
-
- **/curdx-flow:review**: review **after code is finished** — Stage 1 compliance + Stage 2 quality
|
|
205
|
-
- **/curdx-flow:spec --review**: review **before code starts** — targets design.md
|
|
206
|
-
|
|
207
|
-
The two don't overlap. Plan Review prevents "doing the wrong thing", Code Review ensures "the thing was done right".
|
|
208
|
-
|
|
209
|
-
---
|
|
210
|
-
|
|
211
|
-
_source: gstack's 4 maps review system. CurdX-Flow simplifies to 4 independent commands + one aggregated command._
|
|
@@ -1,223 +0,0 @@
|
|
|
1
|
-
# POC-First Workflow — 5 Phases
|
|
2
|
-
|
|
3
|
-
> Step-by-step methodology for the execution phase: get it running → clean up → add tests → pass quality gates → hand off review-ready evidence.
|
|
4
|
-
>
|
|
5
|
-
> Agents reference this file via `@${CLAUDE_PLUGIN_ROOT}/knowledge/poc-first-workflow.md`.
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Core Idea
|
|
10
|
-
|
|
11
|
-
**Don't scaffold, test, and optimize in the same round.** Each phase focuses on one thing.
|
|
12
|
-
|
|
13
|
-
```
|
|
14
|
-
Phase 1: Make It Work → end-to-end running; hard-coding allowed
|
|
15
|
-
Phase 2: Refactoring → clean up structure, behavior unchanged
|
|
16
|
-
Phase 3: Testing (TDD) → red-green-yellow loop to backfill tests
|
|
17
|
-
Phase 4: Quality Gates → tsc + lint + test all green
|
|
18
|
-
Phase 5: Evidence Handoff → verify, review, prepare PR/release evidence
|
|
19
|
-
```
|
|
20
|
-
|
|
21
|
-
---
|
|
22
|
-
|
|
23
|
-
## Phase 1: Make It Work (POC)
|
|
24
|
-
|
|
25
|
-
### Goal
|
|
26
|
-
Get the whole flow **end-to-end running**. The user can see output / call the API / run the command.
|
|
27
|
-
|
|
28
|
-
### Allowed
|
|
29
|
-
- ✓ Hard-coded constants (TODO: configurize)
|
|
30
|
-
- ✓ Ignore error handling (happy path only)
|
|
31
|
-
- ✓ Skip unit tests
|
|
32
|
-
- ✓ Duplicate code (don't DRY yet)
|
|
33
|
-
|
|
34
|
-
### Not allowed
|
|
35
|
-
- ✗ Claim "done" (this is only a POC)
|
|
36
|
-
- ✗ Skip end-to-end verification (must truly run)
|
|
37
|
-
|
|
38
|
-
### Done criteria
|
|
39
|
-
- Manual `curl` / click / call returns the expected result
|
|
40
|
-
- At least one happy path scenario works end-to-end
|
|
41
|
-
|
|
42
|
-
### Pitfalls
|
|
43
|
-
- **Over-hardcoded** → the refactor later becomes a rewrite. Compromise: use variables at key extension points.
|
|
44
|
-
- **Forgot the real implementation** → left `throw new NotImplemented`. A POC must **really** implement the core path.
|
|
45
|
-
|
|
46
|
-
---
|
|
47
|
-
|
|
48
|
-
## Phase 2: Refactoring
|
|
49
|
-
|
|
50
|
-
### Goal
|
|
51
|
-
Clean up the hasty code from Phase 1. **Behavior must not change.**
|
|
52
|
-
|
|
53
|
-
### Typical actions
|
|
54
|
-
- Extract repeated logic into functions
|
|
55
|
-
- Split "god functions" into smaller ones
|
|
56
|
-
- Name things sensibly (replace Phase 1 temporary names like `doStuff` with `validateUserInput`)
|
|
57
|
-
- Remove unnecessary intermediate variables
|
|
58
|
-
- Apply project style (indentation, quotes, naming)
|
|
59
|
-
|
|
60
|
-
### Done criteria
|
|
61
|
-
- Readability visibly improves
|
|
62
|
-
- Phase 1's manual test still passes when re-run
|
|
63
|
-
- `git diff --stat` shows cleanup, not a rewrite
|
|
64
|
-
|
|
65
|
-
### Pitfalls
|
|
66
|
-
- **Slip in new features** → violates surgical-changes principle. New features go in a new phase / new task.
|
|
67
|
-
- **Over-refactor** → 2 hours to polish 50 lines for "elegance". Good-enough is enough.
|
|
68
|
-
|
|
69
|
-
---
|
|
70
|
-
|
|
71
|
-
## Phase 3: Testing (TDD red-green-yellow)
|
|
72
|
-
|
|
73
|
-
### Goal
|
|
74
|
-
Backfill test coverage, ensuring behavior stability and regression detection.
|
|
75
|
-
|
|
76
|
-
### TDD loop
|
|
77
|
-
|
|
78
|
-
```
|
|
79
|
-
RED → GREEN → YELLOW → (next test)
|
|
80
|
-
```
|
|
81
|
-
|
|
82
|
-
#### RED: write a failing test
|
|
83
|
-
```bash
|
|
84
|
-
# Expect the test to be red
|
|
85
|
-
npm test -- --run specific-test
|
|
86
|
-
# ✗ FAIL — ReferenceError: ... is not defined
|
|
87
|
-
```
|
|
88
|
-
|
|
89
|
-
**Key**: you must actually see a failure. If the test goes green immediately, something's wrong with the test (it's not actually testing the core behavior).
|
|
90
|
-
|
|
91
|
-
Commit: `test(scope): red - <describe the scenario being tested>`
|
|
92
|
-
|
|
93
|
-
#### GREEN: minimal implementation
|
|
94
|
-
Write the **least** code that makes the test pass. Forget elegance.
|
|
95
|
-
|
|
96
|
-
Commit: `feat(scope): green - <implementation that satisfies the test>`
|
|
97
|
-
|
|
98
|
-
#### YELLOW: refactor / clean up
|
|
99
|
-
Test already passes — now tidy the implementation.
|
|
100
|
-
|
|
101
|
-
Commit: `refactor(scope): yellow - <what was cleaned up>`
|
|
102
|
-
|
|
103
|
-
### Test layers
|
|
104
|
-
- **Unit**: pure functions, utility classes → `vitest`
|
|
105
|
-
- **Integration**: inter-component, DB interactions → `vitest` + `supertest`
|
|
106
|
-
- **E2E**: full user flow → `playwright` or `chrome-devtools` MCP
|
|
107
|
-
|
|
108
|
-
### Coverage targets
|
|
109
|
-
- Core business logic ≥ 80%
|
|
110
|
-
- Utility functions 100%
|
|
111
|
-
- UI components: snapshots + key interactions
|
|
112
|
-
- Error paths must have tests (not just happy path)
|
|
113
|
-
|
|
114
|
-
### Pitfalls
|
|
115
|
-
- **Code first, tests after** → not TDD, this is post-hoc testing. Easily misses edge cases.
|
|
116
|
-
- **Tests only cover happy path** → no error handling coverage; the common cause of production issues.
|
|
117
|
-
- **Too much mocking** → tests disconnected from real behavior. Keep primary evidence anchored to real integrations at system boundaries.
|
|
118
|
-
|
|
119
|
-
---
|
|
120
|
-
|
|
121
|
-
## Phase 4: Quality Gates
|
|
122
|
-
|
|
123
|
-
### Goal
|
|
124
|
-
Full local checks to ensure CI will be green.
|
|
125
|
-
|
|
126
|
-
### Standard checks
|
|
127
|
-
|
|
128
|
-
```bash
|
|
129
|
-
# 1. TypeScript strict mode
|
|
130
|
-
npx tsc --strict --noEmit
|
|
131
|
-
|
|
132
|
-
# 2. Lint
|
|
133
|
-
npx eslint src/
|
|
134
|
-
|
|
135
|
-
# 3. All tests
|
|
136
|
-
npm test
|
|
137
|
-
|
|
138
|
-
# 4. Coverage (optional)
|
|
139
|
-
npm test -- --coverage
|
|
140
|
-
```
|
|
141
|
-
|
|
142
|
-
### Done criteria
|
|
143
|
-
Each item produces **0 errors, 0 warnings** (warnings are not tolerated since they proliferate).
|
|
144
|
-
|
|
145
|
-
### Additional checks (project-dependent)
|
|
146
|
-
- Security scan (npm audit / OWASP)
|
|
147
|
-
- Bundle size (bundlewatch)
|
|
148
|
-
- Performance baseline (benchmark)
|
|
149
|
-
|
|
150
|
-
### Pitfalls
|
|
151
|
-
- **`// eslint-disable` everywhere** → this hides problems, doesn't solve them. Only use with strong justification and an explanatory comment.
|
|
152
|
-
- **Skip quality gate and open PR** → CI goes red, wastes reviewer time. Run locally first, then PR.
|
|
153
|
-
|
|
154
|
-
---
|
|
155
|
-
|
|
156
|
-
## Phase 5: Evidence Handoff
|
|
157
|
-
|
|
158
|
-
### Goal
|
|
159
|
-
Feature is verified, reviewed, and ready for a human PR/release decision.
|
|
160
|
-
|
|
161
|
-
### Steps
|
|
162
|
-
|
|
163
|
-
1. **Tidy git history**
|
|
164
|
-
- One commit per task (atomic)
|
|
165
|
-
- Commit message follows conventional format
|
|
166
|
-
- Squash if there are too many WIP commits
|
|
167
|
-
|
|
168
|
-
2. **Prepare PR/release evidence**
|
|
169
|
-
- Clear title (< 70 chars)
|
|
170
|
-
- Summary 3-5 lines covering why & what
|
|
171
|
-
- Include a test plan (checklist)
|
|
172
|
-
- Link spec / issue
|
|
173
|
-
|
|
174
|
-
3. **Respond to review**
|
|
175
|
-
- Reply to every comment ("fixed" or "won't fix because X")
|
|
176
|
-
- Request re-review after fixes — don't change silently
|
|
177
|
-
- Push back politely when you disagree, with evidence
|
|
178
|
-
|
|
179
|
-
4. **Keep CI green**
|
|
180
|
-
- Wait for CI green after every push
|
|
181
|
-
- Fix red immediately — don't pile up
|
|
182
|
-
|
|
183
|
-
5. **Hand off**
|
|
184
|
-
- Squash vs merge vs rebase: per project convention
|
|
185
|
-
- Use the host project's normal PR, merge, and release process
|
|
186
|
-
|
|
187
|
-
### Pitfalls
|
|
188
|
-
- **PR too large** → reviewer gives up. Split anything > 500 changed lines.
|
|
189
|
-
- **Empty PR description** → reviewer has no idea what they're looking at. At minimum provide a Summary.
|
|
190
|
-
- **Force-push during review** → reviewer loses context. Use new commits; squash at merge time.
|
|
191
|
-
|
|
192
|
-
---
|
|
193
|
-
|
|
194
|
-
## When to Skip Phases
|
|
195
|
-
|
|
196
|
-
### Sketch mode (prototype exploration)
|
|
197
|
-
- Phase 1 (POC) is enough
|
|
198
|
-
- Skip Refactoring / Testing / Quality Gates
|
|
199
|
-
|
|
200
|
-
### Fast mode (one-off task)
|
|
201
|
-
- Only Phase 1 + Phase 5 (handoff evidence stays lightweight)
|
|
202
|
-
- Suitable for: fixing a typo, adding a log, changing a constant
|
|
203
|
-
|
|
204
|
-
### Standard mode (default)
|
|
205
|
-
- All 5 phases
|
|
206
|
-
|
|
207
|
-
### Enterprise mode
|
|
208
|
-
- 5 phases + multi-agent review (flow-adversary / flow-edge-hunter) + Security gate
|
|
209
|
-
|
|
210
|
-
---
|
|
211
|
-
|
|
212
|
-
## Relationship to the TDD Iron Rule
|
|
213
|
-
|
|
214
|
-
**TDD iron rule**: "NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST"
|
|
215
|
-
|
|
216
|
-
**POC-First's compromise**: Phase 1 allows skipping tests because the purpose of POC is to validate the idea, not to deliver.
|
|
217
|
-
|
|
218
|
-
**Reconciliation**:
|
|
219
|
-
- POC phase: allow getting things running first (feasibility check)
|
|
220
|
-
- Starting at Phase 2+: revert to strict TDD
|
|
221
|
-
- Newly added **production code**: must be covered by Phase 3 TDD loop
|
|
222
|
-
|
|
223
|
-
The two don't conflict: POC is not production code; production code starts at Phase 2.
|