@curdx/flow 3.0.0 → 3.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (219) hide show
  1. package/CHANGELOG.md +33 -82
  2. package/LICENSE +1 -1
  3. package/README.md +28 -129
  4. package/dist/index.mjs +1165 -0
  5. package/package.json +33 -44
  6. package/.claude-plugin/marketplace.json +0 -48
  7. package/.claude-plugin/plugin.json +0 -52
  8. package/agent-preamble/preamble.md +0 -314
  9. package/agents/flow-adversary.md +0 -203
  10. package/agents/flow-architect.md +0 -198
  11. package/agents/flow-brownfield-analyst.md +0 -143
  12. package/agents/flow-debugger.md +0 -321
  13. package/agents/flow-edge-hunter.md +0 -289
  14. package/agents/flow-executor.md +0 -269
  15. package/agents/flow-orchestrator.md +0 -145
  16. package/agents/flow-planner.md +0 -247
  17. package/agents/flow-product-designer.md +0 -159
  18. package/agents/flow-qa-engineer.md +0 -282
  19. package/agents/flow-researcher.md +0 -166
  20. package/agents/flow-reviewer.md +0 -304
  21. package/agents/flow-security-auditor.md +0 -401
  22. package/agents/flow-triage-analyst.md +0 -272
  23. package/agents/flow-ui-researcher.md +0 -230
  24. package/agents/flow-ux-designer.md +0 -221
  25. package/agents/flow-verifier.md +0 -350
  26. package/bin/curdx-flow +0 -5
  27. package/bin/curdx-flow-state +0 -104
  28. package/bin/curdx-flow.js +0 -54
  29. package/cli/README.md +0 -104
  30. package/cli/doctor-workflow.js +0 -483
  31. package/cli/doctor.js +0 -73
  32. package/cli/help.js +0 -59
  33. package/cli/install-bundled-mcps.js +0 -37
  34. package/cli/install-companions.js +0 -19
  35. package/cli/install-context7-config.js +0 -80
  36. package/cli/install-curdx-plugin.js +0 -96
  37. package/cli/install-language.js +0 -35
  38. package/cli/install-next-steps.js +0 -29
  39. package/cli/install-options.js +0 -9
  40. package/cli/install-paths.js +0 -52
  41. package/cli/install-recommended-plugins.js +0 -104
  42. package/cli/install-required-plugins.js +0 -57
  43. package/cli/install-self-update.js +0 -62
  44. package/cli/install-workflow.js +0 -209
  45. package/cli/install.js +0 -101
  46. package/cli/lib/claude-commands.js +0 -41
  47. package/cli/lib/claude-ops.js +0 -47
  48. package/cli/lib/claude.js +0 -183
  49. package/cli/lib/config.js +0 -24
  50. package/cli/lib/doctor-claude-settings.js +0 -1186
  51. package/cli/lib/doctor-report.js +0 -978
  52. package/cli/lib/doctor-runtime-environment.js +0 -196
  53. package/cli/lib/frontmatter.js +0 -44
  54. package/cli/lib/json-schema.js +0 -57
  55. package/cli/lib/logging.js +0 -25
  56. package/cli/lib/process.js +0 -60
  57. package/cli/lib/prompts.js +0 -135
  58. package/cli/lib/runtime.js +0 -107
  59. package/cli/lib/semver.js +0 -109
  60. package/cli/lib/version.js +0 -12
  61. package/cli/protocols-body.md +0 -22
  62. package/cli/protocols.js +0 -162
  63. package/cli/registry.js +0 -123
  64. package/cli/router.js +0 -49
  65. package/cli/uninstall-actions.js +0 -360
  66. package/cli/uninstall-workflow.js +0 -146
  67. package/cli/uninstall.js +0 -42
  68. package/cli/upgrade-workflow.js +0 -80
  69. package/cli/upgrade.js +0 -91
  70. package/cli/utils.js +0 -40
  71. package/gates/adversarial-review-gate.md +0 -219
  72. package/gates/coverage-audit-gate.md +0 -182
  73. package/gates/devex-gate.md +0 -254
  74. package/gates/edge-case-gate.md +0 -194
  75. package/gates/karpathy-gate.md +0 -130
  76. package/gates/security-gate.md +0 -218
  77. package/gates/tdd-gate.md +0 -182
  78. package/gates/test-quality-gate.md +0 -59
  79. package/gates/verification-gate.md +0 -179
  80. package/hooks/hooks.json +0 -130
  81. package/hooks/scripts/common.sh +0 -237
  82. package/hooks/scripts/config-change-guard.sh +0 -94
  83. package/hooks/scripts/flow-context-watch.sh +0 -94
  84. package/hooks/scripts/inject-karpathy.sh +0 -53
  85. package/hooks/scripts/quick-mode-guard.sh +0 -69
  86. package/hooks/scripts/session-start.sh +0 -94
  87. package/hooks/scripts/session-title.sh +0 -87
  88. package/hooks/scripts/stop-watcher.sh +0 -231
  89. package/hooks/scripts/subagent-artifact-guard.sh +0 -92
  90. package/hooks/scripts/subagent-statusline.sh +0 -111
  91. package/hooks/scripts/task-lifecycle-guard.sh +0 -106
  92. package/hooks/scripts/teammate-idle-guard.sh +0 -83
  93. package/knowledge/artifact-output-discipline.md +0 -24
  94. package/knowledge/artifact-summary-contracts.md +0 -50
  95. package/knowledge/atomic-commits.md +0 -262
  96. package/knowledge/claude-code-runtime-contracts.md +0 -240
  97. package/knowledge/epic-decomposition.md +0 -307
  98. package/knowledge/execution-strategies.md +0 -303
  99. package/knowledge/karpathy-guidelines.md +0 -219
  100. package/knowledge/planning-reviews.md +0 -211
  101. package/knowledge/poc-first-workflow.md +0 -223
  102. package/knowledge/review-feedback-intake.md +0 -57
  103. package/knowledge/spec-driven-development.md +0 -180
  104. package/knowledge/systematic-debugging.md +0 -378
  105. package/knowledge/two-stage-review.md +0 -249
  106. package/knowledge/wave-execution.md +0 -403
  107. package/monitors/monitors.json +0 -8
  108. package/monitors/scripts/flow-state-monitor.sh +0 -102
  109. package/output-styles/curdx-evidence-first.md +0 -34
  110. package/output-styles/curdx-fast-mode.md +0 -42
  111. package/output-styles/curdx-spec-mode.md +0 -46
  112. package/schemas/agent-frontmatter.schema.json +0 -66
  113. package/schemas/config.schema.json +0 -134
  114. package/schemas/gate-frontmatter.schema.json +0 -30
  115. package/schemas/hooks.schema.json +0 -115
  116. package/schemas/output-style-frontmatter.schema.json +0 -22
  117. package/schemas/plugin-manifest.schema.json +0 -436
  118. package/schemas/plugin-settings.schema.json +0 -29
  119. package/schemas/skill-frontmatter.schema.json +0 -177
  120. package/schemas/spec-frontmatter.schema.json +0 -42
  121. package/schemas/spec-state.schema.json +0 -165
  122. package/settings.json +0 -8
  123. package/skills/brownfield-index/SKILL.md +0 -53
  124. package/skills/brownfield-index/references/applicability.md +0 -12
  125. package/skills/brownfield-index/references/handoff.md +0 -8
  126. package/skills/brownfield-index/references/index-contract.md +0 -10
  127. package/skills/browser-qa/SKILL.md +0 -39
  128. package/skills/browser-qa/references/handoff.md +0 -6
  129. package/skills/browser-qa/references/prerequisites.md +0 -10
  130. package/skills/browser-qa/references/qa-contract.md +0 -20
  131. package/skills/cancel/SKILL.md +0 -41
  132. package/skills/cancel/references/destructive-mode.md +0 -17
  133. package/skills/cancel/references/reporting.md +0 -18
  134. package/skills/cancel/references/state-recovery.md +0 -30
  135. package/skills/cancel/references/target-resolution.md +0 -7
  136. package/skills/debug/SKILL.md +0 -45
  137. package/skills/debug/references/context-gathering.md +0 -11
  138. package/skills/debug/references/failure-guard.md +0 -25
  139. package/skills/debug/references/intake.md +0 -12
  140. package/skills/debug/references/phase-workflow.md +0 -34
  141. package/skills/debug/references/reporting.md +0 -20
  142. package/skills/epic/SKILL.md +0 -39
  143. package/skills/epic/references/epic-artifacts.md +0 -20
  144. package/skills/epic/references/epic-intake.md +0 -9
  145. package/skills/epic/references/slice-handoff.md +0 -16
  146. package/skills/fast/SKILL.md +0 -62
  147. package/skills/fast/references/applicability.md +0 -25
  148. package/skills/fast/references/clarification.md +0 -20
  149. package/skills/fast/references/execution-contract.md +0 -56
  150. package/skills/help/SKILL.md +0 -55
  151. package/skills/help/references/dispatch.md +0 -20
  152. package/skills/help/references/overview.md +0 -39
  153. package/skills/help/references/troubleshoot.md +0 -47
  154. package/skills/help/references/workflow.md +0 -37
  155. package/skills/implement/SKILL.md +0 -104
  156. package/skills/implement/references/error-recovery.md +0 -36
  157. package/skills/implement/references/linear-execution.md +0 -43
  158. package/skills/implement/references/native-task-sync.md +0 -107
  159. package/skills/implement/references/preflight.md +0 -43
  160. package/skills/implement/references/progress-contract.md +0 -36
  161. package/skills/implement/references/state-init.md +0 -36
  162. package/skills/implement/references/stop-hook-execution.md +0 -50
  163. package/skills/implement/references/strategy-router.md +0 -38
  164. package/skills/implement/references/subagent-execution.md +0 -57
  165. package/skills/implement/references/wave-execution.md +0 -180
  166. package/skills/init/SKILL.md +0 -49
  167. package/skills/init/references/gitignore-and-health.md +0 -26
  168. package/skills/init/references/next-steps.md +0 -22
  169. package/skills/init/references/preflight.md +0 -15
  170. package/skills/init/references/scaffold-contract.md +0 -27
  171. package/skills/review/SKILL.md +0 -82
  172. package/skills/review/references/optional-passes.md +0 -48
  173. package/skills/review/references/preflight.md +0 -38
  174. package/skills/review/references/report-contract.md +0 -49
  175. package/skills/review/references/reporting.md +0 -20
  176. package/skills/review/references/stage-execution.md +0 -32
  177. package/skills/security-audit/SKILL.md +0 -47
  178. package/skills/security-audit/references/audit-contract.md +0 -21
  179. package/skills/security-audit/references/gate-handoff.md +0 -8
  180. package/skills/security-audit/references/scope-and-depth.md +0 -9
  181. package/skills/spec/SKILL.md +0 -100
  182. package/skills/spec/references/artifact-landing.md +0 -31
  183. package/skills/spec/references/phase-execution.md +0 -50
  184. package/skills/spec/references/planning-review.md +0 -31
  185. package/skills/spec/references/preflight-and-routing.md +0 -46
  186. package/skills/spec/references/reporting.md +0 -21
  187. package/skills/start/SKILL.md +0 -84
  188. package/skills/start/references/branch-routing.md +0 -51
  189. package/skills/start/references/mode-semantics.md +0 -12
  190. package/skills/start/references/preflight.md +0 -13
  191. package/skills/start/references/reporting.md +0 -20
  192. package/skills/start/references/state-seeding.md +0 -44
  193. package/skills/start/references/workflow-handoff.md +0 -26
  194. package/skills/status/SKILL.md +0 -41
  195. package/skills/status/references/gather-contract.md +0 -30
  196. package/skills/status/references/health-rules.md +0 -27
  197. package/skills/status/references/output-contract.md +0 -25
  198. package/skills/status/references/preflight.md +0 -10
  199. package/skills/status/references/recovery-hints.md +0 -18
  200. package/skills/ui-sketch/SKILL.md +0 -39
  201. package/skills/ui-sketch/references/brief-intake.md +0 -10
  202. package/skills/ui-sketch/references/iteration-handoff.md +0 -5
  203. package/skills/ui-sketch/references/variant-contract.md +0 -15
  204. package/skills/verify/SKILL.md +0 -56
  205. package/skills/verify/references/evidence-workflow.md +0 -39
  206. package/skills/verify/references/output-contract.md +0 -23
  207. package/skills/verify/references/preflight.md +0 -11
  208. package/skills/verify/references/report-handoff.md +0 -35
  209. package/skills/verify/references/strict-mode.md +0 -12
  210. package/templates/CONTEXT.md.tmpl +0 -53
  211. package/templates/PROJECT.md.tmpl +0 -59
  212. package/templates/ROADMAP.md.tmpl +0 -50
  213. package/templates/STATE.md.tmpl +0 -49
  214. package/templates/config.json.tmpl +0 -51
  215. package/templates/design.md.tmpl +0 -83
  216. package/templates/progress.md.tmpl +0 -77
  217. package/templates/requirements.md.tmpl +0 -76
  218. package/templates/research.md.tmpl +0 -83
  219. package/templates/tasks.md.tmpl +0 -107
@@ -1,219 +0,0 @@
1
- # Karpathy Guidelines — Detailed Version
2
-
3
- > Source: Andrej Karpathy's observations on X about LLM coding pitfalls. This is CurdX-Flow's L1 behavior baseline — all agents inherit from it.
4
- >
5
- > Agents reference this file via `@${CLAUDE_PLUGIN_ROOT}/knowledge/karpathy-guidelines.md`.
6
-
7
- ---
8
-
9
- ## Core Insight
10
-
11
- LLMs have 4 systemic failure modes in coding. Each can be countered with a clear principle.
12
-
13
- ---
14
-
15
- ## Principle 1: Think Before Coding
16
-
17
- ### Failure mode
18
- LLMs silently make wrong assumptions and execute, hiding confusion and failing to surface trade-offs.
19
-
20
- ### Counter rules
21
- - **Explicitly state assumptions** — "I'm assuming X, Y, Z. Correct me if that's wrong."
22
- - **Present multiple interpretations** — "export data" could mean CSV / JSON / SQL dump. List options for the user.
23
- - **Push back when warranted** — tell the user when their idea is clearly off, don't blindly follow
24
- - **Stop when confused** — ask rather than guess
25
-
26
- ### Good vs Bad
27
-
28
- **❌ Bad**:
29
- > User: "Export user data"
30
- > Agent: *directly writes code to export all users as CSV*
31
-
32
- **✓ Good**:
33
- > User: "Export user data"
34
- > Agent: "I need to confirm a few things:
35
- > - Export **all** users or a filtered subset?
36
- > - Format: CSV / JSON / SQL dump?
37
- > - Which fields? Should sensitive fields (email, phone) be masked?
38
- > - One-off script or scheduled job?"
39
-
40
- ### Triggers
41
-
42
- - Task has multiple reasonable interpretations → **must** ask
43
- - Operation affects data → **must** confirm
44
- - Simple local utility script → reasonable assumptions are OK, but state them
45
-
46
- ---
47
-
48
- ## Principle 2: Simplicity First
49
-
50
- ### Failure mode
51
- LLMs tend to "do more than asked", producing 1000-line complex solutions instead of 100-line simple ones. Over-abstraction, premature optimization, excessive defensiveness.
52
-
53
- ### Counter rules
54
- - **Minimize code** — write only the minimum needed for the task
55
- - **No features beyond the request** — if the user asks for `calculate_discount`, don't also implement `calculate_tax`
56
- - **No single-use abstractions** — Strategy pattern for one strategy? Just write a function
57
- - **No unnecessary flexibility** — don't leave hooks for "maybe future needs"
58
-
59
- ### Good vs Bad
60
-
61
- **❌ Bad** (200 lines):
62
- ```python
63
- class DiscountStrategy(ABC):
64
- @abstractmethod
65
- def calculate(self, amount): ...
66
-
67
- class PercentageDiscount(DiscountStrategy):
68
- def __init__(self, percent):
69
- self.percent = percent
70
- def calculate(self, amount):
71
- return amount * (1 - self.percent / 100)
72
-
73
- class DiscountFactory:
74
- @staticmethod
75
- def create(type, **kwargs):
76
- if type == 'percentage':
77
- return PercentageDiscount(**kwargs)
78
- # ... more types ...
79
-
80
- # Usage:
81
- factory = DiscountFactory()
82
- discount = factory.create('percentage', percent=10)
83
- result = discount.calculate(100)
84
- ```
85
-
86
- **✓ Good** (1 line):
87
- ```python
88
- def calculate_discount(amount, percent):
89
- return amount * (1 - percent / 100)
90
- ```
91
-
92
- ### When abstraction is allowed
93
- - 3+ real repetitions (not "imagined maybe-future")
94
- - Post-abstraction code is **shorter and clearer**, not more complex
95
- - Tests become **easier** to write, not harder
96
-
97
- ### Famous quote
98
- > "A little copying is better than a little dependency." — Rob Pike
99
-
100
- ---
101
-
102
- ## Principle 3: Surgical Changes
103
-
104
- ### Failure mode
105
- While editing, AI "also" improves, refactors, reformats, and fixes lint warnings. It touches code it shouldn't.
106
-
107
- ### Counter rules
108
- - **Only change required lines** — task says change A, only change A
109
- - **Match the existing style** — project uses single quotes, you use single quotes
110
- - **Don't delete pre-existing dead code** — unless user explicitly asks
111
- - **Only clean up orphan code you created** — variables you added that are unused, delete; leave others alone
112
-
113
- ### Good vs Bad
114
-
115
- **Task**: "Fix the bug where email can be empty on user registration"
116
-
117
- **❌ Bad** (10 lines changed):
118
- ```diff
119
- - def register(email, password):
120
- + def register(email: str, password: str) -> User:
121
- + """Register a new user with email validation."""
122
- + if not email:
123
- + raise ValueError("Email is required")
124
- + if not EMAIL_REGEX.match(email): # also added format validation
125
- + raise ValueError("Invalid email format")
126
- + if len(password) < 8: # also added password length
127
- + raise ValueError("Password too short")
128
- - user = User(email=email, password=password)
129
- + user = User(email=email.lower().strip(), password=hash_password(password)) # also lowercased
130
- db.save(user)
131
- return user
132
- ```
133
-
134
- **✓ Good** (2 lines changed):
135
- ```diff
136
- def register(email, password):
137
- + if not email:
138
- + raise ValueError("Email is required")
139
- user = User(email=email, password=password)
140
- db.save(user)
141
- return user
142
- ```
143
-
144
- ### Why it matters
145
- - PR does one thing → easy to review, easy to revert, git blame stays accurate
146
- - Side changes introduce **untested** variations
147
- - Doing only what was asked → trust is built
148
-
149
- ### Only exceptions
150
- - User explicitly says "clean up while you're at it"
151
- - Orphans you yourself just created
152
-
153
- ---
154
-
155
- ## Principle 4: Goal-Driven Execution
156
-
157
- ### Failure mode
158
- Vague instructions ("make it work", "fix authentication") lead to aimless changes. The LLM cannot autonomously decide when it's done.
159
-
160
- ### Counter rules
161
- - **Define success criteria first** — before starting, make "done" explicit
162
- - **Translate into verifiable goals** — not "fix auth", but "after password change, old sessions become invalid"
163
- - **Test first** — write tests that express the goal; passing = done
164
- - **Verifiable > explainable** — show evidence, not "should work"
165
-
166
- ### Good vs Bad
167
-
168
- **Vague instruction**: "Fix the authentication system"
169
-
170
- **❌ Bad**:
171
- > Agent: *modifies authController.js, middleware.js, session.js, user.model.js*
172
- > Agent: "I improved several aspects of the auth system."
173
- > [Later discovered: bug still exists, none of the changes are relevant]
174
-
175
- **✓ Good**:
176
- > Agent: "'Auth system' is too broad. Let me ask: what is the specific symptom?
177
- > - Login failing?
178
- > - Sessions expiring too fast?
179
- > - CSRF vulnerability?"
180
- > User: "After password change, old sessions still work"
181
- > Agent: "OK, success criteria are:
182
- > 1. User changes password
183
- > 2. All other sessions are invalidated immediately
184
- > 3. Verify: two browsers, A changes password, B refresh should redirect to login
185
- >
186
- > I'll write this integration test first (it will fail), then find the fix."
187
-
188
- ### Forbidden vocabulary (without evidence)
189
-
190
- - "should"
191
- - "probably"
192
- - "seems"
193
- - "done"
194
- - "fixed"
195
-
196
- These may only be used **with supporting evidence**. Evidence = command output / test passing / curl response / screenshot.
197
-
198
- ---
199
-
200
- ## Integration: How to Self-Check
201
-
202
- Before emitting output, the agent asks 4 questions:
203
-
204
- 1. **Think**: Where did I make an unstated assumption?
205
- 2. **Simplicity**: Is there code beyond the task? How much can I delete?
206
- 3. **Surgery**: Did I touch code I shouldn't have?
207
- 4. **Goal**: What is my evidence for claiming "done"?
208
-
209
- Only output when all 4 answers are "clear and evidenced".
210
-
211
- ---
212
-
213
- ## Trade-off Statement (Caution vs Speed)
214
-
215
- These 4 principles **reduce speed to improve accuracy**. For trivial tasks (rename a variable, write a doc) you can pragmatically skip strict flow. For complex tasks or production-affecting changes, **enforce every one**.
216
-
217
- ---
218
-
219
- _Source: Andrej Karpathy's observations on LLM coding pitfalls, distilled for CurdX-Flow._
@@ -1,211 +0,0 @@
1
- # Planning Reviews — 4-Dimension Planning Review
2
-
3
- > After design is complete and before tasks are generated, review the design from 4 independent angles. Originally from gstack.
4
- >
5
- > Agents reference this via `@${CLAUDE_PLUGIN_ROOT}/knowledge/planning-reviews.md`.
6
-
7
- ---
8
-
9
- ## Why
10
-
11
- `design.md` is produced by flow-architect, but the architect perspective has blind spots:
12
- - They look at "technical feasibility", not "business value"
13
- - They look at "architectural elegance", not "user experience"
14
- - They look at "current implementation", not "future maintenance"
15
-
16
- The 4 Planning Reviews **examine the same design from different angles**:
17
-
18
- ```
19
- design.md
20
- ↓ multi-perspective review
21
- ├── CEO Review business: strategy, scope, ROI
22
- ├── Engineering technical: architecture lock-in, risk
23
- ├── Design Review UX: user usability, design system
24
- └── DevEx Review maintenance: the next person who picks this up
25
- ```
26
-
27
- Each review is dispatched independently (different agent / context) to avoid perspective convergence.
28
-
29
- Finally `/curdx-flow:spec --review=all` ties them together: runs all 4 reviews in one pass.
30
-
31
- ---
32
-
33
- ## Review 1: CEO Review
34
-
35
- ### Angle: strategy / business
36
-
37
- **Question**: Is this worth doing? Is the scope right? Could it undercut next quarter's strategy?
38
-
39
- ### Checklist
40
-
41
- - [ ] **Scope is appropriate**: not too large (over-engineering), not too small (doesn't deliver value)
42
- - [ ] **Timeline is reasonable**: relative to the overall roadmap
43
- - [ ] **ROI is quantifiable**: the business value this feature brings
44
- - [ ] **Opportunity cost**: doing this means not doing what
45
- - [ ] **Strategic alignment**: does it support company OKRs / product roadmap
46
- - [ ] **Stakeholders**: who benefits, who is impacted
47
-
48
- ### Typical findings
49
-
50
- - "This design is complete, but the MVP needs only 30%. Suggest trimming."
51
- - "This will take 3 months, but project XX is more urgent right now. Suggest deferring."
52
- - "Scope is too small — users still won't convert after it ships. Suggest extending to include Y."
53
- - "This decision locks us in for 2 years. Are you sure?"
54
-
55
- ### Dispatch
56
-
57
- Dispatch `flow-architect` (switching perspective to CEO), or create a new agent `flow-ceo-reviewer`.
58
-
59
- Phase 5 implementation: reuse `flow-architect` with a prompt instructing "switch to CEO perspective".
60
-
61
- ---
62
-
63
- ## Review 2: Engineering Review
64
-
65
- ### Angle: technical architecture
66
-
67
- **Question**: Will this architecture work? Is it maintainable long-term? Are risks identified comprehensively?
68
-
69
- ### Checklist
70
-
71
- - [ ] **Architecture lock-in**: each AD-NN has explicit trade-off notes
72
- - [ ] **Scalable**: what happens at 10x users / data
73
- - [ ] **Dependencies reasonable**: is the chosen library / service the right one
74
- - [ ] **Data flow diagram clear**: mermaid reflects the real flow
75
- - [ ] **Error paths covered**: not just the happy path
76
- - [ ] **Test strategy explicit**: unit / integration / E2E ratio
77
- - [ ] **Deployment feasible**: CI/CD / monitoring / rollback considered
78
-
79
- ### Typical findings
80
-
81
- - "AD-03 picks Redis but doesn't specify the fallback on failure"
82
- - "Data flow diagram shows 3 services hitting DB directly — recommend adding a layer"
83
- - "Test strategy says 'add E2E' but doesn't specify which scenarios"
84
- - "Dependency library X has known performance issues (see GitHub issue #123)"
85
-
86
- ### Dispatch
87
-
88
- Essentially runs `flow-architect` again — but this time not to generate the design, but to **review** it.
89
-
90
- ---
91
-
92
- ## Review 3: Design Review
93
-
94
- ### Angle: UI/UX / design system
95
-
96
- **Question**: Can users use this? Is it visually consistent? Is accessibility sufficient?
97
-
98
- ### Checklist
99
-
100
- - [ ] **User flow**: main scenario completes in ≤ 3 steps
101
- - [ ] **Error states**: users know what to do on failure
102
- - [ ] **Loading states**: long operations give feedback
103
- - [ ] **Empty states**: no data does not mean a blank page
104
- - [ ] **Accessibility**: color contrast / keyboard operation / screen reader
105
- - [ ] **Design system**: uses existing tokens / component library, doesn't reinvent
106
- - [ ] **Mobile adaptation**: usable at the narrowest viewport
107
- - [ ] **Internationalization**: copy can be translated / RTL compatible
108
-
109
- ### Typical findings
110
-
111
- - "On login failure, the user only sees an error toast that disappears — can't see the specific reason"
112
- - "Input has no focus ring — keyboard users don't know where they are"
113
- - "New Button doesn't use the project's Button component — visually inconsistent"
114
- - "Mobile button is < 44pt — hard to tap"
115
-
116
- ### Dispatch
117
-
118
- `flow-ux-designer` switches into review mode.
119
-
120
- ---
121
-
122
- ## Review 4: DevEx Review
123
-
124
- ### Angle: the next maintainer
125
-
126
- **Question**: Can the person (maybe you) picking up this code 6 months from now get up to speed quickly?
127
-
128
- ### Checklist
129
-
130
- (See the 8 dimensions in `gates/devex-gate.md`)
131
-
132
- - [ ] Clear naming
133
- - [ ] Intent comments
134
- - [ ] File structure
135
- - [ ] Error messages
136
- - [ ] Easy setup
137
- - [ ] Clear types
138
- - [ ] Tests as documentation
139
- - [ ] Fast dev loop
140
-
141
- ### Typical findings
142
-
143
- - "Function named `doStuff(x, y)` — no idea what it does"
144
- - "Test named `test('calls validateEmail')` — should describe behavior"
145
- - "Setting up a new env requires 7 manual configuration steps — no docker / script"
146
- - "Error 'Failed to process' — failed how?"
147
-
148
- ### Dispatch
149
-
150
- A new agent `flow-devex-reviewer`, or reuse `flow-reviewer` by passing in the devex-gate.
151
-
152
- Phase 5 implementation: reuse `flow-reviewer` + `@${CLAUDE_PLUGIN_ROOT}/gates/devex-gate.md`.
153
-
154
- ---
155
-
156
- ## /curdx-flow:spec --review=all — Run All 4 at Once
157
-
158
- ```bash
159
- /curdx-flow:spec --review=all
160
- ```
161
-
162
- Workflow:
163
- 1. In parallel (single-message multi-Agent), dispatch 4 reviews
164
- 2. Wait for all to return
165
- 3. Merge findings, sort by priority
166
- 4. Present to the user for decision
167
-
168
- Output:
169
- ```markdown
170
- # Auto Plan Review: <spec-name>
171
-
172
- ## Summary
173
- - CEO Review: 3 findings
174
- - Engineering: 5 findings
175
- - Design: 2 findings
176
- - DevEx: 4 findings
177
-
178
- ## Blockers (fix first)
179
- 1. [Engineering] AD-03 Redis fallback missing
180
- 2. [CEO] Scope too large (suggest MVP 30%)
181
-
182
- ## Warnings
183
- ...
184
-
185
- ## Recommendations
186
- 1. Return to /curdx-flow:spec --phase=design to fix blockers
187
- 2. Record warnings in STATE.md, address in tasks phase
188
- ```
189
-
190
- ---
191
-
192
- ## When to Skip Planning Reviews
193
-
194
- - **MVP / prototype**: time-pressured, run /curdx-flow:spec --phase=tasks first, review after launch
195
- - **Tiny changes**: a single file < 50 lines doesn't warrant a 4-dimension review
196
- - **Similar work done before**: reuse prior review conclusions
197
-
198
- But for production-grade features, running through is strongly recommended.
199
-
200
- ---
201
-
202
- ## Difference from /curdx-flow:review
203
-
204
- - **/curdx-flow:review**: review **after code is finished** — Stage 1 compliance + Stage 2 quality
205
- - **/curdx-flow:spec --review**: review **before code starts** — targets design.md
206
-
207
- The two don't overlap. Plan Review prevents "doing the wrong thing", Code Review ensures "the thing was done right".
208
-
209
- ---
210
-
211
- _source: gstack's 4 maps review system. CurdX-Flow simplifies to 4 independent commands + one aggregated command._
@@ -1,223 +0,0 @@
1
- # POC-First Workflow — 5 Phases
2
-
3
- > Step-by-step methodology for the execution phase: get it running → clean up → add tests → pass quality gates → hand off review-ready evidence.
4
- >
5
- > Agents reference this file via `@${CLAUDE_PLUGIN_ROOT}/knowledge/poc-first-workflow.md`.
6
-
7
- ---
8
-
9
- ## Core Idea
10
-
11
- **Don't scaffold, test, and optimize in the same round.** Each phase focuses on one thing.
12
-
13
- ```
14
- Phase 1: Make It Work → end-to-end running; hard-coding allowed
15
- Phase 2: Refactoring → clean up structure, behavior unchanged
16
- Phase 3: Testing (TDD) → red-green-yellow loop to backfill tests
17
- Phase 4: Quality Gates → tsc + lint + test all green
18
- Phase 5: Evidence Handoff → verify, review, prepare PR/release evidence
19
- ```
20
-
21
- ---
22
-
23
- ## Phase 1: Make It Work (POC)
24
-
25
- ### Goal
26
- Get the whole flow **end-to-end running**. The user can see output / call the API / run the command.
27
-
28
- ### Allowed
29
- - ✓ Hard-coded constants (TODO: configurize)
30
- - ✓ Ignore error handling (happy path only)
31
- - ✓ Skip unit tests
32
- - ✓ Duplicate code (don't DRY yet)
33
-
34
- ### Not allowed
35
- - ✗ Claim "done" (this is only a POC)
36
- - ✗ Skip end-to-end verification (must truly run)
37
-
38
- ### Done criteria
39
- - Manual `curl` / click / call returns the expected result
40
- - At least one happy path scenario works end-to-end
41
-
42
- ### Pitfalls
43
- - **Over-hardcoded** → the refactor later becomes a rewrite. Compromise: use variables at key extension points.
44
- - **Forgot the real implementation** → left `throw new NotImplemented`. A POC must **really** implement the core path.
45
-
46
- ---
47
-
48
- ## Phase 2: Refactoring
49
-
50
- ### Goal
51
- Clean up the hasty code from Phase 1. **Behavior must not change.**
52
-
53
- ### Typical actions
54
- - Extract repeated logic into functions
55
- - Split "god functions" into smaller ones
56
- - Name things sensibly (replace Phase 1 temporary names like `doStuff` with `validateUserInput`)
57
- - Remove unnecessary intermediate variables
58
- - Apply project style (indentation, quotes, naming)
59
-
60
- ### Done criteria
61
- - Readability visibly improves
62
- - Phase 1's manual test still passes when re-run
63
- - `git diff --stat` shows cleanup, not a rewrite
64
-
65
- ### Pitfalls
66
- - **Slip in new features** → violates surgical-changes principle. New features go in a new phase / new task.
67
- - **Over-refactor** → 2 hours to polish 50 lines for "elegance". Good-enough is enough.
68
-
69
- ---
70
-
71
- ## Phase 3: Testing (TDD red-green-yellow)
72
-
73
- ### Goal
74
- Backfill test coverage, ensuring behavior stability and regression detection.
75
-
76
- ### TDD loop
77
-
78
- ```
79
- RED → GREEN → YELLOW → (next test)
80
- ```
81
-
82
- #### RED: write a failing test
83
- ```bash
84
- # Expect the test to be red
85
- npm test -- --run specific-test
86
- # ✗ FAIL — ReferenceError: ... is not defined
87
- ```
88
-
89
- **Key**: you must actually see a failure. If the test goes green immediately, something's wrong with the test (it's not actually testing the core behavior).
90
-
91
- Commit: `test(scope): red - <describe the scenario being tested>`
92
-
93
- #### GREEN: minimal implementation
94
- Write the **least** code that makes the test pass. Forget elegance.
95
-
96
- Commit: `feat(scope): green - <implementation that satisfies the test>`
97
-
98
- #### YELLOW: refactor / clean up
99
- Test already passes — now tidy the implementation.
100
-
101
- Commit: `refactor(scope): yellow - <what was cleaned up>`
102
-
103
- ### Test layers
104
- - **Unit**: pure functions, utility classes → `vitest`
105
- - **Integration**: inter-component, DB interactions → `vitest` + `supertest`
106
- - **E2E**: full user flow → `playwright` or `chrome-devtools` MCP
107
-
108
- ### Coverage targets
109
- - Core business logic ≥ 80%
110
- - Utility functions 100%
111
- - UI components: snapshots + key interactions
112
- - Error paths must have tests (not just happy path)
113
-
114
- ### Pitfalls
115
- - **Code first, tests after** → not TDD, this is post-hoc testing. Easily misses edge cases.
116
- - **Tests only cover happy path** → no error handling coverage; the common cause of production issues.
117
- - **Too much mocking** → tests disconnected from real behavior. Keep primary evidence anchored to real integrations at system boundaries.
118
-
119
- ---
120
-
121
- ## Phase 4: Quality Gates
122
-
123
- ### Goal
124
- Full local checks to ensure CI will be green.
125
-
126
- ### Standard checks
127
-
128
- ```bash
129
- # 1. TypeScript strict mode
130
- npx tsc --strict --noEmit
131
-
132
- # 2. Lint
133
- npx eslint src/
134
-
135
- # 3. All tests
136
- npm test
137
-
138
- # 4. Coverage (optional)
139
- npm test -- --coverage
140
- ```
141
-
142
- ### Done criteria
143
- Each item produces **0 errors, 0 warnings** (warnings are not tolerated since they proliferate).
144
-
145
- ### Additional checks (project-dependent)
146
- - Security scan (npm audit / OWASP)
147
- - Bundle size (bundlewatch)
148
- - Performance baseline (benchmark)
149
-
150
- ### Pitfalls
151
- - **`// eslint-disable` everywhere** → this hides problems, doesn't solve them. Only use with strong justification and an explanatory comment.
152
- - **Skip quality gate and open PR** → CI goes red, wastes reviewer time. Run locally first, then PR.
153
-
154
- ---
155
-
156
- ## Phase 5: Evidence Handoff
157
-
158
- ### Goal
159
- Feature is verified, reviewed, and ready for a human PR/release decision.
160
-
161
- ### Steps
162
-
163
- 1. **Tidy git history**
164
- - One commit per task (atomic)
165
- - Commit message follows conventional format
166
- - Squash if there are too many WIP commits
167
-
168
- 2. **Prepare PR/release evidence**
169
- - Clear title (< 70 chars)
170
- - Summary 3-5 lines covering why & what
171
- - Include a test plan (checklist)
172
- - Link spec / issue
173
-
174
- 3. **Respond to review**
175
- - Reply to every comment ("fixed" or "won't fix because X")
176
- - Request re-review after fixes — don't change silently
177
- - Push back politely when you disagree, with evidence
178
-
179
- 4. **Keep CI green**
180
- - Wait for CI green after every push
181
- - Fix red immediately — don't pile up
182
-
183
- 5. **Hand off**
184
- - Squash vs merge vs rebase: per project convention
185
- - Use the host project's normal PR, merge, and release process
186
-
187
- ### Pitfalls
188
- - **PR too large** → reviewer gives up. Split anything > 500 changed lines.
189
- - **Empty PR description** → reviewer has no idea what they're looking at. At minimum provide a Summary.
190
- - **Force-push during review** → reviewer loses context. Use new commits; squash at merge time.
191
-
192
- ---
193
-
194
- ## When to Skip Phases
195
-
196
- ### Sketch mode (prototype exploration)
197
- - Phase 1 (POC) is enough
198
- - Skip Refactoring / Testing / Quality Gates
199
-
200
- ### Fast mode (one-off task)
201
- - Only Phase 1 + Phase 5 (handoff evidence stays lightweight)
202
- - Suitable for: fixing a typo, adding a log, changing a constant
203
-
204
- ### Standard mode (default)
205
- - All 5 phases
206
-
207
- ### Enterprise mode
208
- - 5 phases + multi-agent review (flow-adversary / flow-edge-hunter) + Security gate
209
-
210
- ---
211
-
212
- ## Relationship to the TDD Iron Rule
213
-
214
- **TDD iron rule**: "NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST"
215
-
216
- **POC-First's compromise**: Phase 1 allows skipping tests because the purpose of POC is to validate the idea, not to deliver.
217
-
218
- **Reconciliation**:
219
- - POC phase: allow getting things running first (feasibility check)
220
- - Starting at Phase 2+: revert to strict TDD
221
- - Newly added **production code**: must be covered by Phase 3 TDD loop
222
-
223
- The two don't conflict: POC is not production code; production code starts at Phase 2.