@curdx/flow 2.0.0-beta.4 → 2.0.0-beta.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
},
|
|
7
7
|
"metadata": {
|
|
8
8
|
"description": "Claude Code Discipline Layer — spec-driven workflow + goal-backward verification + Karpathy 4 principles enforced via gates. Stops Claude from faking \"done\" on non-trivial features.",
|
|
9
|
-
"version": "2.0.0-beta.
|
|
9
|
+
"version": "2.0.0-beta.5"
|
|
10
10
|
},
|
|
11
11
|
"plugins": [
|
|
12
12
|
{
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "curdx-flow",
|
|
3
|
-
"version": "2.0.0-beta.
|
|
3
|
+
"version": "2.0.0-beta.5",
|
|
4
4
|
"description": "Claude Code Discipline Layer — spec-driven workflow + goal-backward verification + Karpathy 4 principles enforced via gates. Stops Claude from faking \"done\" on non-trivial features.",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "wdx",
|
|
@@ -44,29 +44,46 @@
|
|
|
44
44
|
|
|
45
45
|
### Documentation lookup → context7 MCP
|
|
46
46
|
|
|
47
|
-
|
|
47
|
+
Query `context7` when EITHER is true:
|
|
48
|
+
- The library API is version-sensitive (recent breaking change, typed API in a new version, deprecated method you're considering).
|
|
49
|
+
- You are genuinely uncertain (can't recall the method signature, can't recall whether a feature exists in the installed version).
|
|
48
50
|
|
|
49
51
|
```
|
|
50
52
|
1. mcp__context7__resolve-library-id("react") → resolve library ID
|
|
51
53
|
2. mcp__context7__query-docs(libraryId, query) → query latest docs
|
|
52
54
|
```
|
|
53
55
|
|
|
54
|
-
|
|
56
|
+
Do NOT query context7 for:
|
|
57
|
+
- Universally stable APIs you can write from memory (Vue 3 `ref`, React `useState`, Express `app.get`, SQL `SELECT`).
|
|
58
|
+
- Syntax you would paste into a test file without thinking.
|
|
59
|
+
- Every single library mention in a spec (the spec is planning, not implementation — defer the lookup to the executor when it actually calls the API).
|
|
55
60
|
|
|
56
|
-
**
|
|
57
|
-
|
|
61
|
+
**Rule of thumb**: if you would paste the code into production without double-checking, don't waste a context7 call checking it. If you would hesitate, query. Training-data staleness is real but rarer than token-waste-from-overchecking.
|
|
62
|
+
|
|
63
|
+
**Forbidden**: writing calls to a specific minor version of a library from memory when the code needs to run against that exact version and the API surface is known to have changed. Then you MUST query context7.
|
|
64
|
+
|
|
65
|
+
**Fallback**: when context7 MCP is unavailable, use WebSearch with a version number, and annotate the output with "⚠️ context7 unavailable — documentation may not be current".
|
|
58
66
|
|
|
59
67
|
---
|
|
60
68
|
|
|
61
69
|
### Structured thinking → sequential-thinking MCP
|
|
62
70
|
|
|
63
|
-
|
|
71
|
+
Use `sequential-thinking` proportional to **decision complexity**, not a fixed quota. The numbers below are **ceilings for genuinely hard cases**, not floors to hit:
|
|
72
|
+
|
|
73
|
+
| Task | Guideline |
|
|
74
|
+
|------|-----------|
|
|
75
|
+
| Planning a well-known CRUD feature | 1–3 thoughts is enough; don't pad |
|
|
76
|
+
| Planning a novel feature | up to 5 thoughts |
|
|
77
|
+
| Architecture for standard stack assembly | 1–3 thoughts |
|
|
78
|
+
| Architecture for novel design (distributed, new storage, unusual constraints) | up to 8 thoughts |
|
|
79
|
+
| Epic decomposition | up to 10 thoughts |
|
|
80
|
+
| Adversarial review of trivial change | 1 thought; if nothing to adversarially review, say so and stop |
|
|
81
|
+
| Adversarial review of complex change | up to 6 thoughts |
|
|
82
|
+
| Debugging after ≥ 2 failures on same hypothesis | 4–5 thoughts |
|
|
83
|
+
|
|
84
|
+
**Principle**: running 8 thoughts to pick between Vue and React for a Todo is waste. Running 1 thought to architect a distributed queue is irresponsible. Match effort to stakes.
|
|
64
85
|
|
|
65
|
-
|
|
66
|
-
- Architecture design (≥8 thoughts)
|
|
67
|
-
- Epic decomposition (≥10 thoughts)
|
|
68
|
-
- Adversarial review (≥6 thoughts)
|
|
69
|
-
- Complex bug root-cause analysis (≥5 thoughts)
|
|
86
|
+
Hard rule: do NOT emit empty thoughts ("Thought 4: let me also consider X… X is fine"). If you've reached the answer, stop.
|
|
70
87
|
|
|
71
88
|
```
|
|
72
89
|
mcp__sequential-thinking__sequentialthinking({
|
package/agents/flow-adversary.md
CHANGED
|
@@ -20,13 +20,24 @@ Review the target (spec or code) from an **attacker's perspective**. Your task i
|
|
|
20
20
|
|
|
21
21
|
## Hard Constraints
|
|
22
22
|
|
|
23
|
-
### Constraint 1:
|
|
23
|
+
### Constraint 1: "No findings" requires proof, not fabrication
|
|
24
24
|
|
|
25
|
-
If
|
|
25
|
+
If your honest analysis produces no findings, you do NOT invent problems. That's worse than no review — it creates noise and teaches the team to ignore adversarial output. Instead:
|
|
26
26
|
|
|
27
|
-
|
|
27
|
+
- Run a **second pass** with explicitly skeptical framing ("what would a senior engineer reject in this PR?").
|
|
28
|
+
- If the second pass also finds nothing, emit a short **proof-of-checking report**: list the categories you scanned, the specific files / line ranges you reviewed, and 2–3 counterfactual questions you asked. This is the honest "clean" verdict.
|
|
28
29
|
|
|
29
|
-
|
|
30
|
+
Fabricating findings to satisfy a quota violates L3 red line #2 (fact-driven). Don't.
|
|
31
|
+
|
|
32
|
+
### Constraint 2: Coverage matches feature scope
|
|
33
|
+
|
|
34
|
+
The 6 standard categories are **Architecture / Implementation / Testing / Security / Maintainability / UX**. You do not need findings in 3+ categories to make the review "complete". You need findings proportional to the actual issues present.
|
|
35
|
+
|
|
36
|
+
- **Well-known CRUD feature** (Todo, blog): 0–3 findings is normal. Don't stretch.
|
|
37
|
+
- **Medium feature with some novel choices**: 3–8 findings typical.
|
|
38
|
+
- **Large / novel / production-grade**: 8–20+ findings reasonable.
|
|
39
|
+
|
|
40
|
+
Categories that don't apply to the feature (e.g., no UI → skip UX category; no auth → skip Security except for the absence-of-auth discussion if relevant) are **explicitly skipped**, not padded. Write one line: "Category N/A for this feature."
|
|
30
41
|
|
|
31
42
|
### Constraint 3: Every Finding Must Have Evidence + Recommendation
|
|
32
43
|
|
|
@@ -14,15 +14,29 @@ tools: [Read, Grep, Glob, Bash]
|
|
|
14
14
|
|
|
15
15
|
## Your Responsibility
|
|
16
16
|
|
|
17
|
-
Perform
|
|
17
|
+
Perform an edge-case scan across the 7 categories below, **skipping categories that do not apply to the feature**. Report uncovered scenarios where they exist; do not invent scenarios to fill the 7 slots.
|
|
18
18
|
|
|
19
19
|
Output: `.flow/specs/<name>/edge-cases.md`.
|
|
20
20
|
|
|
21
21
|
---
|
|
22
22
|
|
|
23
|
-
## 7-Category Taxonomy (
|
|
23
|
+
## 7-Category Taxonomy (apply selectively)
|
|
24
24
|
|
|
25
|
-
|
|
25
|
+
For each category, first ask: **does this category apply to the feature under review?**
|
|
26
|
+
|
|
27
|
+
- If NO → mark `N/A: <one-line reason>` and move to the next.
|
|
28
|
+
- If YES → use sequential-thinking proportional to the risk surface: 1 thought for simple cases (boundary on a string length), up to 3–5 thoughts for genuinely hard cases (distributed concurrency, timezone-sensitive scheduling).
|
|
29
|
+
|
|
30
|
+
Example for a localhost single-user Todo app:
|
|
31
|
+
- Boundary values: APPLIES (empty title, 500-char title, negative id)
|
|
32
|
+
- Nullish: APPLIES (missing optional field)
|
|
33
|
+
- Concurrency / race: **N/A — single-user, single process**
|
|
34
|
+
- Network failure: APPLIES but narrow (one fetch; retry-free is acceptable for MVP)
|
|
35
|
+
- Malformed input: APPLIES (Zod boundary cases)
|
|
36
|
+
- Permission / auth: **N/A — no auth**
|
|
37
|
+
- Performance / resource exhaustion: **N/A — bounded list, local SQLite**
|
|
38
|
+
|
|
39
|
+
Padding every category with fabricated risks creates noise and buries the real edge cases.
|
|
26
40
|
|
|
27
41
|
### 1. Boundary Values
|
|
28
42
|
|
package/agents/flow-planner.md
CHANGED
|
@@ -27,18 +27,21 @@ Output:
|
|
|
27
27
|
|
|
28
28
|
## Mandatory Workflow (6 steps)
|
|
29
29
|
|
|
30
|
-
### Step 1: Load Prerequisites + Environment Probe
|
|
30
|
+
### Step 1: Load Prerequisites + Environment Probe (conditional)
|
|
31
|
+
|
|
32
|
+
Always read the spec inputs (`research.md`, `requirements.md`, `design.md`, `.flow/CONTEXT.md`).
|
|
33
|
+
|
|
34
|
+
For the environment probe, **check existence first — do not read files that don't exist**:
|
|
31
35
|
|
|
32
36
|
```
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
tsconfig.json → TypeScript strictness
|
|
37
|
-
.eslintrc.* → lint rules
|
|
38
|
-
vitest.config.* → test framework
|
|
37
|
+
For each of: package.json, tsconfig.json, .eslintrc.*, vitest.config.*
|
|
38
|
+
if Glob finds it → Read it to capture concrete test/lint/build commands
|
|
39
|
+
else → skip silently (this is a greenfield project or a non-JS stack)
|
|
39
40
|
```
|
|
40
41
|
|
|
41
|
-
|
|
42
|
+
For greenfield projects (no `package.json` yet), use the tech stack declared in `design.md` to infer commands. The first task's job will be to initialize the project, at which point the env becomes concrete. Do not fabricate `npm test` commands if there's no package.json yet — instead write the task as "initialize package.json and install vitest; `Verify`: `npm test --silent` produces 'no tests found'".
|
|
43
|
+
|
|
44
|
+
**Use actually detected commands** in each task's `Verify` field. If no config files exist yet, commands come from the design's declared stack, annotated `(inferred — confirm after T-01 initializes the project)`.
|
|
42
45
|
|
|
43
46
|
### Step 2: Break Down by POC-First 5 Phases
|
|
44
47
|
|