@curdx/flow 1.1.4 → 1.1.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +25 -0
- package/.claude-plugin/plugin.json +43 -0
- package/CHANGELOG.md +279 -0
- package/agent-preamble/preamble.md +214 -0
- package/agents/flow-adversary.md +216 -0
- package/agents/flow-architect.md +190 -0
- package/agents/flow-debugger.md +325 -0
- package/agents/flow-edge-hunter.md +273 -0
- package/agents/flow-executor.md +246 -0
- package/agents/flow-planner.md +204 -0
- package/agents/flow-product-designer.md +146 -0
- package/agents/flow-qa-engineer.md +276 -0
- package/agents/flow-researcher.md +155 -0
- package/agents/flow-reviewer.md +280 -0
- package/agents/flow-security-auditor.md +398 -0
- package/agents/flow-triage-analyst.md +290 -0
- package/agents/flow-ui-researcher.md +227 -0
- package/agents/flow-ux-designer.md +247 -0
- package/agents/flow-verifier.md +283 -0
- package/agents/persona-amelia.md +128 -0
- package/agents/persona-david.md +141 -0
- package/agents/persona-emma.md +179 -0
- package/agents/persona-john.md +105 -0
- package/agents/persona-mary.md +95 -0
- package/agents/persona-oliver.md +136 -0
- package/agents/persona-rachel.md +126 -0
- package/agents/persona-serena.md +175 -0
- package/agents/persona-winston.md +117 -0
- package/bin/curdx-flow.js +5 -2
- package/cli/install.js +44 -5
- package/commands/audit.md +170 -0
- package/commands/autoplan.md +184 -0
- package/commands/debug.md +199 -0
- package/commands/design.md +155 -0
- package/commands/discuss.md +162 -0
- package/commands/doctor.md +124 -0
- package/commands/fast.md +128 -0
- package/commands/help.md +119 -0
- package/commands/implement.md +381 -0
- package/commands/index.md +261 -0
- package/commands/init.md +105 -0
- package/commands/install-deps.md +128 -0
- package/commands/party.md +241 -0
- package/commands/plan-ceo.md +117 -0
- package/commands/plan-design.md +107 -0
- package/commands/plan-dx.md +104 -0
- package/commands/plan-eng.md +108 -0
- package/commands/qa.md +118 -0
- package/commands/requirements.md +146 -0
- package/commands/research.md +141 -0
- package/commands/review.md +168 -0
- package/commands/security.md +109 -0
- package/commands/sketch.md +118 -0
- package/commands/spec.md +135 -0
- package/commands/spike.md +181 -0
- package/commands/start.md +189 -0
- package/commands/status.md +139 -0
- package/commands/switch.md +95 -0
- package/commands/tasks.md +189 -0
- package/commands/triage.md +160 -0
- package/commands/verify.md +124 -0
- package/gates/adversarial-review-gate.md +219 -0
- package/gates/coverage-audit-gate.md +184 -0
- package/gates/devex-gate.md +255 -0
- package/gates/edge-case-gate.md +194 -0
- package/gates/karpathy-gate.md +130 -0
- package/gates/security-gate.md +218 -0
- package/gates/tdd-gate.md +188 -0
- package/gates/verification-gate.md +183 -0
- package/hooks/hooks.json +56 -0
- package/hooks/scripts/fail-tracker.sh +31 -0
- package/hooks/scripts/inject-karpathy.sh +52 -0
- package/hooks/scripts/quick-mode-guard.sh +64 -0
- package/hooks/scripts/session-start.sh +76 -0
- package/hooks/scripts/stop-watcher.sh +166 -0
- package/knowledge/atomic-commits.md +262 -0
- package/knowledge/epic-decomposition.md +307 -0
- package/knowledge/execution-strategies.md +278 -0
- package/knowledge/karpathy-guidelines.md +219 -0
- package/knowledge/planning-reviews.md +211 -0
- package/knowledge/poc-first-workflow.md +227 -0
- package/knowledge/spec-driven-development.md +183 -0
- package/knowledge/systematic-debugging.md +384 -0
- package/knowledge/two-stage-review.md +233 -0
- package/knowledge/wave-execution.md +387 -0
- package/package.json +14 -3
- package/schemas/config.schema.json +100 -0
- package/schemas/spec-frontmatter.schema.json +42 -0
- package/schemas/spec-state.schema.json +117 -0
|
@@ -0,0 +1,255 @@
|
|
|
1
|
+
---
|
|
2
|
+
gate: devex-gate
|
|
3
|
+
category: enterprise-mode
|
|
4
|
+
severity: warning
|
|
5
|
+
depends_on: []
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# DevEx Gate — Developer Experience Review
|
|
9
|
+
|
|
10
|
+
> Optional in Enterprise mode. Reviews whether code is friendly to **the next maintainer**.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## Trigger Timing
|
|
15
|
+
|
|
16
|
+
- When `/curdx-flow:plan-dx` runs (design phase)
|
|
17
|
+
- When `/curdx-flow:review --devex` runs (code phase)
|
|
18
|
+
- Enabled by default in open-source / multi-person collaboration scenarios
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Core Question
|
|
23
|
+
|
|
24
|
+
**"Six months from now, can I (or my colleague) quickly take over this code?"**
|
|
25
|
+
|
|
26
|
+
Not considering this → code becomes legacy → maintenance cost grows exponentially.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## 8 Dimensions
|
|
31
|
+
|
|
32
|
+
### DX-01: Clear Naming
|
|
33
|
+
|
|
34
|
+
Naming = the most important documentation.
|
|
35
|
+
|
|
36
|
+
❌ Bad:
|
|
37
|
+
```typescript
|
|
38
|
+
function doStuff(x, y) { ... }
|
|
39
|
+
const d = new Date()
|
|
40
|
+
let flag = true
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
✓ Good:
|
|
44
|
+
```typescript
|
|
45
|
+
function validateEmailFormat(email: string): boolean { ... }
|
|
46
|
+
const currentTimestamp = new Date()
|
|
47
|
+
let isAuthenticationPending = true
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
**Checks**:
|
|
51
|
+
- Abbreviations (`usr`, `pwd`, `addr`) should be expanded (unless domain-standard like `API`, `URL`)
|
|
52
|
+
- Booleans use `is/has/can/should` prefix
|
|
53
|
+
- Functions start with verbs
|
|
54
|
+
- Variables use nouns
|
|
55
|
+
- Single letters only in loops/lambdas (`i`, `x`)
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
### DX-02: Intent Comments
|
|
60
|
+
|
|
61
|
+
Comments explain **why**, not **what**.
|
|
62
|
+
|
|
63
|
+
❌ Bad (comment adds no value):
|
|
64
|
+
```typescript
|
|
65
|
+
// get the user
|
|
66
|
+
const user = await getUser(id)
|
|
67
|
+
|
|
68
|
+
// increment i by 1
|
|
69
|
+
i++
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
✓ Good (comment explains why):
|
|
73
|
+
```typescript
|
|
74
|
+
// bcrypt.compare has fixed execution time; run it even when the user does not exist to prevent timing attacks
|
|
75
|
+
const hash = user?.passwordHash ?? FAKE_HASH
|
|
76
|
+
await bcrypt.compare(inputPwd, hash)
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
**Checks**:
|
|
80
|
+
- Are there low-value comments (`// set x to 1`)?
|
|
81
|
+
- Are there magic numbers / odd practices lacking explanation?
|
|
82
|
+
- Do public APIs have doc comments?
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
### DX-03: Discoverable File Structure
|
|
87
|
+
|
|
88
|
+
```
|
|
89
|
+
src/auth/
|
|
90
|
+
├── index.ts ← export entry
|
|
91
|
+
├── login.ts ← business logic
|
|
92
|
+
├── login.test.ts ← tests (colocated)
|
|
93
|
+
├── types.ts ← types
|
|
94
|
+
└── __mocks__/ ← mocks (if any)
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
**Checks**:
|
|
98
|
+
- Are new files placed in the right location (follow existing patterns)?
|
|
99
|
+
- Naming conventions are consistent (don't mix `login.ts` with `LoginService.ts`)
|
|
100
|
+
- Depth does not exceed 4 levels
|
|
101
|
+
|
|
102
|
+
---
|
|
103
|
+
|
|
104
|
+
### DX-04: Useful Error Messages
|
|
105
|
+
|
|
106
|
+
❌ Bad:
|
|
107
|
+
```
|
|
108
|
+
Error: Failed
|
|
109
|
+
Error: Validation error
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
✓ Good:
|
|
113
|
+
```
|
|
114
|
+
Error: Failed to send email to user@example.com (SMTP 554 - rejected)
|
|
115
|
+
Error: email must match pattern /^[^@]+@[^@]+\.[^@]+$/, got ""
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
**Checks**:
|
|
119
|
+
- Error messages include **where it went wrong**, **what went wrong**, and **what the user/developer can do**
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
### DX-05: Easy Setup
|
|
124
|
+
|
|
125
|
+
After a newcomer clones the repo:
|
|
126
|
+
|
|
127
|
+
```bash
|
|
128
|
+
git clone ...
|
|
129
|
+
cd project
|
|
130
|
+
<single command to run>
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
Should start with one command (`npm install && npm run dev`).
|
|
134
|
+
|
|
135
|
+
**Checks**:
|
|
136
|
+
- README has a "Getting Started" section
|
|
137
|
+
- No hidden dependencies ("must install PostgreSQL and create a user first")
|
|
138
|
+
- Or a single docker-compose command
|
|
139
|
+
|
|
140
|
+
---
|
|
141
|
+
|
|
142
|
+
### DX-06: Clear Types
|
|
143
|
+
|
|
144
|
+
TypeScript / Python with types:
|
|
145
|
+
```typescript
|
|
146
|
+
// ❌ any hell
|
|
147
|
+
function process(data: any): any { ... }
|
|
148
|
+
|
|
149
|
+
// ✓ explicit types
|
|
150
|
+
function processUser(user: User): ProcessedUser { ... }
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
**Checks**:
|
|
154
|
+
- `any` usage (the less the better)
|
|
155
|
+
- `unknown` for boundary inputs
|
|
156
|
+
- Interfaces / Types have meaningful names
|
|
157
|
+
|
|
158
|
+
---
|
|
159
|
+
|
|
160
|
+
### DX-07: Tests as Documentation
|
|
161
|
+
|
|
162
|
+
```typescript
|
|
163
|
+
describe("login endpoint", () => {
|
|
164
|
+
test("rejects empty email with 400", async () => { ... })
|
|
165
|
+
test("rejects invalid format email with 400", async () => { ... })
|
|
166
|
+
test("rejects unknown email with 401", async () => { ... })
|
|
167
|
+
test("rejects wrong password with 401", async () => { ... })
|
|
168
|
+
test("accepts valid credentials with 200 + JWT", async () => { ... })
|
|
169
|
+
})
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
Reading these test names = reading API behavior documentation.
|
|
173
|
+
|
|
174
|
+
**Checks**:
|
|
175
|
+
- Test names describe **behavior** ("returns 400 for empty email"), not implementation ("calls validateEmail")
|
|
176
|
+
- Tests cover happy + edge + error paths
|
|
177
|
+
- Each test is independent (does not depend on order)
|
|
178
|
+
|
|
179
|
+
---
|
|
180
|
+
|
|
181
|
+
### DX-08: Fast Dev Loop
|
|
182
|
+
|
|
183
|
+
- Is build time acceptable? (< 30s to change one line)
|
|
184
|
+
- Is test time acceptable? (< 60s to run relevant tests)
|
|
185
|
+
- Does HMR / live reload work?
|
|
186
|
+
|
|
187
|
+
**Checks**:
|
|
188
|
+
- tsc --watch / bundler configuration is reasonable
|
|
189
|
+
- Test runner supports --watch
|
|
190
|
+
- Non-critical tests are optional (e.g. E2E in CI, not on every commit)
|
|
191
|
+
|
|
192
|
+
---
|
|
193
|
+
|
|
194
|
+
## Checking Methods
|
|
195
|
+
|
|
196
|
+
### Agent Automatic
|
|
197
|
+
|
|
198
|
+
When `flow-ux-designer` / `flow-reviewer` applies this gate, use sequential-thinking ≥ 4 rounds to scan the 8 dimensions.
|
|
199
|
+
|
|
200
|
+
### Human Review
|
|
201
|
+
|
|
202
|
+
Attach a DevEx checklist at PR time:
|
|
203
|
+
- [ ] Clear naming (reviewed at least 3 times)
|
|
204
|
+
- [ ] Critical comments exist
|
|
205
|
+
- [ ] Consistent structure
|
|
206
|
+
- [ ] Actionable error messages
|
|
207
|
+
- [ ] Tests as docs
|
|
208
|
+
|
|
209
|
+
---
|
|
210
|
+
|
|
211
|
+
## Scoring
|
|
212
|
+
|
|
213
|
+
Each dimension 0-10 points:
|
|
214
|
+
|
|
215
|
+
```
|
|
216
|
+
10 = best practice
|
|
217
|
+
8 = good
|
|
218
|
+
5 = pass (production-usable)
|
|
219
|
+
3 = needs improvement
|
|
220
|
+
0 = serious issue
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
Total 40+ / 80 = pass (warning, non-blocking).
|
|
224
|
+
Total < 40 = blocked, improvement required.
|
|
225
|
+
|
|
226
|
+
---
|
|
227
|
+
|
|
228
|
+
## Output Format
|
|
229
|
+
|
|
230
|
+
```markdown
|
|
231
|
+
## DevEx Gate Report
|
|
232
|
+
|
|
233
|
+
Scores:
|
|
234
|
+
DX-01 naming: 7/10 — 2 abbreviations (usr, pwd)
|
|
235
|
+
DX-02 comments: 8/10 — magic number 42 not explained
|
|
236
|
+
DX-03 structure: 9/10 — consistent
|
|
237
|
+
DX-04 errors: 5/10 — 2 uninformative "Failed"
|
|
238
|
+
DX-05 Setup: 8/10 — README complete
|
|
239
|
+
DX-06 types: 7/10 — 3 instances of any
|
|
240
|
+
DX-07 tests: 6/10 — test names too implementation-detail
|
|
241
|
+
DX-08 dev loop: 9/10 — HMR works well
|
|
242
|
+
|
|
243
|
+
Total: 59/80 (pass)
|
|
244
|
+
|
|
245
|
+
Improvement recommendations:
|
|
246
|
+
1. Replace usr/pwd with user/password
|
|
247
|
+
2. Comment magic number 42 (reason for timeout=42s)
|
|
248
|
+
3. Change error message "Failed" → specific reason
|
|
249
|
+
4. Several any usages can be typed explicitly
|
|
250
|
+
5. Rewrite test names with "behavior" descriptions
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
---
|
|
254
|
+
|
|
255
|
+
_source: years of development experience + gstack's DX review philosophy._
|
|
@@ -0,0 +1,194 @@
|
|
|
1
|
+
---
|
|
2
|
+
gate: edge-case-gate
|
|
3
|
+
category: enterprise-mode
|
|
4
|
+
severity: warning
|
|
5
|
+
depends_on: []
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Edge Case Gate — Edge Case Hunter
|
|
9
|
+
|
|
10
|
+
> Derived from BMAD-METHOD's "Edge Case Hunter".
|
|
11
|
+
>
|
|
12
|
+
> **Core**: specifically hunts for **non-happy-path** scenarios. User stories describe the happy path; the real world is full of edge cases.
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## Trigger Timing
|
|
17
|
+
|
|
18
|
+
- After the requirements phase ends (to supplement edge conditions)
|
|
19
|
+
- After the design phase (to check error-path completeness)
|
|
20
|
+
- After tests are written (to check whether only the happy path is covered)
|
|
21
|
+
- Explicitly requested by /curdx-flow:audit
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## 7 Categories
|
|
26
|
+
|
|
27
|
+
Systematically inspect the object under review (function / component / API):
|
|
28
|
+
|
|
29
|
+
### 1. Boundary Values
|
|
30
|
+
|
|
31
|
+
- 0, -1, 1
|
|
32
|
+
- INT_MAX, INT_MIN
|
|
33
|
+
- Empty array `[]`, single-element array `[x]`, large array `[x...10000]`
|
|
34
|
+
- Empty string `""`, single character `"a"`, extra-long string
|
|
35
|
+
- First element / last element / middle element
|
|
36
|
+
|
|
37
|
+
### 2. Nullish
|
|
38
|
+
|
|
39
|
+
- `null`
|
|
40
|
+
- `undefined`
|
|
41
|
+
- Empty object `{}`
|
|
42
|
+
- Object with missing fields (key does not exist in JSON)
|
|
43
|
+
- Whether default parameters are actually applied
|
|
44
|
+
|
|
45
|
+
### 3. Concurrency
|
|
46
|
+
|
|
47
|
+
- Two requests arriving simultaneously
|
|
48
|
+
- Write conflicts (optimistic / pessimistic lock)
|
|
49
|
+
- Read-modify-write race
|
|
50
|
+
- Cache invalidation timing
|
|
51
|
+
- Distributed locks
|
|
52
|
+
|
|
53
|
+
### 4. Error Recovery
|
|
54
|
+
|
|
55
|
+
- Network outage → retry strategy?
|
|
56
|
+
- DB unavailable → circuit breaker?
|
|
57
|
+
- Disk full → degrade?
|
|
58
|
+
- Permission revoked → graceful exit?
|
|
59
|
+
- Dependency service 500 → fallback?
|
|
60
|
+
|
|
61
|
+
### 5. Security
|
|
62
|
+
|
|
63
|
+
- SQL/Command/XSS injection
|
|
64
|
+
- Unauthorized access (use A's token to access B's resource)
|
|
65
|
+
- Sensitive data leakage (logs / error messages / response)
|
|
66
|
+
- Rate limiting bypass
|
|
67
|
+
- CSRF / session fixation
|
|
68
|
+
- Timing attack
|
|
69
|
+
|
|
70
|
+
### 6. Internationalization (I18n)
|
|
71
|
+
|
|
72
|
+
- Unicode (emoji, CJK, combining characters)
|
|
73
|
+
- RTL (Arabic)
|
|
74
|
+
- Time zones (UTC vs local, DST jumps)
|
|
75
|
+
- Number formats (decimal point vs comma)
|
|
76
|
+
- Sorting (locale-aware collation)
|
|
77
|
+
|
|
78
|
+
### 7. Performance
|
|
79
|
+
|
|
80
|
+
- N+1 queries
|
|
81
|
+
- Slow queries (missing indexes)
|
|
82
|
+
- Large responses (M/G scale)
|
|
83
|
+
- Memory leaks (event listeners, closures)
|
|
84
|
+
- Deadlocks / long-running transactions
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
## Required Question Checklist
|
|
89
|
+
|
|
90
|
+
For each category, the agent must answer (via sequential-thinking):
|
|
91
|
+
|
|
92
|
+
```
|
|
93
|
+
Q1. What inputs/scenarios will this feature encounter for [category]?
|
|
94
|
+
Q2. If the input is [extreme value], what will the current implementation do?
|
|
95
|
+
Q3. Is there a test covering this scenario?
|
|
96
|
+
Q4. If no test, what test should be added to cover it?
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## Execution Flow
|
|
102
|
+
|
|
103
|
+
```
|
|
104
|
+
Input: object under review (function / component / API) + requirements + tests
|
|
105
|
+
↓
|
|
106
|
+
For each category (1-7):
|
|
107
|
+
1. Use sequential-thinking to list at least 3 possible edge scenarios
|
|
108
|
+
2. Check whether each scenario has corresponding coverage in tests
|
|
109
|
+
3. Add uncovered ones to the "gap list"
|
|
110
|
+
↓
|
|
111
|
+
Output: edge-cases.md
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
## Output Format
|
|
117
|
+
|
|
118
|
+
```markdown
|
|
119
|
+
## Edge Case Hunt Report
|
|
120
|
+
|
|
121
|
+
Object under review: src/auth/login.ts + login.test.ts
|
|
122
|
+
|
|
123
|
+
## Covered (✓)
|
|
124
|
+
|
|
125
|
+
- Valid email + password → 200 + JWT
|
|
126
|
+
- Invalid email format → 400
|
|
127
|
+
- Non-existent user → 401
|
|
128
|
+
- Wrong password → 401
|
|
129
|
+
|
|
130
|
+
## Gap List (✗)
|
|
131
|
+
|
|
132
|
+
### 1. Boundary Values
|
|
133
|
+
- ✗ Extra-long email (>255) may cause DB errors
|
|
134
|
+
- Recommendation: test("rejects email >255 chars", ...)
|
|
135
|
+
- ✗ Password containing NUL character (bcrypt has historical issue)
|
|
136
|
+
- Recommendation: test("handles NUL in password safely", ...)
|
|
137
|
+
|
|
138
|
+
### 2. Nullish
|
|
139
|
+
- ✗ email is empty string vs undefined
|
|
140
|
+
- Currently: both return 400 (via schema validation), but no test
|
|
141
|
+
- Recommendation: explicit test for both cases
|
|
142
|
+
|
|
143
|
+
### 3. Concurrency
|
|
144
|
+
- ✗ Same user with 2 concurrent logins
|
|
145
|
+
- Risk: token generation uniqueness?
|
|
146
|
+
- Recommendation: test("handles concurrent logins", async () => Promise.all([...]))
|
|
147
|
+
|
|
148
|
+
### 4. Error Recovery
|
|
149
|
+
- ✗ bcrypt.compare() timeout
|
|
150
|
+
- Currently: no timeout, will wait indefinitely
|
|
151
|
+
- Recommendation: add Promise.race + timeout test
|
|
152
|
+
|
|
153
|
+
### 5. Security
|
|
154
|
+
- ⚠ Error message leak (user enumeration)
|
|
155
|
+
- Already reported in adversarial review
|
|
156
|
+
- ✗ Timing attack: response time difference between email existing vs not
|
|
157
|
+
- Recommendation: run bcrypt.compare() in both cases, test response time difference < 10ms
|
|
158
|
+
|
|
159
|
+
### 6. Internationalization
|
|
160
|
+
- ✗ Unicode email (RFC 6531)
|
|
161
|
+
- Currently: regex may reject legitimate Unicode emails
|
|
162
|
+
- Recommendation: test("accepts unicode email like ñ@example.com")
|
|
163
|
+
|
|
164
|
+
### 7. Performance
|
|
165
|
+
- ⚠ bcrypt cost 12 response time (~100ms) not tested
|
|
166
|
+
- Recommendation: benchmark test, expect < 200ms P99
|
|
167
|
+
|
|
168
|
+
## Summary
|
|
169
|
+
|
|
170
|
+
Covered: 4 scenarios
|
|
171
|
+
Gaps: 9 scenarios
|
|
172
|
+
Priority ranking: 1 (concurrency) > 4 (timeout) > 7 (timing attack) > others
|
|
173
|
+
|
|
174
|
+
Fix recommendations:
|
|
175
|
+
- High priority: add 4 tests (concurrency, timeout, timing attack, unicode email)
|
|
176
|
+
- Medium priority: add edge-case-tests.test.ts to unify edge-case test management
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
---
|
|
180
|
+
|
|
181
|
+
## Difference from Adversarial Review
|
|
182
|
+
|
|
183
|
+
| Dimension | adversarial | edge-case |
|
|
184
|
+
|------|-------------|-----------|
|
|
185
|
+
| Goal | find **any** issue | find **edge-case** issues |
|
|
186
|
+
| Scope | all dimensions (architecture/implementation/...) | inputs / scenarios |
|
|
187
|
+
| Style | "attacker perspective" | "extreme case search" |
|
|
188
|
+
| Output | issue list + fix recommendations | gap list + test recommendations |
|
|
189
|
+
|
|
190
|
+
The two are complementary. Enterprise mode recommends enabling both.
|
|
191
|
+
|
|
192
|
+
---
|
|
193
|
+
|
|
194
|
+
_Source: BMAD-METHOD's edge-case-hunter._
|
|
@@ -0,0 +1,130 @@
|
|
|
1
|
+
---
|
|
2
|
+
gate: karpathy-gate
|
|
3
|
+
category: always-on
|
|
4
|
+
severity: blocking
|
|
5
|
+
depends_on: []
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Karpathy Gate — Thinking Baseline Check
|
|
9
|
+
|
|
10
|
+
> **Always enabled**. This is the code-level enforcement of L1. Violations block immediately.
|
|
11
|
+
|
|
12
|
+
This gate maps to Karpathy's 4 principles. All flow-executor and flow-reviewer agents must enforce it.
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## Trigger Timing
|
|
17
|
+
|
|
18
|
+
- Before code is written (pre-check)
|
|
19
|
+
- Before commit (re-check)
|
|
20
|
+
- When `/curdx-flow:review` runs (full review)
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## 4 Checks
|
|
25
|
+
|
|
26
|
+
### G1. Think Before Coding
|
|
27
|
+
|
|
28
|
+
**Violation patterns**:
|
|
29
|
+
- ✗ Code embodies unstated assumptions (e.g. default encoding, default pagination count, default permission scope)
|
|
30
|
+
- ✗ User goal has multiple interpretations but the agent picked one without saying so
|
|
31
|
+
- ✗ Business-relevant changes (data export, permission modification) were not confirmed with the user
|
|
32
|
+
|
|
33
|
+
**Check method**:
|
|
34
|
+
1. Read commit message + change scope
|
|
35
|
+
2. Look in `.progress.md` for "assumption:" entries
|
|
36
|
+
3. If a key assumption is not explicit, mark as violation
|
|
37
|
+
|
|
38
|
+
**Auto-fix**: impossible. Report to user.
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
### G2. Simplicity First
|
|
43
|
+
|
|
44
|
+
**Violation patterns**:
|
|
45
|
+
- ✗ Introduces an abstraction with only one usage point (Strategy / Factory / Observer used in one place)
|
|
46
|
+
- ✗ Code goes beyond task requirements (user asked for `A`, implemented `A` + `B` + `C`)
|
|
47
|
+
- ✗ Over-defensive (error handling for cases that obviously won't happen)
|
|
48
|
+
- ✗ Premature parameterization (hooks left "in case we need it later")
|
|
49
|
+
- ✗ Tests changed to "always pass" to accommodate implementation
|
|
50
|
+
|
|
51
|
+
**Check method**:
|
|
52
|
+
1. Cross-reference with the FR list in requirements.md
|
|
53
|
+
2. Check whether the commit's diff scope exceeds the FR description
|
|
54
|
+
3. Scan new classes / interfaces / factories; only reasonable if used in > 1 place
|
|
55
|
+
|
|
56
|
+
**Auto-fix**:
|
|
57
|
+
- Dispatch flow-adversary agent to review, flag redundant code
|
|
58
|
+
- Auto-deletion not allowed (may have reasons); list items and let the user decide
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
### G3. Surgical Changes
|
|
63
|
+
|
|
64
|
+
**Violation patterns**:
|
|
65
|
+
- ✗ Task only modifies `auth/login.ts`, but the commit contains changes in `utils/`
|
|
66
|
+
- ✗ Task is to add a feature, but the commit contains "incidental" refactoring
|
|
67
|
+
- ✗ Changed comments, quotes, or indentation unrelated to the task
|
|
68
|
+
- ✗ Deleted pre-existing (not self-caused) "dead code"
|
|
69
|
+
|
|
70
|
+
**Check method**:
|
|
71
|
+
1. Read the Files field in tasks.md
|
|
72
|
+
2. Compare to the commit's changed files
|
|
73
|
+
3. If there is a difference (commit changed files not in Files), mark as violation
|
|
74
|
+
|
|
75
|
+
**Auto-fix**:
|
|
76
|
+
- Dispatch flow-executor to extract the "incidental changes" into a separate commit
|
|
77
|
+
- Or roll back and redo
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
### G4. Goal-Driven Execution
|
|
82
|
+
|
|
83
|
+
**Violation patterns**:
|
|
84
|
+
- ✗ Commit message contains "should", "probably", "seems", "fixed" without verification evidence
|
|
85
|
+
- ✗ The `Verify` field is skipped (claiming complete without running)
|
|
86
|
+
- ✗ Tests were deleted instead of fixed (turning green into gray)
|
|
87
|
+
- ✗ Claims "done" but AC-X.Y still cannot be verified via curl
|
|
88
|
+
|
|
89
|
+
**Check method**:
|
|
90
|
+
1. Grep commit messages for forbidden words
|
|
91
|
+
2. Check .progress.md for Verify output records
|
|
92
|
+
3. For each AC, confirm that an automated verification path can be found
|
|
93
|
+
|
|
94
|
+
**Auto-fix**:
|
|
95
|
+
- Trigger flow-verifier to run reverse verification
|
|
96
|
+
- If AC is not met, send back for rework (dispatch flow-executor to fix)
|
|
97
|
+
|
|
98
|
+
---
|
|
99
|
+
|
|
100
|
+
## Violation Levels
|
|
101
|
+
|
|
102
|
+
| Violation | Level | Block? |
|
|
103
|
+
|------|------|-------|
|
|
104
|
+
| G1 (unstated assumption) | Medium | warning, require user confirmation |
|
|
105
|
+
| G2 (over-engineering) | Medium | warning + suggest simplification |
|
|
106
|
+
| G3 (surgical failure) | High | **block**, must split the commit |
|
|
107
|
+
| G4 (no evidence) | High | **block**, must run verification |
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## Output Format
|
|
112
|
+
|
|
113
|
+
```markdown
|
|
114
|
+
## Karpathy Gate Check Result
|
|
115
|
+
|
|
116
|
+
[G1] Think Before Coding: ✓ pass (3 explicit assumption records)
|
|
117
|
+
[G2] Simplicity First: ⚠ warning — src/auth/login-strategy.ts has a single-use Strategy pattern
|
|
118
|
+
[G3] Surgical Changes: ✗ violated — commit abc123 contains accidental changes in utils/
|
|
119
|
+
[G4] Goal-Driven: ✓ pass (all ACs have verification records)
|
|
120
|
+
|
|
121
|
+
Blockers: 1
|
|
122
|
+
Warnings: 1
|
|
123
|
+
|
|
124
|
+
Fix recommendations:
|
|
125
|
+
G3: git reset HEAD~1, split commit abc123 into 2 atomic commits
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
_Applied to: all agent preamble.md has this built in; this file contains the detailed rules for concrete checks._
|