deepflow 0.1.46 → 0.1.47
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/src/commands/df/execute.md +30 -7
- package/src/commands/df/verify.md +102 -17
- package/templates/config-template.yaml +14 -0
package/package.json
CHANGED
|
@@ -99,8 +99,16 @@ task: T3
|
|
|
99
99
|
status: success|failed
|
|
100
100
|
commit: abc1234
|
|
101
101
|
summary: "one line"
|
|
102
|
+
tests_ran: true|false
|
|
103
|
+
test_command: "npm test"
|
|
104
|
+
test_exit_code: 0
|
|
105
|
+
test_output_tail: |
|
|
106
|
+
PASS src/upload.test.ts
|
|
107
|
+
Tests: 12 passed, 12 total
|
|
102
108
|
```
|
|
103
109
|
|
|
110
|
+
New fields: `tests_ran` (bool), `test_command` (string), `test_exit_code` (int), `test_output_tail` (last 20 lines of output).
|
|
111
|
+
|
|
104
112
|
**Spike result file** `.deepflow/results/{task_id}.yaml` (additional fields):
|
|
105
113
|
```yaml
|
|
106
114
|
task: T1
|
|
@@ -400,8 +408,18 @@ Example: To edit src/foo.ts, use:
|
|
|
400
408
|
|
|
401
409
|
Do NOT write files to the main project directory.
|
|
402
410
|
|
|
403
|
-
|
|
404
|
-
|
|
411
|
+
Steps:
|
|
412
|
+
1. Implement the task
|
|
413
|
+
2. Detect test command: check for package.json (npm test), pyproject.toml (pytest),
|
|
414
|
+
Cargo.toml (cargo test), go.mod (go test ./...), or Makefile (make test)
|
|
415
|
+
3. Run tests if test infrastructure exists:
|
|
416
|
+
- Run the detected test command
|
|
417
|
+
- If tests fail: fix the code and re-run until passing
|
|
418
|
+
- Do NOT commit with failing tests
|
|
419
|
+
4. If NO test infrastructure: set tests_ran: false in result file
|
|
420
|
+
5. Commit as feat({spec}): {description}
|
|
421
|
+
6. Write result file with ALL fields including test evidence (see schema):
|
|
422
|
+
{worktree_absolute_path}/.deepflow/results/{task_id}.yaml
|
|
405
423
|
|
|
406
424
|
**STOP after writing the result file. Do NOT:**
|
|
407
425
|
- Merge branches or cherry-pick commits
|
|
@@ -427,6 +445,7 @@ Steps:
|
|
|
427
445
|
3. Write experiment as --active.md (verifier determines final status)
|
|
428
446
|
4. Commit: spike({spec}): validate {hypothesis}
|
|
429
447
|
5. Write result to .deepflow/results/{task_id}.yaml (see spike result schema)
|
|
448
|
+
6. If test infrastructure exists, also run tests and include evidence in result file
|
|
430
449
|
|
|
431
450
|
Rules:
|
|
432
451
|
- `met: true` ONLY if actual satisfies target
|
|
@@ -491,11 +510,15 @@ After spawning wave agents, your turn ENDS. Completion notifications drive the l
|
|
|
491
510
|
|
|
492
511
|
**Per notification:**
|
|
493
512
|
1. Read result file for the completed agent
|
|
494
|
-
2.
|
|
495
|
-
|
|
496
|
-
|
|
497
|
-
|
|
498
|
-
|
|
513
|
+
2. Validate test evidence:
|
|
514
|
+
- `tests_ran: true` + `test_exit_code: 0` → trust result
|
|
515
|
+
- `tests_ran: true` + `test_exit_code: non-zero` → status MUST be failed (flag mismatch if agent said success)
|
|
516
|
+
- `tests_ran: false` + `status: success` → flag: "⚠ Tx: success but no tests ran"
|
|
517
|
+
3. TaskUpdate(taskId: native_id, status: "completed") — auto-unblocks dependent tasks
|
|
518
|
+
4. Update PLAN.md: `[ ]` → `[x]` + commit hash (as before)
|
|
519
|
+
5. Report: "✓ T1: success (abc123) [12 tests passed]" or "⚠ T1: success (abc123) [no tests]"
|
|
520
|
+
6. If NOT all wave agents done → end turn, wait
|
|
521
|
+
7. If ALL wave agents done → use TaskList to find newly unblocked tasks, check context, spawn next wave or finish
|
|
499
522
|
|
|
500
523
|
**Between waves:** Check context %. If ≥50%, checkpoint and exit.
|
|
501
524
|
|
|
@@ -40,16 +40,86 @@ Load:
|
|
|
40
40
|
|
|
41
41
|
If no done-* specs: report counts, suggest `--doing`.
|
|
42
42
|
|
|
43
|
+
### 1.5. DETECT PROJECT COMMANDS
|
|
44
|
+
|
|
45
|
+
Detect build and test commands by inspecting project files in the worktree.
|
|
46
|
+
|
|
47
|
+
**Config override always wins.** If `.deepflow/config.yaml` has `quality.test_command` or `quality.build_command`, use those.
|
|
48
|
+
|
|
49
|
+
**Auto-detection (first match wins):**
|
|
50
|
+
|
|
51
|
+
| File | Build | Test |
|
|
52
|
+
|------|-------|------|
|
|
53
|
+
| `package.json` with `scripts.build` | `npm run build` | `npm test` (if scripts.test is not default placeholder) |
|
|
54
|
+
| `pyproject.toml` or `setup.py` | — | `pytest` |
|
|
55
|
+
| `Cargo.toml` | `cargo build` | `cargo test` |
|
|
56
|
+
| `go.mod` | `go build ./...` | `go test ./...` |
|
|
57
|
+
| `Makefile` with `test` target | `make build` (if target exists) | `make test` |
|
|
58
|
+
|
|
59
|
+
**Output:**
|
|
60
|
+
- Commands found: `Build: npm run build | Test: npm test`
|
|
61
|
+
- Nothing found: `⚠ No build/test commands detected. L0/L4 skipped. Set quality.test_command in .deepflow/config.yaml`
|
|
62
|
+
|
|
43
63
|
### 2. VERIFY EACH SPEC
|
|
44
64
|
|
|
65
|
+
**L0: Build check** (if build command detected)
|
|
66
|
+
|
|
67
|
+
Run the build command in the worktree:
|
|
68
|
+
- Exit code 0 → L0 pass, continue to L1-L3
|
|
69
|
+
- Exit code non-zero → L0 FAIL
|
|
70
|
+
- Report: "✗ L0: Build failed" with last 30 lines of output
|
|
71
|
+
- Add fix task: "Fix build errors" to PLAN.md
|
|
72
|
+
- Do NOT proceed to L1-L4 (no point checking if code doesn't build)
|
|
73
|
+
|
|
74
|
+
**L1-L3: Static analysis** (via Explore agents)
|
|
75
|
+
|
|
45
76
|
Check requirements, acceptance criteria, and quality (stubs/TODOs).
|
|
46
77
|
Mark each: ✓ satisfied | ✗ missing | ⚠ partial
|
|
47
78
|
|
|
79
|
+
**L4: Test execution** (if test command detected)
|
|
80
|
+
|
|
81
|
+
Run AFTER L0 passes and L1-L3 complete. Run even if L1-L3 found issues — test failures reveal additional problems.
|
|
82
|
+
|
|
83
|
+
- Run test command in the worktree (timeout from config, default 5 min)
|
|
84
|
+
- Exit code 0 → L4 pass
|
|
85
|
+
- Exit code non-zero → L4 FAIL
|
|
86
|
+
- Capture last 50 lines of output
|
|
87
|
+
- Report: "✗ L4: Tests failed (N of M)" with relevant output
|
|
88
|
+
- Add fix task: "Fix failing tests" with test output in description
|
|
89
|
+
|
|
90
|
+
**Flaky test handling** (if `quality.test_retry_on_fail: true` in config):
|
|
91
|
+
- If tests fail, re-run ONCE
|
|
92
|
+
- Second run passes → L4 pass with note: "⚠ L4: Passed on retry (possible flaky test)"
|
|
93
|
+
- Second run fails → genuine failure
|
|
94
|
+
|
|
48
95
|
### 3. GENERATE REPORT
|
|
49
96
|
|
|
50
|
-
Report per spec
|
|
97
|
+
Report per spec with L0/L4 status, requirements count, acceptance count, quality issues.
|
|
51
98
|
|
|
52
|
-
**
|
|
99
|
+
**Format on success:**
|
|
100
|
+
```
|
|
101
|
+
done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✓ (12 tests) | 0 quality issues
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
**Format on failure:**
|
|
105
|
+
```
|
|
106
|
+
done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✗ (3 failed) | 0 quality issues
|
|
107
|
+
|
|
108
|
+
Issues:
|
|
109
|
+
✗ L4: 3 test failures
|
|
110
|
+
FAIL src/upload.test.ts > should validate file type
|
|
111
|
+
FAIL src/upload.test.ts > should reject oversized files
|
|
112
|
+
|
|
113
|
+
Fix tasks added to PLAN.md:
|
|
114
|
+
T10: Fix 3 failing tests in upload module
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
**Gate conditions (ALL must pass to merge):**
|
|
118
|
+
- L0: Build passes (or no build command detected)
|
|
119
|
+
- L1-L3: All requirements satisfied, no stubs, properly wired
|
|
120
|
+
- L4: Tests pass (or no test command detected)
|
|
121
|
+
|
|
122
|
+
**If all gates pass:** Proceed to Post-Verification merge.
|
|
53
123
|
|
|
54
124
|
**If issues found:** Add fix tasks to PLAN.md in the worktree and register as native tasks, then loop back to execute:
|
|
55
125
|
|
|
@@ -67,15 +137,19 @@ Report per spec: requirements count, acceptance count, quality issues.
|
|
|
67
137
|
4. Output report + next step:
|
|
68
138
|
|
|
69
139
|
```
|
|
70
|
-
done-upload.md: 4/4 reqs ✓, 3/5 acceptance
|
|
140
|
+
done-upload.md: L0 ✓ | 4/4 reqs ✓, 3/5 acceptance ✗ | L4 ✗ (2 failed) | 1 quality issue
|
|
71
141
|
|
|
72
142
|
Issues:
|
|
73
143
|
✗ AC-3: YAML parsing missing for consolation
|
|
144
|
+
✗ L4: 2 test failures
|
|
145
|
+
FAIL src/upload.test.ts > should validate file type
|
|
146
|
+
FAIL src/upload.test.ts > should reject oversized files
|
|
74
147
|
⚠ Quality: TODO in parse_config()
|
|
75
148
|
|
|
76
149
|
Fix tasks added to PLAN.md:
|
|
77
150
|
T10: Add YAML parsing for consolation section
|
|
78
|
-
T11:
|
|
151
|
+
T11: Fix 2 failing tests in upload module
|
|
152
|
+
T12: Remove TODO in parse_config()
|
|
79
153
|
|
|
80
154
|
Run /df:execute --continue to fix in the same worktree.
|
|
81
155
|
```
|
|
@@ -105,14 +179,16 @@ Files: ...
|
|
|
105
179
|
|
|
106
180
|
## Verification Levels
|
|
107
181
|
|
|
108
|
-
| Level | Check | Method |
|
|
109
|
-
|
|
110
|
-
|
|
|
111
|
-
|
|
|
112
|
-
|
|
|
113
|
-
|
|
|
182
|
+
| Level | Check | Method | Runner |
|
|
183
|
+
|-------|-------|--------|--------|
|
|
184
|
+
| L0: Builds | Code compiles/builds | Run build command | Orchestrator (Bash) |
|
|
185
|
+
| L1: Exists | File/function exists | Glob/Grep | Explore agents |
|
|
186
|
+
| L2: Substantive | Real code, not stub | Read + analyze | Explore agents |
|
|
187
|
+
| L3: Wired | Integrated into system | Trace imports/calls | Explore agents |
|
|
188
|
+
| L4: Tested | Tests pass | Run test command | Orchestrator (Bash) |
|
|
114
189
|
|
|
115
|
-
Default:
|
|
190
|
+
**Default: L0 through L4.** L0 and L4 are skipped ONLY if no build/test command is detected (see step 1.5).
|
|
191
|
+
L0 and L4 run directly via Bash — Explore agents cannot execute commands.
|
|
116
192
|
|
|
117
193
|
## Rules
|
|
118
194
|
- **Never use TaskOutput** — Returns full transcripts that explode context
|
|
@@ -147,10 +223,12 @@ Scale: 1-2 agents per spec, cap 10.
|
|
|
147
223
|
```
|
|
148
224
|
/df:verify
|
|
149
225
|
|
|
150
|
-
|
|
151
|
-
|
|
226
|
+
Build: npm run build | Test: npm test
|
|
227
|
+
|
|
228
|
+
done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✓ (12 tests) | 0 quality issues
|
|
229
|
+
done-auth.md: L0 ✓ | 2/2 reqs ✓, 3/3 acceptance ✓ | L4 ✓ (8 tests) | 0 quality issues
|
|
152
230
|
|
|
153
|
-
✓ All
|
|
231
|
+
✓ All gates passed
|
|
154
232
|
|
|
155
233
|
✓ Merged df/upload to main
|
|
156
234
|
✓ Cleaned up worktree and branch
|
|
@@ -163,22 +241,29 @@ Learnings captured:
|
|
|
163
241
|
```
|
|
164
242
|
/df:verify --doing
|
|
165
243
|
|
|
166
|
-
|
|
244
|
+
Build: npm run build | Test: npm test
|
|
245
|
+
|
|
246
|
+
doing-upload.md: L0 ✓ | 4/4 reqs ✓, 3/5 acceptance ✗ | L4 ✗ (3 failed) | 1 quality issue
|
|
167
247
|
|
|
168
248
|
Issues:
|
|
169
249
|
✗ AC-3: YAML parsing missing for consolation
|
|
250
|
+
✗ L4: 3 test failures
|
|
251
|
+
FAIL src/upload.test.ts > should validate file type
|
|
252
|
+
FAIL src/upload.test.ts > should reject oversized files
|
|
253
|
+
FAIL src/upload.test.ts > should handle empty input
|
|
170
254
|
⚠ Quality: TODO in parse_config()
|
|
171
255
|
|
|
172
256
|
Fix tasks added to PLAN.md:
|
|
173
257
|
T10: Add YAML parsing for consolation section
|
|
174
|
-
T11:
|
|
258
|
+
T11: Fix 3 failing tests in upload module
|
|
259
|
+
T12: Remove TODO in parse_config()
|
|
175
260
|
|
|
176
261
|
Run /df:execute --continue to fix in the same worktree.
|
|
177
262
|
```
|
|
178
263
|
|
|
179
264
|
## Post-Verification: Worktree Merge & Cleanup
|
|
180
265
|
|
|
181
|
-
**Only runs when ALL
|
|
266
|
+
**Only runs when ALL gates pass** (L0 build, L1-L3 static analysis, L4 tests). If any gate fails, fix tasks were added to PLAN.md instead (see step 3).
|
|
182
267
|
|
|
183
268
|
### 1. DISCOVER WORKTREE
|
|
184
269
|
|
|
@@ -61,3 +61,17 @@ worktree:
|
|
|
61
61
|
|
|
62
62
|
# Keep worktree after failed execution for debugging
|
|
63
63
|
cleanup_on_fail: false
|
|
64
|
+
|
|
65
|
+
# Quality gates for /df:verify
|
|
66
|
+
quality:
|
|
67
|
+
# Override auto-detected build command (e.g., "npm run build", "cargo build")
|
|
68
|
+
build_command: ""
|
|
69
|
+
|
|
70
|
+
# Override auto-detected test command (e.g., "npm test", "pytest", "go test ./...")
|
|
71
|
+
test_command: ""
|
|
72
|
+
|
|
73
|
+
# Test timeout in seconds (default: 300 = 5 minutes)
|
|
74
|
+
test_timeout: 300
|
|
75
|
+
|
|
76
|
+
# Retry flaky tests once before failing (default: true)
|
|
77
|
+
test_retry_on_fail: true
|