deepflow 0.1.46 → 0.1.47

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "deepflow",
3
- "version": "0.1.46",
3
+ "version": "0.1.47",
4
4
  "description": "Stay in flow state - lightweight spec-driven task orchestration for Claude Code",
5
5
  "keywords": [
6
6
  "claude",
@@ -99,8 +99,16 @@ task: T3
99
99
  status: success|failed
100
100
  commit: abc1234
101
101
  summary: "one line"
102
+ tests_ran: true|false
103
+ test_command: "npm test"
104
+ test_exit_code: 0
105
+ test_output_tail: |
106
+ PASS src/upload.test.ts
107
+ Tests: 12 passed, 12 total
102
108
  ```
103
109
 
110
+ New fields: `tests_ran` (bool), `test_command` (string), `test_exit_code` (int), `test_output_tail` (last 20 lines of output).
111
+
104
112
  **Spike result file** `.deepflow/results/{task_id}.yaml` (additional fields):
105
113
  ```yaml
106
114
  task: T1
@@ -400,8 +408,18 @@ Example: To edit src/foo.ts, use:
400
408
 
401
409
  Do NOT write files to the main project directory.
402
410
 
403
- Implement, test, commit as feat({spec}): {description}.
404
- Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
411
+ Steps:
412
+ 1. Implement the task
413
+ 2. Detect test command: check for package.json (npm test), pyproject.toml (pytest),
414
+ Cargo.toml (cargo test), go.mod (go test ./...), or Makefile (make test)
415
+ 3. Run tests if test infrastructure exists:
416
+ - Run the detected test command
417
+ - If tests fail: fix the code and re-run until passing
418
+ - Do NOT commit with failing tests
419
+ 4. If NO test infrastructure: set tests_ran: false in result file
420
+ 5. Commit as feat({spec}): {description}
421
+ 6. Write result file with ALL fields including test evidence (see schema):
422
+ {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
405
423
 
406
424
  **STOP after writing the result file. Do NOT:**
407
425
  - Merge branches or cherry-pick commits
@@ -427,6 +445,7 @@ Steps:
427
445
  3. Write experiment as --active.md (verifier determines final status)
428
446
  4. Commit: spike({spec}): validate {hypothesis}
429
447
  5. Write result to .deepflow/results/{task_id}.yaml (see spike result schema)
448
+ 6. If test infrastructure exists, also run tests and include evidence in result file
430
449
 
431
450
  Rules:
432
451
  - `met: true` ONLY if actual satisfies target
@@ -491,11 +510,15 @@ After spawning wave agents, your turn ENDS. Completion notifications drive the l
491
510
 
492
511
  **Per notification:**
493
512
  1. Read result file for the completed agent
494
- 2. TaskUpdate(taskId: native_id, status: "completed") — auto-unblocks dependent tasks
495
- 3. Update PLAN.md: `[ ]` `[x]` + commit hash (as before)
496
- 4. Report ONE line: "✓ Tx: status (commit)"
497
- 5. If NOT all wave agents done end turn, wait
498
- 6. If ALL wave agents done use TaskList to find newly unblocked tasks, check context, spawn next wave or finish
513
+ 2. Validate test evidence:
514
+ - `tests_ran: true` + `test_exit_code: 0` trust result
515
+ - `tests_ran: true` + `test_exit_code: non-zero` status MUST be failed (flag mismatch if agent said success)
516
+ - `tests_ran: false` + `status: success`flag: "⚠ Tx: success but no tests ran"
517
+ 3. TaskUpdate(taskId: native_id, status: "completed") auto-unblocks dependent tasks
518
+ 4. Update PLAN.md: `[ ]` → `[x]` + commit hash (as before)
519
+ 5. Report: "✓ T1: success (abc123) [12 tests passed]" or "⚠ T1: success (abc123) [no tests]"
520
+ 6. If NOT all wave agents done → end turn, wait
521
+ 7. If ALL wave agents done → use TaskList to find newly unblocked tasks, check context, spawn next wave or finish
499
522
 
500
523
  **Between waves:** Check context %. If ≥50%, checkpoint and exit.
501
524
 
@@ -40,16 +40,86 @@ Load:
40
40
 
41
41
  If no done-* specs: report counts, suggest `--doing`.
42
42
 
43
+ ### 1.5. DETECT PROJECT COMMANDS
44
+
45
+ Detect build and test commands by inspecting project files in the worktree.
46
+
47
+ **Config override always wins.** If `.deepflow/config.yaml` has `quality.test_command` or `quality.build_command`, use those.
48
+
49
+ **Auto-detection (first match wins):**
50
+
51
+ | File | Build | Test |
52
+ |------|-------|------|
53
+ | `package.json` with `scripts.build` | `npm run build` | `npm test` (if scripts.test is not default placeholder) |
54
+ | `pyproject.toml` or `setup.py` | — | `pytest` |
55
+ | `Cargo.toml` | `cargo build` | `cargo test` |
56
+ | `go.mod` | `go build ./...` | `go test ./...` |
57
+ | `Makefile` with `test` target | `make build` (if target exists) | `make test` |
58
+
59
+ **Output:**
60
+ - Commands found: `Build: npm run build | Test: npm test`
61
+ - Nothing found: `⚠ No build/test commands detected. L0/L4 skipped. Set quality.test_command in .deepflow/config.yaml`
62
+
43
63
  ### 2. VERIFY EACH SPEC
44
64
 
65
+ **L0: Build check** (if build command detected)
66
+
67
+ Run the build command in the worktree:
68
+ - Exit code 0 → L0 pass, continue to L1-L3
69
+ - Exit code non-zero → L0 FAIL
70
+ - Report: "✗ L0: Build failed" with last 30 lines of output
71
+ - Add fix task: "Fix build errors" to PLAN.md
72
+ - Do NOT proceed to L1-L4 (no point checking if code doesn't build)
73
+
74
+ **L1-L3: Static analysis** (via Explore agents)
75
+
45
76
  Check requirements, acceptance criteria, and quality (stubs/TODOs).
46
77
  Mark each: ✓ satisfied | ✗ missing | ⚠ partial
47
78
 
79
+ **L4: Test execution** (if test command detected)
80
+
81
+ Run AFTER L0 passes and L1-L3 complete. Run even if L1-L3 found issues — test failures reveal additional problems.
82
+
83
+ - Run test command in the worktree (timeout from config, default 5 min)
84
+ - Exit code 0 → L4 pass
85
+ - Exit code non-zero → L4 FAIL
86
+ - Capture last 50 lines of output
87
+ - Report: "✗ L4: Tests failed (N of M)" with relevant output
88
+ - Add fix task: "Fix failing tests" with test output in description
89
+
90
+ **Flaky test handling** (if `quality.test_retry_on_fail: true` in config):
91
+ - If tests fail, re-run ONCE
92
+ - Second run passes → L4 pass with note: "⚠ L4: Passed on retry (possible flaky test)"
93
+ - Second run fails → genuine failure
94
+
48
95
  ### 3. GENERATE REPORT
49
96
 
50
- Report per spec: requirements count, acceptance count, quality issues.
97
+ Report per spec with L0/L4 status, requirements count, acceptance count, quality issues.
51
98
 
52
- **If all pass:** Proceed to Post-Verification merge.
99
+ **Format on success:**
100
+ ```
101
+ done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✓ (12 tests) | 0 quality issues
102
+ ```
103
+
104
+ **Format on failure:**
105
+ ```
106
+ done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✗ (3 failed) | 0 quality issues
107
+
108
+ Issues:
109
+ ✗ L4: 3 test failures
110
+ FAIL src/upload.test.ts > should validate file type
111
+ FAIL src/upload.test.ts > should reject oversized files
112
+
113
+ Fix tasks added to PLAN.md:
114
+ T10: Fix 3 failing tests in upload module
115
+ ```
116
+
117
+ **Gate conditions (ALL must pass to merge):**
118
+ - L0: Build passes (or no build command detected)
119
+ - L1-L3: All requirements satisfied, no stubs, properly wired
120
+ - L4: Tests pass (or no test command detected)
121
+
122
+ **If all gates pass:** Proceed to Post-Verification merge.
53
123
 
54
124
  **If issues found:** Add fix tasks to PLAN.md in the worktree and register as native tasks, then loop back to execute:
55
125
 
@@ -67,15 +137,19 @@ Report per spec: requirements count, acceptance count, quality issues.
67
137
  4. Output report + next step:
68
138
 
69
139
  ```
70
- done-upload.md: 4/4 reqs ✓, 3/5 acceptance ✗, 1 quality issue
140
+ done-upload.md: L0 ✓ | 4/4 reqs ✓, 3/5 acceptance | L4 ✗ (2 failed) | 1 quality issue
71
141
 
72
142
  Issues:
73
143
  ✗ AC-3: YAML parsing missing for consolation
144
+ ✗ L4: 2 test failures
145
+ FAIL src/upload.test.ts > should validate file type
146
+ FAIL src/upload.test.ts > should reject oversized files
74
147
  ⚠ Quality: TODO in parse_config()
75
148
 
76
149
  Fix tasks added to PLAN.md:
77
150
  T10: Add YAML parsing for consolation section
78
- T11: Remove TODO in parse_config()
151
+ T11: Fix 2 failing tests in upload module
152
+ T12: Remove TODO in parse_config()
79
153
 
80
154
  Run /df:execute --continue to fix in the same worktree.
81
155
  ```
@@ -105,14 +179,16 @@ Files: ...
105
179
 
106
180
  ## Verification Levels
107
181
 
108
- | Level | Check | Method |
109
- |-------|-------|--------|
110
- | L1: Exists | File/function exists | Glob/Grep |
111
- | L2: Substantive | Real code, not stub | Read + analyze |
112
- | L3: Wired | Integrated into system | Trace imports/calls |
113
- | L4: Tested | Has passing tests | Run tests |
182
+ | Level | Check | Method | Runner |
183
+ |-------|-------|--------|--------|
184
+ | L0: Builds | Code compiles/builds | Run build command | Orchestrator (Bash) |
185
+ | L1: Exists | File/function exists | Glob/Grep | Explore agents |
186
+ | L2: Substantive | Real code, not stub | Read + analyze | Explore agents |
187
+ | L3: Wired | Integrated into system | Trace imports/calls | Explore agents |
188
+ | L4: Tested | Tests pass | Run test command | Orchestrator (Bash) |
114
189
 
115
- Default: L1-L3 (L4 optional, can be slow)
190
+ **Default: L0 through L4.** L0 and L4 are skipped ONLY if no build/test command is detected (see step 1.5).
191
+ L0 and L4 run directly via Bash — Explore agents cannot execute commands.
116
192
 
117
193
  ## Rules
118
194
  - **Never use TaskOutput** — Returns full transcripts that explode context
@@ -147,10 +223,12 @@ Scale: 1-2 agents per spec, cap 10.
147
223
  ```
148
224
  /df:verify
149
225
 
150
- done-upload.md: 4/4 reqs ✓, 5/5 acceptance ✓, clean
151
- done-auth.md: 2/2 reqs ✓, 3/3 acceptance ✓, clean
226
+ Build: npm run build | Test: npm test
227
+
228
+ done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✓ (12 tests) | 0 quality issues
229
+ done-auth.md: L0 ✓ | 2/2 reqs ✓, 3/3 acceptance ✓ | L4 ✓ (8 tests) | 0 quality issues
152
230
 
153
- ✓ All specs verified
231
+ ✓ All gates passed
154
232
 
155
233
  ✓ Merged df/upload to main
156
234
  ✓ Cleaned up worktree and branch
@@ -163,22 +241,29 @@ Learnings captured:
163
241
  ```
164
242
  /df:verify --doing
165
243
 
166
- doing-upload.md: 4/4 reqs ✓, 3/5 acceptance ✗, 1 quality issue
244
+ Build: npm run build | Test: npm test
245
+
246
+ doing-upload.md: L0 ✓ | 4/4 reqs ✓, 3/5 acceptance ✗ | L4 ✗ (3 failed) | 1 quality issue
167
247
 
168
248
  Issues:
169
249
  ✗ AC-3: YAML parsing missing for consolation
250
+ ✗ L4: 3 test failures
251
+ FAIL src/upload.test.ts > should validate file type
252
+ FAIL src/upload.test.ts > should reject oversized files
253
+ FAIL src/upload.test.ts > should handle empty input
170
254
  ⚠ Quality: TODO in parse_config()
171
255
 
172
256
  Fix tasks added to PLAN.md:
173
257
  T10: Add YAML parsing for consolation section
174
- T11: Remove TODO in parse_config()
258
+ T11: Fix 3 failing tests in upload module
259
+ T12: Remove TODO in parse_config()
175
260
 
176
261
  Run /df:execute --continue to fix in the same worktree.
177
262
  ```
178
263
 
179
264
  ## Post-Verification: Worktree Merge & Cleanup
180
265
 
181
- **Only runs when ALL specs pass verification.** If issues were found, fix tasks were added to PLAN.md instead (see step 3).
266
+ **Only runs when ALL gates pass** (L0 build, L1-L3 static analysis, L4 tests). If any gate fails, fix tasks were added to PLAN.md instead (see step 3).
182
267
 
183
268
  ### 1. DISCOVER WORKTREE
184
269
 
@@ -61,3 +61,17 @@ worktree:
61
61
 
62
62
  # Keep worktree after failed execution for debugging
63
63
  cleanup_on_fail: false
64
+
65
+ # Quality gates for /df:verify
66
+ quality:
67
+ # Override auto-detected build command (e.g., "npm run build", "cargo build")
68
+ build_command: ""
69
+
70
+ # Override auto-detected test command (e.g., "npm test", "pytest", "go test ./...")
71
+ test_command: ""
72
+
73
+ # Test timeout in seconds (default: 300 = 5 minutes)
74
+ test_timeout: 300
75
+
76
+ # Retry flaky tests once before failing (default: true)
77
+ test_retry_on_fail: true