npm - deepflow - Versions diffs - 0.1.46 → 0.1.47 - Mend

deepflow 0.1.46 → 0.1.47

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/package.json +1 -1
package/src/commands/df/execute.md +30 -7
package/src/commands/df/verify.md +102 -17
package/templates/config-template.yaml +14 -0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "deepflow",
-  "version": "0.1.46",
+  "version": "0.1.47",
   "description": "Stay in flow state - lightweight spec-driven task orchestration for Claude Code",
   "keywords": [
     "claude",

package/src/commands/df/execute.md CHANGED Viewed

@@ -99,8 +99,16 @@ task: T3
 status: success|failed
 commit: abc1234
 summary: "one line"
+tests_ran: true|false
+test_command: "npm test"
+test_exit_code: 0
+test_output_tail: |
+  PASS src/upload.test.ts
+  Tests: 12 passed, 12 total
 ```
+New fields: `tests_ran` (bool), `test_command` (string), `test_exit_code` (int), `test_output_tail` (last 20 lines of output).
 **Spike result file** `.deepflow/results/{task_id}.yaml` (additional fields):
 ```yaml
 task: T1
@@ -400,8 +408,18 @@ Example: To edit src/foo.ts, use:
 Do NOT write files to the main project directory.
-Implement, test, commit as feat({spec}): {description}.
-Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
+Steps:
+1. Implement the task
+2. Detect test command: check for package.json (npm test), pyproject.toml (pytest),
+   Cargo.toml (cargo test), go.mod (go test ./...), or Makefile (make test)
+3. Run tests if test infrastructure exists:
+   - Run the detected test command
+   - If tests fail: fix the code and re-run until passing
+   - Do NOT commit with failing tests
+4. If NO test infrastructure: set tests_ran: false in result file
+5. Commit as feat({spec}): {description}
+6. Write result file with ALL fields including test evidence (see schema):
+   {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
 **STOP after writing the result file. Do NOT:**
 - Merge branches or cherry-pick commits
@@ -427,6 +445,7 @@ Steps:
 3. Write experiment as --active.md (verifier determines final status)
 4. Commit: spike({spec}): validate {hypothesis}
 5. Write result to .deepflow/results/{task_id}.yaml (see spike result schema)
+6. If test infrastructure exists, also run tests and include evidence in result file
 Rules:
 - `met: true` ONLY if actual satisfies target
@@ -491,11 +510,15 @@ After spawning wave agents, your turn ENDS. Completion notifications drive the l
 **Per notification:**
 1. Read result file for the completed agent
-2. TaskUpdate(taskId: native_id, status: "completed") — auto-unblocks dependent tasks
-3. Update PLAN.md: `[ ]` → `[x]` + commit hash (as before)
-4. Report ONE line: "✓ Tx: status (commit)"
-5. If NOT all wave agents done → end turn, wait
-6. If ALL wave agents done → use TaskList to find newly unblocked tasks, check context, spawn next wave or finish
+2. Validate test evidence:
+   - `tests_ran: true` + `test_exit_code: 0` → trust result
+   - `tests_ran: true` + `test_exit_code: non-zero` → status MUST be failed (flag mismatch if agent said success)
+   - `tests_ran: false` + `status: success` → flag: "⚠ Tx: success but no tests ran"
+3. TaskUpdate(taskId: native_id, status: "completed") — auto-unblocks dependent tasks
+4. Update PLAN.md: `[ ]` → `[x]` + commit hash (as before)
+5. Report: "✓ T1: success (abc123) [12 tests passed]" or "⚠ T1: success (abc123) [no tests]"
+6. If NOT all wave agents done → end turn, wait
+7. If ALL wave agents done → use TaskList to find newly unblocked tasks, check context, spawn next wave or finish
 **Between waves:** Check context %. If ≥50%, checkpoint and exit.

package/src/commands/df/verify.md CHANGED Viewed

@@ -40,16 +40,86 @@ Load:
 If no done-* specs: report counts, suggest `--doing`.
+### 1.5. DETECT PROJECT COMMANDS
+Detect build and test commands by inspecting project files in the worktree.
+**Config override always wins.** If `.deepflow/config.yaml` has `quality.test_command` or `quality.build_command`, use those.
+**Auto-detection (first match wins):**
+| File | Build | Test |
+|------|-------|------|
+| `package.json` with `scripts.build` | `npm run build` | `npm test` (if scripts.test is not default placeholder) |
+| `pyproject.toml` or `setup.py` | — | `pytest` |
+| `Cargo.toml` | `cargo build` | `cargo test` |
+| `go.mod` | `go build ./...` | `go test ./...` |
+| `Makefile` with `test` target | `make build` (if target exists) | `make test` |
+**Output:**
+- Commands found: `Build: npm run build | Test: npm test`
+- Nothing found: `⚠ No build/test commands detected. L0/L4 skipped. Set quality.test_command in .deepflow/config.yaml`
 ### 2. VERIFY EACH SPEC
+**L0: Build check** (if build command detected)
+Run the build command in the worktree:
+- Exit code 0 → L0 pass, continue to L1-L3
+- Exit code non-zero → L0 FAIL
+  - Report: "✗ L0: Build failed" with last 30 lines of output
+  - Add fix task: "Fix build errors" to PLAN.md
+  - Do NOT proceed to L1-L4 (no point checking if code doesn't build)
+**L1-L3: Static analysis** (via Explore agents)
 Check requirements, acceptance criteria, and quality (stubs/TODOs).
 Mark each: ✓ satisfied | ✗ missing | ⚠ partial
+**L4: Test execution** (if test command detected)
+Run AFTER L0 passes and L1-L3 complete. Run even if L1-L3 found issues — test failures reveal additional problems.
+- Run test command in the worktree (timeout from config, default 5 min)
+- Exit code 0 → L4 pass
+- Exit code non-zero → L4 FAIL
+  - Capture last 50 lines of output
+  - Report: "✗ L4: Tests failed (N of M)" with relevant output
+  - Add fix task: "Fix failing tests" with test output in description
+**Flaky test handling** (if `quality.test_retry_on_fail: true` in config):
+- If tests fail, re-run ONCE
+- Second run passes → L4 pass with note: "⚠ L4: Passed on retry (possible flaky test)"
+- Second run fails → genuine failure
 ### 3. GENERATE REPORT
-Report per spec: requirements count, acceptance count, quality issues.
+Report per spec with L0/L4 status, requirements count, acceptance count, quality issues.
-**If all pass:** Proceed to Post-Verification merge.
+**Format on success:**
+```
+done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✓ (12 tests) | 0 quality issues
+```
+**Format on failure:**
+```
+done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✗ (3 failed) | 0 quality issues
+Issues:
+  ✗ L4: 3 test failures
+    FAIL src/upload.test.ts > should validate file type
+    FAIL src/upload.test.ts > should reject oversized files
+Fix tasks added to PLAN.md:
+  T10: Fix 3 failing tests in upload module
+```
+**Gate conditions (ALL must pass to merge):**
+- L0: Build passes (or no build command detected)
+- L1-L3: All requirements satisfied, no stubs, properly wired
+- L4: Tests pass (or no test command detected)
+**If all gates pass:** Proceed to Post-Verification merge.
 **If issues found:** Add fix tasks to PLAN.md in the worktree and register as native tasks, then loop back to execute:
@@ -67,15 +137,19 @@ Report per spec: requirements count, acceptance count, quality issues.
 4. Output report + next step:
 ```
-done-upload.md: 4/4 reqs ✓, 3/5 acceptance ✗, 1 quality issue
+done-upload.md: L0 ✓ | 4/4 reqs ✓, 3/5 acceptance ✗ | L4 ✗ (2 failed) | 1 quality issue
 Issues:
   ✗ AC-3: YAML parsing missing for consolation
+  ✗ L4: 2 test failures
+    FAIL src/upload.test.ts > should validate file type
+    FAIL src/upload.test.ts > should reject oversized files
   ⚠ Quality: TODO in parse_config()
 Fix tasks added to PLAN.md:
   T10: Add YAML parsing for consolation section
-  T11: Remove TODO in parse_config()
+  T11: Fix 2 failing tests in upload module
+  T12: Remove TODO in parse_config()
 Run /df:execute --continue to fix in the same worktree.
 ```
@@ -105,14 +179,16 @@ Files: ...
 ## Verification Levels
-| Level | Check | Method |
-|-------|-------|--------|
-| L1: Exists | File/function exists | Glob/Grep |
-| L2: Substantive | Real code, not stub | Read + analyze |
-| L3: Wired | Integrated into system | Trace imports/calls |
-| L4: Tested | Has passing tests | Run tests |
+| Level | Check | Method | Runner |
+|-------|-------|--------|--------|
+| L0: Builds | Code compiles/builds | Run build command | Orchestrator (Bash) |
+| L1: Exists | File/function exists | Glob/Grep | Explore agents |
+| L2: Substantive | Real code, not stub | Read + analyze | Explore agents |
+| L3: Wired | Integrated into system | Trace imports/calls | Explore agents |
+| L4: Tested | Tests pass | Run test command | Orchestrator (Bash) |
-Default: L1-L3 (L4 optional, can be slow)
+**Default: L0 through L4.** L0 and L4 are skipped ONLY if no build/test command is detected (see step 1.5).
+L0 and L4 run directly via Bash — Explore agents cannot execute commands.
 ## Rules
 - **Never use TaskOutput** — Returns full transcripts that explode context
@@ -147,10 +223,12 @@ Scale: 1-2 agents per spec, cap 10.
 ```
 /df:verify
-done-upload.md: 4/4 reqs ✓, 5/5 acceptance ✓, clean
-done-auth.md: 2/2 reqs ✓, 3/3 acceptance ✓, clean
+Build: npm run build | Test: npm test
+done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✓ (12 tests) | 0 quality issues
+done-auth.md: L0 ✓ | 2/2 reqs ✓, 3/3 acceptance ✓ | L4 ✓ (8 tests) | 0 quality issues
-✓ All specs verified
+✓ All gates passed
 ✓ Merged df/upload to main
 ✓ Cleaned up worktree and branch
@@ -163,22 +241,29 @@ Learnings captured:
 ```
 /df:verify --doing
-doing-upload.md: 4/4 reqs ✓, 3/5 acceptance ✗, 1 quality issue
+Build: npm run build | Test: npm test
+doing-upload.md: L0 ✓ | 4/4 reqs ✓, 3/5 acceptance ✗ | L4 ✗ (3 failed) | 1 quality issue
 Issues:
   ✗ AC-3: YAML parsing missing for consolation
+  ✗ L4: 3 test failures
+    FAIL src/upload.test.ts > should validate file type
+    FAIL src/upload.test.ts > should reject oversized files
+    FAIL src/upload.test.ts > should handle empty input
   ⚠ Quality: TODO in parse_config()
 Fix tasks added to PLAN.md:
   T10: Add YAML parsing for consolation section
-  T11: Remove TODO in parse_config()
+  T11: Fix 3 failing tests in upload module
+  T12: Remove TODO in parse_config()
 Run /df:execute --continue to fix in the same worktree.
 ```
 ## Post-Verification: Worktree Merge & Cleanup
-**Only runs when ALL specs pass verification.** If issues were found, fix tasks were added to PLAN.md instead (see step 3).
+**Only runs when ALL gates pass** (L0 build, L1-L3 static analysis, L4 tests). If any gate fails, fix tasks were added to PLAN.md instead (see step 3).
 ### 1. DISCOVER WORKTREE

package/templates/config-template.yaml CHANGED Viewed

@@ -61,3 +61,17 @@ worktree:
   # Keep worktree after failed execution for debugging
   cleanup_on_fail: false
+# Quality gates for /df:verify
+quality:
+  # Override auto-detected build command (e.g., "npm run build", "cargo build")
+  build_command: ""
+  # Override auto-detected test command (e.g., "npm test", "pytest", "go test ./...")
+  test_command: ""
+  # Test timeout in seconds (default: 300 = 5 minutes)
+  test_timeout: 300
+  # Retry flaky tests once before failing (default: true)
+  test_retry_on_fail: true