npm - deepflow - Versions diffs - 0.1.45 → 0.1.47 - Mend

deepflow 0.1.45 → 0.1.47

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/package.json +1 -1
package/src/commands/df/execute.md +149 -18
package/src/commands/df/verify.md +111 -19
package/templates/config-template.yaml +14 -0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "deepflow",
-  "version": "0.1.45",
+  "version": "0.1.47",
   "description": "Stay in flow state - lightweight spec-driven task orchestration for Claude Code",
   "keywords": [
     "claude",

package/src/commands/df/execute.md CHANGED Viewed

@@ -99,8 +99,16 @@ task: T3
 status: success|failed
 commit: abc1234
 summary: "one line"
+tests_ran: true|false
+test_command: "npm test"
+test_exit_code: 0
+test_output_tail: |
+  PASS src/upload.test.ts
+  Tests: 12 passed, 12 total
 ```
+New fields: `tests_ran` (bool), `test_command` (string), `test_exit_code` (int), `test_output_tail` (last 20 lines of output).
 **Spike result file** `.deepflow/results/{task_id}.yaml` (additional fields):
 ```yaml
 task: T1
@@ -137,8 +145,10 @@ experiment_file: ".deepflow/experiments/upload--streaming--failed.md"
 }
 ```
+Note: `completed_tasks` is kept for backward compatibility but is now derivable from PLAN.md `[x]` entries. The native task system (TaskList) is the primary source for runtime task status.
 **On checkpoint:** Complete wave → update PLAN.md → save to worktree → exit.
-**Resume:** `--continue` loads checkpoint, verifies worktree, skips completed tasks.
+**Resume:** `--continue` loads checkpoint, verifies worktree, skips completed tasks. Native tasks are re-registered for remaining `[ ]` items only.
 ## Behavior
@@ -188,6 +198,30 @@ Load: PLAN.md (required), specs/doing-*.md, .deepflow/config.yaml
 If missing: "No PLAN.md found. Run /df:plan first."
 ```
+### 2.5. REGISTER NATIVE TASKS
+Parse PLAN.md and create native tasks for tracking, dependency management, and UI spinners.
+**For each uncompleted task (`[ ]`) in PLAN.md:**
+```
+1. TaskCreate:
+   - subject: "{task_id}: {description}" (e.g. "T1: Create upload endpoint")
+   - description: Full task block from PLAN.md (files, blocked by, type, etc.)
+   - activeForm: "{gerund form of description}" (e.g. "Creating upload endpoint")
+2. Store mapping: PLAN.md task_id (T1) → native task ID
+```
+**After all tasks created, set up dependencies:**
+```
+For each task with "Blocked by: T{n}, T{m}":
+  TaskUpdate(taskId: native_id, addBlockedBy: [native_id_of_Tn, native_id_of_Tm])
+```
+**On `--continue`:** Only create tasks for remaining `[ ]` items (skip `[x]` completed).
 ### 3. CHECK FOR UNPLANNED SPECS
 Warn if `specs/*.md` (excluding doing-/done-) exist. Non-blocking.
@@ -244,12 +278,30 @@ Topic extraction:
 ### 5. IDENTIFY READY TASKS
-Ready = `[ ]` + all `blocked_by` complete + experiment validated (if applicable) + not in checkpoint.
+Use TaskList to find ready tasks (replaces manual PLAN.md parsing):
+```
+Ready = TaskList results where:
+  - status: "pending"
+  - blockedBy: empty (auto-unblocked by native dependency system)
+```
+**Cross-check with experiment validation** (for spike-blocked tasks):
+- If task depends on spike AND experiment not `--passed.md` → still blocked
+  - TaskUpdate to add spike as blocker if not already set
+Ready = TaskList pending + empty blockedBy + experiment validated (if applicable).
 ### 6. SPAWN AGENTS
 Context ≥50%: checkpoint and exit.
+**Before spawning each agent**, mark its native task as in_progress:
+```
+TaskUpdate(taskId: native_id, status: "in_progress")
+```
+This activates the UI spinner showing the task's activeForm (e.g. "Creating upload endpoint").
 **CRITICAL: Spawn ALL ready tasks in a SINGLE response with MULTIPLE Task tool calls.**
 DO NOT spawn one task, wait, then spawn another. Instead, call Task tool multiple times in the SAME message block. This enables true parallelism.
@@ -319,8 +371,15 @@ Then rename experiment:
 **Gate:**
 ```
-VERIFIED_PASS → Unblock, log "✓ Spike {task_id} verified"
-VERIFIED_FAIL → Block, log "✗ Spike {task_id} failed verification"
+VERIFIED_PASS →
+  TaskUpdate(taskId: spike_native_id, status: "completed")
+  # Native system auto-unblocks dependent tasks
+  Log "✓ Spike {task_id} verified"
+VERIFIED_FAIL →
+  # Spike task stays as pending, dependents remain blocked
+  # No TaskUpdate needed — native system keeps them blocked
+  Log "✗ Spike {task_id} failed verification"
   If override: log "⚠ Agent incorrectly marked as passed"
 ```
@@ -349,8 +408,18 @@ Example: To edit src/foo.ts, use:
 Do NOT write files to the main project directory.
-Implement, test, commit as feat({spec}): {description}.
-Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
+Steps:
+1. Implement the task
+2. Detect test command: check for package.json (npm test), pyproject.toml (pytest),
+   Cargo.toml (cargo test), go.mod (go test ./...), or Makefile (make test)
+3. Run tests if test infrastructure exists:
+   - Run the detected test command
+   - If tests fail: fix the code and re-run until passing
+   - Do NOT commit with failing tests
+4. If NO test infrastructure: set tests_ran: false in result file
+5. Commit as feat({spec}): {description}
+6. Write result file with ALL fields including test evidence (see schema):
+   {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
 **STOP after writing the result file. Do NOT:**
 - Merge branches or cherry-pick commits
@@ -376,6 +445,7 @@ Steps:
 3. Write experiment as --active.md (verifier determines final status)
 4. Commit: spike({spec}): validate {hypothesis}
 5. Write result to .deepflow/results/{task_id}.yaml (see spike result schema)
+6. If test infrastructure exists, also run tests and include evidence in result file
 Rules:
 - `met: true` ONLY if actual satisfies target
@@ -390,6 +460,12 @@ Rules:
 When a task fails and cannot be auto-fixed:
+**Native task update:**
+```
+TaskUpdate(taskId: native_id, status: "pending")  # Reset to pending, not deleted
+```
+This keeps the task visible for retry. Dependent tasks remain blocked.
 **Behavior:**
 1. Leave worktree intact at `{worktree_path}`
 2. Keep checkpoint.json for potential resume
@@ -434,9 +510,15 @@ After spawning wave agents, your turn ENDS. Completion notifications drive the l
 **Per notification:**
 1. Read result file for the completed agent
-2. Report ONE line: "✓ Tx: status (commit)"
-3. If NOT all wave agents done → end turn, wait
-4. If ALL wave agents done → check context, update PLAN.md, spawn next wave or finish
+2. Validate test evidence:
+   - `tests_ran: true` + `test_exit_code: 0` → trust result
+   - `tests_ran: true` + `test_exit_code: non-zero` → status MUST be failed (flag mismatch if agent said success)
+   - `tests_ran: false` + `status: success` → flag: "⚠ Tx: success but no tests ran"
+3. TaskUpdate(taskId: native_id, status: "completed") — auto-unblocks dependent tasks
+4. Update PLAN.md: `[ ]` → `[x]` + commit hash (as before)
+5. Report: "✓ T1: success (abc123) [12 tests passed]" or "⚠ T1: success (abc123) [no tests]"
+6. If NOT all wave agents done → end turn, wait
+7. If ALL wave agents done → use TaskList to find newly unblocked tasks, check context, spawn next wave or finish
 **Between waves:** Check context %. If ≥50%, checkpoint and exit.
@@ -456,18 +538,41 @@ After spawning wave agents, your turn ENDS. Completion notifications drive the l
 ```
 /df:execute (context: 12%)
-Spawning Wave 1: T1, T2, T3 parallel...
+Loading PLAN.md...
+  T1: Create upload endpoint (ready)
+  T2: Add S3 service (blocked by T1)
+  T3: Add auth guard (blocked by T1)
+Registering native tasks...
+  TaskCreate → T1 (native: task-001)
+  TaskCreate → T2 (native: task-002)
+  TaskCreate → T3 (native: task-003)
+  TaskUpdate(task-002, addBlockedBy: [task-001])
+  TaskUpdate(task-003, addBlockedBy: [task-001])
+Spawning Wave 1: T1
+  TaskUpdate(task-001, status: "in_progress")  ← spinner: "Creating upload endpoint"
 [Agent "T1" completed]
-✓ T1: success (abc1234)
+  TaskUpdate(task-001, status: "completed")  ← auto-unblocks task-002, task-003
+  ✓ T1: success (abc1234)
+TaskList → task-002, task-003 now ready (blockedBy empty)
+Spawning Wave 2: T2, T3 parallel
+  TaskUpdate(task-002, status: "in_progress")
+  TaskUpdate(task-003, status: "in_progress")
 [Agent "T2" completed]
-✓ T2: success (def5678)
+  TaskUpdate(task-002, status: "completed")
+  ✓ T2: success (def5678)
 [Agent "T3" completed]
-✓ T3: success (ghi9012)
+  TaskUpdate(task-003, status: "completed")
+  ✓ T3: success (ghi9012)
-Wave 1 complete (3/3). Context: 35%
+Wave 2 complete (2/2). Context: 35%
 ✓ doing-upload → done-upload
 ✓ Complete: 3/3 tasks
@@ -480,27 +585,43 @@ Next: Run /df:verify to verify specs and merge to main
 ```
 /df:execute (context: 10%)
+Loading PLAN.md...
+Registering native tasks...
+  TaskCreate → T1 [SPIKE] (native: task-001)
+  TaskCreate → T2 (native: task-002)
+  TaskCreate → T3 (native: task-003)
+  TaskUpdate(task-002, addBlockedBy: [task-001])
+  TaskUpdate(task-003, addBlockedBy: [task-001])
 Checking experiment status...
   T1 [SPIKE]: No experiment yet, spike executable
   T2: Blocked by T1 (spike not validated)
   T3: Blocked by T1 (spike not validated)
-Spawning Wave 1: T1 [SPIKE]...
+Spawning Wave 1: T1 [SPIKE]
+  TaskUpdate(task-001, status: "in_progress")
 [Agent "T1 SPIKE" completed]
 ✓ T1: complete, verifying...
 Verifying T1...
   ✓ Spike T1 verified (throughput 8500 >= 7000)
+  TaskUpdate(task-001, status: "completed")  ← auto-unblocks task-002, task-003
   → upload--streaming--passed.md
-Spawning Wave 2: T2, T3 parallel...
+TaskList → task-002, task-003 now ready
+Spawning Wave 2: T2, T3 parallel
+  TaskUpdate(task-002, status: "in_progress")
+  TaskUpdate(task-003, status: "in_progress")
 [Agent "T2" completed]
-✓ T2: success (def5678)
+  TaskUpdate(task-002, status: "completed")
+  ✓ T2: success (def5678)
 [Agent "T3" completed]
-✓ T3: success (ghi9012)
+  TaskUpdate(task-003, status: "completed")
+  ✓ T3: success (ghi9012)
 Wave 2 complete (2/2). Context: 40%
@@ -515,11 +636,16 @@ Next: Run /df:verify to verify specs and merge to main
 ```
 /df:execute (context: 10%)
+Registering native tasks...
+  TaskCreate → T1 [SPIKE], T2, T3 (with dependencies)
 Wave 1: T1 [SPIKE] (context: 15%)
+  TaskUpdate(task-001, status: "in_progress")
   T1: complete, verifying...
 Verifying T1...
   ✗ Spike T1 failed verification (throughput 1500 < 7000)
+  # Spike stays pending — dependents remain blocked
   → upload--streaming--failed.md
 ⚠ Spike T1 invalidated hypothesis
@@ -533,12 +659,17 @@ Next: Run /df:plan to generate new hypothesis spike
 ```
 /df:execute (context: 10%)
+Registering native tasks...
+  TaskCreate → T1 [SPIKE], T2, T3 (with dependencies)
 Wave 1: T1 [SPIKE] (context: 15%)
+  TaskUpdate(task-001, status: "in_progress")
   T1: complete (agent said: success), verifying...
 Verifying T1...
   ✗ Spike T1 failed verification (throughput 1500 < 7000)
   ⚠ Agent incorrectly marked as passed — overriding to FAILED
+  TaskUpdate(task-001, status: "pending")  ← reset, dependents stay blocked
   → upload--streaming--failed.md
 ⚠ Spike T1 invalidated hypothesis

package/src/commands/df/verify.md CHANGED Viewed

@@ -40,35 +40,116 @@ Load:
 If no done-* specs: report counts, suggest `--doing`.
+### 1.5. DETECT PROJECT COMMANDS
+Detect build and test commands by inspecting project files in the worktree.
+**Config override always wins.** If `.deepflow/config.yaml` has `quality.test_command` or `quality.build_command`, use those.
+**Auto-detection (first match wins):**
+| File | Build | Test |
+|------|-------|------|
+| `package.json` with `scripts.build` | `npm run build` | `npm test` (if scripts.test is not default placeholder) |
+| `pyproject.toml` or `setup.py` | — | `pytest` |
+| `Cargo.toml` | `cargo build` | `cargo test` |
+| `go.mod` | `go build ./...` | `go test ./...` |
+| `Makefile` with `test` target | `make build` (if target exists) | `make test` |
+**Output:**
+- Commands found: `Build: npm run build | Test: npm test`
+- Nothing found: `⚠ No build/test commands detected. L0/L4 skipped. Set quality.test_command in .deepflow/config.yaml`
 ### 2. VERIFY EACH SPEC
+**L0: Build check** (if build command detected)
+Run the build command in the worktree:
+- Exit code 0 → L0 pass, continue to L1-L3
+- Exit code non-zero → L0 FAIL
+  - Report: "✗ L0: Build failed" with last 30 lines of output
+  - Add fix task: "Fix build errors" to PLAN.md
+  - Do NOT proceed to L1-L4 (no point checking if code doesn't build)
+**L1-L3: Static analysis** (via Explore agents)
 Check requirements, acceptance criteria, and quality (stubs/TODOs).
 Mark each: ✓ satisfied | ✗ missing | ⚠ partial
+**L4: Test execution** (if test command detected)
+Run AFTER L0 passes and L1-L3 complete. Run even if L1-L3 found issues — test failures reveal additional problems.
+- Run test command in the worktree (timeout from config, default 5 min)
+- Exit code 0 → L4 pass
+- Exit code non-zero → L4 FAIL
+  - Capture last 50 lines of output
+  - Report: "✗ L4: Tests failed (N of M)" with relevant output
+  - Add fix task: "Fix failing tests" with test output in description
+**Flaky test handling** (if `quality.test_retry_on_fail: true` in config):
+- If tests fail, re-run ONCE
+- Second run passes → L4 pass with note: "⚠ L4: Passed on retry (possible flaky test)"
+- Second run fails → genuine failure
 ### 3. GENERATE REPORT
-Report per spec: requirements count, acceptance count, quality issues.
+Report per spec with L0/L4 status, requirements count, acceptance count, quality issues.
-**If all pass:** Proceed to Post-Verification merge.
+**Format on success:**
+```
+done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✓ (12 tests) | 0 quality issues
+```
-**If issues found:** Add fix tasks to PLAN.md in the worktree and loop back to execute:
+**Format on failure:**
+```
+done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✗ (3 failed) | 0 quality issues
+Issues:
+  ✗ L4: 3 test failures
+    FAIL src/upload.test.ts > should validate file type
+    FAIL src/upload.test.ts > should reject oversized files
+Fix tasks added to PLAN.md:
+  T10: Fix 3 failing tests in upload module
+```
+**Gate conditions (ALL must pass to merge):**
+- L0: Build passes (or no build command detected)
+- L1-L3: All requirements satisfied, no stubs, properly wired
+- L4: Tests pass (or no test command detected)
+**If all gates pass:** Proceed to Post-Verification merge.
+**If issues found:** Add fix tasks to PLAN.md in the worktree and register as native tasks, then loop back to execute:
 1. Discover worktree (same logic as Post-Verification step 1)
 2. Write new fix tasks to `{worktree_path}/PLAN.md` under the existing spec section
    - Task IDs continue from last (e.g. if T9 was last, fixes start at T10)
    - Format: `- [ ] **T10**: Fix {description}` with `Files:` and details
-3. Output report + next step:
+3. Register fix tasks as native tasks for immediate tracking:
+   ```
+   For each fix task added:
+     TaskCreate(subject: "T10: Fix {description}", description: "...", activeForm: "Fixing {description}")
+     TaskUpdate(addBlockedBy: [...]) if dependencies exist
+   ```
+   This allows `/df:execute --continue` to find fix tasks via TaskList immediately.
+4. Output report + next step:
 ```
-done-upload.md: 4/4 reqs ✓, 3/5 acceptance ✗, 1 quality issue
+done-upload.md: L0 ✓ | 4/4 reqs ✓, 3/5 acceptance ✗ | L4 ✗ (2 failed) | 1 quality issue
 Issues:
   ✗ AC-3: YAML parsing missing for consolation
+  ✗ L4: 2 test failures
+    FAIL src/upload.test.ts > should validate file type
+    FAIL src/upload.test.ts > should reject oversized files
   ⚠ Quality: TODO in parse_config()
 Fix tasks added to PLAN.md:
   T10: Add YAML parsing for consolation section
-  T11: Remove TODO in parse_config()
+  T11: Fix 2 failing tests in upload module
+  T12: Remove TODO in parse_config()
 Run /df:execute --continue to fix in the same worktree.
 ```
@@ -98,14 +179,16 @@ Files: ...
 ## Verification Levels
-| Level | Check | Method |
-|-------|-------|--------|
-| L1: Exists | File/function exists | Glob/Grep |
-| L2: Substantive | Real code, not stub | Read + analyze |
-| L3: Wired | Integrated into system | Trace imports/calls |
-| L4: Tested | Has passing tests | Run tests |
+| Level | Check | Method | Runner |
+|-------|-------|--------|--------|
+| L0: Builds | Code compiles/builds | Run build command | Orchestrator (Bash) |
+| L1: Exists | File/function exists | Glob/Grep | Explore agents |
+| L2: Substantive | Real code, not stub | Read + analyze | Explore agents |
+| L3: Wired | Integrated into system | Trace imports/calls | Explore agents |
+| L4: Tested | Tests pass | Run test command | Orchestrator (Bash) |
-Default: L1-L3 (L4 optional, can be slow)
+**Default: L0 through L4.** L0 and L4 are skipped ONLY if no build/test command is detected (see step 1.5).
+L0 and L4 run directly via Bash — Explore agents cannot execute commands.
 ## Rules
 - **Never use TaskOutput** — Returns full transcripts that explode context
@@ -140,10 +223,12 @@ Scale: 1-2 agents per spec, cap 10.
 ```
 /df:verify
-done-upload.md: 4/4 reqs ✓, 5/5 acceptance ✓, clean
-done-auth.md: 2/2 reqs ✓, 3/3 acceptance ✓, clean
+Build: npm run build | Test: npm test
+done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✓ (12 tests) | 0 quality issues
+done-auth.md: L0 ✓ | 2/2 reqs ✓, 3/3 acceptance ✓ | L4 ✓ (8 tests) | 0 quality issues
-✓ All specs verified
+✓ All gates passed
 ✓ Merged df/upload to main
 ✓ Cleaned up worktree and branch
@@ -156,22 +241,29 @@ Learnings captured:
 ```
 /df:verify --doing
-doing-upload.md: 4/4 reqs ✓, 3/5 acceptance ✗, 1 quality issue
+Build: npm run build | Test: npm test
+doing-upload.md: L0 ✓ | 4/4 reqs ✓, 3/5 acceptance ✗ | L4 ✗ (3 failed) | 1 quality issue
 Issues:
   ✗ AC-3: YAML parsing missing for consolation
+  ✗ L4: 3 test failures
+    FAIL src/upload.test.ts > should validate file type
+    FAIL src/upload.test.ts > should reject oversized files
+    FAIL src/upload.test.ts > should handle empty input
   ⚠ Quality: TODO in parse_config()
 Fix tasks added to PLAN.md:
   T10: Add YAML parsing for consolation section
-  T11: Remove TODO in parse_config()
+  T11: Fix 3 failing tests in upload module
+  T12: Remove TODO in parse_config()
 Run /df:execute --continue to fix in the same worktree.
 ```
 ## Post-Verification: Worktree Merge & Cleanup
-**Only runs when ALL specs pass verification.** If issues were found, fix tasks were added to PLAN.md instead (see step 3).
+**Only runs when ALL gates pass** (L0 build, L1-L3 static analysis, L4 tests). If any gate fails, fix tasks were added to PLAN.md instead (see step 3).
 ### 1. DISCOVER WORKTREE

package/templates/config-template.yaml CHANGED Viewed

@@ -61,3 +61,17 @@ worktree:
   # Keep worktree after failed execution for debugging
   cleanup_on_fail: false
+# Quality gates for /df:verify
+quality:
+  # Override auto-detected build command (e.g., "npm run build", "cargo build")
+  build_command: ""
+  # Override auto-detected test command (e.g., "npm test", "pytest", "go test ./...")
+  test_command: ""
+  # Test timeout in seconds (default: 300 = 5 minutes)
+  test_timeout: 300
+  # Retry flaky tests once before failing (default: true)
+  test_retry_on_fail: true