npm - deepflow - Versions diffs - 0.1.46 → 0.1.48 - Mend

deepflow 0.1.46 → 0.1.48

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/package.json +1 -1
package/src/commands/df/debate.md +12 -1
package/src/commands/df/discover.md +17 -1
package/src/commands/df/execute.md +48 -9
package/src/commands/df/note.md +206 -0
package/src/commands/df/plan.md +13 -0
package/src/commands/df/resume.md +130 -0
package/src/commands/df/spec.md +13 -1
package/src/commands/df/verify.md +118 -17
package/templates/config-template.yaml +14 -0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "deepflow",
-  "version": "0.1.46",
+  "version": "0.1.48",
   "description": "Stay in flow state - lightweight spec-driven task orchestration for Claude Code",
   "keywords": [
     "claude",

package/src/commands/df/debate.md CHANGED Viewed

@@ -4,7 +4,7 @@
 You coordinate reasoner agents to debate a problem from multiple perspectives, then synthesize their arguments into a structured document.
-**NEVER:** Read source files, use Glob/Grep directly, run git, use TaskOutput, use `run_in_background`, use Explore agents
+**NEVER:** Read source files, use Glob/Grep directly, run git, use TaskOutput, use `run_in_background`, use Explore agents, use EnterPlanMode, use ExitPlanMode
 **ONLY:** Spawn reasoner agents (non-background), write debate file, respond conversationally
@@ -233,6 +233,17 @@ Open decisions:
 Next: Run /df:spec {name} to formalize into a specification
 ```
+### 6. CAPTURE DECISIONS
+Extract up to 4 candidates from consensus/resolved tensions. Ask user via `AskUserQuestion(multiSelect=True)` with options like `{ label: "[APPROACH] {decision}", description: "{rationale}" }`.
+For confirmed decisions, append to `.deepflow/decisions.md` (create if absent) using format:
+```
+### {YYYY-MM-DD} — debate
+- [{TAG}] {decision text} — {rationale}
+```
+Tags: [APPROACH] directional choices · [PROVISIONAL] tentative · [ASSUMPTION] unverified premises. If a new decision contradicts an existing one, note the conflict inline.
 ---
 ## Rules

package/src/commands/df/discover.md CHANGED Viewed

@@ -4,7 +4,7 @@
 You are a Socratic questioner. Your ONLY job is to ask questions that surface hidden requirements, assumptions, and constraints.
-**NEVER:** Read source files, use Glob/Grep, spawn agents, create files, run git, use TaskOutput, use Task tool
+**NEVER:** Read source files, use Glob/Grep, spawn agents, create files (except `.deepflow/decisions.md`), run git, use TaskOutput, use Task tool, use EnterPlanMode, use ExitPlanMode
 **ONLY:** Ask questions using `AskUserQuestion` tool, respond conversationally
@@ -90,6 +90,22 @@ Example questions:
 - Keep your responses short between questions — don't lecture
 - Acknowledge answers briefly before asking the next question
+### Decision Capture
+When the user signals they are ready to move on, before presenting next-step options, extract up to 4 candidate decisions from the session (meaningful choices about approach, scope, or constraints). Present via `AskUserQuestion` with `multiSelect: true`, e.g.:
+```json
+{"questions": [{"question": "Which decisions should be recorded?", "header": "Decisions", "multiSelect": true,
+  "options": [{"label": "[APPROACH] Use event sourcing", "description": "Matches audit requirements"}]}]}
+```
+For each confirmed decision, append to `.deepflow/decisions.md` (create if missing):
+```
+### {YYYY-MM-DD} — discover
+- [APPROACH] Decision text — rationale
+```
+Tags: `[APPROACH]` firm choice · `[PROVISIONAL]` revisit later · `[ASSUMPTION]` unverified belief.
 ### When the User Wants to Move On
 When the user signals they want to advance (e.g., "I think that's enough", "let's move on", "ready for next step"):

package/src/commands/df/execute.md CHANGED Viewed

@@ -4,9 +4,9 @@
 You are a coordinator. Spawn agents, wait for results, update PLAN.md. Never implement code yourself.
-**NEVER:** Read source files, edit code, run tests, run git commands (except status), use TaskOutput
+**NEVER:** Read source files, edit code, run tests, run git commands (except status), use TaskOutput, use EnterPlanMode, use ExitPlanMode
-**ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, read `.deepflow/results/*.yaml` on completion notifications, update PLAN.md
+**ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, read `.deepflow/results/*.yaml` on completion notifications, update PLAN.md, write `.deepflow/decisions.md` in the main tree
 ---
@@ -99,8 +99,16 @@ task: T3
 status: success|failed
 commit: abc1234
 summary: "one line"
+tests_ran: true|false
+test_command: "npm test"
+test_exit_code: 0
+test_output_tail: |
+  PASS src/upload.test.ts
+  Tests: 12 passed, 12 total
 ```
+New fields: `tests_ran` (bool), `test_command` (string), `test_exit_code` (int), `test_output_tail` (last 20 lines of output).
 **Spike result file** `.deepflow/results/{task_id}.yaml` (additional fields):
 ```yaml
 task: T1
@@ -400,8 +408,18 @@ Example: To edit src/foo.ts, use:
 Do NOT write files to the main project directory.
-Implement, test, commit as feat({spec}): {description}.
-Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
+Steps:
+1. Implement the task
+2. Detect test command: check for package.json (npm test), pyproject.toml (pytest),
+   Cargo.toml (cargo test), go.mod (go test ./...), or Makefile (make test)
+3. Run tests if test infrastructure exists:
+   - Run the detected test command
+   - If tests fail: fix the code and re-run until passing
+   - Do NOT commit with failing tests
+4. If NO test infrastructure: set tests_ran: false in result file
+5. Commit as feat({spec}): {description}
+6. Write result file with ALL fields including test evidence (see schema):
+   {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
 **STOP after writing the result file. Do NOT:**
 - Merge branches or cherry-pick commits
@@ -427,6 +445,7 @@ Steps:
 3. Write experiment as --active.md (verifier determines final status)
 4. Commit: spike({spec}): validate {hypothesis}
 5. Write result to .deepflow/results/{task_id}.yaml (see spike result schema)
+6. If test infrastructure exists, also run tests and include evidence in result file
 Rules:
 - `met: true` ONLY if actual satisfies target
@@ -491,16 +510,36 @@ After spawning wave agents, your turn ENDS. Completion notifications drive the l
 **Per notification:**
 1. Read result file for the completed agent
-2. TaskUpdate(taskId: native_id, status: "completed") — auto-unblocks dependent tasks
-3. Update PLAN.md: `[ ]` → `[x]` + commit hash (as before)
-4. Report ONE line: "✓ Tx: status (commit)"
-5. If NOT all wave agents done → end turn, wait
-6. If ALL wave agents done → use TaskList to find newly unblocked tasks, check context, spawn next wave or finish
+2. Validate test evidence:
+   - `tests_ran: true` + `test_exit_code: 0` → trust result
+   - `tests_ran: true` + `test_exit_code: non-zero` → status MUST be failed (flag mismatch if agent said success)
+   - `tests_ran: false` + `status: success` → flag: "⚠ Tx: success but no tests ran"
+3. TaskUpdate(taskId: native_id, status: "completed") — auto-unblocks dependent tasks
+4. Update PLAN.md: `[ ]` → `[x]` + commit hash (as before)
+5. Report: "✓ T1: success (abc123) [12 tests passed]" or "⚠ T1: success (abc123) [no tests]"
+6. If NOT all wave agents done → end turn, wait
+7. If ALL wave agents done → use TaskList to find newly unblocked tasks, check context, spawn next wave or finish
 **Between waves:** Check context %. If ≥50%, checkpoint and exit.
 **Repeat** until: all done, all blocked, or context ≥50% (checkpoint).
+### 11. CAPTURE DECISIONS
+After all tasks complete (or all blocked), extract up to 4 candidate decisions from the session (implementation patterns, deviations from plan, key assumptions made).
+Present via AskUserQuestion with multiSelect: true. Labels: `[TAG] decision text`. Descriptions: rationale.
+For each confirmed decision, append to **main tree** `.deepflow/decisions.md` (create if missing):
+```
+### {YYYY-MM-DD} — execute
+- [APPROACH] Parallel agent spawn for independent tasks — confirmed no file conflicts
+```
+Main tree path: use the repo root (parent of `.deepflow/worktrees/`), NOT the worktree.
+Max 4 candidates per prompt. Tags: [APPROACH], [PROVISIONAL], [ASSUMPTION].
 ## Rules
 | Rule | Detail |

package/src/commands/df/note.md ADDED Viewed

@@ -0,0 +1,206 @@
+# /df:note — Capture Decisions from Free Conversations
+## Orchestrator Role
+You scan prior conversation context for candidate decisions, present them for user confirmation, and persist confirmed decisions to `.deepflow/decisions.md`.
+**NEVER:** Spawn agents, use Task tool, use Glob/Grep on source code, run git, use TaskOutput, use EnterPlanMode, use ExitPlanMode
+**ONLY:** Read `.deepflow/decisions.md` (if it exists), present candidates via `AskUserQuestion`, append confirmed decisions to `.deepflow/decisions.md`
+---
+## Purpose
+Capture decisions that emerged during free conversations outside of deepflow commands. Surfaces candidate decisions from the current conversation, lets the user confirm or discard each, and persists confirmed ones to the shared decisions log.
+## Usage
+```
+/df:note
+```
+No arguments required. Operates on the current conversation context.
+---
+## Behavior
+### 1. EXTRACT CANDIDATES
+Scan the prior conversation messages for candidate decisions. A decision is any resolved choice, adopted approach, or stated assumption that affects how the work is done. Look for:
+- **Approaches chosen**: "we'll use X instead of Y", "let's go with X"
+- **Provisional choices**: "for now we'll use X", "assuming X until we know more"
+- **Stated assumptions**: "assuming X is true", "treating X as given"
+- **Constraints accepted**: "we won't do X", "X is out of scope"
+- **Naming or structural choices**: "we'll call it X", "X goes in the Y layer"
+Extract **at most 4 candidates** from the conversation. Prioritize the most consequential or recent ones.
+For each candidate, determine:
+- **Tag**: one of `[APPROACH]`, `[PROVISIONAL]`, or `[ASSUMPTION]`
+  - `[APPROACH]` — a deliberate design or implementation choice
+  - `[PROVISIONAL]` — works for now, expected to revisit
+  - `[ASSUMPTION]` — treating something as true without full validation
+- **Decision text**: one concise line describing the choice
+- **Rationale**: one sentence explaining why this was chosen
+If fewer than 2 clear candidates are found, say so briefly and exit without calling `AskUserQuestion`.
+### 2. CHECK FOR CONTRADICTIONS
+Read `.deepflow/decisions.md` if it exists. For each candidate, check whether it contradicts a prior entry in the file.
+If a contradiction is found:
+- Keep the prior entry — never delete or modify it
+- Amend the candidate's rationale to reference the prior decision: `was "X", now "Y" because Z`
+### 3. PRESENT VIA AskUserQuestion
+Present candidates as a multi-select question with at most 4 options (tool limit).
+```json
+{
+  "questions": [
+    {
+      "question": "These decisions were detected in your conversation. Which should be saved to .deepflow/decisions.md?",
+      "header": "Save notes?",
+      "multiSelect": true,
+      "options": [
+        {
+          "label": "[APPROACH] <decision text>",
+          "description": "<rationale>"
+        },
+        {
+          "label": "[PROVISIONAL] <decision text>",
+          "description": "<rationale>"
+        }
+      ]
+    }
+  ]
+}
+```
+Each option's `label` is the tag + decision text. Each `description` is the rationale (one sentence).
+### 4. APPEND CONFIRMED DECISIONS
+For each option the user selects:
+1. If `.deepflow/decisions.md` does not exist, create it with a blank header:
+   ```
+   # Decisions
+   ```
+2. Append a new dated section using today's date in `YYYY-MM-DD` format and source `note`:
+   ```markdown
+   ### 2026-02-22 — note
+   - [APPROACH] Use event sourcing over CRUD — append-only log matches audit requirements
+   - [PROVISIONAL] Batch size = 50 — works for 4-game dataset, revisit at scale
+   ```
+3. If multiple decisions are confirmed in one invocation, group them under a single dated section.
+4. Never modify or delete any prior entries.
+### 5. CONFIRM
+After writing, report to the user:
+```
+Saved N decision(s) to .deepflow/decisions.md
+```
+If the user selected nothing, respond:
+```
+No decisions saved.
+```
+---
+## Decision Format
+```
+### YYYY-MM-DD — note
+- [TAG] Decision text — rationale
+```
+**Tags:**
+- `[APPROACH]` — deliberate design or implementation choice
+- `[PROVISIONAL]` — works for now, will revisit at scale or with more information
+- `[ASSUMPTION]` — treating something as true without full confirmation
+**Contradiction handling:** Never delete prior entries. When a new decision contradicts an older one, include a reference in the rationale: `was "X", now "Y" because Z`.
+---
+## Rules
+- **Maximum 4 candidates** per invocation (tool limit for AskUserQuestion options)
+- **multiSelect: true** — user can confirm any subset, including all or none
+- **Never invent decisions** — only extract what was genuinely discussed and resolved in the conversation
+- **Never modify prior entries** in `.deepflow/decisions.md`
+- **Create the file if absent** — first use initializes it with `# Decisions` header
+- **Source is always `note`** — this distinguishes from decisions captured by other df commands
+- **Date is today's date** — use the current date in `YYYY-MM-DD` format
+- **One AskUserQuestion call** — present all candidates in a single call, not sequentially
+---
+## Example
+```
+USER: /df:note
+CLAUDE: Let me scan our conversation for candidate decisions.
+[Reads .deepflow/decisions.md — finds prior entry:
+  "### 2026-02-10 — spec
+  - [APPROACH] Use PostgreSQL for primary storage — team already knows it"]
+[Extracts from conversation:
+  1. [APPROACH] Use Redis for session state — avoids DB round-trips on hot path
+  2. [PROVISIONAL] Skip auth for internal APIs — only 2 internal services for now
+  3. [ASSUMPTION] Peak load is 500 req/s — based on current traffic, unvalidated
+]
+[No contradictions found with prior entries]
+[Uses AskUserQuestion]
+{
+  "questions": [
+    {
+      "question": "These decisions were detected in your conversation. Which should be saved to .deepflow/decisions.md?",
+      "header": "Save notes?",
+      "multiSelect": true,
+      "options": [
+        {
+          "label": "[APPROACH] Use Redis for session state",
+          "description": "Avoids DB round-trips on hot path"
+        },
+        {
+          "label": "[PROVISIONAL] Skip auth for internal APIs",
+          "description": "Only 2 internal services for now, revisit when we open to more consumers"
+        },
+        {
+          "label": "[ASSUMPTION] Peak load is 500 req/s",
+          "description": "Based on current traffic patterns, not load-tested"
+        }
+      ]
+    }
+  ]
+}
+USER: [Selects: Use Redis for session state, Peak load is 500 req/s]
+CLAUDE: [Appends to .deepflow/decisions.md:]
+### 2026-02-22 — note
+- [APPROACH] Use Redis for session state — avoids DB round-trips on hot path
+- [ASSUMPTION] Peak load is 500 req/s — based on current traffic patterns, not load-tested
+Saved 2 decision(s) to .deepflow/decisions.md
+```

package/src/commands/df/plan.md CHANGED Viewed

@@ -3,6 +3,8 @@
 ## Purpose
 Compare specs against codebase and past experiments. Generate prioritized tasks.
+**NEVER:** use EnterPlanMode, use ExitPlanMode — this command IS the planning phase; native plan mode conflicts with it
 ## Usage
 ```
 /df:plan                 # Plan all new specs
@@ -220,6 +222,17 @@ Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validatio
 `✓ Plan generated — {n} specs, {n} tasks. Run /df:execute`
+### 11. CAPTURE DECISIONS
+Extract up to 4 candidate decisions (approaches chosen, spike strategies, prioritization rationale). Present via AskUserQuestion with `multiSelect: true`. Each option: `label: "[TAG] <decision>"`, `description: "<rationale>"`. Tags: `[APPROACH]`, `[PROVISIONAL]`, `[ASSUMPTION]`.
+Append confirmed decisions to `.deepflow/decisions.md` (create if missing):
+```
+### {YYYY-MM-DD} — plan
+- [TAG] Decision text — rationale summary
+```
+If a decision contradicts a prior entry, add: `(supersedes: <prior text>)`
 ## Rules
 - **Never use TaskOutput** — Returns full transcripts that explode context
 - **Never use run_in_background for Explore agents** — Causes late notifications that pollute output

package/src/commands/df/resume.md ADDED Viewed

@@ -0,0 +1,130 @@
+# /df:resume — Session Continuity Briefing
+## Orchestrator Role
+You are a context synthesizer. Your ONLY job is to read project state from multiple sources and produce a concise, structured briefing so developers can resume work after a break.
+**NEVER:** Write files, create files, modify files, append to files, run git with write operations, use AskUserQuestion, spawn agents, use TaskOutput, use EnterPlanMode, use ExitPlanMode
+**ONLY:** Read files (Bash read-only git commands, Read tool, Glob, Grep), write briefing to stdout
+---
+## Purpose
+Synthesize project state into a 200-500 word briefing covering what happened, what decisions are live, and what to do next. Pure read-only — writes nothing.
+## Usage
+```
+/df:resume
+```
+## Behavior
+### 1. GATHER SOURCES
+Read these sources in parallel (all reads, no writes):
+| Source | Command/Path | Purpose |
+|--------|-------------|---------|
+| Git timeline | `git log --oneline -20` | What changed and when |
+| Decisions | `.deepflow/decisions.md` | Current [APPROACH], [PROVISIONAL], [ASSUMPTION] entries |
+| Plan | `PLAN.md` | Task status (checked vs unchecked) |
+| Spec headers | `specs/doing-*.md` (first 20 lines each) | What features are in-flight |
+| Experiments | `.deepflow/experiments/` (file listing + names) | Validated and failed approaches |
+**Token budget:** Read only what's needed — ~2500 tokens total across all sources.
+If a source does not exist, skip it silently (do not error or warn).
+### 2. SYNTHESIZE BRIEFING
+Produce a 200-500 word briefing with exactly three sections:
+---
+**## Timeline**
+Summarize what happened and when, derived from `git log --oneline -20` and spec/PLAN.md state. Describe the arc of work: what was completed, what is in-flight, notable milestones. Reference dates or commit messages where informative. Aim for 3-6 sentences.
+**## Live Decisions**
+List all current `[APPROACH]`, `[PROVISIONAL]`, and `[ASSUMPTION]` entries from `.deepflow/decisions.md`. Present each as a bullet with its tag, the decision text, and a brief rationale if available.
+If `.deepflow/decisions.md` does not exist or is empty: state "No decisions recorded yet."
+Do not filter or editorialize — report all live decision entries as found. If a decision has been contradicted (a newer entry supersedes it), show only the newest entry for that topic.
+**## Next Steps**
+Derive next steps from `PLAN.md` task status:
+- List unchecked tasks (`- [ ]`) that are not blocked
+- Note blocked tasks and what they are blocked by
+- If no PLAN.md exists: suggest running `/df:plan` to generate tasks
+Prioritize: unblocked tasks first, then blocked tasks with their blockers, then any tasks with no explicit status.
+---
+### 3. OUTPUT
+Print the briefing directly to stdout. No file writes. No file creation.
+Format:
+```
+## Timeline
+[3-6 sentences on what happened]
+## Live Decisions
+- [APPROACH] ...
+- [PROVISIONAL] ...
+- [ASSUMPTION] ...
+## Next Steps
+- T1: ... (ready)
+- T2: ... (blocked by T1)
+```
+Word count target: 200-500 words. Do not pad. Do not truncate important information to hit limits.
+---
+## Rules
+- **NEVER write any file** — not decisions.md, not PLAN.md, not any new file
+- **NEVER use AskUserQuestion** — this command is read-only, no interaction
+- **NEVER spawn agents** — read directly using Bash (git log) and Read tool
+- **NEVER use TaskOutput** — returns full transcripts that explode context
+- **NEVER use EnterPlanMode or ExitPlanMode**
+- Read sources in a single pass — do not loop or re-read
+- If a source file is missing, skip it and note it only if relevant
+- Contradicted decisions: show newest entry per topic only
+- Token budget: stay within ~2500 tokens of input to produce ~500 words of output
+## Example
+```
+USER: /df:resume
+CLAUDE:
+## Timeline
+Work began on the session-continuity feature on 2026-02-20, with the spec
+formalized after a discover and debate phase. The plan was generated with
+5 tasks across two areas: decision capture in existing commands and two new
+commands (note, resume). Tasks T1-T3 are complete (decisions.md format,
+discover capture, plan capture). T4 (execute capture) and T5 (resume command)
+remain in progress.
+## Live Decisions
+- [APPROACH] Store decisions in .deepflow/decisions.md as append-only markdown — chosen over database to keep diffs readable and avoid new dependencies
+- [PROVISIONAL] Max 4 candidates per AskUserQuestion call — matches tool limit, revisit if UX feels too chunked
+- [ASSUMPTION] Worktree execute writes to main tree .deepflow/ path — valid as long as main tree is always the parent
+## Next Steps
+- T4: Add decision capture to /df:execute (ready — unblocked)
+- T5: Create /df:resume command (ready — unblocked)
+- T6: Add decision capture to /df:verify (blocked by T4)
+```

package/src/commands/df/spec.md CHANGED Viewed

@@ -4,7 +4,7 @@
 You coordinate agents and ask questions. You never search code directly.
-**NEVER:** Read source files, use Glob/Grep directly, run git, use TaskOutput
+**NEVER:** Read source files, use Glob/Grep directly, run git, use TaskOutput, use EnterPlanMode, use ExitPlanMode
 **ONLY:** Spawn agents (non-background), ask user questions, write spec file
@@ -176,6 +176,18 @@ Acceptance criteria: {count}
 Next: Run /df:plan to generate tasks
 ```
+### 6. CAPTURE DECISIONS
+Extract up to 4 candidate decisions (requirements chosen, constraints accepted). Use `AskUserQuestion` with `multiSelect: true`:
+- `label`: `[APPROACH|PROVISIONAL|ASSUMPTION] <decision>`
+- `description`: rationale
+Append each confirmed selection to `.deepflow/decisions.md` (create if absent):
+```
+### {YYYY-MM-DD} — spec
+- [TAG] <decision> — <rationale>
+```
 ## Rules
 - **Orchestrator never searches** — Spawn agents for all codebase exploration
 - Do NOT generate spec if critical gaps remain

package/src/commands/df/verify.md CHANGED Viewed

@@ -3,6 +3,8 @@
 ## Purpose
 Check that implemented code satisfies spec requirements and acceptance criteria.
+**NEVER:** use EnterPlanMode, use ExitPlanMode
 ## Usage
 ```
 /df:verify                  # Verify all done-* specs
@@ -40,16 +42,86 @@ Load:
 If no done-* specs: report counts, suggest `--doing`.
+### 1.5. DETECT PROJECT COMMANDS
+Detect build and test commands by inspecting project files in the worktree.
+**Config override always wins.** If `.deepflow/config.yaml` has `quality.test_command` or `quality.build_command`, use those.
+**Auto-detection (first match wins):**
+| File | Build | Test |
+|------|-------|------|
+| `package.json` with `scripts.build` | `npm run build` | `npm test` (if scripts.test is not default placeholder) |
+| `pyproject.toml` or `setup.py` | — | `pytest` |
+| `Cargo.toml` | `cargo build` | `cargo test` |
+| `go.mod` | `go build ./...` | `go test ./...` |
+| `Makefile` with `test` target | `make build` (if target exists) | `make test` |
+**Output:**
+- Commands found: `Build: npm run build | Test: npm test`
+- Nothing found: `⚠ No build/test commands detected. L0/L4 skipped. Set quality.test_command in .deepflow/config.yaml`
 ### 2. VERIFY EACH SPEC
+**L0: Build check** (if build command detected)
+Run the build command in the worktree:
+- Exit code 0 → L0 pass, continue to L1-L3
+- Exit code non-zero → L0 FAIL
+  - Report: "✗ L0: Build failed" with last 30 lines of output
+  - Add fix task: "Fix build errors" to PLAN.md
+  - Do NOT proceed to L1-L4 (no point checking if code doesn't build)
+**L1-L3: Static analysis** (via Explore agents)
 Check requirements, acceptance criteria, and quality (stubs/TODOs).
 Mark each: ✓ satisfied | ✗ missing | ⚠ partial
+**L4: Test execution** (if test command detected)
+Run AFTER L0 passes and L1-L3 complete. Run even if L1-L3 found issues — test failures reveal additional problems.
+- Run test command in the worktree (timeout from config, default 5 min)
+- Exit code 0 → L4 pass
+- Exit code non-zero → L4 FAIL
+  - Capture last 50 lines of output
+  - Report: "✗ L4: Tests failed (N of M)" with relevant output
+  - Add fix task: "Fix failing tests" with test output in description
+**Flaky test handling** (if `quality.test_retry_on_fail: true` in config):
+- If tests fail, re-run ONCE
+- Second run passes → L4 pass with note: "⚠ L4: Passed on retry (possible flaky test)"
+- Second run fails → genuine failure
 ### 3. GENERATE REPORT
-Report per spec: requirements count, acceptance count, quality issues.
+Report per spec with L0/L4 status, requirements count, acceptance count, quality issues.
+**Format on success:**
+```
+done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✓ (12 tests) | 0 quality issues
+```
+**Format on failure:**
+```
+done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✗ (3 failed) | 0 quality issues
-**If all pass:** Proceed to Post-Verification merge.
+Issues:
+  ✗ L4: 3 test failures
+    FAIL src/upload.test.ts > should validate file type
+    FAIL src/upload.test.ts > should reject oversized files
+Fix tasks added to PLAN.md:
+  T10: Fix 3 failing tests in upload module
+```
+**Gate conditions (ALL must pass to merge):**
+- L0: Build passes (or no build command detected)
+- L1-L3: All requirements satisfied, no stubs, properly wired
+- L4: Tests pass (or no test command detected)
+**If all gates pass:** Proceed to Post-Verification merge.
 **If issues found:** Add fix tasks to PLAN.md in the worktree and register as native tasks, then loop back to execute:
@@ -67,15 +139,19 @@ Report per spec: requirements count, acceptance count, quality issues.
 4. Output report + next step:
 ```
-done-upload.md: 4/4 reqs ✓, 3/5 acceptance ✗, 1 quality issue
+done-upload.md: L0 ✓ | 4/4 reqs ✓, 3/5 acceptance ✗ | L4 ✗ (2 failed) | 1 quality issue
 Issues:
   ✗ AC-3: YAML parsing missing for consolation
+  ✗ L4: 2 test failures
+    FAIL src/upload.test.ts > should validate file type
+    FAIL src/upload.test.ts > should reject oversized files
   ⚠ Quality: TODO in parse_config()
 Fix tasks added to PLAN.md:
   T10: Add YAML parsing for consolation section
-  T11: Remove TODO in parse_config()
+  T11: Fix 2 failing tests in upload module
+  T12: Remove TODO in parse_config()
 Run /df:execute --continue to fix in the same worktree.
 ```
@@ -105,14 +181,16 @@ Files: ...
 ## Verification Levels
-| Level | Check | Method |
-|-------|-------|--------|
-| L1: Exists | File/function exists | Glob/Grep |
-| L2: Substantive | Real code, not stub | Read + analyze |
-| L3: Wired | Integrated into system | Trace imports/calls |
-| L4: Tested | Has passing tests | Run tests |
+| Level | Check | Method | Runner |
+|-------|-------|--------|--------|
+| L0: Builds | Code compiles/builds | Run build command | Orchestrator (Bash) |
+| L1: Exists | File/function exists | Glob/Grep | Explore agents |
+| L2: Substantive | Real code, not stub | Read + analyze | Explore agents |
+| L3: Wired | Integrated into system | Trace imports/calls | Explore agents |
+| L4: Tested | Tests pass | Run test command | Orchestrator (Bash) |
-Default: L1-L3 (L4 optional, can be slow)
+**Default: L0 through L4.** L0 and L4 are skipped ONLY if no build/test command is detected (see step 1.5).
+L0 and L4 run directly via Bash — Explore agents cannot execute commands.
 ## Rules
 - **Never use TaskOutput** — Returns full transcripts that explode context
@@ -147,10 +225,12 @@ Scale: 1-2 agents per spec, cap 10.
 ```
 /df:verify
-done-upload.md: 4/4 reqs ✓, 5/5 acceptance ✓, clean
-done-auth.md: 2/2 reqs ✓, 3/3 acceptance ✓, clean
+Build: npm run build | Test: npm test
+done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✓ (12 tests) | 0 quality issues
+done-auth.md: L0 ✓ | 2/2 reqs ✓, 3/3 acceptance ✓ | L4 ✓ (8 tests) | 0 quality issues
-✓ All specs verified
+✓ All gates passed
 ✓ Merged df/upload to main
 ✓ Cleaned up worktree and branch
@@ -163,22 +243,29 @@ Learnings captured:
 ```
 /df:verify --doing
-doing-upload.md: 4/4 reqs ✓, 3/5 acceptance ✗, 1 quality issue
+Build: npm run build | Test: npm test
+doing-upload.md: L0 ✓ | 4/4 reqs ✓, 3/5 acceptance ✗ | L4 ✗ (3 failed) | 1 quality issue
 Issues:
   ✗ AC-3: YAML parsing missing for consolation
+  ✗ L4: 3 test failures
+    FAIL src/upload.test.ts > should validate file type
+    FAIL src/upload.test.ts > should reject oversized files
+    FAIL src/upload.test.ts > should handle empty input
   ⚠ Quality: TODO in parse_config()
 Fix tasks added to PLAN.md:
   T10: Add YAML parsing for consolation section
-  T11: Remove TODO in parse_config()
+  T11: Fix 3 failing tests in upload module
+  T12: Remove TODO in parse_config()
 Run /df:execute --continue to fix in the same worktree.
 ```
 ## Post-Verification: Worktree Merge & Cleanup
-**Only runs when ALL specs pass verification.** If issues were found, fix tasks were added to PLAN.md instead (see step 3).
+**Only runs when ALL gates pass** (L0 build, L1-L3 static analysis, L4 tests). If any gate fails, fix tasks were added to PLAN.md instead (see step 3).
 ### 1. DISCOVER WORKTREE
@@ -240,3 +327,17 @@ rm -f .deepflow/checkpoint.json
 Workflow complete! Ready for next feature: /df:spec <name>
 ```
+### 4. CAPTURE DECISIONS (success path only)
+Extract up to 4 candidate decisions (quality findings, patterns validated, lessons learned). Present via AskUserQuestion with `multiSelect: true`; tags: `[APPROACH]`, `[PROVISIONAL]`, `[ASSUMPTION]`.
+```
+AskUserQuestion(question: "Which decisions to record?", multiSelect: true,
+  options: [{ label: "[APPROACH] <decision>", description: "<rationale>" }, ...])
+```
+For each confirmed decision, append to `.deepflow/decisions.md` (create if missing):
+`### {YYYY-MM-DD} — verify` / `- [TAG] {decision text} — {rationale}`
+Skip if user confirms none or declines.

package/templates/config-template.yaml CHANGED Viewed

@@ -61,3 +61,17 @@ worktree:
   # Keep worktree after failed execution for debugging
   cleanup_on_fail: false
+# Quality gates for /df:verify
+quality:
+  # Override auto-detected build command (e.g., "npm run build", "cargo build")
+  build_command: ""
+  # Override auto-detected test command (e.g., "npm test", "pytest", "go test ./...")
+  test_command: ""
+  # Test timeout in seconds (default: 300 = 5 minutes)
+  test_timeout: 300
+  # Retry flaky tests once before failing (default: true)
+  test_retry_on_fail: true