deepflow 0.1.46 → 0.1.48

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "deepflow",
3
- "version": "0.1.46",
3
+ "version": "0.1.48",
4
4
  "description": "Stay in flow state - lightweight spec-driven task orchestration for Claude Code",
5
5
  "keywords": [
6
6
  "claude",
@@ -4,7 +4,7 @@
4
4
 
5
5
  You coordinate reasoner agents to debate a problem from multiple perspectives, then synthesize their arguments into a structured document.
6
6
 
7
- **NEVER:** Read source files, use Glob/Grep directly, run git, use TaskOutput, use `run_in_background`, use Explore agents
7
+ **NEVER:** Read source files, use Glob/Grep directly, run git, use TaskOutput, use `run_in_background`, use Explore agents, use EnterPlanMode, use ExitPlanMode
8
8
 
9
9
  **ONLY:** Spawn reasoner agents (non-background), write debate file, respond conversationally
10
10
 
@@ -233,6 +233,17 @@ Open decisions:
233
233
  Next: Run /df:spec {name} to formalize into a specification
234
234
  ```
235
235
 
236
+ ### 6. CAPTURE DECISIONS
237
+
238
+ Extract up to 4 candidates from consensus/resolved tensions. Ask user via `AskUserQuestion(multiSelect=True)` with options like `{ label: "[APPROACH] {decision}", description: "{rationale}" }`.
239
+
240
+ For confirmed decisions, append to `.deepflow/decisions.md` (create if absent) using format:
241
+ ```
242
+ ### {YYYY-MM-DD} — debate
243
+ - [{TAG}] {decision text} — {rationale}
244
+ ```
245
+ Tags: [APPROACH] directional choices · [PROVISIONAL] tentative · [ASSUMPTION] unverified premises. If a new decision contradicts an existing one, note the conflict inline.
246
+
236
247
  ---
237
248
 
238
249
  ## Rules
@@ -4,7 +4,7 @@
4
4
 
5
5
  You are a Socratic questioner. Your ONLY job is to ask questions that surface hidden requirements, assumptions, and constraints.
6
6
 
7
- **NEVER:** Read source files, use Glob/Grep, spawn agents, create files, run git, use TaskOutput, use Task tool
7
+ **NEVER:** Read source files, use Glob/Grep, spawn agents, create files (except `.deepflow/decisions.md`), run git, use TaskOutput, use Task tool, use EnterPlanMode, use ExitPlanMode
8
8
 
9
9
  **ONLY:** Ask questions using `AskUserQuestion` tool, respond conversationally
10
10
 
@@ -90,6 +90,22 @@ Example questions:
90
90
  - Keep your responses short between questions — don't lecture
91
91
  - Acknowledge answers briefly before asking the next question
92
92
 
93
+ ### Decision Capture
94
+ When the user signals they are ready to move on, before presenting next-step options, extract up to 4 candidate decisions from the session (meaningful choices about approach, scope, or constraints). Present via `AskUserQuestion` with `multiSelect: true`, e.g.:
95
+
96
+ ```json
97
+ {"questions": [{"question": "Which decisions should be recorded?", "header": "Decisions", "multiSelect": true,
98
+ "options": [{"label": "[APPROACH] Use event sourcing", "description": "Matches audit requirements"}]}]}
99
+ ```
100
+
101
+ For each confirmed decision, append to `.deepflow/decisions.md` (create if missing):
102
+ ```
103
+ ### {YYYY-MM-DD} — discover
104
+ - [APPROACH] Decision text — rationale
105
+ ```
106
+
107
+ Tags: `[APPROACH]` firm choice · `[PROVISIONAL]` revisit later · `[ASSUMPTION]` unverified belief.
108
+
93
109
  ### When the User Wants to Move On
94
110
  When the user signals they want to advance (e.g., "I think that's enough", "let's move on", "ready for next step"):
95
111
 
@@ -4,9 +4,9 @@
4
4
 
5
5
  You are a coordinator. Spawn agents, wait for results, update PLAN.md. Never implement code yourself.
6
6
 
7
- **NEVER:** Read source files, edit code, run tests, run git commands (except status), use TaskOutput
7
+ **NEVER:** Read source files, edit code, run tests, run git commands (except status), use TaskOutput, use EnterPlanMode, use ExitPlanMode
8
8
 
9
- **ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, read `.deepflow/results/*.yaml` on completion notifications, update PLAN.md
9
+ **ONLY:** Read PLAN.md, read specs/doing-*.md, spawn background agents, read `.deepflow/results/*.yaml` on completion notifications, update PLAN.md, write `.deepflow/decisions.md` in the main tree
10
10
 
11
11
  ---
12
12
 
@@ -99,8 +99,16 @@ task: T3
99
99
  status: success|failed
100
100
  commit: abc1234
101
101
  summary: "one line"
102
+ tests_ran: true|false
103
+ test_command: "npm test"
104
+ test_exit_code: 0
105
+ test_output_tail: |
106
+ PASS src/upload.test.ts
107
+ Tests: 12 passed, 12 total
102
108
  ```
103
109
 
110
+ New fields: `tests_ran` (bool), `test_command` (string), `test_exit_code` (int), `test_output_tail` (last 20 lines of output).
111
+
104
112
  **Spike result file** `.deepflow/results/{task_id}.yaml` (additional fields):
105
113
  ```yaml
106
114
  task: T1
@@ -400,8 +408,18 @@ Example: To edit src/foo.ts, use:
400
408
 
401
409
  Do NOT write files to the main project directory.
402
410
 
403
- Implement, test, commit as feat({spec}): {description}.
404
- Write result to {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
411
+ Steps:
412
+ 1. Implement the task
413
+ 2. Detect test command: check for package.json (npm test), pyproject.toml (pytest),
414
+ Cargo.toml (cargo test), go.mod (go test ./...), or Makefile (make test)
415
+ 3. Run tests if test infrastructure exists:
416
+ - Run the detected test command
417
+ - If tests fail: fix the code and re-run until passing
418
+ - Do NOT commit with failing tests
419
+ 4. If NO test infrastructure: set tests_ran: false in result file
420
+ 5. Commit as feat({spec}): {description}
421
+ 6. Write result file with ALL fields including test evidence (see schema):
422
+ {worktree_absolute_path}/.deepflow/results/{task_id}.yaml
405
423
 
406
424
  **STOP after writing the result file. Do NOT:**
407
425
  - Merge branches or cherry-pick commits
@@ -427,6 +445,7 @@ Steps:
427
445
  3. Write experiment as --active.md (verifier determines final status)
428
446
  4. Commit: spike({spec}): validate {hypothesis}
429
447
  5. Write result to .deepflow/results/{task_id}.yaml (see spike result schema)
448
+ 6. If test infrastructure exists, also run tests and include evidence in result file
430
449
 
431
450
  Rules:
432
451
  - `met: true` ONLY if actual satisfies target
@@ -491,16 +510,36 @@ After spawning wave agents, your turn ENDS. Completion notifications drive the l
491
510
 
492
511
  **Per notification:**
493
512
  1. Read result file for the completed agent
494
- 2. TaskUpdate(taskId: native_id, status: "completed") — auto-unblocks dependent tasks
495
- 3. Update PLAN.md: `[ ]` `[x]` + commit hash (as before)
496
- 4. Report ONE line: "✓ Tx: status (commit)"
497
- 5. If NOT all wave agents done end turn, wait
498
- 6. If ALL wave agents done use TaskList to find newly unblocked tasks, check context, spawn next wave or finish
513
+ 2. Validate test evidence:
514
+ - `tests_ran: true` + `test_exit_code: 0` trust result
515
+ - `tests_ran: true` + `test_exit_code: non-zero` status MUST be failed (flag mismatch if agent said success)
516
+ - `tests_ran: false` + `status: success`flag: "⚠ Tx: success but no tests ran"
517
+ 3. TaskUpdate(taskId: native_id, status: "completed") auto-unblocks dependent tasks
518
+ 4. Update PLAN.md: `[ ]` → `[x]` + commit hash (as before)
519
+ 5. Report: "✓ T1: success (abc123) [12 tests passed]" or "⚠ T1: success (abc123) [no tests]"
520
+ 6. If NOT all wave agents done → end turn, wait
521
+ 7. If ALL wave agents done → use TaskList to find newly unblocked tasks, check context, spawn next wave or finish
499
522
 
500
523
  **Between waves:** Check context %. If ≥50%, checkpoint and exit.
501
524
 
502
525
  **Repeat** until: all done, all blocked, or context ≥50% (checkpoint).
503
526
 
527
+ ### 11. CAPTURE DECISIONS
528
+
529
+ After all tasks complete (or all blocked), extract up to 4 candidate decisions from the session (implementation patterns, deviations from plan, key assumptions made).
530
+
531
+ Present via AskUserQuestion with multiSelect: true. Labels: `[TAG] decision text`. Descriptions: rationale.
532
+
533
+ For each confirmed decision, append to **main tree** `.deepflow/decisions.md` (create if missing):
534
+ ```
535
+ ### {YYYY-MM-DD} — execute
536
+ - [APPROACH] Parallel agent spawn for independent tasks — confirmed no file conflicts
537
+ ```
538
+
539
+ Main tree path: use the repo root (parent of `.deepflow/worktrees/`), NOT the worktree.
540
+
541
+ Max 4 candidates per prompt. Tags: [APPROACH], [PROVISIONAL], [ASSUMPTION].
542
+
504
543
  ## Rules
505
544
 
506
545
  | Rule | Detail |
@@ -0,0 +1,206 @@
1
+ # /df:note — Capture Decisions from Free Conversations
2
+
3
+ ## Orchestrator Role
4
+
5
+ You scan prior conversation context for candidate decisions, present them for user confirmation, and persist confirmed decisions to `.deepflow/decisions.md`.
6
+
7
+ **NEVER:** Spawn agents, use Task tool, use Glob/Grep on source code, run git, use TaskOutput, use EnterPlanMode, use ExitPlanMode
8
+
9
+ **ONLY:** Read `.deepflow/decisions.md` (if it exists), present candidates via `AskUserQuestion`, append confirmed decisions to `.deepflow/decisions.md`
10
+
11
+ ---
12
+
13
+ ## Purpose
14
+
15
+ Capture decisions that emerged during free conversations outside of deepflow commands. Surfaces candidate decisions from the current conversation, lets the user confirm or discard each, and persists confirmed ones to the shared decisions log.
16
+
17
+ ## Usage
18
+
19
+ ```
20
+ /df:note
21
+ ```
22
+
23
+ No arguments required. Operates on the current conversation context.
24
+
25
+ ---
26
+
27
+ ## Behavior
28
+
29
+ ### 1. EXTRACT CANDIDATES
30
+
31
+ Scan the prior conversation messages for candidate decisions. A decision is any resolved choice, adopted approach, or stated assumption that affects how the work is done. Look for:
32
+
33
+ - **Approaches chosen**: "we'll use X instead of Y", "let's go with X"
34
+ - **Provisional choices**: "for now we'll use X", "assuming X until we know more"
35
+ - **Stated assumptions**: "assuming X is true", "treating X as given"
36
+ - **Constraints accepted**: "we won't do X", "X is out of scope"
37
+ - **Naming or structural choices**: "we'll call it X", "X goes in the Y layer"
38
+
39
+ Extract **at most 4 candidates** from the conversation. Prioritize the most consequential or recent ones.
40
+
41
+ For each candidate, determine:
42
+ - **Tag**: one of `[APPROACH]`, `[PROVISIONAL]`, or `[ASSUMPTION]`
43
+ - `[APPROACH]` — a deliberate design or implementation choice
44
+ - `[PROVISIONAL]` — works for now, expected to revisit
45
+ - `[ASSUMPTION]` — treating something as true without full validation
46
+ - **Decision text**: one concise line describing the choice
47
+ - **Rationale**: one sentence explaining why this was chosen
48
+
49
+ If fewer than 2 clear candidates are found, say so briefly and exit without calling `AskUserQuestion`.
50
+
51
+ ### 2. CHECK FOR CONTRADICTIONS
52
+
53
+ Read `.deepflow/decisions.md` if it exists. For each candidate, check whether it contradicts a prior entry in the file.
54
+
55
+ If a contradiction is found:
56
+ - Keep the prior entry — never delete or modify it
57
+ - Amend the candidate's rationale to reference the prior decision: `was "X", now "Y" because Z`
58
+
59
+ ### 3. PRESENT VIA AskUserQuestion
60
+
61
+ Present candidates as a multi-select question with at most 4 options (tool limit).
62
+
63
+ ```json
64
+ {
65
+ "questions": [
66
+ {
67
+ "question": "These decisions were detected in your conversation. Which should be saved to .deepflow/decisions.md?",
68
+ "header": "Save notes?",
69
+ "multiSelect": true,
70
+ "options": [
71
+ {
72
+ "label": "[APPROACH] <decision text>",
73
+ "description": "<rationale>"
74
+ },
75
+ {
76
+ "label": "[PROVISIONAL] <decision text>",
77
+ "description": "<rationale>"
78
+ }
79
+ ]
80
+ }
81
+ ]
82
+ }
83
+ ```
84
+
85
+ Each option's `label` is the tag + decision text. Each `description` is the rationale (one sentence).
86
+
87
+ ### 4. APPEND CONFIRMED DECISIONS
88
+
89
+ For each option the user selects:
90
+
91
+ 1. If `.deepflow/decisions.md` does not exist, create it with a blank header:
92
+ ```
93
+ # Decisions
94
+ ```
95
+
96
+ 2. Append a new dated section using today's date in `YYYY-MM-DD` format and source `note`:
97
+
98
+ ```markdown
99
+ ### 2026-02-22 — note
100
+ - [APPROACH] Use event sourcing over CRUD — append-only log matches audit requirements
101
+ - [PROVISIONAL] Batch size = 50 — works for 4-game dataset, revisit at scale
102
+ ```
103
+
104
+ 3. If multiple decisions are confirmed in one invocation, group them under a single dated section.
105
+
106
+ 4. Never modify or delete any prior entries.
107
+
108
+ ### 5. CONFIRM
109
+
110
+ After writing, report to the user:
111
+
112
+ ```
113
+ Saved N decision(s) to .deepflow/decisions.md
114
+ ```
115
+
116
+ If the user selected nothing, respond:
117
+
118
+ ```
119
+ No decisions saved.
120
+ ```
121
+
122
+ ---
123
+
124
+ ## Decision Format
125
+
126
+ ```
127
+ ### YYYY-MM-DD — note
128
+ - [TAG] Decision text — rationale
129
+ ```
130
+
131
+ **Tags:**
132
+ - `[APPROACH]` — deliberate design or implementation choice
133
+ - `[PROVISIONAL]` — works for now, will revisit at scale or with more information
134
+ - `[ASSUMPTION]` — treating something as true without full confirmation
135
+
136
+ **Contradiction handling:** Never delete prior entries. When a new decision contradicts an older one, include a reference in the rationale: `was "X", now "Y" because Z`.
137
+
138
+ ---
139
+
140
+ ## Rules
141
+
142
+ - **Maximum 4 candidates** per invocation (tool limit for AskUserQuestion options)
143
+ - **multiSelect: true** — user can confirm any subset, including all or none
144
+ - **Never invent decisions** — only extract what was genuinely discussed and resolved in the conversation
145
+ - **Never modify prior entries** in `.deepflow/decisions.md`
146
+ - **Create the file if absent** — first use initializes it with `# Decisions` header
147
+ - **Source is always `note`** — this distinguishes from decisions captured by other df commands
148
+ - **Date is today's date** — use the current date in `YYYY-MM-DD` format
149
+ - **One AskUserQuestion call** — present all candidates in a single call, not sequentially
150
+
151
+ ---
152
+
153
+ ## Example
154
+
155
+ ```
156
+ USER: /df:note
157
+
158
+ CLAUDE: Let me scan our conversation for candidate decisions.
159
+
160
+ [Reads .deepflow/decisions.md — finds prior entry:
161
+ "### 2026-02-10 — spec
162
+ - [APPROACH] Use PostgreSQL for primary storage — team already knows it"]
163
+
164
+ [Extracts from conversation:
165
+ 1. [APPROACH] Use Redis for session state — avoids DB round-trips on hot path
166
+ 2. [PROVISIONAL] Skip auth for internal APIs — only 2 internal services for now
167
+ 3. [ASSUMPTION] Peak load is 500 req/s — based on current traffic, unvalidated
168
+ ]
169
+
170
+ [No contradictions found with prior entries]
171
+
172
+ [Uses AskUserQuestion]
173
+ {
174
+ "questions": [
175
+ {
176
+ "question": "These decisions were detected in your conversation. Which should be saved to .deepflow/decisions.md?",
177
+ "header": "Save notes?",
178
+ "multiSelect": true,
179
+ "options": [
180
+ {
181
+ "label": "[APPROACH] Use Redis for session state",
182
+ "description": "Avoids DB round-trips on hot path"
183
+ },
184
+ {
185
+ "label": "[PROVISIONAL] Skip auth for internal APIs",
186
+ "description": "Only 2 internal services for now, revisit when we open to more consumers"
187
+ },
188
+ {
189
+ "label": "[ASSUMPTION] Peak load is 500 req/s",
190
+ "description": "Based on current traffic patterns, not load-tested"
191
+ }
192
+ ]
193
+ }
194
+ ]
195
+ }
196
+
197
+ USER: [Selects: Use Redis for session state, Peak load is 500 req/s]
198
+
199
+ CLAUDE: [Appends to .deepflow/decisions.md:]
200
+
201
+ ### 2026-02-22 — note
202
+ - [APPROACH] Use Redis for session state — avoids DB round-trips on hot path
203
+ - [ASSUMPTION] Peak load is 500 req/s — based on current traffic patterns, not load-tested
204
+
205
+ Saved 2 decision(s) to .deepflow/decisions.md
206
+ ```
@@ -3,6 +3,8 @@
3
3
  ## Purpose
4
4
  Compare specs against codebase and past experiments. Generate prioritized tasks.
5
5
 
6
+ **NEVER:** use EnterPlanMode, use ExitPlanMode — this command IS the planning phase; native plan mode conflicts with it
7
+
6
8
  ## Usage
7
9
  ```
8
10
  /df:plan # Plan all new specs
@@ -220,6 +222,17 @@ Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validatio
220
222
 
221
223
  `✓ Plan generated — {n} specs, {n} tasks. Run /df:execute`
222
224
 
225
+ ### 11. CAPTURE DECISIONS
226
+
227
+ Extract up to 4 candidate decisions (approaches chosen, spike strategies, prioritization rationale). Present via AskUserQuestion with `multiSelect: true`. Each option: `label: "[TAG] <decision>"`, `description: "<rationale>"`. Tags: `[APPROACH]`, `[PROVISIONAL]`, `[ASSUMPTION]`.
228
+
229
+ Append confirmed decisions to `.deepflow/decisions.md` (create if missing):
230
+ ```
231
+ ### {YYYY-MM-DD} — plan
232
+ - [TAG] Decision text — rationale summary
233
+ ```
234
+ If a decision contradicts a prior entry, add: `(supersedes: <prior text>)`
235
+
223
236
  ## Rules
224
237
  - **Never use TaskOutput** — Returns full transcripts that explode context
225
238
  - **Never use run_in_background for Explore agents** — Causes late notifications that pollute output
@@ -0,0 +1,130 @@
1
+ # /df:resume — Session Continuity Briefing
2
+
3
+ ## Orchestrator Role
4
+
5
+ You are a context synthesizer. Your ONLY job is to read project state from multiple sources and produce a concise, structured briefing so developers can resume work after a break.
6
+
7
+ **NEVER:** Write files, create files, modify files, append to files, run git with write operations, use AskUserQuestion, spawn agents, use TaskOutput, use EnterPlanMode, use ExitPlanMode
8
+
9
+ **ONLY:** Read files (Bash read-only git commands, Read tool, Glob, Grep), write briefing to stdout
10
+
11
+ ---
12
+
13
+ ## Purpose
14
+
15
+ Synthesize project state into a 200-500 word briefing covering what happened, what decisions are live, and what to do next. Pure read-only — writes nothing.
16
+
17
+ ## Usage
18
+
19
+ ```
20
+ /df:resume
21
+ ```
22
+
23
+ ## Behavior
24
+
25
+ ### 1. GATHER SOURCES
26
+
27
+ Read these sources in parallel (all reads, no writes):
28
+
29
+ | Source | Command/Path | Purpose |
30
+ |--------|-------------|---------|
31
+ | Git timeline | `git log --oneline -20` | What changed and when |
32
+ | Decisions | `.deepflow/decisions.md` | Current [APPROACH], [PROVISIONAL], [ASSUMPTION] entries |
33
+ | Plan | `PLAN.md` | Task status (checked vs unchecked) |
34
+ | Spec headers | `specs/doing-*.md` (first 20 lines each) | What features are in-flight |
35
+ | Experiments | `.deepflow/experiments/` (file listing + names) | Validated and failed approaches |
36
+
37
+ **Token budget:** Read only what's needed — ~2500 tokens total across all sources.
38
+
39
+ If a source does not exist, skip it silently (do not error or warn).
40
+
41
+ ### 2. SYNTHESIZE BRIEFING
42
+
43
+ Produce a 200-500 word briefing with exactly three sections:
44
+
45
+ ---
46
+
47
+ **## Timeline**
48
+
49
+ Summarize what happened and when, derived from `git log --oneline -20` and spec/PLAN.md state. Describe the arc of work: what was completed, what is in-flight, notable milestones. Reference dates or commit messages where informative. Aim for 3-6 sentences.
50
+
51
+ **## Live Decisions**
52
+
53
+ List all current `[APPROACH]`, `[PROVISIONAL]`, and `[ASSUMPTION]` entries from `.deepflow/decisions.md`. Present each as a bullet with its tag, the decision text, and a brief rationale if available.
54
+
55
+ If `.deepflow/decisions.md` does not exist or is empty: state "No decisions recorded yet."
56
+
57
+ Do not filter or editorialize — report all live decision entries as found. If a decision has been contradicted (a newer entry supersedes it), show only the newest entry for that topic.
58
+
59
+ **## Next Steps**
60
+
61
+ Derive next steps from `PLAN.md` task status:
62
+ - List unchecked tasks (`- [ ]`) that are not blocked
63
+ - Note blocked tasks and what they are blocked by
64
+ - If no PLAN.md exists: suggest running `/df:plan` to generate tasks
65
+
66
+ Prioritize: unblocked tasks first, then blocked tasks with their blockers, then any tasks with no explicit status.
67
+
68
+ ---
69
+
70
+ ### 3. OUTPUT
71
+
72
+ Print the briefing directly to stdout. No file writes. No file creation.
73
+
74
+ Format:
75
+
76
+ ```
77
+ ## Timeline
78
+ [3-6 sentences on what happened]
79
+
80
+ ## Live Decisions
81
+ - [APPROACH] ...
82
+ - [PROVISIONAL] ...
83
+ - [ASSUMPTION] ...
84
+
85
+ ## Next Steps
86
+ - T1: ... (ready)
87
+ - T2: ... (blocked by T1)
88
+ ```
89
+
90
+ Word count target: 200-500 words. Do not pad. Do not truncate important information to hit limits.
91
+
92
+ ---
93
+
94
+ ## Rules
95
+
96
+ - **NEVER write any file** — not decisions.md, not PLAN.md, not any new file
97
+ - **NEVER use AskUserQuestion** — this command is read-only, no interaction
98
+ - **NEVER spawn agents** — read directly using Bash (git log) and Read tool
99
+ - **NEVER use TaskOutput** — returns full transcripts that explode context
100
+ - **NEVER use EnterPlanMode or ExitPlanMode**
101
+ - Read sources in a single pass — do not loop or re-read
102
+ - If a source file is missing, skip it and note it only if relevant
103
+ - Contradicted decisions: show newest entry per topic only
104
+ - Token budget: stay within ~2500 tokens of input to produce ~500 words of output
105
+
106
+ ## Example
107
+
108
+ ```
109
+ USER: /df:resume
110
+
111
+ CLAUDE:
112
+
113
+ ## Timeline
114
+ Work began on the session-continuity feature on 2026-02-20, with the spec
115
+ formalized after a discover and debate phase. The plan was generated with
116
+ 5 tasks across two areas: decision capture in existing commands and two new
117
+ commands (note, resume). Tasks T1-T3 are complete (decisions.md format,
118
+ discover capture, plan capture). T4 (execute capture) and T5 (resume command)
119
+ remain in progress.
120
+
121
+ ## Live Decisions
122
+ - [APPROACH] Store decisions in .deepflow/decisions.md as append-only markdown — chosen over database to keep diffs readable and avoid new dependencies
123
+ - [PROVISIONAL] Max 4 candidates per AskUserQuestion call — matches tool limit, revisit if UX feels too chunked
124
+ - [ASSUMPTION] Worktree execute writes to main tree .deepflow/ path — valid as long as main tree is always the parent
125
+
126
+ ## Next Steps
127
+ - T4: Add decision capture to /df:execute (ready — unblocked)
128
+ - T5: Create /df:resume command (ready — unblocked)
129
+ - T6: Add decision capture to /df:verify (blocked by T4)
130
+ ```
@@ -4,7 +4,7 @@
4
4
 
5
5
  You coordinate agents and ask questions. You never search code directly.
6
6
 
7
- **NEVER:** Read source files, use Glob/Grep directly, run git, use TaskOutput
7
+ **NEVER:** Read source files, use Glob/Grep directly, run git, use TaskOutput, use EnterPlanMode, use ExitPlanMode
8
8
 
9
9
  **ONLY:** Spawn agents (non-background), ask user questions, write spec file
10
10
 
@@ -176,6 +176,18 @@ Acceptance criteria: {count}
176
176
  Next: Run /df:plan to generate tasks
177
177
  ```
178
178
 
179
+ ### 6. CAPTURE DECISIONS
180
+
181
+ Extract up to 4 candidate decisions (requirements chosen, constraints accepted). Use `AskUserQuestion` with `multiSelect: true`:
182
+ - `label`: `[APPROACH|PROVISIONAL|ASSUMPTION] <decision>`
183
+ - `description`: rationale
184
+
185
+ Append each confirmed selection to `.deepflow/decisions.md` (create if absent):
186
+ ```
187
+ ### {YYYY-MM-DD} — spec
188
+ - [TAG] <decision> — <rationale>
189
+ ```
190
+
179
191
  ## Rules
180
192
  - **Orchestrator never searches** — Spawn agents for all codebase exploration
181
193
  - Do NOT generate spec if critical gaps remain
@@ -3,6 +3,8 @@
3
3
  ## Purpose
4
4
  Check that implemented code satisfies spec requirements and acceptance criteria.
5
5
 
6
+ **NEVER:** use EnterPlanMode, use ExitPlanMode
7
+
6
8
  ## Usage
7
9
  ```
8
10
  /df:verify # Verify all done-* specs
@@ -40,16 +42,86 @@ Load:
40
42
 
41
43
  If no done-* specs: report counts, suggest `--doing`.
42
44
 
45
+ ### 1.5. DETECT PROJECT COMMANDS
46
+
47
+ Detect build and test commands by inspecting project files in the worktree.
48
+
49
+ **Config override always wins.** If `.deepflow/config.yaml` has `quality.test_command` or `quality.build_command`, use those.
50
+
51
+ **Auto-detection (first match wins):**
52
+
53
+ | File | Build | Test |
54
+ |------|-------|------|
55
+ | `package.json` with `scripts.build` | `npm run build` | `npm test` (if scripts.test is not default placeholder) |
56
+ | `pyproject.toml` or `setup.py` | — | `pytest` |
57
+ | `Cargo.toml` | `cargo build` | `cargo test` |
58
+ | `go.mod` | `go build ./...` | `go test ./...` |
59
+ | `Makefile` with `test` target | `make build` (if target exists) | `make test` |
60
+
61
+ **Output:**
62
+ - Commands found: `Build: npm run build | Test: npm test`
63
+ - Nothing found: `⚠ No build/test commands detected. L0/L4 skipped. Set quality.test_command in .deepflow/config.yaml`
64
+
43
65
  ### 2. VERIFY EACH SPEC
44
66
 
67
+ **L0: Build check** (if build command detected)
68
+
69
+ Run the build command in the worktree:
70
+ - Exit code 0 → L0 pass, continue to L1-L3
71
+ - Exit code non-zero → L0 FAIL
72
+ - Report: "✗ L0: Build failed" with last 30 lines of output
73
+ - Add fix task: "Fix build errors" to PLAN.md
74
+ - Do NOT proceed to L1-L4 (no point checking if code doesn't build)
75
+
76
+ **L1-L3: Static analysis** (via Explore agents)
77
+
45
78
  Check requirements, acceptance criteria, and quality (stubs/TODOs).
46
79
  Mark each: ✓ satisfied | ✗ missing | ⚠ partial
47
80
 
81
+ **L4: Test execution** (if test command detected)
82
+
83
+ Run AFTER L0 passes and L1-L3 complete. Run even if L1-L3 found issues — test failures reveal additional problems.
84
+
85
+ - Run test command in the worktree (timeout from config, default 5 min)
86
+ - Exit code 0 → L4 pass
87
+ - Exit code non-zero → L4 FAIL
88
+ - Capture last 50 lines of output
89
+ - Report: "✗ L4: Tests failed (N of M)" with relevant output
90
+ - Add fix task: "Fix failing tests" with test output in description
91
+
92
+ **Flaky test handling** (if `quality.test_retry_on_fail: true` in config):
93
+ - If tests fail, re-run ONCE
94
+ - Second run passes → L4 pass with note: "⚠ L4: Passed on retry (possible flaky test)"
95
+ - Second run fails → genuine failure
96
+
48
97
  ### 3. GENERATE REPORT
49
98
 
50
- Report per spec: requirements count, acceptance count, quality issues.
99
+ Report per spec with L0/L4 status, requirements count, acceptance count, quality issues.
100
+
101
+ **Format on success:**
102
+ ```
103
+ done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✓ (12 tests) | 0 quality issues
104
+ ```
105
+
106
+ **Format on failure:**
107
+ ```
108
+ done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✗ (3 failed) | 0 quality issues
51
109
 
52
- **If all pass:** Proceed to Post-Verification merge.
110
+ Issues:
111
+ ✗ L4: 3 test failures
112
+ FAIL src/upload.test.ts > should validate file type
113
+ FAIL src/upload.test.ts > should reject oversized files
114
+
115
+ Fix tasks added to PLAN.md:
116
+ T10: Fix 3 failing tests in upload module
117
+ ```
118
+
119
+ **Gate conditions (ALL must pass to merge):**
120
+ - L0: Build passes (or no build command detected)
121
+ - L1-L3: All requirements satisfied, no stubs, properly wired
122
+ - L4: Tests pass (or no test command detected)
123
+
124
+ **If all gates pass:** Proceed to Post-Verification merge.
53
125
 
54
126
  **If issues found:** Add fix tasks to PLAN.md in the worktree and register as native tasks, then loop back to execute:
55
127
 
@@ -67,15 +139,19 @@ Report per spec: requirements count, acceptance count, quality issues.
67
139
  4. Output report + next step:
68
140
 
69
141
  ```
70
- done-upload.md: 4/4 reqs ✓, 3/5 acceptance ✗, 1 quality issue
142
+ done-upload.md: L0 ✓ | 4/4 reqs ✓, 3/5 acceptance | L4 ✗ (2 failed) | 1 quality issue
71
143
 
72
144
  Issues:
73
145
  ✗ AC-3: YAML parsing missing for consolation
146
+ ✗ L4: 2 test failures
147
+ FAIL src/upload.test.ts > should validate file type
148
+ FAIL src/upload.test.ts > should reject oversized files
74
149
  ⚠ Quality: TODO in parse_config()
75
150
 
76
151
  Fix tasks added to PLAN.md:
77
152
  T10: Add YAML parsing for consolation section
78
- T11: Remove TODO in parse_config()
153
+ T11: Fix 2 failing tests in upload module
154
+ T12: Remove TODO in parse_config()
79
155
 
80
156
  Run /df:execute --continue to fix in the same worktree.
81
157
  ```
@@ -105,14 +181,16 @@ Files: ...
105
181
 
106
182
  ## Verification Levels
107
183
 
108
- | Level | Check | Method |
109
- |-------|-------|--------|
110
- | L1: Exists | File/function exists | Glob/Grep |
111
- | L2: Substantive | Real code, not stub | Read + analyze |
112
- | L3: Wired | Integrated into system | Trace imports/calls |
113
- | L4: Tested | Has passing tests | Run tests |
184
+ | Level | Check | Method | Runner |
185
+ |-------|-------|--------|--------|
186
+ | L0: Builds | Code compiles/builds | Run build command | Orchestrator (Bash) |
187
+ | L1: Exists | File/function exists | Glob/Grep | Explore agents |
188
+ | L2: Substantive | Real code, not stub | Read + analyze | Explore agents |
189
+ | L3: Wired | Integrated into system | Trace imports/calls | Explore agents |
190
+ | L4: Tested | Tests pass | Run test command | Orchestrator (Bash) |
114
191
 
115
- Default: L1-L3 (L4 optional, can be slow)
192
+ **Default: L0 through L4.** L0 and L4 are skipped ONLY if no build/test command is detected (see step 1.5).
193
+ L0 and L4 run directly via Bash — Explore agents cannot execute commands.
116
194
 
117
195
  ## Rules
118
196
  - **Never use TaskOutput** — Returns full transcripts that explode context
@@ -147,10 +225,12 @@ Scale: 1-2 agents per spec, cap 10.
147
225
  ```
148
226
  /df:verify
149
227
 
150
- done-upload.md: 4/4 reqs ✓, 5/5 acceptance ✓, clean
151
- done-auth.md: 2/2 reqs ✓, 3/3 acceptance ✓, clean
228
+ Build: npm run build | Test: npm test
229
+
230
+ done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance ✓ | L4 ✓ (12 tests) | 0 quality issues
231
+ done-auth.md: L0 ✓ | 2/2 reqs ✓, 3/3 acceptance ✓ | L4 ✓ (8 tests) | 0 quality issues
152
232
 
153
- ✓ All specs verified
233
+ ✓ All gates passed
154
234
 
155
235
  ✓ Merged df/upload to main
156
236
  ✓ Cleaned up worktree and branch
@@ -163,22 +243,29 @@ Learnings captured:
163
243
  ```
164
244
  /df:verify --doing
165
245
 
166
- doing-upload.md: 4/4 reqs ✓, 3/5 acceptance ✗, 1 quality issue
246
+ Build: npm run build | Test: npm test
247
+
248
+ doing-upload.md: L0 ✓ | 4/4 reqs ✓, 3/5 acceptance ✗ | L4 ✗ (3 failed) | 1 quality issue
167
249
 
168
250
  Issues:
169
251
  ✗ AC-3: YAML parsing missing for consolation
252
+ ✗ L4: 3 test failures
253
+ FAIL src/upload.test.ts > should validate file type
254
+ FAIL src/upload.test.ts > should reject oversized files
255
+ FAIL src/upload.test.ts > should handle empty input
170
256
  ⚠ Quality: TODO in parse_config()
171
257
 
172
258
  Fix tasks added to PLAN.md:
173
259
  T10: Add YAML parsing for consolation section
174
- T11: Remove TODO in parse_config()
260
+ T11: Fix 3 failing tests in upload module
261
+ T12: Remove TODO in parse_config()
175
262
 
176
263
  Run /df:execute --continue to fix in the same worktree.
177
264
  ```
178
265
 
179
266
  ## Post-Verification: Worktree Merge & Cleanup
180
267
 
181
- **Only runs when ALL specs pass verification.** If issues were found, fix tasks were added to PLAN.md instead (see step 3).
268
+ **Only runs when ALL gates pass** (L0 build, L1-L3 static analysis, L4 tests). If any gate fails, fix tasks were added to PLAN.md instead (see step 3).
182
269
 
183
270
  ### 1. DISCOVER WORKTREE
184
271
 
@@ -240,3 +327,17 @@ rm -f .deepflow/checkpoint.json
240
327
 
241
328
  Workflow complete! Ready for next feature: /df:spec <name>
242
329
  ```
330
+
331
+ ### 4. CAPTURE DECISIONS (success path only)
332
+
333
+ Extract up to 4 candidate decisions (quality findings, patterns validated, lessons learned). Present via AskUserQuestion with `multiSelect: true`; tags: `[APPROACH]`, `[PROVISIONAL]`, `[ASSUMPTION]`.
334
+
335
+ ```
336
+ AskUserQuestion(question: "Which decisions to record?", multiSelect: true,
337
+ options: [{ label: "[APPROACH] <decision>", description: "<rationale>" }, ...])
338
+ ```
339
+
340
+ For each confirmed decision, append to `.deepflow/decisions.md` (create if missing):
341
+ `### {YYYY-MM-DD} — verify` / `- [TAG] {decision text} — {rationale}`
342
+
343
+ Skip if user confirms none or declines.
@@ -61,3 +61,17 @@ worktree:
61
61
 
62
62
  # Keep worktree after failed execution for debugging
63
63
  cleanup_on_fail: false
64
+
65
+ # Quality gates for /df:verify
66
+ quality:
67
+ # Override auto-detected build command (e.g., "npm run build", "cargo build")
68
+ build_command: ""
69
+
70
+ # Override auto-detected test command (e.g., "npm test", "pytest", "go test ./...")
71
+ test_command: ""
72
+
73
+ # Test timeout in seconds (default: 300 = 5 minutes)
74
+ test_timeout: 300
75
+
76
+ # Retry flaky tests once before failing (default: true)
77
+ test_retry_on_fail: true