deepflow 0.1.67 → 0.1.69

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -77,20 +77,21 @@ You write the specs, then walk away. The AI runs the full pipeline — hypothesi
77
77
  ```bash
78
78
  # You define WHAT (the specs), the AI figures out HOW, overnight
79
79
 
80
+ # Requires Agent Teams (experimental feature)
81
+ export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
82
+
80
83
  deepflow auto # process all specs in specs/
81
- deepflow auto --parallel=3 # 3 approaches in parallel
82
- deepflow auto --hypotheses=4 # 4 hypotheses per cycle
83
- deepflow auto --max-cycles=5 # cap retry cycles
84
84
  ```
85
85
 
86
86
  **What the AI does alone:**
87
- 1. Discovers specs (auto-promotes plain specs to `doing-*`)
88
- 2. Generates N hypotheses for how to implement each spec
89
- 3. Runs parallel spikes in isolated worktrees (one per hypothesis)
90
- 4. Implements the passing approaches
91
- 5. Adversarial selection: a fresh AI context compares approaches by artifacts only (never reads code), picks the best or rejects all
92
- 6. If rejected: generates new hypotheses, retries (up to max-cycles)
93
- 7. On convergence: verifies, merges to main
87
+ 1. Pre-checks if spec is already satisfied (skips if so)
88
+ 2. Discovers specs, respects `depends_on` ordering
89
+ 3. Generates N hypotheses for how to implement each spec
90
+ 4. Runs parallel spikes in isolated worktrees (one per hypothesis)
91
+ 5. Implements the passing approaches
92
+ 6. Adversarial selection: a fresh AI context compares approaches by artifacts only (never reads code), picks the best or rejects all
93
+ 7. If rejected: generates new hypotheses, retries (up to max-cycles)
94
+ 8. On convergence: verifies (L0-L4 gates), creates PR, merges to main
94
95
 
95
96
  **What you do:** Write specs (via interactive mode or manually) in `specs/`, run `deepflow auto`, read the morning report at `.deepflow/auto-report.md`. No need to run `/df:plan` first — auto mode promotes plain specs to `doing-*` automatically.
96
97
 
@@ -102,7 +103,8 @@ $ claude
102
103
  > /df:spec auth # creates specs/auth.md
103
104
  > /exit
104
105
 
105
- # In your terminal — run auto mode and walk away
106
+ # In your terminal — enable agent teams and run auto mode
107
+ $ export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
106
108
  $ deepflow auto
107
109
 
108
110
  # Next morning — check what happened
@@ -165,9 +167,11 @@ $ git log --oneline
165
167
 
166
168
  ```
167
169
  deepflow auto
168
- | Discover specs (auto-promote plain specs to doing-*)
170
+ | Discover specs (auto-promote, topological sort by depends_on)
169
171
  | For each doing-* spec:
170
172
  |
173
+ | Pre-check (Haiku: already satisfied? skip)
174
+ | v
171
175
  | Validate spec (malformed? skip)
172
176
  | v
173
177
  | Generate N hypotheses
@@ -177,7 +181,7 @@ deepflow auto
177
181
  | | Fail? -> record experiment, discard
178
182
  | v
179
183
  | Adversarial selection (fresh context, artifacts only)
180
- | | Winner? -> verify & merge
184
+ | | Winner? -> verify (L0-L4) -> PR -> merge
181
185
  | | Reject all? -> new hypotheses, retry
182
186
  | v
183
187
  | Morning report -> .deepflow/auto-report.md
package/bin/install.js CHANGED
@@ -24,7 +24,7 @@ if (process.argv[2] === 'auto') {
24
24
  process.exit(1);
25
25
  }
26
26
  try {
27
- execFileSync('claude', ['--agent', '.claude/agents/deepflow-auto.md', ...process.argv.slice(3)], { stdio: 'inherit' });
27
+ execFileSync('claude', ['--agent', '.claude/agents/deepflow-auto.md', '-p', 'Run the full autonomous cycle. Process all doing-* specs.', ...process.argv.slice(3)], { stdio: 'inherit' });
28
28
  } catch (e) {
29
29
  process.exit(e.status || 1);
30
30
  }
@@ -198,8 +198,7 @@ async function main() {
198
198
  console.log(`Installed to ${c.cyan}${CLAUDE_DIR}${c.reset}:`);
199
199
  console.log(' commands/df/ — /df:discover, /df:debate, /df:spec, /df:plan, /df:execute, /df:verify, /df:note, /df:resume, /df:update');
200
200
  console.log(' skills/ — gap-discovery, atomic-commits, code-completeness');
201
- console.log(' agents/ — reasoner');
202
- console.log(' bin/ — deepflow auto (autonomous overnight execution)');
201
+ console.log(' agents/ — reasoner, deepflow-auto (autonomous overnight execution)');
203
202
  if (level === 'global') {
204
203
  console.log(' hooks/ — statusline, update checker');
205
204
  }
@@ -414,7 +413,8 @@ async function uninstall() {
414
413
  'skills/atomic-commits',
415
414
  'skills/code-completeness',
416
415
  'skills/gap-discovery',
417
- 'agents/reasoner.md'
416
+ 'agents/reasoner.md',
417
+ 'agents/deepflow-auto.md'
418
418
  ];
419
419
 
420
420
  if (level === 'global') {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "deepflow",
3
- "version": "0.1.67",
3
+ "version": "0.1.69",
4
4
  "description": "Stay in flow state - lightweight spec-driven task orchestration for Claude Code",
5
5
  "keywords": [
6
6
  "claude",
@@ -0,0 +1,667 @@
1
+ ---
2
+ name: deepflow-auto-lead
3
+ description: Lead orchestrator — drives specs from discovery through convergence via teammate agents
4
+ model: sonnet
5
+ env:
6
+ CLAUDE_AUTOCOMPACT_PCT_OVERRIDE: "50"
7
+ ---
8
+
9
+ # Deepflow Auto Lead Agent
10
+
11
+ You orchestrate the autonomous deepflow cycle: discover → hypothesize → spike → implement → select → verify → PR. Each phase spawns fresh teammates — never reuse context across phase boundaries.
12
+
13
+ ## Model Routing
14
+
15
+ | Role | Model | Rationale |
16
+ |------|-------|-----------|
17
+ | Lead (you) | Sonnet | Cheap coordination |
18
+ | Pre-check subagent | Haiku | Fast read-only exploration |
19
+ | Spike teammates | Sonnet | Exploratory, disposable |
20
+ | Implementation teammates | Opus | Thorough, production code |
21
+ | Judge subagent | Opus | Adversarial quality gate |
22
+ | Verifier subagent | Opus | Rigorous gate checks |
23
+
24
+ ## Logging
25
+
26
+ Append every decision to `.deepflow/auto-decisions.log` in this format:
27
+ ```
28
+ [YYYY-MM-DDTHH:MM:SSZ] message
29
+ ```
30
+ Log: phase starts, hypothesis generation, spike pass/fail, selection verdicts, errors, worktree operations.
31
+
32
+ ## Phase 1: DISCOVER (you do this)
33
+
34
+ 1. Run spec lint if `hooks/df-spec-lint.js` exists: `node hooks/df-spec-lint.js specs/doing-*.md --mode=auto`. Skip specs that fail.
35
+ 2. List all `specs/doing-*.md` files. Auto-promote any unprefixed `specs/*.md` to `doing-*.md` (skip `done-*`, dotfiles).
36
+ 3. If no specs found → log error, generate report, stop.
37
+ 4. **Build dependency DAG and determine processing order.**
38
+
39
+ #### 4a. Parse dependencies
40
+
41
+ For each spec file collected in step 2, extract its `## Dependencies` section. Parse each line matching the pattern `- depends_on: <name>`. The `<name>` value may appear in several forms — normalize all of them to the bare spec name:
42
+ - `doing-foo.md` → `foo`
43
+ - `doing-foo` → `foo`
44
+ - `foo.md` → `foo`
45
+ - `foo` → `foo` (already bare)
46
+
47
+ Build an **adjacency list** (map of spec-name → list of dependency spec-names). If a dependency references a spec not in the current set of `doing-*` files, log a warning: `dependency '{dep}' referenced by '{spec}' not found in active specs — ignoring` and skip that edge.
48
+
49
+ #### 4b. Topological sort (Kahn's algorithm)
50
+
51
+ Compute a processing order that respects dependencies:
52
+
53
+ 1. Build an **in-degree map**: for each spec, count how many other specs it depends on (among active specs only).
54
+ 2. Initialize a **queue** with all specs that have in-degree 0 (no dependencies).
55
+ 3. Initialize an empty **sorted list**.
56
+ 4. While the queue is not empty:
57
+ - Remove a spec from the queue and append it to the sorted list.
58
+ - For each spec that depends on the removed spec, decrement its in-degree by 1.
59
+ - If any spec's in-degree reaches 0, add it to the queue.
60
+ 5. After the loop, if the sorted list contains fewer specs than the total number of active specs, a **circular dependency** exists — proceed to step 4c.
61
+ 6. Otherwise, use the sorted list as the processing order for all subsequent phases.
62
+
63
+ #### 4c. Circular dependency handling
64
+
65
+ If a cycle is detected (sorted list is shorter than total specs):
66
+
67
+ 1. Identify the cycle: collect all specs NOT in the sorted list. Walk their dependency edges to find and report one cycle path (e.g., `A → B → C → A`).
68
+ 2. Log a fatal error to `.deepflow/auto-decisions.log`:
69
+ ```
70
+ [YYYY-MM-DDTHH:MM:SSZ] FATAL: circular dependency detected: A → B → C → A
71
+ ```
72
+ 3. Generate the error report (Phase 8) with overall status `halted` and the cycle path in the summary.
73
+ 4. **Stop immediately** — do not proceed to any further phases.
74
+
75
+ #### 4d. Processing order enforcement
76
+
77
+ Process specs in the topological order determined in step 4b. When processing a spec through phases 1.5–7, all of its dependencies (specs it `depends_on`) must have already completed successfully (reached Phase 7 or been skipped by pre-check). If a dependency was halted or failed, mark the dependent spec as `blocked` and skip it — log: `spec '{spec}' blocked by failed dependency '{dep}'`.
78
+
79
+ ## Phase 1.5: PRE-CHECK (spawn a fresh subagent per spec, model: haiku, tools: Read/Grep/Glob only)
80
+
81
+ Before generating hypotheses, check if each spec's requirements are already satisfied by existing code.
82
+
83
+ ### 1.5a. Spawn pre-check subagent (model: haiku, read-only)
84
+
85
+ For each spec, spawn a fresh Haiku subagent with tools limited to Read, Grep, and Glob.
86
+
87
+ **Subagent prompt:**
88
+ ```
89
+ You are checking whether a spec's requirements are already satisfied by existing code.
90
+
91
+ --- SPEC CONTENT ---
92
+ {spec content}
93
+ --- END SPEC ---
94
+
95
+ For each requirement in the spec, determine if the existing codebase already satisfies it.
96
+
97
+ Output ONLY a JSON object (no markdown fences). The JSON must have:
98
+ {
99
+ "requirements": [
100
+ {"id": "REQ-1", "status": "DONE|PARTIAL|MISSING", "evidence": "brief explanation"}
101
+ ],
102
+ "overall": "DONE|PARTIAL|MISSING"
103
+ }
104
+
105
+ Rules:
106
+ - DONE = requirement is fully satisfied by existing code
107
+ - PARTIAL = some aspects exist but gaps remain
108
+ - MISSING = not implemented at all
109
+ - overall is DONE only if ALL requirements are DONE
110
+ ```
111
+
112
+ ### 1.5b. Process pre-check result
113
+
114
+ 1. Parse JSON from subagent output.
115
+ 2. If `overall: "DONE"`:
116
+ - Log: `already-satisfied: {spec-name} — all requirements met, skipping`
117
+ - Skip this spec entirely (do not hypothesize, spike, or implement).
118
+ 3. If `overall: "PARTIAL"`:
119
+ - Log each PARTIAL/MISSING requirement.
120
+ - Include the pre-check results in the hypothesis prompt (Phase 2b) so the teammate focuses on gaps.
121
+ 4. If `overall: "MISSING"` or parse fails:
122
+ - Proceed normally to Phase 2.
123
+
124
+ ## Phase 2: HYPOTHESIZE (spawn a fresh teammate per spec, model: sonnet)
125
+
126
+ For each spec:
127
+
128
+ ### 2a. Gather failed experiment context
129
+
130
+ 1. Glob `.deepflow/experiments/{spec-name}--*--failed.md` files.
131
+ 2. For each failed file, extract:
132
+ - The `## Hypothesis` section (from header to next `##`)
133
+ - The `## Conclusion` section (from header to next `##` or EOF)
134
+ 3. Build a `failed_context` block:
135
+ ```
136
+ --- Failed experiment: {filename} ---
137
+ ## Hypothesis
138
+ {extracted hypothesis}
139
+ ## Conclusion
140
+ {extracted conclusion}
141
+ ```
142
+
143
+ ### 2b. Spawn hypothesis teammate
144
+
145
+ Spawn a fresh teammate with this prompt:
146
+
147
+ ```
148
+ You are helping with an autonomous development workflow. Given the following spec, generate exactly {N} approach hypotheses for implementing it.
149
+
150
+ --- SPEC CONTENT ---
151
+ {spec content}
152
+ --- END SPEC ---
153
+ {if failed_context is not empty:}
154
+ The following hypotheses have already been tried and FAILED. Do NOT repeat them or suggest similar approaches:
155
+
156
+ {failed_context}
157
+ {end if}
158
+ {if pre_check_context is not empty (from Phase 1.5, overall=PARTIAL):}
159
+ A pre-check found that some requirements are already partially satisfied. Focus your hypotheses on the gaps:
160
+
161
+ {pre_check_context — the JSON requirements array filtered to PARTIAL/MISSING only}
162
+ {end if}
163
+ Generate exactly {N} hypotheses as a JSON array. Each object must have:
164
+ - "slug": a URL-safe lowercase hyphenated short name (e.g. "stream-based-parser")
165
+ - "hypothesis": a one-sentence description of the approach
166
+ - "method": a one-sentence description of how to validate this approach
167
+
168
+ Output ONLY the JSON array. No markdown fences, no explanation, no extra text. Just the raw JSON array.
169
+ ```
170
+
171
+ ### 2c. Process teammate output
172
+
173
+ 1. Extract JSON array from output (handle accidental wrapping — try `[...\n...]` first, then single-line `[...]`).
174
+ 2. If JSON parse fails → log error, return failure for this spec.
175
+ 3. Write to `.deepflow/hypotheses/{spec-name}-cycle-{N}.json`.
176
+ 4. Log each hypothesis slug. Warn if count differs from requested N.
177
+ 5. Default N = 2 (configurable).
178
+
179
+ ## Phase 3: SPIKE (parallel teammates, model: sonnet)
180
+
181
+ For each hypothesis from the cycle JSON file:
182
+
183
+ ### 3a. Create worktree per hypothesis
184
+
185
+ ```bash
186
+ WORKTREE=".deepflow/worktrees/{spec-name}-{slug}"
187
+ BRANCH="df/{spec-name}-{slug}"
188
+
189
+ # Try create new; fall back to reuse existing branch
190
+ git worktree add -b "$BRANCH" "$WORKTREE" HEAD 2>/dev/null \
191
+ || git worktree add "$WORKTREE" "$BRANCH" 2>/dev/null
192
+
193
+ # If worktree already exists on disk, reuse it
194
+ ```
195
+
196
+ If both fail and worktree directory exists, reuse it. If worktree truly cannot be created, treat hypothesis as failed and continue.
197
+
198
+ ### 3b. Extract acceptance criteria
199
+
200
+ Read the spec file. Extract the `## Acceptance Criteria` section (from that header to the next `##` or EOF). Pass this to the spike teammate as the human's judgment proxy.
201
+
202
+ ### 3c. Spawn spike teammate (model: sonnet)
203
+
204
+ Spawn up to 2 teammates in parallel (configurable). Each runs in its worktree directory.
205
+
206
+ **Teammate prompt:**
207
+ ```
208
+ You are running a spike experiment to validate a hypothesis for spec '{spec-name}'.
209
+
210
+ --- HYPOTHESIS ---
211
+ Slug: {slug}
212
+ Hypothesis: {hypothesis}
213
+ Method: {method}
214
+ --- END HYPOTHESIS ---
215
+
216
+ --- ACCEPTANCE CRITERIA (from spec — the human's judgment proxy) ---
217
+ {acceptance criteria}
218
+ --- END ACCEPTANCE CRITERIA ---
219
+
220
+ Your tasks:
221
+ 1. Validate this hypothesis by implementing the minimum necessary to prove or disprove it.
222
+ The spike must demonstrate that the approach can satisfy the acceptance criteria above.
223
+ 2. Create directories if needed: .deepflow/experiments/ and .deepflow/results/
224
+ 3. Write an experiment file at: .deepflow/experiments/{spec-name}--{slug}--active.md
225
+ Sections:
226
+ - ## Hypothesis: restate the hypothesis
227
+ - ## Method: what you did to validate
228
+ - ## Results: what you observed
229
+ - ## Criteria Check: for each acceptance criterion, can this approach satisfy it? (yes/no/unclear)
230
+ - ## Conclusion: PASSED or FAILED with reasoning
231
+ 4. Write a result YAML file at: .deepflow/results/spike-{slug}.yaml
232
+ Fields: slug, spec, status (passed/failed), summary
233
+ 5. Stage and commit all changes: spike({spec-name}): validate {slug}
234
+
235
+ Important:
236
+ - Be concise and focused — this is a spike, not a full implementation.
237
+ - If the hypothesis is not viable, mark it as failed and explain why.
238
+ ```
239
+
240
+ ### 3d. Post-spike result processing
241
+
242
+ After ALL spike teammates complete, process results sequentially:
243
+
244
+ For each hypothesis slug:
245
+ 1. Read `{worktree}/.deepflow/results/spike-{slug}.yaml`
246
+ 2. If file exists and `status: passed`:
247
+ - Log `PASSED spike: {slug}`
248
+ - Rename experiment: `{worktree}/.deepflow/experiments/{spec-name}--{slug}--active.md` → `--passed.md`
249
+ - Add slug to passed list
250
+ 3. If file exists and `status: failed`, OR file is missing:
251
+ - Log `FAILED spike: {slug}` (or `MISSING RESULT: {slug} — treating as failed`)
252
+ - Rename experiment: `--active.md` → `--failed.md`
253
+ - Copy failed experiment to main project: `{project-root}/.deepflow/experiments/{spec-name}--{slug}--failed.md`
254
+ 4. Write passed hypotheses JSON: `.deepflow/hypotheses/{spec-name}-cycle-{N}-passed.json`
255
+ - Array of `{slug, hypothesis, method}` objects for passed slugs only
256
+ - Empty array `[]` if none passed
257
+
258
+ ## Phase 4: IMPLEMENT (parallel teammates, model: opus)
259
+
260
+ For each passed hypothesis (from `{spec-name}-cycle-{N}-passed.json`), spawn a teammate in the EXISTING worktree (`.deepflow/worktrees/{spec-name}-{slug}`). The implementation teammate builds on spike commits — this is critical.
261
+
262
+ ### 4a. Pre-checks
263
+
264
+ 1. Read passed hypotheses JSON. If empty or missing → skip implementations, proceed to SELECT (it will reject).
265
+ 2. For each slug, verify worktree exists at `.deepflow/worktrees/{spec-name}-{slug}`. If missing → log error, skip that slug.
266
+
267
+ ### 4b. Spawn implementation teammate (model: opus)
268
+
269
+ Spawn up to 2 teammates in parallel. Each runs in its hypothesis worktree.
270
+
271
+ **Teammate prompt:**
272
+ ```
273
+ You are implementing tasks for spec '{spec-name}' in an autonomous development workflow.
274
+ The spike experiment for approach '{slug}' has passed validation. Now implement the full solution.
275
+
276
+ --- SPEC CONTENT ---
277
+ {full spec content}
278
+ --- END SPEC ---
279
+
280
+ The validated experiment file is at: .deepflow/experiments/{spec-name}--{slug}--passed.md
281
+ Review it to understand the approach that was validated during the spike.
282
+
283
+ Your tasks:
284
+ 1. Read the spec carefully and generate a list of implementation tasks from it.
285
+ 2. Implement each task with atomic commits. Each commit message must follow the format:
286
+ feat({spec-name}): {task description}
287
+ 3. For each completed task, write a result YAML file at:
288
+ .deepflow/results/{task-slug}.yaml
289
+ Each YAML must contain:
290
+ - task: short task name
291
+ - spec: {spec-name}
292
+ - status: passed OR failed
293
+ - summary: one-line summary of what was implemented
294
+ 4. Create the .deepflow/results directory if it does not exist.
295
+
296
+ Important:
297
+ - Build on top of the spike commits already in this worktree.
298
+ - Be thorough — this is the full implementation, not a spike.
299
+ - Stage and commit each task separately for clean atomic commits.
300
+ ```
301
+
302
+ ### 4c. Post-implementation result collection
303
+
304
+ After ALL implementation teammates complete:
305
+
306
+ For each slug:
307
+ 1. Read all `.deepflow/results/*.yaml` files from the worktree (exclude `spike-*.yaml`)
308
+ 2. Count by status: passed vs failed
309
+ 3. Log: `Implementation {slug}: {N} tasks ({P} passed, {F} failed)`
310
+ 4. If no result files found → log warning
311
+
312
+ ## Phase 5: SELECT (single subagent, model: opus, tools: Read/Grep/Glob only)
313
+
314
+ ### 5a. Gather artifacts
315
+
316
+ For each approach slug (from the cycle hypotheses JSON):
317
+ 1. Read ALL `.deepflow/results/*.yaml` files from the approach worktree
318
+ 2. Read the passed experiment file: `.deepflow/experiments/{spec-name}--{slug}--passed.md`
319
+ 3. Build an artifacts block:
320
+ ```
321
+ === APPROACH {N}: {slug} ===
322
+ --- Result: {filename}.yaml ---
323
+ {yaml content}
324
+ --- Experiment: {spec-name}--{slug}--passed.md ---
325
+ {experiment content}
326
+ === END APPROACH {N} ===
327
+ ```
328
+
329
+ Do NOT include source code or file paths in the artifacts block.
330
+
331
+ ### 5b. Spawn judge subagent (model: opus, tools: Read/Grep/Glob only)
332
+
333
+ Extract acceptance criteria from the spec (`## Acceptance Criteria` section).
334
+
335
+ **Subagent prompt:**
336
+ ```
337
+ You are an adversarial quality judge in an autonomous development workflow.
338
+ Your job is to compare implementation approaches for spec '{spec-name}' and select the best one — or reject all if quality is insufficient.
339
+
340
+ IMPORTANT:
341
+ - This selection phase ALWAYS runs, even with only 1 approach. With a single approach you act as a quality gate.
342
+ - You CAN and SHOULD reject all approaches if the quality is insufficient. Do not rubber-stamp poor work.
343
+ - Base your judgment ONLY on the artifacts provided below. Do NOT read code files.
344
+ - Judge each approach against the ACCEPTANCE CRITERIA below — these represent the human's intent.
345
+
346
+ --- ACCEPTANCE CRITERIA (from spec) ---
347
+ {acceptance criteria}
348
+ --- END ACCEPTANCE CRITERIA ---
349
+
350
+ There are {N} approach(es) to evaluate:
351
+
352
+ {artifacts block}
353
+
354
+ Respond with ONLY a JSON object (no markdown fences, no explanation). The JSON must have this exact structure:
355
+
356
+ {
357
+ "winner": "slug-of-winner-or-empty-string-if-rejecting-all",
358
+ "rankings": [
359
+ {"slug": "approach-slug", "rank": 1, "rationale": "why this rank"},
360
+ {"slug": "approach-slug", "rank": 2, "rationale": "why this rank"}
361
+ ],
362
+ "reject_all": false,
363
+ "rejection_rationale": ""
364
+ }
365
+
366
+ Rules for the JSON:
367
+ - rankings must include ALL approaches, ranked from best (1) to worst
368
+ - If reject_all is true, winner must be an empty string and rejection_rationale must explain why
369
+ - If reject_all is false, winner must be the slug of the rank-1 approach
370
+ - Output ONLY the JSON object. No other text.
371
+ ```
372
+
373
+ ### 5c. Process verdict
374
+
375
+ Parse the JSON output. Handle extraction failures gracefully (try `{...}` block first, then single-line match).
376
+
377
+ **If `reject_all: true`:**
378
+ 1. Log rejection rationale
379
+ 2. Keep only the best-ranked worktree (rank 1), clean up others: `git worktree remove --force`, `git branch -D`
380
+ 3. Loop back to HYPOTHESIZE (next cycle). The failed context from Phase 2a will prevent repeats.
381
+
382
+ **If winner selected:**
383
+ 1. Log: `SELECTED winner '{slug}'`
384
+ 2. Write `.deepflow/selection/{spec-name}-winner.json`:
385
+ ```json
386
+ {"spec": "{spec-name}", "cycle": {N}, "winner": "{slug}", "selection_output": {full JSON verdict}}
387
+ ```
388
+ 3. Clean up ALL non-winner worktrees and branches: `git worktree remove --force {path}`, `git branch -D df/{spec-name}-{slug}`
389
+
390
+ ## Phase 6: VERIFY (subagent, model: opus)
391
+
392
+ Spawn a fresh verifier subagent on the winner worktree (`.deepflow/worktrees/{spec-name}-{winner-slug}`).
393
+
394
+ ### 6a. Spawn verifier subagent (model: opus)
395
+
396
+ **Subagent prompt:**
397
+ ```
398
+ You are verifying the implementation for spec '{spec-name}' in worktree '.deepflow/worktrees/{spec-name}-{winner-slug}'.
399
+
400
+ Run the following verification gates in order. Stop at the first failure.
401
+
402
+ L0 — Build: Run the project build command (npm run build, cargo build, go build ./..., make build, etc.) if one exists. Must succeed. If no build command detected, skip.
403
+ L1 — Exists: Verify that all files and functions referenced in the spec exist (use Glob/Grep).
404
+ L2 — Substantive: Read key files and verify real implementations, not stubs or TODOs.
405
+ L3 — Wired: Verify implementations are integrated into the system (imports, calls, routes, etc.).
406
+ L4 — Tests: Run the project test command (npm test, pytest, cargo test, go test ./..., make test, etc.). All must pass. If no test command detected, skip.
407
+
408
+ After L0-L4 gates, also check acceptance criteria from the spec against the implementation.
409
+
410
+ Skip PLAN.md readiness check (not applicable in auto mode).
411
+
412
+ Output a JSON object:
413
+ {
414
+ "passed": true/false,
415
+ "gates": [
416
+ {"level": "L0", "status": "passed|failed|skipped", "detail": "..."},
417
+ {"level": "L1", "status": "passed|failed|skipped", "detail": "..."},
418
+ {"level": "L2", "status": "passed|failed|skipped", "detail": "..."},
419
+ {"level": "L3", "status": "passed|failed|skipped", "detail": "..."},
420
+ {"level": "L4", "status": "passed|failed|skipped", "detail": "..."}
421
+ ],
422
+ "summary": "one-line summary"
423
+ }
424
+ ```
425
+
426
+ ### 6b. Process verification result
427
+
428
+ 1. Parse JSON from verifier output.
429
+ 2. If `passed: false`:
430
+ - Log: `VERIFY FAILED for {spec-name}/{winner-slug}: {summary}`
431
+ - Log each failed gate with detail.
432
+ - Mark spec as `halted`. Preserve winner worktree for inspection.
433
+ - Proceed to REPORT (Phase 8). Do NOT create PR.
434
+ 3. If `passed: true`:
435
+ - Log: `VERIFY PASSED for {spec-name}/{winner-slug}`
436
+ - Proceed to PR (Phase 7).
437
+
438
+ ## Phase 7: PR (you do this)
439
+
440
+ ### 7a. Push winner branch
441
+
442
+ ```bash
443
+ git push -u origin df/{spec-name}-{slug}
444
+ ```
445
+
446
+ If push fails (e.g., no remote, auth error), log the error and skip PR creation — proceed directly to REPORT (Phase 8) with `pr_url` unset.
447
+
448
+ ### 7b. Create PR via `gh`
449
+
450
+ First check if `gh` is available:
451
+
452
+ ```bash
453
+ command -v gh >/dev/null 2>&1
454
+ ```
455
+
456
+ **If `gh` IS available**, create a PR with a rich body. Gather these inputs:
457
+
458
+ 1. **Spec objective** — read the first paragraph or `## Objective` section from the spec file.
459
+ 2. **Winner rationale** — read `.deepflow/selection/{spec-name}-winner.json`, extract the rank-1 entry's `rationale` field from `selection_output.rankings`.
460
+ 3. **Diff stats** — run `git diff --stat main...df/{spec-name}-{slug}`.
461
+ 4. **Verification gates** — read the verification JSON from Phase 6 and format each gate (L0-L4) with status and detail.
462
+ 5. **Spike summary** — read `.deepflow/hypotheses/{spec-name}-cycle-{N}.json` and `.deepflow/hypotheses/{spec-name}-cycle-{N}-passed.json` to list which spikes passed and which failed.
463
+
464
+ Create the PR:
465
+
466
+ ```bash
467
+ gh pr create \
468
+ --base main \
469
+ --head "df/{spec-name}-{slug}" \
470
+ --title "feat({spec-name}): {short objective from spec}" \
471
+ --body "$(cat <<'PRBODY'
472
+ ## Spec: {spec-name}
473
+
474
+ **Objective:** {spec objective}
475
+
476
+ ## Winner: {slug}
477
+
478
+ **Rationale:** {rank-1 rationale from selection JSON}
479
+
480
+ ## Spike Summary
481
+
482
+ | Spike | Status |
483
+ |-------|--------|
484
+ | {slug-1} | passed/failed |
485
+ | {slug-2} | passed/failed |
486
+
487
+ ## Verification Gates
488
+
489
+ | Gate | Status | Detail |
490
+ |------|--------|--------|
491
+ | L0 Build | {status} | {detail} |
492
+ | L1 Exists | {status} | {detail} |
493
+ | L2 Substantive | {status} | {detail} |
494
+ | L3 Wired | {status} | {detail} |
495
+ | L4 Tests | {status} | {detail} |
496
+
497
+ ## Diff Stats
498
+
499
+ ```
500
+ {output of git diff --stat main...df/{spec-name}-{slug}}
501
+ ```
502
+
503
+ ---
504
+ *Generated by deepflow auto*
505
+ PRBODY
506
+ )"
507
+ ```
508
+
509
+ Capture the PR URL from `gh pr create` output. Store it as `pr_url` for Phase 8.
510
+
511
+ Log: `PR created: {pr_url}`
512
+
513
+ ### 7c. Fallback: direct merge if `gh` unavailable
514
+
515
+ **If `gh` is NOT available** (i.e., `command -v gh` fails):
516
+
517
+ ```bash
518
+ git checkout main
519
+ git merge df/{spec-name}-{slug}
520
+ ```
521
+
522
+ Log a warning: `WARNING: gh CLI not available — merged directly to main instead of creating PR`
523
+
524
+ Set `pr_url` to `"(direct merge — no PR created)"` for Phase 8.
525
+
526
+ After the direct merge, the spec lifecycle still applies (rename `doing-*` to `done-*` etc.).
527
+
528
+ ### 7d. Spec lifecycle
529
+
530
+ Spec stays `doing-*` until the PR is merged (or the direct merge completes). After merge/direct-merge, execute the following steps in order:
531
+
532
+ #### Step 1 — Rename doing → done
533
+
534
+ ```bash
535
+ git mv specs/doing-{name}.md specs/done-{name}.md
536
+ git commit -m "lifecycle({name}): doing → done"
537
+ ```
538
+
539
+ If `specs/doing-{name}.md` does not exist (e.g., already renamed), skip this step and log a warning.
540
+
541
+ #### Step 2 — Decision extraction
542
+
543
+ Read `specs/done-{name}.md` and extract architectural decisions. Scan the entire file for:
544
+
545
+ 1. **Explicit choices** (phrases like "we chose", "decided to", "selected", "approach:", "going with") → tag as `[APPROACH]`
546
+ 2. **Unvalidated assumptions** (phrases like "assuming", "we assume", "expected to", "should be") → tag as `[ASSUMPTION]`
547
+ 3. **Temporary decisions** (phrases like "for now", "temporary", "placeholder", "revisit later", "tech debt", "TODO") → tag as `[PROVISIONAL]`
548
+
549
+ For each extracted decision, capture:
550
+ - The tag (`[APPROACH]`, `[ASSUMPTION]`, or `[PROVISIONAL]`)
551
+ - A concise one-line summary of the decision
552
+ - The rationale (surrounding context or explicit reasoning)
553
+
554
+ If no decisions are found, log: `no decisions extracted from {name}` and skip to Step 4.
555
+
556
+ #### Step 3 — Write to decisions.md
557
+
558
+ Append a new section to `.deepflow/decisions.md` (create the file if it does not exist):
559
+
560
+ ```markdown
561
+ ### {YYYY-MM-DD} — {name}
562
+ - [APPROACH] decision text — rationale
563
+ - [ASSUMPTION] decision text — rationale
564
+ - [PROVISIONAL] decision text — rationale
565
+ ```
566
+
567
+ Use today's date in `YYYY-MM-DD` format. Only include tags that were actually extracted.
568
+
569
+ Commit the update:
570
+
571
+ ```bash
572
+ git add .deepflow/decisions.md
573
+ git commit -m "lifecycle({name}): extract decisions"
574
+ ```
575
+
576
+ #### Step 4 — Delete done file
577
+
578
+ After successful decision extraction (or if no decisions were found), delete the done spec:
579
+
580
+ ```bash
581
+ git rm specs/done-{name}.md
582
+ git commit -m "lifecycle({name}): archive done spec"
583
+ ```
584
+
585
+ #### Step 5 — Failed extraction preserves done file
586
+
587
+ If decision extraction fails (e.g., file read error, unexpected format), do NOT delete `specs/done-{name}.md`. Log the error: `decision extraction failed for {name} — preserving done file for manual review`. Proceed to Phase 8 (REPORT) normally.
588
+
589
+ ## Phase 8: REPORT (you do this)
590
+
591
+ Generate `.deepflow/auto-report.md`. Always generate a report, even on errors or interrupts.
592
+
593
+ ### 8a. Determine status
594
+
595
+ For each spec:
596
+ - Winner file exists (`.deepflow/selection/{spec-name}-winner.json`) → `converged`
597
+ - Interrupted/incomplete → `in-progress`
598
+ - Failed without recovery → `halted`
599
+
600
+ Overall status: `converged` only if ALL specs converged. Any `halted` → overall `halted`. Any `in-progress` → overall `in-progress`.
601
+
602
+ ### 8b. Build report
603
+
604
+ ```markdown
605
+ # deepflow auto report
606
+
607
+ **Status:** {overall_status}
608
+ **Date:** {UTC timestamp}
609
+
610
+ ---
611
+
612
+ ## {spec-name}
613
+
614
+ **Status:** {converged|halted|in-progress}
615
+ **Winner:** {slug} (if converged)
616
+
617
+ ### Hypotheses
618
+ {for each hypothesis in .deepflow/hypotheses/{spec-name}-cycle-{N}.json:}
619
+ - **{slug}:** {hypothesis description}
620
+
621
+ ### Spike Results
622
+ {for each worktree .deepflow/worktrees/{spec-name}-{slug}:}
623
+ - {pass_icon} **{slug}** — {summary from spike-{slug}.yaml}
624
+
625
+ ### Selection Rationale
626
+ {parse rankings from .deepflow/selection/{spec-name}-winner.json:}
627
+ {rank 1 icon} **#{rank} {slug}:** {rationale}
628
+
629
+ ### Verification
630
+ {if verification ran, show gate results from Phase 6:}
631
+ - {status_icon} **L0 Build:** {detail}
632
+ - {status_icon} **L1 Exists:** {detail}
633
+ - {status_icon} **L2 Substantive:** {detail}
634
+ - {status_icon} **L3 Wired:** {detail}
635
+ - {status_icon} **L4 Tests:** {detail}
636
+ {if halted: "Verification FAILED — worktree preserved for inspection at .deepflow/worktrees/{spec-name}-{winner-slug}"}
637
+
638
+ ### Pull Request
639
+ {if pr_url is set and not a direct merge: "**PR:** [{pr_url}]({pr_url})"}
640
+ {if pr_url indicates direct merge: "**Merged directly** — `gh` CLI was not available. No PR created."}
641
+ {if pr_url is unset (e.g., push failed or verification failed): "No PR created."}
642
+
643
+ ### Changes
644
+ {run: git diff --stat main...df/{spec-name}-{winner-slug}}
645
+
646
+ ---
647
+
648
+ ## Next Steps
649
+ {if converged and pr_url is a real PR: "Review and merge PR: {pr_url}"}
650
+ {if converged and direct merge: "Already merged to main."}
651
+ {if in-progress: "Run `deepflow auto --continue` to resume."}
652
+ {if halted: "Review the spec and run `deepflow auto` again."}
653
+ ```
654
+
655
+ ## Cycle Control
656
+
657
+ | Condition | Action |
658
+ |-----------|--------|
659
+ | No specs found | Stop with error |
660
+ | All spikes failed | Proceed to SELECT (it will reject) |
661
+ | SELECT rejects all | Loop to HYPOTHESIZE (next cycle) |
662
+ | SELECT picks winner | Verify → PR → next spec |
663
+ | MAX_CYCLES reached | Mark halted, generate report |
664
+ | Teammate fails to produce artifacts | Treat as failed |
665
+ | JSON parse error | Log error, treat as failed |
666
+
667
+ Always generate a report, even on errors or interrupts.