npm - deepflow - Versions diffs - 0.1.24 → 0.1.26 - Mend

deepflow 0.1.24 → 0.1.26

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/package.json +1 -1
package/src/commands/df/execute.md +159 -11
package/src/commands/df/plan.md +158 -20
package/src/commands/df/spec.md +57 -8
package/src/commands/df/verify.md +14 -2
package/templates/config-template.yaml +3 -0
package/templates/experiment-template.md +74 -0
package/templates/plan-template.md +18 -0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "deepflow",
-  "version": "0.1.24",
+  "version": "0.1.26",
   "description": "Stay in flow state - lightweight spec-driven task orchestration for Claude Code",
   "keywords": [
     "claude",

package/src/commands/df/execute.md CHANGED Viewed

@@ -24,8 +24,12 @@ Implement tasks from PLAN.md with parallel agents, atomic commits, and context-e
 ## Skills & Agents
 - Skill: `atomic-commits` — Clean commit protocol
-- Agent: `general-purpose` (Sonnet) — Task implementation
-- Agent: `reasoner` (Opus) — Debugging failures
+**Use Task tool to spawn agents:**
+| Agent | subagent_type | model | Purpose |
+|-------|---------------|-------|---------|
+| Implementation | `general-purpose` | `sonnet` | Task implementation |
+| Debugger | `reasoner` | `opus` | Debugging failures |
 ## Context-Aware Execution
@@ -82,20 +86,93 @@ If missing: "No PLAN.md found. Run /df:plan first."
 Warn if `specs/*.md` (excluding doing-/done-) exist. Non-blocking.
-### 4. IDENTIFY READY TASKS
+### 4. CHECK EXPERIMENT STATUS (HYPOTHESIS VALIDATION)
+**Before identifying ready tasks**, check experiment validation for full implementation tasks.
+**Task Types:**
+- **Spike tasks**: Have `[SPIKE]` in title OR `Type: spike` in description — always executable
+- **Full implementation tasks**: Blocked by spike tasks — require validated experiment
+**Validation Flow:**
+```
+For each task in plan:
+  If task is spike task:
+    → Mark as executable (spikes are always allowed)
+  Else if task is blocked by a spike task (T{n}):
+    → Find related experiment file in .deepflow/experiments/
+    → Check experiment status:
+      - --passed.md exists → Unblock, proceed with implementation
+      - --failed.md exists → Keep blocked, warn user
+      - --active.md exists → Keep blocked, spike in progress
+      - No experiment → Keep blocked, spike not started
+```
+**Experiment File Discovery:**
+```
+Glob: .deepflow/experiments/{topic}--*--{status}.md
-Ready = `[ ]` + all `blocked_by` complete + not in checkpoint.
+Topic extraction:
+1. From spike task: experiment file path in task description
+2. From spec name: doing-{topic} → {topic}
+3. Fuzzy match: normalize and match
+```
-### 5. SPAWN AGENTS
+**Status Handling:**
+| Experiment Status | Task Status | Action |
+|-------------------|-------------|--------|
+| `--passed.md` | Ready | Execute full implementation |
+| `--failed.md` | Blocked | Skip, warn: "Experiment failed, re-plan needed" |
+| `--active.md` | Blocked | Skip, info: "Waiting for spike completion" |
+| Not found | Blocked | Skip, info: "Spike task not executed yet" |
+**Warning Output:**
+```
+⚠ T3 blocked: Experiment 'upload--streaming--failed.md' did not validate
+  → Run /df:plan to generate new hypothesis spike
+```
+### 5. IDENTIFY READY TASKS
+Ready = `[ ]` + all `blocked_by` complete + experiment validated (if applicable) + not in checkpoint.
+### 6. SPAWN AGENTS
 Context ≥50%: checkpoint and exit.
-Spawn all ready tasks in ONE message (parallel). Same-file conflicts: sequential.
+**Use Task tool to spawn all ready tasks in ONE message (parallel):**
+```
+Task tool parameters for each task:
+- subagent_type: "general-purpose"
+- model: "sonnet"
+- run_in_background: true
+- prompt: "{task details from PLAN.md}"
+```
-On failure: spawn `reasoner`.
+Same-file conflicts: spawn sequentially instead.
-### 6. PER-TASK (agent prompt)
+**Spike Task Execution:**
+When spawning a spike task, the agent MUST:
+1. Execute the minimal validation method
+2. Record result in experiment file (update status: `--passed.md` or `--failed.md`)
+3. If passed: implementation tasks become unblocked
+4. If failed: record conclusion with "next hypothesis" for future planning
+**On failure, use Task tool to spawn reasoner:**
+```
+Task tool parameters:
+- subagent_type: "reasoner"
+- model: "opus"
+- prompt: "Debug failure: {error details}"
+```
+### 7. PER-TASK (agent prompt)
+**Standard Task:**
 ```
 {task_id}: {description from PLAN.md}
 Files: {target files}
@@ -105,14 +182,38 @@ Implement, test, commit as feat({spec}): {description}.
 Write result to .deepflow/results/{task_id}.yaml
 ```
-### 7. COMPLETE SPECS
+**Spike Task:**
+```
+{task_id} [SPIKE]: {hypothesis}
+Type: spike
+Method: {minimal steps to validate}
+Success criteria: {how to know it passed}
+Time-box: {duration}
+Experiment file: {.deepflow/experiments/{topic}--{hypothesis}--active.md}
+Spec: {spec_name}
+Execute the minimal validation:
+1. Follow the method steps exactly
+2. Measure against success criteria
+3. Update experiment file with result:
+   - If passed: rename to --passed.md, record findings
+   - If failed: rename to --failed.md, record conclusion with "next hypothesis"
+4. Commit as spike({spec}): validate {hypothesis}
+5. Write result to .deepflow/results/{task_id}.yaml
+Result status:
+- success = hypothesis validated (passed)
+- failed = hypothesis invalidated (failed experiment, NOT agent error)
+```
+### 8. COMPLETE SPECS
 When all tasks done for a `doing-*` spec:
 1. Embed history in spec: `## Completed` section
 2. Rename: `doing-upload.md` → `done-upload.md`
 3. Remove section from PLAN.md
-### 8. ITERATE
+### 9. ITERATE
 Repeat until: all done, all blocked, or checkpoint.
@@ -126,6 +227,8 @@ Repeat until: all done, all blocked, or checkpoint.
 ## Example
+### Standard Execution
 ```
 /df:execute (context: 12%)
@@ -140,7 +243,52 @@ Wave 2: T3 (context: 48%)
 ✓ Complete: 3/3 tasks
 ```
-With checkpoint:
+### Spike-First Execution
+```
+/df:execute (context: 10%)
+Checking experiment status...
+  T1 [SPIKE]: No experiment yet, spike executable
+  T2: Blocked by T1 (spike not validated)
+  T3: Blocked by T1 (spike not validated)
+Wave 1: T1 [SPIKE] (context: 20%)
+  T1: success (abc1234) → upload--streaming--passed.md
+Checking experiment status...
+  T2: Experiment passed, unblocked
+  T3: Experiment passed, unblocked
+Wave 2: T2, T3 parallel (context: 45%)
+  T2: success (def5678)
+  T3: success (ghi9012)
+✓ doing-upload → done-upload
+✓ Complete: 3/3 tasks
+```
+### Spike Failed
+```
+/df:execute (context: 10%)
+Wave 1: T1 [SPIKE] (context: 20%)
+  T1: failed → upload--streaming--failed.md
+Checking experiment status...
+  T2: ⚠ Blocked - Experiment failed
+  T3: ⚠ Blocked - Experiment failed
+⚠ Spike T1 invalidated hypothesis
+  Experiment: upload--streaming--failed.md
+  → Run /df:plan to generate new hypothesis spike
+Complete: 1/3 tasks (2 blocked by failed experiment)
+```
+### With Checkpoint
 ```
 Wave 1 complete (context: 52%)
 Checkpoint saved. Run /df:execute --continue

package/src/commands/df/plan.md CHANGED Viewed

@@ -42,21 +42,33 @@ Determine source_dir from config or default to src/
 If no new specs: report counts, suggest `/df:execute`.
-### 2. CHECK PAST EXPERIMENTS
+### 2. CHECK PAST EXPERIMENTS (SPIKE-FIRST)
-Extract domains from spec (perf, auth, api, etc.), then:
+**CRITICAL**: Check experiments BEFORE generating any tasks.
+Extract topic from spec name (fuzzy match), then:
 ```
-Glob .deepflow/experiments/{domain}--*
+Glob .deepflow/experiments/{topic}--*
 ```
+**Experiment file naming:** `{topic}--{hypothesis}--{status}.md`
+Statuses: `active`, `passed`, `failed`
 | Result | Action |
 |--------|--------|
-| `--failed.md` | Exclude approach, note why |
-| `--success.md` | Reference as pattern |
-| No matches | Continue (expected for new projects) |
+| `--failed.md` exists | Extract "next hypothesis" from Conclusion section |
+| `--passed.md` exists | Reference as validated pattern, can proceed to full implementation |
+| `--active.md` exists | Wait for experiment completion before planning |
+| No matches | New topic, needs initial spike |
+**Spike-First Rule**:
+- If `--failed.md` exists: Generate spike task to test the next hypothesis (from failed experiment's Conclusion)
+- If no experiments exist: Generate spike task for the core hypothesis
+- Full implementation tasks are BLOCKED until a spike validates the approach
+- Only proceed to full task generation after `--passed.md` exists
-**Naming:** `{domain}--{approach}--{result}.md`
+See: `templates/experiment-template.md` for experiment format
 ### 3. DETECT PROJECT CONTEXT
@@ -69,7 +81,15 @@ Include patterns in task descriptions for agents to follow.
 ### 4. ANALYZE CODEBASE
-**Spawn Explore agents** (haiku, read-only) with dynamic count:
+**Use Task tool to spawn Explore agents in parallel:**
+```
+Task tool parameters:
+- subagent_type: "Explore"
+- model: "haiku"
+- run_in_background: true (for parallel execution)
+```
+Scale agent count based on codebase size:
 | File Count | Agents |
 |------------|--------|
@@ -78,6 +98,34 @@ Include patterns in task descriptions for agents to follow.
 | 100-500 | 25-40 |
 | 500+ | 50-100 (cap) |
+**Explore Agent Prompt Structure:**
+```
+Find: [specific question]
+Return ONLY:
+- File paths matching criteria
+- One-line description per file
+- Integration points (if asked)
+DO NOT:
+- Read or summarize spec files
+- Make recommendations
+- Propose solutions
+- Generate tables or lengthy explanations
+Max response: 500 tokens (configurable via .deepflow/config.yaml explore.max_tokens)
+```
+**Explore Agent Scope Restrictions:**
+- MUST only report factual findings:
+  - Files found
+  - Patterns/conventions observed
+  - Integration points
+- MUST NOT:
+  - Make recommendations
+  - Propose architectures
+  - Read and summarize specs (that's orchestrator's job)
+  - Draw conclusions about what should be built
 **Use `code-completeness` skill patterns** to search for:
 - Implementations matching spec requirements
 - TODO, FIXME, HACK comments
@@ -86,7 +134,14 @@ Include patterns in task descriptions for agents to follow.
 ### 5. COMPARE & PRIORITIZE
-**Spawn `reasoner` agent** (Opus) for analysis:
+**Use Task tool to spawn reasoner agent:**
+```
+Task tool parameters:
+- subagent_type: "reasoner"
+- model: "opus"
+```
+Reasoner performs analysis:
 | Status | Action |
 |--------|--------|
@@ -102,7 +157,36 @@ Include patterns in task descriptions for agents to follow.
 2. Impact — core features before enhancements
 3. Risk — unknowns early
-### 6. VALIDATE HYPOTHESES
+### 6. GENERATE SPIKE TASKS (IF NEEDED)
+**When to generate spike tasks:**
+1. Failed experiment exists → Test the next hypothesis
+2. No experiments exist → Test the core hypothesis
+3. Passed experiment exists → Skip to full implementation
+**Spike Task Format:**
+```markdown
+- [ ] **T1** [SPIKE]: Validate {hypothesis}
+  - Type: spike
+  - Hypothesis: {what we're testing}
+  - Method: {minimal steps to validate}
+  - Success criteria: {how to know it passed}
+  - Time-box: 30 min
+  - Files: .deepflow/experiments/{topic}--{hypothesis}--{status}.md
+  - Blocked by: none
+```
+**Blocking Logic:**
+- All implementation tasks MUST have `Blocked by: T{spike}` until spike passes
+- After spike completes:
+  - If passed: Update experiment to `--passed.md`, unblock implementation tasks
+  - If failed: Update experiment to `--failed.md`, DO NOT generate implementation tasks
+**Full Implementation Only After Spike:**
+- Only generate full task list when spike validates the approach
+- Never generate 10-task waterfall without validated hypothesis
+### 7. VALIDATE HYPOTHESES
 Test risky assumptions before finalizing plan.
@@ -111,24 +195,27 @@ Test risky assumptions before finalizing plan.
 **Process:**
 1. Prototype in scratchpad (not committed)
 2. Test assumption
-3. If fails → Write `.deepflow/experiments/{domain}--{approach}--failed.md`
+3. If fails → Write `.deepflow/experiments/{topic}--{hypothesis}--failed.md`
 4. Adjust approach, document in task
 **Skip:** Well-known patterns, simple CRUD, clear docs exist
-### 7. OUTPUT PLAN.md
+### 8. OUTPUT PLAN.md
 Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validation findings.
-### 8. RENAME SPECS
+### 9. RENAME SPECS
 `mv specs/feature.md specs/doing-feature.md`
-### 9. REPORT
+### 10. REPORT
 `✓ Plan generated — {n} specs, {n} tasks. Run /df:execute`
 ## Rules
+- **Spike-first** — Generate spike task before full implementation if no `--passed.md` experiment exists
+- **Block on spike** — Full implementation tasks MUST be blocked by spike validation
+- **Learn from failures** — Extract "next hypothesis" from failed experiments, never repeat same approach
 - **Learn from history** — Check past experiments before proposing approaches
 - **Plan only** — Do NOT implement anything (except quick validation prototypes)
 - **Validate before commit** — Test risky assumptions with minimal experiments
@@ -139,13 +226,64 @@ Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validatio
 ## Agent Scaling
-| Agent | Base | Scale |
-|-------|------|-------|
-| Explore (search) | 10 | +1 per 20 files |
-| Reasoner (analyze) | 5 | +1 per 2 specs |
+| Agent | Model | Base | Scale |
+|-------|-------|------|-------|
+| Explore (search) | haiku | 10 | +1 per 20 files |
+| Reasoner (analyze) | opus | 5 | +1 per 2 specs |
+**IMPORTANT**: Always use the `Task` tool with explicit `subagent_type` and `model` parameters. Do NOT use Glob/Grep/Read directly for codebase analysis - spawn agents instead.
 ## Example
+### Spike-First (No Prior Experiments)
+```markdown
+# Plan
+### doing-upload
+- [ ] **T1** [SPIKE]: Validate streaming upload approach
+  - Type: spike
+  - Hypothesis: Streaming uploads will handle files >1GB without memory issues
+  - Method: Create minimal endpoint, upload 2GB file, measure memory
+  - Success criteria: Memory stays under 500MB during upload
+  - Time-box: 30 min
+  - Files: .deepflow/experiments/upload--streaming--active.md
+  - Blocked by: none
+- [ ] **T2**: Create upload endpoint
+  - Files: src/api/upload.ts
+  - Blocked by: T1 (spike must pass)
+- [ ] **T3**: Add S3 service with streaming
+  - Files: src/services/storage.ts
+  - Blocked by: T1 (spike must pass), T2
+```
+### Spike-First (After Failed Experiment)
+```markdown
+# Plan
+### doing-upload
+- [ ] **T1** [SPIKE]: Validate chunked upload with backpressure
+  - Type: spike
+  - Hypothesis: Adding backpressure control will prevent buffer overflow
+  - Method: Implement pause/resume on buffer threshold, test with 2GB file
+  - Success criteria: No memory spikes above 500MB
+  - Time-box: 30 min
+  - Files: .deepflow/experiments/upload--chunked-backpressure--active.md
+  - Blocked by: none
+  - Note: Previous approach failed (see upload--buffer-upload--failed.md)
+- [ ] **T2**: Implement chunked upload endpoint
+  - Files: src/api/upload.ts
+  - Blocked by: T1 (spike must pass)
+```
+### After Spike Validates (Full Implementation)
 ```markdown
 # Plan
@@ -154,10 +292,10 @@ Append tasks grouped by `### doing-{spec-name}`. Include spec gaps and validatio
 - [ ] **T1**: Create upload endpoint
   - Files: src/api/upload.ts
   - Blocked by: none
+  - Note: Use streaming (validated in upload--streaming--passed.md)
 - [ ] **T2**: Add S3 service with streaming
   - Files: src/services/storage.ts
   - Blocked by: T1
-  - Note: Use streaming (see experiments/perf--chunked-upload--success.md)
-  - Avoid: Direct buffer upload failed for large files (experiments/perf--buffer-upload--failed.md)
+  - Avoid: Direct buffer upload failed (see upload--buffer-upload--failed.md)
 ```

package/src/commands/df/spec.md CHANGED Viewed

@@ -20,14 +20,26 @@ Transform conversation context into a structured specification file.
 ## Skills & Agents
 - Skill: `gap-discovery` — Proactive requirement gap identification
-- Agent: `Explore` (haiku) — Codebase context gathering
-- Agent: `reasoner` (Opus) — Synthesize findings into requirements
+**Use Task tool to spawn agents:**
+| Agent | subagent_type | model | Purpose |
+|-------|---------------|-------|---------|
+| Context | `Explore` | `haiku` | Codebase context gathering |
+| Synthesizer | `reasoner` | `opus` | Synthesize findings into requirements |
 ## Behavior
 ### 1. GATHER CODEBASE CONTEXT
-**Spawn Explore agents** (haiku, read-only, parallel) to find:
+**Use Task tool to spawn Explore agents in parallel:**
+```
+Task tool parameters:
+- subagent_type: "Explore"
+- model: "haiku"
+- run_in_background: true
+```
+Find:
 - Related existing implementations
 - Code patterns and conventions
 - Integration points relevant to the feature
@@ -39,6 +51,34 @@ Transform conversation context into a structured specification file.
 | 20-100 | 5-8 |
 | 100+ | 10-15 |
+**Explore Agent Prompt Structure:**
+```
+Find: [specific question]
+Return ONLY:
+- File paths matching criteria
+- One-line description per file
+- Integration points (if asked)
+DO NOT:
+- Read or summarize spec files
+- Make recommendations
+- Propose solutions
+- Generate tables or lengthy explanations
+Max response: 500 tokens (configurable via .deepflow/config.yaml explore.max_tokens)
+```
+**Explore Agent Scope Restrictions:**
+- MUST only report factual findings:
+  - Files found
+  - Patterns/conventions observed
+  - Integration points
+- MUST NOT:
+  - Make recommendations
+  - Propose architectures
+  - Read and summarize specs (that's orchestrator's job)
+  - Draw conclusions about what should be built
 ### 2. GAP CHECK
 Use the `gap-discovery` skill to analyze conversation + agent findings.
@@ -70,7 +110,14 @@ Max 4 questions per tool call. Wait for answers before proceeding.
 ### 3. SYNTHESIZE FINDINGS
-**Spawn `reasoner` agent** (Opus) to:
+**Use Task tool to spawn reasoner agent:**
+```
+Task tool parameters:
+- subagent_type: "reasoner"
+- model: "opus"
+```
+The reasoner will:
 - Analyze codebase context from Explore agents
 - Identify constraints from existing architecture
 - Suggest requirements based on patterns found
@@ -130,10 +177,12 @@ Next: Run /df:plan to generate tasks
 ## Agent Scaling
-| Agent | Base | Purpose |
-|-------|------|---------|
-| Explore (haiku) | 3-5 | Find related code, patterns |
-| Reasoner (Opus) | 1 | Synthesize into requirements |
+| Agent | subagent_type | model | Base | Purpose |
+|-------|---------------|-------|------|---------|
+| Explore | `Explore` | `haiku` | 3-5 | Find related code, patterns |
+| Reasoner | `reasoner` | `opus` | 1 | Synthesize into requirements |
+**IMPORTANT**: Always use the `Task` tool with explicit `subagent_type` and `model` parameters.
 ## Example

package/src/commands/df/verify.md CHANGED Viewed

@@ -12,7 +12,11 @@ Check that implemented code satisfies spec requirements and acceptance criteria.
 ## Skills & Agents
 - Skill: `code-completeness` — Find incomplete implementations
-- Agent: `Explore` (Haiku) — Fast codebase scanning
+**Use Task tool to spawn agents:**
+| Agent | subagent_type | model | Purpose |
+|-------|---------------|-------|---------|
+| Scanner | `Explore` | `haiku` | Fast codebase scanning |
 ## Spec File States
@@ -87,7 +91,15 @@ Default: L1-L3 (L4 optional, can be slow)
 ## Agent Usage
-Spawn `Explore` agents (Haiku), 1-2 per spec, cap 10.
+**Use Task tool to spawn Explore agents:**
+```
+Task tool parameters:
+- subagent_type: "Explore"
+- model: "haiku"
+- run_in_background: true (for parallel)
+```
+Scale: 1-2 agents per spec, cap 10.
 ## Example

package/templates/config-template.yaml CHANGED Viewed

@@ -36,6 +36,9 @@ models:
   reason: opus         # Complex decisions
   debug: opus          # Problem solving
+explore:
+  max_tokens: 500      # Controls Explore agent response length
 commits:
   format: "feat({spec}): {description}"
   atomic: true         # One task = one commit

package/templates/experiment-template.md ADDED Viewed

@@ -0,0 +1,74 @@
+# Experiment: {hypothesis-slug}
+> **Filename convention**: `{topic}--{hypothesis-slug}--{status}.md`
+> Status: `active` | `passed` | `failed`
+## Topic
+{Spec name or feature area this experiment relates to}
+<!--
+What problem or feature does this experiment address?
+Link to relevant spec if applicable.
+-->
+## Hypothesis
+{What we believe will work and why}
+<!--
+Be specific and testable:
+- "Using approach X will achieve Y because Z"
+- "The bottleneck is in component A, not B"
+- Should be falsifiable in a single experiment
+-->
+## Method
+{Minimal steps to validate the hypothesis}
+<!--
+Keep it minimal - fastest path to prove/disprove:
+1. Step one (e.g., "Create test file with X")
+2. Step two (e.g., "Run command Y")
+3. Step three (e.g., "Observe output Z")
+Time-box: ideally under 30 minutes
+-->
+## Result
+**Status**: {pass | fail}
+{Actual outcome with evidence}
+<!--
+Include concrete evidence:
+- Error messages, output logs
+- Metrics or measurements
+- Screenshots if applicable
+- What specifically happened vs. expected
+-->
+## Conclusion
+{What we learned from this experiment}
+<!--
+Answer these:
+- Why did it pass/fail?
+- What assumption was validated/invalidated?
+- If failed: What's the next hypothesis? (don't repeat same approach)
+- If passed: What's ready for implementation?
+-->
+---
+<!--
+Experiment Guidelines:
+- One hypothesis per experiment
+- Failed experiments are valuable - they inform the next hypothesis
+- Never repeat a failed approach without a new insight
+- Keep experiments small and fast (under 30 min)
+- Link related experiments in conclusions
+-->

package/templates/plan-template.md CHANGED Viewed

@@ -29,6 +29,22 @@ Generated: {timestamp}
   - Files: {files}
   - Blocked by: T1
+### Spike Task Example
+When no experiments exist to validate an approach, start with a minimal validation spike:
+- [ ] **T1** (spike): Validate [hypothesis] approach
+  - Files: [minimal files needed]
+  - Blocked by: none
+  - Blocks: T2, T3, T4 (full implementation)
+  - Description: Minimal test to verify [approach] works before full implementation
+- [ ] **T2**: Implement [feature] based on spike results
+  - Files: [implementation files]
+  - Blocked by: T1 (spike)
+Spike tasks are 1-2 tasks to validate an approach before committing to full implementation.
 ---
 <!--
@@ -38,4 +54,6 @@ Plan Guidelines:
 - Blocked by references task IDs (T1, T2, etc.)
 - Mark complete with [x] and commit hash
 - Example completed: [x] **T1**: Create API ✓ (abc1234)
+- Spike tasks: If no experiments validate the approach, first task should be a minimal validation spike
+- Spike tasks block full implementation tasks until the hypothesis is validated
 -->