deepflow 0.1.71 → 0.1.73

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -125,6 +125,51 @@ Check spec health: verify REQ-AC alignment, requirement clarity, and completenes
125
125
 
126
126
  **Blocking Logic:** All implementation tasks MUST have `Blocked by: T{spike}` until spike passes. If spike fails: update to `--failed.md`, DO NOT generate implementation tasks.
127
127
 
128
+ #### Probe Diversity
129
+
130
+ When generating multiple spike probes for the same problem, diversity is required to avoid confirmation bias and enable discovery of unexpected solutions.
131
+
132
+ | Requirement | Rule |
133
+ |-------------|------|
134
+ | Contradictory | At least 2 probes must use opposing/contradictory approaches (e.g., streaming vs buffering, in-process vs external) |
135
+ | Naive | At least 1 probe must be a naive/simple approach without prior technical justification — enables exaptation (discovering unexpected solutions) |
136
+ | Parallel | All probes for the same problem run simultaneously, not sequentially |
137
+ | Scoped | Each probe is minimal — just enough to validate the hypothesis |
138
+ | Safe to fail | Each probe runs in its own worktree; failure has zero impact on main |
139
+
140
+ **Diversity validation step** — before outputting spike tasks, verify:
141
+ 1. Are there at least 2 probes with opposing assumptions? If not, add a contradictory probe.
142
+ 2. Is there at least 1 naive probe with no prior technical justification? If not, add one.
143
+ 3. Are all probes independent (no probe depends on another probe's result)?
144
+
145
+ **Example — 3 diverse probes for a caching problem:**
146
+
147
+ ```markdown
148
+ - [ ] **T1** [SPIKE]: Validate in-memory LRU cache
149
+ - Type: spike
150
+ - Role: Contradictory-A (in-process)
151
+ - Hypothesis: In-memory LRU cache reduces DB queries by ≥80%
152
+ - Method: Implement LRU with 1000-item cap, run load test
153
+ - Success criteria: DB query count drops ≥80% under 100 concurrent users
154
+ - Blocked by: none
155
+
156
+ - [ ] **T2** [SPIKE]: Validate Redis distributed cache
157
+ - Type: spike
158
+ - Role: Contradictory-B (external, opposing T1)
159
+ - Hypothesis: Redis cache scales across multiple instances
160
+ - Method: Add Redis client, cache top 10 queries, same load test
161
+ - Success criteria: DB queries drop ≥80%, works across 2 app instances
162
+ - Blocked by: none
163
+
164
+ - [ ] **T3** [SPIKE]: Validate query optimization without cache (naive)
165
+ - Type: spike
166
+ - Role: Naive (no prior justification — tests if caching is even necessary)
167
+ - Hypothesis: Indexes + query batching alone may be sufficient
168
+ - Method: Add missing indexes, batch N+1 queries, same load test — no cache
169
+ - Success criteria: DB queries drop ≥80% with zero cache infrastructure
170
+ - Blocked by: none
171
+ ```
172
+
128
173
  ### 7. VALIDATE HYPOTHESES
129
174
 
130
175
  For unfamiliar APIs, ambiguous approaches, or performance-critical work: prototype in scratchpad (not committed). If assumption fails, write `.deepflow/experiments/{topic}--{hypothesis}--failed.md`. Skip for well-known patterns/simple CRUD.
@@ -1,7 +1,7 @@
1
1
  # /df:verify — Verify Specs Satisfied
2
2
 
3
3
  ## Purpose
4
- Check that implemented code satisfies spec requirements and acceptance criteria.
4
+ Check that implemented code satisfies spec requirements and acceptance criteria. All checks are machine-verifiable — no LLM agents are used.
5
5
 
6
6
  **NEVER:** use EnterPlanMode, use ExitPlanMode
7
7
 
@@ -12,16 +12,6 @@ Check that implemented code satisfies spec requirements and acceptance criteria.
12
12
  /df:verify --re-verify # Re-verify done-* specs (already merged)
13
13
  ```
14
14
 
15
- ## Skills & Agents
16
- - Skill: `code-completeness` — Find incomplete implementations
17
-
18
- **Use Task tool to spawn agents:**
19
- | Agent | subagent_type | model | Purpose |
20
- |-------|---------------|-------|---------|
21
- | Scanner | `Explore` | `haiku` | Fast codebase scanning |
22
-
23
- Follow `templates/explore-agent.md` for all Explore agent spawning. Scale: 1-2 agents per spec, cap 10.
24
-
25
15
  ## Spec File States
26
16
 
27
17
  ```
@@ -65,18 +55,75 @@ If no `doing-*` specs found: report counts, suggest `/df:execute`.
65
55
  **L0: Build check** (if build command detected)
66
56
 
67
57
  Run the build command in the worktree:
68
- - Exit code 0 → L0 pass, continue to L1-L3
58
+ - Exit code 0 → L0 pass, continue to L1-L2
69
59
  - Exit code non-zero → L0 FAIL: report "✗ L0: Build failed" with last 30 lines, add fix task to PLAN.md, stop (skip L1-L4)
70
60
 
71
- **L1-L3: Static analysis** (via Explore agents)
61
+ **L1: Files exist** (machine-verifiable, via git)
62
+
63
+ Check that planned files appear in the worktree diff:
64
+
65
+ ```bash
66
+ # Get files changed in worktree branch
67
+ CHANGED=$(cd ${WORKTREE_PATH} && git diff main...HEAD --name-only)
68
+
69
+ # Parse PLAN.md for spec's "Files:" entries
70
+ PLANNED=$(grep -A1 "Files:" PLAN.md | grep -v "Files:" | tr ',' '\n' | xargs)
71
+
72
+ # Each planned file must appear in diff
73
+ for file in ${PLANNED}; do
74
+ echo "${CHANGED}" | grep -q "${file}" || MISSING+=("${file}")
75
+ done
76
+ ```
77
+
78
+ - All planned files in diff → L1 pass
79
+ - Missing files → L1 FAIL: report "✗ L1: Files not in diff: {list}"
80
+
81
+ **L2: Coverage** (coverage tool)
72
82
 
73
- Check requirements, acceptance criteria, and quality (stubs/TODOs).
74
- Mark each: ✓ satisfied | ✗ missing | ⚠ partial
75
- Prefer LSP tools (goToDefinition, findReferences, workspaceSymbol) when available; fall back to Grep/Glob silently.
83
+ **Step 1: Detect coverage tool** (first match wins):
84
+
85
+ | File/Config | Coverage Tool | Command |
86
+ |-------------|--------------|---------|
87
+ | `package.json` with `c8` in devDeps | c8 (Node) | `npx c8 --reporter=json-summary npm test` |
88
+ | `package.json` with `nyc` in devDeps | nyc (Node) | `npx nyc --reporter=json-summary npm test` |
89
+ | `.nycrc` or `.nycrc.json` exists | nyc (Node) | `npx nyc --reporter=json-summary npm test` |
90
+ | `pyproject.toml` or `setup.cfg` with coverage config | coverage.py | `python -m coverage run -m pytest && python -m coverage json` |
91
+ | `Cargo.toml` + `cargo-tarpaulin` installed | tarpaulin (Rust) | `cargo tarpaulin --out json` |
92
+ | `go.mod` | go cover (Go) | `go test -coverprofile=coverage.out ./...` |
93
+
94
+ **Step 2: No tool detected** → L2 passes with warning: "⚠ L2: No coverage tool detected, skipping coverage check"
95
+
96
+ **Step 3: Run coverage comparison** (when tool available):
97
+ ```bash
98
+ # Baseline: coverage on main branch (or from ratchet snapshot)
99
+ cd ${WORKTREE_PATH}
100
+ git stash # Temporarily remove changes
101
+ ${COVERAGE_COMMAND}
102
+ BASELINE=$(parse_coverage_percentage) # Extract total line coverage %
103
+ git stash pop
104
+
105
+ # Current: coverage with changes applied
106
+ ${COVERAGE_COMMAND}
107
+ CURRENT=$(parse_coverage_percentage)
108
+
109
+ # Compare
110
+ if [ "${CURRENT}" -lt "${BASELINE}" ]; then
111
+ echo "✗ L2: Coverage dropped ${BASELINE}% → ${CURRENT}%"
112
+ else
113
+ echo "✓ L2: Coverage ${CURRENT}% (baseline: ${BASELINE}%)"
114
+ fi
115
+ ```
116
+
117
+ - Coverage same or improved → L2 pass
118
+ - Coverage dropped → L2 FAIL: report "✗ L2: Coverage dropped {baseline}% → {current}%", add fix task
119
+
120
+ **L3: Integration** (subsumed by L0 + L4)
121
+
122
+ Subsumed by L0 (build) + L4 (tests). If code isn't imported/wired, build fails or tests fail. No separate verification needed.
76
123
 
77
124
  **L4: Test execution** (if test command detected)
78
125
 
79
- Run AFTER L0 passes and L1-L3 complete. Run even if L1-L3 found issues.
126
+ Run AFTER L0 passes and L1-L2 complete. Run even if L1-L2 found issues.
80
127
 
81
128
  - Exit code 0 → L4 pass
82
129
  - Exit code non-zero → L4 FAIL: capture last 50 lines, report "✗ L4: Tests failed (N of M)", add fix task
@@ -88,31 +135,30 @@ Run AFTER L0 passes and L1-L3 complete. Run even if L1-L3 found issues.
88
135
 
89
136
  **Format on success:**
90
137
  ```
91
- done-upload.md: L0 ✓ | 4/4 reqs ✓, 5/5 acceptance | L4 ✓ (12 tests) | 0 quality issues
138
+ doing-upload.md: L0 ✓ | L1 (5/5 files) | L2 ⚠ (no coverage tool) | L3 — (subsumed) | L4 ✓ (12 tests) | 0 quality issues
92
139
  ```
93
140
 
94
141
  **Format on failure:**
95
142
  ```
96
- done-upload.md: L0 ✓ | 4/4 reqs ✓, 3/5 acceptance | L4 ✗ (3 failed) | 1 quality issue
143
+ doing-upload.md: L0 ✓ | L1 (3/5 files) | L2 ⚠ | L3 — | L4 ✗ (3 failed)
97
144
 
98
145
  Issues:
99
- AC-3: YAML parsing missing for consolation
146
+ L1: Missing files: src/api/upload.ts, src/services/storage.ts
100
147
  ✗ L4: 3 test failures
101
148
  FAIL src/upload.test.ts > should validate file type
102
149
  FAIL src/upload.test.ts > should reject oversized files
103
- ⚠ Quality: TODO in parse_config()
104
150
 
105
151
  Fix tasks added to PLAN.md:
106
- T10: Add YAML parsing for consolation section
152
+ T10: Implement missing upload endpoint and storage service
107
153
  T11: Fix 3 failing tests in upload module
108
- T12: Remove TODO in parse_config()
109
154
 
110
155
  Run /df:execute --continue to fix in the same worktree.
111
156
  ```
112
157
 
113
158
  **Gate conditions (ALL must pass to merge):**
114
159
  - L0: Build passes (or no build command detected)
115
- - L1-L3: All requirements satisfied, no stubs, properly wired
160
+ - L1: All planned files appear in diff
161
+ - L2: Coverage didn't drop (or no coverage tool detected)
116
162
  - L4: Tests pass (or no test command detected)
117
163
 
118
164
  **If all gates pass:** Proceed to Post-Verification merge.
@@ -142,17 +188,17 @@ Files: ...
142
188
  | Level | Check | Method | Runner |
143
189
  |-------|-------|--------|--------|
144
190
  | L0: Builds | Code compiles/builds | Run build command | Orchestrator (Bash) |
145
- | L1: Exists | File/function exists | Glob/Grep (prefer workspaceSymbol if available) | Explore agents |
146
- | L2: Substantive | Real code, not stub | Read + analyze | Explore agents |
147
- | L3: Wired | Integrated into system | Trace imports/calls (prefer findReferences if available) | Explore agents |
191
+ | L1: Files exist | Planned files in diff | `git diff --name-only` vs PLAN.md | Orchestrator (Bash) |
192
+ | L2: Coverage | Coverage didn't drop | Coverage tool (before/after) | Orchestrator (Bash) |
193
+ | L3: Integration | Build + tests pass | Subsumed by L0 + L4 | |
148
194
  | L4: Tested | Tests pass | Run test command | Orchestrator (Bash) |
149
195
 
150
- **Default: L0 through L4.** L0 and L4 skipped ONLY if no build/test command detected (see step 1.5). L0 and L4 run via Bash — Explore agents cannot execute commands.
196
+ **Default: L0 through L4.** L0 and L4 skipped ONLY if no build/test command detected (see step 1.5). All checks are machine-verifiable. No LLM agents are used.
151
197
 
152
198
  ## Rules
153
199
  - Verify against spec, not assumptions
154
200
  - Flag partial implementations
155
- - Report TODO/FIXME as quality issues
201
+ - All checks machine-verifiable no LLM judgment
156
202
  - Don't auto-fix — add fix tasks to PLAN.md, then `/df:execute --continue`
157
203
  - Capture learnings — Write experiments for significant approaches
158
204
 
@@ -232,4 +278,3 @@ Output:
232
278
 
233
279
  Workflow complete! Ready for next feature: /df:spec <name>
234
280
  ```
235
-