deepflow 0.1.72 → 0.1.73
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +80 -201
- package/package.json +7 -3
package/README.md
CHANGED
|
@@ -8,25 +8,36 @@
|
|
|
8
8
|
```
|
|
9
9
|
|
|
10
10
|
<p align="center">
|
|
11
|
-
<strong>
|
|
11
|
+
<strong>Doing reveals what thinking can't predict</strong>
|
|
12
12
|
</p>
|
|
13
13
|
|
|
14
14
|
<p align="center">
|
|
15
15
|
<a href="#quick-start">Quick Start</a> •
|
|
16
16
|
<a href="#two-modes">Two Modes</a> •
|
|
17
|
-
<a href="#commands">Commands</a>
|
|
17
|
+
<a href="#commands">Commands</a> •
|
|
18
|
+
<a href="#what-deepflow-rejects">What It Rejects</a> •
|
|
19
|
+
<a href="#principles">Principles</a>
|
|
18
20
|
</p>
|
|
19
21
|
|
|
20
22
|
---
|
|
21
23
|
|
|
22
|
-
##
|
|
24
|
+
## Why Deepflow
|
|
23
25
|
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
-
|
|
27
|
-
|
|
28
|
-
- **
|
|
29
|
-
- **
|
|
26
|
+
**You can't foresee what you don't know to ask.** Doing reveals — at every layer.
|
|
27
|
+
|
|
28
|
+
Most spec-driven frameworks start from a finished spec and execute a static plan. Deepflow treats the entire process as discovery: asking reveals hidden requirements, debating reveals blind spots, spiking reveals technical risks, implementing reveals edge cases. Each step makes the next one sharper.
|
|
29
|
+
|
|
30
|
+
- **Asking reveals what assuming hides** — Before any code, Socratic questioning surfaces the requirements you didn't know you had. Four AI perspectives collide to expose tensions in your approach. The spec isn't written from what you think you know — it's written from what the conversation uncovered.
|
|
31
|
+
- **Spec as living hypothesis** — Core intent stays fixed, details refine through implementation. "The spec becomes bulletproof because you built it, not before."
|
|
32
|
+
- **Parallel probes reveal the best path** — Uncertain approaches spawn parallel spikes in isolated worktrees. The machine selects the winner (fewer regressions > better coverage > fewer files changed). Failed approaches stay recorded and never repeat.
|
|
33
|
+
- **Metrics decide, not opinions** — No LLM judges another LLM. Build, tests, typecheck, lint are the only judges. After an agent commits, the orchestrator runs health checks. Pass = keep. Fail = revert + new hypothesis.
|
|
34
|
+
- **The loop is the product** — Not "execute a plan" — "evolve the codebase toward the spec's goals through iterative cycles." Each cycle reveals what the previous one couldn't see.
|
|
35
|
+
|
|
36
|
+
## What We Learned by Doing
|
|
37
|
+
|
|
38
|
+
Deepflow started with adversarial selection: one AI evaluated another AI's code in a fresh context. The "doing reveals" philosophy applied to the system itself — we discovered that **LLM judging LLM produces gaming**: agents that estimated instead of measuring, simulated instead of implementing, presented shortcuts as deliverables.
|
|
39
|
+
|
|
40
|
+
The fix: eliminate subjective judgment. Only objective metrics decide. Tests created by the agent itself are excluded from the baseline to prevent self-validation. We call this a **ratchet** — inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch): a mechanism where the metric can only improve, never regress. Each cycle ratchets quality forward.
|
|
30
41
|
|
|
31
42
|
## Quick Start
|
|
32
43
|
|
|
@@ -40,210 +51,83 @@ npx deepflow --uninstall
|
|
|
40
51
|
|
|
41
52
|
## Two Modes
|
|
42
53
|
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
### Interactive Mode (human-in-the-loop)
|
|
54
|
+
### Interactive (human-in-the-loop)
|
|
46
55
|
|
|
47
|
-
You
|
|
56
|
+
You explore the problem, shape the spec, and trigger execution — all inside a Claude Code session.
|
|
48
57
|
|
|
49
58
|
```bash
|
|
50
59
|
claude
|
|
51
60
|
|
|
52
|
-
# 1.
|
|
61
|
+
# 1. Discover — understand the problem before solving it
|
|
53
62
|
/df:discover image-upload
|
|
63
|
+
# "Why do you need image upload? What exists today?
|
|
64
|
+
# What file sizes? What formats? Where are images stored?
|
|
65
|
+
# What does 'done' look like? What should this NOT do?"
|
|
54
66
|
|
|
55
|
-
# 2. Debate
|
|
67
|
+
# 2. Debate — stress-test the approach (optional)
|
|
56
68
|
/df:debate upload-strategy
|
|
69
|
+
# User Advocate: "Drag-and-drop is table stakes, not a feature"
|
|
70
|
+
# Tech Skeptic: "Client-side resize before upload, or you'll hit memory limits"
|
|
71
|
+
# Systems Thinker: "What happens when storage goes down mid-upload?"
|
|
72
|
+
# LLM Efficiency: "Split this into two specs: upload + processing"
|
|
57
73
|
|
|
58
|
-
# 3.
|
|
74
|
+
# 3. Spec — now the conversation is rich enough to produce a solid spec
|
|
59
75
|
/df:spec image-upload
|
|
60
76
|
|
|
61
|
-
# 4
|
|
62
|
-
/df:plan
|
|
63
|
-
|
|
64
|
-
#
|
|
65
|
-
/df:execute
|
|
66
|
-
|
|
67
|
-
# 6. Verify and merge to main
|
|
68
|
-
/df:verify
|
|
77
|
+
# 4-6: the AI takes over
|
|
78
|
+
/df:plan # Compare spec to code, create tasks
|
|
79
|
+
/df:execute # Parallel agents in worktree, ratchet validates
|
|
80
|
+
/df:verify # Check spec satisfied, merge to main
|
|
69
81
|
```
|
|
70
82
|
|
|
71
83
|
**What requires you:** Steps 1-3 (defining the problem and approving the spec). Steps 4-6 run autonomously but you trigger each one and can intervene.
|
|
72
84
|
|
|
73
|
-
### Autonomous
|
|
74
|
-
|
|
75
|
-
You write the specs, then walk away. The AI runs the full pipeline — hypothesis generation, parallel spikes, implementation, adversarial self-selection, verification — without any human intervention.
|
|
76
|
-
|
|
77
|
-
```bash
|
|
78
|
-
# You define WHAT (the specs), the AI figures out HOW, overnight
|
|
85
|
+
### Autonomous (unattended)
|
|
79
86
|
|
|
80
|
-
|
|
81
|
-
/df:auto # process all specs in specs/
|
|
82
|
-
```
|
|
87
|
+
The human loop comes first — discover and debate are where intent gets shaped. You refine the problem, stress-test ideas, and produce a spec that captures what you actually need. That's the living contract. Then you hand it off.
|
|
83
88
|
|
|
84
|
-
**What the AI does alone:**
|
|
85
|
-
1. Pre-checks if spec is already satisfied (skips if so)
|
|
86
|
-
2. Discovers specs, respects `depends_on` ordering
|
|
87
|
-
3. Generates N hypotheses for how to implement each spec
|
|
88
|
-
4. Runs parallel spikes in isolated worktrees (one per hypothesis)
|
|
89
|
-
5. Implements the passing approaches
|
|
90
|
-
6. Adversarial selection: a fresh AI context compares approaches by artifacts only (never reads code), picks the best or rejects all
|
|
91
|
-
7. If rejected: generates new hypotheses, retries (up to max-cycles)
|
|
92
|
-
8. On convergence: verifies (L0-L4 gates), creates PR, merges to main
|
|
93
|
-
|
|
94
|
-
**What you do:** Write specs (via interactive mode or manually) in `specs/`, run `/df:auto` inside Claude Code, read the report at `.deepflow/auto-report.md`. No need to run `/df:plan` first — auto mode promotes plain specs to `doing-*` automatically.
|
|
95
|
-
|
|
96
|
-
**How to use:**
|
|
97
89
|
```bash
|
|
98
|
-
#
|
|
90
|
+
# First: the human loop — discover, debate, refine until the spec is solid
|
|
99
91
|
$ claude
|
|
100
92
|
> /df:discover auth
|
|
101
|
-
> /df:
|
|
93
|
+
> /df:debate auth-strategy
|
|
94
|
+
> /df:spec auth # specs/auth.md — the handoff point
|
|
102
95
|
> /exit
|
|
103
96
|
|
|
104
|
-
#
|
|
97
|
+
# Then: the AI loop — plan, execute, validate, merge
|
|
98
|
+
$ claude
|
|
105
99
|
> /df:auto
|
|
106
100
|
|
|
107
|
-
# Next morning
|
|
101
|
+
# Next morning
|
|
108
102
|
$ cat .deepflow/auto-report.md
|
|
109
103
|
$ git log --oneline
|
|
110
104
|
```
|
|
111
105
|
|
|
112
|
-
**
|
|
106
|
+
**What the AI does alone:**
|
|
107
|
+
1. Runs `/df:plan` if no PLAN.md exists
|
|
108
|
+
2. Snapshots pre-existing tests (ratchet baseline)
|
|
109
|
+
3. Starts a loop (`/loop 1m /df:auto-cycle`) — fresh context each cycle
|
|
110
|
+
4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint)
|
|
111
|
+
5. Pass = commit stands. Fail = revert + retry next cycle
|
|
112
|
+
6. Circuit breaker: halts after N consecutive reverts on same task
|
|
113
|
+
7. When all tasks done: runs `/df:verify`, merges to main
|
|
114
|
+
|
|
115
|
+
**Safety:** Never pushes to remote. Failed approaches recorded in `.deepflow/experiments/` and never repeated. Specs validated before processing.
|
|
113
116
|
|
|
114
|
-
###
|
|
117
|
+
### Two Loops, One Handoff
|
|
115
118
|
|
|
116
119
|
```
|
|
117
|
-
|
|
120
|
+
HUMAN LOOP AI LOOP
|
|
118
121
|
───────────────────────────────── ──────────────────────────────────
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
Read morning report
|
|
122
|
+
/df:discover — ask, surface gaps /df:plan — compare spec to code
|
|
123
|
+
/df:debate — stress-test approach /df:execute — spike, implement
|
|
124
|
+
/df:spec — produce living contract /df:verify — health checks, merge
|
|
125
|
+
↻ refine until solid ↻ retry until converged
|
|
124
126
|
───────────────────────────────── ──────────────────────────────────
|
|
125
127
|
specs/*.md is the handoff point
|
|
126
128
|
```
|
|
127
129
|
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
```
|
|
131
|
-
/df:discover <name>
|
|
132
|
-
| Socratic questioning (motivation, scope, constraints...)
|
|
133
|
-
v
|
|
134
|
-
/df:debate <topic> <- optional
|
|
135
|
-
| 4 perspectives: User Advocate, Tech Skeptic,
|
|
136
|
-
| Systems Thinker, LLM Efficiency
|
|
137
|
-
| Creates specs/.debate-{topic}.md
|
|
138
|
-
v
|
|
139
|
-
/df:spec <name>
|
|
140
|
-
| Creates specs/{name}.md from conversation
|
|
141
|
-
| Validates structure before writing
|
|
142
|
-
v
|
|
143
|
-
/df:plan
|
|
144
|
-
| Checks past experiments (learn from failures)
|
|
145
|
-
| Risky work? -> generates spike task first
|
|
146
|
-
| Creates PLAN.md with prioritized tasks
|
|
147
|
-
| Renames: feature.md -> doing-feature.md
|
|
148
|
-
v
|
|
149
|
-
/df:execute
|
|
150
|
-
| Creates isolated worktree (main stays clean)
|
|
151
|
-
| Spike tasks run first, verified before continuing
|
|
152
|
-
| Parallel agents, file conflicts serialize
|
|
153
|
-
| Context-aware (>=50% -> checkpoint)
|
|
154
|
-
v
|
|
155
|
-
/df:verify
|
|
156
|
-
| Checks requirements met
|
|
157
|
-
| Merges worktree to main, cleans up
|
|
158
|
-
| Extracts decisions -> .deepflow/decisions.md
|
|
159
|
-
| Deletes done-* spec after extraction
|
|
160
|
-
```
|
|
161
|
-
|
|
162
|
-
## The Flow (Autonomous)
|
|
163
|
-
|
|
164
|
-
```
|
|
165
|
-
/df:auto
|
|
166
|
-
| Discover specs (auto-promote, topological sort by depends_on)
|
|
167
|
-
| For each doing-* spec:
|
|
168
|
-
|
|
|
169
|
-
| Pre-check (Haiku: already satisfied? skip)
|
|
170
|
-
| v
|
|
171
|
-
| Validate spec (malformed? skip)
|
|
172
|
-
| v
|
|
173
|
-
| Generate N hypotheses
|
|
174
|
-
| v
|
|
175
|
-
| Parallel spikes (one worktree per hypothesis)
|
|
176
|
-
| | Pass? -> implement in same worktree
|
|
177
|
-
| | Fail? -> record experiment, discard
|
|
178
|
-
| v
|
|
179
|
-
| Adversarial selection (fresh context, artifacts only)
|
|
180
|
-
| | Winner? -> verify (L0-L4) -> PR -> merge
|
|
181
|
-
| | Reject all? -> new hypotheses, retry
|
|
182
|
-
| v
|
|
183
|
-
| Morning report -> .deepflow/auto-report.md
|
|
184
|
-
```
|
|
185
|
-
|
|
186
|
-
## Spec Lifecycle
|
|
187
|
-
|
|
188
|
-
```
|
|
189
|
-
specs/
|
|
190
|
-
feature.md -> new, needs /df:plan
|
|
191
|
-
doing-feature.md -> in progress (active contract between you and the AI)
|
|
192
|
-
done-feature.md -> transient (decisions extracted, then deleted)
|
|
193
|
-
```
|
|
194
|
-
|
|
195
|
-
## Works With Any Project
|
|
196
|
-
|
|
197
|
-
**Greenfield:** Everything is new, agents create from scratch.
|
|
198
|
-
|
|
199
|
-
**Ongoing:** Detects existing patterns, follows conventions, integrates with current code.
|
|
200
|
-
|
|
201
|
-
## Spike-First Planning
|
|
202
|
-
|
|
203
|
-
For risky or uncertain work, `/df:plan` generates a **spike task** first:
|
|
204
|
-
|
|
205
|
-
```
|
|
206
|
-
Spike: Validate streaming upload handles 10MB+ files
|
|
207
|
-
| Run minimal experiment
|
|
208
|
-
| Pass? -> Unblock implementation tasks
|
|
209
|
-
| Fail? -> Record learning, generate new hypothesis
|
|
210
|
-
```
|
|
211
|
-
|
|
212
|
-
Experiments are tracked in `.deepflow/experiments/`. Failed approaches won't be repeated.
|
|
213
|
-
|
|
214
|
-
## Worktree Isolation
|
|
215
|
-
|
|
216
|
-
Execution happens in an isolated git worktree:
|
|
217
|
-
- Main branch stays clean during execution
|
|
218
|
-
- On failure, worktree preserved for debugging
|
|
219
|
-
- Resume with `/df:execute --continue`
|
|
220
|
-
- On success, `/df:verify` merges to main and cleans up
|
|
221
|
-
|
|
222
|
-
## LSP Integration
|
|
223
|
-
|
|
224
|
-
/df:automatically enables Claude Code's LSP tools during install, giving agents access to `goToDefinition`, `findReferences`, and `workspaceSymbol` for precise code navigation instead of grep-based searching.
|
|
225
|
-
|
|
226
|
-
- **Global install:** sets `ENABLE_LSP_TOOL=1` in `~/.claude/settings.json`
|
|
227
|
-
- **Project install:** sets it in `.claude/settings.local.json`
|
|
228
|
-
- **Uninstall:** cleans up automatically
|
|
229
|
-
|
|
230
|
-
Agents prefer LSP tools when available and fall back to Grep/Glob silently. You'll need a language server installed for your language (e.g. `typescript-language-server`, `pyright`, `rust-analyzer`, `gopls`).
|
|
231
|
-
|
|
232
|
-
## Spec Validation
|
|
233
|
-
|
|
234
|
-
Specs are validated before downstream consumption by `/df:spec`, `/df:plan`, and `/df:auto`:
|
|
235
|
-
|
|
236
|
-
- **Hard invariants** (block on failure): required sections present, REQ-N prefixes, checkbox ACs, no duplicate IDs
|
|
237
|
-
- **Advisory warnings** (warn interactively, block in auto mode): long specs, orphaned requirements, excessive technical notes
|
|
238
|
-
|
|
239
|
-
Run manually: `node hooks/df-spec-lint.js specs/my-spec.md`
|
|
240
|
-
|
|
241
|
-
## Context-Aware Execution
|
|
242
|
-
|
|
243
|
-
Statusline shows context usage. At >=50%:
|
|
244
|
-
- Waits for running agents
|
|
245
|
-
- Checkpoints state
|
|
246
|
-
- Resume with `/df:execute --continue`
|
|
130
|
+
**Spec lifecycle:** `feature.md` (new) → `doing-feature.md` (in progress) → `done-feature.md` (decisions extracted, then deleted)
|
|
247
131
|
|
|
248
132
|
## Commands
|
|
249
133
|
|
|
@@ -259,7 +143,7 @@ Statusline shows context usage. At >=50%:
|
|
|
259
143
|
| `/df:consolidate` | Deduplicate and clean up decisions.md |
|
|
260
144
|
| `/df:resume` | Session continuity briefing |
|
|
261
145
|
| `/df:update` | Update deepflow to latest |
|
|
262
|
-
| `/df:auto` | Autonomous
|
|
146
|
+
| `/df:auto` | Autonomous mode (plan → loop → verify, no human needed) |
|
|
263
147
|
|
|
264
148
|
## File Structure
|
|
265
149
|
|
|
@@ -273,39 +157,34 @@ your-project/
|
|
|
273
157
|
+-- config.yaml # project settings
|
|
274
158
|
+-- decisions.md # auto-extracted + ad-hoc decisions
|
|
275
159
|
+-- auto-report.md # morning report (autonomous mode)
|
|
276
|
-
+-- auto-
|
|
277
|
-
+-- last-consolidated.json # consolidation timestamp
|
|
278
|
-
+-- context.json # context % tracking
|
|
160
|
+
+-- auto-memory.yaml # cross-cycle learning
|
|
279
161
|
+-- experiments/ # spike results (pass/fail)
|
|
280
162
|
+-- worktrees/ # isolated execution
|
|
281
163
|
+-- upload/ # one worktree per spec
|
|
282
164
|
```
|
|
283
165
|
|
|
284
|
-
##
|
|
285
|
-
|
|
286
|
-
Create `.deepflow/config.yaml`:
|
|
166
|
+
## What Deepflow Rejects
|
|
287
167
|
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
|
|
168
|
+
- **Predicting everything before doing** — You discover what you need by building it. TDD assumes you already know the correct behavior before coding. Deepflow assumes that **execution reveals** what planning can't anticipate.
|
|
169
|
+
- **LLM judging LLM** — We started with adversarial selection (AI evaluating AI). We discovered gaming. We replaced it with objective metrics. Deepflow's own evolution proved the principle.
|
|
170
|
+
- **Agents role-playing job titles** — Flat orchestrator + model routing. No PM agent, no QA agent, no Scrum Master agent.
|
|
171
|
+
- **Automated research before understanding** — Conversation with you first. AI research comes after you've defined the problem.
|
|
172
|
+
- **Ceremony** — 6 commands, one flow. Markdown, not schemas. No sprint planning, no story points, no retrospectives.
|
|
292
173
|
|
|
293
|
-
|
|
294
|
-
execute:
|
|
295
|
-
max: 5 # max parallel agents
|
|
174
|
+
## Principles
|
|
296
175
|
|
|
297
|
-
|
|
298
|
-
|
|
299
|
-
|
|
300
|
-
|
|
176
|
+
1. **Discover before specifying, spike before implementing** — Ask, debate, probe — then commit
|
|
177
|
+
2. **You define WHAT, AI figures out HOW** — Specs are the contract
|
|
178
|
+
3. **Metrics decide, not opinions** — Build/test/typecheck/lint are the only judges
|
|
179
|
+
4. **Confirm before assume** — Search the code before marking "missing"
|
|
180
|
+
5. **Complete implementations** — No stubs, no placeholders
|
|
181
|
+
6. **Atomic commits** — One task = one commit
|
|
182
|
+
7. **Context-aware** — Checkpoint before limits, resume seamlessly
|
|
301
183
|
|
|
302
|
-
##
|
|
184
|
+
## More
|
|
303
185
|
|
|
304
|
-
|
|
305
|
-
|
|
306
|
-
3. **Complete implementations** — No stubs, no placeholders
|
|
307
|
-
4. **Atomic commits** — One task = one commit
|
|
308
|
-
5. **Context-aware** — Checkpoint before limits
|
|
186
|
+
- [Concepts](docs/concepts.md) — Philosophy and flow in depth
|
|
187
|
+
- [Configuration](docs/configuration.md) — All options, models, parallelism
|
|
309
188
|
|
|
310
189
|
## License
|
|
311
190
|
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "deepflow",
|
|
3
|
-
"version": "0.1.
|
|
4
|
-
"description": "
|
|
3
|
+
"version": "0.1.73",
|
|
4
|
+
"description": "Doing reveals what thinking can't predict — spec-driven iterative development for Claude Code",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"claude",
|
|
7
7
|
"claude-code",
|
|
@@ -12,7 +12,11 @@
|
|
|
12
12
|
"specs",
|
|
13
13
|
"tasks",
|
|
14
14
|
"automation",
|
|
15
|
-
"productivity"
|
|
15
|
+
"productivity",
|
|
16
|
+
"ratchet",
|
|
17
|
+
"autonomous",
|
|
18
|
+
"spikes",
|
|
19
|
+
"evolutionary"
|
|
16
20
|
],
|
|
17
21
|
"author": "saidwafiq",
|
|
18
22
|
"license": "MIT",
|