deepflow 0.1.72 → 0.1.73

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +80 -201
  2. package/package.json +7 -3
package/README.md CHANGED
@@ -8,25 +8,36 @@
8
8
  ```
9
9
 
10
10
  <p align="center">
11
- <strong>Stay in flow state spec-driven task orchestration for Claude Code</strong>
11
+ <strong>Doing reveals what thinking can't predict</strong>
12
12
  </p>
13
13
 
14
14
  <p align="center">
15
15
  <a href="#quick-start">Quick Start</a> •
16
16
  <a href="#two-modes">Two Modes</a> •
17
- <a href="#commands">Commands</a>
17
+ <a href="#commands">Commands</a>
18
+ <a href="#what-deepflow-rejects">What It Rejects</a> •
19
+ <a href="#principles">Principles</a>
18
20
  </p>
19
21
 
20
22
  ---
21
23
 
22
- ## Philosophy
24
+ ## Why Deepflow
23
25
 
24
- - **Specs define intent**, tasks close reality gaps
25
- - **You decide WHAT to build** — the AI decides HOW
26
- - **Two modes:** interactive (human-in-the-loop) and autonomous (overnight, unattended)
27
- - **Spike-first planning** — Validate risky hypotheses before full implementation
28
- - **Worktree isolation** — Main branch stays clean during execution
29
- - **Atomic commits** for clean rollback
26
+ **You can't foresee what you don't know to ask.** Doing reveals — at every layer.
27
+
28
+ Most spec-driven frameworks start from a finished spec and execute a static plan. Deepflow treats the entire process as discovery: asking reveals hidden requirements, debating reveals blind spots, spiking reveals technical risks, implementing reveals edge cases. Each step makes the next one sharper.
29
+
30
+ - **Asking reveals what assuming hides** — Before any code, Socratic questioning surfaces the requirements you didn't know you had. Four AI perspectives collide to expose tensions in your approach. The spec isn't written from what you think you know — it's written from what the conversation uncovered.
31
+ - **Spec as living hypothesis** Core intent stays fixed, details refine through implementation. "The spec becomes bulletproof because you built it, not before."
32
+ - **Parallel probes reveal the best path** — Uncertain approaches spawn parallel spikes in isolated worktrees. The machine selects the winner (fewer regressions > better coverage > fewer files changed). Failed approaches stay recorded and never repeat.
33
+ - **Metrics decide, not opinions** — No LLM judges another LLM. Build, tests, typecheck, lint are the only judges. After an agent commits, the orchestrator runs health checks. Pass = keep. Fail = revert + new hypothesis.
34
+ - **The loop is the product** — Not "execute a plan" — "evolve the codebase toward the spec's goals through iterative cycles." Each cycle reveals what the previous one couldn't see.
35
+
36
+ ## What We Learned by Doing
37
+
38
+ Deepflow started with adversarial selection: one AI evaluated another AI's code in a fresh context. The "doing reveals" philosophy applied to the system itself — we discovered that **LLM judging LLM produces gaming**: agents that estimated instead of measuring, simulated instead of implementing, presented shortcuts as deliverables.
39
+
40
+ The fix: eliminate subjective judgment. Only objective metrics decide. Tests created by the agent itself are excluded from the baseline to prevent self-validation. We call this a **ratchet** — inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch): a mechanism where the metric can only improve, never regress. Each cycle ratchets quality forward.
30
41
 
31
42
  ## Quick Start
32
43
 
@@ -40,210 +51,83 @@ npx deepflow --uninstall
40
51
 
41
52
  ## Two Modes
42
53
 
43
- deepflow has two modes of operation. Both start from the same artifact: a **spec**.
44
-
45
- ### Interactive Mode (human-in-the-loop)
54
+ ### Interactive (human-in-the-loop)
46
55
 
47
- You drive each step inside a Claude Code session. Good for when you want control over the process, are exploring a new domain, or want to iterate on the spec.
56
+ You explore the problem, shape the spec, and trigger execution all inside a Claude Code session.
48
57
 
49
58
  ```bash
50
59
  claude
51
60
 
52
- # 1. Explore the problem space (conversation with you)
61
+ # 1. Discover — understand the problem before solving it
53
62
  /df:discover image-upload
63
+ # "Why do you need image upload? What exists today?
64
+ # What file sizes? What formats? Where are images stored?
65
+ # What does 'done' look like? What should this NOT do?"
54
66
 
55
- # 2. Debate tradeoffs (optional, 4 AI perspectives)
67
+ # 2. Debate stress-test the approach (optional)
56
68
  /df:debate upload-strategy
69
+ # User Advocate: "Drag-and-drop is table stakes, not a feature"
70
+ # Tech Skeptic: "Client-side resize before upload, or you'll hit memory limits"
71
+ # Systems Thinker: "What happens when storage goes down mid-upload?"
72
+ # LLM Efficiency: "Split this into two specs: upload + processing"
57
73
 
58
- # 3. Generate spec from conversation
74
+ # 3. Spec now the conversation is rich enough to produce a solid spec
59
75
  /df:spec image-upload
60
76
 
61
- # 4. Generate task plan from spec
62
- /df:plan
63
-
64
- # 5. Execute tasks (parallel agents, you watch)
65
- /df:execute
66
-
67
- # 6. Verify and merge to main
68
- /df:verify
77
+ # 4-6: the AI takes over
78
+ /df:plan # Compare spec to code, create tasks
79
+ /df:execute # Parallel agents in worktree, ratchet validates
80
+ /df:verify # Check spec satisfied, merge to main
69
81
  ```
70
82
 
71
83
  **What requires you:** Steps 1-3 (defining the problem and approving the spec). Steps 4-6 run autonomously but you trigger each one and can intervene.
72
84
 
73
- ### Autonomous Mode (unattended)
74
-
75
- You write the specs, then walk away. The AI runs the full pipeline — hypothesis generation, parallel spikes, implementation, adversarial self-selection, verification — without any human intervention.
76
-
77
- ```bash
78
- # You define WHAT (the specs), the AI figures out HOW, overnight
85
+ ### Autonomous (unattended)
79
86
 
80
- # Inside Claude Code (requires Agent Teams)
81
- /df:auto # process all specs in specs/
82
- ```
87
+ The human loop comes first discover and debate are where intent gets shaped. You refine the problem, stress-test ideas, and produce a spec that captures what you actually need. That's the living contract. Then you hand it off.
83
88
 
84
- **What the AI does alone:**
85
- 1. Pre-checks if spec is already satisfied (skips if so)
86
- 2. Discovers specs, respects `depends_on` ordering
87
- 3. Generates N hypotheses for how to implement each spec
88
- 4. Runs parallel spikes in isolated worktrees (one per hypothesis)
89
- 5. Implements the passing approaches
90
- 6. Adversarial selection: a fresh AI context compares approaches by artifacts only (never reads code), picks the best or rejects all
91
- 7. If rejected: generates new hypotheses, retries (up to max-cycles)
92
- 8. On convergence: verifies (L0-L4 gates), creates PR, merges to main
93
-
94
- **What you do:** Write specs (via interactive mode or manually) in `specs/`, run `/df:auto` inside Claude Code, read the report at `.deepflow/auto-report.md`. No need to run `/df:plan` first — auto mode promotes plain specs to `doing-*` automatically.
95
-
96
- **How to use:**
97
89
  ```bash
98
- # In Claude Codecreate and approve a spec
90
+ # First: the human loop discover, debate, refine until the spec is solid
99
91
  $ claude
100
92
  > /df:discover auth
101
- > /df:spec auth # creates specs/auth.md
93
+ > /df:debate auth-strategy
94
+ > /df:spec auth # specs/auth.md — the handoff point
102
95
  > /exit
103
96
 
104
- # Inside Claude Coderun auto mode
97
+ # Then: the AI loop plan, execute, validate, merge
98
+ $ claude
105
99
  > /df:auto
106
100
 
107
- # Next morning — check what happened
101
+ # Next morning
108
102
  $ cat .deepflow/auto-report.md
109
103
  $ git log --oneline
110
104
  ```
111
105
 
112
- **Safety:** Never pushes to remote. Failed approaches recorded in `.deepflow/experiments/` and never repeated. Specs validated before processing (malformed specs are skipped).
106
+ **What the AI does alone:**
107
+ 1. Runs `/df:plan` if no PLAN.md exists
108
+ 2. Snapshots pre-existing tests (ratchet baseline)
109
+ 3. Starts a loop (`/loop 1m /df:auto-cycle`) — fresh context each cycle
110
+ 4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint)
111
+ 5. Pass = commit stands. Fail = revert + retry next cycle
112
+ 6. Circuit breaker: halts after N consecutive reverts on same task
113
+ 7. When all tasks done: runs `/df:verify`, merges to main
114
+
115
+ **Safety:** Never pushes to remote. Failed approaches recorded in `.deepflow/experiments/` and never repeated. Specs validated before processing.
113
116
 
114
- ### The Boundary
117
+ ### Two Loops, One Handoff
115
118
 
116
119
  ```
117
- YOU (the human) AI (autonomous)
120
+ HUMAN LOOP AI LOOP
118
121
  ───────────────────────────────── ──────────────────────────────────
119
- Define the problem Generate hypotheses
120
- Write/approve the spec Spike, implement, compare
121
- Set constraints & acceptance Self-judge, verify against YOUR criteria
122
- criteria Merge or retry
123
- Read morning report
122
+ /df:discover ask, surface gaps /df:plan — compare spec to code
123
+ /df:debate stress-test approach /df:execute — spike, implement
124
+ /df:spec produce living contract /df:verify health checks, merge
125
+ refine until solid ↻ retry until converged
124
126
  ───────────────────────────────── ──────────────────────────────────
125
127
  specs/*.md is the handoff point
126
128
  ```
127
129
 
128
- ## The Flow (Interactive)
129
-
130
- ```
131
- /df:discover <name>
132
- | Socratic questioning (motivation, scope, constraints...)
133
- v
134
- /df:debate <topic> <- optional
135
- | 4 perspectives: User Advocate, Tech Skeptic,
136
- | Systems Thinker, LLM Efficiency
137
- | Creates specs/.debate-{topic}.md
138
- v
139
- /df:spec <name>
140
- | Creates specs/{name}.md from conversation
141
- | Validates structure before writing
142
- v
143
- /df:plan
144
- | Checks past experiments (learn from failures)
145
- | Risky work? -> generates spike task first
146
- | Creates PLAN.md with prioritized tasks
147
- | Renames: feature.md -> doing-feature.md
148
- v
149
- /df:execute
150
- | Creates isolated worktree (main stays clean)
151
- | Spike tasks run first, verified before continuing
152
- | Parallel agents, file conflicts serialize
153
- | Context-aware (>=50% -> checkpoint)
154
- v
155
- /df:verify
156
- | Checks requirements met
157
- | Merges worktree to main, cleans up
158
- | Extracts decisions -> .deepflow/decisions.md
159
- | Deletes done-* spec after extraction
160
- ```
161
-
162
- ## The Flow (Autonomous)
163
-
164
- ```
165
- /df:auto
166
- | Discover specs (auto-promote, topological sort by depends_on)
167
- | For each doing-* spec:
168
- |
169
- | Pre-check (Haiku: already satisfied? skip)
170
- | v
171
- | Validate spec (malformed? skip)
172
- | v
173
- | Generate N hypotheses
174
- | v
175
- | Parallel spikes (one worktree per hypothesis)
176
- | | Pass? -> implement in same worktree
177
- | | Fail? -> record experiment, discard
178
- | v
179
- | Adversarial selection (fresh context, artifacts only)
180
- | | Winner? -> verify (L0-L4) -> PR -> merge
181
- | | Reject all? -> new hypotheses, retry
182
- | v
183
- | Morning report -> .deepflow/auto-report.md
184
- ```
185
-
186
- ## Spec Lifecycle
187
-
188
- ```
189
- specs/
190
- feature.md -> new, needs /df:plan
191
- doing-feature.md -> in progress (active contract between you and the AI)
192
- done-feature.md -> transient (decisions extracted, then deleted)
193
- ```
194
-
195
- ## Works With Any Project
196
-
197
- **Greenfield:** Everything is new, agents create from scratch.
198
-
199
- **Ongoing:** Detects existing patterns, follows conventions, integrates with current code.
200
-
201
- ## Spike-First Planning
202
-
203
- For risky or uncertain work, `/df:plan` generates a **spike task** first:
204
-
205
- ```
206
- Spike: Validate streaming upload handles 10MB+ files
207
- | Run minimal experiment
208
- | Pass? -> Unblock implementation tasks
209
- | Fail? -> Record learning, generate new hypothesis
210
- ```
211
-
212
- Experiments are tracked in `.deepflow/experiments/`. Failed approaches won't be repeated.
213
-
214
- ## Worktree Isolation
215
-
216
- Execution happens in an isolated git worktree:
217
- - Main branch stays clean during execution
218
- - On failure, worktree preserved for debugging
219
- - Resume with `/df:execute --continue`
220
- - On success, `/df:verify` merges to main and cleans up
221
-
222
- ## LSP Integration
223
-
224
- /df:automatically enables Claude Code's LSP tools during install, giving agents access to `goToDefinition`, `findReferences`, and `workspaceSymbol` for precise code navigation instead of grep-based searching.
225
-
226
- - **Global install:** sets `ENABLE_LSP_TOOL=1` in `~/.claude/settings.json`
227
- - **Project install:** sets it in `.claude/settings.local.json`
228
- - **Uninstall:** cleans up automatically
229
-
230
- Agents prefer LSP tools when available and fall back to Grep/Glob silently. You'll need a language server installed for your language (e.g. `typescript-language-server`, `pyright`, `rust-analyzer`, `gopls`).
231
-
232
- ## Spec Validation
233
-
234
- Specs are validated before downstream consumption by `/df:spec`, `/df:plan`, and `/df:auto`:
235
-
236
- - **Hard invariants** (block on failure): required sections present, REQ-N prefixes, checkbox ACs, no duplicate IDs
237
- - **Advisory warnings** (warn interactively, block in auto mode): long specs, orphaned requirements, excessive technical notes
238
-
239
- Run manually: `node hooks/df-spec-lint.js specs/my-spec.md`
240
-
241
- ## Context-Aware Execution
242
-
243
- Statusline shows context usage. At >=50%:
244
- - Waits for running agents
245
- - Checkpoints state
246
- - Resume with `/df:execute --continue`
130
+ **Spec lifecycle:** `feature.md` (new) → `doing-feature.md` (in progress) → `done-feature.md` (decisions extracted, then deleted)
247
131
 
248
132
  ## Commands
249
133
 
@@ -259,7 +143,7 @@ Statusline shows context usage. At >=50%:
259
143
  | `/df:consolidate` | Deduplicate and clean up decisions.md |
260
144
  | `/df:resume` | Session continuity briefing |
261
145
  | `/df:update` | Update deepflow to latest |
262
- | `/df:auto` | Autonomous execution via /loop (no human needed) |
146
+ | `/df:auto` | Autonomous mode (plan loop → verify, no human needed) |
263
147
 
264
148
  ## File Structure
265
149
 
@@ -273,39 +157,34 @@ your-project/
273
157
  +-- config.yaml # project settings
274
158
  +-- decisions.md # auto-extracted + ad-hoc decisions
275
159
  +-- auto-report.md # morning report (autonomous mode)
276
- +-- auto-decisions.log # AI decision log (autonomous mode)
277
- +-- last-consolidated.json # consolidation timestamp
278
- +-- context.json # context % tracking
160
+ +-- auto-memory.yaml # cross-cycle learning
279
161
  +-- experiments/ # spike results (pass/fail)
280
162
  +-- worktrees/ # isolated execution
281
163
  +-- upload/ # one worktree per spec
282
164
  ```
283
165
 
284
- ## Configuration
285
-
286
- Create `.deepflow/config.yaml`:
166
+ ## What Deepflow Rejects
287
167
 
288
- ```yaml
289
- project:
290
- source_dir: src/
291
- specs_dir: specs/
168
+ - **Predicting everything before doing** — You discover what you need by building it. TDD assumes you already know the correct behavior before coding. Deepflow assumes that **execution reveals** what planning can't anticipate.
169
+ - **LLM judging LLM** — We started with adversarial selection (AI evaluating AI). We discovered gaming. We replaced it with objective metrics. Deepflow's own evolution proved the principle.
170
+ - **Agents role-playing job titles** — Flat orchestrator + model routing. No PM agent, no QA agent, no Scrum Master agent.
171
+ - **Automated research before understanding** — Conversation with you first. AI research comes after you've defined the problem.
172
+ - **Ceremony** — 6 commands, one flow. Markdown, not schemas. No sprint planning, no story points, no retrospectives.
292
173
 
293
- parallelism:
294
- execute:
295
- max: 5 # max parallel agents
174
+ ## Principles
296
175
 
297
- worktree:
298
- cleanup_on_success: true
299
- cleanup_on_fail: false # preserve for debugging
300
- ```
176
+ 1. **Discover before specifying, spike before implementing** — Ask, debate, probe — then commit
177
+ 2. **You define WHAT, AI figures out HOW** — Specs are the contract
178
+ 3. **Metrics decide, not opinions** — Build/test/typecheck/lint are the only judges
179
+ 4. **Confirm before assume** — Search the code before marking "missing"
180
+ 5. **Complete implementations** — No stubs, no placeholders
181
+ 6. **Atomic commits** — One task = one commit
182
+ 7. **Context-aware** — Checkpoint before limits, resume seamlessly
301
183
 
302
- ## Principles
184
+ ## More
303
185
 
304
- 1. **You define WHAT, AI figures out HOW** — Specs are the contract
305
- 2. **Confirm before assume** Search code before marking "missing"
306
- 3. **Complete implementations** — No stubs, no placeholders
307
- 4. **Atomic commits** — One task = one commit
308
- 5. **Context-aware** — Checkpoint before limits
186
+ - [Concepts](docs/concepts.md) Philosophy and flow in depth
187
+ - [Configuration](docs/configuration.md)All options, models, parallelism
309
188
 
310
189
  ## License
311
190
 
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "deepflow",
3
- "version": "0.1.72",
4
- "description": "Stay in flow state - lightweight spec-driven task orchestration for Claude Code",
3
+ "version": "0.1.73",
4
+ "description": "Doing reveals what thinking can't predict spec-driven iterative development for Claude Code",
5
5
  "keywords": [
6
6
  "claude",
7
7
  "claude-code",
@@ -12,7 +12,11 @@
12
12
  "specs",
13
13
  "tasks",
14
14
  "automation",
15
- "productivity"
15
+ "productivity",
16
+ "ratchet",
17
+ "autonomous",
18
+ "spikes",
19
+ "evolutionary"
16
20
  ],
17
21
  "author": "saidwafiq",
18
22
  "license": "MIT",