pi-taskflow 0.0.10 → 0.0.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,117 +1,160 @@
1
1
  <div align="center">
2
2
 
3
- <img src="./assets/hero.png" alt="pi-taskflow — declarative, multi-phase subagent workflows" width="880">
3
+ <img src="./assets/hero.png" alt="pi-taskflow — declarative DAG orchestration for Pi subagents: stateful, resumable, context-isolated" width="900">
4
4
 
5
5
  <p>
6
6
  <a href="https://www.npmjs.com/package/pi-taskflow"><img src="https://img.shields.io/npm/v/pi-taskflow?style=flat-square&color=B692FF&label=npm" alt="npm version"></a>
7
+ <a href="https://www.npmjs.com/package/pi-taskflow"><img src="https://img.shields.io/npm/dm/pi-taskflow?style=flat-square&color=6E8BFF&label=downloads" alt="npm downloads"></a>
7
8
  <a href="./LICENSE"><img src="https://img.shields.io/badge/license-MIT-43D9AD?style=flat-square" alt="MIT license"></a>
8
- <a href="https://pi.dev"><img src="https://img.shields.io/badge/for-Pi%20coding%20agent-6E8BFF?style=flat-square" alt="for the Pi coding agent"></a>
9
+ <a href="#whats-inside"><img src="https://img.shields.io/badge/runtime%20deps-0-43D9AD?style=flat-square" alt="zero runtime dependencies"></a>
10
+ <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-269-6E8BFF?style=flat-square" alt="269 tests"></a>
11
+ <a href="https://pi.dev"><img src="https://img.shields.io/badge/for-Pi%20coding%20agent-B692FF?style=flat-square" alt="for the Pi coding agent"></a>
9
12
  </p>
10
13
 
14
+ <p><strong>Declarative DAG orchestration for <a href="https://pi.dev">Pi</a> subagents.</strong><br/>
15
+ Fan out · gate · resume · save as a command — intermediate results stay out of your context.</p>
16
+
17
+ ```bash
18
+ pi install npm:pi-taskflow
19
+ ```
20
+
11
21
  </div>
12
22
 
13
- > Lightweight workflow orchestration for the [Pi coding agent](https://pi.dev).
23
+ ---
14
24
 
15
- **Orchestrate your Pi subagents. Not by prompting by declaring.**
25
+ **Subagents are fire-and-forget. Taskflows fire, fan out, pause, gate, resume, and save themselves as a command.**
16
26
 
17
- If you've used the built-in subagent tool's `task` / `tasks` / `chain`, you
18
- already know the shorthand — your runs just get tracked, resumable, and
19
- saveable as a one-word `/tf:<name>` command.
27
+ You already know the built-in subagent tool's `task` / `tasks` / `chain`. `pi-taskflow` speaks the *same* shorthand — so your existing delegations instantly become **tracked, resumable, and saveable as a one-word `/tf:<name>` command**. When you outgrow the shorthand, the full DSL gives you a real DAG: dynamic fan-out over dozens of items, conditional routing, quality gates, human approvals, retries, and a hard spend ceiling.
20
28
 
21
- ```bash
22
- pi install npm:pi-taskflow
23
- ```
29
+ And the whole time, **only the final phase reaches your conversation.** Every intermediate transcript stays in the runtime, never your context window.
30
+
31
+ ## Why this exists
24
32
 
25
- Fan out one subagent per item, route on results, retry the flaky ones, pause for
26
- human approval, cap the spend, and gate the output with an adversarial review —
27
- all from one declarative definition. Only the final report reaches your
28
- conversation; every intermediate transcript stays in the runtime.
33
+ Here's the wall you hit with raw subagents: you describe a multi-step plan in prose, the model re-derives it every single run, the intermediate transcripts flood your context, and the moment one model call fails you start over from zero. There's no reuse, no recovery, no structure.
29
34
 
30
- ## Why
35
+ `pi-taskflow` moves the plan **out of the prompt and into a declarative definition.** The runtime owns the DAG, the loops, the retries, and the intermediate state. You declare a pipeline once and run it a hundred times — by name.
31
36
 
32
- The built-in subagent tool is great for a single delegated task. But when a job
33
- needs many coordinated steps, fan-out over dozens of items, cross-checked review,
34
- or a repeatable pipeline, you want orchestration — without the intermediate
35
- transcripts eating your context window.
37
+ <div align="center">
38
+ <img src="./assets/context-isolation.png" alt="With raw subagents every transcript floods your context; with pi-taskflow transcripts stay in the runtime and only the final result returns" width="900">
39
+ </div>
36
40
 
37
- `pi-taskflow` moves the plan into a small declarative definition. The runtime
38
- holds the DAG, the loops, and the intermediate results; your context receives
39
- only the final phase's output.
41
+ > When a job needs twelve steps with branching fan-out and a review gate, you want orchestration — not lucky prompting.
40
42
 
41
- | | `subagent` tool | `pi-taskflow` |
43
+ | | subagent (built-in) | **pi-taskflow** |
42
44
  |---|---|---|
43
- | Who drives | the model, turn by turn | the runtime, from a definition |
44
- | Intermediate results | in your context window | in the runtime (not your context) |
45
- | Reusable | re-described each time | saved as `/tf:<name>` |
46
- | Scale | a few tasks | dynamic `map` fan-out |
47
- | Resumable | no | yes (cross-session, cached phases skip) |
48
- | Quality gates | no | `gate` phases with `VERDICT: BLOCK / PASS` |
49
- | Conditional routing | no | `when` guards + `join: any` OR-joins |
50
- | Fault tolerance | no | per-phase `retry` with backoff |
51
- | Human-in-the-loop | no | `approval` phases (approve / reject / edit) |
52
- | Cost control | no | run-wide `budget` (USD / token caps) |
53
- | Composition | no | `flow` phases run saved sub-flows |
54
- | Progress visibility | opaque while running | live DAG render with timing + cost |
55
- | Ergonomics | inline JSON each time | shorthand (`task`/`tasks`/`chain`) or DSL |
45
+ | **Who drives** | the model, turn by turn | the runtime, from a definition |
46
+ | **Topology** | chain / flat parallel | **DAG with layered concurrency + routing** |
47
+ | **Intermediate results** | in your context window | **in the runtime — not your context** |
48
+ | **Scale** | a handful of tasks | **dynamic `map` fan-out over dozens of items** |
49
+ | **Reusable** | re-described every time | **saved as `/tf:<name>`** |
50
+ | **Resumable** | | **✓ cross-session cached phases auto-skip** |
51
+ | **Quality gates** | | **`gate` phases that halt on `VERDICT: BLOCK`** |
52
+ | **Conditional routing** | | **`when` guards + `join: any` OR-joins** |
53
+ | **Fault tolerance** | | **per-phase `retry` + auto-retry on transient errors** |
54
+ | **Human-in-the-loop** | | **`approval` phases (approve / reject / edit)** |
55
+ | **Cost control** | | **run-wide `budget` (USD / token caps)** |
56
+ | **Composition** | | **`flow` phases run saved sub-flows** |
57
+ | **Live progress** | opaque while running | **live DAG render with timing + cost** |
58
+ | **Ergonomics** | inline JSON each time | **shorthand (`task`/`tasks`/`chain`) *or* DSL** |
59
+
60
+ It doesn't replace the subagent tool. It gives your subagents a DAG, a memory, and a name.
61
+
62
+ ## Compared to other Pi extensions
63
+
64
+ The Pi ecosystem has a healthy crowd of delegation and orchestration extensions — each great at what it's for. Here's an honest map of where `pi-taskflow` sits (verified against each package's latest npm release, June 2026).
65
+
66
+ | Extension | Model | Custom DSL | DAG | Dynamic fan-out | Cross-session resume | Quality gate | Human approval | Save as command | Zero deps |
67
+ |---|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
68
+ | **pi-taskflow** | **declarative multi-phase taskflows** | **✓** | **✓** | **✓ `map`** | **✓** | **✓** | **✓** | **✓ `/tf:<name>`** | **✓** |
69
+ | [`pi-agents`](https://www.npmjs.com/package/pi-agents) | JSON workflow graph (`spawn`/`fork`/`join`/`loop`) | ✓ | ✓ | ✕ (static `fork`) | ✕ | ✕ | ✕ | ✕ | ✕ (1) |
70
+ | [`pi-subagents`](https://www.npmjs.com/package/pi-subagents) | single / parallel / chain delegation | ✕ | ✕ | ✕ | ✕ | ✕ | clarify only | named workflows | ✕ (3) |
71
+ | [`pi-crew`](https://www.npmjs.com/package/pi-crew) | multi-agent teams + git worktrees + async | partial | ✕ | ✕ | durable state | ✕ | ✕ | ✕ | ✕ (7) |
72
+ | [`pi-orchestrator`](https://www.npmjs.com/package/pi-orchestrator) | fixed plan→build→review→fix→test pipeline | ✕ | fixed | ✕ | ✕ | ✓ verdict | ✓ | ✕ | ✓ |
73
+ | [`pi-pipeline`](https://www.npmjs.com/package/pi-pipeline) | fixed SPEC→PLAN→TASKS→VERIFY | ✕ | dep graph | ✕ | session planning | ✓ | clarify | ✕ | – |
74
+ | [`pi-agent-flow`](https://www.npmjs.com/package/pi-agent-flow) | one-shot parallel specialist `fork` | ✕ | ✕ | ✕ | ✕ | ✕ | ✕ | ✕ | – |
75
+
76
+ **How to choose:**
77
+
78
+ - **`pi-agents`** is the closest cousin — also a JSON graph with isolated agents, budgets, and `fork`/`loop`/`join`. Reach for `pi-taskflow` when you need what its graph doesn't have: **dynamic `map` fan-out over a discovered list**, **cross-session resume** (continue a half-finished run hours later, cached phases skipped), **quality `gate`s** that halt on a verdict, **human `approval`** steps, and **saving the whole pipeline as a `/tf:<name>` command**.
79
+ - **`pi-subagents`** is the right tool for ad-hoc “use reviewer on this diff” delegation and background jobs. `pi-taskflow` is for when those delegations need to become a *repeatable, resumable pipeline*.
80
+ - **`pi-crew`** goes heavier — worktree isolation and durable async teams. If you want lightweight, declarative, and zero-dependency, that's this project.
81
+ - **`pi-orchestrator` / `pi-pipeline`** ship *opinionated, fixed* workflows (plan→build→… / spec-driven). `pi-taskflow` ships an *empty canvas*: you (or the model) declare the graph that fits the job.
82
+
83
+ > The honest one-liner: **nothing else in the ecosystem combines a declarative DAG, `map` fan-out, cross-session resume, gates, approvals, and save-as-command — with zero runtime dependencies.**
84
+
85
+ ## 30-second start
86
+
87
+ **1. Install** — one command:
56
88
 
57
- ## Show me
89
+ ```bash
90
+ pi install npm:pi-taskflow
91
+ ```
58
92
 
59
- Describe a pipeline once, then run it from a pi session by name:
93
+ **2. Run** just ask the model in a Pi session:
60
94
 
61
- > `/tf:summarize-files dir=src`
95
+ > *Run a chain: first explore the auth flow, then summarize the findings.*
62
96
 
63
- The runtime fans out one subagent per file, merges the summaries in a `reduce`
64
- phase, and returns only the final overview. Every intermediate transcript stays
65
- in the runtime — never in your context window. (Full definition in
66
- [Quickstart](#then-go-declarative) below.)
97
+ The model calls the `taskflow` tool automatically. You get live progress, per-step timing, token cost, and a saved run record — **same effort as the built-in tool, now tracked and resumable.**
67
98
 
68
- ## Quickstart
99
+ **3. Save** — say *"save it"* and you have `/tf:<name>` forever.
69
100
 
70
- ### Shorthand: same effort as `subagent`, but tracked & resumable
101
+ That's it. You can be running your first workflow before your coffee cools — without writing a single phase definition.
71
102
 
72
- **Single task** one agent, one job:
103
+ ### The shorthand (same shape as the built-in tool)
73
104
 
74
105
  ```jsonc
106
+ // Single — one agent, one job
75
107
  { "task": "Summarize the architecture of src/", "agent": "explorer" }
76
- ```
77
108
 
78
- **Parallel tasks** — fire several at once, outputs merge:
79
-
80
- ```jsonc
109
+ // Parallel — fire several at once, outputs merge
81
110
  { "tasks": [
82
- { "task": "Audit auth in src/api", "agent": "analyst" },
111
+ { "task": "Audit auth in src/api", "agent": "analyst" },
83
112
  { "task": "Audit input validation in src/api", "agent": "analyst" }
84
113
  ] }
85
- ```
86
114
 
87
- **Chain** — sequential, each step sees the previous one's output:
88
-
89
- ```jsonc
115
+ // Chain — sequential; each step sees the previous output
90
116
  { "chain": [
91
117
  { "task": "List the public API of src/lib", "agent": "scout" },
92
118
  { "task": "Write docs for:\n{previous.output}", "agent": "writer" }
93
119
  ] }
94
120
  ```
95
121
 
96
- `agent` is optional (defaults to the first available agent). Add `name` to label
97
- the run and enable saving it as a reusable command.
122
+ `agent` is optional (defaults to the first discovered agent). Add a `name` to label the run and unlock saving it as a command.
123
+
124
+ ## Watch it run
125
+
126
+ This is not a mockup. **This is stdout from a real run** — the `self-improve` flow that writes and verifies its own test suites, caught mid-flight by a quality gate:
127
+
128
+ ```
129
+ ⊗ taskflow self-improve 6/7 · blocked · $0.095
130
+ ✓ discover agent deepseek-v4-flash 10t ↑38k ↓6.7k $0.011
131
+ ┌ ✓ write-runner-tests agent claude-sonnet-4-6 10t ↑13 ↓6.6k $0.020
132
+ ├ ✓ write-store-tests agent claude-sonnet-4-6 10t ↑11 ↓10k $0.018
133
+ ├ ✓ write-agents-tests agent claude-sonnet-4-6 10t ↑28 ↓13k $0.030
134
+ └ ✓ fix-stability agent claude-sonnet-4-6 10t ↑13 ↓3.9k $0.012
135
+ ✓ verify gate BLOCK 3 type errors in test files deepseek-v4-flash
136
+ ⊘ report reduce skipped · Gate blocked ↳ fix-stability
137
+ ```
98
138
 
99
- Try it inlinetell the model something like:
139
+ **The layout *is* the DAG.** No dashboard, no logs to grep you read the progress bar and you understand the whole pipeline:
100
140
 
101
- > Run a chain: first explore the auth flow, then summarize findings.
141
+ - **Header** — `⊗` = blocked (a gate halted it); `6/7` phases processed; aggregate cost `$0.095`.
142
+ - **Status icons** — `✓` done · `◐` running · `✗` failed · `⊘` skipped · `○` pending.
143
+ - **Rail `┌ ├ └`** — phases in the same DAG layer, running concurrently. The four `write-*`/`fix-stability` tasks fan out from `discover`. A blank gutter = a single-phase layer.
144
+ - **`↳`** — a long, layer-skipping dependency. `report` depends on the adjacent `verify` *and* on `fix-stability` two layers back, so only that skip edge is annotated.
145
+ - **Gate** — `verify` emitted `VERDICT: BLOCK`, so the runtime skipped `report` and ended the run as `blocked`, surfacing the reason inline.
146
+ - **Detail** — per phase: model, token counts (`↑`in `↓`out), cost, timing. Fan-out phases also show sub-task progress (`3/15 2✗ 8▸`).
102
147
 
103
- The model calls the `taskflow` tool; you get live progress, per-step timing,
104
- token cost, and a run record. Ask to `save` it and you get `/tf:<name>`.
148
+ ## Go declarative
105
149
 
106
- ### Then go declarative
150
+ The shorthand is your onramp. The DSL is where `pi-taskflow` earns its keep — dynamic fan-out, structured routing, and quality gates.
107
151
 
108
- When your pipeline outgrows the shorthand — when you need dynamic fan-out,
109
- intermediate JSON routing, or quality gates — graduate to the full DSL:
152
+ ### Fan out and reduce
110
153
 
111
154
  ```jsonc
112
155
  {
113
156
  "name": "summarize-files",
114
- "description": "Discover files, summarize each, produce a report",
157
+ "description": "Discover files, summarize each, produce one report",
115
158
  "args": { "dir": { "default": "." } },
116
159
  "concurrency": 8,
117
160
  "phases": [
@@ -119,34 +162,23 @@ intermediate JSON routing, or quality gates — graduate to the full DSL:
119
162
  "task": "List source files under {args.dir} (non-recursive).\nOutput ONLY a JSON array [{\"file\":\"\"}]. No prose.",
120
163
  "output": "json" },
121
164
  { "id": "summarize", "type": "map",
122
- "over": "{steps.discover.json}", "as": "item",
123
- "agent": "scout",
165
+ "over": "{steps.discover.json}", "as": "item", "agent": "scout",
124
166
  "task": "Read {item.file} and give a one-sentence summary.",
125
167
  "dependsOn": ["discover"] },
126
- { "id": "report", "type": "reduce", "from": ["summarize"],
127
- "agent": "writer",
168
+ { "id": "report", "type": "reduce", "from": ["summarize"], "agent": "writer",
128
169
  "task": "Combine into a short overview:\n{steps.summarize.output}",
129
170
  "dependsOn": ["summarize"], "final": true }
130
171
  ]
131
172
  }
132
173
  ```
133
174
 
134
- What this does:
135
-
136
- 1. **`discover`** an agent lists every file in the directory and outputs a JSON array.
137
- 2. **`summarize`** — a `map` fans out, spawning one subagent per file in parallel
138
- (throttled to 8 concurrent). Each gets `{item.file}` bound to its file path.
139
- 3. **`report`** — a `reduce` merges all summaries into one clean overview.
140
-
141
- Intermediate outputs never enter your context. The runtime owns them. You get
142
- only the final report back.
175
+ 1. **`discover`** lists every file and emits a JSON array.
176
+ 2. **`summarize`** is a `map` — it fans out one subagent per file, throttled to 8 concurrent, with `{item.file}` bound to each path.
177
+ 3. **`report`** is a `reduce` it merges every summary into one clean overview.
143
178
 
144
- Save it once → `/tf:summarize-files` forever.
179
+ The intermediate summaries never enter your context. The runtime owns them; you get the report. **Save it once → `/tf:summarize-files dir=src` forever.**
145
180
 
146
- ### Route, gate, and guard
147
-
148
- Phases also **branch, retry, pause for a human, and respect a budget** — still
149
- declaratively, no scripting:
181
+ ### Route, gate, retry, approve, and cap the spend
150
182
 
151
183
  ```jsonc
152
184
  {
@@ -168,59 +200,28 @@ declaratively, no scripting:
168
200
  }
169
201
  ```
170
202
 
171
- - **`when`** routes to `deep` *or* `quick` from the triage JSON; the other branch is skipped.
172
- - **`join: "any"`** lets `approve` run as soon as whichever branch fired completes.
203
+ - **`when`** routes to `deep` *or* `quick` from the triage JSON the other branch is skipped.
204
+ - **`join: "any"`** lets `approve` fire the moment whichever branch ran completes (an OR-join).
173
205
  - **`retry`** re-runs a flaky patch with backoff; **`budget`** halts the whole run if it gets too expensive.
174
206
  - **`approval`** pauses for a human (approve / reject / edit) before the final `ship`.
175
207
 
176
- ## Watch it run
177
-
178
- This is the live progress render for a real run — the `self-improve` flow that
179
- writes and verifies its own test suites, caught here mid-block by a quality gate:
180
-
181
- ```
182
- ⊗ taskflow self-improve 6/7 · blocked · $0.095
183
- ✓ discover agent deepseek-v4-flash 10t ↑38k ↓6.7k $0.011
184
- ┌ ✓ write-runner-tests agent claude-sonnet-4-6 10t ↑13 ↓6.6k $0.020
185
- ├ ✓ write-store-tests agent claude-sonnet-4-6 10t ↑11 ↓10k $0.018
186
- ├ ✓ write-agents-tests agent claude-sonnet-4-6 10t ↑28 ↓13k $0.030
187
- └ ✓ fix-stability agent claude-sonnet-4-6 10t ↑13 ↓3.9k $0.012
188
- ✓ verify gate BLOCK 3 type errors in test files deepseek-v4-flash
189
- ⊘ report reduce skipped · Gate blocked ↳ fix-stability
190
- ```
191
-
192
- **How to read it — the layout *is* the DAG:**
193
-
194
- - **Header** — `⊗` means the flow is blocked (a gate halted it); `6/7` phases
195
- processed, aggregate cost `$0.095`.
196
- - **Status icons** — `✓` done, `◐` running, `✗` failed, `⊘` skipped, `○` pending.
197
- - **Rail `┌ ├ └`** — phases in the same DAG layer, running concurrently. The four
198
- `write-*`/`fix-stability` tasks all fan out from `discover`. A blank gutter is
199
- a single-phase layer.
200
- - **`↳`** — a long (layer-skipping) dependency. `report` depends on `verify` (the
201
- adjacent layer, implied by position) *and* `fix-stability` two layers back, so
202
- only that skip edge is annotated.
203
- - **Gate** — `verify` emitted `VERDICT: BLOCK`, so the runtime skipped `report`
204
- and ended the run as `blocked`, surfacing the reason.
205
- - **Detail** — per phase: model, token counts (`↑`in `↓`out), cost, and timing.
206
- Fan-out phases also show sub-task progress.
208
+ No scripting. No `eval`. Just data the runtime executes — safe enough to run LLM-generated definitions directly.
207
209
 
208
210
  ## Phase types
209
211
 
210
- | type | meaning | required fields |
211
- |------|---------|-----------------|
212
+ | type | what it does | required fields |
213
+ |------|--------------|-----------------|
212
214
  | `agent` | one subagent runs a single task | `task` |
213
215
  | `parallel` | run `branches[]` concurrently | `branches` (array of `{task, agent?}`) |
214
- | `map` | fan out over an array — one subagent per item, `{item}` bound | `over`, `task` |
216
+ | `map` | **fan out** over an array — one subagent per item, `{item}` bound | `over`, `task` |
215
217
  | `gate` | quality/review step that can **halt the flow** | `task` |
216
218
  | `reduce` | aggregate `from[]` phase outputs into one | `from`, `task` |
217
- | `approval` | **human-in-the-loop** pause — approve / reject / edit before continuing | — |
218
- | `flow` | run a **saved sub-flow** as one phase (composition/reuse) | `use` |
219
+ | `approval` | **human-in-the-loop** pause — approve / reject / edit | — |
220
+ | `flow` | run a **saved sub-flow** as one phase (composition) | `use` |
219
221
 
220
222
  ### Common phase fields
221
223
 
222
- Every phase needs a unique `id` and a `type` (defaults to `agent`). On top of the
223
- per-type fields above:
224
+ Every phase needs a unique `id` and a `type` (defaults to `agent`). On top of the per-type fields:
224
225
 
225
226
  | Field | Meaning |
226
227
  |---|---|
@@ -237,62 +238,35 @@ per-type fields above:
237
238
  | `optional` | A failure here does **not** abort the run |
238
239
  | `use` / `with` | (`flow`) saved sub-flow name + its args |
239
240
 
240
- Flow-level keys: `name`, `description`, `args`, `concurrency` (default 8),
241
- `agentScope`, and `budget: { maxUSD?, maxTokens? }`.
241
+ Flow-level keys: `name`, `description`, `args`, `concurrency` (default 8), `agentScope`, and `budget: { maxUSD?, maxTokens? }`.
242
242
 
243
243
  ### Control flow & reliability
244
244
 
245
- - **`when`** — skip a phase unless an expression is truthy. Supports `{refs}`,
246
- `== != < > <= >=`, `&& || !`, parentheses, and quoted strings/numbers, e.g.
247
- `"when": "{steps.triage.json.route} == deep"`. Pair with `join: "any"` on the
248
- merge phase to build real if/else routing. Parse errors **fail open**.
249
- - **`join: "any"`** an OR-join: the phase runs as soon as *one* dependency
250
- completes (default `"all"` waits for every dep).
251
- - **`retry`**`{ "max": 2, "backoffMs": 500, "factor": 2 }` retries a failing
252
- subagent with fixed (`factor:1`) or exponential backoff; usage is summed and
253
- the attempt count shows as `↻N` in the TUI.
254
- - **`approval`** — pause for a human (`select`: Approve / Reject / Edit). Reject
255
- halts the flow; Edit injects the typed note as the phase output for downstream
256
- steps. Non-interactive runs auto-approve.
257
- - **`flow`** — `{ "type": "flow", "use": "deep-research", "with": { "topic": "{item}" } }`
258
- runs a saved flow as a phase (recursion is detected and rejected).
259
- - **`budget`** — a run-wide `{maxUSD, maxTokens}` ceiling; once exceeded, pending
260
- phases are skipped (and in-flight fan-out stops spawning) and the run is
261
- `blocked`.
262
-
263
- ### `output` format
264
-
265
- - `output: "text"` (default) — the raw subagent output.
266
- - `output: "json"` — the subagent output is parsed as JSON and exposed via
267
- `{steps.ID.json}` / `{steps.ID.json.field}`. Set this on phases whose output
268
- a downstream `map` or `reduce` needs to consume as structured data.
269
-
270
- There is no `output: "file"`. For file-based output, have the agent write to
271
- disk with a `write` tool call.
245
+ - **`when`** — skip a phase unless an expression is truthy. Supports `{refs}`, `== != < > <= >=`, `&& || !`, parentheses, and quoted strings/numbers. Pair with `join: "any"` on the merge phase for real if/else routing. Parse errors **fail open**.
246
+ - **`join: "any"`** an OR-join: the phase runs as soon as *one* dependency completes (default `"all"` waits for all).
247
+ - **`retry`** — `{ "max": 2, "backoffMs": 500, "factor": 2 }` retries a failing subagent with fixed or exponential backoff; usage is summed and the attempt count shows as `↻N` in the TUI. Transient provider errors (rate-limit / 5xx / timeout) **auto-retry even without an explicit policy**; hard errors don't.
248
+ - **`approval`** pause for a human (Approve / Reject / Edit). Reject halts the flow; Edit injects the typed note as the phase output for downstream steps. Non-interactive runs auto-approve.
249
+ - **`flow`** — `{ "type": "flow", "use": "deep-research", "with": { "topic": "{item}" } }` runs a saved flow as a phase (recursion is detected and rejected).
250
+ - **`budget`** — a run-wide `{maxUSD, maxTokens}` ceiling; once exceeded, pending phases skip and in-flight fan-out stops spawning, ending the run as `blocked`.
251
+ - **idle watchdog** a subagent that goes silent for 5 minutes is treated as wedged and killed (SIGTERM → SIGKILL), so one hung child can never freeze the whole flow.
272
252
 
273
253
  ### Gate phases (quality control)
274
254
 
275
- A `gate` runs an agent to review upstream output and can **block the rest
276
- of the workflow**. End the gate task's instructions by asking the agent to
277
- emit a verdict the runtime can read:
255
+ A `gate` runs an agent to review upstream output and can **block the rest of the workflow.** End the gate task by asking for a verdict the runtime can read:
278
256
 
279
- - a final line `VERDICT: PASS` or `VERDICT: BLOCK` (also accepts `OK`, `FAIL`,
280
- `STOP`, `REJECT`, `HALT` last occurrence wins), or
281
- - JSON like `{"continue": false, "reason": "missing auth checks"}` /
282
- `{"verdict": "block", "reason": "..."}`.
257
+ - a final line `VERDICT: PASS` or `VERDICT: BLOCK` (also accepts `OK`, `FAIL`, `STOP`, `REJECT`, `HALT` — last occurrence wins), or
258
+ - JSON like `{"continue": false, "reason": "missing auth checks"}` / `{"verdict": "block", "reason": "..."}`.
283
259
 
284
- On **BLOCK**, downstream phases are skipped and the run ends as `blocked` with
285
- the reason surfaced. **Ambiguous output fails open** (treated as PASS) — a gate
286
- never halts the flow by accident.
260
+ On **BLOCK**, downstream phases skip and the run ends as `blocked` with the reason surfaced. **Ambiguous output fails open** (treated as PASS) — a gate never halts your flow by accident.
287
261
 
288
262
  ```
289
- Review the audit results below. If any endpoint is missing auth, end with
263
+ Review the audit below. If any endpoint is missing auth, end with
290
264
  "VERDICT: BLOCK" and a one-line reason; otherwise end with "VERDICT: PASS".
291
265
 
292
266
  {steps.audit.output}
293
267
  ```
294
268
 
295
- ## Interpolation
269
+ ## Interpolation & expressions
296
270
 
297
271
  | placeholder | resolves to |
298
272
  |---|---|
@@ -302,9 +276,13 @@ Review the audit results below. If any endpoint is missing auth, end with
302
276
  | `{item}` / `{item.field}` | current item inside a `map` phase |
303
277
  | `{previous.output}` | the immediately-upstream phase output |
304
278
 
279
+ Condition grammar (for `when`): `== != < > <= >=`, `&& || !`, parentheses, quoted strings/numbers, and any `{...}` reference — e.g. `"when": "{steps.triage.json.route} == deep && {args.force} != true"`.
280
+
281
+ > Referencing `{steps.X}` that isn't declared in `dependsOn` is a **hard validation error** — the runtime catches the most common pipeline bug before a single agent runs.
282
+
305
283
  ## Commands
306
284
 
307
- Saved flows become CLI shortcuts. All commands work in the pi session:
285
+ Saved flows become CLI shortcuts. All commands run in the Pi session:
308
286
 
309
287
  | Command | What it does |
310
288
  |---|---|
@@ -313,20 +291,32 @@ Saved flows become CLI shortcuts. All commands work in the pi session:
313
291
  | `/tf show <name>` | Print a flow's definition |
314
292
  | `/tf runs` | Browse recent run history (interactive TUI) |
315
293
  | `/tf resume <runId>` | Continue a paused/failed run — cached phases skip automatically |
294
+ | `/tf init` | Generate default modelRoles config in `~/.pi/agent/settings.json` |
316
295
  | `/tf:<name> [args]` | Shortcut — runs the flow in one tap |
317
296
 
318
- Tool actions (used by the model): `run` (inline `define` or saved `name`),
319
- `save`, `resume`, `list`.
297
+ Tool actions (used by the model): `run` (inline `define` or saved `name`), `save`, `resume`, `list`.
298
+
299
+ ## Resume across sessions
300
+
301
+ A taskflow run isn't tied to your session. Every completed phase is written to disk, so a run that fails (or that you stop) can be continued later with `/tf resume <runId>` — **cached phases skip automatically** and only the remaining work spends tokens.
302
+
303
+ <div align="center">
304
+ <img src="./assets/resume.png" alt="A run fails midway in session 1; in session 2 /tf resume skips the cached phases and only re-runs the failed phase and what follows" width="900">
305
+ </div>
306
+
307
+ Resume is keyed on each phase's input hash — if an upstream output changed, dependent phases re-run; if nothing changed, they're reused. No competing Pi extension does this across sessions.
320
308
 
321
309
  ## Storage
322
310
 
323
311
  ```
324
312
  .pi/taskflows/<name>.json # project-scoped definitions (commit to share)
325
313
  ~/.pi/agent/taskflows/<name>.json # user-scoped definitions
326
- .pi/taskflows/runs/<runId>.json # run state (resume); gitignore this
314
+ .pi/taskflows/runs/<runId>.json # run state for resume (gitignore this)
327
315
  ```
328
316
 
329
- Agent discovery scope (set via `agentScope` in the flow definition):
317
+ > Commit `.pi/taskflows/` and your whole team shares the pipelines — no config sync, no onboarding doc. Run state is written atomically and guarded by a zero-dependency file lock, so concurrent runs never corrupt the index.
318
+
319
+ Agent discovery scope (via `agentScope` in the flow definition):
330
320
 
331
321
  | value | discovers agents from |
332
322
  |---|---|
@@ -336,20 +326,108 @@ Agent discovery scope (set via `agentScope` in the flow definition):
336
326
 
337
327
  ## Agents
338
328
 
339
- Taskflow reuses your existing pi agent files (`~/.pi/agent/agents/*.md`,
340
- `.pi/agents/*.md`). Reference agents by `name` in a phase or shorthand.
329
+ Taskflow ships **18 built-in agents** each a `.md` file with a tuned system prompt, thinking level, and tool set. You can reference them by `name` in any phase or shorthand, right after install. No setup required.
330
+
331
+ ### Built-in agent roster
332
+
333
+ | Agent | Role | Thinking | Default role |
334
+ |---|---|---:|---|
335
+ | `executor` | Implement planned code changes | high | `{{fast}}` |
336
+ | `executor-fast` | Trivial fixes (≤2 files, ≤50 lines) | off | `{{fast}}` |
337
+ | `executor-code` | Complex multi-file implementation | high | `{{strong}}` |
338
+ | `executor-ui` | Frontend / styling / visual changes | high | `{{vision}}` |
339
+ | `scout` | Fast codebase recon & file mapping | off | `{{fast}}` |
340
+ | `planner` | Implementation plan creation | high | `{{strong}}` |
341
+ | `analyst` | Requirements analysis, ambiguity detection | high | `{{thinker}}` |
342
+ | `critic` | Inline self-doubt during reasoning | xhigh | `{{thinker}}` |
343
+ | `reviewer` | General code / architecture review | high | `{{strong}}` |
344
+ | `risk-reviewer` | Backend / infra / DB / API risk | high | `{{reasoner}}` |
345
+ | `security-reviewer` | Security vulns, auth/crypto | xhigh | `{{reasoner}}` |
346
+ | `plan-arbiter` | Plan quality gate (complex tasks) | high | `{{arbiter}}` |
347
+ | `final-arbiter` | Tiebreaker when critics disagree | xhigh | `{{arbiter}}` |
348
+ | `test-engineer` | Design & implement tests | high | `{{fast}}` |
349
+ | `doc-writer` | Documentation authoring | off | `{{fast}}` |
350
+ | `recover` | Session recovery after compaction | low | `{{fast}}` |
351
+ | `verifier` | Run tests, validate outcomes | off | `{{fast}}` |
352
+ | `visual-explorer` | Figma design metadata analysis | high | `{{vision}}` |
353
+
354
+ Agents are layered: **built-in → user (`~/.pi/agent/agents/`) → project (`.pi/agents/`)**. A user or project agent with the same `name` overrides the built-in — so you can customize any agent without touching the package.
355
+
356
+ ### Model roles
357
+
358
+ Each built-in agent's `model` field uses a **role placeholder** (e.g. `{{fast}}`) instead of a hardcoded provider string. This decouples *intent* from *implementation* — you map roles to models once, and every agent adapts.
359
+
360
+ | Role | Intent | Typical model |
361
+ |---|---|---|
362
+ | `{{fast}}` | Cheap & quick — high-volume, low-stakes | DeepSeek V4 Flash |
363
+ | `{{strong}}` | Balanced — planning, review, moderate complexity | MiMo v2.5 Pro |
364
+ | `{{thinker}}` | Deep analysis — requirements, critique | DeepSeek V4 Pro |
365
+ | `{{arbiter}}` | Final judgment — tiebreak, plan quality gates | Qwen 3.7 Max |
366
+ | `{{vision}}` | Multimodal — UI work, design reading | MiniMax M3 |
367
+ | `{{reasoner}}` | Cautious reasoning — security, risk | GLM 5.1 |
368
+
369
+ Without configuration, agents fall back to Pi's default model. To assign specific models:
370
+
371
+ ```bash
372
+ # Auto-generate ~/.pi/agent/settings.json with default role mappings
373
+ /tf init
374
+ ```
375
+
376
+ This writes:
341
377
 
342
- When running a phase, the runtime extracts the agent's `systemPrompt` from its
343
- `.md` frontmatter and passes it via `--append-system-prompt` (written to a temp
344
- file). Phase-level overrides for `model`, `thinking`, and `tools` are passed as
345
- `--model` / `--thinking` / `--tools` flags to the subagent invocation.
378
+ ```json
379
+ {
380
+ "modelRoles": {
381
+ "fast": "openrouter/deepseek/deepseek-v4-flash",
382
+ "strong": "openrouter/xiaomi/mimo-v2.5-pro",
383
+ "thinker": "openrouter/deepseek/deepseek-v4-pro",
384
+ "arbiter": "openrouter/qwen/qwen3.7-max",
385
+ "vision": "minimax/MiniMax-M3",
386
+ "reasoner": "z-ai/glm-5.1"
387
+ }
388
+ }
389
+ ```
346
390
 
347
- Settings from `~/.pi/agent/settings.json` (the `subagents.agentOverrides` map)
348
- are honored, letting you tweak model, thinking, or tools per agent across all flows.
391
+ Edit the values to match your available providers. You can also override individual agents via `subagents.agentOverrides` in the same file:
392
+
393
+ ```json
394
+ {
395
+ "modelRoles": { ... },
396
+ "subagents": {
397
+ "agentOverrides": {
398
+ "executor": { "model": "anthropic/claude-sonnet-4-20250514" },
399
+ "reviewer": { "thinking": "xhigh" }
400
+ }
401
+ }
402
+ }
403
+ ```
404
+
405
+ ### Custom agents
406
+
407
+ Drop a `.md` file into `~/.pi/agent/agents/` (user-level) or `.pi/agents/` (project-level, commit it) to add your own:
408
+
409
+ ```markdown
410
+ ---
411
+ name: my-linter
412
+
413
+ description: Run ESLint and report violations
414
+
415
+ tools: read, bash
416
+
417
+ model: "{{fast}}"
418
+
419
+ thinking: off
420
+ ---
421
+
422
+ You are a linting agent. Run `npx eslint --format json` on the
423
+ provided files. Report violations grouped by file. No fixes.
424
+ ```
425
+
426
+ Then reference it in any phase: `{ "agent": "my-linter", "task": "Lint src/" }`.
349
427
 
350
428
  ## Examples
351
429
 
352
- Ready-to-read definitions live in [`examples/`](./examples):
430
+ Ready-to-read definitions in [`examples/`](./examples):
353
431
 
354
432
  | File | Demonstrates |
355
433
  |---|---|
@@ -357,37 +435,33 @@ Ready-to-read definitions live in [`examples/`](./examples):
357
435
  | [`conditional-research.json`](./examples/conditional-research.json) | `when` routing + `join: any` + `gate` + `budget` |
358
436
  | [`guarded-refactor.json`](./examples/guarded-refactor.json) | `approval` (human-in-the-loop) + `retry` + `gate` |
359
437
 
360
- To use one, copy it into `.pi/taskflows/<name>.json` (or
361
- `~/.pi/agent/taskflows/`) and it registers as `/tf:<name>` — or just point the
362
- model at the definition.
438
+ Copy one into `.pi/taskflows/<name>.json` (or `~/.pi/agent/taskflows/`) and it registers as `/tf:<name>` — or just point the model at it.
439
+
440
+ ## What's inside
441
+
442
+ <div align="center">
443
+
444
+ **0 runtime dependencies** · **269 tests** · **7 phase types** · **cross-session resume** · **~4.4k LOC runtime**
445
+
446
+ </div>
447
+
448
+ - **Zero runtime dependencies.** No `dependencies` field — the runtime is built entirely on Node built-ins (`fs` / `path` / `os` / `child_process` / `crypto`). The file lock is `fs.openSync("wx")`, not a third-party library.
449
+ - **269 tests across 11 suites** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, gate verdicts, budget caps, retry/backoff, approval flows, sub-flow composition, callback isolation, and the idle watchdog — plus a live end-to-end test that spawns real subagents.
450
+ - **Hardened by design.** Path-traversal defense (lexical + `realpath`), runId validation, HTML/error sanitization, atomic writes, stale-lock stealing via `rename`, and an idle watchdog that kills wedged subagents.
451
+ - **Dogfooded.** Every new feature has to survive the project's own `self-improve` taskflow before it ships.
452
+
453
+ If this saves you a context window, **drop a ⭐ on [GitHub](https://github.com/heggria/pi-taskflow)** — it genuinely helps.
363
454
 
364
455
  ## Status & limits
365
456
 
366
- - **v0.0.6** — control flow & reliability: conditional `when` guards, `join: any`
367
- OR-joins, declarative `retry`/backoff, `approval` (human-in-the-loop) phases,
368
- `flow` (saved sub-flow composition), and run-wide `budget` caps on top of the
369
- DSL + DAG runtime (`agent`/`parallel`/`map`/`gate`/`reduce`),
370
- inline + saved flows, cross-session resume, live progress, isolated context.
371
- Default `concurrency` is 8 (set on the flow; per-phase `concurrency` overrides
372
- for that phase).
373
- - A run executes as one streaming tool call (live progress while it runs).
374
- - `map` requires the upstream phase to emit a JSON array (`output: "json"`).
375
- - Gate verdicts are **fail-open**: if the agent output contains no recognizable
376
- verdict marker (`VERDICT: BLOCK/PASS/OK/FAIL/STOP/REJECT/HALT` or
377
- `{continue: false}` / `{verdict: "block"}`), the gate passes. This prevents
378
- an accidental missing verdict from blocking your workflow.
379
-
380
- ### What it doesn't do (yet)
381
-
382
- - **No detached background execution.** A run needs the pi session to stay open.
383
- True background execution (and event/cron triggers on top of it) is on the
384
- roadmap.
385
- - **No `output: "file"`.** Outputs are text/JSON only. Write files via agent
386
- tool calls if needed.
387
- - **`map` requires a JSON array.** The `over` field must resolve to
388
- `{steps.ID.json}` where the upstream phase emitted `output: "json"`. If the
389
- source is a plain text list, wrap it in a single-agent phase that outputs JSON.
390
- - **Cycles are rejected at validation.** The DAG must be acyclic.
457
+ **v0.0.11** — full control-flow & reliability layer (`when` guards, `join: any`, `retry`/backoff, `approval`, `flow` composition, `budget` caps, idle watchdog) on top of the DSL + DAG runtime (`agent`/`parallel`/`map`/`gate`/`reduce`), inline + saved flows, cross-session resume, live progress, and isolated context. A run executes as one streaming tool call.
458
+
459
+ Known boundaries (tracked, boundedno surprises mid-flow):
460
+
461
+ - **No detached background execution.** A run needs the Pi session open. True background execution (and event/cron triggers on top of it) is on the roadmap.
462
+ - **No `output: "file"`.** Outputs are text/JSON only write files via an agent's `write` tool call.
463
+ - **`map` requires a JSON array.** The `over` field must resolve to a `{steps.ID.json}` array. Wrap a text list in a single-agent `output: "json"` phase first.
464
+ - **The DAG must be acyclic.** Cycles are rejected at validation.
391
465
 
392
466
  ## Development
393
467
 
@@ -395,16 +469,14 @@ model at the definition.
395
469
  npm install
396
470
  npm run typecheck
397
471
  npm test # unit tests — no network, no process spawning
398
-
399
- # real end-to-end (spawns live subagents; needs model access)
400
- npm run test:e2e
472
+ npm run test:e2e # real end-to-end (spawns live subagents; needs model access)
401
473
  ```
402
474
 
475
+ Runtime lives in `extensions/`, tests in `test/`, runnable examples in `examples/`, and the full design rationale in [`DESIGN.md`](./DESIGN.md).
476
+
403
477
  ## Contributing
404
478
 
405
- Contributions welcome! This is a young project open an issue or PR on
406
- [GitHub](https://github.com/heggria/pi-taskflow). Tests live in `test/`, the
407
- runtime in `extensions/`.
479
+ Contributions welcome this is a young, fast-moving project. Open an issue or PR on [GitHub](https://github.com/heggria/pi-taskflow). Good first contributions: new example flows, phase-type ideas, and TUI polish.
408
480
 
409
481
  ## License
410
482