pi-taskflow 0.0.22 → 0.0.24

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,120 @@
2
2
 
3
3
  All notable changes to pi-taskflow are documented here. This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) format.
4
4
 
5
+ ## [0.0.24] — 2026-06-23
6
+
7
+ > Feature release: **`/tf compile`** — turn the declared DAG into a Mermaid
8
+ > diagram plus a verification overlay for 0 tokens. A picture of the plan, a
9
+ > structural audit of the plan, and a GitHub-pastable artifact — all from the
10
+ > same JSON.
11
+
12
+ ### Added
13
+ - **`compile` action** for the `taskflow` tool and the `/tf compile <name>`
14
+ command. Renders the flow as a Mermaid `flowchart`, overlays verification
15
+ issues onto the nodes (red = error, amber = warning, green border = final),
16
+ and emits a markdown document suitable for READMEs / issues / PRs.
17
+ - Distinct shapes for every phase kind: agent ▭, parallel/map/flow ⊐, reduce ▽,
18
+ gate ◇, approval ⏸, loop ↻, tournament ⬡. Guards become edge labels;
19
+ `join: "any"` becomes dotted edges.
20
+ - Reuses the existing `verifyTaskflow` graph analysis, so every dead-end,
21
+ unreachable node, gate-exhaustion, budget overflow, concurrency warning, and
22
+ guard contradiction is painted directly on the diagram.
23
+ - Zero runtime dependencies; the compiler is a pure function with no LLM calls.
24
+ - Tests: 670 → 702 (+32) in `test/compile.test.ts` — structural assertions on
25
+ the emitted Mermaid tokens (no third-party parser dependency; render-
26
+ correctness is validated by shape/edge/class assertions).
27
+
28
+ ### Fixed
29
+ - **Id collisions no longer merge nodes.** Two distinct phase ids that
30
+ sanitize to the same Mermaid token (e.g. `audit-each` and `audit_each`) are
31
+ now disambiguated with a `_2` suffix instead of collapsing into one node with
32
+ an accidental self-loop.
33
+ - **Markdown-injection hardening.** Free-form strings (flow name, description,
34
+ verification messages) are neutralized before interpolation, so a
35
+ multi-line / bracket-laden name can no longer break out of the H1 heading or
36
+ spawn a second blockquote.
37
+ - **`/tf compile <name>` now schema-validates first**, matching the tool action
38
+ — a malformed saved flow yields a clean error instead of a half-rendered
39
+ diagram. An optional `lr`/`td` suffix selects diagram direction.
40
+ - Backslashes are now escaped inside Mermaid labels.
41
+
42
+ ## [0.0.23] — 2026-06-11
43
+
44
+ > Feature release: the **Shared Context Tree** — an opt-in mechanism that gives
45
+ > subagents a horizontal blackboard and a vertical supervision tree, so fan-out
46
+ > items can reuse expensive context instead of re-reading it, and a node can
47
+ > delegate work at runtime and have its children report back. Validated with six
48
+ > real end-to-end runs (real `pi`, real models) including a recursive org tree
49
+ > and a large 5-way audit that converges through a loop + gate.
50
+
51
+ ### Added
52
+ - **Shared Context Tree (opt-in).** Set `shareContext: true` on a phase (or
53
+ `contextSharing: true` at the flow level) to give its subagent four extra
54
+ tools backed by a per-run, file-based blackboard:
55
+ - `ctx_write(key, value)` / `ctx_read(key?)` — a **horizontal blackboard**: a
56
+ node publishes a finding; siblings/descendants reuse it (own > ancestors >
57
+ completed-others on key conflict; a running sibling's half-written findings
58
+ stay hidden). Stops fan-out items from re-reading the same files.
59
+ - `ctx_report(summary, structured?)` / `ctx_spawn(assignments[])` — a
60
+ **vertical supervision tree**: a node reports up, and delegates child work at
61
+ runtime; the runtime runs each child (isolated) after the node finishes and
62
+ folds their reports into the phase output.
63
+ - New module `extensions/context-store.ts` reuses the run store's atomic-write
64
+ + file-lock primitives (per-node findings files — no global lock contention).
65
+ - All bookkeeping is **fail-open** (it can never sink a phase); the blackboard
66
+ is size-bounded (256 KB/value, 256 keys/node), depth-capped (5), and cleaned
67
+ up with the run. Fully backward-compatible: flows that don't opt in are
68
+ byte-for-byte unaffected.
69
+ - **`ctx_spawn` accepts a sub-graph, not just flat tasks.** An assignment is now
70
+ either `{task, agent?}` **or** `{subflow, defaultAgent?}` where `subflow` is an
71
+ inline Taskflow (a dependency-bearing DAG with `map`/`gate`/`reduce`). The
72
+ spawned subflow reuses the same `validateTaskflow` + `verifyTaskflow` +
73
+ nested-`executeTaskflow` machinery as `flow{def}`; spawn-subflows and `flow{def}`
74
+ share **one** `MAX_DYNAMIC_NESTING` counter (a `def:spawn-*` `_stack` frame), and
75
+ spawned child token/cost usage is folded into the parent phase for honest budget
76
+ accounting. A bad subflow fails open with a diagnostic.
77
+ - **Tests: 608 → 670** (+62) across 33 files, incl. `context-store`,
78
+ `context-tree`, `spawn-xor`, `spawn-subflow`, `spawn-subflow-nesting`,
79
+ `workspace`, `workspace-isolation`.
80
+ - **Workspace isolation (`cwd` keywords).** A phase's `cwd` now accepts three
81
+ reserved keywords that make the runtime allocate an isolated working directory
82
+ for the phase's subagent and tear it down afterwards:
83
+ - `"temp"` — an ephemeral dir under the OS tmpdir, removed when the phase ends.
84
+ - `"dedicated"` — a persistent dir under the run state
85
+ (`runs/ws/<runId>/<phaseId>`), kept for inspection and deterministic per
86
+ phase so a **resume reuses the same dir**.
87
+ - `"worktree"` — a real `git worktree` on a throwaway branch off `HEAD`,
88
+ removed (`git worktree remove --force` + branch delete) when the phase ends;
89
+ for changes you want to diff / commit / discard in isolation.
90
+ - New module `extensions/workspace.ts` (zero deps: `fs.mkdtemp` + `git` via
91
+ `child_process`). **Fail-open**: a failed allocation degrades to the base
92
+ cwd (`worktree`→`temp` when not a git repo) and records a `warnings`
93
+ diagnostic — a phase never fails to run because of isolation. **Security**:
94
+ the keywords are rejected at validation in LLM-authored sub-flows
95
+ (`flow{def}` / `ctx_spawn` subflow) so generated plans cannot allocate
96
+ worktrees or temp dirs that mutate the repo. A literal path is passed
97
+ through unchanged (fully backward-compatible).
98
+
99
+ ### Fixed
100
+ - **`map` / `parallel` fan-out items that call `ctx_spawn` were silently
101
+ orphaned.** The post-run spawn-drain only covered single-agent/`gate`/`reduce`
102
+ phases (keyed on the base phase id), but fan-out items run with suffixed node
103
+ ids (`audit-0`…`audit-4`) and were never drained — their queued children never
104
+ ran (5 orphaned intents, 0 children, in a real e2e). Each fan-out item now
105
+ drains its own node and runs + folds its spawned children (reports + usage),
106
+ fail-open. Regression test added.
107
+ - **Workspace override no longer leaks across isolation boundaries** (found by
108
+ the pre-release adversarial review). `runInlineSubflow` and the gate
109
+ `onBlock:retry` upstream re-execution both spread `...deps` without clearing
110
+ the parent's `_cwdOverride`, so a spawned subflow / re-run upstream dep could
111
+ be force-pinned to the parent phase's isolated dir. Both now strip the
112
+ override (a spawned subflow still inherits the parent's dir as its *base* cwd,
113
+ consistent with `flow{def}`, but no longer ignores an inner phase's own cwd).
114
+ The triplicated `effCwd` formula was extracted into one `resolveEffCwd()`
115
+ helper (the divergence was the root cause). `runs/ws/` dedicated-workspace
116
+ dirs are now reclaimed by the terminal-run cleanup, and `rmrf()` gained a
117
+ path-containment guard (defense-in-depth).
118
+
5
119
  ## [0.0.22] — 2026-06-10
6
120
 
7
121
  > Dogfooding release. The `dogfood-full` self-audit taskflow (which itself
package/README.md CHANGED
@@ -8,24 +8,18 @@
8
8
  <a href="./LICENSE"><img src="https://img.shields.io/badge/license-MIT-43D9AD?style=flat-square" alt="MIT license"></a>
9
9
  <a href="#whats-inside"><img src="https://img.shields.io/badge/runtime%20deps-0-43D9AD?style=flat-square" alt="zero runtime dependencies"></a>
10
10
  <a href="https://github.com/heggria/pi-taskflow/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/heggria/pi-taskflow/ci.yml?branch=main&style=flat-square&label=CI" alt="CI status"></a>
11
- <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-608-6E8BFF?style=flat-square" alt="608 tests"></a>
11
+ <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-702-6E8BFF?style=flat-square" alt="702 tests"></a>
12
12
  <a href="#whats-inside"><img src="https://img.shields.io/badge/dogfooded-%E2%9C%93-43D9AD?style=flat-square" alt="dogfooded"></a>
13
13
  <a href="https://pi.dev"><img src="https://img.shields.io/badge/for-Pi%20coding%20agent-B692FF?style=flat-square" alt="for the Pi coding agent"></a>
14
14
  </p>
15
15
 
16
16
  <p align="center">
17
17
  <b>English</b> ·
18
- <a href="./README.zh-CN.md">简体中文</a> ·
19
- <a href="./docs/i18n/README.hi.md">हिन्दी</a> ·
20
- <a href="./docs/i18n/README.es.md">Español</a> ·
21
- <a href="./docs/i18n/README.ar.md">العربية</a> ·
22
- <a href="./docs/i18n/README.bn.md">বাংলা</a> ·
23
- <a href="./docs/i18n/README.pt.md">Português</a> ·
24
- <a href="./docs/i18n/README.ru.md">Русский</a>
18
+ <a href="./README.zh-CN.md">简体中文</a>
25
19
  </p>
26
20
 
27
21
  <p><strong>A declarative, verifiable <em>graph of tasks</em> for <a href="https://pi.dev">Pi</a> subagents.</strong><br/>
28
- Not a workflow you script — a DAG you declare. Fan out · gate · resume · save as a command — intermediate results stay out of your context.</p>
22
+ Not a workflow you script — a DAG you declare. Fan out · gate · loop · tournament · resume · save as a command — intermediate results stay out of your context.</p>
29
23
 
30
24
  ```bash
31
25
  pi install npm:pi-taskflow
@@ -37,7 +31,7 @@ pi install npm:pi-taskflow
37
31
 
38
32
  **A `workflow` flows. A `taskflow` is a *graph*.** Other orchestrators let the model *script* the work — imperative code that flows step by step, with the graph hidden inside control flow. `pi-taskflow` does the opposite: you **declare** the work as a graph of discrete, named **task** nodes connected by `dependsOn` edges — and the runtime *verifies that graph before it spends a single token.*
39
33
 
40
- You already know the built-in subagent tool's `task` / `tasks` / `chain`. `pi-taskflow` speaks the *same* shorthand — so your existing delegations instantly become **tracked, resumable, and saveable as a one-word `/tf:<name>` command**. When you outgrow the shorthand, the full DSL gives you a real DAG: dynamic fan-out over dozens of items, conditional routing, quality gates, human approvals, retries, and a hard spend ceiling.
34
+ You already know the built-in subagent tool's `task` / `tasks` / `chain`. `pi-taskflow` speaks the *same* shorthand — so your existing delegations instantly become **tracked, resumable, and saveable as a one-word `/tf:<name>` command**. When you outgrow the shorthand, the full DSL gives you a real DAG: dynamic fan-out over dozens of items, conditional routing, quality gates, human approvals, retries, loops, tournaments, and a hard spend ceiling.
41
35
 
42
36
  And the whole time, **only the final phase reaches your conversation.** Every intermediate transcript stays in the runtime, never your context window.
43
37
 
@@ -81,7 +75,9 @@ Here's the wall you hit with raw subagents: you describe a multi-step plan in pr
81
75
  | **Fault tolerance** | ✗ | **per-phase `retry` + auto-retry on transient errors** |
82
76
  | **Human-in-the-loop** | ✗ | **`approval` phases (approve / reject / edit)** |
83
77
  | **Cost control** | ✗ | **run-wide `budget` (USD / token caps)** |
84
- | **Composition** | ✗ | **`flow` phases run saved sub-flows** |
78
+ | **Composition** | ✗ | **`flow` phases run saved *or runtime-generated* sub-flows** |
79
+ | **Iterative loops** | ✗ | **`loop` phases — repeat until condition, convergence, or cap** |
80
+ | **Competitive selection** | ✗ | **`tournament` phases — N variants + judge** |
85
81
  | **Live progress** | opaque while running | **live DAG render with timing + cost** |
86
82
  | **Ergonomics** | inline JSON each time | **shorthand (`task`/`tasks`/`chain`) *or* DSL** |
87
83
 
@@ -99,13 +95,13 @@ The closest thing to `pi-taskflow` in spirit is the **dynamic / code-mode workfl
99
95
  | **See it** | ✗ the graph only exists as the code runs | **✓ the live progress render *is* the DAG** |
100
96
  | **Resume** | coarse (call-cache dedup) | **✓ phase-by-phase input-hash resume, cross-session** |
101
97
  | **Safe to LLM-generate** | risky — it's executable code | **✓ it's just data — no `eval`; and a runtime-generated sub-flow is *structurally validated* (cycles / dangling refs / duplicate ids) before it runs** |
102
- | **Expressivity ceiling** | **higher** — arbitrary control flow | bounded by the DSL, but `map`/`when`/`loop`/`gate` — plus **runtime-generated sub-flows (`flow {def}`)** for plan-then-execute and iterative replanning — cover most jobs |
98
+ | **Expressivity ceiling** | **higher** — arbitrary control flow | bounded by the DSL, but `map`/`when`/`loop`/`gate`/`tournament` — plus **runtime-generated sub-flows (`flow {def}`)** for plan-then-execute and iterative replanning — cover most jobs |
103
99
 
104
100
  We chose the **verifiable** side on purpose. The expressivity you give up is real; what you get back — a plan you can check, watch, replay, and safely let a model author — is what turns one-off prompting into durable orchestration.
105
101
 
106
102
  ## Compared to other Pi extensions
107
103
 
108
- The Pi ecosystem now has **20+ delegation, workflow, and orchestration extensions** — each great at what it's for. Here's an honest map of where `pi-taskflow` sits (verified against each package's latest npm release, June 2026). For the full breakdown — every package, strengths *and* weaknesses — see [`PI-ECOSYSTEM.md`](./PI-ECOSYSTEM.md). For the broader, non-Pi landscape (LangGraph, Temporal, CrewAI, Mastra…) see [`COMPETITORS.md`](./COMPETITORS.md).
104
+ The Pi ecosystem now has **20+ delegation, workflow, and orchestration extensions** — each great at what it's for. Here's an honest map of where `pi-taskflow` sits (verified against each package's latest npm release, June 2026). For the full breakdown — every package, strengths *and* weaknesses — see [`docs/internal/PI-ECOSYSTEM.md`](./docs/internal/PI-ECOSYSTEM.md). For the broader, non-Pi landscape (LangGraph, Temporal, CrewAI, Mastra…) see [`docs/internal/COMPETITORS.md`](./docs/internal/COMPETITORS.md).
109
105
 
110
106
  | Extension | Model | Custom DSL | DAG | Dynamic fan-out | Cross-session resume | Quality gate | Human approval | Save as command | Zero deps |
111
107
  |---|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
@@ -120,18 +116,18 @@ The Pi ecosystem now has **20+ delegation, workflow, and orchestration extension
120
116
  | [`pi-pipeline`](https://www.npmjs.com/package/pi-pipeline) | fixed SPEC→PLAN→TASKS→VERIFY | ✕ | fixed | ✕ | session planning | ✓ | clarify | ✕ | ✕ (2) |
121
117
  | [`pi-agent-flow`](https://www.npmjs.com/package/pi-agent-flow) | one-shot parallel specialist `fork` | yes | ✕ | ✕ | – | ✕ | ✕ | – | ✕ (2) |
122
118
 
123
- *(Representative slice of the 20+ — see [`PI-ECOSYSTEM.md`](./PI-ECOSYSTEM.md) for all of them, plus `@0xkobold/pi-orchestration`, `@melihmucuk/pi-crew`, `@mediadatafusion/pi-workflow-suite`, `gentle-pi`, `@dreki-gg/pi-subagent`, and more.)*
119
+ *(Representative slice of the 20+ — see [`docs/internal/PI-ECOSYSTEM.md`](./docs/internal/PI-ECOSYSTEM.md) for all of them, plus `@0xkobold/pi-orchestration`, `@melihmucuk/pi-crew`, `@mediadatafusion/pi-workflow-suite`, `gentle-pi`, `@dreki-gg/pi-subagent`, and more.)*
124
120
 
125
121
  **How to choose:**
126
122
 
127
123
  - **`@pi-agents/orchid`** is the most feature-complete orchestrator in the ecosystem (DAG + worktrees + Ralph loop + agent mailbox) — but its DSL is a *fixed* 9-phase pipeline, it carries runtime deps + jiti, and it's beta. Reach for `pi-taskflow` when you want to **define your own graph** (not adopt an opinionated one) with **zero dependencies** and a one-command install.
128
124
  - **`pi-crew` / `ultimate-pi`** go heavier — worktree isolation, durable async teams, multi-tier governance. If you want lightweight, declarative, and zero-dependency, that's this project.
129
125
  - **`@zhushanwen/pi-workflow`** is the closest in spirit and also zero-dep, but it's the **imperative** side of the split above: you author workflows as **JavaScript scripts** the model writes and runs. `pi-taskflow`'s **declarative JSON DAG** is the verifiable side — statically checkable, visualizable, safe to LLM-generate, and resumable at phase granularity rather than call-cache dedup.
130
- - **`@fiale-plus/pi-rogue-orchestration`** has a real **loop-until-done** (a feature `pi-taskflow` doesn't yet have). If your job is "keep going until the goal is met," it's worth a look; `pi-taskflow` is for *structured, branching* pipelines instead.
126
+ - **`@fiale-plus/pi-rogue-orchestration`** has a real **loop-until-done** (goal-driven iteration). `pi-taskflow` now ships its own `loop` phase (v0.0.13+) plus `tournament` for competitive selection — and unlike rogue-orchestration, `pi-taskflow` has a full DAG with gates, compositional sub-flows, and cross-session resume. For raw "keep going until the goal is met" with minimal structure, rogue-orchestration is still lighter; for structured, branching pipelines, `pi-taskflow` covers the same ground and more.
131
127
  - **`pi-subagents` / `@gotgenes/pi-subagents`** are the mature picks for ad-hoc "use reviewer on this diff" delegation and background jobs. `pi-taskflow` is for when those delegations need to become a *repeatable, resumable pipeline*.
132
128
  - **`pi-pipeline` / `pi-agent-flow`** ship *opinionated, fixed* flows. `pi-taskflow` ships an *empty canvas*: you (or the model) declare the graph that fits the job.
133
129
 
134
- > The honest one-liner: **`pi-taskflow` is the only Pi extension that gives you a *declarative, verifiable, resumable* DAG of task nodes — saved as a one-word command, with zero runtime dependencies and context isolation by design.** Where code-mode workflows let the model *script* the work, `pi-taskflow` lets it *declare a graph the runtime can prove correct before running.* The known gaps it's closing next: worktree isolation (see [`STRATEGY.md`](./STRATEGY.md)).
130
+ > The honest one-liner: **`pi-taskflow` is the only Pi extension that gives you a *declarative, verifiable, resumable* DAG of task nodes — saved as a one-word command, with zero runtime dependencies and context isolation by design.** Where code-mode workflows let the model *script* the work, `pi-taskflow` lets it *declare a graph the runtime can prove correct before running.* Recently shipped from the roadmap: the Shared Context Tree (blackboard + supervision) and worktree isolation (see [`docs/internal/STRATEGY.md`](./docs/internal/STRATEGY.md)).
135
131
 
136
132
  ## 30-second start
137
133
 
@@ -176,6 +172,16 @@ That's it. You can be running your first workflow before your coffee cools — w
176
172
 
177
173
  `agent` is optional (defaults to the first discovered agent). Add a `name` to label the run and unlock saving it as a command.
178
174
 
175
+ Shorthand modes also support per-step **context pre-reading** — pass `context` (file paths) and optionally `contextLimit` (max chars per file, default 8000) at the step level:
176
+
177
+ ```jsonc
178
+ // Chain with context files injected into each step
179
+ { "chain": [
180
+ { "task": "List the public API", "agent": "scout", "context": ["src/lib/**/*.ts"] },
181
+ { "task": "Write docs for:\n{previous.output}", "agent": "writer" }
182
+ ] }
183
+ ```
184
+
179
185
  ## Watch it run
180
186
 
181
187
  This is not a mockup. **This is stdout from a real run** — the `self-improve` flow that writes and verifies its own test suites, caught mid-flight by a quality gate:
@@ -262,6 +268,60 @@ The intermediate summaries never enter your context. The runtime owns them; you
262
268
 
263
269
  No scripting. No `eval`. Just data the runtime executes — safe enough to run LLM-generated definitions directly.
264
270
 
271
+ ### Loop until done
272
+
273
+ Some work is inherently iterative — refine a draft until a reviewer is satisfied, retry-and-improve until tests pass, converge on an answer:
274
+
275
+ ```jsonc
276
+ {
277
+ "id": "refine",
278
+ "type": "loop",
279
+ "task": "Improve this draft (iteration {loop.iteration}). Previous attempt:\n{loop.lastOutput}\n\nReturn JSON {\"draft\":\"…\",\"done\":true|false}.",
280
+ "until": "{steps.refine.json.done} == true",
281
+ "output": "json",
282
+ "maxIterations": 6,
283
+ "convergence": true
284
+ }
285
+ ```
286
+
287
+ See [Loop phases](#loop-until-done-loop) for the full reference.
288
+
289
+ ### Plan, then execute (runtime sub-flows)
290
+
291
+ A planner decides *at runtime* what work to spawn — each iteration's plan depends on the previous result:
292
+
293
+ ```jsonc
294
+ {
295
+ "name": "iterative-replan",
296
+ "phases": [
297
+ { "id": "plan", "type": "agent", "agent": "planner",
298
+ "task": "Given the current state, output a JSON taskflow definition (with phases[]).",
299
+ "output": "json" },
300
+ { "id": "execute", "type": "flow", "def": "{steps.plan.json}",
301
+ "dependsOn": ["plan"] }
302
+ ]
303
+ }
304
+ ```
305
+
306
+ The generated sub-flow is **validated** (no cycles, no dangling refs, no duplicate IDs) before a single token is spent. See [`examples/dynamic-plan-execute.json`](./examples/dynamic-plan-execute.json) and [`examples/iterative-replan.json`](./examples/iterative-replan.json).
307
+
308
+ ### Tournament (compete and judge)
309
+
310
+ For open-ended creative or subjective work, spawn several competing variants and let a judge pick the best:
311
+
312
+ ```jsonc
313
+ {
314
+ "id": "headline",
315
+ "type": "tournament",
316
+ "task": "Write a punchy headline for this launch post.",
317
+ "variants": 4,
318
+ "judge": "Pick the headline with the strongest hook and clearest promise.",
319
+ "mode": "best"
320
+ }
321
+ ```
322
+
323
+ See [Tournament phases](#tournament-tournament) for the full reference.
324
+
265
325
  ## Phase types
266
326
 
267
327
  | type | what it does | required fields |
@@ -289,23 +349,56 @@ Every phase needs a unique `id` and a `type` (defaults to `agent`). On top of th
289
349
  | `retry` | `{ max, backoffMs?, factor? }` — retry a failing subagent |
290
350
  | `output` | `"text"` (default) or `"json"` (exposes `{steps.ID.json}`) |
291
351
  | `model` / `thinking` / `tools` | Per-phase overrides for the subagent |
292
- | `cwd` | Working directory for the subagent |
352
+ | `cwd` | Working directory for the subagent. A literal path, or a reserved keyword for **workspace isolation** — `"temp"` (ephemeral dir, removed after), `"dedicated"` (persistent dir under the run state, kept), `"worktree"` (a git worktree on a throwaway branch, removed after). Fail-open; rejected in LLM-authored sub-flows. |
353
+ | `context` | File paths to pre-read and inject into the agent prompt |
354
+ | `contextLimit` | Max chars per context file (default 8000) |
293
355
  | `concurrency` | Fan-out cap for `map` / `parallel` (overrides the flow default) |
294
356
  | `final` | Marks the result-bearing phase (else the last phase wins) |
295
357
  | `optional` | A failure here does **not** abort the run |
296
- | `use` / `with` | (`flow`) saved sub-flow name + its args |
297
- | `def` | (`flow`) inline sub-flow **generated at runtime** — usually `"{steps.plan.json}"` (mutually exclusive with `use`) |
358
+ | `shareContext` | Opt this phase's subagent into the **Shared Context Tree** (see below). Set `contextSharing: true` at the flow level to enable it for every phase |
298
359
  | `cache` | `{ scope, ttl?, fingerprint? }` — cross-run memoization (see below) |
360
+ | `onBlock` | `"halt"` (default) or `"retry"` — what happens when a gate blocks |
361
+ | `eval` | Zero-token machine-checkable criteria that run *before* the LLM gate |
362
+
363
+ Flow-level keys: `name`, `description`, `args`, `concurrency` (default 8), `agentScope`, `contextSharing`, `strictInterpolation`, and `budget: { maxUSD?, maxTokens? }`.
364
+
365
+ ### Shared Context Tree (blackboard + supervision)
366
+
367
+ By default subagents are fully isolated — they share nothing and only return a
368
+ final string. Opt a phase in with `shareContext: true` (or `contextSharing: true`
369
+ flow-wide) to give its subagent four extra tools backed by a per-run, file-based
370
+ blackboard:
299
371
 
300
- Flow-level keys: `name`, `description`, `args`, `concurrency` (default 8), `agentScope`, and `budget: { maxUSD?, maxTokens? }`.
372
+ | tool | direction | use |
373
+ |------|-----------|-----|
374
+ | `ctx_write(key, value)` | horizontal | publish a finding so siblings/descendants reuse it (stop re-reading the same files) |
375
+ | `ctx_read(key?)` | horizontal | read findings visible to this node: its own + ancestors' + **completed** others' |
376
+ | `ctx_report(summary, structured?)` | vertical ↑ | report a result up to the parent |
377
+ | `ctx_spawn(assignments[])` | vertical ↓ | delegate child work at runtime; each assignment is a flat `{task}` **or** a `{subflow}` (a dependency-bearing DAG the runtime validates and runs nested). Child reports fold back into this phase's output |
378
+
379
+ The first two are a **horizontal blackboard** (siblings reuse expensive context);
380
+ the last two are a **vertical supervision tree** (a node delegates work and its
381
+ children report up). Everything is opt-in, fail-open, depth-capped (5 levels), size-bounded
382
+ (256KB per value, 256 keys per node, 16 spawn assignments max), and cleaned up
383
+ with the run — flows that don't opt in behave exactly as before.
384
+
385
+ ```jsonc
386
+ { "id": "survey", "type": "agent", "agent": "scout", "shareContext": true,
387
+ "task": "Map the API surface. ctx_write key 'endpoints' so the auditors don't re-scan." },
388
+ { "id": "audit", "type": "map", "over": "{steps.survey.json}", "shareContext": true,
389
+ "dependsOn": ["survey"], "agent": "analyst",
390
+ "task": "ctx_read 'endpoints' for shared context, then audit {item} for missing auth." }
391
+ ```
301
392
 
302
393
  ### Control flow & reliability
303
394
 
304
- - **`when`** — skip a phase unless an expression is truthy. Supports `{refs}`, `== != < > <= >=`, `&& || !`, parentheses, and quoted strings/numbers. Pair with `join: "any"` on the merge phase for real if/else routing. Parse errors **fail open**.
395
+ - **`when`** — skip a phase unless an expression is truthy. Supports `{refs}`, `== != < > <= >=`, `&& || !`, parentheses, and quoted strings/numbers. Pair with `join: "any"` on the merge phase for real if/else routing. Parse errors **fail open** (the phase runs — never silently dropped).
305
396
  - **`join: "any"`** — an OR-join: the phase runs as soon as *one* dependency completes (default `"all"` waits for all).
306
397
  - **`retry`** — `{ "max": 2, "backoffMs": 500, "factor": 2 }` retries a failing subagent with fixed or exponential backoff; usage is summed and the attempt count shows as `↻N` in the TUI. Transient provider errors (rate-limit / 5xx / timeout) **auto-retry even without an explicit policy**; hard errors don't.
307
- - **`approval`** — pause for a human (Approve / Reject / Edit). Reject halts the flow; Edit injects the typed note as the phase output for downstream steps. Non-interactive runs auto-reject (safety: approval gates are never bypassed).
308
- - **`flow`** — `{ "type": "flow", "use": "deep-research", "with": { "topic": "{item}" } }` runs a **saved** flow as a phase (recursion is detected and rejected). Or **generate the sub-flow at runtime**: `{ "type": "flow", "def": "{steps.plan.json}" }` resolves an upstream phase's JSON output into a sub-flow, **validates it (cycles / dangling refs / duplicate ids), then runs it** — the number and shape of the generated phases is decided at runtime, not authored in advance. A malformed plan fails *open* (the phase is skipped with a `defError`, the run continues). This is how a planner decides *at runtime* what work to spawn — the declarative answer to a code-mode `for` loop, with each generated plan checked before it spends a token. Pair it with `loop` for **data-dependent iterative replanning** (round N's plan depends on round N-1's result). See [`examples/dynamic-plan-execute.json`](./examples/dynamic-plan-execute.json) and [`examples/iterative-replan.json`](./examples/iterative-replan.json).
398
+ - **`onBlock`** — `"halt"` (default) stops the run when a gate blocks. `"retry"` retries upstream phases when a gate blocks, instead of halting a self-healing rework loop with budget and idle-watchdog guards and a nested recursion depth cap.
399
+ - **`eval`** — zero-token machine-checkable criteria that run *before* the LLM gate. If the eval check fails, the gate blocks without spawning an agent.
400
+ - **`approval`** — pause for a human (Approve / Reject / Edit). Reject halts the flow; Edit injects the typed note as the phase output for downstream steps. Non-interactive runs (detached / CI) **auto-reject** (safety: approval gates are never bypassed).
401
+ - **`flow`** — `{ "type": "flow", "use": "deep-research", "with": { "topic": "{item}" } }` runs a **saved** flow as a phase (recursion is detected and rejected). Or **generate the sub-flow at runtime**: `{ "type": "flow", "def": "{steps.plan.json}" }` resolves an upstream phase's JSON output into a sub-flow, **validates it (cycles / dangling refs / duplicate ids / dead-ends), then runs it** — the number and shape of the generated phases is decided at runtime, not authored in advance. A malformed plan fails *open* (the phase is skipped with a `defError`, the run continues). This is how a planner decides *at runtime* what work to spawn — the declarative answer to a code-mode `for` loop, with each generated plan checked before it spends a token. Security hardening for LLM-generated sub-flows: breadth caps (100 phases, 200 map items, 16 concurrency), `cwd` containment, budget clamped to `min(child, parent)`, nesting cap (5 levels), and prototype-pollution defense (deep-cloned, `__proto__`/`constructor`/`prototype` stripped). Pair it with `loop` for **data-dependent iterative replanning** (round N's plan depends on round N-1's result). See [`examples/dynamic-plan-execute.json`](./examples/dynamic-plan-execute.json) and [`examples/iterative-replan.json`](./examples/iterative-replan.json).
309
402
 
310
403
  ### Loop-until-done (`loop`)
311
404
 
@@ -348,8 +441,6 @@ For open-ended work, the best result often comes from generating several candida
348
441
  - **Judge** — after the fan-out, one judge agent sees every variant (numbered) plus your `judge` rubric and picks a winner via a `WINNER: <n>` line or `{"winner": n}`. An unreadable verdict **fails open** to variant 1; a failed judge falls back too — the work is never lost.
349
442
  - **`mode`** — `best` returns the winning variant **verbatim**; `aggregate` returns the judge's **synthesized** answer combining the strongest parts.
350
443
  - **Short-circuits:** if only one competitor survives, it wins with no judge call; if all fail, the phase fails. The TUI shows `⚑ N→#k`; usage sums variants + judge. Like `gate`, it's **excluded from `cross-run` cache**.
351
- - **`budget`** — a run-wide `{maxUSD, maxTokens}` ceiling; once exceeded, pending phases skip and in-flight fan-out stops spawning, ending the run as `blocked`.
352
- - **idle watchdog** — a subagent that goes silent for 5 minutes is treated as wedged and killed (SIGTERM → SIGKILL), so one hung child can never freeze the whole flow.
353
444
 
354
445
  ### Cross-run memoization (`cache`)
355
446
 
@@ -371,7 +462,7 @@ Every phase is already content-addressed: within a single run's **resume**, a ph
371
462
  - **`scope`** — `"run-only"` (default) is exactly the historical behavior (within-run resume only). `"cross-run"` opts the phase into the persistent store. `"off"` disables reuse entirely (even within a run), for debugging.
372
463
  - **Freshness is the whole game.** The cache key already includes the prompt, the `over` items, and any `context` files (pre-read into the task). `fingerprint` folds *implicit* inputs into the key so "the world changed" becomes a cache miss: `git:HEAD`, `glob:<pat>` (size+mtime), `glob!:<pat>` (content hash), `file:<path>`, `env:<NAME>`. `ttl` (`30m`/`6h`/`7d`) is a time backstop.
373
464
  - **Honest limit:** a subagent that reads a file it didn't declare in `context`/`fingerprint` can still serve a stale `cross-run` hit. That's why the default is `run-only` and why `gate`/`approval` phases are **forbidden** from `cross-run` (they must produce a fresh result each run). Opt in only for phases whose output is a function of declared inputs.
374
- - Cache lives in `.pi/taskflows/cache/` (gitignored). Clear it with `action: "cache-clear"`. Full rationale: [`docs/rfc-cross-run-memoization.md`](./docs/rfc-cross-run-memoization.md).
465
+ - Cache lives in `.pi/taskflows/cache/` (gitignored). Clear it with `action: "cache-clear"` on the tool. Full rationale: [`docs/internal/rfc-cross-run-memoization.md`](./docs/internal/rfc-cross-run-memoization.md).
375
466
 
376
467
  ### Gate phases (quality control)
377
468
 
@@ -398,11 +489,16 @@ Review the audit below. If any endpoint is missing auth, end with
398
489
  | `{steps.ID.json}` | prior output parsed as JSON (or `{steps.ID.json.field}`) |
399
490
  | `{item}` / `{item.field}` | current item inside a `map` phase |
400
491
  | `{previous.output}` | the immediately-upstream phase output |
492
+ | `{loop.iteration}` | current iteration number inside a `loop` phase |
493
+ | `{loop.lastOutput}` | previous iteration's output inside a `loop` phase |
494
+ | `{loop.maxIterations}` | the iteration cap inside a `loop` phase |
401
495
 
402
496
  Condition grammar (for `when`): `== != < > <= >=`, `&& || !`, parentheses, quoted strings/numbers, and any `{...}` reference — e.g. `"when": "{steps.triage.json.route} == deep && {args.force} != true"`.
403
497
 
404
498
  > Referencing `{steps.X}` that isn't declared in `dependsOn` is a **hard validation error** — the runtime catches the most common pipeline bug before a single agent runs.
405
499
 
500
+ > Unresolved interpolation refs (e.g. `{args.typo}` or a missing `dependsOn`) are surfaced as **phase warnings** (`PhaseState.warnings`) in the run record and `/tf runs` — no more silent intact placeholders.
501
+
406
502
  ## Commands
407
503
 
408
504
  Saved flows become CLI shortcuts. All commands run in the Pi session:
@@ -412,12 +508,31 @@ Saved flows become CLI shortcuts. All commands run in the Pi session:
412
508
  | `/tf list` | List all saved flows |
413
509
  | `/tf run <name> [args]` | Run a saved flow (e.g. `/tf run summarize-files dir=src`) |
414
510
  | `/tf show <name>` | Print a flow's definition |
415
- | `/tf runs` | Browse recent run history (interactive TUI) |
511
+ | `/tf compile <name> [lr\|td]` | **Render the flow as a Mermaid diagram + verification overlay** — 0 tokens, no LLM; paste into a README/issue/PR |
512
+ | `/tf runs` | Browse recent run history (interactive TUI — **live auto-refreshes** while any run is active) |
416
513
  | `/tf resume <runId>` | Continue a paused/failed run — cached phases skip automatically |
417
514
  | `/tf init` | **Interactively map model roles** to your enabled models (writes `~/.pi/agent/settings.json`) |
418
515
  | `/tf:<name> [args]` | Shortcut — runs the flow in one tap |
419
516
 
420
- Tool actions (used by the model): `run` (inline `define` or saved `name`), `save`, `resume`, `list`, `init`.
517
+ Tool actions (used by the model): `run` (inline `define` or saved `name`), `save`, `resume`, `list`, `agents`, `init`, `verify`, `compile`, `cache-clear`.
518
+
519
+ ## Background (detached) execution
520
+
521
+ Pass `detach: true` to run a taskflow in a detached child process — the tool returns immediately with the `runId` and the flow continues running even if the host session exits:
522
+
523
+ ```jsonc
524
+ {
525
+ "action": "run",
526
+ "name": "nightly-audit",
527
+ "detach": true
528
+ }
529
+ ```
530
+
531
+ - The child process reads serialized context, calls the orchestration engine, and persists terminal state to the store.
532
+ - Status is polled via `/tf runs` (which now **auto-refreshes live** when any run is running) or `action: "resume"`.
533
+ - Stale PID detection via signal-0 probe; the idle watchdog kills stalled children.
534
+ - **Approval phases auto-reject** in detached mode — human gates are never silently bypassed.
535
+ - `resume` works normally after a detached run completes or fails.
421
536
 
422
537
  ## Resume across sessions
423
538
 
@@ -432,12 +547,13 @@ Resume is keyed on each phase's input hash — if an upstream output changed, de
432
547
  ## Storage
433
548
 
434
549
  ```
435
- .pi/taskflows/<name>.json # project-scoped definitions (commit to share)
436
- ~/.pi/agent/taskflows/<name>.json # user-scoped definitions
550
+ .pi/taskflows/<name>.json # project-scope definitions (commit to share)
551
+ ~/.pi/agent/taskflows/<name>.json # user-scope definitions
437
552
  .pi/taskflows/runs/<flowName>/<runId>.json # run state for resume (gitignore this)
553
+ .pi/taskflows/cache/ # cross-run memoization cache (gitignored)
438
554
  ```
439
555
 
440
- > Commit `.pi/taskflows/` and your whole team shares the pipelines — no config sync, no onboarding doc. Run state is written atomically and guarded by a zero-dependency file lock, so concurrent runs never corrupt the index.
556
+ > Commit `.pi/taskflows/` and your whole team shares the pipelines — no config sync, no onboarding doc. Run state is written atomically via `writeFileAtomic()` (temp file + `renameSync`) and guarded by a zero-dependency file lock (`O_CREAT|O_EXCL` with stale-lock steal via atomic rename), so concurrent runs never corrupt the index.
441
557
 
442
558
  Agent discovery scope (via `agentScope` in the flow definition):
443
559
 
@@ -447,6 +563,8 @@ Agent discovery scope (via `agentScope` in the flow definition):
447
563
  | `"project"` | `.pi/agents/*.md` (walks up the tree) |
448
564
  | `"both"` | user + project; project wins on name collision |
449
565
 
566
+ Run cleanup is configurable via `maxKeptRuns` and `maxRunAgeDays` in settings.
567
+
450
568
  ## Agents
451
569
 
452
570
  Taskflow ships **18 built-in agents** — each a `.md` file with a tuned system prompt, thinking level, and tool set. You can reference them by `name` in any phase or shorthand, right after install. No setup required.
@@ -601,6 +719,8 @@ Ready-to-read definitions in [`examples/`](./examples):
601
719
  | [`summarize-files.json`](./examples/summarize-files.json) | discover → `map` fan-out → `reduce` |
602
720
  | [`conditional-research.json`](./examples/conditional-research.json) | `when` routing + `join: any` + `gate` + `budget` |
603
721
  | [`guarded-refactor.json`](./examples/guarded-refactor.json) | `approval` (human-in-the-loop) + `retry` + `gate` |
722
+ | [`dynamic-plan-execute.json`](./examples/dynamic-plan-execute.json) | `flow { def }` — plan then execute at runtime |
723
+ | [`iterative-replan.json`](./examples/iterative-replan.json) | `loop` + `flow { def }` — iterative replanning |
604
724
 
605
725
  Copy one into `.pi/taskflows/<name>.json` (or `~/.pi/agent/taskflows/`) and it registers as `/tf:<name>` — or just point the model at it.
606
726
 
@@ -608,13 +728,13 @@ Copy one into `.pi/taskflows/<name>.json` (or `~/.pi/agent/taskflows/`) and it r
608
728
 
609
729
  <div align="center">
610
730
 
611
- **0 runtime dependencies** · **608 tests** · **9 phase types** · **cross-session resume** · **cross-run memoization** · **~7.7k LOC runtime**
731
+ **0 runtime dependencies** · **702 tests** · **9 phase types** · **shared context tree** · **cross-session resume** · **cross-run memoization** · **detached execution** · **`compile` Mermaid renderer** · **~9k LOC runtime**
612
732
 
613
733
  </div>
614
734
 
615
735
  - **Zero runtime dependencies.** No `dependencies` field — the runtime is built entirely on Node built-ins (`fs` / `path` / `os` / `child_process` / `crypto`). The file lock is `fs.openSync("wx")`, not a third-party library.
616
- - **608 tests across 26 test files** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, live run-history refresh, callback isolation, the idle watchdog, model-role init config, and parseModelFromLabel with parenthesized-model-name regression.
617
- - **Hardened by design.** Path-traversal defense (lexical + `realpath`), runId validation, HTML/error sanitization, atomic writes, stale-lock stealing via `rename`, and an idle watchdog that kills wedged subagents.
736
+ - **702 tests across 34 test files** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, the shared context tree (blackboard reuse, supervision spawn, subflow validation/nesting), workspace isolation (temp/dedicated/worktree lifecycle, fail-open degrade, dynamic-flow rejection), dynamic sub-flow security hardening, detached execution (PID persistence, stale detection, crash→failed, resume after failure), live run-history refresh, callback isolation, the idle watchdog, model-role init config, parseModelFromLabel with parenthesized-model-name regression, and multi-fence `safeParse` recovery, plus the `compile` Mermaid renderer (id-collision disambiguation, markdown-injection hardening, and full verify-overlay category coverage).
737
+ - **Hardened by design.** Path-traversal defense (lexical + `realpath` containment check), runId validation, HTML/error sanitization, atomic writes, stale-lock stealing via `rename`, and an idle watchdog that kills wedged subagents (SIGTERM → SIGKILL after 5 minutes of silence). Dynamic sub-flows additionally get breadth caps, `cwd` containment, budget clamping, nesting depth caps, and prototype-pollution defense.
618
738
  - **Dogfooded.** Every new feature has to survive the project's own `self-improve` taskflow before it ships.
619
739
 
620
740
  ## 🍽️ We eat our own dog food
@@ -625,41 +745,50 @@ Our `self-improve` flow is a 10-phase DAG — it audits the codebase, patches de
625
745
 
626
746
  | Campaign | Scale | Phases | Outcome |
627
747
  |----------|-------|--------|---------|
628
- | [v0.0.8 dogfood](./docs/dogfooding-v0.0.8-report.md) | Full codebase audit → triage → fix → verify | 10 phases, 234 tests | 13 fixes, all pass |
629
- | [v0.0.6 self-audit](./docs/self-audit-report.md) | inventory → map audit → gate → approval → map fix → reduce | 9 phases | 11 critical defects fixed |
630
- | [Cross-run cache dogfood](./docs/rfc-cross-run-memoization.md) | Real runtime + on-disk store | Dedicated test harness | Cache correctness under adversarial fingerprints |
631
- | [Adversarial cross-review](./docs/brainstorm-adversarial-review-report.md) | Multi-agent adversarial review | `tournament` + `gate` | P0 cache-key fix shipped |
632
- | [Init redesign review](./docs/issue-necessity-review-report.md) | Necessity audit → parallel checks → verdict | 7 phases | Full redesign plan validated |
633
- | [Round 2 adversarial audit](./docs/internal/dogfooding-report.md) | Phase-by-phase DAG execution — 12 findings across runner/runtime/interpolate/verify | 14 phases | 10 fixes applied, 0 regressions |
748
+ | [v0.0.8 dogfood](./docs/internal/dogfooding-v0.0.8-report.md) | Full codebase audit → triage → fix → verify | 10 phases, 234 tests | 13 fixes, all pass |
749
+ | [v0.0.6 self-audit](./docs/internal/self-audit-report.md) | inventory → map audit → gate → approval → map fix → reduce | 9 phases | 11 critical defects fixed |
750
+ | [Cross-run cache dogfood](./docs/internal/rfc-cross-run-memoization.md) | Real runtime + on-disk store | Dedicated test harness | Cache correctness under adversarial fingerprints |
751
+ | [Adversarial cross-review](./docs/internal/brainstorm-adversarial-review-report.md) | Multi-agent adversarial review | `tournament` + `gate` | P0 cache-key fix shipped |
752
+ | [Init redesign review](./docs/internal/issue-necessity-review-report.md) | Necessity audit → parallel checks → verdict | 7 phases | Full redesign plan validated |
753
+ | [Round 2 adversarial audit](./docs/internal/dogfooding-report.md) | Integration layer + cross-module — 12 findings across runner/runtime/interpolate/verify | 14 phases | 10 fixes applied, 0 regressions |
634
754
  | [Round 3 adversarial audit](./docs/internal/dogfooding-report.md) | Integration layer + cross-module — 10 findings across index/agents/cache/render/runs-view | 9 phases | 10 fixes applied, 0 regressions |
755
+ | [v0.0.23 Shared Context Tree](./docs/internal/dogfooding-report.md) | End-to-end validation: org-tree spawn, 5-way audit via loop+gate | 6 e2e runs | Spawn-drain bug fixed, 50 new tests |
635
756
 
636
757
  > **Meta:** we used `pi-taskflow`'s `map` fan-out, `gate` verdicts, `approval` human-in-the-loop, `tournament` best-of-N, `loop` until-done, and `cross-run` cache — to build `pi-taskflow`.
637
758
 
638
759
  ## Status & limits
639
760
 
640
- **v0.0.20** — loop-until-done (`loop` phase: iterate to a condition, convergence, or cap), tournament (best-of-N with a judge), cross-run memoization (content-addressed cache with git/file/glob/env fingerprints and TTL), interactive `/tf init` with role-aware model pickers + diff preview + atomic merge-write, configurable built-in agents, 18 built-in agents with 6 model roles. Full control-flow & reliability layer (`when` guards, `join: any`, `retry`/backoff, `approval`, `flow` composition, `budget` caps, idle watchdog) on top of the DSL + DAG runtime (`agent`/`parallel`/`map`/`gate`/`reduce`). Inline + saved flows, cross-session resume, live progress, and isolated context. A run executes as one streaming tool call.
761
+ **v0.0.23** — **Shared Context Tree**: opt-in (`shareContext` / `contextSharing`) blackboard + supervision tools (`ctx_read`/`ctx_write` horizontal reuse, `ctx_report`/`ctx_spawn` vertical supervision); `ctx_spawn` accepts a flat task **or** a dependency-bearing `subflow` (a runtime-validated nested DAG), depth-capped on a unified nesting counter with budget accounting. **Workspace isolation**: a phase's `cwd` accepts reserved keywords `temp`/`dedicated`/`worktree` — the runtime allocates an isolated dir (or a git worktree on a throwaway branch) and tears it down after the phase, fail-open, rejected in LLM-authored sub-flows. Prior: loop-until-done (`loop`), tournament (best-of-N with a judge), cross-run memoization (content-addressed cache with git/file/glob/env fingerprints and TTL), interactive `/tf init`, configurable built-in agents, 18 built-in agents with 6 model roles. Full control-flow & reliability layer (`when` guards, `join: any`, `retry`/backoff, `approval`, `flow` composition, `budget` caps, `onBlock: "retry"`, `eval` machine gates, idle watchdog) on top of the DSL + DAG runtime (`agent`/`parallel`/`map`/`gate`/`reduce`). Inline + saved flows, cross-session resume, live progress, and isolated context. A run executes as one streaming tool call.
641
762
 
642
763
  Known boundaries (tracked, bounded — no surprises mid-flow):
643
764
 
644
- - **Detached background execution (new).** Add `detach: true` to `action: "run"` to spawn the flow in a detached child process. The tool returns immediately with the `runId`; the flow continues running even if the host session exits. Status is polled via the store (`/tf runs` or `action: "resume"`). Approval phases auto-reject in detached mode.
765
+ - **Shared context is opt-in.** Subagents share nothing unless a phase sets `shareContext` (or the flow sets `contextSharing`). The blackboard is per-run, file-based, size-bounded, and cleaned up with the run. Spawn nesting is capped at `MAX_DYNAMIC_NESTING` (5). A spawned flat task is not individually checkpointed on crash it re-runs on resume (spawned *subflows* resume their completed inner phases via the cache).
766
+ - **Workspace isolation is fail-open.** `cwd: "worktree"` requires the base cwd to be a git work tree; otherwise it degrades to a `temp` dir (with a warning). `temp`/`worktree` dirs are removed when the phase ends — a hard crash mid-phase may leave a stray dir (cleaned on the next run for `dedicated`; `temp`/`worktree` are under the OS tmpdir). The reserved keywords are honoured only in author-written flows.
645
767
  - **No `output: "file"`.** Outputs are text/JSON only — write files via an agent's `write` tool call.
646
768
  - **`map` requires a JSON array.** The `over` field must resolve to a `{steps.ID.json}` array. Wrap a text list in a single-agent `output: "json"` phase first.
647
769
  - **The DAG must be acyclic.** Cycles are rejected at validation.
770
+ - **Cross-run cache excludes `gate`, `approval`, `loop`, and `tournament`.** These must produce a fresh result each run.
771
+ - **Approval auto-rejects in detached mode.** This is a safety invariant — approval gates are never silently bypassed.
648
772
 
649
773
  ## Development
650
774
 
651
775
  ```bash
652
776
  npm install
653
- npm run typecheck
654
- npm test # unit tests — no network, no process spawning
655
- npm run test:e2e # real end-to-end (spawns live subagents; needs model access)
777
+ npm run typecheck # tsc --noEmit
778
+ npm test # unit tests — no network, no process spawning
779
+ npm run test:e2e # real end-to-end (spawns live subagents; needs model access)
780
+ npm run test:e2e-context
781
+ npm run test:e2e-context-value
782
+ npm run test:e2e-team
783
+ npm run test:e2e-spawn-subflow
784
+ npm run test:dogfood-cache
656
785
  ```
657
786
 
658
787
  Runtime lives in `extensions/`, tests in `test/`, and runnable examples in `examples/`.
659
788
 
660
789
  ## Contributing
661
790
 
662
- Contributions welcome — this is a young, fast-moving project. Open an issue or PR on [GitHub](https://github.com/heggria/pi-taskflow). Good first contributions: new example flows, phase-type ideas, and TUI polish.
791
+ Contributions welcome — this is a young, fast-moving project. Open an issue or PR on [GitHub](https://github.com/heggria/pi-taskflow). Good first contributions: new example flows, phase-type ideas, and TUI polish. See [`CONTRIBUTING.md`](./CONTRIBUTING.md) and [`AGENTS.md`](./AGENTS.md) for coding conventions and common task recipes.
663
792
 
664
793
  ## License
665
794
 
package/README.zh-CN.md CHANGED
@@ -8,7 +8,7 @@
8
8
  <a href="./LICENSE"><img src="https://img.shields.io/badge/license-MIT-43D9AD?style=flat-square" alt="MIT license"></a>
9
9
  <a href="#whats-inside"><img src="https://img.shields.io/badge/runtime%20deps-0-43D9AD?style=flat-square" alt="zero runtime dependencies"></a>
10
10
  <a href="https://github.com/heggria/pi-taskflow/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/heggria/pi-taskflow/ci.yml?branch=main&style=flat-square&label=CI" alt="CI status"></a>
11
- <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-535-6E8BFF?style=flat-square" alt="535 tests"></a>
11
+ <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-702-6E8BFF?style=flat-square" alt="702 tests"></a>
12
12
  <a href="#whats-inside"><img src="https://img.shields.io/badge/dogfooded-%E2%9C%93-43D9AD?style=flat-square" alt="dogfooded"></a>
13
13
  <a href="https://pi.dev"><img src="https://img.shields.io/badge/for-Pi%20coding%20agent-B692FF?style=flat-square" alt="for the Pi coding agent"></a>
14
14
  </p>