npm - pi-taskflow - Versions diffs - 0.0.17 → 0.0.18 - Mend

pi-taskflow 0.0.17 → 0.0.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/CHANGELOG.md +36 -0
package/README.md +57 -24
package/README.zh-CN.md +50 -17
package/examples/dynamic-plan-execute.json +34 -0
package/examples/iterative-replan.json +30 -0
package/extensions/index.ts +2 -2
package/extensions/runtime.ts +169 -11
package/extensions/schema.ts +56 -1
package/extensions/store.ts +5 -1
package/package.json +3 -4
package/skills/taskflow/SKILL.md +38 -4

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,42 @@
 All notable changes to pi-taskflow are documented here. This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) format.
+## [0.0.18] — 2026-06-09
+### Added
+- **Runtime dynamic sub-flows — `flow { def }`.** A `flow` phase may now carry an inline `def` (mutually exclusive with `use`) that is resolved at runtime — typically from an upstream phase's JSON output (`"def": "{steps.plan.json}"`) — validated, verified, and executed as a nested sub-flow. This is the declarative answer to code-mode `for`/`if`: a planner decides *at runtime* what work to spawn, and every generated plan is structurally checked (cycles / dangling refs / duplicate ids / dead-ends) before it spends a token.
+  - Accepts a full Taskflow `{name,phases}`, a bare `phases` array, or `{phases:[...]}` (markdown ```json fences tolerated). Pure data — no `eval`.
+  - **Iterative replanning**: pair with `loop` so round N's plan depends on round N-1's *result* (not a one-shot fan-out).
+  - **Fail-open**: a malformed/invalid/unverifiable def never aborts the run — the phase resolves as a no-op with a `defError` diagnostic and upstream output is preserved. An empty `phases` array is a valid no-op.
+  - New examples: `examples/dynamic-plan-execute.json`, `examples/iterative-replan.json`.
+### Security
+- **Hardening for runtime-generated (untrusted) sub-flows**, enforced only when content is LLM-authored:
+  - Breadth caps: `MAX_DYNAMIC_PHASES` (100), `MAX_DYNAMIC_CONCURRENCY` (16, flow- and phase-level), `MAX_DYNAMIC_MAP_ITEMS` (200, fan-out truncated not blocked).
+  - `cwd` containment: a generated phase cannot escape the run directory.
+  - Budget clamp: a generated def's budget is clamped to `min(child, parent)` per dimension — it can only ever be tighter, never looser.
+  - Nesting cap: `MAX_DYNAMIC_NESTING` (5) bounds inline self-spawning depth.
+  - Prototype-pollution defense: inline defs are deep-cloned and `__proto__`/`constructor`/`prototype` own-keys are stripped.
+- Authored/saved flows (`use`) are unchanged and not subject to these dynamic caps.
+### Notes
+- 25 new tests (`test/flow-def.test.ts`); 560 total, zero regression. Design + two-round cross-adversarial review (engineering-risk / design-critic / architecture / security) recorded under `docs/internal/`.
+## [0.0.17] — 2026-06-09
+### Fixed
+- **28 fixes from 3-round adversarial dogfooding across 11 files.**
+- **store.ts**: validateRunId path-traversal guard in saveRun, cleanupTerminalRuns race condition mtime guard, saveFlow file locking (prevents concurrent write loss), saveFlow unified sanitization via safeFlowDirName, SharedArrayBuffer hoisted to module scope, empty flow name rejection, conditional .pi/ creation hint.
+- **runner.ts**: signal kill detection (killedBySignal), idle timeout excluded from transient error retry, message cap (500) with truncation notice, stderr cap (64KB) with truncation notice.
+- **runtime.ts**: loop abort semantics (stop: "aborted"), failed phase interpolation (sensible placeholder instead of raw template), tournament judge budget/abort guard, retry factor asymmetry documentation.
+- **interpolate.ts**: tokenizer escaped quote handling (character-by-character loop), graceful dig() trailing path segment resolution.
+- **index.ts**: /tf save and /tf verify tab completion, JSON string define parsing in renderCall label, escaped quote handling in parseArgsString.
+- **agents.ts**: YAML tools type validation (reject non-string/array), atomic writeFileAtomic in syncBuiltinAgentsToProject.
+- **cache.ts**: 30s timeout on execFileSync git calls.
+- **verify.ts**: budget maxUSD overflow detection.
+- **render.ts**: consistent numerator/denominator in summarizeRun.
+- **runs-view.ts**: timeAgo negative timestamp guard, blocked status removed from isResumable.
 ## [0.0.16] — 2026-06-09
 ### Added

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 <div align="center">
-<img src="./assets/hero.png" alt="pi-taskflow — declarative DAG orchestration for Pi subagents: stateful, resumable, context-isolated" width="900">
+<img src="./assets/hero.png" alt="pi-taskflow — a declarative, verifiable graph of task nodes for Pi subagents: stateful, resumable, context-isolated" width="900">
 <p>
   <a href="https://www.npmjs.com/package/pi-taskflow"><img src="https://img.shields.io/npm/v/pi-taskflow?style=flat-square&color=B692FF&label=npm" alt="npm version"></a>
@@ -8,7 +8,7 @@
   <a href="./LICENSE"><img src="https://img.shields.io/badge/license-MIT-43D9AD?style=flat-square" alt="MIT license"></a>
   <a href="#whats-inside"><img src="https://img.shields.io/badge/runtime%20deps-0-43D9AD?style=flat-square" alt="zero runtime dependencies"></a>
   <a href="https://github.com/heggria/pi-taskflow/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/heggria/pi-taskflow/ci.yml?branch=main&style=flat-square&label=CI" alt="CI status"></a>
-  <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-524-6E8BFF?style=flat-square" alt="524 tests"></a>
+  <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-535-6E8BFF?style=flat-square" alt="535 tests"></a>
   <a href="#whats-inside"><img src="https://img.shields.io/badge/dogfooded-%E2%9C%93-43D9AD?style=flat-square" alt="dogfooded"></a>
   <a href="https://pi.dev"><img src="https://img.shields.io/badge/for-Pi%20coding%20agent-B692FF?style=flat-square" alt="for the Pi coding agent"></a>
 </p>
@@ -16,16 +16,16 @@
 <p align="center">
   <b>English</b> ·
   <a href="./README.zh-CN.md">简体中文</a> ·
-  <a href="./README.hi.md">हिन्दी</a> ·
-  <a href="./README.es.md">Español</a> ·
-  <a href="./README.ar.md">العربية</a> ·
-  <a href="./README.bn.md">বাংলা</a> ·
-  <a href="./README.pt.md">Português</a> ·
-  <a href="./README.ru.md">Русский</a>
+  <a href="./docs/i18n/README.hi.md">हिन्दी</a> ·
+  <a href="./docs/i18n/README.es.md">Español</a> ·
+  <a href="./docs/i18n/README.ar.md">العربية</a> ·
+  <a href="./docs/i18n/README.bn.md">বাংলা</a> ·
+  <a href="./docs/i18n/README.pt.md">Português</a> ·
+  <a href="./docs/i18n/README.ru.md">Русский</a>
 </p>
-<p><strong>Declarative DAG orchestration for <a href="https://pi.dev">Pi</a> subagents.</strong><br/>
-Fan out · gate · resume · save as a command — intermediate results stay out of your context.</p>
+<p><strong>A declarative, verifiable <em>graph of tasks</em> for <a href="https://pi.dev">Pi</a> subagents.</strong><br/>
+Not a workflow you script — a DAG you declare. Fan out · gate · resume · save as a command — intermediate results stay out of your context.</p>
 ```bash
 pi install npm:pi-taskflow
@@ -35,23 +35,38 @@ pi install npm:pi-taskflow
 ---
-**Subagents are fire-and-forget. Taskflows fire, fan out, pause, gate, resume, and save themselves as a command.**
+**A `workflow` flows. A `taskflow` is a *graph*.** Other orchestrators let the model *script* the work — imperative code that flows step by step, with the graph hidden inside control flow. `pi-taskflow` does the opposite: you **declare** the work as a graph of discrete, named **task** nodes connected by `dependsOn` edges — and the runtime *verifies that graph before it spends a single token.*
 You already know the built-in subagent tool's `task` / `tasks` / `chain`. `pi-taskflow` speaks the *same* shorthand — so your existing delegations instantly become **tracked, resumable, and saveable as a one-word `/tf:<name>` command**. When you outgrow the shorthand, the full DSL gives you a real DAG: dynamic fan-out over dozens of items, conditional routing, quality gates, human approvals, retries, and a hard spend ceiling.
 And the whole time, **only the final phase reaches your conversation.** Every intermediate transcript stays in the runtime, never your context window.
+## Why "taskflow" and not "workflow"?
+The name is the thesis. In engineering, a **task** is a *discrete, declared unit of work* — the node of a task graph (the same `task` a build system, scheduler, or compiler wires into a DAG). **Work**, by contrast, is *fluid and unbounded* — the continuous, imperative act of doing.
+That distinction is exactly the design split in the Pi ecosystem:
+<div align="center">
+<img src="./assets/task-vs-work.png" alt="work is a fluid imperative script whose graph hides in control flow and can't be verified before it runs; a taskflow is a declarative graph of discrete task nodes that is statically verified before any token is spent" width="900">
+</div>
+- A **`workflow`** (the dynamic, code-mode kind) is the model writing an **imperative script** that *flows*: `await agent(...)`, an `if`, a `for`, another `await`. Expressive — it's Turing-complete — but the graph only exists *as the code runs*. You can't see it, diff it, or prove it terminates before you pay for it.
+- A **`taskflow`** moves the plan **out of code and into a declarative graph of `task` nodes.** Because the graph is *data*, the runtime can do what an imperative script structurally cannot: **statically verify it** (no cycles, no dead ends, no budget overflow, no dangling refs) before a single subagent spawns, **render it** (the live progress *is* the DAG), **resume it** phase-by-phase, and **save it** as a one-word command.
+> **The trade we make on purpose:** we give up the raw expressivity of arbitrary code to gain something an imperative script can't have — a graph that is **verifiable, observable, replayable, and safe to generate with an LLM.** When a job needs twelve steps with branching fan-out and a review gate, you want a graph you can *check* — not a script you *hope* runs right.
 ## Why this exists
-Here's the wall you hit with raw subagents: you describe a multi-step plan in prose, the model re-derives it every single run, the intermediate transcripts flood your context, and the moment one model call fails you start over from zero. There's no reuse, no recovery, no structure.
+Here's the wall you hit with raw subagents: you describe a multi-step plan in prose, the model re-derives it every single run, the intermediate transcripts flood your context, and the moment one model call fails you start over from zero. There's no reuse, no recovery, no structure — and no way to *check* the plan before it burns tokens.
-`pi-taskflow` moves the plan **out of the prompt and into a declarative definition.** The runtime owns the DAG, the loops, the retries, and the intermediate state. You declare a pipeline once and run it a hundred times — by name.
+`pi-taskflow` moves the plan **out of the prompt and into a declarative graph of task nodes.** The runtime owns the DAG, the loops, the retries, and the intermediate state. You declare a pipeline once and run it a hundred times — by name. Because the plan is data, not prose and not code, it can be **validated, visualized, and replayed.**
 <div align="center">
 <img src="./assets/context-isolation.png" alt="With raw subagents every transcript floods your context; with pi-taskflow transcripts stay in the runtime and only the final result returns" width="900">
 </div>
-> When a job needs twelve steps with branching fan-out and a review gate, you want orchestration — not lucky prompting.
+> Twelve steps, branching fan-out, a review gate, a spend cap — that's a graph, and you want to *see and check* it, not re-prompt it every run.
 | | subagent (built-in) | **pi-taskflow** |
 |---|---|---|
@@ -70,7 +85,23 @@ Here's the wall you hit with raw subagents: you describe a multi-step plan in pr
 | **Live progress** | opaque while running | **live DAG render with timing + cost** |
 | **Ergonomics** | inline JSON each time | **shorthand (`task`/`tasks`/`chain`) *or* DSL** |
-It doesn't replace the subagent tool. It gives your subagents a DAG, a memory, and a name.
+It doesn't replace the subagent tool. It gives your subagents a **graph**, a memory, and a name.
+## Declarative graph vs. imperative script
+The closest thing to `pi-taskflow` in spirit is the **dynamic / code-mode workflow** — where the model writes a JavaScript orchestration script. It's powerful and genuinely expressive. But it sits at the *opposite* end of one fundamental axis: **expressivity vs. verifiability.**
+| | dynamic `workflow` (code-mode) | **`pi-taskflow`** (declarative graph) |
+|---|---|---|
+| **The plan is** | imperative JS the model writes & runs | **declarative JSON data the runtime executes** |
+| **The graph** | implicit — hidden in `if`/`for`/`await` control flow | **explicit — `phases[]` + `dependsOn` edges, a first-class object** |
+| **Verify before running** | ✗ Turing-complete; can't prove it terminates | **✓ static checks: no cycles, dead-ends, budget overflow, dangling refs** |
+| **See it** | ✗ the graph only exists as the code runs | **✓ the live progress render *is* the DAG** |
+| **Resume** | coarse (call-cache dedup) | **✓ phase-by-phase input-hash resume, cross-session** |
+| **Safe to LLM-generate** | risky — it's executable code | **✓ it's just data — no `eval`; and a runtime-generated sub-flow is *structurally validated* (cycles / dangling refs / duplicate ids) before it runs** |
+| **Expressivity ceiling** | **higher** — arbitrary control flow | bounded by the DSL, but `map`/`when`/`loop`/`gate` — plus **runtime-generated sub-flows (`flow {def}`)** for plan-then-execute and iterative replanning — cover most jobs |
+We chose the **verifiable** side on purpose. The expressivity you give up is real; what you get back — a plan you can check, watch, replay, and safely let a model author — is what turns one-off prompting into durable orchestration.
 ## Compared to other Pi extensions
@@ -95,12 +126,12 @@ The Pi ecosystem now has **20+ delegation, workflow, and orchestration extension
 - **`@pi-agents/orchid`** is the most feature-complete orchestrator in the ecosystem (DAG + worktrees + Ralph loop + agent mailbox) — but its DSL is a *fixed* 9-phase pipeline, it carries runtime deps + jiti, and it's beta. Reach for `pi-taskflow` when you want to **define your own graph** (not adopt an opinionated one) with **zero dependencies** and a one-command install.
 - **`pi-crew` / `ultimate-pi`** go heavier — worktree isolation, durable async teams, multi-tier governance. If you want lightweight, declarative, and zero-dependency, that's this project.
-- **`@zhushanwen/pi-workflow`** is the closest in spirit and also zero-dep, but you author workflows as **JavaScript scripts**. `pi-taskflow`'s **declarative JSON DSL** is safer and more auditable, and its **phase-level input-hash resume** is more granular than call-cache dedup.
+- **`@zhushanwen/pi-workflow`** is the closest in spirit and also zero-dep, but it's the **imperative** side of the split above: you author workflows as **JavaScript scripts** the model writes and runs. `pi-taskflow`'s **declarative JSON DAG** is the verifiable side — statically checkable, visualizable, safe to LLM-generate, and resumable at phase granularity rather than call-cache dedup.
 - **`@fiale-plus/pi-rogue-orchestration`** has a real **loop-until-done** (a feature `pi-taskflow` doesn't yet have). If your job is "keep going until the goal is met," it's worth a look; `pi-taskflow` is for *structured, branching* pipelines instead.
 - **`pi-subagents` / `@gotgenes/pi-subagents`** are the mature picks for ad-hoc "use reviewer on this diff" delegation and background jobs. `pi-taskflow` is for when those delegations need to become a *repeatable, resumable pipeline*.
 - **`pi-pipeline` / `pi-agent-flow`** ship *opinionated, fixed* flows. `pi-taskflow` ships an *empty canvas*: you (or the model) declare the graph that fits the job.
-> The honest one-liner: **`pi-taskflow` is the only Pi extension that gives you a declarative, resumable, DAG-shaped subagent pipeline you save as a one-word command — with zero runtime dependencies and context isolation by design.** The known gaps it's closing next: loop-until-done, worktree isolation, and non-blocking background runs (see [`STRATEGY.md`](./STRATEGY.md)).
+> The honest one-liner: **`pi-taskflow` is the only Pi extension that gives you a *declarative, verifiable, resumable* DAG of task nodes — saved as a one-word command, with zero runtime dependencies and context isolation by design.** Where code-mode workflows let the model *script* the work, `pi-taskflow` lets it *declare a graph the runtime can prove correct before running.* The known gaps it's closing next: loop-until-done, worktree isolation, and non-blocking background runs (see [`STRATEGY.md`](./STRATEGY.md)).
 ## 30-second start
@@ -241,7 +272,7 @@ No scripting. No `eval`. Just data the runtime executes — safe enough to run L
 | `gate` | quality/review step that can **halt the flow** | `task` |
 | `reduce` | aggregate `from[]` phase outputs into one | `from`, `task` |
 | `approval` | **human-in-the-loop** pause — approve / reject / edit | — |
-| `flow` | run a **saved sub-flow** as one phase (composition) | `use` |
+| `flow` | run a **sub-flow** as one phase — a **saved** flow (`use`) or a **runtime-generated** one (`def`) | `use` \| `def` |
 | `loop` | **iterate a task until done** — re-run a body until a condition, convergence, or a cap | `task`, `until` |
 | `tournament` | **N variants compete**, a judge picks the best (or aggregates) | `task` \| `branches` |
@@ -263,6 +294,7 @@ Every phase needs a unique `id` and a `type` (defaults to `agent`). On top of th
 | `final` | Marks the result-bearing phase (else the last phase wins) |
 | `optional` | A failure here does **not** abort the run |
 | `use` / `with` | (`flow`) saved sub-flow name + its args |
+| `def` | (`flow`) inline sub-flow **generated at runtime** — usually `"{steps.plan.json}"` (mutually exclusive with `use`) |
 | `cache` | `{ scope, ttl?, fingerprint? }` — cross-run memoization (see below) |
 Flow-level keys: `name`, `description`, `args`, `concurrency` (default 8), `agentScope`, and `budget: { maxUSD?, maxTokens? }`.
@@ -273,7 +305,7 @@ Flow-level keys: `name`, `description`, `args`, `concurrency` (default 8), `agen
 - **`join: "any"`** — an OR-join: the phase runs as soon as *one* dependency completes (default `"all"` waits for all).
 - **`retry`** — `{ "max": 2, "backoffMs": 500, "factor": 2 }` retries a failing subagent with fixed or exponential backoff; usage is summed and the attempt count shows as `↻N` in the TUI. Transient provider errors (rate-limit / 5xx / timeout) **auto-retry even without an explicit policy**; hard errors don't.
 - **`approval`** — pause for a human (Approve / Reject / Edit). Reject halts the flow; Edit injects the typed note as the phase output for downstream steps. Non-interactive runs auto-approve.
-- **`flow`** — `{ "type": "flow", "use": "deep-research", "with": { "topic": "{item}" } }` runs a saved flow as a phase (recursion is detected and rejected).
+- **`flow`** — `{ "type": "flow", "use": "deep-research", "with": { "topic": "{item}" } }` runs a **saved** flow as a phase (recursion is detected and rejected). Or **generate the sub-flow at runtime**: `{ "type": "flow", "def": "{steps.plan.json}" }` resolves an upstream phase's JSON output into a sub-flow, **validates it (cycles / dangling refs / duplicate ids), then runs it** — the number and shape of the generated phases is decided at runtime, not authored in advance. A malformed plan fails *open* (the phase is skipped with a `defError`, the run continues). This is how a planner decides *at runtime* what work to spawn — the declarative answer to a code-mode `for` loop, with each generated plan checked before it spends a token. Pair it with `loop` for **data-dependent iterative replanning** (round N's plan depends on round N-1's result). See [`examples/dynamic-plan-execute.json`](./examples/dynamic-plan-execute.json) and [`examples/iterative-replan.json`](./examples/iterative-replan.json).
 ### Loop-until-done (`loop`)
@@ -536,7 +568,6 @@ The model can also configure roles via the `taskflow` tool:
 | `mode: "apply-defaults"` + `force: true` | Writes `RECOMMENDED_DEFAULTS` to `settings.json`, preserving stale keys. |
 | `mode: "interactive"` | Launches the full action menu + picker flow (requires a UI session). |
-> **v0.0.13 deprecation note:** If `mode` is omitted, the tool falls back to v0.0.12 behavior when `modelRoles` is empty (auto-writes defaults) with a `console.warn` deprecation notice. If `modelRoles` already exists, it behaves as `mode: "show"`. This bridge will be removed in v0.0.14.
 ### Custom agents
@@ -577,12 +608,12 @@ Copy one into `.pi/taskflows/<name>.json` (or `~/.pi/agent/taskflows/`) and it r
 <div align="center">
-**0 runtime dependencies** · **524 tests** · **9 phase types** · **cross-session resume** · **cross-run memoization** · **~4.9k LOC runtime**
+**0 runtime dependencies** · **535 tests** · **9 phase types** · **cross-session resume** · **cross-run memoization** · **~5.4k LOC runtime**
 </div>
 - **Zero runtime dependencies.** No `dependencies` field — the runtime is built entirely on Node built-ins (`fs` / `path` / `os` / `child_process` / `crypto`). The file lock is `fs.openSync("wx")`, not a third-party library.
-- **371 tests across 14 suites** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, callback isolation, the idle watchdog, model-role init config, and parseModelFromLabel with parenthesized-model-name regression — plus a live end-to-end test that spawns real subagents and a cross-run cache dogfood.
+- **535 tests across 21 test files** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, callback isolation, the idle watchdog, model-role init config, and parseModelFromLabel with parenthesized-model-name regression.
 - **Hardened by design.** Path-traversal defense (lexical + `realpath`), runId validation, HTML/error sanitization, atomic writes, stale-lock stealing via `rename`, and an idle watchdog that kills wedged subagents.
 - **Dogfooded.** Every new feature has to survive the project's own `self-improve` taskflow before it ships.
@@ -599,12 +630,14 @@ Our `self-improve` flow is a 10-phase DAG — it audits the codebase, patches de
 | [Cross-run cache dogfood](./docs/rfc-cross-run-memoization.md) | Real runtime + on-disk store | Dedicated test harness | Cache correctness under adversarial fingerprints |
 | [Adversarial cross-review](./docs/brainstorm-adversarial-review-report.md) | Multi-agent adversarial review | `tournament` + `gate` | P0 cache-key fix shipped |
 | [Init redesign review](./docs/issue-necessity-review-report.md) | Necessity audit → parallel checks → verdict | 7 phases | Full redesign plan validated |
+| [Round 2 adversarial audit](./docs/internal/dogfooding-report.md) | Phase-by-phase DAG execution — 12 findings across runner/runtime/interpolate/verify | 14 phases | 10 fixes applied, 0 regressions |
+| [Round 3 adversarial audit](./docs/internal/dogfooding-report.md) | Integration layer + cross-module — 10 findings across index/agents/cache/render/runs-view | 9 phases | 10 fixes applied, 0 regressions |
 > **Meta:** we used `pi-taskflow`'s `map` fan-out, `gate` verdicts, `approval` human-in-the-loop, `tournament` best-of-N, `loop` until-done, and `cross-run` cache — to build `pi-taskflow`.
 ## Status & limits
-**v0.0.16** — loop-until-done (`loop` phase: iterate to a condition, convergence, or cap), tournament (best-of-N with a judge), cross-run memoization (content-addressed cache with git/file/glob/env fingerprints and TTL), interactive `/tf init` with role-aware model pickers + diff preview + atomic merge-write, configurable built-in agents, 18 built-in agents with 6 model roles. Full control-flow & reliability layer (`when` guards, `join: any`, `retry`/backoff, `approval`, `flow` composition, `budget` caps, idle watchdog) on top of the DSL + DAG runtime (`agent`/`parallel`/`map`/`gate`/`reduce`). Inline + saved flows, cross-session resume, live progress, and isolated context. A run executes as one streaming tool call.
+**v0.0.17** — loop-until-done (`loop` phase: iterate to a condition, convergence, or cap), tournament (best-of-N with a judge), cross-run memoization (content-addressed cache with git/file/glob/env fingerprints and TTL), interactive `/tf init` with role-aware model pickers + diff preview + atomic merge-write, configurable built-in agents, 18 built-in agents with 6 model roles. Full control-flow & reliability layer (`when` guards, `join: any`, `retry`/backoff, `approval`, `flow` composition, `budget` caps, idle watchdog) on top of the DSL + DAG runtime (`agent`/`parallel`/`map`/`gate`/`reduce`). Inline + saved flows, cross-session resume, live progress, and isolated context. A run executes as one streaming tool call.
 Known boundaries (tracked, bounded — no surprises mid-flow):
@@ -622,7 +655,7 @@ npm test            # unit tests — no network, no process spawning
 npm run test:e2e    # real end-to-end (spawns live subagents; needs model access)
 ```
-Runtime lives in `extensions/`, tests in `test/`, runnable examples in `examples/`, and the full design rationale in [`DESIGN.md`](./DESIGN.md).
+Runtime lives in `extensions/`, tests in `test/`, and runnable examples in `examples/`.
 ## Contributing

package/README.zh-CN.md CHANGED Viewed

@@ -1,6 +1,6 @@
 <div align="center">
-<img src="./assets/hero.png" alt="pi-taskflow — declarative DAG orchestration for Pi subagents: stateful, resumable, context-isolated" width="900">
+<img src="./assets/hero.png" alt="pi-taskflow — 面向 Pi 子代理的声明式、可验证的任务节点图：有状态、可恢复、上下文隔离" width="900">
 <p>
   <a href="https://www.npmjs.com/package/pi-taskflow"><img src="https://img.shields.io/npm/v/pi-taskflow?style=flat-square&color=B692FF&label=npm" alt="npm version"></a>
@@ -8,7 +8,7 @@
   <a href="./LICENSE"><img src="https://img.shields.io/badge/license-MIT-43D9AD?style=flat-square" alt="MIT license"></a>
   <a href="#whats-inside"><img src="https://img.shields.io/badge/runtime%20deps-0-43D9AD?style=flat-square" alt="zero runtime dependencies"></a>
   <a href="https://github.com/heggria/pi-taskflow/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/heggria/pi-taskflow/ci.yml?branch=main&style=flat-square&label=CI" alt="CI status"></a>
-  <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-524-6E8BFF?style=flat-square" alt="524 tests"></a>
+  <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-535-6E8BFF?style=flat-square" alt="535 tests"></a>
   <a href="#whats-inside"><img src="https://img.shields.io/badge/dogfooded-%E2%9C%93-43D9AD?style=flat-square" alt="dogfooded"></a>
   <a href="https://pi.dev"><img src="https://img.shields.io/badge/for-Pi%20coding%20agent-B692FF?style=flat-square" alt="for the Pi coding agent"></a>
 </p>
@@ -26,8 +26,8 @@
   -->
 </p>
-<p><strong>面向 <a href="https://pi.dev">Pi</a> 子代理（subagent）的声明式 DAG 编排框架。</strong><br/>
-并发分发（fan out）· 门控（gate）· 恢复（resume）· 保存为命令 — 中间结果始终远离你的上下文窗口（context window）。</p>
+<p><strong>面向 <a href="https://pi.dev">Pi</a> 子代理（subagent）的声明式、可验证的「任务图」。</strong><br/>
+不是你要去「写脚本」的 workflow——而是你去「声明」的一张 DAG。并发分发（fan out）· 门控（gate）· 恢复（resume）· 保存为命令——中间结果始终远离你的上下文窗口（context window）。</p>
 ```bash
 pi install npm:pi-taskflow
@@ -37,23 +37,38 @@ pi install npm:pi-taskflow
 ---
-**子代理是发射后不管的。而 Taskflow 可以发射、分发、暂停、门控、恢复，并把自己保存为一条命令。**
+**`workflow` 是在「流动」，而 `taskflow` 是一张「图」。** 其他编排框架让模型去「写脚本」——命令式的代码逐步流动，而那张图藏在控制流里。`pi-taskflow` 恰恰相反：你把工作**声明**为一张由离散、具名的**任务（task）节点**、通过 `dependsOn` 边连接而成的图——而运行时会在花掉一个 token 之前，*先验证这张图。*
-你已经熟悉内置的子代理（subagent）工具的 `task` / `tasks` / `chain` 了。`pi-taskflow` 使用**完全相同的简写语法**——所以你现有的委托立刻就能变成**可追踪、可恢复、可保存为一条 `/tf:<name>` 命令**的流程。当你超越简写语法时，完整的 DSL 为你提供真正的 DAG：针对数十个项目的动态并发分发、条件路由、质量门控、人工审批、重试，以及硬性费用上限。
+你已经熟悉内置子代理（subagent）工具的 `task` / `tasks` / `chain` 了。`pi-taskflow` 使用**完全相同的简写语法**——所以你现有的委托立刻就能变成**可追踪、可恢复、可保存为一条 `/tf:<name>` 命令**的流程。当你超越简写语法时，完整的 DSL 为你提供真正的 DAG：针对数十个项目的动态并发分发、条件路由、质量门控、人工审批、重试，以及硬性费用上限。
 而且自始至终，**只有最终阶段（final phase）才会进入你的对话。** 每一个中间转录都留在运行时中，永远不会进入你的上下文窗口。
+## 为什么叫 “taskflow” 而不是 “workflow”？
+名字就是立论。在工程语境里，**task（任务）**是一个*离散、被声明出来的工作单元*——是任务图的节点（构建系统、调度器、编译器都把这种 `task` 连成 DAG）。而 **work（工作）**息息相反，是*流动的、无界的*——那种连续的、命令式的「干活」过程。
+这个区别，恰恰就是 Pi 生态里的设计分水岭：
+<div align="center">
+<img src="./assets/task-vs-work.png" alt="work 是一段流动的命令式脚本，它的图藏在控制流里、运行前无法验证；taskflow 是一张由离散任务节点构成的声明式图，在花掉任何 token 之前就被静态验证" width="900">
+</div>
+- 一个 **`workflow`**（那种动态的、code-mode 的形态）是模型在写一段**「流动」的命令式脚本**：`await agent(...)`、一个 `if`、一个 `for`、又一个 `await`。很有表达力——它是图灵完备的——但那张图只在*代码跑起来的时候*才存在。你看不到它、diff 不了它，也无法在付费之前证明它会终止。
+- 一个 **`taskflow`** 把计划**从代码中移出、放进一张由 `task` 节点构成的声明式图里。** 因为这张图是*数据*，运行时就能做到命令式脚本从结构上做不到的事：在任何子代理被启动之前就**静态验证它**（无环、无死端、不超预算、无悬空引用）、**渲染它**（实时进度*本身就是*那张 DAG）、**逐阶段恢复它**，以及把它**保存为一条命令**。
+> **我们有意为之的取舍：**我们放弃了任意代码的极致表达力，换来了命令式脚本永远无法拥有的东西——一张**可验证、可观测、可重放、且能安全交给 LLM 生成**的图。当一个任务需要十二个步骤、带分支并发分发和一道审查门控时，你要的是一张能*检查*的图——而不是一段你只能*祈祷*它跑对的脚本。
 ## 为什么需要这个
-这就是你在使用原生子代理时遇到的瓶颈：你用文字描述一个多步骤计划，模型每次都要重新推导，中间转录物塞满你的上下文，一旦某次模型调用失败你就得从头开始。没有复用，没有恢复，没有结构。
+这就是你在使用原生子代理时遇到的瓶颈：你用文字描述一个多步骤计划，模型每次都要重新推导，中间转录物塞满你的上下文，一旦某次模型调用失败你就得从头开始。没有复用，没有恢复，没有结构——也没有任何办法在烧掉 token 之前*检查*这个计划。
-`pi-taskflow` 把计划**从提示词中移出，放入声明式定义中。** 运行时（runtime）拥有 DAG、循环、重试和中间状态的所有权。你声明一次流水线，就能按名字运行上百次。
+`pi-taskflow` 把计划**从提示词中移出，放入一张由任务节点构成的声明式图里。** 运行时（runtime）拥有 DAG、循环、重试和中间状态的所有权。你声明一次流水线，就能按名字运行上百次。因为这个计划是数据——不是文字，也不是代码——所以它可以被**验证、可视化、重放**。
 <div align="center">
 <img src="./assets/context-isolation.png" alt="使用原生子代理时每个转录物都涌入你的上下文；使用 pi-taskflow 时转录物留在运行时，只有最终结果返回" width="900">
 </div>
-> 当一个任务需要十二个步骤，包含分支并发分发和审查门控时，你需要的是编排——而不是碰运气的提示词。
+> 十二个步骤、分支并发分发、一道审查门控、一个费用上限——这就是一张图，你想要*看到并检查*它，而不是每次运行都重新提示一遍。
 | | 子代理（内置） | **pi-taskflow** |
 |---|---|---|
@@ -72,7 +87,23 @@ pi install npm:pi-taskflow
 | **实时进度** | 运行时不可见 | **实时 DAG 渲染，附带耗时和成本** |
 | **易用性** | 每次内联 JSON | **简写语法（`task`/`tasks`/`chain`）*或* DSL** |
-它没有取代子代理工具。它给你的子代理赋予了 DAG、记忆和一个名字。
+它没有取代子代理工具。它给你的子代理赋予了一张**图**、一份记忆和一个名字。
+## 声明式图 vs 命令式脚本
+精神上最接近 `pi-taskflow` 的，是那种**动态 / code-mode 的 workflow**——模型写一段 JavaScript 编排脚本。它强大、且确实很有表达力。但它位于某个根本轴的*另一极*：**表达力 vs 可验证性。**
+| | 动态 `workflow`（code-mode） | **`pi-taskflow`**（声明式图） |
+|---|---|---|
+| **计划是什么** | 模型书写并运行的命令式 JS | **运行时执行的声明式 JSON 数据** |
+| **那张图** | 隐式——藏在 `if`/`for`/`await` 控制流里 | **显式——`phases[]` + `dependsOn` 边，一等对象** |
+| **运行前验证** | ✗ 图灵完备；无法证明会终止 | **✓ 静态检查：无环、无死端、不超预算、无悬空引用** |
+| **看到它** | ✗ 图只在代码跑起来时存在 | **✓ 实时进度渲染*本身就是* DAG** |
+| **恢复** | 粗粒度（调用缓存去重） | **✓ 逐阶段输入哈希恢复，跨会话** |
+| **能否安全交给 LLM 生成** | 有风险——它是可执行代码 | **✓ 它只是数据——无 `eval`、无任意执行** |
+| **表达力上限** | **更高**——任意控制流 | 受 DSL 限制（但 `map`/`when`/`loop`/`gate` 覆盖了大多数任务） |
+我们有意选了**可验证**的那一边。你放弃的表达力是真实的；但你换回的——一张能检查、能看、能重放、能安全交给模型书写的计划——才是把一次性提示变成持久编排的关键。
 ## 与其他 Pi 扩展的对比
@@ -97,12 +128,12 @@ Pi 生态现在有 **20 多个委托、工作流和编排扩展**——每个在
 - **`@pi-agents/orchid`** 是生态中功能最完整的编排器（DAG + worktree + Ralph 循环 + 代理邮箱）——但其 DSL 是*固定*的 9 阶段流水线，携带运行时依赖 + jiti，且处于 beta 阶段。当你想**定义自己的图结构**（而非采用别人的固定观点），并且追求**零依赖**和一条命令安装时，选 `pi-taskflow`。
 - **`pi-crew` / `ultimate-pi`** 更重——worktree 隔离、持久的异步团队、多层治理。如果你想要轻量、声明式、零依赖，那就选本项目。
-- **`@zhushanwen/pi-workflow`** 精神上最为接近，也是零依赖，但你需要以 **JavaScript 脚本**的形式编写工作流。`pi-taskflow` 的**声明式 JSON DSL** 更安全、更可审计，其**阶段级输入哈希恢复**也比调用缓存去重更精细。
+- **`@zhushanwen/pi-workflow`** 精神上最为接近，也是零依赖，但它站在上述分水岭的**命令式**那一边：你要以模型书写并运行的 **JavaScript 脚本**来编写工作流。`pi-taskflow` 的**声明式 JSON DAG** 是可验证的那一边——可静态检查、可可视化、可安全交给 LLM 生成，且恢复粒度精细到阶段级别而非调用缓存去重。
 - **`@fiale-plus/pi-rogue-orchestration`** 拥有真正的**循环至完成**（`pi-taskflow` 尚不具备的功能）。如果你的任务是"一直做直到目标达成"，它值得一看；而 `pi-taskflow` 适用于*结构化、分支式的*流水线。
 - **`pi-subagents` / `@gotgenes/pi-subagents`** 是即席"用 reviewer 审查这个 diff"委托和后台作业的成熟选择。`pi-taskflow` 则适用于当这些委托需要变成*可重复、可恢复的流水线*时。
 - **`pi-pipeline` / `pi-agent-flow`** 提供的是*固定观点、固定结构*的流程。`pi-taskflow` 提供的是*一张空白画布*：你（或模型）声明适合任务的图结构。
-> 诚实的一句话总结：**`pi-taskflow` 是唯一一个让你以声明式、可恢复、DAG 形态编排子代理流水线，保存为一条单词命令，零运行时依赖且上下文隔离的 Pi 扩展。** 已知正在弥补的缺口：循环至完成、worktree 隔离、非阻塞后台运行（详见 [`STRATEGY.md`](./STRATEGY.md)）。
+> 诚实的一句话总结：**`pi-taskflow` 是唯一一个给你一张*声明式、可验证、可恢复*的任务节点 DAG 的 Pi 扩展——保存为一条单词命令，零运行时依赖，且从设计上就上下文隔离。** code-mode 的 workflow 让模型去*写脚本*跳动工作，`pi-taskflow` 则让它*声明一张运行时能在执行前证明其正确的图。* 已知正在弥补的缺口：循环至完成、worktree 隔离、非阻塞后台运行（详见 [`STRATEGY.md`](./STRATEGY.md)）。
 ## 30 秒快速开始
@@ -538,7 +569,7 @@ Taskflow 自带 **18 个内置代理**——每个代理是一个 `.md` 文件
 | `mode: "apply-defaults"` + `force: true` | 将 `RECOMMENDED_DEFAULTS` 写入 `settings.json`，保留旧键。 |
 | `mode: "interactive"` | 启动完整的行动菜单 + 选择器流程（需要 UI 会话）。 |
-> **v0.0.13 弃用说明：** 如果省略 `mode`，工具在 `modelRoles` 为空时回退到 v0.0.12 的行为（自动写入默认值）并附带 `console.warn` 弃用通知。如果 `modelRoles` 已存在，则行为如同 `mode: "show"`。此桥接将在 v0.0.14 中移除。
 ### 自定义代理
@@ -579,12 +610,12 @@ provided files. Report violations grouped by file. No fixes.
 <div align="center">
-**0 个运行时依赖** · **394 个测试** · **10 种阶段类型** · **跨会话恢复** · **跨运行记忆化** · **~4.9k LOC 运行时**
+**0 个运行时依赖** · **535 个测试** · **9 种阶段类型** · **跨会话恢复** · **跨运行记忆化** · **~5.4k LOC 运行时**
 </div>
 - **零运行时依赖。** 没有 `dependencies` 字段——运行时完全基于 Node 内置模块（`fs` / `path` / `os` / `child_process` / `crypto`）。文件锁是 `fs.openSync("wx")`，不是第三方库。
-- **371 个测试分布在 14 个测试套件中**，涵盖并发、原子文件锁定（8 进程竞争回归测试）、路径穿越防御、跨会话恢复、跨运行缓存新鲜度（流程/推理/工具键隔离、指纹失效、TTL/LRU 淘汰）、门控判决、预算上限、重试/回退、审批流程、循环终止、锦标赛评判、子流程组合、回调隔离、空闲看门狗、模型角色 init 配置，以及带括号模型名称回归的 parseModelFromLabel——此外还有实时端到端测试（生成真实子代理）和跨运行缓存 dogfood。
+- **535 个测试分布在 21 个测试文件中**，涵盖并发、原子文件锁定（8 进程竞争回归测试）、路径穿越防御、跨会话恢复、跨运行缓存新鲜度（流程/推理/工具键隔离、指纹失效、TTL/LRU 淘汰）、门控判决、预算上限、重试/回退、审批流程、循环终止、锦标赛评判、子流程组合、回调隔离、空闲看门狗、模型角色 init 配置，以及带括号模型名称回归的 parseModelFromLabel。
 - **经过强化的设计。** 路径穿越防御（词法 + `realpath`）、runId 验证、HTML/错误净化、原子写入、通过 `rename` 实现的过期锁窃取，以及杀死卡死子代理的空闲看门狗。
 - **自产自用（dogfooded）。** 每个新功能必须在发布前通过项目自身的 `self-improve` taskflow 的考验。
@@ -601,12 +632,14 @@ provided files. Report violations grouped by file. No fixes.
 | [跨运行缓存 dogfood](./docs/rfc-cross-run-memoization.md) | 真实运行时 + 磁盘存储 | 专用测试框架 | 在对抗性指纹下验证缓存正确性 |
 | [对抗性交叉审查](./docs/brainstorm-adversarial-review-report.md) | 多代理对抗性审查 | `tournament` + `gate` | 修复 P0 缓存键问题并发布 |
 | [Init 重设计审查](./docs/issue-necessity-review-report.md) | 必要性审计 → 并行检查 → 判决 | 7 阶段 | 完整重设计方案已验证 |
+| [第 2 轮对抗性审计](./docs/internal/dogfooding-report.md) | 逐阶段 DAG 执行——12 个发现覆盖 runner/runtime/interpolate/verify | 14 阶段 | 已修复 10 项，0 退化 |
+| [第 3 轮对抗性审计](./docs/internal/dogfooding-report.md) | 集成层 + 跨模块——10 个发现覆盖 index/agents/cache/render/runs-view | 9 阶段 | 已修复 10 项，0 退化 |
 > **元点评：** 我们使用了 `pi-taskflow` 的 `map` 并发分发、`gate` 判决、`approval` 人机协作、`tournament` best-of-N、`loop` 循环至完成和 `cross-run` 缓存——来构建 `pi-taskflow`。
 ## 状态与边界
-**v0.0.13**——循环至完成（`loop` 阶段：迭代至条件满足、收敛或上限）、锦标赛（best-of-N 带评判者）、跨运行记忆化（基于 git/文件/glob/环境指纹和 TTL 的内容寻址缓存）、交互式 `/tf init`（带角色感知模型选择器 + 差异预览 + 原子合并写入）、18 个内置代理及 6 个模型角色。完整的控制流与可靠性层（`when` 守卫、`join: any`、`retry`/回退、`approval`、`flow` 组合、`budget` 上限、空闲看门狗）构建在 DSL + DAG 运行时（`agent`/`parallel`/`map`/`gate`/`reduce`）之上。支持内联 + 已保存流程、跨会话恢复、实时进度和上下文隔离。一次运行作为一个流式工具调用执行。
+**v0.0.17**——循环至完成（`loop` 阶段：迭代至条件满足、收敛或上限）、锦标赛（best-of-N 带评判者）、跨运行记忆化（基于 git/文件/glob/环境指纹和 TTL 的内容寻址缓存）、交互式 `/tf init`（带角色感知模型选择器 + 差异预览 + 原子合并写入）、18 个内置代理及 6 个模型角色。完整的控制流与可靠性层（`when` 守卫、`join: any`、`retry`/回退、`approval`、`flow` 组合、`budget` 上限、空闲看门狗）构建在 DSL + DAG 运行时（`agent`/`parallel`/`map`/`gate`/`reduce`）之上。支持内联 + 已保存流程、跨会话恢复、实时进度和上下文隔离。一次运行作为一个流式工具调用执行。
 已知边界（已追踪、有限定——不会在流程中途出现意外）：
@@ -624,7 +657,7 @@ npm test            # 单元测试——无网络，无进程派生
 npm run test:e2e    # 真实端到端测试（派生真实子代理；需要模型访问权限）
 ```
-运行时位于 `extensions/`，测试位于 `test/`，可运行示例位于 `examples/`，完整设计原理参见 [`DESIGN.md`](./DESIGN.md)。
+运行时位于 `extensions/`，测试位于 `test/`，可运行示例位于 `examples/`。
 ## 贡献

package/examples/dynamic-plan-execute.json ADDED Viewed

@@ -0,0 +1,34 @@
+{
+  "name": "dynamic-plan-execute",
+  "description": "Runtime plan-then-execute: a planner scans the codebase and EMITS a sub-flow (one audit phase per file). The flow phase resolves that JSON at runtime, validates it, and runs it as a nested sub-flow — the number and shape of audit phases is decided at runtime, not authored in advance. A gate then reports.",
+  "version": 1,
+  "args": {
+    "target": { "default": ".", "description": "Directory to scan and audit" }
+  },
+  "concurrency": 4,
+  "agentScope": "user",
+  "budget": { "maxUSD": 1.5 },
+  "phases": [
+    {
+      "id": "plan",
+      "type": "agent",
+      "agent": "planner",
+      "task": "Scan \"{args.target}\" and produce an audit plan. Output ONLY a JSON object of the form {\"name\":\"audit\",\"phases\":[ ... ]}. Emit ONE phase per source file worth auditing. Each phase must look like {\"id\":\"audit-<safe-name>\",\"type\":\"agent\",\"agent\":\"reviewer\",\"task\":\"Audit <path> for correctness, security, and dead code. Report findings.\"}. Give the LAST phase a \"final\": true and make it a reduce-style summary that depends on the others (\"type\":\"reduce\",\"from\":[<all audit ids>],\"agent\":\"reviewer\",\"task\":\"Summarize all audit findings into one report.\"). Use hyphens in ids, never underscores. Output JSON only — no prose, no markdown fence.",
+      "output": "json"
+    },
+    {
+      "id": "execute-plan",
+      "type": "flow",
+      "def": "{steps.plan.json}",
+      "dependsOn": ["plan"]
+    },
+    {
+      "id": "report-gate",
+      "type": "gate",
+      "agent": "reviewer",
+      "dependsOn": ["execute-plan"],
+      "task": "Here is the audit report:\n\n{steps.execute-plan.output}\n\nDecide whether the codebase is in acceptable shape. End with 'VERDICT: PASS' or 'VERDICT: BLOCK' and a one-line reason.",
+      "final": true
+    }
+  ]
+}

package/examples/iterative-replan.json ADDED Viewed

@@ -0,0 +1,30 @@
+{
+  "name": "iterative-replan",
+  "description": "Data-dependent iterative replanning: a loop where each iteration's plan depends on the PREVIOUS iteration's RESULT (not a one-shot fan-out). Each round the planner reads the prior round's findings and decides either to emit more work or to signal done. This is the declarative equivalent of an imperative `for` loop that reads a result and decides the next step. Each round's generated plan is validated before it runs.",
+  "version": 1,
+  "args": {
+    "goal": { "description": "The investigation / refinement objective" }
+  },
+  "concurrency": 4,
+  "agentScope": "user",
+  "budget": { "maxUSD": 2.0 },
+  "phases": [
+    {
+      "id": "investigate",
+      "type": "loop",
+      "agent": "explorer",
+      "maxIterations": 5,
+      "until": "{steps.investigate.json.done} == true",
+      "output": "json",
+      "task": "Goal: {args.goal}\n\nPrevious round's result (empty on the first round):\n{previous.output}\n\nDecide the next step. If the goal is satisfied, output ONLY {\"done\": true, \"summary\": \"<what you concluded>\"}. Otherwise output ONLY {\"done\": false, \"findings\": \"<what you learned this round>\", \"next\": \"<what to investigate next round>\"}. Each round's plan must build on the previous round's findings. JSON only — no prose."
+    },
+    {
+      "id": "final-report",
+      "type": "agent",
+      "agent": "reviewer",
+      "dependsOn": ["investigate"],
+      "task": "Write the final report for goal \"{args.goal}\" based on the converged investigation:\n\n{steps.investigate.output}",
+      "final": true
+    }
+  ]
+}

package/extensions/index.ts CHANGED Viewed

@@ -1,5 +1,5 @@
 /**
- * pi-taskflow — lightweight workflow orchestration for the Pi coding agent.
+ * pi-taskflow — a declarative, verifiable graph of task nodes for the Pi coding agent.
  *
  * Registers:
  *   - tool `taskflow`        : run inline / saved flows, save, resume (LLM-callable)
@@ -325,7 +325,7 @@ export default function (pi: ExtensionAPI) {
 			"Interpolation: {args.X}, {steps.ID.output}, {steps.ID.json}, {item} (map), {previous.output}.",
 		].join(" "),
 		parameters: TaskflowParams,
-		promptSnippet: "Orchestrate subagents — single, parallel, chain, or DAG — with tracking, resume, and context isolation. Replaces the subagent tool.",
+		promptSnippet: "Declare a verifiable graph of subagent tasks (single, parallel, chain, or full DAG) — tracked, resumable, context-isolated. The runtime validates the graph before running. Replaces the subagent tool.",
 		promptGuidelines: [
 			"BEFORE FIRST USE: invoke skill_load('taskflow') to read the full skill documentation (DSL syntax, phase types, examples, best practices). This tool description is a condensed reference only — the skill is the authoritative guide.\n\nUse taskflow for ALL delegation — single tasks, parallel, chain, or full DAG orchestration. It fully replaces the subagent tool: every delegation is tracked with a runId, resumable across sessions, context-isolated (only final output returns), and saveable as /tf:<name>. Do NOT call the subagent tool directly; use taskflow shorthand (task/tasks/chain) for simple cases instead.",
 			"For complex multi-phase work (explore / 审计 / analyze the project, auditing endpoints, reviewing or migrating many files/modules, cross-checked research), use the full DSL with phases. For taskflow map phases, have the upstream phase emit a JSON array and set output:'json'.",

package/extensions/runtime.ts CHANGED Viewed

@@ -16,7 +16,8 @@ import type { AgentConfig } from "./agents.ts";
 import { coerceArray, evaluateCondition, interpolate, type InterpolationContext, safeParse, tryEvaluateCondition } from "./interpolate.ts";
 import { isFailed, isTransientError, type LiveUpdate, mapWithConcurrencyLimit, runAgentTask, type RunResult } from "./runner.ts";
 import { aggregateUsage, emptyUsage, type UsageStats } from "./usage.ts";
-import { type Budget, type CacheScope, dependenciesOf, finalPhase, LOOP_DEFAULT_MAX_ITERATIONS, LOOP_HARD_MAX_ITERATIONS, parseTtlMs, type Phase, resolveArgs, type Taskflow, topoLayers, TOURNAMENT_DEFAULT_VARIANTS, TOURNAMENT_HARD_MAX_VARIANTS, type TournamentMode } from "./schema.ts";
+import { type Budget, type CacheScope, dependenciesOf, finalPhase, LOOP_DEFAULT_MAX_ITERATIONS, LOOP_HARD_MAX_ITERATIONS, MAX_DYNAMIC_MAP_ITEMS, MAX_DYNAMIC_NESTING, parseTtlMs, type Phase, resolveArgs, type Taskflow, topoLayers, TOURNAMENT_DEFAULT_VARIANTS, TOURNAMENT_HARD_MAX_VARIANTS, type TournamentMode, validateTaskflow } from "./schema.ts";
+import { verifyTaskflow } from "./verify.ts";
 import { hashInput, newRunId, type PhaseState, type RunState } from "./store.ts";
 import { CacheStore, resolveFingerprint } from "./cache.ts";
@@ -142,6 +143,63 @@ function failPhase(id: string, error: string): PhaseState {
 	return { id, status: "failed", error, inputHash: hashInput(id, error), endedAt: Date.now(), usage: emptyUsage() };
 }
+/**
+ * Normalize an inline `flow.def` payload into a full Taskflow shape.
+ * Accepts: a full Taskflow ({name?,phases:[...]}), a bare phases array, or
+ * {phases:[...]}. Returns undefined if the shape is unrecognized. A recognized
+ * shape with ZERO phases is returned as-is (caller treats it as a no-op) so the
+ * empty-plan case is distinguishable from a malformed one.
+ *
+ * The payload is deep-cloned so the runtime never shares references with (or
+ * mutates) the upstream phase's parsed JSON. Cloning also drops any non-own /
+ * prototype-shadowing `__proto__` own-property that a crafted JSON could carry.
+ */
+function normalizeInlineDef(parsed: unknown, phaseId: string): Taskflow | undefined {
+	let shaped: Taskflow | undefined;
+	if (Array.isArray(parsed)) {
+		shaped = { name: `${phaseId}-inline`, phases: parsed as Taskflow["phases"] };
+	} else if (parsed && typeof parsed === "object") {
+		const o = parsed as Record<string, unknown>;
+		if (Array.isArray(o.phases)) {
+			const name = typeof o.name === "string" && o.name.length > 0 ? (o.name as string) : `${phaseId}-inline`;
+			shaped = { ...(o as object), name, phases: o.phases as Taskflow["phases"] } as Taskflow;
+		}
+	}
+	if (!shaped) return undefined;
+	// Deep clone via JSON round-trip: severs shared references with upstream output
+	// and drops any own "__proto__" key (JSON.stringify omits it). As belt-and-
+	// suspenders, also delete inert `constructor`/`prototype` own-keys a crafted
+	// payload could carry, so the returned object is clean of pollution vectors.
+	try {
+		const clone = JSON.parse(JSON.stringify(shaped)) as Record<string, unknown>;
+		for (const k of ["__proto__", "constructor", "prototype"]) {
+			if (Object.prototype.hasOwnProperty.call(clone, k)) delete clone[k];
+		}
+		return clone as unknown as Taskflow;
+	} catch {
+		return undefined;
+	}
+}
+/**
+ * Clamp a runtime-generated sub-flow's budget so it can only ever be TIGHTER
+ * than the parent's, never looser. A generated def cannot raise the spend cap by
+ * declaring its own large budget. Each dimension becomes min(child, parent).
+ */
+function clampSubFlowBudget(sub: Taskflow, parentBudget: Budget | undefined): Taskflow {
+	if (!parentBudget) return sub;
+	const child = sub.budget;
+	const clamped: Budget = {
+		maxUSD: Math.min(child?.maxUSD ?? Infinity, parentBudget.maxUSD ?? Infinity),
+		maxTokens: Math.min(child?.maxTokens ?? Infinity, parentBudget.maxTokens ?? Infinity),
+	};
+	// Drop Infinity dimensions (no cap on that axis).
+	const budget: Budget = {};
+	if (Number.isFinite(clamped.maxUSD)) budget.maxUSD = clamped.maxUSD;
+	if (Number.isFinite(clamped.maxTokens)) budget.maxTokens = clamped.maxTokens;
+	return { ...sub, budget: budget.maxUSD === undefined && budget.maxTokens === undefined ? undefined : budget };
+}
 /** Aggregate run cost/tokens so far and test against the budget. */
 function overBudget(state: RunState): { over: boolean; reason: string } {
 	const budget: Budget | undefined = state.def.budget;
@@ -592,7 +650,15 @@ async function executePhase(
 	if (type === "map") {
 		const overResolved = interpolate(phase.over ?? "", ctx).text;
 		// `over` may itself be a placeholder that resolved to a JSON string.
-		const arr = coerceArray(safeParse(overResolved)) ?? coerceArray(directRef(phase.over ?? "", state));
+		let arr = coerceArray(safeParse(overResolved)) ?? coerceArray(directRef(phase.over ?? "", state));
+		// Breadth cap for untrusted dynamic sub-flows: a `def:` frame in the stack
+		// means we are inside a runtime-generated flow. Truncate giant fan-outs to
+		// bound subprocess blast radius (fail-open: keep the first N rather than abort).
+		let mapTruncated = false;
+		if (arr && (deps._stack ?? []).some((s) => s.startsWith("def:")) && arr.length > MAX_DYNAMIC_MAP_ITEMS) {
+			arr = arr.slice(0, MAX_DYNAMIC_MAP_ITEMS);
+			mapTruncated = true;
+		}
 		if (!arr) {
 			return {
 				id: phase.id,
@@ -617,6 +683,12 @@ async function executePhase(
 		const results = await runFanout(tasks);
 		const ps = mergePhaseState(phase.id, results, inputHash, parseJson);
+		if (mapTruncated) {
+			ps.warnings = [...(ps.warnings ?? []), `map fan-out truncated to MAX_DYNAMIC_MAP_ITEMS (${MAX_DYNAMIC_MAP_ITEMS}) inside a dynamic sub-flow`];
+			// NB: do NOT set ps.budgetTruncated — that field drives the run-level
+			// budget-blocked path and would mislabel the run as "budget exceeded".
+			// This is a safety fan-out cap, not a cost overrun; a warning is enough.
+		}
 		recordCache(cc, ps);
 		return ps;
 	}
@@ -660,14 +732,96 @@ async function executePhase(
 	if (type === "flow") {
 		const ctx = buildInterpolationContext(state, previousOutput);
-		const name = phase.use;
-		if (!name) return failPhase(phase.id, `flow phase '${phase.id}' requires 'use'`);
-		if (!deps.loadFlow) return failPhase(phase.id, `flow phase '${phase.id}': no sub-flow loader available`);
-		const subDef = deps.loadFlow(name);
-		if (!subDef) return failPhase(phase.id, `flow phase '${phase.id}': saved flow not found: '${name}'`);
+		const hasDef = (phase as { def?: unknown }).def !== undefined;
 		const stack = deps._stack ?? [];
-		if (name === state.flowName || stack.includes(name)) {
-			return failPhase(phase.id, `flow phase '${phase.id}': recursive sub-flow ${[...stack, state.flowName, name].join(" -> ")}`);
+		let subDef: Taskflow | undefined;
+		let name: string;
+		let recursionKey: string; // identity used for cache key + recursion guard
+		if (hasDef) {
+			// --- Inline `def`: resolve at runtime, validate, fail-OPEN on any error. ---
+			// Fail-open contract: a bad def NEVER aborts the run. The phase resolves
+			// as `done` with empty output and a `defError` diagnostic, and the
+			// upstream output is preserved for downstream phases. (Authors who want
+			// a bad plan to be a hard failure can add their own gate downstream.)
+			const defFailOpen = (diag: string): PhaseState => ({
+				id: phase.id,
+				status: "done",
+				output: "",
+				json: parseJson ? safeParse("") : undefined,
+				usage: emptyUsage(),
+				inputHash: hashInput(phase.id, `flow-def-error:${diag}`),
+				endedAt: Date.now(),
+				defError: diag,
+			});
+			// Nesting guard: each `flow{def}` adds a frame to _stack; cap inline depth.
+			const inlineDepth = stack.filter((s) => s.startsWith("def:")).length;
+			if (inlineDepth >= MAX_DYNAMIC_NESTING) {
+				return defFailOpen(`inline sub-flow nesting exceeded MAX_DYNAMIC_NESTING (${MAX_DYNAMIC_NESTING}): depth ${inlineDepth}`);
+			}
+			const rawDef = (phase as { def?: unknown }).def;
+			// String defs are interpolated then JSON-parsed; objects are used directly.
+			let parsed: unknown;
+			if (typeof rawDef === "string") {
+				const resolved = interpolate(rawDef, ctx).text;
+				parsed = safeParse(resolved);
+				if (parsed === undefined) {
+					return defFailOpen("inline def string did not parse as JSON");
+				}
+			} else {
+				parsed = rawDef;
+			}
+			// Accept a full Taskflow, a bare phases array, or {phases:[...]}; wrap the latter two.
+			const wrapped = normalizeInlineDef(parsed, phase.id);
+			if (!wrapped) {
+				return defFailOpen("inline def is not a Taskflow, phases array, or {phases:[...]}");
+			}
+			// Empty plan is a valid no-op (a planner deciding there is nothing to do):
+			// succeed with empty output instead of failing validation on zero phases.
+			if (wrapped.phases.length === 0) {
+				return {
+					id: phase.id,
+					status: "done",
+					output: "",
+					json: parseJson ? safeParse("") : undefined,
+					usage: emptyUsage(),
+					inputHash: hashInput(phase.id, "flow-def-empty"),
+					endedAt: Date.now(),
+				};
+			}
+			// Validate with `dynamic` hardening (breadth caps + cwd containment) since
+			// this content is LLM-authored / untrusted. cwd anchors containment checks.
+			const dynCwd = phase.cwd ?? deps.cwd;
+			const v = validateTaskflow(wrapped, { dynamic: true, cwd: dynCwd });
+			if (!v.ok) {
+				return defFailOpen(`inline def failed validation: ${v.errors.join("; ")}`);
+			}
+			// Static verification (dead-ends, unreachable, gate-exhaustion, budget,
+			// concurrency). Only error-severity issues block; warnings are advisory.
+			const ver = verifyTaskflow({ name: wrapped.name, phases: wrapped.phases as Phase[], budget: wrapped.budget, concurrency: wrapped.concurrency });
+			if (!ver.ok) {
+				const errs = ver.issues.filter((i) => i.severity === "error").map((i) => i.message);
+				return defFailOpen(`inline def failed verification: ${errs.join("; ")}`);
+			}
+			// Budget containment: a generated def may not raise the parent's cap. Clamp
+			// each dimension to min(child, parent) so it can only ever be tighter.
+			subDef = clampSubFlowBudget(wrapped, state.def.budget);
+			name = subDef.name;
+			recursionKey = `def:${name}`;
+		} else {
+			// --- Saved flow via `use` (unchanged behavior). ---
+			const useName = phase.use;
+			if (!useName) return failPhase(phase.id, `flow phase '${phase.id}' requires 'use' or 'def'`);
+			if (!deps.loadFlow) return failPhase(phase.id, `flow phase '${phase.id}': no sub-flow loader available`);
+			subDef = deps.loadFlow(useName);
+			if (!subDef) return failPhase(phase.id, `flow phase '${phase.id}': saved flow not found: '${useName}'`);
+			name = useName;
+			recursionKey = useName;
+		}
+		if (recursionKey === state.flowName || stack.includes(recursionKey)) {
+			return failPhase(phase.id, `flow phase '${phase.id}': recursive sub-flow ${[...stack, state.flowName, recursionKey].join(" -> ")}`);
 		}
 		// Resolve sub-flow args (interpolate string values), then apply declared defaults.
 		const provided: Record<string, unknown> = {};
@@ -675,7 +829,11 @@ async function executePhase(
 			provided[k] = typeof v === "string" ? interpolate(v, ctx).text : v;
 		}
 		const subArgs = resolveArgs(subDef, provided);
-		const inputHash = cacheKey(cc, [phase.id, `flow:${name}`, preRead, JSON.stringify(subArgs)]);
+		// For inline defs the cache identity must include the resolved def content so
+		// that a different generated plan yields a different key (and an identical plan
+		// hits cache). For saved flows the name is the identity (historical behavior).
+		const flowIdentity = hasDef ? `def:${JSON.stringify(subDef)}` : `flow:${name}`;
+		const inputHash = cacheKey(cc, [phase.id, flowIdentity, preRead, JSON.stringify(subArgs)]);
 		const cached = cachedPhase(cc, inputHash);
 		if (cached) return cached;
@@ -707,7 +865,7 @@ async function executePhase(
 			// flow's cwd (not the caller's cwd).
 			cwd: phase.cwd ?? deps.cwd,
 			runTask: subRunTask,
-			_stack: [...stack, state.flowName],
+			_stack: hasDef ? [...stack, state.flowName, recursionKey] : [...stack, state.flowName],
 			persist: undefined,
 			onProgress: () => {
 				if (live) {

package/extensions/schema.ts CHANGED Viewed

@@ -20,6 +20,19 @@ export type PhaseType = (typeof PHASE_TYPES)[number];
 export const LOOP_DEFAULT_MAX_ITERATIONS = 10;
 export const LOOP_HARD_MAX_ITERATIONS = 100;
+/** Max depth of runtime `flow { def }` sub-flow nesting (runaway guard for
+ *  LLM-generated sub-flows that themselves spawn more sub-flows). The existing
+ *  `_stack` recursion check guards saved-flow cycles; this bounds inline depth. */
+export const MAX_DYNAMIC_NESTING = 5;
+/** Breadth caps applied ONLY to runtime-generated (`flow { def }`) sub-flows,
+ *  whose content is LLM-authored and therefore untrusted. Authored/saved flows
+ *  are not subject to these (a human reviewed them). They bound DoS blast radius
+ *  from a model emitting a graph with thousands of phases / a giant fan-out. */
+export const MAX_DYNAMIC_PHASES = 100;
+export const MAX_DYNAMIC_MAP_ITEMS = 200;
+export const MAX_DYNAMIC_CONCURRENCY = 16;
 /** Tournament competitor bounds. */
 export const TOURNAMENT_DEFAULT_VARIANTS = 3;
 export const TOURNAMENT_HARD_MAX_VARIANTS = 20;
@@ -119,6 +132,12 @@ const PhaseSchema = Type.Object(
 		// sub-workflow (flow)
 		use: Type.Optional(Type.String({ description: "[flow] Name of a saved taskflow to run as this phase" })),
+		def: Type.Optional(
+			Type.Unknown({
+				description:
+					"[flow] Inline sub-flow definition, resolved at runtime. Mutually exclusive with 'use'. A string is interpolated (e.g. '{steps.plan.json}') then JSON-parsed; an object is used directly. The result must be a Taskflow ({name,phases}) or a bare phases array / {phases:[...]} (auto-wrapped). Validated + verified before execution; on any failure the phase fails-open (defError) without aborting the run.",
+			}),
+		),
 		with: Type.Optional(
 			Type.Record(Type.String(), Type.Unknown(), {
 				description: "[flow] Args passed to the sub-flow (string values support interpolation)",
@@ -388,6 +407,10 @@ export interface ValidationOptions {
 	cwd?: string;
 	/** Override the flow's own `strictInterpolation` flag for this validation call. */
 	strict?: boolean;
+	/** When true, this flow is a runtime-generated (`flow { def }`) sub-flow whose
+	 *  content is LLM-authored / untrusted. Enables hardening checks: breadth caps
+	 *  (phase count, map items, concurrency) and cwd containment under `cwd`. */
+	dynamic?: boolean;
 }
 export function validateTaskflow(def: unknown, opts: ValidationOptions = {}): ValidationResult {
@@ -406,6 +429,32 @@ export function validateTaskflow(def: unknown, opts: ValidationOptions = {}): Va
 		return { ok: false, errors, warnings };
 	}
+	// Hardening for runtime-generated (untrusted) sub-flows: bound breadth and
+	// contain filesystem access. These do NOT apply to authored/saved flows.
+	if (opts.dynamic) {
+		if (flow.phases.length > MAX_DYNAMIC_PHASES) {
+			errors.push(`Dynamic sub-flow has too many phases (${flow.phases.length}, max ${MAX_DYNAMIC_PHASES})`);
+		}
+		if (typeof flow.concurrency === "number" && flow.concurrency > MAX_DYNAMIC_CONCURRENCY) {
+			errors.push(`Dynamic sub-flow concurrency too high (${flow.concurrency}, max ${MAX_DYNAMIC_CONCURRENCY})`);
+		}
+		const root = opts.cwd ? path.resolve(opts.cwd) : undefined;
+		for (const p of flow.phases) {
+			if (!p || typeof p !== "object") continue;
+			// Per-phase concurrency override is also capped.
+			if (typeof p.concurrency === "number" && p.concurrency > MAX_DYNAMIC_CONCURRENCY) {
+				errors.push(`Dynamic sub-flow phase '${p.id}': concurrency too high (${p.concurrency}, max ${MAX_DYNAMIC_CONCURRENCY})`);
+			}
+			// cwd containment: a generated phase may not escape the run's cwd.
+			if (typeof p.cwd === "string" && root) {
+				const resolved = path.resolve(root, p.cwd);
+				if (resolved !== root && !resolved.startsWith(root + path.sep)) {
+					errors.push(`Dynamic sub-flow phase '${p.id}': cwd '${p.cwd}' escapes the run directory`);
+				}
+			}
+		}
+	}
 	const ids = new Set<string>();
 	for (const p of flow.phases) {
 		if (!p || typeof p !== "object") {
@@ -439,7 +488,13 @@ export function validateTaskflow(def: unknown, opts: ValidationOptions = {}): Va
 			if (!p.task) errors.push(`Phase '${p.id}' (reduce) requires 'task'`);
 		}
 		if (type === "flow") {
-			if (!p.use) errors.push(`Phase '${p.id}' (flow) requires 'use' (a saved flow name)`);
+			const hasUse = typeof p.use === "string" && p.use.length > 0;
+			const hasDef = (p as { def?: unknown }).def !== undefined;
+			if (!hasUse && !hasDef) {
+				errors.push(`Phase '${p.id}' (flow) requires 'use' (a saved flow name) or 'def' (an inline definition)`);
+			} else if (hasUse && hasDef) {
+				errors.push(`Phase '${p.id}' (flow): 'use' and 'def' are mutually exclusive — provide exactly one`);
+			}
 		}
 		if (type === "loop") {
 			if (!p.task) errors.push(`Phase '${p.id}' (loop) requires 'task' (the iteration body)`);

package/extensions/store.ts CHANGED Viewed

@@ -54,7 +54,8 @@ export interface PhaseState {
 	gate?: { verdict: "pass" | "block"; reason?: string };
 	/** Total subagent attempts incl. retries (when > calls, a retry happened). */
 	attempts?: number;
-	/** True when a map/parallel fan-out was cut short by the budget cap. */
+	/** True when a map/parallel fan-out was cut short by the budget cap, or by the
+	 *  dynamic sub-flow fan-out safety limit (MAX_DYNAMIC_MAP_ITEMS). */
 	budgetTruncated?: boolean;
 	/** Human-in-the-loop outcome (approval phases only). */
 	approval?: { decision: "approve" | "reject" | "edit"; note?: string; auto?: boolean };
@@ -62,6 +63,9 @@ export interface PhaseState {
 	loop?: { iterations: number; stop: "until" | "converged" | "maxIterations" | "failed" | "aborted" };
 	/** Tournament outcome (tournament phases only). */
 	tournament?: { variants: number; winner: number; mode: "best" | "aggregate"; reason?: string };
+	/** Set when a `flow { def }` inline sub-flow definition could not be resolved,
+	 *  parsed, validated, or verified. The phase fails-open: this records why. */
+	defError?: string;
 	/** Non-fatal diagnostic warnings accumulated during this phase (e.g.
 	 *  unresolved interpolation placeholders, suspicious templates). */
 	warnings?: string[];

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "pi-taskflow",
-  "version": "0.0.17",
-  "description": "Lightweight workflow orchestration for the Pi coding agent — declarative multi-phase taskflows with dynamic fan-out, isolated subagent context, resumable runs, and saveable commands.",
+  "version": "0.0.18",
+  "description": "A declarative, verifiable graph of task nodes for the Pi coding agent — not a workflow you script, but a DAG you declare: statically verified before it runs, with dynamic fan-out, gates, isolated subagent context, resumable runs, and saveable commands.",
   "keywords": [
     "pi-package",
     "pi",
@@ -33,12 +33,11 @@
     "README.md",
     "README.zh-CN.md",
     "CHANGELOG.md",
-    "DESIGN.md",
     "LICENSE"
   ],
   "scripts": {
     "typecheck": "tsc --noEmit",
-    "test": "PI_TASKFLOW_BUILTIN_AGENTS_DIR= node --experimental-strip-types --test test/interpolate.test.ts test/condition.test.ts test/schema.test.ts test/usage.test.ts test/runtime.test.ts test/features.test.ts test/runner.test.ts test/store.test.ts test/agents.test.ts test/init.test.ts test/render.test.ts test/desugar.test.ts test/cache.test.ts test/loop.test.ts test/tournament.test.ts test/verify.test.ts test/gate-eval.test.ts test/transient-error.test.ts test/runtime-branches.test.ts test/interpolate-extended.test.ts test/store-extended.test.ts",
+    "test": "PI_TASKFLOW_BUILTIN_AGENTS_DIR= node --experimental-strip-types --test test/interpolate.test.ts test/condition.test.ts test/schema.test.ts test/usage.test.ts test/runtime.test.ts test/features.test.ts test/runner.test.ts test/store.test.ts test/agents.test.ts test/init.test.ts test/render.test.ts test/desugar.test.ts test/cache.test.ts test/loop.test.ts test/tournament.test.ts test/verify.test.ts test/gate-eval.test.ts test/transient-error.test.ts test/runtime-branches.test.ts test/interpolate-extended.test.ts test/store-extended.test.ts test/flow-def.test.ts",
     "test:e2e": "PI_TASKFLOW_PI_BIN=pi node --experimental-strip-types test/e2e.mts",
     "test:dogfood-cache": "node --experimental-strip-types test/dogfood-cache.mts"
   },

package/skills/taskflow/SKILL.md CHANGED Viewed

@@ -87,7 +87,7 @@ Call the `taskflow` tool. To run a brand-new flow you write inline, pass
 | `gate` | quality/review step that can **halt the flow** (see below) |
 | `reduce` | aggregate `from[]` phases into one output |
 | `approval` | **human-in-the-loop** pause: ask a person to approve / reject / edit before continuing |
-| `flow` | run a **saved sub-flow** (by `use`) as a single phase — composition/reuse |
+| `flow` | run a **sub-flow** as one phase — **saved** (`use`) or **runtime-generated** (`def`) |
 ### Control-flow fields (any phase)
@@ -133,15 +133,49 @@ deciding. The (interpolated) `task` is the prompt shown.
 ### Sub-flows (composition)
-A `flow` phase runs another **saved** taskflow by name and bubbles up its final
-output. Pass args via `with` (string values interpolate). Recursion is detected
-and rejected.
+A `flow` phase runs another taskflow as a single phase and bubbles up its final
+output. Two sources, **mutually exclusive**:
+**Saved** (`use`) — run a previously saved flow by name. Pass args via `with`
+(string values interpolate). Recursion is detected and rejected.
 ```jsonc
 { "id": "research", "type": "flow", "use": "deep-research",
   "with": { "topic": "{item}" }, "dependsOn": ["plan"] }
 ```
+**Runtime-generated** (`def`) — resolve a sub-flow *at runtime*, usually from an
+upstream phase's JSON output. The runtime interpolates + JSON-parses the `def`,
+**validates it** (cycles / dangling refs / duplicate ids), then runs it as a
+nested sub-flow. This is how a planner decides *at runtime* what work to spawn —
+the declarative answer to a code-mode `for`/`if` loop, with each generated plan
+checked before it spends a token.
+```jsonc
+// 1) A planner emits a plan as JSON. 2) flow{def} runs it.
+{ "id": "plan", "type": "agent", "agent": "planner", "output": "json",
+  "task": "Scan the repo. Output ONLY JSON {\"name\":\"audit\",\"phases\":[...]} — one audit phase per file." },
+{ "id": "run", "type": "flow", "def": "{steps.plan.json}", "dependsOn": ["plan"], "final": true }
+```
+**LLM output contract for `def`:** the upstream phase must output a *full*
+Taskflow `{"name":"...","phases":[...]}`, a bare `phases` array, or
+`{"phases":[...]}` — pure JSON (a ```json fence is tolerated and stripped).
+Use hyphens in ids, never underscores. Sub-flow phases reference each other in
+their **own** `{steps.x.output}` namespace (no parent-id prefixing needed).
+**Fail-open & limits:** if the `def` doesn't parse, has the wrong shape, or fails
+validation, the phase fails *open* — it's marked failed with a `defError`, the
+upstream output is preserved, and the run continues (use `optional: true` on the
+flow phase so a bad plan never aborts the run). An **empty** `phases` array is a
+valid no-op (the planner decided there's nothing to do). Inline nesting is capped
+at `MAX_DYNAMIC_NESTING` (5) to bound runaway self-spawning.
+**Iterative replanning** — pair `flow{def}` (or a JSON-emitting body) with `loop`
+so round N's plan depends on round N-1's **result** (not a one-shot fan-out):
+the declarative equivalent of `for (...) { read result; decide next }`. See
+`examples/dynamic-plan-execute.json` and `examples/iterative-replan.json`.
 ### Budget (cost / token caps)
 Add a run-wide ceiling at the top level. When accumulated cost/tokens exceed it,