pi-taskflow 0.0.22 → 0.0.23
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +77 -0
- package/README.md +174 -46
- package/extensions/context-store.ts +447 -0
- package/extensions/index.ts +135 -0
- package/extensions/runner.ts +96 -3
- package/extensions/runtime.ts +310 -13
- package/extensions/schema.ts +34 -6
- package/extensions/store.ts +17 -4
- package/extensions/workspace.ts +206 -0
- package/package.json +6 -2
- package/skills/taskflow/SKILL.md +104 -0
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,83 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to pi-taskflow are documented here. This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) format.
|
|
4
4
|
|
|
5
|
+
## [0.0.23] — 2026-06-11
|
|
6
|
+
|
|
7
|
+
> Feature release: the **Shared Context Tree** — an opt-in mechanism that gives
|
|
8
|
+
> subagents a horizontal blackboard and a vertical supervision tree, so fan-out
|
|
9
|
+
> items can reuse expensive context instead of re-reading it, and a node can
|
|
10
|
+
> delegate work at runtime and have its children report back. Validated with six
|
|
11
|
+
> real end-to-end runs (real `pi`, real models) including a recursive org tree
|
|
12
|
+
> and a large 5-way audit that converges through a loop + gate.
|
|
13
|
+
|
|
14
|
+
### Added
|
|
15
|
+
- **Shared Context Tree (opt-in).** Set `shareContext: true` on a phase (or
|
|
16
|
+
`contextSharing: true` at the flow level) to give its subagent four extra
|
|
17
|
+
tools backed by a per-run, file-based blackboard:
|
|
18
|
+
- `ctx_write(key, value)` / `ctx_read(key?)` — a **horizontal blackboard**: a
|
|
19
|
+
node publishes a finding; siblings/descendants reuse it (own > ancestors >
|
|
20
|
+
completed-others on key conflict; a running sibling's half-written findings
|
|
21
|
+
stay hidden). Stops fan-out items from re-reading the same files.
|
|
22
|
+
- `ctx_report(summary, structured?)` / `ctx_spawn(assignments[])` — a
|
|
23
|
+
**vertical supervision tree**: a node reports up, and delegates child work at
|
|
24
|
+
runtime; the runtime runs each child (isolated) after the node finishes and
|
|
25
|
+
folds their reports into the phase output.
|
|
26
|
+
- New module `extensions/context-store.ts` reuses the run store's atomic-write
|
|
27
|
+
+ file-lock primitives (per-node findings files — no global lock contention).
|
|
28
|
+
- All bookkeeping is **fail-open** (it can never sink a phase); the blackboard
|
|
29
|
+
is size-bounded (256 KB/value, 256 keys/node), depth-capped (5), and cleaned
|
|
30
|
+
up with the run. Fully backward-compatible: flows that don't opt in are
|
|
31
|
+
byte-for-byte unaffected.
|
|
32
|
+
- **`ctx_spawn` accepts a sub-graph, not just flat tasks.** An assignment is now
|
|
33
|
+
either `{task, agent?}` **or** `{subflow, defaultAgent?}` where `subflow` is an
|
|
34
|
+
inline Taskflow (a dependency-bearing DAG with `map`/`gate`/`reduce`). The
|
|
35
|
+
spawned subflow reuses the same `validateTaskflow` + `verifyTaskflow` +
|
|
36
|
+
nested-`executeTaskflow` machinery as `flow{def}`; spawn-subflows and `flow{def}`
|
|
37
|
+
share **one** `MAX_DYNAMIC_NESTING` counter (a `def:spawn-*` `_stack` frame), and
|
|
38
|
+
spawned child token/cost usage is folded into the parent phase for honest budget
|
|
39
|
+
accounting. A bad subflow fails open with a diagnostic.
|
|
40
|
+
- **Tests: 608 → 670** (+62) across 33 files, incl. `context-store`,
|
|
41
|
+
`context-tree`, `spawn-xor`, `spawn-subflow`, `spawn-subflow-nesting`,
|
|
42
|
+
`workspace`, `workspace-isolation`.
|
|
43
|
+
- **Workspace isolation (`cwd` keywords).** A phase's `cwd` now accepts three
|
|
44
|
+
reserved keywords that make the runtime allocate an isolated working directory
|
|
45
|
+
for the phase's subagent and tear it down afterwards:
|
|
46
|
+
- `"temp"` — an ephemeral dir under the OS tmpdir, removed when the phase ends.
|
|
47
|
+
- `"dedicated"` — a persistent dir under the run state
|
|
48
|
+
(`runs/ws/<runId>/<phaseId>`), kept for inspection and deterministic per
|
|
49
|
+
phase so a **resume reuses the same dir**.
|
|
50
|
+
- `"worktree"` — a real `git worktree` on a throwaway branch off `HEAD`,
|
|
51
|
+
removed (`git worktree remove --force` + branch delete) when the phase ends;
|
|
52
|
+
for changes you want to diff / commit / discard in isolation.
|
|
53
|
+
- New module `extensions/workspace.ts` (zero deps: `fs.mkdtemp` + `git` via
|
|
54
|
+
`child_process`). **Fail-open**: a failed allocation degrades to the base
|
|
55
|
+
cwd (`worktree`→`temp` when not a git repo) and records a `warnings`
|
|
56
|
+
diagnostic — a phase never fails to run because of isolation. **Security**:
|
|
57
|
+
the keywords are rejected at validation in LLM-authored sub-flows
|
|
58
|
+
(`flow{def}` / `ctx_spawn` subflow) so generated plans cannot allocate
|
|
59
|
+
worktrees or temp dirs that mutate the repo. A literal path is passed
|
|
60
|
+
through unchanged (fully backward-compatible).
|
|
61
|
+
|
|
62
|
+
### Fixed
|
|
63
|
+
- **`map` / `parallel` fan-out items that call `ctx_spawn` were silently
|
|
64
|
+
orphaned.** The post-run spawn-drain only covered single-agent/`gate`/`reduce`
|
|
65
|
+
phases (keyed on the base phase id), but fan-out items run with suffixed node
|
|
66
|
+
ids (`audit-0`…`audit-4`) and were never drained — their queued children never
|
|
67
|
+
ran (5 orphaned intents, 0 children, in a real e2e). Each fan-out item now
|
|
68
|
+
drains its own node and runs + folds its spawned children (reports + usage),
|
|
69
|
+
fail-open. Regression test added.
|
|
70
|
+
- **Workspace override no longer leaks across isolation boundaries** (found by
|
|
71
|
+
the pre-release adversarial review). `runInlineSubflow` and the gate
|
|
72
|
+
`onBlock:retry` upstream re-execution both spread `...deps` without clearing
|
|
73
|
+
the parent's `_cwdOverride`, so a spawned subflow / re-run upstream dep could
|
|
74
|
+
be force-pinned to the parent phase's isolated dir. Both now strip the
|
|
75
|
+
override (a spawned subflow still inherits the parent's dir as its *base* cwd,
|
|
76
|
+
consistent with `flow{def}`, but no longer ignores an inner phase's own cwd).
|
|
77
|
+
The triplicated `effCwd` formula was extracted into one `resolveEffCwd()`
|
|
78
|
+
helper (the divergence was the root cause). `runs/ws/` dedicated-workspace
|
|
79
|
+
dirs are now reclaimed by the terminal-run cleanup, and `rmrf()` gained a
|
|
80
|
+
path-containment guard (defense-in-depth).
|
|
81
|
+
|
|
5
82
|
## [0.0.22] — 2026-06-10
|
|
6
83
|
|
|
7
84
|
> Dogfooding release. The `dogfood-full` self-audit taskflow (which itself
|
package/README.md
CHANGED
|
@@ -8,24 +8,18 @@
|
|
|
8
8
|
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-MIT-43D9AD?style=flat-square" alt="MIT license"></a>
|
|
9
9
|
<a href="#whats-inside"><img src="https://img.shields.io/badge/runtime%20deps-0-43D9AD?style=flat-square" alt="zero runtime dependencies"></a>
|
|
10
10
|
<a href="https://github.com/heggria/pi-taskflow/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/heggria/pi-taskflow/ci.yml?branch=main&style=flat-square&label=CI" alt="CI status"></a>
|
|
11
|
-
<a href="#whats-inside"><img src="https://img.shields.io/badge/tests-
|
|
11
|
+
<a href="#whats-inside"><img src="https://img.shields.io/badge/tests-670-6E8BFF?style=flat-square" alt="670 tests"></a>
|
|
12
12
|
<a href="#whats-inside"><img src="https://img.shields.io/badge/dogfooded-%E2%9C%93-43D9AD?style=flat-square" alt="dogfooded"></a>
|
|
13
13
|
<a href="https://pi.dev"><img src="https://img.shields.io/badge/for-Pi%20coding%20agent-B692FF?style=flat-square" alt="for the Pi coding agent"></a>
|
|
14
14
|
</p>
|
|
15
15
|
|
|
16
16
|
<p align="center">
|
|
17
17
|
<b>English</b> ·
|
|
18
|
-
<a href="./README.zh-CN.md">简体中文</a>
|
|
19
|
-
<a href="./docs/i18n/README.hi.md">हिन्दी</a> ·
|
|
20
|
-
<a href="./docs/i18n/README.es.md">Español</a> ·
|
|
21
|
-
<a href="./docs/i18n/README.ar.md">العربية</a> ·
|
|
22
|
-
<a href="./docs/i18n/README.bn.md">বাংলা</a> ·
|
|
23
|
-
<a href="./docs/i18n/README.pt.md">Português</a> ·
|
|
24
|
-
<a href="./docs/i18n/README.ru.md">Русский</a>
|
|
18
|
+
<a href="./README.zh-CN.md">简体中文</a>
|
|
25
19
|
</p>
|
|
26
20
|
|
|
27
21
|
<p><strong>A declarative, verifiable <em>graph of tasks</em> for <a href="https://pi.dev">Pi</a> subagents.</strong><br/>
|
|
28
|
-
Not a workflow you script — a DAG you declare. Fan out · gate · resume · save as a command — intermediate results stay out of your context.</p>
|
|
22
|
+
Not a workflow you script — a DAG you declare. Fan out · gate · loop · tournament · resume · save as a command — intermediate results stay out of your context.</p>
|
|
29
23
|
|
|
30
24
|
```bash
|
|
31
25
|
pi install npm:pi-taskflow
|
|
@@ -37,7 +31,7 @@ pi install npm:pi-taskflow
|
|
|
37
31
|
|
|
38
32
|
**A `workflow` flows. A `taskflow` is a *graph*.** Other orchestrators let the model *script* the work — imperative code that flows step by step, with the graph hidden inside control flow. `pi-taskflow` does the opposite: you **declare** the work as a graph of discrete, named **task** nodes connected by `dependsOn` edges — and the runtime *verifies that graph before it spends a single token.*
|
|
39
33
|
|
|
40
|
-
You already know the built-in subagent tool's `task` / `tasks` / `chain`. `pi-taskflow` speaks the *same* shorthand — so your existing delegations instantly become **tracked, resumable, and saveable as a one-word `/tf:<name>` command**. When you outgrow the shorthand, the full DSL gives you a real DAG: dynamic fan-out over dozens of items, conditional routing, quality gates, human approvals, retries, and a hard spend ceiling.
|
|
34
|
+
You already know the built-in subagent tool's `task` / `tasks` / `chain`. `pi-taskflow` speaks the *same* shorthand — so your existing delegations instantly become **tracked, resumable, and saveable as a one-word `/tf:<name>` command**. When you outgrow the shorthand, the full DSL gives you a real DAG: dynamic fan-out over dozens of items, conditional routing, quality gates, human approvals, retries, loops, tournaments, and a hard spend ceiling.
|
|
41
35
|
|
|
42
36
|
And the whole time, **only the final phase reaches your conversation.** Every intermediate transcript stays in the runtime, never your context window.
|
|
43
37
|
|
|
@@ -81,7 +75,9 @@ Here's the wall you hit with raw subagents: you describe a multi-step plan in pr
|
|
|
81
75
|
| **Fault tolerance** | ✗ | **per-phase `retry` + auto-retry on transient errors** |
|
|
82
76
|
| **Human-in-the-loop** | ✗ | **`approval` phases (approve / reject / edit)** |
|
|
83
77
|
| **Cost control** | ✗ | **run-wide `budget` (USD / token caps)** |
|
|
84
|
-
| **Composition** | ✗ | **`flow` phases run saved sub-flows** |
|
|
78
|
+
| **Composition** | ✗ | **`flow` phases run saved *or runtime-generated* sub-flows** |
|
|
79
|
+
| **Iterative loops** | ✗ | **`loop` phases — repeat until condition, convergence, or cap** |
|
|
80
|
+
| **Competitive selection** | ✗ | **`tournament` phases — N variants + judge** |
|
|
85
81
|
| **Live progress** | opaque while running | **live DAG render with timing + cost** |
|
|
86
82
|
| **Ergonomics** | inline JSON each time | **shorthand (`task`/`tasks`/`chain`) *or* DSL** |
|
|
87
83
|
|
|
@@ -99,13 +95,13 @@ The closest thing to `pi-taskflow` in spirit is the **dynamic / code-mode workfl
|
|
|
99
95
|
| **See it** | ✗ the graph only exists as the code runs | **✓ the live progress render *is* the DAG** |
|
|
100
96
|
| **Resume** | coarse (call-cache dedup) | **✓ phase-by-phase input-hash resume, cross-session** |
|
|
101
97
|
| **Safe to LLM-generate** | risky — it's executable code | **✓ it's just data — no `eval`; and a runtime-generated sub-flow is *structurally validated* (cycles / dangling refs / duplicate ids) before it runs** |
|
|
102
|
-
| **Expressivity ceiling** | **higher** — arbitrary control flow | bounded by the DSL, but `map`/`when`/`loop`/`gate` — plus **runtime-generated sub-flows (`flow {def}`)** for plan-then-execute and iterative replanning — cover most jobs |
|
|
98
|
+
| **Expressivity ceiling** | **higher** — arbitrary control flow | bounded by the DSL, but `map`/`when`/`loop`/`gate`/`tournament` — plus **runtime-generated sub-flows (`flow {def}`)** for plan-then-execute and iterative replanning — cover most jobs |
|
|
103
99
|
|
|
104
100
|
We chose the **verifiable** side on purpose. The expressivity you give up is real; what you get back — a plan you can check, watch, replay, and safely let a model author — is what turns one-off prompting into durable orchestration.
|
|
105
101
|
|
|
106
102
|
## Compared to other Pi extensions
|
|
107
103
|
|
|
108
|
-
The Pi ecosystem now has **20+ delegation, workflow, and orchestration extensions** — each great at what it's for. Here's an honest map of where `pi-taskflow` sits (verified against each package's latest npm release, June 2026). For the full breakdown — every package, strengths *and* weaknesses — see [`PI-ECOSYSTEM.md`](./PI-ECOSYSTEM.md). For the broader, non-Pi landscape (LangGraph, Temporal, CrewAI, Mastra…) see [`COMPETITORS.md`](./COMPETITORS.md).
|
|
104
|
+
The Pi ecosystem now has **20+ delegation, workflow, and orchestration extensions** — each great at what it's for. Here's an honest map of where `pi-taskflow` sits (verified against each package's latest npm release, June 2026). For the full breakdown — every package, strengths *and* weaknesses — see [`docs/internal/PI-ECOSYSTEM.md`](./docs/internal/PI-ECOSYSTEM.md). For the broader, non-Pi landscape (LangGraph, Temporal, CrewAI, Mastra…) see [`docs/internal/COMPETITORS.md`](./docs/internal/COMPETITORS.md).
|
|
109
105
|
|
|
110
106
|
| Extension | Model | Custom DSL | DAG | Dynamic fan-out | Cross-session resume | Quality gate | Human approval | Save as command | Zero deps |
|
|
111
107
|
|---|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
|
@@ -120,18 +116,18 @@ The Pi ecosystem now has **20+ delegation, workflow, and orchestration extension
|
|
|
120
116
|
| [`pi-pipeline`](https://www.npmjs.com/package/pi-pipeline) | fixed SPEC→PLAN→TASKS→VERIFY | ✕ | fixed | ✕ | session planning | ✓ | clarify | ✕ | ✕ (2) |
|
|
121
117
|
| [`pi-agent-flow`](https://www.npmjs.com/package/pi-agent-flow) | one-shot parallel specialist `fork` | yes | ✕ | ✕ | – | ✕ | ✕ | – | ✕ (2) |
|
|
122
118
|
|
|
123
|
-
*(Representative slice of the 20+ — see [`PI-ECOSYSTEM.md`](./PI-ECOSYSTEM.md) for all of them, plus `@0xkobold/pi-orchestration`, `@melihmucuk/pi-crew`, `@mediadatafusion/pi-workflow-suite`, `gentle-pi`, `@dreki-gg/pi-subagent`, and more.)*
|
|
119
|
+
*(Representative slice of the 20+ — see [`docs/internal/PI-ECOSYSTEM.md`](./docs/internal/PI-ECOSYSTEM.md) for all of them, plus `@0xkobold/pi-orchestration`, `@melihmucuk/pi-crew`, `@mediadatafusion/pi-workflow-suite`, `gentle-pi`, `@dreki-gg/pi-subagent`, and more.)*
|
|
124
120
|
|
|
125
121
|
**How to choose:**
|
|
126
122
|
|
|
127
123
|
- **`@pi-agents/orchid`** is the most feature-complete orchestrator in the ecosystem (DAG + worktrees + Ralph loop + agent mailbox) — but its DSL is a *fixed* 9-phase pipeline, it carries runtime deps + jiti, and it's beta. Reach for `pi-taskflow` when you want to **define your own graph** (not adopt an opinionated one) with **zero dependencies** and a one-command install.
|
|
128
124
|
- **`pi-crew` / `ultimate-pi`** go heavier — worktree isolation, durable async teams, multi-tier governance. If you want lightweight, declarative, and zero-dependency, that's this project.
|
|
129
125
|
- **`@zhushanwen/pi-workflow`** is the closest in spirit and also zero-dep, but it's the **imperative** side of the split above: you author workflows as **JavaScript scripts** the model writes and runs. `pi-taskflow`'s **declarative JSON DAG** is the verifiable side — statically checkable, visualizable, safe to LLM-generate, and resumable at phase granularity rather than call-cache dedup.
|
|
130
|
-
- **`@fiale-plus/pi-rogue-orchestration`** has a real **loop-until-done** (
|
|
126
|
+
- **`@fiale-plus/pi-rogue-orchestration`** has a real **loop-until-done** (goal-driven iteration). `pi-taskflow` now ships its own `loop` phase (v0.0.13+) plus `tournament` for competitive selection — and unlike rogue-orchestration, `pi-taskflow` has a full DAG with gates, compositional sub-flows, and cross-session resume. For raw "keep going until the goal is met" with minimal structure, rogue-orchestration is still lighter; for structured, branching pipelines, `pi-taskflow` covers the same ground and more.
|
|
131
127
|
- **`pi-subagents` / `@gotgenes/pi-subagents`** are the mature picks for ad-hoc "use reviewer on this diff" delegation and background jobs. `pi-taskflow` is for when those delegations need to become a *repeatable, resumable pipeline*.
|
|
132
128
|
- **`pi-pipeline` / `pi-agent-flow`** ship *opinionated, fixed* flows. `pi-taskflow` ships an *empty canvas*: you (or the model) declare the graph that fits the job.
|
|
133
129
|
|
|
134
|
-
> The honest one-liner: **`pi-taskflow` is the only Pi extension that gives you a *declarative, verifiable, resumable* DAG of task nodes — saved as a one-word command, with zero runtime dependencies and context isolation by design.** Where code-mode workflows let the model *script* the work, `pi-taskflow` lets it *declare a graph the runtime can prove correct before running.*
|
|
130
|
+
> The honest one-liner: **`pi-taskflow` is the only Pi extension that gives you a *declarative, verifiable, resumable* DAG of task nodes — saved as a one-word command, with zero runtime dependencies and context isolation by design.** Where code-mode workflows let the model *script* the work, `pi-taskflow` lets it *declare a graph the runtime can prove correct before running.* Recently shipped from the roadmap: the Shared Context Tree (blackboard + supervision) and worktree isolation (see [`docs/internal/STRATEGY.md`](./docs/internal/STRATEGY.md)).
|
|
135
131
|
|
|
136
132
|
## 30-second start
|
|
137
133
|
|
|
@@ -176,6 +172,16 @@ That's it. You can be running your first workflow before your coffee cools — w
|
|
|
176
172
|
|
|
177
173
|
`agent` is optional (defaults to the first discovered agent). Add a `name` to label the run and unlock saving it as a command.
|
|
178
174
|
|
|
175
|
+
Shorthand modes also support per-step **context pre-reading** — pass `context` (file paths) and optionally `contextLimit` (max chars per file, default 8000) at the step level:
|
|
176
|
+
|
|
177
|
+
```jsonc
|
|
178
|
+
// Chain with context files injected into each step
|
|
179
|
+
{ "chain": [
|
|
180
|
+
{ "task": "List the public API", "agent": "scout", "context": ["src/lib/**/*.ts"] },
|
|
181
|
+
{ "task": "Write docs for:\n{previous.output}", "agent": "writer" }
|
|
182
|
+
] }
|
|
183
|
+
```
|
|
184
|
+
|
|
179
185
|
## Watch it run
|
|
180
186
|
|
|
181
187
|
This is not a mockup. **This is stdout from a real run** — the `self-improve` flow that writes and verifies its own test suites, caught mid-flight by a quality gate:
|
|
@@ -262,6 +268,60 @@ The intermediate summaries never enter your context. The runtime owns them; you
|
|
|
262
268
|
|
|
263
269
|
No scripting. No `eval`. Just data the runtime executes — safe enough to run LLM-generated definitions directly.
|
|
264
270
|
|
|
271
|
+
### Loop until done
|
|
272
|
+
|
|
273
|
+
Some work is inherently iterative — refine a draft until a reviewer is satisfied, retry-and-improve until tests pass, converge on an answer:
|
|
274
|
+
|
|
275
|
+
```jsonc
|
|
276
|
+
{
|
|
277
|
+
"id": "refine",
|
|
278
|
+
"type": "loop",
|
|
279
|
+
"task": "Improve this draft (iteration {loop.iteration}). Previous attempt:\n{loop.lastOutput}\n\nReturn JSON {\"draft\":\"…\",\"done\":true|false}.",
|
|
280
|
+
"until": "{steps.refine.json.done} == true",
|
|
281
|
+
"output": "json",
|
|
282
|
+
"maxIterations": 6,
|
|
283
|
+
"convergence": true
|
|
284
|
+
}
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
See [Loop phases](#loop-until-done-loop) for the full reference.
|
|
288
|
+
|
|
289
|
+
### Plan, then execute (runtime sub-flows)
|
|
290
|
+
|
|
291
|
+
A planner decides *at runtime* what work to spawn — each iteration's plan depends on the previous result:
|
|
292
|
+
|
|
293
|
+
```jsonc
|
|
294
|
+
{
|
|
295
|
+
"name": "iterative-replan",
|
|
296
|
+
"phases": [
|
|
297
|
+
{ "id": "plan", "type": "agent", "agent": "planner",
|
|
298
|
+
"task": "Given the current state, output a JSON taskflow definition (with phases[]).",
|
|
299
|
+
"output": "json" },
|
|
300
|
+
{ "id": "execute", "type": "flow", "def": "{steps.plan.json}",
|
|
301
|
+
"dependsOn": ["plan"] }
|
|
302
|
+
]
|
|
303
|
+
}
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
The generated sub-flow is **validated** (no cycles, no dangling refs, no duplicate IDs) before a single token is spent. See [`examples/dynamic-plan-execute.json`](./examples/dynamic-plan-execute.json) and [`examples/iterative-replan.json`](./examples/iterative-replan.json).
|
|
307
|
+
|
|
308
|
+
### Tournament (compete and judge)
|
|
309
|
+
|
|
310
|
+
For open-ended creative or subjective work, spawn several competing variants and let a judge pick the best:
|
|
311
|
+
|
|
312
|
+
```jsonc
|
|
313
|
+
{
|
|
314
|
+
"id": "headline",
|
|
315
|
+
"type": "tournament",
|
|
316
|
+
"task": "Write a punchy headline for this launch post.",
|
|
317
|
+
"variants": 4,
|
|
318
|
+
"judge": "Pick the headline with the strongest hook and clearest promise.",
|
|
319
|
+
"mode": "best"
|
|
320
|
+
}
|
|
321
|
+
```
|
|
322
|
+
|
|
323
|
+
See [Tournament phases](#tournament-tournament) for the full reference.
|
|
324
|
+
|
|
265
325
|
## Phase types
|
|
266
326
|
|
|
267
327
|
| type | what it does | required fields |
|
|
@@ -289,23 +349,56 @@ Every phase needs a unique `id` and a `type` (defaults to `agent`). On top of th
|
|
|
289
349
|
| `retry` | `{ max, backoffMs?, factor? }` — retry a failing subagent |
|
|
290
350
|
| `output` | `"text"` (default) or `"json"` (exposes `{steps.ID.json}`) |
|
|
291
351
|
| `model` / `thinking` / `tools` | Per-phase overrides for the subagent |
|
|
292
|
-
| `cwd` | Working directory for the subagent |
|
|
352
|
+
| `cwd` | Working directory for the subagent. A literal path, or a reserved keyword for **workspace isolation** — `"temp"` (ephemeral dir, removed after), `"dedicated"` (persistent dir under the run state, kept), `"worktree"` (a git worktree on a throwaway branch, removed after). Fail-open; rejected in LLM-authored sub-flows. |
|
|
353
|
+
| `context` | File paths to pre-read and inject into the agent prompt |
|
|
354
|
+
| `contextLimit` | Max chars per context file (default 8000) |
|
|
293
355
|
| `concurrency` | Fan-out cap for `map` / `parallel` (overrides the flow default) |
|
|
294
356
|
| `final` | Marks the result-bearing phase (else the last phase wins) |
|
|
295
357
|
| `optional` | A failure here does **not** abort the run |
|
|
296
|
-
| `
|
|
297
|
-
| `def` | (`flow`) inline sub-flow **generated at runtime** — usually `"{steps.plan.json}"` (mutually exclusive with `use`) |
|
|
358
|
+
| `shareContext` | Opt this phase's subagent into the **Shared Context Tree** (see below). Set `contextSharing: true` at the flow level to enable it for every phase |
|
|
298
359
|
| `cache` | `{ scope, ttl?, fingerprint? }` — cross-run memoization (see below) |
|
|
360
|
+
| `onBlock` | `"halt"` (default) or `"retry"` — what happens when a gate blocks |
|
|
361
|
+
| `eval` | Zero-token machine-checkable criteria that run *before* the LLM gate |
|
|
362
|
+
|
|
363
|
+
Flow-level keys: `name`, `description`, `args`, `concurrency` (default 8), `agentScope`, `contextSharing`, `strictInterpolation`, and `budget: { maxUSD?, maxTokens? }`.
|
|
364
|
+
|
|
365
|
+
### Shared Context Tree (blackboard + supervision)
|
|
366
|
+
|
|
367
|
+
By default subagents are fully isolated — they share nothing and only return a
|
|
368
|
+
final string. Opt a phase in with `shareContext: true` (or `contextSharing: true`
|
|
369
|
+
flow-wide) to give its subagent four extra tools backed by a per-run, file-based
|
|
370
|
+
blackboard:
|
|
299
371
|
|
|
300
|
-
|
|
372
|
+
| tool | direction | use |
|
|
373
|
+
|------|-----------|-----|
|
|
374
|
+
| `ctx_write(key, value)` | horizontal | publish a finding so siblings/descendants reuse it (stop re-reading the same files) |
|
|
375
|
+
| `ctx_read(key?)` | horizontal | read findings visible to this node: its own + ancestors' + **completed** others' |
|
|
376
|
+
| `ctx_report(summary, structured?)` | vertical ↑ | report a result up to the parent |
|
|
377
|
+
| `ctx_spawn(assignments[])` | vertical ↓ | delegate child work at runtime; each assignment is a flat `{task}` **or** a `{subflow}` (a dependency-bearing DAG the runtime validates and runs nested). Child reports fold back into this phase's output |
|
|
378
|
+
|
|
379
|
+
The first two are a **horizontal blackboard** (siblings reuse expensive context);
|
|
380
|
+
the last two are a **vertical supervision tree** (a node delegates work and its
|
|
381
|
+
children report up). Everything is opt-in, fail-open, depth-capped (5 levels), size-bounded
|
|
382
|
+
(256KB per value, 256 keys per node, 16 spawn assignments max), and cleaned up
|
|
383
|
+
with the run — flows that don't opt in behave exactly as before.
|
|
384
|
+
|
|
385
|
+
```jsonc
|
|
386
|
+
{ "id": "survey", "type": "agent", "agent": "scout", "shareContext": true,
|
|
387
|
+
"task": "Map the API surface. ctx_write key 'endpoints' so the auditors don't re-scan." },
|
|
388
|
+
{ "id": "audit", "type": "map", "over": "{steps.survey.json}", "shareContext": true,
|
|
389
|
+
"dependsOn": ["survey"], "agent": "analyst",
|
|
390
|
+
"task": "ctx_read 'endpoints' for shared context, then audit {item} for missing auth." }
|
|
391
|
+
```
|
|
301
392
|
|
|
302
393
|
### Control flow & reliability
|
|
303
394
|
|
|
304
|
-
- **`when`** — skip a phase unless an expression is truthy. Supports `{refs}`, `== != < > <= >=`, `&& || !`, parentheses, and quoted strings/numbers. Pair with `join: "any"` on the merge phase for real if/else routing. Parse errors **fail open
|
|
395
|
+
- **`when`** — skip a phase unless an expression is truthy. Supports `{refs}`, `== != < > <= >=`, `&& || !`, parentheses, and quoted strings/numbers. Pair with `join: "any"` on the merge phase for real if/else routing. Parse errors **fail open** (the phase runs — never silently dropped).
|
|
305
396
|
- **`join: "any"`** — an OR-join: the phase runs as soon as *one* dependency completes (default `"all"` waits for all).
|
|
306
397
|
- **`retry`** — `{ "max": 2, "backoffMs": 500, "factor": 2 }` retries a failing subagent with fixed or exponential backoff; usage is summed and the attempt count shows as `↻N` in the TUI. Transient provider errors (rate-limit / 5xx / timeout) **auto-retry even without an explicit policy**; hard errors don't.
|
|
307
|
-
- **`
|
|
308
|
-
- **`
|
|
398
|
+
- **`onBlock`** — `"halt"` (default) stops the run when a gate blocks. `"retry"` retries upstream phases when a gate blocks, instead of halting — a self-healing rework loop with budget and idle-watchdog guards and a nested recursion depth cap.
|
|
399
|
+
- **`eval`** — zero-token machine-checkable criteria that run *before* the LLM gate. If the eval check fails, the gate blocks without spawning an agent.
|
|
400
|
+
- **`approval`** — pause for a human (Approve / Reject / Edit). Reject halts the flow; Edit injects the typed note as the phase output for downstream steps. Non-interactive runs (detached / CI) **auto-reject** (safety: approval gates are never bypassed).
|
|
401
|
+
- **`flow`** — `{ "type": "flow", "use": "deep-research", "with": { "topic": "{item}" } }` runs a **saved** flow as a phase (recursion is detected and rejected). Or **generate the sub-flow at runtime**: `{ "type": "flow", "def": "{steps.plan.json}" }` resolves an upstream phase's JSON output into a sub-flow, **validates it (cycles / dangling refs / duplicate ids / dead-ends), then runs it** — the number and shape of the generated phases is decided at runtime, not authored in advance. A malformed plan fails *open* (the phase is skipped with a `defError`, the run continues). This is how a planner decides *at runtime* what work to spawn — the declarative answer to a code-mode `for` loop, with each generated plan checked before it spends a token. Security hardening for LLM-generated sub-flows: breadth caps (100 phases, 200 map items, 16 concurrency), `cwd` containment, budget clamped to `min(child, parent)`, nesting cap (5 levels), and prototype-pollution defense (deep-cloned, `__proto__`/`constructor`/`prototype` stripped). Pair it with `loop` for **data-dependent iterative replanning** (round N's plan depends on round N-1's result). See [`examples/dynamic-plan-execute.json`](./examples/dynamic-plan-execute.json) and [`examples/iterative-replan.json`](./examples/iterative-replan.json).
|
|
309
402
|
|
|
310
403
|
### Loop-until-done (`loop`)
|
|
311
404
|
|
|
@@ -348,8 +441,6 @@ For open-ended work, the best result often comes from generating several candida
|
|
|
348
441
|
- **Judge** — after the fan-out, one judge agent sees every variant (numbered) plus your `judge` rubric and picks a winner via a `WINNER: <n>` line or `{"winner": n}`. An unreadable verdict **fails open** to variant 1; a failed judge falls back too — the work is never lost.
|
|
349
442
|
- **`mode`** — `best` returns the winning variant **verbatim**; `aggregate` returns the judge's **synthesized** answer combining the strongest parts.
|
|
350
443
|
- **Short-circuits:** if only one competitor survives, it wins with no judge call; if all fail, the phase fails. The TUI shows `⚑ N→#k`; usage sums variants + judge. Like `gate`, it's **excluded from `cross-run` cache**.
|
|
351
|
-
- **`budget`** — a run-wide `{maxUSD, maxTokens}` ceiling; once exceeded, pending phases skip and in-flight fan-out stops spawning, ending the run as `blocked`.
|
|
352
|
-
- **idle watchdog** — a subagent that goes silent for 5 minutes is treated as wedged and killed (SIGTERM → SIGKILL), so one hung child can never freeze the whole flow.
|
|
353
444
|
|
|
354
445
|
### Cross-run memoization (`cache`)
|
|
355
446
|
|
|
@@ -371,7 +462,7 @@ Every phase is already content-addressed: within a single run's **resume**, a ph
|
|
|
371
462
|
- **`scope`** — `"run-only"` (default) is exactly the historical behavior (within-run resume only). `"cross-run"` opts the phase into the persistent store. `"off"` disables reuse entirely (even within a run), for debugging.
|
|
372
463
|
- **Freshness is the whole game.** The cache key already includes the prompt, the `over` items, and any `context` files (pre-read into the task). `fingerprint` folds *implicit* inputs into the key so "the world changed" becomes a cache miss: `git:HEAD`, `glob:<pat>` (size+mtime), `glob!:<pat>` (content hash), `file:<path>`, `env:<NAME>`. `ttl` (`30m`/`6h`/`7d`) is a time backstop.
|
|
373
464
|
- **Honest limit:** a subagent that reads a file it didn't declare in `context`/`fingerprint` can still serve a stale `cross-run` hit. That's why the default is `run-only` and why `gate`/`approval` phases are **forbidden** from `cross-run` (they must produce a fresh result each run). Opt in only for phases whose output is a function of declared inputs.
|
|
374
|
-
- Cache lives in `.pi/taskflows/cache/` (gitignored). Clear it with `action: "cache-clear"
|
|
465
|
+
- Cache lives in `.pi/taskflows/cache/` (gitignored). Clear it with `action: "cache-clear"` on the tool. Full rationale: [`docs/internal/rfc-cross-run-memoization.md`](./docs/internal/rfc-cross-run-memoization.md).
|
|
375
466
|
|
|
376
467
|
### Gate phases (quality control)
|
|
377
468
|
|
|
@@ -398,11 +489,16 @@ Review the audit below. If any endpoint is missing auth, end with
|
|
|
398
489
|
| `{steps.ID.json}` | prior output parsed as JSON (or `{steps.ID.json.field}`) |
|
|
399
490
|
| `{item}` / `{item.field}` | current item inside a `map` phase |
|
|
400
491
|
| `{previous.output}` | the immediately-upstream phase output |
|
|
492
|
+
| `{loop.iteration}` | current iteration number inside a `loop` phase |
|
|
493
|
+
| `{loop.lastOutput}` | previous iteration's output inside a `loop` phase |
|
|
494
|
+
| `{loop.maxIterations}` | the iteration cap inside a `loop` phase |
|
|
401
495
|
|
|
402
496
|
Condition grammar (for `when`): `== != < > <= >=`, `&& || !`, parentheses, quoted strings/numbers, and any `{...}` reference — e.g. `"when": "{steps.triage.json.route} == deep && {args.force} != true"`.
|
|
403
497
|
|
|
404
498
|
> Referencing `{steps.X}` that isn't declared in `dependsOn` is a **hard validation error** — the runtime catches the most common pipeline bug before a single agent runs.
|
|
405
499
|
|
|
500
|
+
> Unresolved interpolation refs (e.g. `{args.typo}` or a missing `dependsOn`) are surfaced as **phase warnings** (`PhaseState.warnings`) in the run record and `/tf runs` — no more silent intact placeholders.
|
|
501
|
+
|
|
406
502
|
## Commands
|
|
407
503
|
|
|
408
504
|
Saved flows become CLI shortcuts. All commands run in the Pi session:
|
|
@@ -412,12 +508,30 @@ Saved flows become CLI shortcuts. All commands run in the Pi session:
|
|
|
412
508
|
| `/tf list` | List all saved flows |
|
|
413
509
|
| `/tf run <name> [args]` | Run a saved flow (e.g. `/tf run summarize-files dir=src`) |
|
|
414
510
|
| `/tf show <name>` | Print a flow's definition |
|
|
415
|
-
| `/tf runs` | Browse recent run history (interactive TUI) |
|
|
511
|
+
| `/tf runs` | Browse recent run history (interactive TUI — **live auto-refreshes** while any run is active) |
|
|
416
512
|
| `/tf resume <runId>` | Continue a paused/failed run — cached phases skip automatically |
|
|
417
513
|
| `/tf init` | **Interactively map model roles** to your enabled models (writes `~/.pi/agent/settings.json`) |
|
|
418
514
|
| `/tf:<name> [args]` | Shortcut — runs the flow in one tap |
|
|
419
515
|
|
|
420
|
-
Tool actions (used by the model): `run` (inline `define` or saved `name`), `save`, `resume`, `list`, `init`.
|
|
516
|
+
Tool actions (used by the model): `run` (inline `define` or saved `name`), `save`, `resume`, `list`, `agents`, `init`, `verify`, `cache-clear`.
|
|
517
|
+
|
|
518
|
+
## Background (detached) execution
|
|
519
|
+
|
|
520
|
+
Pass `detach: true` to run a taskflow in a detached child process — the tool returns immediately with the `runId` and the flow continues running even if the host session exits:
|
|
521
|
+
|
|
522
|
+
```jsonc
|
|
523
|
+
{
|
|
524
|
+
"action": "run",
|
|
525
|
+
"name": "nightly-audit",
|
|
526
|
+
"detach": true
|
|
527
|
+
}
|
|
528
|
+
```
|
|
529
|
+
|
|
530
|
+
- The child process reads serialized context, calls the orchestration engine, and persists terminal state to the store.
|
|
531
|
+
- Status is polled via `/tf runs` (which now **auto-refreshes live** when any run is running) or `action: "resume"`.
|
|
532
|
+
- Stale PID detection via signal-0 probe; the idle watchdog kills stalled children.
|
|
533
|
+
- **Approval phases auto-reject** in detached mode — human gates are never silently bypassed.
|
|
534
|
+
- `resume` works normally after a detached run completes or fails.
|
|
421
535
|
|
|
422
536
|
## Resume across sessions
|
|
423
537
|
|
|
@@ -432,12 +546,13 @@ Resume is keyed on each phase's input hash — if an upstream output changed, de
|
|
|
432
546
|
## Storage
|
|
433
547
|
|
|
434
548
|
```
|
|
435
|
-
.pi/taskflows/<name>.json
|
|
436
|
-
~/.pi/agent/taskflows/<name>.json
|
|
549
|
+
.pi/taskflows/<name>.json # project-scope definitions (commit to share)
|
|
550
|
+
~/.pi/agent/taskflows/<name>.json # user-scope definitions
|
|
437
551
|
.pi/taskflows/runs/<flowName>/<runId>.json # run state for resume (gitignore this)
|
|
552
|
+
.pi/taskflows/cache/ # cross-run memoization cache (gitignored)
|
|
438
553
|
```
|
|
439
554
|
|
|
440
|
-
> Commit `.pi/taskflows/` and your whole team shares the pipelines — no config sync, no onboarding doc. Run state is written atomically and guarded by a zero-dependency file lock, so concurrent runs never corrupt the index.
|
|
555
|
+
> Commit `.pi/taskflows/` and your whole team shares the pipelines — no config sync, no onboarding doc. Run state is written atomically via `writeFileAtomic()` (temp file + `renameSync`) and guarded by a zero-dependency file lock (`O_CREAT|O_EXCL` with stale-lock steal via atomic rename), so concurrent runs never corrupt the index.
|
|
441
556
|
|
|
442
557
|
Agent discovery scope (via `agentScope` in the flow definition):
|
|
443
558
|
|
|
@@ -447,6 +562,8 @@ Agent discovery scope (via `agentScope` in the flow definition):
|
|
|
447
562
|
| `"project"` | `.pi/agents/*.md` (walks up the tree) |
|
|
448
563
|
| `"both"` | user + project; project wins on name collision |
|
|
449
564
|
|
|
565
|
+
Run cleanup is configurable via `maxKeptRuns` and `maxRunAgeDays` in settings.
|
|
566
|
+
|
|
450
567
|
## Agents
|
|
451
568
|
|
|
452
569
|
Taskflow ships **18 built-in agents** — each a `.md` file with a tuned system prompt, thinking level, and tool set. You can reference them by `name` in any phase or shorthand, right after install. No setup required.
|
|
@@ -601,6 +718,8 @@ Ready-to-read definitions in [`examples/`](./examples):
|
|
|
601
718
|
| [`summarize-files.json`](./examples/summarize-files.json) | discover → `map` fan-out → `reduce` |
|
|
602
719
|
| [`conditional-research.json`](./examples/conditional-research.json) | `when` routing + `join: any` + `gate` + `budget` |
|
|
603
720
|
| [`guarded-refactor.json`](./examples/guarded-refactor.json) | `approval` (human-in-the-loop) + `retry` + `gate` |
|
|
721
|
+
| [`dynamic-plan-execute.json`](./examples/dynamic-plan-execute.json) | `flow { def }` — plan then execute at runtime |
|
|
722
|
+
| [`iterative-replan.json`](./examples/iterative-replan.json) | `loop` + `flow { def }` — iterative replanning |
|
|
604
723
|
|
|
605
724
|
Copy one into `.pi/taskflows/<name>.json` (or `~/.pi/agent/taskflows/`) and it registers as `/tf:<name>` — or just point the model at it.
|
|
606
725
|
|
|
@@ -608,13 +727,13 @@ Copy one into `.pi/taskflows/<name>.json` (or `~/.pi/agent/taskflows/`) and it r
|
|
|
608
727
|
|
|
609
728
|
<div align="center">
|
|
610
729
|
|
|
611
|
-
**0 runtime dependencies** · **
|
|
730
|
+
**0 runtime dependencies** · **670 tests** · **9 phase types** · **shared context tree** · **cross-session resume** · **cross-run memoization** · **detached execution** · **~9k LOC runtime**
|
|
612
731
|
|
|
613
732
|
</div>
|
|
614
733
|
|
|
615
734
|
- **Zero runtime dependencies.** No `dependencies` field — the runtime is built entirely on Node built-ins (`fs` / `path` / `os` / `child_process` / `crypto`). The file lock is `fs.openSync("wx")`, not a third-party library.
|
|
616
|
-
- **
|
|
617
|
-
- **Hardened by design.** Path-traversal defense (lexical + `realpath`), runId validation, HTML/error sanitization, atomic writes, stale-lock stealing via `rename`, and an idle watchdog that kills wedged subagents.
|
|
735
|
+
- **670 tests across 33 test files** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, the shared context tree (blackboard reuse, supervision spawn, subflow validation/nesting), workspace isolation (temp/dedicated/worktree lifecycle, fail-open degrade, dynamic-flow rejection), dynamic sub-flow security hardening, detached execution (PID persistence, stale detection, crash→failed, resume after failure), live run-history refresh, callback isolation, the idle watchdog, model-role init config, parseModelFromLabel with parenthesized-model-name regression, and multi-fence `safeParse` recovery.
|
|
736
|
+
- **Hardened by design.** Path-traversal defense (lexical + `realpath` containment check), runId validation, HTML/error sanitization, atomic writes, stale-lock stealing via `rename`, and an idle watchdog that kills wedged subagents (SIGTERM → SIGKILL after 5 minutes of silence). Dynamic sub-flows additionally get breadth caps, `cwd` containment, budget clamping, nesting depth caps, and prototype-pollution defense.
|
|
618
737
|
- **Dogfooded.** Every new feature has to survive the project's own `self-improve` taskflow before it ships.
|
|
619
738
|
|
|
620
739
|
## 🍽️ We eat our own dog food
|
|
@@ -625,41 +744,50 @@ Our `self-improve` flow is a 10-phase DAG — it audits the codebase, patches de
|
|
|
625
744
|
|
|
626
745
|
| Campaign | Scale | Phases | Outcome |
|
|
627
746
|
|----------|-------|--------|---------|
|
|
628
|
-
| [v0.0.8 dogfood](./docs/dogfooding-v0.0.8-report.md) | Full codebase audit → triage → fix → verify | 10 phases, 234 tests | 13 fixes, all pass |
|
|
629
|
-
| [v0.0.6 self-audit](./docs/self-audit-report.md) | inventory → map audit → gate → approval → map fix → reduce | 9 phases | 11 critical defects fixed |
|
|
630
|
-
| [Cross-run cache dogfood](./docs/rfc-cross-run-memoization.md) | Real runtime + on-disk store | Dedicated test harness | Cache correctness under adversarial fingerprints |
|
|
631
|
-
| [Adversarial cross-review](./docs/brainstorm-adversarial-review-report.md) | Multi-agent adversarial review | `tournament` + `gate` | P0 cache-key fix shipped |
|
|
632
|
-
| [Init redesign review](./docs/issue-necessity-review-report.md) | Necessity audit → parallel checks → verdict | 7 phases | Full redesign plan validated |
|
|
633
|
-
| [Round 2 adversarial audit](./docs/internal/dogfooding-report.md) |
|
|
747
|
+
| [v0.0.8 dogfood](./docs/internal/dogfooding-v0.0.8-report.md) | Full codebase audit → triage → fix → verify | 10 phases, 234 tests | 13 fixes, all pass |
|
|
748
|
+
| [v0.0.6 self-audit](./docs/internal/self-audit-report.md) | inventory → map audit → gate → approval → map fix → reduce | 9 phases | 11 critical defects fixed |
|
|
749
|
+
| [Cross-run cache dogfood](./docs/internal/rfc-cross-run-memoization.md) | Real runtime + on-disk store | Dedicated test harness | Cache correctness under adversarial fingerprints |
|
|
750
|
+
| [Adversarial cross-review](./docs/internal/brainstorm-adversarial-review-report.md) | Multi-agent adversarial review | `tournament` + `gate` | P0 cache-key fix shipped |
|
|
751
|
+
| [Init redesign review](./docs/internal/issue-necessity-review-report.md) | Necessity audit → parallel checks → verdict | 7 phases | Full redesign plan validated |
|
|
752
|
+
| [Round 2 adversarial audit](./docs/internal/dogfooding-report.md) | Integration layer + cross-module — 12 findings across runner/runtime/interpolate/verify | 14 phases | 10 fixes applied, 0 regressions |
|
|
634
753
|
| [Round 3 adversarial audit](./docs/internal/dogfooding-report.md) | Integration layer + cross-module — 10 findings across index/agents/cache/render/runs-view | 9 phases | 10 fixes applied, 0 regressions |
|
|
754
|
+
| [v0.0.23 Shared Context Tree](./docs/internal/dogfooding-report.md) | End-to-end validation: org-tree spawn, 5-way audit via loop+gate | 6 e2e runs | Spawn-drain bug fixed, 50 new tests |
|
|
635
755
|
|
|
636
756
|
> **Meta:** we used `pi-taskflow`'s `map` fan-out, `gate` verdicts, `approval` human-in-the-loop, `tournament` best-of-N, `loop` until-done, and `cross-run` cache — to build `pi-taskflow`.
|
|
637
757
|
|
|
638
758
|
## Status & limits
|
|
639
759
|
|
|
640
|
-
**v0.0.
|
|
760
|
+
**v0.0.23** — **Shared Context Tree**: opt-in (`shareContext` / `contextSharing`) blackboard + supervision tools (`ctx_read`/`ctx_write` horizontal reuse, `ctx_report`/`ctx_spawn` vertical supervision); `ctx_spawn` accepts a flat task **or** a dependency-bearing `subflow` (a runtime-validated nested DAG), depth-capped on a unified nesting counter with budget accounting. **Workspace isolation**: a phase's `cwd` accepts reserved keywords `temp`/`dedicated`/`worktree` — the runtime allocates an isolated dir (or a git worktree on a throwaway branch) and tears it down after the phase, fail-open, rejected in LLM-authored sub-flows. Prior: loop-until-done (`loop`), tournament (best-of-N with a judge), cross-run memoization (content-addressed cache with git/file/glob/env fingerprints and TTL), interactive `/tf init`, configurable built-in agents, 18 built-in agents with 6 model roles. Full control-flow & reliability layer (`when` guards, `join: any`, `retry`/backoff, `approval`, `flow` composition, `budget` caps, `onBlock: "retry"`, `eval` machine gates, idle watchdog) on top of the DSL + DAG runtime (`agent`/`parallel`/`map`/`gate`/`reduce`). Inline + saved flows, cross-session resume, live progress, and isolated context. A run executes as one streaming tool call.
|
|
641
761
|
|
|
642
762
|
Known boundaries (tracked, bounded — no surprises mid-flow):
|
|
643
763
|
|
|
644
|
-
- **
|
|
764
|
+
- **Shared context is opt-in.** Subagents share nothing unless a phase sets `shareContext` (or the flow sets `contextSharing`). The blackboard is per-run, file-based, size-bounded, and cleaned up with the run. Spawn nesting is capped at `MAX_DYNAMIC_NESTING` (5). A spawned flat task is not individually checkpointed — on crash it re-runs on resume (spawned *subflows* resume their completed inner phases via the cache).
|
|
765
|
+
- **Workspace isolation is fail-open.** `cwd: "worktree"` requires the base cwd to be a git work tree; otherwise it degrades to a `temp` dir (with a warning). `temp`/`worktree` dirs are removed when the phase ends — a hard crash mid-phase may leave a stray dir (cleaned on the next run for `dedicated`; `temp`/`worktree` are under the OS tmpdir). The reserved keywords are honoured only in author-written flows.
|
|
645
766
|
- **No `output: "file"`.** Outputs are text/JSON only — write files via an agent's `write` tool call.
|
|
646
767
|
- **`map` requires a JSON array.** The `over` field must resolve to a `{steps.ID.json}` array. Wrap a text list in a single-agent `output: "json"` phase first.
|
|
647
768
|
- **The DAG must be acyclic.** Cycles are rejected at validation.
|
|
769
|
+
- **Cross-run cache excludes `gate`, `approval`, `loop`, and `tournament`.** These must produce a fresh result each run.
|
|
770
|
+
- **Approval auto-rejects in detached mode.** This is a safety invariant — approval gates are never silently bypassed.
|
|
648
771
|
|
|
649
772
|
## Development
|
|
650
773
|
|
|
651
774
|
```bash
|
|
652
775
|
npm install
|
|
653
|
-
npm run typecheck
|
|
654
|
-
npm test
|
|
655
|
-
npm run test:e2e
|
|
776
|
+
npm run typecheck # tsc --noEmit
|
|
777
|
+
npm test # unit tests — no network, no process spawning
|
|
778
|
+
npm run test:e2e # real end-to-end (spawns live subagents; needs model access)
|
|
779
|
+
npm run test:e2e-context
|
|
780
|
+
npm run test:e2e-context-value
|
|
781
|
+
npm run test:e2e-team
|
|
782
|
+
npm run test:e2e-spawn-subflow
|
|
783
|
+
npm run test:dogfood-cache
|
|
656
784
|
```
|
|
657
785
|
|
|
658
786
|
Runtime lives in `extensions/`, tests in `test/`, and runnable examples in `examples/`.
|
|
659
787
|
|
|
660
788
|
## Contributing
|
|
661
789
|
|
|
662
|
-
Contributions welcome — this is a young, fast-moving project. Open an issue or PR on [GitHub](https://github.com/heggria/pi-taskflow). Good first contributions: new example flows, phase-type ideas, and TUI polish.
|
|
790
|
+
Contributions welcome — this is a young, fast-moving project. Open an issue or PR on [GitHub](https://github.com/heggria/pi-taskflow). Good first contributions: new example flows, phase-type ideas, and TUI polish. See [`CONTRIBUTING.md`](./CONTRIBUTING.md) and [`AGENTS.md`](./AGENTS.md) for coding conventions and common task recipes.
|
|
663
791
|
|
|
664
792
|
## License
|
|
665
793
|
|