npm - pi-taskflow - Versions diffs - 0.0.11 → 0.0.13 - Mend

pi-taskflow 0.0.11 → 0.0.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

package/README.md +274 -6
package/extensions/agents/analyst.md +30 -0
package/extensions/agents/critic.md +31 -0
package/extensions/agents/doc-writer.md +43 -0
package/extensions/agents/executor-code.md +36 -0
package/extensions/agents/executor-fast.md +26 -0
package/extensions/agents/executor-ui.md +35 -0
package/extensions/agents/executor.md +29 -0
package/extensions/agents/final-arbiter.md +29 -0
package/extensions/agents/plan-arbiter.md +35 -0
package/extensions/agents/planner.md +30 -0
package/extensions/agents/recover.md +28 -0
package/extensions/agents/reviewer.md +37 -0
package/extensions/agents/risk-reviewer.md +37 -0
package/extensions/agents/scout.md +51 -0
package/extensions/agents/security-reviewer.md +39 -0
package/extensions/agents/test-engineer.md +31 -0
package/extensions/agents/verifier.md +29 -0
package/extensions/agents/visual-explorer.md +32 -0
package/extensions/agents.ts +33 -2
package/extensions/cache.ts +263 -0
package/extensions/index.ts +201 -10
package/extensions/init.ts +607 -0
package/extensions/render.ts +39 -0
package/extensions/runtime.ts +342 -17
package/extensions/schema.ts +166 -1
package/extensions/store.ts +16 -2
package/package.json +4 -3

package/README.md CHANGED Viewed

@@ -7,7 +7,7 @@
   <a href="https://www.npmjs.com/package/pi-taskflow"><img src="https://img.shields.io/npm/dm/pi-taskflow?style=flat-square&color=6E8BFF&label=downloads" alt="npm downloads"></a>
   <a href="./LICENSE"><img src="https://img.shields.io/badge/license-MIT-43D9AD?style=flat-square" alt="MIT license"></a>
   <a href="#whats-inside"><img src="https://img.shields.io/badge/runtime%20deps-0-43D9AD?style=flat-square" alt="zero runtime dependencies"></a>
-  <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-265-6E8BFF?style=flat-square" alt="265 tests"></a>
+  <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-371-6E8BFF?style=flat-square" alt="371 tests"></a>
   <a href="https://pi.dev"><img src="https://img.shields.io/badge/for-Pi%20coding%20agent-B692FF?style=flat-square" alt="for the Pi coding agent"></a>
 </p>
@@ -34,6 +34,10 @@ Here's the wall you hit with raw subagents: you describe a multi-step plan in pr
 `pi-taskflow` moves the plan **out of the prompt and into a declarative definition.** The runtime owns the DAG, the loops, the retries, and the intermediate state. You declare a pipeline once and run it a hundred times — by name.
+<div align="center">
+<img src="./assets/context-isolation.png" alt="With raw subagents every transcript floods your context; with pi-taskflow transcripts stay in the runtime and only the final result returns" width="900">
+</div>
 > When a job needs twelve steps with branching fan-out and a review gate, you want orchestration — not lucky prompting.
 | | subagent (built-in) | **pi-taskflow** |
@@ -55,6 +59,36 @@ Here's the wall you hit with raw subagents: you describe a multi-step plan in pr
 It doesn't replace the subagent tool. It gives your subagents a DAG, a memory, and a name.
+## Compared to other Pi extensions
+The Pi ecosystem now has **20+ delegation, workflow, and orchestration extensions** — each great at what it's for. Here's an honest map of where `pi-taskflow` sits (verified against each package's latest npm release, June 2026). For the full breakdown — every package, strengths *and* weaknesses — see [`PI-ECOSYSTEM.md`](./PI-ECOSYSTEM.md). For the broader, non-Pi landscape (LangGraph, Temporal, CrewAI, Mastra…) see [`COMPETITORS.md`](./COMPETITORS.md).
+| Extension | Model | Custom DSL | DAG | Dynamic fan-out | Cross-session resume | Quality gate | Human approval | Save as command | Zero deps |
+|---|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+| **pi-taskflow** | **declarative multi-phase taskflows** | **✓** | **✓** | **✓ `map`** | **✓ phase-hash** | **✓** | **✓** | **✓ `/tf:<name>`** | **✓** |
+| [`@pi-agents/orchid`](https://www.npmjs.com/package/@pi-agents/orchid) | opinionated 9-phase pipeline + Ralph loop | fixed | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✕ (2) |
+| [`pi-crew`](https://www.npmjs.com/package/pi-crew) | role teams + git worktrees + async | partial | ✓ | ✓ | ✓ | ✓ | ✓ | – | ✕ (7) |
+| [`ultimate-pi`](https://www.npmjs.com/package/ultimate-pi) | governed plan→execute→review harness | YAML contracts | ✓ (plan-time) | ✕ | ✓ | ✓ (3-tier) | ✓ | ✓ | ✕ (16) |
+| [`@zhushanwen/pi-workflow`](https://www.npmjs.com/package/@zhushanwen/pi-workflow) | JS scripts (`agent`/`parallel`/`pipeline`) | yes (JS) | ✕ (linear) | ✓ | ✓ | ✕ | ✕ | ✓ (call cache) | ✓ |
+| [`@fiale-plus/pi-rogue-orchestration`](https://www.npmjs.com/package/@fiale-plus/pi-rogue-orchestration) | timer loop + goal resolution | ✕ | ✕ | ✕ | ✓ | ✓ (goal-check) | ✕ | ✕ | ✓ |
+| [`pi-subagents`](https://www.npmjs.com/package/pi-subagents) | single / parallel / chain delegation | ✕ | ✕ | static | – | ✕ | clarify | named workflows | ✕ (3) |
+| [`@gotgenes/pi-subagents`](https://www.npmjs.com/package/@gotgenes/pi-subagents) | Claude-Code-style subagents + worktrees | ✕ | ✕ | ✕ | ✓ (by id) | ✕ | per-agent | ✕ | ✕ (1) |
+| [`pi-pipeline`](https://www.npmjs.com/package/pi-pipeline) | fixed SPEC→PLAN→TASKS→VERIFY | ✕ | fixed | ✕ | session planning | ✓ | clarify | ✕ | ✕ (2) |
+| [`pi-agent-flow`](https://www.npmjs.com/package/pi-agent-flow) | one-shot parallel specialist `fork` | yes | ✕ | ✕ | – | ✕ | ✕ | – | ✕ (2) |
+*(Representative slice of the 20+ — see [`PI-ECOSYSTEM.md`](./PI-ECOSYSTEM.md) for all of them, plus `@0xkobold/pi-orchestration`, `@melihmucuk/pi-crew`, `@mediadatafusion/pi-workflow-suite`, `gentle-pi`, `@dreki-gg/pi-subagent`, and more.)*
+**How to choose:**
+- **`@pi-agents/orchid`** is the most feature-complete orchestrator in the ecosystem (DAG + worktrees + Ralph loop + agent mailbox) — but its DSL is a *fixed* 9-phase pipeline, it carries runtime deps + jiti, and it's beta. Reach for `pi-taskflow` when you want to **define your own graph** (not adopt an opinionated one) with **zero dependencies** and a one-command install.
+- **`pi-crew` / `ultimate-pi`** go heavier — worktree isolation, durable async teams, multi-tier governance. If you want lightweight, declarative, and zero-dependency, that's this project.
+- **`@zhushanwen/pi-workflow`** is the closest in spirit and also zero-dep, but you author workflows as **JavaScript scripts**. `pi-taskflow`'s **declarative JSON DSL** is safer and more auditable, and its **phase-level input-hash resume** is more granular than call-cache dedup.
+- **`@fiale-plus/pi-rogue-orchestration`** has a real **loop-until-done** (a feature `pi-taskflow` doesn't yet have). If your job is "keep going until the goal is met," it's worth a look; `pi-taskflow` is for *structured, branching* pipelines instead.
+- **`pi-subagents` / `@gotgenes/pi-subagents`** are the mature picks for ad-hoc "use reviewer on this diff" delegation and background jobs. `pi-taskflow` is for when those delegations need to become a *repeatable, resumable pipeline*.
+- **`pi-pipeline` / `pi-agent-flow`** ship *opinionated, fixed* flows. `pi-taskflow` ships an *empty canvas*: you (or the model) declare the graph that fits the job.
+> The honest one-liner: **`pi-taskflow` is the only Pi extension that gives you a declarative, resumable, DAG-shaped subagent pipeline you save as a one-word command — with zero runtime dependencies and context isolation by design.** The known gaps it's closing next: loop-until-done, worktree isolation, and non-blocking background runs (see [`STRATEGY.md`](./STRATEGY.md)).
 ## 30-second start
 **1. Install** — one command:
@@ -63,6 +97,10 @@ It doesn't replace the subagent tool. It gives your subagents a DAG, a memory, a
 pi install npm:pi-taskflow
 ```
+> **Optional:** run `/tf init` once to map the 18 built-in agents' model roles
+> (`fast`, `strong`, `thinker`, …) to your own models — an interactive picker.
+> Skip it and agents just use Pi's default model. See [Model roles](#model-roles).
 **2. Run** — just ask the model in a Pi session:
 > *Run a chain: first explore the auth flow, then summarize the findings.*
@@ -191,6 +229,8 @@ No scripting. No `eval`. Just data the runtime executes — safe enough to run L
 | `reduce` | aggregate `from[]` phase outputs into one | `from`, `task` |
 | `approval` | **human-in-the-loop** pause — approve / reject / edit | — |
 | `flow` | run a **saved sub-flow** as one phase (composition) | `use` |
+| `loop` | **iterate a task until done** — re-run a body until a condition, convergence, or a cap | `task`, `until` |
+| `tournament` | **N variants compete**, a judge picks the best (or aggregates) | `task` \| `branches` |
 ### Common phase fields
@@ -210,6 +250,7 @@ Every phase needs a unique `id` and a `type` (defaults to `agent`). On top of th
 | `final` | Marks the result-bearing phase (else the last phase wins) |
 | `optional` | A failure here does **not** abort the run |
 | `use` / `with` | (`flow`) saved sub-flow name + its args |
+| `cache` | `{ scope, ttl?, fingerprint? }` — cross-run memoization (see below) |
 Flow-level keys: `name`, `description`, `args`, `concurrency` (default 8), `agentScope`, and `budget: { maxUSD?, maxTokens? }`.
@@ -220,9 +261,73 @@ Flow-level keys: `name`, `description`, `args`, `concurrency` (default 8), `agen
 - **`retry`** — `{ "max": 2, "backoffMs": 500, "factor": 2 }` retries a failing subagent with fixed or exponential backoff; usage is summed and the attempt count shows as `↻N` in the TUI. Transient provider errors (rate-limit / 5xx / timeout) **auto-retry even without an explicit policy**; hard errors don't.
 - **`approval`** — pause for a human (Approve / Reject / Edit). Reject halts the flow; Edit injects the typed note as the phase output for downstream steps. Non-interactive runs auto-approve.
 - **`flow`** — `{ "type": "flow", "use": "deep-research", "with": { "topic": "{item}" } }` runs a saved flow as a phase (recursion is detected and rejected).
+### Loop-until-done (`loop`)
+Some work is inherently iterative — refine a draft until a reviewer is satisfied, retry-and-improve until tests pass, converge on an answer. A `loop` phase re-runs one task body until a stop condition holds:
+```jsonc
+{
+  "id": "refine",
+  "type": "loop",
+  "task": "Improve this draft (iteration {loop.iteration}). Previous attempt:\n{loop.lastOutput}\n\nReturn JSON {\"draft\":\"…\",\"done\":true|false}.",
+  "until": "{steps.refine.json.done} == true",   // the iteration's own output is exposed here
+  "output": "json",
+  "maxIterations": 6,        // default 10, hard cap 100 — the loop ALWAYS terminates
+  "convergence": true        // default: stop early if an iteration's output is identical to the last
+}
+```
+- **Body locals** — the task can read `{loop.iteration}` (1-based), `{loop.lastOutput}` (the prior iteration's output), and `{loop.maxIterations}` to build on its own previous work; all three are also available to the `until` condition.
+- **`until`** — evaluated after each iteration with the iteration's output exposed as `{steps.<thisId>.output}` / `.json`. Same operators as `when`. The loop stops the moment it's truthy.
+- **Always terminates.** Four independent stops: `until` truthy, **convergence** (a fixed point — output identical to the previous iteration), **`maxIterations`** (hard-capped at 100), or a **failing iteration** (the phase fails with the partial output preserved). A malformed `until` **stops** the loop rather than spinning forever (fail-safe) and surfaces a warning on the phase.
+- The TUI shows `↻N` with the stop reason (`done` / `converged` / `max` / `failed`); usage is summed across iterations. Like `gate`/`approval`, `loop` is **excluded from `cross-run` cache** (each run must iterate fresh).
+### Tournament (`tournament`)
+For open-ended work, the best result often comes from generating several candidates and picking the strongest — best-of-N with a judge, in one declarative phase:
+```jsonc
+{
+  "id": "headline",
+  "type": "tournament",
+  "task": "Write a punchy headline for this launch post.",
+  "variants": 4,                    // spawn 4 competitors of the SAME task (default 3, max 20)
+  "judge": "Pick the headline with the strongest hook and clearest promise.",
+  "judgeAgent": "reviewer",          // optional; defaults to the phase agent
+  "mode": "best"                     // "best" (default) | "aggregate"
+}
+```
+- **Competitors** — either `variants: N` copies of one `task` (diversity comes from model nondeterminism), or distinct `branches: [{task, agent?}, …]` when you want to pit *different approaches* against each other.
+- **Judge** — after the fan-out, one judge agent sees every variant (numbered) plus your `judge` rubric and picks a winner via a `WINNER: <n>` line or `{"winner": n}`. An unreadable verdict **fails open** to variant 1; a failed judge falls back too — the work is never lost.
+- **`mode`** — `best` returns the winning variant **verbatim**; `aggregate` returns the judge's **synthesized** answer combining the strongest parts.
+- **Short-circuits:** if only one competitor survives, it wins with no judge call; if all fail, the phase fails. The TUI shows `⚑ N→#k`; usage sums variants + judge. Like `gate`, it's **excluded from `cross-run` cache**.
 - **`budget`** — a run-wide `{maxUSD, maxTokens}` ceiling; once exceeded, pending phases skip and in-flight fan-out stops spawning, ending the run as `blocked`.
 - **idle watchdog** — a subagent that goes silent for 5 minutes is treated as wedged and killed (SIGTERM → SIGKILL), so one hung child can never freeze the whole flow.
+### Cross-run memoization (`cache`)
+Every phase is already content-addressed: within a single run's **resume**, a phase whose resolved inputs are unchanged is skipped. `cache` extends that reuse **across independent runs** — if any prior run computed a phase with an identical input hash, its result is reused for **$0.00**.
+```jsonc
+{
+  "id": "analyze-auth",
+  "task": "Summarize how the auth module works.",
+  "context": ["src/auth/**/*.ts"],
+  "cache": {
+    "scope": "cross-run",                 // "run-only" (default) | "cross-run" | "off"
+    "ttl": "6h",                          // optional max age before a hit is treated as a miss
+    "fingerprint": ["git:HEAD", "glob:src/auth/**/*.ts"]  // fold world-state into the key
+  }
+}
+```
+- **`scope`** — `"run-only"` (default) is exactly the historical behavior (within-run resume only). `"cross-run"` opts the phase into the persistent store. `"off"` disables reuse entirely (even within a run), for debugging.
+- **Freshness is the whole game.** The cache key already includes the prompt, the `over` items, and any `context` files (pre-read into the task). `fingerprint` folds *implicit* inputs into the key so "the world changed" becomes a cache miss: `git:HEAD`, `glob:<pat>` (size+mtime), `glob!:<pat>` (content hash), `file:<path>`, `env:<NAME>`. `ttl` (`30m`/`6h`/`7d`) is a time backstop.
+- **Honest limit:** a subagent that reads a file it didn't declare in `context`/`fingerprint` can still serve a stale `cross-run` hit. That's why the default is `run-only` and why `gate`/`approval` phases are **forbidden** from `cross-run` (they must produce a fresh result each run). Opt in only for phases whose output is a function of declared inputs.
+- Cache lives in `.pi/taskflows/cache/` (gitignored). Clear it with `action: "cache-clear"`. Full rationale: [`docs/rfc-cross-run-memoization.md`](./docs/rfc-cross-run-memoization.md).
 ### Gate phases (quality control)
 A `gate` runs an agent to review upstream output and can **block the rest of the workflow.** End the gate task by asking for a verdict the runtime can read:
@@ -264,9 +369,20 @@ Saved flows become CLI shortcuts. All commands run in the Pi session:
 | `/tf show <name>` | Print a flow's definition |
 | `/tf runs` | Browse recent run history (interactive TUI) |
 | `/tf resume <runId>` | Continue a paused/failed run — cached phases skip automatically |
+| `/tf init` | **Interactively map model roles** to your enabled models (writes `~/.pi/agent/settings.json`) |
 | `/tf:<name> [args]` | Shortcut — runs the flow in one tap |
-Tool actions (used by the model): `run` (inline `define` or saved `name`), `save`, `resume`, `list`.
+Tool actions (used by the model): `run` (inline `define` or saved `name`), `save`, `resume`, `list`, `init`.
+## Resume across sessions
+A taskflow run isn't tied to your session. Every completed phase is written to disk, so a run that fails (or that you stop) can be continued later with `/tf resume <runId>` — **cached phases skip automatically** and only the remaining work spends tokens.
+<div align="center">
+<img src="./assets/resume.png" alt="A run fails midway in session 1; in session 2 /tf resume skips the cached phases and only re-runs the failed phase and what follows" width="900">
+</div>
+Resume is keyed on each phase's input hash — if an upstream output changed, dependent phases re-run; if nothing changed, they're reused. No competing Pi extension does this across sessions.
 ## Storage
@@ -288,7 +404,159 @@ Agent discovery scope (via `agentScope` in the flow definition):
 ## Agents
-Taskflow reuses your existing Pi agent files (`~/.pi/agent/agents/*.md`, `.pi/agents/*.md`) — reference them by `name` in any phase or shorthand. The runtime extracts each agent's `systemPrompt` from its `.md` frontmatter and passes it via `--append-system-prompt`; phase-level `model` / `thinking` / `tools` overrides map to the matching subagent flags. Settings from `~/.pi/agent/settings.json` (`subagents.agentOverrides`) are honored across all flows.
+Taskflow ships **18 built-in agents** — each a `.md` file with a tuned system prompt, thinking level, and tool set. You can reference them by `name` in any phase or shorthand, right after install. No setup required.
+### Built-in agent roster
+| Agent | Role | Thinking | Default role |
+|---|---|---:|---|
+| `executor` | Implement planned code changes | high | `{{fast}}` |
+| `executor-fast` | Trivial fixes (≤2 files, ≤50 lines) | off | `{{fast}}` |
+| `executor-code` | Complex multi-file implementation | high | `{{strong}}` |
+| `executor-ui` | Frontend / styling / visual changes | high | `{{vision}}` |
+| `scout` | Fast codebase recon & file mapping | off | `{{fast}}` |
+| `planner` | Implementation plan creation | high | `{{strong}}` |
+| `analyst` | Requirements analysis, ambiguity detection | high | `{{thinker}}` |
+| `critic` | Inline self-doubt during reasoning | xhigh | `{{thinker}}` |
+| `reviewer` | General code / architecture review | high | `{{strong}}` |
+| `risk-reviewer` | Backend / infra / DB / API risk | high | `{{reasoner}}` |
+| `security-reviewer` | Security vulns, auth/crypto | xhigh | `{{reasoner}}` |
+| `plan-arbiter` | Plan quality gate (complex tasks) | high | `{{arbiter}}` |
+| `final-arbiter` | Tiebreaker when critics disagree | xhigh | `{{arbiter}}` |
+| `test-engineer` | Design & implement tests | high | `{{fast}}` |
+| `doc-writer` | Documentation authoring | off | `{{fast}}` |
+| `recover` | Session recovery after compaction | low | `{{fast}}` |
+| `verifier` | Run tests, validate outcomes | off | `{{fast}}` |
+| `visual-explorer` | Figma design metadata analysis | high | `{{vision}}` |
+Agents are layered: **built-in → user (`~/.pi/agent/agents/`) → project (`.pi/agents/`)**. A user or project agent with the same `name` overrides the built-in — so you can customize any agent without touching the package.
+### Model roles
+Each built-in agent's `model` field uses a **role placeholder** (e.g. `{{fast}}`) instead of a hardcoded provider string. This decouples *intent* from *implementation* — you map roles to models once, and every agent adapts.
+| Role | Intent | Typical model |
+|---|---|---|
+| `{{fast}}` | Cheap & quick — high-volume, low-stakes | DeepSeek V4 Flash |
+| `{{strong}}` | Balanced — planning, review, moderate complexity | MiMo v2.5 Pro |
+| `{{thinker}}` | Deep analysis — requirements, critique | DeepSeek V4 Pro |
+| `{{arbiter}}` | Final judgment — tiebreak, plan quality gates | Qwen 3.7 Max |
+| `{{vision}}` | Multimodal — UI work, design reading | MiniMax M3 |
+| `{{reasoner}}` | Cautious reasoning — security, risk | GLM 5.1 |
+Without configuration, agents fall back to Pi's default model. To map roles to real models, run the interactive setup:
+```bash
+/tf init
+```
+`/tf init` starts with an **action menu**. First-time users get a 2-option shortcut ("Use recommended defaults" / "Configure each role"). Returning users see the full 5-option menu:
+```
+? What do you want to do with model roles?
+  ❯ Use recommended defaults
+    Configure each role
+    Edit one role
+    Show current roles
+    Cancel
+```
+The picker shows model **display names** with capability flags and current/recommended markers:
+```
+? Model for 'vision' — Multimodal (executor-ui, visual-explorer)
+  Current: openrouter/anthropic/claude-sonnet-4-6
+  Recommended: minimax/MiniMax-M3
+  ───────────────
+  ❯ MiniMax M3 (minimax/MiniMax-M3) · image ✓ · reasoning ✓ · (recommended)
+    Claude Sonnet 4.6 (openrouter/anthropic/...) · image ✓ · reasoning ✓ · (current)
+    GPT-5 (openrouter/openai/gpt-5) · image ✓
+    DeepSeek V4 Flash (openrouter/deepseek/v4-flash)
+    ───────────────
+    Custom (type your own)
+    Keep current
+    Back to action menu
+```
+Before saving, a **preview screen** shows the diff of your changes:
+```
+? Review changes:
+  fast       openrouter/deepseek/deepseek-v4-flash   (unchanged)
+  strong     openrouter/xiaomi/mimo-v2.5-pro         (unchanged)
+  thinker    openrouter/qwen/qwen3.7-max             (changed ← was: openrouter/deepseek/v4-pro)
+  arbiter    openrouter/qwen/qwen3.7-max             (unchanged)
+  vision     minimax/MiniMax-M3                      (unchanged)
+  reasoner   z-ai/glm-5.1                            (unchanged)
+  ───────────────
+  ❯ Save these changes
+    Edit a role
+    Cancel
+```
+Your choices are written to `~/.pi/agent/settings.json`:
+```json
+{
+  "modelRoles": {
+    "fast":     "openrouter/deepseek/deepseek-v4-flash",
+    "strong":   "openrouter/xiaomi/mimo-v2.5-pro",
+    "thinker":  "openrouter/deepseek/deepseek-v4-pro",
+    "arbiter":  "openrouter/qwen/qwen3.7-max",
+    "vision":   "minimax/MiniMax-M3",
+    "reasoner": "z-ai/glm-5.1"
+  }
+}
+```
+Edit the values manually any time, or just re-run `/tf init`. You can also override individual agents via `subagents.agentOverrides` in the same file:
+```json
+{
+  "modelRoles": { ... },
+  "subagents": {
+    "agentOverrides": {
+      "executor": { "model": "anthropic/claude-sonnet-4-20250514" },
+      "reviewer": { "thinking": "xhigh" }
+    }
+  }
+}
+```
+### Tool path (`action="init"`)
+The model can also configure roles via the `taskflow` tool:
+| Mode | Behavior |
+|---|---|
+| `mode: "show"` (default) | Read-only report of current `modelRoles`. Never overwrites. |
+| `mode: "apply-defaults"` + `force: true` | Writes `RECOMMENDED_DEFAULTS` to `settings.json`, preserving stale keys. |
+| `mode: "interactive"` | Launches the full action menu + picker flow (requires a UI session). |
+> **v0.0.13 deprecation note:** If `mode` is omitted, the tool falls back to v0.0.12 behavior when `modelRoles` is empty (auto-writes defaults) with a `console.warn` deprecation notice. If `modelRoles` already exists, it behaves as `mode: "show"`. This bridge will be removed in v0.0.14.
+### Custom agents
+Drop a `.md` file into `~/.pi/agent/agents/` (user-level) or `.pi/agents/` (project-level, commit it) to add your own:
+```markdown
+---
+name: my-linter
+description: Run ESLint and report violations
+tools: read, bash
+model: "{{fast}}"
+thinking: off
+---
+You are a linting agent. Run `npx eslint --format json` on the
+provided files. Report violations grouped by file. No fixes.
+```
+Then reference it in any phase: `{ "agent": "my-linter", "task": "Lint src/" }`.
 ## Examples
@@ -306,12 +574,12 @@ Copy one into `.pi/taskflows/<name>.json` (or `~/.pi/agent/taskflows/`) and it r
 <div align="center">
-**0 runtime dependencies** · **265 tests** · **7 phase types** · **cross-session resume** · **~4.4k LOC runtime**
+**0 runtime dependencies** · **371 tests** · **10 phase types** · **cross-session resume** · **cross-run memoization** · **~4.9k LOC runtime**
 </div>
 - **Zero runtime dependencies.** No `dependencies` field — the runtime is built entirely on Node built-ins (`fs` / `path` / `os` / `child_process` / `crypto`). The file lock is `fs.openSync("wx")`, not a third-party library.
-- **265 tests across 11 suites** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, gate verdicts, budget caps, retry/backoff, approval flows, sub-flow composition, callback isolation, and the idle watchdog — plus a live end-to-end test that spawns real subagents.
+- **371 tests across 14 suites** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, callback isolation, the idle watchdog, model-role init config, and parseModelFromLabel with parenthesized-model-name regression — plus a live end-to-end test that spawns real subagents and a cross-run cache dogfood.
 - **Hardened by design.** Path-traversal defense (lexical + `realpath`), runId validation, HTML/error sanitization, atomic writes, stale-lock stealing via `rename`, and an idle watchdog that kills wedged subagents.
 - **Dogfooded.** Every new feature has to survive the project's own `self-improve` taskflow before it ships.
@@ -319,7 +587,7 @@ If this saves you a context window, **drop a ⭐ on [GitHub](https://github.com/
 ## Status & limits
-**v0.0.10** — full control-flow & reliability layer (`when` guards, `join: any`, `retry`/backoff, `approval`, `flow` composition, `budget` caps) on top of the DSL + DAG runtime (`agent`/`parallel`/`map`/`gate`/`reduce`), inline + saved flows, cross-session resume, live progress, and isolated context. A run executes as one streaming tool call.
+**v0.0.13** — loop-until-done (`loop` phase: iterate to a condition, convergence, or cap), tournament (best-of-N with a judge), cross-run memoization (content-addressed cache with git/file/glob/env fingerprints and TTL), interactive `/tf init` with role-aware model pickers + diff preview + atomic merge-write, 18 built-in agents with 6 model roles. Full control-flow & reliability layer (`when` guards, `join: any`, `retry`/backoff, `approval`, `flow` composition, `budget` caps, idle watchdog) on top of the DSL + DAG runtime (`agent`/`parallel`/`map`/`gate`/`reduce`). Inline + saved flows, cross-session resume, live progress, and isolated context. A run executes as one streaming tool call.
 Known boundaries (tracked, bounded — no surprises mid-flow):

package/extensions/agents/analyst.md ADDED Viewed

@@ -0,0 +1,30 @@
+---
+name: analyst
+description: Analyze requirements, ambiguity, and hidden constraints
+tools: read, grep, find, ls, bash
+model: "{{thinker}}"
+thinking: high
+---
+You are a requirements analyst.
+Your job is to identify what is known, unknown, risky, ambiguous, or underspecified in a given task, request, or codebase. You produce clarifying assumptions and acceptance criteria. Do not write files or edit code.
+Working rules:
+- Start from the context already provided in the task. It may already contain code snippets, file summaries, or upstream outputs. Only read additional files when the provided context is clearly insufficient for a concrete answer.
+- If you must explore, read the smallest set of files needed — do not re-explore the whole repository.
+- Use bash only for targeted inspection: narrow git log queries, focused rg searches, or specific test runs. Avoid broad exploration commands.
+- Separate facts from assumptions; flag every assumption with its risk level.
+- Surface hidden constraints (time, dependencies, compatibility, data integrity).
+- Identify stakeholders that may be impacted implicitly.
+- Prefer concrete acceptance criteria that are testable and falsifiable.
+Output format:
+## Analysis
+- Known: facts confirmed by code or docs (cite evidence).
+- Unknowns: gaps that block progress, ordered by impact.
+- Assumptions: what we're assuming and the risk if wrong.
+- Constraints: technical, organizational, or temporal limits.
+- Recommended acceptance criteria: numbered, testable, and specific.
+- Open decisions: questions that require a human or supervisor answer.

package/extensions/agents/critic.md ADDED Viewed

@@ -0,0 +1,31 @@
+---
+name: critic
+description: Challenges planner and main-agent conclusions before risky decisions
+tools: read, grep, find, ls
+model: "{{thinker}}"
+thinking: xhigh
+---
+You are the critic subagent.
+Your job is to disprove weak plans, challenge hidden assumptions, and find contradictions before the main agent commits to an implementation or architecture decision. You do not write files, edit code, or run commands that mutate state.
+**When you operate:** You operate during the main agent's reasoning phase as a self-doubt mechanism. You are NOT a downstream quality gate — that is `plan-arbiter`'s job. You challenge conclusions inline, before a formal plan is produced.
+**Tool note:** You have read-only tools only. You cannot run bash commands. If you need git diff or test output, request it from the orchestrator.
+Working rules:
+- Reconstruct the main conclusion or proposed plan before critiquing it.
+- Check whether the plan matches the user's stated constraints, repo evidence, and current environment.
+- Look for missing requirements, unverified assumptions, unnecessary complexity, compatibility risks, and test gaps.
+- Prefer concrete counterexamples over broad opinions.
+- If the plan is sound, say so and identify the remaining residual risks.
+Output format:
+## Critique
+- Summary: one sentence on whether the conclusion should stand.
+- Strong points: what is valid and evidence-backed.
+- Weak points: concrete risks, contradictions, or missing evidence.
+- Recommended correction: the smallest change to make the plan safer.
+- Questions: only decisions that block progress.

package/extensions/agents/doc-writer.md ADDED Viewed

@@ -0,0 +1,43 @@
+---
+name: doc-writer
+description: Author and edit documentation FILES on disk (README, guides, changelogs, docs)
+tools: read, grep, find, ls, bash, edit, write
+model: "{{fast}}"
+thinking: off
+---
+You are a documentation specialist who writes documentation **to disk**.
+Your job is to author and edit documentation files — READMEs, guides, changelogs,
+migration notes, API docs, architecture docs — producing clear, concise,
+maintainable, technically accurate prose with no marketing fluff.
+Scope discipline (critical):
+- **Use provided context first.** The task may already include diffs, source
+  snippets, or upstream outputs. Only read additional files when the provided
+  context is clearly insufficient for a precise, verifiable claim.
+- **Read minimally.** When you must read, grab only the files needed to confirm
+  a specific technical claim. Do not re-explore the entire repository.
+- **Write narrowly.** You may create or edit **documentation files only**
+  (e.g. `*.md`, `*.mdx`, `docs/**`, README, CHANGELOG).
+- **Never modify** source code, tests, configs, or build files. If a doc change
+  seems to require a code change, STOP and report it instead of doing it.
+- Make the smallest coherent change that satisfies the task; do not broaden scope.
+Working rules:
+- Confirm technical accuracy from the provided context first. Only read
+  additional source files when a claim cannot be verified from what you already have.
+- Use bash only for targeted inspection: narrow `git log`, `git diff`, or `rg`
+  queries to verify a specific fact. Do not use bash for broad exploration.
+- Match the existing documentation style and formatting conventions of the project.
+- Write for the intended audience: developers, operators, or end users.
+- Prefer concrete, verified examples over abstract descriptions; never invent
+  facts, numbers, or behavior — confirm against the source.
+- Keep documents self-contained but cross-reference related docs when useful.
+- Avoid duplication: reference existing information instead of copying it.
+Final response:
+- Wrote/edited: exact file paths.
+- Summary: what changed and why.
+- Verification: how you confirmed technical claims (commands/files read).
+- Escalation: anything that would need a source/code change (do NOT make it).

package/extensions/agents/executor-code.md ADDED Viewed

@@ -0,0 +1,36 @@
+---
+name: executor-code
+description: Full-capability code executor for complex multi-file changes
+tools: read, grep, find, ls, bash, edit, write
+model: "{{strong}}"
+thinking: high
+---
+You are a full-capability code executor for complex, multi-file changes. You operate in an isolated context window without polluting the main conversation.
+**Selection criteria:** Use this agent when the change involves ≥ 5 files, cross-module dependencies, structural refactors, new architectural patterns, or changes that require deep reasoning about interactions between components.
+You have all tools available — read, write, edit, bash, grep, find, ls. Work autonomously.
+**Git responsibility:** After implementing changes, commit them with a descriptive message following the project's commit convention. If the change is part of a larger workflow, create a branch first.
+Working rules:
+- **Evidence-first mandate (P12):** Start from the provided plan and context. Only read additional files when the provided information is clearly insufficient for a concrete implementation decision. When you must read, target only the files directly implicated by the plan — do not re-explore the entire repository. Cross-module changes are expected but should be driven by the plan, not by discovery.
+- Follow local coding patterns, naming conventions, formatting, and test style.
+- Make the smallest coherent implementation that satisfies the task.
+- Run targeted validation after implementation.
+Output format when finished:
+## Completed
+What was done.
+## Files Changed
+- `path/to/file.ts` - what changed
+## Validation
+Commands run and results.
+## Notes (if any)
+Anything the main agent should know.
+- Decisions: key architectural choices, tradeoffs made, deviations from the original plan (if any).

package/extensions/agents/executor-fast.md ADDED Viewed

@@ -0,0 +1,26 @@
+---
+name: executor-fast
+description: Fast executor for scanning, command runs, summaries, and low-risk small edits
+tools: read, grep, find, ls, bash, edit, write
+model: "{{fast}}"
+thinking: off
+---
+You are the executor-fast subagent.
+Your job is to handle low-risk, localized work quickly: file scanning, command execution, mechanical cleanup, tiny edits, and concise result summaries.
+**Selection criteria:** Use this agent when the change involves ≤ 2 files, ≤ 50 lines changed, no new files created, no cross-module dependencies, and no architectural decisions needed.
+Working rules:
+- Keep scope narrow and avoid architecture decisions.
+- Use existing repo patterns and touch the fewest files needed.
+- Do not perform broad refactors, migrations, or speculative changes.
+- If the task becomes cross-file, ambiguous, or risky, stop and report back.
+- Run relevant verification when practical and report exact commands.
+- Commit changes after implementation if the workflow requires it.
+Final response:
+- Changed: files or state touched.
+- Validation: commands run and results.
+- Escalation: anything too risky for executor-fast.

package/extensions/agents/executor-ui.md ADDED Viewed

@@ -0,0 +1,35 @@
+---
+name: executor-ui
+description: UI-focused executor for frontend component, layout, and styling changes
+tools: read, grep, find, ls, bash, edit, write
+model: "{{vision}}"
+thinking: high
+---
+You are a UI-focused code executor.
+Your job is to implement frontend changes — components, layouts, styling, animations, responsive design, and visual polish. You operate in an isolated context window to make changes without polluting the main conversation.
+**Selection criteria:** Use this agent when the change is primarily visual/UI — CSS/styling, component layout, responsive breakpoints, animation, or when a vision-capable model (MiniMax M3) is beneficial for understanding design intent.
+Working rules:
+- Start from the provided plan and design context. Only read additional files when the provided information is insufficient.
+- Follow the project's existing component patterns, naming conventions, and styling approach.
+- Make targeted, minimal changes that satisfy the visual/UX requirement.
+- Test responsive behavior and visual correctness when possible.
+- Do not make backend, API, or architecture decisions; report back if the task touches those areas.
+- Commit changes after implementation if the workflow requires it.
+Output format when finished:
+## Completed
+What was done.
+## Files Changed
+- `path/to/file.tsx` - what changed
+## Visual Notes (if any)
+Anything the main agent should know about the UI changes.
+## Escalation (if any)
+Anything needing backend changes or architecture decisions.

package/extensions/agents/executor.md ADDED Viewed

@@ -0,0 +1,29 @@
+---
+name: executor
+description: Implement planned code changes
+tools: read, grep, find, ls, bash, edit, write
+model: "{{fast}}"
+thinking: high
+---
+You are an implementation specialist.
+Your job is to follow the provided plan, make targeted code changes, keep edits minimal, and report changed files plus validation status. Do not broaden scope without explaining why.
+**Selection criteria:** Use this agent as the default executor for changes involving 1–4 files with a clear plan. For ≥ 5 files or cross-module changes, use `executor-code`. For ≤ 2 trivial files, use `executor-fast`. For UI-only changes, use `executor-ui`.
+Working rules:
+- **Evidence-first mandate (P12):** Start from the provided plan and context. Only read additional files when the provided information is insufficient for a concrete implementation decision. When you must read, target only the files directly implicated by the plan — do not re-explore the entire repository. If the plan is ambiguous, report the ambiguity rather than inferring intent from unrelated code.
+- Validate the plan against the actual code before changing files.
+- Make the smallest coherent implementation that satisfies the task.
+- Follow local coding patterns, naming conventions, formatting, and test style.
+- Do not invent product or architecture decisions; report back if the plan is ambiguous or needs revision.
+- After implementation, run targeted validation when possible.
+- Commit changes after implementation following the project's commit convention.
+Final response:
+- Implemented: concise summary of what was done.
+- Changed files: exact paths.
+- Validation: commands run and outcome.
+- Escalation: anything needing supervisor or planner attention.
+- Decisions: key architectural choices, tradeoffs made, deviations from the original plan (if any).

package/extensions/agents/final-arbiter.md ADDED Viewed

@@ -0,0 +1,29 @@
+---
+name: final-arbiter
+description: Makes final decisions when multiple plans, critiques, or reviews conflict
+tools: read, grep, find, ls
+model: "{{arbiter}}"
+thinking: xhigh
+---
+You are the final arbiter subagent.
+Your job is to make definitive decisions when multiple agents disagree — between competing plans, conflicting critiques, or split reviews. You are the tiebreaker. You do not write files, edit code, or run mutating commands.
+Working rules:
+- Reconstruct all competing positions from the provided context before deciding.
+- Weigh evidence objectively: code evidence > opinion, user requirements > internal preferences.
+- If one position has concrete counterexamples and the other does not, favor the counterexamples.
+- If both positions have merit, synthesize the safest path that preserves the user's intent.
+- State your decision clearly with reasoning — do not simply pick one side.
+- Flag any remaining residual risk or follow-up decisions.
+Output format:
+## Arbiter Decision
+- Summary: one sentence on the final call.
+- Positions considered: brief summary of each competing view.
+- Decision: what to do and why.
+- Reasoning: evidence and principles that justify the call.
+- Residual risks: what could still go wrong.
+- Follow-up: any actions needed after this decision.