pi-taskflow 0.0.6 → 0.0.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -22,9 +22,10 @@ saveable as a one-word `/tf:<name>` command.
22
22
  pi install npm:pi-taskflow
23
23
  ```
24
24
 
25
- Fan out one subagent per item, gate the results with an adversarial review, and
26
- get back only the final report none of the intermediate transcripts ever touch
27
- your conversation.
25
+ Fan out one subagent per item, route on results, retry the flaky ones, pause for
26
+ human approval, cap the spend, and gate the output with an adversarial review
27
+ all from one declarative definition. Only the final report reaches your
28
+ conversation; every intermediate transcript stays in the runtime.
28
29
 
29
30
  ## Why
30
31
 
@@ -45,6 +46,11 @@ only the final phase's output.
45
46
  | Scale | a few tasks | dynamic `map` fan-out |
46
47
  | Resumable | no | yes (cross-session, cached phases skip) |
47
48
  | Quality gates | no | `gate` phases with `VERDICT: BLOCK / PASS` |
49
+ | Conditional routing | no | `when` guards + `join: any` OR-joins |
50
+ | Fault tolerance | no | per-phase `retry` with backoff |
51
+ | Human-in-the-loop | no | `approval` phases (approve / reject / edit) |
52
+ | Cost control | no | run-wide `budget` (USD / token caps) |
53
+ | Composition | no | `flow` phases run saved sub-flows |
48
54
  | Progress visibility | opaque while running | live DAG render with timing + cost |
49
55
  | Ergonomics | inline JSON each time | shorthand (`task`/`tasks`/`chain`) or DSL |
50
56
 
@@ -137,6 +143,36 @@ only the final report back.
137
143
 
138
144
  Save it once → `/tf:summarize-files` forever.
139
145
 
146
+ ### Route, gate, and guard
147
+
148
+ Phases also **branch, retry, pause for a human, and respect a budget** — still
149
+ declaratively, no scripting:
150
+
151
+ ```jsonc
152
+ {
153
+ "name": "triage-and-fix",
154
+ "budget": { "maxUSD": 1.5 },
155
+ "phases": [
156
+ { "id": "triage", "type": "agent", "agent": "analyst", "output": "json",
157
+ "task": "Classify the bug. Output ONLY {\"severity\":\"high\"} or {\"severity\":\"low\"}." },
158
+ { "id": "deep", "when": "{steps.triage.json.severity} == high", "dependsOn": ["triage"],
159
+ "agent": "executor_code", "task": "Root-cause and patch it.",
160
+ "retry": { "max": 2, "backoffMs": 500 } },
161
+ { "id": "quick", "when": "{steps.triage.json.severity} == low", "dependsOn": ["triage"],
162
+ "agent": "executor_fast", "task": "Apply the quick fix." },
163
+ { "id": "approve", "type": "approval", "join": "any", "dependsOn": ["deep", "quick"],
164
+ "task": "Review the fix before it ships." },
165
+ { "id": "ship", "type": "agent", "dependsOn": ["approve"],
166
+ "task": "Open a PR with the change.", "final": true }
167
+ ]
168
+ }
169
+ ```
170
+
171
+ - **`when`** routes to `deep` *or* `quick` from the triage JSON; the other branch is skipped.
172
+ - **`join: "any"`** lets `approve` run as soon as whichever branch fired completes.
173
+ - **`retry`** re-runs a flaky patch with backoff; **`budget`** halts the whole run if it gets too expensive.
174
+ - **`approval`** pauses for a human (approve / reject / edit) before the final `ship`.
175
+
140
176
  ## Watch it run
141
177
 
142
178
  This is the live progress render for a real run — the `self-improve` flow that
@@ -181,11 +217,28 @@ writes and verifies its own test suites, caught here mid-block by a quality gate
181
217
  | `approval` | **human-in-the-loop** pause — approve / reject / edit before continuing | — |
182
218
  | `flow` | run a **saved sub-flow** as one phase (composition/reuse) | `use` |
183
219
 
184
- Every phase needs `id`. Optional fields: `agent`, `dependsOn`, `output`,
185
- `model`, `thinking`, `tools`, `cwd`, `concurrency`, `final`, `optional`,
186
- `when` (conditional guard), `join` (`all`\|`any` dependency join), `retry`
187
- (`{max, backoffMs, factor}`), and `with` (args for a `flow` phase).
188
- Run-wide: `budget: {maxUSD, maxTokens}` halts the flow when exceeded.
220
+ ### Common phase fields
221
+
222
+ Every phase needs a unique `id` and a `type` (defaults to `agent`). On top of the
223
+ per-type fields above:
224
+
225
+ | Field | Meaning |
226
+ |---|---|
227
+ | `agent` | Agent to run (defaults to the first discovered agent) |
228
+ | `dependsOn` | Phase ids this phase waits for — builds the DAG |
229
+ | `join` | `"all"` (default) waits for every dep; `"any"` is an OR-join |
230
+ | `when` | Conditional guard — skip unless the expression is truthy |
231
+ | `retry` | `{ max, backoffMs?, factor? }` — retry a failing subagent |
232
+ | `output` | `"text"` (default) or `"json"` (exposes `{steps.ID.json}`) |
233
+ | `model` / `thinking` / `tools` | Per-phase overrides for the subagent |
234
+ | `cwd` | Working directory for the subagent |
235
+ | `concurrency` | Fan-out cap for `map` / `parallel` (overrides the flow default) |
236
+ | `final` | Marks the result-bearing phase (else the last phase wins) |
237
+ | `optional` | A failure here does **not** abort the run |
238
+ | `use` / `with` | (`flow`) saved sub-flow name + its args |
239
+
240
+ Flow-level keys: `name`, `description`, `args`, `concurrency` (default 8),
241
+ `agentScope`, and `budget: { maxUSD?, maxTokens? }`.
189
242
 
190
243
  ### Control flow & reliability
191
244
 
@@ -294,6 +347,20 @@ file). Phase-level overrides for `model`, `thinking`, and `tools` are passed as
294
347
  Settings from `~/.pi/agent/settings.json` (the `subagents.agentOverrides` map)
295
348
  are honored, letting you tweak model, thinking, or tools per agent across all flows.
296
349
 
350
+ ## Examples
351
+
352
+ Ready-to-read definitions live in [`examples/`](./examples):
353
+
354
+ | File | Demonstrates |
355
+ |---|---|
356
+ | [`summarize-files.json`](./examples/summarize-files.json) | discover → `map` fan-out → `reduce` |
357
+ | [`conditional-research.json`](./examples/conditional-research.json) | `when` routing + `join: any` + `gate` + `budget` |
358
+ | [`guarded-refactor.json`](./examples/guarded-refactor.json) | `approval` (human-in-the-loop) + `retry` + `gate` |
359
+
360
+ To use one, copy it into `.pi/taskflows/<name>.json` (or
361
+ `~/.pi/agent/taskflows/`) and it registers as `/tf:<name>` — or just point the
362
+ model at the definition.
363
+
297
364
  ## Status & limits
298
365
 
299
366
  - **v0.0.6** — control flow & reliability: conditional `when` guards, `join: any`
@@ -327,13 +394,10 @@ are honored, letting you tweak model, thinking, or tools per agent across all fl
327
394
  ```bash
328
395
  npm install
329
396
  npm run typecheck
330
- node --experimental-strip-types --test test/interpolate.test.ts \
331
- test/condition.test.ts test/schema.test.ts test/usage.test.ts \
332
- test/runtime.test.ts test/features.test.ts test/runner.test.ts \
333
- test/store.test.ts test/agents.test.ts test/render.test.ts test/desugar.test.ts
397
+ npm test # unit tests — no network, no process spawning
334
398
 
335
399
  # real end-to-end (spawns live subagents; needs model access)
336
- PI_TASKFLOW_PI_BIN=pi node --experimental-strip-types test/e2e.mts
400
+ npm run test:e2e
337
401
  ```
338
402
 
339
403
  ## Contributing
@@ -301,19 +301,28 @@ export default function (pi: ExtensionAPI) {
301
301
  );
302
302
  },
303
303
  });
304
+ const warningText = v.warnings.length ? `\n\nWarnings:\n- ${v.warnings.join("\n- ")}` : "";
304
305
  return {
305
306
  content: [
306
- { type: "text", text: `Saved taskflow '${def.name}' → ${filePath}\nRun it with /tf:${def.name} or action=run.` },
307
+ { type: "text", text: `Saved taskflow '${def.name}' → ${filePath}\nRun it with /tf:${def.name} or action=run.${warningText}` },
307
308
  ],
308
309
  details: { action, message: filePath } satisfies TaskflowDetails,
309
310
  };
310
311
  }
311
312
 
312
313
  // run
313
- const v = validateTaskflow(def);
314
- if (!v.ok) return errorResult(action, `Invalid taskflow:\n- ${v.errors.join("\n- ")}`);
315
314
  const args = resolveArgs(def, params.args);
315
+ const v = validateTaskflow(def, { args, cwd: ctx.cwd });
316
+ if (!v.ok) return errorResult(action, `Invalid taskflow:\n- ${v.errors.join("\n- ")}`);
317
+ for (const w of v.warnings) {
318
+ console.warn(`[taskflow:${def.name}] ${w}`);
319
+ }
316
320
  const result = await runFlow(def, args, ctx, signal, onUpdate as any);
321
+ // Surface the validation warnings in the tool result so the model
322
+ // can acknowledge or fix them, and the user sees them in the chat.
323
+ if (v.warnings.length) {
324
+ result.finalOutput = `${result.finalOutput}\n\nWarnings:\n- ${v.warnings.join("\n- ")}`;
325
+ }
317
326
  return finalResult(action, result);
318
327
  },
319
328
 
@@ -20,7 +20,7 @@ export interface InterpolationContext {
20
20
  locals?: Record<string, unknown>;
21
21
  }
22
22
 
23
- const PLACEHOLDER = /\{([a-zA-Z0-9_]+(?:\.[a-zA-Z0-9_]+)*)\}/g;
23
+ const PLACEHOLDER = /\{([a-zA-Z0-9_-]+(?:\.[a-zA-Z0-9_-]+)*)\}/g;
24
24
 
25
25
  export interface InterpolationResult {
26
26
  text: string;
@@ -7,7 +7,7 @@
7
7
 
8
8
  import { getMarkdownTheme, type Theme } from "@earendil-works/pi-coding-agent";
9
9
  import { Container, Markdown, Spacer, Text } from "@earendil-works/pi-tui";
10
- import { formatTokens, type UsageStats } from "./usage.ts";
10
+ import { type UsageStats } from "./usage.ts";
11
11
  import type { PhaseState, RunState } from "./store.ts";
12
12
  import { dependenciesOf, type Phase, topoLayers } from "./schema.ts";
13
13
 
@@ -62,23 +62,16 @@ function miniBar(done: number, total: number, theme: Theme, width = 8): string {
62
62
  return theme.fg("accent", "━".repeat(filled)) + theme.fg("dim", "─".repeat(width - filled));
63
63
  }
64
64
 
65
- function compactUsage(usage: UsageStats | undefined, theme: Theme): string {
66
- if (!usage) return "";
67
- const parts: string[] = [];
68
- if (usage.turns) parts.push(theme.fg("dim", `${usage.turns}t`));
69
- if (usage.input) parts.push(theme.fg("dim", `↑${formatTokens(usage.input)}`));
70
- if (usage.output) parts.push(theme.fg("dim", `↓${formatTokens(usage.output)}`));
71
- if (usage.cost) parts.push(theme.fg("muted", `$${usage.cost.toFixed(3)}`));
72
- return parts.join(" ");
65
+ function agentRole(phase: Phase, ps: PhaseState | undefined, theme: Theme): string {
66
+ const role = phase.agent ?? phase.type ?? "agent";
67
+ const model = ps?.model ? shortModel(ps.model) : "";
68
+ if (!model) return theme.fg("accent", role);
69
+ return theme.fg("accent", role) + theme.fg("dim", `(${model})`);
73
70
  }
74
71
 
75
- function liveUsageStr(usage: UsageStats | undefined, theme: Theme): string {
76
- if (!usage) return "";
77
- const parts: string[] = [];
78
- if (usage.input) parts.push(theme.fg("dim", `↑${formatTokens(usage.input)}`));
79
- if (usage.output) parts.push(theme.fg("dim", `↓${formatTokens(usage.output)}`));
80
- if (usage.cost) parts.push(theme.fg("muted", `$${usage.cost.toFixed(3)}`));
81
- return parts.join(" ");
72
+ function costStr(usage: UsageStats | undefined, theme: Theme): string {
73
+ if (!usage?.cost) return "";
74
+ return theme.fg("muted", `$${usage.cost.toFixed(3)}`);
82
75
  }
83
76
 
84
77
  function aggregateCost(state: RunState): number {
@@ -118,7 +111,7 @@ function phaseDetail(phase: Phase, ps: PhaseState | undefined, theme: Theme): st
118
111
  if (ps.status === "skipped") {
119
112
  const reason = (ps.error ?? "upstream failed").replace(/\s+/g, " ");
120
113
  const snip = reason.length > 52 ? `${reason.slice(0, 52)}…` : reason;
121
- return theme.fg("muted", `skipped · ${snip}`);
114
+ return theme.fg("muted", `skipped · ${snip}`) + (ps.warnings?.length ? theme.fg("warning", ` ⚠${ps.warnings.length}`) : "");
122
115
  }
123
116
 
124
117
  const isFanout = type === "map" || type === "parallel" || type === "flow";
@@ -131,30 +124,34 @@ function phaseDetail(phase: Phase, ps: PhaseState | undefined, theme: Theme): st
131
124
  return (
132
125
  theme.fg("toolOutput", `${done - failed}/${total}`) +
133
126
  theme.fg("error", ` ${failed}✗`) +
134
- (snip ? theme.fg("error", ` ${snip}`) : "")
127
+ (snip ? theme.fg("error", ` ${snip}`) : "") +
128
+ (ps.warnings?.length ? theme.fg("warning", ` ⚠${ps.warnings.length}`) : "")
135
129
  );
136
130
  }
137
- return theme.fg("error", snip);
131
+ return theme.fg("error", snip) + (ps.warnings?.length ? theme.fg("warning", ` ⚠${ps.warnings.length}`) : "");
138
132
  }
139
133
 
140
134
  const t = phaseElapsed(ps);
141
135
  const time = t ? theme.fg("dim", elapsed(t)) : "";
142
136
 
143
137
  if (ps.status === "running") {
144
- const model = shortModel(ps.model);
145
- const tokens = liveUsageStr(ps.usage, theme);
138
+ const roleLabel = agentRole(phase, ps, theme);
139
+ const cost = costStr(ps.usage, theme);
146
140
  if (isFanout && ps.subProgress) {
147
141
  const { done, total, running, failed } = ps.subProgress;
148
142
  let s = `${miniBar(done, total, theme)} ${theme.fg("toolOutput", `${done}/${total}`)}`;
149
143
  if (running) s += theme.fg("dim", ` · ${running} run`);
150
144
  if (failed) s += theme.fg("error", ` · ${failed}✗`);
151
- if (tokens) s += ` ${tokens}`;
145
+ s += ` ${roleLabel}`;
146
+ if (cost) s += ` ${cost}`;
152
147
  if (time) s += ` ${time}`;
148
+ if (ps.warnings?.length) s += theme.fg("warning", ` ⚠${ps.warnings.length}`);
153
149
  return s;
154
150
  }
155
- let s = model ? theme.fg("accent", model) : theme.fg("warning", "running…");
156
- if (tokens) s += ` ${tokens}`;
151
+ let s = roleLabel;
152
+ if (cost) s += ` ${cost}`;
157
153
  if (time) s += ` ${time}`;
154
+ if (ps.warnings?.length) s += theme.fg("warning", ` ⚠${ps.warnings.length}`);
158
155
  return s;
159
156
  }
160
157
 
@@ -163,20 +160,22 @@ function phaseDetail(phase: Phase, ps: PhaseState | undefined, theme: Theme): st
163
160
  const { done = 0, total = 0, failed = 0 } = ps.subProgress ?? {};
164
161
  let s = theme.fg("success", `${total}✓`);
165
162
  if (failed) s = theme.fg("toolOutput", `${done - failed}/${total}`) + theme.fg("error", ` ${failed}✗`);
166
- const u = compactUsage(ps.usage, theme);
167
- if (u) s += ` ${u}`;
163
+ const cost = costStr(ps.usage, theme);
164
+ if (cost) s += ` ${cost}`;
168
165
  if (time) s += ` ${time}`;
166
+ if (ps.warnings?.length) s += theme.fg("warning", ` ⚠${ps.warnings.length}`);
169
167
  return s;
170
168
  }
171
169
  // single-agent done
172
- const model = shortModel(ps.model);
173
- const u = compactUsage(ps.usage, theme);
170
+ const roleLabel = agentRole(phase, ps, theme);
171
+ const cost = costStr(ps.usage, theme);
174
172
  if (ps.approval) {
175
173
  const d = ps.approval.decision;
176
174
  const color = d === "reject" ? "error" : d === "edit" ? "warning" : "success";
177
- let a = theme.fg(color as Parameters<typeof theme.fg>[0], theme.bold(d.toUpperCase()));
175
+ let a = theme.fg("warning", "⚠") + " " + theme.fg(color as Parameters<typeof theme.fg>[0], theme.bold(d.toUpperCase()));
178
176
  if (ps.approval.auto) a += theme.fg("dim", " auto");
179
177
  if (time) a += ` ${time}`;
178
+ if (ps.warnings?.length) a += theme.fg("warning", ` ⚠${ps.warnings.length}`);
180
179
  return a;
181
180
  }
182
181
  if (ps.gate) {
@@ -187,16 +186,18 @@ function phaseDetail(phase: Phase, ps: PhaseState | undefined, theme: Theme): st
187
186
  const r = ps.gate.reason.replace(/\s+/g, " ");
188
187
  g += theme.fg("dim", ` ${r.length > 44 ? `${r.slice(0, 44)}…` : r}`);
189
188
  }
190
- if (model) g += ` ${theme.fg("dim", model)}`;
189
+ const cost = costStr(ps.usage, theme);
190
+ if (cost) g += ` ${cost}`;
191
191
  if (time) g += ` ${time}`;
192
+ if (ps.warnings?.length) g += theme.fg("warning", ` ⚠${ps.warnings.length}`);
192
193
  return g;
193
194
  }
194
- let s = "";
195
- if (model) s += theme.fg("accent", model);
196
- if (u) s += (s ? " " : "") + u;
195
+ let s = roleLabel;
196
+ if (cost) s += ` ${cost}`;
197
197
  if (ps.attempts && ps.attempts > 1) s += theme.fg("warning", ` ↻${ps.attempts - 1}`);
198
198
  if (time) s += ` ${time}`;
199
- return s || theme.fg("dim", "done");
199
+ if (ps.warnings?.length) s += theme.fg("warning", ` ⚠${ps.warnings.length}`);
200
+ return s;
200
201
  }
201
202
 
202
203
  /** Header line: status glyph + name + compact totals. */
@@ -48,12 +48,67 @@ export function isFailed(r: RunResult): boolean {
48
48
  return r.exitCode !== 0 || r.stopReason === "error" || r.stopReason === "aborted";
49
49
  }
50
50
 
51
+ /** Placeholder written to a failed phase's `output` so downstream interpolation
52
+ * can detect "upstream failed" without being polluted by raw HTML/JSON. */
53
+ export const TRANSPORT_ERROR_PLACEHOLDER = "(upstream error: subagent failed; see error)";
54
+
55
+ /** Hard cap on the errorMessage field stored in PhaseState (≈ 4 KB). */
56
+ export const ERROR_MESSAGE_MAX_LEN = 4096;
57
+
58
+ /** Cheap HTML/JSON detector so we can summarize upstream garbage. */
59
+ export function looksLikeHtmlOrJson(s: string): boolean {
60
+ const t = s.trimStart();
61
+ if (!t) return false;
62
+ if (t.startsWith("<")) {
63
+ // HTML/XML/Cloudflare challenge pages
64
+ return /^<(?:!doctype\s+html|html|head|body|script|svg|div|iframe|span|p)\b/i.test(t);
65
+ }
66
+ if (t.startsWith("{")) {
67
+ // Truncated JSON. A genuine JSON envelope is fine to keep; an unwrapped
68
+ // {error: "..."} from an SDK is short. We only treat it as "garbage" if
69
+ // it parses and is huge — but that's caught by the size cap below.
70
+ return false;
71
+ }
72
+ return false;
73
+ }
74
+
75
+ /**
76
+ * Truncate and (when obviously HTML) summarize an errorMessage before it is
77
+ * persisted. Returns the cleaned string. Empty input returns empty.
78
+ */
79
+ export function sanitizeErrorMessage(raw: string | undefined): string {
80
+ if (!raw) return "";
81
+ const cleaned = raw.replace(/\s+/g, " ").trim();
82
+ if (!cleaned) return "";
83
+ // Decide the sanitization branch on the RAW length, not the whitespace-
84
+ // collapsed length — otherwise an HTML page padded with spaces would slip
85
+ // through the "looks like HTML" branch and be persisted as-is.
86
+ const rawLen = raw.length;
87
+ if (rawLen > ERROR_MESSAGE_MAX_LEN) {
88
+ const head = cleaned.slice(0, 200);
89
+ const tail = cleaned.slice(-200);
90
+ return `${head} ... [truncated ${rawLen - 400} chars] ... ${tail}`;
91
+ }
92
+ if (looksLikeHtmlOrJson(cleaned)) {
93
+ // Any document-like HTML (Cloudflare challenge pages, proxy error pages,
94
+ // gateway error pages) is a strong signal the upstream returned a page
95
+ // instead of JSON. Summarize it instead of letting HTML pollute the
96
+ // phase's error and downstream interpolation contexts.
97
+ const title = cleaned.match(/<title[^>]*>([^<]*)<\/title>/i)?.[1]?.trim();
98
+ const stripped = cleaned.replace(/<[^>]+>/g, " ").replace(/\s+/g, " ").trim();
99
+ const m = stripped.match(/(?:Unable to load site|Ray ID[: ]+([A-Za-z0-9]+)|[A-Z][a-z]+Error[: ]+(.{0,200}))/i);
100
+ const hint = title || (m ? (m[1] || m[0]).trim() : stripped.slice(0, 200));
101
+ return `Upstream returned non-JSON response (${rawLen} chars). Hint: ${hint}`;
102
+ }
103
+ return cleaned;
104
+ }
105
+
51
106
  function getFinalOutput(messages: Message[]): string {
52
107
  for (let i = messages.length - 1; i >= 0; i--) {
53
108
  const msg = messages[i];
54
109
  if (msg.role === "assistant") {
55
110
  for (const part of msg.content) {
56
- if (part.type === "text") return part.text;
111
+ if (part.type === "text" && part.text.trim()) return part.text;
57
112
  }
58
113
  }
59
114
  }
@@ -289,8 +344,17 @@ export async function runAgentTask(
289
344
  result.stopReason = "aborted";
290
345
  result.errorMessage = "Subagent was aborted";
291
346
  }
347
+ // On failure, build a short, structured errorMessage + a placeholder
348
+ // output. We deliberately do NOT copy the raw errorMessage into
349
+ // `output`: upstream providers (e.g. a Cloudflare challenge page) can
350
+ // surface huge HTML/JSON in errorMessage, and that garbage would
351
+ // otherwise flow into downstream phase interpolations.
292
352
  if (isFailed(result) && !result.output) {
293
- result.output = result.errorMessage || result.stderr || "(no output)";
353
+ result.output = TRANSPORT_ERROR_PLACEHOLDER;
354
+ if (!result.errorMessage) {
355
+ result.errorMessage = result.stderr || `Subagent exited with code ${result.exitCode} (stopReason: ${result.stopReason ?? "unknown"})`;
356
+ }
357
+ result.errorMessage = sanitizeErrorMessage(result.errorMessage);
294
358
  }
295
359
  return result;
296
360
  } finally {
@@ -10,6 +10,8 @@
10
10
  * result are skipped.
11
11
  */
12
12
 
13
+ import * as path from "node:path";
14
+ import * as fs from "node:fs";
13
15
  import type { AgentConfig } from "./agents.ts";
14
16
  import { coerceArray, evaluateCondition, interpolate, type InterpolationContext, safeParse } from "./interpolate.ts";
15
17
  import { isFailed, type LiveUpdate, mapWithConcurrencyLimit, runAgentTask, type RunResult } from "./runner.ts";
@@ -147,6 +149,9 @@ function mergePhaseState(
147
149
  const ran = results.filter((r) => r.stopReason !== "budget-skipped");
148
150
  const anyFailed = ran.some(isFailed);
149
151
  const usage = aggregateUsage(results.map((r) => r.usage));
152
+ // B12: surface the model(s) used in the fan-out so consumers can show
153
+ // which model produced the merged output.
154
+ const model = ran.find((r) => r.model !== undefined)?.model;
150
155
  // Combine outputs as a labelled list; also expose a JSON array of outputs.
151
156
  const combinedText = ran
152
157
  .map((r, i) => `### [${i + 1}/${ran.length}] ${r.agent}${isFailed(r) ? " (failed)" : ""}\n\n${r.output}`)
@@ -163,6 +168,7 @@ function mergePhaseState(
163
168
  output: combinedText,
164
169
  json: jsonArray,
165
170
  usage,
171
+ model,
166
172
  attempts: attempts > results.length ? attempts : undefined,
167
173
  budgetTruncated: budgetSkips.length > 0 || undefined,
168
174
  subProgress: { done: ran.length, total: results.length, running: 0, failed: failedCount },
@@ -188,6 +194,89 @@ function liveSink(state: RunState, phaseId: string, emitProgress: () => void): (
188
194
  };
189
195
  }
190
196
 
197
+
198
+ /**
199
+ * Pre-read files listed in a phase's `context` field and return them as
200
+ * markdown code blocks. Handles:
201
+ * - literal paths
202
+ * - interpolation refs (e.g. `{steps.scout.json}` resolving to `["a.ts"]`)
203
+ * - per-file truncation via `contextLimit`
204
+ *
205
+ * The result is a single string that should be prepended to the phase task so
206
+ * the subagent never needs to spend turns on file exploration.
207
+ */
208
+ const CONTEXT_MAX_FILE_BYTES = 10 * 1024 * 1024; // 10 MB
209
+ const MAX_TOTAL_CONTEXT_CHARS = 200_000;
210
+
211
+ async function resolvePhaseContext(
212
+ phase: Phase,
213
+ ctx: InterpolationContext,
214
+ ): Promise<string> {
215
+ const entries = phase.context;
216
+ if (!entries || entries.length === 0) return "";
217
+ const limit = phase.contextLimit ?? 8000;
218
+
219
+ const paths: string[] = [];
220
+ for (const entry of entries) {
221
+ const r = interpolate(entry, ctx);
222
+ if (r.text !== entry) {
223
+ // Resolved — may be a JSON array from {steps.X.json}
224
+ const parsed = safeParse(r.text);
225
+ if (Array.isArray(parsed)) {
226
+ for (const item of parsed) {
227
+ if (typeof item === "string" && item.trim()) paths.push(item.trim());
228
+ }
229
+ } else if (typeof r.text === "string" && r.text.trim()) {
230
+ paths.push(r.text.trim());
231
+ }
232
+ } else {
233
+ // Unchanged — literal path
234
+ paths.push(entry);
235
+ }
236
+ }
237
+
238
+ const unique = Array.from(new Set(paths));
239
+
240
+ // Diagnose JSON blobs masquerading as file paths — common when a context
241
+ // entry like {steps.discover.output} resolves to {"files":[...]} instead
242
+ // of a flat path or JSON array. The author should use {steps.discover.json.files}.
243
+ const jsonBlobs = unique.filter((p) => p.startsWith("{"));
244
+ for (const blob of jsonBlobs) {
245
+ console.warn(
246
+ `[taskflow] Context entry "${blob.slice(0, 80)}…" looks like a JSON object, not a file path. ` +
247
+ `Use {steps.<id>.json.<field>} to extract a specific field.`,
248
+ );
249
+ }
250
+ const filtered = jsonBlobs.length ? unique.filter((p) => !p.startsWith("{")) : unique;
251
+
252
+ const blocks: string[] = [];
253
+ for (const p of filtered) {
254
+ try {
255
+ const abs = path.resolve(p);
256
+ const stat = fs.statSync(abs);
257
+ if (!stat.isFile()) continue;
258
+ if (stat.size > CONTEXT_MAX_FILE_BYTES) continue;
259
+ const content = fs.readFileSync(abs, "utf-8");
260
+ const truncated =
261
+ content.length > limit
262
+ ? content.slice(0, limit) + `\n... [truncated ${content.length - limit} chars]`
263
+ : content;
264
+ const ext = path.extname(p).slice(1) || "txt";
265
+ blocks.push(`## File: ${p}\n\n\`\`\`${ext}\n${truncated}\n\`\`\``);
266
+ } catch {
267
+ console.warn(`[taskflow] Skipped unreadable context file: ${p}`);
268
+ }
269
+ }
270
+
271
+ // Safety cap: truncate total context when too many files are listed.
272
+ let result = blocks.join("\n\n") + "\n\n";
273
+ if (result.length > MAX_TOTAL_CONTEXT_CHARS) {
274
+ result = result.slice(0, MAX_TOTAL_CONTEXT_CHARS) + `\n\n... [truncated ${result.length - MAX_TOTAL_CONTEXT_CHARS} total chars]`;
275
+ }
276
+ return result;
277
+ }
278
+
279
+
191
280
  async function executePhase(
192
281
  phase: Phase,
193
282
  state: RunState,
@@ -200,6 +289,12 @@ async function executePhase(
200
289
  const previousOutput = lastCompletedOutput(state, phase);
201
290
  const run = deps.runTask ?? runAgentTask;
202
291
 
292
+ // Resolve context pre-read files once, before any type branching.
293
+ // The content is prepended to every task so the subagent never spends
294
+ // turns on file exploration for files the flow author already knows.
295
+ const ctx = buildInterpolationContext(state, previousOutput);
296
+ const preRead = await resolvePhaseContext(phase, ctx);
297
+
203
298
  const baseRun = (agentName: string, task: string, onLive?: (l: LiveUpdate) => void) =>
204
299
  run(
205
300
  deps.cwd,
@@ -228,6 +323,10 @@ async function executePhase(
228
323
  if (deps.signal?.aborted) break;
229
324
  last = await baseRun(agentName, task, onLive);
230
325
  usages.push(last.usage);
326
+ // B6: aggregate and surface cumulative usage before the retry decision,
327
+ // so the TUI / budget guard see the in-flight spend on every attempt.
328
+ const liveRetry = state.phases[phase.id];
329
+ if (liveRetry) liveRetry.usage = aggregateUsage(usages);
231
330
  if (!isFailed(last)) break;
232
331
  // Stop retrying on abort or once the run is over budget.
233
332
  if (deps.signal?.aborted || overBudget(state).over) break;
@@ -313,24 +412,26 @@ async function executePhase(
313
412
  // interpolated task. gate additionally parses a verdict; reduce simply pulls
314
413
  // its inputs from `from` phases (already exposed via interpolation).
315
414
  if (type === "agent" || type === "gate" || type === "reduce") {
316
- const ctx = buildInterpolationContext(state, previousOutput);
317
415
  const { text } = interpolate(phase.task ?? "", ctx);
318
- const inputHash = hashInput(phase.id, phase.agent ?? "", text);
416
+ const fullTask = preRead + text;
417
+ const inputHash = hashInput(phase.id, phase.agent ?? "", fullTask);
319
418
  const cached = cachedPhase(prior, inputHash);
320
419
  if (cached) return cached;
321
420
 
322
- const r = await runOne(phase.agent ?? defaultAgent(deps), text, liveSink(state, phase.id, emitProgress));
421
+ const r = await runOne(phase.agent ?? defaultAgent(deps), fullTask, liveSink(state, phase.id, emitProgress));
323
422
  const ps = resultToPhaseState(phase.id, r, inputHash, parseJson);
324
423
  if (type === "gate" && ps.status === "done") ps.gate = parseGateVerdict(r.output);
325
424
  return ps;
326
425
  }
327
426
 
328
427
  if (type === "parallel") {
329
- const ctx = buildInterpolationContext(state, previousOutput);
330
- const branches = (phase.branches ?? []).map((b) => ({
331
- agent: b.agent ?? phase.agent ?? defaultAgent(deps),
332
- task: interpolate(b.task, ctx).text,
333
- }));
428
+ const branches = (phase.branches ?? []).map((b) => {
429
+ const r = interpolate(b.task, ctx);
430
+ return {
431
+ agent: b.agent ?? phase.agent ?? defaultAgent(deps),
432
+ task: preRead + r.text,
433
+ };
434
+ });
334
435
  const inputHash = hashInput(phase.id, JSON.stringify(branches));
335
436
  const cached = cachedPhase(prior, inputHash);
336
437
  if (cached) return cached;
@@ -340,7 +441,6 @@ async function executePhase(
340
441
  }
341
442
 
342
443
  if (type === "map") {
343
- const ctx = buildInterpolationContext(state, previousOutput);
344
444
  const overResolved = interpolate(phase.over ?? "", ctx).text;
345
445
  // `over` may itself be a placeholder that resolved to a JSON string.
346
446
  const arr = coerceArray(safeParse(overResolved)) ?? coerceArray(directRef(phase.over ?? "", state));
@@ -359,7 +459,7 @@ async function executePhase(
359
459
  const localCtx = buildInterpolationContext(state, previousOutput, { [loopVar]: item });
360
460
  return {
361
461
  agent: phase.agent ?? defaultAgent(deps),
362
- task: interpolate(phase.task ?? "", localCtx).text,
462
+ task: preRead + interpolate(phase.task ?? "", localCtx).text,
363
463
  };
364
464
  });
365
465
  const inputHash = hashInput(phase.id, JSON.stringify(tasks));
@@ -424,7 +524,7 @@ async function executePhase(
424
524
  provided[k] = typeof v === "string" ? interpolate(v, ctx).text : v;
425
525
  }
426
526
  const subArgs = resolveArgs(subDef, provided);
427
- const inputHash = hashInput(phase.id, `flow:${name}`, JSON.stringify(subArgs));
527
+ const inputHash = hashInput(phase.id, `flow:${name}`, preRead, JSON.stringify(subArgs));
428
528
  const cached = cachedPhase(prior, inputHash);
429
529
  if (cached) return cached;
430
530
 
@@ -442,10 +542,16 @@ async function executePhase(
442
542
  phases: {},
443
543
  createdAt: Date.now(),
444
544
  updatedAt: Date.now(),
445
- cwd: deps.cwd,
545
+ cwd: phase.cwd ?? deps.cwd,
446
546
  };
547
+ // B8: pass this flow phase's preRead content to every sub-flow phase by
548
+ // wrapping runTask — sub-phase preRead still gets prepended on top of it.
549
+ const baseRunTask = deps.runTask ?? runAgentTask;
550
+ const subRunTask: typeof runAgentTask = (cwd, agents, agentName, subTask, opts, globalThinking) =>
551
+ baseRunTask(cwd, agents, agentName, preRead + subTask, opts, globalThinking);
447
552
  const subResult = await executeTaskflow(subState, {
448
553
  ...deps,
554
+ runTask: subRunTask,
449
555
  _stack: [...stack, state.flowName],
450
556
  persist: undefined,
451
557
  onProgress: () => {
@@ -494,7 +600,7 @@ async function executePhase(
494
600
 
495
601
  /** Resolve a `{steps.x.json}`-style ref directly to its parsed value (bypassing stringify). */
496
602
  function directRef(over: string, state: RunState): unknown {
497
- const m = over.match(/^\{steps\.([a-zA-Z0-9_]+)\.(output|json)(?:\.([a-zA-Z0-9_]+(?:\.[a-zA-Z0-9_]+)*))?\}$/);
603
+ const m = over.match(/^\{steps\.([a-zA-Z0-9_-]+)\.(output|json)(?:\.([a-zA-Z0-9_-]+(?:\.[a-zA-Z0-9_-]+)*))?\}$/);
498
604
  if (!m) return undefined;
499
605
  const step = state.phases[m[1]];
500
606
  if (!step || step.status !== "done") return undefined;
@@ -5,6 +5,7 @@
5
5
  * to a subagent (an isolated `pi` process). Phases form a DAG via `dependsOn`.
6
6
  */
7
7
 
8
+ import * as path from "node:path";
8
9
  import { StringEnum } from "@earendil-works/pi-ai";
9
10
  import { Type, type Static } from "typebox";
10
11
 
@@ -102,6 +103,18 @@ const PhaseSchema = Type.Object(
102
103
  Type.Boolean({ description: "If true, a failure does not abort the run", default: false }),
103
104
  ),
104
105
  concurrency: Type.Optional(Type.Number({ description: "Override max concurrency for map/parallel" })),
106
+ context: Type.Optional(
107
+ Type.Array(Type.String(), {
108
+ description:
109
+ "File paths or {steps.X} refs to pre-read and inject before the task. Resolves interpolated refs first, then reads each file (capped per-file). Eliminates O(N²) turn-cost exploration.",
110
+ }),
111
+ ),
112
+ contextLimit: Type.Optional(
113
+ Type.Number({
114
+ description: "Max characters to read per file referenced in context (default 8000).",
115
+ default: 8000,
116
+ }),
117
+ ),
105
118
  },
106
119
  { additionalProperties: false },
107
120
  );
@@ -126,6 +139,13 @@ export const TaskflowSchema = Type.Object(
126
139
  agentScope: Type.Optional(
127
140
  StringEnum(["user", "project", "both"] as const, { description: "Agent discovery scope", default: "user" }),
128
141
  ),
142
+ strictInterpolation: Type.Optional(
143
+ Type.Boolean({
144
+ description:
145
+ "When true, unresolved interpolation placeholders and validation warnings about missing deps/args become hard errors",
146
+ default: false,
147
+ }),
148
+ ),
129
149
  phases: Type.Array(PhaseSchema, { minItems: 1, description: "Ordered phase definitions (DAG via dependsOn)" }),
130
150
  },
131
151
  { additionalProperties: false },
@@ -190,6 +210,8 @@ export function desugar(def: unknown): Taskflow {
190
210
  if (typeof d.concurrency === "number") meta.concurrency = d.concurrency;
191
211
  if (d.agentScope === "user" || d.agentScope === "project" || d.agentScope === "both") meta.agentScope = d.agentScope;
192
212
  if (d.args && typeof d.args === "object") meta.args = d.args as Taskflow["args"];
213
+ if (d.budget) meta.budget = d.budget;
214
+ if (typeof d.strictInterpolation === "boolean") meta.strictInterpolation = d.strictInterpolation;
193
215
  const nameOf = (fallback: string) => (typeof d.name === "string" && d.name.trim() ? d.name.trim() : fallback);
194
216
 
195
217
  // chain → sequential agent phases
@@ -228,20 +250,35 @@ export function desugar(def: unknown): Taskflow {
228
250
  export interface ValidationResult {
229
251
  ok: boolean;
230
252
  errors: string[];
253
+ /** Non-fatal issues the user should fix; e.g. `{steps.X}` references that
254
+ * aren't declared in `dependsOn` (the phase will run in parallel with its
255
+ * producer and see the literal placeholder). */
256
+ warnings: string[];
231
257
  }
232
258
 
233
- export function validateTaskflow(def: unknown): ValidationResult {
259
+ export interface ValidationOptions {
260
+ /** Resolved invocation args, used for runtime checks like missing `{args.X}`. */
261
+ args?: Record<string, unknown>;
262
+ /** Runtime working directory, used for mismatch warnings (e.g. cwd vs args.codebase). */
263
+ cwd?: string;
264
+ /** Override the flow's own `strictInterpolation` flag for this validation call. */
265
+ strict?: boolean;
266
+ }
267
+
268
+ export function validateTaskflow(def: unknown, opts: ValidationOptions = {}): ValidationResult {
234
269
  const errors: string[] = [];
270
+ const warnings: string[] = [];
235
271
 
236
272
  if (typeof def !== "object" || def === null) {
237
- return { ok: false, errors: ["Taskflow must be an object"] };
273
+ return { ok: false, errors: ["Taskflow must be an object"], warnings };
238
274
  }
239
275
  const flow = def as Partial<Taskflow>;
276
+ const strict = opts.strict ?? flow.strictInterpolation === true;
240
277
 
241
278
  if (!flow.name || typeof flow.name !== "string") errors.push("Missing or invalid 'name'");
242
279
  if (!Array.isArray(flow.phases) || flow.phases.length === 0) {
243
280
  errors.push("Taskflow must have at least one phase");
244
- return { ok: false, errors };
281
+ return { ok: false, errors, warnings };
245
282
  }
246
283
 
247
284
  const ids = new Set<string>();
@@ -318,7 +355,99 @@ export function validateTaskflow(def: unknown): ValidationResult {
318
355
  const finals = (flow.phases as Phase[]).filter((p) => p?.final);
319
356
  if (finals.length > 1) errors.push(`Only one phase may be marked 'final' (found ${finals.length})`);
320
357
 
321
- return { ok: errors.length === 0, errors };
358
+ // --- Soft warnings: {steps.X.*} references that aren't declared deps -------
359
+ // Catches the most common authoring mistake: the task talks about
360
+ // `{steps.review.output}` but `dependsOn: ["review"]` is missing, so the
361
+ // phase runs in parallel with `review` and the model sees the literal
362
+ // placeholder string. The runtime can't infer the intent.
363
+ if (errors.length === 0) {
364
+ const idToPhase = new Map((flow.phases as Phase[]).map((p) => [p.id, p]));
365
+ for (const p of flow.phases as Phase[]) {
366
+ if (!p?.id) continue;
367
+ const deps = new Set(dependenciesOf(p));
368
+ const refs = collectRefs(p);
369
+ for (const ref of refs.steps) {
370
+ if (ref === p.id) {
371
+ warnings.push(`Phase '${p.id}': references its own output via {steps.${ref}.*}; this is almost always a bug.`);
372
+ continue;
373
+ }
374
+ if (!idToPhase.has(ref)) {
375
+ // Unknown ref is already an error from the dependsOn check, but
376
+ // {steps.X.*} can appear in a task without dependsOn. Don't
377
+ // double-warn — the dependsOn loop above already flags it.
378
+ continue;
379
+ }
380
+ if (!deps.has(ref)) {
381
+ warnings.push(
382
+ `Phase '${p.id}': task references {steps.${ref}.*} but '${ref}' is not in dependsOn. ` +
383
+ `The phase will run in parallel with '${ref}' and see the literal placeholder. ` +
384
+ `Add "dependsOn": ["${ref}"] (or include '${ref}' transitively).`,
385
+ );
386
+ }
387
+ }
388
+ }
389
+ }
390
+
391
+ // --- Runtime/invocation warnings: missing args + cwd/codebase mismatch -----
392
+ if (errors.length === 0 && opts.args) {
393
+ const argRefs = new Set<string>();
394
+ for (const p of flow.phases as Phase[]) {
395
+ if (!p?.id) continue;
396
+ for (const ref of collectRefs(p).args) argRefs.add(ref);
397
+ }
398
+ for (const ref of argRefs) {
399
+ if (!(ref in opts.args)) {
400
+ warnings.push(
401
+ `Taskflow references {args.${ref}} but the invocation did not provide '${ref}'. ` +
402
+ `The placeholder will remain literal unless a default or runtime arg is supplied.`,
403
+ );
404
+ }
405
+ }
406
+ if (opts.cwd && typeof opts.args.codebase === "string" && opts.args.codebase.trim()) {
407
+ const cwd = path.resolve(opts.cwd);
408
+ const codebase = path.resolve(cwd, opts.args.codebase);
409
+ // Safe case: cwd is the codebase root or a subdirectory within it.
410
+ // Warn when cwd is a sibling, unrelated path, or a parent of the
411
+ // codebase (agents that rely on cwd would inspect too broad a tree).
412
+ if (!pathContains(codebase, cwd)) {
413
+ warnings.push(
414
+ `Invocation cwd '${cwd}' does not match args.codebase '${codebase}'. ` +
415
+ `Some agents may inspect the wrong repo if they rely on cwd. Prefer running from the codebase root or set phase.cwd explicitly.`,
416
+ );
417
+ }
418
+ }
419
+ }
420
+
421
+ if (strict && warnings.length) {
422
+ errors.push(...warnings.map((w) => `Strict interpolation: ${w}`));
423
+ }
424
+
425
+ return { ok: errors.length === 0, errors, warnings };
426
+ }
427
+
428
+ function collectRefs(phase: Phase): { steps: string[]; args: string[] } {
429
+ const steps = new Set<string>();
430
+ const args = new Set<string>();
431
+ const scan = (s: string | undefined) => {
432
+ if (!s) return;
433
+ let m: RegExpExecArray | null;
434
+ const stepRe = /\{steps\.([a-zA-Z0-9_-]+)/g;
435
+ while ((m = stepRe.exec(s)) !== null) steps.add(m[1]);
436
+ const argRe = /\{args\.([a-zA-Z0-9_-]+)/g;
437
+ while ((m = argRe.exec(s)) !== null) args.add(m[1]);
438
+ };
439
+ scan(phase.task);
440
+ scan(phase.over);
441
+ scan(phase.when);
442
+ for (const b of phase.branches ?? []) scan(b.task);
443
+ for (const v of Object.values(phase.with ?? {})) if (typeof v === "string") scan(v);
444
+ for (const c of phase.context ?? []) scan(c);
445
+ return { steps: Array.from(steps), args: Array.from(args) };
446
+ }
447
+
448
+ function pathContains(parent: string, child: string): boolean {
449
+ const rel = path.relative(parent, child);
450
+ return rel === "" || (!rel.startsWith("..") && !path.isAbsolute(rel));
322
451
  }
323
452
 
324
453
  /** Returns a cycle path if the DAG has one, else null. */
@@ -45,6 +45,12 @@ export interface PhaseState {
45
45
  budgetTruncated?: boolean;
46
46
  /** Human-in-the-loop outcome (approval phases only). */
47
47
  approval?: { decision: "approve" | "reject" | "edit"; note?: string; auto?: boolean };
48
+ /** Non-fatal diagnostic warnings accumulated during this phase (e.g.
49
+ * unresolved interpolation placeholders, suspicious templates). */
50
+ warnings?: string[];
51
+ /** Truncated previews of interpolated strings used to execute this phase,
52
+ * useful when diagnosing why a model saw a literal placeholder. */
53
+ interpolation?: Array<{ source: string; text: string; missing?: string[] }>;
48
54
  }
49
55
 
50
56
  export interface RunState {
@@ -148,8 +154,48 @@ export function saveRun(state: RunState): void {
148
154
  }
149
155
 
150
156
  export function loadRun(cwd: string, runId: string): RunState | null {
157
+ const dir = runsDir(cwd);
158
+
159
+ // Reject runIds that could be used for path traversal or filesystem abuse.
160
+ // Legitimate runIds are produced by newRunId() and contain only
161
+ // [A-Za-z0-9._-]; anything else (empty string, path separators, NUL bytes,
162
+ // backslashes on POSIX, forward slashes on Windows) is suspicious.
163
+ if (
164
+ typeof runId !== "string" ||
165
+ runId.length === 0 ||
166
+ runId.includes("/") ||
167
+ runId.includes("\\") ||
168
+ runId.includes("\0")
169
+ ) {
170
+ return null;
171
+ }
172
+
173
+ const filePath = path.resolve(dir, `${runId}.json`);
174
+ // Reject runIds that would escape the runs directory (e.g. "../etc/passwd").
175
+ // Compare with a path-separator suffix so legitimate filenames like "..foo"
176
+ // (a name that just happens to start with two dots) are not false-positives.
177
+ const rel = path.relative(dir, filePath);
178
+ if (rel === ".." || rel.startsWith(`..${path.sep}`) || path.isAbsolute(rel)) return null;
179
+
180
+ // Resolve symlinks on both the runs dir and the file, so the containment
181
+ // check below is on a consistent physical path. Without normalizing `dir`,
182
+ // a legitimate run on macOS (where /var → /private/var) would compare a
183
+ // symlinked dir prefix to a real path and falsely flag traversal. A
184
+ // malicious file already placed inside the runs dir could otherwise also
185
+ // point at an arbitrary path on disk and bypass the lexical check above.
186
+ let realDir: string;
187
+ let realFilePath: string;
188
+ try {
189
+ realDir = fs.realpathSync(dir);
190
+ realFilePath = fs.realpathSync(filePath);
191
+ } catch {
192
+ return null;
193
+ }
194
+ const realRel = path.relative(realDir, realFilePath);
195
+ if (realRel === ".." || realRel.startsWith(`..${path.sep}`) || path.isAbsolute(realRel)) return null;
196
+
151
197
  try {
152
- const raw = fs.readFileSync(path.join(runsDir(cwd), `${runId}.json`), "utf-8");
198
+ const raw = fs.readFileSync(realFilePath, "utf-8");
153
199
  return JSON.parse(raw) as RunState;
154
200
  } catch {
155
201
  return null;
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-taskflow",
3
- "version": "0.0.6",
3
+ "version": "0.0.7",
4
4
  "description": "Lightweight workflow orchestration for the Pi coding agent — declarative multi-phase taskflows with dynamic fan-out, isolated subagent context, resumable runs, and saveable commands.",
5
5
  "keywords": [
6
6
  "pi-package",
@@ -188,6 +188,85 @@ Review the audit results below. If any endpoint is missing auth, end with
188
188
  3. Reference upstream results explicitly with `{steps.ID...}` and set `dependsOn`.
189
189
  4. Mark the result-bearing phase with `"final": true` (else the last phase wins).
190
190
 
191
+ ## Common mistakes (the runtime will warn you, but don't trip them)
192
+
193
+ The runtime validates your flow at startup and at each phase's interpolation.
194
+ Two patterns account for ~all the broken runs in the wild — avoid them. If you
195
+ want warnings like these to become hard failures, set `"strictInterpolation": true`
196
+ on the flow.
197
+
198
+ ### 1. Referencing `{steps.X}` without `dependsOn: ["X"]`
199
+
200
+ ```jsonc
201
+ // ❌ WRONG — 'fix-issues' will run in parallel with 'code-review-1' and see the
202
+ // literal string "{steps.code-review-1.output}" instead of the review text.
203
+ {
204
+ "id": "code-review-1", "type": "agent", "task": "review code"
205
+ },
206
+ {
207
+ "id": "fix-issues", "type": "agent",
208
+ "task": "fix {steps.code-review-1.output}" // ← no dependsOn!
209
+ }
210
+ ```
211
+
212
+ The runtime logs a warning at run start (`Phase 'fix-issues': task references
213
+ {steps.code-review-1.*} but 'code-review-1' is not in dependsOn`) and the phase
214
+ itself gets a `warnings` field with a non-fatal `unresolved placeholders` line.
215
+ The TUI shows a `⚠N` badge. **Always declare the chain:**
216
+
217
+ ```jsonc
218
+ // ✅ RIGHT
219
+ {
220
+ "id": "code-review-1", "type": "agent", "task": "review code"
221
+ },
222
+ {
223
+ "id": "fix-issues", "type": "agent",
224
+ "task": "fix {steps.code-review-1.output}",
225
+ "dependsOn": ["code-review-1"] // ← declared
226
+ },
227
+ {
228
+ "id": "code-review-2", "type": "agent",
229
+ "task": "re-review {steps.fix-issues.output}",
230
+ "dependsOn": ["fix-issues"]
231
+ }
232
+ ```
233
+
234
+ Tip: write the `task` first (it tells you what each phase needs), then scan for
235
+ `{steps.*}` references and add the matching `dependsOn`. If a phase truly does
236
+ not depend on anything in its task, you can ignore the warning.
237
+
238
+ ### 2. Assuming the runtime knows "this is a chain"
239
+
240
+ Phase order in the `phases` array is **documentation, not execution order**.
241
+ The DAG comes from `dependsOn`. If you list `code-review-1`, `fix-issues`,
242
+ `code-review-2`, `fix-final` in that order with no `dependsOn`, the runtime
243
+ treats them as four independent phases and runs all of them in **layer 0** in
244
+ parallel. A phase that finishes first may not be the one you expected.
245
+
246
+ ```jsonc
247
+ // ❌ This is not a chain — it's 4 parallel phases, all racing.
248
+ "phases": [
249
+ { "id": "code-review-1", ... },
250
+ { "id": "fix-issues", ... },
251
+ { "id": "code-review-2", ... },
252
+ { "id": "fix-final", ... }
253
+ ]
254
+ ```
255
+
256
+ Use the shorthand if you literally just want `a → b → c → d`:
257
+
258
+ ```jsonc
259
+ { "chain": [
260
+ { "agent": "reviewer", "task": "review code" },
261
+ { "agent": "executor", "task": "fix {previous.output}" },
262
+ { "agent": "reviewer", "task": "re-review" },
263
+ { "agent": "executor", "task": "apply final fixes" }
264
+ ] }
265
+ ```
266
+
267
+ …or write the full DAG with explicit `dependsOn` (so reviewers/fixers can run
268
+ in parallel against multiple review streams when you want that).
269
+
191
270
  ## Configuration
192
271
 
193
272
  For the full set of knobs — per-phase `model`/`thinking`/`tools`/`cwd`, the
@@ -197,7 +276,7 @@ variables, and storage paths — read `configuration.md` (next to this file).
197
276
 
198
277
  Quick reference:
199
278
 
200
- - **Flow:** `name`, `description`, `concurrency` (default 8), `budget` (`maxUSD`/`maxTokens`), `agentScope` (user|project|both), `args`.
279
+ - **Flow:** `name`, `description`, `concurrency` (default 8), `budget` (`maxUSD`/`maxTokens`), `agentScope` (user|project|both), `args`, `strictInterpolation`.
201
280
  - **Phase:** `model`, `thinking`, `tools` (whitelist), `cwd`, `output:"json"`, `concurrency` (map/parallel fan-out), `when`, `join` (all|any), `retry`, `use`/`with` (flow), `final`.
202
281
  - **Precedence (model/thinking/tools):** phase value → `settings.subagents.agentOverrides[agent]` → agent frontmatter → global/default.
203
282
  - **Concurrency:** same-layer phases use `flow.concurrency`; a `map`/`parallel` phase uses `phase.concurrency ?? flow.concurrency ?? 8`.
@@ -86,7 +86,6 @@ Keys of each object in `phases[]`. Some only apply to specific `type`s.
86
86
  | `cwd` | all | flow cwd | Run this phase's subagent in a different directory. |
87
87
  | `concurrency` | map, parallel | flow concurrency | Fan-out cap for this phase only. See §4. |
88
88
  | `final` | all | last phase | Exactly one phase may be `final`; its output is returned. |
89
- | `optional` | all | `false` | ⚠️ Declared in schema but **not yet enforced** — a failed phase still skips downstream. |
90
89
 
91
90
  ---
92
91
 
@@ -270,6 +269,5 @@ Taskflow shares the subagent settings file at `~/.pi/agent/settings.json`:
270
269
  These keys validate but the runtime does **not** act on them yet — don't rely on
271
270
  them for behavior:
272
271
 
273
- - `phase.optional` — a failed phase still marks downstream phases as skipped.
274
272
  - `arg.required` — missing required args are not rejected.
275
273
  - `flow.version` — informational only.