pi-taskflow 0.0.20 → 0.0.22

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,29 @@
2
2
 
3
3
  All notable changes to pi-taskflow are documented here. This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) format.
4
4
 
5
+ ## [0.0.22] — 2026-06-10
6
+
7
+ > Dogfooding release. The `dogfood-full` self-audit taskflow (which itself
8
+ > exercises all 9 phase types + when/join/retry/budget/cache/eval/flow-def/
9
+ > loop/tournament/approval) ran against the codebase and surfaced these fixes.
10
+
11
+ ### Added
12
+ - **Live auto-refresh for the `/tf runs` panel.** The run-history panel was a static snapshot taken when opened, so a background (detached) run's progress never updated while watching. It now polls run state on a 1s interval and re-renders only when a run's status/`updatedAt` actually changes — phase progress (including `map`/`parallel` `subProgress` like `24/24`) updates live. The user's selection follows the same `runId` across refreshes, a green `● live` tag shows while any run is running, and the refresh timer is cleared on close (`dispose()`) and `unref`'d so it never keeps the event loop alive. Fully backward-compatible: without live hooks the panel renders statically as before.
13
+ - 5 new tests (`test/runs-view.test.ts`): refresh-on-change, no-render-when-unchanged, dispose-stops-timer, selection-follows-runId, back-compat-no-hooks.
14
+
15
+ ### Fixed
16
+ - **`safeParse` now prefers a `json`-tagged fence in multi-fence output.** When an LLM phase emitted an evidence block (e.g. ```` ```typescript ````) *before* the ```` ```json ```` payload, the old single-match regex grabbed the first fence, failed to parse, and the balanced-bracket fallback was misled by braces in the prose — `safeParse` returned `undefined` and any downstream `map` phase failed with `'over' did not resolve to an array`. It now scans every fenced block and tries `json`-tagged ones first, then untagged. (3 new multi-fence tests.)
17
+ - **Unresolved interpolation refs are surfaced as phase warnings.** `interpolate()` returns `missing[]` (placeholders with no source), but the runtime discarded it on the main task path — so `{args.typo}` or a `{steps.x.output}` without `dependsOn` was silently left intact in the dispatched task. The `interpolate.ts` doc comment promised "a recorded warning" that no code produced. The runtime now logs `[taskflow] phase X: unresolved refs ...` and attaches the message to `PhaseState.warnings` (persisted in the run record, visible in `/tf runs`). Doc comment corrected to match.
18
+
19
+ ## [0.0.21] — 2026-06-10
20
+
21
+ ### Added
22
+ - **Per-step context pre-read in shorthand modes.** Single, chain, and tasks shorthand steps now accept `context` (file paths) and `contextLimit`, desugared directly onto the generated phases. This eliminates `O(N²)` file exploration without writing the full DSL. In parallel `tasks` mode all branches share the deduped union of step contexts; chain steps each carry their own context. A top-level `context` in chain mode produces a warning (no unsupported flow-level default). Context-file changes automatically invalidate phase caches.
23
+
24
+ ### Fixed
25
+ - **Headless approval safety.** Approval phases now auto-reject (not auto-approve) when running in detached/background/CI mode, preventing silent bypass of human gates.
26
+ - **Step-reference validator accepts transitive ancestors.** The step-reference checker previously raised false positives on valid DAGs where dependencies span multiple levels of ancestry. Ancestor transitive closure is now fully resolved.
27
+
5
28
  ## [0.0.20] — 2026-06-10
6
29
 
7
30
  ### Added
package/README.md CHANGED
@@ -8,7 +8,7 @@
8
8
  <a href="./LICENSE"><img src="https://img.shields.io/badge/license-MIT-43D9AD?style=flat-square" alt="MIT license"></a>
9
9
  <a href="#whats-inside"><img src="https://img.shields.io/badge/runtime%20deps-0-43D9AD?style=flat-square" alt="zero runtime dependencies"></a>
10
10
  <a href="https://github.com/heggria/pi-taskflow/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/heggria/pi-taskflow/ci.yml?branch=main&style=flat-square&label=CI" alt="CI status"></a>
11
- <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-535-6E8BFF?style=flat-square" alt="535 tests"></a>
11
+ <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-608-6E8BFF?style=flat-square" alt="608 tests"></a>
12
12
  <a href="#whats-inside"><img src="https://img.shields.io/badge/dogfooded-%E2%9C%93-43D9AD?style=flat-square" alt="dogfooded"></a>
13
13
  <a href="https://pi.dev"><img src="https://img.shields.io/badge/for-Pi%20coding%20agent-B692FF?style=flat-square" alt="for the Pi coding agent"></a>
14
14
  </p>
@@ -304,7 +304,7 @@ Flow-level keys: `name`, `description`, `args`, `concurrency` (default 8), `agen
304
304
  - **`when`** — skip a phase unless an expression is truthy. Supports `{refs}`, `== != < > <= >=`, `&& || !`, parentheses, and quoted strings/numbers. Pair with `join: "any"` on the merge phase for real if/else routing. Parse errors **fail open**.
305
305
  - **`join: "any"`** — an OR-join: the phase runs as soon as *one* dependency completes (default `"all"` waits for all).
306
306
  - **`retry`** — `{ "max": 2, "backoffMs": 500, "factor": 2 }` retries a failing subagent with fixed or exponential backoff; usage is summed and the attempt count shows as `↻N` in the TUI. Transient provider errors (rate-limit / 5xx / timeout) **auto-retry even without an explicit policy**; hard errors don't.
307
- - **`approval`** — pause for a human (Approve / Reject / Edit). Reject halts the flow; Edit injects the typed note as the phase output for downstream steps. Non-interactive runs auto-approve.
307
+ - **`approval`** — pause for a human (Approve / Reject / Edit). Reject halts the flow; Edit injects the typed note as the phase output for downstream steps. Non-interactive runs auto-reject (safety: approval gates are never bypassed).
308
308
  - **`flow`** — `{ "type": "flow", "use": "deep-research", "with": { "topic": "{item}" } }` runs a **saved** flow as a phase (recursion is detected and rejected). Or **generate the sub-flow at runtime**: `{ "type": "flow", "def": "{steps.plan.json}" }` resolves an upstream phase's JSON output into a sub-flow, **validates it (cycles / dangling refs / duplicate ids), then runs it** — the number and shape of the generated phases is decided at runtime, not authored in advance. A malformed plan fails *open* (the phase is skipped with a `defError`, the run continues). This is how a planner decides *at runtime* what work to spawn — the declarative answer to a code-mode `for` loop, with each generated plan checked before it spends a token. Pair it with `loop` for **data-dependent iterative replanning** (round N's plan depends on round N-1's result). See [`examples/dynamic-plan-execute.json`](./examples/dynamic-plan-execute.json) and [`examples/iterative-replan.json`](./examples/iterative-replan.json).
309
309
 
310
310
  ### Loop-until-done (`loop`)
@@ -434,7 +434,7 @@ Resume is keyed on each phase's input hash — if an upstream output changed, de
434
434
  ```
435
435
  .pi/taskflows/<name>.json # project-scoped definitions (commit to share)
436
436
  ~/.pi/agent/taskflows/<name>.json # user-scoped definitions
437
- .pi/taskflows/runs/<runId>.json # run state for resume (gitignore this)
437
+ .pi/taskflows/runs/<flowName>/<runId>.json # run state for resume (gitignore this)
438
438
  ```
439
439
 
440
440
  > Commit `.pi/taskflows/` and your whole team shares the pipelines — no config sync, no onboarding doc. Run state is written atomically and guarded by a zero-dependency file lock, so concurrent runs never corrupt the index.
@@ -608,12 +608,12 @@ Copy one into `.pi/taskflows/<name>.json` (or `~/.pi/agent/taskflows/`) and it r
608
608
 
609
609
  <div align="center">
610
610
 
611
- **0 runtime dependencies** · **535 tests** · **9 phase types** · **cross-session resume** · **cross-run memoization** · **~5.4k LOC runtime**
611
+ **0 runtime dependencies** · **608 tests** · **9 phase types** · **cross-session resume** · **cross-run memoization** · **~7.7k LOC runtime**
612
612
 
613
613
  </div>
614
614
 
615
615
  - **Zero runtime dependencies.** No `dependencies` field — the runtime is built entirely on Node built-ins (`fs` / `path` / `os` / `child_process` / `crypto`). The file lock is `fs.openSync("wx")`, not a third-party library.
616
- - **535 tests across 21 test files** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, callback isolation, the idle watchdog, model-role init config, and parseModelFromLabel with parenthesized-model-name regression.
616
+ - **608 tests across 26 test files** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, live run-history refresh, callback isolation, the idle watchdog, model-role init config, and parseModelFromLabel with parenthesized-model-name regression.
617
617
  - **Hardened by design.** Path-traversal defense (lexical + `realpath`), runId validation, HTML/error sanitization, atomic writes, stale-lock stealing via `rename`, and an idle watchdog that kills wedged subagents.
618
618
  - **Dogfooded.** Every new feature has to survive the project's own `self-improve` taskflow before it ships.
619
619
 
@@ -637,7 +637,7 @@ Our `self-improve` flow is a 10-phase DAG — it audits the codebase, patches de
637
637
 
638
638
  ## Status & limits
639
639
 
640
- **v0.0.17** — loop-until-done (`loop` phase: iterate to a condition, convergence, or cap), tournament (best-of-N with a judge), cross-run memoization (content-addressed cache with git/file/glob/env fingerprints and TTL), interactive `/tf init` with role-aware model pickers + diff preview + atomic merge-write, configurable built-in agents, 18 built-in agents with 6 model roles. Full control-flow & reliability layer (`when` guards, `join: any`, `retry`/backoff, `approval`, `flow` composition, `budget` caps, idle watchdog) on top of the DSL + DAG runtime (`agent`/`parallel`/`map`/`gate`/`reduce`). Inline + saved flows, cross-session resume, live progress, and isolated context. A run executes as one streaming tool call.
640
+ **v0.0.20** — loop-until-done (`loop` phase: iterate to a condition, convergence, or cap), tournament (best-of-N with a judge), cross-run memoization (content-addressed cache with git/file/glob/env fingerprints and TTL), interactive `/tf init` with role-aware model pickers + diff preview + atomic merge-write, configurable built-in agents, 18 built-in agents with 6 model roles. Full control-flow & reliability layer (`when` guards, `join: any`, `retry`/backoff, `approval`, `flow` composition, `budget` caps, idle watchdog) on top of the DSL + DAG runtime (`agent`/`parallel`/`map`/`gate`/`reduce`). Inline + saved flows, cross-session resume, live progress, and isolated context. A run executes as one streaming tool call.
641
641
 
642
642
  Known boundaries (tracked, bounded — no surprises mid-flow):
643
643
 
@@ -74,6 +74,7 @@ export interface AgentConfig {
74
74
  filePath: string;
75
75
  }
76
76
 
77
+ /** @internal */
77
78
  export interface AgentDiscoveryResult {
78
79
  agents: AgentConfig[];
79
80
  projectAgentsDir: string | null;
@@ -224,7 +225,13 @@ export interface SubagentSettings {
224
225
  * E.g. `{{fast}}` → `openrouter/deepseek/deepseek-v4-flash` if modelRoles.fast is set.
225
226
  * Returns undefined if the value is not a role reference or the role is unmapped.
226
227
  */
227
- export function resolveModelRole(model: string | undefined, roles?: Record<string, string>): string | undefined {
228
+ /**
229
+ * Resolve `{{roleName}}` model references against a role→model mapping.
230
+ * E.g. `{{fast}}` → `openrouter/deepseek/deepseek-v4-flash` if modelRoles.fast is set.
231
+ * Returns undefined if the value is not a role reference or the role is unmapped.
232
+ * @internal
233
+ */
234
+ function resolveModelRole(model: string | undefined, roles?: Record<string, string>): string | undefined {
228
235
  if (!model || !roles) return model;
229
236
  const match = model.match(/^\{\{(\w+)\}\}$/);
230
237
  if (!match) return model;
@@ -6,11 +6,15 @@
6
6
  * before deciding. Every line is padded to the full dialog width so the
7
7
  * overlay composites cleanly (no see-through, no ghosting in scrollback).
8
8
  *
9
- * While the dialog is open, SGR mouse reporting is enabled so the wheel
10
- * scrolls the viewport instead of the terminal scrollback. It is restored
11
- * on dispose.
9
+ * Mouse tracking is intentionally NOT used here. Enabling terminal-level
10
+ * SGR mouse reporting (DECSET 1000h/1006h) to capture wheel events would
11
+ * interfere with the terminal's native scrollback after the dialog closes,
12
+ * because the restore sequence depends on the overlay framework reliably
13
+ * calling dispose — which is not guaranteed across all lifecycle paths.
14
+ * Keyboard scrolling (↑↓/PgUp/PgDn/Home/End/j/k/g/G) covers the same
15
+ * ground without risking a stuck mouse-tracking mode.
12
16
  *
13
- * Keys: wheel/↑↓ scroll · PgUp/PgDn page · Home/End jump ·
17
+ * Keys: ↑↓ scroll · PgUp/PgDn page · Home/End jump ·
14
18
  * a/Enter approve · e edit (guidance) · r/Esc reject.
15
19
  */
16
20
 
@@ -28,31 +32,16 @@ export interface ApprovalViewOptions {
28
32
  upstream?: string;
29
33
  }
30
34
 
31
- /** Minimal writer used to toggle terminal mouse reporting. */
32
- export interface TerminalWriter {
33
- write(data: string): void;
34
- }
35
-
36
35
  const FALLBACK_ROWS = 24;
37
- /** Wheel ticks scroll this many lines. */
38
- const WHEEL_STEP = 3;
39
- /** SGR mouse sequence: ESC [ < B ; X ; Y (M|m) */
40
- const MOUSE_SGR = /^\x1b\[<(\d+);(\d+);(\d+)([Mm])$/;
41
- /** Enable basic mouse tracking + SGR encoding. */
42
- const MOUSE_ON = "\x1b[?1000h\x1b[?1006h";
43
- /** Restore: disable SGR encoding + mouse tracking. */
44
- const MOUSE_OFF = "\x1b[?1006l\x1b[?1000l";
45
36
 
46
37
  export class ApprovalViewComponent {
47
38
  private theme: Theme;
48
39
  private opts: ApprovalViewOptions;
49
40
  private onDone: (choice: ApprovalChoice) => void;
50
41
  private getRows: () => number;
51
- private term?: TerminalWriter;
52
42
  private scrollOffset = 0;
53
43
  private cachedWidth?: number;
54
44
  private cachedBody?: string[];
55
- private mouseEnabled = false;
56
45
  private decided = false;
57
46
 
58
47
  constructor(
@@ -60,43 +49,19 @@ export class ApprovalViewComponent {
60
49
  opts: ApprovalViewOptions,
61
50
  onDone: (choice: ApprovalChoice) => void,
62
51
  getRows?: () => number,
63
- term?: TerminalWriter,
64
52
  ) {
65
53
  this.theme = theme;
66
54
  this.opts = opts;
67
55
  this.onDone = onDone;
68
56
  this.getRows = getRows ?? (() => FALLBACK_ROWS);
69
- this.term = term;
70
- this.enableMouse();
71
- }
72
-
73
- private enableMouse(): void {
74
- if (this.term && !this.mouseEnabled) {
75
- try {
76
- this.term.write(MOUSE_ON);
77
- this.mouseEnabled = true;
78
- } catch {
79
- // non-tty / closed stream — wheel support is best-effort
80
- }
81
- }
82
57
  }
83
58
 
84
- /** Restore terminal mouse state. Idempotent; call from the overlay's dispose. */
85
- dispose(): void {
86
- if (this.term && this.mouseEnabled) {
87
- this.mouseEnabled = false;
88
- try {
89
- this.term.write(MOUSE_OFF);
90
- } catch {
91
- // ignore
92
- }
93
- }
94
- }
59
+ /** No-op kept for compatibility with Pi TUI overlay dispose contract. */
60
+ dispose(): void {}
95
61
 
96
62
  private decide(choice: ApprovalChoice): void {
97
63
  if (this.decided) return;
98
64
  this.decided = true;
99
- this.dispose();
100
65
  this.onDone(choice);
101
66
  }
102
67
 
@@ -155,17 +120,6 @@ export class ApprovalViewComponent {
155
120
  }
156
121
 
157
122
  handleInput(data: string): void {
158
- // Mouse events (SGR) — wheel scrolls, everything else is swallowed.
159
- const mouse = MOUSE_SGR.exec(data);
160
- if (mouse) {
161
- const b = Number(mouse[1]);
162
- if (b & 64) {
163
- // Wheel: low two bits 0 = up, 1 = down.
164
- if ((b & 3) === 0) this.clampScroll(-WHEEL_STEP);
165
- else if ((b & 3) === 1) this.clampScroll(WHEEL_STEP);
166
- }
167
- return;
168
- }
169
123
  // Decisions
170
124
  if (matchesKey(data, "return") || data === "a" || data === "y") {
171
125
  this.decide("approve");
@@ -251,7 +205,7 @@ export class ApprovalViewComponent {
251
205
 
252
206
  // Key hints
253
207
  lines.push(this.hrule(width, "├", "┤"));
254
- const scrollHint = cap > 0 ? "wheel/↑↓/PgUp/PgDn scroll · " : "";
208
+ const scrollHint = cap > 0 ? "↑↓/PgUp/PgDn scroll · " : "";
255
209
  lines.push(this.row(th.fg("dim", `${scrollHint}a/Enter approve · e edit · r/Esc reject`), width));
256
210
  lines.push(this.hrule(width, "╰", "╯"));
257
211
  return lines;
@@ -135,6 +135,7 @@ export function resolveFingerprint(entries: string[] | undefined, cwd: string):
135
135
  // Cross-run cache store
136
136
  // ---------------------------------------------------------------------------
137
137
 
138
+ /** @internal */
138
139
  export interface CacheEntry {
139
140
  /** The full cache key (== phase inputHash incl. fingerprint). */
140
141
  key: string;
@@ -56,8 +56,8 @@ try {
56
56
  agents,
57
57
  globalThinking: settings.globalThinking,
58
58
  persist: (s) => saveRun(s, cleanupConfig),
59
- // No requestApproval — approval phases auto-reject in detached mode
60
- // (fail-open: phase records the rejection, run continues).
59
+ // No requestApproval — approval phases auto-reject in detached/CI mode
60
+ // (safety: approval gates are never bypassed; the run records the rejection).
61
61
  loadFlow: (name: string) => getFlow(ctx.cwd, name)?.def,
62
62
  });
63
63
 
@@ -59,6 +59,15 @@ const ShorthandStep = Type.Object(
59
59
  {
60
60
  agent: Type.Optional(Type.String({ description: "Agent for this step (defaults to the first available agent)" })),
61
61
  task: Type.String({ description: "Task prompt for this step (supports {previous.output} in chains)" }),
62
+ context: Type.Optional(
63
+ Type.Array(Type.String(), {
64
+ description:
65
+ "File paths to pre-read and inject before this step's task (same as Phase.context). In parallel `tasks` mode all branches SHARE the union of step contexts.",
66
+ }),
67
+ ),
68
+ contextLimit: Type.Optional(
69
+ Type.Number({ description: "Max characters to read per context file (default 8000)." }),
70
+ ),
62
71
  },
63
72
  { additionalProperties: false },
64
73
  );
@@ -82,6 +91,15 @@ const TaskflowParams = Type.Object({
82
91
  task: Type.Optional(
83
92
  Type.String({ description: "Shorthand single mode: the task prompt (like subagent single mode)" }),
84
93
  ),
94
+ context: Type.Optional(
95
+ Type.Array(Type.String(), {
96
+ description:
97
+ "Shorthand single mode: file paths to pre-read and inject before the task (same as Phase.context).",
98
+ }),
99
+ ),
100
+ contextLimit: Type.Optional(
101
+ Type.Number({ description: "Shorthand single mode: max characters to read per context file (default 8000)." }),
102
+ ),
85
103
  tasks: Type.Optional(
86
104
  Type.Array(ShorthandStep, {
87
105
  description: "Shorthand parallel mode: run these tasks concurrently and merge results (like subagent parallel)",
@@ -188,7 +206,6 @@ async function runFlow(
188
206
  },
189
207
  done,
190
208
  () => tui.terminal.rows,
191
- tui.terminal,
192
209
  );
193
210
  const onAbort = () => done("reject");
194
211
  signal?.addEventListener("abort", onAbort, { once: true });
@@ -573,7 +590,7 @@ export default function (pi: ExtensionAPI) {
573
590
  : params.tasks
574
591
  ? { tasks: params.tasks, name: params.name }
575
592
  : params.task
576
- ? { task: params.task, agent: params.agent, name: params.name }
593
+ ? { task: params.task, agent: params.agent, name: params.name, context: params.context, contextLimit: params.contextLimit }
577
594
  : undefined);
578
595
 
579
596
  if (shorthandSpec !== undefined) {
@@ -781,8 +798,13 @@ export default function (pi: ExtensionAPI) {
781
798
  );
782
799
  return;
783
800
  }
784
- const result = await ctx.ui.custom<RunHistoryResult | undefined>((_tui, theme, _kb, done) => {
785
- return new RunHistoryComponent(runs, theme, (r) => done(r));
801
+ const result = await ctx.ui.custom<RunHistoryResult | undefined>((tui, theme, _kb, done) => {
802
+ const comp = new RunHistoryComponent(runs, theme, (r) => done(r), {
803
+ refresh: () => listRuns(ctx.cwd, 50),
804
+ requestRender: () => tui.requestRender(),
805
+ intervalMs: 1000,
806
+ });
807
+ return comp;
786
808
  });
787
809
  if (result?.action === "resume") {
788
810
  if (ctx.isIdle()) {
@@ -8,8 +8,11 @@
8
8
  * {previous.output} alias for the immediately-preceding completed phase output
9
9
  * {item} / {item.f} map loop variable (or custom name via phase.as)
10
10
  *
11
- * Unknown placeholders are left intact (with a recorded warning) rather than
12
- * throwing, so a partially-specified task still runs.
11
+ * Unknown placeholders are left intact rather than throwing, so a
12
+ * partially-specified task still runs. The unresolved refs are returned in
13
+ * `missing[]`; the runtime surfaces them as a phase warning (see
14
+ * `warnUnresolvedRefs` in runtime.ts) — logged and persisted to
15
+ * `PhaseState.warnings`.
13
16
  */
14
17
 
15
18
  export interface InterpolationContext {
@@ -123,13 +126,21 @@ export function safeParse(text: string): unknown {
123
126
  } catch {
124
127
  // noop
125
128
  }
126
- // Extract from a ```json fenced block
127
- const fence = trimmed.match(/```(?:json)?\s*([\s\S]*?)```/i);
128
- if (fence) {
129
+ // Extract from fenced blocks. Outputs often contain multiple fences
130
+ // (e.g. a ```typescript evidence block before the ```json payload), so try
131
+ // every fence — json-tagged blocks first, then untagged/other blocks.
132
+ const fenceRe = /```(\w*)[ \t]*\r?\n?([\s\S]*?)```/g;
133
+ const fenced: { lang: string; body: string }[] = [];
134
+ let fm: RegExpExecArray | null;
135
+ while ((fm = fenceRe.exec(trimmed)) !== null) {
136
+ fenced.push({ lang: fm[1].toLowerCase(), body: fm[2].trim() });
137
+ }
138
+ const ordered = [...fenced.filter((b) => b.lang === "json"), ...fenced.filter((b) => b.lang !== "json")];
139
+ for (const block of ordered) {
129
140
  try {
130
- return JSON.parse(fence[1].trim());
141
+ return JSON.parse(block.body);
131
142
  } catch {
132
- // noop
143
+ // noop — try the next fence
133
144
  }
134
145
  }
135
146
  // Extract the first balanced [...] or {...}
@@ -69,7 +69,7 @@ export interface RunOptions {
69
69
  * 5 minutes is generous enough for slow reasoning/long tool calls while still
70
70
  * bounding a true hang.
71
71
  */
72
- export const DEFAULT_IDLE_TIMEOUT_MS = 5 * 60_000;
72
+ const DEFAULT_IDLE_TIMEOUT_MS = 5 * 60_000;
73
73
 
74
74
  export function isFailed(r: RunResult): boolean {
75
75
  return r.exitCode !== 0 || r.stopReason === "error" || r.stopReason === "aborted";
@@ -40,6 +40,19 @@ function isResumable(r: RunState): boolean {
40
40
  return r.status === "paused" || r.status === "failed";
41
41
  }
42
42
 
43
+ /** Detect whether a refreshed run list differs from the current one in any way
44
+ * the panel renders (status, updatedAt, phase progress, membership). */
45
+ function hasChanged(prev: RunState[], next: RunState[]): boolean {
46
+ if (prev.length !== next.length) return true;
47
+ const byId = new Map(prev.map((r) => [r.runId, r]));
48
+ for (const n of next) {
49
+ const p = byId.get(n.runId);
50
+ if (!p) return true;
51
+ if (p.status !== n.status || p.updatedAt !== n.updatedAt) return true;
52
+ }
53
+ return false;
54
+ }
55
+
43
56
  export class RunHistoryComponent {
44
57
  private runs: RunState[];
45
58
  private theme: Theme;
@@ -48,14 +61,62 @@ export class RunHistoryComponent {
48
61
  private mode: "list" | "detail" = "list";
49
62
  private cachedWidth?: number;
50
63
  private cachedLines?: string[];
64
+ /** Live-refresh wiring: re-read run state from disk while the panel is open
65
+ * so background (detached) runs show live progress without reopening. */
66
+ private timer?: ReturnType<typeof setInterval>;
67
+ private refresh?: () => RunState[];
68
+ private requestRender?: () => void;
51
69
 
52
- constructor(runs: RunState[], theme: Theme, onDone: (result?: RunHistoryResult) => void) {
70
+ constructor(
71
+ runs: RunState[],
72
+ theme: Theme,
73
+ onDone: (result?: RunHistoryResult) => void,
74
+ /** Optional live-refresh hooks. When both are provided the panel polls
75
+ * `refresh()` on an interval and calls `requestRender()` if anything changed. */
76
+ live?: { refresh: () => RunState[]; requestRender: () => void; intervalMs?: number },
77
+ ) {
53
78
  if (!runs.length) {
54
79
  throw new Error("RunHistoryComponent requires at least one run");
55
80
  }
56
81
  this.runs = runs;
57
82
  this.theme = theme;
58
83
  this.onDone = onDone;
84
+ if (live) {
85
+ this.refresh = live.refresh;
86
+ this.requestRender = live.requestRender;
87
+ const intervalMs = Math.max(250, live.intervalMs ?? 1000);
88
+ this.timer = setInterval(() => this.poll(), intervalMs);
89
+ // Don't keep the event loop alive just for the panel refresh.
90
+ (this.timer as { unref?: () => void }).unref?.();
91
+ }
92
+ }
93
+
94
+ /** Re-read run state; if anything changed, refresh the cached render. */
95
+ private poll(): void {
96
+ if (!this.refresh) return;
97
+ let next: RunState[];
98
+ try {
99
+ next = this.refresh();
100
+ } catch {
101
+ return; // transient read/lock error — try again next tick
102
+ }
103
+ if (!next.length) return;
104
+ if (!hasChanged(this.runs, next)) return;
105
+ // Preserve the user's selection by runId across refreshes.
106
+ const selectedId = this.runs[this.selected]?.runId;
107
+ this.runs = next;
108
+ const idx = next.findIndex((r) => r.runId === selectedId);
109
+ this.selected = idx >= 0 ? idx : Math.min(this.selected, next.length - 1);
110
+ this.invalidate();
111
+ this.requestRender?.();
112
+ }
113
+
114
+ /** Stop the refresh timer when the panel closes. */
115
+ dispose(): void {
116
+ if (this.timer) {
117
+ clearInterval(this.timer);
118
+ this.timer = undefined;
119
+ }
59
120
  }
60
121
 
61
122
  handleInput(data: string): void {
@@ -104,7 +165,8 @@ export class RunHistoryComponent {
104
165
  for (const l of renderProgress(run, th).split("\n")) lines.push(truncateToWidth(l, width));
105
166
  lines.push("");
106
167
  const hint = isResumable(run) ? "Esc back · r resume" : "Esc back";
107
- lines.push(truncateToWidth(` ${th.fg("dim", hint)}`, width));
168
+ const liveTag = this.timer && run.status === "running" ? th.fg("success", " ● live") : "";
169
+ lines.push(truncateToWidth(` ${th.fg("dim", hint)}${liveTag}`, width));
108
170
  lines.push("");
109
171
  this.cachedWidth = width;
110
172
  this.cachedLines = lines;
@@ -129,7 +191,11 @@ export class RunHistoryComponent {
129
191
  });
130
192
 
131
193
  lines.push("");
132
- lines.push(truncateToWidth(` ${th.fg("dim", "↑↓ select · Enter details · r resume · q close")}`, width));
194
+ const anyRunning = this.runs.some((r) => r.status === "running");
195
+ const liveHint = this.timer && anyRunning ? th.fg("success", " ● live") : "";
196
+ lines.push(
197
+ truncateToWidth(` ${th.fg("dim", "↑↓ select · Enter details · r resume · q close")}${liveHint}`, width),
198
+ );
133
199
  lines.push("");
134
200
 
135
201
  this.cachedWidth = width;
@@ -47,7 +47,7 @@ export interface RuntimeDeps {
47
47
  onProgress?: (state: RunState) => void;
48
48
  /** Injectable task runner (defaults to spawning a real subagent). Enables testing. */
49
49
  runTask?: typeof runAgentTask;
50
- /** Resolve an `approval` phase. Omit for non-interactive runs (auto-approve). */
50
+ /** Resolve an `approval` phase. Omit for non-interactive runs (auto-reject). */
51
51
  requestApproval?: (req: ApprovalRequest) => Promise<ApprovalDecision>;
52
52
  /** Resolve a saved taskflow by name for `flow` (sub-workflow) phases. */
53
53
  loadFlow?: (name: string) => Taskflow | undefined;
@@ -87,8 +87,7 @@ function buildInterpolationContext(
87
87
  return { args: state.args, steps, previousOutput, locals };
88
88
  }
89
89
 
90
- function resultToPhaseState(id: string, r: RunResult, inputHash: string, parseJson: boolean): PhaseState {
91
- const failed = isFailed(r);
90
+ function resultToPhaseState(id: string, r: RunResult, inputHash: string, parseJson: boolean): PhaseState { const failed = isFailed(r);
92
91
  const attempts = attemptsOf(r);
93
92
  // For failed phases, embed the error info in the output so downstream
94
93
  // phases (and the user) can see what went wrong. The raw r.output is
@@ -110,6 +109,22 @@ function resultToPhaseState(id: string, r: RunResult, inputHash: string, parseJs
110
109
  };
111
110
  }
112
111
 
112
+ /**
113
+ * Surface unresolved interpolation placeholders (the `missing[]` from
114
+ * `interpolate()`). Without this they are silently left intact in the task —
115
+ * the doc comment in interpolate.ts promises "a recorded warning". We both
116
+ * log to the console and return a string to attach to PhaseState.warnings so
117
+ * the warning is persisted in the run record and visible in `/tf runs`.
118
+ * Returns undefined when nothing is missing.
119
+ */
120
+ function warnUnresolvedRefs(phaseId: string, missing: string[]): string | undefined {
121
+ if (!missing.length) return undefined;
122
+ const unique = Array.from(new Set(missing));
123
+ const msg = `unresolved refs in task: ${unique.map((m) => `{${m}}`).join(", ")} — left intact (check dependsOn / placeholder spelling)`;
124
+ console.warn(`[taskflow] phase '${phaseId}': ${msg}`);
125
+ return msg;
126
+ }
127
+
113
128
  /** Attempts recorded by the retry wrapper (defaults to 1). */
114
129
  function attemptsOf(r: RunResult): number {
115
130
  const a = r.attempts;
@@ -392,6 +407,7 @@ async function executePhase(
392
407
  runId: state.runId,
393
408
  thinking: phase.thinking,
394
409
  tools: phase.tools,
410
+ preRead,
395
411
  };
396
412
 
397
413
  const baseRun = (agentName: string, task: string, onLive?: (l: LiveUpdate) => void) =>
@@ -581,7 +597,9 @@ async function executePhase(
581
597
  return ps;
582
598
  }
583
599
  }
584
- const { text } = interpolate(phase.task ?? "", ctx);
600
+ const interp = interpolate(phase.task ?? "", ctx);
601
+ const text = interp.text;
602
+ const refWarning = warnUnresolvedRefs(phase.id, interp.missing);
585
603
  const fullTask = preRead + text;
586
604
  const agentName = resolveAgent(phase.agent, deps, state);
587
605
  const inputHash = cacheKey(cc, [phase.id, agentName, phase.model ?? "", fullTask]);
@@ -590,6 +608,7 @@ async function executePhase(
590
608
 
591
609
  const r = await runOne(agentName, fullTask, liveSink(state, phase.id, emitProgress));
592
610
  const ps = resultToPhaseState(phase.id, r, inputHash, parseJson);
611
+ if (refWarning) ps.warnings = [...(ps.warnings ?? []), refWarning];
593
612
  if (type === "gate" && ps.status === "done") ps.gate = parseGateVerdict(r.output);
594
613
 
595
614
  // onBlock:retry — re-execute upstream + gate until pass or max attempts.
@@ -700,13 +719,16 @@ async function executePhase(
700
719
  const cached = cachedPhase(cc, inputHash);
701
720
  if (cached) return cached;
702
721
 
703
- // Non-interactive (headless/CI/tests): auto-approve, fail-open, but record it.
722
+ // Non-interactive (headless/CI/detached): auto-REJECT, fail-open, but record it.
723
+ // Approval gates are safety boundaries — bypassing them silently in CI would
724
+ // let unreviewed work ship. Detached/CI runs must not bypass approval gates.
704
725
  if (!deps.requestApproval) {
705
726
  return {
706
727
  id: phase.id,
707
728
  status: "done",
708
- output: "(auto-approved: no interactive approver available)",
709
- approval: { decision: "approve", auto: true },
729
+ output: "(auto-rejected: no interactive approver available)",
730
+ approval: { decision: "reject", auto: true },
731
+ gate: { verdict: "block", reason: "(auto-rejected: no interactive approver available)" },
710
732
  usage: emptyUsage(),
711
733
  inputHash,
712
734
  endedAt: Date.now(),
@@ -1185,15 +1207,26 @@ interface PhaseCacheCtx {
1185
1207
  * silently serve a stale cross-run hit). */
1186
1208
  thinking?: string;
1187
1209
  tools?: string[];
1210
+ /** Resolved `context` pre-read content. Explicitly part of the cache identity
1211
+ * so a context-file change always invalidates the phase — independent of
1212
+ * whether a given branch happens to fold preRead into its task string
1213
+ * (previously this was only incidentally true via `fullTask`). */
1214
+ preRead?: string;
1188
1215
  }
1189
1216
 
1190
1217
  /** Fold the phase fingerprint into the base hash parts to form the final cache key. */
1191
1218
  function cacheKey(cc: PhaseCacheCtx, baseParts: string[]): string {
1192
1219
  // Fold the full cache identity into the hash: flow name (prevents collisions
1193
1220
  // across different flows that share a phase.id + task + model), the per-phase
1194
- // thinking/tools config (changing either changes the subagent's output), and
1195
- // the resolved world-state fingerprint.
1196
- const parts = [`flow:${cc.flowName}`, ...baseParts, `think:${cc.thinking ?? ""}`, `tools:${JSON.stringify(cc.tools ?? [])}`];
1221
+ // thinking/tools config (changing either changes the subagent's output), the
1222
+ // resolved context pre-read content, and the world-state fingerprint.
1223
+ const parts = [
1224
+ `flow:${cc.flowName}`,
1225
+ ...baseParts,
1226
+ `think:${cc.thinking ?? ""}`,
1227
+ `tools:${JSON.stringify(cc.tools ?? [])}`,
1228
+ `ctx:${cc.preRead ?? ""}`,
1229
+ ];
1197
1230
  return cc.fingerprint ? hashInput(...parts, cc.fingerprint) : hashInput(...parts);
1198
1231
  }
1199
1232
 
@@ -13,8 +13,8 @@ import { Type, type Static } from "typebox";
13
13
  // Phase types
14
14
  // ---------------------------------------------------------------------------
15
15
 
16
- export const PHASE_TYPES = ["agent", "parallel", "map", "gate", "reduce", "approval", "flow", "loop", "tournament"] as const;
17
- export type PhaseType = (typeof PHASE_TYPES)[number];
16
+ const PHASE_TYPES = ["agent", "parallel", "map", "gate", "reduce", "approval", "flow", "loop", "tournament"] as const;
17
+ type PhaseType = (typeof PHASE_TYPES)[number];
18
18
 
19
19
  /** Loop iteration bounds. Authors may lower the max; the hard cap is a runaway guard. */
20
20
  export const LOOP_DEFAULT_MAX_ITERATIONS = 10;
@@ -36,17 +36,18 @@ export const MAX_DYNAMIC_CONCURRENCY = 16;
36
36
  /** Tournament competitor bounds. */
37
37
  export const TOURNAMENT_DEFAULT_VARIANTS = 3;
38
38
  export const TOURNAMENT_HARD_MAX_VARIANTS = 20;
39
- export const TOURNAMENT_MODES = ["best", "aggregate"] as const;
39
+ const TOURNAMENT_MODES = ["best", "aggregate"] as const;
40
+ /** @internal */
40
41
  export type TournamentMode = (typeof TOURNAMENT_MODES)[number];
41
42
 
42
- export const OUTPUT_FORMATS = ["text", "json"] as const;
43
- export const JOIN_MODES = ["all", "any"] as const;
44
- export const CACHE_SCOPES = ["run-only", "cross-run", "off"] as const;
43
+ const OUTPUT_FORMATS = ["text", "json"] as const;
44
+ const JOIN_MODES = ["all", "any"] as const;
45
+ const CACHE_SCOPES = ["run-only", "cross-run", "off"] as const;
45
46
  export type CacheScope = (typeof CACHE_SCOPES)[number];
46
47
  /** Allowed fingerprint entry prefixes. `glob!:` = content-hash variant of `glob:`. */
47
- export const CACHE_FINGERPRINT_PREFIXES = ["git:", "glob:", "glob!:", "file:", "env:"] as const;
48
+ const CACHE_FINGERPRINT_PREFIXES = ["git:", "glob:", "glob!:", "file:", "env:"] as const;
48
49
  /** Phase types that must NOT be cached across runs (a fresh result is required each run). */
49
- export const CACHE_CROSS_RUN_BLOCKED_TYPES = ["gate", "approval", "loop", "tournament"] as const;
50
+ const CACHE_CROSS_RUN_BLOCKED_TYPES = ["gate", "approval", "loop", "tournament"] as const;
50
51
 
51
52
  const ParallelTaskSchema = Type.Object(
52
53
  {
@@ -282,7 +283,7 @@ export type ArgSpec = Static<typeof ArgSpecSchema>;
282
283
  export type RetryPolicy = Static<typeof RetrySchema>;
283
284
  export type Budget = Static<typeof BudgetSchema>;
284
285
  export type CachePolicy = Static<typeof CacheSchema>;
285
- export type JoinMode = (typeof JOIN_MODES)[number];
286
+ type JoinMode = (typeof JOIN_MODES)[number];
286
287
 
287
288
  // ---------------------------------------------------------------------------
288
289
  // Shorthand (non-DAG) specs — subagent-style ergonomics
@@ -302,6 +303,10 @@ export type JoinMode = (typeof JOIN_MODES)[number];
302
303
  export interface ShorthandStep {
303
304
  agent?: string;
304
305
  task: string;
306
+ /** Files to pre-read and inject before the task (pass-through to Phase.context). */
307
+ context?: string[];
308
+ /** Max characters per context file (pass-through to Phase.contextLimit). */
309
+ contextLimit?: number;
305
310
  }
306
311
 
307
312
  /** True when `def` is a shorthand spec (no `phases`, but a task/tasks/chain field). */
@@ -316,11 +321,22 @@ export function isShorthand(def: unknown): boolean {
316
321
  );
317
322
  }
318
323
 
324
+ /** Coerce an unknown value into a non-empty list of non-empty strings (or undefined). */
325
+ function readContextList(v: unknown): string[] | undefined {
326
+ if (!Array.isArray(v)) return undefined;
327
+ const list = v.filter((x): x is string => typeof x === "string" && x.trim().length > 0);
328
+ return list.length ? list : undefined;
329
+ }
330
+
319
331
  function readStep(s: unknown): ShorthandStep {
320
332
  if (typeof s === "string") return { task: s };
321
333
  if (s && typeof s === "object") {
322
334
  const o = s as Record<string, unknown>;
323
- return { agent: typeof o.agent === "string" ? o.agent : undefined, task: String(o.task ?? "") };
335
+ const step: ShorthandStep = { agent: typeof o.agent === "string" ? o.agent : undefined, task: String(o.task ?? "") };
336
+ const ctx = readContextList(o.context);
337
+ if (ctx) step.context = ctx;
338
+ if (typeof o.contextLimit === "number") step.contextLimit = o.contextLimit;
339
+ return step;
324
340
  }
325
341
  return { task: "" };
326
342
  }
@@ -345,10 +361,19 @@ export function desugar(def: unknown): Taskflow {
345
361
 
346
362
  // chain → sequential agent phases
347
363
  if (Array.isArray(d.chain) && d.chain.length > 0) {
364
+ // Spec-level context in chain mode would be a flow-level default (every
365
+ // step), which is deliberately NOT supported — declare it per step instead.
366
+ if (d.context !== undefined || d.contextLimit !== undefined) {
367
+ console.warn(
368
+ "[taskflow] Shorthand chain ignores top-level 'context'/'contextLimit' — put them on individual steps instead.",
369
+ );
370
+ }
348
371
  const steps = d.chain.map(readStep);
349
372
  const phases: Phase[] = steps.map((s, i) => {
350
373
  const phase: Phase = { id: `step${i + 1}`, type: "agent", task: s.task };
351
374
  if (s.agent) phase.agent = s.agent;
375
+ if (s.context) phase.context = s.context;
376
+ if (s.contextLimit !== undefined) phase.contextLimit = s.contextLimit;
352
377
  if (i > 0) phase.dependsOn = [`step${i}`];
353
378
  if (i === steps.length - 1) phase.final = true;
354
379
  return phase;
@@ -356,16 +381,30 @@ export function desugar(def: unknown): Taskflow {
356
381
  return { name: nameOf("chain"), ...meta, phases };
357
382
  }
358
383
 
359
- // tasks → one parallel phase (fan-out + merge), no extra aggregation agent
384
+ // tasks → one parallel phase (fan-out + merge), no extra aggregation agent.
385
+ // Context is SHARED across all branches (the runtime pre-reads per phase, not
386
+ // per branch): spec-level context plus the union of step-level contexts.
360
387
  if (Array.isArray(d.tasks) && d.tasks.length > 0) {
361
- const branches: ParallelTask[] = d.tasks.map(readStep).map((s) => (s.agent ? { task: s.task, agent: s.agent } : { task: s.task }));
362
- return { name: nameOf("parallel"), ...meta, phases: [{ id: "parallel", type: "parallel", branches, final: true }] };
388
+ const steps = d.tasks.map(readStep);
389
+ const branches: ParallelTask[] = steps.map((s) => (s.agent ? { task: s.task, agent: s.agent } : { task: s.task }));
390
+ const phase: Phase = { id: "parallel", type: "parallel", branches, final: true };
391
+ const shared = [...(readContextList(d.context) ?? []), ...steps.flatMap((s) => s.context ?? [])];
392
+ if (shared.length) phase.context = Array.from(new Set(shared));
393
+ const limits = [
394
+ typeof d.contextLimit === "number" ? d.contextLimit : undefined,
395
+ ...steps.map((s) => s.contextLimit),
396
+ ].filter((n): n is number => typeof n === "number");
397
+ if (limits.length) phase.contextLimit = Math.max(...limits);
398
+ return { name: nameOf("parallel"), ...meta, phases: [phase] };
363
399
  }
364
400
 
365
- // single task → one agent phase
401
+ // single task → one agent phase (the spec itself is the step)
366
402
  if (typeof d.task === "string") {
367
403
  const phase: Phase = { id: "main", type: "agent", task: d.task, final: true };
368
404
  if (typeof d.agent === "string") phase.agent = d.agent;
405
+ const ctx = readContextList(d.context);
406
+ if (ctx) phase.context = ctx;
407
+ if (typeof d.contextLimit === "number") phase.contextLimit = d.contextLimit;
369
408
  return { name: nameOf("task"), ...meta, phases: [phase] };
370
409
  }
371
410
 
@@ -376,6 +415,7 @@ export function desugar(def: unknown): Taskflow {
376
415
  // Validation (beyond schema: DAG integrity, phase-type requirements)
377
416
  // ---------------------------------------------------------------------------
378
417
 
418
+ /** @internal */
379
419
  export interface ValidationResult {
380
420
  ok: boolean;
381
421
  errors: string[];
@@ -618,16 +658,41 @@ export function validateTaskflow(def: unknown, opts: ValidationOptions = {}): Va
618
658
  // placeholder string. The runtime can't infer the intent — fail fast at
619
659
  // validation time so the mistake is caught before the run starts.
620
660
  //
661
+ // The check uses TRANSITIVE ancestors: if phase B depends on A, and C depends
662
+ // on B, then C may reference {steps.A.*} transitively. Only truly unreachable
663
+ // refs are errors.
664
+ //
621
665
  // Phases with `join: "any"` are exempt: by design they only need ONE of
622
666
  // their declared deps to complete, and may reference other phases as
623
667
  // informational context (not as true dependencies).
624
668
  if (errors.length === 0) {
625
669
  const idToPhase = new Map((flow.phases as Phase[]).map((p) => [p.id, p]));
670
+ // Precompute transitive ancestors for every phase via BFS over dependsOn.
671
+ const transitiveCache = new Map<string, Set<string>>();
672
+ const transitiveAncestors = (phaseId: string): Set<string> => {
673
+ const cached = transitiveCache.get(phaseId);
674
+ if (cached) return cached;
675
+ const result = new Set<string>();
676
+ const queue = [...(idToPhase.get(phaseId)?.dependsOn ?? []), ...(idToPhase.get(phaseId)?.from ?? [])];
677
+ while (queue.length) {
678
+ const id = queue.shift()!;
679
+ if (result.has(id)) continue;
680
+ result.add(id);
681
+ const dep = idToPhase.get(id);
682
+ if (dep) {
683
+ for (const d of [...(dep.dependsOn ?? []), ...(dep.from ?? [])]) {
684
+ if (!result.has(d)) queue.push(d);
685
+ }
686
+ }
687
+ }
688
+ transitiveCache.set(phaseId, result);
689
+ return result;
690
+ };
626
691
  for (const p of flow.phases as Phase[]) {
627
692
  if (!p?.id) continue;
628
693
  const isJoinAny = p.join === "any";
629
694
  if (isJoinAny) continue;
630
- const deps = new Set(dependenciesOf(p));
695
+ const transitive = transitiveAncestors(p.id);
631
696
  const refs = collectRefs(p);
632
697
  for (const ref of refs.steps) {
633
698
  if (ref === p.id) {
@@ -640,9 +705,9 @@ export function validateTaskflow(def: unknown, opts: ValidationOptions = {}): Va
640
705
  // double-warn — the dependsOn loop above already flags it.
641
706
  continue;
642
707
  }
643
- if (!deps.has(ref)) {
708
+ if (!transitive.has(ref)) {
644
709
  errors.push(
645
- `Phase '${p.id}': task references {steps.${ref}.*} but '${ref}' is not in dependsOn. ` +
710
+ `Phase '${p.id}': task references {steps.${ref}.*} but '${ref}' is not reachable via dependsOn. ` +
646
711
  `The phase will run in parallel with '${ref}' and see the literal placeholder. ` +
647
712
  `Add "dependsOn": ["${ref}"] (or include '${ref}' transitively).`,
648
713
  );
@@ -29,6 +29,7 @@ export interface SavedFlow {
29
29
  def: Taskflow;
30
30
  }
31
31
 
32
+ /** @internal */
32
33
  export type PhaseStatus = "pending" | "running" | "done" | "failed" | "skipped";
33
34
 
34
35
  export interface PhaseState {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-taskflow",
3
- "version": "0.0.20",
3
+ "version": "0.0.22",
4
4
  "description": "A declarative, verifiable graph of task nodes for the Pi coding agent — not a workflow you script, but a DAG you declare: statically verified before it runs, with dynamic fan-out, gates, isolated subagent context, resumable runs, and saveable commands.",
5
5
  "keywords": [
6
6
  "pi-package",
@@ -37,7 +37,7 @@
37
37
  ],
38
38
  "scripts": {
39
39
  "typecheck": "tsc --noEmit",
40
- "test": "PI_TASKFLOW_BUILTIN_AGENTS_DIR= node --experimental-strip-types --test test/interpolate.test.ts test/condition.test.ts test/schema.test.ts test/usage.test.ts test/runtime.test.ts test/features.test.ts test/runner.test.ts test/store.test.ts test/agents.test.ts test/init.test.ts test/render.test.ts test/approval-view.test.ts test/desugar.test.ts test/cache.test.ts test/loop.test.ts test/tournament.test.ts test/verify.test.ts test/gate-eval.test.ts test/transient-error.test.ts test/runtime-branches.test.ts test/interpolate-extended.test.ts test/store-extended.test.ts test/flow-def.test.ts test/detached.test.ts",
40
+ "test": "PI_TASKFLOW_BUILTIN_AGENTS_DIR= node --experimental-strip-types --test test/interpolate.test.ts test/condition.test.ts test/schema.test.ts test/usage.test.ts test/runtime.test.ts test/features.test.ts test/runner.test.ts test/store.test.ts test/agents.test.ts test/init.test.ts test/render.test.ts test/approval-view.test.ts test/desugar.test.ts test/cache.test.ts test/loop.test.ts test/tournament.test.ts test/verify.test.ts test/gate-eval.test.ts test/transient-error.test.ts test/runtime-branches.test.ts test/interpolate-extended.test.ts test/store-extended.test.ts test/flow-def.test.ts test/detached.test.ts test/runs-view.test.ts",
41
41
  "test:e2e": "PI_TASKFLOW_PI_BIN=pi node --experimental-strip-types test/e2e.mts",
42
42
  "test:dogfood-cache": "node --experimental-strip-types test/dogfood-cache.mts"
43
43
  },
@@ -43,10 +43,25 @@ proper flow, so you still get progress, persistence, resume, and `save`.
43
43
  ```
44
44
 
45
45
  - `agent` is optional (defaults to the first available agent).
46
+ - `context` (optional, per step or top-level in single mode): file paths to
47
+ pre-read and inject before the task — same as the full-DSL `Phase.context`
48
+ (per-file `contextLimit`, default 8000 chars). In **parallel `tasks` mode**
49
+ all branches SHARE the union of step contexts (the runtime pre-reads per
50
+ phase, not per branch). In **chain mode** declare `context` on individual
51
+ steps; a top-level `context` is ignored (with a warning).
46
52
  - Add `name` to label the run (and to `save` it as a `/tf:<name>` command).
47
53
  - Precedence if several are given: `chain` > `tasks` > `task`.
48
54
  - You can pass these as top-level tool params **or** inside `define`.
49
55
 
56
+ ```jsonc
57
+ // context pre-read in shorthand — the file content is injected before the task
58
+ { "chain": [
59
+ { "task": "Map the public API of src/lib", "agent": "scout" },
60
+ { "task": "Write docs for:\n{previous.output}", "agent": "doc-writer",
61
+ "context": ["AGENTS.md", "docs/style-guide.md"] }
62
+ ] }
63
+ ```
64
+
50
65
  ## How to author a taskflow
51
66
 
52
67
  Call the `taskflow` tool. To run a brand-new flow you write inline, pass
@@ -128,7 +143,7 @@ deciding. The (interpolated) `task` is the prompt shown.
128
143
  - **Reject** → halt the flow (same mechanism as a blocking gate).
129
144
  - **Edit** → the typed note becomes this phase's `output`, so you can inject
130
145
  guidance mid-run: reference it downstream with `{steps.<id>.output}`.
131
- - **Non-interactive** runs (headless/CI/print mode) **auto-approve** and record it.
146
+ - **Non-interactive** runs (headless/CI/print mode) **auto-reject** and record it — approval gates are safety boundaries that must never be silently bypassed.
132
147
  - **Background (detached)** runs **auto-reject** (no interactive approver) — downstream sees the rejection; the flow continues (fail-open).
133
148
 
134
149
  ```jsonc
@@ -170,9 +185,10 @@ Use hyphens in ids, never underscores. Sub-flow phases reference each other in
170
185
  their **own** `{steps.x.output}` namespace (no parent-id prefixing needed).
171
186
 
172
187
  **Fail-open & limits:** if the `def` doesn't parse, has the wrong shape, or fails
173
- validation, the phase fails *open* it's marked failed with a `defError`, the
174
- upstream output is preserved, and the run continues (use `optional: true` on the
175
- flow phase so a bad plan never aborts the run). An **empty** `phases` array is a
188
+ validation, the phase completes with `status: "done"` and carries a `defError`
189
+ diagnostic field; downstream phases receive empty output. Authors who want a
190
+ hard failure can add a gate that checks for `defError`. The run continues
191
+ (add `optional: true` on the flow phase so a bad plan never aborts the run). An **empty** `phases` array is a
176
192
  valid no-op (the planner decided there's nothing to do). Inline nesting is capped
177
193
  at `MAX_DYNAMIC_NESTING` (5) to bound runaway self-spawning.
178
194
 
@@ -217,7 +233,7 @@ A `tournament` phase runs `variants` competing attempts in parallel, then a
217
233
  (`mode: "aggregate"`). Use it when one shot is unreliable and you want the best
218
234
  of several drafts, or a synthesis of diverse approaches.
219
235
 
220
- - `variants` — the competing attempts: a number (run the same `task` N times) or an array of `{task, agent?}` for genuinely different approaches.
236
+ - `variants` — a number specifying how many competing variants to spawn from 'task' (default 3, max 20). For genuinely different approaches, use the `branches` field instead an explicit array of `{task, agent?}` definitions.
221
237
  - `mode` — `"best"` (judge picks one winner, default) or `"aggregate"` (judge merges all into one output).
222
238
  - `judge` — the judge's rubric/instructions (how to choose or merge).
223
239
  - `judgeAgent` — *(optional)* the agent that runs the judge step; defaults to the phase `agent`.
@@ -450,12 +466,13 @@ Add `detach: true` to `action: "run"` to spawn the flow in a detached child proc
450
466
 
451
467
  ## Operating a run (lifecycle, resume, inspection)
452
468
 
453
- A run moves through: **running →** `completed` (a `final` phase produced output) **/** `blocked` (a gate emitted BLOCK, an `approval` was rejected, or the `budget` cap was hit) **/** `failed` (a non-`optional` phase errored) **/** `paused` (the run was aborted). `failed` and `paused` runs are resumable; `blocked` is terminal (fix the gate/budget and re-run).
469
+ A run moves through: **running →** `completed` (a `final` phase produced output) **/** `blocked` (a gate emitted BLOCK, an `approval` was rejected, or the `budget` cap was hit) **/** `failed` (a non-`optional` phase errored) **/** `paused` (the run was aborted). `failed` and `paused` runs are resumable.
454
470
 
455
- - **Resume is cache-aware.** `action: "resume"` re-runs only what didn't finish: every phase already `done` is reused from its recorded output (within-run cache), so resuming after a crash or a `blocked`/`failed` stop never repeats completed work. A phase that was mid-flight is re-executed cleanly (stale `error`/`endedAt` are cleared first).
471
+ - **`blocked` runs:** a blocked status halts the current run the flow status is set to `blocked` and remaining phases are skipped. Re-running the flow resumes from the last completed state: `done` phases with matching input hashes are skipped; blocked/failed/skipped phases are re-attempted. Fix the gate condition or budget before re-running.
472
+ - **Resume is cache-aware.** `action: "resume"` re-runs only what didn't finish: every phase already `done` is reused from its recorded output (within-run cache), so resuming after a crash or a failed/blocked stop never repeats completed work. A phase that was mid-flight is re-executed cleanly (stale `error`/`endedAt` are cleared first).
456
473
  - **When to resume vs. re-run.** Resume when the inputs are unchanged and you just want to continue/retry the tail (fixed a gate, raised the budget, approved a checkpoint). Re-run from scratch when the task or upstream inputs changed — resume would reuse now-stale outputs. (For reuse *across* runs, opt a phase into `cache: {scope:"cross-run"}` — see configuration.md.)
457
474
  - **Budget mid-run.** When the run-wide `budget` is exceeded, remaining phases are skipped and an in-flight `map`/`parallel` stops spawning new items; the run ends `blocked` with the partial outputs preserved.
458
- - **Inspect runs.** `/tf runs` lists recent runs with status; `/tf show <name>` prints a saved flow's definition. Run state lives at `<project .pi>/taskflows/runs/<runId>.json` (gitignored).
475
+ - **Inspect runs.** `/tf runs` lists recent runs with status; `/tf show <name>` prints a saved flow's definition. Run state lives at `<project .pi>/taskflows/runs/<flowName>/<runId>.json` (gitignored).
459
476
 
460
477
  ## User commands
461
478
 
@@ -286,7 +286,7 @@ for the design.
286
286
  ### `ttl` (cross-run only)
287
287
 
288
288
  Max age before a cross-run hit is treated as a miss: e.g. `"30m"`, `"6h"`, `"7d"`.
289
- Omit for no time bound. A hit older than the TTL re-executes the phase.
289
+ Omit for no time bound. A hit older than the TTL re-executes the phase. Cross-run cache entries are hard-evicted after 90 days regardless of per-entry TTL. This ceiling is not configurable.
290
290
 
291
291
  ### `fingerprint` (cross-run only)
292
292
 
@@ -298,7 +298,7 @@ Each entry is one of:
298
298
  | Entry | Becomes a miss when… | Resolves to |
299
299
  |-------|----------------------|-------------|
300
300
  | `git:HEAD` / `git:<ref>` | the commit moves | the resolved SHA (30s timeout → `<timeout>`; no git → `<no-git>`) |
301
- | `glob:<pattern>` | the **set of matching paths** changes | sorted path list (mtime-free) |
301
+ | `glob:<pattern>` | the **set of matching paths** or their metadata changes | sorted path list with size + mtime (content-hashed globs use `glob!:` instead, which is mtime-independent) |
302
302
  | `glob!:<pattern>` | the **contents** of matching files change | content hashes (capped at 5000 matches) |
303
303
  | `file:<path>` | that file's content changes | sha256 of the file (>10 MB or missing → `<skip>`/`<missing>`) |
304
304
  | `env:<NAME>` | the env var changes | the env value |
@@ -333,7 +333,7 @@ Each entry is one of:
333
333
  |------|------|---------|
334
334
  | User-scoped flow | `~/.pi/agent/taskflows/<name>.json` | personal |
335
335
  | Project-scoped flow | `<nearest .pi>/taskflows/<name>.json` | ✅ commit to share |
336
- | Run state (resume) | `<project .pi>/taskflows/runs/<runId>.json` | ❌ gitignore |
336
+ | Run state (resume) | `<project .pi>/taskflows/runs/<flowName>/<runId>.json` | ❌ gitignore |
337
337
 
338
338
  - `action: "save"` takes `scope: "project"` (default) or `"user"`.
339
339
  - Saved flows auto-register as `/tf:<name>` (immediately for the current session,