pi-taskflow 0.0.27 → 0.0.28

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,60 @@
2
2
 
3
3
  All notable changes to pi-taskflow are documented here. This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) format.
4
4
 
5
+ ## [0.0.28] — 2026-06-27
6
+
7
+ > Granular-reuse release: **incremental recompute goes from whole-flow to
8
+ > per-phase and per-item.** v0.0.27 *proved* the recompute cost win; this
9
+ > release makes that win far larger and easier to opt into. Editing one phase
10
+ > now invalidates only that phase and its transitive dependents (a sibling keeps
11
+ > its cache hit), a `map` phase re-executes only the items that actually changed,
12
+ > and a single `incremental` flag flips a whole flow into cross-run reuse without
13
+ > annotating every phase.
14
+
15
+ ### Added
16
+ - **Per-phase structural sub-fingerprint (`v3:phasefp`).** The cache key now
17
+ folds a per-phase fingerprint — the phase plus its transitive `dependsOn ∪ from`
18
+ closure — instead of the whole-flow `v2:flowdef` hash. Editing phase B
19
+ invalidates only B and its dependents; an independent sibling A keeps its hit.
20
+ `cacheKeys` emits a 4-tier read ladder (`v3:phasefp` write → `v2:flowdef` →
21
+ bare flowdef → legacy, all read-only) so the upgrade is additive — no
22
+ miss-storm for unchanged flows. Fail-open: any per-phase error degrades that
23
+ phase to the whole-flow hash. Soundness fallback to whole-flow when per-phase
24
+ invalidation can't be statically guaranteed (flow-wide `contextSharing`, any
25
+ `shareContext` phase in the closure, `join: "any"`, or sub-flow inner phases).
26
+ (`extensions/flowir/phasefp.ts`, `test/cache-phasefp.test.ts` — 11 tests.)
27
+ - **Per-item cross-run caching for `map` phases.** When one of N items changes
28
+ between runs, only that item re-executes (N−1 cache hits) while the whole-map
29
+ fast path and every soundness fallback stay intact. Per-item keys omit the
30
+ structural fingerprint (which hashes the whole `over` source) so changing one
31
+ item no longer moves every key at once; they fold `[phase.id, it.agent, model,
32
+ it.task]` + the world-state tail, so task/agent/upstream/world changes still
33
+ invalidate the right items. Disabled (whole-map only) under run-only/off scope,
34
+ `shareContext`/flow-wide `contextSharing`, or inside a runtime-generated
35
+ sub-flow. (`test/cache-peritem.test.ts` — 11 tests.)
36
+ - **`incremental` flag** — flow-level (`TaskflowSchema.incremental`) and
37
+ invocation-level (`run` tool arg). Defaults every phase to `scope:"cross-run"`
38
+ so re-running a flow reuses unchanged phases across runs/sessions, without
39
+ annotating each phase. The invocation arg wins over the flow field; per-phase
40
+ cache settings and the cross-run-blocked types (gate/approval/loop/tournament)
41
+ still take precedence; default remains the safe `run-only` (fresh each run).
42
+ (`resolveCacheScope` in `extensions/index.ts`, `test/incremental-flag.test.ts`.)
43
+ - **Reuse reporting.** The end-of-run cache report and `/tf recompute` now show
44
+ reused-vs-executed counts and a per-phase "Why" trace (the explainable-
45
+ reactivity view: `▲ rerun / ✂ cutoff / ✓ reused / ✗ failed`, with `← causedBy`).
46
+ Dollar figures are reported only for within-run reuse, where the prior usage is
47
+ preserved; cross-run hits are counted but never attributed an invented saving.
48
+ (`summarizeReuse` / `RecomputeDecision` in `extensions/runtime.ts`,
49
+ `test/reuse-summary.test.ts`.)
50
+ - Tests: 804 → 846 (+42).
51
+
52
+ ### Changed
53
+ - **`phaseFingerprint` strips more policy fields** (`cache`, `retry`,
54
+ `concurrency`, `final`): none changes a phase's subagent *output*, so a no-op
55
+ config tweak no longer causes false cache invalidation.
56
+ - **README** test count and feature line refreshed (804 → 846 across 46 files);
57
+ `per-item map caching` added to the headline capabilities.
58
+
5
59
  ## [0.0.27] — 2026-06-25
6
60
 
7
61
  > Evidence release: **the incremental-recompute cost win is now proven, not
package/README.md CHANGED
@@ -8,7 +8,7 @@
8
8
  <a href="./LICENSE"><img src="https://img.shields.io/badge/license-MIT-43D9AD?style=flat-square" alt="MIT license"></a>
9
9
  <a href="#whats-inside"><img src="https://img.shields.io/badge/runtime%20deps-0-43D9AD?style=flat-square" alt="zero runtime dependencies"></a>
10
10
  <a href="https://github.com/heggria/pi-taskflow/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/heggria/pi-taskflow/ci.yml?branch=main&style=flat-square&label=CI" alt="CI status"></a>
11
- <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-804-6E8BFF?style=flat-square" alt="804 tests"></a>
11
+ <a href="#whats-inside"><img src="https://img.shields.io/badge/tests-846-6E8BFF?style=flat-square" alt="846 tests"></a>
12
12
  <a href="#whats-inside"><img src="https://img.shields.io/badge/dogfooded-%E2%9C%93-43D9AD?style=flat-square" alt="dogfooded"></a>
13
13
  <a href="https://pi.dev"><img src="https://img.shields.io/badge/for-Pi%20coding%20agent-B692FF?style=flat-square" alt="for the Pi coding agent"></a>
14
14
  </p>
@@ -728,12 +728,12 @@ Copy one into `.pi/taskflows/<name>.json` (or `~/.pi/agent/taskflows/`) and it r
728
728
 
729
729
  <div align="center">
730
730
 
731
- **0 runtime dependencies** · **804 tests** · **9 phase types** · **shared context tree** · **cross-session resume** · **cross-run memoization** · **incremental recompute** · **FlowIR compile seam** · **detached execution** · **`compile` Mermaid renderer** · **~9k LOC runtime**
731
+ **0 runtime dependencies** · **846 tests** · **9 phase types** · **shared context tree** · **cross-session resume** · **cross-run memoization** · **per-item map caching** · **incremental recompute** · **FlowIR compile seam** · **detached execution** · **`compile` Mermaid renderer** · **~9k LOC runtime**
732
732
 
733
733
  </div>
734
734
 
735
735
  - **Zero runtime dependencies.** No `dependencies` field — the runtime is built entirely on Node built-ins (`fs` / `path` / `os` / `child_process` / `crypto`). The file lock is `fs.openSync("wx")`, not a third-party library.
736
- - **804 tests across 42 test files** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), backward-compatible cache-key migration (3-tier legacy fallback), the FlowIR compile seam (determinism, declared-plane synthesis), incremental recompute (early-cutoff propagation, partial cascade strictly < full, observed ∪ declared union frontier), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, the shared context tree (blackboard reuse, supervision spawn, subflow validation/nesting), workspace isolation (temp/dedicated/worktree lifecycle, fail-open degrade, dynamic-flow rejection), dynamic sub-flow security hardening, detached execution (PID persistence, stale detection, crash→failed, resume after failure), live run-history refresh, callback isolation, the idle watchdog, model-role init config, parseModelFromLabel with parenthesized-model-name regression, and multi-fence `safeParse` recovery, plus the `compile` Mermaid renderer (id-collision disambiguation, markdown-injection hardening, and full verify-overlay category coverage).
736
+ - **846 tests across 46 test files** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), backward-compatible cache-key migration (4-tier legacy fallback), per-phase structural sub-fingerprint (v3:phasefp — editing one phase invalidates only it and its dependents), per-item map caching (one changed item re-executes, N−1 cache hits), the `incremental` flag (run-wide cross-run default), reuse reporting, the FlowIR compile seam (determinism, declared-plane synthesis), incremental recompute (early-cutoff propagation, partial cascade strictly < full, observed ∪ declared union frontier), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, the shared context tree (blackboard reuse, supervision spawn, subflow validation/nesting), workspace isolation (temp/dedicated/worktree lifecycle, fail-open degrade, dynamic-flow rejection), dynamic sub-flow security hardening, detached execution (PID persistence, stale detection, crash→failed, resume after failure), live run-history refresh, callback isolation, the idle watchdog, model-role init config, parseModelFromLabel with parenthesized-model-name regression, and multi-fence `safeParse` recovery, plus the `compile` Mermaid renderer (id-collision disambiguation, markdown-injection hardening, and full verify-overlay category coverage).
737
737
  - **Hardened by design.** Path-traversal defense (lexical + `realpath` containment check), runId validation, HTML/error sanitization, atomic writes, stale-lock stealing via `rename`, and an idle watchdog that kills wedged subagents (SIGTERM → SIGKILL after 5 minutes of silence). Dynamic sub-flows additionally get breadth caps, `cwd` containment, budget clamping, nesting depth caps, and prototype-pollution defense.
738
738
  - **Dogfooded.** Every new feature has to survive the project's own `self-improve` taskflow before it ships.
739
739
 
@@ -71,3 +71,5 @@ export type {
71
71
  TaskflowIR,
72
72
  TaskflowIRMeta,
73
73
  } from "./meta.ts";
74
+
75
+ export { phaseFingerprint } from "./phasefp.ts";
@@ -0,0 +1,121 @@
1
+ /**
2
+ * Per-phase structural sub-fingerprint (M6).
3
+ *
4
+ * `phaseFingerprint` produces a content-addressed hash of ONLY the subset of
5
+ * the flow definition that can affect a single phase's subagent output: the
6
+ * phase itself plus its transitive dependency closure. Folding this into the
7
+ * cross-run cache key (instead of the whole-flow `flowDefHash`) means editing
8
+ * phase B invalidates only B and its transitive dependents — independent
9
+ * sibling phase A keeps its cache hit.
10
+ *
11
+ * ## Soundness (the fallback gate)
12
+ *
13
+ * Per-phase invalidation is only sound when a phase's *real* dependencies are
14
+ * fully captured by the static `dependsOn ∪ from` closure. Three cases break
15
+ * that guarantee, so `phaseFingerprint` returns `undefined` for them and the
16
+ * caller falls back to the whole-flow `flowDefHash` (safe, = pre-M6 behavior):
17
+ *
18
+ * 1. **Shared Context Tree** (`def.contextSharing === true` or any closure
19
+ * member has `shareContext === true`): a sharing phase can read sibling
20
+ * blackboard writes OUTSIDE its declared deps, so the static closure
21
+ * under-approximates real reads.
22
+ * 2. **`flow` phase in the closure** (`type === "flow"`): a `flow` phase's
23
+ * sub-structure is resolved at runtime (inline `def`) or from a saved
24
+ * flow (`use`) and is not statically visible here. Editing the saved
25
+ * sub-flow would not move this phase's sub-fingerprint.
26
+ * 3. **`join: "any"` phase** (`phase.join === "any"`): validation exempts it
27
+ * from the `{steps.X}`-must-be-in-`dependsOn` check, so it may read
28
+ * phases outside its static closure. The closure under-approximates its
29
+ * real reads, so fall back to whole-flow invalidation.
30
+ *
31
+ * `cache`, `retry`, `concurrency`, and `final` are stripped from each phase
32
+ * before hashing: none of them changes the subagent's OUTPUT (they are policy,
33
+ * execution mechanics, or result selection). `cache`'s sub-fields
34
+ * (`scope`/`ttl`/`fingerprint`) reach the cache key through other paths
35
+ * (`cc.scope` gates the lookup, `cc.ttlMs` governs expiry, `cc.fingerprint` is
36
+ * in the key tail). Every other `Phase` field is hashed. `PhaseSchema` uses
37
+ * `additionalProperties: false`, so no surprise field can be missed.
38
+ *
39
+ * Pure + async (Web Crypto via `hashCanonical`). Reuses the vendored
40
+ * `canonicalJson`/`hashCanonical` (byte-identical to overstory's contract) so
41
+ * the sub-fingerprint shares one hashing contract with `flowDefHash`. Never
42
+ * throws — callers wrap in try/catch and degrade to `flowDefHash`.
43
+ *
44
+ * @see docs/internal/cache-migration.md (v3:phasefp tier)
45
+ */
46
+
47
+ import { transitiveDependencies, type Phase, type Taskflow } from "../schema.ts";
48
+ import { canonicalJson, hashCanonical } from "./hash.ts";
49
+
50
+ /** Fields stripped before hashing because they do NOT affect a phase's
51
+ * subagent OUTPUT, only execution mechanics or result selection — folding
52
+ * them in would cause false cache invalidation on a no-op config change:
53
+ * - `cache`: policy object; its sub-fields reach the key via
54
+ * `cc.scope`/`cc.ttlMs`/`cc.fingerprint`.
55
+ * - `retry`: retry/backoff is execution mechanics; a successful phase
56
+ * produces the same output regardless of how many attempts it took.
57
+ * - `concurrency`: fan-out parallelism; does not change any item's output.
58
+ * - `final`: marks which phase's output is the flow result; does not change
59
+ * the phase's own output. */
60
+ const PHASE_FP_STRIP = ["cache", "retry", "concurrency", "final"] as const;
61
+
62
+ /** Clone a phase into a plain record with policy fields removed. */
63
+ function stripPolicy(phase: Phase): Record<string, unknown> {
64
+ const rec = phase as unknown as Record<string, unknown>;
65
+ const out: Record<string, unknown> = {};
66
+ for (const k of Object.keys(rec)) {
67
+ if ((PHASE_FP_STRIP as readonly string[]).includes(k)) continue;
68
+ out[k] = rec[k];
69
+ }
70
+ return out;
71
+ }
72
+
73
+ /**
74
+ * Per-phase structural sub-fingerprint.
75
+ *
76
+ * @returns the hex hash, or `undefined` when per-phase soundness cannot be
77
+ * guaranteed (caller falls back to the whole-flow `flowDefHash`). Never
78
+ * throws.
79
+ */
80
+ export async function phaseFingerprint(def: Taskflow, phaseId: string): Promise<string | undefined> {
81
+ const phases = def.phases as Phase[];
82
+ const byId = new Map(phases.map((p) => [p.id, p]));
83
+ const phase = byId.get(phaseId);
84
+ if (!phase) return undefined;
85
+
86
+ // --- Soundness gate: fall back to whole-flow when static closure is unsafe. ---
87
+ // Flow-wide context sharing enables cross-sibling reads outside declared deps.
88
+ if (def.contextSharing === true) return undefined;
89
+ // A `join: "any"` phase may interpolate `{steps.X.*}` refs to phases OUTSIDE
90
+ // its declared dependsOn (validation deliberately exempts it — schema.ts), so
91
+ // the static closure under-approximates its real reads. Fall back to
92
+ // whole-flow invalidation rather than rely on the key tail alone (which would
93
+ // be an undocumented coupling). Safe, = pre-M6 behavior.
94
+ if (phase.join === "any") return undefined;
95
+
96
+ const closureIds = transitiveDependencies(phases, phaseId);
97
+ const closurePhases: Phase[] = [];
98
+ for (const id of closureIds) {
99
+ const p = byId.get(id);
100
+ if (!p) continue; // unknown dep — validation reports elsewhere
101
+ // Per-phase sharing: this closure member can read sibling blackboard
102
+ // writes outside its own declared deps.
103
+ if (p.shareContext === true) return undefined;
104
+ // A flow phase's sub-structure is runtime/saved-flow-resolved and not
105
+ // statically visible — editing it would not move the sub-fingerprint.
106
+ if ((p.type ?? "agent") === "flow") return undefined;
107
+ closurePhases.push(p);
108
+ }
109
+ // The self phase's own sharing/type is part of the closure too.
110
+ if (phase.shareContext === true) return undefined;
111
+ if ((phase.type ?? "agent") === "flow") return undefined;
112
+
113
+ // --- Build the canonical payload. ---
114
+ // `deps` is the SORTED transitive closure (self excluded). canonicalJson
115
+ // sorts OBJECT keys but preserves ARRAY order, so we sort the array
116
+ // explicitly for determinism independent of dependency walk order.
117
+ const depsPayload = closurePhases.map((p) => ({ id: p.id, def: stripPolicy(p) }));
118
+ const payload = { self: stripPolicy(phase), deps: depsPayload };
119
+
120
+ return hashCanonical(canonicalJson(payload));
121
+ }
@@ -28,7 +28,7 @@ import { type AgentScope, discoverAgents, readSubagentSettings, shouldSyncBuilti
28
28
  import { renderRunResult, summarizeRun } from "./render.ts";
29
29
  import { RunHistoryComponent, type RunHistoryResult } from "./runs-view.ts";
30
30
  import { ApprovalViewComponent, type ApprovalChoice } from "./approval-view.ts";
31
- import { executeTaskflow, recomputeTaskflow, type ApprovalDecision, type ApprovalRequest, type RecomputeReport, type RuntimeDeps, type RuntimeResult } from "./runtime.ts";
31
+ import { executeTaskflow, recomputeTaskflow, summarizeReuse, type ApprovalDecision, type ApprovalRequest, type RecomputeReport, type RuntimeDeps, type RuntimeResult } from "./runtime.ts";
32
32
  import { type UsageStats } from "./usage.ts";
33
33
  import { finalPhase, resolveArgs, type Taskflow, validateTaskflow, desugar, isShorthand } from "./schema.ts";
34
34
  import {
@@ -150,6 +150,12 @@ const TaskflowParams = Type.Object({
150
150
  description: "Run in background (detached child process); return runId immediately. Status polled via store.",
151
151
  }),
152
152
  ),
153
+ incremental: Type.Optional(
154
+ Type.Boolean({
155
+ description:
156
+ "For action=run: default every phase to cross-run caching so re-running the flow reuses unchanged phases across runs/sessions (incremental recompute). Overrides the flow's own `incremental` field. Per-phase cache settings and cross-run-blocked types (gate/approval/loop/tournament) still take precedence. Omit to use the flow's setting (default: run-only — fresh each run).",
157
+ }),
158
+ ),
153
159
  });
154
160
 
155
161
  function formatFlowIR(ir: TaskflowIR): string {
@@ -225,6 +231,17 @@ function formatRecompute(r: RecomputeReport): string {
225
231
  if (r.cutoff.length > 0) lines.push(` → saved ${r.cutoff.length} re-execution(s).`);
226
232
  }
227
233
  lines.push(`✓ reused (outside frontier): ${r.reused.join(", ") || "—"}`);
234
+ // Per-phase "why" — the explainable-reactivity trace (like React DevTools
235
+ // telling you why each component re-rendered). Only shown when present.
236
+ if (r.decisions && r.decisions.length > 0) {
237
+ const glyph: Record<string, string> = { rerun: "▲", cutoff: "✂", reused: "✓", failed: "✗" };
238
+ lines.push("");
239
+ lines.push("Why:");
240
+ for (const d of r.decisions) {
241
+ const cause = d.causedBy && d.causedBy.length ? ` ← ${d.causedBy.join(", ")}` : "";
242
+ lines.push(` ${glyph[d.outcome] ?? "•"} ${d.phaseId}: ${d.reason}${cause}`);
243
+ }
244
+ }
228
245
  return lines.join("\n");
229
246
  }
230
247
 
@@ -242,6 +259,18 @@ function makeRunState(def: Taskflow, args: Record<string, unknown>, cwd: string)
242
259
  };
243
260
  }
244
261
 
262
+ /** Resolve the run-wide default cache scope from the incremental flags. The
263
+ * invocation-level override (the `incremental` tool arg) wins; otherwise the
264
+ * flow's own `incremental` field; otherwise the safe `run-only` default
265
+ * (each run starts fresh — cross-run reuse is opt-in). Exported for testing. */
266
+ export function resolveCacheScope(
267
+ incrementalOverride: boolean | undefined,
268
+ flowIncremental: boolean | undefined,
269
+ ): "cross-run" | "run-only" {
270
+ const on = typeof incrementalOverride === "boolean" ? incrementalOverride : flowIncremental;
271
+ return on === true ? "cross-run" : "run-only";
272
+ }
273
+
245
274
  async function runFlow(
246
275
  def: Taskflow,
247
276
  args: Record<string, unknown>,
@@ -249,6 +278,9 @@ async function runFlow(
249
278
  signal: AbortSignal | undefined,
250
279
  onUpdate: ((p: AgentToolResult<TaskflowDetails>) => void) | undefined,
251
280
  existing?: RunState,
281
+ // Invocation-level incremental override: when set, wins over def.incremental.
282
+ // undefined → fall back to the flow's own `incremental` field (default off).
283
+ incrementalOverride?: boolean,
252
284
  ): Promise<RuntimeResult> {
253
285
  const state = existing ?? makeRunState(def, args, ctx.cwd);
254
286
 
@@ -374,11 +406,15 @@ async function runFlow(
374
406
  persist: persistThrottled,
375
407
  requestApproval,
376
408
  loadFlow: (name: string) => getFlow(ctx.cwd, name)?.def,
377
- // Cross-run cache is opt-in per phase (cache:{scope:"cross-run"}).
378
- // Defaulting every real run to cross-run was reviewed out: it silently
379
- // persists phase outputs and can serve stale results for phases whose
380
- // agents read files at runtime (those files are not in the cache key).
381
- cacheScopeDefault: "run-only",
409
+ // Cross-run cache is opt-in. By default a real run is `run-only` (fresh
410
+ // each run): defaulting every phase to cross-run silently persists
411
+ // outputs and can serve stale results for phases whose agents read files
412
+ // at runtime (those files are not in the cache key). A user opts in
413
+ // explicitly — the invocation `incremental` arg wins, else the flow's
414
+ // own `incremental` field, else the safe run-only default. All the
415
+ // soundness fallbacks (blocked types, per-phase fingerprint, shareContext)
416
+ // still apply per phase inside executePhase.
417
+ cacheScopeDefault: resolveCacheScope(incrementalOverride, def.incremental),
382
418
  });
383
419
  // Auto-report cache savings at the end of a real run so the user sees the
384
420
  // M1-M5 effect without running a separate /tf command.
@@ -958,7 +994,7 @@ export default function (pi: ExtensionAPI) {
958
994
  };
959
995
  }
960
996
 
961
- const result = await runFlow(def, args, ctx, signal, onUpdate as any);
997
+ const result = await runFlow(def, args, ctx, signal, onUpdate as any, undefined, params.incremental as boolean | undefined);
962
998
  // Surface the validation warnings in the tool result so the model
963
999
  // can acknowledge or fix them, and the user sees them in the chat.
964
1000
  if (v.warnings.length) {
@@ -1399,15 +1435,18 @@ function errorResult(action: string, message: string): ToolResult {
1399
1435
  };
1400
1436
  }
1401
1437
 
1402
- function formatCacheReport(state: RunState, totalUsage: UsageStats): string {
1403
- const cached = Object.values(state.phases).filter((p) => p.cacheHit === "cross-run");
1404
- if (cached.length === 0) return "";
1405
- // Honest reporting: we know these phases spent 0 tokens *this run* because
1406
- // they were served from cache. We do NOT estimate dollars/tokens "saved" —
1407
- // that requires guessing what a re-execution would have cost, and the mix of
1408
- // cheap vs expensive phases (tournament/loop) makes such a guess misleading.
1409
- const cachedTokens = cached.reduce((sum, p) => sum + ((p.usage?.input ?? 0) + (p.usage?.output ?? 0)), 0);
1410
- return `💾 ${cached.length} phase(s) reused from cross-run cache (${cachedTokens.toLocaleString()} tokens spent on them this run)`;
1438
+ function formatCacheReport(state: RunState, _totalUsage: UsageStats): string {
1439
+ const r = summarizeReuse(state);
1440
+ const reused = r.reusedRunOnly + r.reusedCrossRun;
1441
+ if (reused === 0) return ""; // nothing reused no incremental story to tell
1442
+ // Honest framing: report reused-vs-executed counts, and a dollar figure only
1443
+ // for within-run reuse (where the prior usage is preserved). Cross-run hits
1444
+ // zero their usage, so their original cost is genuinely unknown — we say
1445
+ // "reused" without inventing a savings number for them.
1446
+ const parts: string[] = [`♻️ ${reused}/${r.done} phase(s) reused (${r.executed} executed this run)`];
1447
+ if (r.savedUSD > 0) parts.push(`~$${r.savedUSD.toFixed(4)} of re-execution avoided`);
1448
+ if (r.reusedCrossRun > 0) parts.push(`${r.reusedCrossRun} from cross-run cache`);
1449
+ return parts.join(" · ");
1411
1450
  }
1412
1451
 
1413
1452
  function finalResult(action: string, result: RuntimeResult): ToolResult {
@@ -20,7 +20,7 @@ import { type Budget, type CacheScope, dependenciesOf, finalPhase, LOOP_DEFAULT_
20
20
  import { verifyTaskflow } from "./verify.ts";
21
21
  import { hashInput, newRunId, type PhaseState, type RunState, runsDir } from "./store.ts";
22
22
  import { CacheStore, resolveFingerprint } from "./cache.ts";
23
- import { compileTaskflowToIR } from "./flowir/index.ts";
23
+ import { compileTaskflowToIR, phaseFingerprint } from "./flowir/index.ts";
24
24
  import { computeStaleFrontier, declaredReadMapOfDef, readMapOf } from "./stale.ts";
25
25
  import { ctxDirFor, drainPendingSpawns, initCtxDir, registerNode, setNodeStatus, type SpawnAssignment } from "./context-store.ts";
26
26
  import { allocateWorkspace, isWorkspaceKeyword, type Workspace } from "./workspace.ts";
@@ -72,6 +72,55 @@ export interface RuntimeResult {
72
72
  finalOutput: string;
73
73
  ok: boolean;
74
74
  totalUsage: UsageStats;
75
+ /** Incremental-reuse summary: how many phases were reused from cache vs.
76
+ * freshly executed this run, and the cost the reused work would otherwise
77
+ * have incurred (known only for within-run resume; cross-run hits zero
78
+ * their usage so their original cost is not recoverable). Optional &
79
+ * additive — callers that ignore it are unaffected. */
80
+ reuse?: ReuseSummary;
81
+ }
82
+
83
+ /** A run's incremental-reuse accounting (see RuntimeResult.reuse). */
84
+ export interface ReuseSummary {
85
+ /** Phases that completed by executing a subagent this run. */
86
+ executed: number;
87
+ /** Phases served from the within-run resume cache (no new tokens). */
88
+ reusedRunOnly: number;
89
+ /** Phases restored from the cross-run store (no new tokens). */
90
+ reusedCrossRun: number;
91
+ /** Total phases that reached `done` (executed + reused). */
92
+ done: number;
93
+ /** USD the within-run-reused phases would have cost if re-executed (their
94
+ * preserved prior usage). Cross-run hits are excluded (cost not recoverable). */
95
+ savedUSD: number;
96
+ }
97
+
98
+ /** Compute the incremental-reuse summary from a run's terminal phase states.
99
+ * Pure, total, never throws. A phase is "reused" iff it carries a `cacheHit`
100
+ * marker (set by `cachedPhase` for both within-run resume and cross-run hits). */
101
+ export function summarizeReuse(state: RunState): ReuseSummary {
102
+ let executed = 0;
103
+ let reusedRunOnly = 0;
104
+ let reusedCrossRun = 0;
105
+ let savedUSD = 0;
106
+ for (const ps of Object.values(state.phases)) {
107
+ if (ps.status !== "done") continue;
108
+ if (ps.cacheHit === "run-only") {
109
+ reusedRunOnly++;
110
+ savedUSD += ps.usage?.cost ?? 0; // within-run resume preserves prior usage
111
+ } else if (ps.cacheHit === "cross-run") {
112
+ reusedCrossRun++; // cross-run hits zero their usage — cost not recoverable
113
+ } else {
114
+ executed++;
115
+ }
116
+ }
117
+ return {
118
+ executed,
119
+ reusedRunOnly,
120
+ reusedCrossRun,
121
+ done: executed + reusedRunOnly + reusedCrossRun,
122
+ savedUSD,
123
+ };
75
124
  }
76
125
 
77
126
  function buildInterpolationContext(
@@ -120,6 +169,31 @@ function resultToPhaseState(id: string, r: RunResult, inputHash: string, parseJs
120
169
  };
121
170
  }
122
171
 
172
+ /**
173
+ * Synthesize a 0-token `RunResult` from a cached per-item `PhaseState` so a
174
+ * cross-run per-item cache hit flows through `mergePhaseState` as a normal
175
+ * successful fan-out item. `stopReason: "cache-hit"` is NOT in `isFailed`'s
176
+ * failure set (only "error"/"aborted"/non-zero exit), so the item counts as
177
+ * success. Usage is `emptyUsage()` — a cached item spent no new tokens this
178
+ * run, so `mergePhaseState`'s `aggregateUsage` charges nothing for it.
179
+ *
180
+ * Used only by the `map` per-item cache path (see `runFanout`). Fail-open by
181
+ * construction: this is only reached AFTER a successful `cachedPhase` lookup,
182
+ * so `ps.output` is always present.
183
+ */
184
+ function phaseStateToRunResult(ps: PhaseState, it: { agent: string; task: string }): RunResult {
185
+ return {
186
+ agent: it.agent,
187
+ task: it.task,
188
+ exitCode: 0,
189
+ output: ps.output ?? "",
190
+ stderr: "",
191
+ usage: emptyUsage(),
192
+ model: ps.model,
193
+ stopReason: "cache-hit",
194
+ };
195
+ }
196
+
123
197
  /** Convert observed read refs (e.g. "steps.scout.output") into a structured
124
198
  * readSet keyed by upstream phase id, tagging each with the version
125
199
  * (= inputHash) that was current when read. Only `steps.*` refs are upstream
@@ -277,12 +351,20 @@ function mergePhaseState(
277
351
  const model = ran.find((r) => r.model !== undefined)?.model;
278
352
  // Combine outputs as a labelled list; also expose a JSON array of outputs.
279
353
  // For failed items, use the error message instead of the useless placeholder.
280
- const combinedText = ran
354
+ // Labels are positionally aligned to the ORIGINAL `over` array: we iterate
355
+ // over ALL results (including budget-skipped, which are filtered to null) and
356
+ // use `results.length` as N, so item k's label reads `[k/N]` matching its
357
+ // position in `over` — not its rank among non-skipped items. Per-item cache
358
+ // hits (`stopReason: "cache-hit"`) are not budget-skipped, so they keep their
359
+ // original positional label.
360
+ const combinedText = results
281
361
  .map((r, i) => {
282
- const label = `### [${i + 1}/${ran.length}] ${r.agent}${isFailed(r) ? " (failed)" : ""}`;
362
+ if (r.stopReason === "budget-skipped") return null;
363
+ const label = `### [${i + 1}/${results.length}] ${r.agent}${isFailed(r) ? " (failed)" : ""}`;
283
364
  const content = isFailed(r) ? (r.errorMessage || r.stderr || r.output) : r.output;
284
365
  return `${label}\n\n${content}`;
285
366
  })
367
+ .filter((x): x is string => x !== null)
286
368
  .join("\n\n---\n\n");
287
369
  // Only successful runs feed the parsed JSON array (no error/skip strings).
288
370
  const jsonArray = parseJson ? ran.filter((r) => !isFailed(r)).map((r) => safeParse(r.output) ?? r.output) : undefined;
@@ -721,6 +803,7 @@ async function executePhaseInner(
721
803
  flowName: state.flowName,
722
804
  runId: state.runId,
723
805
  flowDefHash: state.flowDefHash === "failed" ? undefined : state.flowDefHash,
806
+ phaseFp: state.phaseFingerprints?.[phase.id],
724
807
  forceRerun: opts?.forceRerun,
725
808
  thinking: phase.thinking,
726
809
  tools: phase.tools,
@@ -820,7 +903,14 @@ async function executePhaseInner(
820
903
  const parseJson = phase.output === "json";
821
904
 
822
905
  // Runs a list of sub-tasks with live fan-out progress + aggregate live usage/activity.
823
- const runFanout = async (items: Array<{ agent: string; task: string }>): Promise<RunResult[]> => {
906
+ // `perItem` (map only) enables per-item cross-run caching: each item is looked
907
+ // up in the cache before spawning a subagent, and a successful fresh item is
908
+ // recorded so a later run with that item unchanged hits per-item. When
909
+ // `perItem` is undefined (parallel, or non-cacheable maps) the path is inert.
910
+ const runFanout = async (
911
+ items: Array<{ agent: string; task: string }>,
912
+ perItem?: { keyOf: (idx: number) => CacheKeys | null; cc: PhaseCacheCtx },
913
+ ): Promise<RunResult[]> => {
824
914
  let done = 0;
825
915
  let running = 0;
826
916
  let failed = 0;
@@ -854,6 +944,28 @@ async function executePhaseInner(
854
944
  stopReason: "budget-skipped",
855
945
  } satisfies RunResult;
856
946
  }
947
+ // Per-item cross-run cache lookup (map only). A hit synthesizes a 0-token
948
+ // RunResult and returns immediately — the item never spawns a subagent and
949
+ // never reaches the ctx_spawn drain below (a cached item can't have queued
950
+ // new spawns). Fail-open: any error in the lookup path degrades to executing.
951
+ if (perItem) {
952
+ try {
953
+ const ckItem = perItem.keyOf(idx);
954
+ if (ckItem) {
955
+ const hit = cachedPhase(perItem.cc, ckItem);
956
+ if (hit) {
957
+ done++;
958
+ const synth = phaseStateToRunResult(hit, it);
959
+ liveUsages[idx] = emptyUsage();
960
+ if (hit.model) latestModel = hit.model;
961
+ refresh();
962
+ return synth;
963
+ }
964
+ }
965
+ } catch {
966
+ /* fail-open: a cache read error must never sink the item */
967
+ }
968
+ }
857
969
  running++;
858
970
  refresh();
859
971
  if (ctxDir) {
@@ -869,6 +981,23 @@ async function executePhaseInner(
869
981
  done++;
870
982
  if (isFailed(r)) failed++;
871
983
  liveUsages[idx] = r.usage;
984
+ // Per-item cross-run cache record (map only): persist a successful fresh
985
+ // item so a later run with this item unchanged hits per-item instead of
986
+ // re-running. Failed and budget-skipped items are never cached (a stale
987
+ // failure would be served on the next run). Fail-open: a write error never
988
+ // sinks the item — the fresh `r` is already in hand and flows downstream.
989
+ if (perItem && !isFailed(r) && r.stopReason !== "budget-skipped") {
990
+ try {
991
+ const ckItem = perItem.keyOf(idx);
992
+ if (ckItem) {
993
+ const ccItem: PhaseCacheCtx = { ...perItem.cc, phaseId: `${phase.id}#item${idx}` };
994
+ const itemPs = resultToPhaseState(`${phase.id}#item${idx}`, r, ckItem.key, parseJson);
995
+ recordCache(ccItem, itemPs);
996
+ }
997
+ } catch {
998
+ /* fail-open: cache write must never sink the item */
999
+ }
1000
+ }
872
1001
  if (ctxDir) {
873
1002
  try {
874
1003
  const itemNid = nodeIdFor(String(idx));
@@ -1068,12 +1197,59 @@ async function executePhaseInner(
1068
1197
  task: preRead + interpolate(phase.task ?? "", localCtx).text,
1069
1198
  };
1070
1199
  });
1200
+ // Per-item caching is sound ONLY when ALL of:
1201
+ // - cross-run scope: run-only has no persistent store, so per-item entries
1202
+ // could never be re-read (no point keying them).
1203
+ // - no Shared Context Tree (`!sharing`): a sharing map item can read sibling
1204
+ // blackboard writes OUTSIDE its declared deps, so the per-item key (which
1205
+ // folds only the item's own task) under-approximates real reads and could
1206
+ // serve a stale result. Fall back to whole-map.
1207
+ // - not inside a runtime-generated sub-flow (`def:` frame in the stack):
1208
+ // such flows are untrusted / possibly non-deterministic, so per-item reuse
1209
+ // is unsafe. Fall back to whole-map (which still applies breadth caps).
1210
+ // `undefined phaseFingerprint` is NOT a blocker for soundness — it is a
1211
+ // DELIBERATE design choice: per-item keys omit BOTH phaseFp and flowDefHash
1212
+ // (via ccPerItem below) so a changing `over` cannot move unchanged items'
1213
+ // keys. See ccPerItem for the full soundness argument.
1214
+ const perItemCacheable =
1215
+ cc.scope === "cross-run" &&
1216
+ !sharing &&
1217
+ !(deps._stack ?? []).some((s) => s.startsWith("def:"));
1218
+ // Per-item cache context: structural fingerprints (phaseFp + flowDefHash)
1219
+ // are OMITTED so a changing `over` cannot move unchanged items' keys. Both
1220
+ // fingerprints hash `over` (the array source); folding either into a
1221
+ // per-item key means editing one item invalidates EVERY per-item key at
1222
+ // once (no partial reuse) — the bug fixed here. A single item's output is
1223
+ // fully specified by `it.task` (template + {item}/{as} value + any
1224
+ // upstream-output refs + args) + `it.agent` + model + thinking/tools/preRead
1225
+ // + the world-state `fingerprint`; `over` only determines WHICH items
1226
+ // exist, not WHAT any item computes. `flowName` is retained for cross-flow
1227
+ // collision prevention. Soundness: docs/internal/cache-migration.md.
1228
+ // NB: perItemCacheable already gates on scope === "cross-run", which is
1229
+ // blocked upstream when flowDefHash === "failed", so ccPerItem is only
1230
+ // built when flowDefHash is a real hash (or already undefined) — setting
1231
+ // it to undefined here is a safe no-op for the failed case.
1232
+ const ccPerItem: PhaseCacheCtx = { ...cc, phaseFp: undefined, flowDefHash: undefined };
1233
+ // Pre-compute per-item CacheKeys once so the lookup and the record path use
1234
+ // the IDENTICAL key (built from ccPerItem, NOT the whole-phase cc). The
1235
+ // per-item key folds `it.agent` (Arbiter fix): a different agent means
1236
+ // different output, so a per-item key WITHOUT the agent could serve a stale
1237
+ // cross-agent hit when only `phase.agent` changed (the whole-map key would
1238
+ // correctly miss via JSON.stringify(tasks), but per-item keys would not).
1239
+ const perItemKeys: (CacheKeys | null)[] = perItemCacheable
1240
+ ? tasks.map((it) => cacheKeys(ccPerItem, [phase.id, it.agent, phase.model ?? "", it.task]))
1241
+ : tasks.map(() => null);
1242
+ const perItem = perItemCacheable
1243
+ ? { keyOf: (idx: number): CacheKeys | null => perItemKeys[idx] ?? null, cc: ccPerItem }
1244
+ : undefined;
1245
+ // Whole-map key keeps the FULL cc (phaseFp + flowDefHash) so its fast path
1246
+ // and any pre-existing whole-map entries are unchanged (backward compat).
1071
1247
  const ck = cacheKeys(cc, [phase.id, phase.model ?? "", JSON.stringify(tasks)]);
1072
1248
  const inputHash = ck.key;
1073
1249
  const cached = cachedPhase(cc, ck);
1074
1250
  if (cached) return cached;
1075
1251
 
1076
- const results = await runFanout(tasks);
1252
+ const results = await runFanout(tasks, perItem);
1077
1253
  const ps = mergePhaseState(phase.id, results, inputHash, parseJson);
1078
1254
  if (readRefs.length) ps.reads = readRefsToReads(readRefs, state);
1079
1255
  if (mapTruncated) {
@@ -1635,6 +1811,12 @@ export interface PhaseCacheCtx {
1635
1811
  * key so two structurally-different flows that share a name can never
1636
1812
  * collide, and a changed flow never serves a stale cross-run hit. */
1637
1813
  flowDefHash?: string | "failed";
1814
+ /** Per-phase structural sub-fingerprint (M6). When present, folds into the
1815
+ * key as `v3:phasefp:<subfp>` so editing phase B invalidates only B + its
1816
+ * transitive dependents. When absent (sub-flow inner states, or a phase
1817
+ * for which per-phase soundness couldn't be guaranteed), `cacheKeys`
1818
+ * falls back to `flowDefHash` — preserving pre-M6 whole-flow behavior. */
1819
+ phaseFp?: string;
1638
1820
  /** Force this phase to re-execute, ignoring the within-run prior AND the
1639
1821
  * cross-run store (M5 recompute seed). Downstream phases are NOT forced —
1640
1822
  * they re-evaluate naturally: if the seed's new output changed their
@@ -1646,27 +1828,34 @@ export interface PhaseCacheCtx {
1646
1828
  /** A computed cache identity: the new (versioned) key plus the read-only
1647
1829
  * fallback keys used to honor entries written by older releases. The `key`
1648
1830
  * is what we WRITE under and what `PhaseState.inputHash` carries; the
1649
- * `legacyKey`/`bareKey` are consulted READ-ONLY on a miss so an upgrade
1650
- * never produces a miss-storm. See docs/internal/cache-migration.md. */
1831
+ * `v2Key`/`bareKey`/`legacyKey` are consulted READ-ONLY on a miss so an
1832
+ * upgrade never produces a miss-storm. See docs/internal/cache-migration.md. */
1651
1833
  export interface CacheKeys {
1652
- /** Current key: folds `v2:flowdef:<hash>` (the overstory content fingerprint). */
1834
+ /** Current key: folds `v3:phasefp:<subfp>` (the per-phase structural
1835
+ * sub-fingerprint; degrades to the whole-flow hash when per-phase
1836
+ * soundness couldn't be guaranteed). */
1653
1837
  key: string;
1654
- /** Pre-flowDefHash-era key: the flowdef line OMITTED entirely. Read-only. */
1655
- legacyKey: string;
1838
+ /** Pre-M6 key: `v2:flowdef:<flowDefHash>` (whole-flow fingerprint).
1839
+ * Read-only. */
1840
+ v2Key: string;
1656
1841
  /** Bare (unversioned) `flowdef:` key — written by pre-H1 code that folded
1657
1842
  * the hash without a `v2:` prefix. Read-only. Removed in v0.1.0. */
1658
1843
  bareKey: string;
1844
+ /** Pre-flowDefHash-era key: the flowdef line OMITTED entirely. Read-only. */
1845
+ legacyKey: string;
1659
1846
  }
1660
1847
 
1661
1848
  /** Fold the phase fingerprint into the base hash parts to form the cache keys.
1662
1849
  *
1663
- * Three keys are produced for backward compatibility (see
1850
+ * Four keys are produced for backward compatibility (see
1664
1851
  * docs/internal/cache-migration.md):
1665
- * - `key` : `v2:flowdef:<hash>` — the current write key.
1852
+ * - `key` : `v3:phasefp:<subfp>` — the current write key (per-phase
1853
+ * structural sub-fingerprint; falls back to the whole-flow hash when
1854
+ * `cc.phaseFp` is absent).
1855
+ * - `v2Key` : `v2:flowdef:<flowDefHash>` — pre-M6 whole-flow key.
1856
+ * - `bareKey` : bare `flowdef:<flowDefHash>` (unversioned) — pre-H1 entries.
1666
1857
  * - `legacyKey`: the flowdef line omitted — pre-flowDefHash entries.
1667
- * - `bareKey` : bare `flowdef:<hash>` (unversioned) — pre-H1 entries that
1668
- * folded the hash without the `v2:` prefix.
1669
- * `cachedPhase` consults all three READ-ONLY on a miss; `recordCache` writes
1858
+ * `cachedPhase` consults all four READ-ONLY on a miss; `recordCache` writes
1670
1859
  * only `key`. This means an upgrade never produces a miss-storm: existing
1671
1860
  * entries (whichever shape) still hit, and new writes converge on `key`. */
1672
1861
  export function cacheKeys(cc: PhaseCacheCtx, baseParts: string[]): CacheKeys {
@@ -1682,10 +1871,15 @@ export function cacheKeys(cc: PhaseCacheCtx, baseParts: string[]): CacheKeys {
1682
1871
  ];
1683
1872
  const fold = (parts: string[]): string =>
1684
1873
  cc.fingerprint ? hashInput(...parts, cc.fingerprint) : hashInput(...parts);
1874
+ // Per-phase sub-fingerprint; falls back to the whole-flow hash when absent
1875
+ // (sub-flow inner states, or soundness fallback) — preserving pre-M6 behavior.
1876
+ const fp = cc.phaseFp ?? cc.flowDefHash ?? "";
1877
+ const fdh = cc.flowDefHash ?? "";
1685
1878
  return {
1686
- key: fold([`flow:${cc.flowName}`, `v2:flowdef:${cc.flowDefHash ?? ""}`, ...tail]),
1879
+ key: fold([`flow:${cc.flowName}`, `v3:phasefp:${fp}`, ...tail]),
1880
+ v2Key: fold([`flow:${cc.flowName}`, `v2:flowdef:${fdh}`, ...tail]),
1881
+ bareKey: fold([`flow:${cc.flowName}`, `flowdef:${fdh}`, ...tail]),
1687
1882
  legacyKey: fold([`flow:${cc.flowName}`, ...tail]),
1688
- bareKey: fold([`flow:${cc.flowName}`, `flowdef:${cc.flowDefHash ?? ""}`, ...tail]),
1689
1883
  };
1690
1884
  }
1691
1885
 
@@ -1696,9 +1890,10 @@ export function cacheKeys(cc: PhaseCacheCtx, baseParts: string[]): CacheKeys {
1696
1890
  * - "cross-run": within-run first, then the persistent cross-run store.
1697
1891
  * On a cross-run hit, usage is zeroed and `cacheHit` records the source.
1698
1892
  *
1699
- * The cross-run read is THREE-TIER and READ-ONLY for fallback keys: it tries
1700
- * `keys.key` (current `v2:flowdef:` shape) first, then `keys.bareKey` (pre-H1
1701
- * bare `flowdef:`), then `keys.legacyKey` (pre-flowDefHash, no flowdef line).
1893
+ * The cross-run read is FOUR-TIER and READ-ONLY for fallback keys: it tries
1894
+ * `keys.key` (current `v3:phasefp:` shape) first, then `keys.v2Key` (pre-M6
1895
+ * `v2:flowdef:`), then `keys.bareKey` (pre-H1 bare `flowdef:`), then
1896
+ * `keys.legacyKey` (pre-flowDefHash, no flowdef line).
1702
1897
  * A hit on ANY tier is restored as a cache hit; we do NOT write-through (no
1703
1898
  * re-store under the new key) so the cache size stays stable and the legacy
1704
1899
  * entry ages out naturally. See docs/internal/cache-migration.md.
@@ -1707,14 +1902,17 @@ function cachedPhase(cc: PhaseCacheCtx, keys: CacheKeys): PhaseState | null {
1707
1902
  if (cc.scope === "off") return null;
1708
1903
  if (cc.forceRerun) return null;
1709
1904
 
1710
- // 1. within-run resume (fastest; always allowed unless scope is off)
1905
+ // 1. within-run resume (fastest; always allowed unless scope is off). Flag
1906
+ // it as a `run-only` cache hit so the run summary can count it as reused
1907
+ // work (it spent no new tokens). The prior usage is preserved verbatim so
1908
+ // the summary can report what the reuse would otherwise have cost.
1711
1909
  if (cc.prior && cc.prior.status === "done" && cc.prior.inputHash === keys.key) {
1712
- return { ...cc.prior, status: "done" };
1910
+ return { ...cc.prior, status: "done", cacheHit: "run-only" };
1713
1911
  }
1714
1912
 
1715
- // 2. cross-run memoization (opt-in) — three-tier read-only fallback.
1913
+ // 2. cross-run memoization (opt-in) — four-tier read-only fallback.
1716
1914
  if (cc.scope === "cross-run") {
1717
- for (const k of [keys.key, keys.bareKey, keys.legacyKey]) {
1915
+ for (const k of [keys.key, keys.v2Key, keys.bareKey, keys.legacyKey]) {
1718
1916
  const e = cc.store.get(k, cc.ttlMs);
1719
1917
  if (!e) continue;
1720
1918
  // If we stored the full PhaseState, restore it (preserving gate,
@@ -1895,6 +2093,22 @@ export interface RecomputeReport {
1895
2093
  /** Phases in the frontier whose inputHash did NOT move → cached result
1896
2094
  * reused, no re-execution (early cutoff). Empty in dry-run (unknowable). */
1897
2095
  readonly cutoff: readonly string[];
2096
+ /** Per-phase decision trace: WHY each phase was rerun / cut off / reused.
2097
+ * The "explainable reactivity" layer — like React DevTools telling you why
2098
+ * a component re-rendered. Additive; callers that ignore it are unaffected. */
2099
+ readonly decisions: readonly RecomputeDecision[];
2100
+ }
2101
+
2102
+ /** Why a single phase landed in its recompute outcome. */
2103
+ export interface RecomputeDecision {
2104
+ readonly phaseId: string;
2105
+ /** What happened (real run) or would happen (dry-run). */
2106
+ readonly outcome: "rerun" | "cutoff" | "reused" | "failed";
2107
+ /** Human-readable cause. */
2108
+ readonly reason: string;
2109
+ /** The upstream phase(s) that caused this outcome, when applicable
2110
+ * (e.g. the changed upstreams that forced a rerun). */
2111
+ readonly causedBy?: readonly string[];
1898
2112
  }
1899
2113
 
1900
2114
  /** Scan a flow for dependencies that cannot be observed through the readSet.
@@ -1946,6 +2160,30 @@ export async function recomputeTaskflow(
1946
2160
  const allIds = Object.keys(newState.phases);
1947
2161
 
1948
2162
  if (opts.dryRun) {
2163
+ // Explain each phase WITHOUT executing: a frontier phase "may rerun"
2164
+ // because it (transitively) reads a changed seed; everything else is
2165
+ // reused as unreachable. We name the in-frontier upstream(s) as the cause.
2166
+ const seedSet0 = new Set(seeds);
2167
+ const upstreamsOf = (id: string): string[] => {
2168
+ const observed = (newState.phases[id]?.reads ?? []).map((r) => r.stepId).filter((u) => u !== id);
2169
+ const decl = (declared.get(id) ?? []).filter((u) => u !== id);
2170
+ return [...new Set([...observed, ...decl])];
2171
+ };
2172
+ const decisions: RecomputeDecision[] = allIds.map((id) => {
2173
+ if (!frontier.has(id)) {
2174
+ return { phaseId: id, outcome: "reused", reason: "not reachable from any changed seed" };
2175
+ }
2176
+ if (seedSet0.has(id)) {
2177
+ return { phaseId: id, outcome: "rerun", reason: "forced by recompute request (seed)" };
2178
+ }
2179
+ const causes = upstreamsOf(id).filter((u) => frontier.has(u));
2180
+ return {
2181
+ phaseId: id,
2182
+ outcome: "rerun",
2183
+ reason: "reads a phase in the stale frontier; may re-run if that upstream's output moves",
2184
+ causedBy: causes.length ? causes : undefined,
2185
+ };
2186
+ });
1949
2187
  return {
1950
2188
  report: {
1951
2189
  dryRun: true,
@@ -1954,6 +2192,7 @@ export async function recomputeTaskflow(
1954
2192
  rerun: [...frontier],
1955
2193
  reused: allIds.filter((id) => !frontier.has(id)),
1956
2194
  cutoff: [],
2195
+ decisions,
1957
2196
  },
1958
2197
  state: newState,
1959
2198
  };
@@ -2003,6 +2242,11 @@ export async function recomputeTaskflow(
2003
2242
  .filter((id) => frontier.has(id));
2004
2243
  const rerun: string[] = [];
2005
2244
  const cutoff: string[] = [];
2245
+ const decisions: RecomputeDecision[] = [];
2246
+ // Phases whose OUTPUT actually moved this recompute (seed forced, or result
2247
+ // changed). Used to attribute a downstream rerun to the specific upstream(s)
2248
+ // that changed — the "why" of the decision trace.
2249
+ const outputMoved = new Set<string>();
2006
2250
  const noop = () => {};
2007
2251
  let aborted = false;
2008
2252
  for (const id of order) {
@@ -2015,17 +2259,50 @@ export async function recomputeTaskflow(
2015
2259
  const phase = newState.def.phases.find((p) => p.id === id);
2016
2260
  if (!phase) continue;
2017
2261
  const before = newState.phases[id]?.inputHash;
2018
- const execOpts = seedSet.has(id) ? { forceRerun: true } : undefined;
2262
+ const isSeed = seedSet.has(id);
2263
+ const execOpts = isSeed ? { forceRerun: true } : undefined;
2264
+ // The upstream(s) of this phase whose output moved — the cause of a rerun.
2265
+ const changedUpstreams = depsFor(id).filter((u) => outputMoved.has(u));
2019
2266
  try {
2020
2267
  const ps = await executePhase(phase, newState, deps, newState.phases[id], noop, 0, execOpts);
2021
2268
  newState.phases[id] = ps;
2022
2269
  // A phase counts as "rerun" if it was a forced seed OR its result moved;
2023
2270
  // otherwise it hit its cache (inputHash unchanged) → early cutoff.
2024
- if (seedSet.has(id) || ps.inputHash !== before) rerun.push(id);
2025
- else cutoff.push(id);
2271
+ if (isSeed || ps.inputHash !== before) {
2272
+ rerun.push(id);
2273
+ outputMoved.add(id);
2274
+ decisions.push(
2275
+ isSeed
2276
+ ? { phaseId: id, outcome: "rerun", reason: "forced by recompute request (seed)" }
2277
+ : {
2278
+ phaseId: id,
2279
+ outcome: "rerun",
2280
+ reason: "input changed — an upstream's output moved",
2281
+ causedBy: changedUpstreams.length ? changedUpstreams : undefined,
2282
+ },
2283
+ );
2284
+ } else {
2285
+ cutoff.push(id);
2286
+ decisions.push({
2287
+ phaseId: id,
2288
+ outcome: "cutoff",
2289
+ reason: "input unchanged — upstream(s) re-ran but produced identical output (early cutoff)",
2290
+ causedBy: depsFor(id).filter((u) => frontier.has(u)).length
2291
+ ? depsFor(id).filter((u) => frontier.has(u))
2292
+ : undefined,
2293
+ });
2294
+ }
2026
2295
  } catch {
2027
2296
  // A failing recompute phase is recorded as rerun (it was attempted).
2028
2297
  rerun.push(id);
2298
+ outputMoved.add(id);
2299
+ decisions.push({ phaseId: id, outcome: "failed", reason: "re-execution attempted but the phase failed" });
2300
+ }
2301
+ }
2302
+ // Frontier-external phases were never touched — record them as reused.
2303
+ for (const id of allIds) {
2304
+ if (!frontier.has(id)) {
2305
+ decisions.push({ phaseId: id, outcome: "reused", reason: "not reachable from any changed seed" });
2029
2306
  }
2030
2307
  }
2031
2308
  return {
@@ -2036,6 +2313,7 @@ export async function recomputeTaskflow(
2036
2313
  rerun,
2037
2314
  reused: allIds.filter((id) => !frontier.has(id)),
2038
2315
  cutoff,
2316
+ decisions,
2039
2317
  },
2040
2318
  state: newState,
2041
2319
  };
@@ -2099,6 +2377,27 @@ async function runTaskflowLayers(state: RunState, deps: RuntimeDeps): Promise<Ru
2099
2377
  }
2100
2378
  }
2101
2379
 
2380
+ // M6: per-phase structural sub-fingerprints. Computed once per run (when
2381
+ // cross-run is potentially active) so editing phase B invalidates only B +
2382
+ // its transitive dependents, not independent siblings. Each value is either
2383
+ // a precise per-phase hash or the whole-flow `flowDefHash` (soundness
2384
+ // fallback for shareContext / `flow` phases). Skipped entirely when
2385
+ // `flowDefHash === "failed"` (cross-run is disabled for the run anyway).
2386
+ // Never throws into the run — a per-phase error degrades that phase to the
2387
+ // whole-flow hash (safe, = pre-M6 behavior).
2388
+ if (state.flowDefHash !== "failed" && state.phaseFingerprints === undefined) {
2389
+ const whole = state.flowDefHash ?? "";
2390
+ const map: Record<string, string> = {};
2391
+ for (const p of def.phases) {
2392
+ try {
2393
+ map[p.id] = (await phaseFingerprint(def, p.id)) ?? whole;
2394
+ } catch {
2395
+ map[p.id] = whole; // fail-open → whole-flow scope
2396
+ }
2397
+ }
2398
+ state.phaseFingerprints = map;
2399
+ }
2400
+
2102
2401
  state.status = "running";
2103
2402
  safeEmit(deps, state);
2104
2403
 
@@ -2238,5 +2537,6 @@ async function runTaskflowLayers(state: RunState, deps: RuntimeDeps): Promise<Ru
2238
2537
  finalOutput,
2239
2538
  ok: state.status === "completed",
2240
2539
  totalUsage,
2540
+ reuse: summarizeReuse(state),
2241
2541
  };
2242
2542
  }
@@ -284,6 +284,12 @@ export const TaskflowSchema = Type.Object(
284
284
  "Enable the Shared Context Tree for ALL phases in this flow (shorthand for setting shareContext on every phase). Default false.",
285
285
  }),
286
286
  ),
287
+ incremental: Type.Optional(
288
+ Type.Boolean({
289
+ description:
290
+ "Default every phase to cross-run caching (scope:'cross-run') so re-running this flow reuses unchanged phases across runs/sessions. Equivalent to setting cache:{scope:'cross-run'} on every phase; per-phase cache settings and the cross-run-blocked types (gate/approval/loop/tournament) still take precedence. Default false (run-only — each run starts fresh unless a phase opts in). A run-time `incremental` argument overrides this.",
291
+ }),
292
+ ),
287
293
  phases: Type.Array(PhaseSchema, { minItems: 1, description: "Ordered phase definitions (DAG via dependsOn)" }),
288
294
  },
289
295
  { additionalProperties: false },
@@ -855,6 +861,37 @@ export function dependenciesOf(phase: Phase): string[] {
855
861
  return Array.from(set);
856
862
  }
857
863
 
864
+ /**
865
+ * Transitive upstream dependency closure of a phase: every id reachable via
866
+ * `dependsOn ∪ from`, including indirect ancestors. Cycle-safe (visited set).
867
+ * Returns the closure EXCLUDING `phaseId` itself. Sorted for deterministic
868
+ * hashing. Shares the exact edge semantics with `topoLayers`/`detectCycle` so
869
+ * the closure is complete for every valid flow (validation already rejects
870
+ * `{steps.X}` refs that aren't reachable via these edges, except for
871
+ * `join: "any"` phases — handled by callers as needed).
872
+ *
873
+ * Hoisted out of `validateTaskflow` so `phaseFingerprint` (M6) and validation
874
+ * share one source of truth for "what does this phase structurally depend on".
875
+ */
876
+ export function transitiveDependencies(phases: Phase[], phaseId: string): string[] {
877
+ const byId = new Map(phases.map((p) => [p.id, p]));
878
+ const seen = new Set<string>();
879
+ const queue: string[] = [];
880
+ const seed = byId.get(phaseId);
881
+ if (seed) for (const d of dependenciesOf(seed)) queue.push(d);
882
+ while (queue.length) {
883
+ const id = queue.shift()!;
884
+ if (seen.has(id)) continue;
885
+ if (!byId.has(id)) continue; // unknown dep — validation reports elsewhere
886
+ seen.add(id);
887
+ const dep = byId.get(id)!;
888
+ for (const d of dependenciesOf(dep)) {
889
+ if (!seen.has(d)) queue.push(d);
890
+ }
891
+ }
892
+ return Array.from(seen).sort();
893
+ }
894
+
858
895
  /** Topologically ordered layers; phases in the same layer can run concurrently. */
859
896
  export function topoLayers(phases: Phase[]): Phase[][] {
860
897
  const byId = new Map(phases.map((p) => [p.id, p]));
@@ -42,10 +42,11 @@ export interface PhaseState {
42
42
  model?: string;
43
43
  error?: string;
44
44
  inputHash?: string;
45
- /** When this result was served from cache: 'cross-run' for the persistent
46
- * cross-run store. (Within-run resume reuses prior state verbatim and is not
47
- * flagged here.) */
48
- cacheHit?: "cross-run";
45
+ /** When this result was served from cache instead of executed:
46
+ * 'cross-run' = restored from the persistent cross-run store;
47
+ * 'run-only' = within-run resume (a prior attempt with the same inputHash).
48
+ * A phase with this set spent no new tokens this run. */
49
+ cacheHit?: "cross-run" | "run-only";
49
50
  startedAt?: number;
50
51
  endedAt?: number;
51
52
  /** Live fan-out progress for map/parallel phases. */
@@ -114,6 +115,13 @@ export interface RunState {
114
115
  * recompute derives this fresh from `def` so old runs (pre-H1) also get
115
116
  * union semantics. */
116
117
  declaredDeps?: Record<string, DeclaredDeps>;
118
+ /** Per-phase structural sub-fingerprints (M6). Computed once per run
119
+ * alongside `flowDefHash`. Each value is either a precise per-phase hash
120
+ * (when sound) or the whole-flow `flowDefHash` (fallback for
121
+ * shareContext / `flow` phases). Folded into the cross-run cache key as
122
+ * `v3:phasefp:<subfp>` so editing phase B invalidates only B + its
123
+ * transitive dependents. Audit/resume only — recompute derives fresh. */
124
+ phaseFingerprints?: Record<string, string>;
117
125
  }
118
126
 
119
127
  // ---------------------------------------------------------------------------
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-taskflow",
3
- "version": "0.0.27",
3
+ "version": "0.0.28",
4
4
  "description": "A declarative, verifiable graph of task nodes for the Pi coding agent — not a workflow you script, but a DAG you declare: statically verified before it runs, with dynamic fan-out, gates, isolated subagent context, resumable runs, and saveable commands.",
5
5
  "keywords": [
6
6
  "pi-package",
@@ -549,10 +549,58 @@ Quick reference:
549
549
 
550
550
  - **Flow:** `name`, `description`, `concurrency` (default 8), `budget` (`maxUSD`/`maxTokens`), `agentScope` (user|project|both), `args`, `strictInterpolation`.
551
551
  - **Phase:** `model`, `thinking`, `tools` (whitelist), `cwd`, `output:"json"`, `concurrency` (map/parallel fan-out), `when`, `join` (all|any), `retry`, `use`/`with` (flow), `optional` (fail-soft — a failed/blocked phase won't abort the run), `final`.
552
- - **Cross-run caching:** add `cache: { "scope": "cross-run" }` to a phase to memoize its output across runs (same input → instant reuse, zero tokens). See `configuration.md` for `ttl`, `fingerprint` (git/glob/file/env invalidation), and scope options.
552
+ - **Cross-run caching:** add `cache: { "scope": "cross-run" }` to a phase to memoize its output across runs (same input → instant reuse, zero tokens), or set `incremental: true` at the flow level (or pass `incremental: true` to `run`) to default every phase to cross-run reuse. See `configuration.md` for `ttl`, `fingerprint` (git/glob/file/env invalidation), scope options, and the `incremental` precedence rules.
553
553
  - **Precedence (model/thinking/tools):** phase value → agent frontmatter (resolved via `modelRoles`) → global/default.
554
554
  - **Concurrency:** same-layer phases use `flow.concurrency`; a `map`/`parallel` phase uses `phase.concurrency ?? flow.concurrency ?? 8`.
555
555
 
556
+ ### Per-item map caching (cross-run)
557
+
558
+ A `map` phase with `cache: { "scope": "cross-run" }` is cached **per item**, not
559
+ just as a whole. When one of N items changes between runs, only that item
560
+ re-executes — the other N−1 are served from the cross-run cache for $0.
561
+
562
+ ```jsonc
563
+ { "id": "audit-each", "type": "map",
564
+ "over": "{steps.discover.json.files}", // array from an upstream phase
565
+ "task": "audit {item}",
566
+ "cache": { "scope": "cross-run" }, // ← enables per-item reuse
567
+ "dependsOn": ["discover"], "final": true }
568
+ ```
569
+
570
+ How it works:
571
+
572
+ - The **whole-map** entry is still checked first (fast path): an identical
573
+ re-run is a single $0 hit and never enters the fan-out.
574
+ - On a whole-map miss, each item is looked up individually before it spawns a
575
+ subagent; a hit returns a 0-token synthesized result. Successful fresh items
576
+ are recorded so a later run with that item unchanged reuses them.
577
+ - Per-item keys fold the item's resolved task **and agent** (so changing
578
+ `phase.agent` invalidates every item), plus the phase sub-fingerprint,
579
+ `thinking`/`tools`, and any `fingerprint` entries — exactly like a standalone
580
+ cross-run phase.
581
+
582
+ Automatic fallbacks (per-item disables and the whole-map path is used):
583
+
584
+ - `shareContext: true` on the phase, or flow-wide `contextSharing: true` — a
585
+ sharing item can read sibling blackboard writes outside its declared deps, so
586
+ the per-item key would under-approximate real reads.
587
+ - The map runs **inside a runtime-generated sub-flow** (a `flow { def }` phase
588
+ or a `ctx_spawn({subflow})`) — untrusted / possibly non-deterministic.
589
+ - `scope: "run-only"` (default) or `"off"` — no persistent store to reuse from.
590
+
591
+ Notes & limitations:
592
+
593
+ - Duplicate items (identical task + agent) share a single entry — reuse is
594
+ content-addressable, not positional.
595
+ - Failed items and **budget-skipped** items are never cached, so they always
596
+ re-execute on the next run.
597
+ - `{steps.<map>.json[k]}` indexes the k-th **successful** item (not the k-th
598
+ position in `over`); the merged `output` text, however, IS positionally
599
+ aligned with `over` (labels read `[k/N]`).
600
+ - Within-run resume of a partially-completed map is not supported (only
601
+ fully-completed maps resume within a run); cross-run per-item reuse covers the
602
+ common case.
603
+
556
604
  ## Actions
557
605
 
558
606
  - `action: "run"` — run an inline `define` (a one-off DAG) **or** a saved `name` (with optional `args`). Use `define` for an ad-hoc flow; use `name` to invoke something previously saved. Add `detach: true` to run in the background (returns immediately with the runId; poll the store for status).
@@ -283,6 +283,28 @@ for the design.
283
283
  | `cross-run` | Reuse an identical-input result from **any** prior run (the persistent store). |
284
284
  | `off` | Never reuse, even within a run (force re-execution every time). |
285
285
 
286
+ ### Flow-wide opt-in: `incremental`
287
+
288
+ Rather than annotating every phase with `cache: { "scope": "cross-run" }`, set
289
+ `incremental: true` at the **flow** level (or pass `incremental: true` as the
290
+ `run` tool argument) to default *every* phase to cross-run reuse:
291
+
292
+ ```jsonc
293
+ {
294
+ "name": "audit",
295
+ "incremental": true, // ← every phase defaults to scope:"cross-run"
296
+ "phases": [ /* ... */ ]
297
+ }
298
+ ```
299
+
300
+ Precedence: the invocation `incremental` argument wins over the flow's
301
+ `incremental` field, which is in turn overridden by any **per-phase** `cache`
302
+ setting. The cross-run-blocked phase types (`gate`/`approval`/`loop`/
303
+ `tournament`) and all per-phase soundness fallbacks still apply. The default
304
+ remains `run-only` (each run starts fresh unless something opts in), because
305
+ cross-run reuse silently persists outputs and can serve stale results for phases
306
+ whose agents read files at runtime.
307
+
286
308
  ### `ttl` (cross-run only)
287
309
 
288
310
  Max age before a cross-run hit is treated as a miss: e.g. `"30m"`, `"6h"`, `"7d"`.