pi-taskflow 0.0.18 → 0.0.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,18 @@
2
2
 
3
3
  All notable changes to pi-taskflow are documented here. This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) format.
4
4
 
5
+ ## [0.0.19] — 2026-06-10
6
+
7
+ ### Documentation
8
+ - **Closed the SKILL coverage gap — the LLM can now author every shipped feature.** A schema-vs-SKILL.md audit (`docs/internal/skill-coverage-audit.md`, machine-checked + cross-adversarial reviewed) found several implemented + tested features that were undocumented in the LLM-facing skill, so the model never generated them. All ~46 user-facing schema fields are now documented across SKILL.md + configuration.md.
9
+ - **SKILL.md**: phase-type table now lists all 9 types (added `loop`, `tournament`) with a “details” column pointing each to its section; new **Loop phases** (`until`/`maxIterations`/`convergence`) and **Tournament phases** (`variants`/`judge`/`mode`/`judgeAgent`) sections; `eval` (zero-token machine gate) and `onBlock: "retry"` (self-healing rework loop) folded into the Gate section; cross-run `cache` pointer + `optional` + static `branches` notes.
10
+ - **SKILL.md**: new **Operating a run** section — run lifecycle (`running → completed/blocked/failed/paused`), cache-aware resume, when to resume vs. re-run, budget-mid-run behavior, and run inspection. Clarified action semantics (`define` vs `name`, save scope/collision, `verify`/`agents` actions).
11
+ - **configuration.md**: new **§2.1 Context pre-reading** (`context`/`contextLimit` — resolution order, per-file 8000-char cap, 200k total cap) and **§8 Cross-run caching** (`cache.scope`, `ttl`, full `fingerprint` prefix table for git/glob/glob!/file/env). Fixed a stale “5 phase types” → 9 cross-file drift.
12
+ - Every documented JSON example validates against the live schema; all run-status/resume claims verified against the runtime (`blocked` is terminal; `paused`/`failed` are resumable). 560 tests pass, zero regression.
13
+
14
+ ### CI
15
+ - GitHub Packages publish is now best-effort (`continue-on-error`) so an unscoped-package 404 there can never block the npm publish or the GitHub Release.
16
+
5
17
  ## [0.0.18] — 2026-06-09
6
18
 
7
19
  ### Added
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-taskflow",
3
- "version": "0.0.18",
3
+ "version": "0.0.19",
4
4
  "description": "A declarative, verifiable graph of task nodes for the Pi coding agent — not a workflow you script, but a DAG you declare: statically verified before it runs, with dynamic fan-out, gates, isolated subagent context, resumable runs, and saveable commands.",
5
5
  "keywords": [
6
6
  "pi-package",
@@ -79,15 +79,17 @@ Call the `taskflow` tool. To run a brand-new flow you write inline, pass
79
79
 
80
80
  ### Phase types
81
81
 
82
- | type | meaning |
83
- |------|---------|
84
- | `agent` | one subagent runs `task` |
85
- | `parallel` | run `branches[]` concurrently |
86
- | `map` | fan out over `over` (an array) — one subagent per item, `{item}` bound |
87
- | `gate` | quality/review step that can **halt the flow** (see below) |
88
- | `reduce` | aggregate `from[]` phases into one output |
89
- | `approval` | **human-in-the-loop** pause: ask a person to approve / reject / edit before continuing |
90
- | `flow` | run a **sub-flow** as one phase — **saved** (`use`) or **runtime-generated** (`def`) |
82
+ | type | meaning | details |
83
+ |------|---------|---------|
84
+ | `agent` | one subagent runs `task` | DSL shape |
85
+ | `parallel` | run `branches[]` concurrently | Conditional routing |
86
+ | `map` | fan out over `over` (an array) — one subagent per item, `{item}` bound | DSL shape |
87
+ | `gate` | quality/review step that can **halt the flow** | Gate phases |
88
+ | `reduce` | aggregate `from[]` phases into one output | DSL shape |
89
+ | `approval` | **human-in-the-loop** pause: ask a person to approve / reject / edit before continuing | Approval phases |
90
+ | `flow` | run a **sub-flow** as one phase — **saved** (`use`) or **runtime-generated** (`def`) | Sub-flows |
91
+ | `loop` | repeat a body until a condition / convergence / `maxIterations` | Loop phases |
92
+ | `tournament` | run N competing `variants`, a `judge` picks the best or aggregates | Tournament phases |
91
93
 
92
94
  ### Control-flow fields (any phase)
93
95
 
@@ -100,7 +102,9 @@ Call the `taskflow` tool. To run a brand-new flow you write inline, pass
100
102
  ### Conditional routing (when + gate/branches)
101
103
 
102
104
  Pair `when` with an upstream phase that emits a decision to build real if/else
103
- routing. Use `join: "any"` on the merge phase so it runs whichever branch fired:
105
+ routing. Use `join: "any"` on the merge phase so it runs whichever branch fired. For
106
+ static (non-conditional) concurrency, a `parallel` phase runs fixed `branches[]`
107
+ instead — `{ "type": "parallel", "branches": [{"task":"..."}, {"task":"...","agent":"reviewer"}] }`.
104
108
 
105
109
  ```jsonc
106
110
  { "id": "triage", "type": "agent", "agent": "analyst", "output": "json",
@@ -176,6 +180,62 @@ so round N's plan depends on round N-1's **result** (not a one-shot fan-out):
176
180
  the declarative equivalent of `for (...) { read result; decide next }`. See
177
181
  `examples/dynamic-plan-execute.json` and `examples/iterative-replan.json`.
178
182
 
183
+ ### Loop phases (iterate until done)
184
+
185
+ A `loop` phase runs its body repeatedly, exposing each iteration's output as
186
+ `{steps.<thisId>.output}` / `.json` so the next round can react to the last. It
187
+ stops on the first of: `until` truthy, **convergence** (output stops changing),
188
+ or `maxIterations` (hard cap). This is the declarative "keep going until good
189
+ enough" — the runtime always terminates (the cap is mandatory).
190
+
191
+ - `until` — stop condition, same operators as `when` (a parse error stops the loop, fail-safe).
192
+ - `maxIterations` — hard iteration cap (required to bound the loop).
193
+ - `convergence` — `true` to stop early when an iteration's output equals the previous one.
194
+
195
+ ```jsonc
196
+ {
197
+ "id": "refine",
198
+ "type": "loop",
199
+ "agent": "executor",
200
+ "maxIterations": 5,
201
+ "until": "{steps.refine.json.done} == true",
202
+ "convergence": true,
203
+ "task": "Improve the draft. When nothing else needs fixing, output JSON {\"done\":true,\"draft\":\"...\"}; otherwise {\"done\":false,\"draft\":\"...\"}.",
204
+ "output": "json",
205
+ "final": true
206
+ }
207
+ ```
208
+
209
+ For data-dependent **replanning** each round, pair a `loop` body that emits a
210
+ plan with `flow{def}` (see Sub-flows above). See `examples/iterative-replan.json`.
211
+
212
+ ### Tournament phases (N variants, judge picks best)
213
+
214
+ A `tournament` phase runs `variants` competing attempts in parallel, then a
215
+ **judge** sub-phase selects the winner (`mode: "best"`) or merges them
216
+ (`mode: "aggregate"`). Use it when one shot is unreliable and you want the best
217
+ of several drafts, or a synthesis of diverse approaches.
218
+
219
+ - `variants` — the competing attempts: a number (run the same `task` N times) or an array of `{task, agent?}` for genuinely different approaches.
220
+ - `mode` — `"best"` (judge picks one winner, default) or `"aggregate"` (judge merges all into one output).
221
+ - `judge` — the judge's rubric/instructions (how to choose or merge).
222
+ - `judgeAgent` — *(optional)* the agent that runs the judge step; defaults to the phase `agent`.
223
+ - Fail-open: if the judge's pick is unparseable, variant 1 is returned (work is never lost).
224
+
225
+ ```jsonc
226
+ {
227
+ "id": "headline",
228
+ "type": "tournament",
229
+ "agent": "executor",
230
+ "variants": 3,
231
+ "mode": "best",
232
+ "judge": "Pick the clearest, most accurate headline. End with: WINNER: <n>.",
233
+ "task": "Write one headline for the article below.\n\n{steps.draft.output}",
234
+ "dependsOn": ["draft"],
235
+ "final": true
236
+ }
237
+ ```
238
+
179
239
  ### Budget (cost / token caps)
180
240
 
181
241
  Add a run-wide ceiling at the top level. When accumulated cost/tokens exceed it,
@@ -206,6 +266,30 @@ Review the audit results below. If any endpoint is missing auth, end with
206
266
  {steps.audit.output}
207
267
  ```
208
268
 
269
+ **Zero-token machine checks (`eval`).** Before spending a token on the LLM gate,
270
+ list machine-checkable assertions in `eval`. If **all** pass, the gate
271
+ auto-passes with **no LLM call**; if any fails, it falls through to the LLM
272
+ `task` (the qualitative residue). Each entry supports the `when` operators plus
273
+ `X contains Y` (substring). A parse error fails **open** (consistent with the
274
+ gate invariant).
275
+
276
+ ```jsonc
277
+ { "id": "quality", "type": "gate", "dependsOn": ["build","test"],
278
+ "eval": ["{steps.build.output} contains BUILD SUCCESS", "{steps.test.json.failures} == 0"],
279
+ "task": "Review the diff for subtle logic errors a linter can't catch. VERDICT: PASS or BLOCK." }
280
+ ```
281
+
282
+ **Self-healing (`onBlock: "retry"`).** By default a blocking gate halts the run
283
+ (`onBlock: "halt"`). With `onBlock: "retry"` the gate instead **re-runs its
284
+ upstream `dependsOn` phases and re-evaluates**, up to `retry.max` rounds (or
285
+ until PASS / budget / abort) — a generate→critique→regenerate rework loop.
286
+
287
+ ```jsonc
288
+ { "id": "spec-gate", "type": "gate", "onBlock": "retry", "retry": { "max": 3 },
289
+ "dependsOn": ["implement"],
290
+ "task": "Does the implementation satisfy ALL acceptance criteria? VERDICT: PASS or BLOCK with reasons." }
291
+ ```
292
+
209
293
  ### Structured-verify phases (v0.0.8.1)
210
294
 
211
295
  A "verify" phase typically runs `npx tsc --noEmit && npm test && git diff --stat`
@@ -343,16 +427,26 @@ variables, and storage paths — read `configuration.md` (next to this file).
343
427
  Quick reference:
344
428
 
345
429
  - **Flow:** `name`, `description`, `concurrency` (default 8), `budget` (`maxUSD`/`maxTokens`), `agentScope` (user|project|both), `args`, `strictInterpolation`.
346
- - **Phase:** `model`, `thinking`, `tools` (whitelist), `cwd`, `output:"json"`, `concurrency` (map/parallel fan-out), `when`, `join` (all|any), `retry`, `use`/`with` (flow), `final`.
430
+ - **Phase:** `model`, `thinking`, `tools` (whitelist), `cwd`, `output:"json"`, `concurrency` (map/parallel fan-out), `when`, `join` (all|any), `retry`, `use`/`with` (flow), `optional` (fail-soft — a failed/blocked phase won't abort the run), `final`.
431
+ - **Cross-run caching:** add `cache: { "scope": "cross-run" }` to a phase to memoize its output across runs (same input → instant reuse, zero tokens). See `configuration.md` for `ttl`, `fingerprint` (git/glob/file/env invalidation), and scope options.
347
432
  - **Precedence (model/thinking/tools):** phase value → agent frontmatter (resolved via `modelRoles`) → global/default.
348
433
  - **Concurrency:** same-layer phases use `flow.concurrency`; a `map`/`parallel` phase uses `phase.concurrency ?? flow.concurrency ?? 8`.
349
434
 
350
435
  ## Actions
351
436
 
352
- - `action: "run"` — run inline `define` or a saved `name` (with optional `args`).
353
- - `action: "save"` — persist `define` (scope `project` or `user`); becomes `/tf:<name>`.
354
- - `action: "resume"` — continue a paused/failed run by `runId` (completed phases are cached).
355
- - `action: "list"` — list saved flows.
437
+ - `action: "run"` — run an inline `define` (a one-off DAG) **or** a saved `name` (with optional `args`). Use `define` for an ad-hoc flow; use `name` to invoke something previously saved.
438
+ - `action: "save"` — persist `define` (scope `project` — default, committed/shared — or `user`); it becomes `/tf:<name>`. On a name collision, project overrides user.
439
+ - `action: "resume"` — continue a paused/failed run by `runId`.
440
+ - `action: "list"` — list saved flows. `action: "verify"` — static-check a `define` (zero tokens). `action: "agents"` — list available agents.
441
+
442
+ ## Operating a run (lifecycle, resume, inspection)
443
+
444
+ A run moves through: **running →** `completed` (a `final` phase produced output) **/** `blocked` (a gate emitted BLOCK, an `approval` was rejected, or the `budget` cap was hit) **/** `failed` (a non-`optional` phase errored) **/** `paused` (the run was aborted). `failed` and `paused` runs are resumable; `blocked` is terminal (fix the gate/budget and re-run).
445
+
446
+ - **Resume is cache-aware.** `action: "resume"` re-runs only what didn't finish: every phase already `done` is reused from its recorded output (within-run cache), so resuming after a crash or a `blocked`/`failed` stop never repeats completed work. A phase that was mid-flight is re-executed cleanly (stale `error`/`endedAt` are cleared first).
447
+ - **When to resume vs. re-run.** Resume when the inputs are unchanged and you just want to continue/retry the tail (fixed a gate, raised the budget, approved a checkpoint). Re-run from scratch when the task or upstream inputs changed — resume would reuse now-stale outputs. (For reuse *across* runs, opt a phase into `cache: {scope:"cross-run"}` — see configuration.md.)
448
+ - **Budget mid-run.** When the run-wide `budget` is exceeded, remaining phases are skipped and an in-flight `map`/`parallel` stops spawning new items; the run ends `blocked` with the partial outputs preserved.
449
+ - **Inspect runs.** `/tf runs` lists recent runs with status; `/tf show <name>` prints a saved flow's definition. Run state lives at `<project .pi>/taskflows/runs/<runId>.json` (gitignored).
356
450
 
357
451
  ## User commands
358
452
 
@@ -50,7 +50,7 @@ Keys of each object in `phases[]`. Some only apply to specific `type`s.
50
50
  ```jsonc
51
51
  {
52
52
  "id": "audit", // required, unique — referenced via {steps.audit.output}
53
- "type": "map", // agent | parallel | map | gate | reduce (default: agent)
53
+ "type": "map", // agent | parallel | map | gate | reduce | approval | flow | loop | tournament (default: agent)
54
54
  "agent": "analyst", // agent name to run this phase
55
55
  "task": "Audit {item.route}…",
56
56
  "dependsOn": ["discover"],// DAG edges
@@ -71,7 +71,7 @@ Keys of each object in `phases[]`. Some only apply to specific `type`s.
71
71
  | Key | Applies to | Default | Notes |
72
72
  |-----|-----------|---------|-------|
73
73
  | `id` | all | — | **Required, unique.** Used in `{steps.<id>…}`. |
74
- | `type` | all | `agent` | One of the 5 phase types. |
74
+ | `type` | all | `agent` | One of the 9 phase types (agent, parallel, map, gate, reduce, approval, flow, loop, tournament). |
75
75
  | `agent` | all | first available | Agent name; resolved from the scoped pool. |
76
76
  | `task` | agent, gate, map, reduce | — | Prompt; supports interpolation. Required for these types. |
77
77
  | `over` | map | — | **Required for map.** Must resolve to an array. |
@@ -85,8 +85,52 @@ Keys of each object in `phases[]`. Some only apply to specific `type`s.
85
85
  | `tools` | all | agent default | Whitelist of tools for the subagent. See §5. |
86
86
  | `cwd` | all | flow cwd | Run this phase's subagent in a different directory. |
87
87
  | `concurrency` | map, parallel | flow concurrency | Fan-out cap for this phase only. See §4. |
88
+ | `context` | all | — | File paths / `{steps.X}` refs to **pre-read and inject** before the task. See §2.1. |
89
+ | `contextLimit` | all | `8000` | Max characters read **per file** in `context`. See §2.1. |
90
+ | `cache` | all | `run-only` | Per-phase cache policy (`scope`/`ttl`/`fingerprint`). See §11. |
88
91
  | `final` | all | last phase | Exactly one phase may be `final`; its output is returned. |
89
92
 
93
+ > Gate-only control fields (`eval`, `onBlock`) and the loop/tournament control
94
+ > fields (`until`/`maxIterations`/`convergence`, `variants`/`judge`/`judgeAgent`/`mode`)
95
+ > are documented in `SKILL.md` next to their phase types.
96
+
97
+ ---
98
+
99
+ ## 2.1 Context pre-reading (`context` / `contextLimit`)
100
+
101
+ Instead of making a subagent *discover* files by exploring (an O(N²) turn-cost
102
+ spiral), you can **pre-read** known files and inject their contents ahead of the
103
+ task prompt. List file paths and/or `{steps.X}` refs in `context`; the runtime
104
+ resolves interpolated refs first, then reads each file and prepends labeled
105
+ blocks to the task.
106
+
107
+ ```jsonc
108
+ {
109
+ "id": "review",
110
+ "type": "agent",
111
+ "agent": "reviewer",
112
+ "context": ["src/auth.ts", "src/middleware.ts", "{steps.spec.output}"],
113
+ "contextLimit": 12000,
114
+ "task": "Review the auth flow against the spec above. VERDICT: PASS or BLOCK.",
115
+ "dependsOn": ["spec"]
116
+ }
117
+ ```
118
+
119
+ **Behavior & limits (all enforced in the runtime):**
120
+
121
+ | Aspect | Rule |
122
+ |--------|------|
123
+ | Resolution order | interpolate `{steps.X}` / `{args.X}` refs **first**, then read file paths. |
124
+ | Per-file cap | `contextLimit` characters per file (default **8000**); longer files are truncated with a marker. |
125
+ | Total cap | the combined injected block is hard-capped at **200,000 chars**; overflow is truncated with a notice. |
126
+ | Unreadable file | skipped with a `console.warn` (never aborts the phase). |
127
+ | JSON-looking entry | a value that looks like a JSON blob (not a path) is diagnosed and skipped, not read as a file. |
128
+
129
+ Use `context` for **known, bounded** inputs (a handful of source files, an
130
+ upstream phase's output). For large/unknown exploration, let the agent use its
131
+ `read`/`grep` tools instead — pre-reading hundreds of files just hits the total
132
+ cap.
133
+
90
134
  ---
91
135
 
92
136
  ## 3. Declaring & passing arguments
@@ -209,7 +253,73 @@ Taskflow shares the subagent settings file at `~/.pi/agent/settings.json`:
209
253
 
210
254
  ---
211
255
 
212
- ## 8. Environment variables
256
+ ## 8. Cross-run caching (`cache`)
257
+
258
+ By default every phase is **`run-only`**: completed phases are reused only when
259
+ you *resume the same run* (the historical behavior). Opt a phase into the
260
+ persistent **cross-run** memoization store to reuse an identical-input result
261
+ from *any prior run* — instant, zero tokens. See `docs/rfc-cross-run-memoization.md`
262
+ for the design.
263
+
264
+ ```jsonc
265
+ {
266
+ "id": "summarize-deps",
267
+ "type": "agent",
268
+ "agent": "writer",
269
+ "task": "Summarize the dependency tree of this repo.",
270
+ "cache": {
271
+ "scope": "cross-run",
272
+ "ttl": "6h",
273
+ "fingerprint": ["git:HEAD", "file:package-lock.json"]
274
+ }
275
+ }
276
+ ```
277
+
278
+ ### `scope`
279
+
280
+ | Value | Meaning |
281
+ |-------|---------|
282
+ | `run-only` (default) | Reuse only within a resumed run — exactly the historical behavior. |
283
+ | `cross-run` | Reuse an identical-input result from **any** prior run (the persistent store). |
284
+ | `off` | Never reuse, even within a run (force re-execution every time). |
285
+
286
+ ### `ttl` (cross-run only)
287
+
288
+ Max age before a cross-run hit is treated as a miss: e.g. `"30m"`, `"6h"`, `"7d"`.
289
+ Omit for no time bound. A hit older than the TTL re-executes the phase.
290
+
291
+ ### `fingerprint` (cross-run only)
292
+
293
+ The cache key is normally `phaseId + agent + model + interpolated-task`. A
294
+ fingerprint folds **“did the world change?”** signals into that key, so an
295
+ external change becomes a cache **miss** even when the task text is identical.
296
+ Each entry is one of:
297
+
298
+ | Entry | Becomes a miss when… | Resolves to |
299
+ |-------|----------------------|-------------|
300
+ | `git:HEAD` / `git:<ref>` | the commit moves | the resolved SHA (30s timeout → `<timeout>`; no git → `<no-git>`) |
301
+ | `glob:<pattern>` | the **set of matching paths** changes | sorted path list (mtime-free) |
302
+ | `glob!:<pattern>` | the **contents** of matching files change | content hashes (capped at 5000 matches) |
303
+ | `file:<path>` | that file's content changes | sha256 of the file (>10 MB or missing → `<skip>`/`<missing>`) |
304
+ | `env:<NAME>` | the env var changes | the env value |
305
+
306
+ ### What is cached, and when
307
+
308
+ - Only phases whose **`status` is `done`** and that **were not themselves a cache
309
+ hit** are written to the store (no re-storing a value just read).
310
+ - The store is keyed by the full input hash + fingerprint, tagged with
311
+ `flowName`/`phaseId`/`runId`/`model` for inspection and LRU eviction.
312
+ - Cross-run reuse is **safe by construction**: a different agent, model, task, or
313
+ fingerprint produces a different key, so stale results are never served.
314
+
315
+ > **When to use it:** expensive, deterministic phases whose inputs rarely change
316
+ > (dependency summaries, doc generation, repeated audits of the same tree). For
317
+ > phases that *should* re-run every time (anything reading live external state
318
+ > without a fingerprint), leave the default `run-only` or set `off`.
319
+
320
+ ---
321
+
322
+ ## 9. Environment variables
213
323
 
214
324
  | Variable | Effect |
215
325
  |----------|--------|
@@ -217,7 +327,7 @@ Taskflow shares the subagent settings file at `~/.pi/agent/settings.json`:
217
327
 
218
328
  ---
219
329
 
220
- ## 9. Storage & file locations
330
+ ## 10. Storage & file locations
221
331
 
222
332
  | What | Path | Commit? |
223
333
  |------|------|---------|
@@ -233,7 +343,7 @@ Taskflow shares the subagent settings file at `~/.pi/agent/settings.json`:
233
343
 
234
344
  ---
235
345
 
236
- ## 10. Quick recipes
346
+ ## 11. Quick recipes
237
347
 
238
348
  **Pin a strong model only for the review gate:**
239
349
  ```jsonc