oh-my-workflow 0.2.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/skill/SKILL.md CHANGED
@@ -1,5 +1,5 @@
1
1
  ---
2
- name: oh-my-workflow
2
+ name: omw
3
3
  description: Use when a task decomposes into multiple coding-agent CLI calls (claude -p / codex exec) that should run as one structured, schema-gated, journaled workflow — fan-out search, verify-vote, pipeline, or loop-until-dry. Teaches you to author a plain-JS omw script, run it with `omw run`, read the JSONL journal, and repair your own script from structured failures.
4
4
  ---
5
5
 
@@ -9,11 +9,15 @@ You write a **plain-JS orchestration script**. Its nodes are whole coding-agent
9
9
  CLIs you already pay for (`claude -p`, `codex exec`). omw is the thin glue: it
10
10
  runs your script, schema-gates each node's output, and journals every step — so
11
11
  you can read your own failure and fix your own script. (What's "deterministic" is
12
- scoped below — the engine's guarantees and `--agent fake`, not your script.)
12
+ scoped below — the engine's guarantees and `--agent fake`, not your script unless
13
+ you pass `--strict`.)
13
14
 
14
- The runtime gives your script exactly **five hooks** (`agent` / `pipeline` /
15
- `parallel` / `phase` / `log`). That is the entire surface. There is no DSL to
16
- learn; everything else is ordinary JavaScript control flow.
15
+ omw is the **open twin of Claude Code's native dynamic Workflow**: the same
16
+ authoring shape and vocabulary (`agent` / `parallel` / `pipeline` / `workflow` /
17
+ `budget`), but the nodes are *external coding-agent CLIs*, it runs from any host,
18
+ and there is **no magic** — no source transform, no ambient globals, no
19
+ sandbox-by-default. Your script is ordinary JavaScript; the runtime hands it a
20
+ **hooks object** as the first argument. There is no DSL to learn.
17
21
 
18
22
  ## When to use this
19
23
 
@@ -24,15 +28,16 @@ benefits from structure you'd otherwise hand-roll:
24
28
  - **Verify / vote**: produce a finding, then have K independent agents judge it.
25
29
  - **Pipeline**: each item flows scope → search → verify → synthesize independently.
26
30
  - **Loop-until-dry**: keep spawning finders until a round returns nothing new.
31
+ - **Budget-bounded loop**: keep working until a token ceiling is reached.
27
32
 
28
33
  You want: bounded concurrency, schema-validated node output with automatic
29
34
  node-level retry, a replayable journal, and a `null`-on-failure contract so one
30
35
  bad node never crashes the run.
31
36
 
32
- **Don't** use omw for a single agent call, or for work that needs a sandbox
33
- (omw deliberately has none your script is trusted code), or where a node is a
34
- single raw LLM API call (that's LangGraph/Mastra territory; an omw node is a
35
- *whole coding agent*).
37
+ **Don't** use omw for a single agent call, or where a node is a single raw LLM
38
+ API call (that's LangGraph/Mastra territory; an omw node is a *whole coding
39
+ agent*). omw has no sandbox by default your script is trusted code — though you
40
+ can opt into a determinism sandbox with `--strict`.
36
41
 
37
42
  ## The 30-second free demo (no API key, nothing to clone)
38
43
 
@@ -40,11 +45,12 @@ omw is on npm, so you can run the whole thing in one line — no install step, n
40
45
  key, no cost:
41
46
 
42
47
  ```sh
43
- bunx oh-my-workflow@latest run examples/deep-research --agent fake
48
+ bunx github:domuk-k/oh-my-workflow run examples/deep-research --agent fake
44
49
  # → {"confirmed":[…],"summary":{…}} exit 0 · no key · no cost · deterministic
45
50
  ```
46
51
 
47
- > Tip: use `@latest`. A bare `bunx oh-my-workflow` can serve a stale cached copy.
52
+ > Tip: the GitHub source keeps the skill and runtime aligned before a new npm
53
+ > release lands.
48
54
 
49
55
  That single command runs the **whole spine** for you — a fan-out search, a
50
56
  pipeline, a scripted schema-fail→self-repair, and a scripted timeout→drop — and
@@ -52,48 +58,72 @@ prints one result JSON. Want to watch it happen? Add `--pretty` for the
52
58
  phase/fan-out tree on stderr:
53
59
 
54
60
  ```sh
55
- bunx oh-my-workflow@latest run examples/deep-research --agent fake --pretty
61
+ bunx github:domuk-k/oh-my-workflow run examples/deep-research --agent fake --pretty
56
62
  ```
57
63
 
58
64
  `--agent fake` is a built-in, deterministic adapter — it's the no-key demo engine
59
- and the test double. When you're ready for real work, run `claude login` once and
60
- swap `--agent fake` `--agent claude`. Same script, real nodes.
65
+ and the test double. For real work, write the workflow and run `omw run <file>`;
66
+ the CLI defaults to `--agent auto`, choosing the current/installed coding-agent
67
+ CLI. Set `OMW_AGENT=claude|codex|hermes` only when you need to pin it.
61
68
 
62
69
  > **Reading this as a skill?** You already have it. To install/update it for a
63
- > coding agent: `bunx oh-my-workflow@latest skill install` (→ `~/.claude/skills/`,
64
- > or `--project` for one repo); `omw skill path` prints the bundled copy for other
70
+ > coding agent: `bunx github:domuk-k/oh-my-workflow skill install` (→ `~/.claude/skills/`;
71
+ > `--codex` `~/.codex/skills/`; `--opencode` `~/.config/opencode/skills/`;
72
+ > `--project` for one repo). `omw skill path` prints the bundled copy for other
65
73
  > hosts. Re-run `skill install` anytime to refresh.
66
74
 
67
75
  ---
68
76
 
69
- ## The 5 hooks (the entire API)
77
+ ## The hooks (the entire API)
70
78
 
71
- Your script is a module that **default-exports** `async (rt, args) => result`.
72
- `rt` is the runtime; `args` is whatever `--args '{…}'` passed (parsed JSON).
73
- The returned value is serialized to stdout as the run's single result JSON.
79
+ Your script is a module that **default-exports** a function taking the **hooks**
80
+ as a destructured first argument and your `args` second:
74
81
 
75
82
  ```ts
76
- export default async function (rt, args) {
77
- // rt.agent / rt.pipeline / rt.parallel / rt.phase / rt.log
83
+ export default async function ({ agent, parallel, pipeline, phase, log, workflow, budget }, args) {
84
+ // destructure only the hooks you use
78
85
  return { /* whatever you want on stdout */ };
79
86
  }
80
87
  ```
81
88
 
82
- ### `rt.agent(prompt, opts?) => Promise<result | null>`
89
+ `args` is whatever `--args '{…}'` passed (parsed JSON). The returned value is
90
+ serialized to stdout as the run's single result JSON. (Legacy `(rt, args)` scripts
91
+ that call `rt.agent(…)` still run — the same object is passed — but they're
92
+ deprecated; run `omw codemod <file>` to migrate. The bridge is removed in 0.5.)
83
93
 
84
- Runs one coding-agent CLI node. **Never throws.** A terminal failure resolves to
85
- `null` (and is journaled with a failure `kind`). This is the load-bearing
86
- **null-contract** — build on it with `filter(Boolean)` and abstain quorums.
94
+ Optionally declare a `meta` block (a pure literal, like native):
87
95
 
88
96
  ```ts
89
- const out = await rt.agent("SCOPE the question into topics", {
97
+ export const meta = {
98
+ name: "deep-research",
99
+ description: "fan-out research with verify",
100
+ phases: [{ title: "Search", model: "smart" }, { title: "Verify" }],
101
+ };
102
+ ```
103
+
104
+ `meta.phases[].model` and `meta.model` set a default model per phase / for the
105
+ run; the effective model resolves along **`opts.model > phase model > meta.model`**.
106
+
107
+ ### `agent(prompt, opts?) => Promise<result | null>`
108
+
109
+ Runs one coding-agent CLI node. **Never throws** (the one exception is `budget`
110
+ exhaustion — see below). A terminal failure resolves to `null` (and is journaled
111
+ with a failure `kind`). This is the load-bearing **null-contract** — build on it
112
+ with `filter(Boolean)` and abstain quorums.
113
+
114
+ ```ts
115
+ const out = await agent("SCOPE the question into topics", {
90
116
  schema: { type: "object", required: ["topics"], properties: { topics: { type: "array" } } },
91
- label: "scope", // shows in the journal / --pretty tree
92
- phase: "Scope", // overrides the ambient phase() for this call
93
- model: "smart", // tier alias or raw model string, passed to the adapter
94
- timeoutMs: 120_000, // kill the subprocess after this; failure kind = "timeout"
95
- cwd: "/path/to/repo", // run the agent in this directory
96
- maxRetries: 2, // schema-gate retries (default 2 up to 3 attempts)
117
+ label: "scope", // shows in the journal / --pretty tree (cosmetic; not in resume key)
118
+ phase: "Scope", // overrides the ambient phase() for this call (cosmetic)
119
+ model: "smart", // tier alias or raw model string, passed to the adapter
120
+ effort: "high", // reasoning-effort hint: low|medium|high|xhigh|max (adapter maps it where supported)
121
+ agentType: "Explore", // cross-vendor node profile (named agent persona)
122
+ isolation: "worktree", // run this node in a fresh ephemeral git worktree (cwd = the worktree)
123
+ timeoutMs: 120_000, // kill the subprocess after this; failure kind = "timeout"
124
+ cwd: "/path/to/repo", // run the agent in this directory
125
+ maxRetries: 2, // schema-gate retries (default 2 → up to 3 attempts)
126
+ inheritMcp: false, // default: isolate from host MCP servers (fast). true = inherit (claude only; codex ignores)
97
127
  });
98
128
  ```
99
129
 
@@ -105,8 +135,14 @@ const out = await rt.agent("SCOPE the question into topics", {
105
135
  structured outcome. The schema is plain JSON Schema.
106
136
  - **Without `schema`**: one shot; returns the raw text string, or `null` on
107
137
  adapter failure.
138
+ - `effort`/`agentType` are passed through to adapters that support them; the
139
+ `claude` adapter has no faithful CLI flag for them yet, so it **drops them with
140
+ a one-time warn** (honest-scope) rather than silently pretending.
141
+ - `isolation: "worktree"` gives the node its own ephemeral `git worktree` as cwd,
142
+ so parallel file-mutating nodes don't clobber each other; the worktree is
143
+ auto-removed if the node left it clean. A non-git cwd runs in place with a warn.
108
144
 
109
- ### `rt.parallel(thunks) => Promise<any[]>` — barrier
145
+ ### `parallel(thunks) => Promise<any[]>` — barrier
110
146
 
111
147
  Runs thunks concurrently, awaits **all** of them. A thunk that throws (or whose
112
148
  agent fails) becomes `null` in the result array — the call itself never rejects.
@@ -114,12 +150,12 @@ agent fails) becomes `null` in the result array — the call itself never reject
114
150
  together (dedup, count, cross-comparison).
115
151
 
116
152
  ```ts
117
- const results = (await rt.parallel(
118
- topics.map((t) => () => rt.agent(`SEARCH ${t}`, { schema: S, label: `search:${t}` })),
153
+ const results = (await parallel(
154
+ topics.map((t) => () => agent(`SEARCH ${t}`, { schema: S, label: `search:${t}` })),
119
155
  )).filter(Boolean);
120
156
  ```
121
157
 
122
- ### `rt.pipeline(items, ...stages) => Promise<any[]>` — no barrier (default)
158
+ ### `pipeline(items, ...stages) => Promise<any[]>` — no barrier (default)
123
159
 
124
160
  Runs each item through **all** stages independently. Item A can be in stage 3
125
161
  while item B is still in stage 1 — wall-clock is the slowest single chain, not
@@ -129,16 +165,49 @@ default for multi-stage work; only use `parallel` as a barrier when a stage
129
165
  genuinely needs the whole previous result set at once.
130
166
 
131
167
  ```ts
132
- const verified = (await rt.pipeline(
168
+ const verified = (await pipeline(
133
169
  found,
134
170
  async (f) => {
135
- const v = await rt.agent(`VERIFY ${JSON.stringify(f)}`, { schema: V });
171
+ const v = await agent(`VERIFY ${JSON.stringify(f)}`, { schema: V });
136
172
  return v ? { ...f, ...v } : null; // null → dropped by the filter below
137
173
  },
138
174
  )).filter(Boolean);
139
175
  ```
140
176
 
141
- ### `rt.phase(title)` and `rt.log(msg)`
177
+ ### `workflow(ref, args?) => Promise<result>` — nested sub-workflow
178
+
179
+ Runs another workflow inline as a sub-step, **one level deep**, sharing this run's
180
+ adapter, journal, and budget pool. `ref` is a path string or `{ scriptPath }`.
181
+
182
+ ```ts
183
+ const sub = await workflow({ scriptPath: "./refine.ts" }, { topic });
184
+ ```
185
+
186
+ A `workflow()` call **inside** a child throws (`"workflow() nesting is one level
187
+ only"`) — a runaway-recursion backstop.
188
+
189
+ ### `budget` — token ceiling
190
+
191
+ `budget` is `{ total, spent(), remaining() }`. Set a ceiling with `--budget N`;
192
+ `total` is `null` when unset and `remaining()` is then `Infinity`. Once spent
193
+ reaches `total`, `agent()` **throws `BudgetExceededError`** — the *one* documented
194
+ exception to the null-contract — so a bounded loop terminates instead of spinning.
195
+ A throw inside `parallel`/`pipeline` is still swallowed to `null` (matches native).
196
+
197
+ ```ts
198
+ const out = [];
199
+ while (budget.remaining() > 50_000) { // guard, or let agent() throw at the ceiling
200
+ const r = await agent("find the next bug");
201
+ if (r) out.push(r);
202
+ }
203
+ ```
204
+
205
+ > `budget` counts **output tokens the adapter reports** (success or a failure
206
+ > envelope that carries `usage`). A token-less failure (a killed timeout reports
207
+ > no usage) can't be counted — so a loop on a purely-timing-out node isn't bounded
208
+ > by `--budget` alone; pair it with your own iteration cap.
209
+
210
+ ### `phase(title)` and `log(msg)`
142
211
 
143
212
  `phase` groups subsequent `agent()` calls under a heading in the journal and the
144
213
  `--pretty` tree. `log` emits a narration line. Both are side-channel only — they
@@ -157,10 +226,10 @@ pass hundreds of items — only ~N agent subprocesses run at once; the rest queu
157
226
  ### Fan-out (barrier)
158
227
 
159
228
  ```ts
160
- export default async function (rt, args) {
161
- rt.phase("Search");
162
- const hits = (await rt.parallel(
163
- args.queries.map((q) => () => rt.agent(`SEARCH: ${q}`, { schema: HIT, label: `q:${q}` })),
229
+ export default async function ({ agent, parallel, phase }, args) {
230
+ phase("Search");
231
+ const hits = (await parallel(
232
+ args.queries.map((q) => () => agent(`SEARCH: ${q}`, { schema: HIT, label: `q:${q}` })),
164
233
  )).filter(Boolean);
165
234
  return { hits, count: hits.length };
166
235
  }
@@ -173,10 +242,10 @@ Count only real verdicts, and require a quorum of *cast* votes so an all-abstain
173
242
  finding doesn't silently survive.
174
243
 
175
244
  ```ts
176
- async function survives(rt, claim) {
177
- const votes = (await rt.parallel(
245
+ async function survives({ agent, parallel }, claim) {
246
+ const votes = (await parallel(
178
247
  [1, 2, 3].map(() => () =>
179
- rt.agent(`Try to REFUTE this claim. Default to refuted=true if unsure: ${claim}`, {
248
+ agent(`Try to REFUTE this claim. Default to refuted=true if unsure: ${claim}`, {
180
249
  schema: { type: "object", required: ["refuted"], properties: { refuted: { type: "boolean" } } },
181
250
  })),
182
251
  )).filter(Boolean); // drop abstainers (null)
@@ -185,11 +254,11 @@ async function survives(rt, claim) {
185
254
  }
186
255
  ```
187
256
 
188
- **Fresh context is the point — not self-critique.** Each `rt.agent()` call is a
189
- brand-new `claude -p` subprocess with no memory of the producer's turn, so a
190
- verify-vote node judges the claim cold. That is the structural form of Anthropic's
191
- own guidance for its most capable model: *"Separate, fresh-context verifier
192
- subagents tend to outperform self-critique"* ([Fable 5 prompting
257
+ **Fresh context is the point — not self-critique.** Each `agent()` call is a
258
+ brand-new subprocess with no memory of the producer's turn, so a verify-vote node
259
+ judges the claim cold. That is the structural form of Anthropic's own guidance for
260
+ its most capable model: *"Separate, fresh-context verifier subagents tend to
261
+ outperform self-critique"* ([Fable 5 prompting
193
262
  guide](https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/prompting-claude-fable-5)).
194
263
  omw gets it for free — **as long as you keep verification a separate `agent()`
195
264
  call.** Do **not** verify by feeding the result back into the producer's own
@@ -197,10 +266,9 @@ session: the schema-gate's in-session self-repair (the `--resume` / `followUp`
197
266
  path) deliberately *reuses* the producer's context to fix output **format**, which
198
267
  is the exact opposite of fresh-context verification. Use self-repair to make a
199
268
  node's JSON valid; use a new `agent()` to judge whether the *content* is true.
200
- (A **cross-CLI** verifier — a different agent CLI than the producer, so a shared
201
- memorized shortcut can't survive both — is a natural extension but **not a feature
202
- today**: omw binds one adapter per run, so per-node verifier selection is future
203
- work. Verify with fresh same-CLI nodes for now.)
269
+ (A **cross-CLI** verifier — a different agent CLI than the producer, via per-node
270
+ `agentType` / a different adapter — is a natural extension but **not a feature
271
+ today**: omw binds one adapter per run. Verify with fresh same-CLI nodes for now.)
204
272
 
205
273
  ### Gate on evidence, not intent
206
274
 
@@ -228,7 +296,7 @@ const strong = {
228
296
  output: { type: "string" }, // the observed tail, not a claim about it
229
297
  },
230
298
  };
231
- const built = await rt.agent("Run the build and the test suite. Report the command, its exit code, the number of passing tests, and the tail of the output.", { schema: strong });
299
+ const built = await agent("Run the build and the test suite. Report the command, its exit code, the number of passing tests, and the tail of the output.", { schema: strong });
232
300
  ```
233
301
 
234
302
  **Executable-evidence verify node** — combine this with fresh-context verification:
@@ -236,11 +304,11 @@ a separate node *runs what the producer built and observes the result* before th
236
304
  finding is accepted, rather than judging the producer's description of it.
237
305
 
238
306
  ```ts
239
- const verified = (await rt.pipeline(
307
+ const verified = (await pipeline(
240
308
  artifacts,
241
309
  async (a) => {
242
310
  // a.path was written by an upstream node; this fresh node runs it and reports facts.
243
- const v = await rt.agent(
311
+ const v = await agent(
244
312
  `Run \`${a.runCmd}\` in ${a.path}. Report exitCode and the output tail. Do not fix anything — only observe and report.`,
245
313
  { schema: { type: "object", required: ["exitCode", "output"], properties: { exitCode: { type: "number" }, output: { type: "string" } } } },
246
314
  );
@@ -252,10 +320,10 @@ const verified = (await rt.pipeline(
252
320
  ### Pipeline (no barrier)
253
321
 
254
322
  ```ts
255
- const out = (await rt.pipeline(
323
+ const out = (await pipeline(
256
324
  items,
257
- (item) => rt.agent(`ANALYZE ${item.id}`, { schema: A, label: `analyze:${item.id}` }),
258
- (analysis, item) => (analysis ? rt.agent(`SUMMARIZE ${item.id}: ${JSON.stringify(analysis)}`, { schema: S }) : null),
325
+ (item) => agent(`ANALYZE ${item.id}`, { schema: A, label: `analyze:${item.id}` }),
326
+ (analysis, item) => (analysis ? agent(`SUMMARIZE ${item.id}: ${JSON.stringify(analysis)}`, { schema: S }) : null),
259
327
  )).filter(Boolean);
260
328
  ```
261
329
 
@@ -266,8 +334,8 @@ For unknown-size discovery: keep going until K consecutive rounds find nothing n
266
334
  ```ts
267
335
  const seen = new Set(); const found = []; let dry = 0;
268
336
  while (dry < 2) {
269
- const round = (await rt.parallel(
270
- FINDERS.map((f) => () => rt.agent(f.prompt, { schema: BUG })),
337
+ const round = (await parallel(
338
+ FINDERS.map((f) => () => agent(f.prompt, { schema: BUG })),
271
339
  )).filter(Boolean);
272
340
  const fresh = round.filter((b) => !seen.has(b.key));
273
341
  if (fresh.length === 0) { dry++; continue; }
@@ -275,12 +343,25 @@ while (dry < 2) {
275
343
  }
276
344
  ```
277
345
 
346
+ ### Loop-until-budget
347
+
348
+ Scale depth to a token ceiling — guard on `budget.total` so an unset budget
349
+ (`remaining()` = `Infinity`) doesn't loop forever.
350
+
351
+ ```ts
352
+ const bugs = [];
353
+ while (budget.total && budget.remaining() > 50_000) {
354
+ const r = await agent("Find one more bug.", { schema: BUG });
355
+ if (r) bugs.push(r);
356
+ }
357
+ ```
358
+
278
359
  ---
279
360
 
280
361
  ## The run → journal → fix loop (this is the UX)
281
362
 
282
363
  ```sh
283
- bun src/cli/omw.ts run my-workflow.ts --agent claude --args '{"q":"…"}'
364
+ bunx github:domuk-k/oh-my-workflow run my-workflow.ts --args '{"q":"…"}' --pretty
284
365
  ```
285
366
 
286
367
  - **stdout** = the result JSON, one blob. Pipe it, parse it.
@@ -293,13 +374,15 @@ bun src/cli/omw.ts run my-workflow.ts --agent claude --args '{"q":"…"}'
293
374
  | code | meaning | where the detail is |
294
375
  |---|---|---|
295
376
  | `0` | run completed (node failures are absorbed by the null-contract) | stdout = result JSON |
296
- | `1` | **script error** — your JS threw, or syntax/load failure | stderr: `{"error":"script_error"\|"load_failed",…}` |
377
+ | `1` | **script error** — your JS threw (incl. `BudgetExceededError`), or syntax/load failure | stderr: `{"error":"script_error"\|"load_failed",…}` |
297
378
  | `2` | usage error (bad flags) | stderr: usage line |
298
379
  | `3` | adapter CLI not on PATH | stderr: `{"error":"adapter_missing","install_hint":…}` |
380
+ | `4` | completed, but a node hit `internal_error` (author bug, e.g. invalid schema) | stdout = partial result; stderr: `{"error":"internal_error_nodes",…}` |
299
381
 
300
382
  Exit `1` means **your script** threw (an `agent()` returning `null` does *not*
301
- throw — only your own code does). Exit `0` with fewer results than expected means
302
- nodes failed and were filtered — read the journal.
383
+ throw — only your own code, or an uncaught `BudgetExceededError`, does). Exit `0`
384
+ with fewer results than expected means nodes failed and were filtered — read the
385
+ journal.
303
386
 
304
387
  ### Reading a journal
305
388
 
@@ -339,33 +422,33 @@ Failure `kind`s on `agent_end`:
339
422
  `omw replay .omw/<runId>.jsonl [--json]` reconstructs the tree / a stats summary
340
423
  from a journal — a read-only **fixture replay** (reading back what a run
341
424
  recorded). For *live* resume (re-running nodes whose key changed, reusing the
342
- cached ones), use `omw run <wf> --resume <journal>` — see Scope below.
425
+ cached ones), use `omw run <wf> --resume <journal|runId>` — see Scope below.
343
426
 
344
427
  `omw validate <wf> [--json]` is a pre-flight that loads the module and lints a
345
428
  `fake` fixture for the silent-degradation traps (top-level `responses`, a string
346
429
  `match`, no rules+default) **without spawning agents** — exit 0 clean, 1 on a
347
- load/fixture problem. And a node that throws an `internal_error` (e.g. a JSON
348
- Schema that won't compile) no longer hides behind the null-contract: the run
349
- escalates to **exit 4** (the partial result still prints to stdout, and a
350
- `{"error":"internal_error_nodes","calls":[…]}` line goes to stderr), so an author
351
- bug reads differently from a flaky node abstaining.
430
+ load/fixture problem.
352
431
 
353
432
  ---
354
433
 
355
434
  ## Conventions (follow these)
356
435
 
357
- 1. **Build on the null-contract.** `agent()` returns `null`, never throws.
358
- `.filter(Boolean)` after every `parallel`/`pipeline`. For votes, require a
359
- quorum of *cast* (non-null) results so all-abstain can't pass.
436
+ 1. **Build on the null-contract.** `agent()` returns `null`, never throws (except
437
+ `BudgetExceededError` at the ceiling). `.filter(Boolean)` after every
438
+ `parallel`/`pipeline`. For votes, require a quorum of *cast* (non-null) results
439
+ so all-abstain can't pass.
360
440
  2. **Always pass a `schema` when you need structured data.** The gate's
361
441
  self-repair is the one genuine differentiator — use it instead of parsing
362
442
  prose yourself. Keep schemas tight (`required` + types).
363
443
  3. **Stay deterministic.** Don't branch the *shape* of the run on `Date.now()` /
364
- `Math.random()` / wall-clock. The resume key is `(callIndex, promptHash,
365
- optsHash)` (the journaled field is `call`); if a re-run's `agent()` call order shifts, every key shifts and
366
- resume breaks. Vary content by index, not by randomness. (omw can't *enforce*
367
- this no sandbox so it's a convention you keep; enforcement is v2.)
368
- 4. **stdout is for the machine.** Return your result; use `rt.log` / `--pretty`
444
+ `Math.random()` / wall-clock. The resume key is the **semantic** subset of
445
+ `(callIndex, promptHash, optsHash)` cosmetic `label`/`phase` changes don't
446
+ bust the cache, but `model`/`schema`/`effort`/`isolation` do. If a re-run's
447
+ `agent()` call order shifts, every key shifts and resume breaks; vary content
448
+ by index, not by randomness. omw can't enforce determinism by default (no
449
+ sandbox) — but pass **`--strict`** to freeze `Date`/`Math.random` to throw for
450
+ a reproducible run.
451
+ 4. **stdout is for the machine.** Return your result; use `log` / `--pretty`
369
452
  for humans. Never `console.log` to stdout from a workflow.
370
453
  5. **Ship a `fake` fixture for your example.** Export `const fake` alongside your
371
454
  default export so `--agent fake` runs deterministically with no key. The shape:
@@ -376,7 +459,8 @@ bug reads differently from a flaky node abstaining.
376
459
  // `responses` is a cursor that advances per invocation and sticks on the last —
377
460
  // so [invalidJSON, validJSON] models a schema self-repair, and a single
378
461
  // { fail } models a hard failure. A FakeResponse is { text } (a raw JSON
379
- // STRING the gate then extracts + validates) or { fail, stderr }.
462
+ // STRING the gate then extracts + validates) or { fail, stderr }. Either may
463
+ // carry { outputTokens } to drive budget tests.
380
464
  rules: [
381
465
  { match: (p) => p.includes("SCOPE"), responses: [{ text: '{"topics":["a","b"]}' }] },
382
466
  { match: (p) => p.includes("SEARCH a"),
@@ -390,7 +474,7 @@ bug reads differently from a flaky node abstaining.
390
474
  Common mistake: a top-level `responses` array (instead of `rules`) or a string
391
475
  `match` is silently ignored — every node then returns `default` and the demo
392
476
  degenerates to an empty result. See `examples/deep-research/workflow.ts` for a
393
- full working fixture.
477
+ full working fixture, and `conformance/*.ts` for native-shaped samples.
394
478
 
395
479
  ---
396
480
 
@@ -402,31 +486,41 @@ agents that expose such a CLI can be nodes.
402
486
  | adapter | status | invoke | structured out | in-session follow-up |
403
487
  |---|---|---|---|---|
404
488
  | **fake** | built-in, free, deterministic | in-process fixtures | as scripted | yes (fixture) |
405
- | **claude** | **full** (live-verified, claude 2.1.x) | `claude -p <p> --output-format json` | parse `.result` | `--resume` |
406
- | **codex** | **experimental** (live-verified, codex 0.137.x) | `codex exec --json -s workspace-write` | last `agent_message` from JSONL | `exec resume` |
489
+ | **claude** | **full** (live-verified, claude 2.1.x) | `claude -p <p> --output-format json --strict-mcp-config` | parse `.result` | `--resume` (same cwd) |
490
+ | **codex** | **experimental** (live-verified, codex 0.137.x) | `codex exec --json -s workspace-write` | last `agent_message` from JSONL | `exec resume` (same cwd) |
491
+ | **hermes** | **experimental** | `hermes -z <prompt> --yolo` | stdout IS the response (heuristic JSON extract) | — (fresh retries) |
407
492
  | **pi** | planned | `pi --print` | stdout | — |
408
493
  | **kiro** | **not a fit** | — | — | — |
409
494
 
410
495
  > The "in-session follow-up" column is the adapter flag the **schema gate** uses to
411
496
  > re-prompt a node in the same session — *not* run-level resume. Run-level resume
412
- > (skipping unchanged nodes across separate runs) is **v2**; see Honest scope below.
497
+ > (`--resume`, skipping unchanged nodes across runs) is a separate path.
413
498
 
414
499
  - **claude** renames its envelope onto omw's contract (`session_id→sessionId`,
415
- `total_cost_usd→costUsd`, `duration_ms→durationMs`; `is_error`/non-success
416
- `subtype` → `ok:false`).
500
+ `total_cost_usd→costUsd`, `duration_ms→durationMs`, `usage.output_tokens→
501
+ outputTokens`; `is_error`/non-success `subtype` → `ok:false`). By default a node
502
+ runs **isolated from the host's MCP servers** (`--strict-mcp-config`) — booting
503
+ figma/devtools/etc. on every node is the dominant fan-out latency, and a
504
+ coding-agent node rarely needs them. Opt back in per call with `{ inheritMcp:
505
+ true }`. `opts.effort`/`opts.agentType` have no faithful `claude -p` flag yet, so
506
+ they're **dropped with a one-time warn** rather than silently honored. The
507
+ schema-gate `--resume` runs in the **same cwd** as the original invoke and
508
+ **mirrors the same MCP choice**.
417
509
  - **codex** is experimental: it has **no cost field** (tokens only, so `costUsd`
418
510
  stays undefined), and its JSONL can include malformed lines under MCP
419
511
  (openai/codex#15451) — omw tolerates them line-by-line and fails *actionably*
420
- (surfacing the reason) rather than returning empty. Default sandbox is
421
- `workspace-write`.
422
- - **pi** isn't wired yet (not installed locally `--agent pi` returns exit 3
423
- with an install hint). It's a planned experimental adapter.
424
- - **kiro is excluded on purpose**: its CLI is a VS-Code-based IDE launcher (open
425
- files, diffs, extensions), with no headless promptresult interface so it
426
- can't be an omw node. The bar for an adapter is a real headless execution CLI.
512
+ rather than returning empty. Default sandbox is `workspace-write`.
513
+ - **hermes** is experimental: `-z/--oneshot` prints only the response text, so the
514
+ result is stdout (no JSON envelope; schema-gate extracts JSON heuristically).
515
+ `--yolo` runs it non-interactively. No in-session followUp (no session id on
516
+ stdout) schema retries use fresh invokes. No cost field.
517
+ - **pi** isn't wired yet (`--agent pi` exit 3 with an install hint).
518
+ - **kiro is excluded on purpose**: its CLI is a VS-Code-based IDE launcher, with
519
+ no headless prompt→result interface — so it can't be an omw node.
427
520
 
428
521
  Missing CLI → exit 3 with `install_hint`. Run `--agent fake` any time for the
429
- free path.
522
+ free path. `--agent auto` is the default: it honors `OMW_AGENT`, then host
523
+ environment hints, then installed CLIs (`claude`, `codex`, `hermes`).
430
524
 
431
525
  ---
432
526
 
@@ -451,54 +545,57 @@ self-repair loop, which is the one piece a "subprocess + for-loop" doesn't have.
451
545
 
452
546
  ### Resemblance ledger (vs the CC dynamic-workflow surface)
453
547
 
454
- **✅ Genuinely the same idea** — model-authored plain-JS orchestration; the
455
- 5-hook shape (`agent`/`pipeline`/`parallel`/`phase`/`log`); `null`-resolution +
456
- `filter(Boolean)`; schema-forced structured output; a step-by-step journal;
457
- resume key `(callIndex, promptHash, optsHash)` (frozen and **proven byte-stable**
458
- across re-runs); **live resume** via `omw run --resume <journal>` — a **per-node
459
- key match** (cached nodes skip the adapter, `agent_end{cached:true}`; nodes whose
460
- key changed re-run; verified end-to-end on `--agent fake`).
548
+ **✅ Genuinely the same idea** — model-authored plain-JS orchestration with the
549
+ destructured-DI shape; the native vocabulary `agent`/`parallel`/`pipeline`/
550
+ `phase`/`log`/`workflow`/`budget`; an optional `meta`/`phases` block with model
551
+ precedence; `null`-resolution + `filter(Boolean)`; schema-forced structured
552
+ output; `agent` opts `effort`/`agentType`/`isolation:'worktree'`; `budget` with a
553
+ shared spend pool and a `BudgetExceededError` ceiling; nested `workflow()` (one
554
+ level); a step-by-step journal; the resume key `(callIndex, promptHash,
555
+ optsHash)` (frozen, byte-stable, and keyed on the **semantic** opts subset);
556
+ **live resume** via `omw run --resume <journal|runId>`; and an opt-in `--strict`
557
+ determinism sandbox.
461
558
 
462
559
  > One honest altitude difference even here: a CC Workflow node is a single
463
560
  > in-harness subagent; an **omw node is a whole external coding-agent CLI**
464
- > subprocess. Same orchestration shape, heavier nodes.
561
+ > subprocess. Same orchestration shape, heavier nodes. And the no-magic stance is
562
+ > deliberate: omw runs your script as-is (no source transform), hands hooks as an
563
+ > argument (no ambient globals), and leaves determinism opt-in (`--strict`).
465
564
 
466
565
  **🟡 Designed-but-scoped** —
467
- - *Determinism enforcement*: CC throws on `Date.now`/`Math.random`; omw treats it
468
- as a **convention** (no sandbox), so live resume holds **only for workflows that
469
- keep it**. A guard that *enforces* it in resume mode is v2.
470
- - *Resume is per-node, not dependency-aware*: it matches `(callIndex, promptHash,
471
- optsHash)`, so an upstream edit invalidates a downstream node **only if** that
472
- output is threaded into the downstream prompt/opts. This is deliberate — it
473
- preserves **parallel/pipeline sibling cache** (independent fan-out nodes aren't
474
- forced live just because an earlier sibling changed). **The trap**: an omw node
475
- is a whole coding-agent CLI that works on the **filesystem**, so "node 1 writes
476
- files, node 2 reads them" is the *normal* coding-agent idiom not an exotic
477
- anti-pattern — and that channel is invisible to the key. Edit node 1 → on resume
478
- it re-runs and writes different files, but node 2 **hits its cache and serves a
566
+ - *Determinism enforcement*: native throws on `Date.now`/`Math.random` always;
567
+ omw makes it **opt-in** via `--strict` (the rest of the time it's a convention).
568
+ - *Resume is per-node, not dependency-aware*: it matches the semantic
569
+ `(callIndex, promptHash, optsHash)`, so an upstream edit invalidates a
570
+ downstream node **only if** that output is threaded into the downstream
571
+ prompt/opts. This is deliberate — it preserves **parallel/pipeline sibling
572
+ cache**. **The trap**: an omw node is a whole coding-agent CLI that works on the
573
+ **filesystem**, so "node 1 writes files, node 2 reads them" is the *normal*
574
+ idiom and that channel is invisible to the key. Edit node 1 → on resume it
575
+ re-runs and writes different files, but node 2 **hits its cache and serves a
479
576
  summary of the old files** (silently stale). Remedies: (a) re-run fresh (drop
480
- `--resume`) when an upstream's filesystem effects changed, or (b) thread a
481
- content digest of the changed files into the downstream prompt so its hash moves.
482
- An opt-in `--strict-resume` (prefix truncation: force every node after the first
483
- key MISS live correct cascade for *linear* workflows, but over-invalidates
484
- *parallel* siblings) and a dependency-aware cascade are both **v2** candidates;
485
- per-node stays the default precisely because it keeps the parallel cache.
486
-
487
- **❌ Not implemented (CC Workflow has these; omw v1 does not)** — `budget`
488
- (token-target loops), nested `workflow()` (running another workflow inline), a
489
- `meta`/`phases` declaration block, `opts.agentType` (custom subagent types),
490
- `opts.effort`, `run_in_background`, and `isolation: 'worktree'`. Don't write
491
- scripts that assume these.
577
+ `--resume`), or (b) thread a content digest of the changed files into the
578
+ downstream prompt so its hash moves. A dependency-aware cascade is v2.
579
+ - *`budget` counts reported output tokens only*: a token-less failure (a killed
580
+ timeout) can't be counted, so pair `--budget` with your own iteration cap when a
581
+ node may fail without producing tokens.
582
+
583
+ **❌ Not implemented** (native has these; omw does not) — `run_in_background`
584
+ (async node scheduling), and per-node verifier selection across *different*
585
+ adapters in one run (omw binds one adapter per run; `agentType` is passed through
586
+ but cross-CLI routing is future work). Don't write scripts that assume these.
492
587
 
493
588
  ---
494
589
 
495
590
  ## Quick reference
496
591
 
497
- - Module: `export default async (rt, args) => result` · optional `export const fake`.
592
+ - Module: `export default async ({ agent, parallel, pipeline, phase, log, workflow, budget }, args) => result` · optional `export const meta` / `export const fake`. (Legacy `(rt, args)` still runs; `omw codemod <file>` migrates it.)
498
593
  - Path resolves a directory to `workflow.ts` / `workflow.js` / `index.ts` / `index.js`.
499
- - `omw run <wf> --agent <fake|claude|codex|pi> [--args JSON] [--concurrency N] [--resume <journal.jsonl>] [--pretty]`
594
+ - `omw run <wf> [--agent <auto|fake|claude|codex|hermes|pi>] [--args JSON] [--concurrency N] [--budget N] [--resume <journal|runId>] [--strict] [--pretty]`
500
595
  - `omw replay <journal.jsonl> [--json]`
501
596
  - `omw validate <wf> [--json]` — pre-flight: load + fake-fixture lint, no agents spawned.
502
- - exit codes: `0` ok · `1` script/load error · `2` usage · `3` adapter missing · `4` completed but a node hit `internal_error` (author bug; result still on stdout).
597
+ - `omw codemod <file> [--to-di] [--write]` migrate a legacy `(rt, args)` workflow to destructured DI.
598
+ - `omw skill install [--codex|--opencode] [--project]` — install this skill for a coding agent.
599
+ - exit codes: `0` ok · `1` script/load error (incl. budget ceiling) · `2` usage · `3` adapter missing · `4` completed but a node hit `internal_error` (author bug; result still on stdout).
503
600
  - stdout = result JSON · journal = `.omw/<runId>.jsonl` · `--pretty` tree = stderr.
504
- - `agent()` never throws → `filter(Boolean)`; quorum of cast votes for verify-vote.
601
+ - `agent()` never throws (except `BudgetExceededError`) → `filter(Boolean)`; quorum of cast votes for verify-vote.