@agent-compose/cli 0.2.1 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@agent-compose/cli",
3
- "version": "0.2.1",
3
+ "version": "0.2.2",
4
4
  "description": "Command-line interface for agent-compose — register, invoke, and monitor workflows from your terminal.",
5
5
  "license": "MIT",
6
6
  "repository": {
package/skills/ac:demo.md CHANGED
@@ -30,9 +30,13 @@ Make sure the user understands the three primitives before you start
30
30
  typing. Keep it brief — they can read [`docs/how-it-works.md`](../../docs/how-it-works.md)
31
31
  for the full version.
32
32
 
33
- > **Workflow** — `async (ctx, sandbox) => T`. The function you author.
34
- > Owns the run end-to-end: orchestrates phases, kicks off agent loops,
35
- > writes metadata, returns a structured result.
33
+ > **Workflow** — the unit of work you author. Two shapes are valid:
34
+ > a single `async (ctx, sandbox) => T` body (run-form, agent-driven)
35
+ > or a typed pipeline of `defineStep(...)` objects composed with
36
+ > `defineWorkflow({...}).step(s1).step(s2).build()` (step-form). The
37
+ > demo below uses run-form because the body is a single agent loop;
38
+ > step-form is the right shape when the work decomposes into phases
39
+ > with typed handoffs.
36
40
 
37
41
  > **Agent loop** — `agent({ runtime, prompt, ... })`. Embedded inside
38
42
  > the workflow body. Drives one iteration cycle of an LLM agent against
@@ -7,131 +7,352 @@ effort: high
7
7
 
8
8
  # Generate Workflow
9
9
 
10
- Interactively scaffold a new workflow from a plain-English description.
11
-
12
- > **Mental model.** A workflow is `async (ctx, sandbox) => T`. The `ctx`
13
- > exposes `run`, `input?`, `setMetadata`, and `step` (for named timeline
14
- > phases). The `sandbox` is the runner's own VM — pass it to
15
- > `agent({ sandbox, ... })` and to any helper that takes a
16
- > `SandboxProvider`. Agent loops live *inside* the workflow body via
17
- > `agent`. There's no `defineAgent` / `spawnAgent` model anymore.
18
-
19
- ## Steps
20
-
21
- 1. Ask the user:
22
- - What should this workflow do? (plain English describe the phases,
23
- their inputs/outputs, and any external calls.)
24
- - Workflow name? (kebab-case, e.g. `code-review`)
25
- - Where to create it? (project-relative path, e.g. `src/workflows/`)
26
- - Runtime for agent loops? (default `claudeRuntime`)
27
- - Network policy needed? (which domains + which brokered secrets to inject?)
28
- - Capture a snapshot on success? (default off; on for setup/environment
29
- workflows whose end-state other workflows boot from)
30
- - Boot this run from another workflow's snapshot? (default no — fresh
31
- `node24` base sandbox)
32
- - Enable the built-in memory extractor? (default off — opt in only when
33
- this workflow's events are worth memorising AND the `workflow-memory`
34
- template is registered in the same factory)
35
- - Additional post-hooks? (ordered list of workflow names that fan out
36
- after this run completes)
37
-
38
- 2. If the user has a reference workflow in their project, read it to match
39
- their conventions (prompt structure, error handling). Otherwise follow
40
- the patterns in [`sdk/README.md`](../../sdk/README.md) and the example
41
- in [`docs/how-it-works.md`](../../docs/how-it-works.md).
42
-
43
- 3. Generate a complete, real implementation:
44
-
45
- ```ts
46
- import { defineWorkflow, agent, claudeRuntime } from "@agent-compose/sdk";
47
- import PROMPT from "./prompt.md" with { type: "text" };
48
-
49
- export default defineWorkflow({
50
- async run(ctx, sandbox) {
51
- // Phase 1 — wrap any non-agent setup in ctx.step() so it shows
52
- // up on the timeline with its own duration:
53
- const data = await ctx.step("fetch-input", async () => {
54
- // pure-server work, fetches, etc.
55
- });
56
-
57
- // Phase 2 agent loop. Each `agent` is one iteration cycle
58
- // with its own prompt + tool allowlist + budget.
59
- const result = await agent({
60
- sandbox,
61
- runtime: claudeRuntime,
62
- prompt: `${PROMPT}\n\nContext: ${JSON.stringify(data)}`,
63
- tools: ["Read", "Write", "Edit", "Bash", "Grep", "Glob"],
64
- budget: { turnsPerIteration: 40, maxIterations: 6 },
65
- // responseSchema: MyZodSchema, // for typed handoffs
66
- });
67
-
68
- await ctx.setMetadata({ summary: result.status?.summary });
69
- return { ok: !!result.status?.completed };
70
- },
71
-
72
- // ── Optional metadata picked up by the bundler at registration ──
73
- //
74
- // networkPolicy / placeholders — outbound traffic + brokered secrets:
75
- //
76
- // networkPolicy: {
77
- // allow: {
78
- // "api.github.com": [{ transform: [{ headers: {
79
- // Authorization: "basic:$GITHUB_TOKEN",
80
- // } }] }],
81
- // "*.openai.com": [{ transform: [{ headers: {
82
- // "OpenAI-Beta": "$OPENAI_BETA_HEADER",
83
- // } }] }],
84
- // },
85
- // },
86
- // // Placeholder values the runner sees in env vars AFTER brokering.
87
- // // The real secret never enters the VM — Vercel's firewall
88
- // // substitutes it on the way out. Only needed when a tool /
89
- // // SDK validates the env-var format on startup.
90
- // placeholders: { GITHUB_TOKEN: "ghp_" + "x".repeat(36) },
91
- //
92
- // snapshots all snapshot config in one object:
93
- //
94
- // snapshots: {
95
- // // Boot from a specific captured snapshot. Pick the id from
96
- // // `agentc snapshot list` or the factory snapshots page.
97
- // bootFrom: { snapshotId: "snap_abc…" },
98
- // saveLatest: true, // capture sandbox on success
99
- // retainSteps: false, // keep one snapshot per successful step
100
- // },
101
- //
102
- // memory opt-in (default false). Requires `workflow-memory` to
103
- // be registered in this factory.
104
- // memory: true,
105
- //
106
- // postRunHooks — ordered list of workflow names that run after this:
107
- // postRunHooks: ["audit-trail", "notify-slack"],
108
- });
109
- ```
110
-
111
- - Always declare the return type (or let TS infer + show it on hover).
112
- - Use `Promise.all` to fan out independent `agent` calls.
113
- - Use `ctx.step("phase-name", () => …)` for any named non-agent phase.
114
-
115
- 4. Show the complete file plus a separate `prompt.md` (or per-phase
116
- prompts) and confirm before writing.
117
-
118
- 5. Write the files, then suggest the user run `bun run typecheck` (or
119
- their project's equivalent) to validate.
120
-
121
- 6. Tell the user: `agentc register <path>` to register, then
122
- `agentc invoke <name> --follow` to test.
10
+ Walk the user through building a new workflow, **in their own words**.
11
+ This skill is used by operators and product folks as well as engineers
12
+ ask questions in plain language and decide the shape internally.
13
+
14
+ ## What to ask the user
15
+
16
+ Ask these one at a time. Keep the language non-technical. Don't ask
17
+ about "step-form vs run-form" that's an internal call.
18
+
19
+ 1. **What should this workflow do?**
20
+ One paragraph in plain English. This becomes the workflow's
21
+ `description` (shown on the dashboard tile and run page header).
22
+ Example: "Pulls open pull requests from a GitHub repo, scores each
23
+ one by review urgency, and posts the top five to Slack."
24
+
25
+ 2. **What information does it take in?**
26
+ Walk through each input field: a short name, a type (text, number,
27
+ yes/no, list, object), and a one-line description of what it's for.
28
+ Each field's description becomes a `.describe(...)` on the zod
29
+ schema so it shows up in the dashboard's Input panel.
30
+
31
+ 3. **What does it produce?**
32
+ Same pattern for the output. Fields + types + per-field
33
+ descriptions. The dashboard's Output panel renders these straight
34
+ from the schema.
35
+
36
+ 4. **Where should it live?**
37
+ Project-relative path, e.g. `src/workflows/`. Also ask for a
38
+ kebab-case name if one isn't obvious from the description.
39
+
40
+ 5. **Does it need to call external services?**
41
+ If yes, ask which (GitHub, Slack, OpenAI, an internal API, etc.).
42
+ You'll generate a network policy + brokered-secret stubs.
43
+
44
+ 6. **Should it run on a schedule?**
45
+ If yes, ask for the cadence in plain English ("every weekday at 9am
46
+ ET", "every hour", "the 1st of every month"). Convert to cron
47
+ yourself.
48
+
49
+ 7. **Anything else worth recording?**
50
+ Optional. Most workflows are fine without this.
51
+
52
+ **Do NOT ask** the user about:
53
+
54
+ - step-form vs run-form
55
+ - sandbox environments / dependency installation
56
+ - snapshots: `saveLatest`, `retainSteps`, `bootFrom`
57
+ - `memory` / `postRunHooks` / `processors`
58
+
59
+ Decide those internally based on what they described (see the
60
+ "Internal decisions" section below).
61
+
62
+ ## Internal decisions
63
+
64
+ After answering the questions above, decide the shape WITHOUT asking:
65
+
66
+ ### Pick the workflow shape
67
+
68
+ - **Step-form** (`defineWorkflow({...}).step(s1).step(s2).build()` with
69
+ `defineStep(...)` objects) pick this when the body decomposes
70
+ cleanly into named phases with typed handoffs. Each step's output
71
+ threads into the next step's input. The dashboard renders each step
72
+ as a typed phase with its own duration. Examples: ETL pipelines,
73
+ classification → enrichment → publish, fetch → score → rank.
74
+
75
+ - **Run-form** (one `async (ctx, sandbox)` body with `agent({...})`
76
+ calls inside) — pick this when the body is dominated by one or more
77
+ agent loops. Decomposing an LLM iteration into typed engine steps is
78
+ the wrong shape. Use `ctx.step("phase-name", () => …)` inside the
79
+ body for observability sub-events when there's pre-agent setup
80
+ worth tracing on the run timeline.
81
+
82
+ When the user's description says "agent", "LLM", "the model decides",
83
+ "reasons through", "writes a summary based on", "navigates the
84
+ website" — that's run-form. When they describe deterministic phases
85
+ that each transform structured data — that's step-form. If genuinely
86
+ mixed, default to run-form and use `ctx.step` for the deterministic
87
+ phases.
88
+
89
+ ### Should it have a sandbox environment?
90
+
91
+ If the workflow needs specific tools, CLIs, or installed packages in
92
+ its runner VM (Playwright + Chromium, a specific Node version, an SDK
93
+ that needs `npm install`-ing first), it needs a **sandbox
94
+ environment** to capture that pre-installed state. The user doesn't
95
+ need to know about this you decide.
96
+
97
+ When you decide a sandbox env is needed:
98
+
99
+ 1. Generate a small `defineSandboxEnvironment` workflow alongside the
100
+ user's workflow. Name it after the deps it captures (e.g.
101
+ `playwright-env`).
102
+ 2. Register that first with `agentc register --build` so the snapshot
103
+ is captured.
104
+ 3. Reference the captured snapshot from the user's workflow via
105
+ `snapshots: { bootFrom: { snapshotId: "<id from the env's first
106
+ run>" } }`.
107
+ 4. Tell the user about it in user terms: "I'll also create a one-time
108
+ setup workflow `playwright-env` that installs Playwright. We
109
+ register that first to capture a baseline image, then your workflow
110
+ boots from that image so it doesn't reinstall on every run."
111
+
112
+ Most workflows DO NOT need this. Default to no sandbox env.
113
+
114
+ ### Always include in the generated file
115
+
116
+ - `description: "..."` on the `defineWorkflow` / `defineSandboxEnvironment`
117
+ call — straight from the user's question 1 answer.
118
+ - `.describe("...")` on **every** zod schema field from the user's
119
+ question 2 + 3 answers. Schema descriptions appear in the dashboard
120
+ IO panels; without them, fields show only their type.
121
+ - `.describe("...")` on the root schema too where helpful surfaces
122
+ at the panel header.
123
+
124
+ ## Templates
125
+
126
+ Generate the appropriate template based on your internal shape
127
+ decision.
128
+
129
+ ### Template A — step-form (typed pipeline)
130
+
131
+ ```ts
132
+ import { defineWorkflow, defineStep } from "@agent-compose/sdk";
133
+ import { z } from "zod";
134
+
135
+ const InputSchema = z.object({
136
+ repo: z.string().describe("GitHub repository as `owner/name`"),
137
+ limit: z.number().int().positive().describe("Maximum number of PRs to score"),
138
+ }).describe("Inputs for the PR triage workflow");
139
+
140
+ const FetchOutput = z.object({
141
+ prs: z.array(z.object({
142
+ number: z.number().describe("PR number"),
143
+ title: z.string().describe("PR title"),
144
+ author: z.string().describe("PR author's GitHub login"),
145
+ })).describe("Open PRs pulled from GitHub"),
146
+ });
147
+
148
+ const ScoreOutput = z.object({
149
+ scored: z.array(z.object({
150
+ number: z.number().describe("PR number"),
151
+ score: z.number().describe("Review-urgency score, 0..1"),
152
+ })).describe("PRs scored by review urgency"),
153
+ });
154
+
155
+ const OutputSchema = z.object({
156
+ top: z.array(z.object({
157
+ number: z.number().describe("PR number"),
158
+ score: z.number().describe("Review-urgency score, 0..1"),
159
+ })).describe("The top-scoring PRs, descending by score"),
160
+ }).describe("Top PRs surfaced by the triage workflow");
161
+
162
+ const fetchStep = defineStep({
163
+ name: "fetch-prs",
164
+ input: InputSchema,
165
+ output: FetchOutput,
166
+ run: async ({ input }) => {
167
+ // Pure server-side work: fetch, parse, transform.
168
+ return { prs: [/* ... */] };
169
+ },
170
+ });
171
+
172
+ const scoreStep = defineStep({
173
+ name: "score-prs",
174
+ input: FetchOutput,
175
+ output: ScoreOutput,
176
+ run: ({ input }) => ({
177
+ scored: input.prs.map(p => ({ number: p.number, score: 0 })),
178
+ }),
179
+ });
180
+
181
+ const pickTopStep = defineStep({
182
+ name: "pick-top",
183
+ input: ScoreOutput,
184
+ output: OutputSchema,
185
+ run: ({ input }) => ({ top: input.scored.slice(0, 5) }),
186
+ });
187
+
188
+ export default defineWorkflow({
189
+ id: "pr-triage",
190
+ description: "Pulls open PRs from a GitHub repo, scores each one by review urgency, surfaces the top five.",
191
+ input: InputSchema,
192
+ output: OutputSchema,
193
+ })
194
+ .step(fetchStep)
195
+ .step(scoreStep)
196
+ .step(pickTopStep)
197
+ .build();
198
+ ```
199
+
200
+ Notes:
201
+ - Each step's `output` schema must satisfy the next step's `input`
202
+ schema (the SDK enforces this at `defineWorkflow().step(...)` time
203
+ via TS inference). Reshape inside the upstream step's `run` body,
204
+ not at the boundary.
205
+ - Step bodies don't receive a `sandbox`. If a step needs to run code
206
+ inside the runner VM, that's the agent-driven shape — use run-form.
207
+
208
+ ### Template B — run-form (agent-driven body)
209
+
210
+ ```ts
211
+ import { defineWorkflow, agent, claudeRuntime } from "@agent-compose/sdk";
212
+ import { z } from "zod";
213
+ import PROMPT from "./prompt.md" with { type: "text" };
214
+
215
+ const InputSchema = z.object({
216
+ repo: z.string().describe("GitHub repository as `owner/name`"),
217
+ }).describe("Inputs for the code review agent");
218
+
219
+ const OutputSchema = z.object({
220
+ ok: z.boolean().describe("True when the agent finished its review"),
221
+ summary: z.string().optional().describe("One-paragraph summary of the agent's findings"),
222
+ }).describe("Result of the code review agent");
223
+
224
+ export default defineWorkflow({
225
+ description: "Runs an LLM agent that reviews recent changes in a repo and writes a summary.",
226
+ input: InputSchema,
227
+ output: OutputSchema,
228
+
229
+ async run(ctx, sandbox) {
230
+ // Pre-agent setup. `ctx.step("phase", () => ...)` is for the run
231
+ // timeline — emits step_started / step_completed events. It's
232
+ // observability sugar, not an engine step.
233
+ const data = await ctx.step("fetch-input", async () => {
234
+ // pure-server fetch / preparation
235
+ return { /* ... */ };
236
+ });
237
+
238
+ // Agent loop — the LLM iterates against the sandbox until it
239
+ // emits `exit_signal: true` or hits the budget.
240
+ const result = await agent({
241
+ sandbox,
242
+ runtime: claudeRuntime,
243
+ prompt: `${PROMPT}\n\nContext: ${JSON.stringify(data)}`,
244
+ tools: ["Read", "Write", "Edit", "Bash", "Grep", "Glob"],
245
+ budget: { turnsPerIteration: 40, maxIterations: 6 },
246
+ // responseSchema: MyZodSchema, // for typed handoffs
247
+ });
248
+
249
+ await ctx.setMetadata({ summary: result.status?.summary });
250
+ return {
251
+ ok: !!result.status?.completed,
252
+ summary: result.status?.summary,
253
+ };
254
+ },
255
+
256
+ // ── Optional metadata, attached by the bundler at registration ──
257
+ //
258
+ // networkPolicy: outbound traffic + brokered secret substitution.
259
+ // networkPolicy: {
260
+ // allow: {
261
+ // "api.github.com": [{ transform: [{ headers: {
262
+ // Authorization: "basic:$GITHUB_TOKEN",
263
+ // } }] }],
264
+ // },
265
+ // },
266
+ // // Placeholder values the runner sees in env vars AFTER brokering.
267
+ // // Only needed when a tool / SDK validates the env var format on
268
+ // // startup. The real secret never enters the VM — Vercel's
269
+ // // firewall substitutes it on the way out.
270
+ // placeholders: { GITHUB_TOKEN: "ghp_" + "x".repeat(36) },
271
+ //
272
+ // snapshots — boot source + capture mode:
273
+ // snapshots: {
274
+ // bootFrom: { snapshotId: "snap_abc…" },
275
+ // saveLatest: true,
276
+ // retainSteps: false,
277
+ // },
278
+ //
279
+ // memory — opt-in. Requires `workflow-memory` to be registered in
280
+ // this factory.
281
+ // memory: true,
282
+ //
283
+ // postRunHooks — workflows that run after this one completes:
284
+ // postRunHooks: ["audit-trail", "notify-slack"],
285
+ });
286
+ ```
287
+
288
+ Notes:
289
+ - Always declare the return type (or let TS infer + show it on hover).
290
+ - `Promise.all` to fan out independent `agent` calls.
291
+ - `ctx.step("phase-name", () => …)` for any named non-agent phase
292
+ worth showing on the dashboard timeline.
293
+
294
+ ### Sandbox environment companion (only when you decided one is needed)
295
+
296
+ ```ts
297
+ // playwright-env.ts — captures a baseline VM with Playwright pre-installed.
298
+ // Register this first with `agentc register playwright-env.ts --build`
299
+ // so the snapshot exists before user workflows reference it.
300
+ import { defineSandboxEnvironment } from "@agent-compose/sdk";
301
+
302
+ export default defineSandboxEnvironment({
303
+ name: "playwright-env",
304
+ description: "Sandbox with Playwright + Chromium installed for browser-using agents.",
305
+ setup: async (sb) => {
306
+ await sb.commands.run("sudo npm install -g playwright");
307
+ await sb.commands.run("sudo npx playwright install --with-deps chromium");
308
+ },
309
+ });
310
+ ```
311
+
312
+ After `agentc register --build` succeeds, the CLI prints a snapshot
313
+ id; reference it on the user workflow:
314
+
315
+ ```ts
316
+ export default defineWorkflow({
317
+ description: "...",
318
+ input: InputSchema,
319
+ output: OutputSchema,
320
+ snapshots: { bootFrom: { snapshotId: "<id from --build>" } },
321
+ // ...
322
+ });
323
+ ```
324
+
325
+ ## Final steps
326
+
327
+ 1. Show the user the file(s) you plan to write and confirm before
328
+ writing. In plain language: "I'll create `pr-triage.ts` in
329
+ `src/workflows/`. It takes `{ repo, limit }` and returns
330
+ `{ top: [...] }`. Sound right?"
331
+ 2. Write the files.
332
+ 3. Suggest `bun run typecheck` (or the project's equivalent) to
333
+ validate.
334
+ 4. Tell the user the next commands in plain English:
335
+ - "To register: `agentc register src/workflows/pr-triage.ts`."
336
+ - "To test it: `agentc invoke pr-triage --follow`."
337
+ - (If you added a sandbox env companion: "First register the
338
+ `playwright-env` workflow with `--build` so the snapshot is
339
+ captured, then register and invoke `pr-triage`.")
123
340
 
124
341
  ## When the user asks for memory extraction
125
342
 
126
- The built-in memory extractor (`workflow-memory`) is a separate workflow
127
- that must be registered in the same factory. Walk them through it:
343
+ The built-in memory extractor (`workflow-memory`) is a separate
344
+ workflow that must be registered in the same factory. Walk them
345
+ through it in user terms:
128
346
 
129
- 1. Copy the reference recipe from `templates/workflow-memory.ts` (or
347
+ 1. "I'll add the `workflow-memory` recipe to your project."
348
+ Copy from `templates/workflow-memory.ts` (or
130
349
  `.agentc/smoketest/workflows/workflow-memory.ts` for the smoke
131
- variant) into their project.
132
- 2. `agentc register ./workflow-memory.ts` to install it.
133
- 3. Set `memory: true` on the source workflow(s) — done.
350
+ variant).
351
+ 2. "Run `agentc register ./workflow-memory.ts` once to install it."
352
+ 3. "Now any workflow with `memory: true` will trigger it after each
353
+ successful run."
134
354
 
135
- If they want to skip the built-in and run custom post-run workflows,
136
- use `postRunHooks: [...]` instead. Each post-hook is dispatched in order
137
- with the source run's full context.
355
+ For custom post-run workflows (analytics, audit, notifications)
356
+ without the built-in extractor, use `postRunHooks: [...]` instead.
357
+ Each post-hook runs in declaration order with the source run's full
358
+ context.
@@ -107,8 +107,16 @@ export default defineWorkflow({
107
107
  // On every successful run — captures the sandbox VM after the last step:
108
108
  defineWorkflow({ snapshots: { saveLatest: true }, run: ... });
109
109
 
110
- // Retain one per step (storage scales with step count):
111
- defineWorkflow({ snapshots: { saveLatest: true, retainSteps: true }, run: ... });
110
+ // Retain one snapshot per step (storage scales linearly with step count).
111
+ // Most useful with step-form workflows where each `.step(definedStep)`
112
+ // is a distinct engine step; a run-form workflow has only one outer
113
+ // "run" step, so `retainSteps: true` there behaves the same as
114
+ // `saveLatest: true` alone.
115
+ defineWorkflow({ id: "pipeline", input, output, snapshots: { saveLatest: true, retainSteps: true } })
116
+ .step(fetchStep)
117
+ .step(scoreStep)
118
+ .step(pickTopStep)
119
+ .build();
112
120
  ```
113
121
 
114
122
  Per-invocation override: