npm - @agent-compose/cli - Versions diffs - 0.2.1 → 0.2.2 - Mend

@agent-compose/cli 0.2.1 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/package.json +1 -1
package/skills/ac:demo.md +7 -3
package/skills/ac:generate-workflow.md +343 -122
package/skills/ac:snapshots.md +10 -2

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@agent-compose/cli",
-  "version": "0.2.1",
+  "version": "0.2.2",
   "description": "Command-line interface for agent-compose — register, invoke, and monitor workflows from your terminal.",
   "license": "MIT",
   "repository": {

package/skills/ac:demo.md CHANGED Viewed

@@ -30,9 +30,13 @@ Make sure the user understands the three primitives before you start
 typing. Keep it brief — they can read [`docs/how-it-works.md`](../../docs/how-it-works.md)
 for the full version.
-> **Workflow** — `async (ctx, sandbox) => T`. The function you author.
-> Owns the run end-to-end: orchestrates phases, kicks off agent loops,
-> writes metadata, returns a structured result.
+> **Workflow** — the unit of work you author. Two shapes are valid:
+> a single `async (ctx, sandbox) => T` body (run-form, agent-driven)
+> or a typed pipeline of `defineStep(...)` objects composed with
+> `defineWorkflow({...}).step(s1).step(s2).build()` (step-form). The
+> demo below uses run-form because the body is a single agent loop;
+> step-form is the right shape when the work decomposes into phases
+> with typed handoffs.
 > **Agent loop** — `agent({ runtime, prompt, ... })`. Embedded inside
 > the workflow body. Drives one iteration cycle of an LLM agent against

package/skills/ac:generate-workflow.md CHANGED Viewed

@@ -7,131 +7,352 @@ effort: high
 # Generate Workflow
-Interactively scaffold a new workflow from a plain-English description.
-> **Mental model.** A workflow is `async (ctx, sandbox) => T`. The `ctx`
-> exposes `run`, `input?`, `setMetadata`, and `step` (for named timeline
-> phases). The `sandbox` is the runner's own VM — pass it to
-> `agent({ sandbox, ... })` and to any helper that takes a
-> `SandboxProvider`. Agent loops live *inside* the workflow body via
-> `agent`. There's no `defineAgent` / `spawnAgent` model anymore.
-## Steps
-1. Ask the user:
-   - What should this workflow do? (plain English — describe the phases,
-     their inputs/outputs, and any external calls.)
-   - Workflow name? (kebab-case, e.g. `code-review`)
-   - Where to create it? (project-relative path, e.g. `src/workflows/`)
-   - Runtime for agent loops? (default `claudeRuntime`)
-   - Network policy needed? (which domains + which brokered secrets to inject?)
-   - Capture a snapshot on success? (default off; on for setup/environment
-     workflows whose end-state other workflows boot from)
-   - Boot this run from another workflow's snapshot? (default no — fresh
-     `node24` base sandbox)
-   - Enable the built-in memory extractor? (default off — opt in only when
-     this workflow's events are worth memorising AND the `workflow-memory`
-     template is registered in the same factory)
-   - Additional post-hooks? (ordered list of workflow names that fan out
-     after this run completes)
-2. If the user has a reference workflow in their project, read it to match
-   their conventions (prompt structure, error handling). Otherwise follow
-   the patterns in [`sdk/README.md`](../../sdk/README.md) and the example
-   in [`docs/how-it-works.md`](../../docs/how-it-works.md).
-3. Generate a complete, real implementation:
-   ```ts
-   import { defineWorkflow, agent, claudeRuntime } from "@agent-compose/sdk";
-   import PROMPT from "./prompt.md" with { type: "text" };
-   export default defineWorkflow({
-     async run(ctx, sandbox) {
-       // Phase 1 — wrap any non-agent setup in ctx.step() so it shows
-       // up on the timeline with its own duration:
-       const data = await ctx.step("fetch-input", async () => {
-         // pure-server work, fetches, etc.
-       });
-       // Phase 2 — agent loop. Each `agent` is one iteration cycle
-       // with its own prompt + tool allowlist + budget.
-       const result = await agent({
-         sandbox,
-         runtime:    claudeRuntime,
-         prompt:     `${PROMPT}\n\nContext: ${JSON.stringify(data)}`,
-         tools:      ["Read", "Write", "Edit", "Bash", "Grep", "Glob"],
-         budget:     { turnsPerIteration: 40, maxIterations: 6 },
-         // responseSchema: MyZodSchema,   // for typed handoffs
-       });
-       await ctx.setMetadata({ summary: result.status?.summary });
-       return { ok: !!result.status?.completed };
-     },
-     // ── Optional metadata picked up by the bundler at registration ──
-     //
-     // networkPolicy / placeholders — outbound traffic + brokered secrets:
-     //
-     //   networkPolicy: {
-     //     allow: {
-     //       "api.github.com": [{ transform: [{ headers: {
-     //         Authorization: "basic:$GITHUB_TOKEN",
-     //       } }] }],
-     //       "*.openai.com":   [{ transform: [{ headers: {
-     //         "OpenAI-Beta":  "$OPENAI_BETA_HEADER",
-     //       } }] }],
-     //     },
-     //   },
-     //   // Placeholder values the runner sees in env vars AFTER brokering.
-     //   // The real secret never enters the VM — Vercel's firewall
-     //   // substitutes it on the way out. Only needed when a tool /
-     //   // SDK validates the env-var format on startup.
-     //   placeholders: { GITHUB_TOKEN: "ghp_" + "x".repeat(36) },
-     //
-     // snapshots — all snapshot config in one object:
-     //
-     //   snapshots: {
-     //     // Boot from a specific captured snapshot. Pick the id from
-     //     // `agentc snapshot list` or the factory snapshots page.
-     //     bootFrom:    { snapshotId: "snap_abc…" },
-     //     saveLatest:  true,    // capture sandbox on success
-     //     retainSteps: false,   // keep one snapshot per successful step
-     //   },
-     //
-     // memory — opt-in (default false). Requires `workflow-memory` to
-     //          be registered in this factory.
-     //   memory: true,
-     //
-     // postRunHooks — ordered list of workflow names that run after this:
-     //   postRunHooks: ["audit-trail", "notify-slack"],
-   });
-   ```
-   - Always declare the return type (or let TS infer + show it on hover).
-   - Use `Promise.all` to fan out independent `agent` calls.
-   - Use `ctx.step("phase-name", () => …)` for any named non-agent phase.
-4. Show the complete file plus a separate `prompt.md` (or per-phase
-   prompts) and confirm before writing.
-5. Write the files, then suggest the user run `bun run typecheck` (or
-   their project's equivalent) to validate.
-6. Tell the user: `agentc register <path>` to register, then
-   `agentc invoke <name> --follow` to test.
+Walk the user through building a new workflow, **in their own words**.
+This skill is used by operators and product folks as well as engineers
+— ask questions in plain language and decide the shape internally.
+## What to ask the user
+Ask these one at a time. Keep the language non-technical. Don't ask
+about "step-form vs run-form" — that's an internal call.
+1. **What should this workflow do?**
+   One paragraph in plain English. This becomes the workflow's
+   `description` (shown on the dashboard tile and run page header).
+   Example: "Pulls open pull requests from a GitHub repo, scores each
+   one by review urgency, and posts the top five to Slack."
+2. **What information does it take in?**
+   Walk through each input field: a short name, a type (text, number,
+   yes/no, list, object), and a one-line description of what it's for.
+   Each field's description becomes a `.describe(...)` on the zod
+   schema so it shows up in the dashboard's Input panel.
+3. **What does it produce?**
+   Same pattern for the output. Fields + types + per-field
+   descriptions. The dashboard's Output panel renders these straight
+   from the schema.
+4. **Where should it live?**
+   Project-relative path, e.g. `src/workflows/`. Also ask for a
+   kebab-case name if one isn't obvious from the description.
+5. **Does it need to call external services?**
+   If yes, ask which (GitHub, Slack, OpenAI, an internal API, etc.).
+   You'll generate a network policy + brokered-secret stubs.
+6. **Should it run on a schedule?**
+   If yes, ask for the cadence in plain English ("every weekday at 9am
+   ET", "every hour", "the 1st of every month"). Convert to cron
+   yourself.
+7. **Anything else worth recording?**
+   Optional. Most workflows are fine without this.
+**Do NOT ask** the user about:
+- step-form vs run-form
+- sandbox environments / dependency installation
+- snapshots: `saveLatest`, `retainSteps`, `bootFrom`
+- `memory` / `postRunHooks` / `processors`
+Decide those internally based on what they described (see the
+"Internal decisions" section below).
+## Internal decisions
+After answering the questions above, decide the shape WITHOUT asking:
+### Pick the workflow shape
+- **Step-form** (`defineWorkflow({...}).step(s1).step(s2).build()` with
+  `defineStep(...)` objects) — pick this when the body decomposes
+  cleanly into named phases with typed handoffs. Each step's output
+  threads into the next step's input. The dashboard renders each step
+  as a typed phase with its own duration. Examples: ETL pipelines,
+  classification → enrichment → publish, fetch → score → rank.
+- **Run-form** (one `async (ctx, sandbox)` body with `agent({...})`
+  calls inside) — pick this when the body is dominated by one or more
+  agent loops. Decomposing an LLM iteration into typed engine steps is
+  the wrong shape. Use `ctx.step("phase-name", () => …)` inside the
+  body for observability sub-events when there's pre-agent setup
+  worth tracing on the run timeline.
+When the user's description says "agent", "LLM", "the model decides",
+"reasons through", "writes a summary based on", "navigates the
+website" — that's run-form. When they describe deterministic phases
+that each transform structured data — that's step-form. If genuinely
+mixed, default to run-form and use `ctx.step` for the deterministic
+phases.
+### Should it have a sandbox environment?
+If the workflow needs specific tools, CLIs, or installed packages in
+its runner VM (Playwright + Chromium, a specific Node version, an SDK
+that needs `npm install`-ing first), it needs a **sandbox
+environment** to capture that pre-installed state. The user doesn't
+need to know about this — you decide.
+When you decide a sandbox env is needed:
+1. Generate a small `defineSandboxEnvironment` workflow alongside the
+   user's workflow. Name it after the deps it captures (e.g.
+   `playwright-env`).
+2. Register that first with `agentc register --build` so the snapshot
+   is captured.
+3. Reference the captured snapshot from the user's workflow via
+   `snapshots: { bootFrom: { snapshotId: "<id from the env's first
+   run>" } }`.
+4. Tell the user about it in user terms: "I'll also create a one-time
+   setup workflow `playwright-env` that installs Playwright. We
+   register that first to capture a baseline image, then your workflow
+   boots from that image so it doesn't reinstall on every run."
+Most workflows DO NOT need this. Default to no sandbox env.
+### Always include in the generated file
+- `description: "..."` on the `defineWorkflow` / `defineSandboxEnvironment`
+  call — straight from the user's question 1 answer.
+- `.describe("...")` on **every** zod schema field — from the user's
+  question 2 + 3 answers. Schema descriptions appear in the dashboard
+  IO panels; without them, fields show only their type.
+- `.describe("...")` on the root schema too where helpful — surfaces
+  at the panel header.
+## Templates
+Generate the appropriate template based on your internal shape
+decision.
+### Template A — step-form (typed pipeline)
+```ts
+import { defineWorkflow, defineStep } from "@agent-compose/sdk";
+import { z } from "zod";
+const InputSchema = z.object({
+  repo:  z.string().describe("GitHub repository as `owner/name`"),
+  limit: z.number().int().positive().describe("Maximum number of PRs to score"),
+}).describe("Inputs for the PR triage workflow");
+const FetchOutput = z.object({
+  prs: z.array(z.object({
+    number: z.number().describe("PR number"),
+    title:  z.string().describe("PR title"),
+    author: z.string().describe("PR author's GitHub login"),
+  })).describe("Open PRs pulled from GitHub"),
+});
+const ScoreOutput = z.object({
+  scored: z.array(z.object({
+    number: z.number().describe("PR number"),
+    score:  z.number().describe("Review-urgency score, 0..1"),
+  })).describe("PRs scored by review urgency"),
+});
+const OutputSchema = z.object({
+  top: z.array(z.object({
+    number: z.number().describe("PR number"),
+    score:  z.number().describe("Review-urgency score, 0..1"),
+  })).describe("The top-scoring PRs, descending by score"),
+}).describe("Top PRs surfaced by the triage workflow");
+const fetchStep = defineStep({
+  name:   "fetch-prs",
+  input:  InputSchema,
+  output: FetchOutput,
+  run: async ({ input }) => {
+    // Pure server-side work: fetch, parse, transform.
+    return { prs: [/* ... */] };
+  },
+});
+const scoreStep = defineStep({
+  name:   "score-prs",
+  input:  FetchOutput,
+  output: ScoreOutput,
+  run: ({ input }) => ({
+    scored: input.prs.map(p => ({ number: p.number, score: 0 })),
+  }),
+});
+const pickTopStep = defineStep({
+  name:   "pick-top",
+  input:  ScoreOutput,
+  output: OutputSchema,
+  run: ({ input }) => ({ top: input.scored.slice(0, 5) }),
+});
+export default defineWorkflow({
+  id:          "pr-triage",
+  description: "Pulls open PRs from a GitHub repo, scores each one by review urgency, surfaces the top five.",
+  input:       InputSchema,
+  output:      OutputSchema,
+})
+  .step(fetchStep)
+  .step(scoreStep)
+  .step(pickTopStep)
+  .build();
+```
+Notes:
+- Each step's `output` schema must satisfy the next step's `input`
+  schema (the SDK enforces this at `defineWorkflow().step(...)` time
+  via TS inference). Reshape inside the upstream step's `run` body,
+  not at the boundary.
+- Step bodies don't receive a `sandbox`. If a step needs to run code
+  inside the runner VM, that's the agent-driven shape — use run-form.
+### Template B — run-form (agent-driven body)
+```ts
+import { defineWorkflow, agent, claudeRuntime } from "@agent-compose/sdk";
+import { z } from "zod";
+import PROMPT from "./prompt.md" with { type: "text" };
+const InputSchema = z.object({
+  repo: z.string().describe("GitHub repository as `owner/name`"),
+}).describe("Inputs for the code review agent");
+const OutputSchema = z.object({
+  ok:      z.boolean().describe("True when the agent finished its review"),
+  summary: z.string().optional().describe("One-paragraph summary of the agent's findings"),
+}).describe("Result of the code review agent");
+export default defineWorkflow({
+  description: "Runs an LLM agent that reviews recent changes in a repo and writes a summary.",
+  input:       InputSchema,
+  output:      OutputSchema,
+  async run(ctx, sandbox) {
+    // Pre-agent setup. `ctx.step("phase", () => ...)` is for the run
+    // timeline — emits step_started / step_completed events. It's
+    // observability sugar, not an engine step.
+    const data = await ctx.step("fetch-input", async () => {
+      // pure-server fetch / preparation
+      return { /* ... */ };
+    });
+    // Agent loop — the LLM iterates against the sandbox until it
+    // emits `exit_signal: true` or hits the budget.
+    const result = await agent({
+      sandbox,
+      runtime:    claudeRuntime,
+      prompt:     `${PROMPT}\n\nContext: ${JSON.stringify(data)}`,
+      tools:      ["Read", "Write", "Edit", "Bash", "Grep", "Glob"],
+      budget:     { turnsPerIteration: 40, maxIterations: 6 },
+      // responseSchema: MyZodSchema,   // for typed handoffs
+    });
+    await ctx.setMetadata({ summary: result.status?.summary });
+    return {
+      ok:      !!result.status?.completed,
+      summary: result.status?.summary,
+    };
+  },
+  // ── Optional metadata, attached by the bundler at registration ──
+  //
+  // networkPolicy: outbound traffic + brokered secret substitution.
+  //   networkPolicy: {
+  //     allow: {
+  //       "api.github.com": [{ transform: [{ headers: {
+  //         Authorization: "basic:$GITHUB_TOKEN",
+  //       } }] }],
+  //     },
+  //   },
+  //   // Placeholder values the runner sees in env vars AFTER brokering.
+  //   // Only needed when a tool / SDK validates the env var format on
+  //   // startup. The real secret never enters the VM — Vercel's
+  //   // firewall substitutes it on the way out.
+  //   placeholders: { GITHUB_TOKEN: "ghp_" + "x".repeat(36) },
+  //
+  // snapshots — boot source + capture mode:
+  //   snapshots: {
+  //     bootFrom:    { snapshotId: "snap_abc…" },
+  //     saveLatest:  true,
+  //     retainSteps: false,
+  //   },
+  //
+  // memory — opt-in. Requires `workflow-memory` to be registered in
+  // this factory.
+  //   memory: true,
+  //
+  // postRunHooks — workflows that run after this one completes:
+  //   postRunHooks: ["audit-trail", "notify-slack"],
+});
+```
+Notes:
+- Always declare the return type (or let TS infer + show it on hover).
+- `Promise.all` to fan out independent `agent` calls.
+- `ctx.step("phase-name", () => …)` for any named non-agent phase
+  worth showing on the dashboard timeline.
+### Sandbox environment companion (only when you decided one is needed)
+```ts
+// playwright-env.ts — captures a baseline VM with Playwright pre-installed.
+// Register this first with `agentc register playwright-env.ts --build`
+// so the snapshot exists before user workflows reference it.
+import { defineSandboxEnvironment } from "@agent-compose/sdk";
+export default defineSandboxEnvironment({
+  name:        "playwright-env",
+  description: "Sandbox with Playwright + Chromium installed for browser-using agents.",
+  setup: async (sb) => {
+    await sb.commands.run("sudo npm install -g playwright");
+    await sb.commands.run("sudo npx playwright install --with-deps chromium");
+  },
+});
+```
+After `agentc register --build` succeeds, the CLI prints a snapshot
+id; reference it on the user workflow:
+```ts
+export default defineWorkflow({
+  description: "...",
+  input: InputSchema,
+  output: OutputSchema,
+  snapshots: { bootFrom: { snapshotId: "<id from --build>" } },
+  // ...
+});
+```
+## Final steps
+1. Show the user the file(s) you plan to write and confirm before
+   writing. In plain language: "I'll create `pr-triage.ts` in
+   `src/workflows/`. It takes `{ repo, limit }` and returns
+   `{ top: [...] }`. Sound right?"
+2. Write the files.
+3. Suggest `bun run typecheck` (or the project's equivalent) to
+   validate.
+4. Tell the user the next commands in plain English:
+   - "To register: `agentc register src/workflows/pr-triage.ts`."
+   - "To test it: `agentc invoke pr-triage --follow`."
+   - (If you added a sandbox env companion: "First register the
+     `playwright-env` workflow with `--build` so the snapshot is
+     captured, then register and invoke `pr-triage`.")
 ## When the user asks for memory extraction
-The built-in memory extractor (`workflow-memory`) is a separate workflow
-that must be registered in the same factory. Walk them through it:
+The built-in memory extractor (`workflow-memory`) is a separate
+workflow that must be registered in the same factory. Walk them
+through it in user terms:
-1. Copy the reference recipe from `templates/workflow-memory.ts` (or
+1. "I'll add the `workflow-memory` recipe to your project."
+   Copy from `templates/workflow-memory.ts` (or
    `.agentc/smoketest/workflows/workflow-memory.ts` for the smoke
-   variant) into their project.
-2. `agentc register ./workflow-memory.ts` to install it.
-3. Set `memory: true` on the source workflow(s) — done.
+   variant).
+2. "Run `agentc register ./workflow-memory.ts` once to install it."
+3. "Now any workflow with `memory: true` will trigger it after each
+   successful run."
-If they want to skip the built-in and run custom post-run workflows,
-use `postRunHooks: [...]` instead. Each post-hook is dispatched in order
-with the source run's full context.
+For custom post-run workflows (analytics, audit, notifications)
+without the built-in extractor, use `postRunHooks: [...]` instead.
+Each post-hook runs in declaration order with the source run's full
+context.

package/skills/ac:snapshots.md CHANGED Viewed

@@ -107,8 +107,16 @@ export default defineWorkflow({
 // On every successful run — captures the sandbox VM after the last step:
 defineWorkflow({ snapshots: { saveLatest: true }, run: ... });
-// Retain one per step (storage scales with step count):
-defineWorkflow({ snapshots: { saveLatest: true, retainSteps: true }, run: ... });
+// Retain one snapshot per step (storage scales linearly with step count).
+// Most useful with step-form workflows where each `.step(definedStep)`
+// is a distinct engine step; a run-form workflow has only one outer
+// "run" step, so `retainSteps: true` there behaves the same as
+// `saveLatest: true` alone.
+defineWorkflow({ id: "pipeline", input, output, snapshots: { saveLatest: true, retainSteps: true } })
+  .step(fetchStep)
+  .step(scoreStep)
+  .step(pickTopStep)
+  .build();
 ```
 Per-invocation override: