@zibby/cli 0.4.16 → 0.4.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. package/dist/bin/zibby.js +2 -2
  2. package/dist/commands/init.js +64 -64
  3. package/dist/commands/workflows/generate.js +108 -108
  4. package/dist/commands/workflows/schedule.js +10 -0
  5. package/dist/commands/workflows/validate-helpers.js +1 -0
  6. package/dist/commands/workflows/validate.js +4 -0
  7. package/dist/package.json +5 -3
  8. package/dist/templates/.claude/CLAUDE.md +425 -0
  9. package/dist/templates/.claude/commands/add-node.md +63 -0
  10. package/dist/templates/.claude/commands/add-skill.md +83 -0
  11. package/dist/templates/.claude/commands/new-workflow.md +61 -0
  12. package/dist/templates/.claude/commands/validate-workflow.md +67 -0
  13. package/package.json +5 -3
  14. package/templates/.claude/CLAUDE.md +425 -0
  15. package/templates/.claude/commands/add-node.md +63 -0
  16. package/templates/.claude/commands/add-skill.md +83 -0
  17. package/templates/.claude/commands/new-workflow.md +61 -0
  18. package/templates/.claude/commands/validate-workflow.md +67 -0
  19. package/templates/zibby-workflow-claude/agents-md-block.md +173 -0
  20. package/templates/zibby-workflow-claude/claude/agents/zibby-test-author.md +87 -0
  21. package/templates/zibby-workflow-claude/claude/agents/zibby-workflow-builder.md +101 -0
  22. package/templates/zibby-workflow-claude/claude/commands/zibby-add-node.md +75 -0
  23. package/templates/zibby-workflow-claude/claude/commands/zibby-debug.md +67 -0
  24. package/templates/zibby-workflow-claude/claude/commands/zibby-delete.md +37 -0
  25. package/templates/zibby-workflow-claude/claude/commands/zibby-deploy.md +87 -0
  26. package/templates/zibby-workflow-claude/claude/commands/zibby-list.md +30 -0
  27. package/templates/zibby-workflow-claude/claude/commands/zibby-memory-cost.md +39 -0
  28. package/templates/zibby-workflow-claude/claude/commands/zibby-memory-pull.md +47 -0
  29. package/templates/zibby-workflow-claude/claude/commands/zibby-memory-remote-use-hosted.md +61 -0
  30. package/templates/zibby-workflow-claude/claude/commands/zibby-memory-stats.md +38 -0
  31. package/templates/zibby-workflow-claude/claude/commands/zibby-static-ip.md +70 -0
  32. package/templates/zibby-workflow-claude/claude/commands/zibby-tail.md +53 -0
  33. package/templates/zibby-workflow-claude/claude/commands/zibby-test-debug.md +59 -0
  34. package/templates/zibby-workflow-claude/claude/commands/zibby-test-generate.md +39 -0
  35. package/templates/zibby-workflow-claude/claude/commands/zibby-test-run.md +49 -0
  36. package/templates/zibby-workflow-claude/claude/commands/zibby-test-write.md +46 -0
  37. package/templates/zibby-workflow-claude/claude/commands/zibby-trigger.md +56 -0
  38. package/templates/zibby-workflow-claude/claude/settings.json +10 -0
  39. package/templates/zibby-workflow-claude/cursor/rules/zibby-workflows.mdc +119 -0
  40. package/templates/zibby-workflow-claude/manifest.json +47 -0
@@ -0,0 +1,63 @@
1
+ ---
2
+ description: Add a node to an existing Zibby workflow graph
3
+ argument-hint: <workflow-name> <node-purpose>
4
+ ---
5
+
6
+ # /add-node
7
+
8
+ The user wants to extend an existing workflow with a new node.
9
+
10
+ **Arguments:** $ARGUMENTS
11
+
12
+ ## Steps
13
+
14
+ 1. **Find the workflow** — should be at `.zibby/workflows/<name>/`. If
15
+ the user didn't specify a name and there's only one workflow, use
16
+ that. If there are multiple, ask which.
17
+
18
+ 2. **Read the existing `graph.mjs`** to understand:
19
+ - Current node sequence
20
+ - State shape (what each prior node returns)
21
+ - Where the new node should slot in (before / after / parallel)
22
+
23
+ 3. **Decide LLM vs custom-code:**
24
+ - Custom-code (`execute`): the work is deterministic — git ops,
25
+ file IO, HTTP, parsing structured data, math
26
+ - LLM (`prompt`): the work needs judgement — summarization,
27
+ classification, generation, planning
28
+
29
+ 4. **Create the node file** at `.zibby/workflows/<name>/nodes/<node>.mjs`:
30
+
31
+ ```js
32
+ import { z } from 'zod';
33
+
34
+ export const myNode = {
35
+ name: 'my_node',
36
+ outputSchema: z.object({ /* what this returns to state */ }),
37
+ prompt: (state) => `…use state.previousNodeName.field…`,
38
+ // OR for custom-code:
39
+ // execute: async (state) => ({ /* match outputSchema */ }),
40
+ };
41
+ ```
42
+
43
+ 5. **Wire it into `graph.mjs`:**
44
+ - `import { myNode } from './nodes/my-node.mjs';`
45
+ - `graph.addNode('my_node', myNode);`
46
+ - `graph.addEdge('prev_node', 'my_node');`
47
+ - `graph.addEdge('my_node', 'next_node');` (or 'END')
48
+
49
+ 6. **Validate + run** to confirm it integrates:
50
+ ```bash
51
+ zibby workflow validate <name>
52
+ zibby workflow run <name> -p ...
53
+ ```
54
+
55
+ 7. **Report**: what node you added, how state changed, the test result.
56
+
57
+ ## Watch for
58
+
59
+ - The new node's `prompt` references `state.X.field` — make sure that
60
+ field exists in the previous node's `outputSchema`. If it doesn't,
61
+ fix the producer schema, not the consumer prompt.
62
+ - Don't change other nodes' schemas without telling the user — that's
63
+ a breaking change to downstream consumers.
@@ -0,0 +1,83 @@
1
+ ---
2
+ description: Add a custom MCP skill to a Zibby workflow
3
+ argument-hint: <workflow-name> <skill-purpose-or-mcp-server-name>
4
+ ---
5
+
6
+ # /add-skill
7
+
8
+ The user wants to add a custom skill (MCP tool bundle) to a workflow.
9
+
10
+ **Arguments:** $ARGUMENTS
11
+
12
+ ## Steps
13
+
14
+ 1. **Identify the MCP server** the skill wraps:
15
+ - If the user named one (e.g. "slack", "linear", "filesystem"), find
16
+ the official MCP server. Standard ones live at
17
+ `@modelcontextprotocol/server-<name>`.
18
+ - If unsure, ask: "Which MCP server should this skill wrap, or
19
+ should it be a JS-only middleware?"
20
+
21
+ 2. **Find the workflow** at `.zibby/workflows/<name>/`. Create a
22
+ `skills/` subfolder if it doesn't exist.
23
+
24
+ 3. **Write `skills/<id>.mjs`:**
25
+
26
+ ```js
27
+ import { registerSkill } from '@zibby/agent-workflow';
28
+
29
+ registerSkill({
30
+ id: 'slack', // referenced by node `skills: ['slack']`
31
+ serverName: 'slack-mcp',
32
+ command: 'npx',
33
+ args: ['-y', '@modelcontextprotocol/server-slack'],
34
+ allowedTools: ['mcp__slack__*'], // pattern of tools the agent gets access to
35
+ envKeys: ['SLACK_BOT_TOKEN'],
36
+ description: 'Read channels, post messages, search history',
37
+ });
38
+ ```
39
+
40
+ 4. **Import the skill file from `graph.mjs`** at the TOP, before
41
+ `new WorkflowGraph()`:
42
+
43
+ ```js
44
+ import './skills/slack.mjs'; // side-effect: registers the skill
45
+ import { WorkflowGraph } from '@zibby/agent-workflow';
46
+ // ...
47
+ ```
48
+
49
+ 5. **Opt nodes into the skill:**
50
+
51
+ ```js
52
+ graph.addNode('post_summary', {
53
+ ...postSummaryNode,
54
+ skills: ['slack'], // ← agent gets slack tools here
55
+ });
56
+ ```
57
+
58
+ 6. **Document the env requirement.** Add to the workflow's README or
59
+ tell the user which env var they need to set:
60
+ - Locally: `export SLACK_BOT_TOKEN=xoxb-...` or put in `.env`
61
+ - Cloud: `zibby workflow env set <workflow> SLACK_BOT_TOKEN=...`
62
+
63
+ 7. **Validate + test:**
64
+ ```bash
65
+ zibby workflow validate <name>
66
+ zibby workflow run <name> -p ...
67
+ ```
68
+
69
+ The agent should now have access to `mcp__slack__*` tools in the
70
+ nodes that opted in.
71
+
72
+ ## When NOT to use a custom skill
73
+
74
+ - If the work can be done with plain Node.js (HTTP call, file write,
75
+ git command) — use a custom-code node with `execute()` instead. MCP
76
+ skills are for tool surfaces the agent decides to use, not for
77
+ deterministic glue.
78
+
79
+ ## When to use `middleware` instead of MCP
80
+
81
+ If you don't have an MCP server but want to attach a JS helper that
82
+ nodes can use, see the "Custom skill via a non-MCP function" section
83
+ of `.claude/CLAUDE.md`. This is rare — prefer MCP when one exists.
@@ -0,0 +1,61 @@
1
+ ---
2
+ description: Scaffold a new Zibby workflow from a natural-language description
3
+ argument-hint: <description of what the workflow should do>
4
+ ---
5
+
6
+ # /new-workflow
7
+
8
+ You're about to create a new Zibby workflow. The user's request is:
9
+
10
+ **$ARGUMENTS**
11
+
12
+ ## Steps
13
+
14
+ 1. **Sketch the graph.** Based on the user's request, decide:
15
+ - How many nodes? (Typically 2-5. More than 7 is usually a sign of
16
+ overdesign — collapse adjacent nodes.)
17
+ - Which nodes need an LLM (judgement, generation) vs custom-code
18
+ (deterministic: git ops, HTTP, file IO)?
19
+ - What's the linear sequence vs conditional branching?
20
+ - What's the final output the user cares about?
21
+
22
+ 2. **Pick a workflow name** — kebab-case, ≤24 chars, descriptive.
23
+ Examples: `code-review`, `pr-summary`, `nightly-changelog`.
24
+
25
+ 3. **Run the scaffold:**
26
+ ```bash
27
+ zibby workflow new <name>
28
+ ```
29
+ This creates `.zibby/workflows/<name>/` with starter files.
30
+
31
+ 4. **Edit the files** in this order (read CLAUDE.md §1 if you've forgotten the shapes):
32
+ - `workflow.json` — set `name`, `description`, `defaultAgent`
33
+ - `nodes/*.mjs` — one file per node, each with `name`, `outputSchema`
34
+ (Zod), and either `prompt` (LLM) or `execute` (custom-code)
35
+ - `graph.mjs` — wire them up with `addNode` + `addEdge` + `setEntryPoint`
36
+
37
+ 5. **Validate** (this catches 80% of mistakes before running anything):
38
+ ```bash
39
+ zibby workflow validate <name>
40
+ ```
41
+ Fix any reported issues.
42
+
43
+ 6. **Test locally** with a realistic input:
44
+ ```bash
45
+ zibby workflow run <name> -p <key>=<value>
46
+ ```
47
+ Watch the timeline. If a node fails, the `raw` field shows what the
48
+ agent actually returned vs what the schema expected.
49
+
50
+ 7. **Report back to the user** with:
51
+ - The workflow path
52
+ - The local test result
53
+ - The exact `zibby workflow run` command they can use
54
+ - Ask if they want to deploy
55
+
56
+ ## DO NOT
57
+
58
+ - Don't deploy without asking (`zibby workflow deploy` has cost)
59
+ - Don't use `state.set()` / `state.get()` inside `execute()` — just `return`
60
+ - Don't skip `zibby workflow validate` — it catches schema typos fast
61
+ - Don't add nodes the request didn't ask for
@@ -0,0 +1,67 @@
1
+ ---
2
+ description: Statically validate a Zibby workflow + run it locally with sample input
3
+ argument-hint: <workflow-name> [optional input as key=value pairs]
4
+ ---
5
+
6
+ # /validate-workflow
7
+
8
+ The user wants to verify a workflow works before deploying.
9
+
10
+ **Arguments:** $ARGUMENTS
11
+
12
+ ## Steps
13
+
14
+ 1. **Static validation first** (fast — does NOT call any LLM):
15
+
16
+ ```bash
17
+ zibby workflow validate <name>
18
+ ```
19
+
20
+ Checks:
21
+ - Graph topology (entry point set, edges reach END, no orphan nodes)
22
+ - Every node has `outputSchema`
23
+ - Every `skills: ['x']` reference is registered
24
+ - Zod schemas parse cleanly
25
+
26
+ If this fails, **fix the reported issues before running anything
27
+ else**. Validation errors mean the workflow can't possibly work.
28
+
29
+ 2. **Local dry-run** with realistic input:
30
+
31
+ ```bash
32
+ zibby workflow run <name> -p key1=value1 -p key2=value2
33
+ ```
34
+
35
+ Watch the timeline (`┌ nodeName … └ done`). Each node should:
36
+ - Show timing under ~30s for LLM nodes, <1s for custom-code
37
+ - Print its output
38
+ - Hand off to the next node
39
+
40
+ 3. **If a node fails:**
41
+ - Read the `raw` field in its output — that's what the agent
42
+ actually returned
43
+ - Compare to the `outputSchema` — what didn't match?
44
+ - Fix the prompt (be more specific about the output shape) OR
45
+ relax the schema (some fields optional). Prefer fixing prompts.
46
+
47
+ 4. **If the whole graph fails:**
48
+ - Check `state` shape — is the input you provided in the right
49
+ place? Top-level keys, not nested under `input`.
50
+ - Check the entry point — `graph.setEntryPoint('first_node')`.
51
+
52
+ 5. **Report back:**
53
+ - Validation result (pass / fail + what)
54
+ - Local run result (pass / fail + which node)
55
+ - If failed: a one-line diagnosis + a proposed fix
56
+ - If passed: the exact command the user can use to deploy
57
+
58
+ ## DO
59
+
60
+ - Run validate before run before deploy. Cost increases 10× at each step.
61
+ - Use realistic inputs (`-p`) — defaults are usually placeholders.
62
+
63
+ ## DON'T
64
+
65
+ - Don't deploy a workflow that hasn't passed local run.
66
+ - Don't suppress / ignore Zod errors — they're telling you the agent
67
+ produced something the next node won't accept.
@@ -0,0 +1,173 @@
1
+ <!-- BEGIN zibby-workflows zibby-template-version: 4 -->
2
+ ## Zibby
3
+
4
+ This project uses **Zibby** — there are two surfaces:
5
+
6
+ 1. **Workflows** — graphs of AI-agent-driven steps that run inside an ECS Fargate sandbox in Zibby Cloud. Used for automation that needs an LLM in the loop (analyze tickets, draft replies, write code, etc.).
7
+
8
+ 2. **Tests** — plain-language `.txt` specs that Zibby's runner converts to Playwright executions. Produces video + JSON results. Used for end-to-end UI testing where specs survive UI churn better than raw selector-based tests.
9
+
10
+ Both share `.zibby.config.mjs` at the project root.
11
+
12
+ ---
13
+
14
+ ### Workflows
15
+
16
+ Files:
17
+ ```
18
+ <paths.workflows or .zibby/workflows>/<name>/
19
+ ├── workflow.json name, entryClass, triggers, schemas (manifest)
20
+ ├── graph.mjs nodes + edges from START to END
21
+ ├── nodes/
22
+ │ ├── index.mjs barrel export
23
+ │ └── *.mjs one node per file: { id, description, run(ctx) }
24
+ └── package.json deps; bundled at deploy time
25
+ ```
26
+
27
+ Each node has `async run(ctx)` where `ctx` provides:
28
+ - `ctx.input` — outputs from upstream nodes
29
+ - `ctx.agent({ prompt, schema })` — call the configured LLM with structured output
30
+ - `ctx.shell(cmd)` — run shell in the sandbox (egress proxy enabled)
31
+ - `ctx.log(...)` — emit a log line (visible via `zibby workflow logs`)
32
+
33
+ Common dev loop:
34
+ ```
35
+ zibby workflow new <name> # scaffold
36
+ zibby workflow run <name> # one-shot local run (preferred for the dev loop)
37
+ zibby workflow run <name> -p k=v # with input
38
+ zibby workflow deploy <name> # build + push to Zibby Cloud
39
+ zibby workflow trigger <uuid> # invoke the cloud workflow
40
+ zibby workflow logs <uuid> -t # tail live logs (docker-compose-style)
41
+ zibby workflow list # find UUIDs and statuses (local + cloud)
42
+ zibby workflow download <uuid> # pull the cloud workflow source back to .zibby/workflows/
43
+ zibby workflow delete <uuid> # remove a deployed workflow
44
+ ```
45
+
46
+ **`run` vs `start`.** `workflow run` is the one-shot CLI iteration command — load the graph, execute it once, print the result, exit. That's the right primitive for the dev loop and for CI/CD. `workflow start` is a *long-lived* local dev server (default port 3848) used by Studio for replay/debug; for plain CLI iteration always prefer `run`.
47
+
48
+ `run` and `trigger` accept the same input flag surface — flip the verb to switch between local and cloud:
49
+ - `-p key=value` (repeatable) — highest precedence
50
+ - `--input '<json>'` — JSON string
51
+ - `--input-file path.json` — JSON file, lowest precedence
52
+
53
+ Static outbound IPs (for customers behind firewalls): see `--dedicated-ip` flag on `deploy`.
54
+
55
+ #### Per-workflow env vars
56
+
57
+ Each deployed workflow has its own encrypted env-var bag (KMS-backed). Vars get injected into the Fargate task at trigger time, and **workflow env wins over project secrets on conflict**. Use this for per-pipeline credentials (different `ANTHROPIC_API_KEY` per workflow, a workflow-only `DATABASE_URL`, etc.).
58
+
59
+ ```
60
+ zibby workflow env list <uuid> # show key names (values never returned)
61
+ zibby workflow env set <uuid> ANTHROPIC_API_KEY=sk-… # add or rotate one key
62
+ zibby workflow env unset <uuid> OLD_KEY # remove one key
63
+ zibby workflow env push <uuid> --file .env [--file .env.prod] # bulk replace from .env files
64
+ ```
65
+
66
+ Fast path on first deploy — sync a `.env` in one shot:
67
+ ```
68
+ zibby workflow deploy my-pipeline --env .env [--env .env.prod]
69
+ ```
70
+ The CLI deploys, then runs `push` against the freshly-minted UUID.
71
+
72
+ ---
73
+
74
+ ### Tests
75
+
76
+ Files:
77
+ ```
78
+ test-specs/ source `.txt` specs (paths.specs)
79
+ tests/ generated `.spec.js` (paths.generated; regenerated each run)
80
+ test-results/ videos, traces, JSON results per run
81
+ .zibby/memory/.dolt/ local test memory DB (selectors, page model, history)
82
+ playwright.config.js
83
+ ```
84
+
85
+ A spec is plain-language imperative English describing what to test. Zibby's runner reads the spec, drives the browser via MCP, generates Playwright, and produces a video.
86
+
87
+ Common dev loop:
88
+ ```
89
+ zibby test test-specs/<name>.txt # run a spec
90
+ zibby test "go to example.com and ..." # inline, no file
91
+ zibby test <spec> --agent claude # override the configured agent (claude|cursor|codex|gemini)
92
+ zibby test --sources <ids> --execution <id> # cloud test cases (run from a stored execution)
93
+ zibby generate -t ENG-1234 # generate specs from a Jira ticket
94
+ zibby video # organize videos next to spec files
95
+ zibby upload <spec-path> # upload existing artifacts to cloud
96
+ ```
97
+
98
+ When debugging a failed test, watch the video at `test-results/<spec>/video.webm` — that's almost always faster than reading logs.
99
+
100
+ #### Test memory
101
+
102
+ `.zibby/memory/.dolt/` is a **local-first Dolt SQL database** (Git-for-data) that learns from every test run — selectors that worked, page-element fingerprints, navigation transitions, timing quirks, recorded insights. The runner auto-pulls before a run and auto-pushes after a passing run. Keying is **per-domain** (not per-spec), so any spec that hits `myapp.com` benefits from selectors learned by every other spec on the same domain.
103
+
104
+ When `zibby test` runs and `.zibby/memory/.dolt/` exists, the agent gets 5 MCP tools auto-exposed:
105
+
106
+ - `memory_get_test_history` — query recent runs (pass/fail/timing); filter by spec-path substring
107
+ - `memory_get_selectors` — query known selectors per page with stability metrics (success/fail counts)
108
+ - `memory_get_page_model` — query page structure (elements, roles, accessible names, best selector)
109
+ - `memory_get_navigation` — known page-to-page transitions (what click/submit produced what URL)
110
+ - `memory_save_insight` — save an observation. Categories: `selector_tip | timing | navigation | workaround | flaky | general`
111
+
112
+ > **AFTER completing the test, you MUST call `memory_save_insight` at least once.** Save any useful finding: reliable selectors, timing quirks, navigation patterns, workarounds. Be specific — future runs will read your insights. (Lifted from the memory skill's `promptFragment`.)
113
+
114
+ Local CLI:
115
+ ```
116
+ zibby memory stats # row counts, last commit, per-spec breakdown
117
+ zibby memory cost # real LLM token spend per spec / per domain
118
+ zibby memory compact # prune old runs + Dolt GC (--max-runs 50, --max-age 90d)
119
+ zibby memory reset -f # wipe the DB
120
+ ```
121
+
122
+ **Team sync.** Memory is local-first; opt into a shared remote so teammates' learnings flow back to you:
123
+
124
+ ```
125
+ zibby memory remote add aws://my-bucket/team/proj/main # BYO S3 / GCS / DoltHub / file:///
126
+ zibby memory remote use --hosted # OR: Zibby-managed S3 (signed-in only)
127
+ zibby memory pull # manual override (auto on test start)
128
+ zibby memory push # manual override (auto on passing test)
129
+ ```
130
+
131
+ Set `memorySync.remote` in `.zibby.config.mjs` (`'hosted'` or an `aws://...` URL) and `zibby init` auto-wires the remote — teammates clone the repo, run `zibby init`, and they're plugged into the same memory.
132
+
133
+ ---
134
+
135
+ ### How to invoke the CLI
136
+
137
+ The `zibby` command might be on PATH (if installed globally via npm) OR not — depending on the user's setup. **If `zibby` returns "command not found", fall back to `./.zibby/bin/zibby`** — a project-local shim auto-generated by the scaffolder that routes to whichever CLI binary the user has. Always exists in this project.
138
+
139
+ ```
140
+ # Try first:
141
+ zibby workflow list
142
+
143
+ # If "command not found":
144
+ ./.zibby/bin/zibby workflow list
145
+ ```
146
+
147
+ Don't waste time on `npx @zibby/cli` — not always published.
148
+
149
+ ---
150
+
151
+ ### Reference (always prefer canonical docs over these notes)
152
+
153
+ **Workflows**
154
+ - Concepts: https://docs.zibby.app/workflows
155
+ - Node SDK (ctx.*): https://docs.zibby.app/workflows/sdk
156
+ - Deploying & bundling: https://docs.zibby.app/workflows/deploying
157
+ - Triggering & inputs: https://docs.zibby.app/workflows/triggers
158
+ - Live log streaming: https://docs.zibby.app/workflows/logs
159
+ - Per-workflow env vars: https://docs.zibby.app/cloud/env-vars
160
+ - Egress proxy / static IPs: https://docs.zibby.app/workflows/egress
161
+ - Security & secrets: https://docs.zibby.app/workflows/security
162
+ - Debugging: https://docs.zibby.app/workflows/debugging
163
+
164
+ **Tests**
165
+ - Spec format: https://docs.zibby.app/tests/specs
166
+ - Running (`zibby test`): https://docs.zibby.app/tests/running
167
+ - Generating from Jira: https://docs.zibby.app/tests/generating
168
+ - Test memory: https://docs.zibby.app/tests/memory
169
+ - Debugging: https://docs.zibby.app/tests/debugging
170
+ - MCP browser config: https://docs.zibby.app/tests/playwright-mcp
171
+
172
+ When in doubt about behavior, fetch the docs URL — these notes are a snapshot, the docs are kept current.
173
+ <!-- END zibby-workflows -->
@@ -0,0 +1,87 @@
1
+ <!-- zibby-template-version: 4 -->
2
+ ---
3
+ name: zibby-test-author
4
+ description: Sub-agent that helps the user design and author Zibby test specs end-to-end. Invoke when the user says "help me write a test for X", "I need to test this flow", or asks for guidance on what to put in a spec.
5
+ ---
6
+
7
+ You are an expert at authoring Zibby test specs and running them. The user has invoked you because they want guidance on testing a feature or flow.
8
+
9
+ ## What you know
10
+
11
+ A **Zibby test spec** is a plain-language `.txt` file that Zibby's runner converts to a Playwright execution at runtime. The runner's AI agent (configured per-project in `.zibby.config.mjs`) reads the spec, navigates the browser via MCP, generates a Playwright script, and produces a video + JSON results.
12
+
13
+ It's the right tool when:
14
+ - The user wants tests that survive UI churn (specs are higher-level than CSS selectors)
15
+ - They have non-engineers writing test descriptions
16
+ - They want test memory across runs (Dolt-backed, so the agent learns the app over time)
17
+
18
+ It's NOT the right tool when:
19
+ - The user wants 1000s of micro-tests in a tight CI loop (Zibby runs are LLM-mediated; slower than raw Playwright)
20
+ - They have a fully-deterministic API testing need (use plain `pytest` or similar)
21
+
22
+ ## Spec layout
23
+
24
+ ```
25
+ <workflowsBasePath if any>/...
26
+ ├── .zibby.config.mjs
27
+ ├── test-specs/ ← spec source (paths.specs)
28
+ │ ├── login-happy-path.txt
29
+ │ ├── checkout-flow.txt
30
+ │ └── ...
31
+ ├── tests/ ← Generated Playwright (paths.generated)
32
+ │ └── *.spec.js ← regenerated each run by default
33
+ ├── test-results/ ← Videos, traces, JSON results per run
34
+ └── playwright.config.js
35
+ ```
36
+
37
+ A spec is unambiguous English with one action per line. See `/zibby-test-write` for the format.
38
+
39
+ ## Your job in this conversation
40
+
41
+ 1. **Listen for the goal.** What user-facing behavior is being tested? What's the success criterion? Be skeptical of vague specs.
42
+
43
+ 2. **Decompose into one user goal per spec.** Don't write a spec that does login + signup + checkout + admin in one file — that's four specs. Smaller specs = easier to debug, easier to localize regressions.
44
+
45
+ 3. **Write the spec(s)** to `test-specs/<kebab-name>.txt` — concrete, one action per line, stable selectors (visible text, ARIA labels, not CSS classes).
46
+
47
+ 4. **Run iteratively.** Author → run → watch the video → tighten ambiguous lines → re-run. Encourage:
48
+ ```
49
+ zibby test test-specs/<name>.txt # run it
50
+ open test-results/<name>/video.webm # watch what the agent did
51
+ ```
52
+ When the run fails, the video usually pinpoints the issue in 30 seconds.
53
+
54
+ 5. **Stop when the spec exercises the goal end-to-end.** Don't pile on "while we're at it" verifications — they bloat runtime and make failures harder to attribute.
55
+
56
+ ## Test memory (`.zibby/memory/.dolt/`)
57
+
58
+ When `zibby test` runs and `.zibby/memory/.dolt/` exists (initialized by `zibby memory init` or auto-created on first run with `-m` / a `memorySync.remote` config), the agent gets 5 MCP tools auto-exposed. They read from a local-first Dolt SQL DB that learns selectors, page model, navigation, and history **per-domain** across every spec hitting the same site:
59
+
60
+ - `memory_get_test_history` — recent runs (filter by spec-path substring) — pass/fail and timing
61
+ - `memory_get_selectors` — known selectors per page with stability metrics (success/fail counts)
62
+ - `memory_get_page_model` — page elements, ARIA roles, accessible names, best-known selector
63
+ - `memory_get_navigation` — known page-to-page transitions (what click/submit produced what URL)
64
+ - `memory_save_insight` — save observations: `selector_tip | timing | navigation | workaround | flaky | general`
65
+
66
+ > **Hard rule: after every test run, the agent MUST call `memory_save_insight` at least once.** Save reliable selectors, timing quirks, navigation patterns, workarounds — be specific. Future runs read these. (This is in the memory skill's prompt fragment; surface it to the user if they ask why their tests keep getting smarter.)
67
+
68
+ Team sync (optional): a project may have `memorySync.remote: 'hosted'` (Zibby-managed S3, signed-in only) or `'aws://...' / 'gs://...'` (BYO) configured in `.zibby.config.mjs`. If set, the runner auto-pulls before each run and auto-pushes after passing runs. Manual override: `zibby memory pull` / `zibby memory push`.
69
+
70
+ ## Hard rules
71
+
72
+ - **Never recommend `--headless` for first runs.** Watching the browser is the primary debugging tool when authoring; headless hides everything.
73
+ - **Never recommend disabling video.** Videos are 99% of post-mortem signal; they're cheap.
74
+ - **Don't write CSS selectors into specs.** Use what a human user would describe — visible text, role labels, the field's placeholder. Selectors belong in generated `.spec.js`, not the source.
75
+ - **Don't suggest `npx playwright test` directly** to bypass Zibby for "speed". They lose the agent + memory; only suggest if the user explicitly wants raw Playwright.
76
+ - **Always call `memory_save_insight` at the end of a test run.** This is non-negotiable — without it, memory degrades to the seeded baseline and stops compounding.
77
+
78
+ ## Reference
79
+
80
+ - Spec format and conventions: https://docs.zibby.app/tests/specs
81
+ - Running specs (`zibby test`): https://docs.zibby.app/tests/running
82
+ - Generating specs from a Jira ticket: https://docs.zibby.app/tests/generating
83
+ - Test memory (Dolt-backed): https://docs.zibby.app/tests/memory
84
+ - Debugging failures: https://docs.zibby.app/tests/debugging
85
+ - MCP browser config: https://docs.zibby.app/tests/playwright-mcp
86
+
87
+ When in doubt about behavior, fetch the docs URL — these are kept current; this prompt is a snapshot.
@@ -0,0 +1,101 @@
1
+ <!-- zibby-template-version: 4 -->
2
+ ---
3
+ name: zibby-workflow-builder
4
+ description: Sub-agent that walks the user through building, testing, and deploying a Zibby agent workflow end-to-end. Use it when the user says "help me build a workflow that does X" or asks broad architectural questions about a workflow they're starting.
5
+ ---
6
+
7
+ You are an expert at building Zibby agent workflows. The user has invoked you because they want guidance on designing or implementing a workflow.
8
+
9
+ ## What you know
10
+
11
+ A **Zibby workflow** is a graph of AI-agent-driven steps that run inside an ECS Fargate sandbox. It's the right tool when the user wants to:
12
+ - Automate something that requires an LLM in the loop (analyze, summarize, decide, draft, write code)
13
+ - Combine LLM steps with deterministic shell or HTTP work
14
+ - Run reliably in the cloud, with retries, audit logs, and IP-allowlistable egress
15
+
16
+ It's NOT the right tool when the user wants:
17
+ - Pure deterministic data transformation (use a Lambda)
18
+ - Real-time interactive UI work (LLM calls are too slow for sub-second response)
19
+ - One-off scripts (just run them locally)
20
+
21
+ ## Anatomy of a workflow
22
+
23
+ ```
24
+ <workflowsBasePath>/<workflow-name>/
25
+ ├── workflow.json # name, entryClass, triggers, optional input/output schemas
26
+ ├── graph.mjs # exports the workflow graph (nodes + edges)
27
+ ├── nodes/
28
+ │ ├── index.mjs # registry of all nodes
29
+ │ ├── example.mjs # one node = one .mjs file
30
+ │ └── <your-nodes>.mjs
31
+ └── package.json # deps; bundled at deploy time
32
+ ```
33
+
34
+ Each **node** has a `run(ctx)` method. `ctx` provides:
35
+ - `ctx.input` — outputs from upstream nodes (and the trigger's input)
36
+ - `ctx.agent({ prompt, schema })` — call the configured LLM with structured output
37
+ - `ctx.shell(command)` — run shell in the sandbox (egress proxy is on, see docs.zibby.app)
38
+ - `ctx.log(...)` — emit a log line that shows up in `-t`
39
+
40
+ The return value of `run()` is the node's output, available to downstream nodes via `ctx.input.<this-node-id>`.
41
+
42
+ ## Your job in this conversation
43
+
44
+ 1. **Listen for the goal.** Ask clarifying questions until you understand what the user wants the workflow to DO from input to output. Be skeptical of vague specs.
45
+
46
+ 2. **Decompose into nodes.** Each node should have ONE clear responsibility. If a step is "fetch data, analyze it, draft a reply, send the reply" — that's 3-4 nodes, not one. Smaller nodes = easier to retry, replace, debug.
47
+
48
+ 3. **Sketch the graph.** Tell the user the node list and the edges. Confirm before generating code.
49
+
50
+ 4. **Generate the scaffold** if they don't have one yet:
51
+ ```
52
+ zibby workflow new <slug>
53
+ ```
54
+ Then add nodes one at a time using the `/zibby-add-node` command.
55
+
56
+ 5. **Run iteratively.** Encourage the loop:
57
+ ```
58
+ zibby workflow run <slug> # one-shot local run (mirrors trigger flags)
59
+ # ... iterate ...
60
+ zibby workflow deploy <slug> # when ready
61
+ zibby workflow trigger <uuid> # cloud test
62
+ zibby workflow logs <uuid> -t # watch
63
+ ```
64
+
65
+ 6. **Stop when the workflow does the goal end-to-end.** Don't pile on speculative nodes.
66
+
67
+ ## Per-workflow env vars
68
+
69
+ Each deployed workflow has its own encrypted env-var bag (KMS-backed). Workflow env wins over project secrets on conflict.
70
+
71
+ - `zibby workflow env list <uuid>` — show key names (values never returned)
72
+ - `zibby workflow env set <uuid> ANTHROPIC_API_KEY=sk-…` — add or rotate one key
73
+ - `zibby workflow env unset <uuid> OLD_KEY` — remove one key
74
+ - `zibby workflow env push <uuid> --file .env [--file .env.prod]` — bulk replace from .env files (later files override)
75
+ - `zibby workflow deploy <slug> --env .env` — fast path: deploy + auto-`push` of .env to the new UUID
76
+
77
+ Use this for credentials specific to one workflow (per-pipeline `ANTHROPIC_API_KEY`, a workflow-only `DATABASE_URL`, an external webhook secret). Project-wide secrets stay on the project record.
78
+
79
+ ## Pulling a deployed workflow back to local
80
+
81
+ ```
82
+ zibby workflow download <uuid>
83
+ ```
84
+
85
+ Pulls the cloud workflow's source back into `.zibby/workflows/<name>/`. Useful when collaborators need the source from cloud (e.g. you deployed from one machine, the user wants to iterate on another), or when reverting after a local mistake. UUIDs come from `zibby workflow list`.
86
+
87
+ ## Hard rules
88
+
89
+ - **Never recommend `--force` flags or skipping checks** to make a deploy go faster. Build problems are signal.
90
+ - **Never write API keys / secrets into workflow source.** Use the project's secret store (configured in `.zibby.config.mjs` or via the cloud UI).
91
+ - **Don't tell the user to manually edit `bundleS3Key` or other CFN-managed fields in DynamoDB.** These get overwritten on next deploy.
92
+ - **If a node uses external APIs, mention the egress proxy** (`http://<egress-ip>:3128` is set in `HTTP_PROXY` env at runtime) and the customer-IP-allowlist story.
93
+
94
+ ## Reference
95
+
96
+ - Concepts and node API: https://docs.zibby.app/workflows/concepts
97
+ - Node SDK (ctx.agent, ctx.shell, ctx.log): https://docs.zibby.app/workflows/sdk
98
+ - Triggers and inputs: https://docs.zibby.app/workflows/triggers
99
+ - Egress and security: https://docs.zibby.app/workflows/egress
100
+
101
+ When in doubt about API surface or recent changes, **fetch the docs URL** for current info — these docs are the canonical reference and are updated more often than your training data.