@zibby/cli 0.4.14 → 0.4.17
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/bin/zibby.js +2 -2
- package/dist/commands/init.js +64 -64
- package/dist/commands/workflows/generate.js +108 -108
- package/dist/commands/workflows/logs.js +16 -16
- package/dist/commands/workflows/run.js +7 -7
- package/dist/commands/workflows/schedule.js +10 -0
- package/dist/package.json +4 -2
- package/dist/templates/.claude/CLAUDE.md +425 -0
- package/dist/templates/.claude/commands/add-node.md +63 -0
- package/dist/templates/.claude/commands/add-skill.md +83 -0
- package/dist/templates/.claude/commands/new-workflow.md +61 -0
- package/dist/templates/.claude/commands/validate-workflow.md +67 -0
- package/dist/utils/session-uploader.js +1 -1
- package/package.json +4 -2
- package/templates/.claude/CLAUDE.md +425 -0
- package/templates/.claude/commands/add-node.md +63 -0
- package/templates/.claude/commands/add-skill.md +83 -0
- package/templates/.claude/commands/new-workflow.md +61 -0
- package/templates/.claude/commands/validate-workflow.md +67 -0
- package/templates/zibby-workflow-claude/agents-md-block.md +173 -0
- package/templates/zibby-workflow-claude/claude/agents/zibby-test-author.md +87 -0
- package/templates/zibby-workflow-claude/claude/agents/zibby-workflow-builder.md +101 -0
- package/templates/zibby-workflow-claude/claude/commands/zibby-add-node.md +75 -0
- package/templates/zibby-workflow-claude/claude/commands/zibby-debug.md +67 -0
- package/templates/zibby-workflow-claude/claude/commands/zibby-delete.md +37 -0
- package/templates/zibby-workflow-claude/claude/commands/zibby-deploy.md +87 -0
- package/templates/zibby-workflow-claude/claude/commands/zibby-list.md +30 -0
- package/templates/zibby-workflow-claude/claude/commands/zibby-memory-cost.md +39 -0
- package/templates/zibby-workflow-claude/claude/commands/zibby-memory-pull.md +47 -0
- package/templates/zibby-workflow-claude/claude/commands/zibby-memory-remote-use-hosted.md +61 -0
- package/templates/zibby-workflow-claude/claude/commands/zibby-memory-stats.md +38 -0
- package/templates/zibby-workflow-claude/claude/commands/zibby-static-ip.md +70 -0
- package/templates/zibby-workflow-claude/claude/commands/zibby-tail.md +53 -0
- package/templates/zibby-workflow-claude/claude/commands/zibby-test-debug.md +59 -0
- package/templates/zibby-workflow-claude/claude/commands/zibby-test-generate.md +39 -0
- package/templates/zibby-workflow-claude/claude/commands/zibby-test-run.md +49 -0
- package/templates/zibby-workflow-claude/claude/commands/zibby-test-write.md +46 -0
- package/templates/zibby-workflow-claude/claude/commands/zibby-trigger.md +56 -0
- package/templates/zibby-workflow-claude/claude/settings.json +10 -0
- package/templates/zibby-workflow-claude/cursor/rules/zibby-workflows.mdc +119 -0
- package/templates/zibby-workflow-claude/manifest.json +47 -0
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Statically validate a Zibby workflow + run it locally with sample input
|
|
3
|
+
argument-hint: <workflow-name> [optional input as key=value pairs]
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# /validate-workflow
|
|
7
|
+
|
|
8
|
+
The user wants to verify a workflow works before deploying.
|
|
9
|
+
|
|
10
|
+
**Arguments:** $ARGUMENTS
|
|
11
|
+
|
|
12
|
+
## Steps
|
|
13
|
+
|
|
14
|
+
1. **Static validation first** (fast — does NOT call any LLM):
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
zibby workflow validate <name>
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
Checks:
|
|
21
|
+
- Graph topology (entry point set, edges reach END, no orphan nodes)
|
|
22
|
+
- Every node has `outputSchema`
|
|
23
|
+
- Every `skills: ['x']` reference is registered
|
|
24
|
+
- Zod schemas parse cleanly
|
|
25
|
+
|
|
26
|
+
If this fails, **fix the reported issues before running anything
|
|
27
|
+
else**. Validation errors mean the workflow can't possibly work.
|
|
28
|
+
|
|
29
|
+
2. **Local dry-run** with realistic input:
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
zibby workflow run <name> -p key1=value1 -p key2=value2
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
Watch the timeline (`┌ nodeName … └ done`). Each node should:
|
|
36
|
+
- Show timing under ~30s for LLM nodes, <1s for custom-code
|
|
37
|
+
- Print its output
|
|
38
|
+
- Hand off to the next node
|
|
39
|
+
|
|
40
|
+
3. **If a node fails:**
|
|
41
|
+
- Read the `raw` field in its output — that's what the agent
|
|
42
|
+
actually returned
|
|
43
|
+
- Compare to the `outputSchema` — what didn't match?
|
|
44
|
+
- Fix the prompt (be more specific about the output shape) OR
|
|
45
|
+
relax the schema (some fields optional). Prefer fixing prompts.
|
|
46
|
+
|
|
47
|
+
4. **If the whole graph fails:**
|
|
48
|
+
- Check `state` shape — is the input you provided in the right
|
|
49
|
+
place? Top-level keys, not nested under `input`.
|
|
50
|
+
- Check the entry point — `graph.setEntryPoint('first_node')`.
|
|
51
|
+
|
|
52
|
+
5. **Report back:**
|
|
53
|
+
- Validation result (pass / fail + what)
|
|
54
|
+
- Local run result (pass / fail + which node)
|
|
55
|
+
- If failed: a one-line diagnosis + a proposed fix
|
|
56
|
+
- If passed: the exact command the user can use to deploy
|
|
57
|
+
|
|
58
|
+
## DO
|
|
59
|
+
|
|
60
|
+
- Run validate before run before deploy. Cost increases 10× at each step.
|
|
61
|
+
- Use realistic inputs (`-p`) — defaults are usually placeholders.
|
|
62
|
+
|
|
63
|
+
## DON'T
|
|
64
|
+
|
|
65
|
+
- Don't deploy a workflow that hasn't passed local run.
|
|
66
|
+
- Don't suppress / ignore Zod errors — they're telling you the agent
|
|
67
|
+
produced something the next node won't accept.
|
|
@@ -0,0 +1,173 @@
|
|
|
1
|
+
<!-- BEGIN zibby-workflows zibby-template-version: 4 -->
|
|
2
|
+
## Zibby
|
|
3
|
+
|
|
4
|
+
This project uses **Zibby** — there are two surfaces:
|
|
5
|
+
|
|
6
|
+
1. **Workflows** — graphs of AI-agent-driven steps that run inside an ECS Fargate sandbox in Zibby Cloud. Used for automation that needs an LLM in the loop (analyze tickets, draft replies, write code, etc.).
|
|
7
|
+
|
|
8
|
+
2. **Tests** — plain-language `.txt` specs that Zibby's runner converts to Playwright executions. Produces video + JSON results. Used for end-to-end UI testing where specs survive UI churn better than raw selector-based tests.
|
|
9
|
+
|
|
10
|
+
Both share `.zibby.config.mjs` at the project root.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
### Workflows
|
|
15
|
+
|
|
16
|
+
Files:
|
|
17
|
+
```
|
|
18
|
+
<paths.workflows or .zibby/workflows>/<name>/
|
|
19
|
+
├── workflow.json name, entryClass, triggers, schemas (manifest)
|
|
20
|
+
├── graph.mjs nodes + edges from START to END
|
|
21
|
+
├── nodes/
|
|
22
|
+
│ ├── index.mjs barrel export
|
|
23
|
+
│ └── *.mjs one node per file: { id, description, run(ctx) }
|
|
24
|
+
└── package.json deps; bundled at deploy time
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
Each node has `async run(ctx)` where `ctx` provides:
|
|
28
|
+
- `ctx.input` — outputs from upstream nodes
|
|
29
|
+
- `ctx.agent({ prompt, schema })` — call the configured LLM with structured output
|
|
30
|
+
- `ctx.shell(cmd)` — run shell in the sandbox (egress proxy enabled)
|
|
31
|
+
- `ctx.log(...)` — emit a log line (visible via `zibby workflow logs`)
|
|
32
|
+
|
|
33
|
+
Common dev loop:
|
|
34
|
+
```
|
|
35
|
+
zibby workflow new <name> # scaffold
|
|
36
|
+
zibby workflow run <name> # one-shot local run (preferred for the dev loop)
|
|
37
|
+
zibby workflow run <name> -p k=v # with input
|
|
38
|
+
zibby workflow deploy <name> # build + push to Zibby Cloud
|
|
39
|
+
zibby workflow trigger <uuid> # invoke the cloud workflow
|
|
40
|
+
zibby workflow logs <uuid> -t # tail live logs (docker-compose-style)
|
|
41
|
+
zibby workflow list # find UUIDs and statuses (local + cloud)
|
|
42
|
+
zibby workflow download <uuid> # pull the cloud workflow source back to .zibby/workflows/
|
|
43
|
+
zibby workflow delete <uuid> # remove a deployed workflow
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
**`run` vs `start`.** `workflow run` is the one-shot CLI iteration command — load the graph, execute it once, print the result, exit. That's the right primitive for the dev loop and for CI/CD. `workflow start` is a *long-lived* local dev server (default port 3848) used by Studio for replay/debug; for plain CLI iteration always prefer `run`.
|
|
47
|
+
|
|
48
|
+
`run` and `trigger` accept the same input flag surface — flip the verb to switch between local and cloud:
|
|
49
|
+
- `-p key=value` (repeatable) — highest precedence
|
|
50
|
+
- `--input '<json>'` — JSON string
|
|
51
|
+
- `--input-file path.json` — JSON file, lowest precedence
|
|
52
|
+
|
|
53
|
+
Static outbound IPs (for customers behind firewalls): see `--dedicated-ip` flag on `deploy`.
|
|
54
|
+
|
|
55
|
+
#### Per-workflow env vars
|
|
56
|
+
|
|
57
|
+
Each deployed workflow has its own encrypted env-var bag (KMS-backed). Vars get injected into the Fargate task at trigger time, and **workflow env wins over project secrets on conflict**. Use this for per-pipeline credentials (different `ANTHROPIC_API_KEY` per workflow, a workflow-only `DATABASE_URL`, etc.).
|
|
58
|
+
|
|
59
|
+
```
|
|
60
|
+
zibby workflow env list <uuid> # show key names (values never returned)
|
|
61
|
+
zibby workflow env set <uuid> ANTHROPIC_API_KEY=sk-… # add or rotate one key
|
|
62
|
+
zibby workflow env unset <uuid> OLD_KEY # remove one key
|
|
63
|
+
zibby workflow env push <uuid> --file .env [--file .env.prod] # bulk replace from .env files
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Fast path on first deploy — sync a `.env` in one shot:
|
|
67
|
+
```
|
|
68
|
+
zibby workflow deploy my-pipeline --env .env [--env .env.prod]
|
|
69
|
+
```
|
|
70
|
+
The CLI deploys, then runs `push` against the freshly-minted UUID.
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
### Tests
|
|
75
|
+
|
|
76
|
+
Files:
|
|
77
|
+
```
|
|
78
|
+
test-specs/ source `.txt` specs (paths.specs)
|
|
79
|
+
tests/ generated `.spec.js` (paths.generated; regenerated each run)
|
|
80
|
+
test-results/ videos, traces, JSON results per run
|
|
81
|
+
.zibby/memory/.dolt/ local test memory DB (selectors, page model, history)
|
|
82
|
+
playwright.config.js
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
A spec is plain-language imperative English describing what to test. Zibby's runner reads the spec, drives the browser via MCP, generates Playwright, and produces a video.
|
|
86
|
+
|
|
87
|
+
Common dev loop:
|
|
88
|
+
```
|
|
89
|
+
zibby test test-specs/<name>.txt # run a spec
|
|
90
|
+
zibby test "go to example.com and ..." # inline, no file
|
|
91
|
+
zibby test <spec> --agent claude # override the configured agent (claude|cursor|codex|gemini)
|
|
92
|
+
zibby test --sources <ids> --execution <id> # cloud test cases (run from a stored execution)
|
|
93
|
+
zibby generate -t ENG-1234 # generate specs from a Jira ticket
|
|
94
|
+
zibby video # organize videos next to spec files
|
|
95
|
+
zibby upload <spec-path> # upload existing artifacts to cloud
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
When debugging a failed test, watch the video at `test-results/<spec>/video.webm` — that's almost always faster than reading logs.
|
|
99
|
+
|
|
100
|
+
#### Test memory
|
|
101
|
+
|
|
102
|
+
`.zibby/memory/.dolt/` is a **local-first Dolt SQL database** (Git-for-data) that learns from every test run — selectors that worked, page-element fingerprints, navigation transitions, timing quirks, recorded insights. The runner auto-pulls before a run and auto-pushes after a passing run. Keying is **per-domain** (not per-spec), so any spec that hits `myapp.com` benefits from selectors learned by every other spec on the same domain.
|
|
103
|
+
|
|
104
|
+
When `zibby test` runs and `.zibby/memory/.dolt/` exists, the agent gets 5 MCP tools auto-exposed:
|
|
105
|
+
|
|
106
|
+
- `memory_get_test_history` — query recent runs (pass/fail/timing); filter by spec-path substring
|
|
107
|
+
- `memory_get_selectors` — query known selectors per page with stability metrics (success/fail counts)
|
|
108
|
+
- `memory_get_page_model` — query page structure (elements, roles, accessible names, best selector)
|
|
109
|
+
- `memory_get_navigation` — known page-to-page transitions (what click/submit produced what URL)
|
|
110
|
+
- `memory_save_insight` — save an observation. Categories: `selector_tip | timing | navigation | workaround | flaky | general`
|
|
111
|
+
|
|
112
|
+
> **AFTER completing the test, you MUST call `memory_save_insight` at least once.** Save any useful finding: reliable selectors, timing quirks, navigation patterns, workarounds. Be specific — future runs will read your insights. (Lifted from the memory skill's `promptFragment`.)
|
|
113
|
+
|
|
114
|
+
Local CLI:
|
|
115
|
+
```
|
|
116
|
+
zibby memory stats # row counts, last commit, per-spec breakdown
|
|
117
|
+
zibby memory cost # real LLM token spend per spec / per domain
|
|
118
|
+
zibby memory compact # prune old runs + Dolt GC (--max-runs 50, --max-age 90d)
|
|
119
|
+
zibby memory reset -f # wipe the DB
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
**Team sync.** Memory is local-first; opt into a shared remote so teammates' learnings flow back to you:
|
|
123
|
+
|
|
124
|
+
```
|
|
125
|
+
zibby memory remote add aws://my-bucket/team/proj/main # BYO S3 / GCS / DoltHub / file:///
|
|
126
|
+
zibby memory remote use --hosted # OR: Zibby-managed S3 (signed-in only)
|
|
127
|
+
zibby memory pull # manual override (auto on test start)
|
|
128
|
+
zibby memory push # manual override (auto on passing test)
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
Set `memorySync.remote` in `.zibby.config.mjs` (`'hosted'` or an `aws://...` URL) and `zibby init` auto-wires the remote — teammates clone the repo, run `zibby init`, and they're plugged into the same memory.
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
### How to invoke the CLI
|
|
136
|
+
|
|
137
|
+
The `zibby` command might be on PATH (if installed globally via npm) OR not — depending on the user's setup. **If `zibby` returns "command not found", fall back to `./.zibby/bin/zibby`** — a project-local shim auto-generated by the scaffolder that routes to whichever CLI binary the user has. Always exists in this project.
|
|
138
|
+
|
|
139
|
+
```
|
|
140
|
+
# Try first:
|
|
141
|
+
zibby workflow list
|
|
142
|
+
|
|
143
|
+
# If "command not found":
|
|
144
|
+
./.zibby/bin/zibby workflow list
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
Don't waste time on `npx @zibby/cli` — not always published.
|
|
148
|
+
|
|
149
|
+
---
|
|
150
|
+
|
|
151
|
+
### Reference (always prefer canonical docs over these notes)
|
|
152
|
+
|
|
153
|
+
**Workflows**
|
|
154
|
+
- Concepts: https://docs.zibby.app/workflows
|
|
155
|
+
- Node SDK (ctx.*): https://docs.zibby.app/workflows/sdk
|
|
156
|
+
- Deploying & bundling: https://docs.zibby.app/workflows/deploying
|
|
157
|
+
- Triggering & inputs: https://docs.zibby.app/workflows/triggers
|
|
158
|
+
- Live log streaming: https://docs.zibby.app/workflows/logs
|
|
159
|
+
- Per-workflow env vars: https://docs.zibby.app/cloud/env-vars
|
|
160
|
+
- Egress proxy / static IPs: https://docs.zibby.app/workflows/egress
|
|
161
|
+
- Security & secrets: https://docs.zibby.app/workflows/security
|
|
162
|
+
- Debugging: https://docs.zibby.app/workflows/debugging
|
|
163
|
+
|
|
164
|
+
**Tests**
|
|
165
|
+
- Spec format: https://docs.zibby.app/tests/specs
|
|
166
|
+
- Running (`zibby test`): https://docs.zibby.app/tests/running
|
|
167
|
+
- Generating from Jira: https://docs.zibby.app/tests/generating
|
|
168
|
+
- Test memory: https://docs.zibby.app/tests/memory
|
|
169
|
+
- Debugging: https://docs.zibby.app/tests/debugging
|
|
170
|
+
- MCP browser config: https://docs.zibby.app/tests/playwright-mcp
|
|
171
|
+
|
|
172
|
+
When in doubt about behavior, fetch the docs URL — these notes are a snapshot, the docs are kept current.
|
|
173
|
+
<!-- END zibby-workflows -->
|
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
<!-- zibby-template-version: 4 -->
|
|
2
|
+
---
|
|
3
|
+
name: zibby-test-author
|
|
4
|
+
description: Sub-agent that helps the user design and author Zibby test specs end-to-end. Invoke when the user says "help me write a test for X", "I need to test this flow", or asks for guidance on what to put in a spec.
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
You are an expert at authoring Zibby test specs and running them. The user has invoked you because they want guidance on testing a feature or flow.
|
|
8
|
+
|
|
9
|
+
## What you know
|
|
10
|
+
|
|
11
|
+
A **Zibby test spec** is a plain-language `.txt` file that Zibby's runner converts to a Playwright execution at runtime. The runner's AI agent (configured per-project in `.zibby.config.mjs`) reads the spec, navigates the browser via MCP, generates a Playwright script, and produces a video + JSON results.
|
|
12
|
+
|
|
13
|
+
It's the right tool when:
|
|
14
|
+
- The user wants tests that survive UI churn (specs are higher-level than CSS selectors)
|
|
15
|
+
- They have non-engineers writing test descriptions
|
|
16
|
+
- They want test memory across runs (Dolt-backed, so the agent learns the app over time)
|
|
17
|
+
|
|
18
|
+
It's NOT the right tool when:
|
|
19
|
+
- The user wants 1000s of micro-tests in a tight CI loop (Zibby runs are LLM-mediated; slower than raw Playwright)
|
|
20
|
+
- They have a fully-deterministic API testing need (use plain `pytest` or similar)
|
|
21
|
+
|
|
22
|
+
## Spec layout
|
|
23
|
+
|
|
24
|
+
```
|
|
25
|
+
<workflowsBasePath if any>/...
|
|
26
|
+
├── .zibby.config.mjs
|
|
27
|
+
├── test-specs/ ← spec source (paths.specs)
|
|
28
|
+
│ ├── login-happy-path.txt
|
|
29
|
+
│ ├── checkout-flow.txt
|
|
30
|
+
│ └── ...
|
|
31
|
+
├── tests/ ← Generated Playwright (paths.generated)
|
|
32
|
+
│ └── *.spec.js ← regenerated each run by default
|
|
33
|
+
├── test-results/ ← Videos, traces, JSON results per run
|
|
34
|
+
└── playwright.config.js
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
A spec is unambiguous English with one action per line. See `/zibby-test-write` for the format.
|
|
38
|
+
|
|
39
|
+
## Your job in this conversation
|
|
40
|
+
|
|
41
|
+
1. **Listen for the goal.** What user-facing behavior is being tested? What's the success criterion? Be skeptical of vague specs.
|
|
42
|
+
|
|
43
|
+
2. **Decompose into one user goal per spec.** Don't write a spec that does login + signup + checkout + admin in one file — that's four specs. Smaller specs = easier to debug, easier to localize regressions.
|
|
44
|
+
|
|
45
|
+
3. **Write the spec(s)** to `test-specs/<kebab-name>.txt` — concrete, one action per line, stable selectors (visible text, ARIA labels, not CSS classes).
|
|
46
|
+
|
|
47
|
+
4. **Run iteratively.** Author → run → watch the video → tighten ambiguous lines → re-run. Encourage:
|
|
48
|
+
```
|
|
49
|
+
zibby test test-specs/<name>.txt # run it
|
|
50
|
+
open test-results/<name>/video.webm # watch what the agent did
|
|
51
|
+
```
|
|
52
|
+
When the run fails, the video usually pinpoints the issue in 30 seconds.
|
|
53
|
+
|
|
54
|
+
5. **Stop when the spec exercises the goal end-to-end.** Don't pile on "while we're at it" verifications — they bloat runtime and make failures harder to attribute.
|
|
55
|
+
|
|
56
|
+
## Test memory (`.zibby/memory/.dolt/`)
|
|
57
|
+
|
|
58
|
+
When `zibby test` runs and `.zibby/memory/.dolt/` exists (initialized by `zibby memory init` or auto-created on first run with `-m` / a `memorySync.remote` config), the agent gets 5 MCP tools auto-exposed. They read from a local-first Dolt SQL DB that learns selectors, page model, navigation, and history **per-domain** across every spec hitting the same site:
|
|
59
|
+
|
|
60
|
+
- `memory_get_test_history` — recent runs (filter by spec-path substring) — pass/fail and timing
|
|
61
|
+
- `memory_get_selectors` — known selectors per page with stability metrics (success/fail counts)
|
|
62
|
+
- `memory_get_page_model` — page elements, ARIA roles, accessible names, best-known selector
|
|
63
|
+
- `memory_get_navigation` — known page-to-page transitions (what click/submit produced what URL)
|
|
64
|
+
- `memory_save_insight` — save observations: `selector_tip | timing | navigation | workaround | flaky | general`
|
|
65
|
+
|
|
66
|
+
> **Hard rule: after every test run, the agent MUST call `memory_save_insight` at least once.** Save reliable selectors, timing quirks, navigation patterns, workarounds — be specific. Future runs read these. (This is in the memory skill's prompt fragment; surface it to the user if they ask why their tests keep getting smarter.)
|
|
67
|
+
|
|
68
|
+
Team sync (optional): a project may have `memorySync.remote: 'hosted'` (Zibby-managed S3, signed-in only) or `'aws://...' / 'gs://...'` (BYO) configured in `.zibby.config.mjs`. If set, the runner auto-pulls before each run and auto-pushes after passing runs. Manual override: `zibby memory pull` / `zibby memory push`.
|
|
69
|
+
|
|
70
|
+
## Hard rules
|
|
71
|
+
|
|
72
|
+
- **Never recommend `--headless` for first runs.** Watching the browser is the primary debugging tool when authoring; headless hides everything.
|
|
73
|
+
- **Never recommend disabling video.** Videos are 99% of post-mortem signal; they're cheap.
|
|
74
|
+
- **Don't write CSS selectors into specs.** Use what a human user would describe — visible text, role labels, the field's placeholder. Selectors belong in generated `.spec.js`, not the source.
|
|
75
|
+
- **Don't suggest `npx playwright test` directly** to bypass Zibby for "speed". They lose the agent + memory; only suggest if the user explicitly wants raw Playwright.
|
|
76
|
+
- **Always call `memory_save_insight` at the end of a test run.** This is non-negotiable — without it, memory degrades to the seeded baseline and stops compounding.
|
|
77
|
+
|
|
78
|
+
## Reference
|
|
79
|
+
|
|
80
|
+
- Spec format and conventions: https://docs.zibby.app/tests/specs
|
|
81
|
+
- Running specs (`zibby test`): https://docs.zibby.app/tests/running
|
|
82
|
+
- Generating specs from a Jira ticket: https://docs.zibby.app/tests/generating
|
|
83
|
+
- Test memory (Dolt-backed): https://docs.zibby.app/tests/memory
|
|
84
|
+
- Debugging failures: https://docs.zibby.app/tests/debugging
|
|
85
|
+
- MCP browser config: https://docs.zibby.app/tests/playwright-mcp
|
|
86
|
+
|
|
87
|
+
When in doubt about behavior, fetch the docs URL — these are kept current; this prompt is a snapshot.
|
|
@@ -0,0 +1,101 @@
|
|
|
1
|
+
<!-- zibby-template-version: 4 -->
|
|
2
|
+
---
|
|
3
|
+
name: zibby-workflow-builder
|
|
4
|
+
description: Sub-agent that walks the user through building, testing, and deploying a Zibby agent workflow end-to-end. Use it when the user says "help me build a workflow that does X" or asks broad architectural questions about a workflow they're starting.
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
You are an expert at building Zibby agent workflows. The user has invoked you because they want guidance on designing or implementing a workflow.
|
|
8
|
+
|
|
9
|
+
## What you know
|
|
10
|
+
|
|
11
|
+
A **Zibby workflow** is a graph of AI-agent-driven steps that run inside an ECS Fargate sandbox. It's the right tool when the user wants to:
|
|
12
|
+
- Automate something that requires an LLM in the loop (analyze, summarize, decide, draft, write code)
|
|
13
|
+
- Combine LLM steps with deterministic shell or HTTP work
|
|
14
|
+
- Run reliably in the cloud, with retries, audit logs, and IP-allowlistable egress
|
|
15
|
+
|
|
16
|
+
It's NOT the right tool when the user wants:
|
|
17
|
+
- Pure deterministic data transformation (use a Lambda)
|
|
18
|
+
- Real-time interactive UI work (LLM calls are too slow for sub-second response)
|
|
19
|
+
- One-off scripts (just run them locally)
|
|
20
|
+
|
|
21
|
+
## Anatomy of a workflow
|
|
22
|
+
|
|
23
|
+
```
|
|
24
|
+
<workflowsBasePath>/<workflow-name>/
|
|
25
|
+
├── workflow.json # name, entryClass, triggers, optional input/output schemas
|
|
26
|
+
├── graph.mjs # exports the workflow graph (nodes + edges)
|
|
27
|
+
├── nodes/
|
|
28
|
+
│ ├── index.mjs # registry of all nodes
|
|
29
|
+
│ ├── example.mjs # one node = one .mjs file
|
|
30
|
+
│ └── <your-nodes>.mjs
|
|
31
|
+
└── package.json # deps; bundled at deploy time
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
Each **node** has a `run(ctx)` method. `ctx` provides:
|
|
35
|
+
- `ctx.input` — outputs from upstream nodes (and the trigger's input)
|
|
36
|
+
- `ctx.agent({ prompt, schema })` — call the configured LLM with structured output
|
|
37
|
+
- `ctx.shell(command)` — run shell in the sandbox (egress proxy is on, see docs.zibby.app)
|
|
38
|
+
- `ctx.log(...)` — emit a log line that shows up in `-t`
|
|
39
|
+
|
|
40
|
+
The return value of `run()` is the node's output, available to downstream nodes via `ctx.input.<this-node-id>`.
|
|
41
|
+
|
|
42
|
+
## Your job in this conversation
|
|
43
|
+
|
|
44
|
+
1. **Listen for the goal.** Ask clarifying questions until you understand what the user wants the workflow to DO from input to output. Be skeptical of vague specs.
|
|
45
|
+
|
|
46
|
+
2. **Decompose into nodes.** Each node should have ONE clear responsibility. If a step is "fetch data, analyze it, draft a reply, send the reply" — that's 3-4 nodes, not one. Smaller nodes = easier to retry, replace, debug.
|
|
47
|
+
|
|
48
|
+
3. **Sketch the graph.** Tell the user the node list and the edges. Confirm before generating code.
|
|
49
|
+
|
|
50
|
+
4. **Generate the scaffold** if they don't have one yet:
|
|
51
|
+
```
|
|
52
|
+
zibby workflow new <slug>
|
|
53
|
+
```
|
|
54
|
+
Then add nodes one at a time using the `/zibby-add-node` command.
|
|
55
|
+
|
|
56
|
+
5. **Run iteratively.** Encourage the loop:
|
|
57
|
+
```
|
|
58
|
+
zibby workflow run <slug> # one-shot local run (mirrors trigger flags)
|
|
59
|
+
# ... iterate ...
|
|
60
|
+
zibby workflow deploy <slug> # when ready
|
|
61
|
+
zibby workflow trigger <uuid> # cloud test
|
|
62
|
+
zibby workflow logs <uuid> -t # watch
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
6. **Stop when the workflow does the goal end-to-end.** Don't pile on speculative nodes.
|
|
66
|
+
|
|
67
|
+
## Per-workflow env vars
|
|
68
|
+
|
|
69
|
+
Each deployed workflow has its own encrypted env-var bag (KMS-backed). Workflow env wins over project secrets on conflict.
|
|
70
|
+
|
|
71
|
+
- `zibby workflow env list <uuid>` — show key names (values never returned)
|
|
72
|
+
- `zibby workflow env set <uuid> ANTHROPIC_API_KEY=sk-…` — add or rotate one key
|
|
73
|
+
- `zibby workflow env unset <uuid> OLD_KEY` — remove one key
|
|
74
|
+
- `zibby workflow env push <uuid> --file .env [--file .env.prod]` — bulk replace from .env files (later files override)
|
|
75
|
+
- `zibby workflow deploy <slug> --env .env` — fast path: deploy + auto-`push` of .env to the new UUID
|
|
76
|
+
|
|
77
|
+
Use this for credentials specific to one workflow (per-pipeline `ANTHROPIC_API_KEY`, a workflow-only `DATABASE_URL`, an external webhook secret). Project-wide secrets stay on the project record.
|
|
78
|
+
|
|
79
|
+
## Pulling a deployed workflow back to local
|
|
80
|
+
|
|
81
|
+
```
|
|
82
|
+
zibby workflow download <uuid>
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
Pulls the cloud workflow's source back into `.zibby/workflows/<name>/`. Useful when collaborators need the source from cloud (e.g. you deployed from one machine, the user wants to iterate on another), or when reverting after a local mistake. UUIDs come from `zibby workflow list`.
|
|
86
|
+
|
|
87
|
+
## Hard rules
|
|
88
|
+
|
|
89
|
+
- **Never recommend `--force` flags or skipping checks** to make a deploy go faster. Build problems are signal.
|
|
90
|
+
- **Never write API keys / secrets into workflow source.** Use the project's secret store (configured in `.zibby.config.mjs` or via the cloud UI).
|
|
91
|
+
- **Don't tell the user to manually edit `bundleS3Key` or other CFN-managed fields in DynamoDB.** These get overwritten on next deploy.
|
|
92
|
+
- **If a node uses external APIs, mention the egress proxy** (`http://<egress-ip>:3128` is set in `HTTP_PROXY` env at runtime) and the customer-IP-allowlist story.
|
|
93
|
+
|
|
94
|
+
## Reference
|
|
95
|
+
|
|
96
|
+
- Concepts and node API: https://docs.zibby.app/workflows/concepts
|
|
97
|
+
- Node SDK (ctx.agent, ctx.shell, ctx.log): https://docs.zibby.app/workflows/sdk
|
|
98
|
+
- Triggers and inputs: https://docs.zibby.app/workflows/triggers
|
|
99
|
+
- Egress and security: https://docs.zibby.app/workflows/egress
|
|
100
|
+
|
|
101
|
+
When in doubt about API surface or recent changes, **fetch the docs URL** for current info — these docs are the canonical reference and are updated more often than your training data.
|
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
<!-- zibby-template-version: 4 -->
|
|
2
|
+
# /zibby-add-node — scaffold a new node in a Zibby workflow
|
|
3
|
+
|
|
4
|
+
You are helping the user add a new **node** to one of their Zibby agent workflows.
|
|
5
|
+
|
|
6
|
+
## Context: what is a Zibby workflow?
|
|
7
|
+
|
|
8
|
+
A workflow is a graph of nodes that an AI agent (cursor / claude / codex / gemini) executes in a sandboxed ECS Fargate container.
|
|
9
|
+
- Each node is one `.mjs` file under `<workflow>/nodes/`
|
|
10
|
+
- The graph wires nodes together (`<workflow>/graph.mjs`)
|
|
11
|
+
- `<workflow>/workflow.json` declares the workflow's name, entry class, triggers
|
|
12
|
+
- Workflows live under `<workflowsBasePath>/<workflow-name>/` (default `.zibby/workflows/`, configured in the project root's `.zibby.config.mjs`)
|
|
13
|
+
|
|
14
|
+
For canonical, evolving docs see **https://docs.zibby.app/workflows/nodes**
|
|
15
|
+
|
|
16
|
+
## Steps for this command
|
|
17
|
+
|
|
18
|
+
1. **Identify the target workflow.** Look under the path configured in `.zibby.config.mjs` `paths.workflows` (default `.zibby/workflows/`). If multiple workflows exist, ask the user which one. If they're already `cd`'d inside one, infer from `${cwd}`.
|
|
19
|
+
|
|
20
|
+
2. **Get the node spec from the user.** Ask:
|
|
21
|
+
- Node name (kebab-case, e.g. `analyze-ticket`)
|
|
22
|
+
- One-sentence description of what it does
|
|
23
|
+
- Inputs (variables it reads from prior nodes)
|
|
24
|
+
- Outputs (variables it produces — these become available to downstream nodes)
|
|
25
|
+
|
|
26
|
+
3. **Create the node file** at `<workflow>/nodes/<name>.mjs`. Pattern from the existing `example.mjs`:
|
|
27
|
+
```js
|
|
28
|
+
export default {
|
|
29
|
+
id: '<name>',
|
|
30
|
+
description: '<one-sentence description>',
|
|
31
|
+
async run(ctx) {
|
|
32
|
+
// ctx.input — outputs from upstream nodes
|
|
33
|
+
// ctx.agent — call the configured AI agent
|
|
34
|
+
// ctx.shell — run shell commands in the sandbox
|
|
35
|
+
// return value becomes the node's output
|
|
36
|
+
return { /* ... */ };
|
|
37
|
+
},
|
|
38
|
+
};
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
4. **Register the node in `nodes/index.mjs`** — add the import + export.
|
|
42
|
+
|
|
43
|
+
5. **Wire into `graph.mjs`** — add the node id to the graph's `nodes` array, then add an `edge` from its predecessor (or from `START` if it's first) and an edge to `END` (or its successor).
|
|
44
|
+
|
|
45
|
+
6. **Update `workflow.json`** if the new node introduces an `outputSchema` the workflow's caller relies on. Most nodes don't need this.
|
|
46
|
+
|
|
47
|
+
7. **Test locally:**
|
|
48
|
+
```
|
|
49
|
+
zibby workflow run <workflow-name>
|
|
50
|
+
zibby workflow run <workflow-name> -p ticket=BUG-123 # with input
|
|
51
|
+
```
|
|
52
|
+
One-shot — exits when the run finishes. Same input flag surface as `zibby workflow trigger` (cloud).
|
|
53
|
+
|
|
54
|
+
8. **Deploy when ready:**
|
|
55
|
+
```
|
|
56
|
+
zibby workflow deploy <workflow-name>
|
|
57
|
+
```
|
|
58
|
+
Then `zibby workflow trigger <uuid>` and `zibby workflow logs <uuid> -t` to verify.
|
|
59
|
+
|
|
60
|
+
## Common pitfalls
|
|
61
|
+
|
|
62
|
+
- **Node name must match the file name and `id`** — mismatches cause silent skip.
|
|
63
|
+
- **`ctx.agent` calls block on the LLM** — large prompts can take 30+ seconds. Stream output for visibility.
|
|
64
|
+
- **Don't import npm packages inside `run()`** — declare deps in `<workflow>/package.json`. The deploy bundler installs them.
|
|
65
|
+
- **Failed nodes terminate the workflow** unless wrapped in try/catch and explicit `outputSchema.status: 'warn'`.
|
|
66
|
+
|
|
67
|
+
## When to consult the user vs proceed
|
|
68
|
+
|
|
69
|
+
Always ask before:
|
|
70
|
+
- Creating a node that calls external APIs (cost / data egress concern)
|
|
71
|
+
- Modifying `workflow.json` (changes the contract for downstream callers)
|
|
72
|
+
|
|
73
|
+
Proceed without asking when:
|
|
74
|
+
- Just adding a self-contained node and wiring it
|
|
75
|
+
- Tweaking the example/implementation in response to user spec
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
<!-- zibby-template-version: 4 -->
|
|
2
|
+
# /zibby-debug — diagnose a failing or stuck Zibby workflow
|
|
3
|
+
|
|
4
|
+
You are helping the user debug a workflow that didn't behave as expected.
|
|
5
|
+
|
|
6
|
+
Canonical docs: **https://docs.zibby.app/workflows/debugging**
|
|
7
|
+
|
|
8
|
+
## Diagnostic recipe
|
|
9
|
+
|
|
10
|
+
Apply in order. Stop at the first thing that explains the symptom.
|
|
11
|
+
|
|
12
|
+
### 1. Did the deploy succeed?
|
|
13
|
+
|
|
14
|
+
```
|
|
15
|
+
zibby workflow list
|
|
16
|
+
```
|
|
17
|
+
Find the workflow. If `bundleStatus` isn't `ready`, the deploy didn't finish. Re-run `zibby workflow deploy <name> --verbose` and read the CodeBuild output.
|
|
18
|
+
|
|
19
|
+
### 2. Did the trigger reach ECS?
|
|
20
|
+
|
|
21
|
+
```
|
|
22
|
+
zibby workflow trigger <uuid>
|
|
23
|
+
```
|
|
24
|
+
Look at the response — it should include a `Job ID` immediately. If you get an HTTP error, it's an auth or quota problem (CodeBuild concurrency, ECS task limit, etc.). Surface to the user.
|
|
25
|
+
|
|
26
|
+
### 3. Did the agent task START?
|
|
27
|
+
|
|
28
|
+
```
|
|
29
|
+
zibby workflow logs <uuid> -t
|
|
30
|
+
```
|
|
31
|
+
Within 30s of the trigger you should see `[setup] Fetching bundle...` then `zibby v<version>`. If silence past 30s:
|
|
32
|
+
- Maybe ECS couldn't pull the image — check CloudWatch alarm `zibby-sse-fanout-no-task-prod`
|
|
33
|
+
- Maybe the task started but its log stream is delayed — wait another 30s
|
|
34
|
+
- Maybe the workflow row hasn't been written yet (rare — would only affect the very first second)
|
|
35
|
+
|
|
36
|
+
### 4. Did the workflow execute the wrong path?
|
|
37
|
+
|
|
38
|
+
If the tail shows nodes running but in unexpected order, your `graph.mjs` edges are wrong. Common causes:
|
|
39
|
+
- Edge from `START` is missing — first node never runs
|
|
40
|
+
- Cycle in the graph — runtime errors with "cycle detected"
|
|
41
|
+
- Node id in `nodes/` array doesn't match the file's exported `id`
|
|
42
|
+
|
|
43
|
+
### 5. Did a node fail?
|
|
44
|
+
|
|
45
|
+
The tail will show `Error: Node '<name>' failed: <reason>`. Common reasons:
|
|
46
|
+
- Agent (LLM) returned malformed output that didn't match the node's `outputSchema`
|
|
47
|
+
- Node code threw an uncaught exception
|
|
48
|
+
- Shell command in the sandbox returned non-zero
|
|
49
|
+
|
|
50
|
+
For agent errors, look for `│ Prompt sent to LLM:` and `│ Response:` blocks in the tail. The model's reply is right there.
|
|
51
|
+
|
|
52
|
+
### 6. Did the task die without finishing?
|
|
53
|
+
|
|
54
|
+
Look for `[fanout] hard timeout` in the SSE fan-out logs (sse-fanout container) — means the task ran past the cap. Or the status in DDB stays `running` indefinitely (zombie row). Re-trigger.
|
|
55
|
+
|
|
56
|
+
### 7. Are you seeing logs from a stale execution?
|
|
57
|
+
|
|
58
|
+
`-t` on a workflow UUID auto-attaches to the **latest** existing execution at connect time, plus new ones triggered while it's open. If you're tailing an old failed run, drain it (Ctrl+C, re-run after triggering fresh).
|
|
59
|
+
|
|
60
|
+
## Quick reference: what each piece does
|
|
61
|
+
|
|
62
|
+
- **Trigger** → writes a row to `zibby-prod-executions` (DDB) + spawns an ECS task
|
|
63
|
+
- **Task** → pulls bundle from S3, runs `node graph.mjs`, writes logs to CloudWatch, updates DDB status as it progresses
|
|
64
|
+
- **SSE fan-out** → polls CloudWatch, fans events out to subscribers (`-t` clients)
|
|
65
|
+
- **Status** → moves through `starting → running → completed/failed/error`
|
|
66
|
+
|
|
67
|
+
If `status` in DDB is wrong (e.g. stuck `running` after the task is gone), it's an upstream zombie — separate from any workflow logic issue.
|
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
<!-- zibby-template-version: 4 -->
|
|
2
|
+
# /zibby-delete — delete a deployed Zibby workflow
|
|
3
|
+
|
|
4
|
+
You are helping the user remove a workflow from Zibby Cloud.
|
|
5
|
+
|
|
6
|
+
**This is destructive.** It removes the workflow record, its bundle in S3, and its routing — but does NOT delete in-flight executions or their CloudWatch logs (those age out per their retention policy). New triggers against the deleted UUID will fail.
|
|
7
|
+
|
|
8
|
+
Canonical docs: **https://docs.zibby.app/workflows/lifecycle**
|
|
9
|
+
|
|
10
|
+
## Steps
|
|
11
|
+
|
|
12
|
+
1. **Get the UUID.** If user gave a name, look it up:
|
|
13
|
+
```
|
|
14
|
+
Bash(zibby workflow list)
|
|
15
|
+
```
|
|
16
|
+
Find the matching `name` and grab its `uuid`.
|
|
17
|
+
|
|
18
|
+
2. **Confirm with the user.** Always confirm before deleting — show them the workflow's name, project, last-triggered timestamp. Don't proceed silently.
|
|
19
|
+
```
|
|
20
|
+
"Delete workflow 'pr-summarizer' (uuid abc-123, last run 2 days ago)? This cannot be undone."
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
3. **Run the delete:**
|
|
24
|
+
```
|
|
25
|
+
Bash(zibby workflow delete <uuid>)
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
4. **Clean up local files** if the user wants. The local `.zibby/workflows/<name>/` folder isn't auto-deleted — ask before removing:
|
|
29
|
+
```
|
|
30
|
+
rm -rf .zibby/workflows/<name>
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## When NOT to delete
|
|
34
|
+
|
|
35
|
+
- If the user might want to re-deploy later — keep the local folder, just stop triggering it.
|
|
36
|
+
- If there are running executions — the deploy is gone but those will keep running until they exit. Tell the user to wait or `Ctrl+C`-equivalent (kill the ECS task) if urgent.
|
|
37
|
+
- For a "hide from list" feeling without losing history — there's no soft-delete; it's all-or-nothing.
|