npm - @zibby/cli - Versions diffs - 0.4.14 → 0.4.17 - Mend

@zibby/cli 0.4.14 → 0.4.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

package/templates/zibby-workflow-claude/claude/commands/zibby-deploy.md ADDED Viewed

@@ -0,0 +1,87 @@
+<!-- zibby-template-version: 4 -->
+# /zibby-deploy — deploy a Zibby workflow to the cloud
+You are helping the user deploy a workflow they've been building locally.
+## What `zibby workflow deploy` does
+1. Bundles the workflow's source (graph.mjs + nodes/ + package.json) into a tarball
+2. Uploads it to S3 via a presigned URL
+3. Triggers AWS CodeBuild to install deps + bake the bundle
+4. Updates DynamoDB so future triggers run the new bundle
+A successful deploy is required before `zibby workflow trigger <uuid>` works against the cloud.
+Canonical docs: **https://docs.zibby.app/workflows/deploying**
+## Steps for this command
+1. **Identify the workflow.** If the user passes a name, use it. Otherwise list everything under `paths.workflows` (from `.zibby.config.mjs`) and ask.
+2. **Pre-flight checks.** Read the workflow folder and confirm:
+   - `graph.mjs` exists and exports a graph
+   - `nodes/` has at least one node
+   - `workflow.json` is valid (must have `name`, `entryClass`, `triggers`)
+   - `package.json` declares all imports used in nodes (run a quick grep to spot missing deps)
+3. **Run the deploy:**
+   ```
+   zibby workflow deploy <workflow-name>
+   ```
+   This is interactive if `--project` isn't passed. The user picks a project, the CLI handles auth via the saved session token.
+4. **Watch the build.** The CLI streams CodeBuild output. If it succeeds, it prints the workflow's UUID. If it fails, the build logs show why — usually a missing dep in `package.json` or a syntax error in a node.
+5. **Verify post-deploy:**
+   ```
+   zibby workflow trigger <uuid> --input '{}'
+   zibby workflow logs <uuid> -t
+   ```
+   Tail logs until the workflow reaches `completed` (or `failed` — diagnose from logs).
+## Common failure modes
+- **Build fails with module-not-found** → node imports a package not in `package.json`. Add it and redeploy.
+- **Build succeeds but trigger fails immediately** → `entryClass` in `workflow.json` doesn't match a class exported by `graph.mjs`.
+- **Workflow runs but a node fails** → tail the live logs and read the error. Most are in the agent's prompt/output handling.
+## Optional flags worth knowing
+`zibby workflow deploy` accepts:
+- `--project <id>` — skip the interactive project picker
+- `--api-key <key>` — use a PAT instead of the session token (for CI)
+- `--env <path>` — sync a `.env` file into per-workflow env vars after deploy. Repeatable; later files override.
+- `--verbose` — print raw CodeBuild output during the build (helpful for debugging build failures)
+### Seeding per-workflow env on first deploy
+If the workflow needs its own `ANTHROPIC_API_KEY`, `DATABASE_URL`, etc., put them in a `.env` and pass `--env`:
+```
+zibby workflow deploy <name> --env .env
+zibby workflow deploy <name> --env .env --env .env.prod   # later files win
+```
+After deploy, manage them surgically with `zibby workflow env set/unset/list/push <uuid>`. See `/zibby-list` to recover the UUID; full guide at https://docs.zibby.app/cloud/env-vars.
+## Static outbound IP (dedicated egress)
+If the user's workflow needs to call APIs that require IP allowlisting (corporate GitHub, GitLab Enterprise, paranoid SaaS firewalls), the workflow needs the **dedicated egress IP** addon. The flag lives on the legacy alias `zibby deploy` (NOT `zibby workflow deploy`):
+| Flag | What it does |
+|------|-------------|
+| `zibby deploy <name> --dedicated-ip status` | Show current addon state for the account (active / inactive / billing) |
+| `zibby deploy <name> --dedicated-ip enable` | Enable the addon on the account (Pro subscription required, ~$50/mo). One-time per account. |
+| `zibby deploy <name> --dedicated-ip use` | Mark THIS workflow as using the static egress IP (per-workflow opt-in, after `enable`) |
+| `zibby deploy <name> --dedicated-ip unuse` | Stop routing this workflow through the static IP |
+| `zibby deploy <name> --dedicated-ip disable` | Disable the addon for the whole account |
+Typical first-time flow when the user says "I need a static outbound IP":
+1. `zibby deploy <name> --dedicated-ip status` — check whether they have it
+2. If inactive → `zibby deploy <name> --dedicated-ip enable` — enables the account-wide addon (interactive billing prompt; prerequisite Pro subscription)
+3. `zibby deploy <name> --dedicated-ip use` — opts this specific workflow in
+4. Regular `zibby workflow deploy <name>` from now on uses the static IP
+After `--dedicated-ip use`, every node in this workflow gets its outbound HTTP routed through the egress proxy, and `process.env.HTTP_PROXY` / `HTTPS_PROXY` are set in the sandbox automatically. Their static IPs are visible to customers via `https://docs.zibby.app/workflows/egress`.
+**Don't** run `--dedicated-ip enable` without confirming with the user — it has billing impact ($50/mo addon). Always confirm. See `/zibby-static-ip` for the deeper walkthrough.

package/templates/zibby-workflow-claude/claude/commands/zibby-list.md ADDED Viewed

@@ -0,0 +1,30 @@
+<!-- zibby-template-version: 4 -->
+# /zibby-list — list workflows (local + cloud) with their UUIDs and statuses
+You are helping the user see what workflows exist — locally scaffolded and remotely deployed.
+Canonical docs: **https://docs.zibby.app/cli-reference#workflow-list**
+## Steps
+1. **Run the list command:**
+   ```
+   Bash(zibby workflow list)
+   ```
+   This shows both local (in `.zibby/workflows/`) and remote (deployed to Zibby Cloud) workflows. Each row has: name, UUID, project, last triggered.
+2. **Filter on demand.** If the user wants only local or only remote:
+   ```
+   zibby workflow list --local-only
+   zibby workflow list --remote-only --project <id>
+   ```
+## When you'd use this
+- User asks "what workflows do I have?" → run it, show the result.
+- You need a UUID to pass into `/zibby-trigger`, `/zibby-tail`, `/zibby-delete` and the user only knows the name → run it, look up the UUID.
+- After a deploy to confirm the bundle landed.
+## Output expectations
+The output is human-readable text (not JSON). If you need to extract a specific UUID programmatically, parse the line for the workflow name. If the user has many workflows, ask which one they want — don't grep blind.

package/templates/zibby-workflow-claude/claude/commands/zibby-memory-cost.md ADDED Viewed

@@ -0,0 +1,39 @@
+<!-- zibby-template-version: 4 -->
+# /zibby-memory-cost — show real LLM token spend across past test runs
+You are helping the user see how many input/output/cache tokens their tests have actually burned, broken down per spec and per domain. This is real measured spend (read off run records in `.zibby/memory/.dolt/`), not an estimate.
+Canonical docs: **https://docs.zibby.app/tests/memory**
+## What the command shows
+```
+Bash(zibby memory cost)
+```
+Per-spec and per-domain rollup of:
+- Input tokens
+- Output tokens
+- Cache hit / cache write tokens (when the agent supports prompt caching)
+- Estimated $ cost (uses current public model pricing)
+- Recent-runs trend, so you can see if a spec is getting cheaper or more expensive over time
+The numbers are pulled from `test_runs` rows in the Dolt DB — every test run records the agent's actual usage on completion.
+## When to invoke
+- User asks "how much are my tests costing me?" or "which spec is the expensive one?"
+- After enabling prompt caching to confirm cache hits are landing
+- When deciding whether to swap to a cheaper agent on hot specs (`--agent` per run)
+- When triaging a regression in test runtime — high token counts often correlate with the agent retrying
+## Caveats
+- **Only counts what's in local memory.** Runs on machines that haven't pulled from the team remote won't appear. Run `/zibby-memory-pull` first if you want the full team picture.
+- **Pricing is informational.** Public API pricing changes; treat the $ column as a guide, not a bill. The token counts themselves are exact.
+- **Empty if you've never run a test with memory enabled.** Confirm the runs are in there with `/zibby-memory-stats` first.
+## Related
+- `/zibby-memory-stats` — what's in the DB at all
+- `/zibby-memory-pull` — refresh from team remote before reading cost

package/templates/zibby-workflow-claude/claude/commands/zibby-memory-pull.md ADDED Viewed

@@ -0,0 +1,47 @@
+<!-- zibby-template-version: 4 -->
+# /zibby-memory-pull — pull the team's latest test memory from the configured remote
+You are helping the user fetch the team's latest learnings (selectors, page model, insights, run history) from the project's configured memory remote into local `.zibby/memory/.dolt/`.
+Canonical docs: **https://docs.zibby.app/tests/memory**
+## When this is needed (vs. just runs automatically)
+`zibby test` auto-pulls before every run when a remote is configured, and auto-pushes after every passing run. So most of the time the user doesn't need to invoke pull manually. Manual pull is for:
+- Fresh clone of the repo — first sync to seed `.zibby/memory/.dolt/` from the remote
+- After a teammate landed a big batch of new learnings and the user wants them before running anything
+- Inspecting team memory (`/zibby-memory-stats`, `/zibby-memory-cost`) without running a test
+- Reconciling after a manual conflict in the Dolt DB
+## How to run
+```
+Bash(zibby memory pull)
+```
+The CLI fetches from whatever remote `zibby memory remote info` reports — BYO S3/GCS/DoltHub URL or the Zibby-hosted backend. No flags.
+## Pre-flight: is a remote configured?
+Before suggesting `pull`, check:
+```
+Bash(zibby memory remote info)
+```
+- **No remote configured** → pull errors out. Tell the user to either:
+  - Add their own: `zibby memory remote add aws://my-bucket/team/proj/main`
+  - Use the hosted one: `zibby memory remote use --hosted` (requires `zibby login`)
+  - See `/zibby-memory-remote-use-hosted` for the hosted path.
+- **Hosted remote, signed out** → `zibby login` first.
+## After pulling
+Confirm the pull landed with `/zibby-memory-stats` — row counts should jump (selectors, runs, insights) compared to before.
+## Related
+- `zibby memory push` — manual push (auto on passing test, but sometimes you want to share now)
+- `/zibby-memory-stats` — verify what came in
+- `/zibby-memory-remote-use-hosted` — switch to the Zibby-managed S3 backend

package/templates/zibby-workflow-claude/claude/commands/zibby-memory-remote-use-hosted.md ADDED Viewed

@@ -0,0 +1,61 @@
+<!-- zibby-template-version: 4 -->
+# /zibby-memory-remote-use-hosted — switch this project's memory remote to Zibby-managed S3
+You are helping the user point their `.zibby/memory/.dolt/` at Zibby's hosted S3 backend, instead of running their own S3 bucket / GCS / DoltHub repo.
+Canonical docs: **https://docs.zibby.app/tests/memory**
+## What this does
+```
+Bash(zibby memory remote use --hosted)
+```
+Allocates a tenant-scoped prefix on Zibby-managed S3 for this project (keyed on the projectId in `.zibby.config.mjs`) and writes that as the local Dolt remote. After this, every `zibby test` run auto-pulls before and auto-pushes after — same as a BYO remote, just without the bucket plumbing.
+## Prerequisite: signed in
+Hosted remote is **signed-in users only**. Verify:
+```
+Bash(zibby status)
+```
+If not signed in, run `zibby login` first. The CLI uses the saved session to derive the tenant prefix; it won't fall back to anonymous.
+## When to use hosted vs BYO
+| | Hosted (`--hosted`) | BYO (`zibby memory remote add aws://...`) |
+|---|---|---|
+| Setup time | Zero — `--hosted` and you're done | Provision an S3 bucket, IAM, optional KMS |
+| Who can read | Everyone with project access on Zibby | Whoever you grant in IAM |
+| Where data lives | Zibby-managed AWS account | Your account |
+| Compliance / data-residency | Limited regions | Wherever you want |
+| Cost | Included in plan | Your S3 bill |
+If the user has any data-residency requirement or a regulated workload, prefer BYO. Otherwise hosted is the path of least resistance.
+## Switching from BYO to hosted
+`zibby memory remote use --hosted` overwrites the existing remote. If they had a BYO remote and might want to keep its history, run `zibby memory push` against the old remote first so nothing's lost — then switch.
+## After switching
+1. `zibby memory pull` — seed `.zibby/memory/.dolt/` from the hosted prefix (no-op the very first time per project)
+2. `/zibby-memory-stats` — confirm
+3. Commit `.zibby.config.mjs` if you set `memorySync.remote: 'hosted'` so teammates auto-wire on next `zibby init`
+## Reverting
+```
+Bash(zibby memory remote remove)
+```
+Drops the remote — memory becomes local-only again. The data on Zibby's S3 isn't deleted (it's still tenant-scoped), but nothing pushes or pulls until a new remote is configured.
+## Related
+- `/zibby-memory-pull` — manual pull (auto on test start)
+- `/zibby-memory-stats` — verify what's in the local DB
+- `zibby memory remote info` — show current remote config
+- `zibby memory remote add <url>` — BYO remote (S3/GCS/DoltHub/file:///)

package/templates/zibby-workflow-claude/claude/commands/zibby-memory-stats.md ADDED Viewed

@@ -0,0 +1,38 @@
+<!-- zibby-template-version: 4 -->
+# /zibby-memory-stats — inspect the local test memory database
+You are helping the user see what's in their `.zibby/memory/.dolt/` test-memory DB — row counts per table, last commit, and per-spec breakdown.
+Canonical docs: **https://docs.zibby.app/tests/memory**
+## What the command shows
+```
+Bash(zibby memory stats)
+```
+Prints a summary of the local Dolt database:
+- **Test runs** — total runs recorded, pass/fail split, last run timestamp
+- **Selectors** — total cached selectors, top pages by selector count
+- **Page model** — pages mapped, total elements
+- **Navigation** — known transitions
+- **Insights** — count by category (`selector_tip | timing | navigation | workaround | flaky | general`)
+- **Dolt status** — current branch, last commit hash, uncommitted changes
+## When to invoke
+- User asks "what does Zibby know about my app?" or "show me what's in test memory"
+- After running a few tests, to confirm the agent is actually persisting learnings
+- Before a `zibby memory compact` to see how much there is to prune
+- Before a `zibby memory remote add` to know what's about to ship to the team
+## Empty database?
+If the user just ran `zibby memory init` (or it auto-initialized on first `zibby test`), most counts will be 0. That's fine — selectors and page model populate after the first successful run. Suggest running a test first.
+## Related commands
+- `/zibby-memory-cost` — real LLM token spend per spec / per domain
+- `/zibby-memory-pull` — pull team's latest learnings from the configured remote
+- `zibby memory compact` — prune old runs (`--max-runs N`, `--max-age <days>`)
+- `zibby memory reset -f` — wipe the DB (destructive — confirm first)

package/templates/zibby-workflow-claude/claude/commands/zibby-static-ip.md ADDED Viewed

@@ -0,0 +1,70 @@
+<!-- zibby-template-version: 4 -->
+# /zibby-static-ip — set up dedicated outbound static IP for a workflow
+You are helping the user route a workflow's outbound traffic through a static IP address — needed when the workflow calls APIs that require IP allowlisting (corporate GitLab/GitHub Enterprise, internal SaaS, firewalls).
+Canonical docs: **https://docs.zibby.app/workflows/egress**
+> Note: the `--dedicated-ip` flag lives on the legacy alias `zibby deploy <name>`, NOT on `zibby workflow deploy <name>`. The two share a handler, but only `zibby deploy` exposes this flag in `--help`.
+## What "static IP" means here
+By default, workflow tasks run on Fargate and their outbound traffic exits via AWS-managed IPs that rotate. With the **dedicated egress** addon enabled, the workflow's outbound traffic is routed through a Zibby-managed proxy whose IP is pinned and customer-allowlistable.
+Two pieces:
+1. **Account-level addon** — `~$50/mo`, requires Pro subscription. One-time toggle per account.
+2. **Per-workflow opt-in** — once the addon is on, each workflow opts in individually (some workflows might not need it, no point routing them).
+## Steps
+1. **Confirm the user understands the cost.** Before any `--dedicated-ip enable`, explicitly say:
+   ```
+   "This will enable a $50/mo dedicated-egress addon on your account. Confirm?"
+   ```
+   If they don't have a Pro subscription, the enable call returns 402 — direct them to https://zibby.dev/billing first.
+2. **Check current state:**
+   ```
+   Bash(zibby deploy <name> --dedicated-ip status)
+   ```
+   Output tells you: addon active or inactive, this workflow currently using it or not, and the assigned IPs to publish to customers.
+3. **If addon is inactive — enable it** (only after explicit user confirmation):
+   ```
+   Bash(zibby deploy <name> --dedicated-ip enable)
+   ```
+   This is one-time per account. After this, the addon is active for ALL workflows in the account that opt in.
+4. **Opt this workflow in:**
+   ```
+   Bash(zibby deploy <name> --dedicated-ip use)
+   ```
+   From now on, every deploy of this workflow + every triggered execution routes outbound through the static IP.
+5. **Re-deploy the workflow** so the runtime picks up the change:
+   ```
+   Bash(zibby workflow deploy <name>)
+   ```
+6. **Verify in a node** by adding a quick log:
+   ```js
+   ctx.log(`HTTP_PROXY=${process.env.HTTP_PROXY}`);
+   ```
+   The proxy URL should be set on the node's process. The IP that external services see is the dedicated one.
+## Reverting
+- `Bash(zibby deploy <name> --dedicated-ip unuse)` — stop routing this workflow's egress through the static IP. Other opted-in workflows are unaffected.
+- `Bash(zibby deploy <name> --dedicated-ip disable)` — disable the addon entirely (also stops billing).
+## Tell the user the IPs
+After `enable`, the assigned outbound IPs are visible in `--dedicated-ip status` output. Surface them clearly so the user can paste into their customer's firewall allowlist:
+```
+Outbound static IPs (allowlist these in the customer's firewall):
+  54.252.121.111
+```
+## Don't confuse with the inbound static IPs
+This is OUTBOUND (workflow → external API). There's also an INBOUND static-IP set for `https://logs-stream.zibby.app` (the SSE log endpoint customers tail with `zibby workflow logs -t`). That one is unrelated to this addon — see `https://docs.zibby.app/security/ip-allowlist`.

package/templates/zibby-workflow-claude/claude/commands/zibby-tail.md ADDED Viewed

@@ -0,0 +1,53 @@
+<!-- zibby-template-version: 4 -->
+# /zibby-tail — stream live logs from a Zibby workflow
+You are helping the user tail logs from a workflow execution.
+`zibby workflow logs <jobId> -t` is the live-streaming variant (like `heroku logs --tail` or `docker compose logs -f`). Without `-t`, you get a one-shot fetch.
+Canonical docs: **https://docs.zibby.app/workflows/logs**
+## What `<jobId>` accepts
+- A **workflow UUID** (`2b1ea07f-...`) → tails ALL currently-active executions of that workflow, plus any new ones triggered while the tail is open. Lines are interleaved by arrival time, prefixed with `(taskId)` so concurrent runs are distinguishable.
+- An **execution ID** (`abc-...`) → tails just that single execution. Closes when the execution drains.
+## Example flow
+```
+$ zibby workflow trigger 2b1ea07f-3ede-4bfd-a51d-431f0bab008e
+Job ID: 569ef1ee-15c8-4ee7-af30-933c8bec7ea7
+$ zibby workflow logs 2b1ea07f-3ede-4bfd-a51d-431f0bab008e -t
+  Streaming logs for workflow 2b1ea07f-...
+  Press Ctrl+C to stop.
+  ┌─ Execution: 569ef1ee...7ea7 (task: 9e61c690)
+  └─ Streaming logs...
+2026-05-04 06:16:27.621  (9e61c690) zibby v0.1.91
+2026-05-04 06:16:27.997  (9e61c690)  Workflow:  hello-world
+2026-05-04 06:16:33.055  (9e61c690) │ Prompt sent to LLM:
+...
+2026-05-04 06:16:56.389  (9e61c690) ✓ Workflow completed
+```
+## Tail behavior worth knowing
+- **Auto-switch:** when the current execution drains and a new trigger lands, the tail seamlessly picks it up — no need to re-run the command.
+- **Multi-attach:** if you trigger two workflows in parallel, the tail attaches to both. Lines from each are interleaved by timestamp.
+- **Reconnect:** transient network blips reconnect silently. Long quiet windows (3+ min) are kept alive via 5s keepalives.
+## Steps for this command
+1. Identify the jobId. If user gives a workflow name (not UUID), look it up via `zibby workflow list` and use the UUID.
+2. **Run in the background** — `-t` is a long-running stream. If you call `Bash` without `run_in_background: true` it'll block the chat for minutes:
+   ```
+   Bash({ command: "zibby workflow logs <jobId> -t", run_in_background: true })
+   ```
+   Then use `BashOutput` (or whatever your background-output tool is) to read new lines as they arrive.
+3. If the user wants a **one-shot snapshot** (no follow), drop the `-t` flag and call Bash normally:
+   ```
+   Bash({ command: "zibby workflow logs <jobId>" })
+   ```
+4. To stop the live tail: kill the background bash task. The CLI exits cleanly with `Stopped streaming.`

package/templates/zibby-workflow-claude/claude/commands/zibby-test-debug.md ADDED Viewed

@@ -0,0 +1,59 @@
+<!-- zibby-template-version: 4 -->
+# /zibby-test-debug — diagnose a failing Zibby test
+You are helping the user figure out why a test failed.
+Canonical docs: **https://docs.zibby.app/tests/debugging**
+## Diagnostic recipe
+Apply in order. Stop at the first thing that explains the symptom.
+### 1. Was the failure in the AGENT phase or the PLAYWRIGHT phase?
+Two places things go wrong:
+- **Agent phase** (`generate_script` / `execute_live`) — the AI couldn't figure out what to do or what selector to use. Look for `MCP error`, `tool timeout`, or generic LLM gibberish in the output.
+- **Playwright phase** (after script generation) — script ran but an assertion failed or a selector didn't match. Look for `expect(...)` failures, `Timeout` waiting for selector, or "element not visible".
+### 2. Open the artifacts
+Each run produces:
+- `test-results/<spec-name>/video.webm` — watch what happened
+- `test-results/<spec-name>/trace.zip` — Playwright trace; open with `npx playwright show-trace <path>`
+- `.zibby/output/sessions/<session-id>/` — agent's internal state, prompts, responses
+- `tests/<spec-name>.spec.js` — the generated Playwright code
+The video usually tells you in 30 seconds what's wrong.
+### 3. Check the spec
+Spec ambiguity is the most common cause. If the spec says "click the button" and there are five buttons, the agent picks one — possibly the wrong one. Re-read the spec asking: would a stranger know exactly which element this means?
+### 4. Re-run with more verbosity
+```
+Bash(zibby test test-specs/<name>.txt --verbose)        # info-level logs
+Bash(zibby test test-specs/<name>.txt --debug)          # all logs, lots
+Bash(zibby test test-specs/<name>.txt)                  # default is headed — drop --headless to watch the browser
+```
+### 5. Re-execute one node from a prior session
+If `execute_live` failed but the script generation was OK, re-execute just that node against the existing session:
+```
+Bash(zibby test --node execute_live --session last)
+```
+### 6. Common errors and fixes
+| Symptom | Likely cause | Fix |
+|---------|-------------|-----|
+| "ZIBBY_API_KEY required" | not authenticated | `Bash(zibby login)` |
+| "MCP server not responding" | playwright-mcp config drifted | `Bash(zibby setup-playwright)` |
+| "Selector not found" | UI changed since last run | re-generate from the spec, or update selector in `.spec.js` |
+| Agent loops forever in `execute_live` | spec is too vague | tighten the spec; add the explicit selector text |
+| "Module not found" | missing dep in repo | `Bash(npm install)` in the repo root |
+### 7. When to escalate
+If the same spec passed yesterday and fails today, and the codebase didn't change → check if Zibby pushed an agent update (`zibby --version`). Otherwise it's almost always spec ambiguity or a real product regression.

package/templates/zibby-workflow-claude/claude/commands/zibby-test-generate.md ADDED Viewed

@@ -0,0 +1,39 @@
+<!-- zibby-template-version: 4 -->
+# /zibby-test-generate — generate test specs from a Jira ticket / requirements
+You are helping the user auto-generate test specs from a ticket description (Jira) or a free-text requirements doc. Zibby's `generate` command runs the configured AI agent against the codebase + ticket and produces `.txt` specs in `test-specs/`.
+Canonical docs: **https://docs.zibby.app/tests/generating**
+## Inputs the command accepts
+- **Jira ticket key:** `-t ENG-1234` — fetches the ticket from the configured Jira integration
+- **Inline description:** `-d "When a user clicks Buy, charge the card and email a receipt"`
+- **File:** `-i path/to/requirements.md`
+- **Repo path** (defaults to cwd): `--repo /path/to/codebase`
+## Steps
+1. **Get the source.** Ask the user: ticket key, file, or paste the requirements?
+2. **Run generate:**
+   ```
+   Bash(zibby generate -t ENG-1234)
+   Bash(zibby generate -i requirements.md)
+   Bash(zibby generate -d "happy-path checkout flow")
+   ```
+3. **Pick the agent and model** if the user has preferences:
+   ```
+   Bash(zibby generate -t ENG-1234 --agent claude --model claude-opus-4)
+   ```
+4. **Output destination.** Specs land in `test-specs/` by default; override with `-o <dir>`.
+5. **Review before running.** After generation, the user should read the spec(s) and tweak. Then `/zibby-test-run` to execute.
+## Tips
+- **Better tickets = better specs.** Vague tickets ("fix login") generate vague tests. Tight acceptance criteria ("login redirects to /dashboard on success, shows 'invalid' on failure") generate sharp specs.
+- **Generate iterates.** If the first cut is wrong, re-run with `-d "be more specific about X"` rather than hand-editing.
+- **The agent reads the codebase.** It looks at component files, route definitions, etc., so the specs reference real selectors and URLs. Run from the actual repo root, not a subdir.

package/templates/zibby-workflow-claude/claude/commands/zibby-test-run.md ADDED Viewed

@@ -0,0 +1,49 @@
+<!-- zibby-template-version: 4 -->
+# /zibby-test-run — execute a Zibby test spec
+You are helping the user run an existing test spec through Zibby. A spec is a `.txt` file describing what to test in plain language; Zibby's runner turns it into a Playwright execution and produces a video + JSON results.
+Canonical docs: **https://docs.zibby.app/tests/running**
+## Steps
+1. **Identify the spec.** Most projects keep specs under `test-specs/` (configurable in `.zibby.config.mjs` `paths.specs`). If user named one, use it. Otherwise list what's there and ask:
+   ```
+   ls test-specs/
+   ```
+2. **Run it.** From the project root:
+   ```
+   Bash(zibby test test-specs/<name>.txt)
+   ```
+   For a quick inline test without writing a spec file:
+   ```
+   Bash(zibby test "go to example.com and check that the title contains Example")
+   ```
+3. **Output to expect.** Zibby streams the run live — agent thinking, browser actions, assertion results, final pass/fail. Generated `.spec.js` lands in `tests/<name>.spec.js` (configurable via `paths.generated`). Video + traces under `test-results/`.
+4. **If running headless / CI:**
+   ```
+   Bash(zibby test test-specs/<name>.txt --headless)
+   ```
+5. **If running a specific node only** (advanced — re-execute one phase of a prior session):
+   ```
+   Bash(zibby test --node execute_live --session last)
+   ```
+## Useful flags
+- `--agent claude|cursor|codex|gemini` — override the configured agent for this run
+- `--workflow QuickSmokeWorkflow` — use a non-default workflow for the run
+- `--verbose` / `--debug` — escalate log levels
+- `-m, --mem` — enable test memory (Dolt-backed knowledge from prior runs)
+- `--sync` / `--no-sync` — force / skip cloud upload regardless of config
+- `--sources <ids> --execution <id>` — run cloud-stored test cases from a specific execution (comma-separated IDs)
+## Common failure modes
+- **"No spec found"** — path is relative to project root, not cwd. Check `paths.specs` in `.zibby.config.mjs`.
+- **"Browser crashed"** — usually the playwright browser cache is stale. Drop `--headless` once (default is headed) so you can see what's happening, then re-add `--headless` once it's healthy.
+- **MCP errors during `execute_live`** — the agent's MCP tool config may need refreshing. See `/zibby-test-debug`.

package/templates/zibby-workflow-claude/claude/commands/zibby-test-write.md ADDED Viewed

@@ -0,0 +1,46 @@
+<!-- zibby-template-version: 4 -->
+# /zibby-test-write — author a new Zibby test spec
+You are helping the user write a new test spec. Specs are plain-language `.txt` files in `test-specs/` (configurable via `.zibby.config.mjs` `paths.specs`). Zibby's runner converts them to Playwright at execution time.
+Canonical docs: **https://docs.zibby.app/tests/specs**
+## Spec format (informal but conventional)
+A spec is mostly imperative English with one action per line. Common shape:
+```
+Title: <one-line summary>
+Setup:
+- Open <url>
+- Log in as <user>
+Steps:
+- Click <element>
+- Type <value> into <field>
+- Wait for <state>
+Verify:
+- <assertion>
+- <assertion>
+```
+Zibby tolerates loose phrasing — what matters is being unambiguous about WHICH element and WHAT value. Use stable selectors (visible text, ARIA labels) over CSS class names.
+## Steps for this command
+1. **Ask the user what they want to test.** What's the user flow? What are they verifying? What URL?
+2. **Find a similar existing spec to mirror.** `ls test-specs/` and read 1-2 to match the project's conventions.
+3. **Write the spec to `test-specs/<kebab-case-name>.txt`** using `Write` tool.
+4. **Offer to run it immediately** with `/zibby-test-run` (or just `Bash(zibby test test-specs/<name>.txt)`).
+## Naming conventions
+- kebab-case: `login-with-sso.txt`, `cart-checkout-happy-path.txt`
+- Group by feature: `users-create.txt`, `users-edit.txt`, `users-delete.txt`
+- Avoid ambiguous names like `test1.txt`
+## When the spec is complex
+For multi-page flows or many assertions, split into multiple specs and run them as a collection. Don't pile everything into one spec — Playwright errors are easier to localize when each spec is one user goal.