@zibby/skills 0.1.27 → 0.1.28

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@zibby/skills",
3
- "version": "0.1.26",
3
+ "version": "0.1.28",
4
4
  "description": "Built-in skill definitions for Zibby test automation framework",
5
5
  "type": "module",
6
6
  "main": "dist/index.js",
@@ -0,0 +1,114 @@
1
+ ---
2
+ sidebar_position: 4
3
+ title: Agent operator
4
+ ---
5
+
6
+ # The agent-ops sidecar
7
+
8
+ Every Zibby Managed App ships with **agent-ops**, an autonomous daemon sidecar that runs alongside the app container. It does what a human operator would do — checks health, restarts on crash, prunes disk, rolls upgrades — only it does it every hour, never sleeps, and never forgets to file a runbook.
9
+
10
+ This is the structural difference between "deploy button on a VM" and **Zibby**.
11
+
12
+ ## What it does
13
+
14
+ | Task | Cadence | What happens |
15
+ |---|---|---|
16
+ | **Hourly health check** | every 60 min | HTTP probe + container state + EFS usage. Recorded as a structured run record. |
17
+ | **Self-heal on OOM** | event-driven | Container exits with OOMKilled → agent-ops triggers an ECS restart, records the recovery. |
18
+ | **Disk-pressure prune** | when EFS > 90% | Removes safe-to-delete caches (e.g. n8n execution history older than 30 days). Configurable. |
19
+ | **Upgrade orchestration** | on schedule | When a new app version lands in the catalog, agent-ops can run the in-place upgrade on a cron you set. |
20
+ | **Activity log** | every action | One row in the app's "Agent activity" tab, with structured fields you can grep / chart. |
21
+
22
+ Every action lands in DynamoDB as an `app-runs` record — queryable by anything from a workflow node to a Grafana dashboard.
23
+
24
+ ## See it in action
25
+
26
+ ```bash
27
+ zibby app activity a1b2c3d4
28
+ ```
29
+
30
+ ```
31
+ Time Action Status Duration Notes
32
+ 14:00:01 hourly_health_check ok 1.2s
33
+ 13:00:01 hourly_health_check ok 1.1s
34
+ 12:04:38 oom_recovery ok 4.8s restarted container after OOMKilled
35
+ 12:00:01 hourly_health_check warn 2.1s container restarting
36
+ 11:00:01 hourly_health_check ok 1.3s
37
+ 10:00:01 hourly_health_check ok 1.0s
38
+ ```
39
+
40
+ The dashboard's "Agent activity" tab shows the same records with extra context (HTTP status codes, container logs at failure time, recovery diff).
41
+
42
+ ## Run records ≠ logs
43
+
44
+ A run record is **structured metadata** about something agent-ops did:
45
+
46
+ ```json
47
+ {
48
+ "instanceId": "a1b2c3d4",
49
+ "runId": "01J9KZQF...",
50
+ "action": "hourly_health_check",
51
+ "status": "ok",
52
+ "startedAt": "2026-05-30T14:00:01Z",
53
+ "duration_ms": 1234,
54
+ "httpStatus": 200,
55
+ "containerState": "RUNNING"
56
+ }
57
+ ```
58
+
59
+ Logs are unstructured text. Run records are queryable, chartable, and aggregatable — that's why the Agent activity tab can show a 30-day uptime % without you running grep.
60
+
61
+ The CLI exposes them:
62
+
63
+ ```bash
64
+ zibby app activity a1b2c3d4 --since 7d
65
+ zibby app activity a1b2c3d4 --action oom_recovery
66
+ zibby app activity a1b2c3d4 --json | jq '.[] | select(.status == "warn")'
67
+ ```
68
+
69
+ ## Why a sidecar, not a centralized controller?
70
+
71
+ Three properties only this shape gives you:
72
+
73
+ 1. **No noisy-neighbor failure mode** — your instance's agent-ops can't be blocked by another instance's slow health check
74
+ 2. **Per-instance customization without a central feature flag** — env vars (`AGENT_OPS_CHECK_INTERVAL_MIN=15`, `AGENT_OPS_PRUNE_THRESHOLD=80`) live on your instance and just work
75
+ 3. **Egress identity matches the app** — outbound calls from agent-ops use the same ENI / NAT path as the app itself, so when the app has a dedicated egress IP, agent-ops's webhook callbacks come from the same IP
76
+
77
+ ## Customize via env
78
+
79
+ Per-instance agent-ops behavior is tunable via env vars (set on the app instance — Apps → ENV tab — or via `zibby app env set`):
80
+
81
+ | Env var | Default | What it controls |
82
+ |---|---|---|
83
+ | `AGENT_OPS_CHECK_INTERVAL_MIN` | 60 | Minutes between hourly health checks |
84
+ | `AGENT_OPS_PRUNE_THRESHOLD` | 90 | EFS usage % that triggers disk-pressure prune |
85
+ | `AGENT_OPS_AUTO_UPGRADE` | `false` | If `true`, upgrade automatically when catalog publishes a new version |
86
+ | `AGENT_OPS_NOTIFY_WEBHOOK` | — | URL to POST run records to (any HTTPS endpoint — your own backend, n8n, etc.) |
87
+
88
+ `AGENT_OPS_NOTIFY_WEBHOOK` is how you wire agent-ops into your existing observability stack — fire every run record into your team's #ops Slack via a workflow trigger, into Datadog via their webhook receiver, or into your own database.
89
+
90
+ ## Hooking agent-ops into a workflow
91
+
92
+ The most powerful pattern: a Zibby workflow that runs **on agent-ops events**.
93
+
94
+ Example: when an `oom_recovery` fires, run a workflow that pulls the container's last-100-lines, classifies the crash, and pages whoever owns this app:
95
+
96
+ ```bash
97
+ zibby app env set a1b2c3d4 \
98
+ AGENT_OPS_NOTIFY_WEBHOOK=https://api-prod.zibby.app/v1/workflows/<wf-uuid>/trigger
99
+ ```
100
+
101
+ The workflow receives the run record as `input`, can call back to `zibby app logs` / `zibby app status`, and decides what to do. Agent-ops + workflows compose into a self-operating fleet — humans only get pinged for genuinely novel failure modes.
102
+
103
+ ## Upgrade orchestration
104
+
105
+ When you `zibby app upgrade <id>` manually, agent-ops watches the rollout and rolls back if the new task fails health checks twice in a row. With `AGENT_OPS_AUTO_UPGRADE=true` set, the upgrade fires on a cron (default: weekly, Sunday 04:00 UTC) — agent-ops runs the same flow:
106
+
107
+ 1. Register new task definition revision (catalog's latest)
108
+ 2. Update service, watch the rollout
109
+ 3. If 2 consecutive health checks pass on the new revision → keep it
110
+ 4. If 2 fail → roll back to the previous revision, log a `failed_upgrade` run record
111
+
112
+ The activity log shows the full attempted upgrade timeline so you can see why a rollback happened.
113
+
114
+ → Done with apps. See [Workflows](../get-started/your-first-workflow) for the agent-pipeline counterpart, or [CLI Reference](../cli-reference#app-commands) for the full `zibby app` command surface.
@@ -0,0 +1,120 @@
1
+ ---
2
+ sidebar_position: 2
3
+ title: Deploy your first app
4
+ ---
5
+
6
+ # Deploy your first app
7
+
8
+ A complete walk-through — from `zibby app templates` to a running n8n instance behind a stable URL — in 30 seconds.
9
+
10
+ ## Prerequisites
11
+
12
+ You'll need the CLI installed and authenticated:
13
+
14
+ ```bash
15
+ npm install -g @zibby/cli
16
+ zibby login # OAuth in browser, saves session to ~/.zibby/config.json
17
+ ```
18
+
19
+ You also need a project. If you don't have one yet, deploy a workflow first or create one in the [Zibby dashboard](https://studio.zibby.dev) — apps are scoped to projects so per-instance EFS volumes can be isolated per team.
20
+
21
+ ## Browse the catalog
22
+
23
+ ```bash
24
+ zibby app templates
25
+ ```
26
+
27
+ ```
28
+ ID Display name Tier Rate Description
29
+ n8n n8n Light $0.05/hr Workflow automation. 200+ integrations.
30
+ grafana Grafana Light $0.05/hr Dashboards for metrics, logs, traces.
31
+ gastown Gas Town Light $0.05/hr Multi-agent workspace. Coordinate Claude, Codex, Cursor, Gemini.
32
+ drawio draw.io Light $0.05/hr Client-side diagram editor. Flowcharts, UML, ER, network.
33
+ open-webui Open WebUI Heavy $0.25/hr ChatGPT-style UI for Ollama / OpenAI-compatible endpoints.
34
+ ```
35
+
36
+ ## Deploy
37
+
38
+ ```bash
39
+ zibby app deploy n8n --project <project-id> --name automations
40
+ ```
41
+
42
+ On success:
43
+
44
+ ```
45
+ ↑ Provisioning n8n on Fargate…
46
+ ECS service + EFS volume + ALB target group
47
+ agent-ops sidecar starting…
48
+ ✔ Deployed (instanceId: a1b2c3d4)
49
+ → Public URL: https://a1b2c3d4.apps.zibby.dev
50
+ ```
51
+
52
+ `--project` is interactive-prompted if omitted (CLI walks you through your project list).
53
+ `--name` controls the **display name** — what shows in `zibby app list` and the dashboard. The subdomain is a separate opaque identifier (the instance ID), stable for the life of the instance.
54
+
55
+ The provisioning steps:
56
+
57
+ 1. **Allocate an instance ID** — short hex token used as the subdomain
58
+ 2. **Create EFS access point** — per-instance volume, encrypted at rest, AZ-pinned
59
+ 3. **Register task definition** — pinned to the catalog entry's image + your resource tier
60
+ 4. **Spin up ECS service** — desired count 1, agent-ops sidecar bundled alongside the app container
61
+ 5. **Wire the ALB** — listener rule routes `<id>.apps.zibby.dev` to the new target group
62
+ 6. **Health-check loop** — the first agent-ops tick fires once the container is up
63
+
64
+ Wall-clock: ~45-90 seconds. The CLI streams progress and prints the public URL the moment the ALB is responsive.
65
+
66
+ ## Verify
67
+
68
+ ```bash
69
+ zibby app status a1b2c3d4
70
+ ```
71
+
72
+ ```
73
+ ● automations (n8n v1.97.1)
74
+ ┌ status running (1/1) ✓
75
+ ├ resources 0.5 vCPU · 1 GB RAM ✓
76
+ └ hourly $0.05/hr ✓
77
+
78
+ Public URL: https://a1b2c3d4.apps.zibby.dev
79
+
80
+ Last agent-ops run: 14:00:01 hourly_health_check ok (1.2s)
81
+ ```
82
+
83
+ Open the URL in a browser — n8n's setup screen renders, you create the admin account, and the instance is private to you. The data sits on the EFS volume, encrypted and isolated; no other Zibby customer can reach it.
84
+
85
+ ## Watch logs while it warms up
86
+
87
+ If the app behaves oddly on first launch, tail logs:
88
+
89
+ ```bash
90
+ zibby app logs a1b2c3d4 -t
91
+ ```
92
+
93
+ Logs cover both the app container **and** the agent-ops sidecar. Container logs are color-coded by source:
94
+
95
+ ```
96
+ 14:00:00.122 [n8n] Listening on port 5678
97
+ 14:00:01.044 [agent-ops] hourly_health_check: HTTP 200 in 1.2s
98
+ 14:00:01.061 [agent-ops] ✓ instance healthy — next tick in 60m
99
+ ```
100
+
101
+ `Ctrl+C` exits tail mode; logs persist in CloudWatch with 30-day retention.
102
+
103
+ ## What's actually private vs shared
104
+
105
+ Mental model that lines up with what the bill shows:
106
+
107
+ | Resource | Per-instance? |
108
+ |---|---|
109
+ | Subdomain (`<id>.apps.zibby.dev`) | ✅ Yours |
110
+ | EFS volume | ✅ Yours, encrypted |
111
+ | ALB target group | ✅ Yours |
112
+ | ECS task definition | ✅ Yours (revisions tracked) |
113
+ | Fargate task | ✅ Yours |
114
+ | ALB itself | Shared — pooled across all tenants |
115
+ | ECS cluster | Shared |
116
+ | EFS file system | Shared, but per-instance access points enforce isolation |
117
+
118
+ The shared bits are why per-minute pricing can be $0.05/hr instead of $30/mo — economies of scale on the platform side.
119
+
120
+ → Next: [Manage instances](./managing)
@@ -0,0 +1,74 @@
1
+ ---
2
+ sidebar_position: 1
3
+ title: Apps overview
4
+ ---
5
+
6
+ # Managed Apps
7
+
8
+ One-click hosted instances of open-source tools (n8n, Grafana, Open WebUI, draw.io, Gas Town, …), each private to your project — with an **autonomous agent-ops sidecar** that handles health checks, self-healing, and upgrades on its own.
9
+
10
+ ```bash
11
+ zibby app templates # browse the catalog
12
+ zibby app deploy n8n # one-click — ECS service + EFS volume + ALB target group
13
+ zibby app logs <id> -t # tail logs, SSE auto-reconnect
14
+ zibby app status <id> # uptime, cost, version, agent-ops activity
15
+ ```
16
+
17
+ ## Why apps (not workflows)
18
+
19
+ Both are pillars of Zibby Cloud. Pick by **how long the thing needs to run**:
20
+
21
+ | | **Workflow** | **App** |
22
+ |---|---|---|
23
+ | Lifetime | Per-trigger (seconds to minutes) | Long-lived (24/7 or paused) |
24
+ | Surface | A graph of agent CLI calls | A whole open-source application |
25
+ | Billing | Per execution | Per minute, while running |
26
+ | Persistence | Session JSONL + S3 artifacts | Encrypted-at-rest EFS volume |
27
+ | Best for | "When ticket lands, classify it" | "Host n8n for the team" |
28
+
29
+ If you find yourself wanting to **run an open-source web app behind a stable URL**, that's an App. If you want **agent-driven business logic that fires on events**, that's a Workflow.
30
+
31
+ ## What you get with every app
32
+
33
+ - **Private subdomain** — `<instance-id>.apps.zibby.dev`, TLS by default
34
+ - **Dedicated EFS volume** — encrypted-at-rest, persists across container restarts and upgrades
35
+ - **Per-instance ALB target group** — your traffic doesn't share a load balancer with other tenants
36
+ - **Per-minute Fargate billing** — including the agent-ops sidecar, pause-to-stop billing
37
+ - **agent-ops sidecar** (see [Agent operator](./agent-ops)) — hourly health checks, self-healing, upgrades
38
+ - **SSE log streaming** — `zibby app logs -t` tails any container from anywhere
39
+ - **Dedicated egress IP addon** — pin outbound HTTPS through one whitelistable IP for self-hosted GitLab / Salesforce / Oracle Cloud
40
+
41
+ ## The catalog
42
+
43
+ Each marketplace entry is a curated bundle: container image, EFS volume layout, ALB wiring, secrets pattern, resource defaults. Today's catalog:
44
+
45
+ | App | Category | Tier | Rate |
46
+ |---|---|---|---|
47
+ | **n8n** | Workflow automation | Light | $0.05/hr |
48
+ | **Grafana** | Metrics + dashboards | Light | $0.05/hr |
49
+ | **Gas Town** | Multi-agent workspace | Light | $0.05/hr |
50
+ | **draw.io** | Diagrams + flowcharts | Light | $0.05/hr |
51
+ | **Open WebUI** | ChatGPT-style UI for Ollama | Heavy | $0.25/hr |
52
+
53
+ `zibby app templates` is the canonical, always-up-to-date list — the table above is a snapshot.
54
+
55
+ ## How tiers work
56
+
57
+ The catalog groups apps into three resource tiers:
58
+
59
+ | Tier | CPU | RAM | Rate |
60
+ |---|---|---|---|
61
+ | **Light** | 0.5 vCPU | 1 GB | $0.05/hr |
62
+ | **Standard** | 1 vCPU | 2 GB | $0.12/hr |
63
+ | **Heavy** | 2 vCPU | 4 GB | $0.25/hr |
64
+
65
+ Per-instance resource overrides are supported when you need to bump CPU / memory for one specific deployment without forking the catalog entry. See [Managing instances → resource overrides](./managing#resource-overrides).
66
+
67
+ ## Pricing model
68
+
69
+ - **Per-minute Fargate billing** while the instance is running, scoped to the tier above
70
+ - **No flat platform fee** for apps — you pay only for what's running
71
+ - **Pause to stop the meter** — `zibby app destroy` immediately stops billing; redeploy when you need it back (data is gone after destroy; pause-without-destroy is on the roadmap)
72
+ - **Free tier**: $10 in credits on signup, enough to run a Light app for ~8 days
73
+
74
+ → Next: [Deploy your first app](./deploy)
@@ -0,0 +1,121 @@
1
+ ---
2
+ sidebar_position: 3
3
+ title: Manage instances
4
+ ---
5
+
6
+ # Operating instances
7
+
8
+ Every lifecycle action — restart, scale, upgrade, rotate credentials, tear down — is one CLI call. All operations are scoped by **instance ID** (`a1b2c3d4`-style); `zibby app list` shows the ID alongside the display name.
9
+
10
+ ## Inventory
11
+
12
+ ```bash
13
+ zibby app list # all instances under your account
14
+ zibby app list --project <project-id> # scope to one project
15
+ ```
16
+
17
+ ```
18
+ ID Name App Tier Status Hourly Uptime
19
+ a1b2c3d4 automations n8n@1.97.1 Light running $0.05/hr 7d 14h
20
+ a8f7e6d5 metrics grafana Light running $0.05/hr 21d 3h
21
+ b2c3d4e5 webui open-webui Heavy paused — —
22
+ ```
23
+
24
+ `paused` instances are not billed; `running` are. `status` is updated every 60s by the agent-ops sidecar.
25
+
26
+ ## Single-instance status
27
+
28
+ ```bash
29
+ zibby app status a1b2c3d4
30
+ ```
31
+
32
+ A one-screen summary: status, resources, hourly rate, public URL, last agent-ops run.
33
+
34
+ ## Logs
35
+
36
+ ```bash
37
+ zibby app logs a1b2c3d4 # last 200 lines, both containers
38
+ zibby app logs a1b2c3d4 -t # tail mode, polls every 3s
39
+ zibby app logs a1b2c3d4 --lines 1000 # bigger window
40
+ zibby app logs a1b2c3d4 --json # raw JSON lines
41
+ zibby app logs a1b2c3d4 --verbose # full body, no parsing
42
+ ```
43
+
44
+ Logs include both the **app** container and the **agent-ops** sidecar, prefixed by source. Tail mode reconnects automatically on network blips.
45
+
46
+ ## Upgrade (zero-downtime)
47
+
48
+ ```bash
49
+ zibby app upgrade a1b2c3d4
50
+ zibby app upgrade a1b2c3d4 --version 0.1.16 # pin a specific agent-ops version
51
+ ```
52
+
53
+ Behind the scenes:
54
+
55
+ 1. Register a new task definition revision (same image, same volume, same env)
56
+ 2. Update the ECS service with the new revision
57
+ 3. ALB drains old tasks while new ones come up; the listener serves the new tasks once they pass health checks
58
+ 4. Old tasks shut down
59
+
60
+ A load-bearing n8n stays serving traffic the whole time. `--yes` skips the confirmation prompt for automation.
61
+
62
+ ## Restart
63
+
64
+ ```bash
65
+ zibby app restart a1b2c3d4
66
+ ```
67
+
68
+ Forces the ECS service to roll the current tasks — useful when an app gets wedged on a stuck connection and you don't want a full upgrade.
69
+
70
+ ## Rotate credentials
71
+
72
+ For BYOK apps (e.g. open-webui pointing at Anthropic via your own key):
73
+
74
+ ```bash
75
+ zibby app update-credential a1b2c3d4
76
+ ```
77
+
78
+ This picks up whatever's currently in your workspace credentials (set via [Settings → Workspace credentials](https://studio.zibby.dev/settings/workspace) or `zibby creds set`) and rolls the task with the new secret env. EFS data is preserved; the task restarts in ~30s.
79
+
80
+ ## ENV vars
81
+
82
+ Every app instance has a per-instance encrypted env-var bag, same shape as workflow env. Use it for per-instance config (e.g. `N8N_ENCRYPTION_KEY`, `DATABASE_URL` pointing at an external RDS).
83
+
84
+ Set via the dashboard (Apps → instance → ENV tab) or via CLI:
85
+
86
+ ```bash
87
+ zibby app env list a1b2c3d4
88
+ zibby app env set a1b2c3d4 N8N_HOST=automations.acme.com
89
+ zibby app env unset a1b2c3d4 OLD_FLAG
90
+ ```
91
+
92
+ Changes apply on the next task restart. Use `zibby app restart` to roll immediately.
93
+
94
+ ## Resource overrides
95
+
96
+ Default resources come from the catalog entry's tier. To bump CPU / memory for one instance:
97
+
98
+ ```bash
99
+ zibby app deploy n8n --project <id> --cpu 1024 --memory 2048 # 1 vCPU / 2 GB
100
+ ```
101
+
102
+ Per-instance overrides survive upgrades; the upgrade flow re-registers the task definition with the same override values unless `--reset-resources` is passed.
103
+
104
+ ## Destroy
105
+
106
+ ```bash
107
+ zibby app destroy a1b2c3d4
108
+ zibby app destroy a1b2c3d4 --yes # skip confirmation
109
+ ```
110
+
111
+ This:
112
+
113
+ 1. Drains the ECS service (in-flight requests finish)
114
+ 2. Deletes the service + task definition revision
115
+ 3. Removes the ALB listener rule + target group
116
+ 4. Releases the EFS access point — **destroys the volume data permanently**
117
+ 5. Stops the billing meter immediately
118
+
119
+ There's no soft-delete. If you might want the data later, snapshot it externally first (or wait for the pause-without-destroy feature on the roadmap).
120
+
121
+ → Next: [Agent operator](./agent-ops)
@@ -265,6 +265,111 @@ Templates are starter workflow scaffolds. `add` overwrites existing files in pla
265
265
  Options on `add`:
266
266
  - `--skip-memory` — strip `SKILLS.MEMORY` from copied `execute-live.mjs` (browser-test template only)
267
267
 
268
+ ## App commands {#app-commands}
269
+
270
+ `zibby app` manages [Managed App instances](./apps/) — hosted open-source tools (n8n, Grafana, …) with an autonomous agent-ops sidecar. Each verb is keyed by **instance ID** (`a1b2c3d4`-style); `zibby app list` shows IDs alongside display names.
271
+
272
+ | Command | What it does |
273
+ |---|---|
274
+ | [`zibby app templates`](#app-templates) | Browse the catalog (n8n, grafana, gas-town, drawio, open-webui, …) |
275
+ | [`zibby app list`](#app-list) | List deployed instances under your account |
276
+ | [`zibby app deploy <appType>`](#app-deploy) | Deploy an app from the catalog |
277
+ | [`zibby app status <id>`](#app-status) | One-screen summary: status, resources, URL, last agent-ops run |
278
+ | [`zibby app logs <id>`](#app-logs) | Logs from app + agent-ops, with `-t` tail mode |
279
+ | [`zibby app upgrade <id>`](#app-upgrade) | Zero-downtime roll to the catalog's current image |
280
+ | [`zibby app restart <id>`](#app-restart) | Force ECS service to roll the running tasks |
281
+ | [`zibby app update-credential <id>`](#app-update-credential) | Rotate a BYOK credential and restart |
282
+ | [`zibby app destroy <id>`](#app-destroy) | Tear down service + volume (data permanently deleted) |
283
+
284
+ ### app templates {#app-templates}
285
+
286
+ ```bash
287
+ zibby app templates
288
+ ```
289
+
290
+ Print the live catalog — id, display name, tier, hourly rate, one-line description.
291
+
292
+ ### app list {#app-list}
293
+
294
+ ```bash
295
+ zibby app list # all instances under your account
296
+ zibby app list --project <id> # scope to one project
297
+ ```
298
+
299
+ Options:
300
+ - `--project <id>` — project to scope the listing to (default: all projects your account owns)
301
+ - `--api-key <key>` — API key (or `ZIBBY_API_KEY` env)
302
+
303
+ ### app deploy {#app-deploy}
304
+
305
+ ```bash
306
+ zibby app deploy n8n --project <project-id> --name automations
307
+ ```
308
+
309
+ Options:
310
+ - `--project <id>` — interactive picker if omitted
311
+ - `--name <name>` — display name in the dashboard / `zibby app list` (defaults to `appType`)
312
+ - `--cpu <units>` — Fargate CPU units (e.g. `1024` for 1 vCPU; default from tier)
313
+ - `--memory <mb>` — Fargate memory in MB (e.g. `2048` for 2 GB; default from tier)
314
+ - `--api-key <key>` — API key (or `ZIBBY_API_KEY` env)
315
+
316
+ Returns an `instanceId` and the public URL.
317
+
318
+ ### app status {#app-status}
319
+
320
+ ```bash
321
+ zibby app status a1b2c3d4
322
+ ```
323
+
324
+ Prints status, resources, hourly rate, public URL, and the latest agent-ops run summary.
325
+
326
+ ### app logs {#app-logs}
327
+
328
+ ```bash
329
+ zibby app logs a1b2c3d4 # last 200 lines
330
+ zibby app logs a1b2c3d4 -t # tail mode, polls every 3s, SSE auto-reconnect
331
+ zibby app logs a1b2c3d4 --lines 1000 # bigger window
332
+ zibby app logs a1b2c3d4 --json # raw JSON lines
333
+ zibby app logs a1b2c3d4 --verbose # full line including JSON body
334
+ ```
335
+
336
+ Logs cover **both** containers — the app and the agent-ops sidecar — prefixed by source. Default output is the parsed `<time> <msg>` summary.
337
+
338
+ ### app upgrade {#app-upgrade}
339
+
340
+ ```bash
341
+ zibby app upgrade a1b2c3d4
342
+ zibby app upgrade a1b2c3d4 --version 0.1.16 # pin a specific agent-ops version
343
+ zibby app upgrade a1b2c3d4 --yes # skip confirmation
344
+ ```
345
+
346
+ Registers a new task definition revision, updates the ECS service, and lets the ALB drain old tasks before they exit. Zero-downtime for HTTP traffic.
347
+
348
+ ### app restart {#app-restart}
349
+
350
+ ```bash
351
+ zibby app restart a1b2c3d4
352
+ ```
353
+
354
+ Forces the ECS service to roll the current tasks without changing the task definition. Useful when the app gets wedged on a stuck connection.
355
+
356
+ ### app update-credential {#app-update-credential}
357
+
358
+ ```bash
359
+ zibby app update-credential a1b2c3d4
360
+ ```
361
+
362
+ Picks up whatever's currently in your workspace credentials and rolls the task with the new secret env. EFS data is preserved; the task restarts in ~30s. Used by BYOK apps (e.g. Open WebUI pointing at Anthropic via your own key).
363
+
364
+ ### app destroy {#app-destroy}
365
+
366
+ ```bash
367
+ zibby app destroy a1b2c3d4 # interactive confirm
368
+ zibby app destroy a1b2c3d4 --yes # skip the confirmation prompt
369
+ ```
370
+
371
+ Drains the ECS service, deletes the task definition revision, removes the ALB listener rule + target group, releases the EFS access point (**destroying the volume data permanently**), and stops the billing meter immediately. No soft delete.
372
+
268
373
  ## Environment variables
269
374
 
270
375
  | Var | Purpose |
package/docs/intro.md CHANGED
@@ -56,8 +56,20 @@ zibby template add <name> # add a template later (overwrites =
56
56
  - **Run anywhere** — local with hot reload, or cloud with Heroku-style bundles (~3s cold start).
57
57
  - **Session replay** — every run lands as on-disk JSONL + artifacts. Re-run any node via `--session <id> --node <name>`.
58
58
  - **Cloud-native** — SSE log streaming, dedicated egress IPs for firewalled GitLab / GitHub Enterprise / Salesforce.
59
+ - **Hosted apps too** — [Managed Apps](./apps/) host open-source tools (n8n, Grafana, Open WebUI, draw.io) with an autonomous agent-ops sidecar that handles health checks, self-healing, and upgrades.
59
60
  - **Drive it from your AI agent** — [`@zibby/mcp-cli`](./packages/mcp-cli) exposes deploy / trigger / logs / debug as MCP tools. Add one snippet to Claude Code, Cursor, Codex, or Gemini and they call Zibby directly from chat. See [Use from your AI agent](./get-started/use-from-agents).
60
61
 
62
+ ## Two product surfaces
63
+
64
+ | | **Workflows** | **Apps** |
65
+ |---|---|---|
66
+ | Lifetime | Per trigger (seconds-minutes) | Long-lived |
67
+ | Surface | Graph of agent CLI calls | A whole open-source application |
68
+ | Billing | Per execution | Per minute, while running |
69
+ | Best for | "When ticket lands, classify it" | "Host n8n for the team" |
70
+
71
+ Pick by how long the thing needs to run — see [Apps overview](./apps/) for the decision tree.
72
+
61
73
  ## How it compares
62
74
 
63
75
  | | Zibby | Claude Code Agent Teams | Devin | Mastra / LangGraph / CrewAI |
@@ -30,6 +30,7 @@ You don't have to use the recipes. You can build whatever pipeline you want with
30
30
  | Recipe | What it does | Best for |
31
31
  |---|---|---|
32
32
  | [`zibby test`](./test) | Drives a browser via Cursor or Claude, runs assertions, generates a Playwright script + verification video | E2E test generation from plain-English specs |
33
+ | [Sentry Triage](./sentry-triage) | Hourly: fetch unresolved Sentry issues, classify by severity, route via Slack/Lark — author DM + usergroup mention | Automated incident routing without a human triager |
33
34
  | `zibby analyze` | Reads a Jira/Linear ticket, walks the codebase, produces an implementation plan | Pre-implementation planning, ticket triage |
34
35
  | `zibby generate` | Generates test specs from a ticket + codebase | Backfilling test coverage on legacy projects |
35
36
  | `zibby video` | Re-records or organizes verification videos for an existing test | Producing demos, regenerating after code changes |
@@ -0,0 +1,93 @@
1
+ ---
2
+ sidebar_position: 3
3
+ title: Sentry triage recipe
4
+ ---
5
+
6
+ # `sentry-triage` — agent-driven Sentry triage
7
+
8
+ An hourly Sentry triage workflow that fetches unresolved issues, classifies them by severity, and **routes them to the right human**. Three nodes, end-to-end agent-driven, deployed from the marketplace in one click.
9
+
10
+ ```
11
+ fetch_issues → classify → dispatch_alerts
12
+ (deterministic (LLM — (LLM —
13
+ + Sentry API) severity) Slack/Lark, agent-driven routing)
14
+ ```
15
+
16
+ ## What it does
17
+
18
+ 1. **fetch_issues** — calls Sentry's REST API for issues unresolved + unassigned + `lastSeen:-60m`. Hydrates each with `suspectCommits[]` (author email from Sentry's GitHub integration) for downstream routing.
19
+ 2. **classify** — labels each issue `NOISE | LOW | MEDIUM | HIGH | CRITICAL` based on a configurable rubric (impact metric, surface area, payment paths, security tags). Skips below-threshold issues.
20
+ 3. **dispatch_alerts** — the routing brain. Three layers of decisioning:
21
+ - **Free-form `DISPATCH_RULES`** in env (highest priority) — natural language like *"send to Sam for billing issues"*
22
+ - **Structured env vars** — `SLACK_CHANNEL`, `ROUTING_PREFER_AUTHOR`, `ROUTING_HIGH_SEVERITY_GROUP`
23
+ - **Defaults** — channel-only post, threshold `MEDIUM`
24
+
25
+ The agent uses [`slack_lookup_user_by_email`](../skills/slack), [`slack_list_usergroups`](../skills/slack), [`slack_search_users`](../skills/slack) (or the Lark equivalents) to resolve names → IDs, then `slack_post_message` / `lark_send_message` to deliver. Channel post, user DM, usergroup mention — same agent decides per-issue based on what you wrote in the rules.
26
+
27
+ ## Deploy from the marketplace
28
+
29
+ ```bash
30
+ zibby workflow templates deploy sentry-triage --project <project-id>
31
+ ```
32
+
33
+ Or via the dashboard: `/marketplace/workflows` → Sentry Triage → Deploy.
34
+
35
+ After deploy, configure ENV (Apps → workflow → ENV tab):
36
+
37
+ | Env var | Required? | Default | What it does |
38
+ |---|---|---|---|
39
+ | `SLACK_CHANNEL` *or* `LARK_RECEIVE_ID` | Yes (one of) | — | Channel id (Slack `C…`) / chat id (Lark `oc_…`) for fallback posts |
40
+ | `SEVERITY_THRESHOLD` | No | `MEDIUM` | Skip anything below: `NOISE` / `LOW` / `MEDIUM` / `HIGH` / `CRITICAL` |
41
+ | `ROUTING_PREFER_AUTHOR` | No | `false` | If `true`, when a suspect commit author is known, DM them |
42
+ | `ROUTING_HIGH_SEVERITY_GROUP` | No | — | Slack usergroup handle (`@oncall`) mentioned on CRITICAL/HIGH |
43
+ | `SLACK_MENTIONS` *or* `LARK_MENTIONS` | No | `[]` | JSON array of mentions prepended on CRITICAL only |
44
+ | `DISPATCH_RULES` | No | — | Free-form natural-language override (see below) |
45
+
46
+ ## DISPATCH_RULES — natural-language routing
47
+
48
+ When you set `DISPATCH_RULES`, the agent treats it as **authoritative**; the structured env vars become fallbacks for things the rules don't cover.
49
+
50
+ ```
51
+ DISPATCH_RULES="
52
+ - CRITICAL bugs in /payment/ → DM Sam and post to #incidents
53
+ - HIGH severity → DM the suspect commit author if known, else post to #engineering
54
+ - Anything mentioning 'security' → also mention the @security usergroup
55
+ - Frontend bugs (zibby-frontend project) → only Sarah, never page on-call
56
+ - NOISE → skip entirely
57
+ "
58
+ ```
59
+
60
+ The agent reads issue metadata (severity, message, tags, suspectCommit author email, project name) and applies rules in order. **Same rule + same issue is deterministic** — temperature 0, schema-enforced output, every dispatch records who got it and why under `dispatched[].recipient.{kind,id,label}`.
61
+
62
+ ## Author-DM path
63
+
64
+ When `ROUTING_PREFER_AUTHOR=true` and Sentry has a `suspectCommits[0].author.email`:
65
+
66
+ ```
67
+ 1. agent reads issue.suspectCommits[0].authorEmail
68
+ 2. → slack_lookup_user_by_email(email)
69
+ 3a. ✓ returns {id, name} → slack_post_message(channel: <user-id>, text: …)
70
+ 3b. ✗ users_not_found → channel fallback
71
+ ```
72
+
73
+ Requires the [Sentry → GitHub integration](https://sentry.io/settings/integrations/github/) installed and Code Mappings configured. Without it, `suspectCommits[]` is empty and the agent falls back to channel-only routing automatically.
74
+
75
+ If you deployed your backend with `RELEASE_SHA` Sentry release-tracking on, suspect commits populate within ~minutes of new issues being created. (The platform-side wiring — `Sentry.init({release})` + `sentry-cli releases set-commits --auto` at deploy time — is what makes per-issue blame work; without it, every issue lands with `suspectCommits: []`.)
76
+
77
+ ## Customize the prompts
78
+
79
+ Each node's prompt lives in its own module — fork the template, edit, redeploy:
80
+
81
+ ```bash
82
+ zibby workflow download <uuid>
83
+ # edit nodes/dispatch-node.js
84
+ zibby workflow deploy ./sentry-triage # same UUID, new version
85
+ ```
86
+
87
+ Or fork the whole template repo if you want long-term divergence — it's just a `@zibby/workflow-templates/sentry-triage/` directory in the published package.
88
+
89
+ ## Cadence
90
+
91
+ Default: hourly cron, fires `sinceMinutes=60`. Change in the trigger config (Apps → workflow → Triggers) — keep the SQL safe `since` between 5 and 1440 minutes (`inputSchema` enforces this).
92
+
93
+ → Next: [`zibby test`](./test) (the browser-testing recipe) or [Build your own workflow](../get-started/your-first-workflow).