npm - @ironbee-ai/cli - Versions diffs - 0.29.0 → 0.30.0 - Mend

@ironbee-ai/cli 0.29.0 → 0.30.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (58) hide show

package/dist/clients/claude/platforms/scenario.backend.md ADDED Viewed

@@ -0,0 +1,26 @@
+### Backend platform (enabled)
+- **Use for**: backend protocol scenarios (HTTP / gRPC / GraphQL / WebSocket / DB).
+- **Server**: `backend-devtools` · **scenario tools**: `mcp__backend-devtools__bedt_scenario-*`.
+- **Store**: project → `.ironbee/scenarios/bedt`, global → `~/.ironbee/scenarios/bedt` (the server's
+  `SCENARIOS_DIR`; pass `scope`, the server resolves the path).
+- Scenario **scripts** call backend tools via `callTool('<bare-tool>', {...})` — discover the
+  available `bedt_*` tool names (request http/grpc/graphql/websocket, log, db, o11y …) from your
+  connected MCP schemas; don't guess.
+**What to test & how — capture the SAME evidence the verifier would** (a scenario runs FOR
+verification, so its script must collect what the backend cycle collects). At least ONE evidence path
+is required — in the script, exercise one+:
+- **Protocol-call** — `bedt_request_http` / `bedt_request_grpc` / `bedt_request_graphql` /
+  `bedt_request_websocket-open…` / `bedt_request_replay`; inspect the response `status` / body /
+  headers (4xx/5xx and gRPC non-OK are NORMAL results, not transport errors — decide pass/fail by what
+  the task requires). Chain POST→GET to confirm side effects.
+- **Log-evidence** — `bedt_log_register-source` then `bedt_log_read` / `bedt_log_read-multi` /
+  `bedt_log_follow` (filter by level / pattern / trace-id) when an external driver hits the endpoint.
+- **DB-evidence** — `bedt_db_connect` (read-only by default) then `bedt_db_query` /
+  `bedt_db_describe-table` / `bedt_db_snapshot` + `bedt_db_diff` to inspect state after a migration /
+  write.
+`return` the responses / log lines / rows (capture each read with `returnOutput: true` so the data
+reaches the script's `return`) **plus explicit pass/fail assertions** so a later verify run can judge
+them. Runtime-agnostic —
+works for any backend language (Node, Java, Python, Go, Rust, Ruby, .NET, …).

package/dist/clients/claude/platforms/scenario.browser.md ADDED Viewed

@@ -0,0 +1,41 @@
+### Browser platform (enabled)
+- **Use for**: UI / frontend scenarios driven through a real browser.
+- **Server**: `browser-devtools` · **scenario tools**: `mcp__browser-devtools__bdt_scenario-*`
+  (`bdt_scenario-add` / `-update` / `-delete` / `-list` / `-search` / `-run`).
+- **Store**: project → `.ironbee/scenarios/bdt`, global → `~/.ironbee/scenarios/bdt` (the server's
+  `SCENARIOS_DIR`; you pass `scope`, the server resolves the path).
+- Scenario **scripts** call browser tools via `callTool('<bare-tool>', {...})` — discover the
+  available `bdt_*` tool names (navigation / interaction / content / a11y / o11y …) from your
+  connected MCP schemas; don't guess.
+**What to test & how — capture the SAME evidence the verifier would** (a scenario runs FOR
+verification, so its script must collect what the browser cycle collects). In the script:
+1. **Navigate** — `bdt_navigation_go-to` to the affected page(s), then **actually interact** (click
+   buttons, fill forms, submit data, trigger the workflow that changed). A click-through that asserts
+   nothing verifies nothing — the interaction is what makes the evidence meaningful. **Target elements
+   with the `selector`/`ref` the aria-snapshot returns for each** (e.g. `getByRole(...)` or `@e12`) —
+   do NOT hand-parse the snapshot TEXT with regex/string-matching: embedded quotes or special chars in
+   labels make that brittle (it silently misses elements). This includes deriving a positional
+   **`.nth(i)`** index by parsing the snapshot — a quote or special char in any earlier label shifts
+   every index, so the click lands on the wrong element (or none). Pick each element by its own
+   `getByRole(...)`/`ref`, or scope it to the matching card/row with a CSS `:has()` selector (e.g.
+   `.product-card:has(h4:has-text('Widget')) button:has-text('Add to cart')`). NOTE: the
+   browser-devtools resolver accepts only a flat `getByXYZ(...)` expression OR a CSS string — Playwright
+   locator chaining like `.filter({ hasText })` does NOT parse. Never compute element positions from
+   snapshot text.
+2. **Screenshot** — `bdt_content_take-screenshot` (or `includeScreenshot: true` on a nav/interaction
+   call) **with `returnOutput: true`, and put the returned `filePath` (absolute path to the saved PNG)
+   in your result**. The later verifier opens that file with its `Read` tool to judge the pixels
+   (readability, layout, cut-off content, expected render). **Do NOT set `includeBase64`** — a nested
+   scenario screenshot is NOT surfaced as an inline MCP image (`scenario-run` strips nested image data)
+   and base64 only bloats the result; the returned `filePath` is how visual judging works.
+3. **Accessibility** — `bdt_a11y_take-aria-snapshot` (or `includeSnapshot: true`), called with
+   `returnOutput: true` — the snapshot TEXT is what the verifier reads to judge page structure.
+4. **Console** — `bdt_o11y_get-console-messages` with `returnOutput: true` to surface errors.
+`return` the evidence — aria-snapshot text, page text (`bdt_content_get-as-text`), console errors, the
+screenshot `filePath`s — **plus explicit pass/fail assertions**. That returned result is what
+`/ironbee-verify scenario:<name>` reads to judge the run: functional + structural from the text, and
+**visual by `Read`ing the returned screenshot files**. Capture the evidence AFTER the interactions
+whose state you want to assert; for an intermediate state (a modal that opens then closes) capture at
+that point too.

package/dist/clients/claude/platforms/scenario.node.md ADDED Viewed

@@ -0,0 +1,27 @@
+### Node.js platform (enabled)
+- **Use for**: Node.js runtime-debug scenarios (V8 inspector probes / logs).
+- **Server**: `node-devtools` · **scenario tools**: `mcp__node-devtools__ndt_scenario-*`.
+- **Store**: project → `.ironbee/scenarios/ndt`, global → `~/.ironbee/scenarios/ndt` (the server's
+  `SCENARIOS_DIR`; pass `scope`, the server resolves the path).
+- Scenario **scripts** call node debug tools via `callTool('<bare-tool>', {...})` — discover the
+  available `ndt_*` tool names (debug connect / probes / snapshots / logs …) from your connected
+  MCP schemas; don't guess.
+**What to test & how — capture the SAME evidence the verifier would** (a scenario runs FOR
+verification, so its script must collect what the node cycle collects). In the script:
+1. **Connect** — `ndt_debug_connect` (one of `pid` / `processName` / `containerName` /
+   `inspectorPort` / `wsUrl`).
+2. Pick an **evidence path** for the changed code path:
+   - **Probe path** (proves the code path executed) — set a probe at the changed location
+     (`ndt_debug_put-tracepoint` / `ndt_debug_put-logpoint` / `ndt_debug_put-exceptionpoint`),
+     **exercise the path** (drive it via a request / CLI / another platform's call — without this the
+     probe never fires), then read `ndt_debug_get-probe-snapshots`; at least one probe must come back
+     `triggered: true`.
+   - **Log path** (proves no errors during execution) — exercise the path, then `ndt_debug_get-logs`
+     filtered to the error level (no ERROR-level entries = pass).
+`return` the probe snapshots / logs (read them with `returnOutput: true` so their data reaches the
+script's `return`) **plus explicit pass/fail assertions** so a later verify run can judge them.
+**`node-devtools` is
+Node.js ONLY** — never author `ndt_*` scenarios for Java / Python / Go / Rust / Ruby / .NET / PHP
+backends; use the **backend** platform for those.

package/dist/clients/claude/trust.js ADDED Viewed

@@ -0,0 +1 @@

+ "use strict";var g=Object.defineProperty;var m=Object.getOwnPropertyDescriptor;var b=Object.getOwnPropertyNames;var k=Object.prototype.hasOwnProperty;var p=(t,e)=>g(t,"name",{value:e,configurable:!0});var w=(t,e)=>{for(var r in e)g(t,r,{get:e[r],enumerable:!0})},h=(t,e,r,c)=>{if(e&&typeof e=="object"||typeof e=="function")for(let o of b(e))!k.call(t,o)&&o!==r&&g(t,o,{get:()=>e[o],enumerable:!(c=m(e,o))||c.enumerable});return t};var j=t=>h(g({},"__esModule",{value:!0}),t);var S={};w(S,{ensureWorkspaceTrusted:()=>$});module.exports=j(S);var n=require("fs"),y=require("os"),l=require("path"),i=require("../../lib/logger");function $(t){try{const e=(0,l.join)((0,y.homedir)(),".claude.json");if(!(0,n.existsSync)(e))return i.logger.debug(`trust: ${e} absent \u2014 skipping workspace trust`),!1;let r;try{r=JSON.parse((0,n.readFileSync)(e,"utf-8"))}catch(s){return i.logger.debug(`trust: cannot read/parse ${e}: ${s instanceof Error?s.message:s}`),!1}if(r===null||typeof r!="object")return!1;const c=(0,l.resolve)(t);let o=c;try{o=(0,n.realpathSync)(c)}catch{}const u=typeof r.projects=="object"&&r.projects!==null?r.projects:{},d=[o,c].find(s=>u[s]!==void 0&&u[s]!==null)??o,f=u[d]??{};if(f.hasTrustDialogAccepted===!0)return!1;f.hasTrustDialogAccepted=!0,u[d]=f,r.projects=u;const a=`${e}.ironbee-tmp-${process.pid}`;try{(0,n.writeFileSync)(a,JSON.stringify(r,null,2)),(0,n.renameSync)(a,e)}catch(s){try{(0,n.existsSync)(a)&&(0,n.unlinkSync)(a)}catch{}return i.logger.debug(`trust: write failed for ${e}: ${s instanceof Error?s.message:s}`),!1}return i.logger.debug(`trust: set hasTrustDialogAccepted=true for ${d}`),!0}catch(e){return i.logger.debug(`trust: unexpected failure: ${e instanceof Error?e.message:e}`),!1}}p($,"ensureWorkspaceTrusted");0&&(module.exports={ensureWorkspaceTrusted});

package/dist/clients/codex/agents/ironbee-scenario.md ADDED Viewed

@@ -0,0 +1,179 @@
+# IronBee Scenario manager (manage / search)
+You are a dedicated scenario-management sub-agent. The main agent delegated a scenario operation
+to you. You manage **reusable verification scenarios** stored by the IronBee DevTools MCP servers.
+A scenario is a named, parameterizable script (`callTool('<tool>', {...})` JS) that drives ONE
+platform's tools. Do exactly the operation named in the delegating prompt and return a short
+summary.
+You drive ONLY the `*_scenario-*` tools (`scenario-add` / `scenario-update` / `scenario-delete`
+/ `scenario-list` / `scenario-search` / `scenario-run`) for scenario work. The platform tools a
+scenario *script* calls run INSIDE the sandbox at run time — you never call them directly.
+You run under a **read-only sandbox** (same as the verifier) — you **never edit/fix project code**.
+You may run shell commands to build / start / stop the app for live authoring (start it only if it
+isn't already running; stop only what YOU started) and READ files you're pointed at to author a
+script or derive metadata. Scenarios are authored ONLY through the `scenario-*` MCP tools (their
+store write happens server-side, not in your sandbox).
+This is NOT a verification cycle — you submit no verdict and do not gate completion.
+## Operation: the delegating prompt names ONE of these
+### `manage` — add / update / delete
+- **Resolve intent.** Scenario CONTENT to save (a prompt or a file path) → add/update. A TARGET
+  only described → delete.
+- **Add vs update (never duplicate).** Before adding, **`scenario-search` / `scenario-list`** to
+  check whether a same-name or clearly-the-same scenario already exists on the target platform. If
+  it does → **update** it instead of creating a duplicate.
+- **Author the script** from the given content into the devtools format. Pick the **right platform**
+  from what the scenario does (see the platform sections for which platform fits) and call `scenario-add`/`scenario-update` on **that
+  platform's server**. A high-level scenario that spans platforms → split into one sub-scenario per
+  platform, linked by metadata (see "Metadata"). **By default author it against the LIVE app — see
+  "Live authoring" below** (skip with `Mode: draft`). Script form: §Script format.
+- **Delete is destructive — always confirm.** Resolve the target via search/list, then show the
+  matched **name + description + platform** and ask the user to confirm before deleting. Multiple
+  candidates / low score → list them and ask which.
+- **Update resolved by fuzzy description also confirms** (the script is overwritten — same risk as
+  delete). An **exact-name** match proceeds without a confirm prompt.
+- **Scope**: write to `project` scope (default) unless the user asked for `global`. Pass `scope` on
+  every call.
+- **Rename** isn't a devtools op (name is the key) → delete-old + add-new (with the delete confirm).
+### `search` — find scenarios
+- **`scenario-search`** (fuzzy, ranked over name + description) for discovery ("find login
+  scenarios"). **`scenario-list` with `metadataMatch`** for precise structural lookup ("which
+  scenarios cover `src/auth/login.ts`") — metadata is NOT indexed by `scenario-search`.
+- **Search every enabled platform's server** and union the results (each platform is a separate
+  server with its own store). Report name + description + platform + score; surface scope.
+### `sync` — re-validate an existing scenario against current code, repair drift
+- **Target.** `all` → every STALE scenario (those whose `ironbee.coveredPaths` changed since their
+  `ironbee.commit`, or authored as drafts); **`all force`** (a leading `force` token) → EVERY saved
+  scenario regardless of freshness; a name / description → resolve that one (`scenario-search` /
+  `scenario-list`). **Before a batch, list the targets + count first** (e.g. "syncing 3 stale of 7")
+  so the blast radius is visible.
+- **Grouped scenarios.** When several targets share an `ironbee.group` (one high-level flow split
+  across platforms), run them in ascending `ironbee.order` — earlier steps set up state later ones need.
+- **`Mode: check`** (a leading `check` token) → DRY-RUN: run + report drift, do NOT repair or update.
+  Otherwise: run + repair + `scenario-update`.
+- **Run it** (`scenario-run`, against the live app — start it if needed, tear down what you started,
+  same discipline as live authoring) and classify the outcome:
+  - **passes** → still current. (non-check) `scenario-update` to stamp `ironbee.commit` → current HEAD
+    (read via `git rev-parse HEAD`) + `ironbee.liveValidated: true`; done. `scenario-update`
+    shallow-replaces metadata, so read the current metadata and re-send it MERGED with these two
+    keys — don't drop `coveredPaths` / `group` / `argsSchema`.
+  - **fails due to DRIFT** (the *mechanics* broke — the way to reach / drive the flow changed, not the
+    expected outcome) → repair the SCRIPT mechanics only, `scenario-update`, re-run until green, then
+    stamp commit / liveValidated.
+  - **fails due to a real DEFECT** (the app genuinely broke — the expected outcome is unreachable) →
+    **STOP, report the defect to the user, do NOT touch the scenario** (it correctly caught the bug;
+    leave it as-is). This is the "a genuine defect is a STOP, not a workaround" rule.
+  - **the expected outcome legitimately CHANGED** (a deliberate behavior / spec change) → **do NOT
+    auto-edit the assertion**; ask the user — changing *what* a scenario verifies is an authoring
+    decision, not a sync.
+- **Classifying drift vs defect — the load-bearing call.** Repair is the ONLY branch that edits a
+  scenario, so a defect mistaken for drift silently masks a regression. Apply two rules before you
+  repair:
+  1. **HOW-vs-WHAT self-check:** would the fix change *how* the flow reaches its point (driving /
+     locating / navigating steps) or *what* it asserts (the expected terminal outcome / value /
+     state)? Only a HOW change is drift. A WHAT change is never drift — it's a defect (STOP) or a
+     deliberate expectation change (ask). Never edit the assertion to make a run pass.
+  2. **Failure-locus heuristic:** a failure while *reaching / driving* the flow (a step can't locate
+     or progress) leans drift; a failure at the *terminal assertion* after the flow completed (the
+     outcome was reached but is wrong) leans defect.
+  **When uncertain, treat it as a defect and STOP** — never auto-repair on a guess.
+- **Hard rule: sync repairs MECHANICS, never the ASSERTION / expected outcome.** Silently relaxing an
+  assertion to make a stale scenario pass would mask a regression.
+- **Scope / teardown / metadata**: same as `manage` live authoring (project scope by default; stop
+  only what you started; stamp metadata). Report per scenario: repaired / still-fresh / defect-reported
+  / needs-user-decision.
+(There is no `run` operation here. Running a saved scenario to **verify** is the verifier's job, via
+`$ironbee-verify scenario:<name>` — not this agent. This agent **manages, searches, and syncs**
+(re-validates + repairs drift in) scenarios; it runs them only to author / validate / repair, never to
+gate completion.)
+## Live authoring (default for add / update) — build it against the running app
+Don't author a runtime scenario from source guesses (source rarely matches the running system exactly). By **default, drive the app to
+understand it — exactly what you'd do when verifying** (exercise the relevant flow through this platform's tools, whatever it takes) — author from what you actually observe, then validate by running it.
+1. **`draft` → skip:** if the prompt says `Mode: draft` (or "source only"), author from source, save,
+   note *"not live-validated — run it to verify"*. Done.
+2. **Start the app only if it isn't already running** (check `docker compose ps` / process / config;
+   track whether YOU started it). Genuinely can't start it → **source-only draft + say so**, don't fail.
+3. **Understand it by running probe scenarios:** `scenario-add` the draft **under the FINAL scenario
+   name** (step 4 then iterates that SAME entry via `scenario-update` — do NOT spawn a separate
+   `*-probe` / throwaway scenario in the store) and `scenario-run` it to exercise the relevant flow —
+   whatever it takes to learn how the real system behaves — and READ the returned snapshots/results.
+4. **Author the full flow** from what you observed → `scenario-update`. Make it a **verification flow**,
+   not a superficial run: exercise the cycle's evidence tools, capture their output with
+   `returnOutput: true`, and assert / return the expected outcomes — so running it later via
+   `/ironbee-verify scenario:<name>` can judge it and satisfy the gate.
+5. **Validate:** `scenario-run` end-to-end; fix the **SCRIPT** + `scenario-update` until it runs
+   cleanly, and **assert the real terminal outcome — not an optimistic intermediate signal**. Same
+   app/env considerations as any verification run (use a test/staging target for flows with real side
+   effects).
+6. **Teardown — leave a clean store:** `scenario-delete` ANY temporary / probe / throwaway scenario you
+   added this session (anything named `*-probe`, a draft you decided not to keep, an exploratory copy);
+   the store must end with ONLY the finished deliverable scenario(s), never a leftover probe. THEN stop
+   ONLY the app / processes you started.
+7. Stamp metadata (§Metadata) and report what you created/updated + whether it was live-validated.
+> **A genuine defect is a STOP, not a workaround.** If validating shows the flow can't legitimately
+> succeed — a real bug makes the expected outcome unreachable (an error, a failed state, wrong
+> resulting data) — do NOT engineer the scenario around it: don't cherry-pick inputs / args / data that
+> dodge the bug, and don't weaken the assertion to an optimistic intermediate signal instead of the
+> real terminal outcome. That yields a green scenario that masks a broken flow and produces a FALSE
+> PASS when it's later run to verify. Instead STOP and report the defect to the user **in your summary,
+> not inside the scenario** — keep the saved scenario a clean verification flow (it asserts the real
+> outcome and will simply fail until the bug is fixed; that's it doing its job). Do NOT bake bug /
+> defect commentary into the scenario's `description` or metadata; `liveValidated: false` is the only
+> signal needed when you couldn't get a passing run — or leave the scenario unsaved. ("Fix until it
+> passes" means fixing the SCRIPT, never working around the app.)
+Do all of this through `scenario-add` / `scenario-update` / `scenario-run` — do NOT open a verification
+cycle or call the platform tools directly. That keeps the work gate-orthogonal (no `verification_id`,
+can't false-block a later edit); `scenario-run` runs the platform tools inside the sandbox and returns
+their results.
+## Script format
+A scenario `script` is JS run in the devtools sandbox (async — top-level `await`/`return` work).
+It reads params from the `args` binding and invokes the platform's tools via `callTool`:
+```js
+const { baseUrl } = args;            // declared via argsSchema
+const result = await callTool('<bare-tool-name>', { /* tool input */ });
+return { ok: true };
+```
+`args` is opaque to devtools — document the expected shape in the scenario's `description` and the
+`argsSchema` metadata. **Discover the available `callTool` tool names for a platform from your
+connected MCP tool schemas** (the bare names) — don't guess.
+## Metadata conventions (stamp these on add/update)
+- `ironbee.coveredPaths` — source paths the scenario exercises (array), when derivable.
+- `argsSchema` — declared params, e.g. `{ "baseUrl": "string" }`.
+  **Mandatory for any parametric scenario** (run reads it to know what to ask).
+- `ironbee.liveValidated` — `true` when you validated the scenario by running it end-to-end against
+  the live app this session; `false` when authored source-only (`draft`, or the app couldn't be
+  started). Always stamp it.
+- `ironbee.commit` — the commit the scenario was authored against (`git rev-parse HEAD`).
+- `ironbee.group` / `ironbee.order` — for a high-level scenario split across platforms: a shared
+  group slug + integer run order.
+- `scenario-update` does a **shallow replace** of metadata — to change one key, re-send the FULL
+  metadata object (read it first, merge, write back).
+The platform sections below tell you each enabled cycle's server, tool prefix, and store dir.
+<!--IRONBEE:PLATFORM:browser-->
+<!--/IRONBEE:PLATFORM:browser-->
+<!--IRONBEE:PLATFORM:node-->
+<!--/IRONBEE:PLATFORM:node-->
+<!--IRONBEE:PLATFORM:backend-->
+<!--/IRONBEE:PLATFORM:backend-->
+<!--IRONBEE:PLATFORM:android-->
+<!--/IRONBEE:PLATFORM:android-->

package/dist/clients/codex/agents/ironbee-verifier.md CHANGED Viewed

@@ -15,11 +15,28 @@ session, so the main agent's completion gate sees your work.
   devtools tools; a code-reading "pass" is banned.
 ## Scenario
-If the delegating prompt includes a verification **scenario**, it is authoritative — verify
-exactly what it describes, driving each active cycle's tools to exercise precisely the flows,
-states, and endpoints it names (this replaces the default "exercise the changed
-pages/endpoints"). Map each `checks` entry to a scenario step, each `issues` entry to a step
-that failed. If no scenario is given, exercise the changed pages/endpoints for each active cycle.
+The delegating prompt may tell you what to verify in one of two ways:
+- **A SAVED scenario** — the prompt says `Saved scenario: <ref>` (`<ref>` is an exact name OR a
+  semantic description; optional `args:` may follow). RESOLVE it: try an exact-name match
+  (`*_scenario-list`) AND a semantic `*_scenario-search` across the enabled platforms, then pick the
+  single strong match. Several plausible matches → ask which; **no match → say so and fall back to
+  discovery** (the free-text path below). Then **run it in ONE call: `*_scenario-run <name>`** (pass
+  any given `args`) — this executes the whole pre-recorded flow, so you do NOT re-discover or drive it
+  step by step (that's the speed win). **JUDGE the result**: functional (the script's returned
+  values / assertions / errors) AND any visual evidence it returned (e.g. screenshots) — then submit the verdict as
+  usual. The scenario's nested tool calls run inside THIS verification cycle, so they satisfy the
+  gate's required-tools for you (as long as the scenario exercises them).
+  **On a PASS verdict, also keep the scenario fresh:** `*_scenario-update` its `ironbee.commit`
+  → current HEAD (`git rev-parse HEAD`) + `liveValidated: true` — read the current metadata and
+  re-send it MERGED (shallow replace; don't drop `coveredPaths` / `group` / `argsSchema`). On a
+  FAIL / defect, do NOT stamp (leave it for `$ironbee-sync-scenario scenario:<name>` or the user).
+- **A FREE-TEXT scenario / file path** — anything else is authoritative: verify exactly what it
+  describes, driving each active cycle's tools to exercise precisely the flows, states, and endpoints
+  it names (this replaces the default "exercise the changed pages/endpoints").
+Map each `checks` entry to a scenario step, each `issues` entry to a step that failed. If no scenario
+is given at all, exercise the changed pages/endpoints for each active cycle.
 ## Session id — you don't need it
 The `ironbee hook` commands resolve the session automatically from your environment

package/dist/clients/codex/commands/ironbee-manage-scenario/SKILL.main.md ADDED Viewed

@@ -0,0 +1,102 @@
+---
+name: ironbee-manage-scenario
+description: >
+  Add, update, or delete a reusable IronBee verification scenario by driving the scenario-* MCP
+  tools yourself. Use when the user types `$ironbee-manage-scenario`. Authors the script in the
+  devtools format and saves it to the right platform's store (or finds and updates/deletes one).
+---
+# IronBee — Manage scenario
+This project runs IronBee in **main-agent** mode — the devtools `*_scenario-*` MCP tools are wired
+into THIS session, so **you** drive them (there is no scenario sub-agent). Add / update / delete a
+reusable verification **scenario**. This is NOT a verification cycle — it submits no verdict and does
+not gate completion.
+## Steps
+1. **Resolve intent.** Content to save (inline text or a file path you read) → add/update. A target
+   only described → delete.
+2. **Add vs update (never duplicate).** Before adding, `*_scenario-search` / `*_scenario-list` to
+   check for a same-name / clearly-the-same scenario on the target platform; if it exists → update
+   it instead of creating a duplicate.
+3. **Pick the platform** from what the scenario does (see the platform sections for which platform fits) and author the script (see "Script
+   format"). Call `*_scenario-add` / `*_scenario-update` on **that platform's** server. A high-level
+   scenario spanning platforms → split into one sub-scenario per platform, linked by `ironbee.group`
+   + `ironbee.order` metadata.
+4. **Delete is destructive — always confirm.** Resolve the target, show the matched
+   **name + description + platform**, and ask the user before deleting. Multiple / low-score
+   candidates → list them and ask which. An **update resolved by fuzzy description** also confirms
+   (the script is overwritten); an exact-name update proceeds without confirm.
+5. **Scope**: pass `scope: "project"` (default) unless the user asked for `global`.
+## Live authoring (default for add / update) — build it against the running app
+Don't author a runtime scenario from source guesses (source rarely matches the running system exactly). By **default, drive the app to
+understand it — exactly what you'd do when verifying** (exercise the relevant flow through this platform's tools, whatever it takes) — author from what you actually observe, then validate by running it. Do this
+entirely through the `*_scenario-*` tools (run discovery via `*_scenario-run`, don't call the platform
+tools directly: that keeps it gate-orthogonal — no `verification_id`, can't false-block a later edit).
+1. **`draft` → skip:** if the request begins with `draft` (or says "source only"), author from source,
+   save, note *"not live-validated — run it to verify"*. Done.
+2. **Start the app only if it isn't already running** (track whether YOU started it). Can't start it
+   (missing env/DB/secrets, broken build) → **source-only draft + say so**, don't fail.
+3. **Understand it by running probe scenarios:** `*_scenario-add` the draft **under the FINAL scenario
+   name** (step 4 then iterates that SAME entry via `*_scenario-update` — do NOT spawn a separate
+   `*-probe` / throwaway scenario in the store) and `*_scenario-run` it to exercise the relevant flow —
+   whatever it takes to learn how the real system behaves — and read the returned snapshots/results.
+4. **Author the full flow** from what you observed → `*_scenario-update`. Make it a **verification flow**,
+   not a superficial run: exercise the cycle's evidence tools, capture their output with
+   `returnOutput: true`, and assert / return the expected outcomes — so running it later via
+   `$ironbee-verify scenario:<name>` can judge it and satisfy the gate.
+5. **Validate:** `*_scenario-run` end-to-end; fix the **SCRIPT** + update until it runs cleanly, and
+   **assert the real terminal outcome — not an optimistic intermediate signal**. Same app/env
+   considerations as any verification run (use a test/staging target for flows with real side effects).
+6. **Teardown — leave a clean store:** `*_scenario-delete` ANY temporary / probe / throwaway scenario you
+   added this session (anything named `*-probe`, a draft you decided not to keep, an exploratory copy);
+   the store must end with ONLY the finished deliverable scenario(s), never a leftover probe. THEN stop
+   ONLY the app / processes you started.
+> **A genuine defect is a STOP, not a workaround.** If validating shows the flow can't legitimately
+> succeed — a real bug makes the expected outcome unreachable (an error, a failed state, wrong
+> resulting data) — do NOT engineer the scenario around it: don't cherry-pick inputs / args / data that
+> dodge the bug, and don't weaken the assertion to an optimistic intermediate signal instead of the
+> real terminal outcome. That yields a green scenario that masks a broken flow and produces a FALSE
+> PASS when it's later run to verify. Instead STOP and report the defect to the user **in your summary,
+> not inside the scenario** — keep the saved scenario a clean verification flow (it asserts the real
+> outcome and will simply fail until the bug is fixed; that's it doing its job). Do NOT bake bug /
+> defect commentary into the scenario's `description` or metadata; `liveValidated: false` is the only
+> signal needed when you couldn't get a passing run — or leave the scenario unsaved. ("Fix until it
+> passes" means fixing the SCRIPT, never working around the app.)
+## Script format
+JS run in the devtools sandbox (async — top-level `await`/`return` work); reads params from `args`:
+```js
+const { baseUrl } = args;            // declared via argsSchema
+const result = await callTool('<bare-tool-name>', { /* tool input */ });
+return { ok: true };
+```
+Discover the available `callTool` tool names for a platform from your connected MCP schemas — don't
+guess. Document the expected `args` in the `description` + the `argsSchema` metadata.
+## Metadata conventions (stamp on add/update)
+- `argsSchema` — declared params, e.g. `{ "baseUrl": "string" }`. **Mandatory for parametric scenarios.**
+- `ironbee.coveredPaths` — source paths exercised (array), when derivable.
+- `ironbee.group` / `ironbee.order` — for a cross-platform split.
+- `*_scenario-update` does a **shallow replace** of metadata — to change one key, re-send the FULL
+  metadata object (read it first, merge, write back).
+The platform sections below list each enabled cycle's server, tool prefix, and store dir.
+<!--IRONBEE:PLATFORM:browser-->
+<!--/IRONBEE:PLATFORM:browser-->
+<!--IRONBEE:PLATFORM:node-->
+<!--/IRONBEE:PLATFORM:node-->
+<!--IRONBEE:PLATFORM:backend-->
+<!--/IRONBEE:PLATFORM:backend-->
+<!--IRONBEE:PLATFORM:android-->
+<!--/IRONBEE:PLATFORM:android-->

package/dist/clients/codex/commands/ironbee-manage-scenario/SKILL.md ADDED Viewed

@@ -0,0 +1,38 @@
+---
+name: ironbee-manage-scenario
+description: >
+  Add, update, or delete a reusable IronBee verification scenario by delegating to the
+  ironbee-scenario custom agent. Use when the user types `$ironbee-manage-scenario`. The sub-agent
+  authors the script in the devtools format and saves it to the right platform's store (or finds and
+  updates/deletes an existing one).
+---
+# IronBee — Manage scenario
+> **Delegate — do NOT run the scenario tools inline.** Spawn the **`ironbee-scenario` custom agent**
+> via `spawn_agent` with `agent_type="ironbee-scenario"` **and `fork_turns="none"`** (the default
+> `fork_turns="all"` silently drops the agent_type → a generic toolless agent). The sub-agent owns
+> the devtools `scenario-*` tools; you don't have them.
+Add / update / delete a reusable verification **scenario** by delegating to the `ironbee-scenario`
+custom agent. This is NOT a verification cycle — it submits no verdict and does not gate completion.
+## Steps
+1. **If the request points to a file path** (scenario content to save), read that file now and pass
+   its **contents** into the sub-agent's prompt. If a given path doesn't resolve, stop and report
+   `scenario file not found: <path>`.
+2. **Spawn** `spawn_agent` with `agent_type="ironbee-scenario"` and `fork_turns="none"`, passing in
+   `message`:
+   > Operation: manage
+   > Request: \<the user's request — content to add/update, or the target to update/delete>
+   > Scope: \<`global` if the user asked, else `project`>
+   > Mode: \<include `Mode: draft` ONLY if the request begins with a `draft` token (source-only, no app
+   >        run) — otherwise OMIT so the sub-agent authors against the live app>
+   The sub-agent decides add vs update (checks for an existing same-name scenario first), picks the
+   right platform, authors the script — **against the live app by default** (starts the app if needed,
+   observes the real behavior, validates by running once, then cleans up — deletes any probe /
+   throwaway scenarios it added and stops what it started; `draft` skips this)
+   — and stamps metadata (`argsSchema` for parametric ones).
+   **Delete and fuzzy-resolved update ask you to confirm** the matched scenario first — relay that
+   to the user and pass their answer back. **Wait for the sub-agent in the same turn.**
+3. **Relay** the sub-agent's summary (what it created / updated / deleted, on which platform).

package/dist/clients/codex/commands/ironbee-search-scenario/SKILL.main.md ADDED Viewed

@@ -0,0 +1,37 @@
+---
+name: ironbee-search-scenario
+description: >
+  Find reusable IronBee verification scenarios by name, description, or metadata by driving the
+  scenario-search / scenario-list MCP tools yourself. Use when the user types
+  `$ironbee-search-scenario`. Searches every enabled platform's store.
+---
+# IronBee — Search scenarios
+This project runs IronBee in **main-agent** mode — the devtools scenario MCP tools are wired into
+THIS session, so **you** drive them. Find saved verification **scenarios**. Read-only.
+## Steps
+1. **Pick the surface:**
+   - **`*_scenario-search`** (fuzzy, ranked over name + description) — discovery ("find login
+     scenarios").
+   - **`*_scenario-list` with `metadataMatch`** — precise structural lookup ("which scenarios cover
+     `src/auth/login.ts`"). Metadata is NOT indexed by `scenario-search`, so path/tag lookups use
+     `scenario-list`.
+2. **Search every enabled platform's server** (each platform is a separate server with its own
+   store) and union the results.
+3. **Report** name + description + platform + (for fuzzy search) relevance score; surface scope.
+The platform sections below list each enabled cycle's server, tool prefix, and store dir.
+<!--IRONBEE:PLATFORM:browser-->
+<!--/IRONBEE:PLATFORM:browser-->
+<!--IRONBEE:PLATFORM:node-->
+<!--/IRONBEE:PLATFORM:node-->
+<!--IRONBEE:PLATFORM:backend-->
+<!--/IRONBEE:PLATFORM:backend-->
+<!--IRONBEE:PLATFORM:android-->
+<!--/IRONBEE:PLATFORM:android-->

package/dist/clients/codex/commands/ironbee-search-scenario/SKILL.md ADDED Viewed

@@ -0,0 +1,23 @@
+---
+name: ironbee-search-scenario
+description: >
+  Find reusable IronBee verification scenarios by name, description, or metadata by delegating to
+  the ironbee-scenario custom agent. Use when the user types `$ironbee-search-scenario`. The
+  sub-agent searches every enabled platform's store and returns the matches.
+---
+# IronBee — Search scenarios
+> **Delegate** — spawn the **`ironbee-scenario` custom agent** via `spawn_agent` with
+> `agent_type="ironbee-scenario"` **and `fork_turns="none"`**. The sub-agent owns the scenario tools.
+Find saved verification **scenarios**. Read-only.
+## Steps
+1. **Spawn** `spawn_agent` with `agent_type="ironbee-scenario"` and `fork_turns="none"`, passing in
+   `message`:
+   > Operation: search
+   > Query: \<the user's description — a name/topic for fuzzy search, or a path/tag for metadata match>
+   The sub-agent picks the right surface (fuzzy name+description vs precise `metadataMatch`), searches
+   **every enabled platform's store**, and unions the results. **Wait for the sub-agent in the same turn.**
+2. **Relay** the matches — name, description, platform, and (for fuzzy search) relevance score.

package/dist/clients/codex/commands/ironbee-sync-scenario/SKILL.main.md ADDED Viewed

@@ -0,0 +1,55 @@
+---
+name: ironbee-sync-scenario
+description: >
+  Re-validate saved IronBee verification scenarios against the current code and repair MECHANICAL
+  drift, by driving the scenario-* MCP tools yourself. Use when the user types
+  `$ironbee-sync-scenario`. A leading `check` token = dry-run (report drift, no repair).
+---
+# IronBee — Sync scenario(s)
+This project runs IronBee in **main-agent** mode — the devtools `*_scenario-*` MCP tools are wired
+into THIS session, so **you** drive them. Re-validate + repair saved verification **scenarios**. This
+is NOT a verification cycle — no verdict, no gate.
+## Steps
+1. **Resolve mode + target**: strip a leading `check` token (→ dry-run) and a leading `force` token
+   (→ include ALL scenarios, not just stale); remainder = `all` (stale ones; with `force`, every one)
+   or a name / description (one). Empty → `all`. **Print the target list + count before running.**
+   Run targets that share an `ironbee.group` in ascending `ironbee.order` (a flow split across platforms).
+2. **For each target scenario** (resolve via `*_scenario-search` / `*_scenario-list`; `all` = the stale
+   ones — covered files changed since their `ironbee.commit`, or authored as drafts) **run it**
+   (`*_scenario-run`, against the live app — start it if needed, tear down what you started) and classify:
+   - **passes** → still current; (non-check) `*_scenario-update` to stamp `ironbee.commit` → HEAD
+     (read via `git rev-parse HEAD`) + `ironbee.liveValidated: true`. `*_scenario-update`
+     shallow-replaces metadata — read current metadata and re-send it MERGED with these two keys
+     (don't drop `coveredPaths` / `group` / `argsSchema`).
+   - **mechanical DRIFT** (the way to reach / drive the flow changed, not the expected outcome) →
+     repair the SCRIPT mechanics only, `*_scenario-update`, re-run until green, then stamp.
+   - **real DEFECT** (the expected outcome is unreachable — the app broke) → **STOP, report, do NOT
+     touch the scenario.**
+   - **expectation CHANGED** (a deliberate behavior / spec change) → do NOT auto-edit the assertion;
+     ask the user.
+   - **`check` mode** → only run + report drift; never repair / update.
+   - **Classify safely** (repair is the only branch that edits a scenario, so a defect mistaken for
+     drift masks a regression): before repairing, self-check whether the fix changes *how* the flow
+     is driven (drift — OK to repair) or *what* it asserts (never drift — a defect → STOP, or a
+     deliberate change → ask). A failure while reaching / driving the flow leans drift; a failure at
+     the terminal assertion leans defect. **Uncertain → treat as a defect and STOP.**
+3. **Report** per scenario: repaired / still-fresh / defect-reported / needs decision.
+**Hard rule: repair MECHANICS, never the ASSERTION / expected outcome** — silently relaxing an
+assertion to make a stale scenario pass would mask a regression. (To just *detect* staleness without
+running anything, use `ironbee scenario status`.)
+<!--IRONBEE:PLATFORM:browser-->
+<!--/IRONBEE:PLATFORM:browser-->
+<!--IRONBEE:PLATFORM:node-->
+<!--/IRONBEE:PLATFORM:node-->
+<!--IRONBEE:PLATFORM:backend-->
+<!--/IRONBEE:PLATFORM:backend-->
+<!--IRONBEE:PLATFORM:android-->
+<!--/IRONBEE:PLATFORM:android-->

package/dist/clients/codex/commands/ironbee-sync-scenario/SKILL.md ADDED Viewed

@@ -0,0 +1,33 @@
+---
+name: ironbee-sync-scenario
+description: >
+  Re-validate saved IronBee verification scenarios against the current code and repair MECHANICAL
+  drift, by delegating to the ironbee-scenario custom agent (operation sync). Use when the user types
+  `$ironbee-sync-scenario`. A leading `check` token = dry-run (report drift, no repair).
+---
+# IronBee — Sync scenario(s)
+> **Delegate** — spawn the **`ironbee-scenario` custom agent** via `spawn_agent` with
+> `agent_type="ironbee-scenario"` **and `fork_turns="none"`** (the default `fork_turns="all"` silently
+> drops the agent_type → a generic toolless agent). The sub-agent owns the `scenario-*` tools.
+Re-validate + repair saved verification **scenarios**. This is NOT a verification cycle.
+## Steps
+1. **Resolve the mode + target**: strip a leading `check` token (→ dry-run) and a leading `force` token
+   (→ sync ALL scenarios, not just stale); remainder = `all` (stale ones; `force` = every one) or a
+   name / description (one). Empty → `all`.
+2. **Spawn** `spawn_agent` with `agent_type="ironbee-scenario"` and `fork_turns="none"`, passing in
+   `message`:
+   > Operation: sync
+   > Target: \<`all`, or the name / description>
+   > Force: \<include `Force: all` ONLY if the request began with `force`>
+   > Mode: \<include `Mode: check` ONLY if the request began with `check`; otherwise OMIT>
+   The sub-agent runs each target against the live app, classifies (still-fresh / mechanical drift →
+   repair the SCRIPT only / real defect → STOP + report / expectation changed → ask), and on a
+   non-check run stamps repaired scenarios current. **It repairs MECHANICS, never what a scenario
+   verifies. Wait for the sub-agent in the same turn.**
+3. **Relay** the summary (per scenario: repaired / still-fresh / defect-reported / needs decision).
+(To just *detect* staleness without running anything, use `ironbee scenario status`.)