@ironbee-ai/cli 0.31.0 → 0.33.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +8 -0
- package/dist/clients/base.js +1 -1
- package/dist/clients/claude/agents/ironbee-scenario.md +40 -11
- package/dist/clients/claude/agents/ironbee-verifier.md +40 -4
- package/dist/clients/claude/commands/ironbee-manage-scenario.md +2 -1
- package/dist/clients/claude/hooks/require-verdict.js +2 -2
- package/dist/clients/claude/hooks/require-verification.js +3 -3
- package/dist/clients/claude/hooks/track-action-monitor.js +1 -1
- package/dist/clients/claude/hooks/track-action.js +1 -1
- package/dist/clients/claude/index.js +4 -4
- package/dist/clients/claude/platforms/scenario.terminal.md +26 -0
- package/dist/clients/claude/platforms/skill.browser.md +1 -1
- package/dist/clients/claude/platforms/skill.terminal.md +62 -0
- package/dist/clients/codex/agents/ironbee-scenario.md +39 -10
- package/dist/clients/codex/agents/ironbee-verifier.md +39 -3
- package/dist/clients/codex/commands/ironbee-manage-scenario/SKILL.main.md +21 -6
- package/dist/clients/codex/commands/ironbee-manage-scenario/SKILL.md +2 -1
- package/dist/clients/codex/commands/ironbee-search-scenario/SKILL.main.md +3 -0
- package/dist/clients/codex/commands/ironbee-sync-scenario/SKILL.main.md +4 -1
- package/dist/clients/codex/commands/ironbee-verify/SKILL.main.md +4 -0
- package/dist/clients/codex/hooks/require-verification.js +1 -1
- package/dist/clients/codex/hooks/track-action.js +1 -1
- package/dist/clients/codex/index.js +2 -2
- package/dist/clients/codex/platforms/command-verify.terminal.md +61 -0
- package/dist/clients/codex/platforms/rule.terminal.md +31 -0
- package/dist/clients/codex/platforms/scenario.terminal.md +36 -0
- package/dist/clients/codex/platforms/skill.browser.md +1 -1
- package/dist/clients/codex/platforms/skill.terminal.md +57 -0
- package/dist/clients/codex/rules/ironbee-verification.main.md +3 -0
- package/dist/clients/codex/skills/ironbee-verification.main.md +14 -0
- package/dist/clients/codex/util.js +1 -1
- package/dist/clients/cursor/commands/ironbee-manage-scenario/SKILL.md +21 -6
- package/dist/clients/cursor/commands/ironbee-search-scenario/SKILL.md +3 -0
- package/dist/clients/cursor/commands/ironbee-sync-scenario/SKILL.md +4 -1
- package/dist/clients/cursor/commands/ironbee-verify/SKILL.md +4 -0
- package/dist/clients/cursor/hooks/require-verdict.js +2 -2
- package/dist/clients/cursor/hooks/require-verification.js +3 -3
- package/dist/clients/cursor/hooks/track-action-monitor.js +1 -1
- package/dist/clients/cursor/hooks/track-action.js +1 -1
- package/dist/clients/cursor/index.js +1 -1
- package/dist/clients/cursor/platforms/command-verify.terminal.md +61 -0
- package/dist/clients/cursor/platforms/rule.terminal.md +31 -0
- package/dist/clients/cursor/platforms/scenario.terminal.md +29 -0
- package/dist/clients/cursor/platforms/skill.browser.md +1 -1
- package/dist/clients/cursor/platforms/skill.terminal.md +54 -0
- package/dist/clients/cursor/rules/ironbee-verification.mdc +3 -0
- package/dist/clients/cursor/skills/ironbee-verification.md +14 -0
- package/dist/clients/registry.js +1 -1
- package/dist/commands/config.js +2 -2
- package/dist/commands/hook.js +22 -19
- package/dist/commands/install.js +1 -1
- package/dist/commands/platform-suggest.js +2 -0
- package/dist/commands/scenario.js +1 -1
- package/dist/commands/terminal.js +1 -0
- package/dist/hooks/core/actions.js +9 -7
- package/dist/hooks/core/run-checks.js +7 -0
- package/dist/hooks/core/verification-context.js +19 -15
- package/dist/hooks/core/verify-gate.js +35 -21
- package/dist/import/claude/events/tool-call.js +1 -1
- package/dist/import/codex/events/tool-call.js +1 -1
- package/dist/index.js +1 -1
- package/dist/lib/config.js +1 -1
- package/dist/lib/event.js +1 -1
- package/dist/lib/headless.js +1 -0
- package/dist/lib/install-version.js +1 -1
- package/dist/lib/platform-section.js +5 -4
- package/dist/lib/prompt.js +6 -5
- package/dist/lib/scenario-staleness.js +1 -1
- package/dist/tui/config/schema.js +1 -1
- package/dist/tui/platforms/area.js +2 -2
- package/dist/tui/projects/area.js +4 -4
- package/dist/tui/shell/session.js +5 -5
- package/package.json +1 -1
|
@@ -61,7 +61,7 @@ This is NOT a verification cycle — you submit no verdict and do not gate compl
|
|
|
61
61
|
- **passes** → still current. (non-check) `scenario-update` to stamp `ironbee.commit` → current HEAD
|
|
62
62
|
(read via `git rev-parse HEAD`) + `ironbee.liveValidated: true`; done. `scenario-update`
|
|
63
63
|
shallow-replaces metadata, so read the current metadata and re-send it MERGED with these two
|
|
64
|
-
keys — don't drop `coveredPaths` / `group`
|
|
64
|
+
keys — don't drop `coveredPaths` / `group`. (Omit `params` to keep the stored contract.)
|
|
65
65
|
- **fails due to DRIFT** (the *mechanics* broke — the way to reach / drive the flow changed, not the
|
|
66
66
|
expected outcome) → repair the SCRIPT mechanics only, `scenario-update`, re-run until green, then
|
|
67
67
|
stamp commit / liveValidated.
|
|
@@ -139,30 +139,56 @@ their results.
|
|
|
139
139
|
|
|
140
140
|
## Script format
|
|
141
141
|
A scenario `script` is JS run in the devtools sandbox (async — top-level `await`/`return` work).
|
|
142
|
-
It reads
|
|
142
|
+
It reads its inputs from the `args` binding and invokes the platform's tools via `callTool`:
|
|
143
143
|
|
|
144
144
|
```js
|
|
145
|
-
const { baseUrl } = args; // declared
|
|
145
|
+
const { baseUrl } = args; // declared in the scenario's `params` contract
|
|
146
146
|
const result = await callTool('<bare-tool-name>', { /* tool input */ });
|
|
147
147
|
return { ok: true };
|
|
148
148
|
```
|
|
149
149
|
|
|
150
|
-
|
|
151
|
-
`argsSchema` metadata. **Discover the available `callTool` tool names for a
|
|
152
|
-
connected MCP tool schemas** (the bare names) — don't guess.
|
|
150
|
+
Declare each input the script reads via the first-class **`params`** contract (see §Parameters) —
|
|
151
|
+
not the old opaque `argsSchema` metadata key. **Discover the available `callTool` tool names for a
|
|
152
|
+
platform from your connected MCP tool schemas** (the bare names) — don't guess.
|
|
153
|
+
|
|
154
|
+
## Parameters (`params`) — typed, defaulted, validated
|
|
155
|
+
For a parametric scenario, declare each input via the first-class **`params`** array on
|
|
156
|
+
`scenario-add` / `scenario-update` (a top-level field, NOT inside `metadata` — this supersedes the
|
|
157
|
+
old `argsSchema` metadata convention). Each entry:
|
|
158
|
+
|
|
159
|
+
- `name` (required) — the `args` key the script reads (e.g. `baseUrl`).
|
|
160
|
+
- `description` — what the param is (agent/human-facing).
|
|
161
|
+
- `type` — `string` / `number` / `boolean` / `object` / `array`. Omit for an untyped passthrough;
|
|
162
|
+
`object` / `array` are shallow-checked at the top level only (inner shape not validated).
|
|
163
|
+
- `default` — applied when the caller omits the arg; for `object` / `array` it doubles as the
|
|
164
|
+
concrete shape example. **Capture sensible defaults from the live-authoring run** so the scenario
|
|
165
|
+
re-runs "as captured" with zero args.
|
|
166
|
+
- `example` — documentation-only concrete shape, surfaced when there's no `default` (typically for
|
|
167
|
+
`object` / `array`). Never injected or validated.
|
|
168
|
+
- `required` — `true` rejects the run when there's no value AND no `default`.
|
|
169
|
+
|
|
170
|
+
`scenario-run` then applies defaults for omitted args, enforces `required`, and shallow-validates
|
|
171
|
+
declared types: re-running after a fix needs no re-derived args (`scenario-run { name }` reproduces
|
|
172
|
+
the captured values), and a wrong-type / required-missing run fails loudly instead of running with
|
|
173
|
+
`undefined`. The declared params ride in `scenario-list` / `-search` / `-run` output, so the
|
|
174
|
+
contract is visible without reading the script. Pass `args` only to OVERRIDE a default. A scenario
|
|
175
|
+
with no `params` keeps the fully-opaque `args` passthrough (document its shape in `description`).
|
|
176
|
+
|
|
177
|
+
**`scenario-update` shallow-replaces `params`** (same as `metadata`): to change one entry, re-send
|
|
178
|
+
the FULL `params` array; omit `params` entirely to keep the stored contract.
|
|
153
179
|
|
|
154
180
|
## Metadata conventions (stamp these on add/update)
|
|
155
181
|
- `ironbee.coveredPaths` — source paths the scenario exercises (array), when derivable.
|
|
156
|
-
- `argsSchema` — declared params, e.g. `{ "baseUrl": "string" }`.
|
|
157
|
-
**Mandatory for any parametric scenario** (run reads it to know what to ask).
|
|
158
182
|
- `ironbee.liveValidated` — `true` when you validated the scenario by running it end-to-end against
|
|
159
183
|
the live app this session; `false` when authored source-only (`draft`, or the app couldn't be
|
|
160
184
|
started). Always stamp it.
|
|
161
185
|
- `ironbee.commit` — the commit the scenario was authored against (`git rev-parse HEAD`).
|
|
162
186
|
- `ironbee.group` / `ironbee.order` — for a high-level scenario split across platforms: a shared
|
|
163
187
|
group slug + integer run order.
|
|
164
|
-
- `scenario-update` does a **shallow replace** of metadata — to change one key,
|
|
165
|
-
|
|
188
|
+
- `scenario-update` does a **shallow replace** of metadata (and of `params`) — to change one key,
|
|
189
|
+
re-send the FULL object / array (read it first, merge, write back).
|
|
190
|
+
- (The scenario's typed input contract is the first-class **`params`** field — see §Parameters —
|
|
191
|
+
NOT a metadata key.)
|
|
166
192
|
|
|
167
193
|
The platform sections below tell you each enabled cycle's server, tool prefix, and store dir.
|
|
168
194
|
|
|
@@ -177,3 +203,6 @@ The platform sections below tell you each enabled cycle's server, tool prefix, a
|
|
|
177
203
|
|
|
178
204
|
<!--IRONBEE:PLATFORM:android-->
|
|
179
205
|
<!--/IRONBEE:PLATFORM:android-->
|
|
206
|
+
|
|
207
|
+
<!--IRONBEE:PLATFORM:terminal-->
|
|
208
|
+
<!--/IRONBEE:PLATFORM:terminal-->
|
|
@@ -29,14 +29,30 @@ The delegating prompt may tell you what to verify in one of two ways:
|
|
|
29
29
|
gate's required-tools for you (as long as the scenario exercises them).
|
|
30
30
|
**On a PASS verdict, also keep the scenario fresh:** `*_scenario-update` its `ironbee.commit`
|
|
31
31
|
→ current HEAD (`git rev-parse HEAD`) + `liveValidated: true` — read the current metadata and
|
|
32
|
-
re-send it MERGED (shallow replace; don't drop `coveredPaths` / `group
|
|
32
|
+
re-send it MERGED (shallow replace; don't drop `coveredPaths` / `group`; omit `params` to keep
|
|
33
|
+
the stored typed contract). On a
|
|
33
34
|
FAIL / defect, do NOT stamp (leave it for `$ironbee-sync-scenario scenario:<name>` or the user).
|
|
34
35
|
- **A FREE-TEXT scenario / file path** — anything else is authoritative: verify exactly what it
|
|
35
36
|
describes, driving each active cycle's tools to exercise precisely the flows, states, and endpoints
|
|
36
37
|
it names (this replaces the default "exercise the changed pages/endpoints").
|
|
37
38
|
|
|
38
39
|
Map each `checks` entry to a scenario step, each `issues` entry to a step that failed. If no scenario
|
|
39
|
-
is given at all, exercise the changed pages/endpoints for each active cycle
|
|
40
|
+
is given at all, exercise the changed pages/endpoints for each active cycle **plus the downstream
|
|
41
|
+
flows they feed** (see *Verify end-to-end* below).
|
|
42
|
+
|
|
43
|
+
## Verify end-to-end — trace the blast radius (don't stop at the edited file)
|
|
44
|
+
|
|
45
|
+
A change's defect most often surfaces not on the edited file's own surface but in a **downstream
|
|
46
|
+
consumer** of what the change produces — wherever its output is read back, stored, rendered, or acted
|
|
47
|
+
on. Before driving tools, spend ONE quick pass reading/grepping the code to map the blast radius:
|
|
48
|
+
identify what the change produces and which other surfaces consume it, then exercise the FULL flow
|
|
49
|
+
from where the change is produced through to where its effect is observable — not only the surface the
|
|
50
|
+
edited file owns. A feature that works at its source but breaks in a downstream consumer is a **FAIL**.
|
|
51
|
+
|
|
52
|
+
This holds even when the consumer was not itself edited: the place you should have updated but didn't
|
|
53
|
+
never appears in the changed-files list, so don't let that list bound your verification — **follow the
|
|
54
|
+
data, not the diff.** Keep the mapping quick (a focused scan, not a full audit) so it doesn't eat the
|
|
55
|
+
speed budget.
|
|
40
56
|
|
|
41
57
|
## Session id — you don't need it
|
|
42
58
|
The `ironbee hook` commands resolve the session automatically from your environment
|
|
@@ -59,6 +75,23 @@ echo '{"status":"pass","checks":["..."]}' | ironbee hook submit-verdict
|
|
|
59
75
|
echo '{}' | ironbee hook verification-start --intent fix
|
|
60
76
|
```
|
|
61
77
|
(No declared mode → plain form as above, no flag.)
|
|
78
|
+
1.5. **Run the project checks FIRST (lint/test/…)** — the deterministic first step of every
|
|
79
|
+
verification cycle. Run them with a **generous timeout** (they may take minutes):
|
|
80
|
+
```
|
|
81
|
+
echo '{}' | ironbee hook run-checks
|
|
82
|
+
```
|
|
83
|
+
This runs the project's configured `verification.checks` and records the results IronBee's
|
|
84
|
+
gate reads.
|
|
85
|
+
|
|
86
|
+
🛑 **HARD STOP — IF ANY REQUIRED CHECK FAILS, THE VERIFICATION HAS ALREADY FAILED.** Do **NOT**
|
|
87
|
+
drive the devtools tools. Do **NOT** submit a pass. Do **NOT** rationalize the failure away — it
|
|
88
|
+
is **NOT your call** whether a required failure is "just a planted fixture", "unrelated to my
|
|
89
|
+
change", "pre-existing", or "not really broken": IronBee marked the check **required**, so a
|
|
90
|
+
non-zero exit **IS** a verification failure, full stop. Immediately submit a **fail** verdict
|
|
91
|
+
whose `issues` are the failing checks (the gate enforces the fix).
|
|
92
|
+
|
|
93
|
+
Only when **every** required check PASSES do you continue to the application/devtools flow below.
|
|
94
|
+
(If it reports "no checks configured", just continue.)
|
|
62
95
|
2. Build and start the application **only if it isn't already running** (check
|
|
63
96
|
`docker compose ps` / process output / config — don't guess ports). **Track whether YOU
|
|
64
97
|
started it**: if it was already up, the user or main agent owns it — leave it alone.
|
|
@@ -118,7 +151,7 @@ Each tool call is a separate LLM round-trip, and that round-trip — not the too
|
|
|
118
151
|
— is the dominant cost of a verification. Drive the tools in as few turns as you can:
|
|
119
152
|
|
|
120
153
|
- **Batch a scope's work into ONE `*_execute` call.** Each cycle exposes a batch tool
|
|
121
|
-
(`bdt_execute` / `ndt_execute` / `bedt_execute` / `adt_execute`) that runs many steps in
|
|
154
|
+
(`bdt_execute` / `ndt_execute` / `bedt_execute` / `adt_execute` / `tdt_execute`) that runs many steps in
|
|
122
155
|
one turn — nest each as a `callTool('<tool>', { … })`. A batch nests only that cycle's own
|
|
123
156
|
tools (you can't mix servers in one `*_execute`). It's a JS sandbox, so a later step
|
|
124
157
|
can reuse a value an earlier `callTool` returned
|
|
@@ -147,3 +180,6 @@ Each tool call is a separate LLM round-trip, and that round-trip — not the too
|
|
|
147
180
|
|
|
148
181
|
<!--IRONBEE:PLATFORM:android-->
|
|
149
182
|
<!--/IRONBEE:PLATFORM:android-->
|
|
183
|
+
|
|
184
|
+
<!--IRONBEE:PLATFORM:terminal-->
|
|
185
|
+
<!--/IRONBEE:PLATFORM:terminal-->
|
|
@@ -69,23 +69,35 @@ tools directly: that keeps it gate-orthogonal — no `verification_id`, can't fa
|
|
|
69
69
|
> passes" means fixing the SCRIPT, never working around the app.)
|
|
70
70
|
|
|
71
71
|
## Script format
|
|
72
|
-
JS run in the devtools sandbox (async — top-level `await`/`return` work); reads
|
|
72
|
+
JS run in the devtools sandbox (async — top-level `await`/`return` work); reads its inputs from `args`:
|
|
73
73
|
|
|
74
74
|
```js
|
|
75
|
-
const { baseUrl } = args; // declared
|
|
75
|
+
const { baseUrl } = args; // declared in the scenario's `params` contract
|
|
76
76
|
const result = await callTool('<bare-tool-name>', { /* tool input */ });
|
|
77
77
|
return { ok: true };
|
|
78
78
|
```
|
|
79
79
|
|
|
80
80
|
Discover the available `callTool` tool names for a platform from your connected MCP schemas — don't
|
|
81
|
-
guess.
|
|
81
|
+
guess. Declare each input via the first-class **`params`** contract (§Parameters), not `argsSchema`.
|
|
82
|
+
|
|
83
|
+
## Parameters (`params`) — typed, defaulted, validated
|
|
84
|
+
Declare a parametric scenario's inputs via the first-class **`params`** array on
|
|
85
|
+
`scenario-add` / `scenario-update` (top-level field, NOT metadata — supersedes the old `argsSchema`
|
|
86
|
+
convention). Each entry: `name` (required — the `args` key the script reads), `description`, `type`
|
|
87
|
+
(`string`/`number`/`boolean`/`object`/`array`; `object`/`array` shallow-checked at the top level),
|
|
88
|
+
`default` (applied when the arg is omitted — **capture it from the live-authoring run** so the
|
|
89
|
+
scenario re-runs "as captured" with zero args), `example` (doc-only shape when there's no `default`),
|
|
90
|
+
`required` (reject the run when there's no value AND no `default`). `scenario-run` applies defaults,
|
|
91
|
+
enforces `required`, shallow-validates declared types, and surfaces `params` in list/search/run
|
|
92
|
+
output. Pass `args` only to OVERRIDE a default. `scenario-update` shallow-replaces `params` (re-send
|
|
93
|
+
the full array; omit to keep the stored contract).
|
|
82
94
|
|
|
83
95
|
## Metadata conventions (stamp on add/update)
|
|
84
|
-
- `argsSchema` — declared params, e.g. `{ "baseUrl": "string" }`. **Mandatory for parametric scenarios.**
|
|
85
96
|
- `ironbee.coveredPaths` — source paths exercised (array), when derivable.
|
|
86
97
|
- `ironbee.group` / `ironbee.order` — for a cross-platform split.
|
|
87
|
-
- `*_scenario-update` does a **shallow replace** of metadata — to change one key,
|
|
88
|
-
|
|
98
|
+
- `*_scenario-update` does a **shallow replace** of metadata (and of `params`) — to change one key,
|
|
99
|
+
re-send the FULL object / array (read it first, merge, write back). The typed input contract is the
|
|
100
|
+
first-class `params` field (§Parameters), not a metadata key.
|
|
89
101
|
|
|
90
102
|
The platform sections below list each enabled cycle's server, tool prefix, and store dir.
|
|
91
103
|
|
|
@@ -100,3 +112,6 @@ The platform sections below list each enabled cycle's server, tool prefix, and s
|
|
|
100
112
|
|
|
101
113
|
<!--IRONBEE:PLATFORM:android-->
|
|
102
114
|
<!--/IRONBEE:PLATFORM:android-->
|
|
115
|
+
|
|
116
|
+
<!--IRONBEE:PLATFORM:terminal-->
|
|
117
|
+
<!--/IRONBEE:PLATFORM:terminal-->
|
|
@@ -32,7 +32,8 @@ custom agent. This is NOT a verification cycle — it submits no verdict and doe
|
|
|
32
32
|
right platform, authors the script — **against the live app by default** (starts the app if needed,
|
|
33
33
|
observes the real behavior, validates by running once, then cleans up — deletes any probe /
|
|
34
34
|
throwaway scenarios it added and stops what it started; `draft` skips this)
|
|
35
|
-
— and
|
|
35
|
+
— and declares the typed `params` contract for parametric ones (defaults captured from the run)
|
|
36
|
+
plus stamps metadata.
|
|
36
37
|
**Delete and fuzzy-resolved update ask you to confirm** the matched scenario first — relay that
|
|
37
38
|
to the user and pass their answer back. **Wait for the sub-agent in the same turn.**
|
|
38
39
|
3. **Relay** the sub-agent's summary (what it created / updated / deleted, on which platform).
|
|
@@ -23,7 +23,7 @@ is NOT a verification cycle — no verdict, no gate.
|
|
|
23
23
|
- **passes** → still current; (non-check) `*_scenario-update` to stamp `ironbee.commit` → HEAD
|
|
24
24
|
(read via `git rev-parse HEAD`) + `ironbee.liveValidated: true`. `*_scenario-update`
|
|
25
25
|
shallow-replaces metadata — read current metadata and re-send it MERGED with these two keys
|
|
26
|
-
(don't drop `coveredPaths` / `group
|
|
26
|
+
(don't drop `coveredPaths` / `group`; omit `params` to keep the stored typed contract).
|
|
27
27
|
- **mechanical DRIFT** (the way to reach / drive the flow changed, not the expected outcome) →
|
|
28
28
|
repair the SCRIPT mechanics only, `*_scenario-update`, re-run until green, then stamp.
|
|
29
29
|
- **real DEFECT** (the expected outcome is unreachable — the app broke) → **STOP, report, do NOT
|
|
@@ -53,3 +53,6 @@ running anything, use `ironbee scenario status`.)
|
|
|
53
53
|
|
|
54
54
|
<!--IRONBEE:PLATFORM:android-->
|
|
55
55
|
<!--/IRONBEE:PLATFORM:android-->
|
|
56
|
+
|
|
57
|
+
<!--IRONBEE:PLATFORM:terminal-->
|
|
58
|
+
<!--/IRONBEE:PLATFORM:terminal-->
|
|
@@ -72,6 +72,7 @@ stripping a leading `fix` / `report` mode token.
|
|
|
72
72
|
```
|
|
73
73
|
echo '{"session_id":"<your-session-id>"}' | ironbee hook verification-start --intent fix
|
|
74
74
|
```
|
|
75
|
+
1.5. **Run the project checks FIRST (lint/test/…)**: `echo '{"session_id":"<your-session-id>"}' | ironbee hook run-checks` (generous shell timeout — they may take minutes). Runs the configured `verification.checks` and records the results the gate reads. 🛑 **IF ANY REQUIRED CHECK FAILS, THE VERIFICATION HAS ALREADY FAILED — STOP.** It is **NOT your call** whether the failure is "just a fixture", "unrelated", or "pre-existing" — a required non-zero exit **IS** a failure. Do **NOT** touch the devtools tools or submit a pass; submit a **fail** verdict whose `issues` are the failing checks (the gate enforces the fix). Only when **every** required check PASSES do you continue. ("no checks configured" → continue.)
|
|
75
76
|
2. **Build and start** the application if not already running (don't guess ports). Track what YOU started.
|
|
76
77
|
3. **For every active cycle, run its flow** — driven by the scenario above when supplied, otherwise
|
|
77
78
|
per the platform sections near the bottom of this file. All active cycles must be exercised within
|
|
@@ -102,6 +103,9 @@ stripping a leading `fix` / `report` mode token.
|
|
|
102
103
|
<!--IRONBEE:PLATFORM:android-->
|
|
103
104
|
<!--/IRONBEE:PLATFORM:android-->
|
|
104
105
|
|
|
106
|
+
<!--IRONBEE:PLATFORM:terminal-->
|
|
107
|
+
<!--/IRONBEE:PLATFORM:terminal-->
|
|
108
|
+
|
|
105
109
|
---
|
|
106
110
|
|
|
107
111
|
## When to FAIL
|
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
Start verification first:
|
|
4
4
|
echo '{"session_id":"${t}"}' | ironbee hook verification-start
|
|
5
5
|
|
|
6
|
-
Then use the verification tools for the active cycle(s) \u2014 mcp__browser-devtools__bdt_* for browser, mcp__node-devtools__ndt_* for node, mcp__backend-devtools__bedt_* for backend, mcp__android-devtools__adt_* for android.`;process.stdout.write(JSON.stringify({hookSpecificOutput:{hookEventName:"PreToolUse",permissionDecision:"deny",permissionDecisionReason:p}})),process.exit(0);return}const _=r.tool_name??"",S=(0,f.extractCodexMcpServer)(_),c=(0,A.recordingToolsForServer)(S),j=c!==null?(0,f.canonicalizeCodexToolName)(_.split("__").pop()??""):"";if(!s&&!g&&c!==null&&(0,i.isRecordingRequired)(n)&&!(0,i.isRecordingActive)(n)&&j!==c.startTool){const p=`BLOCKED: Recording is required but not started.
|
|
6
|
+
Then use the verification tools for the active cycle(s) \u2014 mcp__browser-devtools__bdt_* for browser, mcp__node-devtools__ndt_* for node, mcp__backend-devtools__bedt_* for backend, mcp__android-devtools__adt_* for android, mcp__terminal-devtools__tdt_* for terminal.`;process.stdout.write(JSON.stringify({hookSpecificOutput:{hookEventName:"PreToolUse",permissionDecision:"deny",permissionDecisionReason:p}})),process.exit(0);return}const _=r.tool_name??"",S=(0,f.extractCodexMcpServer)(_),c=(0,A.recordingToolsForServer)(S),j=c!==null?(0,f.canonicalizeCodexToolName)(_.split("__").pop()??""):"";if(!s&&!g&&c!==null&&(0,i.isRecordingRequired)(n)&&!(0,i.isRecordingActive)(n)&&j!==c.startTool){const p=`BLOCKED: Recording is required but not started.
|
|
7
7
|
|
|
8
8
|
1. Start recording NOW:
|
|
9
9
|
Use mcp__${c.server}__${c.startTool}
|
|
@@ -1 +1 @@
|
|
|
1
|
-
"use strict";var N=Object.defineProperty;var K=Object.getOwnPropertyDescriptor;var W=Object.getOwnPropertyNames;var X=Object.prototype.hasOwnProperty;var b=(t,e)=>N(t,"name",{value:e,configurable:!0});var Z=(t,e)=>{for(var o in e)N(t,o,{get:e[o],enumerable:!0})},ee=(t,e,o,n)=>{if(e&&typeof e=="object"||typeof e=="function")for(let i of W(e))!X.call(t,i)&&i!==o&&N(t,i,{get:()=>e[i],enumerable:!(n=K(e,i))||n.enumerable});return t};var te=t=>ee(N({},"__esModule",{value:!0}),t);var se={};Z(se,{run:()=>ie});module.exports=te(se);var T=require("../../../hooks/core/actions"),v=require("../../../hooks/core/nested-tools"),$=require("../../../import/ids"),L=require("../../../lib/runtime-paths"),r=require("../../../hooks/core/session-state"),P=require("../../../hooks/core/tool-use-stash"),U=require("../../../lib/config"),a=require("../../../lib/logger"),h=require("../../../lib/output"),q=require("../../../lib/recording-tools"),H=require("../../../lib/stdin"),x=require("../../../queue"),d=require("../util");function A(t){if(t==null)return 0;if(typeof t=="string")try{return Buffer.byteLength(t,"utf8")}catch{return 0}try{return Buffer.byteLength(JSON.stringify(t),"utf8")}catch{return 0}}b(A,"safeStringifyBytes");function oe(t){if(t==null)return{isError:!1,errorText:void 0};if(typeof t=="object"&&t!==null){const e=t;if(e.isError===!0||e.is_error===!0){const o=e.error??e.message??e.errorMessage;return{isError:!0,errorText:typeof o=="string"?o:JSON.stringify(e).slice(0,500)}}}if(typeof t=="string"){const e=t;if(/(?:^|\n)Process exited with code [1-9]/.test(e)||/^Exit code:\s*[1-9]/m.test(e)||/apply_patch verification failed/i.test(e)||/failed to find expected lines/i.test(e)||/^\s*Error\b/.test(e)||/(?:^|\n)\[Request interrupted by user\]/.test(e)||/modified since (?:last )?read|stale read/i.test(e)||/file (?:is )?too large|exceeds/i.test(e)||/file not found|No such file or directory|does not exist/i.test(e))return{isError:!0,errorText:e.slice(0,500)}}return{isError:!1,errorText:void 0}}b(oe,"detectFailure");function ne(t){if(t===null||typeof t!="object")return;const e=t._metadata;if(e===null||typeof e!="object")return;const o=e.toolCallId;if(typeof o=="string"&&/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i.test(o))return o}b(ne,"extractMetadataToolCallId");function re(t,e){const o=(0,P.consumeToolUseData)(t,e);if(!o?.start_ns)return null;try{const n=process.hrtime.bigint()-BigInt(o.start_ns);return Number(n/1000000n)}catch(n){return a.logger.debug(`failed to derive duration from stash: ${n}`),null}}b(re,"deriveDurationMs");async function ie(t){const e=(0,d.parseCodexHookStdin)((0,H.readStdin)()),o=e.session_id??"default",n=(0,L.sessionDir)(t,o),i=`${n}/actions.jsonl`;(0,a.setLogFile)(`${n}/session.log`);const y=e.tool_name??"",s=e.tool_use_id??"",g=e.tool_input,R=g&&typeof g=="object"?{...g,_metadata:void 0}:void 0,C=e.tool_response,l=(0,d.extractCodexMcpServer)(y),D=l==="browser-devtools"||l==="node-devtools"||l==="backend-devtools"||l==="android-devtools",z=re(o,s),c=(0,d.classifyCodexTool)(y),F=D&&(0,v.isNestedToolContainer)(c.tool_name,l),J=F?(0,v.extractNestedToolCallsFromResponse)(C,l):null,f=J!==null?{isError:!1,errorText:void 0}:oe(C);if(D){const w=c.tool_name,u=(0,q.recordingToolsForServer)(l);u!==null&&(w===u.startTool?(0,r.setRecordingActive)(n,!0):w===u.stopTool&&(0,r.setRecordingActive)(n,!1));const E=(0,r.getActiveActivityId)(n),m={...(0,T.baseFields)(i),type:"tool_call",timestamp:Date.now(),tool_type:c.tool_type,tool_name:c.tool_name,mcp_server:c.mcp_server??l,tool_input:R,tool_input_size:A(R),tool_response:f.isError?void 0:C,tool_response_size:f.isError?0:A(C),duration:z};E&&(m.activity_id=E);const B=ne(g);B!==void 0?m.id=B:s.length>0&&(m.id=(0,$.deriveToolCallEventIdFromToolUseId)(o,s)),s&&(m.tool_use_id=s);const k=(0,r.getActiveVerificationId)(n);k&&(m.verification_id=k);const S=(0,r.getActiveTraceId)(n);if(S&&(m.trace_id=S),f.isError&&(m.error=f.errorText),await(0,T.appendAction)(i,m),F&&!f.isError){const G=J??(0,v.extractNestedToolCalls)(R??g,l);for(const _ of G){u!==null&&(_.name===u.startTool?((0,r.setRecordingActive)(n,!0),a.logger.debug(`track-action (nested): recording started (${u.cycle})`)):_.name===u.stopTool&&((0,r.setRecordingActive)(n,!1),a.logger.debug(`track-action (nested): recording stopped (${u.cycle})`)));const I={...(0,T.baseFields)(i),type:"tool_call",timestamp:_.startTime??Date.now(),tool_name:_.name,tool_type:"mcp",tool_input:_.args,duration:_.duration??null,mcp_server:l,nested:!0,...s?{parent_tool_use_id:s}:{}};E&&(I.activity_id=E),k&&(I.verification_id=k),S&&(I.trace_id=S),await(0,T.appendAction)(i,I),a.logger.debug(`track-action (nested): ${_.name}`)}}(0,h.writeAndExit)(JSON.stringify({}),0);return}if(!(0,U.isJobQueueEnabled)(t)){(0,h.writeAndExit)(JSON.stringify({}),0);return}const M=(0,r.getActiveActivityId)(n),V=(0,d.extractCodexToolInput)(y,g),Q=A(g),Y=f.isError?0:A(C),p={...(0,T.baseFields)(i),type:"tool_call",timestamp:Date.now(),tool_type:c.tool_type,tool_name:c.tool_name||(0,d.normalizeCodexToolName)(y),mcp_server:c.mcp_server,tool_input:V,tool_input_size:Q,tool_response_size:Y,duration:z};M&&(p.activity_id=M),s.length>0&&(p.id=(0,$.deriveToolCallEventIdFromToolUseId)(o,s)),s&&(p.tool_use_id=s);const O=(0,r.getActiveVerificationId)(n);O&&(p.verification_id=O);const j=(0,r.getActiveTraceId)(n);j&&(p.trace_id=j),f.isError&&(p.error=f.errorText);try{(0,x.submit)(t,o,x.SEND_EVENT_TYPE,p)}catch(w){w instanceof x.JobTooLargeError?a.logger.debug(`track-action: wire event too large for tool_call ${y}; dropping`):a.logger.debug(`queue submit failed for tool_call ${y}: ${w}`)}(0,h.writeAndExit)(JSON.stringify({}),0)}b(ie,"run");0&&(module.exports={run});
|
|
1
|
+
"use strict";var N=Object.defineProperty;var K=Object.getOwnPropertyDescriptor;var W=Object.getOwnPropertyNames;var X=Object.prototype.hasOwnProperty;var b=(t,e)=>N(t,"name",{value:e,configurable:!0});var Z=(t,e)=>{for(var o in e)N(t,o,{get:e[o],enumerable:!0})},ee=(t,e,o,n)=>{if(e&&typeof e=="object"||typeof e=="function")for(let i of W(e))!X.call(t,i)&&i!==o&&N(t,i,{get:()=>e[i],enumerable:!(n=K(e,i))||n.enumerable});return t};var te=t=>ee(N({},"__esModule",{value:!0}),t);var se={};Z(se,{run:()=>ie});module.exports=te(se);var T=require("../../../hooks/core/actions"),v=require("../../../hooks/core/nested-tools"),$=require("../../../import/ids"),L=require("../../../lib/runtime-paths"),r=require("../../../hooks/core/session-state"),P=require("../../../hooks/core/tool-use-stash"),U=require("../../../lib/config"),a=require("../../../lib/logger"),h=require("../../../lib/output"),q=require("../../../lib/recording-tools"),H=require("../../../lib/stdin"),x=require("../../../queue"),d=require("../util");function A(t){if(t==null)return 0;if(typeof t=="string")try{return Buffer.byteLength(t,"utf8")}catch{return 0}try{return Buffer.byteLength(JSON.stringify(t),"utf8")}catch{return 0}}b(A,"safeStringifyBytes");function oe(t){if(t==null)return{isError:!1,errorText:void 0};if(typeof t=="object"&&t!==null){const e=t;if(e.isError===!0||e.is_error===!0){const o=e.error??e.message??e.errorMessage;return{isError:!0,errorText:typeof o=="string"?o:JSON.stringify(e).slice(0,500)}}}if(typeof t=="string"){const e=t;if(/(?:^|\n)Process exited with code [1-9]/.test(e)||/^Exit code:\s*[1-9]/m.test(e)||/apply_patch verification failed/i.test(e)||/failed to find expected lines/i.test(e)||/^\s*Error\b/.test(e)||/(?:^|\n)\[Request interrupted by user\]/.test(e)||/modified since (?:last )?read|stale read/i.test(e)||/file (?:is )?too large|exceeds/i.test(e)||/file not found|No such file or directory|does not exist/i.test(e))return{isError:!0,errorText:e.slice(0,500)}}return{isError:!1,errorText:void 0}}b(oe,"detectFailure");function ne(t){if(t===null||typeof t!="object")return;const e=t._metadata;if(e===null||typeof e!="object")return;const o=e.toolCallId;if(typeof o=="string"&&/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i.test(o))return o}b(ne,"extractMetadataToolCallId");function re(t,e){const o=(0,P.consumeToolUseData)(t,e);if(!o?.start_ns)return null;try{const n=process.hrtime.bigint()-BigInt(o.start_ns);return Number(n/1000000n)}catch(n){return a.logger.debug(`failed to derive duration from stash: ${n}`),null}}b(re,"deriveDurationMs");async function ie(t){const e=(0,d.parseCodexHookStdin)((0,H.readStdin)()),o=e.session_id??"default",n=(0,L.sessionDir)(t,o),i=`${n}/actions.jsonl`;(0,a.setLogFile)(`${n}/session.log`);const y=e.tool_name??"",s=e.tool_use_id??"",g=e.tool_input,R=g&&typeof g=="object"?{...g,_metadata:void 0}:void 0,C=e.tool_response,l=(0,d.extractCodexMcpServer)(y),D=l==="browser-devtools"||l==="node-devtools"||l==="backend-devtools"||l==="android-devtools"||l==="terminal-devtools",z=re(o,s),c=(0,d.classifyCodexTool)(y),F=D&&(0,v.isNestedToolContainer)(c.tool_name,l),J=F?(0,v.extractNestedToolCallsFromResponse)(C,l):null,f=J!==null?{isError:!1,errorText:void 0}:oe(C);if(D){const w=c.tool_name,u=(0,q.recordingToolsForServer)(l);u!==null&&(w===u.startTool?(0,r.setRecordingActive)(n,!0):w===u.stopTool&&(0,r.setRecordingActive)(n,!1));const E=(0,r.getActiveActivityId)(n),m={...(0,T.baseFields)(i),type:"tool_call",timestamp:Date.now(),tool_type:c.tool_type,tool_name:c.tool_name,mcp_server:c.mcp_server??l,tool_input:R,tool_input_size:A(R),tool_response:f.isError?void 0:C,tool_response_size:f.isError?0:A(C),duration:z};E&&(m.activity_id=E);const B=ne(g);B!==void 0?m.id=B:s.length>0&&(m.id=(0,$.deriveToolCallEventIdFromToolUseId)(o,s)),s&&(m.tool_use_id=s);const k=(0,r.getActiveVerificationId)(n);k&&(m.verification_id=k);const S=(0,r.getActiveTraceId)(n);if(S&&(m.trace_id=S),f.isError&&(m.error=f.errorText),await(0,T.appendAction)(i,m),F&&!f.isError){const G=J??(0,v.extractNestedToolCalls)(R??g,l);for(const _ of G){u!==null&&(_.name===u.startTool?((0,r.setRecordingActive)(n,!0),a.logger.debug(`track-action (nested): recording started (${u.cycle})`)):_.name===u.stopTool&&((0,r.setRecordingActive)(n,!1),a.logger.debug(`track-action (nested): recording stopped (${u.cycle})`)));const I={...(0,T.baseFields)(i),type:"tool_call",timestamp:_.startTime??Date.now(),tool_name:_.name,tool_type:"mcp",tool_input:_.args,duration:_.duration??null,mcp_server:l,nested:!0,...s?{parent_tool_use_id:s}:{}};E&&(I.activity_id=E),k&&(I.verification_id=k),S&&(I.trace_id=S),await(0,T.appendAction)(i,I),a.logger.debug(`track-action (nested): ${_.name}`)}}(0,h.writeAndExit)(JSON.stringify({}),0);return}if(!(0,U.isJobQueueEnabled)(t)){(0,h.writeAndExit)(JSON.stringify({}),0);return}const M=(0,r.getActiveActivityId)(n),V=(0,d.extractCodexToolInput)(y,g),Q=A(g),Y=f.isError?0:A(C),p={...(0,T.baseFields)(i),type:"tool_call",timestamp:Date.now(),tool_type:c.tool_type,tool_name:c.tool_name||(0,d.normalizeCodexToolName)(y),mcp_server:c.mcp_server,tool_input:V,tool_input_size:Q,tool_response_size:Y,duration:z};M&&(p.activity_id=M),s.length>0&&(p.id=(0,$.deriveToolCallEventIdFromToolUseId)(o,s)),s&&(p.tool_use_id=s);const O=(0,r.getActiveVerificationId)(n);O&&(p.verification_id=O);const j=(0,r.getActiveTraceId)(n);j&&(p.trace_id=j),f.isError&&(p.error=f.errorText);try{(0,x.submit)(t,o,x.SEND_EVENT_TYPE,p)}catch(w){w instanceof x.JobTooLargeError?a.logger.debug(`track-action: wire event too large for tool_call ${y}; dropping`):a.logger.debug(`queue submit failed for tool_call ${y}: ${w}`)}(0,h.writeAndExit)(JSON.stringify({}),0)}b(ie,"run");0&&(module.exports={run});
|
|
@@ -1,3 +1,3 @@
|
|
|
1
|
-
"use strict";var M=Object.defineProperty;var re=Object.getOwnPropertyDescriptor;var se=Object.getOwnPropertyNames;var ae=Object.prototype.hasOwnProperty;var S=(f,e)=>M(f,"name",{value:e,configurable:!0});var le=(f,e)=>{for(var o in e)M(f,o,{get:e[o],enumerable:!0})},ce=(f,e,o,r)=>{if(e&&typeof e=="object"||typeof e=="function")for(let i of se(e))!ae.call(f,i)&&i!==o&&M(f,i,{get:()=>e[i],enumerable:!(r=re(e,i))||r.enumerable});return f};var de=f=>ce(M({},"__esModule",{value:!0}),f);var fe={};le(fe,{CodexClient:()=>ue});module.exports=de(fe);var s=require("fs"),m=require("path"),U=require("../../lib/gitignore"),b=require("../../lib/logger"),l=require("../../lib/output"),O=require("../../lib/fs-prune"),d=require("../../lib/config"),C=require("../../lib/platform-section"),n=require("./util"),H=require("./thread-map"),q=require("../../lib/runtime-paths"),W=require("./hooks/verify-gate"),D=require("./hooks/activity-end"),X=require("./hooks/session-start"),Y=require("./hooks/activity-start"),z=require("./hooks/require-verification"),Q=require("./hooks/require-verdict"),Z=require("./hooks/clear-verdict"),j=require("./hooks/track-action"),ee=require("./hooks/track-action-monitor"),oe=require("./hooks/track-action-pre"),ne=require("./hooks/subagent-start"),te=require("./hooks/subagent-stop");const B="~/.ironbee/projects",E="browser-devtools",A="node-devtools",_="backend-devtools",I="android-devtools",ge="ironbee",$="ironbee-verifier",V=30,L="Verifies recent code changes through real browser/runtime/backend tools and submits the IronBee verdict. Spawn this custom agent (by agent_type) after editing code to run the verification cycle out-of-band \u2014 it drives the devtools tools, judges the result, and records the verdict in the shared session. It does NOT edit code.",x="ironbee-scenario",J=["ironbee-manage-scenario","ironbee-search-scenario","ironbee-sync-scenario"],F="Manages and searches reusable IronBee verification scenarios via the devtools scenario tools. Spawn this custom agent (by agent_type) from the scenario slash commands to author/update/delete saved scenarios and find them by name/description/metadata. NOT a verification cycle (running a saved scenario to verify is done via $ironbee-verify scenario:<name>).";function P(f){return(0,m.join)(__dirname,"..",f,"platforms")}S(P,"platformsDirFor");function y(f){return l.pc.dim(f)}S(y,"codexColor");function G(f){return f.hooks.some(e=>e.command.includes(ge))}S(G,"isIronBeeHookGroup");function me(f){const e=Object.keys(f);return e.length===0?!0:e.length===1&&e[0]==="hooks"?Object.keys(f.hooks??{}).length===0:!1}S(me,"isCodexHooksEmpty");class ue{constructor(){this.name="codex";this.supportsVerifierModel=!0}static{S(this,"CodexClient")}detect(e){return(0,s.existsSync)((0,m.join)(e,".agents","skills","ironbee-verify"))}resolveProjectDir(){return process.env.CODEX_PROJECT_DIR??process.env.IRONBEE_PROJECT_DIR??process.cwd()}install(e,o){const r=o??(0,d.loadConfig)(e),i=(0,d.getVerificationMode)(r),t=i!=="monitor",a=(0,d.getCodexVerifierMode)(r);this.cleanupArtifacts(e);const g=(0,n.codexHooksJsonPath)(e);if(this.mergeHooksConfig(g,i,a),this.mergeConfigToml(e,r,t,a),t&&(i==="enforce"&&this.writeAgentsMdBlock(e,r,a),this.writeSkills(e,i==="enforce",r,a),(0,C.syncPlatformSectionsToConfig)(e,P)),(0,U.ensureIronBeeGitignored)(e),console.log(` ${l.pc.dim("\u2192")} ${y("[codex]")} hooks ${l.pc.dim("\u2192")} ${l.pc.dim(g)}`),console.log(` ${l.pc.dim("\u2192")} ${y("[codex]")} config ${l.pc.dim("\u2192")} ${l.pc.dim((0,n.codexConfigTomlPath)(e))}`),t){const p=a==="main-agent"?`${l.pc.yellow("main-agent")} (the main agent drives the devtools tools directly)`:`${l.pc.bold("sub-agent")} (delegated to the ironbee-verifier custom agent)`;console.log(` ${l.pc.dim("\u2192")} ${y("[codex]")} verify ${l.pc.dim("\u2192")} ${p}`)}i==="enforce"?(console.log(` ${l.pc.dim("\u2192")} ${y("[codex]")} agents ${l.pc.dim("\u2192")} ${l.pc.dim((0,m.join)(e,"AGENTS.md"))}`),console.log(` ${l.pc.dim("\u2192")} ${y("[codex]")} skill ${l.pc.dim("\u2192")} ${l.pc.dim((0,m.join)(e,".agents","skills","ironbee-verification","SKILL.md"))}`),console.log(` ${l.pc.dim("\u2192")} ${y("[codex]")} command ${l.pc.dim("\u2192")} ${l.pc.dim((0,m.join)(e,".agents","skills","ironbee-verify","SKILL.md"))}`)):i==="assist"?(console.log(` ${l.pc.dim("\u2192")} ${y("[codex]")} ${l.pc.yellow("assist mode")} (verification.auto: false) \u2014 manual $ironbee-verify only, no enforcement`),console.log(` ${l.pc.dim("\u2192")} ${y("[codex]")} command ${l.pc.dim("\u2192")} ${l.pc.dim((0,m.join)(e,".agents","skills","ironbee-verify","SKILL.md"))}`)):console.log(` ${l.pc.dim("\u2192")} ${y("[codex]")} ${l.pc.yellow("monitoring-only mode")} (verification.enable: false)`),console.log(),console.log(` ${l.pc.yellow("\u26A0")} ${l.pc.yellow("Codex requires one-time TUI setup:")}`),console.log(` ${l.pc.yellow("1.")} Run ${l.pc.bold("/hooks")} in a fresh Codex session to review and trust IronBee hooks`),console.log(` ${l.pc.yellow("2.")} Restart any open Codex sessions to pick up new hook config`)}uninstall(e){this.cleanupArtifacts(e),(0,s.existsSync)((0,n.codexHooksJsonPath)(e))||this.removeFeaturesHooksFlag(e),(0,O.pruneEmptyDirs)((0,m.join)(e,".codex"));const o=(0,H.codexThreadMapPath)(e);if((0,s.existsSync)(o))try{(0,s.unlinkSync)(o)}catch(r){b.logger.debug(`failed to remove codex thread map: ${r}`)}console.log(` ${l.pc.dim("\u2192")} ${y("[codex]")} removed hooks, MCP entries, AGENTS.md block, and skills`)}removeFeaturesHooksFlag(e){const o=(0,n.codexConfigTomlPath)(e);if((0,s.existsSync)(o))try{const r=(0,s.readFileSync)(o,"utf-8");let i=(0,n.removeFeaturesHooks)(r);i=(0,n.removeSandboxWritableRoot)(i,B),i.trim().length===0?(0,s.unlinkSync)(o):i!==r&&(0,s.writeFileSync)(o,i)}catch(r){b.logger.debug(`failed to strip [features] hooks from config.toml: ${r}`)}}cleanupArtifacts(e){this.migrateAwayFromUserLevel();const o=(0,n.codexHooksJsonPath)(e);this.removeIronBeeHooks(o),this.maybeDeleteEmptyHooks(o),this.removeIronBeeMcpServers(e),this.removeVerifierAgentToml(e),this.removeScenarioAgentToml(e);const r=(0,m.join)(e,"AGENTS.md");if((0,s.existsSync)(r))try{const t=(0,s.readFileSync)(r,"utf-8"),a=(0,n.stripAgentsMdBlock)(t);a===null?(0,s.unlinkSync)(r):a!==t&&(0,s.writeFileSync)(r,a)}catch(t){b.logger.debug(`failed to strip AGENTS.md block: ${t}`)}const i=(0,m.join)(e,".agents","skills");this.removeDir((0,m.join)(i,"ironbee-verification")),this.removeDir((0,m.join)(i,"ironbee-verify"));for(const t of J)this.removeDir((0,m.join)(i,t));this.removeDir((0,m.join)(i,"ironbee-run-scenario")),(0,O.pruneEmptyDirs)((0,m.join)(e,".agents"))}async runVerifyGate(e){await(0,W.run)(e)}async runActivityEnd(e){await(0,D.run)(e)}async runSessionStart(e){await(0,X.run)(e)}async runActivityStart(e){await(0,Y.run)(e)}async runRequireVerification(e,o){await(0,z.run)(e,o)}async runRequireVerdict(e,o){await(0,Q.run)(e,o)}async runClearVerdict(e){await(0,Z.run)(e)}async runTrackAction(e){await(0,j.run)(e)}async runTrackActionMonitor(e){await(0,ee.run)(e)}async runTrackActionPre(e){await(0,oe.run)(e)}async runSubagentStart(e){await(0,ne.run)(e)}async runSubagentStop(e){await(0,te.run)(e)}resolveAgentSessionId(e,o){const r=process.env.CODEX_THREAD_ID;if(typeof r=="string"&&r.length>0&&o)return(0,H.lookupThreadSession)(o,r)}async runSessionEnd(e){b.logger.debug("session-end: no-op on Codex (no SessionEnd hook event)")}mergeHooksConfig(e,o,r){const i=o!=="monitor",t=o==="assist"?" --soft":"";(0,s.mkdirSync)((0,m.dirname)(e),{recursive:!0});let a={hooks:{}};if((0,s.existsSync)(e))try{a=JSON.parse((0,s.readFileSync)(e,"utf-8")),a.hooks||(a.hooks={})}catch(v){b.logger.debug(`failed to parse ${e}: ${v}`),a={hooks:{}}}for(const v of Object.keys(a.hooks)){const c=a.hooks[v].filter(h=>!G(h));c.length===0?delete a.hooks[v]:a.hooks[v]=c}const g=S((v,c,h)=>{a.hooks[v]||(a.hooks[v]=[]),a.hooks[v].push({matcher:c,hooks:[{type:"command",command:h}]})},"addGroup");g("SessionStart",".*","ironbee hook session-start --client codex"),g("UserPromptSubmit",".*","ironbee hook activity-start --client codex"),g("PreToolUse",".*","ironbee hook track-action-pre --client codex"),i&&(g("PreToolUse","^mcp__(browser|node|backend|android)[-_]devtools__.*",`ironbee hook require-verification --client codex${t}`),g("PreToolUse","^apply_patch$",`ironbee hook require-verdict --client codex${t}`),g("PostToolUse","^apply_patch$","ironbee hook clear-verdict --client codex"),r==="sub-agent"&&g("SubagentStart",".*","ironbee hook subagent-start --client codex")),g("SubagentStop",".*","ironbee hook subagent-stop --client codex"),g("PostToolUse",".*",i?"ironbee hook track-action --client codex":"ironbee hook track-action-monitor --client codex"),g("Stop",".*",o==="enforce"?"ironbee hook verify-gate --client codex":"ironbee hook activity-end --client codex"),(0,s.writeFileSync)(e,JSON.stringify(a,null,2))}removeIronBeeHooks(e){if((0,s.existsSync)(e))try{const o=(0,s.readFileSync)(e,"utf-8"),r=JSON.parse(o);if(!r.hooks)return;let i=!1;for(const t of Object.keys(r.hooks)){const a=r.hooks[t].filter(g=>!G(g));a.length!==r.hooks[t].length&&(i=!0),a.length===0?delete r.hooks[t]:r.hooks[t]=a}i&&(0,s.writeFileSync)(e,JSON.stringify(r,null,2))}catch(o){b.logger.debug(`failed to strip IronBee hooks from ${e}: ${o}`)}}maybeDeleteEmptyHooks(e){if((0,s.existsSync)(e))try{const o=JSON.parse((0,s.readFileSync)(e,"utf-8"));me(o)&&(0,s.unlinkSync)(e)}catch(o){b.logger.debug(`failed to inspect ${e} for emptiness: ${o}`)}}mergeConfigToml(e,o,r,i){(0,s.mkdirSync)((0,m.join)(e,".codex"),{recursive:!0});let t=(0,n.readCodexConfigToml)(e);if(t=(0,n.ensureFeaturesHooksTrue)(t),t=r&&(0,q.resolveRuntimeLocation)(e)==="external"?(0,n.ensureSandboxWritableRoot)(t,B):(0,n.removeSandboxWritableRoot)(t,B),t=(0,n.removeMcpServer)(t,E),t=(0,n.removeMcpServer)(t,A),t=(0,n.removeMcpServer)(t,_),t=(0,n.removeMcpServer)(t,I),r&&i==="main-agent"){t=this.upsertSessionMcpServers(t,e,o),t=(0,n.removeAgentsTable)(t,$),t=(0,n.removeAgentsTable)(t,x),t=(0,n.removeMultiAgentV2SpawnMetadata)(t),this.removeVerifierAgentToml(e),this.removeScenarioAgentToml(e),(0,n.writeCodexConfigToml)(e,t);return}if(r){const g=(0,d.getVerificationModel)(o,"codex"),p=(0,s.existsSync)((0,n.userCodexConfigTomlPath)())?(0,s.readFileSync)((0,n.userCodexConfigTomlPath)(),"utf-8"):"",u=(0,n.extractTomlTopLevelModel)(t)===null&&(0,n.extractTomlTopLevelModel)(p)===null;g===void 0&&u&&console.log(` ${l.pc.dim("\u2192")} ${y("[codex]")} ${l.pc.yellow("\u26A0 no model for the verifier")} \u2014 the ${l.pc.bold("ironbee-verifier")} sub-agent inherits the session model, but neither this project's .codex/config.toml nor ~/.codex/config.toml has a top-level ${l.pc.bold("model")}, so it may fail to spawn ("could not resolve the child model"). Fix: set ${l.pc.bold("model")} in ~/.codex/config.toml, or set ${l.pc.bold("verification.model")} in your ironbee config.`),this.writeVerifierAgentToml(e,o,g),t=(0,n.upsertAgentsTable)(t,$,[`description = ${JSON.stringify(L)}`,`config_file = ${JSON.stringify(`agents/${$}.toml`)}`]),t=(0,n.ensureMultiAgentV2SpawnMetadataExposed)(t),this.writeScenarioAgentToml(e,o,g),t=(0,n.upsertAgentsTable)(t,x,[`description = ${JSON.stringify(F)}`,`config_file = ${JSON.stringify(`agents/${x}.toml`)}`])}else t=(0,n.removeAgentsTable)(t,$),t=(0,n.removeAgentsTable)(t,x),t=(0,n.removeMultiAgentV2SpawnMetadata)(t),this.removeVerifierAgentToml(e),this.removeScenarioAgentToml(e);(0,n.writeCodexConfigToml)(e,t)}writeVerifierAgentToml(e,o,r){this.writeCustomAgentToml(e,o,r,$,L,"skill","read-only")}writeScenarioAgentToml(e,o,r){this.writeCustomAgentToml(e,o,r,x,F,"scenario","read-only")}writeCustomAgentToml(e,o,r,i,t,a,g){const p=(0,m.join)(__dirname,"agents",`${i}.md`);let u;try{u=(0,s.readFileSync)(p,"utf-8")}catch(k){b.logger.debug(`failed to read agent source ${p}: ${k}`);return}const v=P("codex");for(const k of d.ALL_CYCLES){const w=(0,d.isCycleEnabled)(o,k)?ie=>{const N=(0,m.join)(v,(0,C.fragmentFilename)(a,k,ie));return(0,s.existsSync)(N)?(0,s.readFileSync)(N,"utf-8").trimEnd():null}:null;u=(0,C.applyPlatformSection)(u,k,w,`${i}.toml`)}const c=[];c.push(`name = ${JSON.stringify(i)}`),c.push(`description = ${JSON.stringify(t)}`),c.push(`sandbox_mode = ${JSON.stringify(g)}`),r&&c.push(`model = ${JSON.stringify(r)}`),c.push("developer_instructions = '''"),c.push(u.replace(/'''/g,"```").trimEnd()),c.push("'''");const h=S((k,T,w)=>{k&&(c.push(""),c.push(`[mcp_servers.${T}]`),c.push(...K(w)),c.push(`startup_timeout_sec = ${V}`),c.push("required = true"),c.push('default_tools_approval_mode = "approve"'))},"addCycle");h((0,d.isCycleEnabled)(o,"browser"),E,(0,d.getMcpServerEntry)(e)),h((0,d.isCycleEnabled)(o,"node"),A,(0,d.getNodeDevToolsMcpEntry)(e)),h((0,d.isCycleEnabled)(o,"backend"),_,(0,d.getBackendDevToolsMcpEntry)(e)),h((0,d.isCycleEnabled)(o,"android"),I,(0,d.getAndroidDevToolsMcpEntry)(e));const R=(0,n.codexAgentTomlPath)(e,i);(0,s.mkdirSync)((0,m.dirname)(R),{recursive:!0}),(0,s.writeFileSync)(R,c.join(`
|
|
1
|
+
"use strict";var P=Object.defineProperty;var le=Object.getOwnPropertyDescriptor;var ce=Object.getOwnPropertyNames;var de=Object.prototype.hasOwnProperty;var S=(f,e)=>P(f,"name",{value:e,configurable:!0});var me=(f,e)=>{for(var o in e)P(f,o,{get:e[o],enumerable:!0})},ge=(f,e,o,r)=>{if(e&&typeof e=="object"||typeof e=="function")for(let i of ce(e))!de.call(f,i)&&i!==o&&P(f,i,{get:()=>e[i],enumerable:!(r=le(e,i))||r.enumerable});return f};var ue=f=>ge(P({},"__esModule",{value:!0}),f);var be={};me(be,{CodexClient:()=>pe});module.exports=ue(be);var s=require("fs"),q=require("os"),m=require("path"),W=require("../../lib/gitignore"),p=require("../../lib/logger"),c=require("../../lib/output"),N=require("../../lib/fs-prune"),D=require("../../lib/headless"),l=require("../../lib/config"),C=require("../../lib/platform-section"),n=require("./util"),B=require("./thread-map"),X=require("../../lib/runtime-paths"),Y=require("./hooks/verify-gate"),z=require("./hooks/activity-end"),Q=require("./hooks/session-start"),Z=require("./hooks/activity-start"),j=require("./hooks/require-verification"),ee=require("./hooks/require-verdict"),oe=require("./hooks/clear-verdict"),ne=require("./hooks/track-action"),te=require("./hooks/track-action-monitor"),ie=require("./hooks/track-action-pre"),re=require("./hooks/subagent-start"),se=require("./hooks/subagent-stop");const O="~/.ironbee/projects",w="browser-devtools",A="node-devtools",_="backend-devtools",I="android-devtools",R="terminal-devtools",fe="ironbee",$="ironbee-verifier",L=30,J="Verifies recent code changes through real browser/runtime/backend tools and submits the IronBee verdict. Spawn this custom agent (by agent_type) after editing code to run the verification cycle out-of-band \u2014 it drives the devtools tools, judges the result, and records the verdict in the shared session. It does NOT edit code.",x="ironbee-scenario",F=["ironbee-manage-scenario","ironbee-search-scenario","ironbee-sync-scenario"],G="Manages and searches reusable IronBee verification scenarios via the devtools scenario tools. Spawn this custom agent (by agent_type) from the scenario slash commands to author/update/delete saved scenarios and find them by name/description/metadata. NOT a verification cycle (running a saved scenario to verify is done via $ironbee-verify scenario:<name>).";function H(f){return(0,m.join)(__dirname,"..",f,"platforms")}S(H,"platformsDirFor");function k(f){return c.pc.dim(f)}S(k,"codexColor");function K(f){return f.hooks.some(e=>e.command.includes(fe))}S(K,"isIronBeeHookGroup");function ve(f){const e=Object.keys(f);return e.length===0?!0:e.length===1&&e[0]==="hooks"?Object.keys(f.hooks??{}).length===0:!1}S(ve,"isCodexHooksEmpty");class pe{constructor(){this.name="codex";this.supportsVerifierModel=!0}static{S(this,"CodexClient")}detect(e){return(0,s.existsSync)((0,m.join)(e,".agents","skills","ironbee-verify"))}resolveProjectDir(){return process.env.CODEX_PROJECT_DIR??process.env.IRONBEE_PROJECT_DIR??process.cwd()}async runHeadlessPrompt(e,o){const r=(0,s.mkdtempSync)((0,m.join)((0,q.tmpdir)(),"ironbee-codex-")),i=(0,m.join)(r,"last.txt");try{await(0,D.runHeadlessCommand)("codex",["exec","--sandbox","read-only","--skip-git-repo-check","-o",i,e],{cwd:o.projectDir,timeoutMs:o.timeoutMs,signal:o.signal});try{return(0,s.readFileSync)(i,"utf8")}catch{return""}}finally{try{(0,s.rmSync)(r,{recursive:!0,force:!0})}catch{}}}install(e,o){const r=o??(0,l.loadConfig)(e),i=(0,l.getVerificationMode)(r),t=i!=="monitor",a=(0,l.getCodexVerifierMode)(r);this.cleanupArtifacts(e);const g=(0,n.codexHooksJsonPath)(e);if(this.mergeHooksConfig(g,i,a),this.mergeConfigToml(e,r,t,a),t&&(i==="enforce"&&this.writeAgentsMdBlock(e,r,a),this.writeSkills(e,i==="enforce",r,a),(0,C.syncPlatformSectionsToConfig)(e,H)),(0,W.ensureIronBeeGitignored)(e),console.log(` ${c.pc.dim("\u2192")} ${k("[codex]")} hooks ${c.pc.dim("\u2192")} ${c.pc.dim(g)}`),console.log(` ${c.pc.dim("\u2192")} ${k("[codex]")} config ${c.pc.dim("\u2192")} ${c.pc.dim((0,n.codexConfigTomlPath)(e))}`),t){const h=a==="main-agent"?`${c.pc.yellow("main-agent")} (the main agent drives the devtools tools directly)`:`${c.pc.bold("sub-agent")} (delegated to the ironbee-verifier custom agent)`;console.log(` ${c.pc.dim("\u2192")} ${k("[codex]")} verify ${c.pc.dim("\u2192")} ${h}`)}i==="enforce"?(console.log(` ${c.pc.dim("\u2192")} ${k("[codex]")} agents ${c.pc.dim("\u2192")} ${c.pc.dim((0,m.join)(e,"AGENTS.md"))}`),console.log(` ${c.pc.dim("\u2192")} ${k("[codex]")} skill ${c.pc.dim("\u2192")} ${c.pc.dim((0,m.join)(e,".agents","skills","ironbee-verification","SKILL.md"))}`),console.log(` ${c.pc.dim("\u2192")} ${k("[codex]")} command ${c.pc.dim("\u2192")} ${c.pc.dim((0,m.join)(e,".agents","skills","ironbee-verify","SKILL.md"))}`)):i==="assist"?(console.log(` ${c.pc.dim("\u2192")} ${k("[codex]")} ${c.pc.yellow("assist mode")} (verification.auto: false) \u2014 manual $ironbee-verify only, no enforcement`),console.log(` ${c.pc.dim("\u2192")} ${k("[codex]")} command ${c.pc.dim("\u2192")} ${c.pc.dim((0,m.join)(e,".agents","skills","ironbee-verify","SKILL.md"))}`)):console.log(` ${c.pc.dim("\u2192")} ${k("[codex]")} ${c.pc.yellow("monitoring-only mode")} (verification.enable: false)`),console.log(),console.log(` ${c.pc.yellow("\u26A0")} ${c.pc.yellow("Codex requires one-time TUI setup:")}`),console.log(` ${c.pc.yellow("1.")} Run ${c.pc.bold("/hooks")} in a fresh Codex session to review and trust IronBee hooks`),console.log(` ${c.pc.yellow("2.")} Restart any open Codex sessions to pick up new hook config`)}uninstall(e){this.cleanupArtifacts(e),(0,s.existsSync)((0,n.codexHooksJsonPath)(e))||this.removeFeaturesHooksFlag(e),(0,N.pruneEmptyDirs)((0,m.join)(e,".codex"));const o=(0,B.codexThreadMapPath)(e);if((0,s.existsSync)(o))try{(0,s.unlinkSync)(o)}catch(r){p.logger.debug(`failed to remove codex thread map: ${r}`)}console.log(` ${c.pc.dim("\u2192")} ${k("[codex]")} removed hooks, MCP entries, AGENTS.md block, and skills`)}removeFeaturesHooksFlag(e){const o=(0,n.codexConfigTomlPath)(e);if((0,s.existsSync)(o))try{const r=(0,s.readFileSync)(o,"utf-8");let i=(0,n.removeFeaturesHooks)(r);i=(0,n.removeSandboxWritableRoot)(i,O),i.trim().length===0?(0,s.unlinkSync)(o):i!==r&&(0,s.writeFileSync)(o,i)}catch(r){p.logger.debug(`failed to strip [features] hooks from config.toml: ${r}`)}}cleanupArtifacts(e){this.migrateAwayFromUserLevel();const o=(0,n.codexHooksJsonPath)(e);this.removeIronBeeHooks(o),this.maybeDeleteEmptyHooks(o),this.removeIronBeeMcpServers(e),this.removeVerifierAgentToml(e),this.removeScenarioAgentToml(e);const r=(0,m.join)(e,"AGENTS.md");if((0,s.existsSync)(r))try{const t=(0,s.readFileSync)(r,"utf-8"),a=(0,n.stripAgentsMdBlock)(t);a===null?(0,s.unlinkSync)(r):a!==t&&(0,s.writeFileSync)(r,a)}catch(t){p.logger.debug(`failed to strip AGENTS.md block: ${t}`)}const i=(0,m.join)(e,".agents","skills");this.removeDir((0,m.join)(i,"ironbee-verification")),this.removeDir((0,m.join)(i,"ironbee-verify"));for(const t of F)this.removeDir((0,m.join)(i,t));this.removeDir((0,m.join)(i,"ironbee-run-scenario")),(0,N.pruneEmptyDirs)((0,m.join)(e,".agents"))}async runVerifyGate(e){await(0,Y.run)(e)}async runActivityEnd(e){await(0,z.run)(e)}async runSessionStart(e){await(0,Q.run)(e)}async runActivityStart(e){await(0,Z.run)(e)}async runRequireVerification(e,o){await(0,j.run)(e,o)}async runRequireVerdict(e,o){await(0,ee.run)(e,o)}async runClearVerdict(e){await(0,oe.run)(e)}async runTrackAction(e){await(0,ne.run)(e)}async runTrackActionMonitor(e){await(0,te.run)(e)}async runTrackActionPre(e){await(0,ie.run)(e)}async runSubagentStart(e){await(0,re.run)(e)}async runSubagentStop(e){await(0,se.run)(e)}resolveAgentSessionId(e,o){const r=process.env.CODEX_THREAD_ID;if(typeof r=="string"&&r.length>0&&o)return(0,B.lookupThreadSession)(o,r)}async runSessionEnd(e){p.logger.debug("session-end: no-op on Codex (no SessionEnd hook event)")}mergeHooksConfig(e,o,r){const i=o!=="monitor",t=o==="assist"?" --soft":"";(0,s.mkdirSync)((0,m.dirname)(e),{recursive:!0});let a={hooks:{}};if((0,s.existsSync)(e))try{a=JSON.parse((0,s.readFileSync)(e,"utf-8")),a.hooks||(a.hooks={})}catch(v){p.logger.debug(`failed to parse ${e}: ${v}`),a={hooks:{}}}for(const v of Object.keys(a.hooks)){const d=a.hooks[v].filter(b=>!K(b));d.length===0?delete a.hooks[v]:a.hooks[v]=d}const g=S((v,d,b)=>{a.hooks[v]||(a.hooks[v]=[]),a.hooks[v].push({matcher:d,hooks:[{type:"command",command:b}]})},"addGroup");g("SessionStart",".*","ironbee hook session-start --client codex"),g("UserPromptSubmit",".*","ironbee hook activity-start --client codex"),g("PreToolUse",".*","ironbee hook track-action-pre --client codex"),i&&(g("PreToolUse","^mcp__(browser|node|backend|android|terminal)[-_]devtools__.*",`ironbee hook require-verification --client codex${t}`),g("PreToolUse","^apply_patch$",`ironbee hook require-verdict --client codex${t}`),g("PostToolUse","^apply_patch$","ironbee hook clear-verdict --client codex"),r==="sub-agent"&&g("SubagentStart",".*","ironbee hook subagent-start --client codex")),g("SubagentStop",".*","ironbee hook subagent-stop --client codex"),g("PostToolUse",".*",i?"ironbee hook track-action --client codex":"ironbee hook track-action-monitor --client codex"),g("Stop",".*",o==="enforce"?"ironbee hook verify-gate --client codex":"ironbee hook activity-end --client codex"),(0,s.writeFileSync)(e,JSON.stringify(a,null,2))}removeIronBeeHooks(e){if((0,s.existsSync)(e))try{const o=(0,s.readFileSync)(e,"utf-8"),r=JSON.parse(o);if(!r.hooks)return;let i=!1;for(const t of Object.keys(r.hooks)){const a=r.hooks[t].filter(g=>!K(g));a.length!==r.hooks[t].length&&(i=!0),a.length===0?delete r.hooks[t]:r.hooks[t]=a}i&&(0,s.writeFileSync)(e,JSON.stringify(r,null,2))}catch(o){p.logger.debug(`failed to strip IronBee hooks from ${e}: ${o}`)}}maybeDeleteEmptyHooks(e){if((0,s.existsSync)(e))try{const o=JSON.parse((0,s.readFileSync)(e,"utf-8"));ve(o)&&(0,s.unlinkSync)(e)}catch(o){p.logger.debug(`failed to inspect ${e} for emptiness: ${o}`)}}mergeConfigToml(e,o,r,i){(0,s.mkdirSync)((0,m.join)(e,".codex"),{recursive:!0});let t=(0,n.readCodexConfigToml)(e);if(t=(0,n.ensureFeaturesHooksTrue)(t),t=r&&(0,X.resolveRuntimeLocation)(e)==="external"?(0,n.ensureSandboxWritableRoot)(t,O):(0,n.removeSandboxWritableRoot)(t,O),t=(0,n.removeMcpServer)(t,w),t=(0,n.removeMcpServer)(t,A),t=(0,n.removeMcpServer)(t,_),t=(0,n.removeMcpServer)(t,I),t=(0,n.removeMcpServer)(t,R),r&&i==="main-agent"){t=this.upsertSessionMcpServers(t,e,o),t=(0,n.removeAgentsTable)(t,$),t=(0,n.removeAgentsTable)(t,x),t=(0,n.removeMultiAgentV2SpawnMetadata)(t),this.removeVerifierAgentToml(e),this.removeScenarioAgentToml(e),(0,n.writeCodexConfigToml)(e,t);return}if(r){const g=(0,l.getVerificationModel)(o,"codex"),h=(0,s.existsSync)((0,n.userCodexConfigTomlPath)())?(0,s.readFileSync)((0,n.userCodexConfigTomlPath)(),"utf-8"):"",u=(0,n.extractTomlTopLevelModel)(t)===null&&(0,n.extractTomlTopLevelModel)(h)===null;g===void 0&&u&&console.log(` ${c.pc.dim("\u2192")} ${k("[codex]")} ${c.pc.yellow("\u26A0 no model for the verifier")} \u2014 the ${c.pc.bold("ironbee-verifier")} sub-agent inherits the session model, but neither this project's .codex/config.toml nor ~/.codex/config.toml has a top-level ${c.pc.bold("model")}, so it may fail to spawn ("could not resolve the child model"). Fix: set ${c.pc.bold("model")} in ~/.codex/config.toml, or set ${c.pc.bold("verification.model")} in your ironbee config.`),this.writeVerifierAgentToml(e,o,g),t=(0,n.upsertAgentsTable)(t,$,[`description = ${JSON.stringify(J)}`,`config_file = ${JSON.stringify(`agents/${$}.toml`)}`]),t=(0,n.ensureMultiAgentV2SpawnMetadataExposed)(t),this.writeScenarioAgentToml(e,o,g),t=(0,n.upsertAgentsTable)(t,x,[`description = ${JSON.stringify(G)}`,`config_file = ${JSON.stringify(`agents/${x}.toml`)}`])}else t=(0,n.removeAgentsTable)(t,$),t=(0,n.removeAgentsTable)(t,x),t=(0,n.removeMultiAgentV2SpawnMetadata)(t),this.removeVerifierAgentToml(e),this.removeScenarioAgentToml(e);(0,n.writeCodexConfigToml)(e,t)}writeVerifierAgentToml(e,o,r){this.writeCustomAgentToml(e,o,r,$,J,"skill","read-only")}writeScenarioAgentToml(e,o,r){this.writeCustomAgentToml(e,o,r,x,G,"scenario","read-only")}writeCustomAgentToml(e,o,r,i,t,a,g){const h=(0,m.join)(__dirname,"agents",`${i}.md`);let u;try{u=(0,s.readFileSync)(h,"utf-8")}catch(y){p.logger.debug(`failed to read agent source ${h}: ${y}`);return}const v=H("codex");for(const y of l.ALL_CYCLES){const E=(0,l.isCycleEnabled)(o,y)?ae=>{const V=(0,m.join)(v,(0,C.fragmentFilename)(a,y,ae));return(0,s.existsSync)(V)?(0,s.readFileSync)(V,"utf-8").trimEnd():null}:null;u=(0,C.applyPlatformSection)(u,y,E,`${i}.toml`)}const d=[];d.push(`name = ${JSON.stringify(i)}`),d.push(`description = ${JSON.stringify(t)}`),d.push(`sandbox_mode = ${JSON.stringify(g)}`),r&&d.push(`model = ${JSON.stringify(r)}`),d.push("developer_instructions = '''"),d.push(u.replace(/'''/g,"```").trimEnd()),d.push("'''");const b=S((y,T,E)=>{y&&(d.push(""),d.push(`[mcp_servers.${T}]`),d.push(...U(E)),d.push(`startup_timeout_sec = ${L}`),d.push("required = true"),d.push('default_tools_approval_mode = "approve"'))},"addCycle");b((0,l.isCycleEnabled)(o,"browser"),w,(0,l.getMcpServerEntry)(e)),b((0,l.isCycleEnabled)(o,"node"),A,(0,l.getNodeDevToolsMcpEntry)(e)),b((0,l.isCycleEnabled)(o,"backend"),_,(0,l.getBackendDevToolsMcpEntry)(e)),b((0,l.isCycleEnabled)(o,"android"),I,(0,l.getAndroidDevToolsMcpEntry)(e)),b((0,l.isCycleEnabled)(o,"terminal"),R,(0,l.getTerminalDevToolsMcpEntry)(e));const M=(0,n.codexAgentTomlPath)(e,i);(0,s.mkdirSync)((0,m.dirname)(M),{recursive:!0}),(0,s.writeFileSync)(M,d.join(`
|
|
2
2
|
`)+`
|
|
3
|
-
`)}upsertSessionMcpServers(e,o,r){let i=e;const t=S((a,g,
|
|
3
|
+
`)}upsertSessionMcpServers(e,o,r){let i=e;const t=S((a,g,h)=>{if(!a)return;const u=[...U(h),`startup_timeout_sec = ${L}`,'default_tools_approval_mode = "approve"'];i=(0,n.upsertMcpServer)(i,g,u)},"addCycle");return t((0,l.isCycleEnabled)(r,"browser"),w,(0,l.getMcpServerEntry)(o)),t((0,l.isCycleEnabled)(r,"node"),A,(0,l.getNodeDevToolsMcpEntry)(o)),t((0,l.isCycleEnabled)(r,"backend"),_,(0,l.getBackendDevToolsMcpEntry)(o)),t((0,l.isCycleEnabled)(r,"android"),I,(0,l.getAndroidDevToolsMcpEntry)(o)),t((0,l.isCycleEnabled)(r,"terminal"),R,(0,l.getTerminalDevToolsMcpEntry)(o)),i}removeVerifierAgentToml(e){const o=(0,n.codexAgentTomlPath)(e,$);if((0,s.existsSync)(o))try{(0,s.unlinkSync)(o)}catch(r){p.logger.debug(`failed to remove verifier agent toml: ${r}`)}}removeScenarioAgentToml(e){const o=(0,n.codexAgentTomlPath)(e,x);if((0,s.existsSync)(o))try{(0,s.unlinkSync)(o)}catch(r){p.logger.debug(`failed to remove scenario agent toml: ${r}`)}}removeIronBeeMcpServers(e){let o=(0,n.readCodexConfigToml)(e);o&&(o=(0,n.removeMcpServer)(o,w),o=(0,n.removeMcpServer)(o,A),o=(0,n.removeMcpServer)(o,_),o=(0,n.removeMcpServer)(o,I),o=(0,n.removeMcpServer)(o,R),o=(0,n.removeAgentsTable)(o,$),o=(0,n.removeAgentsTable)(o,x),o=(0,n.removeMultiAgentV2SpawnMetadata)(o),(0,n.writeCodexConfigToml)(e,o))}migrateAwayFromUserLevel(){const e=(0,n.userCodexHooksJsonPath)();this.removeIronBeeHooks(e),this.maybeDeleteEmptyHooks(e);const o=(0,n.userCodexConfigTomlPath)();if((0,s.existsSync)(o))try{let i=(0,s.readFileSync)(o,"utf-8");const t=i;i=(0,n.removeMcpServer)(i,w),i=(0,n.removeMcpServer)(i,A),i=(0,n.removeMcpServer)(i,_),i=(0,n.removeMcpServer)(i,I),i=(0,n.removeMcpServer)(i,R),i=(0,n.removeAgentsTable)(i,$),i=(0,n.removeMultiAgentV2SpawnMetadata)(i),i!==t&&(0,s.writeFileSync)(o,i)}catch(i){p.logger.debug(`migrate: failed to clean user-level config.toml: ${i}`)}const r=(0,n.userCodexAgentTomlPath)($);if((0,s.existsSync)(r))try{(0,s.unlinkSync)(r)}catch(i){p.logger.debug(`migrate: failed to remove user-level verifier toml: ${i}`)}}writeAgentsMdBlock(e,o,r){const i=(0,m.join)(e,"AGENTS.md"),t=r==="main-agent"?"ironbee-verification.main.md":"ironbee-verification.md",a=(0,m.join)(__dirname,"rules",t);let g;try{g=(0,s.readFileSync)(a,"utf-8")}catch(d){p.logger.debug(`failed to read rule source ${a}: ${d}`);return}const h=H("codex");for(const d of l.ALL_CYCLES){const M=(0,l.isCycleEnabled)(o,d)?y=>{const T=(0,m.join)(h,(0,C.fragmentFilename)("rule",d,y));if(!(0,s.existsSync)(T)){const E=y.length>0?`${d}:${y}`:d;return p.logger.debug(`AGENTS.md platform-section ${E}: missing fragment ${T}, using placeholder`),null}return(0,s.readFileSync)(T,"utf-8").trimEnd()}:null;g=(0,C.applyPlatformSection)(g,d,M,"AGENTS.md")}const u=(0,s.existsSync)(i)?(0,s.readFileSync)(i,"utf-8"):"",v=(0,n.upsertAgentsMdBlock)(u,g);(0,s.writeFileSync)(i,v)}writeSkills(e,o,r,i){const t=(0,m.join)(e,".agents","skills"),a=i==="main-agent";if(o){const u=(0,m.join)(t,"ironbee-verification");(0,s.mkdirSync)(u,{recursive:!0});const v=(0,m.join)(__dirname,"skills",a?"ironbee-verification.main.md":"ironbee-verification.md");try{let d=(0,s.readFileSync)(v,"utf-8");a&&(d=this.spliceCycleFragments(d,"skill",r,"ironbee-verification/SKILL.md")),(0,s.writeFileSync)((0,m.join)(u,"SKILL.md"),d)}catch(d){p.logger.debug(`failed to copy skill ${v}: ${d}`)}}const g=(0,m.join)(t,"ironbee-verify");(0,s.mkdirSync)(g,{recursive:!0});const h=(0,m.join)(__dirname,"commands","ironbee-verify",a?"SKILL.main.md":"SKILL.md");try{let u=(0,s.readFileSync)(h,"utf-8");a&&(u=this.spliceCycleFragments(u,"command-verify",r,"ironbee-verify/SKILL.md")),(0,s.writeFileSync)((0,m.join)(g,"SKILL.md"),u)}catch(u){p.logger.debug(`failed to copy verify command ${h}: ${u}`)}for(const u of F){const v=(0,m.join)(t,u);(0,s.mkdirSync)(v,{recursive:!0});const d=(0,m.join)(__dirname,"commands",u,a?"SKILL.main.md":"SKILL.md");try{let b=(0,s.readFileSync)(d,"utf-8");a&&(b=this.spliceCycleFragments(b,"scenario",r,`${u}/SKILL.md`)),(0,s.writeFileSync)((0,m.join)(v,"SKILL.md"),b)}catch(b){p.logger.debug(`failed to copy scenario command ${d}: ${b}`)}}}spliceCycleFragments(e,o,r,i){const t=H("codex");let a=e;for(const g of l.ALL_CYCLES){const u=(0,l.isCycleEnabled)(r,g)?v=>{const d=(0,m.join)(t,(0,C.fragmentFilename)(o,g,v));return(0,s.existsSync)(d)?(0,s.readFileSync)(d,"utf-8").trimEnd():null}:null;a=(0,C.applyPlatformSection)(a,g,u,i)}return a}removeDir(e){if((0,s.existsSync)(e))try{(0,s.rmSync)(e,{recursive:!0,force:!0})}catch(o){p.logger.debug(`failed to remove ${e}: ${o}`)}}}function U(f){return(0,n.tomlBodyFromRecord)(f)}S(U,"mcpEntryToTomlBody");0&&(module.exports={CodexClient});
|
|
@@ -0,0 +1,61 @@
|
|
|
1
|
+
<!-- Terminal verification is ENABLED for this project. -->
|
|
2
|
+
|
|
3
|
+
## Terminal Mode (when `terminal.verifyPatterns` matches an edited file)
|
|
4
|
+
|
|
5
|
+
> **Precondition: the change must have terminal-observable behavior.** If the change is a web-only UI with no command-line / REPL / TUI surface, this section does NOT apply — `tdt_*` tools spawn a program attached to a PTY. Just do browser verification.
|
|
6
|
+
|
|
7
|
+
If the project has terminal verification enabled (`ironbee terminal enable` once at setup) and your edits touch matching paths, the Stop hook also enforces a terminal cycle. The same `verification-start` covers both cycles; one platform-agnostic verdict covers both.
|
|
8
|
+
|
|
9
|
+
### Mode behavior (terminal cycle)
|
|
10
|
+
- **default** (no arg or `default`): exercise only the commands / code paths your diff touched.
|
|
11
|
+
- **full**: exercise every terminal-reachable code path from files matching `terminal.verifyPatterns`.
|
|
12
|
+
- `visual` / `functional`: browser-only modes; terminal cycle behaves as `default` when they are passed.
|
|
13
|
+
|
|
14
|
+
### Steps (run within step 3 of the Universal steps above)
|
|
15
|
+
1. **Pick an evidence path** for the changed code:
|
|
16
|
+
- **Run-evidence** (proves a non-interactive command works): run the affected command one-shot with `mcp__terminal-devtools__tdt_pty_run` — it spawns the command attached to a PTY, runs it to completion, and returns the FULL output plus exit code. Confirm the output shows the expected result AND the exit code matches expectation. Best for CLIs, build targets, scripts, and test runs.
|
|
17
|
+
- **Interactive-evidence** (proves a REPL / shell / TUI change works):
|
|
18
|
+
- Spawn the program: `mcp__terminal-devtools__tdt_pty_start` (returns a `paneId`).
|
|
19
|
+
- Drive input: `mcp__terminal-devtools__tdt_interaction_send-keys` (tmux key syntax — `Enter`, `C-c`, `Up`, `Tab`, …) and `mcp__terminal-devtools__tdt_interaction_send-text` (literal text).
|
|
20
|
+
- Synchronize before reading: `mcp__terminal-devtools__tdt_sync_wait-for` (block until the expected output appears — prefer over delays).
|
|
21
|
+
- Capture output: `mcp__terminal-devtools__tdt_content_capture` — `mode: stream` for line-oriented programs (REPLs, shells; incremental `since` cursor reads only new lines), `mode: screen` for full-screen TUIs. Confirm it shows the expected result.
|
|
22
|
+
- Stop the pane: `mcp__terminal-devtools__tdt_pty_stop`.
|
|
23
|
+
- Auxiliary (NOT gate evidence): `mcp__terminal-devtools__tdt_sync_wait-for-idle`, `mcp__terminal-devtools__tdt_content_get-cursor`, `mcp__terminal-devtools__tdt_pty_resize`, `mcp__terminal-devtools__tdt_pty_signal`, `mcp__terminal-devtools__tdt_pty_list`.
|
|
24
|
+
2. **Submit verdict** — platform-agnostic, just status + checks (+ issues/fixes).
|
|
25
|
+
|
|
26
|
+
### Verdict (platform-agnostic)
|
|
27
|
+
```json
|
|
28
|
+
{
|
|
29
|
+
"session_id": "...",
|
|
30
|
+
"status": "pass",
|
|
31
|
+
"checks": ["`mycli build` exits 0 with the new summary line", "REPL `:help` lists the new command"]
|
|
32
|
+
}
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
For a multi-cycle pass, both browser and terminal pass criteria must hold.
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## Default Mode (terminal cycle)
|
|
40
|
+
|
|
41
|
+
Focus on the commands or code paths your diff touched — not the entire program.
|
|
42
|
+
|
|
43
|
+
### 1. Study the changes
|
|
44
|
+
1. Run `git diff --name-only` and `git diff --name-only HEAD~1`
|
|
45
|
+
2. **Ignore `.ironbee/`, `.claude/`, `.cursor/`** — tool config, not application code
|
|
46
|
+
3. **Read the full diff** for every terminal file in scope — note new commands, changed flags, new output lines, changed exit codes, new REPL/TUI behavior
|
|
47
|
+
4. Before spawning, identify: which command / subcommand / REPL command / TUI screen is affected? What input exercises it? What output / exit code proves it works?
|
|
48
|
+
|
|
49
|
+
### 2. Verify against the running program
|
|
50
|
+
- **Run-evidence**: run the affected command via `tdt_pty_run`; the output must show the expected result and the exit code must match expectation
|
|
51
|
+
- **Interactive-evidence**: spawn the program, drive the affected input flow (send-keys / send-text), wait for the expected output (`tdt_sync_wait-for`), and capture it (`tdt_content_capture`) — the capture must show the expected state after your change
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## Full Mode (`$ironbee-verify full`, terminal cycle)
|
|
56
|
+
|
|
57
|
+
Verify every terminal-reachable code path from files matching `terminal.verifyPatterns`, not just the changed files. Do NOT run `git diff` or scope to recent changes.
|
|
58
|
+
|
|
59
|
+
- Exercise every command / subcommand / REPL command / TUI screen in scope
|
|
60
|
+
- Drive at least one happy-path flow AND one error-path flow per command (confirm both the success output/exit `0` and the expected failure output/non-zero exit)
|
|
61
|
+
- Capture output (run-evidence or interactive-evidence) for each path; no unexpected crashes or stack traces
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
<!-- Terminal verification is ENABLED for this project. The Stop hook
|
|
2
|
+
enforces a terminal cycle whenever an edited file matches
|
|
3
|
+
`terminal.verifyPatterns`. -->
|
|
4
|
+
|
|
5
|
+
## Terminal cycle
|
|
6
|
+
|
|
7
|
+
Terminal file changes IF the file matches `terminal.verifyPatterns` ALSO require verification through the **terminal-devtools** MCP server (prefix `tdt_`). Terminal-cycle verification means spawning the affected program attached to a PTY and confirming its behavior — either running the command one-shot and checking its output and exit code, OR driving an interactive session (REPL / shell / TUI) and capturing the rendered output.
|
|
8
|
+
|
|
9
|
+
Both cycles can be active simultaneously (e.g. you edit both a React component and a CLI command in the same task). One `verification-start` covers all active cycles; one platform-agnostic verdict covers them all; one retry counter applies globally.
|
|
10
|
+
|
|
11
|
+
### ⚠️ `terminal-devtools` is ONLY for terminal-observable behavior
|
|
12
|
+
|
|
13
|
+
`terminal-devtools` drives CLIs, REPLs, shells, and TUIs through a PTY. It does NOT apply to web-only UI changes with no command-line surface. If the change produces no terminal-observable output (stdout / stderr / exit code / rendered TUI), do NOT call `tdt_*` tools — use the browser cycle for web-only projects.
|
|
14
|
+
|
|
15
|
+
**Misconfiguration recovery.** If you reach this state, the operator enabled the terminal cycle by mistake. The Stop hook will keep blocking with `incomplete_tools` for the terminal cycle. Don't attempt to spawn a PTY. Instead, stop and clearly report to the user: this change has no terminal-observable behavior; ask them to run `ironbee terminal disable` to unblock the gate.
|
|
16
|
+
|
|
17
|
+
### Terminal-cycle additions to the main flow
|
|
18
|
+
|
|
19
|
+
These attach to the **Required steps** above — they don't replace any step. Numbering follows the main flow:
|
|
20
|
+
|
|
21
|
+
- **Within step 3 (run flow):** also run the terminal flow: pick ONE evidence path:
|
|
22
|
+
- **Run-evidence**: run the affected command one-shot (`tdt_pty_run`) and confirm its output AND exit code match expectation
|
|
23
|
+
- **Interactive-evidence**: spawn a pane (`tdt_pty_start`) → drive input (`tdt_interaction_send-keys` / `tdt_interaction_send-text`) → synchronize (`tdt_sync_wait-for`) → capture output (`tdt_content_capture`, `mode: stream` for REPLs/shells, `mode: screen` for TUIs) → stop the pane (`tdt_pty_stop`). Auxiliary only (NOT evidence): `tdt_sync_wait-for-idle`, `tdt_content_get-cursor`, `tdt_pty_resize` / `tdt_pty_signal` / `tdt_pty_list`.
|
|
24
|
+
- **Within step 6 (submit verdict):** submit one platform-agnostic verdict with `status` + `checks` (+ `issues`/`fixes` as needed). Terminal-cycle pass criteria: (command ran via `tdt_pty_run` with output + exit code confirmed) OR (pane spawned AND input driven AND output captured showing the expected result).
|
|
25
|
+
|
|
26
|
+
### Additional BANNED for terminal cycle
|
|
27
|
+
|
|
28
|
+
- Calling `tdt_*` tools without first opening a verification cycle (`ironbee hook verification-start`).
|
|
29
|
+
- **Calling `tdt_*` tools when the change has NO terminal-observable behavior.** Use the browser cycle only for web-only projects.
|
|
30
|
+
- Claiming `status: pass` for a terminal cycle when no evidence path was exercised.
|
|
31
|
+
- Claiming `status: pass` on the run-evidence path without confirming the exit code, or on the interactive-evidence path without capturing output that shows the expected result.
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
### terminal platform (enabled)
|
|
2
|
+
- **Use for**: CLI / REPL / shell / TUI scenarios driven through a PTY.
|
|
3
|
+
- **Server**: `terminal-devtools` · **scenario tools**: the `tdt_scenario-*` tools
|
|
4
|
+
(`tdt_scenario-add` / `-update` / `-delete` / `-list` / `-search` / `-run`).
|
|
5
|
+
- **Store**: project → `.ironbee/scenarios/tdt`, global → `~/.ironbee/scenarios/tdt` (the
|
|
6
|
+
server's `SCENARIOS_DIR`; you pass `scope`, the server resolves the path).
|
|
7
|
+
- Scenario **scripts** call this platform's tools via `callTool('<bare-tool>', {...})` — discover
|
|
8
|
+
the available `tdt_*` tool names from your connected MCP tool schemas; don't guess.
|
|
9
|
+
|
|
10
|
+
**What to test & how — capture the SAME evidence the verifier would** (a scenario runs FOR
|
|
11
|
+
verification, so its script must collect what the terminal cycle collects). In the script, pick an
|
|
12
|
+
**evidence path** for the changed code area:
|
|
13
|
+
1. **Run-evidence path** — run the affected command one-shot with `tdt_pty_run` (with
|
|
14
|
+
`returnOutput: true`): it spawns the command attached to a PTY, runs it to completion, and returns
|
|
15
|
+
the FULL output plus exit code. Put the returned output AND exit code in your result; the verifier
|
|
16
|
+
reads them to judge whether the change behaved correctly. Best for non-interactive CLIs, build
|
|
17
|
+
targets, scripts, and test runs.
|
|
18
|
+
2. **Interactive-evidence path** — drive a live session:
|
|
19
|
+
- Spawn the program: `tdt_pty_start` (returns a `paneId` you reference for the rest of the script).
|
|
20
|
+
- Drive input: `tdt_interaction_send-keys` (tmux key syntax — `Enter`, `C-c`, `Up`, `Tab`, …) and
|
|
21
|
+
`tdt_interaction_send-text` (literal text).
|
|
22
|
+
- **Synchronize before reading** — `tdt_sync_wait-for` to block until the expected output appears
|
|
23
|
+
(prefer over fixed delays).
|
|
24
|
+
- Capture output: `tdt_content_capture` (with `returnOutput: true`) — `mode: stream` for
|
|
25
|
+
line-oriented programs (REPLs, shells; incremental `since` cursor reads only new lines),
|
|
26
|
+
`mode: screen` for full-screen TUIs. Its captured text is what the verifier reads.
|
|
27
|
+
- Stop the pane: `tdt_pty_stop`.
|
|
28
|
+
- Optional helpers (NOT evidence): `tdt_sync_wait-for-idle` (wait until output settles),
|
|
29
|
+
`tdt_content_get-cursor` (read the stream cursor), `tdt_pty_resize` / `tdt_pty_signal` /
|
|
30
|
+
`tdt_pty_list`.
|
|
31
|
+
|
|
32
|
+
`return` the evidence — the captured output text, the exit code (run-evidence) — **plus explicit
|
|
33
|
+
pass/fail assertions**. That returned result is what `$ironbee-verify scenario:<name>` reads to judge
|
|
34
|
+
functional correctness (from the output text and exit code). **`terminal-devtools` has no
|
|
35
|
+
screenshots / video** — there is no visual artifact to capture; the captured text and exit code ARE
|
|
36
|
+
the evidence. **`terminal-devtools` is for terminal-observable behavior only.**
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
|
|
7
7
|
> **Recording (only when `recording.enable` is on in config):** the gate blocks every other browser tool until you first call `mcp__browser-devtools__bdt_content_start-recording`, and `submit-verdict` rejects with `"recording is still active"` unless you call `mcp__browser-devtools__bdt_content_stop-recording` after the steps below. **Treat start/stop as bookends around steps 1-5.** The same is enforced as step 6 of the Universal flow.
|
|
8
8
|
|
|
9
|
-
1. **Navigate**: `mcp__browser-devtools__bdt_navigation_go-to` — go to the affected page(s)
|
|
9
|
+
1. **Navigate**: `mcp__browser-devtools__bdt_navigation_go-to` — go to the affected page(s) **AND any downstream page that renders or consumes what the change produces** — verify the change's effect where it's observed, not only the page the edited file owns
|
|
10
10
|
2. **Interact**: actually exercise what changed — click buttons, fill forms, submit data, trigger workflows. Don't just look at the page.
|
|
11
11
|
3. **Screenshot**: `mcp__browser-devtools__bdt_content_take-screenshot` — capture the final visual state
|
|
12
12
|
4. **Accessibility**: `mcp__browser-devtools__bdt_a11y_take-aria-snapshot` — verify page structure
|