@ironbee-ai/cli 0.25.1 → 0.27.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +8 -0
- package/dist/clients/claude/agents/ironbee-verifier.md +33 -0
- package/dist/clients/claude/commands/ironbee-verify.md +1 -0
- package/dist/clients/claude/hooks/activity-start.js +1 -1
- package/dist/clients/claude/hooks/session-end.js +1 -1
- package/dist/clients/claude/hooks/subagent-start.js +1 -0
- package/dist/clients/claude/hooks/subagent-stop.js +1 -0
- package/dist/clients/claude/hooks/verify-gate.js +4 -4
- package/dist/clients/claude/index.js +6 -6
- package/dist/clients/claude/platforms/skill.android.md +2 -0
- package/dist/clients/claude/platforms/skill.backend.md +2 -0
- package/dist/clients/claude/platforms/skill.browser.md +2 -0
- package/dist/clients/claude/platforms/skill.node.md +2 -0
- package/dist/clients/claude/rules/ironbee-verification.md +2 -1
- package/dist/clients/claude/skills/ironbee-verification.md +5 -0
- package/dist/clients/codex/agents/ironbee-verifier.md +75 -26
- package/dist/clients/codex/commands/ironbee-verify/SKILL.md +38 -61
- package/dist/clients/codex/index.js +2 -2
- package/dist/clients/codex/platforms/skill.android.md +2 -0
- package/dist/clients/codex/platforms/skill.backend.md +2 -0
- package/dist/clients/codex/platforms/skill.browser.md +2 -0
- package/dist/clients/codex/platforms/skill.node.md +2 -0
- package/dist/clients/codex/rules/ironbee-verification.md +10 -24
- package/dist/clients/codex/skills/ironbee-verification.md +40 -68
- package/dist/clients/codex/util.js +32 -22
- package/dist/clients/cursor/platforms/skill.android.md +2 -0
- package/dist/clients/cursor/platforms/skill.backend.md +2 -0
- package/dist/clients/cursor/platforms/skill.browser.md +2 -0
- package/dist/clients/cursor/platforms/skill.node.md +2 -0
- package/dist/clients/cursor/skills/ironbee-verification.md +21 -0
- package/dist/commands/hook.js +14 -14
- package/dist/commands/update.js +1 -1
- package/dist/hooks/core/activity-end.js +1 -1
- package/dist/hooks/core/activity-participants.js +1 -0
- package/dist/hooks/core/activity.js +1 -1
- package/dist/hooks/core/session-state.js +1 -1
- package/dist/hooks/core/submit-verdict.js +2 -2
- package/dist/hooks/core/verification-lifecycle.js +1 -1
- package/dist/hooks/core/verify-gate.js +24 -24
- package/dist/lib/config.js +1 -1
- package/dist/lib/install-version.js +1 -1
- package/dist/lib/platform-section.js +3 -3
- package/package.json +1 -1
|
@@ -1,90 +1,67 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: ironbee-verify
|
|
3
3
|
description: >
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
submits a verdict. Default is verify-only (report the verdict and stop); a
|
|
9
|
-
leading `fix` argument adds the fix-and-re-verify loop until pass. A custom
|
|
10
|
-
scenario may ride along with the invocation — inline text or a path to a
|
|
11
|
-
scenario file — defining exactly what to verify.
|
|
4
|
+
Delegate verification of the current code changes to the ironbee-verifier custom agent. Use
|
|
5
|
+
when the user types `$ironbee-verify`. Default is verify-only (report the verdict and stop);
|
|
6
|
+
a leading `fix` argument adds the fix-and-re-verify loop until pass. Optionally pass a custom
|
|
7
|
+
scenario (inline text or a file path) that defines what to verify.
|
|
12
8
|
---
|
|
13
9
|
|
|
14
10
|
# IronBee Verify
|
|
15
11
|
|
|
16
|
-
|
|
12
|
+
> **Delegate — do NOT verify inline.** Run this command by spawning the **`ironbee-verifier` custom agent** via `spawn_agent` with `agent_type="ironbee-verifier"` **and `fork_turns="none"`** (the default `fork_turns="all"` silently drops the agent_type → a generic toolless agent; not a generic "act as" agent either) and relaying its verdict. The verifier owns the devtools tools; you (the main agent) don't have them. Everything below describes what the **verifier** does — your job is only to spawn it (passing the mode + scenario in its prompt) and report back its verdict.
|
|
13
|
+
|
|
14
|
+
Verify the current code changes by **delegating to the `ironbee-verifier` custom agent**. It drives the verification tools out-of-band in this **shared session** and returns a verdict summary — so the heavy devtools output (DOM, console, screenshots) stays in its context, not yours. **You do not run the verification tools yourself**: you resolve the mode and scenario (below), spawn the verifier, and relay its result. The gate still runs every active cycle and all must pass for `status: pass`.
|
|
17
15
|
|
|
18
16
|
## Mode
|
|
19
17
|
|
|
20
18
|
The FIRST whitespace-delimited token of whatever the user provided alongside `$ironbee-verify` selects the mode; everything after it is the scenario:
|
|
21
19
|
|
|
22
|
-
- `fix` → **verify-and-fix**: on a fail verdict, fix the reported issues
|
|
20
|
+
- `fix` → **verify-and-fix**: on a fail verdict, fix the reported issues and re-delegate until the verdict passes.
|
|
23
21
|
- `report` → **verify-only** (the explicit form of the default).
|
|
24
22
|
- Anything else, or nothing → **verify-only** (default), and the WHOLE provided text is the scenario.
|
|
25
23
|
|
|
26
|
-
**Verify-only** means:
|
|
24
|
+
**Verify-only** means: relay the verdict and STOP — do **not** edit code, do **not** re-delegate on fail. The fail verdict is still submitted and recorded (that's the point — an honest status report). If the user wants the issues repaired, suggest `$ironbee-verify fix`. One caveat (enforce mode): if code was edited earlier in THIS turn, the Stop gate may still block on the fail verdict and demand fixes — follow the gate then; the mode token never overrides enforcement.
|
|
27
25
|
|
|
28
26
|
## Verification scenario
|
|
29
27
|
|
|
30
|
-
A custom verification scenario may be supplied when this command is invoked — either as **inline text** or as a **path to a file** (any location, any format;
|
|
31
|
-
|
|
32
|
-
- **If a scenario is supplied, it is authoritative**: verify exactly what it describes. Drive each active cycle's tools to exercise precisely the flows, states, and endpoints it names — this **replaces** the default "exercise the changed pages/endpoints" guidance.
|
|
33
|
-
- **If the scenario is (or points to) a file path**, read that file with your file-read tool and treat its contents as the scenario. Do not assume a fixed location or format — read whatever path was given.
|
|
34
|
-
- **If the path does not resolve to an existing file**, stop and report `scenario file not found: <path>`, then ask how to proceed — do not verify the literal path string or guess a target.
|
|
35
|
-
- **If no scenario is supplied**, fall back to the default flow: exercise the changed pages/endpoints per the active platform sections below.
|
|
36
|
-
|
|
37
|
-
Whatever the scenario directs, the gate is unchanged — you must still call every active cycle's required tools and submit a non-empty `checks`. Map each `checks` entry to a concrete scenario step/expectation, and each `issues` entry to a scenario step that failed.
|
|
38
|
-
|
|
39
|
-
## Universal steps
|
|
40
|
-
|
|
41
|
-
1. **Start verification**: Run `echo '{"session_id":"<your-session-id>"}' | ironbee hook verification-start` via Bash (substitute the actual session ID printed by the SessionStart hook).
|
|
42
|
-
**In fix mode**, add the intent flag so IronBee's completion gate enforces fix-until-pass:
|
|
43
|
-
`echo '{"session_id":"<your-session-id>"}' | ironbee hook verification-start --intent fix`
|
|
44
|
-
2. **Build and start** the application if not already running.
|
|
45
|
-
3. **For every active cycle, run its flow** — driven by the **Verification scenario** above when one was supplied, otherwise as described in the platform sections near the bottom of this file. All active cycles must be exercised within this same verification cycle.
|
|
46
|
-
4. **Stop** the dev server when verification is complete (every cycle — including the final one).
|
|
47
|
-
5. **Honor any cycle-specific teardown** noted in the platform sections BEFORE submitting your verdict.
|
|
48
|
-
6. **Submit your verdict** via Bash. One verdict covers every active cycle:
|
|
49
|
-
- Pass: `echo '{"session_id":"...","status":"pass","checks":["..."]}' | ironbee hook submit-verdict`
|
|
50
|
-
- Fail: `echo '{"session_id":"...","status":"fail","checks":["..."],"issues":["describe what failed"]}' | ironbee hook submit-verdict`
|
|
51
|
-
- N/A (nothing to verify — never fake evidence): global `echo '{"session_id":"...","status":"not_applicable","reason":["no runtime surface — type-only/config/refactor"]}'`, or per-platform on a pass/fail verdict `"not_applicable_cycles":["browser"],"reason":["server-only change"]`. `reason` is REQUIRED (recorded + observable); strict mode rejects N/A.
|
|
52
|
-
7. **If failed** → collect ALL issues first (finish testing every active cycle) and submit ONE fail verdict with all issues. Then branch by mode:
|
|
53
|
-
- **Verify-only (default)**: report the issues to the user and stop — do not edit code. Suggest `$ironbee-verify fix` to repair them.
|
|
54
|
-
- **Fix mode (`fix` token)**: fix everything, rebuild, and re-verify until pass. Do not fix one issue at a time — batch fixes to avoid repeated build/restart cycles.
|
|
55
|
-
8. If pass after a previous fail, include `"fixes"` in the verdict describing what was fixed.
|
|
28
|
+
A custom verification scenario may be supplied when this command is invoked — either as **inline text** or as a **path to a file** (any location, any format; read at run time).
|
|
56
29
|
|
|
57
|
-
|
|
30
|
+
> The scenario is whatever the user provided alongside `$ironbee-verify`, after stripping a leading `fix` / `report` mode token — the remainder is the scenario; empty remainder → the verifier uses its default flow.
|
|
58
31
|
|
|
59
|
-
|
|
60
|
-
|
|
32
|
+
- **If a scenario is supplied, it is authoritative**: the verifier must verify exactly what it describes, exercising precisely the flows/states/endpoints it names — this **replaces** the default "exercise the changed pages/endpoints" guidance.
|
|
33
|
+
- **If the scenario is (or points to) a file path**, read that file with your file-read tool yourself and pass its **contents** into the verifier's prompt (the verifier has no file-read tool). Do not assume a fixed location or format — read whatever path was given.
|
|
34
|
+
- **If the path does not resolve to an existing file**, stop and report `scenario file not found: <path>`, then ask how to proceed — do not delegate with the literal path string or guess a target.
|
|
35
|
+
- **If no scenario is supplied**, the verifier falls back to exercising the changed pages/endpoints per the active cycles.
|
|
61
36
|
|
|
62
|
-
|
|
63
|
-
<!--/IRONBEE:PLATFORM:node-->
|
|
37
|
+
## Steps
|
|
64
38
|
|
|
65
|
-
|
|
66
|
-
|
|
39
|
+
1. **Resolve the mode and scenario**: strip a leading `fix` / `report` token (see **Mode**); then file path → read it now; inline text → use as-is; empty → none.
|
|
40
|
+
2. **Spawn the `ironbee-verifier` custom agent** — call `spawn_agent` with **`agent_type="ironbee-verifier"`** AND **`fork_turns="none"`**. The `fork_turns="none"` is REQUIRED: the default `fork_turns="all"` is a full-history fork that silently DROPS the `agent_type` override, giving you a generic agent *without* the verification tools. (Do NOT "act as" the verifier or use a plain generic fork either.) Put the task, the mode, and the resolved scenario in the `message`, e.g.:
|
|
41
|
+
> Verify the current code changes.
|
|
42
|
+
> Mode: \<`fix` in fix mode — OMIT this line entirely in verify-only mode>
|
|
43
|
+
> Scenario: \<the resolved scenario text, or "none — exercise the changed pages/endpoints">
|
|
44
|
+
The verifier runs `verification-start` (relaying the fix intent to IronBee's completion gate, which then enforces fix-until-pass on you) → drives every active cycle's tools → submits the single verdict, all in this shared session. It resolves the session id from the environment, so you don't pass one.
|
|
45
|
+
**Wait for the verifier in the same turn — do NOT background it.** Let it run to completion and read its verdict before responding; a backgrounded verifier can let your turn end (and the Stop gate fire) before its verdict is recorded.
|
|
46
|
+
3. **Relay the verifier's summary** — the verdict status and, on fail, the issues it found.
|
|
47
|
+
4. **On a fail verdict, branch by mode**:
|
|
48
|
+
- **Verify-only (default)**: stop here. Report the issues clearly and suggest `$ironbee-verify fix` to repair them. Do not edit code.
|
|
49
|
+
- **Fix mode (`fix` token)**: fix the issues it reported. Optionally record what you fixed so the next pass verdict can describe it:
|
|
50
|
+
```
|
|
51
|
+
echo '{"fixes":["what you repaired"]}' | ironbee hook record-fix
|
|
52
|
+
```
|
|
53
|
+
Then re-run the verification by re-delegating (step 2) — repeat until the verdict passes. (If you skip `record-fix`, IronBee fills `fixes` from the files you changed since the fail.)
|
|
67
54
|
|
|
68
|
-
|
|
69
|
-
<!--/IRONBEE:PLATFORM:android-->
|
|
55
|
+
Do NOT verify inline — always delegate, so your context stays clean. The per-cycle "how to verify" detail (which tools to drive, the verdict expectations) lives in the `ironbee-verifier` custom agent itself — you don't need it here to delegate.
|
|
70
56
|
|
|
71
57
|
---
|
|
72
58
|
|
|
73
|
-
##
|
|
74
|
-
|
|
75
|
-
If you observe ANY problem on any active cycle — wrong data, unexpected errors, broken interactions, missing evidence, anything that doesn't match the spec — you MUST submit a **fail** verdict.
|
|
76
|
-
|
|
77
|
-
**Do NOT rationalize away problems.** If something looks wrong or behaves unexpectedly, it IS wrong.
|
|
78
|
-
|
|
79
|
-
**After a fail verdict in fix mode, you MUST fix the issues and re-verify** — do not just report and stop. In verify-only mode (the default) the opposite holds: report and stop; fixing without the `fix` token is overstepping.
|
|
80
|
-
|
|
81
|
-
## Verdict Quality
|
|
59
|
+
## What the verifier judges (so you know what to expect back)
|
|
82
60
|
|
|
83
|
-
|
|
84
|
-
-
|
|
85
|
-
- BAD: `["it works", "looks good", "feature implemented"]`
|
|
61
|
+
- It submits a **fail** verdict on ANY problem on any active cycle — wrong data, unexpected errors, broken interactions, missing evidence. It does not rationalize problems away.
|
|
62
|
+
- Its `checks` are specific observations (e.g. `"submitted valid credentials, redirected to /dashboard"`, `"console clean — 0 errors"`), not `"it works"`.
|
|
86
63
|
|
|
87
64
|
## Important
|
|
88
|
-
-
|
|
89
|
-
-
|
|
90
|
-
-
|
|
65
|
+
- The **verifier** produces the verdict; your job is to delegate, relay it, and — in fix mode — fix on fail.
|
|
66
|
+
- **Fix mode only**: a fail verdict means you must fix the issues and re-delegate until pass. In verify-only mode (the default) you report and stop — fixing without the `fix` token is overstepping.
|
|
67
|
+
- Never verify inline to "save a round trip" — delegation keeps your context clean and is the supported path.
|
|
@@ -1,3 +1,3 @@
|
|
|
1
|
-
"use strict";var E=Object.defineProperty;var W=Object.getOwnPropertyDescriptor;var Y=Object.getOwnPropertyNames;var z=Object.prototype.hasOwnProperty;var h=(u,o)=>E(u,"name",{value:o,configurable:!0});var Q=(u,o)=>{for(var e in o)E(u,e,{get:o[e],enumerable:!0})},Z=(u,o,e,r)=>{if(o&&typeof o=="object"||typeof o=="function")for(let n of Y(o))!z.call(u,n)&&n!==e&&E(u,n,{get:()=>o[n],enumerable:!(r=W(o,n))||r.enumerable});return u};var j=u=>Z(E({},"__esModule",{value:!0}),u);var io={};Q(io,{CodexClient:()=>no});module.exports=j(io);var i=require("fs"),a=require("path"),B=require("../../lib/gitignore"),f=require("../../lib/logger"),l=require("../../lib/output"),P=require("../../lib/fs-prune"),d=require("../../lib/config"),$=require("../../lib/platform-section"),t=require("./util"),H=require("./thread-map"),N=require("./hooks/verify-gate"),O=require("./hooks/activity-end"),V=require("./hooks/session-start"),G=require("./hooks/activity-start"),J=require("./hooks/require-verification"),L=require("./hooks/require-verdict"),F=require("./hooks/clear-verdict"),K=require("./hooks/track-action"),U=require("./hooks/track-action-monitor"),q=require("./hooks/track-action-pre"),D=require("./hooks/subagent-start"),X=require("./hooks/subagent-stop");const w="browser-devtools",T="node-devtools",A="backend-devtools",_="android-devtools",oo="ironbee",k="ironbee-verifier",I="Verifies recent code changes through real browser/runtime/backend tools and submits the IronBee verdict. Spawn this custom agent (by agent_type) after editing code to run the verification cycle out-of-band \u2014 it drives the devtools tools, judges the result, and records the verdict in the shared session. It does NOT edit code.";function R(u){return(0,a.join)(__dirname,"..",u,"platforms")}h(R,"platformsDirFor");function b(u){return l.pc.dim(u)}h(b,"codexColor");function M(u){return u.hooks.some(o=>o.command.includes(oo))}h(M,"isIronBeeHookGroup");function eo(u){const o=Object.keys(u);return o.length===0?!0:o.length===1&&o[0]==="hooks"?Object.keys(u.hooks??{}).length===0:!1}h(eo,"isCodexHooksEmpty");class no{constructor(){this.name="codex";this.supportsVerifierModel=!0}static{h(this,"CodexClient")}detect(o){return(0,i.existsSync)((0,a.join)(o,".agents","skills","ironbee-verify"))}resolveProjectDir(){return process.env.CODEX_PROJECT_DIR??process.env.IRONBEE_PROJECT_DIR??process.cwd()}install(o,e){const r=e??(0,d.loadConfig)(o),n=(0,d.getVerificationMode)(r),s=n!=="monitor";this.cleanupArtifacts(o);const c=(0,t.codexHooksJsonPath)(o);this.mergeHooksConfig(c,n),this.mergeConfigToml(o,r,s),s&&(n==="enforce"&&this.writeAgentsMdBlock(o,r),this.writeSkills(o,n==="enforce"),(0,$.syncPlatformSectionsToConfig)(o,R)),(0,B.ensureIronBeeGitignored)(o),console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} hooks ${l.pc.dim("\u2192")} ${l.pc.dim(c)}`),console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} config ${l.pc.dim("\u2192")} ${l.pc.dim((0,t.codexConfigTomlPath)(o))}`),n==="enforce"?(console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} agents ${l.pc.dim("\u2192")} ${l.pc.dim((0,a.join)(o,"AGENTS.md"))}`),console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} skill ${l.pc.dim("\u2192")} ${l.pc.dim((0,a.join)(o,".agents","skills","ironbee-verification","SKILL.md"))}`),console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} command ${l.pc.dim("\u2192")} ${l.pc.dim((0,a.join)(o,".agents","skills","ironbee-verify","SKILL.md"))}`)):n==="assist"?(console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} ${l.pc.yellow("assist mode")} (verification.auto: false) \u2014 manual $ironbee-verify only, no enforcement`),console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} command ${l.pc.dim("\u2192")} ${l.pc.dim((0,a.join)(o,".agents","skills","ironbee-verify","SKILL.md"))}`)):console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} ${l.pc.yellow("monitoring-only mode")} (verification.enable: false)`),console.log(),console.log(` ${l.pc.yellow("\u26A0")} ${l.pc.yellow("Codex requires one-time TUI setup:")}`),console.log(` ${l.pc.yellow("1.")} Run ${l.pc.bold("/hooks")} in a fresh Codex session to review and trust IronBee hooks`),console.log(` ${l.pc.yellow("2.")} Restart any open Codex sessions to pick up new hook config`)}uninstall(o){this.cleanupArtifacts(o),(0,P.pruneEmptyDirs)((0,a.join)(o,".codex"));const e=(0,H.codexThreadMapPath)(o);if((0,i.existsSync)(e))try{(0,i.unlinkSync)(e)}catch(r){f.logger.debug(`failed to remove codex thread map: ${r}`)}console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} removed hooks, MCP entries, AGENTS.md block, and skills`)}cleanupArtifacts(o){this.migrateAwayFromUserLevel();const e=(0,t.codexHooksJsonPath)(o);this.removeIronBeeHooks(e),this.maybeDeleteEmptyHooks(e),this.removeIronBeeMcpServers(o),this.removeVerifierAgentToml(o);const r=(0,a.join)(o,"AGENTS.md");if((0,i.existsSync)(r))try{const s=(0,i.readFileSync)(r,"utf-8"),c=(0,t.stripAgentsMdBlock)(s);c===null?(0,i.unlinkSync)(r):c!==s&&(0,i.writeFileSync)(r,c)}catch(s){f.logger.debug(`failed to strip AGENTS.md block: ${s}`)}const n=(0,a.join)(o,".agents","skills");this.removeDir((0,a.join)(n,"ironbee-verification")),this.removeDir((0,a.join)(n,"ironbee-verify")),(0,P.pruneEmptyDirs)((0,a.join)(o,".agents"))}async runVerifyGate(o){await(0,N.run)(o)}async runActivityEnd(o){await(0,O.run)(o)}async runSessionStart(o){await(0,V.run)(o)}async runActivityStart(o){await(0,G.run)(o)}async runRequireVerification(o,e){await(0,J.run)(o,e)}async runRequireVerdict(o,e){await(0,L.run)(o,e)}async runClearVerdict(o){await(0,F.run)(o)}async runTrackAction(o){await(0,K.run)(o)}async runTrackActionMonitor(o){await(0,U.run)(o)}async runTrackActionPre(o){await(0,q.run)(o)}async runSubagentStart(o){await(0,D.run)(o)}async runSubagentStop(o){await(0,X.run)(o)}resolveAgentSessionId(o,e){const r=process.env.CODEX_THREAD_ID;if(typeof r=="string"&&r.length>0&&e)return(0,H.lookupThreadSession)(e,r)}async runSessionEnd(o){f.logger.debug("session-end: no-op on Codex (no SessionEnd hook event)")}mergeHooksConfig(o,e){const r=e!=="monitor",n=e==="assist"?" --soft":"";(0,i.mkdirSync)((0,a.dirname)(o),{recursive:!0});let s={hooks:{}};if((0,i.existsSync)(o))try{s=JSON.parse((0,i.readFileSync)(o,"utf-8")),s.hooks||(s.hooks={})}catch(m){f.logger.debug(`failed to parse ${o}: ${m}`),s={hooks:{}}}for(const m of Object.keys(s.hooks)){const v=s.hooks[m].filter(y=>!M(y));v.length===0?delete s.hooks[m]:s.hooks[m]=v}const c=h((m,v,y)=>{s.hooks[m]||(s.hooks[m]=[]),s.hooks[m].push({matcher:v,hooks:[{type:"command",command:y}]})},"addGroup");c("SessionStart",".*","ironbee hook session-start --client codex"),c("UserPromptSubmit",".*","ironbee hook activity-start --client codex"),c("PreToolUse",".*","ironbee hook track-action-pre --client codex"),r&&(c("PreToolUse","^mcp__(browser|node|backend|android)[-_]devtools__.*",`ironbee hook require-verification --client codex${n}`),c("PreToolUse","^apply_patch$",`ironbee hook require-verdict --client codex${n}`),c("PostToolUse","^apply_patch$","ironbee hook clear-verdict --client codex"),c("SubagentStart",".*","ironbee hook subagent-start --client codex")),c("SubagentStop",".*","ironbee hook subagent-stop --client codex"),c("PostToolUse",".*",r?"ironbee hook track-action --client codex":"ironbee hook track-action-monitor --client codex"),c("Stop",".*",e==="enforce"?"ironbee hook verify-gate --client codex":"ironbee hook activity-end --client codex"),(0,i.writeFileSync)(o,JSON.stringify(s,null,2))}removeIronBeeHooks(o){if((0,i.existsSync)(o))try{const e=(0,i.readFileSync)(o,"utf-8"),r=JSON.parse(e);if(!r.hooks)return;let n=!1;for(const s of Object.keys(r.hooks)){const c=r.hooks[s].filter(g=>!M(g));c.length!==r.hooks[s].length&&(n=!0),c.length===0?delete r.hooks[s]:r.hooks[s]=c}n&&(0,i.writeFileSync)(o,JSON.stringify(r,null,2))}catch(e){f.logger.debug(`failed to strip IronBee hooks from ${o}: ${e}`)}}maybeDeleteEmptyHooks(o){if((0,i.existsSync)(o))try{const e=JSON.parse((0,i.readFileSync)(o,"utf-8"));eo(e)&&(0,i.unlinkSync)(o)}catch(e){f.logger.debug(`failed to inspect ${o} for emptiness: ${e}`)}}mergeConfigToml(o,e,r){(0,i.mkdirSync)((0,a.join)(o,".codex"),{recursive:!0});let n=(0,t.readCodexConfigToml)(o);if(n=(0,t.ensureFeaturesHooksTrue)(n),n=(0,t.removeMcpServer)(n,w),n=(0,t.removeMcpServer)(n,T),n=(0,t.removeMcpServer)(n,A),n=(0,t.removeMcpServer)(n,_),r){const s=(0,d.getVerificationModel)(e,"codex"),c=(0,i.existsSync)((0,t.userCodexConfigTomlPath)())?(0,i.readFileSync)((0,t.userCodexConfigTomlPath)(),"utf-8"):"",g=(0,t.extractTomlTopLevelModel)(n)===null&&(0,t.extractTomlTopLevelModel)(c)===null;s===void 0&&g&&console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} ${l.pc.yellow("\u26A0 no model for the verifier")} \u2014 the ${l.pc.bold("ironbee-verifier")} sub-agent inherits the session model, but neither this project's .codex/config.toml nor ~/.codex/config.toml has a top-level ${l.pc.bold("model")}, so it may fail to spawn ("could not resolve the child model"). Fix: set ${l.pc.bold("model")} in ~/.codex/config.toml, or set ${l.pc.bold("verification.model")} in your ironbee config.`),this.writeVerifierAgentToml(o,e,s),n=(0,t.upsertAgentsTable)(n,k,[`description = ${JSON.stringify(I)}`,`config_file = ${JSON.stringify(`agents/${k}.toml`)}`])}else n=(0,t.removeAgentsTable)(n,k),this.removeVerifierAgentToml(o);(0,t.writeCodexConfigToml)(o,n)}writeVerifierAgentToml(o,e,r){const n=(0,a.join)(__dirname,"agents",`${k}.md`);let s;try{s=(0,i.readFileSync)(n,"utf-8")}catch(v){f.logger.debug(`failed to read verifier agent source ${n}: ${v}`);return}const c=R("codex");for(const v of d.ALL_CYCLES){const S=(0,d.isCycleEnabled)(e,v)?C=>{const x=(0,a.join)(c,(0,$.fragmentFilename)("skill",v,C));return(0,i.existsSync)(x)?(0,i.readFileSync)(x,"utf-8").trimEnd():null}:null;s=(0,$.applyPlatformSection)(s,v,S,`${k}.toml`)}const g=[];g.push(`name = ${JSON.stringify(k)}`),g.push(`description = ${JSON.stringify(I)}`),g.push('sandbox_mode = "read-only"'),r&&g.push(`model = ${JSON.stringify(r)}`),g.push("developer_instructions = '''"),g.push(s.replace(/'''/g,"```").trimEnd()),g.push("'''");const p=h((v,y,S)=>{v&&(g.push(""),g.push(`[mcp_servers.${y}]`),g.push(...to(S)),g.push("required = true"),g.push('default_tools_approval_mode = "approve"'))},"addCycle");p((0,d.isCycleEnabled)(e,"browser"),w,(0,d.getMcpServerEntry)(o)),p((0,d.isCycleEnabled)(e,"node"),T,(0,d.getNodeDevToolsMcpEntry)(o)),p((0,d.isCycleEnabled)(e,"backend"),A,(0,d.getBackendDevToolsMcpEntry)(o)),p((0,d.isCycleEnabled)(e,"android"),_,(0,d.getAndroidDevToolsMcpEntry)(o));const m=(0,t.codexAgentTomlPath)(o,k);(0,i.mkdirSync)((0,a.dirname)(m),{recursive:!0}),(0,i.writeFileSync)(m,g.join(`
|
|
1
|
+
"use strict";var E=Object.defineProperty;var W=Object.getOwnPropertyDescriptor;var Y=Object.getOwnPropertyNames;var z=Object.prototype.hasOwnProperty;var h=(g,o)=>E(g,"name",{value:o,configurable:!0});var Q=(g,o)=>{for(var n in o)E(g,n,{get:o[n],enumerable:!0})},Z=(g,o,n,r)=>{if(o&&typeof o=="object"||typeof o=="function")for(let e of Y(o))!z.call(g,e)&&e!==n&&E(g,e,{get:()=>o[e],enumerable:!(r=W(o,e))||r.enumerable});return g};var j=g=>Z(E({},"__esModule",{value:!0}),g);var ro={};Q(ro,{CodexClient:()=>to});module.exports=j(ro);var i=require("fs"),a=require("path"),B=require("../../lib/gitignore"),f=require("../../lib/logger"),l=require("../../lib/output"),P=require("../../lib/fs-prune"),u=require("../../lib/config"),$=require("../../lib/platform-section"),t=require("./util"),R=require("./thread-map"),N=require("./hooks/verify-gate"),V=require("./hooks/activity-end"),O=require("./hooks/session-start"),G=require("./hooks/activity-start"),J=require("./hooks/require-verification"),L=require("./hooks/require-verdict"),F=require("./hooks/clear-verdict"),K=require("./hooks/track-action"),U=require("./hooks/track-action-monitor"),q=require("./hooks/track-action-pre"),D=require("./hooks/subagent-start"),X=require("./hooks/subagent-stop");const T="browser-devtools",w="node-devtools",A="backend-devtools",_="android-devtools",oo="ironbee",k="ironbee-verifier",eo=30,I="Verifies recent code changes through real browser/runtime/backend tools and submits the IronBee verdict. Spawn this custom agent (by agent_type) after editing code to run the verification cycle out-of-band \u2014 it drives the devtools tools, judges the result, and records the verdict in the shared session. It does NOT edit code.";function H(g){return(0,a.join)(__dirname,"..",g,"platforms")}h(H,"platformsDirFor");function b(g){return l.pc.dim(g)}h(b,"codexColor");function M(g){return g.hooks.some(o=>o.command.includes(oo))}h(M,"isIronBeeHookGroup");function no(g){const o=Object.keys(g);return o.length===0?!0:o.length===1&&o[0]==="hooks"?Object.keys(g.hooks??{}).length===0:!1}h(no,"isCodexHooksEmpty");class to{constructor(){this.name="codex";this.supportsVerifierModel=!0}static{h(this,"CodexClient")}detect(o){return(0,i.existsSync)((0,a.join)(o,".agents","skills","ironbee-verify"))}resolveProjectDir(){return process.env.CODEX_PROJECT_DIR??process.env.IRONBEE_PROJECT_DIR??process.cwd()}install(o,n){const r=n??(0,u.loadConfig)(o),e=(0,u.getVerificationMode)(r),s=e!=="monitor";this.cleanupArtifacts(o);const c=(0,t.codexHooksJsonPath)(o);this.mergeHooksConfig(c,e),this.mergeConfigToml(o,r,s),s&&(e==="enforce"&&this.writeAgentsMdBlock(o,r),this.writeSkills(o,e==="enforce"),(0,$.syncPlatformSectionsToConfig)(o,H)),(0,B.ensureIronBeeGitignored)(o),console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} hooks ${l.pc.dim("\u2192")} ${l.pc.dim(c)}`),console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} config ${l.pc.dim("\u2192")} ${l.pc.dim((0,t.codexConfigTomlPath)(o))}`),e==="enforce"?(console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} agents ${l.pc.dim("\u2192")} ${l.pc.dim((0,a.join)(o,"AGENTS.md"))}`),console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} skill ${l.pc.dim("\u2192")} ${l.pc.dim((0,a.join)(o,".agents","skills","ironbee-verification","SKILL.md"))}`),console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} command ${l.pc.dim("\u2192")} ${l.pc.dim((0,a.join)(o,".agents","skills","ironbee-verify","SKILL.md"))}`)):e==="assist"?(console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} ${l.pc.yellow("assist mode")} (verification.auto: false) \u2014 manual $ironbee-verify only, no enforcement`),console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} command ${l.pc.dim("\u2192")} ${l.pc.dim((0,a.join)(o,".agents","skills","ironbee-verify","SKILL.md"))}`)):console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} ${l.pc.yellow("monitoring-only mode")} (verification.enable: false)`),console.log(),console.log(` ${l.pc.yellow("\u26A0")} ${l.pc.yellow("Codex requires one-time TUI setup:")}`),console.log(` ${l.pc.yellow("1.")} Run ${l.pc.bold("/hooks")} in a fresh Codex session to review and trust IronBee hooks`),console.log(` ${l.pc.yellow("2.")} Restart any open Codex sessions to pick up new hook config`)}uninstall(o){this.cleanupArtifacts(o),(0,P.pruneEmptyDirs)((0,a.join)(o,".codex"));const n=(0,R.codexThreadMapPath)(o);if((0,i.existsSync)(n))try{(0,i.unlinkSync)(n)}catch(r){f.logger.debug(`failed to remove codex thread map: ${r}`)}console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} removed hooks, MCP entries, AGENTS.md block, and skills`)}cleanupArtifacts(o){this.migrateAwayFromUserLevel();const n=(0,t.codexHooksJsonPath)(o);this.removeIronBeeHooks(n),this.maybeDeleteEmptyHooks(n),this.removeIronBeeMcpServers(o),this.removeVerifierAgentToml(o);const r=(0,a.join)(o,"AGENTS.md");if((0,i.existsSync)(r))try{const s=(0,i.readFileSync)(r,"utf-8"),c=(0,t.stripAgentsMdBlock)(s);c===null?(0,i.unlinkSync)(r):c!==s&&(0,i.writeFileSync)(r,c)}catch(s){f.logger.debug(`failed to strip AGENTS.md block: ${s}`)}const e=(0,a.join)(o,".agents","skills");this.removeDir((0,a.join)(e,"ironbee-verification")),this.removeDir((0,a.join)(e,"ironbee-verify")),(0,P.pruneEmptyDirs)((0,a.join)(o,".agents"))}async runVerifyGate(o){await(0,N.run)(o)}async runActivityEnd(o){await(0,V.run)(o)}async runSessionStart(o){await(0,O.run)(o)}async runActivityStart(o){await(0,G.run)(o)}async runRequireVerification(o,n){await(0,J.run)(o,n)}async runRequireVerdict(o,n){await(0,L.run)(o,n)}async runClearVerdict(o){await(0,F.run)(o)}async runTrackAction(o){await(0,K.run)(o)}async runTrackActionMonitor(o){await(0,U.run)(o)}async runTrackActionPre(o){await(0,q.run)(o)}async runSubagentStart(o){await(0,D.run)(o)}async runSubagentStop(o){await(0,X.run)(o)}resolveAgentSessionId(o,n){const r=process.env.CODEX_THREAD_ID;if(typeof r=="string"&&r.length>0&&n)return(0,R.lookupThreadSession)(n,r)}async runSessionEnd(o){f.logger.debug("session-end: no-op on Codex (no SessionEnd hook event)")}mergeHooksConfig(o,n){const r=n!=="monitor",e=n==="assist"?" --soft":"";(0,i.mkdirSync)((0,a.dirname)(o),{recursive:!0});let s={hooks:{}};if((0,i.existsSync)(o))try{s=JSON.parse((0,i.readFileSync)(o,"utf-8")),s.hooks||(s.hooks={})}catch(m){f.logger.debug(`failed to parse ${o}: ${m}`),s={hooks:{}}}for(const m of Object.keys(s.hooks)){const v=s.hooks[m].filter(y=>!M(y));v.length===0?delete s.hooks[m]:s.hooks[m]=v}const c=h((m,v,y)=>{s.hooks[m]||(s.hooks[m]=[]),s.hooks[m].push({matcher:v,hooks:[{type:"command",command:y}]})},"addGroup");c("SessionStart",".*","ironbee hook session-start --client codex"),c("UserPromptSubmit",".*","ironbee hook activity-start --client codex"),c("PreToolUse",".*","ironbee hook track-action-pre --client codex"),r&&(c("PreToolUse","^mcp__(browser|node|backend|android)[-_]devtools__.*",`ironbee hook require-verification --client codex${e}`),c("PreToolUse","^apply_patch$",`ironbee hook require-verdict --client codex${e}`),c("PostToolUse","^apply_patch$","ironbee hook clear-verdict --client codex"),c("SubagentStart",".*","ironbee hook subagent-start --client codex")),c("SubagentStop",".*","ironbee hook subagent-stop --client codex"),c("PostToolUse",".*",r?"ironbee hook track-action --client codex":"ironbee hook track-action-monitor --client codex"),c("Stop",".*",n==="enforce"?"ironbee hook verify-gate --client codex":"ironbee hook activity-end --client codex"),(0,i.writeFileSync)(o,JSON.stringify(s,null,2))}removeIronBeeHooks(o){if((0,i.existsSync)(o))try{const n=(0,i.readFileSync)(o,"utf-8"),r=JSON.parse(n);if(!r.hooks)return;let e=!1;for(const s of Object.keys(r.hooks)){const c=r.hooks[s].filter(d=>!M(d));c.length!==r.hooks[s].length&&(e=!0),c.length===0?delete r.hooks[s]:r.hooks[s]=c}e&&(0,i.writeFileSync)(o,JSON.stringify(r,null,2))}catch(n){f.logger.debug(`failed to strip IronBee hooks from ${o}: ${n}`)}}maybeDeleteEmptyHooks(o){if((0,i.existsSync)(o))try{const n=JSON.parse((0,i.readFileSync)(o,"utf-8"));no(n)&&(0,i.unlinkSync)(o)}catch(n){f.logger.debug(`failed to inspect ${o} for emptiness: ${n}`)}}mergeConfigToml(o,n,r){(0,i.mkdirSync)((0,a.join)(o,".codex"),{recursive:!0});let e=(0,t.readCodexConfigToml)(o);if(e=(0,t.ensureFeaturesHooksTrue)(e),e=(0,t.removeMcpServer)(e,T),e=(0,t.removeMcpServer)(e,w),e=(0,t.removeMcpServer)(e,A),e=(0,t.removeMcpServer)(e,_),r){const s=(0,u.getVerificationModel)(n,"codex"),c=(0,i.existsSync)((0,t.userCodexConfigTomlPath)())?(0,i.readFileSync)((0,t.userCodexConfigTomlPath)(),"utf-8"):"",d=(0,t.extractTomlTopLevelModel)(e)===null&&(0,t.extractTomlTopLevelModel)(c)===null;s===void 0&&d&&console.log(` ${l.pc.dim("\u2192")} ${b("[codex]")} ${l.pc.yellow("\u26A0 no model for the verifier")} \u2014 the ${l.pc.bold("ironbee-verifier")} sub-agent inherits the session model, but neither this project's .codex/config.toml nor ~/.codex/config.toml has a top-level ${l.pc.bold("model")}, so it may fail to spawn ("could not resolve the child model"). Fix: set ${l.pc.bold("model")} in ~/.codex/config.toml, or set ${l.pc.bold("verification.model")} in your ironbee config.`),this.writeVerifierAgentToml(o,n,s),e=(0,t.upsertAgentsTable)(e,k,[`description = ${JSON.stringify(I)}`,`config_file = ${JSON.stringify(`agents/${k}.toml`)}`]),e=(0,t.ensureMultiAgentV2SpawnMetadataExposed)(e)}else e=(0,t.removeAgentsTable)(e,k),e=(0,t.removeMultiAgentV2SpawnMetadata)(e),this.removeVerifierAgentToml(o);(0,t.writeCodexConfigToml)(o,e)}writeVerifierAgentToml(o,n,r){const e=(0,a.join)(__dirname,"agents",`${k}.md`);let s;try{s=(0,i.readFileSync)(e,"utf-8")}catch(v){f.logger.debug(`failed to read verifier agent source ${e}: ${v}`);return}const c=H("codex");for(const v of u.ALL_CYCLES){const S=(0,u.isCycleEnabled)(n,v)?C=>{const x=(0,a.join)(c,(0,$.fragmentFilename)("skill",v,C));return(0,i.existsSync)(x)?(0,i.readFileSync)(x,"utf-8").trimEnd():null}:null;s=(0,$.applyPlatformSection)(s,v,S,`${k}.toml`)}const d=[];d.push(`name = ${JSON.stringify(k)}`),d.push(`description = ${JSON.stringify(I)}`),d.push('sandbox_mode = "read-only"'),r&&d.push(`model = ${JSON.stringify(r)}`),d.push("developer_instructions = '''"),d.push(s.replace(/'''/g,"```").trimEnd()),d.push("'''");const p=h((v,y,S)=>{v&&(d.push(""),d.push(`[mcp_servers.${y}]`),d.push(...io(S)),d.push(`startup_timeout_sec = ${eo}`),d.push("required = true"),d.push('default_tools_approval_mode = "approve"'))},"addCycle");p((0,u.isCycleEnabled)(n,"browser"),T,(0,u.getMcpServerEntry)(o)),p((0,u.isCycleEnabled)(n,"node"),w,(0,u.getNodeDevToolsMcpEntry)(o)),p((0,u.isCycleEnabled)(n,"backend"),A,(0,u.getBackendDevToolsMcpEntry)(o)),p((0,u.isCycleEnabled)(n,"android"),_,(0,u.getAndroidDevToolsMcpEntry)(o));const m=(0,t.codexAgentTomlPath)(o,k);(0,i.mkdirSync)((0,a.dirname)(m),{recursive:!0}),(0,i.writeFileSync)(m,d.join(`
|
|
2
2
|
`)+`
|
|
3
|
-
`)}removeVerifierAgentToml(o){const
|
|
3
|
+
`)}removeVerifierAgentToml(o){const n=(0,t.codexAgentTomlPath)(o,k);if((0,i.existsSync)(n))try{(0,i.unlinkSync)(n)}catch(r){f.logger.debug(`failed to remove verifier agent toml: ${r}`)}}removeIronBeeMcpServers(o){let n=(0,t.readCodexConfigToml)(o);n&&(n=(0,t.removeMcpServer)(n,T),n=(0,t.removeMcpServer)(n,w),n=(0,t.removeMcpServer)(n,A),n=(0,t.removeMcpServer)(n,_),n=(0,t.removeAgentsTable)(n,k),n=(0,t.removeMultiAgentV2SpawnMetadata)(n),(0,t.writeCodexConfigToml)(o,n))}migrateAwayFromUserLevel(){const o=(0,t.userCodexHooksJsonPath)();this.removeIronBeeHooks(o),this.maybeDeleteEmptyHooks(o);const n=(0,t.userCodexConfigTomlPath)();if((0,i.existsSync)(n))try{let e=(0,i.readFileSync)(n,"utf-8");const s=e;e=(0,t.removeMcpServer)(e,T),e=(0,t.removeMcpServer)(e,w),e=(0,t.removeMcpServer)(e,A),e=(0,t.removeMcpServer)(e,_),e=(0,t.removeAgentsTable)(e,k),e=(0,t.removeMultiAgentV2SpawnMetadata)(e),e!==s&&(0,i.writeFileSync)(n,e)}catch(e){f.logger.debug(`migrate: failed to clean user-level config.toml: ${e}`)}const r=(0,t.userCodexAgentTomlPath)(k);if((0,i.existsSync)(r))try{(0,i.unlinkSync)(r)}catch(e){f.logger.debug(`migrate: failed to remove user-level verifier toml: ${e}`)}}writeAgentsMdBlock(o,n){const r=(0,a.join)(o,"AGENTS.md"),e=(0,a.join)(__dirname,"rules","ironbee-verification.md");let s;try{s=(0,i.readFileSync)(e,"utf-8")}catch(m){f.logger.debug(`failed to read rule source ${e}: ${m}`);return}const c=H("codex");for(const m of u.ALL_CYCLES){const y=(0,u.isCycleEnabled)(n,m)?S=>{const C=(0,a.join)(c,(0,$.fragmentFilename)("rule",m,S));if(!(0,i.existsSync)(C)){const x=S.length>0?`${m}:${S}`:m;return f.logger.debug(`AGENTS.md platform-section ${x}: missing fragment ${C}, using placeholder`),null}return(0,i.readFileSync)(C,"utf-8").trimEnd()}:null;s=(0,$.applyPlatformSection)(s,m,y,"AGENTS.md")}const d=(0,i.existsSync)(r)?(0,i.readFileSync)(r,"utf-8"):"",p=(0,t.upsertAgentsMdBlock)(d,s);(0,i.writeFileSync)(r,p)}writeSkills(o,n){const r=(0,a.join)(o,".agents","skills");if(n){const c=(0,a.join)(r,"ironbee-verification");(0,i.mkdirSync)(c,{recursive:!0});const d=(0,a.join)(__dirname,"skills","ironbee-verification.md");try{const p=(0,i.readFileSync)(d,"utf-8");(0,i.writeFileSync)((0,a.join)(c,"SKILL.md"),p)}catch(p){f.logger.debug(`failed to copy skill ${d}: ${p}`)}}const e=(0,a.join)(r,"ironbee-verify");(0,i.mkdirSync)(e,{recursive:!0});const s=(0,a.join)(__dirname,"commands","ironbee-verify","SKILL.md");try{const c=(0,i.readFileSync)(s,"utf-8");(0,i.writeFileSync)((0,a.join)(e,"SKILL.md"),c)}catch(c){f.logger.debug(`failed to copy verify command ${s}: ${c}`)}}removeDir(o){if((0,i.existsSync)(o))try{(0,i.rmSync)(o,{recursive:!0,force:!0})}catch(n){f.logger.debug(`failed to remove ${o}: ${n}`)}}}function io(g){return(0,t.tomlBodyFromRecord)(g)}h(io,"mcpEntryToTomlBody");0&&(module.exports={CodexClient});
|
|
@@ -33,6 +33,8 @@ If you see only `ios/`, `web/`, or no mobile directories — the project does NO
|
|
|
33
33
|
- Read Logcat output for the tag(s) relevant to the changed code: `mcp__android-devtools__adt_o11y_log-read` or `mcp__android-devtools__adt_o11y_log-follow` (drain a follow with `mcp__android-devtools__adt_o11y_log-get-followed`, stop it with `mcp__android-devtools__adt_o11y_log-stop-follow`).
|
|
34
34
|
- Confirm expected log lines appear AND no unexpected crashes (FATAL / E/ entries for the app package).
|
|
35
35
|
|
|
36
|
+
**Batch (speed):** connect + launch-app run standalone first (prerequisites). On the device-evidence path, batch the UI interactions + the UI snapshot into one `mcp__android-devtools__adt_execute`; the snapshot captures the state after the batched interactions, so to assert an intermediate state take a snapshot at that point too. The device-evidence screenshot is usually pixel-judged (a visual change) — take THAT one standalone with `includeBase64: true` so you can see it; batch it only when it's purely gate evidence. Log-evidence reads batch together too.
|
|
37
|
+
|
|
36
38
|
### Verdict fields
|
|
37
39
|
The verdict is platform-agnostic — submit only semantic judgment:
|
|
38
40
|
|
|
@@ -13,6 +13,8 @@ The **backend protocol cycle** verifies backend changes by driving real protocol
|
|
|
13
13
|
|
|
14
14
|
You can satisfy the cycle via **protocol-call evidence** (you drive the request yourself), **log evidence** (something else drives the request, you read the resulting logs), **DB evidence** (you inspect database state directly), or any combination. Pick whichever fits the task; one is enough.
|
|
15
15
|
|
|
16
|
+
**Batch (speed):** group consecutive `bedt_*` steps into one `mcp__backend-devtools__bedt_execute` — e.g. a POST then a GET that reuses the created id (bind the first call's result: `const r = callTool('bedt_request_http', {…POST…}); callTool('bedt_request_http', { /* GET using an id from r */ })`), register-source + read, or db-connect + query. Keep a step standalone only when you must inspect its result to DECIDE what to do next, not just to pass a value along.
|
|
17
|
+
|
|
16
18
|
### Path A — Protocol-call evidence
|
|
17
19
|
|
|
18
20
|
1. **Confirm a backend service is running** (the user's dev server, Docker compose, k8s port-forward, …). The agent itself does not start the service — ask the user if uncertain.
|
|
@@ -14,6 +14,8 @@
|
|
|
14
14
|
|
|
15
15
|
All four tools are MANDATORY (the Stop hook checks each). Functional interaction is expected for every verification.
|
|
16
16
|
|
|
17
|
+
**Batch (speed):** navigate (step 1) is standalone — read the ARIA snapshot it returns to decide your interactions. Then run steps 2–5 in ONE `mcp__browser-devtools__bdt_execute` batch — `callTool('bdt_interaction_…', …)` for each interaction, `callTool('bdt_content_take-screenshot', …)`, `callTool('bdt_a11y_take-aria-snapshot', …)`, `callTool('bdt_o11y_get-console-messages', …)` — instead of four separate turns. Screenshot/aria/console capture the state AFTER the batched interactions, so batch interactions that lead to ONE state you want to assert; to assert an intermediate state (e.g. a modal that opens then closes) take a screenshot/snapshot at that point too — interleave it in the batch or split into two. The interaction is what makes the evidence meaningful: a batch of just the four evidence tools with no real interaction passes the tool-presence check but verifies nothing. If you must judge the screenshot's pixels, take that one standalone with `includeBase64: true`.
|
|
18
|
+
|
|
17
19
|
### Verdict fields
|
|
18
20
|
The verdict is platform-agnostic — you submit only semantic judgment:
|
|
19
21
|
|
|
@@ -31,6 +31,8 @@ If you see `pom.xml`, `build.gradle`, `requirements.txt`, `pyproject.toml`, `go.
|
|
|
31
31
|
- Read errors: `ndt_debug_get-logs` with the error-level filter.
|
|
32
32
|
4. **Disconnect** (optional): `ndt_debug_disconnect`.
|
|
33
33
|
|
|
34
|
+
**Batch (speed):** connect (step 2) is standalone discovery. Batch consecutive `ndt_*` calls in one `mcp__node-devtools__ndt_execute` — set several probes together, then later read snapshots/logs together. The exercise step is ALWAYS separate: whatever triggers the code path (a browser/backend call on another server, a CLI command, the user) can't share an `ndt_*` batch — so node runs as set probes (batch) → exercise (separate) → read snapshots (batch).
|
|
35
|
+
|
|
34
36
|
### Verdict fields
|
|
35
37
|
The verdict is platform-agnostic — you submit only semantic judgment:
|
|
36
38
|
|
|
@@ -1,30 +1,16 @@
|
|
|
1
|
-
You MUST verify all code changes
|
|
1
|
+
You MUST verify all code changes before completing any task — by DELEGATING to the `ironbee-verifier` custom agent. You do not have the verification tools; the verifier does. Never verify inline.
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
After editing code, before reporting completion: spawn the `ironbee-verifier` custom agent — call `spawn_agent` with `agent_type="ironbee-verifier"` and `fork_turns="none"` (REQUIRED: the default `fork_turns="all"` silently drops the agent_type → a generic agent without the verification tools; not a generic "act as" agent either) with a prompt describing what to verify. It drives the verification tools, exercises every active cycle (browser / runtime / backend, as wired up for this project), and submits the single verdict in this shared session — then returns a summary. Relay it. **Wait for the verifier in the same turn — do NOT background it; if it is backgrounded your turn can end before its verdict is recorded, leaving your changes unverified.**
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
If verification FAILS: fix the issues the verifier reported, optionally record what you fixed (`echo '{"fixes":["what you repaired"]}' | ironbee hook record-fix`), then re-delegate until it passes. Every code edit (apply_patch) clears the verdict, requiring re-delegation.
|
|
6
6
|
|
|
7
|
-
|
|
8
|
-
- judges the result and submits a single verdict via `ironbee hook submit-verdict`,
|
|
9
|
-
- runs inside your session, so the Stop gate sees its work,
|
|
10
|
-
- does NOT edit code — if it finds problems it returns them as `issues` for YOU to fix.
|
|
11
|
-
|
|
12
|
-
You (the main agent) do **not** have the devtools tools and must not try to drive them. Your job is to edit code, spawn the verifier to verify, and — if it reports a fail — fix the issues and re-spawn it.
|
|
13
|
-
|
|
14
|
-
## After a fail → fix → record → re-verify
|
|
15
|
-
|
|
16
|
-
1. The verifier returns a fail verdict with `issues`.
|
|
17
|
-
2. Fix the issues in your code.
|
|
18
|
-
3. Record what you fixed so the next pass verdict captures it (the verifier can't author this — it didn't make the edit):
|
|
19
|
-
```
|
|
20
|
-
echo '{"fixes":["fixed null check in src/foo.ts"]}' | ironbee hook record-fix
|
|
21
|
-
```
|
|
22
|
-
4. Re-spawn the `ironbee-verifier` custom agent. Repeat until it passes.
|
|
7
|
+
The Stop gate blocks completion until a verdict exists for your changes — delegation is the only path.
|
|
23
8
|
|
|
24
9
|
## BANNED
|
|
25
10
|
|
|
26
|
-
-
|
|
27
|
-
- "
|
|
28
|
-
-
|
|
29
|
-
|
|
30
|
-
|
|
11
|
+
- Running the verification tools (`bdt_*` / `ndt_*` / `bedt_*`) or `ironbee hook verification-start` / `submit-verdict` yourself — those are the verifier's job. Delegate.
|
|
12
|
+
- Using the generic `spawn_agent` tool / a plain fork to "be" the verifier — that spawns a DEFAULT agent without the devtools. Spawn the `ironbee-verifier` custom agent by its `agent_type`.
|
|
13
|
+
- Reporting a task complete without delegating verification of your changes.
|
|
14
|
+
- Submitting a verdict based on assumptions, code reading, or prior knowledge — the verifier verifies through real tools.
|
|
15
|
+
- Writing `verdict.json` directly.
|
|
16
|
+
- Backgrounding the verifier custom agent, or ending your turn before it returns its verdict — wait for it in the same turn.
|
|
@@ -2,82 +2,54 @@
|
|
|
2
2
|
name: ironbee-verification
|
|
3
3
|
description: >
|
|
4
4
|
MANDATORY verification after code changes. Activates when implementing features, fixing
|
|
5
|
-
bugs, modifying UI components, API endpoints, styles, refactoring, or any task that
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
near the bottom of this file. After every code edit you MUST verify the affected
|
|
9
|
-
cycle(s) through real tools and submit a single verdict (pass or fail) before
|
|
10
|
-
reporting task completion. If verification fails, submit the fail verdict first,
|
|
11
|
-
then fix.
|
|
5
|
+
bugs, modifying UI components, API endpoints, styles, refactoring, or any task that changes
|
|
6
|
+
application behavior. After editing code you MUST verify the changes before reporting task
|
|
7
|
+
completion — and you verify by DELEGATING to the ironbee-verifier custom agent, never inline.
|
|
12
8
|
---
|
|
13
9
|
|
|
14
10
|
# IronBee Verification
|
|
15
11
|
|
|
16
|
-
|
|
17
|
-
No task is complete until changes are verified — through **real tools**, not by reading code or inferring behavior. After verification, you MUST submit a verdict (pass or fail) before doing anything else. If verification fails, submit the fail verdict first, then fix.
|
|
18
|
-
|
|
19
|
-
## Cycles
|
|
20
|
-
|
|
21
|
-
IronBee runs verification in **cycles**. A single Stop hook can drive multiple cycles in parallel — every active cycle must pass for your task to complete.
|
|
22
|
-
|
|
23
|
-
You don't choose which cycle runs — the file pattern decides. A single edited file can match multiple cycles' patterns and activate them all. Cycles always run in parallel within a single Stop run. Each cycle has its own tools, flow steps, and verdict fields.
|
|
24
|
-
|
|
25
|
-
**See the platform sections near the bottom of this file** for which cycles are active for this project, the tools they expose, and the per-cycle verdict fields you must include.
|
|
26
|
-
|
|
27
|
-
## Application lifecycle (your responsibility)
|
|
28
|
-
|
|
29
|
-
For every active cycle you manage the running application:
|
|
30
|
-
- **Build** if needed (`npm run build`, `docker compose build`, …)
|
|
31
|
-
- **Start** before navigating/connecting (`npm run dev`, `docker compose up -d`, …)
|
|
32
|
-
- **Stop** when verification is complete
|
|
33
|
-
|
|
34
|
-
If already running, skip start. If the build fails, fix it before proceeding.
|
|
12
|
+
> **Delegate — do NOT verify inline.** You verify by spawning the **`ironbee-verifier` custom agent** via `spawn_agent` with `agent_type="ironbee-verifier"` **and `fork_turns="none"`** (the default `fork_turns="all"` silently drops the agent_type → a generic toolless agent; not a generic "act as" agent either) and relaying its verdict. The verifier owns the devtools tools; you (the main agent) don't have them.
|
|
35
13
|
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
14
|
+
## Rule
|
|
15
|
+
No task is complete until your changes are verified through **real tools** — and you verify by
|
|
16
|
+
**delegating to the `ironbee-verifier` custom agent**, never inline. You do not have the
|
|
17
|
+
verification tools (browser / runtime / backend devtools); the verifier does. After delegating,
|
|
18
|
+
relay its verdict; on fail, fix the reported issues and re-delegate until it passes.
|
|
19
|
+
|
|
20
|
+
## How to verify — delegate
|
|
21
|
+
1. Finish your code edits.
|
|
22
|
+
2. Spawn the `ironbee-verifier` custom agent: call `spawn_agent` with `agent_type="ironbee-verifier"`
|
|
23
|
+
AND `fork_turns="none"` (REQUIRED — the default `fork_turns="all"` silently drops the agent_type,
|
|
24
|
+
giving a generic agent without the verification tools; not a generic "act as" agent either) with a
|
|
25
|
+
prompt like *"Verify the recent changes"* (optionally describe what changed, or pass a
|
|
26
|
+
scenario). It drives the verification tools, exercises every active cycle, and submits the
|
|
27
|
+
single verdict in this **shared session** — then returns a short summary.
|
|
28
|
+
**Wait for it in the same turn — do NOT background the verifier.** Let it run to completion
|
|
29
|
+
and read its verdict before you respond. If the verifier is backgrounded, your turn can end
|
|
30
|
+
(and the Stop gate fire) before its verdict is recorded, leaving your changes unverified.
|
|
31
|
+
3. **Relay the verdict.** If it FAILED: fix the issues it reported. Optionally record what you
|
|
32
|
+
fixed so the next pass verdict can describe it:
|
|
51
33
|
```
|
|
52
|
-
echo '
|
|
34
|
+
echo '{"fixes":["what you repaired"]}' | ironbee hook record-fix
|
|
53
35
|
```
|
|
54
|
-
|
|
55
|
-
- Pass → `{ "session_id": "...", "status": "pass", "checks": [...] }`
|
|
56
|
-
- Fail → add `"issues": [...]` describing what failed.
|
|
57
|
-
- Pass after a previous fail → add `"fixes": [...]` describing what was repaired.
|
|
58
|
-
- **The Stop hook enforces that you called the required tools for every active cycle and that the verdict carries non-empty `checks`.**
|
|
59
|
-
8. If failed → fix → rebuild → go back to step 2 → repeat until pass.
|
|
60
|
-
|
|
61
|
-
<!--IRONBEE:PLATFORM:browser-->
|
|
62
|
-
<!--/IRONBEE:PLATFORM:browser-->
|
|
63
|
-
|
|
64
|
-
<!--IRONBEE:PLATFORM:node-->
|
|
65
|
-
<!--/IRONBEE:PLATFORM:node-->
|
|
66
|
-
|
|
67
|
-
<!--IRONBEE:PLATFORM:backend-->
|
|
68
|
-
<!--/IRONBEE:PLATFORM:backend-->
|
|
36
|
+
Then re-delegate. Repeat until it passes.
|
|
69
37
|
|
|
70
|
-
|
|
71
|
-
|
|
38
|
+
The Stop gate enforces this: it blocks completion until a verdict exists for your changes. Since
|
|
39
|
+
you can't verify inline, delegation is the only path forward.
|
|
72
40
|
|
|
73
|
-
##
|
|
74
|
-
-
|
|
75
|
-
|
|
76
|
-
-
|
|
77
|
-
-
|
|
78
|
-
-
|
|
41
|
+
## BANNED
|
|
42
|
+
- Trying to run the verification tools yourself (`bdt_*` / `ndt_*` / `bedt_*`) or
|
|
43
|
+
`ironbee hook verification-start` / `submit-verdict` — those are the verifier's job. Delegate.
|
|
44
|
+
- Using the generic `spawn_agent` tool / a plain fork to "be" the verifier — that spawns a
|
|
45
|
+
DEFAULT agent without the devtools. Spawn the `ironbee-verifier` custom agent via `spawn_agent` with `agent_type="ironbee-verifier"` and `fork_turns="none"`.
|
|
46
|
+
- Reporting a task complete without delegating verification of your changes.
|
|
47
|
+
- Claiming verification passed based on code reading, assumptions, or prior knowledge.
|
|
48
|
+
- Backgrounding the verifier custom agent (or ending your turn before it returns its verdict) —
|
|
49
|
+
wait for it to finish in the same turn.
|
|
79
50
|
|
|
80
51
|
## Subagent teams
|
|
81
|
-
-
|
|
82
|
-
-
|
|
83
|
-
-
|
|
52
|
+
- Implementation subagents write code; they do NOT verify.
|
|
53
|
+
- Verification is ALWAYS delegated to the dedicated `ironbee-verifier` custom agent — it owns the
|
|
54
|
+
per-cycle browser/node/backend flows and the verification tools. Each session's verification
|
|
55
|
+
is isolated via session-specific verdict files.
|
|
@@ -1,38 +1,48 @@
|
|
|
1
|
-
"use strict";var
|
|
1
|
+
"use strict";var k=Object.defineProperty;var E=Object.getOwnPropertyDescriptor;var L=Object.getOwnPropertyNames;var W=Object.prototype.hasOwnProperty;var o=(n,t)=>k(n,"name",{value:t,configurable:!0});var M=(n,t)=>{for(var e in t)k(n,e,{get:t[e],enumerable:!0})},P=(n,t,e,s)=>{if(t&&typeof t=="object"||typeof t=="function")for(let r of L(t))!W.call(n,r)&&r!==e&&k(n,r,{get:()=>t[r],enumerable:!(s=E(t,r))||s.enumerable});return n};var J=n=>P(k({},"__esModule",{value:!0}),n);var pn={};M(pn,{AGENTS_MD_END_MARKER:()=>x,AGENTS_MD_START_MARKER:()=>v,canonicalizeCodexServerName:()=>C,canonicalizeCodexToolName:()=>$,classifyCodexTool:()=>V,codexAgentTomlPath:()=>en,codexConfigTomlPath:()=>T,codexHooksJsonPath:()=>ln,decodeJwtPayload:()=>A,ensureFeaturesHooksTrue:()=>Z,ensureMultiAgentV2SpawnMetadataExposed:()=>q,extractBashBinary:()=>j,extractCodexMcpServer:()=>I,extractCodexToolInput:()=>D,extractTomlTopLevelModel:()=>rn,findTomlSection:()=>h,normalizeCodexToolName:()=>S,parseCodexHookStdin:()=>B,readCodexConfigToml:()=>an,removeAgentsTable:()=>tn,removeMcpServer:()=>N,removeMultiAgentV2SpawnMetadata:()=>Q,resolveCodexUsage:()=>U,stripAgentsMdBlock:()=>un,tomlBodyFromRecord:()=>sn,upsertAgentsMdBlock:()=>on,upsertAgentsTable:()=>nn,upsertMcpServer:()=>Y,userCodexAgentTomlPath:()=>fn,userCodexConfigTomlPath:()=>cn,userCodexHooksJsonPath:()=>gn,writeCodexConfigToml:()=>dn});module.exports=J(pn);var m=require("fs"),b=require("os"),p=require("path"),y=require("../../lib/logger");function B(n){try{return JSON.parse(n)}catch(t){return y.logger.debug(`failed to parse Codex hook stdin: ${t}`),{}}}o(B,"parseCodexHookStdin");const _="mcp__",z={browser_devtools:"browser-devtools",node_devtools:"node-devtools",backend_devtools:"backend-devtools",android_devtools:"android-devtools"},H=["bdt_","ndt_","bedt_","adt_"];function C(n){return z[n]??n}o(C,"canonicalizeCodexServerName");function $(n){if(!H.some(e=>n.startsWith(e)))return n;const t=n.split("_");return t.length<=3?n:`${t[0]}_${t[1]}_${t.slice(2).join("-")}`}o($,"canonicalizeCodexToolName");const F=[["bdt_","browser-devtools"],["ndt_","node-devtools"],["bedt_","backend-devtools"],["adt_","android-devtools"]];function I(n){if(!n)return null;if(n.startsWith(_)){const t=n.slice(_.length),e=t.indexOf("__");return e<0?null:C(t.slice(0,e))}for(const[t,e]of F)if(n.startsWith(t))return e;return null}o(I,"extractCodexMcpServer");function S(n){return n==="exec_command"?"Bash":n==="apply_patch"?"Edit":n==="update_plan"?"TodoWrite":n==="read_file"?"Read":n==="web_search"?"WebSearch":n==="web_fetch"?"WebFetch":n}o(S,"normalizeCodexToolName");function V(n){if(!n)return{tool_type:null,tool_name:"",mcp_server:null};if(n.startsWith(_)){const s=n.slice(_.length),r=s.indexOf("__");if(r>=0){const i=s.slice(0,r),u=C(i),a=s.slice(r+2);return{tool_type:"mcp",tool_name:$(a),mcp_server:u}}}const t=I(n);if(t!==null&&!n.startsWith(_))return{tool_type:"mcp",tool_name:$(n),mcp_server:t};const e=S(n);return n==="spawn_agent"||n==="wait_agent"||n==="close_agent"?{tool_type:"sub_agent",tool_name:e,mcp_server:null}:{tool_type:null,tool_name:e,mcp_server:null}}o(V,"classifyCodexTool");function D(n,t){if(!n||t===void 0)return;if(n==="apply_patch"){if(typeof t=="string")return{input_size:t.length};if(typeof t=="object"&&t!==null){const r=t,i=r.command??r.input;if(typeof i=="string")return{input_size:i.length}}return{input_size:void 0}}if(typeof t!="object"||t===null)return;const e=t;if(S(n)==="Bash"){const r=e.cmd??e.command,i=typeof r=="string"?j(r):void 0;return{workdir:e.workdir,binary:i}}if(n==="update_plan"){const r=e.explanation,i=e.plan;return{explanation:typeof r=="string"?r:void 0,plan_step_count:Array.isArray(i)?i.length:void 0}}if(n==="spawn_agent"){const r=e.agent_type,i=e.message,u=e.fork_context;return{agent_type:typeof r=="string"?r:void 0,message_size:typeof i=="string"?i.length:void 0,fork_context:typeof u=="boolean"?u:void 0}}if(n==="wait_agent"){const r=e.targets,i=e.timeout_ms;return{target_count:Array.isArray(r)?r.length:void 0,timeout_ms:typeof i=="number"?i:void 0}}if(n==="close_agent"){const r=e.target;return{target:typeof r=="string"?r:void 0}}if(n==="view_image"){const r=e.path,i=e.detail;return{path:typeof r=="string"?r:void 0,detail:typeof i=="string"?i:void 0}}if(n==="write_stdin"){const r=e.session_id,i=e.chars,u=e.yield_time_ms,a=e.max_output_tokens;return{session_id:typeof r=="number"?r:void 0,chars_size:typeof i=="string"?i.length:void 0,yield_time_ms:typeof u=="number"?u:void 0,max_output_tokens:typeof a=="number"?a:void 0}}if(n.startsWith(_)||I(n)!==null){if("_metadata"in e){const{_metadata:r,...i}=e;return i}return e}}o(D,"extractCodexToolInput");function j(n){const t=n.trim();if(!t)return;const e=t.split(/\s+/);for(const s of e)if(!/^[A-Za-z_][A-Za-z0-9_]*=/.test(s)&&s.length>0)return s.split(/[\\/]/).pop()??s}o(j,"extractBashBinary");function A(n){const t=n.split(".");if(t.length!==3)return null;try{const e=Buffer.from(t[1],"base64url").toString("utf-8"),s=JSON.parse(e);return typeof s!="object"||s===null?null:s}catch{return null}}o(A,"decodeJwtPayload");function K(n){if(typeof n=="string"){const t=A(n);return t?{email:t.email,planType:t["https://api.openai.com/auth"]?.chatgpt_plan_type}:{}}if(typeof n=="object"&&n!==null){const t=n;return{email:t.email,planType:t.chatgpt_plan_type}}return{}}o(K,"extractIdTokenFields");function U(n){const t=n??(0,p.join)((0,b.homedir)(),".codex","auth.json");if(!(0,m.existsSync)(t))return{};try{const e=JSON.parse((0,m.readFileSync)(t,"utf-8")),s=e.auth_mode==="chatgpt"||e.auth_mode==="swic"?"subscription":e.auth_mode==="api"?"api":void 0,{email:r,planType:i}=K(e.tokens?.id_token);return{usageType:s,usagePlan:i?.toLowerCase(),userEmail:r}}catch(e){return y.logger.debug(`failed to parse ${t}: ${e}`),{}}}o(U,"resolveCodexUsage");function X(n,t){return n.trim()===`[${t}]`}o(X,"tableHeaderLineExact");function G(n){const t=n.trim();return/^\[\[?[^\]]+\]\]?$/.test(t)}o(G,"isAnyTableHeader");function R(n){const e=n.trim().match(/^\[([^[\]]+)\]$/);return e===null?null:e[1]}o(R,"tableHeaderName");function h(n,t){let e=-1;for(let r=0;r<n.length;r+=1)if(X(n[r],t)){e=r;break}if(e<0)return null;let s=n.length;for(let r=e+1;r<n.length;r+=1)if(G(n[r])){s=r;break}return{startIdx:e,endIdx:s}}o(h,"findTomlSection");function O(n){const t=[...n];for(;t.length>0&&t[t.length-1].trim()==="";)t.pop();return t}o(O,"trimTrailingBlanks");function w(n,t){return n.length===0?t.join(`
|
|
2
2
|
`)+`
|
|
3
3
|
`:n.replace(/\n+$/,"")+`
|
|
4
4
|
|
|
5
5
|
`+t.join(`
|
|
6
6
|
`)+`
|
|
7
|
-
`}o(
|
|
8
|
-
`),e=
|
|
9
|
-
`);return
|
|
10
|
-
`)?
|
|
11
|
-
`}o(Z,"ensureFeaturesHooksTrue");function q(n
|
|
12
|
-
`),
|
|
13
|
-
`);return
|
|
14
|
-
`)?
|
|
15
|
-
`}o(q,"
|
|
16
|
-
`),
|
|
7
|
+
`}o(w,"appendBlockWithSeparator");function Z(n){const t=n.split(`
|
|
8
|
+
`),e=h(t,"features");if(e===null)return w(n,["[features]","hooks = true"]);const s=t.slice(e.startIdx+1,e.endIdx),r=/^\s*hooks\s*=/;let i=!1;for(let d=0;d<s.length;d+=1)if(r.test(s[d])){s[d]="hooks = true",i=!0;break}i||s.unshift("hooks = true");const u=O(s),c=[...t.slice(0,e.startIdx),t[e.startIdx],...u,...e.endIdx<t.length?[""]:[],...t.slice(e.endIdx)].join(`
|
|
9
|
+
`);return c.endsWith(`
|
|
10
|
+
`)?c:c+`
|
|
11
|
+
`}o(Z,"ensureFeaturesHooksTrue");function q(n){const t=n.split(`
|
|
12
|
+
`),e=h(t,"features.multi_agent_v2");if(e===null)return w(n,["[features.multi_agent_v2]","hide_spawn_agent_metadata = false"]);const s=t.slice(e.startIdx+1,e.endIdx),r=/^\s*hide_spawn_agent_metadata\s*=/;let i=!1;for(let d=0;d<s.length;d+=1)if(r.test(s[d])){s[d]="hide_spawn_agent_metadata = false",i=!0;break}i||s.unshift("hide_spawn_agent_metadata = false");const u=O(s),c=[...t.slice(0,e.startIdx),t[e.startIdx],...u,...e.endIdx<t.length?[""]:[],...t.slice(e.endIdx)].join(`
|
|
13
|
+
`);return c.endsWith(`
|
|
14
|
+
`)?c:c+`
|
|
15
|
+
`}o(q,"ensureMultiAgentV2SpawnMetadataExposed");function Q(n){const t=n.split(`
|
|
16
|
+
`),e=h(t,"features.multi_agent_v2");if(e===null)return n;const s=t.slice(e.startIdx+1,e.endIdx).filter(a=>a.trim().length>0);if(!(s.length===1&&/^\s*hide_spawn_agent_metadata\s*=\s*false\s*$/.test(s[0])))return n;const u=[...t.slice(0,e.startIdx),...t.slice(e.endIdx)].join(`
|
|
17
|
+
`).replace(/\n{3,}/g,`
|
|
18
|
+
|
|
19
|
+
`);return u.endsWith(`
|
|
20
|
+
`)?u:u+`
|
|
21
|
+
`}o(Q,"removeMultiAgentV2SpawnMetadata");function Y(n,t,e){const s=`mcp_servers.${t}`,r=n.split(`
|
|
22
|
+
`),i=h(r,s),a=[`[${s}]`,...e];if(i===null)return w(n,a);const c=r.slice(0,i.startIdx),d=r.slice(i.endIdx),l=[...c,...a,...d.length>0?[""]:[],...d].join(`
|
|
23
|
+
`);return l.endsWith(`
|
|
24
|
+
`)?l:l+`
|
|
25
|
+
`}o(Y,"upsertMcpServer");function N(n,t){const e=`mcp_servers.${t}`,s=`${e}.`,r=n.split(`
|
|
26
|
+
`),i=[];let u=!1,a=!1;for(const l of r){const g=R(l);if(g!==null&&(u=g===e||g.startsWith(s),u)){a=!0;continue}u||i.push(l)}if(!a)return n;const c=[];let d=!1;for(const l of i){const g=l.trim().length===0;g&&d||(c.push(l),d=g)}const f=c.join(`
|
|
17
27
|
`);return f.endsWith(`
|
|
18
28
|
`)||f.length===0?f:f+`
|
|
19
|
-
`}o(
|
|
20
|
-
`),i=
|
|
21
|
-
`);return
|
|
22
|
-
`)?
|
|
23
|
-
`}o(
|
|
24
|
-
`),i=[];let
|
|
29
|
+
`}o(N,"removeMcpServer");function nn(n,t,e){const s=`agents.${t}`,r=n.split(`
|
|
30
|
+
`),i=h(r,s),a=[`[${s}]`,...e];if(i===null)return w(n,a);const c=r.slice(0,i.startIdx),d=r.slice(i.endIdx),l=[...c,...a,...d.length>0?[""]:[],...d].join(`
|
|
31
|
+
`);return l.endsWith(`
|
|
32
|
+
`)?l:l+`
|
|
33
|
+
`}o(nn,"upsertAgentsTable");function tn(n,t){const e=`agents.${t}`,s=`${e}.`,r=n.split(`
|
|
34
|
+
`),i=[];let u=!1,a=!1;for(const l of r){const g=R(l);if(g!==null&&(u=g===e||g.startsWith(s),u)){a=!0;continue}u||i.push(l)}if(!a)return n;const c=[];let d=!1;for(const l of i){const g=l.trim().length===0;g&&d||(c.push(l),d=g)}const f=c.join(`
|
|
25
35
|
`);return f.endsWith(`
|
|
26
36
|
`)||f.length===0?f:f+`
|
|
27
|
-
`}o(
|
|
28
|
-
`)){const e=t.trim();if(e.startsWith("["))break;const s=e.match(/^model\s*=\s*"([^"]*)"/);if(s&&s[1].length>0)return s[1]}return null}o(
|
|
37
|
+
`}o(tn,"removeAgentsTable");function en(n,t){return(0,p.join)(n,".codex","agents",`${t}.toml`)}o(en,"codexAgentTomlPath");function rn(n){for(const t of n.split(`
|
|
38
|
+
`)){const e=t.trim();if(e.startsWith("["))break;const s=e.match(/^model\s*=\s*"([^"]*)"/);if(s&&s[1].length>0)return s[1]}return null}o(rn,"extractTomlTopLevelModel");function sn(n){const t=[];for(const[e,s]of Object.entries(n))if(s!=null){if(typeof s=="string")t.push(`${e} = ${JSON.stringify(s)}`);else if(typeof s=="number"||typeof s=="boolean")t.push(`${e} = ${s}`);else if(Array.isArray(s)){const r=s.map(i=>typeof i=="string"?JSON.stringify(i):typeof i=="number"||typeof i=="boolean"?String(i):JSON.stringify(i));t.push(`${e} = [${r.join(", ")}]`)}else if(typeof s=="object"){const r=s,i=[];for(const[u,a]of Object.entries(r))a!=null&&(typeof a=="string"?i.push(`${u} = ${JSON.stringify(a)}`):typeof a=="number"||typeof a=="boolean"?i.push(`${u} = ${a}`):i.push(`${u} = ${JSON.stringify(a)}`));t.push(`${e} = { ${i.join(", ")} }`)}}return t}o(sn,"tomlBodyFromRecord");const v="<!-- ironbee:start -->",x="<!-- ironbee:end -->";function on(n,t){const e=`${v}
|
|
29
39
|
${t.trimEnd()}
|
|
30
|
-
${
|
|
40
|
+
${x}`,s=n.indexOf(v),r=n.indexOf(x);if(s>=0&&r>s){const i=n.slice(0,s),u=n.slice(r+x.length);return i+e+u}return n.trim().length===0?e+`
|
|
31
41
|
`:n.trimEnd()+`
|
|
32
42
|
|
|
33
43
|
`+e+`
|
|
34
|
-
`}o(
|
|
44
|
+
`}o(on,"upsertAgentsMdBlock");function un(n){const t=n.indexOf(v),e=n.indexOf(x);if(t<0||e<t)return n.trim().length===0?null:n;const s=n.slice(0,t).trimEnd(),r=n.slice(e+x.length).trimStart(),i=s+(s.length>0&&r.length>0?`
|
|
35
45
|
|
|
36
46
|
`:"")+r;return i.trim().length===0?null:i.endsWith(`
|
|
37
47
|
`)?i:i+`
|
|
38
|
-
`}o(
|
|
48
|
+
`}o(un,"stripAgentsMdBlock");function an(n){const t=T(n);if(!(0,m.existsSync)(t))return"";try{return(0,m.readFileSync)(t,"utf-8")}catch(e){return y.logger.debug(`failed to read ${t}: ${e}`),""}}o(an,"readCodexConfigToml");function dn(n,t){const e=T(n);try{(0,m.writeFileSync)(e,t)}catch(s){y.logger.debug(`failed to write ${e}: ${s}`)}}o(dn,"writeCodexConfigToml");function T(n){return(0,p.join)(n,".codex","config.toml")}o(T,"codexConfigTomlPath");function ln(n){return(0,p.join)(n,".codex","hooks.json")}o(ln,"codexHooksJsonPath");function cn(){return(0,p.join)((0,b.homedir)(),".codex","config.toml")}o(cn,"userCodexConfigTomlPath");function gn(){return(0,p.join)((0,b.homedir)(),".codex","hooks.json")}o(gn,"userCodexHooksJsonPath");function fn(n){return(0,p.join)((0,b.homedir)(),".codex","agents",`${n}.toml`)}o(fn,"userCodexAgentTomlPath");0&&(module.exports={AGENTS_MD_END_MARKER,AGENTS_MD_START_MARKER,canonicalizeCodexServerName,canonicalizeCodexToolName,classifyCodexTool,codexAgentTomlPath,codexConfigTomlPath,codexHooksJsonPath,decodeJwtPayload,ensureFeaturesHooksTrue,ensureMultiAgentV2SpawnMetadataExposed,extractBashBinary,extractCodexMcpServer,extractCodexToolInput,extractTomlTopLevelModel,findTomlSection,normalizeCodexToolName,parseCodexHookStdin,readCodexConfigToml,removeAgentsTable,removeMcpServer,removeMultiAgentV2SpawnMetadata,resolveCodexUsage,stripAgentsMdBlock,tomlBodyFromRecord,upsertAgentsMdBlock,upsertAgentsTable,upsertMcpServer,userCodexAgentTomlPath,userCodexConfigTomlPath,userCodexHooksJsonPath,writeCodexConfigToml});
|
|
@@ -33,6 +33,8 @@ If you see only `ios/`, `web/`, or no mobile directories — the project does NO
|
|
|
33
33
|
- Read Logcat output for the tag(s) relevant to the changed code: `MCP:adt_o11y_log-read` or `MCP:adt_o11y_log-follow` (drain a follow with `MCP:adt_o11y_log-get-followed`, stop it with `MCP:adt_o11y_log-stop-follow`).
|
|
34
34
|
- Confirm expected log lines appear AND no unexpected crashes (FATAL / E/ entries for the app package).
|
|
35
35
|
|
|
36
|
+
**Batch (speed):** connect + launch-app run standalone first (prerequisites). On the device-evidence path, batch the UI interactions + the UI snapshot into one `MCP:adt_execute`; the snapshot captures the state after the batched interactions, so to assert an intermediate state take a snapshot at that point too. The device-evidence screenshot is usually pixel-judged (a visual change) — take THAT one standalone with `includeBase64: true` so you can see it; batch it only when it's purely gate evidence. Log-evidence reads batch together too.
|
|
37
|
+
|
|
36
38
|
### Verdict fields
|
|
37
39
|
The verdict is platform-agnostic — submit only semantic judgment:
|
|
38
40
|
|