@ironbee-ai/cli 0.9.6 → 0.10.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +2 -0
- package/README.md +15 -40
- package/dist/analysis/scoring.d.ts.map +1 -1
- package/dist/analysis/scoring.js +6 -4
- package/dist/analysis/scoring.js.map +1 -1
- package/dist/analysis/verdict-details.d.ts +5 -5
- package/dist/analysis/verdict-details.d.ts.map +1 -1
- package/dist/analysis/verdict-details.js +5 -5
- package/dist/analysis/verdict-details.js.map +1 -1
- package/dist/analysis/verification-quality.d.ts +4 -3
- package/dist/analysis/verification-quality.d.ts.map +1 -1
- package/dist/analysis/verification-quality.js +7 -44
- package/dist/analysis/verification-quality.js.map +1 -1
- package/dist/clients/claude/commands/ironbee-verify.md +5 -5
- package/dist/clients/claude/hooks/require-verdict.js +2 -2
- package/dist/clients/claude/hooks/session-start.d.ts.map +1 -1
- package/dist/clients/claude/hooks/session-start.js +1 -7
- package/dist/clients/claude/hooks/session-start.js.map +1 -1
- package/dist/clients/claude/platforms/command-verify.backend.md +6 -31
- package/dist/clients/claude/platforms/command-verify.browser.md +2 -2
- package/dist/clients/claude/platforms/command-verify.node.md +8 -14
- package/dist/clients/claude/platforms/rule.backend.md +2 -2
- package/dist/clients/claude/platforms/rule.browser.md +1 -1
- package/dist/clients/claude/platforms/rule.node.md +3 -4
- package/dist/clients/claude/platforms/skill.backend.md +10 -41
- package/dist/clients/claude/platforms/skill.browser.md +4 -7
- package/dist/clients/claude/platforms/skill.node.md +11 -26
- package/dist/clients/claude/rules/ironbee-verification.md +3 -4
- package/dist/clients/claude/skills/ironbee-verification.md +8 -6
- package/dist/clients/cursor/commands/ironbee-verify/SKILL.md +5 -5
- package/dist/clients/cursor/hooks/require-verdict.js +2 -2
- package/dist/clients/cursor/hooks/session-start.d.ts.map +1 -1
- package/dist/clients/cursor/hooks/session-start.js +1 -7
- package/dist/clients/cursor/hooks/session-start.js.map +1 -1
- package/dist/clients/cursor/platforms/command-verify.backend.md +6 -31
- package/dist/clients/cursor/platforms/command-verify.browser.md +2 -2
- package/dist/clients/cursor/platforms/command-verify.node.md +8 -14
- package/dist/clients/cursor/platforms/rule.backend.md +2 -2
- package/dist/clients/cursor/platforms/rule.browser.md +1 -1
- package/dist/clients/cursor/platforms/rule.node.md +3 -4
- package/dist/clients/cursor/platforms/skill.backend.md +10 -41
- package/dist/clients/cursor/platforms/skill.browser.md +4 -7
- package/dist/clients/cursor/platforms/skill.node.md +11 -26
- package/dist/clients/cursor/rules/ironbee-verification.mdc +3 -4
- package/dist/clients/cursor/skills/ironbee-verification.md +8 -6
- package/dist/commands/analyze.d.ts.map +1 -1
- package/dist/commands/analyze.js +0 -9
- package/dist/commands/analyze.js.map +1 -1
- package/dist/commands/login.js +1 -1
- package/dist/commands/login.js.map +1 -1
- package/dist/commands/status.d.ts.map +1 -1
- package/dist/commands/status.js +0 -4
- package/dist/commands/status.js.map +1 -1
- package/dist/commands/verification-toggle.d.ts +7 -5
- package/dist/commands/verification-toggle.d.ts.map +1 -1
- package/dist/commands/verification-toggle.js +7 -5
- package/dist/commands/verification-toggle.js.map +1 -1
- package/dist/commands/verification.d.ts +24 -0
- package/dist/commands/verification.d.ts.map +1 -0
- package/dist/commands/verification.js +65 -0
- package/dist/commands/verification.js.map +1 -0
- package/dist/commands/verify.d.ts.map +1 -1
- package/dist/commands/verify.js +1 -34
- package/dist/commands/verify.js.map +1 -1
- package/dist/hooks/core/actions.d.ts +12 -54
- package/dist/hooks/core/actions.d.ts.map +1 -1
- package/dist/hooks/core/actions.js +50 -4
- package/dist/hooks/core/actions.js.map +1 -1
- package/dist/hooks/core/submit-verdict.d.ts.map +1 -1
- package/dist/hooks/core/submit-verdict.js +2 -3
- package/dist/hooks/core/submit-verdict.js.map +1 -1
- package/dist/hooks/core/verify-gate.d.ts +10 -6
- package/dist/hooks/core/verify-gate.d.ts.map +1 -1
- package/dist/hooks/core/verify-gate.js +59 -163
- package/dist/hooks/core/verify-gate.js.map +1 -1
- package/dist/index.js +3 -6
- package/dist/index.js.map +1 -1
- package/dist/lib/collector.d.ts +37 -4
- package/dist/lib/collector.d.ts.map +1 -1
- package/dist/lib/collector.js +68 -8
- package/dist/lib/collector.js.map +1 -1
- package/dist/lib/config.d.ts +6 -4
- package/dist/lib/config.d.ts.map +1 -1
- package/dist/lib/config.js +2 -1
- package/dist/lib/config.js.map +1 -1
- package/dist/lib/platform-section.d.ts +1 -1
- package/dist/lib/platform-section.js +1 -1
- package/package.json +1 -1
- package/dist/commands/disable-verification.d.ts +0 -16
- package/dist/commands/disable-verification.d.ts.map +0 -1
- package/dist/commands/disable-verification.js +0 -39
- package/dist/commands/disable-verification.js.map +0 -1
- package/dist/commands/enable-verification.d.ts +0 -14
- package/dist/commands/enable-verification.d.ts.map +0 -1
- package/dist/commands/enable-verification.js +0 -37
- package/dist/commands/enable-verification.js.map +0 -1
package/CHANGELOG.md
CHANGED
package/README.md
CHANGED
|
@@ -150,10 +150,10 @@ To revert: `ironbee backend disable` (drops the block clean if no customizations
|
|
|
150
150
|
### Optional: monitoring-only mode (no enforcement)
|
|
151
151
|
|
|
152
152
|
```bash
|
|
153
|
-
ironbee disable
|
|
153
|
+
ironbee verification disable
|
|
154
154
|
```
|
|
155
155
|
|
|
156
|
-
Turns off enforcement but keeps the telemetry path intact. Session lifecycle and tool-call events still flow to the IronBee Collector, but the agent never sees a verify-gate, skill, rule, or `/ironbee-verify` command — useful when you want observability without slowing the agent down. To re-enable: `ironbee enable
|
|
156
|
+
Turns off enforcement but keeps the telemetry path intact. Session lifecycle and tool-call events still flow to the IronBee Collector, but the agent never sees a verify-gate, skill, rule, or `/ironbee-verify` command — useful when you want observability without slowing the agent down. To re-enable: `ironbee verification enable`.
|
|
157
157
|
|
|
158
158
|
The toggle re-renders all client artifacts (hooks, skill, rule, MCP servers, permissions) atomically. The change takes effect on the next agent session — restart your editor / agent after toggling.
|
|
159
159
|
|
|
@@ -182,8 +182,7 @@ ironbee analyze [session-id] Analyze session metrics (or al
|
|
|
182
182
|
ironbee browser <enable|disable> Manage the browser cycle (default-on; bdt_* tools via browser-devtools)
|
|
183
183
|
ironbee node <enable|disable> Manage the Node.js runtime debug cycle (V8 inspector probes via node-devtools)
|
|
184
184
|
ironbee backend <enable|disable> Manage the runtime-agnostic backend protocol cycle (HTTP/gRPC/GraphQL/WS via backend-devtools)
|
|
185
|
-
ironbee enable
|
|
186
|
-
ironbee disable-verification Monitoring-only mode (no enforcement; sessions still ship to collector)
|
|
185
|
+
ironbee verification <enable|disable> Master verification toggle (enable = enforce; disable = monitoring-only, no enforcement but sessions/tools still ship to collector)
|
|
187
186
|
ironbee config get <key> Read a config value (default: merged effective value)
|
|
188
187
|
ironbee config set <key> <value> Write a config value; auto re-renders client artifacts when needed
|
|
189
188
|
ironbee config unset <key> Remove a config value; auto re-renders when needed
|
|
@@ -296,7 +295,7 @@ ironbee config path # print the project config file path
|
|
|
296
295
|
|
|
297
296
|
**Type coercion** — `set` parses the value as JSON when it can (`true`/`42`/`[…]`/`{…}`) and falls back to a raw string when JSON parse fails. URLs and paths pass through unquoted; pass `--json` to force strict JSON parsing (e.g. when you want the literal string `"42"` instead of the number `42`).
|
|
298
297
|
|
|
299
|
-
**Smart artifact re-render** — when a top-level key affects installed client artifacts (`verification`, `collector`, `browser`, `node`, `backend`, `browserDevTools`, `nodeDevTools`, `backendDevTools`), `set` and `unset` re-render the client files (hooks, MCP entries, skill, rule, permissions) automatically — same code path `enable
|
|
298
|
+
**Smart artifact re-render** — when a top-level key affects installed client artifacts (`verification`, `collector`, `browser`, `node`, `backend`, `browserDevTools`, `nodeDevTools`, `backendDevTools`), `set` and `unset` re-render the client files (hooks, MCP entries, skill, rule, permissions) automatically — same code path `verification enable` / `node enable` / `backend enable` use. Other keys (`maxRetries`, `recording`, `jobQueue`, `analytics`, `import`, `ignoredVerifyPatterns`) are pure config flips that the next agent session picks up — no rerender needed.
|
|
300
299
|
|
|
301
300
|
Pass `--no-rerender` to skip the rerender on artifact-affecting keys (handy for scripted bulk edits — follow up with `ironbee install` to resync). If a rerender fails midway, the config file is rolled back to its prior bytes so disk state never diverges from installed artifacts.
|
|
302
301
|
|
|
@@ -371,23 +370,22 @@ When the agent tries to complete a task, IronBee runs these checks:
|
|
|
371
370
|
3. **Were the cycle's required tools used?**
|
|
372
371
|
- **Browser cycle**: navigate, screenshot, accessibility snapshot, console check (all-of)
|
|
373
372
|
- **Node cycle**: connect; then either probe path (`(put-tracepoint | put-logpoint | put-exceptionpoint) AND get-probe-snapshots`) OR log path (`get-logs`)
|
|
374
|
-
4. **Does a verdict exist?** — The agent must submit a single verdict
|
|
375
|
-
5. **Is the verdict valid?** —
|
|
376
|
-
6. **Pass or fail?** — `status: "pass"` is honored
|
|
373
|
+
4. **Does a verdict exist?** — The agent must submit a single verdict via `ironbee hook submit-verdict`.
|
|
374
|
+
5. **Is the verdict valid?** — Required: `status` ∈ {pass, fail} + `checks` (non-empty array). On fail, `issues` is required; on pass-after-fail, `fixes` is required.
|
|
375
|
+
6. **Pass or fail?** — Server-derived pass criteria from `tool_call` records is currently a no-op stub (TODO — see `verify-gate.ts`). For now `status: "pass"` is honored as-is. When evidence extractors land, per-cycle pass criteria (zero console errors, probe triggered, evidence path exercised) will be derived from the agent's tool_calls and override `status: pass` to fail when criteria don't hold.
|
|
377
376
|
7. **Retry limit** — After `maxRetries` failed attempts (default 3, single global counter), the agent is allowed to complete but must report unresolved issues.
|
|
378
377
|
|
|
379
378
|
### Verdict format
|
|
380
379
|
|
|
381
|
-
Verdicts are
|
|
380
|
+
Verdicts are platform-agnostic — the same minimal shape regardless of which cycles (browser / node / backend / multi-cycle) ran. Structural evidence (pages tested, console error counts, probe snapshots, endpoints called, log sources, DB connections, …) is intentionally NOT part of the verdict — the gate (will) derive it from the `tool_call` records of your `bdt_*` / `ndt_*` / `bedt_*` invocations, so the agent cannot misreport it.
|
|
381
|
+
|
|
382
|
+
Submit via `echo '<json>' | ironbee hook submit-verdict`:
|
|
382
383
|
|
|
383
384
|
```json
|
|
384
385
|
{
|
|
385
386
|
"session_id": "<your-session-id>",
|
|
386
387
|
"status": "pass",
|
|
387
|
-
"
|
|
388
|
-
"checks": ["form submits successfully", "new item appears in list"],
|
|
389
|
-
"console_errors": 0,
|
|
390
|
-
"network_failures": 0
|
|
388
|
+
"checks": ["form submits successfully", "new item appears in list"]
|
|
391
389
|
}
|
|
392
390
|
```
|
|
393
391
|
|
|
@@ -397,10 +395,7 @@ On failure, include an `issues` array describing what went wrong:
|
|
|
397
395
|
{
|
|
398
396
|
"session_id": "<your-session-id>",
|
|
399
397
|
"status": "fail",
|
|
400
|
-
"pages_tested": ["http://localhost:3000/dashboard"],
|
|
401
398
|
"checks": ["form renders", "submit button unresponsive"],
|
|
402
|
-
"console_errors": 2,
|
|
403
|
-
"network_failures": 0,
|
|
404
399
|
"issues": ["button click handler not firing", "TypeError in console"]
|
|
405
400
|
}
|
|
406
401
|
```
|
|
@@ -411,31 +406,12 @@ On pass after a previous fail, include a `fixes` array describing what was fixed
|
|
|
411
406
|
{
|
|
412
407
|
"session_id": "<your-session-id>",
|
|
413
408
|
"status": "pass",
|
|
414
|
-
"pages_tested": ["http://localhost:3000/dashboard"],
|
|
415
409
|
"checks": ["form submits successfully", "new item appears in list"],
|
|
416
|
-
"console_errors": 0,
|
|
417
|
-
"network_failures": 0,
|
|
418
410
|
"fixes": ["reattached click handler to submit button", "fixed TypeError in event handler"]
|
|
419
411
|
}
|
|
420
412
|
```
|
|
421
413
|
|
|
422
|
-
|
|
423
|
-
|
|
424
|
-
```json
|
|
425
|
-
{
|
|
426
|
-
"session_id": "<your-session-id>",
|
|
427
|
-
"status": "pass",
|
|
428
|
-
"checks": ["POST /api/orders returned 201", "tracepoint at handler.ts:42 fired once"],
|
|
429
|
-
"node_processes_connected": ["pid:12345 (next-server)"],
|
|
430
|
-
"node_probes_set": [
|
|
431
|
-
{ "type": "tracepoint", "location": "src/api/orders.ts:42", "triggered": true }
|
|
432
|
-
],
|
|
433
|
-
"node_probe_snapshots_collected": 1,
|
|
434
|
-
"node_log_errors": []
|
|
435
|
-
}
|
|
436
|
-
```
|
|
437
|
-
|
|
438
|
-
If both cycles are active, populate browser fields **and** `node_*` fields in the same verdict — both cycles' pass criteria must hold for the gate to honor `status: "pass"`.
|
|
414
|
+
Multi-cycle (e.g. browser + node + backend all active in the same turn): same single verdict. Cycles are derived from the file_changes you made; pass criteria for each is derived from your tool_calls.
|
|
439
415
|
|
|
440
416
|
The agent must submit a verdict after **every** verification attempt — both pass and fail. File edits are blocked until a verdict is submitted after using devtools tools.
|
|
441
417
|
|
|
@@ -499,11 +475,10 @@ Each session is divided into three phases:
|
|
|
499
475
|
| First-pass rate | Percentage of verification chains where the first verdict was pass |
|
|
500
476
|
| Verdicts | Total verdict count (pass + fail) |
|
|
501
477
|
| Avg retries | Average number of fail verdicts before pass per chain |
|
|
502
|
-
| Avg console errs | Average `console_errors` across all verdicts |
|
|
503
|
-
| Avg network fails | Average `network_failures` across all verdicts |
|
|
504
|
-
| Avg pages tested | Average number of pages tested per verdict |
|
|
505
478
|
| Avg checks | Average number of checks performed per verdict |
|
|
506
479
|
|
|
480
|
+
> Per-cycle structural metrics (avg console errors, avg network failures, avg pages tested) are temporarily absent — they depended on agent-claimed evidence that has been removed from the verdict shape. They will return when `verify-gate` derives structural evidence from `tool_call` records (TODO).
|
|
481
|
+
|
|
507
482
|
#### Code Changes
|
|
508
483
|
|
|
509
484
|
| Metric | Meaning |
|
|
@@ -531,7 +506,7 @@ Three scores summarize the session:
|
|
|
531
506
|
| Score | Formula | What it measures |
|
|
532
507
|
|-------|---------|-----------------|
|
|
533
508
|
| **Efficiency** | `coding_time / (coding_time + fix_time) × 100` | How much productive time vs fix overhead. High = minimal wasted time on fixes |
|
|
534
|
-
| **Quality** | `(pass_pct +
|
|
509
|
+
| **Quality** | `(pass_pct + checks_pct) / 2` | How thorough the verification was. Components: pass rate, check depth (5+ checks = 100%). Page-coverage and error-cleanliness components were temporarily removed (depended on agent-claimed evidence) — they'll return when verify-gate derives structural evidence from `tool_call` records. |
|
|
535
510
|
| **Confidence** | `pass_count / total_verdicts × 100` | How likely the agent's code works. Based on verdict pass rate |
|
|
536
511
|
|
|
537
512
|
### Project Analysis
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"scoring.d.ts","sourceRoot":"","sources":["../../src/analysis/scoring.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,EAAE,YAAY,EAAE,MAAM,iBAAiB,CAAC;AAC/C,OAAO,EAAE,2BAA2B,EAAE,MAAM,wBAAwB,CAAC;AAErE,MAAM,WAAW,cAAc;IAC3B,oBAAoB,EAAE,MAAM,CAAC;IAC7B,wBAAwB,EAAE,MAAM,CAAC;IACjC,mBAAmB,EAAE,MAAM,CAAC;CAC/B;AAED,wBAAgB,gBAAgB,CAC5B,IAAI,EAAE,YAAY,GAAG,IAAI,EACzB,OAAO,EAAE,2BAA2B,GAAG,IAAI,GAC5C,cAAc,GAAG,IAAI,
|
|
1
|
+
{"version":3,"file":"scoring.d.ts","sourceRoot":"","sources":["../../src/analysis/scoring.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,EAAE,YAAY,EAAE,MAAM,iBAAiB,CAAC;AAC/C,OAAO,EAAE,2BAA2B,EAAE,MAAM,wBAAwB,CAAC;AAErE,MAAM,WAAW,cAAc;IAC3B,oBAAoB,EAAE,MAAM,CAAC;IAC7B,wBAAwB,EAAE,MAAM,CAAC;IACjC,mBAAmB,EAAE,MAAM,CAAC;CAC/B;AAED,wBAAgB,gBAAgB,CAC5B,IAAI,EAAE,YAAY,GAAG,IAAI,EACzB,OAAO,EAAE,2BAA2B,GAAG,IAAI,GAC5C,cAAc,GAAG,IAAI,CAmDvB"}
|
package/dist/analysis/scoring.js
CHANGED
|
@@ -25,17 +25,19 @@ function calculateScoring(time, quality) {
|
|
|
25
25
|
agentEfficiencyScore = Math.round((codingTime / productiveTime) * 100);
|
|
26
26
|
}
|
|
27
27
|
// --- Verification Quality Score (0-100) ---
|
|
28
|
-
// Average of
|
|
28
|
+
// Average of two components: pass rate, check depth.
|
|
29
|
+
// TODO: when verify-gate derives structural evidence from tool_call records
|
|
30
|
+
// (pages tested, console error count, network failure count), add those
|
|
31
|
+
// components back here (page coverage, error cleanliness). See
|
|
32
|
+
// `verify-gate.ts` evidence-extractor TODO.
|
|
29
33
|
let verificationQualityScore;
|
|
30
34
|
if (quality === null || quality.totalVerifications === 0) {
|
|
31
35
|
verificationQualityScore = 0;
|
|
32
36
|
}
|
|
33
37
|
else {
|
|
34
38
|
const passPct = (quality.passCount / quality.totalVerifications) * 100;
|
|
35
|
-
const pagesPct = Math.min(quality.averagePagesTestedCount / 3, 1) * 100;
|
|
36
39
|
const checksPct = Math.min(quality.averageChecksCount / 5, 1) * 100;
|
|
37
|
-
|
|
38
|
-
verificationQualityScore = Math.round((passPct + pagesPct + checksPct + cleanPct) / 4);
|
|
40
|
+
verificationQualityScore = Math.round((passPct + checksPct) / 2);
|
|
39
41
|
}
|
|
40
42
|
// --- Code Confidence Score (0-100) ---
|
|
41
43
|
// pass_count / total_verdicts * 100
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"scoring.js","sourceRoot":"","sources":["../../src/analysis/scoring.ts"],"names":[],"mappings":";AAAA;;;;;GAKG;;AAWH,
|
|
1
|
+
{"version":3,"file":"scoring.js","sourceRoot":"","sources":["../../src/analysis/scoring.ts"],"names":[],"mappings":";AAAA;;;;;GAKG;;AAWH,4CAsDC;AAtDD,SAAgB,gBAAgB,CAC5B,IAAyB,EACzB,OAA2C;IAE3C,IAAI,IAAI,KAAK,IAAI,IAAI,OAAO,KAAK,IAAI,EAAE,CAAC;QACpC,OAAO,IAAI,CAAC;IAChB,CAAC;IAED,yCAAyC;IACzC,+CAA+C;IAC/C,kDAAkD;IAClD,MAAM,UAAU,GAAW,IAAI,KAAK,IAAI,CAAC,CAAC,CAAC,IAAI,CAAC,UAAU,CAAC,CAAC,CAAC,CAAC,CAAC;IAC/D,MAAM,UAAU,GAAW,IAAI,KAAK,IAAI,CAAC,CAAC,CAAC,IAAI,CAAC,OAAO,CAAC,CAAC,CAAC,CAAC,CAAC;IAC5D,MAAM,cAAc,GAAW,UAAU,GAAG,UAAU,CAAC;IAEvD,IAAI,oBAA4B,CAAC;IACjC,IAAI,cAAc,KAAK,CAAC,EAAE,CAAC;QACvB,oBAAoB,GAAG,GAAG,CAAC;IAC/B,CAAC;SAAM,CAAC;QACJ,oBAAoB,GAAG,IAAI,CAAC,KAAK,CAAC,CAAC,UAAU,GAAG,cAAc,CAAC,GAAG,GAAG,CAAC,CAAC;IAC3E,CAAC;IAED,6CAA6C;IAC7C,qDAAqD;IACrD,4EAA4E;IAC5E,wEAAwE;IACxE,+DAA+D;IAC/D,4CAA4C;IAC5C,IAAI,wBAAgC,CAAC;IACrC,IAAI,OAAO,KAAK,IAAI,IAAI,OAAO,CAAC,kBAAkB,KAAK,CAAC,EAAE,CAAC;QACvD,wBAAwB,GAAG,CAAC,CAAC;IACjC,CAAC;SAAM,CAAC;QACJ,MAAM,OAAO,GAAW,CAAC,OAAO,CAAC,SAAS,GAAG,OAAO,CAAC,kBAAkB,CAAC,GAAG,GAAG,CAAC;QAC/E,MAAM,SAAS,GAAW,IAAI,CAAC,GAAG,CAAC,OAAO,CAAC,kBAAkB,GAAG,CAAC,EAAE,CAAC,CAAC,GAAG,GAAG,CAAC;QAC5E,wBAAwB,GAAG,IAAI,CAAC,KAAK,CAAC,CAAC,OAAO,GAAG,SAAS,CAAC,GAAG,CAAC,CAAC,CAAC;IACrE,CAAC;IAED,wCAAwC;IACxC,oCAAoC;IACpC,MAAM,SAAS,GAAW,OAAO,KAAK,IAAI,CAAC,CAAC,CAAC,OAAO,CAAC,SAAS,CAAC,CAAC,CAAC,CAAC,CAAC;IACnE,MAAM,aAAa,GAAW,OAAO,KAAK,IAAI,CAAC,CAAC,CAAC,OAAO,CAAC,kBAAkB,CAAC,CAAC,CAAC,CAAC,CAAC;IAEhF,IAAI,mBAA2B,CAAC;IAChC,IAAI,aAAa,KAAK,CAAC,EAAE,CAAC;QACtB,mBAAmB,GAAG,GAAG,CAAC;IAC9B,CAAC;SAAM,CAAC;QACJ,mBAAmB,GAAG,IAAI,CAAC,KAAK,CAAC,CAAC,SAAS,GAAG,aAAa,CAAC,GAAG,GAAG,CAAC,CAAC;IACxE,CAAC;IAED,OAAO;QACH,oBAAoB;QACpB,wBAAwB;QACxB,mBAAmB;KACtB,CAAC;AACN,CAAC"}
|
|
@@ -1,19 +1,19 @@
|
|
|
1
1
|
/**
|
|
2
2
|
* IronBee — Verdict Details Extraction
|
|
3
3
|
*
|
|
4
|
-
* Extracts raw verdict data (checks, issues, fixes
|
|
5
|
-
*
|
|
4
|
+
* Extracts raw verdict data (status, checks, issues, fixes) from
|
|
5
|
+
* actions.jsonl for semantic analysis by LLM agents. Per-cycle structural
|
|
6
|
+
* evidence (pages tested, console errors, probes, endpoints called, …) is
|
|
7
|
+
* NOT part of the verdict shape — it will be derived from `tool_call`
|
|
8
|
+
* records by a future extractor. See `verify-gate.ts` TODO.
|
|
6
9
|
* Pure logic — no process.exit, no stdin, no side effects.
|
|
7
10
|
*/
|
|
8
11
|
export interface VerdictDetail {
|
|
9
12
|
timestamp: number;
|
|
10
13
|
status: string;
|
|
11
|
-
pages_tested: string[];
|
|
12
14
|
checks: string[];
|
|
13
15
|
issues: string[];
|
|
14
16
|
fixes: string[];
|
|
15
|
-
console_errors: number;
|
|
16
|
-
network_failures: number;
|
|
17
17
|
verification_id?: string;
|
|
18
18
|
}
|
|
19
19
|
export interface VerdictDetailsResult {
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"verdict-details.d.ts","sourceRoot":"","sources":["../../src/analysis/verdict-details.ts"],"names":[],"mappings":"AAAA
|
|
1
|
+
{"version":3,"file":"verdict-details.d.ts","sourceRoot":"","sources":["../../src/analysis/verdict-details.ts"],"names":[],"mappings":"AAAA;;;;;;;;;GASG;AAKH,MAAM,WAAW,aAAa;IAC1B,SAAS,EAAE,MAAM,CAAC;IAClB,MAAM,EAAE,MAAM,CAAC;IACf,MAAM,EAAE,MAAM,EAAE,CAAC;IACjB,MAAM,EAAE,MAAM,EAAE,CAAC;IACjB,KAAK,EAAE,MAAM,EAAE,CAAC;IAChB,eAAe,CAAC,EAAE,MAAM,CAAC;CAC5B;AAED,MAAM,WAAW,oBAAoB;IACjC,QAAQ,EAAE,aAAa,EAAE,CAAC;CAC7B;AAED,wBAAgB,qBAAqB,CAAC,WAAW,EAAE,MAAM,GAAG,oBAAoB,GAAG,IAAI,CAkDtF"}
|
|
@@ -2,8 +2,11 @@
|
|
|
2
2
|
/**
|
|
3
3
|
* IronBee — Verdict Details Extraction
|
|
4
4
|
*
|
|
5
|
-
* Extracts raw verdict data (checks, issues, fixes
|
|
6
|
-
*
|
|
5
|
+
* Extracts raw verdict data (status, checks, issues, fixes) from
|
|
6
|
+
* actions.jsonl for semantic analysis by LLM agents. Per-cycle structural
|
|
7
|
+
* evidence (pages tested, console errors, probes, endpoints called, …) is
|
|
8
|
+
* NOT part of the verdict shape — it will be derived from `tool_call`
|
|
9
|
+
* records by a future extractor. See `verify-gate.ts` TODO.
|
|
7
10
|
* Pure logic — no process.exit, no stdin, no side effects.
|
|
8
11
|
*/
|
|
9
12
|
Object.defineProperty(exports, "__esModule", { value: true });
|
|
@@ -35,12 +38,9 @@ function extractVerdictDetails(actionsFile) {
|
|
|
35
38
|
const detail = {
|
|
36
39
|
timestamp: entry.timestamp,
|
|
37
40
|
status: typeof verdict.status === "string" ? verdict.status : "unknown",
|
|
38
|
-
pages_tested: Array.isArray(verdict.pages_tested) ? verdict.pages_tested.map(String) : [],
|
|
39
41
|
checks: Array.isArray(verdict.checks) ? verdict.checks.map(String) : [],
|
|
40
42
|
issues: Array.isArray(verdict.issues) ? verdict.issues.map(String) : [],
|
|
41
43
|
fixes: Array.isArray(verdict.fixes) ? verdict.fixes.map(String) : [],
|
|
42
|
-
console_errors: typeof verdict.console_errors === "number" ? verdict.console_errors : 0,
|
|
43
|
-
network_failures: typeof verdict.network_failures === "number" ? verdict.network_failures : 0,
|
|
44
44
|
};
|
|
45
45
|
if (typeof entry.verification_id === "string") {
|
|
46
46
|
detail.verification_id = entry.verification_id;
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"verdict-details.js","sourceRoot":"","sources":["../../src/analysis/verdict-details.ts"],"names":[],"mappings":";AAAA
|
|
1
|
+
{"version":3,"file":"verdict-details.js","sourceRoot":"","sources":["../../src/analysis/verdict-details.ts"],"names":[],"mappings":";AAAA;;;;;;;;;GASG;;AAkBH,sDAkDC;AAlED,2BAA8C;AAgB9C,SAAgB,qBAAqB,CAAC,WAAmB;IACrD,IAAI,CAAC,IAAA,eAAU,EAAC,WAAW,CAAC,EAAE,CAAC;QAC3B,OAAO,IAAI,CAAC;IAChB,CAAC;IAED,IAAI,OAAe,CAAC;IACpB,IAAI,CAAC;QACD,OAAO,GAAG,IAAA,iBAAY,EAAC,WAAW,EAAE,OAAO,CAAC,CAAC;IACjD,CAAC;IAAC,MAAM,CAAC;QACL,OAAO,IAAI,CAAC;IAChB,CAAC;IAED,MAAM,KAAK,GAAa,OAAO,CAAC,IAAI,EAAE,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,MAAM,CAAC,CAAC,CAAS,EAAW,EAAE,CAAC,CAAC,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC;IAChG,MAAM,QAAQ,GAAoB,EAAE,CAAC;IAErC,KAAK,MAAM,IAAI,IAAI,KAAK,EAAE,CAAC;QACvB,IAAI,CAAC;YACD,MAAM,KAAK,GAAU,IAAI,CAAC,KAAK,CAAC,IAAI,CAAU,CAAC;YAC/C,IAAI,KAAK,CAAC,IAAI,KAAK,eAAe,EAAE,CAAC;gBACjC,SAAS;YACb,CAAC;YAED,MAAM,OAAO,GAA4B,KAAK,CAAC,OAAkC,CAAC;YAClF,IAAI,CAAC,OAAO,EAAE,CAAC;gBACX,SAAS;YACb,CAAC;YAED,MAAM,MAAM,GAAkB;gBAC1B,SAAS,EAAE,KAAK,CAAC,SAAS;gBAC1B,MAAM,EAAE,OAAO,OAAO,CAAC,MAAM,KAAK,QAAQ,CAAC,CAAC,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC,CAAC,SAAS;gBACvE,MAAM,EAAE,KAAK,CAAC,OAAO,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC,CAAC,CAAC,OAAO,CAAC,MAAM,CAAC,GAAG,CAAC,MAAM,CAAC,CAAC,CAAC,CAAC,EAAE;gBACvE,MAAM,EAAE,KAAK,CAAC,OAAO,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC,CAAC,CAAC,OAAO,CAAC,MAAM,CAAC,GAAG,CAAC,MAAM,CAAC,CAAC,CAAC,CAAC,EAAE;gBACvE,KAAK,EAAE,KAAK,CAAC,OAAO,CAAC,OAAO,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,OAAO,CAAC,KAAK,CAAC,GAAG,CAAC,MAAM,CAAC,CAAC,CAAC,CAAC,EAAE;aACvE,CAAC;YAEF,IAAI,OAAO,KAAK,CAAC,eAAe,KAAK,QAAQ,EAAE,CAAC;gBAC5C,MAAM,CAAC,eAAe,GAAG,KAAK,CAAC,eAAe,CAAC;YACnD,CAAC;YAED,QAAQ,CAAC,IAAI,CAAC,MAAM,CAAC,CAAC;QAC1B,CAAC;QAAC,MAAM,CAAC;YACL,uBAAuB;QAC3B,CAAC;IACL,CAAC;IAED,IAAI,QAAQ,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;QACxB,OAAO,IAAI,CAAC;IAChB,CAAC;IAED,OAAO,EAAE,QAAQ,EAAE,CAAC;AACxB,CAAC"}
|
|
@@ -2,6 +2,10 @@
|
|
|
2
2
|
* IronBee — Verification Quality Analysis
|
|
3
3
|
*
|
|
4
4
|
* Reads actions.jsonl and calculates verification quality metrics.
|
|
5
|
+
* Per-cycle structural metrics (avg console errors, network failures,
|
|
6
|
+
* pages tested) have been removed — those values are no longer part of
|
|
7
|
+
* the verdict shape; they will return when verify-gate derives them
|
|
8
|
+
* from `tool_call` records. See `verify-gate.ts` TODO.
|
|
5
9
|
* Pure logic — no process.exit, no stdin, no side effects.
|
|
6
10
|
*/
|
|
7
11
|
export interface VerificationQualityAnalysis {
|
|
@@ -10,9 +14,6 @@ export interface VerificationQualityAnalysis {
|
|
|
10
14
|
passCount: number;
|
|
11
15
|
failCount: number;
|
|
12
16
|
averageRetries: number;
|
|
13
|
-
averageConsoleErrors: number;
|
|
14
|
-
averageNetworkFailures: number;
|
|
15
|
-
averagePagesTestedCount: number;
|
|
16
17
|
averageChecksCount: number;
|
|
17
18
|
}
|
|
18
19
|
export declare function analyzeVerificationQuality(actionsFile: string): VerificationQualityAnalysis | null;
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"verification-quality.d.ts","sourceRoot":"","sources":["../../src/analysis/verification-quality.ts"],"names":[],"mappings":"AAAA
|
|
1
|
+
{"version":3,"file":"verification-quality.d.ts","sourceRoot":"","sources":["../../src/analysis/verification-quality.ts"],"names":[],"mappings":"AAAA;;;;;;;;;GASG;AAKH,MAAM,WAAW,2BAA2B;IACxC,oBAAoB,EAAE,MAAM,CAAC;IAC7B,kBAAkB,EAAE,MAAM,CAAC;IAC3B,SAAS,EAAE,MAAM,CAAC;IAClB,SAAS,EAAE,MAAM,CAAC;IAClB,cAAc,EAAE,MAAM,CAAC;IACvB,kBAAkB,EAAE,MAAM,CAAC;CAC9B;AA6ED,wBAAgB,0BAA0B,CAAC,WAAW,EAAE,MAAM,GAAG,2BAA2B,GAAG,IAAI,CAoFlG"}
|
|
@@ -3,6 +3,10 @@
|
|
|
3
3
|
* IronBee — Verification Quality Analysis
|
|
4
4
|
*
|
|
5
5
|
* Reads actions.jsonl and calculates verification quality metrics.
|
|
6
|
+
* Per-cycle structural metrics (avg console errors, network failures,
|
|
7
|
+
* pages tested) have been removed — those values are no longer part of
|
|
8
|
+
* the verdict shape; they will return when verify-gate derives them
|
|
9
|
+
* from `tool_call` records. See `verify-gate.ts` TODO.
|
|
6
10
|
* Pure logic — no process.exit, no stdin, no side effects.
|
|
7
11
|
*/
|
|
8
12
|
Object.defineProperty(exports, "__esModule", { value: true });
|
|
@@ -37,9 +41,6 @@ function extractVerdictData(entry) {
|
|
|
37
41
|
const v = verdict;
|
|
38
42
|
return {
|
|
39
43
|
status: typeof v.status === "string" ? v.status : undefined,
|
|
40
|
-
console_errors: typeof v.console_errors === "number" ? v.console_errors : undefined,
|
|
41
|
-
network_failures: typeof v.network_failures === "number" ? v.network_failures : undefined,
|
|
42
|
-
pages_tested: Array.isArray(v.pages_tested) ? v.pages_tested : undefined,
|
|
43
44
|
checks: Array.isArray(v.checks) ? v.checks : undefined,
|
|
44
45
|
};
|
|
45
46
|
}
|
|
@@ -102,15 +103,13 @@ function analyzeVerificationQuality(actionsFile) {
|
|
|
102
103
|
failCount++;
|
|
103
104
|
}
|
|
104
105
|
}
|
|
105
|
-
// Average retries: for each chain that ends in pass, count fails before the pass
|
|
106
|
+
// Average retries: for each chain that ends in pass, count fails before the pass.
|
|
107
|
+
// Chains that never reach pass don't contribute (average is over pass-terminated chains).
|
|
106
108
|
const retryCounts = [];
|
|
107
109
|
for (const chain of chains) {
|
|
108
|
-
// Find the first pass in the chain
|
|
109
110
|
let failsBeforePass = 0;
|
|
110
|
-
let foundPass = false;
|
|
111
111
|
for (const v of chain.verdicts) {
|
|
112
112
|
if (v.status === "pass") {
|
|
113
|
-
foundPass = true;
|
|
114
113
|
retryCounts.push(failsBeforePass);
|
|
115
114
|
break;
|
|
116
115
|
}
|
|
@@ -118,52 +117,19 @@ function analyzeVerificationQuality(actionsFile) {
|
|
|
118
117
|
failsBeforePass++;
|
|
119
118
|
}
|
|
120
119
|
}
|
|
121
|
-
// Chains that never reach pass don't contribute to average retries
|
|
122
|
-
if (!foundPass) {
|
|
123
|
-
// still count them — they had all fails as "retries" with no success
|
|
124
|
-
// but per spec, average retries is about retries before pass
|
|
125
|
-
// so we skip chains with no pass
|
|
126
|
-
}
|
|
127
120
|
}
|
|
128
121
|
const averageRetries = retryCounts.length > 0
|
|
129
122
|
? Math.round((retryCounts.reduce((a, b) => a + b, 0) / retryCounts.length) * 10) / 10
|
|
130
123
|
: 0;
|
|
131
|
-
//
|
|
132
|
-
let totalConsoleErrors = 0;
|
|
133
|
-
let consoleErrorCount = 0;
|
|
134
|
-
let totalNetworkFailures = 0;
|
|
135
|
-
let networkFailureCount = 0;
|
|
136
|
-
let totalPagesTestedCount = 0;
|
|
137
|
-
let pagesTestedEntries = 0;
|
|
124
|
+
// Average checks count
|
|
138
125
|
let totalChecksCount = 0;
|
|
139
126
|
let checksEntries = 0;
|
|
140
127
|
for (const v of allVerdicts) {
|
|
141
|
-
if (v.console_errors !== undefined) {
|
|
142
|
-
totalConsoleErrors += v.console_errors;
|
|
143
|
-
consoleErrorCount++;
|
|
144
|
-
}
|
|
145
|
-
if (v.network_failures !== undefined) {
|
|
146
|
-
totalNetworkFailures += v.network_failures;
|
|
147
|
-
networkFailureCount++;
|
|
148
|
-
}
|
|
149
|
-
if (v.pages_tested !== undefined) {
|
|
150
|
-
totalPagesTestedCount += v.pages_tested.length;
|
|
151
|
-
pagesTestedEntries++;
|
|
152
|
-
}
|
|
153
128
|
if (v.checks !== undefined) {
|
|
154
129
|
totalChecksCount += v.checks.length;
|
|
155
130
|
checksEntries++;
|
|
156
131
|
}
|
|
157
132
|
}
|
|
158
|
-
const averageConsoleErrors = consoleErrorCount > 0
|
|
159
|
-
? Math.round((totalConsoleErrors / consoleErrorCount) * 10) / 10
|
|
160
|
-
: 0;
|
|
161
|
-
const averageNetworkFailures = networkFailureCount > 0
|
|
162
|
-
? Math.round((totalNetworkFailures / networkFailureCount) * 10) / 10
|
|
163
|
-
: 0;
|
|
164
|
-
const averagePagesTestedCount = pagesTestedEntries > 0
|
|
165
|
-
? Math.round((totalPagesTestedCount / pagesTestedEntries) * 10) / 10
|
|
166
|
-
: 0;
|
|
167
133
|
const averageChecksCount = checksEntries > 0
|
|
168
134
|
? Math.round((totalChecksCount / checksEntries) * 10) / 10
|
|
169
135
|
: 0;
|
|
@@ -173,9 +139,6 @@ function analyzeVerificationQuality(actionsFile) {
|
|
|
173
139
|
passCount,
|
|
174
140
|
failCount,
|
|
175
141
|
averageRetries,
|
|
176
|
-
averageConsoleErrors,
|
|
177
|
-
averageNetworkFailures,
|
|
178
|
-
averagePagesTestedCount,
|
|
179
142
|
averageChecksCount,
|
|
180
143
|
};
|
|
181
144
|
}
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"verification-quality.js","sourceRoot":"","sources":["../../src/analysis/verification-quality.ts"],"names":[],"mappings":";AAAA
|
|
1
|
+
{"version":3,"file":"verification-quality.js","sourceRoot":"","sources":["../../src/analysis/verification-quality.ts"],"names":[],"mappings":";AAAA;;;;;;;;;GASG;;AAyFH,gEAoFC;AA3KD,2BAA8C;AAqB9C,SAAS,YAAY,CAAC,WAAmB;IACrC,IAAI,CAAC,IAAA,eAAU,EAAC,WAAW,CAAC,EAAE,CAAC;QAC3B,OAAO,EAAE,CAAC;IACd,CAAC;IAED,MAAM,OAAO,GAAW,IAAA,iBAAY,EAAC,WAAW,EAAE,OAAO,CAAC,CAAC;IAC3D,MAAM,KAAK,GAAa,OAAO,CAAC,IAAI,EAAE,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,MAAM,CAAC,CAAC,CAAS,EAAW,EAAE,CAAC,CAAC,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC;IAChG,MAAM,OAAO,GAAY,EAAE,CAAC;IAE5B,KAAK,MAAM,IAAI,IAAI,KAAK,EAAE,CAAC;QACvB,IAAI,CAAC;YACD,MAAM,KAAK,GAAU,IAAI,CAAC,KAAK,CAAC,IAAI,CAAU,CAAC;YAC/C,OAAO,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;QACxB,CAAC;QAAC,MAAM,CAAC;YACL,uBAAuB;QAC3B,CAAC;IACL,CAAC;IAED,OAAO,OAAO,CAAC;AACnB,CAAC;AAED,SAAS,kBAAkB,CAAC,KAAY;IACpC,IAAI,KAAK,CAAC,IAAI,KAAK,eAAe,EAAE,CAAC;QACjC,OAAO,IAAI,CAAC;IAChB,CAAC;IAED,MAAM,OAAO,GAAa,KAAiC,CAAC,OAAO,CAAC;IACpE,IAAI,OAAO,OAAO,KAAK,QAAQ,IAAI,OAAO,KAAK,IAAI,EAAE,CAAC;QAClD,OAAO,IAAI,CAAC;IAChB,CAAC;IAED,MAAM,CAAC,GAA4B,OAAkC,CAAC;IACtE,OAAO;QACH,MAAM,EAAE,OAAO,CAAC,CAAC,MAAM,KAAK,QAAQ,CAAC,CAAC,CAAC,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,SAAS;QAC3D,MAAM,EAAE,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,MAAmB,CAAC,CAAC,CAAC,SAAS;KACtE,CAAC;AACN,CAAC;AAED,SAAS,WAAW,CAAC,OAAgB;IACjC,MAAM,MAAM,GAAY,EAAE,CAAC;IAC3B,IAAI,YAAY,GAAU,EAAE,QAAQ,EAAE,EAAE,EAAE,CAAC;IAE3C,KAAK,MAAM,KAAK,IAAI,OAAO,EAAE,CAAC;QAC1B,IAAI,KAAK,CAAC,IAAI,KAAK,eAAe,EAAE,CAAC;YACjC,MAAM,IAAI,GAAuB,kBAAkB,CAAC,KAAK,CAAC,CAAC;YAC3D,IAAI,IAAI,KAAK,IAAI,EAAE,CAAC;gBAChB,YAAY,CAAC,QAAQ,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;YACrC,CAAC;QACL,CAAC;QAED,IAAI,KAAK,CAAC,IAAI,KAAK,wBAAwB,IAAK,KAAiC,CAAC,MAAM,KAAK,OAAO,EAAE,CAAC;YACnG,IAAI,YAAY,CAAC,QAAQ,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;gBACnC,MAAM,CAAC,IAAI,CAAC,YAAY,CAAC,CAAC;YAC9B,CAAC;YACD,YAAY,GAAG,EAAE,QAAQ,EAAE,EAAE,EAAE,CAAC;QACpC,CAAC;IACL,CAAC;IAED,yCAAyC;IACzC,IAAI,YAAY,CAAC,QAAQ,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;QACnC,MAAM,CAAC,IAAI,CAAC,YAAY,CAAC,CAAC;IAC9B,CAAC;IAED,OAAO,MAAM,CAAC;AAClB,CAAC;AAED,SAAgB,0BAA0B,CAAC,WAAmB;IAC1D,MAAM,OAAO,GAAY,YAAY,CAAC,WAAW,CAAC,CAAC;IAEnD,oCAAoC;IACpC,MAAM,WAAW,GAAkB,EAAE,CAAC;IACtC,KAAK,MAAM,KAAK,IAAI,OAAO,EAAE,CAAC;QAC1B,MAAM,IAAI,GAAuB,kBAAkB,CAAC,KAAK,CAAC,CAAC;QAC3D,IAAI,IAAI,KAAK,IAAI,EAAE,CAAC;YAChB,WAAW,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;QAC3B,CAAC;IACL,CAAC;IAED,IAAI,WAAW,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;QAC3B,OAAO,IAAI,CAAC;IAChB,CAAC;IAED,MAAM,MAAM,GAAY,WAAW,CAAC,OAAO,CAAC,CAAC;IAE7C,0BAA0B;IAC1B,IAAI,kBAAkB,GAAW,CAAC,CAAC;IACnC,MAAM,WAAW,GAAW,MAAM,CAAC,MAAM,CAAC;IAE1C,KAAK,MAAM,KAAK,IAAI,MAAM,EAAE,CAAC;QACzB,IAAI,KAAK,CAAC,QAAQ,CAAC,MAAM,GAAG,CAAC,IAAI,KAAK,CAAC,QAAQ,CAAC,CAAC,CAAC,CAAC,MAAM,KAAK,MAAM,EAAE,CAAC;YACnE,kBAAkB,EAAE,CAAC;QACzB,CAAC;IACL,CAAC;IAED,MAAM,oBAAoB,GAAW,WAAW,GAAG,CAAC;QAChD,CAAC,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC,kBAAkB,GAAG,WAAW,CAAC,GAAG,GAAG,CAAC;QACtD,CAAC,CAAC,CAAC,CAAC;IAER,qBAAqB;IACrB,IAAI,SAAS,GAAW,CAAC,CAAC;IAC1B,IAAI,SAAS,GAAW,CAAC,CAAC;IAC1B,KAAK,MAAM,CAAC,IAAI,WAAW,EAAE,CAAC;QAC1B,IAAI,CAAC,CAAC,MAAM,KAAK,MAAM,EAAE,CAAC;YACtB,SAAS,EAAE,CAAC;QAChB,CAAC;aAAM,IAAI,CAAC,CAAC,MAAM,KAAK,MAAM,EAAE,CAAC;YAC7B,SAAS,EAAE,CAAC;QAChB,CAAC;IACL,CAAC;IAED,kFAAkF;IAClF,0FAA0F;IAC1F,MAAM,WAAW,GAAa,EAAE,CAAC;IACjC,KAAK,MAAM,KAAK,IAAI,MAAM,EAAE,CAAC;QACzB,IAAI,eAAe,GAAW,CAAC,CAAC;QAChC,KAAK,MAAM,CAAC,IAAI,KAAK,CAAC,QAAQ,EAAE,CAAC;YAC7B,IAAI,CAAC,CAAC,MAAM,KAAK,MAAM,EAAE,CAAC;gBACtB,WAAW,CAAC,IAAI,CAAC,eAAe,CAAC,CAAC;gBAClC,MAAM;YACV,CAAC;YACD,IAAI,CAAC,CAAC,MAAM,KAAK,MAAM,EAAE,CAAC;gBACtB,eAAe,EAAE,CAAC;YACtB,CAAC;QACL,CAAC;IACL,CAAC;IAED,MAAM,cAAc,GAAW,WAAW,CAAC,MAAM,GAAG,CAAC;QACjD,CAAC,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC,WAAW,CAAC,MAAM,CAAC,CAAC,CAAS,EAAE,CAAS,EAAU,EAAE,CAAC,CAAC,GAAG,CAAC,EAAE,CAAC,CAAC,GAAG,WAAW,CAAC,MAAM,CAAC,GAAG,EAAE,CAAC,GAAG,EAAE;QAC7G,CAAC,CAAC,CAAC,CAAC;IAER,uBAAuB;IACvB,IAAI,gBAAgB,GAAW,CAAC,CAAC;IACjC,IAAI,aAAa,GAAW,CAAC,CAAC;IAC9B,KAAK,MAAM,CAAC,IAAI,WAAW,EAAE,CAAC;QAC1B,IAAI,CAAC,CAAC,MAAM,KAAK,SAAS,EAAE,CAAC;YACzB,gBAAgB,IAAI,CAAC,CAAC,MAAM,CAAC,MAAM,CAAC;YACpC,aAAa,EAAE,CAAC;QACpB,CAAC;IACL,CAAC;IACD,MAAM,kBAAkB,GAAW,aAAa,GAAG,CAAC;QAChD,CAAC,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC,gBAAgB,GAAG,aAAa,CAAC,GAAG,EAAE,CAAC,GAAG,EAAE;QAC1D,CAAC,CAAC,CAAC,CAAC;IAER,OAAO;QACH,oBAAoB;QACpB,kBAAkB,EAAE,WAAW,CAAC,MAAM;QACtC,SAAS;QACT,SAAS;QACT,cAAc;QACd,kBAAkB;KACrB,CAAC;AACN,CAAC"}
|
|
@@ -1,17 +1,17 @@
|
|
|
1
1
|
# IronBee Verify
|
|
2
2
|
|
|
3
|
-
Verify the current code changes through real tools. The gate runs every cycle that has been wired up for this project, and all active cycles must be satisfied within a single verification cycle for `status: pass`. Each cycle has its own tools
|
|
3
|
+
Verify the current code changes through real tools. The gate runs every cycle that has been wired up for this project, and all active cycles must be satisfied within a single verification cycle for `status: pass`. Each cycle has its own tools and flow — **see the platform sections near the bottom of this file** for which cycles apply and what to call. The verdict shape itself is platform-agnostic (`status`, `checks`, `issues?`, `fixes?`); the gate enforces that you called each cycle's required tools and that `checks` is non-empty.
|
|
4
4
|
|
|
5
5
|
## Universal steps
|
|
6
6
|
|
|
7
7
|
1. **Start verification**: Run `echo '{"session_id":"<your-session-id>"}' | ironbee hook verification-start` via Bash (substitute the actual session ID printed by the SessionStart hook).
|
|
8
8
|
2. **Build and start** the application if not already running.
|
|
9
9
|
3. **For every active cycle, run its flow** as described in the platform sections near the bottom of this file. All active cycles must be exercised within this same verification cycle.
|
|
10
|
-
4. **Stop** the dev server when verification is complete.
|
|
10
|
+
4. **Stop** the dev server when verification is complete (every cycle — including the final one).
|
|
11
11
|
5. **Honor any cycle-specific teardown** noted in the platform sections BEFORE submitting your verdict.
|
|
12
|
-
6. **Submit your verdict** via Bash.
|
|
13
|
-
- Pass: `echo '{"session_id":"...","status":"pass",
|
|
14
|
-
- Fail: `echo '{"session_id":"...","status":"fail",
|
|
12
|
+
6. **Submit your verdict** via Bash. One verdict covers every active cycle:
|
|
13
|
+
- Pass: `echo '{"session_id":"...","status":"pass","checks":["..."]}' | ironbee hook submit-verdict`
|
|
14
|
+
- Fail: `echo '{"session_id":"...","status":"fail","checks":["..."],"issues":["describe what failed"]}' | ironbee hook submit-verdict`
|
|
15
15
|
7. **If failed** → collect ALL issues first (finish testing every active cycle), submit one fail verdict with all issues, then fix everything, rebuild, and re-verify. Do not fix one issue at a time — batch fixes to avoid repeated build/restart cycles.
|
|
16
16
|
8. If pass after a previous fail, include `"fixes"` in the verdict describing what was fixed.
|
|
17
17
|
|
|
@@ -40,8 +40,8 @@ async function run(projectDir) {
|
|
|
40
40
|
if ((0, actions_1.hasToolCallsSinceLastVerdict)(actionsFile)) {
|
|
41
41
|
process.stderr.write(`BLOCKED: You used verification tools (browser-devtools / node-devtools / backend-devtools) but did not submit a verdict. You MUST submit a verdict (pass or fail) before editing code.
|
|
42
42
|
|
|
43
|
-
Submit your verdict first
|
|
44
|
-
echo '{"session_id":"${sessionId}","status":"fail","checks":[...],"issues":["describe what failed"]
|
|
43
|
+
Submit your verdict first:
|
|
44
|
+
echo '{"session_id":"${sessionId}","status":"fail","checks":["..."],"issues":["describe what failed"]}' | ironbee hook submit-verdict
|
|
45
45
|
|
|
46
46
|
Then you can edit code to fix the issues.
|
|
47
47
|
`);
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"session-start.d.ts","sourceRoot":"","sources":["../../../../src/clients/claude/hooks/session-start.ts"],"names":[],"mappings":"AAAA;;;;;;;GAOG;AAgBH,wBAAsB,GAAG,CAAC,UAAU,EAAE,MAAM,GAAG,OAAO,CAAC,IAAI,CAAC,
|
|
1
|
+
{"version":3,"file":"session-start.d.ts","sourceRoot":"","sources":["../../../../src/clients/claude/hooks/session-start.ts"],"names":[],"mappings":"AAAA;;;;;;;GAOG;AAgBH,wBAAsB,GAAG,CAAC,UAAU,EAAE,MAAM,GAAG,OAAO,CAAC,IAAI,CAAC,CAiF3D"}
|
|
@@ -62,18 +62,12 @@ async function run(projectDir) {
|
|
|
62
62
|
const verdictPass = JSON.stringify({
|
|
63
63
|
session_id: sessionId,
|
|
64
64
|
status: "pass",
|
|
65
|
-
pages_tested: ["http://localhost:3000/affected-page"],
|
|
66
65
|
checks: ["form submits successfully", "new item appears in list"],
|
|
67
|
-
console_errors: 0,
|
|
68
|
-
network_failures: 0,
|
|
69
66
|
});
|
|
70
67
|
const verdictFail = JSON.stringify({
|
|
71
68
|
session_id: sessionId,
|
|
72
69
|
status: "fail",
|
|
73
|
-
pages_tested: ["http://localhost:3000/affected-page"],
|
|
74
70
|
checks: ["form renders", "submit button unresponsive"],
|
|
75
|
-
console_errors: 2,
|
|
76
|
-
network_failures: 0,
|
|
77
71
|
issues: ["button click handler not firing", "TypeError in console"],
|
|
78
72
|
});
|
|
79
73
|
(0, output_1.writeAndExit)(`
|
|
@@ -93,7 +87,7 @@ Submit via Bash:
|
|
|
93
87
|
On fail (issues is required):
|
|
94
88
|
echo '${verdictFail}' | ironbee hook submit-verdict
|
|
95
89
|
|
|
96
|
-
Required fields: session_id, status,
|
|
90
|
+
Required fields: session_id, status, checks
|
|
97
91
|
On fail, include: issues (array of strings describing what failed)
|
|
98
92
|
On pass after a previous fail, include: fixes (array of strings describing what was fixed)
|
|
99
93
|
=====================================
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"session-start.js","sourceRoot":"","sources":["../../../../src/clients/claude/hooks/session-start.ts"],"names":[],"mappings":";AAAA;;;;;;;GAOG;;AAgBH,
|
|
1
|
+
{"version":3,"file":"session-start.js","sourceRoot":"","sources":["../../../../src/clients/claude/hooks/session-start.ts"],"names":[],"mappings":";AAAA;;;;;;;GAOG;;AAgBH,kBAiFC;AA/FD,yDAA2F;AAC3F,qEAAuH;AACvH,kCAAiE;AACjE,gDAAyE;AACzE,gDAAyD;AACzD,gDAAmD;AACnD,8CAA+C;AAC/C,sDAA2D;AAOpD,KAAK,UAAU,GAAG,CAAC,UAAkB;IACxC,IAAI,KAA8B,CAAC;IACnC,IAAI,CAAC;QACD,KAAK,GAAG,IAAI,CAAC,KAAK,CAAC,IAAA,iBAAS,GAAE,CAA4B,CAAC;IAC/D,CAAC;IAAC,OAAO,CAAU,EAAE,CAAC;QAClB,eAAM,CAAC,KAAK,CAAC,0BAA0B,CAAC,EAAE,CAAC,CAAC;QAC5C,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;IACpB,CAAC;IAED,MAAM,SAAS,GAAW,KAAK,CAAC,UAAU,IAAI,SAAS,CAAC;IACxD,MAAM,WAAW,GAAW,GAAG,UAAU,sBAAsB,SAAS,gBAAgB,CAAC;IACzF,IAAA,mBAAU,EAAC,GAAG,UAAU,sBAAsB,SAAS,cAAc,CAAC,CAAC;IAEvE,MAAM,UAAU,GAAW,GAAG,UAAU,sBAAsB,SAAS,EAAE,CAAC;IAC1E,wEAAwE;IACxE,wEAAwE;IACxE,qEAAqE;IACrE,iEAAiE;IACjE,qEAAqE;IACrE,sBAAsB;IACtB,IAAA,4BAAY,EAAC,UAAU,EAAE,IAAA,yBAAkB,GAAE,CAAC,CAAC;IAC/C,IAAA,wBAAQ,EAAC,UAAU,EAAE,IAAA,yBAAkB,GAAE,CAAC,CAAC;IAE3C,MAAM,KAAK,GAAuB;QAC9B,GAAG,IAAA,oBAAU,EAAC,WAAW,CAAC;QAC1B,IAAI,EAAE,eAAe;QACrB,SAAS,EAAE,IAAI,CAAC,GAAG,EAAE;QACrB,UAAU,EAAE,SAAS;QACrB,MAAM,EAAE,QAAQ;QAChB,MAAM,EAAE,KAAK,CAAC,MAAM,IAAI,SAAS;KACpC,CAAC;IAEF,MAAM,IAAA,sBAAY,EAAC,WAAW,EAAE,KAAK,CAAC,CAAC;IAEvC,qEAAqE;IACrE,yEAAyE;IACzE,yEAAyE;IACzE,oDAAoD;IACpD,IAAI,KAAK,CAAC,MAAM,KAAK,SAAS,EAAE,CAAC;QAC7B,MAAM,IAAA,mCAAmB,EAAC,UAAU,EAAE,WAAW,EAAE,sBAAY,CAAC,CAAC;IACrE,CAAC;SAAM,CAAC;QACJ,MAAM,IAAA,qCAAqB,EAAC,UAAU,EAAE,WAAW,EAAE,sBAAY,CAAC,CAAC;IACvE,CAAC;IAED,MAAM,IAAA,6BAAiB,EAAC,QAAQ,EAAE,SAAS,EAAE,IAAA,+BAAsB,EAAC,IAAA,mBAAU,EAAC,UAAU,CAAC,CAAC,CAAC,CAAC;IAC7F,eAAM,CAAC,KAAK,CAAC,kBAAkB,SAAS,KAAK,KAAK,CAAC,MAAM,IAAI,SAAS,GAAG,CAAC,CAAC;IAE3E,MAAM,WAAW,GAAW,IAAI,CAAC,SAAS,CAAC;QACvC,UAAU,EAAE,SAAS;QACrB,MAAM,EAAE,MAAM;QACd,MAAM,EAAE,CAAC,2BAA2B,EAAE,0BAA0B,CAAC;KACpE,CAAC,CAAC;IACH,MAAM,WAAW,GAAW,IAAI,CAAC,SAAS,CAAC;QACvC,UAAU,EAAE,SAAS;QACrB,MAAM,EAAE,MAAM;QACd,MAAM,EAAE,CAAC,cAAc,EAAE,4BAA4B,CAAC;QACtD,MAAM,EAAE,CAAC,iCAAiC,EAAE,sBAAsB,CAAC;KACtE,CAAC,CAAC;IAEH,IAAA,qBAAY,EAAC;;;;cAIH,SAAS;;;;;;;;UAQb,WAAW;;;UAGX,WAAW;;;;;;CAMpB,EAAE,CAAC,CAAC,CAAC;AACN,CAAC"}
|
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
## Backend Mode (when `backend.verifyPatterns` matches an edited file)
|
|
4
4
|
|
|
5
|
-
If the project has the backend protocol cycle enabled (`ironbee backend enable` once at setup) and your edits touch matching paths (e.g. `server/**`, `api/**`, `routes/**`, `controllers/**`), the Stop hook also enforces a Backend cycle. The same `verification-start` covers every active cycle;
|
|
5
|
+
If the project has the backend protocol cycle enabled (`ironbee backend enable` once at setup) and your edits touch matching paths (e.g. `server/**`, `api/**`, `routes/**`, `controllers/**`), the Stop hook also enforces a Backend cycle. The same `verification-start` covers every active cycle; one platform-agnostic verdict covers them all.
|
|
6
6
|
|
|
7
7
|
This cycle is **runtime- and language-agnostic** — it works for Node, Java, Python, Go, Rust, Ruby, .NET, PHP, Elixir, Kotlin, Scala. The agent makes real protocol calls (HTTP / gRPC / GraphQL / WebSocket) against the running service, inspects logs, OR reads database state; it never attaches to a process.
|
|
8
8
|
|
|
@@ -45,46 +45,21 @@ The cycle is satisfied by ANY ONE of three evidence paths: protocol-call (you dr
|
|
|
45
45
|
5. **Trace correlation (optional, `o11y_*` primitives):** IronBee already pins the verification cycle's traceId on every backend tool call via `_metadata.traceId` (outranks any session pin), so the orchestrator's correlation root is authoritative. Use `bedt_o11y_get-trace-context` to read it, then pass it to `bedt_log_read { pattern: "<traceId>" }` to slice logs for one flow. `bedt_o11y_new-trace-id` / `bedt_o11y_set-trace-context` are available when you want to anchor a flow under an explicit id (e.g. integration-test runs).
|
|
46
46
|
6. **Submit verdict** including the fields matching the path(s) you exercised. If browser and/or node cycles are also active, include their fields in the SAME verdict — do not submit two verdicts.
|
|
47
47
|
|
|
48
|
-
### Verdict (
|
|
48
|
+
### Verdict (platform-agnostic)
|
|
49
49
|
|
|
50
|
-
|
|
51
|
-
```json
|
|
52
|
-
{
|
|
53
|
-
"session_id": "...",
|
|
54
|
-
"status": "pass",
|
|
55
|
-
"checks": ["POST /api/orders returned 201 with order id", "GET /api/orders/:id reflects new order"],
|
|
56
|
-
"backend_endpoints_called": [
|
|
57
|
-
"POST http://localhost:3000/api/orders",
|
|
58
|
-
"GET http://localhost:3000/api/orders/42"
|
|
59
|
-
],
|
|
60
|
-
"backend_response_statuses": [201, 200],
|
|
61
|
-
"backend_traces_collected": ["00-1234abcd-...01"]
|
|
62
|
-
}
|
|
63
|
-
```
|
|
64
|
-
|
|
65
|
-
Log-evidence path:
|
|
66
|
-
```json
|
|
67
|
-
{
|
|
68
|
-
"session_id": "...",
|
|
69
|
-
"status": "pass",
|
|
70
|
-
"checks": ["api-server logged 'order 42 created' on POST /api/orders", "no ERROR-level lines after the change"],
|
|
71
|
-
"backend_log_sources_read": ["api-server"]
|
|
72
|
-
}
|
|
73
|
-
```
|
|
50
|
+
The verdict shape is the same regardless of which evidence path (protocol-call / log / db) you took — `status` + `checks` (+ `issues` / `fixes` as needed):
|
|
74
51
|
|
|
75
|
-
DB-evidence path:
|
|
76
52
|
```json
|
|
77
53
|
{
|
|
78
54
|
"session_id": "...",
|
|
79
55
|
"status": "pass",
|
|
80
|
-
"checks": ["
|
|
81
|
-
"backend_db_connections_read": ["app"]
|
|
56
|
+
"checks": ["POST /api/orders returned 201 with order id", "GET /api/orders/:id reflects new order"]
|
|
82
57
|
}
|
|
83
58
|
```
|
|
84
59
|
|
|
85
|
-
|
|
60
|
+
The gate requires that AT LEAST one evidence path was actually exercised in your tool calls — `bedt_request_*` for protocol-call, `bedt_log_register-source` + `bedt_log_read*` / `_follow` for log-evidence, or `bedt_db_connect` + a read/diff/snapshot/get-changes for DB-evidence. If none were used, the gate will reject.
|
|
86
61
|
|
|
87
|
-
For a multi-cycle pass (browser + backend, or browser + node + backend), every active cycle's
|
|
62
|
+
For a multi-cycle pass (browser + backend, or browser + node + backend), every active cycle's pass criteria must hold.
|
|
88
63
|
|
|
89
64
|
---
|
|
90
65
|
|
|
@@ -38,8 +38,8 @@ If no argument is given, use **default** mode. `default` and `full` apply to eve
|
|
|
38
38
|
6. **Stop** the dev server when verification is complete
|
|
39
39
|
7. **If recording was started, stop it now** — `mcp__browser-devtools__bdt_content_stop-recording`. submit-verdict rejects with `"recording is still active"` when this step is skipped. (Recording is a server-side opt-in via `recording.enable` — when on, the gate forces `mcp__browser-devtools__bdt_content_start-recording` BEFORE the steps above and demands the matching stop here.)
|
|
40
40
|
8. **Submit your verdict** via Bash:
|
|
41
|
-
- Pass: `echo '{"session_id":"...","status":"pass","
|
|
42
|
-
- Fail: `echo '{"session_id":"...","status":"fail","
|
|
41
|
+
- Pass: `echo '{"session_id":"...","status":"pass","checks":["..."]}' | ironbee hook submit-verdict`
|
|
42
|
+
- Fail: `echo '{"session_id":"...","status":"fail","checks":["..."],"issues":["describe what failed"]}' | ironbee hook submit-verdict`
|
|
43
43
|
9. **If failed** → collect ALL issues first (finish testing all affected pages), submit one fail verdict with all issues, then fix everything, rebuild, and re-verify. Do not fix one issue at a time — batch fixes to avoid repeated build/restart cycles.
|
|
44
44
|
10. If pass after a previous fail, include `"fixes"` in the verdict describing what was fixed
|
|
45
45
|
|