@ironbee-ai/cli 0.9.1 → 0.9.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +12 -0
- package/README.md +7 -7
- package/dist/clients/claude/hooks/require-verdict.js +1 -1
- package/dist/clients/claude/platforms/command-verify.backend.md +72 -13
- package/dist/clients/claude/platforms/command-verify.node.md +8 -8
- package/dist/clients/claude/platforms/rule.backend.md +15 -6
- package/dist/clients/claude/platforms/rule.node.md +3 -3
- package/dist/clients/claude/platforms/skill.backend.md +67 -6
- package/dist/clients/claude/platforms/skill.node.md +10 -10
- package/dist/clients/cursor/hooks/require-verdict.js +1 -1
- package/dist/clients/cursor/platforms/command-verify.backend.md +72 -13
- package/dist/clients/cursor/platforms/command-verify.node.md +8 -8
- package/dist/clients/cursor/platforms/rule.backend.md +15 -6
- package/dist/clients/cursor/platforms/rule.node.md +3 -3
- package/dist/clients/cursor/platforms/skill.backend.md +67 -6
- package/dist/clients/cursor/platforms/skill.node.md +10 -10
- package/dist/hooks/core/actions.d.ts +24 -10
- package/dist/hooks/core/actions.d.ts.map +1 -1
- package/dist/hooks/core/actions.js.map +1 -1
- package/dist/hooks/core/submit-verdict.js +1 -1
- package/dist/hooks/core/submit-verdict.js.map +1 -1
- package/dist/hooks/core/verify-gate.d.ts.map +1 -1
- package/dist/hooks/core/verify-gate.js +45 -20
- package/dist/hooks/core/verify-gate.js.map +1 -1
- package/dist/lib/config.d.ts +29 -2
- package/dist/lib/config.d.ts.map +1 -1
- package/dist/lib/config.js +58 -2
- package/dist/lib/config.js.map +1 -1
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,17 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.9.3 (2026-05-12)
|
|
4
|
+
|
|
5
|
+
### Features
|
|
6
|
+
|
|
7
|
+
* **backend:** support db and o11y domains in the backend platform ([#13](https://github.com/ironbee-ai/ironbee-cli/issues/13)) ([6801ebe](https://github.com/ironbee-ai/ironbee-cli/commit/6801ebe9204cafd4ffccbc305418f8c284db5ba6))
|
|
8
|
+
|
|
9
|
+
## 0.9.2 (2026-05-11)
|
|
10
|
+
|
|
11
|
+
### Features
|
|
12
|
+
|
|
13
|
+
* **backend:** support log domain in the backend platform ([#11](https://github.com/ironbee-ai/ironbee-cli/issues/11)) ([4512275](https://github.com/ironbee-ai/ironbee-cli/commit/4512275b5f90c501481f4b07c13057fea475e3fb))
|
|
14
|
+
|
|
3
15
|
## 0.9.1 (2026-05-11)
|
|
4
16
|
|
|
5
17
|
### Bug Fixes
|
package/README.md
CHANGED
|
@@ -372,7 +372,7 @@ When the agent tries to complete a task, IronBee runs these checks:
|
|
|
372
372
|
- **Browser cycle**: navigate, screenshot, accessibility snapshot, console check (all-of)
|
|
373
373
|
- **Node cycle**: connect; then either probe path (`(put-tracepoint | put-logpoint | put-exceptionpoint) AND get-probe-snapshots`) OR log path (`get-logs`)
|
|
374
374
|
4. **Does a verdict exist?** — The agent must submit a single verdict carrying evidence for every active cycle via `ironbee hook submit-verdict`.
|
|
375
|
-
5. **Is the verdict valid?** — Per active cycle: browser fields (pages_tested, console_errors, network_failures) and/or node fields (
|
|
375
|
+
5. **Is the verdict valid?** — Per active cycle: browser fields (pages_tested, console_errors, network_failures) and/or node fields (node_processes_connected, node_probes_set / node_log_errors).
|
|
376
376
|
6. **Pass or fail?** — `status: "pass"` is honored only if every active cycle's evidence backs the claim. The gate overrides to fail if it doesn't.
|
|
377
377
|
7. **Retry limit** — After `maxRetries` failed attempts (default 3, single global counter), the agent is allowed to complete but must report unresolved issues.
|
|
378
378
|
|
|
@@ -419,23 +419,23 @@ On pass after a previous fail, include a `fixes` array describing what was fixed
|
|
|
419
419
|
}
|
|
420
420
|
```
|
|
421
421
|
|
|
422
|
-
For a **node-cycle** verdict (probe path), use the `
|
|
422
|
+
For a **node-cycle** verdict (probe path), use the `node_*` fields instead of (or alongside) the browser fields:
|
|
423
423
|
|
|
424
424
|
```json
|
|
425
425
|
{
|
|
426
426
|
"session_id": "<your-session-id>",
|
|
427
427
|
"status": "pass",
|
|
428
428
|
"checks": ["POST /api/orders returned 201", "tracepoint at handler.ts:42 fired once"],
|
|
429
|
-
"
|
|
430
|
-
"
|
|
429
|
+
"node_processes_connected": ["pid:12345 (next-server)"],
|
|
430
|
+
"node_probes_set": [
|
|
431
431
|
{ "type": "tracepoint", "location": "src/api/orders.ts:42", "triggered": true }
|
|
432
432
|
],
|
|
433
|
-
"
|
|
434
|
-
"
|
|
433
|
+
"node_probe_snapshots_collected": 1,
|
|
434
|
+
"node_log_errors": []
|
|
435
435
|
}
|
|
436
436
|
```
|
|
437
437
|
|
|
438
|
-
If both cycles are active, populate browser fields **and** `
|
|
438
|
+
If both cycles are active, populate browser fields **and** `node_*` fields in the same verdict — both cycles' pass criteria must hold for the gate to honor `status: "pass"`.
|
|
439
439
|
|
|
440
440
|
The agent must submit a verdict after **every** verification attempt — both pass and fail. File edits are blocked until a verdict is submitted after using devtools tools.
|
|
441
441
|
|
|
@@ -40,7 +40,7 @@ async function run(projectDir) {
|
|
|
40
40
|
if ((0, actions_1.hasToolCallsSinceLastVerdict)(actionsFile)) {
|
|
41
41
|
process.stderr.write(`BLOCKED: You used verification tools (browser-devtools / node-devtools / backend-devtools) but did not submit a verdict. You MUST submit a verdict (pass or fail) before editing code.
|
|
42
42
|
|
|
43
|
-
Submit your verdict first (include cycle-appropriate fields — browser fields for bdt_*,
|
|
43
|
+
Submit your verdict first (include cycle-appropriate fields — browser fields for bdt_*, node_* for ndt_*, backend_endpoints_called/backend_response_statuses for bedt_*):
|
|
44
44
|
echo '{"session_id":"${sessionId}","status":"fail","checks":[...],"issues":["describe what failed"], ...}' | ironbee hook submit-verdict
|
|
45
45
|
|
|
46
46
|
Then you can edit code to fix the issues.
|
|
@@ -4,27 +4,50 @@
|
|
|
4
4
|
|
|
5
5
|
If the project has the backend protocol cycle enabled (`ironbee backend enable` once at setup) and your edits touch matching paths (e.g. `server/**`, `api/**`, `routes/**`, `controllers/**`), the Stop hook also enforces a Backend cycle. The same `verification-start` covers every active cycle; the same verdict file carries fields for all of them.
|
|
6
6
|
|
|
7
|
-
This cycle is **runtime- and language-agnostic** — it works for Node, Java, Python, Go, Rust, Ruby, .NET, PHP, Elixir, Kotlin, Scala. The agent makes real protocol calls (HTTP / gRPC / GraphQL / WebSocket) against the running service
|
|
7
|
+
This cycle is **runtime- and language-agnostic** — it works for Node, Java, Python, Go, Rust, Ruby, .NET, PHP, Elixir, Kotlin, Scala. The agent makes real protocol calls (HTTP / gRPC / GraphQL / WebSocket) against the running service, inspects logs, OR reads database state; it never attaches to a process.
|
|
8
8
|
|
|
9
9
|
### Mode behavior (backend cycle)
|
|
10
|
-
- **default** (no arg or `default`):
|
|
11
|
-
- **full**:
|
|
10
|
+
- **default** (no arg or `default`): exercise the endpoints your diff touched via ONE of the three evidence paths (protocol-call, log evidence, or DB evidence — see below). Map each changed file → the route(s) / handler(s) / RPC method(s) / table(s) it exposes, then either call them yourself and chain follow-ups to verify side effects, OR set up log capture, OR inspect database state directly.
|
|
11
|
+
- **full**: cover every endpoint reachable from files matching `backend.verifyPatterns`, not just the changed files. Cover the success path, at least one error path, and any auth-gated variant for each. For schema / migration changes, verify every affected table.
|
|
12
12
|
- `visual` / `functional`: browser-only modes; this cycle behaves as `default` when they are passed.
|
|
13
13
|
|
|
14
14
|
### Steps (additive to the browser flow above)
|
|
15
|
+
|
|
16
|
+
The cycle is satisfied by ANY ONE of three evidence paths: protocol-call (you drive the request), log-evidence (something else drives it; you read the logs), or DB-evidence (you inspect database state). Pick whichever fits the task — multiple can be combined.
|
|
17
|
+
|
|
15
18
|
1. **Confirm the backend service is running** (the user's dev server / Docker compose / k8s port-forward / …). Don't start the service yourself — ask the user if it's not obvious.
|
|
16
|
-
2. **Identify the affected endpoint
|
|
17
|
-
3. **
|
|
19
|
+
2. **Identify the affected layer** — wire endpoint, log output, and/or database state. Map your code change to its observable side(s).
|
|
20
|
+
3. **Exercise ONE or more evidence paths**:
|
|
21
|
+
|
|
22
|
+
**Path A — Protocol-call (you drive the request):**
|
|
18
23
|
- `mcp__backend-devtools__bedt_request_http` — HTTP/1.1 + HTTP/2 (ALPN auto-negotiates).
|
|
19
24
|
- `mcp__backend-devtools__bedt_request_grpc` — unary + 3 streaming modes; `.proto` text or descriptor.
|
|
20
25
|
- `mcp__backend-devtools__bedt_request_graphql` — query/mutation/persisted query.
|
|
21
26
|
- `mcp__backend-devtools__bedt_request_websocket-open` then `bedt_request_websocket-send` / `bedt_request_websocket-receive` / `bedt_request_websocket-close` for stateful WS sessions.
|
|
22
27
|
- `mcp__backend-devtools__bedt_request_replay` — re-issue a captured curl command or HAR entry.
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
28
|
+
- **Inspect the response** — `status`, body, headers, `traceId`. **4xx/5xx and gRPC non-OK are normal results, not transport errors.** Decide PASS/FAIL based on what the test actually requires.
|
|
29
|
+
|
|
30
|
+
**Path B — Log evidence (an external driver hits the endpoint; you read the logs):**
|
|
31
|
+
- `mcp__backend-devtools__bedt_log_register-source` — register the running service's log destination (`type: "file"` + `path`, `type: "docker"` + `container`, or `type: "kubernetes"` + `pod`).
|
|
32
|
+
- `mcp__backend-devtools__bedt_log_read` / `bedt_log_read-multi` — point-in-time read with filters (`tail`, `since`/`until`, `pattern`, `level`, `parseJson` + `jsonFilter`, `contextBefore`/`contextAfter`, `select`, `coalesce`).
|
|
33
|
+
- `mcp__backend-devtools__bedt_log_follow` + `bedt_log_get-followed` (and `bedt_log_stop-follow`) — streaming follow when you need to capture lines that emit AFTER you trigger.
|
|
34
|
+
- **Verify the lines match the expectation** — error gone, expected line present, trace-id chained through, no unexpected ERROR-level entries.
|
|
35
|
+
|
|
36
|
+
**Path C — DB evidence (you inspect database state directly):**
|
|
37
|
+
- `mcp__backend-devtools__bedt_db_connect` — open a named connection (`type: "postgres" | "mysql" | "sqlite"`, prefer `connectionStringEnv` over inline `connectionString`, default `allowWrites: false`).
|
|
38
|
+
- `mcp__backend-devtools__bedt_db_list-tables` / `bedt_db_describe-table` — discover schema; great for migration verification.
|
|
39
|
+
- `mcp__backend-devtools__bedt_db_query` — run a read query (parametrized; readonly mode is enforced server-side).
|
|
40
|
+
- `mcp__backend-devtools__bedt_db_snapshot` (+ `bedt_db_diff`) — pre/post state diff so you can prove a code path changed exactly the rows it should have.
|
|
41
|
+
- `mcp__backend-devtools__bedt_db_watch-changes` + `bedt_db_get-changes` — streaming change capture for verifying writes triggered by a protocol call or external driver.
|
|
42
|
+
- `mcp__backend-devtools__bedt_db_transaction-begin/-commit/-rollback` — scope seed data to one test so it doesn't leak.
|
|
43
|
+
- `bedt_db_disconnect` to clean up (optional — session teardown handles it).
|
|
44
|
+
4. **Chain follow-ups across paths** — protocol-call → DB read to verify writes landed, protocol-call → log read to verify trace-id chained, etc. Use `bedt_request_set-default-headers` for auth tokens (host-scoped), `bedt_request_set-cookies` for session state on Path A.
|
|
45
|
+
5. **Trace correlation (optional, `o11y_*` primitives):** IronBee already pins the verification cycle's traceId on every backend tool call via `_metadata.traceId` (outranks any session pin), so the orchestrator's correlation root is authoritative. Use `bedt_o11y_get-trace-context` to read it, then pass it to `bedt_log_read { pattern: "<traceId>" }` to slice logs for one flow. `bedt_o11y_new-trace-id` / `bedt_o11y_set-trace-context` are available when you want to anchor a flow under an explicit id (e.g. integration-test runs).
|
|
46
|
+
6. **Submit verdict** including the fields matching the path(s) you exercised. If browser and/or node cycles are also active, include their fields in the SAME verdict — do not submit two verdicts.
|
|
26
47
|
|
|
27
48
|
### Verdict (backend-cycle fields)
|
|
49
|
+
|
|
50
|
+
Protocol-call path:
|
|
28
51
|
```json
|
|
29
52
|
{
|
|
30
53
|
"session_id": "...",
|
|
@@ -39,7 +62,27 @@ This cycle is **runtime- and language-agnostic** — it works for Node, Java, Py
|
|
|
39
62
|
}
|
|
40
63
|
```
|
|
41
64
|
|
|
42
|
-
|
|
65
|
+
Log-evidence path:
|
|
66
|
+
```json
|
|
67
|
+
{
|
|
68
|
+
"session_id": "...",
|
|
69
|
+
"status": "pass",
|
|
70
|
+
"checks": ["api-server logged 'order 42 created' on POST /api/orders", "no ERROR-level lines after the change"],
|
|
71
|
+
"backend_log_sources_read": ["api-server"]
|
|
72
|
+
}
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
DB-evidence path:
|
|
76
|
+
```json
|
|
77
|
+
{
|
|
78
|
+
"session_id": "...",
|
|
79
|
+
"status": "pass",
|
|
80
|
+
"checks": ["users table has new email_verified column with default false", "row count unchanged after migration"],
|
|
81
|
+
"backend_db_connections_read": ["app"]
|
|
82
|
+
}
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
At least one of `backend_endpoints_called`, `backend_log_sources_read`, or `backend_db_connections_read` must be non-empty. `backend_response_statuses` and `backend_traces_collected` are optional but strongly recommended on the protocol-call path — same order as `backend_endpoints_called`. There is no automatic pass-criteria override on this cycle: if `status: pass` and the evidence is structurally valid, the gate honors it.
|
|
43
86
|
|
|
44
87
|
For a multi-cycle pass (browser + backend, or browser + node + backend), every active cycle's evidence must be present — claiming `pass` without one cycle's fields will be overridden to fail.
|
|
45
88
|
|
|
@@ -53,22 +96,38 @@ Focus on the endpoints you changed — not every endpoint in the service.
|
|
|
53
96
|
1. Run `git diff --name-only` and `git diff --name-only HEAD~1`
|
|
54
97
|
2. **Ignore `.ironbee/`, `.claude/`, `.cursor/`** — tool config, not application code
|
|
55
98
|
3. **Read the full diff** for route / handler / controller / service files in scope — note the wire-level address (HTTP method+path, gRPC service+method, GraphQL operation, WebSocket path), request shape, response shape, side effects (DB writes, downstream calls, queue puts)
|
|
56
|
-
4. Before opening the request
|
|
99
|
+
4. Before opening the request or log tools, you should be able to answer: what endpoints did I touch? What does each return on the happy path? What does each return on the error path? What side effects need verification? Which side (request or log) is easier to drive for this task?
|
|
57
100
|
|
|
58
101
|
### 2. Verify against the running service
|
|
102
|
+
Pick the evidence path(s) that fit the task:
|
|
103
|
+
|
|
104
|
+
**Protocol-call path** (you drive the request):
|
|
59
105
|
- **Call each changed endpoint** with the matching tool — `mcp__backend-devtools__bedt_request_http` / `bedt_request_grpc` / `bedt_request_graphql` / `bedt_request_websocket-open`
|
|
60
106
|
- **Cross-reference the response against the diff** — status, body shape, headers, gRPC status code
|
|
61
107
|
- **Chain a follow-up call** to verify side effects (POST then GET, set then list, mutation then query, …)
|
|
62
108
|
- **Test one error path** per new branch — invalid body, missing field, missing auth, 404 path
|
|
63
109
|
- **Capture `traceId`** when available — useful for joining with downstream cycle evidence
|
|
64
110
|
|
|
111
|
+
**Log-evidence path** (an external driver hits it; you read the logs):
|
|
112
|
+
- **Register the service's log source** with `mcp__backend-devtools__bedt_log_register-source` (file / docker / kubernetes)
|
|
113
|
+
- **Read or follow** with `bedt_log_read` / `bedt_log_read-multi` (point-in-time, filter by `pattern` / `level` / `since-until` / `jsonFilter`) or `bedt_log_follow` (streaming for after-the-trigger capture)
|
|
114
|
+
- **Correlate with `traceId`** — use `pattern: "<traceId>"` to pull only the lines for one request
|
|
115
|
+
- **Verify the expected line is present** AND **no unexpected ERROR-level entries appeared** for the touched route(s)
|
|
116
|
+
|
|
117
|
+
**DB-evidence path** (you inspect database state directly):
|
|
118
|
+
- **Open a named, readonly connection** with `mcp__backend-devtools__bedt_db_connect` — prefer `connectionStringEnv` so the secret never flows through the agent context.
|
|
119
|
+
- **Discover schema** via `bedt_db_list-tables` + `bedt_db_describe-table` for migrations / column additions.
|
|
120
|
+
- **Run targeted reads** via `bedt_db_query` for row-count, content, and constraint checks; `bedt_db_snapshot` + `bedt_db_diff` for pre/post state proofs; `bedt_db_watch-changes` + `bedt_db_get-changes` for streaming change capture during a triggered call.
|
|
121
|
+
- **Disconnect** when done (`bedt_db_disconnect`) — optional; the session tears connections down at end.
|
|
122
|
+
|
|
65
123
|
---
|
|
66
124
|
|
|
67
125
|
## Full Mode (`/ironbee-verify full`, backend cycle)
|
|
68
126
|
|
|
69
|
-
Exercise every endpoint reachable from files matching `backend.verifyPatterns`, not just the changed files. Do NOT run `git diff` or scope to recent changes.
|
|
127
|
+
Exercise every endpoint / log source / DB table reachable from files matching `backend.verifyPatterns`, not just the changed files. Do NOT run `git diff` or scope to recent changes.
|
|
70
128
|
|
|
71
|
-
- Hit every route / RPC method / GraphQL operation / WebSocket lifecycle in scope
|
|
129
|
+
- Hit every route / RPC method / GraphQL operation / WebSocket lifecycle in scope (protocol-call) OR cover them via the log feed when an external driver / test suite drives them (log evidence) OR via direct DB inspection for schema / migration coverage (DB evidence)
|
|
72
130
|
- Cover the success path AND at least one error path for each
|
|
73
131
|
- Cover any auth-gated variant (unauthenticated, wrong role) where authentication is present
|
|
74
|
-
-
|
|
132
|
+
- For migrations: verify every affected table's schema (`bedt_db_describe-table`) + sample row count before / after
|
|
133
|
+
- Any unexpected error response, unexpected ERROR-level log line, or unexpected schema drift during the run is a fail, regardless of when it was introduced
|
|
@@ -16,9 +16,9 @@ If the project has node backend verification enabled (`ironbee node enable` once
|
|
|
16
16
|
2. **Connect**: `mcp__node-devtools__ndt_debug_connect` with one of `pid` / `processName` / `containerId` / `containerName` / `inspectorPort` / `wsUrl`. Inspector is auto-activated via SIGUSR1 if needed.
|
|
17
17
|
3. **Pick an evidence path** for each changed code path:
|
|
18
18
|
- **Probe path** (proves the code path executed): `mcp__node-devtools__ndt_debug_put-tracepoint` (or `put-logpoint` / `put-exceptionpoint`) at the changed code, exercise the path (e.g. trigger the API call from the browser), then `mcp__node-devtools__ndt_debug_get-probe-snapshots`. At least one probe must come back with `triggered: true`.
|
|
19
|
-
- **Log path** (proves no errors): exercise the path, then `mcp__node-devtools__ndt_debug_get-logs` with the error level filter. `
|
|
19
|
+
- **Log path** (proves no errors): exercise the path, then `mcp__node-devtools__ndt_debug_get-logs` with the error level filter. `node_log_errors` must be empty for `status: pass`.
|
|
20
20
|
4. **Disconnect** (optional): `mcp__node-devtools__ndt_debug_disconnect`.
|
|
21
|
-
5. **Submit verdict** including `
|
|
21
|
+
5. **Submit verdict** including `node_*` fields. If browser cycle is also active, include browser fields in the SAME verdict — do not submit two verdicts.
|
|
22
22
|
|
|
23
23
|
### Verdict (node-cycle fields)
|
|
24
24
|
```json
|
|
@@ -26,12 +26,12 @@ If the project has node backend verification enabled (`ironbee node enable` once
|
|
|
26
26
|
"session_id": "...",
|
|
27
27
|
"status": "pass",
|
|
28
28
|
"checks": ["POST /api/orders returned 201", "tracepoint at handler.ts:42 fired once"],
|
|
29
|
-
"
|
|
30
|
-
"
|
|
29
|
+
"node_processes_connected": ["pid:12345 (next-server)"],
|
|
30
|
+
"node_probes_set": [
|
|
31
31
|
{ "type": "tracepoint", "location": "src/api/orders.ts:42", "triggered": true }
|
|
32
32
|
],
|
|
33
|
-
"
|
|
34
|
-
"
|
|
33
|
+
"node_probe_snapshots_collected": 1,
|
|
34
|
+
"node_log_errors": []
|
|
35
35
|
}
|
|
36
36
|
```
|
|
37
37
|
|
|
@@ -54,7 +54,7 @@ Focus on the code you changed — not the entire Node service.
|
|
|
54
54
|
- **Exercise the path end-to-end** (trigger from browser, curl, or the backend cycle if active)
|
|
55
55
|
- **Each touched probe must report `triggered: true`** in `mcp__node-devtools__ndt_debug_get-probe-snapshots`
|
|
56
56
|
- **Check one edge case per new branch** — invalid input, missing field, auth failure, …
|
|
57
|
-
- **Logs** — `mcp__node-devtools__ndt_debug_get-logs` at error level; `
|
|
57
|
+
- **Logs** — `mcp__node-devtools__ndt_debug_get-logs` at error level; `node_log_errors` must be empty for `pass`
|
|
58
58
|
|
|
59
59
|
---
|
|
60
60
|
|
|
@@ -64,4 +64,4 @@ Probe every Node code path reachable from files matching `node.verifyPatterns`,
|
|
|
64
64
|
|
|
65
65
|
- Place probes at every handler / route / service entry point in scope, plus key internal branch points (early returns, error catches, conditional middleware)
|
|
66
66
|
- Exercise each path with at least one happy-path call AND one failure-path call
|
|
67
|
-
- `
|
|
67
|
+
- `node_log_errors` must be empty after the full run — any unexpected log error is a fail, regardless of when it was introduced
|
|
@@ -4,20 +4,29 @@
|
|
|
4
4
|
|
|
5
5
|
## Backend cycle (runtime-agnostic protocol verification)
|
|
6
6
|
|
|
7
|
-
When the file matches `backend.verifyPatterns`, the Stop hook ALSO requires verification through the **backend-devtools** MCP server (prefix `bedt_`).
|
|
7
|
+
When the file matches `backend.verifyPatterns`, the Stop hook ALSO requires verification through the **backend-devtools** MCP server (prefix `bedt_`). The cycle is satisfied by **any one of three** evidence paths: a real protocol call (HTTP / gRPC / GraphQL / WebSocket) against the running service, log inspection from the running service (file / docker / kubernetes) when an external driver is hitting the endpoint, OR direct database inspection for schema / data-state changes. Pick whichever fits the task — language- and framework-independent in all three cases.
|
|
8
8
|
|
|
9
|
-
The backend cycle and the node cycle are **independent**. Node attaches to a Node.js V8 inspector with non-blocking probes (`ndt_*`); backend drives wire protocols from outside (`bedt_*`). They can be active in the same task; both must be satisfied for `status: pass`.
|
|
9
|
+
The backend cycle and the node cycle are **independent**. Node attaches to a Node.js V8 inspector with non-blocking probes (`ndt_*`); backend drives wire protocols and/or reads logs / DB state from outside (`bedt_*`). They can be active in the same task; both must be satisfied for `status: pass`.
|
|
10
10
|
|
|
11
11
|
### Backend-cycle additions to the main flow
|
|
12
12
|
|
|
13
13
|
These attach to the **Required steps** above — they don't replace any step. Numbering follows the main flow:
|
|
14
14
|
|
|
15
|
-
- **Within step 3 (run flow):** also run the backend flow
|
|
16
|
-
- **
|
|
15
|
+
- **Within step 3 (run flow):** also run the backend flow via ONE of three evidence paths:
|
|
16
|
+
- **Protocol-call** — identify the affected endpoint(s) → call the matching protocol tool (`bedt_request_http` / `bedt_request_grpc` / `bedt_request_graphql` / `bedt_request_websocket-open` / `bedt_request_replay`) → inspect status / body / `traceId` → chain follow-up calls when verifying side effects. **A 4xx / 5xx response is a normal result, not an error** — only transport failures populate the `error` field.
|
|
17
|
+
- **Log evidence** — `bedt_log_register-source` (file / docker / kubernetes) → `bedt_log_read` (point-in-time, supports `tail` / `since-until` / `pattern` / `level` / `parseJson` + `jsonFilter` / `contextBefore-After` / `select` / `coalesce`) OR `bedt_log_follow` + `bedt_log_get-followed` (streaming). Fit for jobs / queue workers / async handlers, or any case where an external driver is hitting the endpoint and you only need to verify what the server logged. `bedt_log_register-source` is mandatory on this path (the gate counts it as the setup step).
|
|
18
|
+
- **DB evidence** — `bedt_db_connect` (named, `connectionStringEnv` preferred, default readonly) → ONE of `bedt_db_query` / `bedt_db_describe-table` / `bedt_db_list-tables` / `bedt_db_snapshot` (+ optional `bedt_db_diff`) / `bedt_db_get-changes`. Fit for migrations, seed-data changes, query-result regressions, and any code change whose side effect lives in a relational DB. `bedt_db_connect` is mandatory on this path (same anti-fluke rule as `log-evidence` — the connection name is on the wire).
|
|
19
|
+
- **Within step 6 (submit verdict):** include `backend_endpoints_called` for protocol-call evidence (optionally with `backend_response_statuses` and `backend_traces_collected`), `backend_log_sources_read` for log-evidence, and/or `backend_db_connections_read` for db-evidence. **At least one** of those three arrays must be non-empty. One verdict carries fields for every active cycle.
|
|
20
|
+
|
|
21
|
+
### Trace correlation (`o11y_*` is auxiliary, not evidence)
|
|
22
|
+
|
|
23
|
+
IronBee already injects the active verification `traceId` into every backend tool call via `_metadata.traceId`. That id outranks anything `bedt_o11y_new-trace-id` / `bedt_o11y_set-trace-context` would pin for the cycle, so the orchestrator's correlation root stays authoritative. The `o11y_*` tools are still useful for log searches (`bedt_log_read { pattern: traceId }` against an explicit id), but they don't count toward the gate.
|
|
17
24
|
|
|
18
25
|
### Additional BANNED for backend cycle
|
|
19
26
|
|
|
20
27
|
- Calling `bedt_*` tools without first opening a verification cycle (`ironbee hook verification-start`).
|
|
21
28
|
- Treating a 4xx / 5xx response as a transport failure when the test was specifically asking for that error condition (e.g. "POST should reject malformed body with 400"). Decide PASS/FAIL based on the test's intent, not the status code's HTTP-class default.
|
|
22
|
-
- Submitting a backend
|
|
23
|
-
- Inferring backend behavior by reading code without
|
|
29
|
+
- Submitting a backend verdict that omits ALL of `backend_endpoints_called`, `backend_log_sources_read`, and `backend_db_connections_read` — at least one of those evidence arrays must be non-empty.
|
|
30
|
+
- Inferring backend behavior by reading code without exercising any evidence path. The cycle is satisfied only by making a real protocol call, reading real logs, or inspecting real DB state on the running service.
|
|
31
|
+
- Reading a pre-existing log source / DB unrelated to your task to fake the log-evidence or db-evidence path. `bedt_log_register-source` and `bedt_db_connect` are required setup steps on those paths so the registration / connection is on the wire.
|
|
32
|
+
- Opening a DB connection with `allowWrites: true` to "set up" verification data without an explicit need (seed / migration). Read-only is the default for a reason — flipping it widens the blast radius if a query goes wrong.
|
|
@@ -19,11 +19,11 @@ Both cycles can be active simultaneously (e.g. you edit both a React component a
|
|
|
19
19
|
These attach to the **Required steps** above — they don't replace any step. Numbering follows the main flow:
|
|
20
20
|
|
|
21
21
|
- **Within step 3 (run flow):** also run the node flow: connect (`ndt_debug_connect`) → set probe (`ndt_debug_put-tracepoint` / `put-logpoint` / `put-exceptionpoint`) AND exercise + read snapshots (`ndt_debug_get-probe-snapshots`), OR exercise + read logs (`ndt_debug_get-logs`). When both browser and node cycles are active, run BOTH within the same verification cycle.
|
|
22
|
-
- **Within step 6 (submit verdict):** include `
|
|
22
|
+
- **Within step 6 (submit verdict):** include `node_*` fields (`node_processes_connected` non-empty, plus `node_probes_set` and/or `node_log_errors`). One verdict carries fields for every active cycle.
|
|
23
23
|
|
|
24
24
|
### Additional BANNED for node cycle
|
|
25
25
|
|
|
26
26
|
- Calling `ndt_*` tools without first opening a verification cycle (`ironbee hook verification-start`).
|
|
27
27
|
- **Calling `ndt_*` tools when the project's backend is NOT Node.js** (Java / Python / Go / Rust / .NET / Ruby / PHP / Elixir / etc.). Use the browser cycle only for non-Node backends.
|
|
28
|
-
- Claiming `status: pass` for a node cycle when no probe triggered AND `
|
|
29
|
-
- Submitting a node-only verdict that omits `
|
|
28
|
+
- Claiming `status: pass` for a node cycle when no probe triggered AND `node_log_errors` was never collected.
|
|
29
|
+
- Submitting a node-only verdict that omits `node_processes_connected` — every node-cycle verdict requires this field non-empty.
|
|
@@ -5,11 +5,15 @@
|
|
|
5
5
|
|
|
6
6
|
## Backend cycle — runtime- and language-agnostic
|
|
7
7
|
|
|
8
|
-
The **backend protocol cycle** verifies backend changes by driving real protocol calls (HTTP / gRPC / GraphQL / WebSocket) against the running service and reading the responses. It works for ANY backend runtime: Node, Java, Python, Go, Rust, Ruby, .NET, PHP, Elixir, Kotlin, Scala — the agent never attaches to a process
|
|
8
|
+
The **backend protocol cycle** verifies backend changes by driving real protocol calls (HTTP / gRPC / GraphQL / WebSocket) against the running service and reading the responses, OR by inspecting the logs of the running service (file / docker / kubernetes) when an external driver is hitting the endpoint. It works for ANY backend runtime: Node, Java, Python, Go, Rust, Ruby, .NET, PHP, Elixir, Kotlin, Scala — the agent never attaches to a process.
|
|
9
9
|
|
|
10
|
-
**This is different from the node cycle.** Node-cycle (`ndt_*`) attaches to a V8 inspector and sets non-blocking probes inside a running Node.js process — it's Node-only. Backend-cycle (`bedt_*`) makes outside-in protocol calls
|
|
10
|
+
**This is different from the node cycle.** Node-cycle (`ndt_*`) attaches to a V8 inspector and sets non-blocking probes inside a running Node.js process — it's Node-only. Backend-cycle (`bedt_*`) makes outside-in protocol calls and/or reads logs of any service. They can be active at the same time when both are enabled.
|
|
11
11
|
|
|
12
|
-
## Backend flow
|
|
12
|
+
## Backend flow (three evidence paths — at least one is required)
|
|
13
|
+
|
|
14
|
+
You can satisfy the cycle via **protocol-call evidence** (you drive the request yourself), **log evidence** (something else drives the request, you read the resulting logs), **DB evidence** (you inspect database state directly), or any combination. Pick whichever fits the task; one is enough.
|
|
15
|
+
|
|
16
|
+
### Path A — Protocol-call evidence
|
|
13
17
|
|
|
14
18
|
1. **Confirm a backend service is running** (the user's dev server, Docker compose, k8s port-forward, …). The agent itself does not start the service — ask the user if uncertain.
|
|
15
19
|
2. **Identify the affected endpoint(s)** for your code change. Look at routes / handlers / controllers in the changed files. Map them to wire-level addresses (URL, gRPC service+method, GraphQL operation name, WebSocket path).
|
|
@@ -25,9 +29,44 @@ The **backend protocol cycle** verifies backend changes by driving real protocol
|
|
|
25
29
|
4. **Inspect the response** — `status` (HTTP / gRPC code), body, headers, returned `traceId` (always W3C `traceparent`).
|
|
26
30
|
**`4xx/5xx and gRPC non-OK are normal results, not errors.** A test for "404 Not Found" SHOULD return 404. Only transport-level failures (DNS, TLS, timeout, abort) populate the response's `error` field. Decide PASS/FAIL based on what your task actually requires.
|
|
27
31
|
5. **Chain follow-up calls** if you need to verify side effects (e.g. POST then GET to confirm the new resource is readable). Use `bedt_request_set-default-headers` to pin auth tokens once per host, `bedt_request_set-cookies` for session cookies — both stay scoped to that target across calls.
|
|
28
|
-
6. **Submit verdict** including `backend_*` fields. If browser and/or node cycles are also active, include their fields in the SAME verdict — do not submit two verdicts.
|
|
29
32
|
|
|
30
|
-
###
|
|
33
|
+
### Path B — Log evidence (when an external driver hits the endpoint)
|
|
34
|
+
|
|
35
|
+
Useful when an integration test, the user, or a deploy script is already driving the protocol — your job is to verify "side B" by reading what the server logged. Also a fit for jobs / queue workers / cron handlers where there is no synchronous request to make.
|
|
36
|
+
|
|
37
|
+
1. **Register the log source** with `mcp__backend-devtools__bedt_log_register-source` — pick `type: "file"` for any process whose stdout is redirected to a file, `type: "docker"` for a container (`container: "<name|id>"`), `type: "kubernetes"` for a pod (`pod`, optionally `kubernetesContainer` / `namespace`). Source names are session-unique; re-register to overwrite. Listing/check helpers: `bedt_log_list-sources`, `bedt_log_check-source`.
|
|
38
|
+
2. **Read or follow the source**:
|
|
39
|
+
- `mcp__backend-devtools__bedt_log_read` / `bedt_log_read-multi` — point-in-time read across one or many sources. Filters: `tail` (last N lines), `since` / `until` (ISO-8601 — natively docker; file sources require `parseJson: true` so timestamp is extracted from a JSON field), `pattern` (substring; use for trace-id correlation), `level` (ERROR/WARN/INFO/DEBUG/TRACE/FATAL), `limit`, `parseJson`, `jsonFilter` (dot-path equality predicates against parsed JSON), `contextBefore` / `contextAfter`, `select` (dot-path projection to trim verbose JSONL), `coalesce` (fold multi-line stack traces into one line).
|
|
40
|
+
- `mcp__backend-devtools__bedt_log_follow` — open a streaming subscription that pushes lines into a ring buffer; `bedt_log_get-followed` drains it on demand; `bedt_log_stop-follow` tears it down. Use this when you need to capture logs that emit AFTER your trigger.
|
|
41
|
+
3. **Verify the lines you got match the expectation** — error gone, expected log line present, trace-id chained through. Plain-text and JSON sources are both supported; JSON sources accept structural predicates (`jsonFilter: { 'level': 'error', 'route': '/api/orders' }`).
|
|
42
|
+
4. **Unregister when done** — `bedt_log_unregister-source` cleans up. Optional; the session tears them down at end too.
|
|
43
|
+
|
|
44
|
+
### Path C — DB evidence (when the change touches database state)
|
|
45
|
+
|
|
46
|
+
Best fit for schema migrations, seed-data changes, query-result regressions, and any backend change whose side effect is visible in a relational DB. The agent opens a named, read-only-by-default connection and inspects the state — no need to drive a protocol call when the verification IS reading the DB.
|
|
47
|
+
|
|
48
|
+
1. **Open a named connection** with `mcp__backend-devtools__bedt_db_connect` — `type: "postgres" | "mysql" | "sqlite"`, `connectionString` or (preferred) `connectionStringEnv`. Default is `allowWrites: false` (server-side READ ONLY mode); only set `allowWrites: true` if you need to seed test data. The session can hold multiple named connections side-by-side (`bedt_db_list-connections` to inspect).
|
|
49
|
+
2. **Inspect state** — pick the tool that fits:
|
|
50
|
+
- `mcp__backend-devtools__bedt_db_list-tables` — discover tables.
|
|
51
|
+
- `mcp__backend-devtools__bedt_db_describe-table` — verify schema (columns / types / indexes / FKs) after a migration.
|
|
52
|
+
- `mcp__backend-devtools__bedt_db_query` — run an arbitrary read query (`SELECT count(*) FROM orders WHERE …`, etc.). Always parameterized — the gate honors readonly mode.
|
|
53
|
+
- `mcp__backend-devtools__bedt_db_snapshot` + `bedt_db_diff` — pre/post state diff for verifying that a code path changed exactly the rows it should.
|
|
54
|
+
- `mcp__backend-devtools__bedt_db_watch-changes` + `bedt_db_get-changes` — streaming change capture when a protocol call (or external driver) triggers writes you want to verify after the fact.
|
|
55
|
+
3. **(Optional) Seed / migrate** — `bedt_db_seed` (structured) and `bedt_db_run-script` (raw SQL) write data; both require `allowWrites: true` at connect time. Use sparingly and prefer per-test transactions (`bedt_db_transaction-begin` / `-commit` / `-rollback`) so the seed doesn't leak across tests.
|
|
56
|
+
4. **Disconnect when done** — `bedt_db_disconnect` releases the connection cleanly. Optional; the session tears them down at end too.
|
|
57
|
+
|
|
58
|
+
### Trace correlation across paths (`o11y_*` primitives)
|
|
59
|
+
|
|
60
|
+
The IronBee verification cycle already pins a W3C trace id on every backend tool call via `_metadata.traceId` (see `require-verification`). It outranks any trace pin the agent sets, so you usually don't need the `o11y_*` tools. They are still available when you want to grep logs by trace id, or anchor a multi-tool flow under an explicit id:
|
|
61
|
+
|
|
62
|
+
- `mcp__backend-devtools__bedt_o11y_new-trace-id` — generate a fresh 32-hex id and pin it on the session (shadowed by IronBee's verification traceId for the cycle).
|
|
63
|
+
- `mcp__backend-devtools__bedt_o11y_set-trace-context` — set / clear specific trace-context values.
|
|
64
|
+
- `mcp__backend-devtools__bedt_o11y_get-trace-context` — inspect the current pin (useful for `bedt_log_read { pattern: "<traceId>" }`).
|
|
65
|
+
|
|
66
|
+
### Submit verdict
|
|
67
|
+
|
|
68
|
+
Include the fields matching the path(s) you exercised. If browser and/or node cycles are also active, include their fields in the SAME verdict — do not submit two verdicts.
|
|
69
|
+
|
|
31
70
|
```json
|
|
32
71
|
{
|
|
33
72
|
"session_id": "<sid>",
|
|
@@ -42,7 +81,29 @@ The **backend protocol cycle** verifies backend changes by driving real protocol
|
|
|
42
81
|
}
|
|
43
82
|
```
|
|
44
83
|
|
|
45
|
-
|
|
84
|
+
Or, for a log-evidence path:
|
|
85
|
+
|
|
86
|
+
```json
|
|
87
|
+
{
|
|
88
|
+
"session_id": "<sid>",
|
|
89
|
+
"status": "pass",
|
|
90
|
+
"checks": ["api-server logged 'order 42 created' on POST /api/orders", "no ERROR-level lines after the change"],
|
|
91
|
+
"backend_log_sources_read": ["api-server"]
|
|
92
|
+
}
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Or, for a DB-evidence path:
|
|
96
|
+
|
|
97
|
+
```json
|
|
98
|
+
{
|
|
99
|
+
"session_id": "<sid>",
|
|
100
|
+
"status": "pass",
|
|
101
|
+
"checks": ["users table has new email_verified column with default false", "row count unchanged after migration"],
|
|
102
|
+
"backend_db_connections_read": ["app"]
|
|
103
|
+
}
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
At least one of `backend_endpoints_called`, `backend_log_sources_read`, or `backend_db_connections_read` must be non-empty. The optional protocol-call fields (`backend_response_statuses`, `backend_traces_collected`) are strongly recommended when you used the protocol-call path — same order as `backend_endpoints_called`. Status interpretation is YOUR call: there is no automatic pass-criteria override on this cycle. If the agent claims `status: pass` and the evidence is structurally valid, the gate honors it.
|
|
46
107
|
|
|
47
108
|
## Multi-cycle (browser + backend, or browser + node + backend)
|
|
48
109
|
|
|
@@ -28,7 +28,7 @@ If you see `pom.xml`, `build.gradle`, `requirements.txt`, `pyproject.toml`, `go.
|
|
|
28
28
|
- Read collected snapshots: `ndt_debug_get-probe-snapshots`. At least one probe must come back with `triggered: true`.
|
|
29
29
|
- **Log path** (proves no errors during execution):
|
|
30
30
|
- Exercise the path.
|
|
31
|
-
- Read errors: `ndt_debug_get-logs` with the error-level filter. `
|
|
31
|
+
- Read errors: `ndt_debug_get-logs` with the error-level filter. `node_log_errors` must be empty for `status: pass`.
|
|
32
32
|
4. **Disconnect** (optional): `ndt_debug_disconnect`.
|
|
33
33
|
|
|
34
34
|
### Node verdict fields
|
|
@@ -37,18 +37,18 @@ If you see `pom.xml`, `build.gradle`, `requirements.txt`, `pyproject.toml`, `go.
|
|
|
37
37
|
"session_id": "<sid>",
|
|
38
38
|
"status": "pass",
|
|
39
39
|
"checks": ["POST /api/orders returned 201", "tracepoint at handler.ts:42 fired once"],
|
|
40
|
-
"
|
|
41
|
-
"
|
|
40
|
+
"node_processes_connected": ["pid:12345 (next-server)"],
|
|
41
|
+
"node_probes_set": [
|
|
42
42
|
{ "type": "tracepoint", "location": "src/api/orders.ts:42", "triggered": true }
|
|
43
43
|
],
|
|
44
|
-
"
|
|
45
|
-
"
|
|
44
|
+
"node_probe_snapshots_collected": 1,
|
|
45
|
+
"node_log_errors": []
|
|
46
46
|
}
|
|
47
47
|
```
|
|
48
48
|
|
|
49
49
|
For `status: "pass"` (node cycle):
|
|
50
50
|
- If probes were set, at least one must have `triggered: true` (proves the code path executed).
|
|
51
|
-
- If only logs were used, `
|
|
51
|
+
- If only logs were used, `node_log_errors.length === 0` (no errors observed).
|
|
52
52
|
- If both forms were used, both conditions must hold.
|
|
53
53
|
|
|
54
54
|
## Multi-cycle (browser + node simultaneously)
|
|
@@ -65,12 +65,12 @@ Submit ONE verdict carrying fields for every active cycle:
|
|
|
65
65
|
"checks": ["checkout submits", "POST /api/orders returned 201", "no console errors"],
|
|
66
66
|
"console_errors": 0,
|
|
67
67
|
"network_failures": 0,
|
|
68
|
-
"
|
|
69
|
-
"
|
|
68
|
+
"node_processes_connected": ["pid:12345 (next-server)"],
|
|
69
|
+
"node_probes_set": [
|
|
70
70
|
{ "type": "tracepoint", "location": "src/api/orders.ts:42", "triggered": true }
|
|
71
71
|
],
|
|
72
|
-
"
|
|
73
|
-
"
|
|
72
|
+
"node_probe_snapshots_collected": 1,
|
|
73
|
+
"node_log_errors": []
|
|
74
74
|
}
|
|
75
75
|
```
|
|
76
76
|
|
|
@@ -44,7 +44,7 @@ async function run(projectDir) {
|
|
|
44
44
|
permission: "deny",
|
|
45
45
|
agent_message: `BLOCKED: You used verification tools (browser-devtools / node-devtools / backend-devtools) but did not submit a verdict. You MUST submit a verdict (pass or fail) before editing code.
|
|
46
46
|
|
|
47
|
-
Submit your verdict first (include cycle-appropriate fields — browser fields for bdt_*,
|
|
47
|
+
Submit your verdict first (include cycle-appropriate fields — browser fields for bdt_*, node_* for ndt_*, backend_endpoints_called/backend_response_statuses for bedt_*):
|
|
48
48
|
echo '{"session_id":"${sessionId}","status":"fail","checks":[...],"issues":["describe what failed"], ...}' | ironbee hook submit-verdict
|
|
49
49
|
|
|
50
50
|
Then you can edit code to fix the issues.`,
|