@openwop/openwop-conformance 1.19.0 → 1.21.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,8 +1,31 @@
1
1
  # `@openwop/openwop-conformance` Changelog
2
2
 
3
- ## [Unreleased]
3
+ ## [1.21.0] — 2026-06-07 — RFC 0090/0091/0092 conformance scenarios + vitest 4
4
4
 
5
- _No unreleased changes._
5
+ ### Security — test-tooling dependency hygiene (consumer-facing)
6
+ - **Bumped `vitest` 3 → 4 (kept in `dependencies`).** `vitest` was a hard `dependency` pinned `^3.0.0`, dragging the 2 critical `vitest`-3 advisories (which only trigger under `vitest --ui`, never `vitest run`, never production) into every consumer's tree. The fix is the **version bump alone**: `dependencies.vitest` is now `^4.0.0` — `vitest` 4 carries none of those advisories, so a consumer install now reports `0 vulnerabilities`. **`vitest` MUST remain a runtime `dependency`** (not a `devDependency` or optional peer): the suite's `bin` (`dist/cli.js`) runtime-invokes `npx vitest`, so a bare `npx -y @openwop/openwop-conformance` install path requires `vitest` resolvable from the installed package — enforced by the CI install-path guardrail (regression `F-2026-05-06-C`). The server-free gate passes under `vitest` 4; server-requiring scenarios are unaffected (they gate on `OPENWOP_BASE_URL`, orthogonal to the runner version). No scenario behavior or count change.
7
+
8
+ ### Added — RFCs 0090/0091/0092 conformance scenarios (count 324 → 330)
9
+ Always-on, server-free shape probes:
10
+ - **`agent-verifier-shape.test.ts`** (RFC 0090) — the content-free `agent.verified` payload $def (closed `verdict` enum; `additionalProperties:false` backing `verifier-no-content-leak`), `agent.verified` in the RunEventType enum, the additive `successCriteria` on the `terminate` decision, and `executionModel.version` 6 + the `verifier{supported,gating}` capability sub-block.
11
+ - **`aiproviders-input-shape.test.ts`** (RFC 0091) — the additive `capabilities.aiProviders.input.modalities[]` (closed enum text/image/audio/document) + positive-integer `maxBytesPerPart`.
12
+ - **`agent-requires-capabilities-shape.test.ts`** (RFC 0092) — the additive `AgentManifest.requiresCapabilities[]` (optional; non-string-array rejected).
13
+
14
+ Capability-gated **behavioral** legs (soft-skip by default; hard-fail under `OPENWOP_REQUIRE_BEHAVIOR=true`) — the Active→Accepted reference-host proof for each RFC:
15
+ - **`agent-capability-degraded-projection.test.ts`** (RFC 0092 §B) — gated on `agents.manifestRuntime`; the `degraded[]` projection iff-contract on the NORMATIVE `GET /v1/agents` (well-formed/unique members; a degraded key MUST be one the agent requires); non-vacuous via `OPENWOP_DEGRADED_CAPABILITY_AGENT_ID`. Black-box, no seam.
16
+ - **`callai-multimodal.test.ts`** (RFC 0091 §A/§B) — gated on `aiProviders.input.modalities` including a non-text modality; an advertised modality part is accepted, an unadvertised one is rejected with `unsupported_modality`; via the `POST /v1/host/sample/ai/call` seam (soft-skip on 404).
17
+ - **`verifier-gating.test.ts`** (RFC 0090 §B) — gated on `multiAgent.executionModel.verifier.gating`; a `fail` verdict blocks commit/terminate-as-success + emits `agent.verified{verdict:"fail"}`, a `pass` completes; via the `POST /v1/host/sample/agents/verify-run` seam (soft-skip on 404).
18
+
19
+ ### Changed — test diagnostics (no behavior / count change)
20
+ - **`agent-channel-dispatch.test.ts` (RFC 0082 §B) — value assertions over boolean assertions.** The four pin-comparison checks (Leg 1 `resolvedChannel`, Leg 2 + Leg 3 `resolvedAgentVersion`) now assert the observed value directly (`expect(observed, …).toBe(expected)` / `.not.toBe(movedVersion)`) instead of pre-computing a boolean and asserting `.toBe(true)`. A failing run on a third-party host now surfaces the *actual* resolved channel/version in the diff, not just "expected true, got false" + the requirement text. Pass/fail semantics, scenario count (324), and the published suite behavior are unchanged. Also documented Leg 3's test-isolation caveat: it promotes the bound agent's `stable` head via the conformance-only seam without rollback (the seam exposes no rollback primitive), so hosts SHOULD run the suite against an isolated/ephemeral deployment store.
21
+ - **`spec-corpus-validity.test.ts` — markdown link-walker prunes CI sibling-repo checkouts.** Added `examples-ext` + `registry-ext` to the walker's `ignoredDirs`. After the 2026-06 repo split, the host-conformance gates check `openwop-examples` / `openwop-registry` out into the workspace; the server-free corpus link-scan would otherwise recurse into them and flag those repos' own cross-repo links as broken. Consumer runs (`--base-url`) don't have those dirs, so this is a no-op for them. No assertion / count change.
22
+ - **`experimental-tier-shape.test.ts` — self-test busts the memoized env cache.** The `experimentalGate routes through behaviorGate` unit test deletes `OPENWOP_REQUIRE_BEHAVIOR` to assert default-mode soft-skip, but `behaviorGate` reads a memoized `loadEnv()` snapshot; under a strict suite run the delete was a no-op (order-dependent: passed in isolation, threw in the full run). Now calls the existing `__resetEnvCacheForTests()` after the toggle + in `finally`. No assertion / count change.
23
+
24
+ ## [1.20.0] — 2026-06-02 — RFC 0082 §B channel-dispatch: Leg 3 targets the fixture's agent
25
+
26
+ A one-line correctness fix to `agent-channel-dispatch.test.ts` Leg 3 (the seam-guarded cross-move non-re-resolution proof). The promote that moves the channel between the original run and the replay fork now passes `agentId: "core.conformance.channel-agent"` (the fixture's bound agent) to `driveDeploymentTransition`, instead of letting the host-sample `deployment-transition` seam default to its sample agent (`core.openwop.agents.sample`). Without this, Leg 3 could never observe a move — the seam promoted a different agent than the fixture binds, so a fresh fixture run's `stable` head was unchanged and Leg 3's `movedVersion === pinnedVersion` guard always self-skipped. Now a host whose promote actually mutates the deployment store for the bound agent can exercise Leg 3's strict assertion (a replay still carries the ORIGINAL pin after the channel moves). Hosts whose promote only emits an event (e.g. the reference sample seam) still self-skip honestly — no behavior change for them. Legs 1+2 (production-path resolve-and-pin + replay re-read) are unchanged, and the 1.19.0 second-host witness (MyndHyve) stands; this only widens what a future host can prove. No new scenario file (count unchanged at 324).
27
+
28
+ **Version note:** this is a patch-scope change but is released as `1.20.0` (skipping `1.19.1`) because `1.19.1` is a reserved string in the `openwop-check-publish-metadata.sh` posture guard (a deliberate catch for stale pre-OpenWOP "WOP-era" version artifacts: `1.18.6` / `1.19.1` / `v19`). `EXPECTED_CONFORMANCE_VERSION` + the `check-npm-pack-contents.sh` assert advance to `1.20.0` in lockstep.
6
29
 
7
30
  ## [1.19.0] — 2026-06-02 — RFC 0082 §B production-path channel-dispatch scenario
8
31
 
package/README.md CHANGED
@@ -93,7 +93,7 @@ Exit code is non-zero on any failed assertion.
93
93
 
94
94
  ## What's Covered
95
95
 
96
- The current suite has 324 scenario files under `src/scenarios/`. 2026-06-02 (RFC 0082 §B — deployment channel resolve-and-pin, production-path coverage) added `agent-channel-dispatch.test.ts` (capability-gated on `agents.deployment.supported` + the seeded `conformance-agent-channel-dispatch` fixture + advertised `replay` mode via `behaviorGate('openwop-deployment-channel-dispatch', …)` — proves the §B pin from a REAL run graph, complementing `agent-deployment-lifecycle.test.ts` Leg 4's host-sample seam: a canonical `POST /v1/runs` of a node binding `agent.channel:"stable"` MUST record `resolvedChannel` + `resolvedAgentVersion` on `agent.invocation.started` (RFC 0077), a `:fork{mode:"replay"}` MUST re-read that recorded version, and the seam-guarded Leg 3 MOVES the channel then asserts a replay STILL carries the original pin — never re-resolving a moved channel; soft-skips by default, hard-fails under `OPENWOP_REQUIRE_BEHAVIOR=true` — the production-path proof of the §B contract). 2026-06-01 (RFC 0085 — `openwop-agent-platform` meta-profile, the Active→Accepted behavioral gate) added `agent-platform-aggregate-evidence.test.ts` (capability-gated on a host CLAIMING `openwop-agent-platform` in its live discovery `profiles[]` via `behaviorGate('openwop-agent-platform', …)` — the §C/§D honest-advertisement rule on the live `/.well-known/openwop`: the claim MUST satisfy the §B floor predicate (`isAgentPlatformPartial` → `partial`/`full`, never `none`), backed by the per-capability evidence not the profile string; `OPENWOP_AGENT_PLATFORM_TIER=full` forces the non-vacuous full bar — all governance terms + tenant installScope + all 16 §D terms; server-requiring, the always-on §B/§D derivation legs stay in `agent-platform-profile.test.ts` — the RFC 0085 → Accepted bar). 2026-06-01 (RFC 0084 — budget, quota + cost policy, the Active→Accepted behavioral gate) added `budget-enforcement.test.ts` (capability-gated on `budget.supported` via `behaviorGate('openwop-budget-enforcement', …)` — the §C/§D enforcement via the new `POST /v1/host/sample/budget/run` seam + the test event-log seam: a `hard-cost-exhaust` run emits the strict-ordered `budget.reserved → budget.consumed → budget.threshold.crossed{percent} → budget.exhausted → cap.breached{kind:"budget-cost"} → run.failed{error:"budget_exhausted"}` chain; a `model-denied` run is refused `budget_model_denied` BEFORE the provider call (fail-closed); an `advisory` host emits the `budget.*` events without stopping; every `budget.*` payload content-free backing `budget-no-pricing-leak`; new lib helper `src/lib/budgetPolicy.ts`; soft-skips on 404 — the RFC 0084 → Accepted bar). 2026-06-01 (RFC 0080 — agent memory capability reconciliation, the Active→Accepted behavioral gate) added `memory-degraded-projection.test.ts` (capability-gated on `agents.manifestRuntime.supported` + `memory.supported` via `behaviorGate('openwop-memory-degraded', …)` — the §C degraded-projection iff-contract on the NORMATIVE `GET /v1/agents`: a degraded inventory entry MUST carry `memoryDegraded:true` + a non-empty, unique `degradedMemoryDimensions[]` from the closed §A-name enum, a non-degraded entry MUST NOT, the inventory is non-empty, and the degraded branch runs non-vacuously when `OPENWOP_DEGRADED_AGENT_ID` names a known-degraded agent; black-box, no POST seam — the RFC 0080 → Accepted bar). This batch also documents the two RFC 0068 conformance seams (`POST /v1/host/sample/memory/consolidate` + `.../commitment/fire`) in `host-sample-test-seams.md` (the 0068 gated scenarios shipped in 1.14.0). 2026-06-01 (RFC 0034 — collector-side BYOK-canary inspection) added `otel-collector-canary-inspection.test.ts` (always-on server-free: stands up a real `OtelCollector`, POSTs synthetic OTLP/HTTP-JSON traces + metrics through its actual ingest path, and proves the new `findCanaryLeakage()` inspector catches a canary embedded in a span attribute / resource attribute / span name / metric data-point attribute while reporting ZERO hits on a redacted payload and never matching an empty canary — the non-vacuous proof that the conformance collector now inspects what the host's OTLP exporter ACTUALLY shipped over the wire, closing the `secret-leakage-otel-attribute` / `-debug-bundle-otel` collector-seam gap; the live capability-gated complement is the new collector-export describe block in `secret-leakage-otel-attribute.test.ts`). 2026-06-01 (RFC 0035 — sandbox wall-clock timeout, the 7th-of-8 graduation) added `sandbox-wasm-timeout.test.ts` (worker-driven server-free: `probeTimeout` in `wasm-sandbox-probe.ts` spawns a worker thread running the committed `misbehaving-timeout.wasm` + a main-thread kill-timer — the thread preemption a same-thread probe can't do — asserting `sandbox_timeout` with a well-behaved positive control; graduates `node-pack-sandbox-timeout` reference-impl→protocol, so 7 of 8 `node-pack-sandbox-*` invariants are now protocol-tier, only the JS-specific `no-eval` permanently exempt). 2026-05-31 (audit-response black-box / graduation batch) added three more: `sandbox-wasm-isolation.test.ts` (RFC 0035 — drives the committed `fixtures/wasm-sandbox/*.wasm` through `wasm-sandbox-probe.ts`: escape/capability-gate via static `WebAssembly.Module.imports()`, an OOB-store memory trap, double-instantiate isolation; 10/10; graduates 6 `node-pack-sandbox-*` invariants reference-impl→protocol), `workspace-cross-tenant-isolation-blackbox.test.ts` (RFC 0059 — two-credential black-box on the normative §C `/v1/host/workspace/files` endpoints: owner A writes, a second-tenant credential fails closed; no seam), and `prompt-resolution-chain-event.test.ts` (RFC 0029 — reads the durable `agent.promptResolved.chain[]` precedence record via the normative `GET /v1/runs/{runId}/events/poll`; no seam) — each the production-path proof that graduates its surface into the `openwop-core-standard` floor. 2026-05-31 (RFC 0088 — the `openwop-core-standard` Core Standard Profile, the audit-response Core Candidate target) added `core-standard-profile.test.ts` (always-on server-free derivation probe: `isCoreStandard` derives the §B floor — `openwop-core` ∧ `openwop-interrupts` ∧ (`openwop-stream-sse` ∨ `openwop-stream-poll`) — a bare `openwop-core` host without interrupts is excluded, a host with no event transport fails, and the annex is absent from `deriveProfiles` because it composes rather than redefines). 2026-05-31 (RFC 0082 — agent deployment lifecycle, the Active→Accepted behavioral gate) added `agent-deployment-lifecycle.test.ts` (capability-gated on `agents.deployment.supported` via `behaviorGate('openwop-deployment-lifecycle', …)` — the §E promotion contract via the new `POST /v1/host/sample/agents/deployment-transition` seam + the test event-log seam across four legs: `promote` (authorize RFC 0049 → approvalGate RFC 0051 → eval-verify RFC 0081 → content-free `deployment.promoted` with a seven-state `toState` + `toVersion`, the record validating `agent-deployment.schema.json`), `unauthorized` (fail-closed — `allowed:false`, no `deployment.promoted`, the behavioral leg of `deployment-promotion-fail-closed`), `eval-gate-unmet` (`eval_gate_unmet` denial, §E-3), and `channel-pin` (the §B `resolvedAgentVersion` recorded-fact on `agent.invocation.started`); new lib helper `src/lib/agentDeployment.ts`; soft-skips on 404 — the RFC 0082 → Accepted bar). 2026-05-31 (RFC 0081 — agent evaluation, the Active→Accepted behavioral gate) added `agent-eval-run.test.ts` (capability-gated on `agents.evalSuite.supported` via `behaviorGate('openwop-eval-run', …)` — the §B `mode:"eval"` projection via the new `POST /v1/host/sample/agents/eval-run` seam + the test event-log seam: `eval.started`-first → one `eval.scored` per task → `eval.completed`-once ordering (count == `eval.completed.taskCount`), the content-free `eval.scored` legs (`score` ∈ 0..1) backing `eval-summary-no-content-leak`, and the NORMATIVE `GET /v1/runs/{runId}/eval-summary` schema-valid `EvalSummary` round-trip with `passedCount <= taskCount`; new lib helper `src/lib/agentEval.ts`; soft-skips on 404 — the RFC 0081 → Accepted bar). 2026-05-31 (RFC 0083 — durable trigger bridge, the Active→Accepted behavioral gate) added `trigger-bridge-delivery.test.ts` (profile-gated on `openwop-trigger-bridge` derived from the live discovery doc — the §C delivery model via the `POST /v1/host/sample/trigger-bridge/deliver` seam + the test event-log seam: dedup→effectively-once `trigger.delivery.attempted{delivered}` (§C-1), retry-exhaustion→`{dead-lettered}` + `trigger.subscription.state.changed{toState:dead-lettered}` (§C-2 + RFC 0053), and the delivered run's `run.started.causationId` == the delivery id (§C / RFC 0040); both `trigger.*` events content-free; the always-on shape stays in `trigger-bridge-shape.test.ts`; new lib helper `src/lib/triggerBridge.ts`). 2026-05-31 (RFC 0087 — agent org-chart, the Active→Accepted behavioral gate) added two capability-gated behavioral scenarios (both gated on `agents.orgChart.supported`, black-box on the normative `/v1/agents/org-chart` surface — no new POST seam): `agent-org-chart-scoping.test.ts` (the `GET /v1/agents/org-chart` tree-shape — departments form an acyclic `parentDepartmentId` tree, members reference `host:<id>` roster entries — + the §D responsibility roll-up via `GET /v1/agents/org-chart/{departmentId}` with a deduped `responsibilities[]` union + the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ORG_CHART_DEPARTMENT_ID`) and `org-position-no-authority-escalation.test.ts` (the behavioral leg of the protocol-tier invariant — the live org-chart wire carries NO authority-bearing field on any member/department/responsibility-view object; the structural leg stays always-on in `agent-org-chart-shape.test.ts`, and the deeper RFC 0049/0051 authority-invariance legs stay reference-impl tier per the `agent-manifest-runtime` no-host-hook precedent). 2026-05-31 (RFCs 0086 + 0077 — the Active→Accepted behavioral gate) added four capability-gated behavioral scenarios so a non-steward host can be mechanically certified non-vacuously under `OPENWOP_REQUIRE_BEHAVIOR=true`: `agent-roster-attribution.test.ts` (RFC 0086 §B/§C; gated on `agents.roster.supported` — the normative `GET /v1/agents/roster` read shape + `total==roster.length`, the §C `roster.run.initiated`-before-`agent.invocation.started` ordering, the content-free payload backing `roster-attribution-no-content`, the durable work-item `triggerSubscriptionId`, and the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ROSTER_ID`), `agent-live-invocation-bracket.test.ts` (RFC 0077 §E; gated on `agents.liveRuntime.supported` — `agent.invocation.started`-first / `agent.invocation.completed`-last bracket, matching `invocationId`, `source`/`outcome` closed enums, content-free), `agent-live-structured-output.test.ts` (RFC 0077 §B step 6; gated on `agents.liveRuntime.structuredOutput` — a result violating `handoff.returnSchemaRef` fails the invocation `outcome:"failed"` rather than shipping as completed), and `agent-live-allowlist-enforced.test.ts` (RFC 0077 §F-1 / RFC 0002 §A14; gated on `agents.liveRuntime.supported` — a tool outside `toolAllowlist` is not callable); all four drive the documented `POST /v1/host/sample/roster/fire` + `POST /v1/host/sample/agents/live-invoke` seams plus the test event-log seam and soft-skip on 404 (these are the RFC 0086 / 0077 Active→Accepted bars). 2026-05-30 (RFC 0087 — agent org-chart, Draft -> Active) added `agent-org-chart-shape.test.ts` (always-on server-free: the `capabilities.agents.orgChart` shape + the `AgentOrgChart` round-trip + the non-`host:` member negative + the **§B structural non-authority guarantee** — the schema rejects a `scopes`/`canDispatch`/`permissions`/`authority` field on a member (`additionalProperties:false`), and a member's key set is exactly `{rosterId, departmentId, roleId, reportsTo}` — backing the protocol-tier `org-position-no-authority-escalation` invariant; no new RunEventType). 2026-05-30 (RFC 0086 — standing agent roster, Draft -> Active) added `agent-roster-shape.test.ts` (always-on server-free: the `capabilities.agents.roster` shape + the `AgentRosterEntry` round-trip + the `host:` `rosterId` + `agentRef` version-XOR-channel negatives + the content-free `roster.run.initiated` negatives backing the protocol-tier `roster-attribution-no-content` invariant + the additive `roster` inventory projection + RunEventType-enum membership). 2026-05-30 (RFC 0082 — agent deployment lifecycle, Draft -> Active) added `agent-deployment-shape.test.ts` (always-on server-free: the `capabilities.agents.deployment` shape + the `AgentDeployment` record round-trip + the `AgentRef` `channel` XOR `version` `not`-clause + the four `deployment.*` payloads + the content-free negatives backing the protocol-tier `deployment-event-no-content-leak` invariant). 2026-05-30 (RFC 0085 — `openwop-agent-platform` meta-profile, Draft -> Active) added `agent-platform-profile.test.ts` (always-on server-free derivation of the operational-annex `none`/`partial`/`full` status: all-floor ⇒ partial, missing-flag ⇒ none, the replay-OR-`nondeterminismPolicy.declared` term, floor+governance ⇒ full, missing-tenant-scope ⇒ partial-not-full per the honest-advertisement rule, eval/deploy/budget-are-advisory-not-hard-terms, + the `capabilities.nondeterminismPolicy.declared` shape). 2026-05-30 (RFC 0084 — budget, quota + cost policy, Draft -> Active) added `budget-policy-shape.test.ts` (always-on server-free: `budget-policy.schema.json` round-trip + the §A orthogonality guard — a wall-time field is rejected (it's RFC 0058's `runTimeoutMs`) — + threshold/onExhaustion negatives + the four content-free `budget.{reserved,consumed,threshold.crossed,exhausted}` payloads + the four `cap.breached{budget-*}` kinds + RunEventType-enum membership + the no-pricing-property structural check backing the protocol-tier `budget-no-pricing-leak` invariant + the `capabilities.budget`/`limits.maxBudget*` shape). 2026-05-30 (RFC 0083 — durable trigger + channel bridge, Draft -> Active) added `trigger-bridge-shape.test.ts` (always-on server-free: `trigger-subscription.schema.json` round-trip + missing-`state`/out-of-enum-`source`/unknown-property negatives + the four-state vocab + the two content-free `trigger.{subscription.state.changed,delivery.attempted}` payloads incl. closed `state`/`outcome` enums + RunEventType-enum membership + the `triggerBridge`/`webhooks.durable` capability shape + the `openwop-trigger-bridge` profile derivation incl. the no-dead-letter-sink negative). 2026-05-30 (RFC 0079 — credential provenance + egress policy, Draft -> Active) added `egress-provenance-shape.test.ts` (always-on server-free: `credential-provenance.schema.json` round-trip + `audiences:[]`/missing-`credentialId`/unknown-property negatives + the no-secret-property structural check backing the protocol-tier `egress-decision-no-secret-leak` invariant + the content-free `egress.decided` record incl. the `decision` enum + RunEventType-enum membership + the `httpClient.egressPolicy` shape; the behavioral `egress-credential-audience-bound` confused-deputy MUST is reference-impl tier, deferred to a host). 2026-05-30 (RFC 0078 — portable tool catalog, Draft -> Active) added `tool-descriptor-shape.test.ts` (always-on server-free: `tool-descriptor.schema.json` round-trip + the §C-1 `exec` ⇒ `host-extension` cross-field MUST (RFC 0069) + the `safetyTier`-required negative + `additionalProperties:false`, the `capabilities.toolCatalog` `supported`/`sources`/`sessionLifecycle` shape, and the two content-free `tool.session.{opened,closed}` payload $defs incl. the closed `outcome` enum + RunEventType-enum membership). 2026-05-30 (RFC 0080 — agent memory capability reconciliation, Draft -> Active) added `memory-capability-model-shape.test.ts` (always-on server-free: the additive `capabilities.memory.{writable,search,retention}` dimension shapes + malformed-instance negatives — `retention.ttl` non-boolean, out-of-enum `search.modes`, unknown property under `additionalProperties:false` — the `agent-inventory-response` `memoryDegraded`/`degradedMemoryDimensions` closed-enum fields, and the `openwop-memory` derivation surfacing for read/write + long-term hosts while withholding from `writable:false`). 2026-05-30 (RFC 0081 — agent evaluation, Draft -> Active) added `agent-eval-suite-shape.test.ts` (always-on server-free: the `capabilities.agents.evalSuite` shape + the `AgentEvalSuite`/`EvalSummary` schema round-trips + the three `eval.{started,scored,completed}` payloads + the content-free negatives — a task entry with a `taskOutput` body, a `safetyFinding` with an `excerpt` — backing the new `eval-summary-no-content-leak` SECURITY invariant). 2026-05-29 (RFC 0076 §B — `ctx.http.safeFetch` live-run audit) added `safefetch-live-audit.test.ts` (`behaviorGate('openwop-safefetch-live-audit', …)`, gated on `httpClient.safeFetch` + `toolHooks.prePostEvents`) — asserts the audit-when-both MUST against the **durable run event log** via the new `POST /v1/host/sample/http/safe-fetch-run` open seam + the test event-log seam, closing the seam-vs-production gap (a production `createSafeFetch()` with no audit hooks passes the inline `safefetch-behavior.test.ts` but FAILS this under `OPENWOP_REQUIRE_BEHAVIOR=true`); this is the RFC 0076 §B → Accepted bar; run seam soft-skips on 404 (host-pending). 2026-05-29 (RFC 0066 — `x-openwop-form` picker UX hints, Draft → Active) added `x-openwop-form-pack-manifest.test.ts` (always-on server-free: an annotated `configSchema` stays a valid 2020-12 schema + the advisory hints don't change what it accepts, each §A annotation matches the shape, an unknown `kind` validates for forward-compat, 3 negatives — missing/non-string `kind`, non-string `dependsOn`). 2026-05-29 (RFC 0076 §B — `ctx.http.safeFetch`) added `safefetch-behavior.test.ts` (seam-gated: SSRF block / DNS-rebinding / `Connection: upgrade` refusal / tool-hooks audit-when-both, via `POST /v1/host/sample/http/safe-fetch`; advertisement contract stays in `http-client-ssrf.test.ts`). 2026-05-29 (RFC 0076 §A — pack `runtime.requires[]` install gate) added two: `runtime-requires-shape.test.ts` (server-free closed-vocabulary validation — the 8 tokens validate, a raw builtin name is rejected, empty-array≡omission, `uniqueItems`) + `runtime-requires-install-gate.test.ts` (seam-gated install-grant / install-refuse → `pack_runtime_requirement_unmet` / non-sandbox SHOULD-projection, soft-skip on 404 via `POST /v1/host/sample/packs/install-gate`). 2026-05-29 (RFC 0047 — `host.oauth` authorization-code roundtrip) added `oauth-authorization-code-roundtrip.test.ts` — capability-gated on `capabilities.oauth.supported` + `grants` including `authorization_code`; drives the `POST /v1/host/sample/oauth/authorize-code-roundtrip` seam against the one canonical synthetic provider in `fixtures/oauth-providers/synthetic.json` (soft-skip on 404, Tier-2 host-pending), asserting a successful grant returns a credential REFERENCE (token persisted as a `host.credentials` entry) and that the authorization code / state / PKCE verifier / acquired access+refresh tokens never appear on any run-visible surface (RFC 0047 §C + §C.2 / `credential-payload-redaction`). Closes the RFC 0047 Tier-2 gap (capability-shape + redaction scenarios existed; the actual authorization-code dance was unexercised). 2026-05-26 (RFC 0070 — agent-manifest runtime) added `agent-manifest-runtime.test.ts`; 2026-05-26 (RFC 0071 — artifact-type + chat card packs) added six: `artifact-type-pack-manifest-validation.test.ts` + `artifact-schema-compile-bounded.test.ts` (server-free) + `artifact-type-pack-install.test.ts` + `artifact-type-store-without-render.test.ts` + `chat-card-pack-manifest-validation.test.ts` (server-free) + `chat-card-pack-execution.test.ts` (capability-gated, host-pending). 2026-05-26 (RFCs 0067 / 0068 / 0069 — spec-gap Draft cohort) added five scenarios: `byok-auth-modes.test.ts` (RFC 0067; always-on schema-shape of `aiProviders.authModes` + a discovery-gated §B auth-mode-contract cross-field check), `memory-consolidation-shape.test.ts` (RFC 0068; always-on shape of `agents.memoryConsolidation`/`agents.commitments` + the `agent.memory.consolidated`/`commitment.fired` payload $defs), `memory-consolidation-idempotent.test.ts` + `commitment-fired.test.ts` (RFC 0068; capability-gated behavioral, soft-skip on the documented `/v1/host/sample/memory/consolidate` + `/commitment/fire` seams), and `exec-not-protocol-tier.test.ts` (RFC 0069; always-on server-free structural assertion that the protocol corpus defines no `core.*`/`openwop.*` exec-class primitive — backs the `exec-must-not-be-protocol-tier` SECURITY invariant). 2026-05-25 (RFC 0061 — stateful agent-loop lifecycle, executionModel.version 5) added four `agent-loop-*.test.ts` scenarios: `-version5-shape` (always-on; validates `executionModel.statefulResume`/`transcriptWindow` + the 1–5 version ceiling) plus `-iteration-monotonic` (gated on `version >= 5`; `runOrchestrator.decided.iteration` increments 1,2,3… exactly once per turn), `-workspace-snapshot` (gated additionally on `host.workspace.supported`; a turn-i workspace write is invisible to turn i, visible to turn i+1), and `-stateful-resume` (gated on `statefulResume`; a mid-loop suspend resumes at the same iteration without resetting the counter) — the three behavioral scenarios drive the documented agent-loop seam (`POST /v1/host/sample/agentloop/run`) and soft-skip until a host wires it. 2026-05-25 (RFC 0059 — host.workspace M2, reference-host enforcement) added two `workspace-*.test.ts` scenarios: `-behavior` (capability-gated CRUD round-trip / `If-Match` 409 `workspace_conflict` / `workspace_too_large` / §D run-start snapshot, all via the real `/v1/host/workspace/files` §C endpoints) and `-cross-tenant-isolation` (WCT-1 — drives the documented `POST /v1/host/sample/workspace/op` seam to assert a file owned by one `{tenant, workspace}` is unreadable, on both `get` and `list`, under a different owner; backs the new `workspace-cross-tenant-isolation` SECURITY invariant). The in-memory reference host now advertises `capabilities.workspace.supported` and honors §C/§D/§E end-to-end. 2026-05-25 (RFC 0062 — memory.distillation "dreams") added five `distillation-*.test.ts` scenarios: `-shape` (always-on; validates the `capabilities.memory.distillation` block + the additive `distillation` sub-object on `memory.compacted`) plus `-token-budget` (within budget `tokensUsed ≤ tokenBudget`; an un-meetable budget → `token_budget_exceeded` with no partial archive), `-stable-archive` (same sources + budget ⇒ byte-stable archive checksum), `-index-roundtrip` (gated additionally on `indexEmitted`; the `MEMORY-INDEX.json` workspace file is retrievable + `workspace.updated` fired), and `-secret-carryforward` (SR-1: a redacted source secret never appears in the archive) — the four behavioral scenarios drive the documented memory-distillation seam (`POST /v1/host/sample/memory/distill`) and soft-skip until a host wires it. 2026-05-25 (RFC 0063 — core.subWorkflow.outputAttestation) added four `subrun-*.test.ts` scenarios: `-attestation-shape` (always-on; validates the `capabilities.agents.subRunAttestation` flag) plus `-checksum-stable` (the child output checksum is the byte-stable, key-order-invariant RFC 8785 JCS + SHA-256 digest), `-approval-gate` (`requireApproval` → `accept` merges, `reject` does not), and `-approval-fail-closed` (no `accept`/`edit-accept` → no merge; backs the deferred `subrun-merge-approval-fail-closed` invariant) — the three behavioral scenarios drive the documented sub-run attestation seam (`POST /v1/host/sample/subrun/attest`) and soft-skip until a host wires it. 2026-05-25 (RFC 0064 — host.toolHooks) added five `tool-hooks-*.test.ts` scenarios: `-shape` (always-on; validates the `capabilities.toolHooks` block + the optional content-free fields on `agentToolCalled` / `agentToolReturned`) plus `-content-free` (gated on `prePostEvents`), `-authorization-fail-closed` (gated on `perToolAuthorization`), `-rate-limit` (gated on `perToolRateLimit`), and `-secret-redaction` (gated on `prePostEvents` + the SR-1 `argsHash` redaction rule) — the four behavioral scenarios drive the documented tool-hooks invoke seam (`POST /v1/host/sample/toolhooks/invoke`) and soft-skip until a host wires it. 2026-05-25 (RFC 0060 — host.heartbeat) added four `heartbeat-*.test.ts` scenarios: `-capability-shape` (always-on; validates the `capabilities.heartbeat` block) plus `-fires-once-per-tick`, `-idempotent-no-spam`, and `-runtime-bound` (gated on `capabilities.heartbeat.supported` + the host heartbeat tick seam; soft-skip until a host wires it). 2026-05-25 (RFC 0057 — memory write-attribution) added five `memory-attribution-*.test.ts` scenarios: `-shape` (always-on advertisement check on `capabilities.memory.attribution`), plus `-no-content`, `-tenant-scoped`, `-emits-on-write`, and `-replay-stable` (gated on `capabilities.memory.attribution.emitsWriteEvents`) verifying the content-free `memory.written` RunEvent, its two SECURITY invariants (`memory-attribution-no-content` + `memory-attribution-tenant-scoped`), and the §D replay rule that a `replay`-mode fork MUST NOT regenerate `memoryId`. 2026-05-25 (RFC 0025 §C point 1 — test-catalog isolation invariant; pairs with the 25 publish-error scenarios in `pack-registry-publish.test.ts`) added `pack-registry-isolation.test.ts` — capability-gated on `capabilities.packs.testMode.{supported, isolated}: true`; PUTs a disposable pack into `/v1/packs-test/{name}` and asserts the same `(name, version)` does NOT appear via `GET /v1/packs/{name}` — anchors the test-catalog isolation MUST in RFC 0025 §C. 2026-05-25 (RFC 0028 Tier-2 post-promotion T2 — read-side sister scenario for workspace-membership enforcement) added `prompt-read-workspace-membership-enforced.test.ts` — gates on `capabilities.prompts.supported: true` (broader than `mutableLibrary` so read-only hosts that expose `?workspaceId=` are also probed); drives `GET /v1/prompts?workspaceId=<random-non-member>` and interprets the response: 4xx PASS (canonical envelope check on 403); 200 with empty `templates[]` PASS (correct null result for a nonexistent workspace); 200 with non-empty `templates[]` FAIL (cross-tenant leak); 200 without `templates[]` field SKIP (host doesn't expose workspace-scoped reads). Verifies SECURITY invariant `prompt-read-workspace-membership-enforced`. Same-day T1 strengthened `prompt-mutation-workspace-membership-enforced.test.ts` to pin `error === "workspace_membership_required"` when the host's refusal status is 403 (other refusal codes unconstrained). 2026-05-25 (RFC 0028 Tier-2 follow-up — workspace-membership enforcement on mutating prompt endpoints, filed in response to a self-disclosed adopter vulnerability) added `prompt-mutation-workspace-membership-enforced.test.ts` — capability-gated on `capabilities.prompts.mutableLibrary: true`; drives `POST /v1/prompts` with a cryptographically-random non-member `workspaceId` and asserts the host refuses (NOT a 2xx; any 4xx/5xx is acceptable — silent success is the failure mode). Verifies SECURITY invariant `prompt-mutation-workspace-membership-enforced`. 2026-05-22 (RFC 0034 §B follow-up — secret-leakage harness against the OTel + debug-bundle seams) added `secret-leakage-otel-attribute.test.ts` — gates on `capabilities.secrets.supported` + `capabilities.observability.testSeams.{otelScrape,debugBundleExport}` AND the `OPENWOP_CANARY_SECRET_VALUE` env (host operator + conformance runner agree on the canary). Drives the existing `openwop-smoke-byok-roundtrip` fixture end-to-end; scrapes both seams after run completion; hard-fails if the canary plaintext appears in any OTel span attribute or debug-bundle field. Verifies SECURITY invariants `secret-leakage-otel-attribute` + `secret-leakage-debug-bundle-otel`. 2026-05-22 (RFC 0041 Phase 4 — replay determinism under nondeterministic models) added three scenarios: `replay-divergence-at-refusal.test.ts` (advertisement-shape probe on `replayDeterminism.refusalDivergenceEmission` + 2 `it.todo` for the dual-direction refusal-divergence case), `replay-observable-sequence-determinism.test.ts` (capability-gated; behavioral assertion soft-skipped until a `conformance-phase4-nondet-tool` fixture ships), `replay-llm-cache-key-portable.test.ts` (intra-host reproducibility + non-recipe-field invariance + Phase 4 advertisement alignment — reuses the existing `POST /v1/host/sample/test/llm-cache-key` seam from the sibling `replay-llm-cache-key.test.ts`). 2026-05-20 (RFC 0027 §A templateKinds-coverage follow-up — paired with `prompt-end-to-end-events.test.ts`) added `prompt-all-four-kinds-events.test.ts` exercising all four `PromptKind` values (`system`, `user`, `schema-hint`, `few-shot`) end-to-end through the reference workflow-engine sample's `local.sample.demo.mock-ai` dispatch path; capability-gated via `behaviorGate('prompts-supported', ...)`. Closes the credibility gap where the host advertised `templateKinds: ["system", "user", "few-shot", "schema-hint"]` but only the system+user pair was actually wired into dispatch. 2026-05-20 (RFCs 0030–0033 — envelope LLM-contract-hardening track) added 15 scenarios across four `Active` RFCs: `envelope-reasoning-shape.test.ts` (RFC 0030, always-on; asserts the OPTIONAL `reasoning` property on the three universal-kind schemas + the `schema.response` deliberate omission), `envelope-reasoning-secret-redaction.test.ts` (RFC 0030, capability-gated on `capabilities.envelopes.reasoning.supported` + `secrets.supported`; 5 `it.todo()` placeholders for SECURITY invariant `envelope-reasoning-secret-redaction`), `envelope-tier-one-subset-static.test.ts` (RFC 0030, always-on for load-bearing rules — no `oneOf` / `allOf` / `not` / `prefixItems` / `propertyNames` anywhere; gated on `tierOneSubsetCompliance: "strict"` for OpenAI-strict-only constraints), `envelope-variant-discriminator-static.test.ts` (RFC 0031, always-on; asserts no `oneOf` + every `anyOf` branch declares a single-string-enum discriminator in `required` on every `schemas/envelopes/*.schema.json`), `model-capability-substituted.test.ts` (RFC 0031, advertisement-shape probe on `capabilities.modelCapabilities.advertised[]` identifier pattern + 5 `it.todo()` placeholders for SECURITY invariant `model-capability-substituted-no-credential-disclosure`), `model-capability-insufficient.test.ts` (RFC 0031, 6 `it.todo()` placeholders for refusal + no-recursive-fallback), `node-module-required-capabilities-shape.test.ts` (RFC 0031 SHOULD-tier authoring-convention; 4 `it.todo()` placeholders), and the six envelope-reliability events from RFC 0032 (`envelope-retry-attempted` carrying the shared advertisement-shape probe enforcing both MUST-tier events in `events[]` per RFC 0032 §C, plus `envelope-retry-exhausted`, `envelope-refusal-shape`, `envelope-truncated`, `envelope-nl-to-format-engaged`, `envelope-recovery-applied` — collectively 39 `it.todo()` placeholders covering retry/refusal/truncation/recovery + SECURITY invariants `envelope-refusal-no-prompt-leak` and `envelope-recovery-no-content-leak`), plus RFC 0033's two scenarios (`envelope-completion-distinguishes-truncation.test.ts` + `envelope-truncation-cap-exhaustion.test.ts` — 12 `it.todo()` placeholders covering the truncation-vs-schema-violation retry-routing distinction + the DoS-bound assertion). Reference workflow-engine sample advertises `capabilities.envelopes.reasoning: { supported: true, promptDirective: "off" }` + `tierOneSubsetCompliance: "warn"` honestly (schemas accept the field; host doesn't yet inject the directive); the other three RFCs' capability blocks defer to reference-host emission code per the staged RFC 0027 §G precedent. 2026-05-20 (RFC 0028 §B Phase B — prompt-pack boot-time install) added `prompt-pack-install.test.ts` (capability-gated on `capabilities.prompts.endpointsSupported: true`; asserts a host that ran the boot-time pack loader surfaces ≥ 1 pack-source template under `GET /v1/prompts?source=pack` carrying the canonical `meta.source: "pack"` + `meta.packName` + `meta.packVersion` stamps; positively identifies the in-tree `vendor.openwop.prompt-sample` reference pack's `writer-system` template when present). Pairs with the new `host/promptPackLoader.ts` boot-time entry on the reference workflow-engine sample, which scans `examples/packs/*` plus `OPENWOP_PROMPT_PACKS_DIR` and calls `installPackTemplates()` for each `kind: "prompt"` pack found. 2026-05-20 (RFC 0029 Phase C — prompt resolution chain wire shape) added three more scenarios: `prompt-resolution-chain-node-wins.test.ts` (capability-gated on `capabilities.prompts.supported: true`; asserts layer-1 node-config supersedes lower layers per `spec/v1/prompts.md` §"Resolution chain (normative)"), `prompt-resolution-chain-agent-intrinsic.test.ts` (additionally gated on `capabilities.prompts.agentBindings: true`; asserts agent intrinsic `systemPromptRef` wins over `promptOverrides` AND lower layers when the node has no layer-1 ref), `prompt-resolution-chain-fallback-cascade.test.ts` (asserts layer 3 workflow-defaults wins over layer 4 host-defaults; layer 4 host-defaults wins when 1-3 yield null; resolved is null when all four yield null but chain[] still lists every attempted layer). The scenarios drive the host's `POST /v1/host/sample/prompt/resolve` test seam (reference-host implementation deferred to follow-up slice per RFC 0021 staging precedent). 2026-05-20 (RFC 0027 Phase A — prompt templates wire shape) added three scenarios: `prompt-template-shape.test.ts` (always-on; Ajv compileability + positive/negative round-trip for PromptTemplate + PromptRef + PromptKind), `prompt-composed-secret-redaction.test.ts` (capability-gated on `capabilities.prompts.supported: true` + `observability: "full"`; asserts `[REDACTED:<secretId>]` markers in `prompt.composed` payloads for `source: "secret"` variable bindings per SECURITY/threat-model-secret-leakage.md §SR-1), `prompt-composed-trust-marker.test.ts` (same capability gates; asserts `<UNTRUSTED>...</UNTRUSTED>` wrapping + `contentTrust: "untrusted"` propagation per RFC 0020 §D). Paired with new `fixtures/prompt-templates/` sub-directory + per-fixture schema-validity describe block + future SECURITY invariants `prompt-composed-secret-redaction` and `prompt-composed-trust-marker` (lands alongside reference-host emission per RFC 0021 staging precedent). 2026-05-18 (RFC 0022 `Draft` — runtime variable mapping) added four `it.todo()` placeholder scenarios covering the new mapping surfaces on `core.dispatch` (§A — `dispatch-input-mapping.test.ts`, `dispatch-output-mapping.test.ts`, `dispatch-cross-worker-handoff.test.ts`) and `core.subWorkflow` (§B — `subworkflow-input-mapping.test.ts`). Gated on `capabilities.agents.dispatchMapping` (dispatch trio) and `capabilities.subWorkflow.inputMapping` (subWorkflow). Promote to live assertions when RFC 0022 reaches `Active` + a reference host advertises the matching flags. 2026-05-17 (RFC 0003 §D handoff-schema enforcement, HV-1) added `agentPackHandoffSchemaValidation.test.ts` — verifies the host validates dispatch payloads against `handoff.taskSchemaRef` AND return payloads against `handoff.returnSchemaRef` per RFC 0003 §D. Paired with the new `agent-pack-handoff-schema-enforcement` row in `SECURITY/invariants.yaml`. 2026-05-17 (AI Envelope gap-closure, DRAFT v1.x — `spec/v1/ai-envelope.md`) added 7 advertisement-shape scenarios with `it.todo()` behavioral placeholders gated on `capabilities.envelopeContracts.advertised: true`: `aiEnvelope.universalKinds.test.ts`, `aiEnvelope.schemaDrift.test.ts`, `aiEnvelope.correlationReplay.test.ts`, `aiEnvelope.contractRefusal.test.ts`, `aiEnvelope.trustBoundaryPropagation.test.ts`, `aiEnvelope.redaction.test.ts`, `aiEnvelope.capBreached.test.ts`. Paired with the new `envelope-redaction-sr-1-carry-forward` row in `SECURITY/invariants.yaml`. 2026-05-17 (post-publish hardening, deep audit of `core.openwop.agents`) added `agents-run-tool-allowlist.test.ts` — server-free scenario locking in the `core.openwop.agents@1.0.1` safety-fix that closes `OPENWOP-AUDIT-2026-003` (function-typed `tool.handler` properties rejected at `validateTools()` with `INVALID_TOOL_DECLARATION`; tool-driven runs require `ctx.agentRuntime`; tool-less safe fallback preserved). Paired with the new `agents-run-no-raw-handler` row in `SECURITY/invariants.yaml`. Same-day post-publish hardening added `idempotency-key-determinism.test.ts` — server-free scenario locking in the `core.openwop.http@1.1.2` determinism safety-fix (default `composite` mode produces deterministic keys in `(runId, nodeId, payload)`; removed `uuid` mode rejects with `CONFIG_INVALID`; cross-impl vector test lets third-party reimplementations verify wire agreement). Paired with the new `idempotency-key-deterministic` row in `SECURITY/invariants.yaml`. 2026-05-17 (Phase 3 of RFC 0013) added three server-free scenarios exercising the reference workflow-chain expansion library (`conformance/src/lib/workflow-chain-expansion.ts`): `workflow-chain-expansion.test.ts` (parameter substitution + node id collision avoidance + edge rewriting + capability propagation + runtime-invariance contract), `workflow-chain-unresolvable-typeid.test.ts` (rejection with `chain_unresolvable_typeid` when a chain references an unknown typeId), and `workflow-chain-pack-signature-verification.test.ts` (Ed25519 verification recipe reuse from `node-packs.md §Signing`). Earlier that day (Phase 1) added `workflow-chain-pack-manifest-validation.test.ts` — server-free schema-validation scenario covering the new `workflow-chain-pack-manifest.schema.json` (positive sample + two negatives: kind/contents mismatch and invalid `chainId`). Closes RFC 0013 (`Workflow-chain packs`, `Draft`) Phases 1 + 3 alongside the new `spec/v1/workflow-chain-packs.md`, the `Capabilities.workflowChainPacks` block, and the registry build-index/conformance-check `kind` routing from Phase 2. Earlier that day, the suite added 27 `it.todo()` placeholder scenarios paired with RFCs 0014-0020 (host capability surfaces — fs, kvStorage, tableStorage, queueBus, sql/vector/search, blob/cache, mcp.serverMount). These promote to live assertions when each RFC reaches `Active` + the matching capability block lands in `schemas/capabilities.schema.json` + a reference host advertises the capability. Earlier additions include 18 Multi-Agent Shift scenarios (Phases 1-5) added 2026-05-10, the `registry-public.test.ts` public-registry healthcheck added 2026-05-11 (opt-in via `OPENWOP_TEST_PUBLIC_REGISTRY=true`), the `replay-llm-cache-key.test.ts` placeholder added 2026-05-11 (three `it.todo()` cases for the cross-host LLM cache-key recipe per `replay.md` §"LLM cache-key recipe"), the two `production-*.test.ts` scenarios added 2026-05-11 for the `openwop-production` profile per RFC 0009 (`production-backpressure.test.ts`, `production-retention-expiry.test.ts`), the four `auth-*.test.ts` scenarios added 2026-05-11/12 for the production-auth profiles per RFC 0010 (`auth-api-key-rotation.test.ts`, `auth-oauth2-client-credentials.test.ts`, `auth-oidc-user-bearer.test.ts`, `auth-mtls.test.ts` (opt-in via `OPENWOP_TEST_MTLS=1`)), `replay-retention-expiry.test.ts` added 2026-05-12 (capability shape + 410/422 envelope per `replay.md` §"Retention and garbage collection"), `bulk-cancel.test.ts` added 2026-05-12 (Phase B close-out of R1 — `POST /v1/runs:bulk-cancel`), the two Phase H launch-blocker advertisement-contract scenarios added 2026-05-12 (`mcp-toolcall-redaction.test.ts` for the MCP-1 invariant per `host-capabilities.md §host.mcp` + `threat-model-prompt-injection.md §UNTRUSTED`, and `http-client-ssrf.test.ts` for the SSRF + body-size cap advertisement contract on `capabilities.httpClient`), the `wasm-pack-abi-version-rejection.test.ts` Track 7 scenario added 2026-05-12 for the ABI-mismatch positive path via the `vendor.openwop.misbehaving-abi` pack per RFC 0008 §H, the `otel-trace-propagation-subworkflow.test.ts` Track 11 close-out added 2026-05-13 (parent + child run spans share the inbound traceparent's traceId across the `core.subWorkflow` dispatch boundary), and the three RFC 0012 (Memory Compaction Profile, `Active`) scenarios added 2026-05-13/14: `memory-compaction-sr1-carry-forward.test.ts` (load-bearing SR-1 §D), `memory-compaction-event-emitted.test.ts` (canonical §B payload shape), and `memory-compaction-provenance-tag.test.ts` (soft assertion on §C `compacted-from:<id>` convention). All three gate on `capabilities.memory.compaction.supported` + the host's test seam at `/v1/test/memory/{seed,compact}` (Postgres reference host enables both via `OPENWOP_MEMORY_COMPACTION=true OPENWOP_TEST_TRIGGER_COMPACTION=true`). 2026-05-15 (gap-closure CF-3) added `interrupt-token-matrix.test.ts` (malformed / unknown / replay / cross-run-id paths on `GET|POST /v1/interrupts/{token}`). 2026-05-31 (RFC 0078 portable tool catalog + RFC 0079 credential provenance / egress policy — the Active→Accepted behavioral gate) added four: `tool-catalog-projection.test.ts` (capability-gated on `toolCatalog.supported` via `behaviorGate('openwop-tool-catalog', …)` — the NORMATIVE `GET /v1/tools` list with each `ToolDescriptor` schema-valid + `source`/`safetyTier` in the closed vocab + content-free, `GET /v1/tools/{toolId}` round-trip + unknown-id 404, 401-unauthenticated, and the §F-2 cross-principal non-disclosure; black-box, no POST seam), `tool-session-lifecycle.test.ts` (gated on `toolCatalog.sessionLifecycle` — the §D `tool.session.opened`-before / `tool.session.closed`-after bracket over the RFC 0064 call events via the `POST /v1/host/sample/tools/session-run` seam, one shared `sessionId`, content-free), `egress-audience-binding.test.ts` (KEYSTONE — gated on `httpClient.egressPolicy.supported`; the §C confused-deputy MUST via `POST /v1/host/sample/egress/decide`: an out-of-audience egress is denied/downgraded with the credential NOT attached, a provenance-unevaluable egress fails closed — the behavioral leg of `egress-credential-audience-bound`), and `egress-decision-content-free.test.ts` (the SR-1 canary — the credential value never surfaces in `egress.decided` and `reason` stays in the CLOSED vocabulary). The maintained scenario-to-spec map lives in [`coverage.md`](./coverage.md); this README keeps the operator quickstart and the historical scenario notes below.
96
+ The current suite has 330 scenario files under `src/scenarios/`. 2026-06-07 (RFCs 0090/0091/0092 — verifier turn + convergence, multimodal perception input, agent capability requirements) added six: the always-on, server-free shape probes `agent-verifier-shape.test.ts`, `aiproviders-input-shape.test.ts`, `agent-requires-capabilities-shape.test.ts`, plus the capability-gated **behavioral** legs `agent-capability-degraded-projection.test.ts` (RFC 0092 §B — the `degraded[]` projection on `GET /v1/agents`, black-box, non-vacuous via `OPENWOP_DEGRADED_CAPABILITY_AGENT_ID`), `callai-multimodal.test.ts` (RFC 0091 §A/§B — advertised modality accepted / unadvertised → `unsupported_modality`, via the `POST /v1/host/sample/ai/call` seam), and `verifier-gating.test.ts` (RFC 0090 §B — a `fail` verdict blocks commit, via the `POST /v1/host/sample/agents/verify-run` seam). The three behavioral legs soft-skip by default and hard-fail under `OPENWOP_REQUIRE_BEHAVIOR=true` — the Active→Accepted reference-host proof for each RFC. 2026-06-02 (RFC 0082 §B — deployment channel resolve-and-pin, production-path coverage) added `agent-channel-dispatch.test.ts` (capability-gated on `agents.deployment.supported` + the seeded `conformance-agent-channel-dispatch` fixture + advertised `replay` mode via `behaviorGate('openwop-deployment-channel-dispatch', …)` — proves the §B pin from a REAL run graph, complementing `agent-deployment-lifecycle.test.ts` Leg 4's host-sample seam: a canonical `POST /v1/runs` of a node binding `agent.channel:"stable"` MUST record `resolvedChannel` + `resolvedAgentVersion` on `agent.invocation.started` (RFC 0077), a `:fork{mode:"replay"}` MUST re-read that recorded version, and the seam-guarded Leg 3 MOVES the channel then asserts a replay STILL carries the original pin — never re-resolving a moved channel; soft-skips by default, hard-fails under `OPENWOP_REQUIRE_BEHAVIOR=true` — the production-path proof of the §B contract). 2026-06-01 (RFC 0085 — `openwop-agent-platform` meta-profile, the Active→Accepted behavioral gate) added `agent-platform-aggregate-evidence.test.ts` (capability-gated on a host CLAIMING `openwop-agent-platform` in its live discovery `profiles[]` via `behaviorGate('openwop-agent-platform', …)` — the §C/§D honest-advertisement rule on the live `/.well-known/openwop`: the claim MUST satisfy the §B floor predicate (`isAgentPlatformPartial` → `partial`/`full`, never `none`), backed by the per-capability evidence not the profile string; `OPENWOP_AGENT_PLATFORM_TIER=full` forces the non-vacuous full bar — all governance terms + tenant installScope + all 16 §D terms; server-requiring, the always-on §B/§D derivation legs stay in `agent-platform-profile.test.ts` — the RFC 0085 → Accepted bar). 2026-06-01 (RFC 0084 — budget, quota + cost policy, the Active→Accepted behavioral gate) added `budget-enforcement.test.ts` (capability-gated on `budget.supported` via `behaviorGate('openwop-budget-enforcement', …)` — the §C/§D enforcement via the new `POST /v1/host/sample/budget/run` seam + the test event-log seam: a `hard-cost-exhaust` run emits the strict-ordered `budget.reserved → budget.consumed → budget.threshold.crossed{percent} → budget.exhausted → cap.breached{kind:"budget-cost"} → run.failed{error:"budget_exhausted"}` chain; a `model-denied` run is refused `budget_model_denied` BEFORE the provider call (fail-closed); an `advisory` host emits the `budget.*` events without stopping; every `budget.*` payload content-free backing `budget-no-pricing-leak`; new lib helper `src/lib/budgetPolicy.ts`; soft-skips on 404 — the RFC 0084 → Accepted bar). 2026-06-01 (RFC 0080 — agent memory capability reconciliation, the Active→Accepted behavioral gate) added `memory-degraded-projection.test.ts` (capability-gated on `agents.manifestRuntime.supported` + `memory.supported` via `behaviorGate('openwop-memory-degraded', …)` — the §C degraded-projection iff-contract on the NORMATIVE `GET /v1/agents`: a degraded inventory entry MUST carry `memoryDegraded:true` + a non-empty, unique `degradedMemoryDimensions[]` from the closed §A-name enum, a non-degraded entry MUST NOT, the inventory is non-empty, and the degraded branch runs non-vacuously when `OPENWOP_DEGRADED_AGENT_ID` names a known-degraded agent; black-box, no POST seam — the RFC 0080 → Accepted bar). This batch also documents the two RFC 0068 conformance seams (`POST /v1/host/sample/memory/consolidate` + `.../commitment/fire`) in `host-sample-test-seams.md` (the 0068 gated scenarios shipped in 1.14.0). 2026-06-01 (RFC 0034 — collector-side BYOK-canary inspection) added `otel-collector-canary-inspection.test.ts` (always-on server-free: stands up a real `OtelCollector`, POSTs synthetic OTLP/HTTP-JSON traces + metrics through its actual ingest path, and proves the new `findCanaryLeakage()` inspector catches a canary embedded in a span attribute / resource attribute / span name / metric data-point attribute while reporting ZERO hits on a redacted payload and never matching an empty canary — the non-vacuous proof that the conformance collector now inspects what the host's OTLP exporter ACTUALLY shipped over the wire, closing the `secret-leakage-otel-attribute` / `-debug-bundle-otel` collector-seam gap; the live capability-gated complement is the new collector-export describe block in `secret-leakage-otel-attribute.test.ts`). 2026-06-01 (RFC 0035 — sandbox wall-clock timeout, the 7th-of-8 graduation) added `sandbox-wasm-timeout.test.ts` (worker-driven server-free: `probeTimeout` in `wasm-sandbox-probe.ts` spawns a worker thread running the committed `misbehaving-timeout.wasm` + a main-thread kill-timer — the thread preemption a same-thread probe can't do — asserting `sandbox_timeout` with a well-behaved positive control; graduates `node-pack-sandbox-timeout` reference-impl→protocol, so 7 of 8 `node-pack-sandbox-*` invariants are now protocol-tier, only the JS-specific `no-eval` permanently exempt). 2026-05-31 (audit-response black-box / graduation batch) added three more: `sandbox-wasm-isolation.test.ts` (RFC 0035 — drives the committed `fixtures/wasm-sandbox/*.wasm` through `wasm-sandbox-probe.ts`: escape/capability-gate via static `WebAssembly.Module.imports()`, an OOB-store memory trap, double-instantiate isolation; 10/10; graduates 6 `node-pack-sandbox-*` invariants reference-impl→protocol), `workspace-cross-tenant-isolation-blackbox.test.ts` (RFC 0059 — two-credential black-box on the normative §C `/v1/host/workspace/files` endpoints: owner A writes, a second-tenant credential fails closed; no seam), and `prompt-resolution-chain-event.test.ts` (RFC 0029 — reads the durable `agent.promptResolved.chain[]` precedence record via the normative `GET /v1/runs/{runId}/events/poll`; no seam) — each the production-path proof that graduates its surface into the `openwop-core-standard` floor. 2026-05-31 (RFC 0088 — the `openwop-core-standard` Core Standard Profile, the audit-response Core Candidate target) added `core-standard-profile.test.ts` (always-on server-free derivation probe: `isCoreStandard` derives the §B floor — `openwop-core` ∧ `openwop-interrupts` ∧ (`openwop-stream-sse` ∨ `openwop-stream-poll`) — a bare `openwop-core` host without interrupts is excluded, a host with no event transport fails, and the annex is absent from `deriveProfiles` because it composes rather than redefines). 2026-05-31 (RFC 0082 — agent deployment lifecycle, the Active→Accepted behavioral gate) added `agent-deployment-lifecycle.test.ts` (capability-gated on `agents.deployment.supported` via `behaviorGate('openwop-deployment-lifecycle', …)` — the §E promotion contract via the new `POST /v1/host/sample/agents/deployment-transition` seam + the test event-log seam across four legs: `promote` (authorize RFC 0049 → approvalGate RFC 0051 → eval-verify RFC 0081 → content-free `deployment.promoted` with a seven-state `toState` + `toVersion`, the record validating `agent-deployment.schema.json`), `unauthorized` (fail-closed — `allowed:false`, no `deployment.promoted`, the behavioral leg of `deployment-promotion-fail-closed`), `eval-gate-unmet` (`eval_gate_unmet` denial, §E-3), and `channel-pin` (the §B `resolvedAgentVersion` recorded-fact on `agent.invocation.started`); new lib helper `src/lib/agentDeployment.ts`; soft-skips on 404 — the RFC 0082 → Accepted bar). 2026-05-31 (RFC 0081 — agent evaluation, the Active→Accepted behavioral gate) added `agent-eval-run.test.ts` (capability-gated on `agents.evalSuite.supported` via `behaviorGate('openwop-eval-run', …)` — the §B `mode:"eval"` projection via the new `POST /v1/host/sample/agents/eval-run` seam + the test event-log seam: `eval.started`-first → one `eval.scored` per task → `eval.completed`-once ordering (count == `eval.completed.taskCount`), the content-free `eval.scored` legs (`score` ∈ 0..1) backing `eval-summary-no-content-leak`, and the NORMATIVE `GET /v1/runs/{runId}/eval-summary` schema-valid `EvalSummary` round-trip with `passedCount <= taskCount`; new lib helper `src/lib/agentEval.ts`; soft-skips on 404 — the RFC 0081 → Accepted bar). 2026-05-31 (RFC 0083 — durable trigger bridge, the Active→Accepted behavioral gate) added `trigger-bridge-delivery.test.ts` (profile-gated on `openwop-trigger-bridge` derived from the live discovery doc — the §C delivery model via the `POST /v1/host/sample/trigger-bridge/deliver` seam + the test event-log seam: dedup→effectively-once `trigger.delivery.attempted{delivered}` (§C-1), retry-exhaustion→`{dead-lettered}` + `trigger.subscription.state.changed{toState:dead-lettered}` (§C-2 + RFC 0053), and the delivered run's `run.started.causationId` == the delivery id (§C / RFC 0040); both `trigger.*` events content-free; the always-on shape stays in `trigger-bridge-shape.test.ts`; new lib helper `src/lib/triggerBridge.ts`). 2026-05-31 (RFC 0087 — agent org-chart, the Active→Accepted behavioral gate) added two capability-gated behavioral scenarios (both gated on `agents.orgChart.supported`, black-box on the normative `/v1/agents/org-chart` surface — no new POST seam): `agent-org-chart-scoping.test.ts` (the `GET /v1/agents/org-chart` tree-shape — departments form an acyclic `parentDepartmentId` tree, members reference `host:<id>` roster entries — + the §D responsibility roll-up via `GET /v1/agents/org-chart/{departmentId}` with a deduped `responsibilities[]` union + the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ORG_CHART_DEPARTMENT_ID`) and `org-position-no-authority-escalation.test.ts` (the behavioral leg of the protocol-tier invariant — the live org-chart wire carries NO authority-bearing field on any member/department/responsibility-view object; the structural leg stays always-on in `agent-org-chart-shape.test.ts`, and the deeper RFC 0049/0051 authority-invariance legs stay reference-impl tier per the `agent-manifest-runtime` no-host-hook precedent). 2026-05-31 (RFCs 0086 + 0077 — the Active→Accepted behavioral gate) added four capability-gated behavioral scenarios so a non-steward host can be mechanically certified non-vacuously under `OPENWOP_REQUIRE_BEHAVIOR=true`: `agent-roster-attribution.test.ts` (RFC 0086 §B/§C; gated on `agents.roster.supported` — the normative `GET /v1/agents/roster` read shape + `total==roster.length`, the §C `roster.run.initiated`-before-`agent.invocation.started` ordering, the content-free payload backing `roster-attribution-no-content`, the durable work-item `triggerSubscriptionId`, and the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ROSTER_ID`), `agent-live-invocation-bracket.test.ts` (RFC 0077 §E; gated on `agents.liveRuntime.supported` — `agent.invocation.started`-first / `agent.invocation.completed`-last bracket, matching `invocationId`, `source`/`outcome` closed enums, content-free), `agent-live-structured-output.test.ts` (RFC 0077 §B step 6; gated on `agents.liveRuntime.structuredOutput` — a result violating `handoff.returnSchemaRef` fails the invocation `outcome:"failed"` rather than shipping as completed), and `agent-live-allowlist-enforced.test.ts` (RFC 0077 §F-1 / RFC 0002 §A14; gated on `agents.liveRuntime.supported` — a tool outside `toolAllowlist` is not callable); all four drive the documented `POST /v1/host/sample/roster/fire` + `POST /v1/host/sample/agents/live-invoke` seams plus the test event-log seam and soft-skip on 404 (these are the RFC 0086 / 0077 Active→Accepted bars). 2026-05-30 (RFC 0087 — agent org-chart, Draft -> Active) added `agent-org-chart-shape.test.ts` (always-on server-free: the `capabilities.agents.orgChart` shape + the `AgentOrgChart` round-trip + the non-`host:` member negative + the **§B structural non-authority guarantee** — the schema rejects a `scopes`/`canDispatch`/`permissions`/`authority` field on a member (`additionalProperties:false`), and a member's key set is exactly `{rosterId, departmentId, roleId, reportsTo}` — backing the protocol-tier `org-position-no-authority-escalation` invariant; no new RunEventType). 2026-05-30 (RFC 0086 — standing agent roster, Draft -> Active) added `agent-roster-shape.test.ts` (always-on server-free: the `capabilities.agents.roster` shape + the `AgentRosterEntry` round-trip + the `host:` `rosterId` + `agentRef` version-XOR-channel negatives + the content-free `roster.run.initiated` negatives backing the protocol-tier `roster-attribution-no-content` invariant + the additive `roster` inventory projection + RunEventType-enum membership). 2026-05-30 (RFC 0082 — agent deployment lifecycle, Draft -> Active) added `agent-deployment-shape.test.ts` (always-on server-free: the `capabilities.agents.deployment` shape + the `AgentDeployment` record round-trip + the `AgentRef` `channel` XOR `version` `not`-clause + the four `deployment.*` payloads + the content-free negatives backing the protocol-tier `deployment-event-no-content-leak` invariant). 2026-05-30 (RFC 0085 — `openwop-agent-platform` meta-profile, Draft -> Active) added `agent-platform-profile.test.ts` (always-on server-free derivation of the operational-annex `none`/`partial`/`full` status: all-floor ⇒ partial, missing-flag ⇒ none, the replay-OR-`nondeterminismPolicy.declared` term, floor+governance ⇒ full, missing-tenant-scope ⇒ partial-not-full per the honest-advertisement rule, eval/deploy/budget-are-advisory-not-hard-terms, + the `capabilities.nondeterminismPolicy.declared` shape). 2026-05-30 (RFC 0084 — budget, quota + cost policy, Draft -> Active) added `budget-policy-shape.test.ts` (always-on server-free: `budget-policy.schema.json` round-trip + the §A orthogonality guard — a wall-time field is rejected (it's RFC 0058's `runTimeoutMs`) — + threshold/onExhaustion negatives + the four content-free `budget.{reserved,consumed,threshold.crossed,exhausted}` payloads + the four `cap.breached{budget-*}` kinds + RunEventType-enum membership + the no-pricing-property structural check backing the protocol-tier `budget-no-pricing-leak` invariant + the `capabilities.budget`/`limits.maxBudget*` shape). 2026-05-30 (RFC 0083 — durable trigger + channel bridge, Draft -> Active) added `trigger-bridge-shape.test.ts` (always-on server-free: `trigger-subscription.schema.json` round-trip + missing-`state`/out-of-enum-`source`/unknown-property negatives + the four-state vocab + the two content-free `trigger.{subscription.state.changed,delivery.attempted}` payloads incl. closed `state`/`outcome` enums + RunEventType-enum membership + the `triggerBridge`/`webhooks.durable` capability shape + the `openwop-trigger-bridge` profile derivation incl. the no-dead-letter-sink negative). 2026-05-30 (RFC 0079 — credential provenance + egress policy, Draft -> Active) added `egress-provenance-shape.test.ts` (always-on server-free: `credential-provenance.schema.json` round-trip + `audiences:[]`/missing-`credentialId`/unknown-property negatives + the no-secret-property structural check backing the protocol-tier `egress-decision-no-secret-leak` invariant + the content-free `egress.decided` record incl. the `decision` enum + RunEventType-enum membership + the `httpClient.egressPolicy` shape; the behavioral `egress-credential-audience-bound` confused-deputy MUST is reference-impl tier, deferred to a host). 2026-05-30 (RFC 0078 — portable tool catalog, Draft -> Active) added `tool-descriptor-shape.test.ts` (always-on server-free: `tool-descriptor.schema.json` round-trip + the §C-1 `exec` ⇒ `host-extension` cross-field MUST (RFC 0069) + the `safetyTier`-required negative + `additionalProperties:false`, the `capabilities.toolCatalog` `supported`/`sources`/`sessionLifecycle` shape, and the two content-free `tool.session.{opened,closed}` payload $defs incl. the closed `outcome` enum + RunEventType-enum membership). 2026-05-30 (RFC 0080 — agent memory capability reconciliation, Draft -> Active) added `memory-capability-model-shape.test.ts` (always-on server-free: the additive `capabilities.memory.{writable,search,retention}` dimension shapes + malformed-instance negatives — `retention.ttl` non-boolean, out-of-enum `search.modes`, unknown property under `additionalProperties:false` — the `agent-inventory-response` `memoryDegraded`/`degradedMemoryDimensions` closed-enum fields, and the `openwop-memory` derivation surfacing for read/write + long-term hosts while withholding from `writable:false`). 2026-05-30 (RFC 0081 — agent evaluation, Draft -> Active) added `agent-eval-suite-shape.test.ts` (always-on server-free: the `capabilities.agents.evalSuite` shape + the `AgentEvalSuite`/`EvalSummary` schema round-trips + the three `eval.{started,scored,completed}` payloads + the content-free negatives — a task entry with a `taskOutput` body, a `safetyFinding` with an `excerpt` — backing the new `eval-summary-no-content-leak` SECURITY invariant). 2026-05-29 (RFC 0076 §B — `ctx.http.safeFetch` live-run audit) added `safefetch-live-audit.test.ts` (`behaviorGate('openwop-safefetch-live-audit', …)`, gated on `httpClient.safeFetch` + `toolHooks.prePostEvents`) — asserts the audit-when-both MUST against the **durable run event log** via the new `POST /v1/host/sample/http/safe-fetch-run` open seam + the test event-log seam, closing the seam-vs-production gap (a production `createSafeFetch()` with no audit hooks passes the inline `safefetch-behavior.test.ts` but FAILS this under `OPENWOP_REQUIRE_BEHAVIOR=true`); this is the RFC 0076 §B → Accepted bar; run seam soft-skips on 404 (host-pending). 2026-05-29 (RFC 0066 — `x-openwop-form` picker UX hints, Draft → Active) added `x-openwop-form-pack-manifest.test.ts` (always-on server-free: an annotated `configSchema` stays a valid 2020-12 schema + the advisory hints don't change what it accepts, each §A annotation matches the shape, an unknown `kind` validates for forward-compat, 3 negatives — missing/non-string `kind`, non-string `dependsOn`). 2026-05-29 (RFC 0076 §B — `ctx.http.safeFetch`) added `safefetch-behavior.test.ts` (seam-gated: SSRF block / DNS-rebinding / `Connection: upgrade` refusal / tool-hooks audit-when-both, via `POST /v1/host/sample/http/safe-fetch`; advertisement contract stays in `http-client-ssrf.test.ts`). 2026-05-29 (RFC 0076 §A — pack `runtime.requires[]` install gate) added two: `runtime-requires-shape.test.ts` (server-free closed-vocabulary validation — the 8 tokens validate, a raw builtin name is rejected, empty-array≡omission, `uniqueItems`) + `runtime-requires-install-gate.test.ts` (seam-gated install-grant / install-refuse → `pack_runtime_requirement_unmet` / non-sandbox SHOULD-projection, soft-skip on 404 via `POST /v1/host/sample/packs/install-gate`). 2026-05-29 (RFC 0047 — `host.oauth` authorization-code roundtrip) added `oauth-authorization-code-roundtrip.test.ts` — capability-gated on `capabilities.oauth.supported` + `grants` including `authorization_code`; drives the `POST /v1/host/sample/oauth/authorize-code-roundtrip` seam against the one canonical synthetic provider in `fixtures/oauth-providers/synthetic.json` (soft-skip on 404, Tier-2 host-pending), asserting a successful grant returns a credential REFERENCE (token persisted as a `host.credentials` entry) and that the authorization code / state / PKCE verifier / acquired access+refresh tokens never appear on any run-visible surface (RFC 0047 §C + §C.2 / `credential-payload-redaction`). Closes the RFC 0047 Tier-2 gap (capability-shape + redaction scenarios existed; the actual authorization-code dance was unexercised). 2026-05-26 (RFC 0070 — agent-manifest runtime) added `agent-manifest-runtime.test.ts`; 2026-05-26 (RFC 0071 — artifact-type + chat card packs) added six: `artifact-type-pack-manifest-validation.test.ts` + `artifact-schema-compile-bounded.test.ts` (server-free) + `artifact-type-pack-install.test.ts` + `artifact-type-store-without-render.test.ts` + `chat-card-pack-manifest-validation.test.ts` (server-free) + `chat-card-pack-execution.test.ts` (capability-gated, host-pending). 2026-05-26 (RFCs 0067 / 0068 / 0069 — spec-gap Draft cohort) added five scenarios: `byok-auth-modes.test.ts` (RFC 0067; always-on schema-shape of `aiProviders.authModes` + a discovery-gated §B auth-mode-contract cross-field check), `memory-consolidation-shape.test.ts` (RFC 0068; always-on shape of `agents.memoryConsolidation`/`agents.commitments` + the `agent.memory.consolidated`/`commitment.fired` payload $defs), `memory-consolidation-idempotent.test.ts` + `commitment-fired.test.ts` (RFC 0068; capability-gated behavioral, soft-skip on the documented `/v1/host/sample/memory/consolidate` + `/commitment/fire` seams), and `exec-not-protocol-tier.test.ts` (RFC 0069; always-on server-free structural assertion that the protocol corpus defines no `core.*`/`openwop.*` exec-class primitive — backs the `exec-must-not-be-protocol-tier` SECURITY invariant). 2026-05-25 (RFC 0061 — stateful agent-loop lifecycle, executionModel.version 5) added four `agent-loop-*.test.ts` scenarios: `-version5-shape` (always-on; validates `executionModel.statefulResume`/`transcriptWindow` + the 1–5 version ceiling) plus `-iteration-monotonic` (gated on `version >= 5`; `runOrchestrator.decided.iteration` increments 1,2,3… exactly once per turn), `-workspace-snapshot` (gated additionally on `host.workspace.supported`; a turn-i workspace write is invisible to turn i, visible to turn i+1), and `-stateful-resume` (gated on `statefulResume`; a mid-loop suspend resumes at the same iteration without resetting the counter) — the three behavioral scenarios drive the documented agent-loop seam (`POST /v1/host/sample/agentloop/run`) and soft-skip until a host wires it. 2026-05-25 (RFC 0059 — host.workspace M2, reference-host enforcement) added two `workspace-*.test.ts` scenarios: `-behavior` (capability-gated CRUD round-trip / `If-Match` 409 `workspace_conflict` / `workspace_too_large` / §D run-start snapshot, all via the real `/v1/host/workspace/files` §C endpoints) and `-cross-tenant-isolation` (WCT-1 — drives the documented `POST /v1/host/sample/workspace/op` seam to assert a file owned by one `{tenant, workspace}` is unreadable, on both `get` and `list`, under a different owner; backs the new `workspace-cross-tenant-isolation` SECURITY invariant). The in-memory reference host now advertises `capabilities.workspace.supported` and honors §C/§D/§E end-to-end. 2026-05-25 (RFC 0062 — memory.distillation "dreams") added five `distillation-*.test.ts` scenarios: `-shape` (always-on; validates the `capabilities.memory.distillation` block + the additive `distillation` sub-object on `memory.compacted`) plus `-token-budget` (within budget `tokensUsed ≤ tokenBudget`; an un-meetable budget → `token_budget_exceeded` with no partial archive), `-stable-archive` (same sources + budget ⇒ byte-stable archive checksum), `-index-roundtrip` (gated additionally on `indexEmitted`; the `MEMORY-INDEX.json` workspace file is retrievable + `workspace.updated` fired), and `-secret-carryforward` (SR-1: a redacted source secret never appears in the archive) — the four behavioral scenarios drive the documented memory-distillation seam (`POST /v1/host/sample/memory/distill`) and soft-skip until a host wires it. 2026-05-25 (RFC 0063 — core.subWorkflow.outputAttestation) added four `subrun-*.test.ts` scenarios: `-attestation-shape` (always-on; validates the `capabilities.agents.subRunAttestation` flag) plus `-checksum-stable` (the child output checksum is the byte-stable, key-order-invariant RFC 8785 JCS + SHA-256 digest), `-approval-gate` (`requireApproval` → `accept` merges, `reject` does not), and `-approval-fail-closed` (no `accept`/`edit-accept` → no merge; backs the deferred `subrun-merge-approval-fail-closed` invariant) — the three behavioral scenarios drive the documented sub-run attestation seam (`POST /v1/host/sample/subrun/attest`) and soft-skip until a host wires it. 2026-05-25 (RFC 0064 — host.toolHooks) added five `tool-hooks-*.test.ts` scenarios: `-shape` (always-on; validates the `capabilities.toolHooks` block + the optional content-free fields on `agentToolCalled` / `agentToolReturned`) plus `-content-free` (gated on `prePostEvents`), `-authorization-fail-closed` (gated on `perToolAuthorization`), `-rate-limit` (gated on `perToolRateLimit`), and `-secret-redaction` (gated on `prePostEvents` + the SR-1 `argsHash` redaction rule) — the four behavioral scenarios drive the documented tool-hooks invoke seam (`POST /v1/host/sample/toolhooks/invoke`) and soft-skip until a host wires it. 2026-05-25 (RFC 0060 — host.heartbeat) added four `heartbeat-*.test.ts` scenarios: `-capability-shape` (always-on; validates the `capabilities.heartbeat` block) plus `-fires-once-per-tick`, `-idempotent-no-spam`, and `-runtime-bound` (gated on `capabilities.heartbeat.supported` + the host heartbeat tick seam; soft-skip until a host wires it). 2026-05-25 (RFC 0057 — memory write-attribution) added five `memory-attribution-*.test.ts` scenarios: `-shape` (always-on advertisement check on `capabilities.memory.attribution`), plus `-no-content`, `-tenant-scoped`, `-emits-on-write`, and `-replay-stable` (gated on `capabilities.memory.attribution.emitsWriteEvents`) verifying the content-free `memory.written` RunEvent, its two SECURITY invariants (`memory-attribution-no-content` + `memory-attribution-tenant-scoped`), and the §D replay rule that a `replay`-mode fork MUST NOT regenerate `memoryId`. 2026-05-25 (RFC 0025 §C point 1 — test-catalog isolation invariant; pairs with the 25 publish-error scenarios in `pack-registry-publish.test.ts`) added `pack-registry-isolation.test.ts` — capability-gated on `capabilities.packs.testMode.{supported, isolated}: true`; PUTs a disposable pack into `/v1/packs-test/{name}` and asserts the same `(name, version)` does NOT appear via `GET /v1/packs/{name}` — anchors the test-catalog isolation MUST in RFC 0025 §C. 2026-05-25 (RFC 0028 Tier-2 post-promotion T2 — read-side sister scenario for workspace-membership enforcement) added `prompt-read-workspace-membership-enforced.test.ts` — gates on `capabilities.prompts.supported: true` (broader than `mutableLibrary` so read-only hosts that expose `?workspaceId=` are also probed); drives `GET /v1/prompts?workspaceId=<random-non-member>` and interprets the response: 4xx PASS (canonical envelope check on 403); 200 with empty `templates[]` PASS (correct null result for a nonexistent workspace); 200 with non-empty `templates[]` FAIL (cross-tenant leak); 200 without `templates[]` field SKIP (host doesn't expose workspace-scoped reads). Verifies SECURITY invariant `prompt-read-workspace-membership-enforced`. Same-day T1 strengthened `prompt-mutation-workspace-membership-enforced.test.ts` to pin `error === "workspace_membership_required"` when the host's refusal status is 403 (other refusal codes unconstrained). 2026-05-25 (RFC 0028 Tier-2 follow-up — workspace-membership enforcement on mutating prompt endpoints, filed in response to a self-disclosed adopter vulnerability) added `prompt-mutation-workspace-membership-enforced.test.ts` — capability-gated on `capabilities.prompts.mutableLibrary: true`; drives `POST /v1/prompts` with a cryptographically-random non-member `workspaceId` and asserts the host refuses (NOT a 2xx; any 4xx/5xx is acceptable — silent success is the failure mode). Verifies SECURITY invariant `prompt-mutation-workspace-membership-enforced`. 2026-05-22 (RFC 0034 §B follow-up — secret-leakage harness against the OTel + debug-bundle seams) added `secret-leakage-otel-attribute.test.ts` — gates on `capabilities.secrets.supported` + `capabilities.observability.testSeams.{otelScrape,debugBundleExport}` AND the `OPENWOP_CANARY_SECRET_VALUE` env (host operator + conformance runner agree on the canary). Drives the existing `openwop-smoke-byok-roundtrip` fixture end-to-end; scrapes both seams after run completion; hard-fails if the canary plaintext appears in any OTel span attribute or debug-bundle field. Verifies SECURITY invariants `secret-leakage-otel-attribute` + `secret-leakage-debug-bundle-otel`. 2026-05-22 (RFC 0041 Phase 4 — replay determinism under nondeterministic models) added three scenarios: `replay-divergence-at-refusal.test.ts` (advertisement-shape probe on `replayDeterminism.refusalDivergenceEmission` + 2 `it.todo` for the dual-direction refusal-divergence case), `replay-observable-sequence-determinism.test.ts` (capability-gated; behavioral assertion soft-skipped until a `conformance-phase4-nondet-tool` fixture ships), `replay-llm-cache-key-portable.test.ts` (intra-host reproducibility + non-recipe-field invariance + Phase 4 advertisement alignment — reuses the existing `POST /v1/host/sample/test/llm-cache-key` seam from the sibling `replay-llm-cache-key.test.ts`). 2026-05-20 (RFC 0027 §A templateKinds-coverage follow-up — paired with `prompt-end-to-end-events.test.ts`) added `prompt-all-four-kinds-events.test.ts` exercising all four `PromptKind` values (`system`, `user`, `schema-hint`, `few-shot`) end-to-end through the reference workflow-engine sample's `local.sample.demo.mock-ai` dispatch path; capability-gated via `behaviorGate('prompts-supported', ...)`. Closes the credibility gap where the host advertised `templateKinds: ["system", "user", "few-shot", "schema-hint"]` but only the system+user pair was actually wired into dispatch. 2026-05-20 (RFCs 0030–0033 — envelope LLM-contract-hardening track) added 15 scenarios across four `Active` RFCs: `envelope-reasoning-shape.test.ts` (RFC 0030, always-on; asserts the OPTIONAL `reasoning` property on the three universal-kind schemas + the `schema.response` deliberate omission), `envelope-reasoning-secret-redaction.test.ts` (RFC 0030, capability-gated on `capabilities.envelopes.reasoning.supported` + `secrets.supported`; 5 `it.todo()` placeholders for SECURITY invariant `envelope-reasoning-secret-redaction`), `envelope-tier-one-subset-static.test.ts` (RFC 0030, always-on for load-bearing rules — no `oneOf` / `allOf` / `not` / `prefixItems` / `propertyNames` anywhere; gated on `tierOneSubsetCompliance: "strict"` for OpenAI-strict-only constraints), `envelope-variant-discriminator-static.test.ts` (RFC 0031, always-on; asserts no `oneOf` + every `anyOf` branch declares a single-string-enum discriminator in `required` on every `schemas/envelopes/*.schema.json`), `model-capability-substituted.test.ts` (RFC 0031, advertisement-shape probe on `capabilities.modelCapabilities.advertised[]` identifier pattern + 5 `it.todo()` placeholders for SECURITY invariant `model-capability-substituted-no-credential-disclosure`), `model-capability-insufficient.test.ts` (RFC 0031, 6 `it.todo()` placeholders for refusal + no-recursive-fallback), `node-module-required-capabilities-shape.test.ts` (RFC 0031 SHOULD-tier authoring-convention; 4 `it.todo()` placeholders), and the six envelope-reliability events from RFC 0032 (`envelope-retry-attempted` carrying the shared advertisement-shape probe enforcing both MUST-tier events in `events[]` per RFC 0032 §C, plus `envelope-retry-exhausted`, `envelope-refusal-shape`, `envelope-truncated`, `envelope-nl-to-format-engaged`, `envelope-recovery-applied` — collectively 39 `it.todo()` placeholders covering retry/refusal/truncation/recovery + SECURITY invariants `envelope-refusal-no-prompt-leak` and `envelope-recovery-no-content-leak`), plus RFC 0033's two scenarios (`envelope-completion-distinguishes-truncation.test.ts` + `envelope-truncation-cap-exhaustion.test.ts` — 12 `it.todo()` placeholders covering the truncation-vs-schema-violation retry-routing distinction + the DoS-bound assertion). Reference workflow-engine sample advertises `capabilities.envelopes.reasoning: { supported: true, promptDirective: "off" }` + `tierOneSubsetCompliance: "warn"` honestly (schemas accept the field; host doesn't yet inject the directive); the other three RFCs' capability blocks defer to reference-host emission code per the staged RFC 0027 §G precedent. 2026-05-20 (RFC 0028 §B Phase B — prompt-pack boot-time install) added `prompt-pack-install.test.ts` (capability-gated on `capabilities.prompts.endpointsSupported: true`; asserts a host that ran the boot-time pack loader surfaces ≥ 1 pack-source template under `GET /v1/prompts?source=pack` carrying the canonical `meta.source: "pack"` + `meta.packName` + `meta.packVersion` stamps; positively identifies the in-tree `vendor.openwop.prompt-sample` reference pack's `writer-system` template when present). Pairs with the new `host/promptPackLoader.ts` boot-time entry on the reference workflow-engine sample, which scans `examples/packs/*` plus `OPENWOP_PROMPT_PACKS_DIR` and calls `installPackTemplates()` for each `kind: "prompt"` pack found. 2026-05-20 (RFC 0029 Phase C — prompt resolution chain wire shape) added three more scenarios: `prompt-resolution-chain-node-wins.test.ts` (capability-gated on `capabilities.prompts.supported: true`; asserts layer-1 node-config supersedes lower layers per `spec/v1/prompts.md` §"Resolution chain (normative)"), `prompt-resolution-chain-agent-intrinsic.test.ts` (additionally gated on `capabilities.prompts.agentBindings: true`; asserts agent intrinsic `systemPromptRef` wins over `promptOverrides` AND lower layers when the node has no layer-1 ref), `prompt-resolution-chain-fallback-cascade.test.ts` (asserts layer 3 workflow-defaults wins over layer 4 host-defaults; layer 4 host-defaults wins when 1-3 yield null; resolved is null when all four yield null but chain[] still lists every attempted layer). The scenarios drive the host's `POST /v1/host/sample/prompt/resolve` test seam (reference-host implementation deferred to follow-up slice per RFC 0021 staging precedent). 2026-05-20 (RFC 0027 Phase A — prompt templates wire shape) added three scenarios: `prompt-template-shape.test.ts` (always-on; Ajv compileability + positive/negative round-trip for PromptTemplate + PromptRef + PromptKind), `prompt-composed-secret-redaction.test.ts` (capability-gated on `capabilities.prompts.supported: true` + `observability: "full"`; asserts `[REDACTED:<secretId>]` markers in `prompt.composed` payloads for `source: "secret"` variable bindings per SECURITY/threat-model-secret-leakage.md §SR-1), `prompt-composed-trust-marker.test.ts` (same capability gates; asserts `<UNTRUSTED>...</UNTRUSTED>` wrapping + `contentTrust: "untrusted"` propagation per RFC 0020 §D). Paired with new `fixtures/prompt-templates/` sub-directory + per-fixture schema-validity describe block + future SECURITY invariants `prompt-composed-secret-redaction` and `prompt-composed-trust-marker` (lands alongside reference-host emission per RFC 0021 staging precedent). 2026-05-18 (RFC 0022 `Draft` — runtime variable mapping) added four `it.todo()` placeholder scenarios covering the new mapping surfaces on `core.dispatch` (§A — `dispatch-input-mapping.test.ts`, `dispatch-output-mapping.test.ts`, `dispatch-cross-worker-handoff.test.ts`) and `core.subWorkflow` (§B — `subworkflow-input-mapping.test.ts`). Gated on `capabilities.agents.dispatchMapping` (dispatch trio) and `capabilities.subWorkflow.inputMapping` (subWorkflow). Promote to live assertions when RFC 0022 reaches `Active` + a reference host advertises the matching flags. 2026-05-17 (RFC 0003 §D handoff-schema enforcement, HV-1) added `agentPackHandoffSchemaValidation.test.ts` — verifies the host validates dispatch payloads against `handoff.taskSchemaRef` AND return payloads against `handoff.returnSchemaRef` per RFC 0003 §D. Paired with the new `agent-pack-handoff-schema-enforcement` row in `SECURITY/invariants.yaml`. 2026-05-17 (AI Envelope gap-closure, DRAFT v1.x — `spec/v1/ai-envelope.md`) added 7 advertisement-shape scenarios with `it.todo()` behavioral placeholders gated on `capabilities.envelopeContracts.advertised: true`: `aiEnvelope.universalKinds.test.ts`, `aiEnvelope.schemaDrift.test.ts`, `aiEnvelope.correlationReplay.test.ts`, `aiEnvelope.contractRefusal.test.ts`, `aiEnvelope.trustBoundaryPropagation.test.ts`, `aiEnvelope.redaction.test.ts`, `aiEnvelope.capBreached.test.ts`. Paired with the new `envelope-redaction-sr-1-carry-forward` row in `SECURITY/invariants.yaml`. 2026-05-17 (post-publish hardening, deep audit of `core.openwop.agents`) added `agents-run-tool-allowlist.test.ts` — server-free scenario locking in the `core.openwop.agents@1.0.1` safety-fix that closes `OPENWOP-AUDIT-2026-003` (function-typed `tool.handler` properties rejected at `validateTools()` with `INVALID_TOOL_DECLARATION`; tool-driven runs require `ctx.agentRuntime`; tool-less safe fallback preserved). Paired with the new `agents-run-no-raw-handler` row in `SECURITY/invariants.yaml`. Same-day post-publish hardening added `idempotency-key-determinism.test.ts` — server-free scenario locking in the `core.openwop.http@1.1.2` determinism safety-fix (default `composite` mode produces deterministic keys in `(runId, nodeId, payload)`; removed `uuid` mode rejects with `CONFIG_INVALID`; cross-impl vector test lets third-party reimplementations verify wire agreement). Paired with the new `idempotency-key-deterministic` row in `SECURITY/invariants.yaml`. 2026-05-17 (Phase 3 of RFC 0013) added three server-free scenarios exercising the reference workflow-chain expansion library (`conformance/src/lib/workflow-chain-expansion.ts`): `workflow-chain-expansion.test.ts` (parameter substitution + node id collision avoidance + edge rewriting + capability propagation + runtime-invariance contract), `workflow-chain-unresolvable-typeid.test.ts` (rejection with `chain_unresolvable_typeid` when a chain references an unknown typeId), and `workflow-chain-pack-signature-verification.test.ts` (Ed25519 verification recipe reuse from `node-packs.md §Signing`). Earlier that day (Phase 1) added `workflow-chain-pack-manifest-validation.test.ts` — server-free schema-validation scenario covering the new `workflow-chain-pack-manifest.schema.json` (positive sample + two negatives: kind/contents mismatch and invalid `chainId`). Closes RFC 0013 (`Workflow-chain packs`, `Draft`) Phases 1 + 3 alongside the new `spec/v1/workflow-chain-packs.md`, the `Capabilities.workflowChainPacks` block, and the registry build-index/conformance-check `kind` routing from Phase 2. Earlier that day, the suite added 27 `it.todo()` placeholder scenarios paired with RFCs 0014-0020 (host capability surfaces — fs, kvStorage, tableStorage, queueBus, sql/vector/search, blob/cache, mcp.serverMount). These promote to live assertions when each RFC reaches `Active` + the matching capability block lands in `schemas/capabilities.schema.json` + a reference host advertises the capability. Earlier additions include 18 Multi-Agent Shift scenarios (Phases 1-5) added 2026-05-10, the `registry-public.test.ts` public-registry healthcheck added 2026-05-11 (opt-in via `OPENWOP_TEST_PUBLIC_REGISTRY=true`), the `replay-llm-cache-key.test.ts` placeholder added 2026-05-11 (three `it.todo()` cases for the cross-host LLM cache-key recipe per `replay.md` §"LLM cache-key recipe"), the two `production-*.test.ts` scenarios added 2026-05-11 for the `openwop-production` profile per RFC 0009 (`production-backpressure.test.ts`, `production-retention-expiry.test.ts`), the four `auth-*.test.ts` scenarios added 2026-05-11/12 for the production-auth profiles per RFC 0010 (`auth-api-key-rotation.test.ts`, `auth-oauth2-client-credentials.test.ts`, `auth-oidc-user-bearer.test.ts`, `auth-mtls.test.ts` (opt-in via `OPENWOP_TEST_MTLS=1`)), `replay-retention-expiry.test.ts` added 2026-05-12 (capability shape + 410/422 envelope per `replay.md` §"Retention and garbage collection"), `bulk-cancel.test.ts` added 2026-05-12 (Phase B close-out of R1 — `POST /v1/runs:bulk-cancel`), the two Phase H launch-blocker advertisement-contract scenarios added 2026-05-12 (`mcp-toolcall-redaction.test.ts` for the MCP-1 invariant per `host-capabilities.md §host.mcp` + `threat-model-prompt-injection.md §UNTRUSTED`, and `http-client-ssrf.test.ts` for the SSRF + body-size cap advertisement contract on `capabilities.httpClient`), the `wasm-pack-abi-version-rejection.test.ts` Track 7 scenario added 2026-05-12 for the ABI-mismatch positive path via the `vendor.openwop.misbehaving-abi` pack per RFC 0008 §H, the `otel-trace-propagation-subworkflow.test.ts` Track 11 close-out added 2026-05-13 (parent + child run spans share the inbound traceparent's traceId across the `core.subWorkflow` dispatch boundary), and the three RFC 0012 (Memory Compaction Profile, `Active`) scenarios added 2026-05-13/14: `memory-compaction-sr1-carry-forward.test.ts` (load-bearing SR-1 §D), `memory-compaction-event-emitted.test.ts` (canonical §B payload shape), and `memory-compaction-provenance-tag.test.ts` (soft assertion on §C `compacted-from:<id>` convention). All three gate on `capabilities.memory.compaction.supported` + the host's test seam at `/v1/test/memory/{seed,compact}` (Postgres reference host enables both via `OPENWOP_MEMORY_COMPACTION=true OPENWOP_TEST_TRIGGER_COMPACTION=true`). 2026-05-15 (gap-closure CF-3) added `interrupt-token-matrix.test.ts` (malformed / unknown / replay / cross-run-id paths on `GET|POST /v1/interrupts/{token}`). 2026-05-31 (RFC 0078 portable tool catalog + RFC 0079 credential provenance / egress policy — the Active→Accepted behavioral gate) added four: `tool-catalog-projection.test.ts` (capability-gated on `toolCatalog.supported` via `behaviorGate('openwop-tool-catalog', …)` — the NORMATIVE `GET /v1/tools` list with each `ToolDescriptor` schema-valid + `source`/`safetyTier` in the closed vocab + content-free, `GET /v1/tools/{toolId}` round-trip + unknown-id 404, 401-unauthenticated, and the §F-2 cross-principal non-disclosure; black-box, no POST seam), `tool-session-lifecycle.test.ts` (gated on `toolCatalog.sessionLifecycle` — the §D `tool.session.opened`-before / `tool.session.closed`-after bracket over the RFC 0064 call events via the `POST /v1/host/sample/tools/session-run` seam, one shared `sessionId`, content-free), `egress-audience-binding.test.ts` (KEYSTONE — gated on `httpClient.egressPolicy.supported`; the §C confused-deputy MUST via `POST /v1/host/sample/egress/decide`: an out-of-audience egress is denied/downgraded with the credential NOT attached, a provenance-unevaluable egress fails closed — the behavioral leg of `egress-credential-audience-bound`), and `egress-decision-content-free.test.ts` (the SR-1 canary — the credential value never surfaces in `egress.decided` and `reason` stays in the CLOSED vocabulary). The maintained scenario-to-spec map lives in [`coverage.md`](./coverage.md); this README keeps the operator quickstart and the historical scenario notes below.
97
97
 
98
98
  High-level coverage includes:
99
99
 
@@ -172,7 +172,7 @@ Server-required (added in 1.7.0):
172
172
  |---|---|---|
173
173
  | **Redaction** | [`capabilities.md`](../spec/v1/capabilities.md) §"Secrets" + NFR-7 + §"aiProviders" | Vendor-neutral assertions that the server doesn't leak secret material. Three scenario groups: (a) discovery shape contract — `secrets` + `aiProviders` advertisements are well-formed regardless of `secrets.supported`; when `supported === true`, scopes MUST be non-empty + `resolution === 'host-managed'`; `byok ⊆ supported`. (b) bearer-token redaction — invalid Bearer canary in `Authorization` header is not echoed in the 401 response body. (c) credentialRef echo control — gated on `secrets.supported === true`; canary planted in `configurable.ai.credentialRef` MUST NOT appear in any RunEvent payload (poll-based capture; transport-agnostic). Uses runtime-built canary fixtures (`lib/canaries.ts`) that defeat static secret scanners. 6 scenarios. |
174
174
 
175
- Current source tree: 324 scenario files. Use [`coverage.md`](./coverage.md) for current grade/gap tracking.
175
+ Current source tree: 330 scenario files. Use [`coverage.md`](./coverage.md) for current grade/gap tracking.
176
176
 
177
177
  ## Remaining Gaps
178
178
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@openwop/openwop-conformance",
3
- "version": "1.19.0",
3
+ "version": "1.21.0",
4
4
  "description": "Production-ready black-box conformance suite for OpenWOP v1.0 compliant servers.",
5
5
  "repository": {
6
6
  "type": "git",
@@ -41,7 +41,7 @@
41
41
  "dependencies": {
42
42
  "ajv": "^8.17.0",
43
43
  "ajv-formats": "^3.0.1",
44
- "vitest": "^3.0.0"
44
+ "vitest": "^4.0.0"
45
45
  },
46
46
  "devDependencies": {
47
47
  "typescript": "^5.6.0",
@@ -51,6 +51,12 @@
51
51
  "items": { "type": "string", "minLength": 1 },
52
52
  "description": "Tool identifiers the agent MAY invoke. Format: `<scope>:<tool-id>` where scope is `openwop:` (Core/protocol tools), `mcp:` (MCP-namespaced tools), or `<vendor>.<host>` for host-extension tools. Hosts MUST enforce this allowlist when dispatching the agent (see RFC 0002 §A14 mapping for the reference-impl `ToolPermissionService.filterTools(harnessRole)` derivation pattern). Hosts MAY treat absence as `[]` (no tools) or as `*` (host-default tool surface) — that policy is host-configured."
53
53
  },
54
+ "requiresCapabilities": {
55
+ "type": "array",
56
+ "uniqueItems": true,
57
+ "items": { "type": "string", "minLength": 1 },
58
+ "description": "RFC 0092. Host-capability keys this agent needs to run fully — the agent-layer analogue of a node-pack's `peerDependencies`. Dotted identifiers from the discovery vocabulary (`capabilities.md` / `host-capabilities.md`), e.g. `host.workspace`, `aiProviders.toolCalling`, `multiAgent.executionModel.verifier`. A host that does NOT advertise a listed key MUST surface this agent as degraded on `GET /v1/agents` via the existing `degraded[]` inventory field (RFC 0072 §C); it MAY still dispatch at the RFC 0070 floor, but a silent satisfied-looking entry is non-conformant. Advisory about NEED only — never widens the agent's authority. Absent ⇒ no declared requirements."
59
+ },
54
60
  "memoryShape": {
55
61
  "type": "object",
56
62
  "description": "Declares which memory backends the agent reads/writes. Hosts that advertise `capabilities.agents.memoryBackends` MAY filter manifests during install based on these flags. The full memory-shape contract lands in RFC 0004 (Phase 3); RFC 0003 only declares the descriptor field and reserves `longTerm: true` as the redaction-harness trigger.",
@@ -553,8 +553,18 @@
553
553
  "version": {
554
554
  "type": "integer",
555
555
  "minimum": 1,
556
- "maximum": 5,
557
- "description": "Profile version. 1 = Phase 1 (execution-loop framework + planner→worker handoff). 2 = Phase 2 (confidence-floor escalation + agent-memory lifecycle, RFC 0039). 3 = Phase 3 (cross-host causation, RFC 0040). 4 = Phase 4 (replay determinism under nondeterministic models, RFC 0041). 5 = Phase 5 (stateful agent-loop lifecycle — per-iteration workspace+memory snapshot inputs, the observable `iteration` counter on `runOrchestrator.decided`, and stateful HITL resume, RFC 0061). A host advertising `version: N` MUST implement all phases 1..N additively."
556
+ "maximum": 6,
557
+ "description": "Profile version. 1 = Phase 1 (execution-loop framework + planner→worker handoff). 2 = Phase 2 (confidence-floor escalation + agent-memory lifecycle, RFC 0039). 3 = Phase 3 (cross-host causation, RFC 0040). 4 = Phase 4 (replay determinism under nondeterministic models, RFC 0041). 5 = Phase 5 (stateful agent-loop lifecycle — per-iteration workspace+memory snapshot inputs, the observable `iteration` counter on `runOrchestrator.decided`, and stateful HITL resume, RFC 0061). 6 = Phase 6 (verifier/critic turn — the `agent.verified` event + `successCriteria` on the `terminate` decision, RFC 0090). A host advertising `version: N` MUST implement all phases 1..N additively."
558
+ },
559
+ "verifier": {
560
+ "type": "object",
561
+ "additionalProperties": false,
562
+ "required": ["supported"],
563
+ "description": "RFC 0090 (`version >= 6`). The verifier/critic turn: the host emits `agent.verified` over a prior result and (optionally) gates commit on the verdict. Absent ⇒ no verifier turn; conformance scenarios soft-skip.",
564
+ "properties": {
565
+ "supported": { "type": "boolean", "description": "Host emits `agent.verified` and honors RFC 0090 §A. Applies only when `version >= 6`." },
566
+ "gating": { "type": "boolean", "description": "Host enforces the RFC 0090 §B commit-gating contract: a `fail` verdict blocks merge/terminate (fail-closed, composing RFC 0063); `revise` routes back to an actor turn. Absent/false ⇒ the verdict is observability-only." }
567
+ }
558
568
  },
559
569
  "confidenceEscalationFloor": {
560
570
  "type": "number",
@@ -686,6 +696,20 @@
686
696
  "uniqueItems": true,
687
697
  "description": "Subset of `supported` for which BYOK is permitted. Empty array → all calls use platform-managed keys; non-empty → clients MAY pass `ai.credentialRef` in `RunOptions.configurable` for matching providers."
688
698
  },
699
+ "input": {
700
+ "type": "object",
701
+ "additionalProperties": false,
702
+ "description": "RFC 0091. Multimodal PERCEPTION input on `ctx.callAI` — the modalities a `callAI` message ContentPart may carry as model INPUT. Absent ⇒ text-only (today's behavior); a `string` message content is always valid. Distinct from `imageGeneration` (output) and the `ai-envelope.md` media emission types (output).",
703
+ "properties": {
704
+ "modalities": {
705
+ "type": "array",
706
+ "uniqueItems": true,
707
+ "items": { "type": "string", "enum": ["text", "image", "audio", "document"] },
708
+ "description": "Input modalities the host's `callAI` accepts as ContentParts. `text` is implicit even when omitted. A ContentPart whose `type` is not advertised here MUST be rejected with `unsupported_modality` (never silently dropped)."
709
+ },
710
+ "maxBytesPerPart": { "type": "integer", "minimum": 1, "description": "Optional host cap on a single inline (`data`) or `mediaRef` part." }
711
+ }
712
+ },
689
713
  "authModes": {
690
714
  "type": "object",
691
715
  "description": "RFC 0067 (`Draft`). Optional per-provider advertisement of HOW the host expects a provider's credential to be supplied. Keys are provider ids appearing in `supported`; values are the auth modes the host honors for that provider. Absent ⇒ no advertisement: a provider in `byok` defaults to `apiKey` semantics (client passes `ai.credentialRef`); a provider in `supported` but not in `byok` defaults to `none` (platform-managed). This map only DESCRIBES the supply mechanism — `oauth-pkce`/`oauth-device` flow mechanics compose RFC 0047 `host.oauth` and resolve credentials by `ref` (RFC 0046), never on `ai.credentialRef`. A provider with `apiKey` MUST appear in `byok`; a provider whose modes are exactly `[\"none\"]` MUST NOT appear in `byok`. Consumers MUST ignore an auth mode they don't recognize rather than reject the discovery doc.",
@@ -64,6 +64,19 @@
64
64
  "minimum": 0,
65
65
  "maximum": 1,
66
66
  "description": "RFC 0039 §A — supervisor's stated confidence in this termination decision, in [0, 1]. Optional; absent means 'no opinion stated' (NOT low confidence). Hosts advertising `capabilities.multiAgent.executionModel.version >= 2` MUST honor a confidence floor as documented on NextWorkerDecision.confidence above; the same MUST applies to TerminateDecision because a confidently-wrong terminate is the same class of failure as a confidently-wrong dispatch."
67
+ },
68
+ "successCriteria": {
69
+ "type": "array",
70
+ "description": "RFC 0090 (`multiAgent.executionModel.version >= 6`). Optional structured convergence record — the conditions the supervisor judged when terminating. Content-free: criterion keys + booleans only, never result content. When present, a terminate with any `met: false` entry signals a give-up (NOT goal-satisfied); consumers MUST NOT treat such a run as a success. Absent ⇒ today's free-text `reason` semantics, unchanged.",
71
+ "items": {
72
+ "type": "object",
73
+ "additionalProperties": false,
74
+ "required": ["key", "met"],
75
+ "properties": {
76
+ "key": { "type": "string", "minLength": 1, "description": "Criterion identifier (e.g. `goal-answered`)." },
77
+ "met": { "type": "boolean", "description": "Whether the supervisor judged this criterion satisfied." }
78
+ }
79
+ }
67
80
  }
68
81
  },
69
82
  "additionalProperties": false
@@ -2,7 +2,7 @@
2
2
  "$schema": "https://json-schema.org/draft/2020-12/schema",
3
3
  "$id": "https://openwop.dev/spec/v1/run-event-payloads.schema.json",
4
4
  "title": "RunEventPayloads",
5
- "description": "Per-RunEventType payload schemas. The base RunEventDoc shape (run-event.schema.json) leaves `payload` permissive for forward-compat. This schema defines the canonical payload contract for each known RunEventType. Consumers MAY pin strict payload validation via `$defs.<typeId>` and `ajv.validate(schema.$defs[event.type], event.payload)`. Unknown event types MUST be tolerated (no $defs match → fold best-effort).\n\n94 variants from `run-event.schema.json#$defs.RunEventType` are covered, grouped into ~20 shape families with shared $defs. Naming convention: camelCase keys mirror dotted RunEventType names (e.g., `run.started` → `runStarted`).",
5
+ "description": "Per-RunEventType payload schemas. The base RunEventDoc shape (run-event.schema.json) leaves `payload` permissive for forward-compat. This schema defines the canonical payload contract for each known RunEventType. Consumers MAY pin strict payload validation via `$defs.<typeId>` and `ajv.validate(schema.$defs[event.type], event.payload)`. Unknown event types MUST be tolerated (no $defs match → fold best-effort).\n\n95 variants from `run-event.schema.json#$defs.RunEventType` are covered, grouped into ~20 shape families with shared $defs. Naming convention: camelCase keys mirror dotted RunEventType names (e.g., `run.started` → `runStarted`).",
6
6
  "type": "object",
7
7
  "$defs": {
8
8
  "_typeIndex": {
@@ -70,6 +70,7 @@
70
70
  "agent.toolReturned": { "$ref": "#/$defs/agentToolReturned" },
71
71
  "agent.handoff": { "$ref": "#/$defs/agentHandoff" },
72
72
  "agent.decided": { "$ref": "#/$defs/agentDecided" },
73
+ "agent.verified": { "$ref": "#/$defs/agentVerified" },
73
74
  "runOrchestrator.decided": { "$ref": "#/$defs/runOrchestratorDecided" },
74
75
  "node.dispatched": { "$ref": "#/$defs/nodeDispatched" },
75
76
  "conversation.opened": { "$ref": "#/$defs/conversationOpened" },
@@ -1254,6 +1255,21 @@
1254
1255
  "additionalProperties": true
1255
1256
  },
1256
1257
 
1258
+ "agentVerified": {
1259
+ "type": "object",
1260
+ "description": "RFC 0090 (`multiAgent.executionModel.version >= 6`). A critic agent's independent verdict over a prior result, emitted before the result is committed/merged. Content-free: names the target + verdict + (optional) criteria KEYS, never the verified content (SECURITY invariant `verifier-no-content-leak`). A `fail` verdict on a `verifier.gating: true` host MUST block the merge/terminate; `revise` SHOULD route back to another actor turn (bounded by `maxLoopIterations`, RFC 0058).",
1261
+ "required": ["agentId", "target", "verdict"],
1262
+ "properties": {
1263
+ "agentId": { "type": "string", "minLength": 3, "maxLength": 256, "description": "AgentRef.agentId of the verifying (critic) agent. SHOULD differ from the agent whose work is checked; a host MAY allow self-verification but MUST keep verifier identity inspectable." },
1264
+ "target": { "type": "string", "minLength": 1, "description": "Opaque reference to what was checked — the eventId of the verified `agent.decided`, a child runId, or a tool callId. Chains the verdict to its subject." },
1265
+ "verdict": { "type": "string", "enum": ["pass", "fail", "revise"], "description": "`pass`: acceptable, MAY commit/terminate. `fail`: rejected; a `verifier.gating` host MUST NOT commit it. `revise`: needs another actor turn; SHOULD route back, not terminate." },
1266
+ "criteria": { "type": "array", "items": { "type": "string", "minLength": 1 }, "uniqueItems": true, "description": "Optional closed list of the criteria KEYS evaluated (e.g. `[\"schema-valid\",\"grounded\"]`). Keys only — never per-criterion verdict text — for SIEM safety." },
1267
+ "confidence": { "type": "number", "minimum": 0, "maximum": 1, "description": "Optional verifier confidence in `[0,1]` — the critic's confidence in its OWN verdict, distinct from the actor's `agent.decided.confidence`. MAY drive the RFC 0039 escalation contract." },
1268
+ "causationHostId": { "type": "string", "minLength": 1, "description": "RFC 0040 §A — cross-host causation pointer when the verified target lives on a different host." }
1269
+ },
1270
+ "additionalProperties": false
1271
+ },
1272
+
1257
1273
  "runOrchestratorDecided": {
1258
1274
  "type": "object",
1259
1275
  "description": "Multi-Agent Shift Phase 5. Emitted exactly once per orchestrator decision by a `core.orchestrator.supervisor` node (or host-extension equivalent). Carries the deciding agent's identity and the typed `OrchestratorDecision` (see `orchestrator-decision.schema.json`). The envelope's top-level `nodeId` carries the supervisor node-id; the payload does NOT duplicate it.",
@@ -123,6 +123,7 @@
123
123
  "agent.toolReturned",
124
124
  "agent.handoff",
125
125
  "agent.decided",
126
+ "agent.verified",
126
127
  "agent.invocation.started",
127
128
  "agent.invocation.completed",
128
129
  "eval.started",
@@ -0,0 +1,108 @@
1
+ /**
2
+ * Agent capability-requirement degraded projection (RFC 0092 §B) — behavioral.
3
+ *
4
+ * Gated on `capabilities.agents.manifestRuntime` (root-first per RFC 0073).
5
+ * Soft-skips when unadvertised (default) / hard-fails under
6
+ * `OPENWOP_REQUIRE_BEHAVIOR=true`. The always-on wire-shape coverage lives in
7
+ * `agent-requires-capabilities-shape.test.ts` (the `requiresCapabilities[]`
8
+ * field); this asserts host BEHAVIOR on the NORMATIVE `GET /v1/agents` inventory:
9
+ *
10
+ * §B iff-contract — when an agent's `requiresCapabilities[]` names a key the
11
+ * host does NOT advertise, that key MUST appear in the inventory entry's
12
+ * `degraded[]` (the canonical RFC 0072 §C field — NOT a `degradedCapabilities`
13
+ * field). An agent whose requirements are all satisfied MUST omit `degraded`
14
+ * (or carry it empty). `degraded[]` members are unique, non-empty strings.
15
+ *
16
+ * Non-vacuity — the inventory MUST be non-empty. When
17
+ * `OPENWOP_DEGRADED_CAPABILITY_AGENT_ID` names an agent the host knows is
18
+ * capability-degraded (its `requiresCapabilities` names an unadvertised key),
19
+ * the degraded branch is asserted NON-VACUOUSLY against that agent.
20
+ *
21
+ * Black-box on the normative path — no POST seam.
22
+ *
23
+ * Spec references:
24
+ * - https://github.com/openwop/openwop/blob/main/RFCS/0092-agent-capability-requirements.md
25
+ * - https://github.com/openwop/openwop/blob/main/RFCS/0072-agent-inventory-and-dispatch.md (§C degraded[])
26
+ */
27
+
28
+ import { describe, it, expect } from 'vitest';
29
+ import { driver } from '../lib/driver.js';
30
+ import { behaviorGate } from '../lib/behavior-gate.js';
31
+ import { readManifestRuntimeCap, listManifestAgents } from '../lib/agentRuntime.js';
32
+
33
+ interface InventoryEntry {
34
+ agentId?: string;
35
+ requiresCapabilities?: unknown;
36
+ degraded?: unknown;
37
+ [k: string]: unknown;
38
+ }
39
+
40
+ describe('agent-capability-degraded-projection (RFC 0092 §B)', () => {
41
+ it('surfaces unmet requiresCapabilities in degraded[] and nothing on the rest', async () => {
42
+ const mr = await readManifestRuntimeCap();
43
+ if (!behaviorGate('openwop-agent-capability-degraded', mr?.supported === true)) return;
44
+
45
+ const inv = await listManifestAgents();
46
+ if (inv === null) return; // cap advertised but /v1/agents unserved — soft-skip
47
+ const agents = (inv.agents ?? []) as InventoryEntry[];
48
+
49
+ // Non-vacuity: an advertising + serving host MUST expose its inventory.
50
+ expect(
51
+ agents.length >= 1,
52
+ driver.describe('RFC 0072 §A', 'GET /v1/agents MUST return the installed manifest agents'),
53
+ ).toBe(true);
54
+
55
+ // §B well-formedness on EVERY entry: degraded[], when present, is a unique
56
+ // list of non-empty strings.
57
+ for (const a of agents) {
58
+ const d = a.degraded;
59
+ if (d !== undefined) {
60
+ expect(
61
+ Array.isArray(d),
62
+ driver.describe('agent-inventory-response.schema.json', `degraded MUST be an array when present (agent ${a.agentId})`),
63
+ ).toBe(true);
64
+ if (Array.isArray(d)) {
65
+ for (const k of d) {
66
+ expect(
67
+ typeof k === 'string' && k.length > 0,
68
+ driver.describe('RFC 0072 §C', `degraded[] members MUST be non-empty strings (agent ${a.agentId}, got ${String(k)})`),
69
+ ).toBe(true);
70
+ }
71
+ expect(
72
+ new Set(d as string[]).size === d.length,
73
+ driver.describe('RFC 0092 §B', `degraded[] MUST be unique (agent ${a.agentId})`),
74
+ ).toBe(true);
75
+ }
76
+ }
77
+ }
78
+
79
+ // Non-vacuous degraded branch when the host names a known capability-degraded agent.
80
+ const degradedId = process.env.OPENWOP_DEGRADED_CAPABILITY_AGENT_ID;
81
+ if (degradedId) {
82
+ const target = agents.find((a) => a.agentId === degradedId);
83
+ expect(
84
+ target !== undefined,
85
+ driver.describe('RFC 0092 §B', `OPENWOP_DEGRADED_CAPABILITY_AGENT_ID=${degradedId} MUST appear in the inventory`),
86
+ ).toBe(true);
87
+ if (target) {
88
+ const d = target.degraded;
89
+ expect(
90
+ Array.isArray(d) && d.length >= 1,
91
+ driver.describe('RFC 0092 §B', 'the named capability-degraded agent MUST carry a non-empty degraded[]'),
92
+ ).toBe(true);
93
+ // When the entry also exposes requiresCapabilities, every degraded key
94
+ // MUST be one the agent actually requires (the projection is a subset of
95
+ // requirements, never invented).
96
+ if (Array.isArray(d) && Array.isArray(target.requiresCapabilities)) {
97
+ const req = new Set(target.requiresCapabilities as unknown[]);
98
+ for (const k of d) {
99
+ expect(
100
+ req.has(k),
101
+ driver.describe('RFC 0092 §B', `a degraded[] key MUST be one the agent requires (got ${String(k)})`),
102
+ ).toBe(true);
103
+ }
104
+ }
105
+ }
106
+ }
107
+ });
108
+ });
@@ -45,6 +45,10 @@ import { readDeploymentCap, driveDeploymentTransition } from '../lib/agentDeploy
45
45
 
46
46
  const FIXTURE_ID = 'conformance-agent-channel-dispatch';
47
47
  const BOUND_CHANNEL = 'stable';
48
+ /** The agentId the fixture's node binds (`agent.channel`). Leg 3 promotes THIS
49
+ * agent's channel head — not the deployment-transition seam's default sample
50
+ * agent — so the move is observable to a fresh run of the fixture. */
51
+ const BOUND_AGENT_ID = 'core.conformance.channel-agent';
48
52
  const HTTP_SKIP = !process.env.OPENWOP_BASE_URL;
49
53
 
50
54
  interface RunEventDoc {
@@ -125,12 +129,12 @@ describe.skipIf(HTTP_SKIP)('agent-channel-dispatch (RFC 0082 §B): production ru
125
129
  ),
126
130
  ).toBe(true);
127
131
  expect(
128
- started!.payload.resolvedChannel === BOUND_CHANNEL,
132
+ started!.payload.resolvedChannel,
129
133
  driver.describe(
130
134
  'agent-deployment.md §B',
131
135
  `agent.invocation.started MUST carry the bound channel as resolvedChannel ("${BOUND_CHANNEL}")`,
132
136
  ),
133
- ).toBe(true);
137
+ ).toBe(BOUND_CHANNEL);
134
138
  const pinnedVersion = started!.payload.resolvedAgentVersion;
135
139
  expect(
136
140
  typeof pinnedVersion === 'string' && (pinnedVersion as string).length > 0,
@@ -162,20 +166,29 @@ describe.skipIf(HTTP_SKIP)('agent-channel-dispatch (RFC 0082 §B): production ru
162
166
  driver.describe('agent-deployment.md §B', 'a replay fork MUST re-emit agent.invocation.started'),
163
167
  ).toBe(true);
164
168
  expect(
165
- fork1Started!.payload.resolvedAgentVersion === pinnedVersion,
169
+ fork1Started!.payload.resolvedAgentVersion,
166
170
  driver.describe(
167
171
  'agent-deployment.md §B',
168
172
  'a replay MUST re-read the recorded resolvedAgentVersion (NOT re-resolve the channel)',
169
173
  ),
170
- ).toBe(true);
174
+ ).toBe(pinnedVersion);
171
175
 
172
176
  // ---- Leg 3 (seam-guarded): move the channel, prove non-re-resolution -
173
177
  // The strongest form of §B: after the original pin, MOVE `stable` to a new
174
178
  // active version via the optional deployment seam. A replay fork of the
175
179
  // ORIGINAL run MUST still carry the ORIGINAL pin — proving the host re-reads
176
180
  // the recorded fact rather than re-resolving the (now-moved) channel.
181
+ //
182
+ // TEST ISOLATION: this promotes the bound agent's `stable` head via the
183
+ // conformance-only seam and does NOT roll it back — the seam exposes no
184
+ // rollback primitive (see agentDeployment.ts: scenarios are promote /
185
+ // unauthorized / eval-gate-unmet / channel-pin, no `rollback`). Scenarios are
186
+ // independent and the seam is conformance-only (404/403 in production), so the
187
+ // un-restored head is benign; hosts SHOULD nonetheless run the suite against an
188
+ // isolated/ephemeral deployment store rather than shared production state.
177
189
  const moved = await driveDeploymentTransition({
178
190
  scenario: 'promote',
191
+ agentId: BOUND_AGENT_ID,
179
192
  channel: BOUND_CHANNEL,
180
193
  });
181
194
  if (moved === null) {
@@ -212,18 +225,18 @@ describe.skipIf(HTTP_SKIP)('agent-channel-dispatch (RFC 0082 §B): production ru
212
225
  await pollUntilTerminal(fork2RunId, { timeoutMs: 15_000 });
213
226
  const fork2Started = await firstInvocationStarted(fork2RunId);
214
227
  expect(
215
- fork2Started?.payload.resolvedAgentVersion === pinnedVersion,
228
+ fork2Started?.payload.resolvedAgentVersion,
216
229
  driver.describe(
217
230
  'agent-deployment.md §B',
218
231
  'after the channel moves, a replay of the original run MUST still carry the ORIGINAL pin — never re-resolving the moved channel',
219
232
  ),
220
- ).toBe(true);
233
+ ).toBe(pinnedVersion);
221
234
  expect(
222
- fork2Started?.payload.resolvedAgentVersion !== movedVersion,
235
+ fork2Started?.payload.resolvedAgentVersion,
223
236
  driver.describe(
224
237
  'agent-deployment.md §B',
225
238
  'a replay MUST NOT resolve to the post-move version (proves the recorded fact is re-read, not re-resolved)',
226
239
  ),
227
- ).toBe(true);
240
+ ).not.toBe(movedVersion);
228
241
  });
229
242
  });
@@ -0,0 +1,64 @@
1
+ /**
2
+ * Agent-level capability requirements (RFC 0092).
3
+ *
4
+ * Always-on, server-free schema-shape probe of the additive
5
+ * `AgentManifest.requiresCapabilities[]`: a manifest WITH it validates, a
6
+ * manifest WITHOUT it still validates (absent ⇒ no requirements), and a
7
+ * non-string-array value is rejected.
8
+ *
9
+ * The degraded-projection behavior (an unmet key surfaced in the inventory
10
+ * `degraded[]`) is gated on `agents.manifestRuntime` and lands with a reference
11
+ * host (RFC 0092 §Conformance — deferred to Active → Accepted).
12
+ *
13
+ * Spec references:
14
+ * - https://github.com/openwop/openwop/blob/main/RFCS/0092-agent-capability-requirements.md
15
+ * - https://github.com/openwop/openwop/blob/main/RFCS/0072-agent-inventory-and-dispatch.md (the degraded[] marker)
16
+ */
17
+
18
+ import { describe, it, expect } from 'vitest';
19
+ import { readFileSync, readdirSync } from 'node:fs';
20
+ import { join } from 'node:path';
21
+ import Ajv2020 from 'ajv/dist/2020.js';
22
+ import addFormats from 'ajv-formats';
23
+ import { SCHEMAS_DIR } from '../lib/paths.js';
24
+
25
+ const BASE = 'https://openwop.dev/spec/v1/';
26
+ const why = (specRef: string, requirement: string): string => `${specRef} — ${requirement}`;
27
+ function loadSchema(name: string): Record<string, unknown> {
28
+ return JSON.parse(readFileSync(join(SCHEMAS_DIR, name), 'utf8')) as Record<string, unknown>;
29
+ }
30
+
31
+ describe('agent-requires-capabilities-shape: AgentManifest.requiresCapabilities (RFC 0092 §A, server-free)', () => {
32
+ const ajv = new Ajv2020({ strict: false, allErrors: true });
33
+ addFormats(ajv);
34
+ for (const f of readdirSync(SCHEMAS_DIR)) {
35
+ if (f.endsWith('.schema.json')) {
36
+ try {
37
+ ajv.addSchema(loadSchema(f));
38
+ } catch {
39
+ /* duplicate/ignore */
40
+ }
41
+ }
42
+ }
43
+ const manifest = ajv.getSchema(`${BASE}agent-manifest.schema.json`)!;
44
+
45
+ const base = { agentId: 'core.openwop.agents.demo', persona: 'Demo', modelClass: 'general', systemPrompt: 'do it' };
46
+
47
+ it('a manifest WITH requiresCapabilities validates', () => {
48
+ expect(
49
+ manifest({ ...base, requiresCapabilities: ['host.workspace', 'aiProviders.toolCalling'] }),
50
+ why('RFC 0092 §A', 'a manifest declaring requiresCapabilities MUST validate'),
51
+ ).toBe(true);
52
+ });
53
+
54
+ it('a manifest WITHOUT it still validates (absent ⇒ no requirements)', () => {
55
+ expect(manifest(base), why('RFC 0092 §A', 'requiresCapabilities is optional')).toBe(true);
56
+ });
57
+
58
+ it('rejects a non-string-array value', () => {
59
+ expect(
60
+ manifest({ ...base, requiresCapabilities: [123] }),
61
+ why('RFC 0092 §A', 'requiresCapabilities items MUST be non-empty strings'),
62
+ ).toBe(false);
63
+ });
64
+ });
@@ -0,0 +1,106 @@
1
+ /**
2
+ * Agent verifier turn + convergence criteria (RFC 0090).
3
+ *
4
+ * Always-on, server-free schema-shape probe. Verifies:
5
+ * - the `agentVerified` payload $def validates a content-free verdict and
6
+ * rejects an out-of-enum `verdict` and a content-carrying payload
7
+ * (additionalProperties:false — `verifier-no-content-leak`);
8
+ * - `agent.verified` appears in the RunEventType enum;
9
+ * - the `terminate` OrchestratorDecision accepts the additive `successCriteria`;
10
+ * - `capabilities.multiAgent.executionModel` accepts `version: 6` + the
11
+ * `verifier { supported, gating }` sub-block (and rejects version 7).
12
+ *
13
+ * Behavioral assertions (a `verifier.gating` host blocking a merge on `fail`)
14
+ * are gated on `capabilities.multiAgent.executionModel.verifier.gating` and land
15
+ * with a reference host (RFC 0090 §Conformance — deferred to Active → Accepted).
16
+ *
17
+ * Spec references:
18
+ * - https://github.com/openwop/openwop/blob/main/RFCS/0090-agent-verifier-and-convergence.md
19
+ * - https://github.com/openwop/openwop/blob/main/spec/v1/multi-agent-execution.md
20
+ */
21
+
22
+ import { describe, it, expect } from 'vitest';
23
+ import { readFileSync, readdirSync } from 'node:fs';
24
+ import { join } from 'node:path';
25
+ import Ajv2020 from 'ajv/dist/2020.js';
26
+ import addFormats from 'ajv-formats';
27
+ import { SCHEMAS_DIR } from '../lib/paths.js';
28
+
29
+ const BASE = 'https://openwop.dev/spec/v1/';
30
+ const why = (specRef: string, requirement: string): string => `${specRef} — ${requirement}`;
31
+ function loadSchema(name: string): Record<string, unknown> {
32
+ return JSON.parse(readFileSync(join(SCHEMAS_DIR, name), 'utf8')) as Record<string, unknown>;
33
+ }
34
+ /** Register every corpus schema so relative cross-file $refs resolve. */
35
+ function newAjvWithCorpus(): Ajv2020 {
36
+ const ajv = new Ajv2020({ strict: false, allErrors: true });
37
+ addFormats(ajv);
38
+ for (const f of readdirSync(SCHEMAS_DIR)) {
39
+ if (f.endsWith('.schema.json')) {
40
+ try {
41
+ ajv.addSchema(loadSchema(f));
42
+ } catch {
43
+ /* duplicate/ignore */
44
+ }
45
+ }
46
+ }
47
+ return ajv;
48
+ }
49
+
50
+ describe('agent-verifier-shape: agent.verified payload (RFC 0090 §A, server-free)', () => {
51
+ const ajv = newAjvWithCorpus();
52
+ const verified = ajv.getSchema(`${BASE}run-event-payloads.schema.json#/$defs/agentVerified`);
53
+
54
+ it('a conforming content-free verdict validates', () => {
55
+ expect(verified, 'the agentVerified $def MUST exist').toBeTruthy();
56
+ expect(
57
+ verified!({ agentId: 'core.openwop.verifier', target: 'evt-42', verdict: 'pass', criteria: ['grounded'], confidence: 0.9 }),
58
+ why('RFC 0090 §A', 'a conforming agent.verified payload MUST validate'),
59
+ ).toBe(true);
60
+ });
61
+
62
+ it('rejects an out-of-enum verdict', () => {
63
+ expect(verified!({ agentId: 'c', target: 'e', verdict: 'ok' }), why('RFC 0090 §A', 'verdict MUST be pass|fail|revise')).toBe(false);
64
+ });
65
+
66
+ it('rejects a content-carrying payload (verifier-no-content-leak)', () => {
67
+ expect(
68
+ verified!({ agentId: 'c', target: 'e', verdict: 'fail', result: 'the secret answer' }),
69
+ why('RFC 0090 §SECURITY', 'agent.verified MUST be content-free (additionalProperties:false)'),
70
+ ).toBe(false);
71
+ });
72
+ });
73
+
74
+ describe('agent-verifier-shape: RunEventType + terminate + capability (RFC 0090)', () => {
75
+ const ajv = newAjvWithCorpus();
76
+
77
+ it('agent.verified is registered in the RunEventType enum', () => {
78
+ const runEvent = loadSchema('run-event.schema.json') as { $defs?: { RunEventType?: { enum?: string[] } } };
79
+ expect(
80
+ runEvent.$defs?.RunEventType?.enum?.includes('agent.verified'),
81
+ why('RFC 0090 §A', 'agent.verified MUST appear in the RunEventType enum'),
82
+ ).toBe(true);
83
+ });
84
+
85
+ it('the terminate decision accepts the additive successCriteria', () => {
86
+ const decision = ajv.getSchema(`${BASE}orchestrator-decision.schema.json`)!;
87
+ expect(
88
+ decision({ kind: 'terminate', reason: 'goal-reached', successCriteria: [{ key: 'goal-answered', met: true }] }),
89
+ why('RFC 0090 §C', 'terminate MUST accept successCriteria[{key,met}]'),
90
+ ).toBe(true);
91
+ expect(
92
+ decision({ kind: 'terminate', successCriteria: [{ key: 'x' }] }),
93
+ why('RFC 0090 §C', 'a successCriteria entry MUST require both key and met'),
94
+ ).toBe(false);
95
+ });
96
+
97
+ it('capabilities accepts executionModel.version 6 + verifier sub-block', () => {
98
+ const execModel = ajv.getSchema(`${BASE}capabilities.schema.json#/properties/multiAgent/properties/executionModel`);
99
+ expect(execModel, 'the executionModel sub-schema MUST exist').toBeTruthy();
100
+ expect(
101
+ execModel!({ supported: true, version: 6, verifier: { supported: true, gating: true } }),
102
+ why('RFC 0090 §D', 'version:6 + verifier{supported,gating} MUST validate'),
103
+ ).toBe(true);
104
+ expect(execModel!({ supported: true, version: 7 }), why('RFC 0090 §D', 'version above the ceiling MUST be rejected')).toBe(false);
105
+ });
106
+ });
@@ -0,0 +1,69 @@
1
+ /**
2
+ * Multimodal perception input (RFC 0091).
3
+ *
4
+ * Always-on, server-free schema-shape probe of the additive
5
+ * `capabilities.aiProviders.input` block: the `modalities` enum is closed
6
+ * (text/image/audio/document) and `maxBytesPerPart` is a positive integer.
7
+ *
8
+ * Behavioral assertions (a host accepting an image ContentPart on callAI and
9
+ * rejecting an unadvertised modality with `unsupported_modality`) are gated on
10
+ * `aiProviders.input.modalities` and land with a reference host
11
+ * (RFC 0091 §Conformance — deferred to Active → Accepted).
12
+ *
13
+ * Spec references:
14
+ * - https://github.com/openwop/openwop/blob/main/RFCS/0091-multimodal-perception-input.md
15
+ * - https://github.com/openwop/openwop/blob/main/spec/v1/host-capabilities.md
16
+ */
17
+
18
+ import { describe, it, expect } from 'vitest';
19
+ import { readFileSync, readdirSync } from 'node:fs';
20
+ import { join } from 'node:path';
21
+ import Ajv2020 from 'ajv/dist/2020.js';
22
+ import addFormats from 'ajv-formats';
23
+ import { SCHEMAS_DIR } from '../lib/paths.js';
24
+
25
+ const BASE = 'https://openwop.dev/spec/v1/';
26
+ const why = (specRef: string, requirement: string): string => `${specRef} — ${requirement}`;
27
+ function loadSchema(name: string): Record<string, unknown> {
28
+ return JSON.parse(readFileSync(join(SCHEMAS_DIR, name), 'utf8')) as Record<string, unknown>;
29
+ }
30
+
31
+ describe('aiproviders-input-shape: callAI perception input (RFC 0091 §B, server-free)', () => {
32
+ const ajv = new Ajv2020({ strict: false, allErrors: true });
33
+ addFormats(ajv);
34
+ for (const f of readdirSync(SCHEMAS_DIR)) {
35
+ if (f.endsWith('.schema.json')) {
36
+ try {
37
+ ajv.addSchema(loadSchema(f));
38
+ } catch {
39
+ /* duplicate/ignore */
40
+ }
41
+ }
42
+ }
43
+ const input = ajv.getSchema(`${BASE}capabilities.schema.json#/properties/aiProviders/properties/input`);
44
+
45
+ it('the aiProviders.input sub-schema exists', () => {
46
+ expect(input, 'capabilities.aiProviders.input MUST be declared').toBeTruthy();
47
+ });
48
+
49
+ it('accepts a conforming modalities advertisement', () => {
50
+ expect(
51
+ input!({ modalities: ['text', 'image', 'document'], maxBytesPerPart: 1048576 }),
52
+ why('RFC 0091 §B', 'a conforming input advertisement MUST validate'),
53
+ ).toBe(true);
54
+ });
55
+
56
+ it('rejects an out-of-enum modality', () => {
57
+ expect(
58
+ input!({ modalities: ['text', 'video'] }),
59
+ why('RFC 0091 §B', 'modalities is a closed enum (text/image/audio/document)'),
60
+ ).toBe(false);
61
+ });
62
+
63
+ it('rejects a non-positive maxBytesPerPart', () => {
64
+ expect(
65
+ input!({ modalities: ['text'], maxBytesPerPart: 0 }),
66
+ why('RFC 0091 §B', 'maxBytesPerPart MUST be a positive integer'),
67
+ ).toBe(false);
68
+ });
69
+ });
@@ -0,0 +1,86 @@
1
+ /**
2
+ * Multimodal perception input on callAI (RFC 0091 §A/§B) — behavioral.
3
+ *
4
+ * Gated on `capabilities.aiProviders.input.modalities` including a non-text
5
+ * modality (root-first per RFC 0073). Soft-skips when unadvertised (default) /
6
+ * hard-fails under `OPENWOP_REQUIRE_BEHAVIOR=true`. The always-on wire-shape
7
+ * coverage lives in `aiproviders-input-shape.test.ts`; this asserts host
8
+ * BEHAVIOR via the documented host-sample callAI seam
9
+ * `POST /v1/host/sample/ai/call` (soft-skips on 404 until a host wires it):
10
+ *
11
+ * - an ADVERTISED non-text modality (e.g. an `image` ContentPart) is ACCEPTED
12
+ * (not rejected as unsupported);
13
+ * - an UNADVERTISED modality MUST be rejected with the canonical
14
+ * `unsupported_modality` error — never silently dropped (RFC 0091 §A).
15
+ *
16
+ * Spec references:
17
+ * - https://github.com/openwop/openwop/blob/main/RFCS/0091-multimodal-perception-input.md
18
+ * - https://github.com/openwop/openwop/blob/main/spec/v1/host-capabilities.md (§host.aiProviders)
19
+ */
20
+
21
+ import { describe, it, expect } from 'vitest';
22
+ import { driver } from '../lib/driver.js';
23
+ import { behaviorGate } from '../lib/behavior-gate.js';
24
+ import { readCapabilityFamily } from '../lib/discovery-capabilities.js';
25
+
26
+ const ALL_MODALITIES = ['text', 'image', 'audio', 'document'];
27
+
28
+ /** Read the canonical error code from a seam response body (tolerant of
29
+ * `{error}` / `{code}` / `{error:{code}}` shapes). */
30
+ function errCode(json: unknown): string | undefined {
31
+ const j = json as { error?: unknown; code?: unknown };
32
+ if (typeof j?.code === 'string') return j.code;
33
+ if (typeof j?.error === 'string') return j.error;
34
+ const e = j?.error as { code?: unknown } | undefined;
35
+ if (e && typeof e.code === 'string') return e.code;
36
+ return undefined;
37
+ }
38
+
39
+ const SEAM = '/v1/host/sample/ai/call';
40
+
41
+ describe('callai-multimodal (RFC 0091 §A/§B)', () => {
42
+ it('accepts an advertised non-text modality and rejects an unadvertised one with unsupported_modality', async () => {
43
+ const ai = await readCapabilityFamily<Record<string, unknown>>('aiProviders');
44
+ const input = ai?.input as { modalities?: unknown } | undefined;
45
+ const modalities = Array.isArray(input?.modalities) ? (input!.modalities as string[]) : [];
46
+ const advertisedNonText = modalities.filter((m) => m !== 'text');
47
+ // Gate on advertising at least one non-text input modality.
48
+ if (!behaviorGate('openwop-callai-multimodal', advertisedNonText.length > 0)) return;
49
+
50
+ const accepted = advertisedNonText[0]!; // an advertised non-text modality
51
+ const unadvertised = ALL_MODALITIES.find((m) => m !== 'text' && !modalities.includes(m));
52
+
53
+ // 1) An advertised modality part is ACCEPTED (not an unsupported_modality refusal).
54
+ const okPart =
55
+ accepted === 'image'
56
+ ? { type: 'image', mimeType: 'image/png', dataBase64: 'iVBORw0KGgo=' }
57
+ : accepted === 'audio'
58
+ ? { type: 'audio', mimeType: 'audio/mp3', dataBase64: 'AAAA' }
59
+ : { type: 'file', mimeType: 'application/pdf', dataBase64: 'JVBERi0=' };
60
+ const okRes = await driver.post(SEAM, {
61
+ messages: [{ role: 'user', content: [{ type: 'text', text: 'describe this' }, okPart] }],
62
+ });
63
+ if (okRes.status === 404) return; // seam unwired — soft-skip the whole behavioral suite
64
+ expect(
65
+ errCode(okRes.json) !== 'unsupported_modality',
66
+ driver.describe('RFC 0091 §B', `an advertised modality (${accepted}) MUST NOT be rejected as unsupported_modality`),
67
+ ).toBe(true);
68
+
69
+ // 2) An unadvertised modality MUST be rejected with unsupported_modality.
70
+ if (unadvertised) {
71
+ const badPart =
72
+ unadvertised === 'audio'
73
+ ? { type: 'audio', mimeType: 'audio/mp3', dataBase64: 'AAAA' }
74
+ : unadvertised === 'document'
75
+ ? { type: 'file', mimeType: 'application/pdf', dataBase64: 'JVBERi0=' }
76
+ : { type: 'image', mimeType: 'image/png', dataBase64: 'iVBORw0KGgo=' };
77
+ const badRes = await driver.post(SEAM, {
78
+ messages: [{ role: 'user', content: [badPart] }],
79
+ });
80
+ expect(
81
+ errCode(badRes.json) === 'unsupported_modality',
82
+ driver.describe('RFC 0091 §A', `an unadvertised modality (${unadvertised}) MUST be rejected with unsupported_modality (never silently dropped)`),
83
+ ).toBe(true);
84
+ }
85
+ });
86
+ });
@@ -29,6 +29,7 @@
29
29
  import { describe, it, expect } from 'vitest';
30
30
  import { driver } from '../lib/driver.js';
31
31
  import { experimentalGate } from '../lib/behavior-gate.js';
32
+ import { __resetEnvCacheForTests } from '../lib/env.js';
32
33
  import { capabilityFamily } from '../lib/discovery-capabilities.js';
33
34
 
34
35
  const HTTP_SKIP = !process.env.OPENWOP_BASE_URL;
@@ -180,6 +181,12 @@ describe.skipIf(HTTP_SKIP)('experimental-tier-shape: §D experimentalGate helper
180
181
  it('experimentalGate routes through behaviorGate when tier === undefined or "stable"', () => {
181
182
  const prevReqBeh = process.env.OPENWOP_REQUIRE_BEHAVIOR;
182
183
  delete process.env.OPENWOP_REQUIRE_BEHAVIOR;
184
+ // behaviorGate/experimentalGate read a memoized loadEnv() snapshot. Under a
185
+ // strict suite run (e.g. the conformance-soak sets OPENWOP_REQUIRE_BEHAVIOR=true
186
+ // process-wide) an earlier scenario has already cached requireBehavior=true, so
187
+ // the delete above is a no-op against the cache and the default-mode assertions
188
+ // below would wrongly throw. Bust the memo so this self-test sees default mode.
189
+ __resetEnvCacheForTests();
183
190
  try {
184
191
  // Stable + advertised → proceed.
185
192
  expect(experimentalGate('test-stable', true, 'stable')).toBe(true);
@@ -188,6 +195,10 @@ describe.skipIf(HTTP_SKIP)('experimental-tier-shape: §D experimentalGate helper
188
195
  expect(experimentalGate('test-not-adv', false, 'stable')).toBe(false);
189
196
  } finally {
190
197
  if (prevReqBeh !== undefined) process.env.OPENWOP_REQUIRE_BEHAVIOR = prevReqBeh;
198
+ // Restore the real env into the memo so later scenarios gate correctly (a
199
+ // leaked default-mode cache would turn their strict behaviorGates into
200
+ // silent soft-skips — a coverage hole).
201
+ __resetEnvCacheForTests();
191
202
  }
192
203
  });
193
204
  });
@@ -387,7 +387,19 @@ function extractReadmeDocumentIndex(readme: string): string {
387
387
  }
388
388
 
389
389
  function listMarkdownFilesRecursive(dir: string, repoRoot: string = dir): string[] {
390
- const ignoredDirs = new Set(['.git', 'node_modules', 'dist']);
390
+ const ignoredDirs = new Set([
391
+ '.git',
392
+ 'node_modules',
393
+ 'dist',
394
+ // CI cross-repo checkouts: the host-conformance workflows (conformance-soak,
395
+ // postgres-host-conformance) check out openwop-examples + openwop-registry into
396
+ // examples-ext/ + registry-ext/ inside the workspace. Those carry their own
397
+ // READMEs whose links are relative to THEIR repo root (../../spec, ../../RFCS,
398
+ // ../../conformance, …) and don't resolve from this corpus. They're link-checked
399
+ // in their own repos; do not scan a vendored sibling-repo checkout here.
400
+ 'examples-ext',
401
+ 'registry-ext',
402
+ ]);
391
403
  // Repo-relative directory paths to prune. These are subtrees whose
392
404
  // content shouldn't be link-checked because either (a) they're
393
405
  // generated build output (`site/out`) or (b) they're a vendored
@@ -1484,9 +1496,15 @@ describe('spec-corpus: RFC 0089 conformance certification bundle + binding rule'
1484
1496
  // own captured discovery document AND be floor-proven. The bundle lives in
1485
1497
  // `examples/`, which is NOT bundled into the published tarball, so this skips
1486
1498
  // cleanly under the published layout (V1_DIR === null).
1487
- describe.skipIf(V1_DIR === null)('spec-corpus: RFC 0089 committed reference-host certification bundle', () => {
1488
- const repoRoot = V1_DIR === null ? '' : pathResolve(V1_DIR, '..', '..');
1489
- const bundlePath = join(repoRoot, 'examples', 'hosts', 'in-memory', 'certification-bundle.json');
1499
+ // The committed reference-host certification bundle lives with the in-memory host,
1500
+ // which moved to the openwop-examples repo (2026-06). When the host tree is absent
1501
+ // (the spec corpus on its own), this committed-bundle check self-skips — it is
1502
+ // validated in openwop-examples CI against the published @openwop/openwop-conformance
1503
+ // verifyBundle. The sample-bundle schema + binding-rule checks above still run here.
1504
+ const RFC0089_BUNDLE_PATH =
1505
+ V1_DIR === null ? null : join(pathResolve(V1_DIR, '..', '..'), 'examples', 'hosts', 'in-memory', 'certification-bundle.json');
1506
+ describe.skipIf(RFC0089_BUNDLE_PATH === null || !existsSync(RFC0089_BUNDLE_PATH))('spec-corpus: RFC 0089 committed reference-host certification bundle', () => {
1507
+ const bundlePath = RFC0089_BUNDLE_PATH as string;
1490
1508
 
1491
1509
  const ajv = new Ajv2020({ allErrors: true, strict: false });
1492
1510
  addFormats(ajv);
@@ -0,0 +1,73 @@
1
+ /**
2
+ * Verifier turn + gating (RFC 0090 §A/§B) — behavioral.
3
+ *
4
+ * Gated on `capabilities.multiAgent.executionModel.verifier.gating === true`
5
+ * (root-first per RFC 0073). Soft-skips when unadvertised (default) / hard-fails
6
+ * under `OPENWOP_REQUIRE_BEHAVIOR=true`. The always-on wire-shape coverage lives
7
+ * in `agent-verifier-shape.test.ts`; this asserts host BEHAVIOR via the
8
+ * documented host-sample seam `POST /v1/host/sample/agents/verify-run`
9
+ * (soft-skips on 404 until a host wires it):
10
+ *
11
+ * - a `fail` verdict on a gating host MUST block commit/terminate-as-success:
12
+ * the turn does NOT complete successfully, and a content-free `agent.verified`
13
+ * with `verdict: "fail"` is emitted (RFC 0090 §A/§B; composes the RFC 0063
14
+ * fail-closed merge gate);
15
+ * - a `pass` verdict completes normally.
16
+ *
17
+ * Spec references:
18
+ * - https://github.com/openwop/openwop/blob/main/RFCS/0090-agent-verifier-and-convergence.md
19
+ * - https://github.com/openwop/openwop/blob/main/spec/v1/multi-agent-execution.md (§"Verifier and convergence")
20
+ */
21
+
22
+ import { describe, it, expect } from 'vitest';
23
+ import { driver } from '../lib/driver.js';
24
+ import { behaviorGate } from '../lib/behavior-gate.js';
25
+ import { readCapabilityFamily } from '../lib/discovery-capabilities.js';
26
+
27
+ const SEAM = '/v1/host/sample/agents/verify-run';
28
+
29
+ /** Pull the agent.verified verdict out of a seam response's events[] (tolerant). */
30
+ function verifiedVerdict(json: unknown): string | undefined {
31
+ const events = (json as { events?: Array<{ type?: string; verdict?: unknown }> })?.events ?? [];
32
+ const v = events.find((e) => e?.type === 'agent.verified');
33
+ return typeof v?.verdict === 'string' ? v.verdict : undefined;
34
+ }
35
+
36
+ /** Did the turn complete as a success? (tolerant of status/outcome shapes). */
37
+ function isSuccess(json: unknown): boolean {
38
+ const j = json as { status?: unknown; outcome?: unknown; committed?: unknown };
39
+ return j?.status === 'completed' || j?.outcome === 'completed' || j?.committed === true;
40
+ }
41
+
42
+ describe('verifier-gating (RFC 0090 §B)', () => {
43
+ it('a fail verdict blocks commit on a gating host; a pass verdict completes', async () => {
44
+ const ma = await readCapabilityFamily<Record<string, unknown>>('multiAgent');
45
+ const em = ma?.executionModel as { verifier?: { gating?: unknown } } | undefined;
46
+ const gating = em?.verifier?.gating === true;
47
+ if (!behaviorGate('openwop-verifier-gating', gating)) return;
48
+
49
+ // FAIL verdict → must NOT complete as success; agent.verified{fail} emitted.
50
+ const failRes = await driver.post(SEAM, { simulateVerdict: 'fail' });
51
+ if (failRes.status === 404) return; // seam unwired — soft-skip
52
+ expect(
53
+ verifiedVerdict(failRes.json) === 'fail',
54
+ driver.describe('RFC 0090 §A', 'a verify-run forcing a fail MUST emit agent.verified{verdict:"fail"}'),
55
+ ).toBe(true);
56
+ expect(
57
+ !isSuccess(failRes.json),
58
+ driver.describe('RFC 0090 §B', 'on a gating host a fail verdict MUST block commit/terminate-as-success'),
59
+ ).toBe(true);
60
+
61
+ // PASS verdict → completes normally.
62
+ const passRes = await driver.post(SEAM, { simulateVerdict: 'pass' });
63
+ if (passRes.status === 404) return;
64
+ expect(
65
+ verifiedVerdict(passRes.json) === 'pass',
66
+ driver.describe('RFC 0090 §A', 'a verify-run forcing a pass MUST emit agent.verified{verdict:"pass"}'),
67
+ ).toBe(true);
68
+ expect(
69
+ isSuccess(passRes.json),
70
+ driver.describe('RFC 0090 §B', 'a pass verdict MUST allow the turn to complete'),
71
+ ).toBe(true);
72
+ });
73
+ });