@openwop/openwop-conformance 1.16.0 → 1.18.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,38 @@
4
4
 
5
5
  _No unreleased changes._
6
6
 
7
+ ## [1.18.1] — 2026-06-01 — RFC 0050 SAML behavioral leg: full 7-variant seam coverage
8
+
9
+ Standalone conformance patch — published via the `openwop-conformance/v1.18.1` per-package tag (PUBLISHING.md §"CI automation"; only the `publish-conformance` job runs), NOT a coordinated spec-corpus release. `EXPECTED_CONFORMANCE_VERSION` advances to `1.18.1` in lockstep. No new scenario file → patch bump.
10
+
11
+ ### Changed — RFC 0050 `auth-saml-profile` opt-in behavioral leg
12
+
13
+ Expanded the opt-in `auth/saml/validate` behavioral leg from a single `alg-none` negative to the **full RFC 0050 §A variant set driven over the host's live seam**: 1 positive (valid, signed, in-window, non-wrapped → MUST be accepted, 2xx) + 6 negatives (`alg-none`, `unsigned`, `bad-signature`, `expired`, `not-yet-valid`, `signature-wrapping` → MUST be rejected, non-2xx). The signature-wrapping (XSW) case is the load-bearing security property. This makes the behavioral cert **non-vacuous against a host's real XML-DSig ACS** (distinct from the in-process reference suite, which proves the assertions detectably malformed against the bundled oracle, not the host). Still gated on `OPENWOP_TEST_SAML_IDP_URL` (an operator-supplied HTTP synthetic IdP serving the bundled `createSyntheticSamlIdp()` assertions) + the `openwop-auth-saml` advertisement; soft-skips otherwise. The steward prerequisite for graduating RFC 0050 `Active → Accepted` on a host with a real SAML ACS. Scenario docstring updated `Draft → Active`.
14
+
15
+ ## [1.18.0] — 2026-06-01 — RFC 0085 openwop-agent-platform live aggregate-evidence gate
16
+
17
+ Standalone conformance minor — a scenario addition published via the `openwop-conformance/v1.18.0` per-package tag (PUBLISHING.md §"CI automation"; only the `publish-conformance` job runs), NOT a coordinated spec-corpus release. `EXPECTED_CONFORMANCE_VERSION` advances to `1.18.0` in lockstep. The steward prerequisite that lets MyndHyve run the RFC 0085 §C live aggregate-evidence scenario non-vacuously under `OPENWOP_REQUIRE_BEHAVIOR=true` to graduate the `openwop-agent-platform` meta-profile from `Active` to `Accepted` — the capstone of the agent-platform program. The normative surface (the `nondeterminismPolicy.declared` flag + the `isAgentPlatform*` predicate helpers + the operational annex) already shipped — this release is the gated test surface only.
18
+
19
+ ### Added — RFC 0085 live aggregate-evidence scenario
20
+
21
+ - **`agent-platform-aggregate-evidence.test.ts`** (`behaviorGate('openwop-agent-platform', …)`, gated on a host CLAIMING `openwop-agent-platform` in its live discovery `profiles[]`) — the RFC 0085 §C `Active → Accepted` bar. Reads the live `/.well-known/openwop` and asserts the §C/§D honest-advertisement rule: a host MAY advertise `openwop-agent-platform` only if its real wire satisfies the §B floor predicate (`isAgentPlatformPartial`), deriving to `partial` or `full` (never `none`) — the platform claim is **backed by** the per-capability evidence (each constituent cap's gated scenario runs in the same suite run), never the profile string alone. When the operator declares the cert tier `full` (`OPENWOP_AGENT_PLATFORM_TIER=full`), the full predicate (authorization + tenant installScope + memory.attribution + debugBundle + triggerBridge + egressPolicy) MUST hold + all 16 §D terms satisfied. Server-requiring — the always-on §B/§D predicate-derivation legs stay in `agent-platform-profile.test.ts`. This is the RFC 0085 → Accepted bar.
22
+
23
+ Additive + capability-gated; existing v1.0-only hosts pass unchanged. No new schemas (the `nondeterminismPolicy.declared` flag + the `isAgentPlatform*` predicate helpers shipped at `Draft → Active`).
24
+
25
+ ## [1.17.0] — 2026-06-01 — harden the agent-platform behavioral gates against vacuous passes
26
+
27
+ Standalone conformance minor — a strictness fix to three existing capability-gated behavioral scenarios published via the `openwop-conformance/v1.17.0` per-package tag (PUBLISHING.md §"CI automation"). `EXPECTED_CONFORMANCE_VERSION` advances to `1.17.0` in lockstep. **No new scenario files** (count unchanged); this tightens the assertions inside three scenarios so a host that advertises the capability + wires the seam but emits **no evidence** now FAILS rather than vacuously passing. Addresses an independent audit finding that these gates "could pass without proving the behavior they claim."
28
+
29
+ The fix preserves every legitimate soft-skip (capability/profile not advertised, event-log seam absent, drive-seam unwired → the host hasn't opted in). What changes is that **once a host has opted in and returned a runId**, missing evidence is a hard failure.
30
+
31
+ ### Changed — agent-platform behavioral gates (RFC 0081 / 0082 / 0083)
32
+
33
+ - **`agent-eval-run.test.ts`** (RFC 0081) — previously the entire `eval.*` ordering/content block was wrapped in `if (startedEvents.length > 0)` and the `EvalSummary` read in `if (status === 200)`, so a host emitting nothing (or a non-200 summary) passed. Now: a wired eval-run MUST return a runId; the event-log seam MUST return the events; **exactly one** `eval.started`, **≥1** `eval.scored`, **exactly one** `eval.completed`; the ordering + per-task count + content-free + `score∈0..1` checks run unconditionally; and `GET /v1/runs/{runId}/eval-summary` MUST serve a **200** schema-valid `EvalSummary`.
34
+ - **`agent-deployment-lifecycle.test.ts`** (RFC 0082) — previously the positive promote could pass with no record/runId and no `deployment.promoted`; the denial/channel-pin legs were conditionally checked. Now: a promote MUST return a runId + a schema-valid record + emit **≥1** `deployment.promoted`; the unauthorized + eval-gate-unmet legs MUST return a runId, be denied, and emit **zero** `deployment.promoted`; the channel-pin leg MUST emit `agent.invocation.started` carrying `resolvedAgentVersion`.
35
+ - **`trigger-bridge-delivery.test.ts`** (RFC 0083) — previously dedup checked `≤1` delivered (zero passed) and causation only checked that a causationId *existed*. Now: dedup MUST be **exactly one** delivered attempt; exhaustion MUST emit a terminal `dead-lettered` delivery + a `dead-lettered` subscription transition; and the delivered run's `run.started.causationId` MUST **equal the delivery id** (the `trigger.delivery.attempted{delivered}` event's `eventId`) per trigger-bridge.md §C — not merely be non-empty.
36
+
37
+ New shared helper `requireEvents()` in `src/lib/event-log-query.ts` asserts a query succeeded (hard-fail, no vacuous pass) and returns the typed events. Capability-gated + additive; reference hosts that don't advertise these surfaces continue to soft-skip. MyndHyve (which already emits the full evidence) continues to pass non-vacuously.
38
+
7
39
  ## [1.16.0] — 2026-06-01 — RFC 0084 budget-enforcement behavioral gate
8
40
 
9
41
  Standalone conformance minor — a scenario addition published via the `openwop-conformance/v1.16.0` per-package tag (PUBLISHING.md §"CI automation"; only the `publish-conformance` job runs), NOT a coordinated spec-corpus release. `EXPECTED_CONFORMANCE_VERSION` advances to `1.16.0` in lockstep. The steward prerequisite that lets MyndHyve run the RFC 0084 §C/§D budget-enforcement scenario non-vacuously under `OPENWOP_REQUIRE_BEHAVIOR=true` to graduate `budget` from `Active` to `Accepted`. The normative surface (`budget-policy.schema.json` + the four `budget.*` events + the four `cap.breached{budget-*}` kinds) already shipped — this release is the gated test surface only.
package/README.md CHANGED
@@ -93,7 +93,7 @@ Exit code is non-zero on any failed assertion.
93
93
 
94
94
  ## What's Covered
95
95
 
96
- The current suite has 322 scenario files under `src/scenarios/`. 2026-06-01 (RFC 0084 — budget, quota + cost policy, the Active→Accepted behavioral gate) added `budget-enforcement.test.ts` (capability-gated on `budget.supported` via `behaviorGate('openwop-budget-enforcement', …)` — the §C/§D enforcement via the new `POST /v1/host/sample/budget/run` seam + the test event-log seam: a `hard-cost-exhaust` run emits the strict-ordered `budget.reserved → budget.consumed → budget.threshold.crossed{percent} → budget.exhausted → cap.breached{kind:"budget-cost"} → run.failed{error:"budget_exhausted"}` chain; a `model-denied` run is refused `budget_model_denied` BEFORE the provider call (fail-closed); an `advisory` host emits the `budget.*` events without stopping; every `budget.*` payload content-free backing `budget-no-pricing-leak`; new lib helper `src/lib/budgetPolicy.ts`; soft-skips on 404 — the RFC 0084 → Accepted bar). 2026-06-01 (RFC 0080 — agent memory capability reconciliation, the Active→Accepted behavioral gate) added `memory-degraded-projection.test.ts` (capability-gated on `agents.manifestRuntime.supported` + `memory.supported` via `behaviorGate('openwop-memory-degraded', …)` — the §C degraded-projection iff-contract on the NORMATIVE `GET /v1/agents`: a degraded inventory entry MUST carry `memoryDegraded:true` + a non-empty, unique `degradedMemoryDimensions[]` from the closed §A-name enum, a non-degraded entry MUST NOT, the inventory is non-empty, and the degraded branch runs non-vacuously when `OPENWOP_DEGRADED_AGENT_ID` names a known-degraded agent; black-box, no POST seam — the RFC 0080 → Accepted bar). This batch also documents the two RFC 0068 conformance seams (`POST /v1/host/sample/memory/consolidate` + `.../commitment/fire`) in `host-sample-test-seams.md` (the 0068 gated scenarios shipped in 1.14.0). 2026-06-01 (RFC 0034 — collector-side BYOK-canary inspection) added `otel-collector-canary-inspection.test.ts` (always-on server-free: stands up a real `OtelCollector`, POSTs synthetic OTLP/HTTP-JSON traces + metrics through its actual ingest path, and proves the new `findCanaryLeakage()` inspector catches a canary embedded in a span attribute / resource attribute / span name / metric data-point attribute while reporting ZERO hits on a redacted payload and never matching an empty canary — the non-vacuous proof that the conformance collector now inspects what the host's OTLP exporter ACTUALLY shipped over the wire, closing the `secret-leakage-otel-attribute` / `-debug-bundle-otel` collector-seam gap; the live capability-gated complement is the new collector-export describe block in `secret-leakage-otel-attribute.test.ts`). 2026-06-01 (RFC 0035 — sandbox wall-clock timeout, the 7th-of-8 graduation) added `sandbox-wasm-timeout.test.ts` (worker-driven server-free: `probeTimeout` in `wasm-sandbox-probe.ts` spawns a worker thread running the committed `misbehaving-timeout.wasm` + a main-thread kill-timer — the thread preemption a same-thread probe can't do — asserting `sandbox_timeout` with a well-behaved positive control; graduates `node-pack-sandbox-timeout` reference-impl→protocol, so 7 of 8 `node-pack-sandbox-*` invariants are now protocol-tier, only the JS-specific `no-eval` permanently exempt). 2026-05-31 (audit-response black-box / graduation batch) added three more: `sandbox-wasm-isolation.test.ts` (RFC 0035 — drives the committed `fixtures/wasm-sandbox/*.wasm` through `wasm-sandbox-probe.ts`: escape/capability-gate via static `WebAssembly.Module.imports()`, an OOB-store memory trap, double-instantiate isolation; 10/10; graduates 6 `node-pack-sandbox-*` invariants reference-impl→protocol), `workspace-cross-tenant-isolation-blackbox.test.ts` (RFC 0059 — two-credential black-box on the normative §C `/v1/host/workspace/files` endpoints: owner A writes, a second-tenant credential fails closed; no seam), and `prompt-resolution-chain-event.test.ts` (RFC 0029 — reads the durable `agent.promptResolved.chain[]` precedence record via the normative `GET /v1/runs/{runId}/events/poll`; no seam) — each the production-path proof that graduates its surface into the `openwop-core-standard` floor. 2026-05-31 (RFC 0088 — the `openwop-core-standard` Core Standard Profile, the audit-response Core Candidate target) added `core-standard-profile.test.ts` (always-on server-free derivation probe: `isCoreStandard` derives the §B floor — `openwop-core` ∧ `openwop-interrupts` ∧ (`openwop-stream-sse` ∨ `openwop-stream-poll`) — a bare `openwop-core` host without interrupts is excluded, a host with no event transport fails, and the annex is absent from `deriveProfiles` because it composes rather than redefines). 2026-05-31 (RFC 0082 — agent deployment lifecycle, the Active→Accepted behavioral gate) added `agent-deployment-lifecycle.test.ts` (capability-gated on `agents.deployment.supported` via `behaviorGate('openwop-deployment-lifecycle', …)` — the §E promotion contract via the new `POST /v1/host/sample/agents/deployment-transition` seam + the test event-log seam across four legs: `promote` (authorize RFC 0049 → approvalGate RFC 0051 → eval-verify RFC 0081 → content-free `deployment.promoted` with a seven-state `toState` + `toVersion`, the record validating `agent-deployment.schema.json`), `unauthorized` (fail-closed — `allowed:false`, no `deployment.promoted`, the behavioral leg of `deployment-promotion-fail-closed`), `eval-gate-unmet` (`eval_gate_unmet` denial, §E-3), and `channel-pin` (the §B `resolvedAgentVersion` recorded-fact on `agent.invocation.started`); new lib helper `src/lib/agentDeployment.ts`; soft-skips on 404 — the RFC 0082 → Accepted bar). 2026-05-31 (RFC 0081 — agent evaluation, the Active→Accepted behavioral gate) added `agent-eval-run.test.ts` (capability-gated on `agents.evalSuite.supported` via `behaviorGate('openwop-eval-run', …)` — the §B `mode:"eval"` projection via the new `POST /v1/host/sample/agents/eval-run` seam + the test event-log seam: `eval.started`-first → one `eval.scored` per task → `eval.completed`-once ordering (count == `eval.completed.taskCount`), the content-free `eval.scored` legs (`score` ∈ 0..1) backing `eval-summary-no-content-leak`, and the NORMATIVE `GET /v1/runs/{runId}/eval-summary` schema-valid `EvalSummary` round-trip with `passedCount <= taskCount`; new lib helper `src/lib/agentEval.ts`; soft-skips on 404 — the RFC 0081 → Accepted bar). 2026-05-31 (RFC 0083 — durable trigger bridge, the Active→Accepted behavioral gate) added `trigger-bridge-delivery.test.ts` (profile-gated on `openwop-trigger-bridge` derived from the live discovery doc — the §C delivery model via the `POST /v1/host/sample/trigger-bridge/deliver` seam + the test event-log seam: dedup→effectively-once `trigger.delivery.attempted{delivered}` (§C-1), retry-exhaustion→`{dead-lettered}` + `trigger.subscription.state.changed{toState:dead-lettered}` (§C-2 + RFC 0053), and the delivered run's `run.started.causationId` == the delivery id (§C / RFC 0040); both `trigger.*` events content-free; the always-on shape stays in `trigger-bridge-shape.test.ts`; new lib helper `src/lib/triggerBridge.ts`). 2026-05-31 (RFC 0087 — agent org-chart, the Active→Accepted behavioral gate) added two capability-gated behavioral scenarios (both gated on `agents.orgChart.supported`, black-box on the normative `/v1/agents/org-chart` surface — no new POST seam): `agent-org-chart-scoping.test.ts` (the `GET /v1/agents/org-chart` tree-shape — departments form an acyclic `parentDepartmentId` tree, members reference `host:<id>` roster entries — + the §D responsibility roll-up via `GET /v1/agents/org-chart/{departmentId}` with a deduped `responsibilities[]` union + the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ORG_CHART_DEPARTMENT_ID`) and `org-position-no-authority-escalation.test.ts` (the behavioral leg of the protocol-tier invariant — the live org-chart wire carries NO authority-bearing field on any member/department/responsibility-view object; the structural leg stays always-on in `agent-org-chart-shape.test.ts`, and the deeper RFC 0049/0051 authority-invariance legs stay reference-impl tier per the `agent-manifest-runtime` no-host-hook precedent). 2026-05-31 (RFCs 0086 + 0077 — the Active→Accepted behavioral gate) added four capability-gated behavioral scenarios so a non-steward host can be mechanically certified non-vacuously under `OPENWOP_REQUIRE_BEHAVIOR=true`: `agent-roster-attribution.test.ts` (RFC 0086 §B/§C; gated on `agents.roster.supported` — the normative `GET /v1/agents/roster` read shape + `total==roster.length`, the §C `roster.run.initiated`-before-`agent.invocation.started` ordering, the content-free payload backing `roster-attribution-no-content`, the durable work-item `triggerSubscriptionId`, and the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ROSTER_ID`), `agent-live-invocation-bracket.test.ts` (RFC 0077 §E; gated on `agents.liveRuntime.supported` — `agent.invocation.started`-first / `agent.invocation.completed`-last bracket, matching `invocationId`, `source`/`outcome` closed enums, content-free), `agent-live-structured-output.test.ts` (RFC 0077 §B step 6; gated on `agents.liveRuntime.structuredOutput` — a result violating `handoff.returnSchemaRef` fails the invocation `outcome:"failed"` rather than shipping as completed), and `agent-live-allowlist-enforced.test.ts` (RFC 0077 §F-1 / RFC 0002 §A14; gated on `agents.liveRuntime.supported` — a tool outside `toolAllowlist` is not callable); all four drive the documented `POST /v1/host/sample/roster/fire` + `POST /v1/host/sample/agents/live-invoke` seams plus the test event-log seam and soft-skip on 404 (these are the RFC 0086 / 0077 Active→Accepted bars). 2026-05-30 (RFC 0087 — agent org-chart, Draft -> Active) added `agent-org-chart-shape.test.ts` (always-on server-free: the `capabilities.agents.orgChart` shape + the `AgentOrgChart` round-trip + the non-`host:` member negative + the **§B structural non-authority guarantee** — the schema rejects a `scopes`/`canDispatch`/`permissions`/`authority` field on a member (`additionalProperties:false`), and a member's key set is exactly `{rosterId, departmentId, roleId, reportsTo}` — backing the protocol-tier `org-position-no-authority-escalation` invariant; no new RunEventType). 2026-05-30 (RFC 0086 — standing agent roster, Draft -> Active) added `agent-roster-shape.test.ts` (always-on server-free: the `capabilities.agents.roster` shape + the `AgentRosterEntry` round-trip + the `host:` `rosterId` + `agentRef` version-XOR-channel negatives + the content-free `roster.run.initiated` negatives backing the protocol-tier `roster-attribution-no-content` invariant + the additive `roster` inventory projection + RunEventType-enum membership). 2026-05-30 (RFC 0082 — agent deployment lifecycle, Draft -> Active) added `agent-deployment-shape.test.ts` (always-on server-free: the `capabilities.agents.deployment` shape + the `AgentDeployment` record round-trip + the `AgentRef` `channel` XOR `version` `not`-clause + the four `deployment.*` payloads + the content-free negatives backing the protocol-tier `deployment-event-no-content-leak` invariant). 2026-05-30 (RFC 0085 — `openwop-agent-platform` meta-profile, Draft -> Active) added `agent-platform-profile.test.ts` (always-on server-free derivation of the operational-annex `none`/`partial`/`full` status: all-floor ⇒ partial, missing-flag ⇒ none, the replay-OR-`nondeterminismPolicy.declared` term, floor+governance ⇒ full, missing-tenant-scope ⇒ partial-not-full per the honest-advertisement rule, eval/deploy/budget-are-advisory-not-hard-terms, + the `capabilities.nondeterminismPolicy.declared` shape). 2026-05-30 (RFC 0084 — budget, quota + cost policy, Draft -> Active) added `budget-policy-shape.test.ts` (always-on server-free: `budget-policy.schema.json` round-trip + the §A orthogonality guard — a wall-time field is rejected (it's RFC 0058's `runTimeoutMs`) — + threshold/onExhaustion negatives + the four content-free `budget.{reserved,consumed,threshold.crossed,exhausted}` payloads + the four `cap.breached{budget-*}` kinds + RunEventType-enum membership + the no-pricing-property structural check backing the protocol-tier `budget-no-pricing-leak` invariant + the `capabilities.budget`/`limits.maxBudget*` shape). 2026-05-30 (RFC 0083 — durable trigger + channel bridge, Draft -> Active) added `trigger-bridge-shape.test.ts` (always-on server-free: `trigger-subscription.schema.json` round-trip + missing-`state`/out-of-enum-`source`/unknown-property negatives + the four-state vocab + the two content-free `trigger.{subscription.state.changed,delivery.attempted}` payloads incl. closed `state`/`outcome` enums + RunEventType-enum membership + the `triggerBridge`/`webhooks.durable` capability shape + the `openwop-trigger-bridge` profile derivation incl. the no-dead-letter-sink negative). 2026-05-30 (RFC 0079 — credential provenance + egress policy, Draft -> Active) added `egress-provenance-shape.test.ts` (always-on server-free: `credential-provenance.schema.json` round-trip + `audiences:[]`/missing-`credentialId`/unknown-property negatives + the no-secret-property structural check backing the protocol-tier `egress-decision-no-secret-leak` invariant + the content-free `egress.decided` record incl. the `decision` enum + RunEventType-enum membership + the `httpClient.egressPolicy` shape; the behavioral `egress-credential-audience-bound` confused-deputy MUST is reference-impl tier, deferred to a host). 2026-05-30 (RFC 0078 — portable tool catalog, Draft -> Active) added `tool-descriptor-shape.test.ts` (always-on server-free: `tool-descriptor.schema.json` round-trip + the §C-1 `exec` ⇒ `host-extension` cross-field MUST (RFC 0069) + the `safetyTier`-required negative + `additionalProperties:false`, the `capabilities.toolCatalog` `supported`/`sources`/`sessionLifecycle` shape, and the two content-free `tool.session.{opened,closed}` payload $defs incl. the closed `outcome` enum + RunEventType-enum membership). 2026-05-30 (RFC 0080 — agent memory capability reconciliation, Draft -> Active) added `memory-capability-model-shape.test.ts` (always-on server-free: the additive `capabilities.memory.{writable,search,retention}` dimension shapes + malformed-instance negatives — `retention.ttl` non-boolean, out-of-enum `search.modes`, unknown property under `additionalProperties:false` — the `agent-inventory-response` `memoryDegraded`/`degradedMemoryDimensions` closed-enum fields, and the `openwop-memory` derivation surfacing for read/write + long-term hosts while withholding from `writable:false`). 2026-05-30 (RFC 0081 — agent evaluation, Draft -> Active) added `agent-eval-suite-shape.test.ts` (always-on server-free: the `capabilities.agents.evalSuite` shape + the `AgentEvalSuite`/`EvalSummary` schema round-trips + the three `eval.{started,scored,completed}` payloads + the content-free negatives — a task entry with a `taskOutput` body, a `safetyFinding` with an `excerpt` — backing the new `eval-summary-no-content-leak` SECURITY invariant). 2026-05-29 (RFC 0076 §B — `ctx.http.safeFetch` live-run audit) added `safefetch-live-audit.test.ts` (`behaviorGate('openwop-safefetch-live-audit', …)`, gated on `httpClient.safeFetch` + `toolHooks.prePostEvents`) — asserts the audit-when-both MUST against the **durable run event log** via the new `POST /v1/host/sample/http/safe-fetch-run` open seam + the test event-log seam, closing the seam-vs-production gap (a production `createSafeFetch()` with no audit hooks passes the inline `safefetch-behavior.test.ts` but FAILS this under `OPENWOP_REQUIRE_BEHAVIOR=true`); this is the RFC 0076 §B → Accepted bar; run seam soft-skips on 404 (host-pending). 2026-05-29 (RFC 0066 — `x-openwop-form` picker UX hints, Draft → Active) added `x-openwop-form-pack-manifest.test.ts` (always-on server-free: an annotated `configSchema` stays a valid 2020-12 schema + the advisory hints don't change what it accepts, each §A annotation matches the shape, an unknown `kind` validates for forward-compat, 3 negatives — missing/non-string `kind`, non-string `dependsOn`). 2026-05-29 (RFC 0076 §B — `ctx.http.safeFetch`) added `safefetch-behavior.test.ts` (seam-gated: SSRF block / DNS-rebinding / `Connection: upgrade` refusal / tool-hooks audit-when-both, via `POST /v1/host/sample/http/safe-fetch`; advertisement contract stays in `http-client-ssrf.test.ts`). 2026-05-29 (RFC 0076 §A — pack `runtime.requires[]` install gate) added two: `runtime-requires-shape.test.ts` (server-free closed-vocabulary validation — the 8 tokens validate, a raw builtin name is rejected, empty-array≡omission, `uniqueItems`) + `runtime-requires-install-gate.test.ts` (seam-gated install-grant / install-refuse → `pack_runtime_requirement_unmet` / non-sandbox SHOULD-projection, soft-skip on 404 via `POST /v1/host/sample/packs/install-gate`). 2026-05-29 (RFC 0047 — `host.oauth` authorization-code roundtrip) added `oauth-authorization-code-roundtrip.test.ts` — capability-gated on `capabilities.oauth.supported` + `grants` including `authorization_code`; drives the `POST /v1/host/sample/oauth/authorize-code-roundtrip` seam against the one canonical synthetic provider in `fixtures/oauth-providers/synthetic.json` (soft-skip on 404, Tier-2 host-pending), asserting a successful grant returns a credential REFERENCE (token persisted as a `host.credentials` entry) and that the authorization code / state / PKCE verifier / acquired access+refresh tokens never appear on any run-visible surface (RFC 0047 §C + §C.2 / `credential-payload-redaction`). Closes the RFC 0047 Tier-2 gap (capability-shape + redaction scenarios existed; the actual authorization-code dance was unexercised). 2026-05-26 (RFC 0070 — agent-manifest runtime) added `agent-manifest-runtime.test.ts`; 2026-05-26 (RFC 0071 — artifact-type + chat card packs) added six: `artifact-type-pack-manifest-validation.test.ts` + `artifact-schema-compile-bounded.test.ts` (server-free) + `artifact-type-pack-install.test.ts` + `artifact-type-store-without-render.test.ts` + `chat-card-pack-manifest-validation.test.ts` (server-free) + `chat-card-pack-execution.test.ts` (capability-gated, host-pending). 2026-05-26 (RFCs 0067 / 0068 / 0069 — spec-gap Draft cohort) added five scenarios: `byok-auth-modes.test.ts` (RFC 0067; always-on schema-shape of `aiProviders.authModes` + a discovery-gated §B auth-mode-contract cross-field check), `memory-consolidation-shape.test.ts` (RFC 0068; always-on shape of `agents.memoryConsolidation`/`agents.commitments` + the `agent.memory.consolidated`/`commitment.fired` payload $defs), `memory-consolidation-idempotent.test.ts` + `commitment-fired.test.ts` (RFC 0068; capability-gated behavioral, soft-skip on the documented `/v1/host/sample/memory/consolidate` + `/commitment/fire` seams), and `exec-not-protocol-tier.test.ts` (RFC 0069; always-on server-free structural assertion that the protocol corpus defines no `core.*`/`openwop.*` exec-class primitive — backs the `exec-must-not-be-protocol-tier` SECURITY invariant). 2026-05-25 (RFC 0061 — stateful agent-loop lifecycle, executionModel.version 5) added four `agent-loop-*.test.ts` scenarios: `-version5-shape` (always-on; validates `executionModel.statefulResume`/`transcriptWindow` + the 1–5 version ceiling) plus `-iteration-monotonic` (gated on `version >= 5`; `runOrchestrator.decided.iteration` increments 1,2,3… exactly once per turn), `-workspace-snapshot` (gated additionally on `host.workspace.supported`; a turn-i workspace write is invisible to turn i, visible to turn i+1), and `-stateful-resume` (gated on `statefulResume`; a mid-loop suspend resumes at the same iteration without resetting the counter) — the three behavioral scenarios drive the documented agent-loop seam (`POST /v1/host/sample/agentloop/run`) and soft-skip until a host wires it. 2026-05-25 (RFC 0059 — host.workspace M2, reference-host enforcement) added two `workspace-*.test.ts` scenarios: `-behavior` (capability-gated CRUD round-trip / `If-Match` 409 `workspace_conflict` / `workspace_too_large` / §D run-start snapshot, all via the real `/v1/host/workspace/files` §C endpoints) and `-cross-tenant-isolation` (WCT-1 — drives the documented `POST /v1/host/sample/workspace/op` seam to assert a file owned by one `{tenant, workspace}` is unreadable, on both `get` and `list`, under a different owner; backs the new `workspace-cross-tenant-isolation` SECURITY invariant). The in-memory reference host now advertises `capabilities.workspace.supported` and honors §C/§D/§E end-to-end. 2026-05-25 (RFC 0062 — memory.distillation "dreams") added five `distillation-*.test.ts` scenarios: `-shape` (always-on; validates the `capabilities.memory.distillation` block + the additive `distillation` sub-object on `memory.compacted`) plus `-token-budget` (within budget `tokensUsed ≤ tokenBudget`; an un-meetable budget → `token_budget_exceeded` with no partial archive), `-stable-archive` (same sources + budget ⇒ byte-stable archive checksum), `-index-roundtrip` (gated additionally on `indexEmitted`; the `MEMORY-INDEX.json` workspace file is retrievable + `workspace.updated` fired), and `-secret-carryforward` (SR-1: a redacted source secret never appears in the archive) — the four behavioral scenarios drive the documented memory-distillation seam (`POST /v1/host/sample/memory/distill`) and soft-skip until a host wires it. 2026-05-25 (RFC 0063 — core.subWorkflow.outputAttestation) added four `subrun-*.test.ts` scenarios: `-attestation-shape` (always-on; validates the `capabilities.agents.subRunAttestation` flag) plus `-checksum-stable` (the child output checksum is the byte-stable, key-order-invariant RFC 8785 JCS + SHA-256 digest), `-approval-gate` (`requireApproval` → `accept` merges, `reject` does not), and `-approval-fail-closed` (no `accept`/`edit-accept` → no merge; backs the deferred `subrun-merge-approval-fail-closed` invariant) — the three behavioral scenarios drive the documented sub-run attestation seam (`POST /v1/host/sample/subrun/attest`) and soft-skip until a host wires it. 2026-05-25 (RFC 0064 — host.toolHooks) added five `tool-hooks-*.test.ts` scenarios: `-shape` (always-on; validates the `capabilities.toolHooks` block + the optional content-free fields on `agentToolCalled` / `agentToolReturned`) plus `-content-free` (gated on `prePostEvents`), `-authorization-fail-closed` (gated on `perToolAuthorization`), `-rate-limit` (gated on `perToolRateLimit`), and `-secret-redaction` (gated on `prePostEvents` + the SR-1 `argsHash` redaction rule) — the four behavioral scenarios drive the documented tool-hooks invoke seam (`POST /v1/host/sample/toolhooks/invoke`) and soft-skip until a host wires it. 2026-05-25 (RFC 0060 — host.heartbeat) added four `heartbeat-*.test.ts` scenarios: `-capability-shape` (always-on; validates the `capabilities.heartbeat` block) plus `-fires-once-per-tick`, `-idempotent-no-spam`, and `-runtime-bound` (gated on `capabilities.heartbeat.supported` + the host heartbeat tick seam; soft-skip until a host wires it). 2026-05-25 (RFC 0057 — memory write-attribution) added five `memory-attribution-*.test.ts` scenarios: `-shape` (always-on advertisement check on `capabilities.memory.attribution`), plus `-no-content`, `-tenant-scoped`, `-emits-on-write`, and `-replay-stable` (gated on `capabilities.memory.attribution.emitsWriteEvents`) verifying the content-free `memory.written` RunEvent, its two SECURITY invariants (`memory-attribution-no-content` + `memory-attribution-tenant-scoped`), and the §D replay rule that a `replay`-mode fork MUST NOT regenerate `memoryId`. 2026-05-25 (RFC 0025 §C point 1 — test-catalog isolation invariant; pairs with the 25 publish-error scenarios in `pack-registry-publish.test.ts`) added `pack-registry-isolation.test.ts` — capability-gated on `capabilities.packs.testMode.{supported, isolated}: true`; PUTs a disposable pack into `/v1/packs-test/{name}` and asserts the same `(name, version)` does NOT appear via `GET /v1/packs/{name}` — anchors the test-catalog isolation MUST in RFC 0025 §C. 2026-05-25 (RFC 0028 Tier-2 post-promotion T2 — read-side sister scenario for workspace-membership enforcement) added `prompt-read-workspace-membership-enforced.test.ts` — gates on `capabilities.prompts.supported: true` (broader than `mutableLibrary` so read-only hosts that expose `?workspaceId=` are also probed); drives `GET /v1/prompts?workspaceId=<random-non-member>` and interprets the response: 4xx PASS (canonical envelope check on 403); 200 with empty `templates[]` PASS (correct null result for a nonexistent workspace); 200 with non-empty `templates[]` FAIL (cross-tenant leak); 200 without `templates[]` field SKIP (host doesn't expose workspace-scoped reads). Verifies SECURITY invariant `prompt-read-workspace-membership-enforced`. Same-day T1 strengthened `prompt-mutation-workspace-membership-enforced.test.ts` to pin `error === "workspace_membership_required"` when the host's refusal status is 403 (other refusal codes unconstrained). 2026-05-25 (RFC 0028 Tier-2 follow-up — workspace-membership enforcement on mutating prompt endpoints, filed in response to a self-disclosed adopter vulnerability) added `prompt-mutation-workspace-membership-enforced.test.ts` — capability-gated on `capabilities.prompts.mutableLibrary: true`; drives `POST /v1/prompts` with a cryptographically-random non-member `workspaceId` and asserts the host refuses (NOT a 2xx; any 4xx/5xx is acceptable — silent success is the failure mode). Verifies SECURITY invariant `prompt-mutation-workspace-membership-enforced`. 2026-05-22 (RFC 0034 §B follow-up — secret-leakage harness against the OTel + debug-bundle seams) added `secret-leakage-otel-attribute.test.ts` — gates on `capabilities.secrets.supported` + `capabilities.observability.testSeams.{otelScrape,debugBundleExport}` AND the `OPENWOP_CANARY_SECRET_VALUE` env (host operator + conformance runner agree on the canary). Drives the existing `openwop-smoke-byok-roundtrip` fixture end-to-end; scrapes both seams after run completion; hard-fails if the canary plaintext appears in any OTel span attribute or debug-bundle field. Verifies SECURITY invariants `secret-leakage-otel-attribute` + `secret-leakage-debug-bundle-otel`. 2026-05-22 (RFC 0041 Phase 4 — replay determinism under nondeterministic models) added three scenarios: `replay-divergence-at-refusal.test.ts` (advertisement-shape probe on `replayDeterminism.refusalDivergenceEmission` + 2 `it.todo` for the dual-direction refusal-divergence case), `replay-observable-sequence-determinism.test.ts` (capability-gated; behavioral assertion soft-skipped until a `conformance-phase4-nondet-tool` fixture ships), `replay-llm-cache-key-portable.test.ts` (intra-host reproducibility + non-recipe-field invariance + Phase 4 advertisement alignment — reuses the existing `POST /v1/host/sample/test/llm-cache-key` seam from the sibling `replay-llm-cache-key.test.ts`). 2026-05-20 (RFC 0027 §A templateKinds-coverage follow-up — paired with `prompt-end-to-end-events.test.ts`) added `prompt-all-four-kinds-events.test.ts` exercising all four `PromptKind` values (`system`, `user`, `schema-hint`, `few-shot`) end-to-end through the reference workflow-engine sample's `local.sample.demo.mock-ai` dispatch path; capability-gated via `behaviorGate('prompts-supported', ...)`. Closes the credibility gap where the host advertised `templateKinds: ["system", "user", "few-shot", "schema-hint"]` but only the system+user pair was actually wired into dispatch. 2026-05-20 (RFCs 0030–0033 — envelope LLM-contract-hardening track) added 15 scenarios across four `Active` RFCs: `envelope-reasoning-shape.test.ts` (RFC 0030, always-on; asserts the OPTIONAL `reasoning` property on the three universal-kind schemas + the `schema.response` deliberate omission), `envelope-reasoning-secret-redaction.test.ts` (RFC 0030, capability-gated on `capabilities.envelopes.reasoning.supported` + `secrets.supported`; 5 `it.todo()` placeholders for SECURITY invariant `envelope-reasoning-secret-redaction`), `envelope-tier-one-subset-static.test.ts` (RFC 0030, always-on for load-bearing rules — no `oneOf` / `allOf` / `not` / `prefixItems` / `propertyNames` anywhere; gated on `tierOneSubsetCompliance: "strict"` for OpenAI-strict-only constraints), `envelope-variant-discriminator-static.test.ts` (RFC 0031, always-on; asserts no `oneOf` + every `anyOf` branch declares a single-string-enum discriminator in `required` on every `schemas/envelopes/*.schema.json`), `model-capability-substituted.test.ts` (RFC 0031, advertisement-shape probe on `capabilities.modelCapabilities.advertised[]` identifier pattern + 5 `it.todo()` placeholders for SECURITY invariant `model-capability-substituted-no-credential-disclosure`), `model-capability-insufficient.test.ts` (RFC 0031, 6 `it.todo()` placeholders for refusal + no-recursive-fallback), `node-module-required-capabilities-shape.test.ts` (RFC 0031 SHOULD-tier authoring-convention; 4 `it.todo()` placeholders), and the six envelope-reliability events from RFC 0032 (`envelope-retry-attempted` carrying the shared advertisement-shape probe enforcing both MUST-tier events in `events[]` per RFC 0032 §C, plus `envelope-retry-exhausted`, `envelope-refusal-shape`, `envelope-truncated`, `envelope-nl-to-format-engaged`, `envelope-recovery-applied` — collectively 39 `it.todo()` placeholders covering retry/refusal/truncation/recovery + SECURITY invariants `envelope-refusal-no-prompt-leak` and `envelope-recovery-no-content-leak`), plus RFC 0033's two scenarios (`envelope-completion-distinguishes-truncation.test.ts` + `envelope-truncation-cap-exhaustion.test.ts` — 12 `it.todo()` placeholders covering the truncation-vs-schema-violation retry-routing distinction + the DoS-bound assertion). Reference workflow-engine sample advertises `capabilities.envelopes.reasoning: { supported: true, promptDirective: "off" }` + `tierOneSubsetCompliance: "warn"` honestly (schemas accept the field; host doesn't yet inject the directive); the other three RFCs' capability blocks defer to reference-host emission code per the staged RFC 0027 §G precedent. 2026-05-20 (RFC 0028 §B Phase B — prompt-pack boot-time install) added `prompt-pack-install.test.ts` (capability-gated on `capabilities.prompts.endpointsSupported: true`; asserts a host that ran the boot-time pack loader surfaces ≥ 1 pack-source template under `GET /v1/prompts?source=pack` carrying the canonical `meta.source: "pack"` + `meta.packName` + `meta.packVersion` stamps; positively identifies the in-tree `vendor.openwop.prompt-sample` reference pack's `writer-system` template when present). Pairs with the new `host/promptPackLoader.ts` boot-time entry on the reference workflow-engine sample, which scans `examples/packs/*` plus `OPENWOP_PROMPT_PACKS_DIR` and calls `installPackTemplates()` for each `kind: "prompt"` pack found. 2026-05-20 (RFC 0029 Phase C — prompt resolution chain wire shape) added three more scenarios: `prompt-resolution-chain-node-wins.test.ts` (capability-gated on `capabilities.prompts.supported: true`; asserts layer-1 node-config supersedes lower layers per `spec/v1/prompts.md` §"Resolution chain (normative)"), `prompt-resolution-chain-agent-intrinsic.test.ts` (additionally gated on `capabilities.prompts.agentBindings: true`; asserts agent intrinsic `systemPromptRef` wins over `promptOverrides` AND lower layers when the node has no layer-1 ref), `prompt-resolution-chain-fallback-cascade.test.ts` (asserts layer 3 workflow-defaults wins over layer 4 host-defaults; layer 4 host-defaults wins when 1-3 yield null; resolved is null when all four yield null but chain[] still lists every attempted layer). The scenarios drive the host's `POST /v1/host/sample/prompt/resolve` test seam (reference-host implementation deferred to follow-up slice per RFC 0021 staging precedent). 2026-05-20 (RFC 0027 Phase A — prompt templates wire shape) added three scenarios: `prompt-template-shape.test.ts` (always-on; Ajv compileability + positive/negative round-trip for PromptTemplate + PromptRef + PromptKind), `prompt-composed-secret-redaction.test.ts` (capability-gated on `capabilities.prompts.supported: true` + `observability: "full"`; asserts `[REDACTED:<secretId>]` markers in `prompt.composed` payloads for `source: "secret"` variable bindings per SECURITY/threat-model-secret-leakage.md §SR-1), `prompt-composed-trust-marker.test.ts` (same capability gates; asserts `<UNTRUSTED>...</UNTRUSTED>` wrapping + `contentTrust: "untrusted"` propagation per RFC 0020 §D). Paired with new `fixtures/prompt-templates/` sub-directory + per-fixture schema-validity describe block + future SECURITY invariants `prompt-composed-secret-redaction` and `prompt-composed-trust-marker` (lands alongside reference-host emission per RFC 0021 staging precedent). 2026-05-18 (RFC 0022 `Draft` — runtime variable mapping) added four `it.todo()` placeholder scenarios covering the new mapping surfaces on `core.dispatch` (§A — `dispatch-input-mapping.test.ts`, `dispatch-output-mapping.test.ts`, `dispatch-cross-worker-handoff.test.ts`) and `core.subWorkflow` (§B — `subworkflow-input-mapping.test.ts`). Gated on `capabilities.agents.dispatchMapping` (dispatch trio) and `capabilities.subWorkflow.inputMapping` (subWorkflow). Promote to live assertions when RFC 0022 reaches `Active` + a reference host advertises the matching flags. 2026-05-17 (RFC 0003 §D handoff-schema enforcement, HV-1) added `agentPackHandoffSchemaValidation.test.ts` — verifies the host validates dispatch payloads against `handoff.taskSchemaRef` AND return payloads against `handoff.returnSchemaRef` per RFC 0003 §D. Paired with the new `agent-pack-handoff-schema-enforcement` row in `SECURITY/invariants.yaml`. 2026-05-17 (AI Envelope gap-closure, DRAFT v1.x — `spec/v1/ai-envelope.md`) added 7 advertisement-shape scenarios with `it.todo()` behavioral placeholders gated on `capabilities.envelopeContracts.advertised: true`: `aiEnvelope.universalKinds.test.ts`, `aiEnvelope.schemaDrift.test.ts`, `aiEnvelope.correlationReplay.test.ts`, `aiEnvelope.contractRefusal.test.ts`, `aiEnvelope.trustBoundaryPropagation.test.ts`, `aiEnvelope.redaction.test.ts`, `aiEnvelope.capBreached.test.ts`. Paired with the new `envelope-redaction-sr-1-carry-forward` row in `SECURITY/invariants.yaml`. 2026-05-17 (post-publish hardening, deep audit of `core.openwop.agents`) added `agents-run-tool-allowlist.test.ts` — server-free scenario locking in the `core.openwop.agents@1.0.1` safety-fix that closes `OPENWOP-AUDIT-2026-003` (function-typed `tool.handler` properties rejected at `validateTools()` with `INVALID_TOOL_DECLARATION`; tool-driven runs require `ctx.agentRuntime`; tool-less safe fallback preserved). Paired with the new `agents-run-no-raw-handler` row in `SECURITY/invariants.yaml`. Same-day post-publish hardening added `idempotency-key-determinism.test.ts` — server-free scenario locking in the `core.openwop.http@1.1.2` determinism safety-fix (default `composite` mode produces deterministic keys in `(runId, nodeId, payload)`; removed `uuid` mode rejects with `CONFIG_INVALID`; cross-impl vector test lets third-party reimplementations verify wire agreement). Paired with the new `idempotency-key-deterministic` row in `SECURITY/invariants.yaml`. 2026-05-17 (Phase 3 of RFC 0013) added three server-free scenarios exercising the reference workflow-chain expansion library (`conformance/src/lib/workflow-chain-expansion.ts`): `workflow-chain-expansion.test.ts` (parameter substitution + node id collision avoidance + edge rewriting + capability propagation + runtime-invariance contract), `workflow-chain-unresolvable-typeid.test.ts` (rejection with `chain_unresolvable_typeid` when a chain references an unknown typeId), and `workflow-chain-pack-signature-verification.test.ts` (Ed25519 verification recipe reuse from `node-packs.md §Signing`). Earlier that day (Phase 1) added `workflow-chain-pack-manifest-validation.test.ts` — server-free schema-validation scenario covering the new `workflow-chain-pack-manifest.schema.json` (positive sample + two negatives: kind/contents mismatch and invalid `chainId`). Closes RFC 0013 (`Workflow-chain packs`, `Draft`) Phases 1 + 3 alongside the new `spec/v1/workflow-chain-packs.md`, the `Capabilities.workflowChainPacks` block, and the registry build-index/conformance-check `kind` routing from Phase 2. Earlier that day, the suite added 27 `it.todo()` placeholder scenarios paired with RFCs 0014-0020 (host capability surfaces — fs, kvStorage, tableStorage, queueBus, sql/vector/search, blob/cache, mcp.serverMount). These promote to live assertions when each RFC reaches `Active` + the matching capability block lands in `schemas/capabilities.schema.json` + a reference host advertises the capability. Earlier additions include 18 Multi-Agent Shift scenarios (Phases 1-5) added 2026-05-10, the `registry-public.test.ts` public-registry healthcheck added 2026-05-11 (opt-in via `OPENWOP_TEST_PUBLIC_REGISTRY=true`), the `replay-llm-cache-key.test.ts` placeholder added 2026-05-11 (three `it.todo()` cases for the cross-host LLM cache-key recipe per `replay.md` §"LLM cache-key recipe"), the two `production-*.test.ts` scenarios added 2026-05-11 for the `openwop-production` profile per RFC 0009 (`production-backpressure.test.ts`, `production-retention-expiry.test.ts`), the four `auth-*.test.ts` scenarios added 2026-05-11/12 for the production-auth profiles per RFC 0010 (`auth-api-key-rotation.test.ts`, `auth-oauth2-client-credentials.test.ts`, `auth-oidc-user-bearer.test.ts`, `auth-mtls.test.ts` (opt-in via `OPENWOP_TEST_MTLS=1`)), `replay-retention-expiry.test.ts` added 2026-05-12 (capability shape + 410/422 envelope per `replay.md` §"Retention and garbage collection"), `bulk-cancel.test.ts` added 2026-05-12 (Phase B close-out of R1 — `POST /v1/runs:bulk-cancel`), the two Phase H launch-blocker advertisement-contract scenarios added 2026-05-12 (`mcp-toolcall-redaction.test.ts` for the MCP-1 invariant per `host-capabilities.md §host.mcp` + `threat-model-prompt-injection.md §UNTRUSTED`, and `http-client-ssrf.test.ts` for the SSRF + body-size cap advertisement contract on `capabilities.httpClient`), the `wasm-pack-abi-version-rejection.test.ts` Track 7 scenario added 2026-05-12 for the ABI-mismatch positive path via the `vendor.openwop.misbehaving-abi` pack per RFC 0008 §H, the `otel-trace-propagation-subworkflow.test.ts` Track 11 close-out added 2026-05-13 (parent + child run spans share the inbound traceparent's traceId across the `core.subWorkflow` dispatch boundary), and the three RFC 0012 (Memory Compaction Profile, `Active`) scenarios added 2026-05-13/14: `memory-compaction-sr1-carry-forward.test.ts` (load-bearing SR-1 §D), `memory-compaction-event-emitted.test.ts` (canonical §B payload shape), and `memory-compaction-provenance-tag.test.ts` (soft assertion on §C `compacted-from:<id>` convention). All three gate on `capabilities.memory.compaction.supported` + the host's test seam at `/v1/test/memory/{seed,compact}` (Postgres reference host enables both via `OPENWOP_MEMORY_COMPACTION=true OPENWOP_TEST_TRIGGER_COMPACTION=true`). 2026-05-15 (gap-closure CF-3) added `interrupt-token-matrix.test.ts` (malformed / unknown / replay / cross-run-id paths on `GET|POST /v1/interrupts/{token}`). 2026-05-31 (RFC 0078 portable tool catalog + RFC 0079 credential provenance / egress policy — the Active→Accepted behavioral gate) added four: `tool-catalog-projection.test.ts` (capability-gated on `toolCatalog.supported` via `behaviorGate('openwop-tool-catalog', …)` — the NORMATIVE `GET /v1/tools` list with each `ToolDescriptor` schema-valid + `source`/`safetyTier` in the closed vocab + content-free, `GET /v1/tools/{toolId}` round-trip + unknown-id 404, 401-unauthenticated, and the §F-2 cross-principal non-disclosure; black-box, no POST seam), `tool-session-lifecycle.test.ts` (gated on `toolCatalog.sessionLifecycle` — the §D `tool.session.opened`-before / `tool.session.closed`-after bracket over the RFC 0064 call events via the `POST /v1/host/sample/tools/session-run` seam, one shared `sessionId`, content-free), `egress-audience-binding.test.ts` (KEYSTONE — gated on `httpClient.egressPolicy.supported`; the §C confused-deputy MUST via `POST /v1/host/sample/egress/decide`: an out-of-audience egress is denied/downgraded with the credential NOT attached, a provenance-unevaluable egress fails closed — the behavioral leg of `egress-credential-audience-bound`), and `egress-decision-content-free.test.ts` (the SR-1 canary — the credential value never surfaces in `egress.decided` and `reason` stays in the CLOSED vocabulary). The maintained scenario-to-spec map lives in [`coverage.md`](./coverage.md); this README keeps the operator quickstart and the historical scenario notes below.
96
+ The current suite has 323 scenario files under `src/scenarios/`. 2026-06-01 (RFC 0085 — `openwop-agent-platform` meta-profile, the Active→Accepted behavioral gate) added `agent-platform-aggregate-evidence.test.ts` (capability-gated on a host CLAIMING `openwop-agent-platform` in its live discovery `profiles[]` via `behaviorGate('openwop-agent-platform', …)` — the §C/§D honest-advertisement rule on the live `/.well-known/openwop`: the claim MUST satisfy the §B floor predicate (`isAgentPlatformPartial` → `partial`/`full`, never `none`), backed by the per-capability evidence not the profile string; `OPENWOP_AGENT_PLATFORM_TIER=full` forces the non-vacuous full bar — all governance terms + tenant installScope + all 16 §D terms; server-requiring, the always-on §B/§D derivation legs stay in `agent-platform-profile.test.ts` — the RFC 0085 → Accepted bar). 2026-06-01 (RFC 0084 — budget, quota + cost policy, the Active→Accepted behavioral gate) added `budget-enforcement.test.ts` (capability-gated on `budget.supported` via `behaviorGate('openwop-budget-enforcement', …)` — the §C/§D enforcement via the new `POST /v1/host/sample/budget/run` seam + the test event-log seam: a `hard-cost-exhaust` run emits the strict-ordered `budget.reserved → budget.consumed → budget.threshold.crossed{percent} → budget.exhausted → cap.breached{kind:"budget-cost"} → run.failed{error:"budget_exhausted"}` chain; a `model-denied` run is refused `budget_model_denied` BEFORE the provider call (fail-closed); an `advisory` host emits the `budget.*` events without stopping; every `budget.*` payload content-free backing `budget-no-pricing-leak`; new lib helper `src/lib/budgetPolicy.ts`; soft-skips on 404 — the RFC 0084 → Accepted bar). 2026-06-01 (RFC 0080 — agent memory capability reconciliation, the Active→Accepted behavioral gate) added `memory-degraded-projection.test.ts` (capability-gated on `agents.manifestRuntime.supported` + `memory.supported` via `behaviorGate('openwop-memory-degraded', …)` — the §C degraded-projection iff-contract on the NORMATIVE `GET /v1/agents`: a degraded inventory entry MUST carry `memoryDegraded:true` + a non-empty, unique `degradedMemoryDimensions[]` from the closed §A-name enum, a non-degraded entry MUST NOT, the inventory is non-empty, and the degraded branch runs non-vacuously when `OPENWOP_DEGRADED_AGENT_ID` names a known-degraded agent; black-box, no POST seam — the RFC 0080 → Accepted bar). This batch also documents the two RFC 0068 conformance seams (`POST /v1/host/sample/memory/consolidate` + `.../commitment/fire`) in `host-sample-test-seams.md` (the 0068 gated scenarios shipped in 1.14.0). 2026-06-01 (RFC 0034 — collector-side BYOK-canary inspection) added `otel-collector-canary-inspection.test.ts` (always-on server-free: stands up a real `OtelCollector`, POSTs synthetic OTLP/HTTP-JSON traces + metrics through its actual ingest path, and proves the new `findCanaryLeakage()` inspector catches a canary embedded in a span attribute / resource attribute / span name / metric data-point attribute while reporting ZERO hits on a redacted payload and never matching an empty canary — the non-vacuous proof that the conformance collector now inspects what the host's OTLP exporter ACTUALLY shipped over the wire, closing the `secret-leakage-otel-attribute` / `-debug-bundle-otel` collector-seam gap; the live capability-gated complement is the new collector-export describe block in `secret-leakage-otel-attribute.test.ts`). 2026-06-01 (RFC 0035 — sandbox wall-clock timeout, the 7th-of-8 graduation) added `sandbox-wasm-timeout.test.ts` (worker-driven server-free: `probeTimeout` in `wasm-sandbox-probe.ts` spawns a worker thread running the committed `misbehaving-timeout.wasm` + a main-thread kill-timer — the thread preemption a same-thread probe can't do — asserting `sandbox_timeout` with a well-behaved positive control; graduates `node-pack-sandbox-timeout` reference-impl→protocol, so 7 of 8 `node-pack-sandbox-*` invariants are now protocol-tier, only the JS-specific `no-eval` permanently exempt). 2026-05-31 (audit-response black-box / graduation batch) added three more: `sandbox-wasm-isolation.test.ts` (RFC 0035 — drives the committed `fixtures/wasm-sandbox/*.wasm` through `wasm-sandbox-probe.ts`: escape/capability-gate via static `WebAssembly.Module.imports()`, an OOB-store memory trap, double-instantiate isolation; 10/10; graduates 6 `node-pack-sandbox-*` invariants reference-impl→protocol), `workspace-cross-tenant-isolation-blackbox.test.ts` (RFC 0059 — two-credential black-box on the normative §C `/v1/host/workspace/files` endpoints: owner A writes, a second-tenant credential fails closed; no seam), and `prompt-resolution-chain-event.test.ts` (RFC 0029 — reads the durable `agent.promptResolved.chain[]` precedence record via the normative `GET /v1/runs/{runId}/events/poll`; no seam) — each the production-path proof that graduates its surface into the `openwop-core-standard` floor. 2026-05-31 (RFC 0088 — the `openwop-core-standard` Core Standard Profile, the audit-response Core Candidate target) added `core-standard-profile.test.ts` (always-on server-free derivation probe: `isCoreStandard` derives the §B floor — `openwop-core` ∧ `openwop-interrupts` ∧ (`openwop-stream-sse` ∨ `openwop-stream-poll`) — a bare `openwop-core` host without interrupts is excluded, a host with no event transport fails, and the annex is absent from `deriveProfiles` because it composes rather than redefines). 2026-05-31 (RFC 0082 — agent deployment lifecycle, the Active→Accepted behavioral gate) added `agent-deployment-lifecycle.test.ts` (capability-gated on `agents.deployment.supported` via `behaviorGate('openwop-deployment-lifecycle', …)` — the §E promotion contract via the new `POST /v1/host/sample/agents/deployment-transition` seam + the test event-log seam across four legs: `promote` (authorize RFC 0049 → approvalGate RFC 0051 → eval-verify RFC 0081 → content-free `deployment.promoted` with a seven-state `toState` + `toVersion`, the record validating `agent-deployment.schema.json`), `unauthorized` (fail-closed — `allowed:false`, no `deployment.promoted`, the behavioral leg of `deployment-promotion-fail-closed`), `eval-gate-unmet` (`eval_gate_unmet` denial, §E-3), and `channel-pin` (the §B `resolvedAgentVersion` recorded-fact on `agent.invocation.started`); new lib helper `src/lib/agentDeployment.ts`; soft-skips on 404 — the RFC 0082 → Accepted bar). 2026-05-31 (RFC 0081 — agent evaluation, the Active→Accepted behavioral gate) added `agent-eval-run.test.ts` (capability-gated on `agents.evalSuite.supported` via `behaviorGate('openwop-eval-run', …)` — the §B `mode:"eval"` projection via the new `POST /v1/host/sample/agents/eval-run` seam + the test event-log seam: `eval.started`-first → one `eval.scored` per task → `eval.completed`-once ordering (count == `eval.completed.taskCount`), the content-free `eval.scored` legs (`score` ∈ 0..1) backing `eval-summary-no-content-leak`, and the NORMATIVE `GET /v1/runs/{runId}/eval-summary` schema-valid `EvalSummary` round-trip with `passedCount <= taskCount`; new lib helper `src/lib/agentEval.ts`; soft-skips on 404 — the RFC 0081 → Accepted bar). 2026-05-31 (RFC 0083 — durable trigger bridge, the Active→Accepted behavioral gate) added `trigger-bridge-delivery.test.ts` (profile-gated on `openwop-trigger-bridge` derived from the live discovery doc — the §C delivery model via the `POST /v1/host/sample/trigger-bridge/deliver` seam + the test event-log seam: dedup→effectively-once `trigger.delivery.attempted{delivered}` (§C-1), retry-exhaustion→`{dead-lettered}` + `trigger.subscription.state.changed{toState:dead-lettered}` (§C-2 + RFC 0053), and the delivered run's `run.started.causationId` == the delivery id (§C / RFC 0040); both `trigger.*` events content-free; the always-on shape stays in `trigger-bridge-shape.test.ts`; new lib helper `src/lib/triggerBridge.ts`). 2026-05-31 (RFC 0087 — agent org-chart, the Active→Accepted behavioral gate) added two capability-gated behavioral scenarios (both gated on `agents.orgChart.supported`, black-box on the normative `/v1/agents/org-chart` surface — no new POST seam): `agent-org-chart-scoping.test.ts` (the `GET /v1/agents/org-chart` tree-shape — departments form an acyclic `parentDepartmentId` tree, members reference `host:<id>` roster entries — + the §D responsibility roll-up via `GET /v1/agents/org-chart/{departmentId}` with a deduped `responsibilities[]` union + the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ORG_CHART_DEPARTMENT_ID`) and `org-position-no-authority-escalation.test.ts` (the behavioral leg of the protocol-tier invariant — the live org-chart wire carries NO authority-bearing field on any member/department/responsibility-view object; the structural leg stays always-on in `agent-org-chart-shape.test.ts`, and the deeper RFC 0049/0051 authority-invariance legs stay reference-impl tier per the `agent-manifest-runtime` no-host-hook precedent). 2026-05-31 (RFCs 0086 + 0077 — the Active→Accepted behavioral gate) added four capability-gated behavioral scenarios so a non-steward host can be mechanically certified non-vacuously under `OPENWOP_REQUIRE_BEHAVIOR=true`: `agent-roster-attribution.test.ts` (RFC 0086 §B/§C; gated on `agents.roster.supported` — the normative `GET /v1/agents/roster` read shape + `total==roster.length`, the §C `roster.run.initiated`-before-`agent.invocation.started` ordering, the content-free payload backing `roster-attribution-no-content`, the durable work-item `triggerSubscriptionId`, and the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ROSTER_ID`), `agent-live-invocation-bracket.test.ts` (RFC 0077 §E; gated on `agents.liveRuntime.supported` — `agent.invocation.started`-first / `agent.invocation.completed`-last bracket, matching `invocationId`, `source`/`outcome` closed enums, content-free), `agent-live-structured-output.test.ts` (RFC 0077 §B step 6; gated on `agents.liveRuntime.structuredOutput` — a result violating `handoff.returnSchemaRef` fails the invocation `outcome:"failed"` rather than shipping as completed), and `agent-live-allowlist-enforced.test.ts` (RFC 0077 §F-1 / RFC 0002 §A14; gated on `agents.liveRuntime.supported` — a tool outside `toolAllowlist` is not callable); all four drive the documented `POST /v1/host/sample/roster/fire` + `POST /v1/host/sample/agents/live-invoke` seams plus the test event-log seam and soft-skip on 404 (these are the RFC 0086 / 0077 Active→Accepted bars). 2026-05-30 (RFC 0087 — agent org-chart, Draft -> Active) added `agent-org-chart-shape.test.ts` (always-on server-free: the `capabilities.agents.orgChart` shape + the `AgentOrgChart` round-trip + the non-`host:` member negative + the **§B structural non-authority guarantee** — the schema rejects a `scopes`/`canDispatch`/`permissions`/`authority` field on a member (`additionalProperties:false`), and a member's key set is exactly `{rosterId, departmentId, roleId, reportsTo}` — backing the protocol-tier `org-position-no-authority-escalation` invariant; no new RunEventType). 2026-05-30 (RFC 0086 — standing agent roster, Draft -> Active) added `agent-roster-shape.test.ts` (always-on server-free: the `capabilities.agents.roster` shape + the `AgentRosterEntry` round-trip + the `host:` `rosterId` + `agentRef` version-XOR-channel negatives + the content-free `roster.run.initiated` negatives backing the protocol-tier `roster-attribution-no-content` invariant + the additive `roster` inventory projection + RunEventType-enum membership). 2026-05-30 (RFC 0082 — agent deployment lifecycle, Draft -> Active) added `agent-deployment-shape.test.ts` (always-on server-free: the `capabilities.agents.deployment` shape + the `AgentDeployment` record round-trip + the `AgentRef` `channel` XOR `version` `not`-clause + the four `deployment.*` payloads + the content-free negatives backing the protocol-tier `deployment-event-no-content-leak` invariant). 2026-05-30 (RFC 0085 — `openwop-agent-platform` meta-profile, Draft -> Active) added `agent-platform-profile.test.ts` (always-on server-free derivation of the operational-annex `none`/`partial`/`full` status: all-floor ⇒ partial, missing-flag ⇒ none, the replay-OR-`nondeterminismPolicy.declared` term, floor+governance ⇒ full, missing-tenant-scope ⇒ partial-not-full per the honest-advertisement rule, eval/deploy/budget-are-advisory-not-hard-terms, + the `capabilities.nondeterminismPolicy.declared` shape). 2026-05-30 (RFC 0084 — budget, quota + cost policy, Draft -> Active) added `budget-policy-shape.test.ts` (always-on server-free: `budget-policy.schema.json` round-trip + the §A orthogonality guard — a wall-time field is rejected (it's RFC 0058's `runTimeoutMs`) — + threshold/onExhaustion negatives + the four content-free `budget.{reserved,consumed,threshold.crossed,exhausted}` payloads + the four `cap.breached{budget-*}` kinds + RunEventType-enum membership + the no-pricing-property structural check backing the protocol-tier `budget-no-pricing-leak` invariant + the `capabilities.budget`/`limits.maxBudget*` shape). 2026-05-30 (RFC 0083 — durable trigger + channel bridge, Draft -> Active) added `trigger-bridge-shape.test.ts` (always-on server-free: `trigger-subscription.schema.json` round-trip + missing-`state`/out-of-enum-`source`/unknown-property negatives + the four-state vocab + the two content-free `trigger.{subscription.state.changed,delivery.attempted}` payloads incl. closed `state`/`outcome` enums + RunEventType-enum membership + the `triggerBridge`/`webhooks.durable` capability shape + the `openwop-trigger-bridge` profile derivation incl. the no-dead-letter-sink negative). 2026-05-30 (RFC 0079 — credential provenance + egress policy, Draft -> Active) added `egress-provenance-shape.test.ts` (always-on server-free: `credential-provenance.schema.json` round-trip + `audiences:[]`/missing-`credentialId`/unknown-property negatives + the no-secret-property structural check backing the protocol-tier `egress-decision-no-secret-leak` invariant + the content-free `egress.decided` record incl. the `decision` enum + RunEventType-enum membership + the `httpClient.egressPolicy` shape; the behavioral `egress-credential-audience-bound` confused-deputy MUST is reference-impl tier, deferred to a host). 2026-05-30 (RFC 0078 — portable tool catalog, Draft -> Active) added `tool-descriptor-shape.test.ts` (always-on server-free: `tool-descriptor.schema.json` round-trip + the §C-1 `exec` ⇒ `host-extension` cross-field MUST (RFC 0069) + the `safetyTier`-required negative + `additionalProperties:false`, the `capabilities.toolCatalog` `supported`/`sources`/`sessionLifecycle` shape, and the two content-free `tool.session.{opened,closed}` payload $defs incl. the closed `outcome` enum + RunEventType-enum membership). 2026-05-30 (RFC 0080 — agent memory capability reconciliation, Draft -> Active) added `memory-capability-model-shape.test.ts` (always-on server-free: the additive `capabilities.memory.{writable,search,retention}` dimension shapes + malformed-instance negatives — `retention.ttl` non-boolean, out-of-enum `search.modes`, unknown property under `additionalProperties:false` — the `agent-inventory-response` `memoryDegraded`/`degradedMemoryDimensions` closed-enum fields, and the `openwop-memory` derivation surfacing for read/write + long-term hosts while withholding from `writable:false`). 2026-05-30 (RFC 0081 — agent evaluation, Draft -> Active) added `agent-eval-suite-shape.test.ts` (always-on server-free: the `capabilities.agents.evalSuite` shape + the `AgentEvalSuite`/`EvalSummary` schema round-trips + the three `eval.{started,scored,completed}` payloads + the content-free negatives — a task entry with a `taskOutput` body, a `safetyFinding` with an `excerpt` — backing the new `eval-summary-no-content-leak` SECURITY invariant). 2026-05-29 (RFC 0076 §B — `ctx.http.safeFetch` live-run audit) added `safefetch-live-audit.test.ts` (`behaviorGate('openwop-safefetch-live-audit', …)`, gated on `httpClient.safeFetch` + `toolHooks.prePostEvents`) — asserts the audit-when-both MUST against the **durable run event log** via the new `POST /v1/host/sample/http/safe-fetch-run` open seam + the test event-log seam, closing the seam-vs-production gap (a production `createSafeFetch()` with no audit hooks passes the inline `safefetch-behavior.test.ts` but FAILS this under `OPENWOP_REQUIRE_BEHAVIOR=true`); this is the RFC 0076 §B → Accepted bar; run seam soft-skips on 404 (host-pending). 2026-05-29 (RFC 0066 — `x-openwop-form` picker UX hints, Draft → Active) added `x-openwop-form-pack-manifest.test.ts` (always-on server-free: an annotated `configSchema` stays a valid 2020-12 schema + the advisory hints don't change what it accepts, each §A annotation matches the shape, an unknown `kind` validates for forward-compat, 3 negatives — missing/non-string `kind`, non-string `dependsOn`). 2026-05-29 (RFC 0076 §B — `ctx.http.safeFetch`) added `safefetch-behavior.test.ts` (seam-gated: SSRF block / DNS-rebinding / `Connection: upgrade` refusal / tool-hooks audit-when-both, via `POST /v1/host/sample/http/safe-fetch`; advertisement contract stays in `http-client-ssrf.test.ts`). 2026-05-29 (RFC 0076 §A — pack `runtime.requires[]` install gate) added two: `runtime-requires-shape.test.ts` (server-free closed-vocabulary validation — the 8 tokens validate, a raw builtin name is rejected, empty-array≡omission, `uniqueItems`) + `runtime-requires-install-gate.test.ts` (seam-gated install-grant / install-refuse → `pack_runtime_requirement_unmet` / non-sandbox SHOULD-projection, soft-skip on 404 via `POST /v1/host/sample/packs/install-gate`). 2026-05-29 (RFC 0047 — `host.oauth` authorization-code roundtrip) added `oauth-authorization-code-roundtrip.test.ts` — capability-gated on `capabilities.oauth.supported` + `grants` including `authorization_code`; drives the `POST /v1/host/sample/oauth/authorize-code-roundtrip` seam against the one canonical synthetic provider in `fixtures/oauth-providers/synthetic.json` (soft-skip on 404, Tier-2 host-pending), asserting a successful grant returns a credential REFERENCE (token persisted as a `host.credentials` entry) and that the authorization code / state / PKCE verifier / acquired access+refresh tokens never appear on any run-visible surface (RFC 0047 §C + §C.2 / `credential-payload-redaction`). Closes the RFC 0047 Tier-2 gap (capability-shape + redaction scenarios existed; the actual authorization-code dance was unexercised). 2026-05-26 (RFC 0070 — agent-manifest runtime) added `agent-manifest-runtime.test.ts`; 2026-05-26 (RFC 0071 — artifact-type + chat card packs) added six: `artifact-type-pack-manifest-validation.test.ts` + `artifact-schema-compile-bounded.test.ts` (server-free) + `artifact-type-pack-install.test.ts` + `artifact-type-store-without-render.test.ts` + `chat-card-pack-manifest-validation.test.ts` (server-free) + `chat-card-pack-execution.test.ts` (capability-gated, host-pending). 2026-05-26 (RFCs 0067 / 0068 / 0069 — spec-gap Draft cohort) added five scenarios: `byok-auth-modes.test.ts` (RFC 0067; always-on schema-shape of `aiProviders.authModes` + a discovery-gated §B auth-mode-contract cross-field check), `memory-consolidation-shape.test.ts` (RFC 0068; always-on shape of `agents.memoryConsolidation`/`agents.commitments` + the `agent.memory.consolidated`/`commitment.fired` payload $defs), `memory-consolidation-idempotent.test.ts` + `commitment-fired.test.ts` (RFC 0068; capability-gated behavioral, soft-skip on the documented `/v1/host/sample/memory/consolidate` + `/commitment/fire` seams), and `exec-not-protocol-tier.test.ts` (RFC 0069; always-on server-free structural assertion that the protocol corpus defines no `core.*`/`openwop.*` exec-class primitive — backs the `exec-must-not-be-protocol-tier` SECURITY invariant). 2026-05-25 (RFC 0061 — stateful agent-loop lifecycle, executionModel.version 5) added four `agent-loop-*.test.ts` scenarios: `-version5-shape` (always-on; validates `executionModel.statefulResume`/`transcriptWindow` + the 1–5 version ceiling) plus `-iteration-monotonic` (gated on `version >= 5`; `runOrchestrator.decided.iteration` increments 1,2,3… exactly once per turn), `-workspace-snapshot` (gated additionally on `host.workspace.supported`; a turn-i workspace write is invisible to turn i, visible to turn i+1), and `-stateful-resume` (gated on `statefulResume`; a mid-loop suspend resumes at the same iteration without resetting the counter) — the three behavioral scenarios drive the documented agent-loop seam (`POST /v1/host/sample/agentloop/run`) and soft-skip until a host wires it. 2026-05-25 (RFC 0059 — host.workspace M2, reference-host enforcement) added two `workspace-*.test.ts` scenarios: `-behavior` (capability-gated CRUD round-trip / `If-Match` 409 `workspace_conflict` / `workspace_too_large` / §D run-start snapshot, all via the real `/v1/host/workspace/files` §C endpoints) and `-cross-tenant-isolation` (WCT-1 — drives the documented `POST /v1/host/sample/workspace/op` seam to assert a file owned by one `{tenant, workspace}` is unreadable, on both `get` and `list`, under a different owner; backs the new `workspace-cross-tenant-isolation` SECURITY invariant). The in-memory reference host now advertises `capabilities.workspace.supported` and honors §C/§D/§E end-to-end. 2026-05-25 (RFC 0062 — memory.distillation "dreams") added five `distillation-*.test.ts` scenarios: `-shape` (always-on; validates the `capabilities.memory.distillation` block + the additive `distillation` sub-object on `memory.compacted`) plus `-token-budget` (within budget `tokensUsed ≤ tokenBudget`; an un-meetable budget → `token_budget_exceeded` with no partial archive), `-stable-archive` (same sources + budget ⇒ byte-stable archive checksum), `-index-roundtrip` (gated additionally on `indexEmitted`; the `MEMORY-INDEX.json` workspace file is retrievable + `workspace.updated` fired), and `-secret-carryforward` (SR-1: a redacted source secret never appears in the archive) — the four behavioral scenarios drive the documented memory-distillation seam (`POST /v1/host/sample/memory/distill`) and soft-skip until a host wires it. 2026-05-25 (RFC 0063 — core.subWorkflow.outputAttestation) added four `subrun-*.test.ts` scenarios: `-attestation-shape` (always-on; validates the `capabilities.agents.subRunAttestation` flag) plus `-checksum-stable` (the child output checksum is the byte-stable, key-order-invariant RFC 8785 JCS + SHA-256 digest), `-approval-gate` (`requireApproval` → `accept` merges, `reject` does not), and `-approval-fail-closed` (no `accept`/`edit-accept` → no merge; backs the deferred `subrun-merge-approval-fail-closed` invariant) — the three behavioral scenarios drive the documented sub-run attestation seam (`POST /v1/host/sample/subrun/attest`) and soft-skip until a host wires it. 2026-05-25 (RFC 0064 — host.toolHooks) added five `tool-hooks-*.test.ts` scenarios: `-shape` (always-on; validates the `capabilities.toolHooks` block + the optional content-free fields on `agentToolCalled` / `agentToolReturned`) plus `-content-free` (gated on `prePostEvents`), `-authorization-fail-closed` (gated on `perToolAuthorization`), `-rate-limit` (gated on `perToolRateLimit`), and `-secret-redaction` (gated on `prePostEvents` + the SR-1 `argsHash` redaction rule) — the four behavioral scenarios drive the documented tool-hooks invoke seam (`POST /v1/host/sample/toolhooks/invoke`) and soft-skip until a host wires it. 2026-05-25 (RFC 0060 — host.heartbeat) added four `heartbeat-*.test.ts` scenarios: `-capability-shape` (always-on; validates the `capabilities.heartbeat` block) plus `-fires-once-per-tick`, `-idempotent-no-spam`, and `-runtime-bound` (gated on `capabilities.heartbeat.supported` + the host heartbeat tick seam; soft-skip until a host wires it). 2026-05-25 (RFC 0057 — memory write-attribution) added five `memory-attribution-*.test.ts` scenarios: `-shape` (always-on advertisement check on `capabilities.memory.attribution`), plus `-no-content`, `-tenant-scoped`, `-emits-on-write`, and `-replay-stable` (gated on `capabilities.memory.attribution.emitsWriteEvents`) verifying the content-free `memory.written` RunEvent, its two SECURITY invariants (`memory-attribution-no-content` + `memory-attribution-tenant-scoped`), and the §D replay rule that a `replay`-mode fork MUST NOT regenerate `memoryId`. 2026-05-25 (RFC 0025 §C point 1 — test-catalog isolation invariant; pairs with the 25 publish-error scenarios in `pack-registry-publish.test.ts`) added `pack-registry-isolation.test.ts` — capability-gated on `capabilities.packs.testMode.{supported, isolated}: true`; PUTs a disposable pack into `/v1/packs-test/{name}` and asserts the same `(name, version)` does NOT appear via `GET /v1/packs/{name}` — anchors the test-catalog isolation MUST in RFC 0025 §C. 2026-05-25 (RFC 0028 Tier-2 post-promotion T2 — read-side sister scenario for workspace-membership enforcement) added `prompt-read-workspace-membership-enforced.test.ts` — gates on `capabilities.prompts.supported: true` (broader than `mutableLibrary` so read-only hosts that expose `?workspaceId=` are also probed); drives `GET /v1/prompts?workspaceId=<random-non-member>` and interprets the response: 4xx PASS (canonical envelope check on 403); 200 with empty `templates[]` PASS (correct null result for a nonexistent workspace); 200 with non-empty `templates[]` FAIL (cross-tenant leak); 200 without `templates[]` field SKIP (host doesn't expose workspace-scoped reads). Verifies SECURITY invariant `prompt-read-workspace-membership-enforced`. Same-day T1 strengthened `prompt-mutation-workspace-membership-enforced.test.ts` to pin `error === "workspace_membership_required"` when the host's refusal status is 403 (other refusal codes unconstrained). 2026-05-25 (RFC 0028 Tier-2 follow-up — workspace-membership enforcement on mutating prompt endpoints, filed in response to a self-disclosed adopter vulnerability) added `prompt-mutation-workspace-membership-enforced.test.ts` — capability-gated on `capabilities.prompts.mutableLibrary: true`; drives `POST /v1/prompts` with a cryptographically-random non-member `workspaceId` and asserts the host refuses (NOT a 2xx; any 4xx/5xx is acceptable — silent success is the failure mode). Verifies SECURITY invariant `prompt-mutation-workspace-membership-enforced`. 2026-05-22 (RFC 0034 §B follow-up — secret-leakage harness against the OTel + debug-bundle seams) added `secret-leakage-otel-attribute.test.ts` — gates on `capabilities.secrets.supported` + `capabilities.observability.testSeams.{otelScrape,debugBundleExport}` AND the `OPENWOP_CANARY_SECRET_VALUE` env (host operator + conformance runner agree on the canary). Drives the existing `openwop-smoke-byok-roundtrip` fixture end-to-end; scrapes both seams after run completion; hard-fails if the canary plaintext appears in any OTel span attribute or debug-bundle field. Verifies SECURITY invariants `secret-leakage-otel-attribute` + `secret-leakage-debug-bundle-otel`. 2026-05-22 (RFC 0041 Phase 4 — replay determinism under nondeterministic models) added three scenarios: `replay-divergence-at-refusal.test.ts` (advertisement-shape probe on `replayDeterminism.refusalDivergenceEmission` + 2 `it.todo` for the dual-direction refusal-divergence case), `replay-observable-sequence-determinism.test.ts` (capability-gated; behavioral assertion soft-skipped until a `conformance-phase4-nondet-tool` fixture ships), `replay-llm-cache-key-portable.test.ts` (intra-host reproducibility + non-recipe-field invariance + Phase 4 advertisement alignment — reuses the existing `POST /v1/host/sample/test/llm-cache-key` seam from the sibling `replay-llm-cache-key.test.ts`). 2026-05-20 (RFC 0027 §A templateKinds-coverage follow-up — paired with `prompt-end-to-end-events.test.ts`) added `prompt-all-four-kinds-events.test.ts` exercising all four `PromptKind` values (`system`, `user`, `schema-hint`, `few-shot`) end-to-end through the reference workflow-engine sample's `local.sample.demo.mock-ai` dispatch path; capability-gated via `behaviorGate('prompts-supported', ...)`. Closes the credibility gap where the host advertised `templateKinds: ["system", "user", "few-shot", "schema-hint"]` but only the system+user pair was actually wired into dispatch. 2026-05-20 (RFCs 0030–0033 — envelope LLM-contract-hardening track) added 15 scenarios across four `Active` RFCs: `envelope-reasoning-shape.test.ts` (RFC 0030, always-on; asserts the OPTIONAL `reasoning` property on the three universal-kind schemas + the `schema.response` deliberate omission), `envelope-reasoning-secret-redaction.test.ts` (RFC 0030, capability-gated on `capabilities.envelopes.reasoning.supported` + `secrets.supported`; 5 `it.todo()` placeholders for SECURITY invariant `envelope-reasoning-secret-redaction`), `envelope-tier-one-subset-static.test.ts` (RFC 0030, always-on for load-bearing rules — no `oneOf` / `allOf` / `not` / `prefixItems` / `propertyNames` anywhere; gated on `tierOneSubsetCompliance: "strict"` for OpenAI-strict-only constraints), `envelope-variant-discriminator-static.test.ts` (RFC 0031, always-on; asserts no `oneOf` + every `anyOf` branch declares a single-string-enum discriminator in `required` on every `schemas/envelopes/*.schema.json`), `model-capability-substituted.test.ts` (RFC 0031, advertisement-shape probe on `capabilities.modelCapabilities.advertised[]` identifier pattern + 5 `it.todo()` placeholders for SECURITY invariant `model-capability-substituted-no-credential-disclosure`), `model-capability-insufficient.test.ts` (RFC 0031, 6 `it.todo()` placeholders for refusal + no-recursive-fallback), `node-module-required-capabilities-shape.test.ts` (RFC 0031 SHOULD-tier authoring-convention; 4 `it.todo()` placeholders), and the six envelope-reliability events from RFC 0032 (`envelope-retry-attempted` carrying the shared advertisement-shape probe enforcing both MUST-tier events in `events[]` per RFC 0032 §C, plus `envelope-retry-exhausted`, `envelope-refusal-shape`, `envelope-truncated`, `envelope-nl-to-format-engaged`, `envelope-recovery-applied` — collectively 39 `it.todo()` placeholders covering retry/refusal/truncation/recovery + SECURITY invariants `envelope-refusal-no-prompt-leak` and `envelope-recovery-no-content-leak`), plus RFC 0033's two scenarios (`envelope-completion-distinguishes-truncation.test.ts` + `envelope-truncation-cap-exhaustion.test.ts` — 12 `it.todo()` placeholders covering the truncation-vs-schema-violation retry-routing distinction + the DoS-bound assertion). Reference workflow-engine sample advertises `capabilities.envelopes.reasoning: { supported: true, promptDirective: "off" }` + `tierOneSubsetCompliance: "warn"` honestly (schemas accept the field; host doesn't yet inject the directive); the other three RFCs' capability blocks defer to reference-host emission code per the staged RFC 0027 §G precedent. 2026-05-20 (RFC 0028 §B Phase B — prompt-pack boot-time install) added `prompt-pack-install.test.ts` (capability-gated on `capabilities.prompts.endpointsSupported: true`; asserts a host that ran the boot-time pack loader surfaces ≥ 1 pack-source template under `GET /v1/prompts?source=pack` carrying the canonical `meta.source: "pack"` + `meta.packName` + `meta.packVersion` stamps; positively identifies the in-tree `vendor.openwop.prompt-sample` reference pack's `writer-system` template when present). Pairs with the new `host/promptPackLoader.ts` boot-time entry on the reference workflow-engine sample, which scans `examples/packs/*` plus `OPENWOP_PROMPT_PACKS_DIR` and calls `installPackTemplates()` for each `kind: "prompt"` pack found. 2026-05-20 (RFC 0029 Phase C — prompt resolution chain wire shape) added three more scenarios: `prompt-resolution-chain-node-wins.test.ts` (capability-gated on `capabilities.prompts.supported: true`; asserts layer-1 node-config supersedes lower layers per `spec/v1/prompts.md` §"Resolution chain (normative)"), `prompt-resolution-chain-agent-intrinsic.test.ts` (additionally gated on `capabilities.prompts.agentBindings: true`; asserts agent intrinsic `systemPromptRef` wins over `promptOverrides` AND lower layers when the node has no layer-1 ref), `prompt-resolution-chain-fallback-cascade.test.ts` (asserts layer 3 workflow-defaults wins over layer 4 host-defaults; layer 4 host-defaults wins when 1-3 yield null; resolved is null when all four yield null but chain[] still lists every attempted layer). The scenarios drive the host's `POST /v1/host/sample/prompt/resolve` test seam (reference-host implementation deferred to follow-up slice per RFC 0021 staging precedent). 2026-05-20 (RFC 0027 Phase A — prompt templates wire shape) added three scenarios: `prompt-template-shape.test.ts` (always-on; Ajv compileability + positive/negative round-trip for PromptTemplate + PromptRef + PromptKind), `prompt-composed-secret-redaction.test.ts` (capability-gated on `capabilities.prompts.supported: true` + `observability: "full"`; asserts `[REDACTED:<secretId>]` markers in `prompt.composed` payloads for `source: "secret"` variable bindings per SECURITY/threat-model-secret-leakage.md §SR-1), `prompt-composed-trust-marker.test.ts` (same capability gates; asserts `<UNTRUSTED>...</UNTRUSTED>` wrapping + `contentTrust: "untrusted"` propagation per RFC 0020 §D). Paired with new `fixtures/prompt-templates/` sub-directory + per-fixture schema-validity describe block + future SECURITY invariants `prompt-composed-secret-redaction` and `prompt-composed-trust-marker` (lands alongside reference-host emission per RFC 0021 staging precedent). 2026-05-18 (RFC 0022 `Draft` — runtime variable mapping) added four `it.todo()` placeholder scenarios covering the new mapping surfaces on `core.dispatch` (§A — `dispatch-input-mapping.test.ts`, `dispatch-output-mapping.test.ts`, `dispatch-cross-worker-handoff.test.ts`) and `core.subWorkflow` (§B — `subworkflow-input-mapping.test.ts`). Gated on `capabilities.agents.dispatchMapping` (dispatch trio) and `capabilities.subWorkflow.inputMapping` (subWorkflow). Promote to live assertions when RFC 0022 reaches `Active` + a reference host advertises the matching flags. 2026-05-17 (RFC 0003 §D handoff-schema enforcement, HV-1) added `agentPackHandoffSchemaValidation.test.ts` — verifies the host validates dispatch payloads against `handoff.taskSchemaRef` AND return payloads against `handoff.returnSchemaRef` per RFC 0003 §D. Paired with the new `agent-pack-handoff-schema-enforcement` row in `SECURITY/invariants.yaml`. 2026-05-17 (AI Envelope gap-closure, DRAFT v1.x — `spec/v1/ai-envelope.md`) added 7 advertisement-shape scenarios with `it.todo()` behavioral placeholders gated on `capabilities.envelopeContracts.advertised: true`: `aiEnvelope.universalKinds.test.ts`, `aiEnvelope.schemaDrift.test.ts`, `aiEnvelope.correlationReplay.test.ts`, `aiEnvelope.contractRefusal.test.ts`, `aiEnvelope.trustBoundaryPropagation.test.ts`, `aiEnvelope.redaction.test.ts`, `aiEnvelope.capBreached.test.ts`. Paired with the new `envelope-redaction-sr-1-carry-forward` row in `SECURITY/invariants.yaml`. 2026-05-17 (post-publish hardening, deep audit of `core.openwop.agents`) added `agents-run-tool-allowlist.test.ts` — server-free scenario locking in the `core.openwop.agents@1.0.1` safety-fix that closes `OPENWOP-AUDIT-2026-003` (function-typed `tool.handler` properties rejected at `validateTools()` with `INVALID_TOOL_DECLARATION`; tool-driven runs require `ctx.agentRuntime`; tool-less safe fallback preserved). Paired with the new `agents-run-no-raw-handler` row in `SECURITY/invariants.yaml`. Same-day post-publish hardening added `idempotency-key-determinism.test.ts` — server-free scenario locking in the `core.openwop.http@1.1.2` determinism safety-fix (default `composite` mode produces deterministic keys in `(runId, nodeId, payload)`; removed `uuid` mode rejects with `CONFIG_INVALID`; cross-impl vector test lets third-party reimplementations verify wire agreement). Paired with the new `idempotency-key-deterministic` row in `SECURITY/invariants.yaml`. 2026-05-17 (Phase 3 of RFC 0013) added three server-free scenarios exercising the reference workflow-chain expansion library (`conformance/src/lib/workflow-chain-expansion.ts`): `workflow-chain-expansion.test.ts` (parameter substitution + node id collision avoidance + edge rewriting + capability propagation + runtime-invariance contract), `workflow-chain-unresolvable-typeid.test.ts` (rejection with `chain_unresolvable_typeid` when a chain references an unknown typeId), and `workflow-chain-pack-signature-verification.test.ts` (Ed25519 verification recipe reuse from `node-packs.md §Signing`). Earlier that day (Phase 1) added `workflow-chain-pack-manifest-validation.test.ts` — server-free schema-validation scenario covering the new `workflow-chain-pack-manifest.schema.json` (positive sample + two negatives: kind/contents mismatch and invalid `chainId`). Closes RFC 0013 (`Workflow-chain packs`, `Draft`) Phases 1 + 3 alongside the new `spec/v1/workflow-chain-packs.md`, the `Capabilities.workflowChainPacks` block, and the registry build-index/conformance-check `kind` routing from Phase 2. Earlier that day, the suite added 27 `it.todo()` placeholder scenarios paired with RFCs 0014-0020 (host capability surfaces — fs, kvStorage, tableStorage, queueBus, sql/vector/search, blob/cache, mcp.serverMount). These promote to live assertions when each RFC reaches `Active` + the matching capability block lands in `schemas/capabilities.schema.json` + a reference host advertises the capability. Earlier additions include 18 Multi-Agent Shift scenarios (Phases 1-5) added 2026-05-10, the `registry-public.test.ts` public-registry healthcheck added 2026-05-11 (opt-in via `OPENWOP_TEST_PUBLIC_REGISTRY=true`), the `replay-llm-cache-key.test.ts` placeholder added 2026-05-11 (three `it.todo()` cases for the cross-host LLM cache-key recipe per `replay.md` §"LLM cache-key recipe"), the two `production-*.test.ts` scenarios added 2026-05-11 for the `openwop-production` profile per RFC 0009 (`production-backpressure.test.ts`, `production-retention-expiry.test.ts`), the four `auth-*.test.ts` scenarios added 2026-05-11/12 for the production-auth profiles per RFC 0010 (`auth-api-key-rotation.test.ts`, `auth-oauth2-client-credentials.test.ts`, `auth-oidc-user-bearer.test.ts`, `auth-mtls.test.ts` (opt-in via `OPENWOP_TEST_MTLS=1`)), `replay-retention-expiry.test.ts` added 2026-05-12 (capability shape + 410/422 envelope per `replay.md` §"Retention and garbage collection"), `bulk-cancel.test.ts` added 2026-05-12 (Phase B close-out of R1 — `POST /v1/runs:bulk-cancel`), the two Phase H launch-blocker advertisement-contract scenarios added 2026-05-12 (`mcp-toolcall-redaction.test.ts` for the MCP-1 invariant per `host-capabilities.md §host.mcp` + `threat-model-prompt-injection.md §UNTRUSTED`, and `http-client-ssrf.test.ts` for the SSRF + body-size cap advertisement contract on `capabilities.httpClient`), the `wasm-pack-abi-version-rejection.test.ts` Track 7 scenario added 2026-05-12 for the ABI-mismatch positive path via the `vendor.openwop.misbehaving-abi` pack per RFC 0008 §H, the `otel-trace-propagation-subworkflow.test.ts` Track 11 close-out added 2026-05-13 (parent + child run spans share the inbound traceparent's traceId across the `core.subWorkflow` dispatch boundary), and the three RFC 0012 (Memory Compaction Profile, `Active`) scenarios added 2026-05-13/14: `memory-compaction-sr1-carry-forward.test.ts` (load-bearing SR-1 §D), `memory-compaction-event-emitted.test.ts` (canonical §B payload shape), and `memory-compaction-provenance-tag.test.ts` (soft assertion on §C `compacted-from:<id>` convention). All three gate on `capabilities.memory.compaction.supported` + the host's test seam at `/v1/test/memory/{seed,compact}` (Postgres reference host enables both via `OPENWOP_MEMORY_COMPACTION=true OPENWOP_TEST_TRIGGER_COMPACTION=true`). 2026-05-15 (gap-closure CF-3) added `interrupt-token-matrix.test.ts` (malformed / unknown / replay / cross-run-id paths on `GET|POST /v1/interrupts/{token}`). 2026-05-31 (RFC 0078 portable tool catalog + RFC 0079 credential provenance / egress policy — the Active→Accepted behavioral gate) added four: `tool-catalog-projection.test.ts` (capability-gated on `toolCatalog.supported` via `behaviorGate('openwop-tool-catalog', …)` — the NORMATIVE `GET /v1/tools` list with each `ToolDescriptor` schema-valid + `source`/`safetyTier` in the closed vocab + content-free, `GET /v1/tools/{toolId}` round-trip + unknown-id 404, 401-unauthenticated, and the §F-2 cross-principal non-disclosure; black-box, no POST seam), `tool-session-lifecycle.test.ts` (gated on `toolCatalog.sessionLifecycle` — the §D `tool.session.opened`-before / `tool.session.closed`-after bracket over the RFC 0064 call events via the `POST /v1/host/sample/tools/session-run` seam, one shared `sessionId`, content-free), `egress-audience-binding.test.ts` (KEYSTONE — gated on `httpClient.egressPolicy.supported`; the §C confused-deputy MUST via `POST /v1/host/sample/egress/decide`: an out-of-audience egress is denied/downgraded with the credential NOT attached, a provenance-unevaluable egress fails closed — the behavioral leg of `egress-credential-audience-bound`), and `egress-decision-content-free.test.ts` (the SR-1 canary — the credential value never surfaces in `egress.decided` and `reason` stays in the CLOSED vocabulary). The maintained scenario-to-spec map lives in [`coverage.md`](./coverage.md); this README keeps the operator quickstart and the historical scenario notes below.
97
97
 
98
98
  High-level coverage includes:
99
99
 
@@ -172,7 +172,7 @@ Server-required (added in 1.7.0):
172
172
  |---|---|---|
173
173
  | **Redaction** | [`capabilities.md`](../spec/v1/capabilities.md) §"Secrets" + NFR-7 + §"aiProviders" | Vendor-neutral assertions that the server doesn't leak secret material. Three scenario groups: (a) discovery shape contract — `secrets` + `aiProviders` advertisements are well-formed regardless of `secrets.supported`; when `supported === true`, scopes MUST be non-empty + `resolution === 'host-managed'`; `byok ⊆ supported`. (b) bearer-token redaction — invalid Bearer canary in `Authorization` header is not echoed in the 401 response body. (c) credentialRef echo control — gated on `secrets.supported === true`; canary planted in `configurable.ai.credentialRef` MUST NOT appear in any RunEvent payload (poll-based capture; transport-agnostic). Uses runtime-built canary fixtures (`lib/canaries.ts`) that defeat static secret scanners. 6 scenarios. |
174
174
 
175
- Current source tree: 322 scenario files. Use [`coverage.md`](./coverage.md) for current grade/gap tracking.
175
+ Current source tree: 323 scenario files. Use [`coverage.md`](./coverage.md) for current grade/gap tracking.
176
176
 
177
177
  ## Remaining Gaps
178
178
 
package/coverage.md CHANGED
@@ -58,7 +58,7 @@
58
58
  | Credential provenance + egress policy (RFC 0079 — `spec/v1/host-capabilities.md` §"Credential provenance + egress policy") | `egress-provenance-shape.test.ts` | A (always-on, server-free shape probe; doubles as the public test for `egress-decision-no-secret-leak`) | RFC 0079 promoted Draft → Active 2026-05-30 (4 UQs resolved via MyndHyve review). Always-on shape probe asserts `credential-provenance.schema.json` round-trips a conforming `CredentialProvenance` + rejects `audiences:[]` / missing `credentialId` / unknown property, the descriptor + `egress.decided` declare NO secret-value property (the content-free **`egress-decision-no-secret-leak`** protocol-tier invariant), the `egress.decided` payload validates a content-free record + enforces the `decision` enum + required `decision`/`destination`, and `capabilities.httpClient.egressPolicy` is declared. **The behavioral audience-binding MUST-NOT (`egress-credential-audience-bound`) is reference-impl tier** at Draft→Active — a credential bound to audience A on an egress to B must be `denied`/`downgraded` (never `allowed`-with-credential), fail-closed on unevaluable provenance — and lands in the gated `egress-audience-binding.test.ts` + `egress-decision-content-free.test.ts` (soft-skip until a host wires `egressPolicy` over `safeFetch`). Path to `Accepted`: a reference host enforces §C + the binding scenario passes → `egress-credential-audience-bound` graduates protocol-tier (RFC 0035 precedent). |
59
59
  | Durable trigger + channel bridge (RFC 0083 — `spec/v1/trigger-bridge.md`, `spec/v1/profiles.md` §`openwop-trigger-bridge`) | `trigger-bridge-shape.test.ts` | A (always-on, server-free shape probe) | RFC 0083 promoted Draft → Active 2026-05-30 (5 UQs resolved via MyndHyve review). Always-on shape probe asserts `trigger-subscription.schema.json` round-trips a conforming `TriggerSubscription` + rejects missing-`state`/out-of-enum-`source`/unknown-property, the four-state vocab (`active`/`paused`/`failed`/`dead-lettered`) is stable, the two content-free `trigger.{subscription.state.changed,delivery.attempted}` payloads validate + enforce the `state`/`outcome` enums + RunEventType-enum membership, `capabilities.triggerBridge` + `webhooks.durable` are declared, and `deriveProfiles` surfaces `openwop-trigger-bridge` for bridge+sink+durable-source while withholding it with no dead-letter sink. **Behavioral scenario deferred** per RFC 0083 §Conformance: `trigger-bridge-delivery.test.ts` (dedup → retry → dead-letter → trigger→run causation) is profile-gated on `openwop-trigger-bridge` and soft-skips until a reference host wires durable delivery. Path to `Accepted`: a host wires the state machine + delivery loop + the scenario passes. |
60
60
  | Budget, quota + cost policy (RFC 0084 — `spec/v1/budget-policy.md`) | `budget-policy-shape.test.ts` | A (always-on, server-free shape probe; doubles as the public test for `budget-no-pricing-leak`) | RFC 0084 promoted Draft → Active 2026-05-30 (5 UQs resolved via MyndHyve review). Always-on shape probe asserts `budget-policy.schema.json` round-trips a conforming `BudgetPolicy` + enforces the §A/§E orthogonality guard (a wall-time field is rejected — that's RFC 0058's `runTimeoutMs`) + threshold/onExhaustion negatives, the four content-free `budget.{reserved,consumed,threshold.crossed,exhausted}` payloads validate + enforce the `dimension`/`scope` enums, the four `cap.breached{budget-tokens,budget-cost,budget-tool-calls,budget-retries}` kinds + the four `budget.*` RunEventType-enum entries are present, the payloads declare no pricing/credential property (the **`budget-no-pricing-leak`** protocol-tier invariant), and `capabilities.budget` + `limits.maxBudget{Tokens,CostUsd}` are declared. **Behavioral scenario authored** (2026-06-01; see §"Capability-gated scenarios"): `budget-enforcement.test.ts` (accrue → threshold → exhaust → `cap.breached{budget-cost}` → `run.failed{budget_exhausted}`; `budget_model_denied`; advisory no-stop) gates on `budget.supported` via `behaviorGate('openwop-budget-enforcement', …)` + the `POST /v1/host/sample/budget/run` seam and soft-skips until a reference host wires accounting. Path to `Accepted`: a host wires the budget accumulator + the exhaustion stop + the scenario passes non-vacuously (MyndHyve `budget`). |
61
- | `openwop-agent-platform` meta-profile (RFC 0085 — `spec/v1/agent-platform-profile.md`, operational annex) | `agent-platform-profile.test.ts` | A (always-on, server-free derivation probe) | RFC 0085 promoted Draft → Active 2026-05-30 (5 UQs resolved via MyndHyve review). Operational annex (NOT a closed `profiles.md` catalog predicate). Always-on derivation probe asserts `isAgentPlatformPartial`/`isAgentPlatformFull`/`agentPlatformStatus` derive `none`/`partial`/`full` correctly: all-floor ⇒ partial, a missing floor flag ⇒ none, the replay-OR-`nondeterminismPolicy.declared` term, floor+governance ⇒ full, a missing governance term (tenant `installScope`) ⇒ partial-not-full (the honest-advertisement rule), and that the eval/deploy/budget platform-plus tier is advisory (a full host without them is still full); plus `capabilities.nondeterminismPolicy.declared` is declared. **Live aggregate-evidence assertion deferred** per RFC 0085 §C: when a host claims `full`, the meta-scenario must assert every required constituent scenario is in its passing setnaturally gated on a reference host reaching partial/full (Postgres is the candidate). Path to `Accepted`: ≥1 host reports `partial`/`full` backed by the aggregate + renders the badge. |
61
+ | `openwop-agent-platform` meta-profile (RFC 0085 — `spec/v1/agent-platform-profile.md`, operational annex) | `agent-platform-profile.test.ts` | A (always-on, server-free derivation probe) | RFC 0085 promoted Draft → Active 2026-05-30 (5 UQs resolved via MyndHyve review). Operational annex (NOT a closed `profiles.md` catalog predicate). Always-on derivation probe asserts `isAgentPlatformPartial`/`isAgentPlatformFull`/`agentPlatformStatus` derive `none`/`partial`/`full` correctly: all-floor ⇒ partial, a missing floor flag ⇒ none, the replay-OR-`nondeterminismPolicy.declared` term, floor+governance ⇒ full, a missing governance term (tenant `installScope`) ⇒ partial-not-full (the honest-advertisement rule), and that the eval/deploy/budget platform-plus tier is advisory (a full host without them is still full); plus `capabilities.nondeterminismPolicy.declared` is declared. **Live aggregate-evidence assertion authored** (2026-06-01; see §"Capability-gated scenarios"): `agent-platform-aggregate-evidence.test.ts` (gated on a host CLAIMING `openwop-agent-platform` in live `profiles[]` via `behaviorGate('openwop-agent-platform', …)`) reads the live discovery and asserts the §C/§D honest-advertisement rulethe claim MUST satisfy the §B floor predicate (`partial`/`full`, never `none`), backed by the per-capability evidence; `OPENWOP_AGENT_PLATFORM_TIER=full` forces the non-vacuous full-predicate bar. Path to `Accepted`: a host advertises `openwop-agent-platform` backed by the §B floor + passes the aggregate scenario claiming `full` (MyndHyve, after the memory batch surfaces the floor's `memory.supported`). |
62
62
  | `openwop-core-standard` profile (RFC 0088 — `spec/v1/core-standard-profile.md`, operational annex) | `core-standard-profile.test.ts` | A (always-on, server-free derivation probe) | RFC 0088 (`Active` 2026-05-31). Operational annex (NOT a closed `profiles.md` catalog predicate) naming the stable **Core Standard Profile** — the floor of normative MUSTs with black-box production-path conformance. Always-on derivation probe asserts `isCoreStandard` derives the §B floor (`openwop-core` ∧ `openwop-interrupts` ∧ (`openwop-stream-sse` ∨ `openwop-stream-poll`)): core+interrupts+default-transport ⇒ core-standard; a bare `openwop-core` host without `clarification.request` ⇒ NOT core-standard (the floor is stricter than the v1 minimum); a host with no event transport (`supportedTransports:[]`) ⇒ fails; a non-1.x host ⇒ fails; and `openwop-core-standard` is absent from `deriveProfiles` (it composes, it does not redefine). The §C floor scenarios (runs-lifecycle / discovery / auth-401 / event-ordering / failure-path / idempotency / interrupts / webhook-negative / audit-log-verify) are all already black-box production-path. **Live aggregate-evidence assertion deferred** per RFC 0088 §C: a host claiming `openwop-core-standard` must pass every §C floor scenario black-box — already satisfied by MyndHyve + all four reference hosts. Path to `Accepted`: ≥1 host advertises the claim in its `conformance.md`/`INTEROP-MATRIX.md` row backed by the floor scenarios. |
63
63
  | Agent deployment lifecycle (RFC 0082 — `capabilities.agents.deployment`) | `agent-deployment-shape.test.ts` | A (always-on, server-free shape probe; doubles as the public test for `deployment-event-no-content-leak`) | RFC 0082 promoted Draft → Active 2026-05-30. Always-on shape probe asserts `capabilities.agents.deployment` (+ `supported`/`channels`/`canary`/`rollback`/`states` sub-flags); the `AgentDeployment` record compiles + round-trips and rejects malformed ones (out-of-enum `state`; `canaryPercent` out of 0..100); the **`AgentRef` `channel` XOR `version`** rule (each alone + neither validate; both rejected by the `not` clause, §A); the four `deployment.*` payloads validate content-free records + reject malformed ones; `agent.invocation.started` carries the additive recorded-fact `resolvedAgentVersion`/`resolvedChannel` (§B channel pin); and all four event names appear in the RunEventType enum. The **content-free negatives** (a `deployment.promoted` carrying a `manifestBody`; a `deployment.state.changed` carrying a `prompt`) are the public test for protocol-tier SECURITY invariant `deployment-event-no-content-leak`. The behavioral `deployment-promotion-fail-closed` invariant is `reference-impl` tier until the behavioral scenario lands (then graduates to protocol; RFC 0035 precedent). **Behavioral scenario authored + gated** (2026-05-31; see §"Capability-gated scenarios"): `agent-deployment-lifecycle.test.ts` (the §E authz → approvalGate → eval-verify → `deployment.promoted` promotion + the record round-trip, the fail-closed denial, the `eval_gate_unmet` denial, and the §B `resolvedAgentVersion` channel pin) gates on `capabilities.agents.deployment.supported` + the `POST /v1/host/sample/agents/deployment-transition` seam (`behaviorGate('openwop-deployment-lifecycle', …)`) and soft-skips until a host wires it. When it passes against a host, the `deployment-promotion-fail-closed` invariant graduates reference-impl → protocol tier. Path to `Accepted`: a host implements the deployment store + canary router + the `POST /v1/agents/{agentId}/deployments` promotion contract (the endpoint + SDK helper already landed). |
64
64
  | Standing agent roster (RFC 0086 — `capabilities.agents.roster`) | `agent-roster-shape.test.ts` | A (always-on, server-free shape probe; doubles as the public test for `roster-attribution-no-content`) | RFC 0086 promoted Draft → Active 2026-05-30. Always-on shape probe asserts `capabilities.agents.roster` (+ `supported`/`installScope`/`portfolioTriggerSources` sub-flags); the `AgentRosterEntry` record compiles + round-trips and rejects malformed ones (a non-`host:` `rosterId`; an `agentRef` carrying BOTH `version` and `channel` — the RFC 0082 §A XOR rule; a missing `rosterId`); the `roster.run.initiated` payload validates a content-free attribution record + requires its ids/persona/triggerSource; the `AgentInventoryEntry` carries the additive optional `roster` portfolio projection (§B); and `roster.run.initiated` appears in the RunEventType enum. The **content-free negatives** (a `roster.run.initiated` carrying a `body`; one carrying a `prompt`) are the public test for protocol-tier SECURITY invariant `roster-attribution-no-content`. **Behavioral scenarios deferred** per RFC 0086 §Conformance (reference host): a scheduled portfolio fire emitting `roster.run.initiated` before `agent.invocation.started`; the RFC 0083 work-item causation chain; the replay re-read; the cross-tenant `GET /v1/agents/roster/{id}` 404 — gate on `capabilities.agents.roster.supported` + the roster-store seam and soft-skip until a host wires it (the host-extension at `/v1/host/sample/roster` + board attribution, apps/workflow-engine #368, is the reference demonstration). Path to `Accepted`: a non-steward host advertises `agents.roster` + emits `roster.run.initiated`. |
@@ -121,6 +121,7 @@ Thirty-one scenario groups validate optional profiles where the host's discovery
121
121
  | `egress-decision-content-free.test.ts` | `capabilities.httpClient.egressPolicy.supported` (RFC 0079 §F / SR-1) | A (the secret non-leak — a `canary` credential's sentinel never surfaces in the decision (`canaryLeaked !== true`), the `egress.decided` payload carries no forbidden content key, and `reason` stays in the CLOSED vocabulary so no blocked destination spills into a free-form field) | `host-pending` | `behaviorGate('openwop-egress-decision-content-free', …)`. Seam-gated; soft-skips on 404. **Part of the RFC 0079 → Accepted bar.** First adopter: MyndHyve `httpClient.egressPolicy`. |
122
122
  | `memory-degraded-projection.test.ts` | `capabilities.agents.manifestRuntime.supported` + `capabilities.memory.supported` (RFC 0080 §C, `agent-memory.md`) | A (the §C iff-contract on the NORMATIVE `GET /v1/agents`: a degraded entry MUST carry `memoryDegraded:true` + a non-empty, unique `degradedMemoryDimensions[]` drawn from the closed §A-name enum [read/write/search/long-term-durability/compaction/attribution/replay-snapshot/retention]; a non-degraded entry MUST NOT carry a non-empty list; the inventory is non-empty; the degraded branch runs non-vacuously when `OPENWOP_DEGRADED_AGENT_ID` names a known-degraded agent) | `host-pending` | `behaviorGate('openwop-memory-degraded', …)`. Black-box on the normative path (no POST seam); soft-skips on 404 / when the host computes no degradation. **This is the RFC 0080 → Accepted bar.** First adopter: MyndHyve `memory`. |
123
123
  | `budget-enforcement.test.ts` | `capabilities.budget.supported` (RFC 0084 §C/§D, `budget-policy.md`) + `SECURITY/invariants.yaml` `budget-no-pricing-leak` | A (the §C/§D enforcement via `POST /v1/host/sample/budget/run` + the test event-log seam: a `hard-cost-exhaust` run emits the strict-ordered `budget.reserved → budget.consumed → budget.threshold.crossed{percent} → budget.exhausted → cap.breached{kind:"budget-cost"} → run.failed{error:"budget_exhausted"}` chain; a `model-denied` run is refused `budget_model_denied` BEFORE the provider call (fail-closed); an `advisory` host emits the `budget.*` events without stopping; every `budget.*` payload content-free — no pricing/rate) | `host-pending` | `behaviorGate('openwop-budget-enforcement', …)`. Seam-gated; soft-skips on 404. **This is the RFC 0084 → Accepted bar.** First adopter: MyndHyve `budget`. |
124
+ | `agent-platform-aggregate-evidence.test.ts` | `openwop-agent-platform` claim — live discovery `profiles[]` includes it (RFC 0085 §C, `agent-platform-profile.md`) | A (the §C/§D honest-advertisement on live `/.well-known/openwop`: a host claiming `openwop-agent-platform` MUST satisfy the §B floor predicate (`isAgentPlatformPartial` → `partial`/`full`, never `none`), the claim backed by per-capability evidence not the profile string; `OPENWOP_AGENT_PLATFORM_TIER=full` forces the full-predicate bar — all governance terms + tenant installScope + all 16 §D terms) | `host-pending` | `behaviorGate('openwop-agent-platform', …)`. Black-box on the discovery doc (no POST seam); soft-skips until a host claims the profile. **This is the RFC 0085 → Accepted bar.** First adopter: MyndHyve (after the memory batch surfaces the floor's `memory.supported`). |
124
125
  | `approval-gate-events.test.ts` | `approval.granted` / `.rejected` / `.overridden` (RFC 0051 §B, `interrupt-profiles.md` §approvalGate) | Server-free (event-payload schema validity: required fields incl. mandatory `overridden.reason`; additionalProperties:false negatives) | host-pass (server-free) | Always runs; no host needed. |
125
126
  | `approval-gate-flow.test.ts` | `core.openwop.governance.approvalGate` (RFC 0051 §A) + `capabilities.authorization` (RFC 0049) | A (capability-gated on `authorization.supported`; unauthorized-principal-denied + override-audited via the `governance/approval-gate` seam) | `host-pending` | Behavioral probe soft-skips on 404. Grant/reject-loopback/quorum scenarios deferred until a governance host wires the seam. |
126
127
  | `scheduling-capability-shape.test.ts` | `capabilities.scheduling` (RFC 0052 §A, `host-capabilities.md` §host.scheduling) | A (advertisement shape always — `supported` boolean; `cron`/`delayed`/`calendar` booleans; `maxFutureHorizon` ISO-8601 duration) | `host-pending` | Always runs; asserts the block is absent or well-formed. |
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@openwop/openwop-conformance",
3
- "version": "1.16.0",
3
+ "version": "1.18.1",
4
4
  "description": "Production-ready black-box conformance suite for OpenWOP v1.0 compliant servers.",
5
5
  "repository": {
6
6
  "type": "git",
@@ -764,8 +764,8 @@
764
764
  "required": ["kind", "limit", "observed"],
765
765
  "properties": {
766
766
  "kind": { "type": "string", "enum": ["clarification", "schema", "envelopes", "node-executions", "wasm-memory", "wasm-fuel", "wasm-execution-time", "run-duration", "loop-iterations", "budget-tokens", "budget-cost", "budget-tool-calls", "budget-retries"] },
767
- "limit": { "type": "integer", "minimum": 0 },
768
- "observed": { "type": "integer", "minimum": 0 },
767
+ "limit": { "type": "number", "minimum": 0, "description": "The effective ceiling. A whole number for the engine/WASM/RFC 0058 integer kinds; a fractional value for budget-cost (dollars, matching budget.exhausted + maxCostUsd — RFC 0084 §D)." },
768
+ "observed": { "type": "number", "minimum": 0, "description": "The observed value at breach. Same numeric domain as limit for the kind." },
769
769
  "nodeId": { "type": "string" }
770
770
  },
771
771
  "additionalProperties": true
@@ -13,6 +13,7 @@
13
13
  * from leaking across scenarios.
14
14
  */
15
15
 
16
+ import { expect } from 'vitest';
16
17
  import { driver } from './driver.js';
17
18
 
18
19
  export interface TestEvent {
@@ -50,6 +51,23 @@ export async function queryTestEvents(
50
51
  return { ok: true, events: body.events ?? [] };
51
52
  }
52
53
 
54
+ /**
55
+ * Assert a query SUCCEEDED and return its typed events. Use this AFTER a
56
+ * scenario's legitimate soft-skip gates (capability/profile not advertised,
57
+ * seam unwired) so that a host which has opted in but returns no events FAILS
58
+ * the assertion rather than vacuously passing an `if (q.ok && events.length)`
59
+ * guard. The trailing `throw` is unreachable once `expect` has fired but lets
60
+ * TypeScript narrow `QueryOutcome` to the `{ ok: true }` branch.
61
+ */
62
+ export function requireEvents(q: QueryOutcome, where: string): TestEvent[] {
63
+ expect(
64
+ q.ok,
65
+ `${where}: the event-log seam MUST return events for a wired behavioral run (got ${q.ok ? 'ok' : ('reason' in q ? q.reason : 'error')})`,
66
+ ).toBe(true);
67
+ if (!q.ok) throw new Error(`${where}: event-log query not ok`);
68
+ return q.events;
69
+ }
70
+
53
71
  /** Reset the test-only event log + capability overlay (suite teardown). */
54
72
  export async function resetTestSeam(): Promise<void> {
55
73
  await driver.post('/v1/host/sample/test/reset', {});
@@ -42,7 +42,7 @@ import {
42
42
  DEPLOYMENT_STATES,
43
43
  DEPLOYMENT_CONTENT_FORBIDDEN,
44
44
  } from '../lib/agentDeployment.js';
45
- import { queryTestEvents, isEventLogSeamAvailable, resetTestSeam } from '../lib/event-log-query.js';
45
+ import { queryTestEvents, requireEvents, isEventLogSeamAvailable, resetTestSeam } from '../lib/event-log-query.js';
46
46
 
47
47
  function loadSchema(name: string): Record<string, unknown> {
48
48
  return JSON.parse(readFileSync(join(SCHEMAS_DIR, name), 'utf8')) as Record<string, unknown>;
@@ -71,76 +71,99 @@ describe('agent-deployment-lifecycle (RFC 0082 §B/§E)', () => {
71
71
  const promote = await driveDeploymentTransition({ scenario: 'promote' });
72
72
  if (promote === null) return; // deployment seam unwired — soft-skip the whole behavioral suite
73
73
 
74
- if (promote.record) {
74
+ // The host has ADVERTISED agents.deployment AND wired the seam — missing
75
+ // evidence is a FAILURE, not a soft-skip. A successful promote MUST return
76
+ // a runId + a schema-valid record + emit ≥1 content-free deployment.promoted.
77
+ expect(
78
+ typeof promote.runId === 'string' && (promote.runId as string).length > 0,
79
+ driver.describe('agent-deployment.md §E', 'a wired promote MUST return the runId'),
80
+ ).toBe(true);
81
+ expect(
82
+ promote.record !== undefined && promote.record !== null,
83
+ driver.describe('agent-deployment.md §E', 'a successful promote MUST return the deployment record'),
84
+ ).toBe(true);
85
+ expect(
86
+ validateRecord(promote.record),
87
+ driver.describe('agent-deployment.schema.json', `a promoted deployment record MUST validate (${ajv.errorsText(validateRecord.errors)})`),
88
+ ).toBe(true);
89
+
90
+ const promotedEvents = requireEvents(
91
+ await queryTestEvents(promote.runId as string, { type: 'deployment.promoted' }),
92
+ 'deployment.promoted',
93
+ );
94
+ expect(
95
+ promotedEvents.length >= 1,
96
+ driver.describe('agent-deployment.md §E', 'a successful promote MUST emit at least one deployment.promoted'),
97
+ ).toBe(true);
98
+ for (const e of promotedEvents) {
99
+ expectContentFree(e.payload, 'deployment.promoted');
75
100
  expect(
76
- validateRecord(promote.record),
77
- driver.describe(
78
- 'agent-deployment.schema.json',
79
- `a promoted deployment record MUST validate (${ajv.errorsText(validateRecord.errors)})`,
80
- ),
101
+ typeof e.payload.toState === 'string' && DEPLOYMENT_STATES.includes(e.payload.toState as string),
102
+ driver.describe('run-event-payloads.schema.json#/$defs/deploymentPromoted', 'toState MUST be in the seven-state vocabulary'),
103
+ ).toBe(true);
104
+ expect(
105
+ typeof e.payload.toVersion === 'string' && (e.payload.toVersion as string).length > 0,
106
+ driver.describe('agent-deployment.md §D', 'deployment.promoted MUST carry the promoted toVersion'),
81
107
  ).toBe(true);
82
- }
83
- if (promote.runId) {
84
- const pq = await queryTestEvents(promote.runId, { type: 'deployment.promoted' });
85
- if (pq.ok) {
86
- for (const e of pq.events) {
87
- expectContentFree(e.payload, 'deployment.promoted');
88
- expect(
89
- typeof e.payload.toState === 'string' && DEPLOYMENT_STATES.includes(e.payload.toState as string),
90
- driver.describe('run-event-payloads.schema.json#/$defs/deploymentPromoted', 'toState MUST be in the seven-state vocabulary'),
91
- ).toBe(true);
92
- expect(
93
- typeof e.payload.toVersion === 'string' && (e.payload.toVersion as string).length > 0,
94
- driver.describe('agent-deployment.md §D', 'deployment.promoted MUST carry the promoted toVersion'),
95
- ).toBe(true);
96
- }
97
- }
98
108
  }
99
109
 
100
110
  // ---- Leg 2: fail-closed authz (§E-1; deployment-promotion-fail-closed) -
101
111
  const unauth = await driveDeploymentTransition({ scenario: 'unauthorized' });
102
- if (unauth && unauth.runId) {
103
- expect(
104
- unauth.allowed !== true,
105
- driver.describe('agent-deployment.md §E-1', 'a principal without deploy:promote MUST be denied (fail-closed)'),
106
- ).toBe(true);
107
- const uq = await queryTestEvents(unauth.runId, { type: 'deployment.promoted' });
108
- if (uq.ok) {
109
- expect(
110
- uq.events.length === 0,
111
- driver.describe('SECURITY invariant deployment-promotion-fail-closed', 'a denied transition MUST emit NO deployment.promoted'),
112
- ).toBe(true);
113
- }
114
- }
112
+ expect(
113
+ unauth !== null && typeof unauth.runId === 'string' && (unauth.runId as string).length > 0,
114
+ driver.describe('agent-deployment.md §E-1', 'the unauthorized scenario MUST return a runId to evidence the fail-closed denial'),
115
+ ).toBe(true);
116
+ expect(
117
+ unauth!.allowed !== true,
118
+ driver.describe('agent-deployment.md §E-1', 'a principal without deploy:promote MUST be denied (fail-closed)'),
119
+ ).toBe(true);
120
+ const unauthPromoted = requireEvents(
121
+ await queryTestEvents(unauth!.runId as string, { type: 'deployment.promoted' }),
122
+ 'deployment.promoted (unauthorized)',
123
+ );
124
+ expect(
125
+ unauthPromoted.length === 0,
126
+ driver.describe('SECURITY invariant deployment-promotion-fail-closed', 'a denied transition MUST emit NO deployment.promoted'),
127
+ ).toBe(true);
115
128
 
116
129
  // ---- Leg 3: eval-gate-unmet denial (§E-3) ----------------------------
117
130
  const evalUnmet = await driveDeploymentTransition({ scenario: 'eval-gate-unmet' });
118
- if (evalUnmet && evalUnmet.runId) {
119
- expect(
120
- evalUnmet.error === 'eval_gate_unmet' || evalUnmet.allowed !== true,
121
- driver.describe('agent-deployment.md §E-3', 'a promote whose eval evidence has passed:false MUST be denied (eval_gate_unmet)'),
122
- ).toBe(true);
123
- const eq = await queryTestEvents(evalUnmet.runId, { type: 'deployment.promoted' });
124
- if (eq.ok) {
125
- expect(
126
- eq.events.length === 0,
127
- driver.describe('agent-deployment.md §E-3', 'an unmet eval gate MUST emit NO deployment.promoted'),
128
- ).toBe(true);
129
- }
130
- }
131
+ expect(
132
+ evalUnmet !== null && typeof evalUnmet.runId === 'string' && (evalUnmet.runId as string).length > 0,
133
+ driver.describe('agent-deployment.md §E-3', 'the eval-gate-unmet scenario MUST return a runId to evidence the denial'),
134
+ ).toBe(true);
135
+ expect(
136
+ evalUnmet!.error === 'eval_gate_unmet' || evalUnmet!.allowed !== true,
137
+ driver.describe('agent-deployment.md §E-3', 'a promote whose eval evidence has passed:false MUST be denied (eval_gate_unmet)'),
138
+ ).toBe(true);
139
+ const evalUnmetPromoted = requireEvents(
140
+ await queryTestEvents(evalUnmet!.runId as string, { type: 'deployment.promoted' }),
141
+ 'deployment.promoted (eval-gate-unmet)',
142
+ );
143
+ expect(
144
+ evalUnmetPromoted.length === 0,
145
+ driver.describe('agent-deployment.md §E-3', 'an unmet eval gate MUST emit NO deployment.promoted'),
146
+ ).toBe(true);
131
147
 
132
148
  // ---- Leg 4: channel-resolution pin (§B) ------------------------------
133
149
  const pin = await driveDeploymentTransition({ scenario: 'channel-pin', channel: 'stable' });
134
- if (pin && pin.runId) {
135
- const iq = await queryTestEvents(pin.runId, { type: 'agent.invocation.started' });
136
- if (iq.ok && iq.events.length > 0) {
137
- const started = iq.events.sort((a, b) => a.sequence - b.sequence)[0]!;
138
- expect(
139
- typeof started.payload.resolvedAgentVersion === 'string' && (started.payload.resolvedAgentVersion as string).length > 0,
140
- driver.describe('agent-deployment.md §B', 'a @channel-bound run MUST record resolvedAgentVersion on agent.invocation.started (the recorded fact a replay re-reads)'),
141
- ).toBe(true);
142
- }
143
- }
150
+ expect(
151
+ pin !== null && typeof pin.runId === 'string' && (pin.runId as string).length > 0,
152
+ driver.describe('agent-deployment.md §B', 'the channel-pin scenario MUST return a runId'),
153
+ ).toBe(true);
154
+ const invEvents = requireEvents(
155
+ await queryTestEvents(pin!.runId as string, { type: 'agent.invocation.started' }),
156
+ 'agent.invocation.started (channel-pin)',
157
+ );
158
+ expect(
159
+ invEvents.length >= 1,
160
+ driver.describe('agent-deployment.md §B', 'a @channel-bound run MUST emit agent.invocation.started'),
161
+ ).toBe(true);
162
+ const startedInv = invEvents.sort((a, b) => a.sequence - b.sequence)[0]!;
163
+ expect(
164
+ typeof startedInv.payload.resolvedAgentVersion === 'string' && (startedInv.payload.resolvedAgentVersion as string).length > 0,
165
+ driver.describe('agent-deployment.md §B', 'a @channel-bound run MUST record resolvedAgentVersion on agent.invocation.started (the recorded fact a replay re-reads)'),
166
+ ).toBe(true);
144
167
 
145
168
  await resetTestSeam();
146
169
  });
@@ -38,7 +38,7 @@ import {
38
38
  getEvalSummary,
39
39
  EVAL_CONTENT_FORBIDDEN,
40
40
  } from '../lib/agentEval.js';
41
- import { queryTestEvents, isEventLogSeamAvailable, resetTestSeam } from '../lib/event-log-query.js';
41
+ import { queryTestEvents, requireEvents, isEventLogSeamAvailable, resetTestSeam } from '../lib/event-log-query.js';
42
42
 
43
43
  function loadSchema(name: string): Record<string, unknown> {
44
44
  return JSON.parse(readFileSync(join(SCHEMAS_DIR, name), 'utf8')) as Record<string, unknown>;
@@ -61,83 +61,110 @@ describe('agent-eval-run (RFC 0081 §B/§C)', () => {
61
61
 
62
62
  const run = await driveEvalRun({ modes: ['golden'] });
63
63
  if (run === null) return; // eval-run seam unwired — soft-skip the whole behavioral suite
64
- if (!run.runId) return;
65
64
 
66
- // ---- Legs 1+2: eval.* ordering + content-free (§C) -------------------
67
- const startedQ = await queryTestEvents(run.runId, { type: 'eval.started' });
68
- const scoredQ = await queryTestEvents(run.runId, { type: 'eval.scored' });
69
- const completedQ = await queryTestEvents(run.runId, { type: 'eval.completed' });
65
+ // From here the host has ADVERTISED agents.evalSuite AND wired the eval-run
66
+ // seam missing evidence is a FAILURE, not a soft-skip. A host claiming the
67
+ // capability MUST produce the runId, the full eval.* sequence, and the
68
+ // normative EvalSummary, or it is advertising a capability it doesn't deliver.
69
+ expect(
70
+ typeof run.runId === 'string' && run.runId.length > 0,
71
+ driver.describe('agent-evaluation.md §B', 'a wired eval-run seam MUST return the projected runId'),
72
+ ).toBe(true);
73
+ const runId = run.runId as string;
70
74
 
71
- if (startedQ.ok && scoredQ.ok && startedQ.events.length > 0) {
72
- const started = startedQ.events.sort((a, b) => a.sequence - b.sequence)[0]!;
75
+ // ---- Legs 1+2: eval.* ordering + content-free (§C) -------------------
76
+ const startedQ = await queryTestEvents(runId, { type: 'eval.started' });
77
+ const scoredQ = await queryTestEvents(runId, { type: 'eval.scored' });
78
+ const completedQ = await queryTestEvents(runId, { type: 'eval.completed' });
73
79
 
74
- // eval.started precedes every eval.scored (§C ordering).
75
- for (const s of scoredQ.events) {
76
- expect(
77
- started.sequence < s.sequence,
78
- driver.describe('agent-evaluation.md §C', 'eval.started MUST precede every eval.scored'),
79
- ).toBe(true);
80
- }
80
+ // The event-log seam MUST return the eval.* events for a wired eval run
81
+ // (requireEvents hard-fails if a leg's query is not ok — no vacuous pass).
82
+ const startedEvents = requireEvents(startedQ, 'eval.started');
83
+ const scoredEvents = requireEvents(scoredQ, 'eval.scored');
84
+ const completedEvents = requireEvents(completedQ, 'eval.completed');
81
85
 
82
- if (completedQ.ok && completedQ.events.length > 0) {
83
- const completed = completedQ.events.sort((a, b) => a.sequence - b.sequence)[completedQ.events.length - 1]!;
84
- for (const s of scoredQ.events) {
85
- expect(
86
- s.sequence < completed.sequence,
87
- driver.describe('agent-evaluation.md §C', 'every eval.scored MUST precede eval.completed'),
88
- ).toBe(true);
89
- }
90
- // eval.scored is emitted once per task (count == eval.completed.taskCount).
91
- if (typeof completed.payload.taskCount === 'number') {
92
- expect(
93
- scoredQ.events.length === completed.payload.taskCount,
94
- driver.describe('agent-evaluation.md §C', 'one eval.scored per task (count == eval.completed.taskCount)'),
95
- ).toBe(true);
96
- }
97
- expectContentFree(completed.payload, 'eval.completed');
98
- }
86
+ // eval.started exactly once (FIRST); eval.completed exactly once (LAST);
87
+ // ≥1 eval.scored a wired eval run MUST emit the full sequence.
88
+ expect(
89
+ startedEvents.length === 1,
90
+ driver.describe('agent-evaluation.md §C', 'an eval run MUST emit exactly one eval.started'),
91
+ ).toBe(true);
92
+ expect(
93
+ scoredEvents.length >= 1,
94
+ driver.describe('agent-evaluation.md §C', 'an eval run MUST emit at least one eval.scored'),
95
+ ).toBe(true);
96
+ expect(
97
+ completedEvents.length === 1,
98
+ driver.describe('agent-evaluation.md §C', 'an eval run MUST emit exactly one eval.completed'),
99
+ ).toBe(true);
100
+ const started = startedEvents[0]!;
101
+ const completed = completedEvents[0]!;
99
102
 
100
- // each eval.scored content-free + score 0..1, passed boolean.
101
- for (const s of scoredQ.events) {
102
- expectContentFree(s.payload, 'eval.scored');
103
- expect(
104
- typeof s.payload.score === 'number' && (s.payload.score as number) >= 0 && (s.payload.score as number) <= 1,
105
- driver.describe('run-event-payloads.schema.json#/$defs/evalScored', 'eval.scored.score MUST be in 0..1'),
106
- ).toBe(true);
107
- expect(
108
- typeof s.payload.passed === 'boolean',
109
- driver.describe('run-event-payloads.schema.json#/$defs/evalScored', 'eval.scored.passed MUST be a boolean'),
110
- ).toBe(true);
111
- }
112
- expectContentFree(started.payload, 'eval.started');
103
+ // Ordering: eval.started precedes every eval.scored precedes eval.completed.
104
+ for (const s of scoredEvents) {
105
+ expect(
106
+ started.sequence < s.sequence,
107
+ driver.describe('agent-evaluation.md §C', 'eval.started MUST precede every eval.scored'),
108
+ ).toBe(true);
109
+ expect(
110
+ s.sequence < completed.sequence,
111
+ driver.describe('agent-evaluation.md §C', 'every eval.scored MUST precede eval.completed'),
112
+ ).toBe(true);
113
113
  }
114
114
 
115
- // ---- Leg 3: NORMATIVE EvalSummary read (§C) --------------------------
116
- const { status, summary } = await getEvalSummary(run.runId);
117
- if (status === 200 && summary) {
118
- const ajv = new Ajv2020({ strict: false, allErrors: true });
119
- addFormats(ajv);
120
- const validate = ajv.compile(loadSchema('eval-summary.schema.json'));
115
+ // One eval.scored per task (count == eval.completed.taskCount).
116
+ expect(
117
+ typeof completed.payload.taskCount === 'number',
118
+ driver.describe('run-event-payloads.schema.json#/$defs/evalCompleted', 'eval.completed MUST carry a numeric taskCount'),
119
+ ).toBe(true);
120
+ expect(
121
+ scoredEvents.length === completed.payload.taskCount,
122
+ driver.describe('agent-evaluation.md §C', 'one eval.scored per task (count == eval.completed.taskCount)'),
123
+ ).toBe(true);
124
+
125
+ // Content-free (§C / eval-summary-no-content-leak) + score ∈ 0..1, passed boolean.
126
+ expectContentFree(started.payload, 'eval.started');
127
+ expectContentFree(completed.payload, 'eval.completed');
128
+ for (const s of scoredEvents) {
129
+ expectContentFree(s.payload, 'eval.scored');
130
+ expect(
131
+ typeof s.payload.score === 'number' && (s.payload.score as number) >= 0 && (s.payload.score as number) <= 1,
132
+ driver.describe('run-event-payloads.schema.json#/$defs/evalScored', 'eval.scored.score MUST be in 0..1'),
133
+ ).toBe(true);
121
134
  expect(
122
- validate(summary),
123
- driver.describe(
124
- 'eval-summary.schema.json',
125
- `GET /v1/runs/{runId}/eval-summary MUST return a schema-valid EvalSummary (${ajv.errorsText(validate.errors)})`,
126
- ),
135
+ typeof s.payload.passed === 'boolean',
136
+ driver.describe('run-event-payloads.schema.json#/$defs/evalScored', 'eval.scored.passed MUST be a boolean'),
127
137
  ).toBe(true);
138
+ }
128
139
 
129
- const tasks = (summary.tasks as Array<Record<string, unknown>> | undefined) ?? [];
130
- const passedCount = summary.passedCount as number | undefined;
131
- const taskCount = summary.taskCount as number | undefined;
132
- if (typeof passedCount === 'number' && typeof taskCount === 'number') {
133
- expect(
134
- passedCount <= taskCount,
135
- driver.describe('agent-evaluation.md §C', 'EvalSummary.passedCount MUST NOT exceed taskCount'),
136
- ).toBe(true);
137
- }
138
- for (const t of tasks) {
139
- expectContentFree(t, 'EvalSummary.tasks[]');
140
- }
140
+ // ---- Leg 3: NORMATIVE EvalSummary read (§C) MUST serve a 200 -------
141
+ const { status, summary } = await getEvalSummary(runId);
142
+ expect(
143
+ status === 200 && summary !== undefined,
144
+ driver.describe('agent-evaluation.md §C', `GET /v1/runs/{runId}/eval-summary MUST serve a 200 EvalSummary for a completed eval run (got ${status})`),
145
+ ).toBe(true);
146
+ const sum = summary as Record<string, unknown>;
147
+ const ajv = new Ajv2020({ strict: false, allErrors: true });
148
+ addFormats(ajv);
149
+ const validate = ajv.compile(loadSchema('eval-summary.schema.json'));
150
+ expect(
151
+ validate(sum),
152
+ driver.describe('eval-summary.schema.json', `EvalSummary MUST be schema-valid (${ajv.errorsText(validate.errors)})`),
153
+ ).toBe(true);
154
+
155
+ const tasks = (sum.tasks as Array<Record<string, unknown>> | undefined) ?? [];
156
+ const passedCount = sum.passedCount as number | undefined;
157
+ const taskCount = sum.taskCount as number | undefined;
158
+ expect(
159
+ typeof passedCount === 'number' && typeof taskCount === 'number',
160
+ driver.describe('eval-summary.schema.json', 'EvalSummary MUST carry numeric passedCount + taskCount'),
161
+ ).toBe(true);
162
+ expect(
163
+ (passedCount as number) <= (taskCount as number),
164
+ driver.describe('agent-evaluation.md §C', 'EvalSummary.passedCount MUST NOT exceed taskCount'),
165
+ ).toBe(true);
166
+ for (const t of tasks) {
167
+ expectContentFree(t, 'EvalSummary.tasks[]');
141
168
  }
142
169
 
143
170
  await resetTestSeam();
@@ -0,0 +1,68 @@
1
+ /**
2
+ * openwop-agent-platform — LIVE aggregate-evidence (RFC 0085 §C) — behavioral.
3
+ *
4
+ * The `Active → Accepted` bar for the meta-profile. Capability-gated on a host
5
+ * CLAIMING the operational annex — i.e. its live discovery `profiles[]` includes
6
+ * `openwop-agent-platform`. Soft-skips when unclaimed (default) / hard-fails
7
+ * under `OPENWOP_REQUIRE_BEHAVIOR=true`.
8
+ *
9
+ * The always-on derivation legs in `agent-platform-profile.test.ts` prove the
10
+ * §B predicate logic against synthetic payloads; THIS asserts the §C/§D
11
+ * honest-advertisement rule against the LIVE discovery doc: a host MAY advertise
12
+ * `openwop-agent-platform` only if its real wire satisfies the §B floor
13
+ * predicate — the platform claim is **backed by** the per-capability evidence
14
+ * (each constituent cap's gated scenario — agent-manifest-runtime,
15
+ * agent-live-*, tool-catalog/hooks, safe-fetch, provider-usage, prompts, memory,
16
+ * feedback, replay, + the governance scenarios — runs in this same suite run and
17
+ * must pass), never asserted on the profile string alone.
18
+ *
19
+ * When the operator declares the cert tier `full`
20
+ * (`OPENWOP_AGENT_PLATFORM_TIER=full`), the full predicate (all governance terms
21
+ * + tenant installScope) MUST hold non-vacuously.
22
+ *
23
+ * Spec references:
24
+ * - https://github.com/openwop/openwop/blob/main/spec/v1/agent-platform-profile.md (§C/§D)
25
+ * - https://github.com/openwop/openwop/blob/main/RFCS/0085-agent-platform-meta-profile.md
26
+ */
27
+
28
+ import { describe, it, expect } from 'vitest';
29
+ import { driver } from '../lib/driver.js';
30
+ import { behaviorGate } from '../lib/behavior-gate.js';
31
+ import { isAgentPlatformPartial, isAgentPlatformFull, agentPlatformStatus, agentPlatformSatisfiedTerms } from '../lib/profiles.js';
32
+
33
+ describe('agent-platform-aggregate-evidence (RFC 0085 §C)', () => {
34
+ it('a host claiming openwop-agent-platform satisfies the §B floor on live discovery; full when the operator certifies full', async () => {
35
+ const res = await driver.get('/.well-known/openwop', { authenticated: false });
36
+ const disco = (res.status === 200 ? res.json : null) as Record<string, unknown> | null;
37
+ const profiles = Array.isArray(disco?.profiles) ? (disco!.profiles as unknown[]) : [];
38
+ const claims = disco !== null && profiles.includes('openwop-agent-platform');
39
+ if (!behaviorGate('openwop-agent-platform', claims)) return;
40
+
41
+ // §C / §D honest-advertisement: the profile claim MUST be backed by the §B
42
+ // floor predicate holding on the live discovery payload — never asserted on
43
+ // the profile string alone.
44
+ expect(
45
+ isAgentPlatformPartial(disco!),
46
+ driver.describe('agent-platform-profile.md §C', 'claiming openwop-agent-platform MUST satisfy the §B floor predicate on live discovery (claim backed by per-capability evidence)'),
47
+ ).toBe(true);
48
+
49
+ const status = agentPlatformStatus(disco!);
50
+ expect(
51
+ status === 'partial' || status === 'full',
52
+ driver.describe('agent-platform-profile.md §D', 'a claimed openwop-agent-platform host MUST derive to partial or full, never none'),
53
+ ).toBe(true);
54
+
55
+ // Non-vacuous FULL bar: when the operator declares the cert tier `full`,
56
+ // every governance term + tenant installScope MUST hold + all 16 §D terms.
57
+ if (process.env.OPENWOP_AGENT_PLATFORM_TIER === 'full') {
58
+ expect(
59
+ isAgentPlatformFull(disco!),
60
+ driver.describe('agent-platform-profile.md §B/§D', 'a host certifying `full` MUST satisfy every governance term: authorization + tenant installScope + memory.attribution + debugBundle + triggerBridge + httpClient.egressPolicy'),
61
+ ).toBe(true);
62
+ expect(
63
+ agentPlatformSatisfiedTerms(disco!).length,
64
+ driver.describe('agent-platform-profile.md §D', 'a host certifying `full` satisfies all 16 §D terms'),
65
+ ).toBe(16);
66
+ }
67
+ });
68
+ });
@@ -13,10 +13,11 @@
13
13
  * missing any reports `partial`, never `full` (the honest-advertisement rule).
14
14
  * - `capabilities.nondeterminismPolicy.declared` is declared in the schema.
15
15
  *
16
- * The LIVE aggregate-evidence assertion (does every required constituent scenario
17
- * actually pass against a host claiming `full`?) is the `Active → Accepted` step
18
- * per RFC 0085 §C — naturally gated on a reference host reaching partial/full, and
19
- * deferred here. This scenario asserts the discovery-predicate derivation only.
16
+ * The LIVE aggregate-evidence assertion (the §C honest-advertisement rule on a
17
+ * host claiming `openwop-agent-platform`) is the `Active → Accepted` step per RFC
18
+ * 0085 §C — capability-gated, server-requiring, and lives in the sibling
19
+ * `agent-platform-aggregate-evidence.test.ts`. THIS scenario asserts the
20
+ * discovery-predicate derivation only (always-on, server-free).
20
21
  *
21
22
  * Spec references:
22
23
  * - https://github.com/openwop/openwop/blob/main/spec/v1/agent-platform-profile.md
@@ -1,16 +1,20 @@
1
1
  /**
2
2
  * auth-saml-profile — RFC 0050: openwop-auth-saml profile.
3
3
  *
4
- * Status: DRAFT. RFC 0050 (SAML / SCIM enterprise identity profiles) is
5
- * `Draft`. The profile is documented in `auth-profiles.md`
4
+ * Status: ACTIVE. RFC 0050 (SAML / SCIM enterprise identity profiles) is
5
+ * `Active`. The profile is documented in `auth-profiles.md`
6
6
  * §`openwop-auth-saml` and reserved in `capabilities.auth.profiles`.
7
7
  *
8
8
  * Capability shape runs unconditionally when the profile is advertised.
9
- * The assertion-validation behavior (1 positive + ≥6 negatives: bad
10
- * signature, `alg:none`, absent signature, `NotOnOrAfter` expiry,
11
- * `NotBefore` not-yet-valid, signature-wrapping) is opt-in via
12
- * `OPENWOP_TEST_SAML_IDP_URL` (operator-supplied synthetic IdP), because
13
- * a deterministic XML-DSig signer harness isn't bundled yet — follows the
9
+ * The assertion-validation behavior is exercised over the host's live
10
+ * `auth/saml/validate` seam for the FULL §A variant set — 1 positive
11
+ * (valid, signed, in-window, non-wrapped → accepted) + 6 negatives
12
+ * (`alg:none`, unsigned, bad-signature, `NotOnOrAfter` expiry, `NotBefore`
13
+ * not-yet-valid, signature-wrapping rejected). This behavioral leg is
14
+ * opt-in via `OPENWOP_TEST_SAML_IDP_URL` (an operator-supplied HTTP
15
+ * synthetic IdP serving the bundled `createSyntheticSamlIdp()` assertions),
16
+ * because it requires a live host ACS + an HTTP IdP the seam can resolve
17
+ * variants from — the in-process minter below is not a server. Follows the
14
18
  * `auth-mtls.test.ts` opt-in precedent. Soft-skips otherwise.
15
19
  *
16
20
  * @see RFCS/0050-saml-scim-enterprise-identity-profiles.md
@@ -65,19 +69,52 @@ describe('auth-saml-profile: advertisement shape (RFC 0050)', () => {
65
69
  describe('auth-saml-profile: assertion validation (RFC 0050 §A — opt-in)', () => {
66
70
  const idpUrl = process.env.OPENWOP_TEST_SAML_IDP_URL;
67
71
 
68
- it('rejects an `alg:none` / unsigned assertion (synthetic IdP required)', async () => {
72
+ // The host's real SAML ACS is driven over the `auth/saml/validate` seam for
73
+ // every variant the bundled synthetic IdP mints — the full RFC 0050 §A MUST
74
+ // list, not just one negative. The seam receives `{ idpUrl, variant }`,
75
+ // resolves `{ certificatePem, assertion }` of that variant from the
76
+ // operator-supplied synthetic IdP, runs it through the host's genuine
77
+ // validator, and answers 2xx (accepted) / non-2xx (rejected). This is the
78
+ // NON-VACUOUS behavioral leg: it exercises the host's ACS on the live wire,
79
+ // distinct from the in-process reference suite below (which proves the
80
+ // assertions are detectably malformed against the bundled oracle, not the
81
+ // host). All legs soft-skip until `OPENWOP_TEST_SAML_IDP_URL` is supplied.
82
+ const gated = (): boolean => idpUrl !== undefined && idpUrl.length > 0;
83
+
84
+ it('ACCEPTS a valid signed, in-window, non-wrapped assertion (synthetic IdP required)', async () => {
69
85
  const profiles = await readProfiles();
70
86
  if (profiles === null || !profiles.includes(SAML_PROFILE)) return; // capability-gated
71
- if (idpUrl === undefined || idpUrl.length === 0) return; // opt-in: synthetic-IdP harness not provided
72
- // With a synthetic IdP, an `alg:none`/unsigned assertion presented to the
73
- // host's SAML ACS MUST be rejected with `unauthenticated`.
74
- const res = await driver.post('/v1/host/sample/auth/saml/validate', { idpUrl, variant: 'alg-none' });
87
+ if (!gated()) return; // opt-in: synthetic-IdP harness not provided
88
+ const res = await driver.post('/v1/host/sample/auth/saml/validate', { idpUrl, variant: 'valid' });
75
89
  if (res.status === 404) return; // seam unwired
76
90
  expect(
77
91
  res.status,
78
- driver.describe('RFC 0050 §A', 'an `alg:none`/unsigned SAML assertion MUST be rejected (non-2xx)'),
79
- ).toBeGreaterThanOrEqual(400);
92
+ driver.describe('RFC 0050 §A', 'a valid signed, in-window, non-wrapped SAML assertion MUST be accepted (2xx)'),
93
+ ).toBeLessThan(400);
80
94
  });
95
+
96
+ // The 6 negatives the RFC 0050 §A MUST list requires a host to reject. The
97
+ // signature-wrapping (XSW) case is the load-bearing security property — a
98
+ // host MUST bind the validated signature to the consumed assertion.
99
+ const negatives: ReadonlyArray<[Exclude<SamlVariant, 'valid'>, string]> = [
100
+ ['alg-none', 'an `alg:none` SAML assertion MUST be rejected (non-2xx)'],
101
+ ['unsigned', 'an unsigned SAML assertion MUST be rejected (non-2xx)'],
102
+ ['bad-signature', 'a SAML assertion with an invalid signature MUST be rejected (non-2xx)'],
103
+ ['expired', 'a SAML assertion past `NotOnOrAfter` MUST be rejected (non-2xx)'],
104
+ ['not-yet-valid', 'a SAML assertion before `NotBefore` MUST be rejected (non-2xx)'],
105
+ ['signature-wrapping', 'a signature-wrapped (XSW) SAML assertion MUST be rejected (non-2xx)'],
106
+ ];
107
+
108
+ for (const [variant, requirement] of negatives) {
109
+ it(`REJECTS the ${variant} assertion over the seam (synthetic IdP required)`, async () => {
110
+ const profiles = await readProfiles();
111
+ if (profiles === null || !profiles.includes(SAML_PROFILE)) return; // capability-gated
112
+ if (!gated()) return; // opt-in: synthetic-IdP harness not provided
113
+ const res = await driver.post('/v1/host/sample/auth/saml/validate', { idpUrl, variant });
114
+ if (res.status === 404) return; // seam unwired
115
+ expect(res.status, driver.describe('RFC 0050 §A', requirement)).toBeGreaterThanOrEqual(400);
116
+ });
117
+ }
81
118
  });
82
119
 
83
120
  describe('category: auth-saml synthetic-IdP reference suite (RFC 0050 §A)', () => {
@@ -36,7 +36,7 @@ import {
36
36
  DELIVERY_OUTCOMES,
37
37
  SUBSCRIPTION_STATES,
38
38
  } from '../lib/triggerBridge.js';
39
- import { queryTestEvents, isEventLogSeamAvailable, resetTestSeam } from '../lib/event-log-query.js';
39
+ import { queryTestEvents, requireEvents, isEventLogSeamAvailable, resetTestSeam } from '../lib/event-log-query.js';
40
40
 
41
41
  const CONTENT_FREE_FORBIDDEN = ['body', 'headers', 'payload', 'secret', 'credentials', 'token', 'apiKey'];
42
42
 
@@ -57,69 +57,105 @@ describe('trigger-bridge-delivery (RFC 0083 §C)', () => {
57
57
  // ---- Leg 1: dedup → effectively-once (§C-1) ---------------------------
58
58
  const dedup = await driveDelivery({ scenario: 'dedup', dedupKey: 'conformance-dedup-key', source: 'queue' });
59
59
  if (dedup === null) return; // delivery seam unwired — soft-skip the whole behavioral suite
60
- if (dedup.runId || dedup.subscriptionId) {
61
- const subId = dedup.subscriptionId;
62
- const q = await queryTestEvents(dedup.runId ?? '__dedup__', { type: 'trigger.delivery.attempted' });
63
- if (q.ok) {
64
- const deliveredForKey = q.events.filter(
65
- (e) => e.payload.dedupKey === 'conformance-dedup-key' && e.payload.outcome === 'delivered',
66
- );
67
- // Effectively-once: a repeated dedupKey MUST NOT produce two 'delivered' attempts.
68
- expect(
69
- deliveredForKey.length <= 1,
70
- driver.describe('trigger-bridge.md §C-1', 'a repeated dedupKey MUST be effectively-once (≤1 delivered attempt)'),
71
- ).toBe(true);
72
- for (const e of q.events) {
73
- expect(
74
- typeof e.payload.outcome === 'string' && DELIVERY_OUTCOMES.includes(e.payload.outcome as string),
75
- driver.describe('run-event-payloads.schema.json#triggerDeliveryAttempted', 'outcome MUST be delivered|retrying|dead-lettered'),
76
- ).toBe(true);
77
- expectContentFree(e.payload, 'trigger.delivery.attempted');
78
- }
79
- }
80
- void subId;
60
+
61
+ // The profile is derived AND the seam is wired — missing evidence is a
62
+ // FAILURE, not a soft-skip. A repeated dedupKey MUST be effectively-once:
63
+ // EXACTLY one delivered attempt for the key (zero would mean no delivery at all).
64
+ const dedupQueryId = dedup.runId ?? '__dedup__';
65
+ const dedupEvents = requireEvents(
66
+ await queryTestEvents(dedupQueryId, { type: 'trigger.delivery.attempted' }),
67
+ 'trigger.delivery.attempted (dedup)',
68
+ );
69
+ const deliveredForKey = dedupEvents.filter(
70
+ (e) => e.payload.dedupKey === 'conformance-dedup-key' && e.payload.outcome === 'delivered',
71
+ );
72
+ expect(
73
+ deliveredForKey.length === 1,
74
+ driver.describe('trigger-bridge.md §C-1', 'a repeated dedupKey MUST be effectively-once — EXACTLY one delivered attempt (not zero, not two)'),
75
+ ).toBe(true);
76
+ for (const e of dedupEvents) {
77
+ expect(
78
+ typeof e.payload.outcome === 'string' && DELIVERY_OUTCOMES.includes(e.payload.outcome as string),
79
+ driver.describe('run-event-payloads.schema.json#triggerDeliveryAttempted', 'outcome MUST be delivered|retrying|dead-lettered'),
80
+ ).toBe(true);
81
+ expectContentFree(e.payload, 'trigger.delivery.attempted');
81
82
  }
82
83
 
83
84
  // ---- Leg 2: retry → dead-letter (§C-2 + RFC 0053) --------------------
84
85
  const exhaust = await driveDelivery({ scenario: 'exhaust', source: 'webhook' });
85
- if (exhaust && (exhaust.runId || exhaust.subscriptionId)) {
86
- const key = exhaust.runId ?? '__exhaust__';
87
- const dq = await queryTestEvents(key, { type: 'trigger.delivery.attempted' });
88
- if (dq.ok && dq.events.length > 0) {
89
- const terminal = dq.events.sort((a, b) => a.sequence - b.sequence)[dq.events.length - 1]!;
90
- expect(
91
- terminal.payload.outcome === 'dead-lettered',
92
- driver.describe('trigger-bridge.md §C-2', 'an exhausted retry policy MUST terminate in a dead-lettered delivery'),
93
- ).toBe(true);
94
- }
95
- const sq = await queryTestEvents(key, { type: 'trigger.subscription.state.changed' });
96
- if (sq.ok && sq.events.length > 0) {
97
- const toDeadLetter = sq.events.some((e) => e.payload.toState === 'dead-lettered');
98
- expect(
99
- toDeadLetter,
100
- driver.describe('trigger-bridge.md §B', 'the subscription MUST transition to dead-lettered on exhaustion'),
101
- ).toBe(true);
102
- for (const e of sq.events) {
103
- expect(
104
- typeof e.payload.toState === 'string' && SUBSCRIPTION_STATES.includes(e.payload.toState as string),
105
- driver.describe('trigger-bridge.md §B', 'toState MUST be in the four-state vocabulary'),
106
- ).toBe(true);
107
- expectContentFree(e.payload, 'trigger.subscription.state.changed');
108
- }
109
- }
86
+ expect(
87
+ exhaust !== null,
88
+ driver.describe('trigger-bridge.md §C-2', 'the exhaust scenario MUST be wired when the delivery seam is'),
89
+ ).toBe(true);
90
+ const exKey = exhaust!.runId ?? '__exhaust__';
91
+ const exhaustEvents = requireEvents(
92
+ await queryTestEvents(exKey, { type: 'trigger.delivery.attempted' }),
93
+ 'trigger.delivery.attempted (exhaust)',
94
+ );
95
+ expect(
96
+ exhaustEvents.length >= 1,
97
+ driver.describe('trigger-bridge.md §C-2', 'an exhausted delivery MUST emit ≥1 trigger.delivery.attempted'),
98
+ ).toBe(true);
99
+ const terminal = exhaustEvents.sort((a, b) => a.sequence - b.sequence)[exhaustEvents.length - 1]!;
100
+ expect(
101
+ terminal.payload.outcome === 'dead-lettered',
102
+ driver.describe('trigger-bridge.md §C-2', 'an exhausted retry policy MUST terminate in a dead-lettered delivery'),
103
+ ).toBe(true);
104
+ const stateEvents = requireEvents(
105
+ await queryTestEvents(exKey, { type: 'trigger.subscription.state.changed' }),
106
+ 'trigger.subscription.state.changed (exhaust)',
107
+ );
108
+ expect(
109
+ stateEvents.length >= 1,
110
+ driver.describe('trigger-bridge.md §B', 'exhaustion MUST emit ≥1 trigger.subscription.state.changed'),
111
+ ).toBe(true);
112
+ expect(
113
+ stateEvents.some((e) => e.payload.toState === 'dead-lettered'),
114
+ driver.describe('trigger-bridge.md §B', 'the subscription MUST transition to dead-lettered on exhaustion'),
115
+ ).toBe(true);
116
+ for (const e of stateEvents) {
117
+ expect(
118
+ typeof e.payload.toState === 'string' && SUBSCRIPTION_STATES.includes(e.payload.toState as string),
119
+ driver.describe('trigger-bridge.md §B', 'toState MUST be in the four-state vocabulary'),
120
+ ).toBe(true);
121
+ expectContentFree(e.payload, 'trigger.subscription.state.changed');
110
122
  }
111
123
 
112
124
  // ---- Leg 3: delivery → run causation (§C / RFC 0040) -----------------
125
+ // §C: "the run started by a successful delivery MUST carry the delivery's
126
+ // id as causationId on its run.started." The delivery's id is the
127
+ // trigger.delivery.attempted{delivered} event's id, so we assert EQUALITY
128
+ // (not merely "a causation id exists") — the trigger→run link MUST resolve.
113
129
  const delivered = await driveDelivery({ scenario: 'deliver', source: 'schedule' });
114
- if (delivered?.runId) {
115
- const rq = await queryTestEvents(delivered.runId, { type: 'run.started' });
116
- if (rq.ok && rq.events[0]) {
117
- expect(
118
- typeof rq.events[0].causationId === 'string' && (rq.events[0].causationId as string).length > 0,
119
- driver.describe('trigger-bridge.md §C / RFC 0040', 'the delivered run.started MUST carry the delivery causationId (resolvable via /ancestry)'),
120
- ).toBe(true);
121
- }
122
- }
130
+ expect(
131
+ delivered !== null && typeof delivered.runId === 'string' && (delivered.runId as string).length > 0,
132
+ driver.describe('trigger-bridge.md §C', 'a successful delivery MUST create a run'),
133
+ ).toBe(true);
134
+ const deliveredRunId = delivered!.runId as string;
135
+ const attemptEvents = requireEvents(
136
+ await queryTestEvents(deliveredRunId, { type: 'trigger.delivery.attempted' }),
137
+ 'trigger.delivery.attempted (deliver)',
138
+ );
139
+ const deliveredEvent = attemptEvents.find((e) => e.payload.outcome === 'delivered');
140
+ expect(
141
+ deliveredEvent !== undefined,
142
+ driver.describe('trigger-bridge.md §C-1', 'a successful delivery MUST emit a trigger.delivery.attempted{outcome:delivered}'),
143
+ ).toBe(true);
144
+ const runStartedEvents = requireEvents(
145
+ await queryTestEvents(deliveredRunId, { type: 'run.started' }),
146
+ 'run.started (deliver)',
147
+ );
148
+ expect(
149
+ runStartedEvents.length >= 1,
150
+ driver.describe('trigger-bridge.md §C', 'a delivered run MUST emit run.started'),
151
+ ).toBe(true);
152
+ const runStarted = runStartedEvents.sort((a, b) => a.sequence - b.sequence)[0]!;
153
+ expect(
154
+ typeof runStarted.causationId === 'string' &&
155
+ (runStarted.causationId as string).length > 0 &&
156
+ runStarted.causationId === deliveredEvent!.eventId,
157
+ driver.describe('trigger-bridge.md §C / RFC 0040', 'run.started.causationId MUST EQUAL the delivery id (the trigger.delivery.attempted{delivered} eventId) — resolvable via /ancestry'),
158
+ ).toBe(true);
123
159
 
124
160
  await resetTestSeam();
125
161
  });