@openwop/openwop-conformance 1.14.0 → 1.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,16 @@
4
4
 
5
5
  _No unreleased changes._
6
6
 
7
+ ## [1.15.0] — 2026-06-01 — RFC 0080 memory degraded-projection behavioral gate
8
+
9
+ Standalone conformance minor — a scenario addition published via the `openwop-conformance/v1.15.0` per-package tag (PUBLISHING.md §"CI automation"; only the `publish-conformance` job runs), NOT a coordinated spec-corpus release. `EXPECTED_CONFORMANCE_VERSION` advances to `1.15.0` in lockstep. The steward prerequisite that lets MyndHyve run the RFC 0080 §C degraded-projection scenario non-vacuously under `OPENWOP_REQUIRE_BEHAVIOR=true` to graduate `memory` from `Active` to `Accepted`. The normative surface (the `memory.{search,retention,writable}` dimensions + the `memoryDegraded`/`degradedMemoryDimensions` inventory fields) already shipped — this release is the gated test surface only. The companion RFC 0068 gated scenarios (`memory-consolidation-idempotent` + `commitment-fired`) shipped in 1.14.0; this release additionally documents their two host seams in `host-sample-test-seams.md`.
10
+
11
+ ### Added — RFC 0080 behavioral scenario
12
+
13
+ - **`memory-degraded-projection.test.ts`** (`behaviorGate('openwop-memory-degraded', …)`, gated on `agents.manifestRuntime.supported` + `memory.supported`) — the RFC 0080 §C degraded-projection iff-contract on the NORMATIVE `GET /v1/agents`: a degraded inventory entry MUST carry `memoryDegraded:true` + a non-empty, unique `degradedMemoryDimensions[]` drawn from the closed §A-name enum (`read`/`write`/`search`/`long-term-durability`/`compaction`/`attribution`/`replay-snapshot`/`retention`); a non-degraded entry MUST NOT carry a non-empty list; the inventory is non-empty; the degraded branch runs non-vacuously when `OPENWOP_DEGRADED_AGENT_ID` names a known-degraded agent. Black-box on the normative path — no POST seam. This is the RFC 0080 → Accepted bar.
14
+
15
+ Additive + capability-gated; existing v1.0-only hosts pass unchanged. No new schemas (the `memory.{search,retention,writable}` dimensions + the `memoryDegraded`/`degradedMemoryDimensions` inventory fields shipped at `Draft → Active`). Also documents the two RFC 0068 conformance seams (`POST /v1/host/sample/memory/consolidate` + `.../commitment/fire`) in `host-sample-test-seams.md` — the 0068 gated scenarios (`memory-consolidation-idempotent` + `commitment-fired`) already shipped in 1.14.0.
16
+
7
17
  ## [1.14.0] — 2026-06-01 — RFC 0078 tool-catalog + RFC 0079 egress-policy behavioral gates
8
18
 
9
19
  Standalone conformance minor — a scenario addition published via the `openwop-conformance/v1.14.0` per-package tag (PUBLISHING.md §"CI automation"; only the `publish-conformance` job runs), NOT a coordinated spec-corpus release. `EXPECTED_CONFORMANCE_VERSION` advances to `1.14.0` in lockstep. Four gated behavioral scenarios + two `src/lib/` helpers, the steward prerequisite to graduating `toolCatalog` (RFC 0078) and `httpClient.egressPolicy` (RFC 0079) from `Active` to `Accepted` on a non-steward host (MyndHyve). All additive + capability-gated; existing v1.0-only hosts pass unchanged. The normative surface (`GET /v1/tools` + `GET /v1/tools/{toolId}`, the `tool.session.*`/`egress.decided` events + the `tool-descriptor`/`credential-provenance` schemas) already shipped — this release is the gated test surface only.
package/README.md CHANGED
@@ -93,7 +93,7 @@ Exit code is non-zero on any failed assertion.
93
93
 
94
94
  ## What's Covered
95
95
 
96
- The current suite has 319 scenario files under `src/scenarios/`. 2026-06-01 (RFC 0035 — sandbox wall-clock timeout, the 7th-of-8 graduation) added `sandbox-wasm-timeout.test.ts` (worker-driven server-free: `probeTimeout` in `wasm-sandbox-probe.ts` spawns a worker thread running the committed `misbehaving-timeout.wasm` + a main-thread kill-timer — the thread preemption a same-thread probe can't do — asserting `sandbox_timeout` with a well-behaved positive control; graduates `node-pack-sandbox-timeout` reference-impl→protocol, so 7 of 8 `node-pack-sandbox-*` invariants are now protocol-tier, only the JS-specific `no-eval` permanently exempt). 2026-05-31 (audit-response black-box / graduation batch) added three more: `sandbox-wasm-isolation.test.ts` (RFC 0035 — drives the committed `fixtures/wasm-sandbox/*.wasm` through `wasm-sandbox-probe.ts`: escape/capability-gate via static `WebAssembly.Module.imports()`, an OOB-store memory trap, double-instantiate isolation; 10/10; graduates 6 `node-pack-sandbox-*` invariants reference-impl→protocol), `workspace-cross-tenant-isolation-blackbox.test.ts` (RFC 0059 — two-credential black-box on the normative §C `/v1/host/workspace/files` endpoints: owner A writes, a second-tenant credential fails closed; no seam), and `prompt-resolution-chain-event.test.ts` (RFC 0029 — reads the durable `agent.promptResolved.chain[]` precedence record via the normative `GET /v1/runs/{runId}/events/poll`; no seam) — each the production-path proof that graduates its surface into the `openwop-core-standard` floor. 2026-05-31 (RFC 0088 — the `openwop-core-standard` Core Standard Profile, the audit-response Core Candidate target) added `core-standard-profile.test.ts` (always-on server-free derivation probe: `isCoreStandard` derives the §B floor — `openwop-core` ∧ `openwop-interrupts` ∧ (`openwop-stream-sse` ∨ `openwop-stream-poll`) — a bare `openwop-core` host without interrupts is excluded, a host with no event transport fails, and the annex is absent from `deriveProfiles` because it composes rather than redefines). 2026-05-31 (RFC 0082 — agent deployment lifecycle, the Active→Accepted behavioral gate) added `agent-deployment-lifecycle.test.ts` (capability-gated on `agents.deployment.supported` via `behaviorGate('openwop-deployment-lifecycle', …)` — the §E promotion contract via the new `POST /v1/host/sample/agents/deployment-transition` seam + the test event-log seam across four legs: `promote` (authorize RFC 0049 → approvalGate RFC 0051 → eval-verify RFC 0081 → content-free `deployment.promoted` with a seven-state `toState` + `toVersion`, the record validating `agent-deployment.schema.json`), `unauthorized` (fail-closed — `allowed:false`, no `deployment.promoted`, the behavioral leg of `deployment-promotion-fail-closed`), `eval-gate-unmet` (`eval_gate_unmet` denial, §E-3), and `channel-pin` (the §B `resolvedAgentVersion` recorded-fact on `agent.invocation.started`); new lib helper `src/lib/agentDeployment.ts`; soft-skips on 404 — the RFC 0082 → Accepted bar). 2026-05-31 (RFC 0081 — agent evaluation, the Active→Accepted behavioral gate) added `agent-eval-run.test.ts` (capability-gated on `agents.evalSuite.supported` via `behaviorGate('openwop-eval-run', …)` — the §B `mode:"eval"` projection via the new `POST /v1/host/sample/agents/eval-run` seam + the test event-log seam: `eval.started`-first → one `eval.scored` per task → `eval.completed`-once ordering (count == `eval.completed.taskCount`), the content-free `eval.scored` legs (`score` ∈ 0..1) backing `eval-summary-no-content-leak`, and the NORMATIVE `GET /v1/runs/{runId}/eval-summary` schema-valid `EvalSummary` round-trip with `passedCount <= taskCount`; new lib helper `src/lib/agentEval.ts`; soft-skips on 404 — the RFC 0081 → Accepted bar). 2026-05-31 (RFC 0083 — durable trigger bridge, the Active→Accepted behavioral gate) added `trigger-bridge-delivery.test.ts` (profile-gated on `openwop-trigger-bridge` derived from the live discovery doc — the §C delivery model via the `POST /v1/host/sample/trigger-bridge/deliver` seam + the test event-log seam: dedup→effectively-once `trigger.delivery.attempted{delivered}` (§C-1), retry-exhaustion→`{dead-lettered}` + `trigger.subscription.state.changed{toState:dead-lettered}` (§C-2 + RFC 0053), and the delivered run's `run.started.causationId` == the delivery id (§C / RFC 0040); both `trigger.*` events content-free; the always-on shape stays in `trigger-bridge-shape.test.ts`; new lib helper `src/lib/triggerBridge.ts`). 2026-05-31 (RFC 0087 — agent org-chart, the Active→Accepted behavioral gate) added two capability-gated behavioral scenarios (both gated on `agents.orgChart.supported`, black-box on the normative `/v1/agents/org-chart` surface — no new POST seam): `agent-org-chart-scoping.test.ts` (the `GET /v1/agents/org-chart` tree-shape — departments form an acyclic `parentDepartmentId` tree, members reference `host:<id>` roster entries — + the §D responsibility roll-up via `GET /v1/agents/org-chart/{departmentId}` with a deduped `responsibilities[]` union + the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ORG_CHART_DEPARTMENT_ID`) and `org-position-no-authority-escalation.test.ts` (the behavioral leg of the protocol-tier invariant — the live org-chart wire carries NO authority-bearing field on any member/department/responsibility-view object; the structural leg stays always-on in `agent-org-chart-shape.test.ts`, and the deeper RFC 0049/0051 authority-invariance legs stay reference-impl tier per the `agent-manifest-runtime` no-host-hook precedent). 2026-05-31 (RFCs 0086 + 0077 — the Active→Accepted behavioral gate) added four capability-gated behavioral scenarios so a non-steward host can be mechanically certified non-vacuously under `OPENWOP_REQUIRE_BEHAVIOR=true`: `agent-roster-attribution.test.ts` (RFC 0086 §B/§C; gated on `agents.roster.supported` — the normative `GET /v1/agents/roster` read shape + `total==roster.length`, the §C `roster.run.initiated`-before-`agent.invocation.started` ordering, the content-free payload backing `roster-attribution-no-content`, the durable work-item `triggerSubscriptionId`, and the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ROSTER_ID`), `agent-live-invocation-bracket.test.ts` (RFC 0077 §E; gated on `agents.liveRuntime.supported` — `agent.invocation.started`-first / `agent.invocation.completed`-last bracket, matching `invocationId`, `source`/`outcome` closed enums, content-free), `agent-live-structured-output.test.ts` (RFC 0077 §B step 6; gated on `agents.liveRuntime.structuredOutput` — a result violating `handoff.returnSchemaRef` fails the invocation `outcome:"failed"` rather than shipping as completed), and `agent-live-allowlist-enforced.test.ts` (RFC 0077 §F-1 / RFC 0002 §A14; gated on `agents.liveRuntime.supported` — a tool outside `toolAllowlist` is not callable); all four drive the documented `POST /v1/host/sample/roster/fire` + `POST /v1/host/sample/agents/live-invoke` seams plus the test event-log seam and soft-skip on 404 (these are the RFC 0086 / 0077 Active→Accepted bars). 2026-05-30 (RFC 0087 — agent org-chart, Draft -> Active) added `agent-org-chart-shape.test.ts` (always-on server-free: the `capabilities.agents.orgChart` shape + the `AgentOrgChart` round-trip + the non-`host:` member negative + the **§B structural non-authority guarantee** — the schema rejects a `scopes`/`canDispatch`/`permissions`/`authority` field on a member (`additionalProperties:false`), and a member's key set is exactly `{rosterId, departmentId, roleId, reportsTo}` — backing the protocol-tier `org-position-no-authority-escalation` invariant; no new RunEventType). 2026-05-30 (RFC 0086 — standing agent roster, Draft -> Active) added `agent-roster-shape.test.ts` (always-on server-free: the `capabilities.agents.roster` shape + the `AgentRosterEntry` round-trip + the `host:` `rosterId` + `agentRef` version-XOR-channel negatives + the content-free `roster.run.initiated` negatives backing the protocol-tier `roster-attribution-no-content` invariant + the additive `roster` inventory projection + RunEventType-enum membership). 2026-05-30 (RFC 0082 — agent deployment lifecycle, Draft -> Active) added `agent-deployment-shape.test.ts` (always-on server-free: the `capabilities.agents.deployment` shape + the `AgentDeployment` record round-trip + the `AgentRef` `channel` XOR `version` `not`-clause + the four `deployment.*` payloads + the content-free negatives backing the protocol-tier `deployment-event-no-content-leak` invariant). 2026-05-30 (RFC 0085 — `openwop-agent-platform` meta-profile, Draft -> Active) added `agent-platform-profile.test.ts` (always-on server-free derivation of the operational-annex `none`/`partial`/`full` status: all-floor ⇒ partial, missing-flag ⇒ none, the replay-OR-`nondeterminismPolicy.declared` term, floor+governance ⇒ full, missing-tenant-scope ⇒ partial-not-full per the honest-advertisement rule, eval/deploy/budget-are-advisory-not-hard-terms, + the `capabilities.nondeterminismPolicy.declared` shape). 2026-05-30 (RFC 0084 — budget, quota + cost policy, Draft -> Active) added `budget-policy-shape.test.ts` (always-on server-free: `budget-policy.schema.json` round-trip + the §A orthogonality guard — a wall-time field is rejected (it's RFC 0058's `runTimeoutMs`) — + threshold/onExhaustion negatives + the four content-free `budget.{reserved,consumed,threshold.crossed,exhausted}` payloads + the four `cap.breached{budget-*}` kinds + RunEventType-enum membership + the no-pricing-property structural check backing the protocol-tier `budget-no-pricing-leak` invariant + the `capabilities.budget`/`limits.maxBudget*` shape). 2026-05-30 (RFC 0083 — durable trigger + channel bridge, Draft -> Active) added `trigger-bridge-shape.test.ts` (always-on server-free: `trigger-subscription.schema.json` round-trip + missing-`state`/out-of-enum-`source`/unknown-property negatives + the four-state vocab + the two content-free `trigger.{subscription.state.changed,delivery.attempted}` payloads incl. closed `state`/`outcome` enums + RunEventType-enum membership + the `triggerBridge`/`webhooks.durable` capability shape + the `openwop-trigger-bridge` profile derivation incl. the no-dead-letter-sink negative). 2026-05-30 (RFC 0079 — credential provenance + egress policy, Draft -> Active) added `egress-provenance-shape.test.ts` (always-on server-free: `credential-provenance.schema.json` round-trip + `audiences:[]`/missing-`credentialId`/unknown-property negatives + the no-secret-property structural check backing the protocol-tier `egress-decision-no-secret-leak` invariant + the content-free `egress.decided` record incl. the `decision` enum + RunEventType-enum membership + the `httpClient.egressPolicy` shape; the behavioral `egress-credential-audience-bound` confused-deputy MUST is reference-impl tier, deferred to a host). 2026-05-30 (RFC 0078 — portable tool catalog, Draft -> Active) added `tool-descriptor-shape.test.ts` (always-on server-free: `tool-descriptor.schema.json` round-trip + the §C-1 `exec` ⇒ `host-extension` cross-field MUST (RFC 0069) + the `safetyTier`-required negative + `additionalProperties:false`, the `capabilities.toolCatalog` `supported`/`sources`/`sessionLifecycle` shape, and the two content-free `tool.session.{opened,closed}` payload $defs incl. the closed `outcome` enum + RunEventType-enum membership). 2026-05-30 (RFC 0080 — agent memory capability reconciliation, Draft -> Active) added `memory-capability-model-shape.test.ts` (always-on server-free: the additive `capabilities.memory.{writable,search,retention}` dimension shapes + malformed-instance negatives — `retention.ttl` non-boolean, out-of-enum `search.modes`, unknown property under `additionalProperties:false` — the `agent-inventory-response` `memoryDegraded`/`degradedMemoryDimensions` closed-enum fields, and the `openwop-memory` derivation surfacing for read/write + long-term hosts while withholding from `writable:false`). 2026-05-30 (RFC 0081 — agent evaluation, Draft -> Active) added `agent-eval-suite-shape.test.ts` (always-on server-free: the `capabilities.agents.evalSuite` shape + the `AgentEvalSuite`/`EvalSummary` schema round-trips + the three `eval.{started,scored,completed}` payloads + the content-free negatives — a task entry with a `taskOutput` body, a `safetyFinding` with an `excerpt` — backing the new `eval-summary-no-content-leak` SECURITY invariant). 2026-05-29 (RFC 0076 §B — `ctx.http.safeFetch` live-run audit) added `safefetch-live-audit.test.ts` (`behaviorGate('openwop-safefetch-live-audit', …)`, gated on `httpClient.safeFetch` + `toolHooks.prePostEvents`) — asserts the audit-when-both MUST against the **durable run event log** via the new `POST /v1/host/sample/http/safe-fetch-run` open seam + the test event-log seam, closing the seam-vs-production gap (a production `createSafeFetch()` with no audit hooks passes the inline `safefetch-behavior.test.ts` but FAILS this under `OPENWOP_REQUIRE_BEHAVIOR=true`); this is the RFC 0076 §B → Accepted bar; run seam soft-skips on 404 (host-pending). 2026-05-29 (RFC 0066 — `x-openwop-form` picker UX hints, Draft → Active) added `x-openwop-form-pack-manifest.test.ts` (always-on server-free: an annotated `configSchema` stays a valid 2020-12 schema + the advisory hints don't change what it accepts, each §A annotation matches the shape, an unknown `kind` validates for forward-compat, 3 negatives — missing/non-string `kind`, non-string `dependsOn`). 2026-05-29 (RFC 0076 §B — `ctx.http.safeFetch`) added `safefetch-behavior.test.ts` (seam-gated: SSRF block / DNS-rebinding / `Connection: upgrade` refusal / tool-hooks audit-when-both, via `POST /v1/host/sample/http/safe-fetch`; advertisement contract stays in `http-client-ssrf.test.ts`). 2026-05-29 (RFC 0076 §A — pack `runtime.requires[]` install gate) added two: `runtime-requires-shape.test.ts` (server-free closed-vocabulary validation — the 8 tokens validate, a raw builtin name is rejected, empty-array≡omission, `uniqueItems`) + `runtime-requires-install-gate.test.ts` (seam-gated install-grant / install-refuse → `pack_runtime_requirement_unmet` / non-sandbox SHOULD-projection, soft-skip on 404 via `POST /v1/host/sample/packs/install-gate`). 2026-05-29 (RFC 0047 — `host.oauth` authorization-code roundtrip) added `oauth-authorization-code-roundtrip.test.ts` — capability-gated on `capabilities.oauth.supported` + `grants` including `authorization_code`; drives the `POST /v1/host/sample/oauth/authorize-code-roundtrip` seam against the one canonical synthetic provider in `fixtures/oauth-providers/synthetic.json` (soft-skip on 404, Tier-2 host-pending), asserting a successful grant returns a credential REFERENCE (token persisted as a `host.credentials` entry) and that the authorization code / state / PKCE verifier / acquired access+refresh tokens never appear on any run-visible surface (RFC 0047 §C + §C.2 / `credential-payload-redaction`). Closes the RFC 0047 Tier-2 gap (capability-shape + redaction scenarios existed; the actual authorization-code dance was unexercised). 2026-05-26 (RFC 0070 — agent-manifest runtime) added `agent-manifest-runtime.test.ts`; 2026-05-26 (RFC 0071 — artifact-type + chat card packs) added six: `artifact-type-pack-manifest-validation.test.ts` + `artifact-schema-compile-bounded.test.ts` (server-free) + `artifact-type-pack-install.test.ts` + `artifact-type-store-without-render.test.ts` + `chat-card-pack-manifest-validation.test.ts` (server-free) + `chat-card-pack-execution.test.ts` (capability-gated, host-pending). 2026-05-26 (RFCs 0067 / 0068 / 0069 — spec-gap Draft cohort) added five scenarios: `byok-auth-modes.test.ts` (RFC 0067; always-on schema-shape of `aiProviders.authModes` + a discovery-gated §B auth-mode-contract cross-field check), `memory-consolidation-shape.test.ts` (RFC 0068; always-on shape of `agents.memoryConsolidation`/`agents.commitments` + the `agent.memory.consolidated`/`commitment.fired` payload $defs), `memory-consolidation-idempotent.test.ts` + `commitment-fired.test.ts` (RFC 0068; capability-gated behavioral, soft-skip on the documented `/v1/host/sample/memory/consolidate` + `/commitment/fire` seams), and `exec-not-protocol-tier.test.ts` (RFC 0069; always-on server-free structural assertion that the protocol corpus defines no `core.*`/`openwop.*` exec-class primitive — backs the `exec-must-not-be-protocol-tier` SECURITY invariant). 2026-05-25 (RFC 0061 — stateful agent-loop lifecycle, executionModel.version 5) added four `agent-loop-*.test.ts` scenarios: `-version5-shape` (always-on; validates `executionModel.statefulResume`/`transcriptWindow` + the 1–5 version ceiling) plus `-iteration-monotonic` (gated on `version >= 5`; `runOrchestrator.decided.iteration` increments 1,2,3… exactly once per turn), `-workspace-snapshot` (gated additionally on `host.workspace.supported`; a turn-i workspace write is invisible to turn i, visible to turn i+1), and `-stateful-resume` (gated on `statefulResume`; a mid-loop suspend resumes at the same iteration without resetting the counter) — the three behavioral scenarios drive the documented agent-loop seam (`POST /v1/host/sample/agentloop/run`) and soft-skip until a host wires it. 2026-05-25 (RFC 0059 — host.workspace M2, reference-host enforcement) added two `workspace-*.test.ts` scenarios: `-behavior` (capability-gated CRUD round-trip / `If-Match` 409 `workspace_conflict` / `workspace_too_large` / §D run-start snapshot, all via the real `/v1/host/workspace/files` §C endpoints) and `-cross-tenant-isolation` (WCT-1 — drives the documented `POST /v1/host/sample/workspace/op` seam to assert a file owned by one `{tenant, workspace}` is unreadable, on both `get` and `list`, under a different owner; backs the new `workspace-cross-tenant-isolation` SECURITY invariant). The in-memory reference host now advertises `capabilities.workspace.supported` and honors §C/§D/§E end-to-end. 2026-05-25 (RFC 0062 — memory.distillation "dreams") added five `distillation-*.test.ts` scenarios: `-shape` (always-on; validates the `capabilities.memory.distillation` block + the additive `distillation` sub-object on `memory.compacted`) plus `-token-budget` (within budget `tokensUsed ≤ tokenBudget`; an un-meetable budget → `token_budget_exceeded` with no partial archive), `-stable-archive` (same sources + budget ⇒ byte-stable archive checksum), `-index-roundtrip` (gated additionally on `indexEmitted`; the `MEMORY-INDEX.json` workspace file is retrievable + `workspace.updated` fired), and `-secret-carryforward` (SR-1: a redacted source secret never appears in the archive) — the four behavioral scenarios drive the documented memory-distillation seam (`POST /v1/host/sample/memory/distill`) and soft-skip until a host wires it. 2026-05-25 (RFC 0063 — core.subWorkflow.outputAttestation) added four `subrun-*.test.ts` scenarios: `-attestation-shape` (always-on; validates the `capabilities.agents.subRunAttestation` flag) plus `-checksum-stable` (the child output checksum is the byte-stable, key-order-invariant RFC 8785 JCS + SHA-256 digest), `-approval-gate` (`requireApproval` → `accept` merges, `reject` does not), and `-approval-fail-closed` (no `accept`/`edit-accept` → no merge; backs the deferred `subrun-merge-approval-fail-closed` invariant) — the three behavioral scenarios drive the documented sub-run attestation seam (`POST /v1/host/sample/subrun/attest`) and soft-skip until a host wires it. 2026-05-25 (RFC 0064 — host.toolHooks) added five `tool-hooks-*.test.ts` scenarios: `-shape` (always-on; validates the `capabilities.toolHooks` block + the optional content-free fields on `agentToolCalled` / `agentToolReturned`) plus `-content-free` (gated on `prePostEvents`), `-authorization-fail-closed` (gated on `perToolAuthorization`), `-rate-limit` (gated on `perToolRateLimit`), and `-secret-redaction` (gated on `prePostEvents` + the SR-1 `argsHash` redaction rule) — the four behavioral scenarios drive the documented tool-hooks invoke seam (`POST /v1/host/sample/toolhooks/invoke`) and soft-skip until a host wires it. 2026-05-25 (RFC 0060 — host.heartbeat) added four `heartbeat-*.test.ts` scenarios: `-capability-shape` (always-on; validates the `capabilities.heartbeat` block) plus `-fires-once-per-tick`, `-idempotent-no-spam`, and `-runtime-bound` (gated on `capabilities.heartbeat.supported` + the host heartbeat tick seam; soft-skip until a host wires it). 2026-05-25 (RFC 0057 — memory write-attribution) added five `memory-attribution-*.test.ts` scenarios: `-shape` (always-on advertisement check on `capabilities.memory.attribution`), plus `-no-content`, `-tenant-scoped`, `-emits-on-write`, and `-replay-stable` (gated on `capabilities.memory.attribution.emitsWriteEvents`) verifying the content-free `memory.written` RunEvent, its two SECURITY invariants (`memory-attribution-no-content` + `memory-attribution-tenant-scoped`), and the §D replay rule that a `replay`-mode fork MUST NOT regenerate `memoryId`. 2026-05-25 (RFC 0025 §C point 1 — test-catalog isolation invariant; pairs with the 25 publish-error scenarios in `pack-registry-publish.test.ts`) added `pack-registry-isolation.test.ts` — capability-gated on `capabilities.packs.testMode.{supported, isolated}: true`; PUTs a disposable pack into `/v1/packs-test/{name}` and asserts the same `(name, version)` does NOT appear via `GET /v1/packs/{name}` — anchors the test-catalog isolation MUST in RFC 0025 §C. 2026-05-25 (RFC 0028 Tier-2 post-promotion T2 — read-side sister scenario for workspace-membership enforcement) added `prompt-read-workspace-membership-enforced.test.ts` — gates on `capabilities.prompts.supported: true` (broader than `mutableLibrary` so read-only hosts that expose `?workspaceId=` are also probed); drives `GET /v1/prompts?workspaceId=<random-non-member>` and interprets the response: 4xx PASS (canonical envelope check on 403); 200 with empty `templates[]` PASS (correct null result for a nonexistent workspace); 200 with non-empty `templates[]` FAIL (cross-tenant leak); 200 without `templates[]` field SKIP (host doesn't expose workspace-scoped reads). Verifies SECURITY invariant `prompt-read-workspace-membership-enforced`. Same-day T1 strengthened `prompt-mutation-workspace-membership-enforced.test.ts` to pin `error === "workspace_membership_required"` when the host's refusal status is 403 (other refusal codes unconstrained). 2026-05-25 (RFC 0028 Tier-2 follow-up — workspace-membership enforcement on mutating prompt endpoints, filed in response to a self-disclosed adopter vulnerability) added `prompt-mutation-workspace-membership-enforced.test.ts` — capability-gated on `capabilities.prompts.mutableLibrary: true`; drives `POST /v1/prompts` with a cryptographically-random non-member `workspaceId` and asserts the host refuses (NOT a 2xx; any 4xx/5xx is acceptable — silent success is the failure mode). Verifies SECURITY invariant `prompt-mutation-workspace-membership-enforced`. 2026-05-22 (RFC 0034 §B follow-up — secret-leakage harness against the OTel + debug-bundle seams) added `secret-leakage-otel-attribute.test.ts` — gates on `capabilities.secrets.supported` + `capabilities.observability.testSeams.{otelScrape,debugBundleExport}` AND the `OPENWOP_CANARY_SECRET_VALUE` env (host operator + conformance runner agree on the canary). Drives the existing `openwop-smoke-byok-roundtrip` fixture end-to-end; scrapes both seams after run completion; hard-fails if the canary plaintext appears in any OTel span attribute or debug-bundle field. Verifies SECURITY invariants `secret-leakage-otel-attribute` + `secret-leakage-debug-bundle-otel`. 2026-05-22 (RFC 0041 Phase 4 — replay determinism under nondeterministic models) added three scenarios: `replay-divergence-at-refusal.test.ts` (advertisement-shape probe on `replayDeterminism.refusalDivergenceEmission` + 2 `it.todo` for the dual-direction refusal-divergence case), `replay-observable-sequence-determinism.test.ts` (capability-gated; behavioral assertion soft-skipped until a `conformance-phase4-nondet-tool` fixture ships), `replay-llm-cache-key-portable.test.ts` (intra-host reproducibility + non-recipe-field invariance + Phase 4 advertisement alignment — reuses the existing `POST /v1/host/sample/test/llm-cache-key` seam from the sibling `replay-llm-cache-key.test.ts`). 2026-05-20 (RFC 0027 §A templateKinds-coverage follow-up — paired with `prompt-end-to-end-events.test.ts`) added `prompt-all-four-kinds-events.test.ts` exercising all four `PromptKind` values (`system`, `user`, `schema-hint`, `few-shot`) end-to-end through the reference workflow-engine sample's `local.sample.demo.mock-ai` dispatch path; capability-gated via `behaviorGate('prompts-supported', ...)`. Closes the credibility gap where the host advertised `templateKinds: ["system", "user", "few-shot", "schema-hint"]` but only the system+user pair was actually wired into dispatch. 2026-05-20 (RFCs 0030–0033 — envelope LLM-contract-hardening track) added 15 scenarios across four `Active` RFCs: `envelope-reasoning-shape.test.ts` (RFC 0030, always-on; asserts the OPTIONAL `reasoning` property on the three universal-kind schemas + the `schema.response` deliberate omission), `envelope-reasoning-secret-redaction.test.ts` (RFC 0030, capability-gated on `capabilities.envelopes.reasoning.supported` + `secrets.supported`; 5 `it.todo()` placeholders for SECURITY invariant `envelope-reasoning-secret-redaction`), `envelope-tier-one-subset-static.test.ts` (RFC 0030, always-on for load-bearing rules — no `oneOf` / `allOf` / `not` / `prefixItems` / `propertyNames` anywhere; gated on `tierOneSubsetCompliance: "strict"` for OpenAI-strict-only constraints), `envelope-variant-discriminator-static.test.ts` (RFC 0031, always-on; asserts no `oneOf` + every `anyOf` branch declares a single-string-enum discriminator in `required` on every `schemas/envelopes/*.schema.json`), `model-capability-substituted.test.ts` (RFC 0031, advertisement-shape probe on `capabilities.modelCapabilities.advertised[]` identifier pattern + 5 `it.todo()` placeholders for SECURITY invariant `model-capability-substituted-no-credential-disclosure`), `model-capability-insufficient.test.ts` (RFC 0031, 6 `it.todo()` placeholders for refusal + no-recursive-fallback), `node-module-required-capabilities-shape.test.ts` (RFC 0031 SHOULD-tier authoring-convention; 4 `it.todo()` placeholders), and the six envelope-reliability events from RFC 0032 (`envelope-retry-attempted` carrying the shared advertisement-shape probe enforcing both MUST-tier events in `events[]` per RFC 0032 §C, plus `envelope-retry-exhausted`, `envelope-refusal-shape`, `envelope-truncated`, `envelope-nl-to-format-engaged`, `envelope-recovery-applied` — collectively 39 `it.todo()` placeholders covering retry/refusal/truncation/recovery + SECURITY invariants `envelope-refusal-no-prompt-leak` and `envelope-recovery-no-content-leak`), plus RFC 0033's two scenarios (`envelope-completion-distinguishes-truncation.test.ts` + `envelope-truncation-cap-exhaustion.test.ts` — 12 `it.todo()` placeholders covering the truncation-vs-schema-violation retry-routing distinction + the DoS-bound assertion). Reference workflow-engine sample advertises `capabilities.envelopes.reasoning: { supported: true, promptDirective: "off" }` + `tierOneSubsetCompliance: "warn"` honestly (schemas accept the field; host doesn't yet inject the directive); the other three RFCs' capability blocks defer to reference-host emission code per the staged RFC 0027 §G precedent. 2026-05-20 (RFC 0028 §B Phase B — prompt-pack boot-time install) added `prompt-pack-install.test.ts` (capability-gated on `capabilities.prompts.endpointsSupported: true`; asserts a host that ran the boot-time pack loader surfaces ≥ 1 pack-source template under `GET /v1/prompts?source=pack` carrying the canonical `meta.source: "pack"` + `meta.packName` + `meta.packVersion` stamps; positively identifies the in-tree `vendor.openwop.prompt-sample` reference pack's `writer-system` template when present). Pairs with the new `host/promptPackLoader.ts` boot-time entry on the reference workflow-engine sample, which scans `examples/packs/*` plus `OPENWOP_PROMPT_PACKS_DIR` and calls `installPackTemplates()` for each `kind: "prompt"` pack found. 2026-05-20 (RFC 0029 Phase C — prompt resolution chain wire shape) added three more scenarios: `prompt-resolution-chain-node-wins.test.ts` (capability-gated on `capabilities.prompts.supported: true`; asserts layer-1 node-config supersedes lower layers per `spec/v1/prompts.md` §"Resolution chain (normative)"), `prompt-resolution-chain-agent-intrinsic.test.ts` (additionally gated on `capabilities.prompts.agentBindings: true`; asserts agent intrinsic `systemPromptRef` wins over `promptOverrides` AND lower layers when the node has no layer-1 ref), `prompt-resolution-chain-fallback-cascade.test.ts` (asserts layer 3 workflow-defaults wins over layer 4 host-defaults; layer 4 host-defaults wins when 1-3 yield null; resolved is null when all four yield null but chain[] still lists every attempted layer). The scenarios drive the host's `POST /v1/host/sample/prompt/resolve` test seam (reference-host implementation deferred to follow-up slice per RFC 0021 staging precedent). 2026-05-20 (RFC 0027 Phase A — prompt templates wire shape) added three scenarios: `prompt-template-shape.test.ts` (always-on; Ajv compileability + positive/negative round-trip for PromptTemplate + PromptRef + PromptKind), `prompt-composed-secret-redaction.test.ts` (capability-gated on `capabilities.prompts.supported: true` + `observability: "full"`; asserts `[REDACTED:<secretId>]` markers in `prompt.composed` payloads for `source: "secret"` variable bindings per SECURITY/threat-model-secret-leakage.md §SR-1), `prompt-composed-trust-marker.test.ts` (same capability gates; asserts `<UNTRUSTED>...</UNTRUSTED>` wrapping + `contentTrust: "untrusted"` propagation per RFC 0020 §D). Paired with new `fixtures/prompt-templates/` sub-directory + per-fixture schema-validity describe block + future SECURITY invariants `prompt-composed-secret-redaction` and `prompt-composed-trust-marker` (lands alongside reference-host emission per RFC 0021 staging precedent). 2026-05-18 (RFC 0022 `Draft` — runtime variable mapping) added four `it.todo()` placeholder scenarios covering the new mapping surfaces on `core.dispatch` (§A — `dispatch-input-mapping.test.ts`, `dispatch-output-mapping.test.ts`, `dispatch-cross-worker-handoff.test.ts`) and `core.subWorkflow` (§B — `subworkflow-input-mapping.test.ts`). Gated on `capabilities.agents.dispatchMapping` (dispatch trio) and `capabilities.subWorkflow.inputMapping` (subWorkflow). Promote to live assertions when RFC 0022 reaches `Active` + a reference host advertises the matching flags. 2026-05-17 (RFC 0003 §D handoff-schema enforcement, HV-1) added `agentPackHandoffSchemaValidation.test.ts` — verifies the host validates dispatch payloads against `handoff.taskSchemaRef` AND return payloads against `handoff.returnSchemaRef` per RFC 0003 §D. Paired with the new `agent-pack-handoff-schema-enforcement` row in `SECURITY/invariants.yaml`. 2026-05-17 (AI Envelope gap-closure, DRAFT v1.x — `spec/v1/ai-envelope.md`) added 7 advertisement-shape scenarios with `it.todo()` behavioral placeholders gated on `capabilities.envelopeContracts.advertised: true`: `aiEnvelope.universalKinds.test.ts`, `aiEnvelope.schemaDrift.test.ts`, `aiEnvelope.correlationReplay.test.ts`, `aiEnvelope.contractRefusal.test.ts`, `aiEnvelope.trustBoundaryPropagation.test.ts`, `aiEnvelope.redaction.test.ts`, `aiEnvelope.capBreached.test.ts`. Paired with the new `envelope-redaction-sr-1-carry-forward` row in `SECURITY/invariants.yaml`. 2026-05-17 (post-publish hardening, deep audit of `core.openwop.agents`) added `agents-run-tool-allowlist.test.ts` — server-free scenario locking in the `core.openwop.agents@1.0.1` safety-fix that closes `OPENWOP-AUDIT-2026-003` (function-typed `tool.handler` properties rejected at `validateTools()` with `INVALID_TOOL_DECLARATION`; tool-driven runs require `ctx.agentRuntime`; tool-less safe fallback preserved). Paired with the new `agents-run-no-raw-handler` row in `SECURITY/invariants.yaml`. Same-day post-publish hardening added `idempotency-key-determinism.test.ts` — server-free scenario locking in the `core.openwop.http@1.1.2` determinism safety-fix (default `composite` mode produces deterministic keys in `(runId, nodeId, payload)`; removed `uuid` mode rejects with `CONFIG_INVALID`; cross-impl vector test lets third-party reimplementations verify wire agreement). Paired with the new `idempotency-key-deterministic` row in `SECURITY/invariants.yaml`. 2026-05-17 (Phase 3 of RFC 0013) added three server-free scenarios exercising the reference workflow-chain expansion library (`conformance/src/lib/workflow-chain-expansion.ts`): `workflow-chain-expansion.test.ts` (parameter substitution + node id collision avoidance + edge rewriting + capability propagation + runtime-invariance contract), `workflow-chain-unresolvable-typeid.test.ts` (rejection with `chain_unresolvable_typeid` when a chain references an unknown typeId), and `workflow-chain-pack-signature-verification.test.ts` (Ed25519 verification recipe reuse from `node-packs.md §Signing`). Earlier that day (Phase 1) added `workflow-chain-pack-manifest-validation.test.ts` — server-free schema-validation scenario covering the new `workflow-chain-pack-manifest.schema.json` (positive sample + two negatives: kind/contents mismatch and invalid `chainId`). Closes RFC 0013 (`Workflow-chain packs`, `Draft`) Phases 1 + 3 alongside the new `spec/v1/workflow-chain-packs.md`, the `Capabilities.workflowChainPacks` block, and the registry build-index/conformance-check `kind` routing from Phase 2. Earlier that day, the suite added 27 `it.todo()` placeholder scenarios paired with RFCs 0014-0020 (host capability surfaces — fs, kvStorage, tableStorage, queueBus, sql/vector/search, blob/cache, mcp.serverMount). These promote to live assertions when each RFC reaches `Active` + the matching capability block lands in `schemas/capabilities.schema.json` + a reference host advertises the capability. Earlier additions include 18 Multi-Agent Shift scenarios (Phases 1-5) added 2026-05-10, the `registry-public.test.ts` public-registry healthcheck added 2026-05-11 (opt-in via `OPENWOP_TEST_PUBLIC_REGISTRY=true`), the `replay-llm-cache-key.test.ts` placeholder added 2026-05-11 (three `it.todo()` cases for the cross-host LLM cache-key recipe per `replay.md` §"LLM cache-key recipe"), the two `production-*.test.ts` scenarios added 2026-05-11 for the `openwop-production` profile per RFC 0009 (`production-backpressure.test.ts`, `production-retention-expiry.test.ts`), the four `auth-*.test.ts` scenarios added 2026-05-11/12 for the production-auth profiles per RFC 0010 (`auth-api-key-rotation.test.ts`, `auth-oauth2-client-credentials.test.ts`, `auth-oidc-user-bearer.test.ts`, `auth-mtls.test.ts` (opt-in via `OPENWOP_TEST_MTLS=1`)), `replay-retention-expiry.test.ts` added 2026-05-12 (capability shape + 410/422 envelope per `replay.md` §"Retention and garbage collection"), `bulk-cancel.test.ts` added 2026-05-12 (Phase B close-out of R1 — `POST /v1/runs:bulk-cancel`), the two Phase H launch-blocker advertisement-contract scenarios added 2026-05-12 (`mcp-toolcall-redaction.test.ts` for the MCP-1 invariant per `host-capabilities.md §host.mcp` + `threat-model-prompt-injection.md §UNTRUSTED`, and `http-client-ssrf.test.ts` for the SSRF + body-size cap advertisement contract on `capabilities.httpClient`), the `wasm-pack-abi-version-rejection.test.ts` Track 7 scenario added 2026-05-12 for the ABI-mismatch positive path via the `vendor.openwop.misbehaving-abi` pack per RFC 0008 §H, the `otel-trace-propagation-subworkflow.test.ts` Track 11 close-out added 2026-05-13 (parent + child run spans share the inbound traceparent's traceId across the `core.subWorkflow` dispatch boundary), and the three RFC 0012 (Memory Compaction Profile, `Active`) scenarios added 2026-05-13/14: `memory-compaction-sr1-carry-forward.test.ts` (load-bearing SR-1 §D), `memory-compaction-event-emitted.test.ts` (canonical §B payload shape), and `memory-compaction-provenance-tag.test.ts` (soft assertion on §C `compacted-from:<id>` convention). All three gate on `capabilities.memory.compaction.supported` + the host's test seam at `/v1/test/memory/{seed,compact}` (Postgres reference host enables both via `OPENWOP_MEMORY_COMPACTION=true OPENWOP_TEST_TRIGGER_COMPACTION=true`). 2026-05-15 (gap-closure CF-3) added `interrupt-token-matrix.test.ts` (malformed / unknown / replay / cross-run-id paths on `GET|POST /v1/interrupts/{token}`). 2026-05-31 (RFC 0078 portable tool catalog + RFC 0079 credential provenance / egress policy — the Active→Accepted behavioral gate) added four: `tool-catalog-projection.test.ts` (capability-gated on `toolCatalog.supported` via `behaviorGate('openwop-tool-catalog', …)` — the NORMATIVE `GET /v1/tools` list with each `ToolDescriptor` schema-valid + `source`/`safetyTier` in the closed vocab + content-free, `GET /v1/tools/{toolId}` round-trip + unknown-id 404, 401-unauthenticated, and the §F-2 cross-principal non-disclosure; black-box, no POST seam), `tool-session-lifecycle.test.ts` (gated on `toolCatalog.sessionLifecycle` — the §D `tool.session.opened`-before / `tool.session.closed`-after bracket over the RFC 0064 call events via the `POST /v1/host/sample/tools/session-run` seam, one shared `sessionId`, content-free), `egress-audience-binding.test.ts` (KEYSTONE — gated on `httpClient.egressPolicy.supported`; the §C confused-deputy MUST via `POST /v1/host/sample/egress/decide`: an out-of-audience egress is denied/downgraded with the credential NOT attached, a provenance-unevaluable egress fails closed — the behavioral leg of `egress-credential-audience-bound`), and `egress-decision-content-free.test.ts` (the SR-1 canary — the credential value never surfaces in `egress.decided` and `reason` stays in the CLOSED vocabulary). The maintained scenario-to-spec map lives in [`coverage.md`](./coverage.md); this README keeps the operator quickstart and the historical scenario notes below.
96
+ The current suite has 321 scenario files under `src/scenarios/`. 2026-06-01 (RFC 0080 — agent memory capability reconciliation, the Active→Accepted behavioral gate) added `memory-degraded-projection.test.ts` (capability-gated on `agents.manifestRuntime.supported` + `memory.supported` via `behaviorGate('openwop-memory-degraded', …)` — the §C degraded-projection iff-contract on the NORMATIVE `GET /v1/agents`: a degraded inventory entry MUST carry `memoryDegraded:true` + a non-empty, unique `degradedMemoryDimensions[]` from the closed §A-name enum, a non-degraded entry MUST NOT, the inventory is non-empty, and the degraded branch runs non-vacuously when `OPENWOP_DEGRADED_AGENT_ID` names a known-degraded agent; black-box, no POST seam — the RFC 0080 → Accepted bar). This batch also documents the two RFC 0068 conformance seams (`POST /v1/host/sample/memory/consolidate` + `.../commitment/fire`) in `host-sample-test-seams.md` (the 0068 gated scenarios shipped in 1.14.0). 2026-06-01 (RFC 0034 — collector-side BYOK-canary inspection) added `otel-collector-canary-inspection.test.ts` (always-on server-free: stands up a real `OtelCollector`, POSTs synthetic OTLP/HTTP-JSON traces + metrics through its actual ingest path, and proves the new `findCanaryLeakage()` inspector catches a canary embedded in a span attribute / resource attribute / span name / metric data-point attribute while reporting ZERO hits on a redacted payload and never matching an empty canary — the non-vacuous proof that the conformance collector now inspects what the host's OTLP exporter ACTUALLY shipped over the wire, closing the `secret-leakage-otel-attribute` / `-debug-bundle-otel` collector-seam gap; the live capability-gated complement is the new collector-export describe block in `secret-leakage-otel-attribute.test.ts`). 2026-06-01 (RFC 0035 — sandbox wall-clock timeout, the 7th-of-8 graduation) added `sandbox-wasm-timeout.test.ts` (worker-driven server-free: `probeTimeout` in `wasm-sandbox-probe.ts` spawns a worker thread running the committed `misbehaving-timeout.wasm` + a main-thread kill-timer — the thread preemption a same-thread probe can't do — asserting `sandbox_timeout` with a well-behaved positive control; graduates `node-pack-sandbox-timeout` reference-impl→protocol, so 7 of 8 `node-pack-sandbox-*` invariants are now protocol-tier, only the JS-specific `no-eval` permanently exempt). 2026-05-31 (audit-response black-box / graduation batch) added three more: `sandbox-wasm-isolation.test.ts` (RFC 0035 — drives the committed `fixtures/wasm-sandbox/*.wasm` through `wasm-sandbox-probe.ts`: escape/capability-gate via static `WebAssembly.Module.imports()`, an OOB-store memory trap, double-instantiate isolation; 10/10; graduates 6 `node-pack-sandbox-*` invariants reference-impl→protocol), `workspace-cross-tenant-isolation-blackbox.test.ts` (RFC 0059 — two-credential black-box on the normative §C `/v1/host/workspace/files` endpoints: owner A writes, a second-tenant credential fails closed; no seam), and `prompt-resolution-chain-event.test.ts` (RFC 0029 — reads the durable `agent.promptResolved.chain[]` precedence record via the normative `GET /v1/runs/{runId}/events/poll`; no seam) — each the production-path proof that graduates its surface into the `openwop-core-standard` floor. 2026-05-31 (RFC 0088 — the `openwop-core-standard` Core Standard Profile, the audit-response Core Candidate target) added `core-standard-profile.test.ts` (always-on server-free derivation probe: `isCoreStandard` derives the §B floor — `openwop-core` ∧ `openwop-interrupts` ∧ (`openwop-stream-sse` ∨ `openwop-stream-poll`) — a bare `openwop-core` host without interrupts is excluded, a host with no event transport fails, and the annex is absent from `deriveProfiles` because it composes rather than redefines). 2026-05-31 (RFC 0082 — agent deployment lifecycle, the Active→Accepted behavioral gate) added `agent-deployment-lifecycle.test.ts` (capability-gated on `agents.deployment.supported` via `behaviorGate('openwop-deployment-lifecycle', …)` — the §E promotion contract via the new `POST /v1/host/sample/agents/deployment-transition` seam + the test event-log seam across four legs: `promote` (authorize RFC 0049 → approvalGate RFC 0051 → eval-verify RFC 0081 → content-free `deployment.promoted` with a seven-state `toState` + `toVersion`, the record validating `agent-deployment.schema.json`), `unauthorized` (fail-closed — `allowed:false`, no `deployment.promoted`, the behavioral leg of `deployment-promotion-fail-closed`), `eval-gate-unmet` (`eval_gate_unmet` denial, §E-3), and `channel-pin` (the §B `resolvedAgentVersion` recorded-fact on `agent.invocation.started`); new lib helper `src/lib/agentDeployment.ts`; soft-skips on 404 — the RFC 0082 → Accepted bar). 2026-05-31 (RFC 0081 — agent evaluation, the Active→Accepted behavioral gate) added `agent-eval-run.test.ts` (capability-gated on `agents.evalSuite.supported` via `behaviorGate('openwop-eval-run', …)` — the §B `mode:"eval"` projection via the new `POST /v1/host/sample/agents/eval-run` seam + the test event-log seam: `eval.started`-first → one `eval.scored` per task → `eval.completed`-once ordering (count == `eval.completed.taskCount`), the content-free `eval.scored` legs (`score` ∈ 0..1) backing `eval-summary-no-content-leak`, and the NORMATIVE `GET /v1/runs/{runId}/eval-summary` schema-valid `EvalSummary` round-trip with `passedCount <= taskCount`; new lib helper `src/lib/agentEval.ts`; soft-skips on 404 — the RFC 0081 → Accepted bar). 2026-05-31 (RFC 0083 — durable trigger bridge, the Active→Accepted behavioral gate) added `trigger-bridge-delivery.test.ts` (profile-gated on `openwop-trigger-bridge` derived from the live discovery doc — the §C delivery model via the `POST /v1/host/sample/trigger-bridge/deliver` seam + the test event-log seam: dedup→effectively-once `trigger.delivery.attempted{delivered}` (§C-1), retry-exhaustion→`{dead-lettered}` + `trigger.subscription.state.changed{toState:dead-lettered}` (§C-2 + RFC 0053), and the delivered run's `run.started.causationId` == the delivery id (§C / RFC 0040); both `trigger.*` events content-free; the always-on shape stays in `trigger-bridge-shape.test.ts`; new lib helper `src/lib/triggerBridge.ts`). 2026-05-31 (RFC 0087 — agent org-chart, the Active→Accepted behavioral gate) added two capability-gated behavioral scenarios (both gated on `agents.orgChart.supported`, black-box on the normative `/v1/agents/org-chart` surface — no new POST seam): `agent-org-chart-scoping.test.ts` (the `GET /v1/agents/org-chart` tree-shape — departments form an acyclic `parentDepartmentId` tree, members reference `host:<id>` roster entries — + the §D responsibility roll-up via `GET /v1/agents/org-chart/{departmentId}` with a deduped `responsibilities[]` union + the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ORG_CHART_DEPARTMENT_ID`) and `org-position-no-authority-escalation.test.ts` (the behavioral leg of the protocol-tier invariant — the live org-chart wire carries NO authority-bearing field on any member/department/responsibility-view object; the structural leg stays always-on in `agent-org-chart-shape.test.ts`, and the deeper RFC 0049/0051 authority-invariance legs stay reference-impl tier per the `agent-manifest-runtime` no-host-hook precedent). 2026-05-31 (RFCs 0086 + 0077 — the Active→Accepted behavioral gate) added four capability-gated behavioral scenarios so a non-steward host can be mechanically certified non-vacuously under `OPENWOP_REQUIRE_BEHAVIOR=true`: `agent-roster-attribution.test.ts` (RFC 0086 §B/§C; gated on `agents.roster.supported` — the normative `GET /v1/agents/roster` read shape + `total==roster.length`, the §C `roster.run.initiated`-before-`agent.invocation.started` ordering, the content-free payload backing `roster-attribution-no-content`, the durable work-item `triggerSubscriptionId`, and the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ROSTER_ID`), `agent-live-invocation-bracket.test.ts` (RFC 0077 §E; gated on `agents.liveRuntime.supported` — `agent.invocation.started`-first / `agent.invocation.completed`-last bracket, matching `invocationId`, `source`/`outcome` closed enums, content-free), `agent-live-structured-output.test.ts` (RFC 0077 §B step 6; gated on `agents.liveRuntime.structuredOutput` — a result violating `handoff.returnSchemaRef` fails the invocation `outcome:"failed"` rather than shipping as completed), and `agent-live-allowlist-enforced.test.ts` (RFC 0077 §F-1 / RFC 0002 §A14; gated on `agents.liveRuntime.supported` — a tool outside `toolAllowlist` is not callable); all four drive the documented `POST /v1/host/sample/roster/fire` + `POST /v1/host/sample/agents/live-invoke` seams plus the test event-log seam and soft-skip on 404 (these are the RFC 0086 / 0077 Active→Accepted bars). 2026-05-30 (RFC 0087 — agent org-chart, Draft -> Active) added `agent-org-chart-shape.test.ts` (always-on server-free: the `capabilities.agents.orgChart` shape + the `AgentOrgChart` round-trip + the non-`host:` member negative + the **§B structural non-authority guarantee** — the schema rejects a `scopes`/`canDispatch`/`permissions`/`authority` field on a member (`additionalProperties:false`), and a member's key set is exactly `{rosterId, departmentId, roleId, reportsTo}` — backing the protocol-tier `org-position-no-authority-escalation` invariant; no new RunEventType). 2026-05-30 (RFC 0086 — standing agent roster, Draft -> Active) added `agent-roster-shape.test.ts` (always-on server-free: the `capabilities.agents.roster` shape + the `AgentRosterEntry` round-trip + the `host:` `rosterId` + `agentRef` version-XOR-channel negatives + the content-free `roster.run.initiated` negatives backing the protocol-tier `roster-attribution-no-content` invariant + the additive `roster` inventory projection + RunEventType-enum membership). 2026-05-30 (RFC 0082 — agent deployment lifecycle, Draft -> Active) added `agent-deployment-shape.test.ts` (always-on server-free: the `capabilities.agents.deployment` shape + the `AgentDeployment` record round-trip + the `AgentRef` `channel` XOR `version` `not`-clause + the four `deployment.*` payloads + the content-free negatives backing the protocol-tier `deployment-event-no-content-leak` invariant). 2026-05-30 (RFC 0085 — `openwop-agent-platform` meta-profile, Draft -> Active) added `agent-platform-profile.test.ts` (always-on server-free derivation of the operational-annex `none`/`partial`/`full` status: all-floor ⇒ partial, missing-flag ⇒ none, the replay-OR-`nondeterminismPolicy.declared` term, floor+governance ⇒ full, missing-tenant-scope ⇒ partial-not-full per the honest-advertisement rule, eval/deploy/budget-are-advisory-not-hard-terms, + the `capabilities.nondeterminismPolicy.declared` shape). 2026-05-30 (RFC 0084 — budget, quota + cost policy, Draft -> Active) added `budget-policy-shape.test.ts` (always-on server-free: `budget-policy.schema.json` round-trip + the §A orthogonality guard — a wall-time field is rejected (it's RFC 0058's `runTimeoutMs`) — + threshold/onExhaustion negatives + the four content-free `budget.{reserved,consumed,threshold.crossed,exhausted}` payloads + the four `cap.breached{budget-*}` kinds + RunEventType-enum membership + the no-pricing-property structural check backing the protocol-tier `budget-no-pricing-leak` invariant + the `capabilities.budget`/`limits.maxBudget*` shape). 2026-05-30 (RFC 0083 — durable trigger + channel bridge, Draft -> Active) added `trigger-bridge-shape.test.ts` (always-on server-free: `trigger-subscription.schema.json` round-trip + missing-`state`/out-of-enum-`source`/unknown-property negatives + the four-state vocab + the two content-free `trigger.{subscription.state.changed,delivery.attempted}` payloads incl. closed `state`/`outcome` enums + RunEventType-enum membership + the `triggerBridge`/`webhooks.durable` capability shape + the `openwop-trigger-bridge` profile derivation incl. the no-dead-letter-sink negative). 2026-05-30 (RFC 0079 — credential provenance + egress policy, Draft -> Active) added `egress-provenance-shape.test.ts` (always-on server-free: `credential-provenance.schema.json` round-trip + `audiences:[]`/missing-`credentialId`/unknown-property negatives + the no-secret-property structural check backing the protocol-tier `egress-decision-no-secret-leak` invariant + the content-free `egress.decided` record incl. the `decision` enum + RunEventType-enum membership + the `httpClient.egressPolicy` shape; the behavioral `egress-credential-audience-bound` confused-deputy MUST is reference-impl tier, deferred to a host). 2026-05-30 (RFC 0078 — portable tool catalog, Draft -> Active) added `tool-descriptor-shape.test.ts` (always-on server-free: `tool-descriptor.schema.json` round-trip + the §C-1 `exec` ⇒ `host-extension` cross-field MUST (RFC 0069) + the `safetyTier`-required negative + `additionalProperties:false`, the `capabilities.toolCatalog` `supported`/`sources`/`sessionLifecycle` shape, and the two content-free `tool.session.{opened,closed}` payload $defs incl. the closed `outcome` enum + RunEventType-enum membership). 2026-05-30 (RFC 0080 — agent memory capability reconciliation, Draft -> Active) added `memory-capability-model-shape.test.ts` (always-on server-free: the additive `capabilities.memory.{writable,search,retention}` dimension shapes + malformed-instance negatives — `retention.ttl` non-boolean, out-of-enum `search.modes`, unknown property under `additionalProperties:false` — the `agent-inventory-response` `memoryDegraded`/`degradedMemoryDimensions` closed-enum fields, and the `openwop-memory` derivation surfacing for read/write + long-term hosts while withholding from `writable:false`). 2026-05-30 (RFC 0081 — agent evaluation, Draft -> Active) added `agent-eval-suite-shape.test.ts` (always-on server-free: the `capabilities.agents.evalSuite` shape + the `AgentEvalSuite`/`EvalSummary` schema round-trips + the three `eval.{started,scored,completed}` payloads + the content-free negatives — a task entry with a `taskOutput` body, a `safetyFinding` with an `excerpt` — backing the new `eval-summary-no-content-leak` SECURITY invariant). 2026-05-29 (RFC 0076 §B — `ctx.http.safeFetch` live-run audit) added `safefetch-live-audit.test.ts` (`behaviorGate('openwop-safefetch-live-audit', …)`, gated on `httpClient.safeFetch` + `toolHooks.prePostEvents`) — asserts the audit-when-both MUST against the **durable run event log** via the new `POST /v1/host/sample/http/safe-fetch-run` open seam + the test event-log seam, closing the seam-vs-production gap (a production `createSafeFetch()` with no audit hooks passes the inline `safefetch-behavior.test.ts` but FAILS this under `OPENWOP_REQUIRE_BEHAVIOR=true`); this is the RFC 0076 §B → Accepted bar; run seam soft-skips on 404 (host-pending). 2026-05-29 (RFC 0066 — `x-openwop-form` picker UX hints, Draft → Active) added `x-openwop-form-pack-manifest.test.ts` (always-on server-free: an annotated `configSchema` stays a valid 2020-12 schema + the advisory hints don't change what it accepts, each §A annotation matches the shape, an unknown `kind` validates for forward-compat, 3 negatives — missing/non-string `kind`, non-string `dependsOn`). 2026-05-29 (RFC 0076 §B — `ctx.http.safeFetch`) added `safefetch-behavior.test.ts` (seam-gated: SSRF block / DNS-rebinding / `Connection: upgrade` refusal / tool-hooks audit-when-both, via `POST /v1/host/sample/http/safe-fetch`; advertisement contract stays in `http-client-ssrf.test.ts`). 2026-05-29 (RFC 0076 §A — pack `runtime.requires[]` install gate) added two: `runtime-requires-shape.test.ts` (server-free closed-vocabulary validation — the 8 tokens validate, a raw builtin name is rejected, empty-array≡omission, `uniqueItems`) + `runtime-requires-install-gate.test.ts` (seam-gated install-grant / install-refuse → `pack_runtime_requirement_unmet` / non-sandbox SHOULD-projection, soft-skip on 404 via `POST /v1/host/sample/packs/install-gate`). 2026-05-29 (RFC 0047 — `host.oauth` authorization-code roundtrip) added `oauth-authorization-code-roundtrip.test.ts` — capability-gated on `capabilities.oauth.supported` + `grants` including `authorization_code`; drives the `POST /v1/host/sample/oauth/authorize-code-roundtrip` seam against the one canonical synthetic provider in `fixtures/oauth-providers/synthetic.json` (soft-skip on 404, Tier-2 host-pending), asserting a successful grant returns a credential REFERENCE (token persisted as a `host.credentials` entry) and that the authorization code / state / PKCE verifier / acquired access+refresh tokens never appear on any run-visible surface (RFC 0047 §C + §C.2 / `credential-payload-redaction`). Closes the RFC 0047 Tier-2 gap (capability-shape + redaction scenarios existed; the actual authorization-code dance was unexercised). 2026-05-26 (RFC 0070 — agent-manifest runtime) added `agent-manifest-runtime.test.ts`; 2026-05-26 (RFC 0071 — artifact-type + chat card packs) added six: `artifact-type-pack-manifest-validation.test.ts` + `artifact-schema-compile-bounded.test.ts` (server-free) + `artifact-type-pack-install.test.ts` + `artifact-type-store-without-render.test.ts` + `chat-card-pack-manifest-validation.test.ts` (server-free) + `chat-card-pack-execution.test.ts` (capability-gated, host-pending). 2026-05-26 (RFCs 0067 / 0068 / 0069 — spec-gap Draft cohort) added five scenarios: `byok-auth-modes.test.ts` (RFC 0067; always-on schema-shape of `aiProviders.authModes` + a discovery-gated §B auth-mode-contract cross-field check), `memory-consolidation-shape.test.ts` (RFC 0068; always-on shape of `agents.memoryConsolidation`/`agents.commitments` + the `agent.memory.consolidated`/`commitment.fired` payload $defs), `memory-consolidation-idempotent.test.ts` + `commitment-fired.test.ts` (RFC 0068; capability-gated behavioral, soft-skip on the documented `/v1/host/sample/memory/consolidate` + `/commitment/fire` seams), and `exec-not-protocol-tier.test.ts` (RFC 0069; always-on server-free structural assertion that the protocol corpus defines no `core.*`/`openwop.*` exec-class primitive — backs the `exec-must-not-be-protocol-tier` SECURITY invariant). 2026-05-25 (RFC 0061 — stateful agent-loop lifecycle, executionModel.version 5) added four `agent-loop-*.test.ts` scenarios: `-version5-shape` (always-on; validates `executionModel.statefulResume`/`transcriptWindow` + the 1–5 version ceiling) plus `-iteration-monotonic` (gated on `version >= 5`; `runOrchestrator.decided.iteration` increments 1,2,3… exactly once per turn), `-workspace-snapshot` (gated additionally on `host.workspace.supported`; a turn-i workspace write is invisible to turn i, visible to turn i+1), and `-stateful-resume` (gated on `statefulResume`; a mid-loop suspend resumes at the same iteration without resetting the counter) — the three behavioral scenarios drive the documented agent-loop seam (`POST /v1/host/sample/agentloop/run`) and soft-skip until a host wires it. 2026-05-25 (RFC 0059 — host.workspace M2, reference-host enforcement) added two `workspace-*.test.ts` scenarios: `-behavior` (capability-gated CRUD round-trip / `If-Match` 409 `workspace_conflict` / `workspace_too_large` / §D run-start snapshot, all via the real `/v1/host/workspace/files` §C endpoints) and `-cross-tenant-isolation` (WCT-1 — drives the documented `POST /v1/host/sample/workspace/op` seam to assert a file owned by one `{tenant, workspace}` is unreadable, on both `get` and `list`, under a different owner; backs the new `workspace-cross-tenant-isolation` SECURITY invariant). The in-memory reference host now advertises `capabilities.workspace.supported` and honors §C/§D/§E end-to-end. 2026-05-25 (RFC 0062 — memory.distillation "dreams") added five `distillation-*.test.ts` scenarios: `-shape` (always-on; validates the `capabilities.memory.distillation` block + the additive `distillation` sub-object on `memory.compacted`) plus `-token-budget` (within budget `tokensUsed ≤ tokenBudget`; an un-meetable budget → `token_budget_exceeded` with no partial archive), `-stable-archive` (same sources + budget ⇒ byte-stable archive checksum), `-index-roundtrip` (gated additionally on `indexEmitted`; the `MEMORY-INDEX.json` workspace file is retrievable + `workspace.updated` fired), and `-secret-carryforward` (SR-1: a redacted source secret never appears in the archive) — the four behavioral scenarios drive the documented memory-distillation seam (`POST /v1/host/sample/memory/distill`) and soft-skip until a host wires it. 2026-05-25 (RFC 0063 — core.subWorkflow.outputAttestation) added four `subrun-*.test.ts` scenarios: `-attestation-shape` (always-on; validates the `capabilities.agents.subRunAttestation` flag) plus `-checksum-stable` (the child output checksum is the byte-stable, key-order-invariant RFC 8785 JCS + SHA-256 digest), `-approval-gate` (`requireApproval` → `accept` merges, `reject` does not), and `-approval-fail-closed` (no `accept`/`edit-accept` → no merge; backs the deferred `subrun-merge-approval-fail-closed` invariant) — the three behavioral scenarios drive the documented sub-run attestation seam (`POST /v1/host/sample/subrun/attest`) and soft-skip until a host wires it. 2026-05-25 (RFC 0064 — host.toolHooks) added five `tool-hooks-*.test.ts` scenarios: `-shape` (always-on; validates the `capabilities.toolHooks` block + the optional content-free fields on `agentToolCalled` / `agentToolReturned`) plus `-content-free` (gated on `prePostEvents`), `-authorization-fail-closed` (gated on `perToolAuthorization`), `-rate-limit` (gated on `perToolRateLimit`), and `-secret-redaction` (gated on `prePostEvents` + the SR-1 `argsHash` redaction rule) — the four behavioral scenarios drive the documented tool-hooks invoke seam (`POST /v1/host/sample/toolhooks/invoke`) and soft-skip until a host wires it. 2026-05-25 (RFC 0060 — host.heartbeat) added four `heartbeat-*.test.ts` scenarios: `-capability-shape` (always-on; validates the `capabilities.heartbeat` block) plus `-fires-once-per-tick`, `-idempotent-no-spam`, and `-runtime-bound` (gated on `capabilities.heartbeat.supported` + the host heartbeat tick seam; soft-skip until a host wires it). 2026-05-25 (RFC 0057 — memory write-attribution) added five `memory-attribution-*.test.ts` scenarios: `-shape` (always-on advertisement check on `capabilities.memory.attribution`), plus `-no-content`, `-tenant-scoped`, `-emits-on-write`, and `-replay-stable` (gated on `capabilities.memory.attribution.emitsWriteEvents`) verifying the content-free `memory.written` RunEvent, its two SECURITY invariants (`memory-attribution-no-content` + `memory-attribution-tenant-scoped`), and the §D replay rule that a `replay`-mode fork MUST NOT regenerate `memoryId`. 2026-05-25 (RFC 0025 §C point 1 — test-catalog isolation invariant; pairs with the 25 publish-error scenarios in `pack-registry-publish.test.ts`) added `pack-registry-isolation.test.ts` — capability-gated on `capabilities.packs.testMode.{supported, isolated}: true`; PUTs a disposable pack into `/v1/packs-test/{name}` and asserts the same `(name, version)` does NOT appear via `GET /v1/packs/{name}` — anchors the test-catalog isolation MUST in RFC 0025 §C. 2026-05-25 (RFC 0028 Tier-2 post-promotion T2 — read-side sister scenario for workspace-membership enforcement) added `prompt-read-workspace-membership-enforced.test.ts` — gates on `capabilities.prompts.supported: true` (broader than `mutableLibrary` so read-only hosts that expose `?workspaceId=` are also probed); drives `GET /v1/prompts?workspaceId=<random-non-member>` and interprets the response: 4xx PASS (canonical envelope check on 403); 200 with empty `templates[]` PASS (correct null result for a nonexistent workspace); 200 with non-empty `templates[]` FAIL (cross-tenant leak); 200 without `templates[]` field SKIP (host doesn't expose workspace-scoped reads). Verifies SECURITY invariant `prompt-read-workspace-membership-enforced`. Same-day T1 strengthened `prompt-mutation-workspace-membership-enforced.test.ts` to pin `error === "workspace_membership_required"` when the host's refusal status is 403 (other refusal codes unconstrained). 2026-05-25 (RFC 0028 Tier-2 follow-up — workspace-membership enforcement on mutating prompt endpoints, filed in response to a self-disclosed adopter vulnerability) added `prompt-mutation-workspace-membership-enforced.test.ts` — capability-gated on `capabilities.prompts.mutableLibrary: true`; drives `POST /v1/prompts` with a cryptographically-random non-member `workspaceId` and asserts the host refuses (NOT a 2xx; any 4xx/5xx is acceptable — silent success is the failure mode). Verifies SECURITY invariant `prompt-mutation-workspace-membership-enforced`. 2026-05-22 (RFC 0034 §B follow-up — secret-leakage harness against the OTel + debug-bundle seams) added `secret-leakage-otel-attribute.test.ts` — gates on `capabilities.secrets.supported` + `capabilities.observability.testSeams.{otelScrape,debugBundleExport}` AND the `OPENWOP_CANARY_SECRET_VALUE` env (host operator + conformance runner agree on the canary). Drives the existing `openwop-smoke-byok-roundtrip` fixture end-to-end; scrapes both seams after run completion; hard-fails if the canary plaintext appears in any OTel span attribute or debug-bundle field. Verifies SECURITY invariants `secret-leakage-otel-attribute` + `secret-leakage-debug-bundle-otel`. 2026-05-22 (RFC 0041 Phase 4 — replay determinism under nondeterministic models) added three scenarios: `replay-divergence-at-refusal.test.ts` (advertisement-shape probe on `replayDeterminism.refusalDivergenceEmission` + 2 `it.todo` for the dual-direction refusal-divergence case), `replay-observable-sequence-determinism.test.ts` (capability-gated; behavioral assertion soft-skipped until a `conformance-phase4-nondet-tool` fixture ships), `replay-llm-cache-key-portable.test.ts` (intra-host reproducibility + non-recipe-field invariance + Phase 4 advertisement alignment — reuses the existing `POST /v1/host/sample/test/llm-cache-key` seam from the sibling `replay-llm-cache-key.test.ts`). 2026-05-20 (RFC 0027 §A templateKinds-coverage follow-up — paired with `prompt-end-to-end-events.test.ts`) added `prompt-all-four-kinds-events.test.ts` exercising all four `PromptKind` values (`system`, `user`, `schema-hint`, `few-shot`) end-to-end through the reference workflow-engine sample's `local.sample.demo.mock-ai` dispatch path; capability-gated via `behaviorGate('prompts-supported', ...)`. Closes the credibility gap where the host advertised `templateKinds: ["system", "user", "few-shot", "schema-hint"]` but only the system+user pair was actually wired into dispatch. 2026-05-20 (RFCs 0030–0033 — envelope LLM-contract-hardening track) added 15 scenarios across four `Active` RFCs: `envelope-reasoning-shape.test.ts` (RFC 0030, always-on; asserts the OPTIONAL `reasoning` property on the three universal-kind schemas + the `schema.response` deliberate omission), `envelope-reasoning-secret-redaction.test.ts` (RFC 0030, capability-gated on `capabilities.envelopes.reasoning.supported` + `secrets.supported`; 5 `it.todo()` placeholders for SECURITY invariant `envelope-reasoning-secret-redaction`), `envelope-tier-one-subset-static.test.ts` (RFC 0030, always-on for load-bearing rules — no `oneOf` / `allOf` / `not` / `prefixItems` / `propertyNames` anywhere; gated on `tierOneSubsetCompliance: "strict"` for OpenAI-strict-only constraints), `envelope-variant-discriminator-static.test.ts` (RFC 0031, always-on; asserts no `oneOf` + every `anyOf` branch declares a single-string-enum discriminator in `required` on every `schemas/envelopes/*.schema.json`), `model-capability-substituted.test.ts` (RFC 0031, advertisement-shape probe on `capabilities.modelCapabilities.advertised[]` identifier pattern + 5 `it.todo()` placeholders for SECURITY invariant `model-capability-substituted-no-credential-disclosure`), `model-capability-insufficient.test.ts` (RFC 0031, 6 `it.todo()` placeholders for refusal + no-recursive-fallback), `node-module-required-capabilities-shape.test.ts` (RFC 0031 SHOULD-tier authoring-convention; 4 `it.todo()` placeholders), and the six envelope-reliability events from RFC 0032 (`envelope-retry-attempted` carrying the shared advertisement-shape probe enforcing both MUST-tier events in `events[]` per RFC 0032 §C, plus `envelope-retry-exhausted`, `envelope-refusal-shape`, `envelope-truncated`, `envelope-nl-to-format-engaged`, `envelope-recovery-applied` — collectively 39 `it.todo()` placeholders covering retry/refusal/truncation/recovery + SECURITY invariants `envelope-refusal-no-prompt-leak` and `envelope-recovery-no-content-leak`), plus RFC 0033's two scenarios (`envelope-completion-distinguishes-truncation.test.ts` + `envelope-truncation-cap-exhaustion.test.ts` — 12 `it.todo()` placeholders covering the truncation-vs-schema-violation retry-routing distinction + the DoS-bound assertion). Reference workflow-engine sample advertises `capabilities.envelopes.reasoning: { supported: true, promptDirective: "off" }` + `tierOneSubsetCompliance: "warn"` honestly (schemas accept the field; host doesn't yet inject the directive); the other three RFCs' capability blocks defer to reference-host emission code per the staged RFC 0027 §G precedent. 2026-05-20 (RFC 0028 §B Phase B — prompt-pack boot-time install) added `prompt-pack-install.test.ts` (capability-gated on `capabilities.prompts.endpointsSupported: true`; asserts a host that ran the boot-time pack loader surfaces ≥ 1 pack-source template under `GET /v1/prompts?source=pack` carrying the canonical `meta.source: "pack"` + `meta.packName` + `meta.packVersion` stamps; positively identifies the in-tree `vendor.openwop.prompt-sample` reference pack's `writer-system` template when present). Pairs with the new `host/promptPackLoader.ts` boot-time entry on the reference workflow-engine sample, which scans `examples/packs/*` plus `OPENWOP_PROMPT_PACKS_DIR` and calls `installPackTemplates()` for each `kind: "prompt"` pack found. 2026-05-20 (RFC 0029 Phase C — prompt resolution chain wire shape) added three more scenarios: `prompt-resolution-chain-node-wins.test.ts` (capability-gated on `capabilities.prompts.supported: true`; asserts layer-1 node-config supersedes lower layers per `spec/v1/prompts.md` §"Resolution chain (normative)"), `prompt-resolution-chain-agent-intrinsic.test.ts` (additionally gated on `capabilities.prompts.agentBindings: true`; asserts agent intrinsic `systemPromptRef` wins over `promptOverrides` AND lower layers when the node has no layer-1 ref), `prompt-resolution-chain-fallback-cascade.test.ts` (asserts layer 3 workflow-defaults wins over layer 4 host-defaults; layer 4 host-defaults wins when 1-3 yield null; resolved is null when all four yield null but chain[] still lists every attempted layer). The scenarios drive the host's `POST /v1/host/sample/prompt/resolve` test seam (reference-host implementation deferred to follow-up slice per RFC 0021 staging precedent). 2026-05-20 (RFC 0027 Phase A — prompt templates wire shape) added three scenarios: `prompt-template-shape.test.ts` (always-on; Ajv compileability + positive/negative round-trip for PromptTemplate + PromptRef + PromptKind), `prompt-composed-secret-redaction.test.ts` (capability-gated on `capabilities.prompts.supported: true` + `observability: "full"`; asserts `[REDACTED:<secretId>]` markers in `prompt.composed` payloads for `source: "secret"` variable bindings per SECURITY/threat-model-secret-leakage.md §SR-1), `prompt-composed-trust-marker.test.ts` (same capability gates; asserts `<UNTRUSTED>...</UNTRUSTED>` wrapping + `contentTrust: "untrusted"` propagation per RFC 0020 §D). Paired with new `fixtures/prompt-templates/` sub-directory + per-fixture schema-validity describe block + future SECURITY invariants `prompt-composed-secret-redaction` and `prompt-composed-trust-marker` (lands alongside reference-host emission per RFC 0021 staging precedent). 2026-05-18 (RFC 0022 `Draft` — runtime variable mapping) added four `it.todo()` placeholder scenarios covering the new mapping surfaces on `core.dispatch` (§A — `dispatch-input-mapping.test.ts`, `dispatch-output-mapping.test.ts`, `dispatch-cross-worker-handoff.test.ts`) and `core.subWorkflow` (§B — `subworkflow-input-mapping.test.ts`). Gated on `capabilities.agents.dispatchMapping` (dispatch trio) and `capabilities.subWorkflow.inputMapping` (subWorkflow). Promote to live assertions when RFC 0022 reaches `Active` + a reference host advertises the matching flags. 2026-05-17 (RFC 0003 §D handoff-schema enforcement, HV-1) added `agentPackHandoffSchemaValidation.test.ts` — verifies the host validates dispatch payloads against `handoff.taskSchemaRef` AND return payloads against `handoff.returnSchemaRef` per RFC 0003 §D. Paired with the new `agent-pack-handoff-schema-enforcement` row in `SECURITY/invariants.yaml`. 2026-05-17 (AI Envelope gap-closure, DRAFT v1.x — `spec/v1/ai-envelope.md`) added 7 advertisement-shape scenarios with `it.todo()` behavioral placeholders gated on `capabilities.envelopeContracts.advertised: true`: `aiEnvelope.universalKinds.test.ts`, `aiEnvelope.schemaDrift.test.ts`, `aiEnvelope.correlationReplay.test.ts`, `aiEnvelope.contractRefusal.test.ts`, `aiEnvelope.trustBoundaryPropagation.test.ts`, `aiEnvelope.redaction.test.ts`, `aiEnvelope.capBreached.test.ts`. Paired with the new `envelope-redaction-sr-1-carry-forward` row in `SECURITY/invariants.yaml`. 2026-05-17 (post-publish hardening, deep audit of `core.openwop.agents`) added `agents-run-tool-allowlist.test.ts` — server-free scenario locking in the `core.openwop.agents@1.0.1` safety-fix that closes `OPENWOP-AUDIT-2026-003` (function-typed `tool.handler` properties rejected at `validateTools()` with `INVALID_TOOL_DECLARATION`; tool-driven runs require `ctx.agentRuntime`; tool-less safe fallback preserved). Paired with the new `agents-run-no-raw-handler` row in `SECURITY/invariants.yaml`. Same-day post-publish hardening added `idempotency-key-determinism.test.ts` — server-free scenario locking in the `core.openwop.http@1.1.2` determinism safety-fix (default `composite` mode produces deterministic keys in `(runId, nodeId, payload)`; removed `uuid` mode rejects with `CONFIG_INVALID`; cross-impl vector test lets third-party reimplementations verify wire agreement). Paired with the new `idempotency-key-deterministic` row in `SECURITY/invariants.yaml`. 2026-05-17 (Phase 3 of RFC 0013) added three server-free scenarios exercising the reference workflow-chain expansion library (`conformance/src/lib/workflow-chain-expansion.ts`): `workflow-chain-expansion.test.ts` (parameter substitution + node id collision avoidance + edge rewriting + capability propagation + runtime-invariance contract), `workflow-chain-unresolvable-typeid.test.ts` (rejection with `chain_unresolvable_typeid` when a chain references an unknown typeId), and `workflow-chain-pack-signature-verification.test.ts` (Ed25519 verification recipe reuse from `node-packs.md §Signing`). Earlier that day (Phase 1) added `workflow-chain-pack-manifest-validation.test.ts` — server-free schema-validation scenario covering the new `workflow-chain-pack-manifest.schema.json` (positive sample + two negatives: kind/contents mismatch and invalid `chainId`). Closes RFC 0013 (`Workflow-chain packs`, `Draft`) Phases 1 + 3 alongside the new `spec/v1/workflow-chain-packs.md`, the `Capabilities.workflowChainPacks` block, and the registry build-index/conformance-check `kind` routing from Phase 2. Earlier that day, the suite added 27 `it.todo()` placeholder scenarios paired with RFCs 0014-0020 (host capability surfaces — fs, kvStorage, tableStorage, queueBus, sql/vector/search, blob/cache, mcp.serverMount). These promote to live assertions when each RFC reaches `Active` + the matching capability block lands in `schemas/capabilities.schema.json` + a reference host advertises the capability. Earlier additions include 18 Multi-Agent Shift scenarios (Phases 1-5) added 2026-05-10, the `registry-public.test.ts` public-registry healthcheck added 2026-05-11 (opt-in via `OPENWOP_TEST_PUBLIC_REGISTRY=true`), the `replay-llm-cache-key.test.ts` placeholder added 2026-05-11 (three `it.todo()` cases for the cross-host LLM cache-key recipe per `replay.md` §"LLM cache-key recipe"), the two `production-*.test.ts` scenarios added 2026-05-11 for the `openwop-production` profile per RFC 0009 (`production-backpressure.test.ts`, `production-retention-expiry.test.ts`), the four `auth-*.test.ts` scenarios added 2026-05-11/12 for the production-auth profiles per RFC 0010 (`auth-api-key-rotation.test.ts`, `auth-oauth2-client-credentials.test.ts`, `auth-oidc-user-bearer.test.ts`, `auth-mtls.test.ts` (opt-in via `OPENWOP_TEST_MTLS=1`)), `replay-retention-expiry.test.ts` added 2026-05-12 (capability shape + 410/422 envelope per `replay.md` §"Retention and garbage collection"), `bulk-cancel.test.ts` added 2026-05-12 (Phase B close-out of R1 — `POST /v1/runs:bulk-cancel`), the two Phase H launch-blocker advertisement-contract scenarios added 2026-05-12 (`mcp-toolcall-redaction.test.ts` for the MCP-1 invariant per `host-capabilities.md §host.mcp` + `threat-model-prompt-injection.md §UNTRUSTED`, and `http-client-ssrf.test.ts` for the SSRF + body-size cap advertisement contract on `capabilities.httpClient`), the `wasm-pack-abi-version-rejection.test.ts` Track 7 scenario added 2026-05-12 for the ABI-mismatch positive path via the `vendor.openwop.misbehaving-abi` pack per RFC 0008 §H, the `otel-trace-propagation-subworkflow.test.ts` Track 11 close-out added 2026-05-13 (parent + child run spans share the inbound traceparent's traceId across the `core.subWorkflow` dispatch boundary), and the three RFC 0012 (Memory Compaction Profile, `Active`) scenarios added 2026-05-13/14: `memory-compaction-sr1-carry-forward.test.ts` (load-bearing SR-1 §D), `memory-compaction-event-emitted.test.ts` (canonical §B payload shape), and `memory-compaction-provenance-tag.test.ts` (soft assertion on §C `compacted-from:<id>` convention). All three gate on `capabilities.memory.compaction.supported` + the host's test seam at `/v1/test/memory/{seed,compact}` (Postgres reference host enables both via `OPENWOP_MEMORY_COMPACTION=true OPENWOP_TEST_TRIGGER_COMPACTION=true`). 2026-05-15 (gap-closure CF-3) added `interrupt-token-matrix.test.ts` (malformed / unknown / replay / cross-run-id paths on `GET|POST /v1/interrupts/{token}`). 2026-05-31 (RFC 0078 portable tool catalog + RFC 0079 credential provenance / egress policy — the Active→Accepted behavioral gate) added four: `tool-catalog-projection.test.ts` (capability-gated on `toolCatalog.supported` via `behaviorGate('openwop-tool-catalog', …)` — the NORMATIVE `GET /v1/tools` list with each `ToolDescriptor` schema-valid + `source`/`safetyTier` in the closed vocab + content-free, `GET /v1/tools/{toolId}` round-trip + unknown-id 404, 401-unauthenticated, and the §F-2 cross-principal non-disclosure; black-box, no POST seam), `tool-session-lifecycle.test.ts` (gated on `toolCatalog.sessionLifecycle` — the §D `tool.session.opened`-before / `tool.session.closed`-after bracket over the RFC 0064 call events via the `POST /v1/host/sample/tools/session-run` seam, one shared `sessionId`, content-free), `egress-audience-binding.test.ts` (KEYSTONE — gated on `httpClient.egressPolicy.supported`; the §C confused-deputy MUST via `POST /v1/host/sample/egress/decide`: an out-of-audience egress is denied/downgraded with the credential NOT attached, a provenance-unevaluable egress fails closed — the behavioral leg of `egress-credential-audience-bound`), and `egress-decision-content-free.test.ts` (the SR-1 canary — the credential value never surfaces in `egress.decided` and `reason` stays in the CLOSED vocabulary). The maintained scenario-to-spec map lives in [`coverage.md`](./coverage.md); this README keeps the operator quickstart and the historical scenario notes below.
97
97
 
98
98
  High-level coverage includes:
99
99
 
@@ -172,7 +172,7 @@ Server-required (added in 1.7.0):
172
172
  |---|---|---|
173
173
  | **Redaction** | [`capabilities.md`](../spec/v1/capabilities.md) §"Secrets" + NFR-7 + §"aiProviders" | Vendor-neutral assertions that the server doesn't leak secret material. Three scenario groups: (a) discovery shape contract — `secrets` + `aiProviders` advertisements are well-formed regardless of `secrets.supported`; when `supported === true`, scopes MUST be non-empty + `resolution === 'host-managed'`; `byok ⊆ supported`. (b) bearer-token redaction — invalid Bearer canary in `Authorization` header is not echoed in the 401 response body. (c) credentialRef echo control — gated on `secrets.supported === true`; canary planted in `configurable.ai.credentialRef` MUST NOT appear in any RunEvent payload (poll-based capture; transport-agnostic). Uses runtime-built canary fixtures (`lib/canaries.ts`) that defeat static secret scanners. 6 scenarios. |
174
174
 
175
- Current source tree: 319 scenario files. Use [`coverage.md`](./coverage.md) for current grade/gap tracking.
175
+ Current source tree: 321 scenario files. Use [`coverage.md`](./coverage.md) for current grade/gap tracking.
176
176
 
177
177
  ## Remaining Gaps
178
178
 
package/coverage.md CHANGED
@@ -45,15 +45,15 @@
45
45
  | Multi-agent confidence-floor escalation (RFC 0039 — `spec/v1/multi-agent-execution.md` §"Confidence escalation", `version: 2`) | `multi-agent-confidence-escalation.test.ts` | B (1 advertisement-shape probe on `confidenceEscalationFloor` + 1 behavioral assertion against the low-confidence fixture) | RFC 0039 filed Draft → promoted Active 2026-05-22 after the confidence-floor half landed end-to-end. Advertisement-shape probe asserts `capabilities.multiAgent.executionModel.confidenceEscalationFloor` (when present) is a number in `[0.5, 1.0]`; values below the spec floor are non-conformant. Behavioral assertion drives the `conformance-multi-agent-confidence-escalation` fixture (supervisor `mockDispatchPlan` carries one decision with `confidence: 0.3`) and asserts: parent reaches `waiting-clarification` (NOT `completed` because no dispatch fired); exactly ONE `core.workflowChain.confidence-escalated` event with `payload.confidence === 0.3`, `payload.floor ∈ [0.5, 1.0]`, `payload.escalationKind ∈ {clarify, escalate}`; causationId chains back to the `runOrchestrator.decided` event; ZERO `core.workflowChain.event` records (the load-bearing distinction from `version: 1` — confidence floor MUST fire BEFORE any dispatch.began). Reference workflow-engine advertises `version: 2` + `confidenceEscalationFloor: 0.5` when both `OPENWOP_MULTI_AGENT_EXECUTION_MODEL=true` AND `OPENWOP_MULTI_AGENT_EXECUTION_MODEL_PHASE_2=true` are set; floor tunable via `OPENWOP_MULTI_AGENT_CONFIDENCE_FLOOR`. Path to `Accepted`: non-steward host advertises `version: 2` + the behavioral assertion passes against it. Memory-lifecycle half of RFC 0039 (MAE-2/3) remains explicit follow-up: `crossChildMemoryConcurrency` capability field is schema-landed but the host's MemoryAdapter doesn't yet implement either contract. |
46
46
  | Sandbox execution contract (RFC 0035 — `spec/v1/host-capabilities.md` §"Sandbox execution contract") | `sandbox-no-host-fs-escape.test.ts`, `sandbox-no-host-env-leak.test.ts`, `sandbox-no-network-escape.test.ts`, `sandbox-no-host-process-escape.test.ts`, `sandbox-memory-cap.test.ts`, `sandbox-timeout-cap.test.ts`, `sandbox-capability-gate-respected.test.ts`, `sandbox-no-cross-pack-mutation.test.ts` | C+ (advertisement-shape probes always-on; 8 capability-gated behavioral stubs scaffolded; soft-skip on hosts that don't advertise `capabilities.sandbox.supported`) | RFC 0035 promoted Draft → Active 2026-05-21. 8 scenarios, one per `node-pack-sandbox-*` invariant in `SECURITY/invariants.yaml`. Behavioral assertions remain stubbed with `expect(true).toBe(true)` + docstring expected-wire-shape pending the synthetic `vendor.openwop.misbehaving-sandbox` pack + a first sandbox-executing reference host. Path to `Accepted`: first sandbox-executing host advertises + implements the 8 failure-mode invariants + the 8 scenarios pass; at that point the 8 `node-pack-sandbox-*` SECURITY rows graduate from `reference-impl` → `protocol` tier per RFC 0035 §"Acceptance criteria." |
47
47
  | Multi-region idempotency + cross-engine append-ordering (RFC 0036 — `spec/v1/idempotency.md` §"`multiRegion` sub-block", `spec/v1/replay.md` §"Cross-region replay") | `multi-region-idempotency.test.ts`, `cross-engine-append-ordering.test.ts`, **`multi-region-idempotency-behavior.test.ts` (2026-05-22)**, **`cross-engine-append-behavior.test.ts` (2026-05-22)** | A (2 categorical-shape probes always-on + 1 granular `multiRegion` shape probe + 1 `crossEngineOrdering` shape probe + 6 multi-region behavioral assertions + 4 cross-engine Lamport-ordering behavioral assertions; all 10 behavioral assertions PASS against the reference workflow-engine when `OPENWOP_TEST_MULTI_REGION_SIMULATOR=true` + `OPENWOP_TEST_CROSS_ENGINE_HARNESS=true` are set) | RFC 0036 §B + §C behavioral close-out landed 2026-05-22 via the new workflow-engine test seams (`POST /v1/host/sample/test/multi-region/simulate-partition` + `POST/GET /v1/host/sample/test/cross-engine/{append,read,reset}`) — see `spec/v1/host-sample-test-seams.md` §6 + §7. The new `multi-region-idempotency-behavior.test.ts` exercises the canonical lex-min convergence rule + order-invariance + 400-on-mismatch; the new `cross-engine-append-behavior.test.ts` exercises Lamport-clock monotonicity + per-engine order preservation + read-determinism. Path to `Accepted`: non-steward host advertises matching capabilities + the behavioral assertions pass against it. |
48
- | Secret-leakage telemetry / debug-bundle export (RFC 0034 §B — `spec/v1/host-capabilities.md` §"OTel collector test seam") | **`secret-leakage-otel-attribute.test.ts` (2026-05-22)** | A (3 capability-gated probes — OTel span scrape + debug-bundle scrape + advertisement-shape; soft-skips honestly until host advertises `capabilities.observability.testSeams.{otelScrape, debugBundleExport}` AND `capabilities.secrets.supported` AND `OPENWOP_CANARY_SECRET_VALUE` env is set) | Broadens the existing protocol-tier `secret-leakage-otel-attribute` + `secret-leakage-debug-bundle-otel` SECURITY invariants from envelope-acceptor-narrow (already covered by `envelope-reasoning-secret-redaction.test.ts`) to executor-side-broad. Drives the existing `openwop-smoke-byok-roundtrip` fixture; scrapes both seams after run completion; hard-fails if the BYOK canary plaintext appears in any OTel span attribute or debug-bundle field. |
48
+ | Secret-leakage telemetry / debug-bundle export (RFC 0034 §B — `spec/v1/host-capabilities.md` §"OTel collector test seam") | **`secret-leakage-otel-attribute.test.ts` (2026-05-22)**, **`otel-collector-canary-inspection.test.ts` (2026-06-01)** | A (host scrape-seam probes + collector-side over-the-wire inspection: `secret-leakage-otel-attribute.test.ts` scrapes the host seams AND — new — runs `OtelCollector.findCanaryLeakage()` against the live real OTLP export when the collector is active; `otel-collector-canary-inspection.test.ts` is the always-on, server-free proof that the inspector is non-vacuous) | Broadens the existing protocol-tier `secret-leakage-otel-attribute` + `secret-leakage-debug-bundle-otel` SECURITY invariants from envelope-acceptor-narrow (`envelope-reasoning-secret-redaction.test.ts`) to executor-side-broad. **Collector-seam gap CLOSED 2026-06-01:** `OtelCollector.findCanaryLeakage()` scans every captured span name/attribute/resource-attribute + metric data-point attribute for the BYOK canary, so the conformance collector now inspects what the host's OTLP exporter ACTUALLY shipped over the wire — a host can no longer redact in its scrape seam while leaking on the real export. The always-on scenario stands up a real collector, POSTs synthetic OTLP/HTTP-JSON through the actual ingest path, and proves the inspector catches a planted canary in each surface + reports zero on a redacted payload + never matches an empty canary. Residual is adoption-only (the live assertion soft-skips until a host exports OTLP to the collector). |
49
49
  | Experimental capability tier (RFC 0042 — `schemas/capabilities.schema.json` §`multiAgent.executionModel.tier`) | **`experimental-tier-shape.test.ts` (2026-05-22)** | A (6 server-free + helper-routing assertions across §A schema discipline + §D experimentalGate routing; always-on for hosts that advertise tier='experimental' on any capability sub-block; helper-level behavioral probes for the `experimentalGate()` routing under both default + OPENWOP_REQUIRE_EXPERIMENTAL modes) | RFC 0042 (Draft) lands the audit's "Active RFC → carve-out" pattern. Schema diff lands on `multiAgent.executionModel` with optional `tier ∈ {stable, experimental}` + `experimentalUntil` (ISO-8601 sunset) + `if/then` conditional enforcing §B sunset MUST mechanically. New `experimentalGate()` helper in `conformance/src/lib/behavior-gate.ts` routes scenarios under default mode + `OPENWOP_REQUIRE_EXPERIMENTAL=true` strict-mode. |
50
50
  | Sandbox WASM-isolation behavioral graduation (RFC 0035 §B) | **`sandbox-wasm-isolation.test.ts` (2026-05-31)** | A (10 always-on server-free assertions against the committed `fixtures/wasm-sandbox/*.wasm` via the suite-local `wasm-sandbox-probe.ts`: `misbehaving-{fs,env,network,process}` → `sandbox_escape_attempt`+`escapeKind` by static `WebAssembly.Module.imports()` gate, `misbehaving-memory` OOB store → `sandbox_memory_exceeded`, un-granted `openwop.*` → `sandbox_capability_denied`, mutable-global double-instantiate → isolated context; all 10 PASS) | **Graduates 6 `node-pack-sandbox-*` SECURITY rows reference-impl → protocol 2026-05-31** (fs-gated / no-env / network-gated / no-process / memory-cap / isolated-context). These hold by construction in any WASM sandbox — a forbidden op can only be a declared import refused before instantiation; the memory bound is engine-enforced. The reference host is `examples/hosts/wasm-sandbox/` (#412). **`node-pack-sandbox-timeout` ALSO graduated `reference-impl → protocol` 2026-06-01** via the worker-driven `sandbox-wasm-timeout.test.ts` (`probeTimeout` spawns a worker + a main-thread kill-timer — the thread preemption a server-free probe can't do; 2/2 non-vacuous incl. a well-behaved positive control) — so **7 of 8** `node-pack-sandbox-*` invariants are now protocol-tier. `node-pack-sandbox-no-eval` is JS-specific + a permanent exemption. RFC 0035 `Active → Accepted` separately needs a **non-steward** sandbox-executing host. |
51
51
  | Sandbox MVP behavioral close-out (RFC 0035 §B) | **`sandbox-mvp-behavior.test.ts` (2026-05-22)** | A (10 capability-gated behavioral assertions covering 7 of 8 §B failure-mode invariants — 5 escape kinds + timeout + memory-exceeded + cross-pack-mutation isolation + capability-gate-violation + 2 well-behaved baselines; all 10 PASS against the workflow-engine's node:vm-based sandbox MVP) | Companion to the existing 8 advertisement-shape sandbox scenarios (`sandbox-no-host-fs-escape.test.ts` et al.). Exercises the canonical 4-code error catalog at `spec/v1/host-capabilities.md` §"Error codes" (`sandbox_escape_attempt` + `sandbox_capability_denied` + `sandbox_memory_exceeded` + `sandbox_timeout`) with spec-mandated `details.{escapeKind, requestedCapability, requestedBytes}` populated. Wire-shape per `spec/v1/host-sample-test-seams.md §8`. Production adopters use wasmtime/nsjail behind the same HTTP test-seam contract. |
52
- | RFC 0041 §B replay-divergence-at-refusal behavioral (`version: 4`) | `replay-divergence-at-refusal.test.ts` (advertisement-shape + behavioral; 3 assertions PASS against workflow-engine when the `multiAgent.executionModel.version: 4` advertisement is enabled) | A (was `it.todo` until 2026-05-23 when the executor wiring landed — see commit `1fce55a` + `bba3b4a`. Behavioral assertions cover both divergence directions: original=valid + replay=refusal AND original=refusal + replay=valid) | Closes Track #4 of the 2026-05-22 multi-agent behavioral-harness close-out. Reference workflow-engine emits `replay.divergedAtRefusal` event + fails run with `error.code: 'replay_diverged_at_refusal'` when source vs replay envelope kinds differ at the same nodeId. Gated on `OPENWOP_MULTI_AGENT_EXECUTION_MODEL_PHASE_4=true` AND `run.forkMode === 'replay'`. Path-to-Accepted for RFC 0041: non-steward host advertises `multiAgent.executionModel.version: 4` end-to-end. |
52
+ | RFC 0041 §B replay-divergence-at-refusal behavioral (`version: 4`) | `replay-divergence-at-refusal.test.ts` (advertisement-shape + behavioral; 3 assertions PASS against workflow-engine when the `multiAgent.executionModel.version: 4` advertisement is enabled) | A (was `it.todo` until 2026-05-23 when the executor wiring landed — see commit `1fce55a` + `bba3b4a`. Behavioral assertions cover both divergence directions: original=valid + replay=refusal AND original=refusal + replay=valid) | Closes Track #4 of the 2026-05-22 multi-agent behavioral-harness close-out. Reference workflow-engine emits `replay.divergedAtRefusal` event + fails run with `error.code: 'replay_diverged_at_refusal'` when source vs replay envelope kinds differ at the same nodeId. Gated on `OPENWOP_MULTI_AGENT_EXECUTION_MODEL_PHASE_4=true` AND `run.forkMode === 'replay'`. Path-to-Accepted for RFC 0041: non-steward host advertises `multiAgent.executionModel.version: 4` end-to-end. **RFC 0041 §C sibling:** `replay-observable-sequence-determinism.test.ts` is likewise now ACTIVE capability-gated behavioral (2026-06-01 — was an `it.skip` placeholder; the `conformance-phase4-nondet-tool` fixture having shipped, it drives a `mode:replay` fork of the nondet fixture and asserts observable event-log prefix byte-equivalence + nondeterministic-node observable-result caching, gated on `replayDeterminism.supported` + `version >= 4`; soft-skips against hosts that haven't wired the pure-replay observable-cache path). |
53
53
  | Agent-manifest runtime floor (RFC 0070 — `capabilities.agents.manifestRuntime`) | `agent-manifest-runtime.test.ts` | B (capability-gated; lists ≥1 installed manifest agent + dispatches one with attributed `agent.reasoned`+`agent.decided` events, plus a §F sub-threshold-escalation assertion) | RFC 0070 filed Draft 2026-05-26. Gated on `capabilities.agents.manifestRuntime.supported` + the host dispatch seam (`POST /v1/host/sample/agents/{agentId}/dispatch`); soft-skips when either is absent. The reference **workflow-engine** host advertises `manifestRuntime: { supported: true, handoffValidation: true }`, loads pack `agents[]` (RFC 0003 `installAgents`) into an AgentRegistry at boot, and dispatches end-to-end (toolAllowlist-filtered per RFC 0002 §A14, handoff-validated per RFC 0003 §D, confidence-escalating per §F) — see `apps/workflow-engine/backend/typescript/test/agent-dispatch-route.test.ts` (6 HTTP assertions, incl. the normative inventory). **RFC 0072 (`Draft`):** the scenario's inventory leg now drives the NORMATIVE `GET /v1/agents` (§A) so it runs black-box against any conformant host; the dispatch leg stays on the sample seam (soft-skips off-steward) pending the executor-integration tier. RFC 0072 §C `peerDependenciesMeta` disposition + `degraded[]` are unit-tested in `agent-loader.test.ts`. Path to `Active → Accepted` (RFC 0070): a non-steward host advertises `manifestRuntime` + serves `GET /v1/agents`. |
54
54
  | Live manifest dispatch (RFC 0077 — `capabilities.agents.liveRuntime`) | `agent-live-runtime-shape.test.ts` | A (always-on, server-free shape probe) | RFC 0077 promoted Draft → Active 2026-05-29 (5 UQs resolved via MyndHyve T4 co-design). Always-on shape probe asserts `capabilities.agents.liveRuntime` (+ `supported`/`structuredOutput`/`confidenceEscalation`/`sources` sub-flags) is declared, the `agentInvocationStarted`/`agentInvocationCompleted` payloads validate conforming content-free records + reject malformed ones (`started` missing `source`; `completed` out-of-enum `outcome`), and both event names appear in the RunEventType enum. `liveRuntime` ⊃ `manifestRuntime`. **Behavioral scenarios deferred** per RFC 0077 §Conformance (reference host): the started→completed bracket ordering, `structuredOutput` enforcement, and `toolAllowlist` enforcement gate on `capabilities.agents.liveRuntime.supported` + a live-invoke seam and soft-skip until a host wires it. Path to `Accepted`: a non-steward host advertises `liveRuntime` + emits the invocation pair (net-new MyndHyve T4 work, queued behind §B). |
55
55
  | Agent evaluation (RFC 0081 — `capabilities.agents.evalSuite`) | `agent-eval-suite-shape.test.ts` | A (always-on, server-free shape probe; doubles as the public test for `eval-summary-no-content-leak`) | RFC 0081 promoted Draft → Active 2026-05-30. Always-on shape probe asserts `capabilities.agents.evalSuite` (+ `supported`/`modes` sub-flags) is declared; the `AgentEvalSuite` + `EvalSummary` schemas compile + round-trip a conforming artifact and reject malformed ones (bad `suiteId` infix; `passScore` out of 0..1; out-of-range `aggregateScore`); the `eval.started`/`eval.scored`/`eval.completed` payloads validate content-free records + reject malformed ones; and all three event names appear in the RunEventType enum. The **content-free negatives** (an `EvalSummary` task entry carrying a `taskOutput` body; a `safetyFinding` carrying an `excerpt`) are the public test for protocol-tier SECURITY invariant `eval-summary-no-content-leak`. **Behavioral scenario authored + gated** (2026-05-31; see §"Capability-gated scenarios"): `agent-eval-run.test.ts` (the `eval.started`→per-task `eval.scored`→`eval.completed` ordering, the content-free `eval.scored` legs, and the NORMATIVE `GET /v1/runs/{runId}/eval-summary` schema-valid round-trip) gates on `capabilities.agents.evalSuite.supported` + the `POST /v1/host/sample/agents/eval-run` seam (`behaviorGate('openwop-eval-run', …)`) and soft-skips until a host wires the eval projection. Path to `Accepted`: a host advertises `evalSuite` + runs a golden/regression suite end-to-end (the `GET /v1/runs/{runId}/eval-summary` endpoint + SDK helper already landed). |
56
- | Memory capability model (RFC 0080 — `spec/v1/agent-memory.md` §"Memory capability model", `spec/v1/profiles.md` §`openwop-memory`) | `memory-capability-model-shape.test.ts` | A (always-on, server-free shape probe) | RFC 0080 promoted Draft → Active 2026-05-30 (4 UQs resolved via MyndHyve review). Always-on shape probe asserts the additive `capabilities.memory.{writable,search,retention}` dimensions are declared (existing `supported`/`compaction`/`distillation`/`attribution` untouched), `memory.search`/`memory.retention` validate conforming instances + reject malformed ones (`retention.ttl` non-boolean; out-of-enum `search.modes`; unknown property under `additionalProperties:false`), `agent-inventory-response` declares `memoryDegraded` + the closed-enum `degradedMemoryDimensions` (the eight §A dimension names), and `deriveProfiles` surfaces `openwop-memory` for a read/write + long-term host while withholding it from a `writable:false` host. **Behavioral scenario deferred** per RFC 0080 §Conformance: `memory-degraded-projection.test.ts` (a live `GET /v1/agents` stamping `memoryDegraded` when an agent's `memoryShape` exceeds the host's reconciled model) gates on `agents.manifestRuntime` + `memory` and soft-skips until a reference host computes it. Path to `Accepted`: a host computes the §C degraded projection + the scenario passes against it. |
56
+ | Memory capability model (RFC 0080 — `spec/v1/agent-memory.md` §"Memory capability model", `spec/v1/profiles.md` §`openwop-memory`) | `memory-capability-model-shape.test.ts` | A (always-on, server-free shape probe) | RFC 0080 promoted Draft → Active 2026-05-30 (4 UQs resolved via MyndHyve review). Always-on shape probe asserts the additive `capabilities.memory.{writable,search,retention}` dimensions are declared (existing `supported`/`compaction`/`distillation`/`attribution` untouched), `memory.search`/`memory.retention` validate conforming instances + reject malformed ones (`retention.ttl` non-boolean; out-of-enum `search.modes`; unknown property under `additionalProperties:false`), `agent-inventory-response` declares `memoryDegraded` + the closed-enum `degradedMemoryDimensions` (the eight §A dimension names), and `deriveProfiles` surfaces `openwop-memory` for a read/write + long-term host while withholding it from a `writable:false` host. **Behavioral scenario authored** (2026-06-01; see §"Capability-gated scenarios"): `memory-degraded-projection.test.ts` (a live `GET /v1/agents` stamping `memoryDegraded` + the closed-enum `degradedMemoryDimensions` when an agent's `memoryShape` exceeds the host's reconciled model) gates on `agents.manifestRuntime` + `memory` via `behaviorGate('openwop-memory-degraded', …)` and soft-skips until a host computes the §C projection. Path to `Accepted`: a host computes the §C degraded projection + the scenario passes against it non-vacuously (MyndHyve `memory`). |
57
57
  | Portable tool catalog (RFC 0078 — `spec/v1/tool-catalog.md`) | `tool-descriptor-shape.test.ts` | A (always-on, server-free shape probe) | RFC 0078 promoted Draft → Active 2026-05-30 (4 UQs resolved via MyndHyve review). Always-on shape probe asserts `tool-descriptor.schema.json` round-trips a conforming `ToolDescriptor` + rejects the malformed (`safetyTier`-required, `additionalProperties:false`), enforces the §C-1/§F-4 cross-field MUST (`safetyTier:"exec"` ⇒ `source:"host-extension"`, RFC 0069 — an `exec`+`node-pack` descriptor is rejected), asserts the `capabilities.toolCatalog` `supported`/`sources`/`sessionLifecycle` shape, and validates the two content-free `tool.session.{opened,closed}` payloads (incl. the closed `outcome` enum) + their RunEventType-enum membership. **Behavioral scenarios deferred** per RFC 0078 §Conformance: `tool-catalog-projection.test.ts` (the authorization-scoped `GET /v1/tools` + `404` non-disclosure) + `tool-session-lifecycle.test.ts` (the `tool.session.*` bracket ordering) gate on `capabilities.toolCatalog.supported` + `sessionLifecycle` and soft-skip until a reference host serves the catalog. Path to `Accepted`: a host projects ≥1 tool source at `GET /v1/tools` + the projection scenario passes. |
58
58
  | Credential provenance + egress policy (RFC 0079 — `spec/v1/host-capabilities.md` §"Credential provenance + egress policy") | `egress-provenance-shape.test.ts` | A (always-on, server-free shape probe; doubles as the public test for `egress-decision-no-secret-leak`) | RFC 0079 promoted Draft → Active 2026-05-30 (4 UQs resolved via MyndHyve review). Always-on shape probe asserts `credential-provenance.schema.json` round-trips a conforming `CredentialProvenance` + rejects `audiences:[]` / missing `credentialId` / unknown property, the descriptor + `egress.decided` declare NO secret-value property (the content-free **`egress-decision-no-secret-leak`** protocol-tier invariant), the `egress.decided` payload validates a content-free record + enforces the `decision` enum + required `decision`/`destination`, and `capabilities.httpClient.egressPolicy` is declared. **The behavioral audience-binding MUST-NOT (`egress-credential-audience-bound`) is reference-impl tier** at Draft→Active — a credential bound to audience A on an egress to B must be `denied`/`downgraded` (never `allowed`-with-credential), fail-closed on unevaluable provenance — and lands in the gated `egress-audience-binding.test.ts` + `egress-decision-content-free.test.ts` (soft-skip until a host wires `egressPolicy` over `safeFetch`). Path to `Accepted`: a reference host enforces §C + the binding scenario passes → `egress-credential-audience-bound` graduates protocol-tier (RFC 0035 precedent). |
59
59
  | Durable trigger + channel bridge (RFC 0083 — `spec/v1/trigger-bridge.md`, `spec/v1/profiles.md` §`openwop-trigger-bridge`) | `trigger-bridge-shape.test.ts` | A (always-on, server-free shape probe) | RFC 0083 promoted Draft → Active 2026-05-30 (5 UQs resolved via MyndHyve review). Always-on shape probe asserts `trigger-subscription.schema.json` round-trips a conforming `TriggerSubscription` + rejects missing-`state`/out-of-enum-`source`/unknown-property, the four-state vocab (`active`/`paused`/`failed`/`dead-lettered`) is stable, the two content-free `trigger.{subscription.state.changed,delivery.attempted}` payloads validate + enforce the `state`/`outcome` enums + RunEventType-enum membership, `capabilities.triggerBridge` + `webhooks.durable` are declared, and `deriveProfiles` surfaces `openwop-trigger-bridge` for bridge+sink+durable-source while withholding it with no dead-letter sink. **Behavioral scenario deferred** per RFC 0083 §Conformance: `trigger-bridge-delivery.test.ts` (dedup → retry → dead-letter → trigger→run causation) is profile-gated on `openwop-trigger-bridge` and soft-skips until a reference host wires durable delivery. Path to `Accepted`: a host wires the state machine + delivery loop + the scenario passes. |
@@ -119,6 +119,7 @@ Thirty-one scenario groups validate optional profiles where the host's discovery
119
119
  | `tool-session-lifecycle.test.ts` | `capabilities.toolCatalog.sessionLifecycle` (RFC 0078 §D, `tool-catalog.md`) | A (the §D bracket via `POST /v1/host/sample/tools/session-run` + the test event-log seam: `tool.session.opened` before the first RFC 0064 call event → `tool.session.closed` after the last, one shared `sessionId`, each carrying a `toolId`, `closed.outcome` ∈ {completed,failed,aborted,expired}, both content-free) | `host-pending` | `behaviorGate('openwop-tool-session-lifecycle', …)`. Seam-gated; soft-skips on 404. **Part of the RFC 0078 → Accepted bar.** First adopter: MyndHyve `toolCatalog`. |
120
120
  | `egress-audience-binding.test.ts` | `capabilities.httpClient.egressPolicy.supported` (RFC 0079 §C, `host-capabilities.md`) + `SECURITY/invariants.yaml` `egress-credential-audience-bound` | A (KEYSTONE — the §C confused-deputy MUST via `POST /v1/host/sample/egress/decide`: an out-of-audience egress is `denied`/`downgraded` with `reason:"out-of-audience"` and the credential is NOT attached (`credentialAttached !== true`); a provenance-unevaluable egress fails closed `denied`+`reason:"provenance-unevaluable"`; decision/reason ∈ the closed enums) | `host-pending` | `behaviorGate('openwop-egress-audience-binding', …)`. Seam-gated; soft-skips on 404. **This is the RFC 0079 → Accepted bar** (the `egress-credential-audience-bound` invariant graduates reference-impl → protocol tier when this passes against a host). First adopter: MyndHyve `httpClient.egressPolicy`. |
121
121
  | `egress-decision-content-free.test.ts` | `capabilities.httpClient.egressPolicy.supported` (RFC 0079 §F / SR-1) | A (the secret non-leak — a `canary` credential's sentinel never surfaces in the decision (`canaryLeaked !== true`), the `egress.decided` payload carries no forbidden content key, and `reason` stays in the CLOSED vocabulary so no blocked destination spills into a free-form field) | `host-pending` | `behaviorGate('openwop-egress-decision-content-free', …)`. Seam-gated; soft-skips on 404. **Part of the RFC 0079 → Accepted bar.** First adopter: MyndHyve `httpClient.egressPolicy`. |
122
+ | `memory-degraded-projection.test.ts` | `capabilities.agents.manifestRuntime.supported` + `capabilities.memory.supported` (RFC 0080 §C, `agent-memory.md`) | A (the §C iff-contract on the NORMATIVE `GET /v1/agents`: a degraded entry MUST carry `memoryDegraded:true` + a non-empty, unique `degradedMemoryDimensions[]` drawn from the closed §A-name enum [read/write/search/long-term-durability/compaction/attribution/replay-snapshot/retention]; a non-degraded entry MUST NOT carry a non-empty list; the inventory is non-empty; the degraded branch runs non-vacuously when `OPENWOP_DEGRADED_AGENT_ID` names a known-degraded agent) | `host-pending` | `behaviorGate('openwop-memory-degraded', …)`. Black-box on the normative path (no POST seam); soft-skips on 404 / when the host computes no degradation. **This is the RFC 0080 → Accepted bar.** First adopter: MyndHyve `memory`. |
122
123
  | `approval-gate-events.test.ts` | `approval.granted` / `.rejected` / `.overridden` (RFC 0051 §B, `interrupt-profiles.md` §approvalGate) | Server-free (event-payload schema validity: required fields incl. mandatory `overridden.reason`; additionalProperties:false negatives) | host-pass (server-free) | Always runs; no host needed. |
123
124
  | `approval-gate-flow.test.ts` | `core.openwop.governance.approvalGate` (RFC 0051 §A) + `capabilities.authorization` (RFC 0049) | A (capability-gated on `authorization.supported`; unauthorized-principal-denied + override-audited via the `governance/approval-gate` seam) | `host-pending` | Behavioral probe soft-skips on 404. Grant/reject-loopback/quorum scenarios deferred until a governance host wires the seam. |
124
125
  | `scheduling-capability-shape.test.ts` | `capabilities.scheduling` (RFC 0052 §A, `host-capabilities.md` §host.scheduling) | A (advertisement shape always — `supported` boolean; `cron`/`delayed`/`calendar` booleans; `maxFutureHorizon` ISO-8601 duration) | `host-pending` | Always runs; asserts the block is absent or well-formed. |
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@openwop/openwop-conformance",
3
- "version": "1.14.0",
3
+ "version": "1.15.0",
4
4
  "description": "Production-ready black-box conformance suite for OpenWOP v1.0 compliant servers.",
5
5
  "repository": {
6
6
  "type": "git",
@@ -83,6 +83,23 @@ export interface CapturedMetric {
83
83
  };
84
84
  }
85
85
 
86
+ /**
87
+ * One place where a canary string was found inside the captured OTLP
88
+ * export. Returned by `OtelCollector.findCanaryLeakage()` so a leak
89
+ * assertion can name the offending surface (which span, which attribute)
90
+ * rather than just `true`.
91
+ */
92
+ export interface CanaryLeak {
93
+ /** Which captured surface leaked: a span field or a metric data point. */
94
+ readonly surface: 'span.name' | 'span.attribute' | 'span.resourceAttribute' | 'metric.attribute';
95
+ /** The span/metric name the leak was found under. */
96
+ readonly emitterName: string;
97
+ /** Attribute key when `surface` is an attribute; `undefined` for `span.name`. */
98
+ readonly key: string | undefined;
99
+ /** Stringified value (or the name itself) that contained the canary. */
100
+ readonly value: string;
101
+ }
102
+
86
103
  /**
87
104
  * Decode an OTLP attribute-value object into a primitive. Returns `null`
88
105
  * when the value shape is unrecognized.
@@ -235,6 +252,61 @@ export class OtelCollector {
235
252
  return this._spans.filter((s) => s.name === name);
236
253
  }
237
254
 
255
+ /**
256
+ * Scan every captured span (name + attribute keys/values + resource
257
+ * attribute keys/values) and metric data-point attribute for the given
258
+ * canary substring, returning one `CanaryLeak` per hit.
259
+ *
260
+ * This is the collector-side complement to the host's
261
+ * `GET /v1/host/sample/test/otel/spans` scrape seam: the scrape seam
262
+ * reports what the host *says* it emitted; this method inspects what
263
+ * the host's OTLP exporter *actually shipped over the wire* to the
264
+ * collector. A host could redact in its scrape seam yet still leak on
265
+ * the real export — only collector-side inspection catches that, which
266
+ * is the gap `docs/KNOWN-LIMITS.md` tracked for
267
+ * `secret-leakage-otel-attribute` / `-debug-bundle-otel`.
268
+ *
269
+ * The match is a plain substring test (case-sensitive) so an attribute
270
+ * value that merely *embeds* the canary (e.g. inside a JSON blob or an
271
+ * error message) is still caught. Empty/whitespace canaries return no
272
+ * hits — a guard against vacuous "everything leaks" assertions.
273
+ *
274
+ * @see SECURITY/invariants.yaml secret-leakage-otel-attribute
275
+ * @see SECURITY/threat-model-secret-leakage.md
276
+ */
277
+ findCanaryLeakage(canary: string): readonly CanaryLeak[] {
278
+ const hits: CanaryLeak[] = [];
279
+ if (canary.trim() === '') return hits;
280
+ const contains = (v: unknown): string | null => {
281
+ const s = typeof v === 'string' ? v : JSON.stringify(v);
282
+ return s !== undefined && s.includes(canary) ? s : null;
283
+ };
284
+ for (const sp of this._spans) {
285
+ if (sp.name.includes(canary)) {
286
+ hits.push({ surface: 'span.name', emitterName: sp.name, key: undefined, value: sp.name });
287
+ }
288
+ for (const [key, val] of sp.attributes) {
289
+ const m = contains(val) ?? (key.includes(canary) ? key : null);
290
+ if (m !== null) hits.push({ surface: 'span.attribute', emitterName: sp.name, key, value: m });
291
+ }
292
+ for (const [key, val] of sp.resourceAttributes) {
293
+ const m = contains(val) ?? (key.includes(canary) ? key : null);
294
+ if (m !== null) {
295
+ hits.push({ surface: 'span.resourceAttribute', emitterName: sp.name, key, value: m });
296
+ }
297
+ }
298
+ }
299
+ for (const metric of this._metrics) {
300
+ for (const [key, val] of metric.dataPoint.attributes) {
301
+ const m = contains(val) ?? (key.includes(canary) ? key : null);
302
+ if (m !== null) {
303
+ hits.push({ surface: 'metric.attribute', emitterName: metric.name, key, value: m });
304
+ }
305
+ }
306
+ }
307
+ return hits;
308
+ }
309
+
238
310
  metrics(): readonly CapturedMetric[] {
239
311
  return this._metrics;
240
312
  }
@@ -0,0 +1,121 @@
1
+ /**
2
+ * Memory-capability degraded projection (RFC 0080 §C) — behavioral.
3
+ *
4
+ * Gated on `capabilities.agents.manifestRuntime` + `capabilities.memory`
5
+ * (root-first per RFC 0073). Soft-skips when either is unadvertised (default) /
6
+ * hard-fails under `OPENWOP_REQUIRE_BEHAVIOR=true`. The always-on wire-shape
7
+ * coverage lives in `memory-capability-model-shape.test.ts` (the schema fields +
8
+ * the closed dimension enum); this asserts host BEHAVIOR on the NORMATIVE
9
+ * `GET /v1/agents` inventory:
10
+ *
11
+ * §C iff-contract — for EVERY inventory entry, when the host cannot satisfy an
12
+ * agent's requested `memoryShape` it MUST stamp `memoryDegraded: true` together
13
+ * with a NON-EMPTY `degradedMemoryDimensions[]` whose members are the RFC 0080
14
+ * §A dimension names (the CLOSED enum, NOT the `memoryShape` keys) and are
15
+ * unique; a non-degraded entry MUST carry `memoryDegraded` absent or `false`
16
+ * and MUST NOT carry a non-empty `degradedMemoryDimensions`.
17
+ *
18
+ * Non-vacuity — the inventory MUST be non-empty (the cap is advertised + the
19
+ * endpoint serves). When `OPENWOP_DEGRADED_AGENT_ID` names an agent the host
20
+ * knows is degraded (an agent whose `memoryShape` exceeds host capability —
21
+ * e.g. one requesting `longTerm` on a host without long-term durability), the
22
+ * degraded branch is asserted NON-VACUOUSLY against that agent.
23
+ *
24
+ * Black-box on the normative path — no POST seam.
25
+ *
26
+ * Spec references:
27
+ * - https://github.com/openwop/openwop/blob/main/spec/v1/agent-memory.md (§"Memory capability model")
28
+ * - https://github.com/openwop/openwop/blob/main/RFCS/0080-agent-memory-capability-reconciliation.md
29
+ */
30
+
31
+ import { describe, it, expect } from 'vitest';
32
+ import { driver } from '../lib/driver.js';
33
+ import { behaviorGate } from '../lib/behavior-gate.js';
34
+ import { readCapabilityFamily } from '../lib/discovery-capabilities.js';
35
+ import { readManifestRuntimeCap, listManifestAgents } from '../lib/agentRuntime.js';
36
+
37
+ /** The CLOSED RFC 0080 §A dimension vocabulary (agent-inventory-response.schema.json
38
+ * `degradedMemoryDimensions` enum). NOT the `memoryShape` keys. */
39
+ const DIMENSIONS = [
40
+ 'read',
41
+ 'write',
42
+ 'search',
43
+ 'long-term-durability',
44
+ 'compaction',
45
+ 'attribution',
46
+ 'replay-snapshot',
47
+ 'retention',
48
+ ];
49
+
50
+ interface InventoryEntry {
51
+ agentId?: string;
52
+ memoryDegraded?: unknown;
53
+ degradedMemoryDimensions?: unknown;
54
+ [k: string]: unknown;
55
+ }
56
+
57
+ describe('memory-degraded-projection (RFC 0080 §C)', () => {
58
+ it('stamps memoryDegraded + a closed-enum degradedMemoryDimensions on degraded agents and nothing on the rest', async () => {
59
+ const mr = await readManifestRuntimeCap();
60
+ const memory = await readCapabilityFamily<Record<string, unknown>>('memory');
61
+ const advertised = mr?.supported === true && !!memory && memory.supported === true;
62
+ if (!behaviorGate('openwop-memory-degraded', advertised)) return;
63
+
64
+ const inv = await listManifestAgents();
65
+ if (inv === null) return; // host advertises the cap but doesn't serve /v1/agents — soft-skip
66
+ const agents = (inv.agents ?? []) as InventoryEntry[];
67
+
68
+ // Non-vacuity: an advertising + serving host MUST expose its inventory.
69
+ expect(
70
+ agents.length >= 1,
71
+ driver.describe('agent-memory.md §"Memory capability model"', 'GET /v1/agents MUST return the installed manifest agents'),
72
+ ).toBe(true);
73
+
74
+ // §C iff-contract on EVERY entry.
75
+ for (const a of agents) {
76
+ const degraded = a.memoryDegraded === true;
77
+ const dims = a.degradedMemoryDimensions;
78
+
79
+ if (degraded) {
80
+ expect(
81
+ Array.isArray(dims) && dims.length >= 1,
82
+ driver.describe('RFC 0080 §C', `memoryDegraded:true MUST carry a non-empty degradedMemoryDimensions (agent ${a.agentId})`),
83
+ ).toBe(true);
84
+ if (Array.isArray(dims)) {
85
+ for (const d of dims) {
86
+ expect(
87
+ typeof d === 'string' && DIMENSIONS.includes(d),
88
+ driver.describe('agent-inventory-response.schema.json', `degradedMemoryDimensions members MUST be RFC 0080 §A dimension names (got ${String(d)})`),
89
+ ).toBe(true);
90
+ }
91
+ expect(
92
+ new Set(dims as string[]).size === dims.length,
93
+ driver.describe('RFC 0080 §C', 'degradedMemoryDimensions MUST be unique'),
94
+ ).toBe(true);
95
+ }
96
+ } else {
97
+ // Not degraded ⇒ no non-empty dimension list (absent or empty both pass).
98
+ expect(
99
+ dims === undefined || (Array.isArray(dims) && dims.length === 0),
100
+ driver.describe('RFC 0080 §C', `a non-degraded entry MUST NOT carry a non-empty degradedMemoryDimensions (agent ${a.agentId})`),
101
+ ).toBe(true);
102
+ }
103
+ }
104
+
105
+ // Non-vacuous degraded branch when the host names a known-degraded agent.
106
+ const degradedId = process.env.OPENWOP_DEGRADED_AGENT_ID;
107
+ if (degradedId) {
108
+ const target = agents.find((a) => a.agentId === degradedId);
109
+ expect(
110
+ target !== undefined,
111
+ driver.describe('RFC 0080 §C', `OPENWOP_DEGRADED_AGENT_ID=${degradedId} MUST appear in the inventory`),
112
+ ).toBe(true);
113
+ if (target) {
114
+ expect(
115
+ target.memoryDegraded === true && Array.isArray(target.degradedMemoryDimensions) && target.degradedMemoryDimensions.length >= 1,
116
+ driver.describe('RFC 0080 §C', 'the named degraded agent MUST project memoryDegraded:true + a non-empty degradedMemoryDimensions'),
117
+ ).toBe(true);
118
+ }
119
+ }
120
+ });
121
+ });
@@ -0,0 +1,211 @@
1
+ /**
2
+ * otel-collector-canary-inspection — always-on proof that the conformance
3
+ * OTel collector inspects real OTLP span attributes for secret leakage.
4
+ *
5
+ * Context: `secret-leakage-otel-attribute.test.ts` proves a host doesn't
6
+ * leak a BYOK canary on its `GET /v1/host/sample/test/otel/spans` scrape
7
+ * seam. But the scrape seam reports what the host *says* it emitted; a
8
+ * host could redact there yet still ship the plaintext over the wire via
9
+ * its real OTLP exporter. `docs/KNOWN-LIMITS.md` tracked exactly this gap:
10
+ *
11
+ * "The conformance OTel collector seam doesn't yet inspect span
12
+ * attributes; a host could pass conformance while leaking BYOK
13
+ * material on telemetry exports."
14
+ *
15
+ * `OtelCollector.findCanaryLeakage()` (added with this scenario) closes
16
+ * the harness half: it scans every captured span's name, attributes, and
17
+ * resource attributes — plus metric data-point attributes — for the
18
+ * canary, returning the offending surface. This file is the always-on,
19
+ * server-free proof that the inspector is NON-VACUOUS: it stands up a
20
+ * real collector, POSTs synthetic OTLP/HTTP-JSON payloads through the
21
+ * collector's actual ingest path, and asserts the inspector (a) catches a
22
+ * canary embedded in a span attribute / resource attribute / span name /
23
+ * metric attribute, and (b) reports zero hits on a redacted payload.
24
+ *
25
+ * Pairing this with the live, capability-gated collector scrape in
26
+ * `secret-leakage-otel-attribute.test.ts` means the invariant is proven
27
+ * end-to-end the moment a host exports OTLP to the collector — the
28
+ * inspector itself is no longer the missing piece.
29
+ *
30
+ * Runs unconditionally (no host, no network): it exercises the collector
31
+ * library directly, so it executes in CI on every run.
32
+ *
33
+ * @see conformance/src/lib/otel-collector.ts findCanaryLeakage
34
+ * @see SECURITY/invariants.yaml secret-leakage-otel-attribute
35
+ * @see docs/KNOWN-LIMITS.md "Behavior tests too coarse to fully prove an invariant"
36
+ */
37
+
38
+ import { describe, it, expect, afterEach } from 'vitest';
39
+ import { OtelCollector } from '../lib/otel-collector.js';
40
+
41
+ const CANARY = 'sk-canary-DO-NOT-LEAK-0f3a9c';
42
+ const REDACTED = '[REDACTED:openwop-conformance-canary-secret]';
43
+
44
+ /** Build an OTLP/HTTP-JSON traces export carrying the given span+resource attrs. */
45
+ function tracesPayload(opts: {
46
+ spanName: string;
47
+ spanAttrs: Record<string, string>;
48
+ resourceAttrs: Record<string, string>;
49
+ }): unknown {
50
+ const toAttrs = (m: Record<string, string>) =>
51
+ Object.entries(m).map(([key, value]) => ({ key, value: { stringValue: value } }));
52
+ return {
53
+ resourceSpans: [
54
+ {
55
+ resource: { attributes: toAttrs(opts.resourceAttrs) },
56
+ scopeSpans: [
57
+ {
58
+ scope: { name: 'openwop' },
59
+ spans: [
60
+ {
61
+ traceId: 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa',
62
+ spanId: 'bbbbbbbbbbbbbbbb',
63
+ name: opts.spanName,
64
+ startTimeUnixNano: '1',
65
+ endTimeUnixNano: '2',
66
+ attributes: toAttrs(opts.spanAttrs),
67
+ },
68
+ ],
69
+ },
70
+ ],
71
+ },
72
+ ],
73
+ };
74
+ }
75
+
76
+ /** Build an OTLP/HTTP-JSON metrics export with one sum data point carrying attrs. */
77
+ function metricsPayload(metricName: string, attrs: Record<string, string>): unknown {
78
+ return {
79
+ resourceMetrics: [
80
+ {
81
+ scopeMetrics: [
82
+ {
83
+ scope: { name: 'openwop' },
84
+ metrics: [
85
+ {
86
+ name: metricName,
87
+ sum: {
88
+ dataPoints: [
89
+ {
90
+ asInt: '1',
91
+ attributes: Object.entries(attrs).map(([key, value]) => ({
92
+ key,
93
+ value: { stringValue: value },
94
+ })),
95
+ },
96
+ ],
97
+ },
98
+ },
99
+ ],
100
+ },
101
+ ],
102
+ },
103
+ ],
104
+ };
105
+ }
106
+
107
+ describe('otel-collector-canary-inspection: collector inspects real OTLP exports', () => {
108
+ let collector: OtelCollector | null = null;
109
+
110
+ afterEach(async () => {
111
+ if (collector) {
112
+ await collector.stop();
113
+ collector = null;
114
+ }
115
+ });
116
+
117
+ async function postTraces(payload: unknown): Promise<void> {
118
+ const res = await fetch(`${collector!.endpoint()}/v1/traces`, {
119
+ method: 'POST',
120
+ headers: { 'content-type': 'application/json' },
121
+ body: JSON.stringify(payload),
122
+ });
123
+ expect(res.status).toBeLessThan(300);
124
+ }
125
+
126
+ async function postMetrics(payload: unknown): Promise<void> {
127
+ const res = await fetch(`${collector!.endpoint()}/v1/metrics`, {
128
+ method: 'POST',
129
+ headers: { 'content-type': 'application/json' },
130
+ body: JSON.stringify(payload),
131
+ });
132
+ expect(res.status).toBeLessThan(300);
133
+ }
134
+
135
+ it('catches a canary embedded in a span attribute value', async () => {
136
+ collector = new OtelCollector();
137
+ await collector.start();
138
+ await postTraces(
139
+ tracesPayload({
140
+ spanName: 'openwop.node.execute',
141
+ spanAttrs: { 'openwop.node.id': 'n1', 'http.request.header.authorization': `Bearer ${CANARY}` },
142
+ resourceAttrs: { 'service.name': 'host' },
143
+ }),
144
+ );
145
+
146
+ const leaks = collector.findCanaryLeakage(CANARY);
147
+ expect(leaks.length).toBeGreaterThan(0);
148
+ const attrLeak = leaks.find((l) => l.surface === 'span.attribute');
149
+ expect(attrLeak).toBeDefined();
150
+ expect(attrLeak!.key).toBe('http.request.header.authorization');
151
+ expect(attrLeak!.value).toContain(CANARY);
152
+ });
153
+
154
+ it('catches a canary in a resource attribute and in a span name', async () => {
155
+ collector = new OtelCollector();
156
+ await collector.start();
157
+ await postTraces(
158
+ tracesPayload({
159
+ spanName: `openwop.run ${CANARY}`,
160
+ spanAttrs: { 'openwop.run.id': 'r1' },
161
+ resourceAttrs: { 'service.name': 'host', 'deployment.token': CANARY },
162
+ }),
163
+ );
164
+
165
+ const leaks = collector.findCanaryLeakage(CANARY);
166
+ const surfaces = new Set(leaks.map((l) => l.surface));
167
+ expect(surfaces.has('span.name')).toBe(true);
168
+ expect(surfaces.has('span.resourceAttribute')).toBe(true);
169
+ });
170
+
171
+ it('catches a canary in a metric data-point attribute', async () => {
172
+ collector = new OtelCollector();
173
+ await collector.start();
174
+ await postMetrics(metricsPayload('openwop.node.duration', { 'secret.echo': CANARY }));
175
+
176
+ const leaks = collector.findCanaryLeakage(CANARY);
177
+ const metricLeak = leaks.find((l) => l.surface === 'metric.attribute');
178
+ expect(metricLeak).toBeDefined();
179
+ expect(metricLeak!.emitterName).toBe('openwop.node.duration');
180
+ });
181
+
182
+ it('reports ZERO hits when the host redacts the canary before export (positive control)', async () => {
183
+ collector = new OtelCollector();
184
+ await collector.start();
185
+ await postTraces(
186
+ tracesPayload({
187
+ spanName: 'openwop.node.execute',
188
+ spanAttrs: { 'openwop.node.id': 'n1', 'http.request.header.authorization': `Bearer ${REDACTED}` },
189
+ resourceAttrs: { 'service.name': 'host', 'deployment.token': REDACTED },
190
+ }),
191
+ );
192
+ await postMetrics(metricsPayload('openwop.node.duration', { 'secret.echo': REDACTED }));
193
+
194
+ expect(collector.findCanaryLeakage(CANARY)).toEqual([]);
195
+ });
196
+
197
+ it('an empty or whitespace canary never produces a (vacuous) hit', async () => {
198
+ collector = new OtelCollector();
199
+ await collector.start();
200
+ await postTraces(
201
+ tracesPayload({
202
+ spanName: 'openwop.node.execute',
203
+ spanAttrs: { 'a': 'b' },
204
+ resourceAttrs: { 'service.name': 'host' },
205
+ }),
206
+ );
207
+
208
+ expect(collector.findCanaryLeakage('')).toEqual([]);
209
+ expect(collector.findCanaryLeakage(' ')).toEqual([]);
210
+ });
211
+ });
@@ -8,87 +8,204 @@
8
8
  * Asserts (behavioral, when a host advertises `version: 4` + the contract):
9
9
  *
10
10
  * 1. A `mode: replay` fork from event-log index `fromSeq` produces an
11
- * event-log prefix `[0, fromSeq]` that is byte-equivalent to the
12
- * original run's prefix (modulo per-region clock fields per RFC 0036
13
- * §E and ULID component-T entropy when ULIDs are minted fresh).
11
+ * observable event-log prefix `[0, fromSeq]` that is byte-equivalent
12
+ * to the original run's prefix (modulo volatile per-event fields:
13
+ * eventId/ULID entropy, per-region `observedAt` clocks per RFC 0036
14
+ * §E, and the run id itself).
14
15
  *
15
- * 2. The replay's `RunSnapshot.variables`, `RunSnapshot.channels`, and
16
- * `RunSnapshot.status` at the boundary index are byte-equivalent to
17
- * the original.
16
+ * 2. (Crucially per §C.) The replay reproduces the OBSERVABLE RESULT of
17
+ * a nondeterministic tool node EVEN WHEN a fresh call would produce
18
+ * different bytes. The `conformance-phase4-nondet-tool` fixture's
19
+ * first node declares `config.nondeterministic: true`; a `version: 4`
20
+ * host MUST replay the original event-log entries for that node
21
+ * (cache the observable result) rather than re-executing it, so the
22
+ * node's terminal payload is identical across original + replay.
18
23
  *
19
- * 3. (Crucially per §C.) The replay reproduces observable output EVEN
20
- * WHEN the underlying tool call would have produced different bytes.
21
- * The reference test uses a mock tool that returns a fresh random
22
- * string on each call; the host MUST cache the original observable
23
- * result so replay returns the SAME string the original got not
24
- * the bytes a fresh call would return now.
24
+ * The `conformance-phase4-nondet-tool` fixture ships in the suite (added
25
+ * via the RFC 0041 Phase 4 fixtures commit). These assertions are now
26
+ * runnable capability-gated `it()` bodies consistent with the sibling
27
+ * `replay-divergence-at-refusal.test.ts`, which is likewise active and
28
+ * soft-skips on the same gate. They light up the moment a host advertises
29
+ * the `version: 4` replay-determinism contract; against hosts that don't
30
+ * (incl. the reference workflow-engine, which has not yet wired the
31
+ * pure-replay observable-cache path), they soft-skip honestly.
25
32
  *
26
- * Driving the assertion requires a workflow fixture whose tool call is
27
- * pure-nondeterministic (different bytes on each call) but whose
28
- * observable result is what gets cached. Reference workflow-engine ships
29
- * `core.noop` + deterministic fixtures; the `version: 4` wiring needs a
30
- * nondeterministic-tool fixture (e.g., `conformance-phase4-nondet-tool`).
31
- * Until that lands, the cross-boundary assertion is surfaced as `it.todo`
32
- * so test reporters track the gap.
33
+ * RFC 0042 §B note: RFC 0041 §C is `Active` (not yet `Accepted`), so its
34
+ * wire shape MAY shift compatibly within v1.x — a host wiring this before
35
+ * RFC 0041 graduates SHOULD advertise `multiAgent.executionModel.tier:
36
+ * 'experimental'` + `experimentalUntil` per RFC 0042 §A.
33
37
  *
34
38
  * @see RFCS/0041-multi-agent-replay-under-nondeterminism.md §C
35
39
  * @see spec/v1/replay.md §"Observable-output-sequence determinism vs bit-equivalent execution (MAE-9 closure)"
36
40
  * @see spec/v1/multi-agent-execution.md §"Replay determinism under nondeterminism (RFC 0041)"
37
41
  */
38
42
 
39
- import { describe, it } from 'vitest';
40
-
41
- // Behavioral assertions in this file are currently `it.todo` placeholders;
42
- // the `conformance-phase4-nondet-tool` fixture hasn't shipped yet. When
43
- // it does, the `it.todo` calls flip back to runnable `it(...)` bodies
44
- // that read discovery (via `driver.get('/.well-known/openwop')`), gate
45
- // on `multiAgent.executionModel.version >= 4` AND
46
- // `replayDeterminism.supported: true`, and drive the workflow through
47
- // the fixture.
48
-
49
- describe('replay-observable-sequence-determinism: prefix byte-equivalence (RFC 0041 §C)', () => {
50
- // Behavioral assertion drives a workflow with at least one node whose
51
- // underlying tool call is nondeterministic (different bytes on each
52
- // call). The assertion sequence:
53
- // 1. POST /v1/runs { workflowId: 'conformance-phase4-nondet-tool' }
54
- // → runs to completion, capturing the original event log.
55
- // 2. Capture original event-log prefix [0, N] where N is the index
56
- // after the nondeterministic-tool node fires.
57
- // 3. POST /v1/runs/{runId}:fork { mode: 'replay', fromSeq: N }
58
- // 4. Read replay event-log prefix [0, N].
59
- // 5. Assert byte-equivalence modulo the carve-outs:
60
- // - per-region observedAt timestamps (RFC 0036 §E)
61
- // - ULID component-T entropy on newly-minted eventIds
62
- // 6. Read original + replay RunSnapshot at index N; assert
63
- // variables + channels + status byte-equivalent.
64
- // Surfaced as `todo` until the `conformance-phase4-nondet-tool`
65
- // fixture ships in the suite — consistent with the sibling RFC 0041
66
- // scenarios (`replay-divergence-at-refusal.test.ts`,
67
- // `replay-llm-cache-key-portable.test.ts`).
68
- // Marked out of stable profile via RFC 0042 §B (experimental tier):
69
- // RFC 0041 §C remains Active, so its wire shape MAY shift compatibly
70
- // within v1.x. Hosts that wire this assertion before RFC 0041 graduates
71
- // to Accepted SHOULD advertise `multiAgent.executionModel.tier:
72
- // 'experimental'` + `experimentalUntil` per RFC 0042 §A. Path-to-runnable
73
- // requires: (a) host pure-replay observable-cache emission via the
74
- // `:fork mode: replay` re-dispatch path and (b) the test seam endpoint
75
- // contract for cache-hit-vs-fresh-call distinction (see
76
- // `spec/v1/host-sample-test-seams.md` for the established seam pattern).
77
- it.skip('original and replay event-log prefixes [0, fromSeq] MUST be byte-equivalent (modulo per-region clock + ULID-T entropy) — out of stable profile via RFC 0042');
78
- });
79
-
80
- describe('replay-observable-sequence-determinism: observable-result caching (RFC 0041 §C)', () => {
81
- // The load-bearing assertion: a nondeterministic tool call's OBSERVABLE
82
- // RESULT (return value + side-effects on workflow state + emitted events)
83
- // is what gets cached, not the bytes-on-the-wire of the underlying call.
84
- // The replay's reproduction of the observable sequence is what makes
85
- // this a valid determinism contract — bit-equivalent execution would
86
- // require unbounded caching (rejected per RFC 0041 §"Alternatives
87
- // considered" #2).
88
- // Marked out of stable profile via RFC 0042 §B (experimental tier):
89
- // see the prefix-byte-equivalence comment above for the same routing.
90
- // This is RFC 0041 §C's load-bearing assertion; it lands as a runnable
91
- // `it()` when RFC 0041 graduates to Accepted on first non-steward host
92
- // adoption.
93
- it.skip('replay of a workflow containing a nondeterministic tool call reproduces the original observable result, NOT a fresh call — out of stable profile via RFC 0042');
94
- });
43
+ import { describe, it, expect } from 'vitest';
44
+ import { driver } from '../lib/driver.js';
45
+ import { capabilityFamily } from '../lib/discovery-capabilities.js';
46
+
47
+ const HTTP_SKIP = !process.env.OPENWOP_BASE_URL;
48
+ const FIXTURE = 'conformance-phase4-nondet-tool';
49
+ const NONDET_NODE_ID = 'nondet-tool';
50
+
51
+ interface ExecutionModelCaps {
52
+ version?: unknown;
53
+ replayDeterminism?: { supported?: unknown };
54
+ }
55
+ interface DiscoveryDoc {
56
+ capabilities?: {
57
+ multiAgent?: { executionModel?: ExecutionModelCaps };
58
+ };
59
+ }
60
+
61
+ interface RunSnapshot {
62
+ status?: string;
63
+ }
64
+ interface RunEventDoc {
65
+ type: string;
66
+ nodeId?: string;
67
+ sequence?: number;
68
+ payload?: Record<string, unknown>;
69
+ }
70
+
71
+ async function readDiscovery(): Promise<DiscoveryDoc | null> {
72
+ try {
73
+ const res = await driver.get('/.well-known/openwop');
74
+ if (res.status !== 200) return null;
75
+ return res.json as DiscoveryDoc;
76
+ } catch {
77
+ return null;
78
+ }
79
+ }
80
+
81
+ /** Soft-skip unless the host advertises the RFC 0041 §C version-4 contract. */
82
+ async function gateOnPhase4(ctx: { skip: () => void }): Promise<boolean> {
83
+ const d = await readDiscovery();
84
+ const em = capabilityFamily<{ executionModel?: ExecutionModelCaps }>(d, 'multiAgent')?.executionModel;
85
+ const version = typeof em?.version === 'number' ? em.version : 0;
86
+ if (em?.replayDeterminism?.supported !== true || version < 4) {
87
+ ctx.skip();
88
+ return false;
89
+ }
90
+ return true;
91
+ }
92
+
93
+ async function pollUntilTerminal(runId: string): Promise<RunSnapshot> {
94
+ for (let i = 0; i < 50; i++) {
95
+ const r = await driver.get(`/v1/runs/${encodeURIComponent(runId)}`);
96
+ const snap = r.json as RunSnapshot;
97
+ if (snap.status === 'completed' || snap.status === 'failed' || snap.status === 'cancelled') {
98
+ return snap;
99
+ }
100
+ await new Promise((resolve) => setTimeout(resolve, 100));
101
+ }
102
+ throw new Error(`run ${runId} did not reach terminal within 5s`);
103
+ }
104
+
105
+ async function readEvents(runId: string): Promise<RunEventDoc[]> {
106
+ const r = await driver.get(`/v1/runs/${encodeURIComponent(runId)}/events`);
107
+ const body = r.json as { events?: RunEventDoc[] };
108
+ return body.events ?? [];
109
+ }
110
+
111
+ /**
112
+ * Strip volatile per-event fields so two runs of the same workflow are
113
+ * comparable. Removes the run id, freshly-minted event ids/ULIDs, and the
114
+ * per-region observed-at clock (RFC 0036 §E carve-out) wherever they
115
+ * appear at the event top level.
116
+ */
117
+ function stripVolatile(ev: RunEventDoc): Record<string, unknown> {
118
+ const clone = JSON.parse(JSON.stringify(ev)) as Record<string, unknown>;
119
+ for (const k of ['eventId', 'runId', 'observedAt', 'timestamp', 'occurredAt', 'emittedAt', 'id']) {
120
+ delete clone[k];
121
+ }
122
+ return clone;
123
+ }
124
+
125
+ /** Create the fixture run; returns null (with a skip) if it isn't advertised. */
126
+ async function startFixtureRun(ctx: { skip: () => void }): Promise<string | null> {
127
+ const create = await driver.post('/v1/runs', { workflowId: FIXTURE });
128
+ if (create.status === 404 || create.status === 422) {
129
+ ctx.skip(); // fixture not advertised by this host
130
+ return null;
131
+ }
132
+ expect(create.status).toBe(201);
133
+ return (create.json as { runId: string }).runId;
134
+ }
135
+
136
+ describe.skipIf(HTTP_SKIP)(
137
+ 'replay-observable-sequence-determinism: prefix byte-equivalence (RFC 0041 §C)',
138
+ () => {
139
+ it('original and replay event-log prefixes MUST be byte-equivalent (modulo per-event clock + ULID entropy)', async (ctx) => {
140
+ if (!(await gateOnPhase4(ctx))) return;
141
+
142
+ const sourceRunId = await startFixtureRun(ctx);
143
+ if (sourceRunId === null) return;
144
+ const sourceTerminal = await pollUntilTerminal(sourceRunId);
145
+ expect(sourceTerminal.status).toBe('completed');
146
+ const sourceEvents = await readEvents(sourceRunId);
147
+
148
+ const forkRes = await driver.post(`/v1/runs/${encodeURIComponent(sourceRunId)}:fork`, {
149
+ fromSeq: 0,
150
+ mode: 'replay',
151
+ });
152
+ expect(forkRes.status).toBe(201);
153
+ const replayRunId = (forkRes.json as { runId: string }).runId;
154
+ await pollUntilTerminal(replayRunId);
155
+ const replayEvents = await readEvents(replayRunId);
156
+
157
+ const sourceNorm = sourceEvents.map(stripVolatile);
158
+ const replayNorm = replayEvents.map(stripVolatile);
159
+ expect(
160
+ replayNorm,
161
+ driver.describe(
162
+ 'RFCS/0041-multi-agent-replay-under-nondeterminism.md §C',
163
+ 'a mode:replay fork MUST reproduce the original observable event-log sequence byte-for-byte modulo volatile per-event fields (eventId/ULID entropy, per-region observedAt clock)',
164
+ ),
165
+ ).toEqual(sourceNorm);
166
+ });
167
+ },
168
+ );
169
+
170
+ describe.skipIf(HTTP_SKIP)(
171
+ 'replay-observable-sequence-determinism: observable-result caching (RFC 0041 §C)',
172
+ () => {
173
+ it('replay of a nondeterministic tool node reproduces the ORIGINAL observable result, NOT a fresh call', async (ctx) => {
174
+ if (!(await gateOnPhase4(ctx))) return;
175
+
176
+ const sourceRunId = await startFixtureRun(ctx);
177
+ if (sourceRunId === null) return;
178
+ expect((await pollUntilTerminal(sourceRunId)).status).toBe('completed');
179
+ const sourceEvents = await readEvents(sourceRunId);
180
+
181
+ // The terminal event(s) for the nondeterministic node carry its
182
+ // observable result. Capture every event scoped to that node.
183
+ const sourceNodeEvents = sourceEvents.filter((e) => e.nodeId === NONDET_NODE_ID).map(stripVolatile);
184
+ expect(
185
+ sourceNodeEvents.length,
186
+ driver.describe(
187
+ 'RFCS/0041-multi-agent-replay-under-nondeterminism.md §C',
188
+ `the fixture's nondeterministic node \`${NONDET_NODE_ID}\` MUST emit at least one observable event`,
189
+ ),
190
+ ).toBeGreaterThan(0);
191
+
192
+ const forkRes = await driver.post(`/v1/runs/${encodeURIComponent(sourceRunId)}:fork`, {
193
+ fromSeq: 0,
194
+ mode: 'replay',
195
+ });
196
+ expect(forkRes.status).toBe(201);
197
+ const replayRunId = (forkRes.json as { runId: string }).runId;
198
+ await pollUntilTerminal(replayRunId);
199
+ const replayEvents = await readEvents(replayRunId);
200
+ const replayNodeEvents = replayEvents.filter((e) => e.nodeId === NONDET_NODE_ID).map(stripVolatile);
201
+
202
+ expect(
203
+ replayNodeEvents,
204
+ driver.describe(
205
+ 'RFCS/0041-multi-agent-replay-under-nondeterminism.md §C',
206
+ 'the nondeterministic tool node MUST replay its ORIGINAL observable result (cached event-log entry) rather than re-executing — bit-equivalent re-execution would require unbounded caching, rejected per RFC 0041 §"Alternatives considered" #2',
207
+ ),
208
+ ).toEqual(sourceNodeEvents);
209
+ });
210
+ },
211
+ );
@@ -56,6 +56,7 @@ import { driver } from '../lib/driver.js';
56
56
  import { pollUntilTerminal } from '../lib/polling.js';
57
57
  import { isFixtureAdvertised } from '../lib/fixtures.js';
58
58
  import { capabilityFamily } from '../lib/discovery-capabilities.js';
59
+ import { getCollector, waitForRunSpans } from '../lib/otel-collector.js';
59
60
 
60
61
  const HTTP_SKIP = !process.env.OPENWOP_BASE_URL;
61
62
  const BYOK_WORKFLOW_ID = 'openwop-smoke-byok-roundtrip';
@@ -205,6 +206,57 @@ describe.skipIf(HTTP_SKIP || FIXTURE_SKIP)(
205
206
  },
206
207
  );
207
208
 
209
+ describe.skipIf(HTTP_SKIP || FIXTURE_SKIP)(
210
+ 'secret-leakage-otel-attribute: real OTLP export scrape (collector-side)',
211
+ () => {
212
+ // Distinct from the scrape-seam probe above: this asserts against what
213
+ // the host's OTLP exporter ACTUALLY shipped over the wire to the
214
+ // conformance collector, not what the host self-reports via its
215
+ // `/v1/host/sample/test/otel/spans` seam. A host could redact in its
216
+ // seam yet leak on the real export — only this catches that. Closes
217
+ // the `docs/KNOWN-LIMITS.md` "collector seam doesn't inspect span
218
+ // attributes" gap. Gated on the in-process collector being active
219
+ // (`OPENWOP_OTEL_COLLECTOR=true` + the host configured to export to it).
220
+ it('NO real-exported OTel span/metric attribute MUST contain the BYOK canary plaintext', async (ctx) => {
221
+ const collector = getCollector();
222
+ if (!collector || !CANARY_VALUE) {
223
+ ctx.skip();
224
+ return;
225
+ }
226
+ const d = await readDiscovery();
227
+ const secretsOk = capabilityFamily<{ supported?: unknown }>(d, 'secrets')?.supported === true;
228
+ const obsOk = capabilityFamily<unknown>(d, 'observability') !== undefined;
229
+ if (!secretsOk || !obsOk) {
230
+ ctx.skip();
231
+ return;
232
+ }
233
+
234
+ collector.reset();
235
+ const runId = await startByokRun();
236
+ if (runId === null) {
237
+ ctx.skip();
238
+ return;
239
+ }
240
+ const terminal = await pollUntilTerminal(runId);
241
+ expect(terminal.status).toBe('completed');
242
+
243
+ // Hosts export spans asynchronously after terminal; poll until the
244
+ // run's spans land (or the timeout elapses — an absent export is a
245
+ // separate coverage concern, not a leak).
246
+ await waitForRunSpans(runId, { timeoutMs: 8_000 });
247
+
248
+ const leaks = collector.findCanaryLeakage(CANARY_VALUE);
249
+ expect(
250
+ leaks,
251
+ driver.describe(
252
+ 'SECURITY/invariants.yaml secret-leakage-otel-attribute',
253
+ `no real-exported OTel span/metric attribute may contain the BYOK canary plaintext. Leaking surfaces: ${JSON.stringify(leaks)}`,
254
+ ),
255
+ ).toEqual([]);
256
+ });
257
+ },
258
+ );
259
+
208
260
  describe.skipIf(HTTP_SKIP || FIXTURE_SKIP)(
209
261
  'secret-leakage-otel-attribute: advertisement-shape probe (RFC 0034 §A)',
210
262
  () => {