npm - @openwop/openwop-conformance - Versions diffs - 1.5.0 → 1.6.0 - Mend

@openwop/openwop-conformance 1.5.0 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (61) hide show

package/CHANGELOG.md +19 -0
package/README.md +2 -2
package/api/asyncapi.yaml +8 -3
package/api/openapi.yaml +305 -0
package/coverage.md +29 -4
package/fixtures/conformance-phase4-nondet-tool.json +53 -0
package/fixtures/conformance-phase4-replay-divergence.json +40 -0
package/fixtures.md +5 -3
package/package.json +1 -1
package/schemas/README.md +2 -0
package/schemas/capabilities.schema.json +167 -3
package/schemas/credential-reference.schema.json +21 -0
package/schemas/node-pack-manifest.schema.json +112 -1
package/schemas/run-diff-response.schema.json +64 -0
package/schemas/run-event-payloads.schema.json +104 -2
package/schemas/run-event.schema.json +8 -1
package/schemas/run-snapshot.schema.json +11 -0
package/src/lib/behavior-gate.ts +51 -0
package/src/lib/driver.ts +13 -1
package/src/lib/saml-idp.ts +179 -0
package/src/scenarios/approval-gate-events.test.ts +61 -0
package/src/scenarios/approval-gate-flow.test.ts +68 -0
package/src/scenarios/auth-saml-profile.test.ts +119 -0
package/src/scenarios/auth-scim-profile.test.ts +65 -0
package/src/scenarios/authorization-fail-closed.test.ts +80 -0
package/src/scenarios/authorization-roles-shape.test.ts +83 -0
package/src/scenarios/connector-manifest-validity.test.ts +142 -0
package/src/scenarios/credential-payload-redaction.test.ts +93 -0
package/src/scenarios/credentials-capability-shape.test.ts +90 -0
package/src/scenarios/cross-engine-append-behavior.test.ts +204 -0
package/src/scenarios/cross-host-traceparent-propagation.test.ts +13 -6
package/src/scenarios/cross-workspace-isolation.test.ts +72 -0
package/src/scenarios/deadletter-capability-shape.test.ts +59 -0
package/src/scenarios/deadletter-retry-exhaustion.test.ts +62 -0
package/src/scenarios/experimental-tier-shape.test.ts +192 -0
package/src/scenarios/identity-owner-shape.test.ts +64 -0
package/src/scenarios/multi-agent-confidence-escalation.test.ts +13 -12
package/src/scenarios/multi-agent-memory-lifecycle.test.ts +87 -12
package/src/scenarios/multi-region-idempotency-behavior.test.ts +203 -0
package/src/scenarios/oauth-capability-shape.test.ts +97 -0
package/src/scenarios/oauth-connector-redaction.test.ts +91 -0
package/src/scenarios/pack-registry-isolation.test.ts +108 -0
package/src/scenarios/pack-registry-publish.test.ts +1 -1
package/src/scenarios/prompt-mutation-workspace-membership-enforced.test.ts +126 -0
package/src/scenarios/prompt-read-workspace-membership-enforced.test.ts +183 -0
package/src/scenarios/replay-divergence-at-refusal.test.ts +187 -7
package/src/scenarios/replay-observable-sequence-determinism.test.ts +20 -6
package/src/scenarios/run-diff.test.ts +143 -0
package/src/scenarios/sandbox-capability-gate-respected.test.ts +7 -1
package/src/scenarios/sandbox-memory-cap.test.ts +7 -5
package/src/scenarios/sandbox-mvp-behavior.test.ts +280 -0
package/src/scenarios/sandbox-no-cross-pack-mutation.test.ts +7 -1
package/src/scenarios/sandbox-no-host-env-leak.test.ts +5 -1
package/src/scenarios/sandbox-no-host-fs-escape.test.ts +9 -1
package/src/scenarios/sandbox-no-host-process-escape.test.ts +5 -1
package/src/scenarios/sandbox-no-network-escape.test.ts +5 -1
package/src/scenarios/sandbox-timeout-cap.test.ts +7 -5
package/src/scenarios/scheduling-capability-shape.test.ts +81 -0
package/src/scenarios/scheduling-cron-fires-once.test.ts +66 -0
package/src/scenarios/secret-leakage-otel-attribute.test.ts +241 -0
package/src/scenarios/spec-corpus-validity.test.ts +2 -2

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,24 @@
 # `@openwop/openwop-conformance` Changelog
+## [1.6.0] — 2026-05-25
+Minor bump per `PUBLISHING.md` §"Versioning alignment" — ships the conformance scenarios for the **MyndHyve protocol-extension cohort (RFCs 0045–0054)** so adopting hosts can pin the released suite, run it against their deployment, and report pass for `Draft → Active → Accepted` graduation (per `RFCS/0001-rfc-process.md` §"Promotion to Accepted"). All additive — every new scenario is capability-gated and soft-skips against a host that doesn't advertise the surface, so existing v1.0-only hosts pass unchanged.
+### Added — RFC 0045–0054 cohort scenarios
+- **RFC 0045** (connector pack manifest) — `connector-manifest-validity.test.ts` (server-free: §A schema validity of the `connector` block + both ConnectorAuth variants; §B action/trigger typeId-resolution).
+- **RFC 0046** (`host.credentials`) — `credentials-capability-shape.test.ts` (always) + `credential-payload-redaction.test.ts` (gated; SECURITY invariant `credential-payload-redaction` via the `credentials/echo` seam).
+- **RFC 0047** (`host.oauth`) — `oauth-capability-shape.test.ts` + `oauth-connector-redaction.test.ts` (gated; token redaction via the `oauth/connector-echo` seam).
+- **RFC 0048** (identity triple) — `identity-owner-shape.test.ts` (server-free) + `cross-workspace-isolation.test.ts` (gated; fail-closed `run_forbidden` via the `identity/*` seams).
+- **RFC 0049** (RBAC) — `authorization-roles-shape.test.ts` (always) + `authorization-fail-closed.test.ts` (gated; SECURITY invariant `authorization-fail-closed` via the `authorization/decide` seam).
+- **RFC 0050** (SAML / SCIM) — `auth-saml-profile.test.ts` + `auth-scim-profile.test.ts` (advertisement shape always; behavior opt-in via `OPENWOP_TEST_SAML_IDP_URL` / `OPENWOP_TEST_SCIM_URL` + the `auth/saml/validate` + `auth/scim/provision` seams). Now ships a **bundled synthetic SAML IdP harness** (`conformance/src/lib/saml-idp.ts`, `node:crypto` RSA-SHA256, no deps) that mints the 1-positive + 6-negative assertion variants and whose `verify()` implements the RFC 0050 §A MUST list — `auth-saml-profile.test.ts` runs the full negative reference suite **server-free**; a host's real ACS validates the same assertions over the seam.
+- **RFC 0051** (approval gate) — `approval-gate-events.test.ts` (server-free) + `approval-gate-flow.test.ts` (gated; unauthorized-denied + override-audited via the `governance/approval-gate` seam).
+- **RFC 0052** (scheduling) — `scheduling-capability-shape.test.ts` (always) + `scheduling-cron-fires-once.test.ts` (gated; once-per-tick + missed-tick via the `scheduling/tick` seam).
+- **RFC 0053** (dead-letter) — `deadletter-capability-shape.test.ts` (always) + `deadletter-retry-exhaustion.test.ts` (gated; `run.dead_lettered` + fork-eligibility via the `deadletter/exhaust` seam).
+- **RFC 0054** (run diff) — `run-diff-*.test.ts` (landed with the run-diff endpoint).
+Two new SECURITY invariants gated in this cohort: `credential-payload-redaction` (0046, also covers 0047 tokens) + `authorization-fail-closed` (0049). New `/v1/host/sample/*` seams are catalogued in `spec/v1/host-sample-test-seams.md` §"Open seams". Suite scenario-file count → 230.
 ## [1.5.0] — 2026-05-22
 Minor bump per `PUBLISHING.md` §"Versioning alignment" — unblocks MyndHyve's RFC 0044 + RFC 0039 Half A co-graduation by shipping the relaxed + RFC-0044-routing assertion logic in `multi-agent-confidence-escalation.test.ts`. No new scenario files; no new fixtures. Behavioral honesty pass on 8 sandbox scenarios + schema additions for the new RFC 0044 capability advertisement.

package/README.md CHANGED Viewed

@@ -93,7 +93,7 @@ Exit code is non-zero on any failed assertion.
 ## What's Covered
-The current suite has 205 scenario files under `src/scenarios/`. 2026-05-22 (RFC 0041 Phase 4 — replay determinism under nondeterministic models) added three scenarios: `replay-divergence-at-refusal.test.ts` (advertisement-shape probe on `replayDeterminism.refusalDivergenceEmission` + 2 `it.todo` for the dual-direction refusal-divergence case), `replay-observable-sequence-determinism.test.ts` (capability-gated; behavioral assertion soft-skipped until a `conformance-phase4-nondet-tool` fixture ships), `replay-llm-cache-key-portable.test.ts` (intra-host reproducibility + non-recipe-field invariance + Phase 4 advertisement alignment — reuses the existing `POST /v1/host/sample/test/llm-cache-key` seam from the sibling `replay-llm-cache-key.test.ts`). 2026-05-20 (RFC 0027 §A templateKinds-coverage follow-up — paired with `prompt-end-to-end-events.test.ts`) added `prompt-all-four-kinds-events.test.ts` exercising all four `PromptKind` values (`system`, `user`, `schema-hint`, `few-shot`) end-to-end through the reference workflow-engine sample's `local.sample.demo.mock-ai` dispatch path; capability-gated via `behaviorGate('prompts-supported', ...)`. Closes the credibility gap where the host advertised `templateKinds: ["system", "user", "few-shot", "schema-hint"]` but only the system+user pair was actually wired into dispatch. 2026-05-20 (RFCs 0030–0033 — envelope LLM-contract-hardening track) added 15 scenarios across four `Active` RFCs: `envelope-reasoning-shape.test.ts` (RFC 0030, always-on; asserts the OPTIONAL `reasoning` property on the three universal-kind schemas + the `schema.response` deliberate omission), `envelope-reasoning-secret-redaction.test.ts` (RFC 0030, capability-gated on `capabilities.envelopes.reasoning.supported` + `secrets.supported`; 5 `it.todo()` placeholders for SECURITY invariant `envelope-reasoning-secret-redaction`), `envelope-tier-one-subset-static.test.ts` (RFC 0030, always-on for load-bearing rules — no `oneOf` / `allOf` / `not` / `prefixItems` / `propertyNames` anywhere; gated on `tierOneSubsetCompliance: "strict"` for OpenAI-strict-only constraints), `envelope-variant-discriminator-static.test.ts` (RFC 0031, always-on; asserts no `oneOf` + every `anyOf` branch declares a single-string-enum discriminator in `required` on every `schemas/envelopes/*.schema.json`), `model-capability-substituted.test.ts` (RFC 0031, advertisement-shape probe on `capabilities.modelCapabilities.advertised[]` identifier pattern + 5 `it.todo()` placeholders for SECURITY invariant `model-capability-substituted-no-credential-disclosure`), `model-capability-insufficient.test.ts` (RFC 0031, 6 `it.todo()` placeholders for refusal + no-recursive-fallback), `node-module-required-capabilities-shape.test.ts` (RFC 0031 SHOULD-tier authoring-convention; 4 `it.todo()` placeholders), and the six envelope-reliability events from RFC 0032 (`envelope-retry-attempted` carrying the shared advertisement-shape probe enforcing both MUST-tier events in `events[]` per RFC 0032 §C, plus `envelope-retry-exhausted`, `envelope-refusal-shape`, `envelope-truncated`, `envelope-nl-to-format-engaged`, `envelope-recovery-applied` — collectively 39 `it.todo()` placeholders covering retry/refusal/truncation/recovery + SECURITY invariants `envelope-refusal-no-prompt-leak` and `envelope-recovery-no-content-leak`), plus RFC 0033's two scenarios (`envelope-completion-distinguishes-truncation.test.ts` + `envelope-truncation-cap-exhaustion.test.ts` — 12 `it.todo()` placeholders covering the truncation-vs-schema-violation retry-routing distinction + the DoS-bound assertion). Reference workflow-engine sample advertises `capabilities.envelopes.reasoning: { supported: true, promptDirective: "off" }` + `tierOneSubsetCompliance: "warn"` honestly (schemas accept the field; host doesn't yet inject the directive); the other three RFCs' capability blocks defer to reference-host emission code per the staged RFC 0027 §G precedent. 2026-05-20 (RFC 0028 §B Phase B — prompt-pack boot-time install) added `prompt-pack-install.test.ts` (capability-gated on `capabilities.prompts.endpointsSupported: true`; asserts a host that ran the boot-time pack loader surfaces ≥ 1 pack-source template under `GET /v1/prompts?source=pack` carrying the canonical `meta.source: "pack"` + `meta.packName` + `meta.packVersion` stamps; positively identifies the in-tree `vendor.openwop.prompt-sample` reference pack's `writer-system` template when present). Pairs with the new `host/promptPackLoader.ts` boot-time entry on the reference workflow-engine sample, which scans `examples/packs/*` plus `OPENWOP_PROMPT_PACKS_DIR` and calls `installPackTemplates()` for each `kind: "prompt"` pack found. 2026-05-20 (RFC 0029 Phase C — prompt resolution chain wire shape) added three more scenarios: `prompt-resolution-chain-node-wins.test.ts` (capability-gated on `capabilities.prompts.supported: true`; asserts layer-1 node-config supersedes lower layers per `spec/v1/prompts.md` §"Resolution chain (normative)"), `prompt-resolution-chain-agent-intrinsic.test.ts` (additionally gated on `capabilities.prompts.agentBindings: true`; asserts agent intrinsic `systemPromptRef` wins over `promptOverrides` AND lower layers when the node has no layer-1 ref), `prompt-resolution-chain-fallback-cascade.test.ts` (asserts layer 3 workflow-defaults wins over layer 4 host-defaults; layer 4 host-defaults wins when 1-3 yield null; resolved is null when all four yield null but chain[] still lists every attempted layer). The scenarios drive the host's `POST /v1/host/sample/prompt/resolve` test seam (reference-host implementation deferred to follow-up slice per RFC 0021 staging precedent). 2026-05-20 (RFC 0027 Phase A — prompt templates wire shape) added three scenarios: `prompt-template-shape.test.ts` (always-on; Ajv compileability + positive/negative round-trip for PromptTemplate + PromptRef + PromptKind), `prompt-composed-secret-redaction.test.ts` (capability-gated on `capabilities.prompts.supported: true` + `observability: "full"`; asserts `[REDACTED:<secretId>]` markers in `prompt.composed` payloads for `source: "secret"` variable bindings per SECURITY/threat-model-secret-leakage.md §SR-1), `prompt-composed-trust-marker.test.ts` (same capability gates; asserts `<UNTRUSTED>...</UNTRUSTED>` wrapping + `contentTrust: "untrusted"` propagation per RFC 0020 §D). Paired with new `fixtures/prompt-templates/` sub-directory + per-fixture schema-validity describe block + future SECURITY invariants `prompt-composed-secret-redaction` and `prompt-composed-trust-marker` (lands alongside reference-host emission per RFC 0021 staging precedent). 2026-05-18 (RFC 0022 `Draft` — runtime variable mapping) added four `it.todo()` placeholder scenarios covering the new mapping surfaces on `core.dispatch` (§A — `dispatch-input-mapping.test.ts`, `dispatch-output-mapping.test.ts`, `dispatch-cross-worker-handoff.test.ts`) and `core.subWorkflow` (§B — `subworkflow-input-mapping.test.ts`). Gated on `capabilities.agents.dispatchMapping` (dispatch trio) and `capabilities.subWorkflow.inputMapping` (subWorkflow). Promote to live assertions when RFC 0022 reaches `Active` + a reference host advertises the matching flags. 2026-05-17 (RFC 0003 §D handoff-schema enforcement, HV-1) added `agentPackHandoffSchemaValidation.test.ts` — verifies the host validates dispatch payloads against `handoff.taskSchemaRef` AND return payloads against `handoff.returnSchemaRef` per RFC 0003 §D. Paired with the new `agent-pack-handoff-schema-enforcement` row in `SECURITY/invariants.yaml`. 2026-05-17 (AI Envelope gap-closure, DRAFT v1.x — `spec/v1/ai-envelope.md`) added 7 advertisement-shape scenarios with `it.todo()` behavioral placeholders gated on `capabilities.envelopeContracts.advertised: true`: `aiEnvelope.universalKinds.test.ts`, `aiEnvelope.schemaDrift.test.ts`, `aiEnvelope.correlationReplay.test.ts`, `aiEnvelope.contractRefusal.test.ts`, `aiEnvelope.trustBoundaryPropagation.test.ts`, `aiEnvelope.redaction.test.ts`, `aiEnvelope.capBreached.test.ts`. Paired with the new `envelope-redaction-sr-1-carry-forward` row in `SECURITY/invariants.yaml`. 2026-05-17 (post-publish hardening, deep audit of `core.openwop.agents`) added `agents-run-tool-allowlist.test.ts` — server-free scenario locking in the `core.openwop.agents@1.0.1` safety-fix that closes `OPENWOP-AUDIT-2026-003` (function-typed `tool.handler` properties rejected at `validateTools()` with `INVALID_TOOL_DECLARATION`; tool-driven runs require `ctx.agentRuntime`; tool-less safe fallback preserved). Paired with the new `agents-run-no-raw-handler` row in `SECURITY/invariants.yaml`. Same-day post-publish hardening added `idempotency-key-determinism.test.ts` — server-free scenario locking in the `core.openwop.http@1.1.2` determinism safety-fix (default `composite` mode produces deterministic keys in `(runId, nodeId, payload)`; removed `uuid` mode rejects with `CONFIG_INVALID`; cross-impl vector test lets third-party reimplementations verify wire agreement). Paired with the new `idempotency-key-deterministic` row in `SECURITY/invariants.yaml`. 2026-05-17 (Phase 3 of RFC 0013) added three server-free scenarios exercising the reference workflow-chain expansion library (`conformance/src/lib/workflow-chain-expansion.ts`): `workflow-chain-expansion.test.ts` (parameter substitution + node id collision avoidance + edge rewriting + capability propagation + runtime-invariance contract), `workflow-chain-unresolvable-typeid.test.ts` (rejection with `chain_unresolvable_typeid` when a chain references an unknown typeId), and `workflow-chain-pack-signature-verification.test.ts` (Ed25519 verification recipe reuse from `node-packs.md §Signing`). Earlier that day (Phase 1) added `workflow-chain-pack-manifest-validation.test.ts` — server-free schema-validation scenario covering the new `workflow-chain-pack-manifest.schema.json` (positive sample + two negatives: kind/contents mismatch and invalid `chainId`). Closes RFC 0013 (`Workflow-chain packs`, `Draft`) Phases 1 + 3 alongside the new `spec/v1/workflow-chain-packs.md`, the `Capabilities.workflowChainPacks` block, and the registry build-index/conformance-check `kind` routing from Phase 2. Earlier that day, the suite added 27 `it.todo()` placeholder scenarios paired with RFCs 0014-0020 (host capability surfaces — fs, kvStorage, tableStorage, queueBus, sql/vector/search, blob/cache, mcp.serverMount). These promote to live assertions when each RFC reaches `Active` + the matching capability block lands in `schemas/capabilities.schema.json` + a reference host advertises the capability. Earlier additions include 18 Multi-Agent Shift scenarios (Phases 1-5) added 2026-05-10, the `registry-public.test.ts` public-registry healthcheck added 2026-05-11 (opt-in via `OPENWOP_TEST_PUBLIC_REGISTRY=true`), the `replay-llm-cache-key.test.ts` placeholder added 2026-05-11 (three `it.todo()` cases for the cross-host LLM cache-key recipe per `replay.md` §"LLM cache-key recipe"), the two `production-*.test.ts` scenarios added 2026-05-11 for the `openwop-production` profile per RFC 0009 (`production-backpressure.test.ts`, `production-retention-expiry.test.ts`), the four `auth-*.test.ts` scenarios added 2026-05-11/12 for the production-auth profiles per RFC 0010 (`auth-api-key-rotation.test.ts`, `auth-oauth2-client-credentials.test.ts`, `auth-oidc-user-bearer.test.ts`, `auth-mtls.test.ts` (opt-in via `OPENWOP_TEST_MTLS=1`)), `replay-retention-expiry.test.ts` added 2026-05-12 (capability shape + 410/422 envelope per `replay.md` §"Retention and garbage collection"), `bulk-cancel.test.ts` added 2026-05-12 (Phase B close-out of R1 — `POST /v1/runs:bulk-cancel`), the two Phase H launch-blocker advertisement-contract scenarios added 2026-05-12 (`mcp-toolcall-redaction.test.ts` for the MCP-1 invariant per `host-capabilities.md §host.mcp` + `threat-model-prompt-injection.md §UNTRUSTED`, and `http-client-ssrf.test.ts` for the SSRF + body-size cap advertisement contract on `capabilities.httpClient`), the `wasm-pack-abi-version-rejection.test.ts` Track 7 scenario added 2026-05-12 for the ABI-mismatch positive path via the `vendor.openwop.misbehaving-abi` pack per RFC 0008 §H, the `otel-trace-propagation-subworkflow.test.ts` Track 11 close-out added 2026-05-13 (parent + child run spans share the inbound traceparent's traceId across the `core.subWorkflow` dispatch boundary), and the three RFC 0012 (Memory Compaction Profile, `Active`) scenarios added 2026-05-13/14: `memory-compaction-sr1-carry-forward.test.ts` (load-bearing SR-1 §D), `memory-compaction-event-emitted.test.ts` (canonical §B payload shape), and `memory-compaction-provenance-tag.test.ts` (soft assertion on §C `compacted-from:<id>` convention). All three gate on `capabilities.memory.compaction.supported` + the host's test seam at `/v1/test/memory/{seed,compact}` (Postgres reference host enables both via `OPENWOP_MEMORY_COMPACTION=true OPENWOP_TEST_TRIGGER_COMPACTION=true`). 2026-05-15 (gap-closure CF-3) added `interrupt-token-matrix.test.ts` (malformed / unknown / replay / cross-run-id paths on `GET|POST /v1/interrupts/{token}`). The maintained scenario-to-spec map lives in [`coverage.md`](./coverage.md); this README keeps the operator quickstart and the historical scenario notes below.
+The current suite has 231 scenario files under `src/scenarios/`. 2026-05-25 (RFC 0025 §C point 1 — test-catalog isolation invariant; pairs with the 25 publish-error scenarios in `pack-registry-publish.test.ts`) added `pack-registry-isolation.test.ts` — capability-gated on `capabilities.packs.testMode.{supported, isolated}: true`; PUTs a disposable pack into `/v1/packs-test/{name}` and asserts the same `(name, version)` does NOT appear via `GET /v1/packs/{name}` — anchors the test-catalog isolation MUST in RFC 0025 §C. 2026-05-25 (RFC 0028 Tier-2 post-promotion T2 — read-side sister scenario for workspace-membership enforcement) added `prompt-read-workspace-membership-enforced.test.ts` — gates on `capabilities.prompts.supported: true` (broader than `mutableLibrary` so read-only hosts that expose `?workspaceId=` are also probed); drives `GET /v1/prompts?workspaceId=<random-non-member>` and interprets the response: 4xx PASS (canonical envelope check on 403); 200 with empty `templates[]` PASS (correct null result for a nonexistent workspace); 200 with non-empty `templates[]` FAIL (cross-tenant leak); 200 without `templates[]` field SKIP (host doesn't expose workspace-scoped reads). Verifies SECURITY invariant `prompt-read-workspace-membership-enforced`. Same-day T1 strengthened `prompt-mutation-workspace-membership-enforced.test.ts` to pin `error === "workspace_membership_required"` when the host's refusal status is 403 (other refusal codes unconstrained). 2026-05-25 (RFC 0028 Tier-2 follow-up — workspace-membership enforcement on mutating prompt endpoints, filed in response to a self-disclosed adopter vulnerability) added `prompt-mutation-workspace-membership-enforced.test.ts` — capability-gated on `capabilities.prompts.mutableLibrary: true`; drives `POST /v1/prompts` with a cryptographically-random non-member `workspaceId` and asserts the host refuses (NOT a 2xx; any 4xx/5xx is acceptable — silent success is the failure mode). Verifies SECURITY invariant `prompt-mutation-workspace-membership-enforced`. 2026-05-22 (RFC 0034 §B follow-up — secret-leakage harness against the OTel + debug-bundle seams) added `secret-leakage-otel-attribute.test.ts` — gates on `capabilities.secrets.supported` + `capabilities.observability.testSeams.{otelScrape,debugBundleExport}` AND the `OPENWOP_CANARY_SECRET_VALUE` env (host operator + conformance runner agree on the canary). Drives the existing `openwop-smoke-byok-roundtrip` fixture end-to-end; scrapes both seams after run completion; hard-fails if the canary plaintext appears in any OTel span attribute or debug-bundle field. Verifies SECURITY invariants `secret-leakage-otel-attribute` + `secret-leakage-debug-bundle-otel`. 2026-05-22 (RFC 0041 Phase 4 — replay determinism under nondeterministic models) added three scenarios: `replay-divergence-at-refusal.test.ts` (advertisement-shape probe on `replayDeterminism.refusalDivergenceEmission` + 2 `it.todo` for the dual-direction refusal-divergence case), `replay-observable-sequence-determinism.test.ts` (capability-gated; behavioral assertion soft-skipped until a `conformance-phase4-nondet-tool` fixture ships), `replay-llm-cache-key-portable.test.ts` (intra-host reproducibility + non-recipe-field invariance + Phase 4 advertisement alignment — reuses the existing `POST /v1/host/sample/test/llm-cache-key` seam from the sibling `replay-llm-cache-key.test.ts`). 2026-05-20 (RFC 0027 §A templateKinds-coverage follow-up — paired with `prompt-end-to-end-events.test.ts`) added `prompt-all-four-kinds-events.test.ts` exercising all four `PromptKind` values (`system`, `user`, `schema-hint`, `few-shot`) end-to-end through the reference workflow-engine sample's `local.sample.demo.mock-ai` dispatch path; capability-gated via `behaviorGate('prompts-supported', ...)`. Closes the credibility gap where the host advertised `templateKinds: ["system", "user", "few-shot", "schema-hint"]` but only the system+user pair was actually wired into dispatch. 2026-05-20 (RFCs 0030–0033 — envelope LLM-contract-hardening track) added 15 scenarios across four `Active` RFCs: `envelope-reasoning-shape.test.ts` (RFC 0030, always-on; asserts the OPTIONAL `reasoning` property on the three universal-kind schemas + the `schema.response` deliberate omission), `envelope-reasoning-secret-redaction.test.ts` (RFC 0030, capability-gated on `capabilities.envelopes.reasoning.supported` + `secrets.supported`; 5 `it.todo()` placeholders for SECURITY invariant `envelope-reasoning-secret-redaction`), `envelope-tier-one-subset-static.test.ts` (RFC 0030, always-on for load-bearing rules — no `oneOf` / `allOf` / `not` / `prefixItems` / `propertyNames` anywhere; gated on `tierOneSubsetCompliance: "strict"` for OpenAI-strict-only constraints), `envelope-variant-discriminator-static.test.ts` (RFC 0031, always-on; asserts no `oneOf` + every `anyOf` branch declares a single-string-enum discriminator in `required` on every `schemas/envelopes/*.schema.json`), `model-capability-substituted.test.ts` (RFC 0031, advertisement-shape probe on `capabilities.modelCapabilities.advertised[]` identifier pattern + 5 `it.todo()` placeholders for SECURITY invariant `model-capability-substituted-no-credential-disclosure`), `model-capability-insufficient.test.ts` (RFC 0031, 6 `it.todo()` placeholders for refusal + no-recursive-fallback), `node-module-required-capabilities-shape.test.ts` (RFC 0031 SHOULD-tier authoring-convention; 4 `it.todo()` placeholders), and the six envelope-reliability events from RFC 0032 (`envelope-retry-attempted` carrying the shared advertisement-shape probe enforcing both MUST-tier events in `events[]` per RFC 0032 §C, plus `envelope-retry-exhausted`, `envelope-refusal-shape`, `envelope-truncated`, `envelope-nl-to-format-engaged`, `envelope-recovery-applied` — collectively 39 `it.todo()` placeholders covering retry/refusal/truncation/recovery + SECURITY invariants `envelope-refusal-no-prompt-leak` and `envelope-recovery-no-content-leak`), plus RFC 0033's two scenarios (`envelope-completion-distinguishes-truncation.test.ts` + `envelope-truncation-cap-exhaustion.test.ts` — 12 `it.todo()` placeholders covering the truncation-vs-schema-violation retry-routing distinction + the DoS-bound assertion). Reference workflow-engine sample advertises `capabilities.envelopes.reasoning: { supported: true, promptDirective: "off" }` + `tierOneSubsetCompliance: "warn"` honestly (schemas accept the field; host doesn't yet inject the directive); the other three RFCs' capability blocks defer to reference-host emission code per the staged RFC 0027 §G precedent. 2026-05-20 (RFC 0028 §B Phase B — prompt-pack boot-time install) added `prompt-pack-install.test.ts` (capability-gated on `capabilities.prompts.endpointsSupported: true`; asserts a host that ran the boot-time pack loader surfaces ≥ 1 pack-source template under `GET /v1/prompts?source=pack` carrying the canonical `meta.source: "pack"` + `meta.packName` + `meta.packVersion` stamps; positively identifies the in-tree `vendor.openwop.prompt-sample` reference pack's `writer-system` template when present). Pairs with the new `host/promptPackLoader.ts` boot-time entry on the reference workflow-engine sample, which scans `examples/packs/*` plus `OPENWOP_PROMPT_PACKS_DIR` and calls `installPackTemplates()` for each `kind: "prompt"` pack found. 2026-05-20 (RFC 0029 Phase C — prompt resolution chain wire shape) added three more scenarios: `prompt-resolution-chain-node-wins.test.ts` (capability-gated on `capabilities.prompts.supported: true`; asserts layer-1 node-config supersedes lower layers per `spec/v1/prompts.md` §"Resolution chain (normative)"), `prompt-resolution-chain-agent-intrinsic.test.ts` (additionally gated on `capabilities.prompts.agentBindings: true`; asserts agent intrinsic `systemPromptRef` wins over `promptOverrides` AND lower layers when the node has no layer-1 ref), `prompt-resolution-chain-fallback-cascade.test.ts` (asserts layer 3 workflow-defaults wins over layer 4 host-defaults; layer 4 host-defaults wins when 1-3 yield null; resolved is null when all four yield null but chain[] still lists every attempted layer). The scenarios drive the host's `POST /v1/host/sample/prompt/resolve` test seam (reference-host implementation deferred to follow-up slice per RFC 0021 staging precedent). 2026-05-20 (RFC 0027 Phase A — prompt templates wire shape) added three scenarios: `prompt-template-shape.test.ts` (always-on; Ajv compileability + positive/negative round-trip for PromptTemplate + PromptRef + PromptKind), `prompt-composed-secret-redaction.test.ts` (capability-gated on `capabilities.prompts.supported: true` + `observability: "full"`; asserts `[REDACTED:<secretId>]` markers in `prompt.composed` payloads for `source: "secret"` variable bindings per SECURITY/threat-model-secret-leakage.md §SR-1), `prompt-composed-trust-marker.test.ts` (same capability gates; asserts `<UNTRUSTED>...</UNTRUSTED>` wrapping + `contentTrust: "untrusted"` propagation per RFC 0020 §D). Paired with new `fixtures/prompt-templates/` sub-directory + per-fixture schema-validity describe block + future SECURITY invariants `prompt-composed-secret-redaction` and `prompt-composed-trust-marker` (lands alongside reference-host emission per RFC 0021 staging precedent). 2026-05-18 (RFC 0022 `Draft` — runtime variable mapping) added four `it.todo()` placeholder scenarios covering the new mapping surfaces on `core.dispatch` (§A — `dispatch-input-mapping.test.ts`, `dispatch-output-mapping.test.ts`, `dispatch-cross-worker-handoff.test.ts`) and `core.subWorkflow` (§B — `subworkflow-input-mapping.test.ts`). Gated on `capabilities.agents.dispatchMapping` (dispatch trio) and `capabilities.subWorkflow.inputMapping` (subWorkflow). Promote to live assertions when RFC 0022 reaches `Active` + a reference host advertises the matching flags. 2026-05-17 (RFC 0003 §D handoff-schema enforcement, HV-1) added `agentPackHandoffSchemaValidation.test.ts` — verifies the host validates dispatch payloads against `handoff.taskSchemaRef` AND return payloads against `handoff.returnSchemaRef` per RFC 0003 §D. Paired with the new `agent-pack-handoff-schema-enforcement` row in `SECURITY/invariants.yaml`. 2026-05-17 (AI Envelope gap-closure, DRAFT v1.x — `spec/v1/ai-envelope.md`) added 7 advertisement-shape scenarios with `it.todo()` behavioral placeholders gated on `capabilities.envelopeContracts.advertised: true`: `aiEnvelope.universalKinds.test.ts`, `aiEnvelope.schemaDrift.test.ts`, `aiEnvelope.correlationReplay.test.ts`, `aiEnvelope.contractRefusal.test.ts`, `aiEnvelope.trustBoundaryPropagation.test.ts`, `aiEnvelope.redaction.test.ts`, `aiEnvelope.capBreached.test.ts`. Paired with the new `envelope-redaction-sr-1-carry-forward` row in `SECURITY/invariants.yaml`. 2026-05-17 (post-publish hardening, deep audit of `core.openwop.agents`) added `agents-run-tool-allowlist.test.ts` — server-free scenario locking in the `core.openwop.agents@1.0.1` safety-fix that closes `OPENWOP-AUDIT-2026-003` (function-typed `tool.handler` properties rejected at `validateTools()` with `INVALID_TOOL_DECLARATION`; tool-driven runs require `ctx.agentRuntime`; tool-less safe fallback preserved). Paired with the new `agents-run-no-raw-handler` row in `SECURITY/invariants.yaml`. Same-day post-publish hardening added `idempotency-key-determinism.test.ts` — server-free scenario locking in the `core.openwop.http@1.1.2` determinism safety-fix (default `composite` mode produces deterministic keys in `(runId, nodeId, payload)`; removed `uuid` mode rejects with `CONFIG_INVALID`; cross-impl vector test lets third-party reimplementations verify wire agreement). Paired with the new `idempotency-key-deterministic` row in `SECURITY/invariants.yaml`. 2026-05-17 (Phase 3 of RFC 0013) added three server-free scenarios exercising the reference workflow-chain expansion library (`conformance/src/lib/workflow-chain-expansion.ts`): `workflow-chain-expansion.test.ts` (parameter substitution + node id collision avoidance + edge rewriting + capability propagation + runtime-invariance contract), `workflow-chain-unresolvable-typeid.test.ts` (rejection with `chain_unresolvable_typeid` when a chain references an unknown typeId), and `workflow-chain-pack-signature-verification.test.ts` (Ed25519 verification recipe reuse from `node-packs.md §Signing`). Earlier that day (Phase 1) added `workflow-chain-pack-manifest-validation.test.ts` — server-free schema-validation scenario covering the new `workflow-chain-pack-manifest.schema.json` (positive sample + two negatives: kind/contents mismatch and invalid `chainId`). Closes RFC 0013 (`Workflow-chain packs`, `Draft`) Phases 1 + 3 alongside the new `spec/v1/workflow-chain-packs.md`, the `Capabilities.workflowChainPacks` block, and the registry build-index/conformance-check `kind` routing from Phase 2. Earlier that day, the suite added 27 `it.todo()` placeholder scenarios paired with RFCs 0014-0020 (host capability surfaces — fs, kvStorage, tableStorage, queueBus, sql/vector/search, blob/cache, mcp.serverMount). These promote to live assertions when each RFC reaches `Active` + the matching capability block lands in `schemas/capabilities.schema.json` + a reference host advertises the capability. Earlier additions include 18 Multi-Agent Shift scenarios (Phases 1-5) added 2026-05-10, the `registry-public.test.ts` public-registry healthcheck added 2026-05-11 (opt-in via `OPENWOP_TEST_PUBLIC_REGISTRY=true`), the `replay-llm-cache-key.test.ts` placeholder added 2026-05-11 (three `it.todo()` cases for the cross-host LLM cache-key recipe per `replay.md` §"LLM cache-key recipe"), the two `production-*.test.ts` scenarios added 2026-05-11 for the `openwop-production` profile per RFC 0009 (`production-backpressure.test.ts`, `production-retention-expiry.test.ts`), the four `auth-*.test.ts` scenarios added 2026-05-11/12 for the production-auth profiles per RFC 0010 (`auth-api-key-rotation.test.ts`, `auth-oauth2-client-credentials.test.ts`, `auth-oidc-user-bearer.test.ts`, `auth-mtls.test.ts` (opt-in via `OPENWOP_TEST_MTLS=1`)), `replay-retention-expiry.test.ts` added 2026-05-12 (capability shape + 410/422 envelope per `replay.md` §"Retention and garbage collection"), `bulk-cancel.test.ts` added 2026-05-12 (Phase B close-out of R1 — `POST /v1/runs:bulk-cancel`), the two Phase H launch-blocker advertisement-contract scenarios added 2026-05-12 (`mcp-toolcall-redaction.test.ts` for the MCP-1 invariant per `host-capabilities.md §host.mcp` + `threat-model-prompt-injection.md §UNTRUSTED`, and `http-client-ssrf.test.ts` for the SSRF + body-size cap advertisement contract on `capabilities.httpClient`), the `wasm-pack-abi-version-rejection.test.ts` Track 7 scenario added 2026-05-12 for the ABI-mismatch positive path via the `vendor.openwop.misbehaving-abi` pack per RFC 0008 §H, the `otel-trace-propagation-subworkflow.test.ts` Track 11 close-out added 2026-05-13 (parent + child run spans share the inbound traceparent's traceId across the `core.subWorkflow` dispatch boundary), and the three RFC 0012 (Memory Compaction Profile, `Active`) scenarios added 2026-05-13/14: `memory-compaction-sr1-carry-forward.test.ts` (load-bearing SR-1 §D), `memory-compaction-event-emitted.test.ts` (canonical §B payload shape), and `memory-compaction-provenance-tag.test.ts` (soft assertion on §C `compacted-from:<id>` convention). All three gate on `capabilities.memory.compaction.supported` + the host's test seam at `/v1/test/memory/{seed,compact}` (Postgres reference host enables both via `OPENWOP_MEMORY_COMPACTION=true OPENWOP_TEST_TRIGGER_COMPACTION=true`). 2026-05-15 (gap-closure CF-3) added `interrupt-token-matrix.test.ts` (malformed / unknown / replay / cross-run-id paths on `GET|POST /v1/interrupts/{token}`). The maintained scenario-to-spec map lives in [`coverage.md`](./coverage.md); this README keeps the operator quickstart and the historical scenario notes below.
 High-level coverage includes:
@@ -172,7 +172,7 @@ Server-required (added in 1.7.0):
 |---|---|---|
 | **Redaction** | [`capabilities.md`](../spec/v1/capabilities.md) §"Secrets" + NFR-7 + §"aiProviders" | Vendor-neutral assertions that the server doesn't leak secret material. Three scenario groups: (a) discovery shape contract — `secrets` + `aiProviders` advertisements are well-formed regardless of `secrets.supported`; when `supported === true`, scopes MUST be non-empty + `resolution === 'host-managed'`; `byok ⊆ supported`. (b) bearer-token redaction — invalid Bearer canary in `Authorization` header is not echoed in the 401 response body. (c) credentialRef echo control — gated on `secrets.supported === true`; canary planted in `configurable.ai.credentialRef` MUST NOT appear in any RunEvent payload (poll-based capture; transport-agnostic). Uses runtime-built canary fixtures (`lib/canaries.ts`) that defeat static secret scanners. 6 scenarios. |
-Current source tree: 205 scenario files. Use [`coverage.md`](./coverage.md) for current grade/gap tracking.
+Current source tree: 231 scenario files. Use [`coverage.md`](./coverage.md) for current grade/gap tracking.
 ## Remaining Gaps

package/api/asyncapi.yaml CHANGED Viewed

@@ -433,9 +433,14 @@ components:
       summary: |
         Type-erased handler for `debug` mode — discriminate on the
         `type` field per the `RunEventType` enum in the run-event
-        JSON Schema. Includes events filtered out of `updates`:
-        `log.appended`, `variable.changed`, `version.pinned`,
-        `lease.*`, `node.retried`, `replay.diverged`, etc.
+        JSON Schema (the authoritative, exhaustive event list; the
+        named messages above are a curated `updates`-tier subset).
+        Includes events filtered out of `updates`: `log.appended`,
+        `variable.changed`, `version.pinned`, `lease.*`, `node.retried`,
+        `replay.diverged`, `connector.authorized`,
+        `connector.auth_expired` (RFC 0047), `authorization.decided`
+        (RFC 0049), `approval.granted` / `approval.rejected` /
+        `approval.overridden` (RFC 0051), etc.
       contentType: application/json
       payload:
         $ref: '#/components/schemas/RunEventDoc'

package/api/openapi.yaml CHANGED Viewed

@@ -61,6 +61,13 @@ tags:
     description: Audit-log integrity verification (gated on the `openwop-audit-log-integrity` profile).
   - name: prompts
     description: Prompt-template library — list, fetch, render, mutate (RFC 0028; gated on `capabilities.prompts.*`).
+  - name: packs-test
+    description: |
+      RFC 0025 (`Draft`). Test-mode mirror of the production `/v1/packs/*` publish/get/delete/sig surface against
+      an isolated catalog. Gated on `capabilities.packs.testMode.supported: true` plus the reference impl's
+      `OPENWOP_PACKS_TEST_NAMESPACE_ENABLED=true` env-gate. Lets the conformance suite exercise the documented
+      19-code publish error catalog without `packs:publish` scope on the real registry. Hosts that haven't
+      mounted this surface MUST return `404 Not Found` for every path under `/v1/packs-test/`.
 # ─────────────────────────────────────────────────────────────────────────────
 # PATHS
@@ -502,6 +509,54 @@ paths:
             application/json:
               schema: { $ref: '#/components/schemas/Error' }
+  /v1/runs/{runId}:diff:
+    get:
+      tags: [runs]
+      summary: |
+        RFC 0054 — return a deterministic, replay-aware structured diff of
+        two runs (typically a run and its RFC 0011 fork): `divergedAtSeq` +
+        ordered `eventDiffs[]` + `stateDiff`. The diff is a pure function of
+        the two event logs (see `replay.md` determinism contract). Requires
+        `runs:read` on BOTH runs. Hosts that don't implement it return 404.
+      operationId: diffRun
+      parameters:
+        - $ref: '#/components/parameters/RunId'
+        - name: against
+          in: query
+          required: true
+          description: The other run id to diff `{runId}` against (the `b` run).
+          schema: { type: string }
+      responses:
+        '200':
+          description: |
+            Structured diff of the two runs. `divergedAtSeq: null` + empty
+            `eventDiffs` when the logs are identical.
+          content:
+            application/json:
+              schema:
+                $ref: '../schemas/run-diff-response.schema.json'
+        '400':
+          description: Missing or malformed `against` query parameter.
+          content:
+            application/json:
+              schema: { $ref: '#/components/schemas/Error' }
+        '401': { $ref: '#/components/responses/Unauthenticated' }
+        '403':
+          description: |
+            Caller lacks `runs:read` on `{runId}` and/or on `against`
+            (`forbidden`); composes with RFC 0048 cross-workspace
+            isolation.
+          content:
+            application/json:
+              schema: { $ref: '#/components/schemas/Error' }
+        '404':
+          description: |
+            Either run doesn't exist, OR the host doesn't implement the diff
+            endpoint and treats the path as absent.
+          content:
+            application/json:
+              schema: { $ref: '#/components/schemas/Error' }
   /v1/runs/{runId}:pause:
     post:
       tags: [runs]
@@ -1146,6 +1201,209 @@ paths:
               schema:
                 $ref: '../schemas/error-envelope.schema.json'
+  # ── Test-mode pack registry namespace (RFC 0025) ─────────────────────────
+  # Mirrors the production /v1/packs/* PUT/GET/DELETE/.sig surface against an
+  # isolated catalog. Conformance scenarios under
+  # `conformance/src/scenarios/pack-registry-publish.test.ts` exercise the
+  # 19-code publish error catalog through this namespace. Hosts that don't
+  # advertise `capabilities.packs.testMode.supported: true` MUST return
+  # 404 for every path below.
+  /v1/packs-test/{name}/-/{version}.tgz:
+    parameters:
+      - $ref: '#/components/parameters/PackName'
+      - $ref: '#/components/parameters/PackVersion'
+    put:
+      tags: [packs-test]
+      summary: Publish a pack tarball to the isolated test catalog.
+      description: |
+        Mirror of `PUT /v1/packs/{name}/-/{version}.tgz` per
+        `spec/v1/node-packs.md` §"PUT /v1/packs/{name}/-/{version}.tgz".
+        Request shape, response shape, status codes, and the 19-code
+        publish error catalog (`invalid_pack_scope`, `invalid_pack_name`,
+        `invalid_version`, `invalid_body`, eight `tarball_*` codes,
+        `invalid_manifest`, `manifest_mismatch` (or the granular
+        `manifest_name_mismatch` / `manifest_version_mismatch` pair),
+        `pack_integrity_failure`, `unsupported_runtime`, `forbidden`,
+        `conflict`/`version_conflict`) MUST be served verbatim. The
+        test catalog MUST be isolated per RFC 0025 §C — a pack PUT'd
+        here MUST NOT appear in `GET /v1/packs/{name}` listings.
+      operationId: putTestPackTarball
+      parameters:
+        - in: header
+          name: X-Pack-Signing-Method
+          required: false
+          schema: { type: string, enum: [sigstore, manual, none] }
+        - in: header
+          name: X-Pack-Sha256
+          required: false
+          schema:
+            type: string
+            pattern: '^sha256-[A-Za-z0-9+/=]+$'
+          description: Caller-asserted SHA-256 (server verifies; mismatch surfaces `pack_integrity_failure`).
+      requestBody:
+        required: true
+        description: Gzipped tarball bytes (`application/tar+gzip`, `application/gzip`, `application/x-gzip`, or `application/octet-stream`).
+        content:
+          application/gzip:
+            schema: { type: string, format: binary }
+          application/x-gzip:
+            schema: { type: string, format: binary }
+          application/tar+gzip:
+            schema: { type: string, format: binary }
+          application/octet-stream:
+            schema: { type: string, format: binary }
+      responses:
+        '200':
+          description: Idempotent re-publish — identical sha256 content already published; existing record returned.
+          content:
+            application/json:
+              schema: { $ref: '#/components/schemas/TestPackPublishRecord' }
+        '201':
+          description: New version published to the test catalog.
+          content:
+            application/json:
+              schema: { $ref: '#/components/schemas/TestPackPublishRecord' }
+        '400':
+          description: |
+            One of the 17 spec-documented 400-class publish error codes:
+            URL/scope (`invalid_pack_scope`, `invalid_pack_name`, `invalid_version`),
+            body shape (`invalid_body`),
+            tarball extraction (`tarball_gunzip_failed`, `tarball_too_large`,
+            `tarball_manifest_missing`, `tarball_manifest_too_large`,
+            `tarball_manifest_not_json`, `tarball_entry_missing`,
+            `tarball_entry_too_large`, `tarball_path_traversal`,
+            `tarball_tar_parse_failed`),
+            manifest contents (`invalid_manifest`, `manifest_mismatch` or
+            the granular `manifest_name_mismatch` / `manifest_version_mismatch` pair,
+            `pack_integrity_failure`, `unsupported_runtime`).
+          content:
+            application/json:
+              schema: { $ref: '../schemas/error-envelope.schema.json' }
+        '401': { $ref: '#/components/responses/Unauthenticated' }
+        '403':
+          description: '`forbidden` — caller lacks `packs:publish` scope or the namespace claim.'
+          content:
+            application/json:
+              schema: { $ref: '../schemas/error-envelope.schema.json' }
+        '404':
+          description: 'Host does not advertise `capabilities.packs.testMode.supported: true`, or the test-mode env-gate is unset.'
+          content:
+            application/json:
+              schema: { $ref: '../schemas/error-envelope.schema.json' }
+        '409':
+          description: '`conflict` (or `version_conflict`) — `(name, version)` already published with different content.'
+          content:
+            application/json:
+              schema: { $ref: '../schemas/error-envelope.schema.json' }
+    get:
+      tags: [packs-test]
+      summary: Fetch a published test-catalog tarball.
+      description: 'Mirror of `GET /v1/packs/{name}/-/{version}.tgz`. Returns the gzipped tarball bytes with `Content-Type: application/tar+gzip` and an `ETag: "sha256-..."` matching the manifest''s `tarballSha256`.'
+      operationId: getTestPackTarball
+      responses:
+        '200':
+          description: Tarball bytes.
+          content:
+            application/tar+gzip:
+              schema: { type: string, format: binary }
+        '400':
+          description: '`invalid_pack_name` or `invalid_version` — URL params malformed.'
+          content:
+            application/json:
+              schema: { $ref: '../schemas/error-envelope.schema.json' }
+        '401': { $ref: '#/components/responses/Unauthenticated' }
+        '403':
+          description: '`forbidden` — caller lacks `packs:read` scope.'
+          content:
+            application/json:
+              schema: { $ref: '../schemas/error-envelope.schema.json' }
+        '404':
+          description: 'Pack version not found in the test catalog (or host does not advertise `capabilities.packs.testMode.supported: true`).'
+          content:
+            application/json:
+              schema: { $ref: '../schemas/error-envelope.schema.json' }
+  /v1/packs-test/{name}/-/{version}:
+    parameters:
+      - $ref: '#/components/parameters/PackName'
+      - $ref: '#/components/parameters/PackVersion'
+    delete:
+      tags: [packs-test]
+      summary: Unpublish a test-catalog version (mirrors unpublish-window semantics).
+      description: |
+        Mirror of `DELETE /v1/packs/{name}/-/{version}` per
+        `spec/v1/node-packs.md`. Returns `400 unpublish_window_expired`
+        for versions older than the registry's unpublish window
+        (default 72h). Test-mode implementations MAY shorten the
+        window for tractable conformance fixtures but MUST surface
+        the same error code.
+      operationId: deleteTestPackVersion
+      responses:
+        '204':
+          description: Version successfully unpublished from the test catalog.
+        '400':
+          description: '`unpublish_window_expired`, `invalid_pack_name`, or `invalid_version`.'
+          content:
+            application/json:
+              schema: { $ref: '../schemas/error-envelope.schema.json' }
+        '401': { $ref: '#/components/responses/Unauthenticated' }
+        '403':
+          description: '`forbidden` — caller lacks `packs:publish` scope.'
+          content:
+            application/json:
+              schema: { $ref: '../schemas/error-envelope.schema.json' }
+        '404':
+          description: Version doesn't exist in the test catalog, or host does not advertise the test-mode capability.
+          content:
+            application/json:
+              schema: { $ref: '../schemas/error-envelope.schema.json' }
+  /v1/packs-test/{name}/-/{version}.sig:
+    parameters:
+      - $ref: '#/components/parameters/PackName'
+      - $ref: '#/components/parameters/PackVersion'
+    get:
+      tags: [packs-test]
+      summary: Fetch the detached Ed25519 signature for a test-catalog pack.
+      description: |
+        Mirror of `GET /v1/packs/{name}/-/{version}.sig`. Returns the
+        signature blob over `pack.json` for this version. MAY 302-redirect
+        to a storage-backend signed URL.
+      operationId: getTestPackSignature
+      responses:
+        '200':
+          description: Signature blob.
+          content:
+            application/octet-stream:
+              schema: { type: string, format: binary }
+        '302':
+          description: Redirect to a storage-backend signed URL (clients SHOULD follow).
+          headers:
+            Location:
+              schema: { type: string, format: uri }
+        '400':
+          description: '`invalid_pack_name` or `invalid_version` — URL params malformed.'
+          content:
+            application/json:
+              schema: { $ref: '../schemas/error-envelope.schema.json' }
+        '401': { $ref: '#/components/responses/Unauthenticated' }
+        '403':
+          description: '`forbidden` — caller lacks `packs:read` scope.'
+          content:
+            application/json:
+              schema: { $ref: '../schemas/error-envelope.schema.json' }
+        '404':
+          description: |
+            `signature_not_available` — version is missing, yanked,
+            unsigned at publish time, OR the registry's storage backend
+            is unwired. The four cases are intentionally
+            indistinguishable per spec/v1/node-packs.md §"GET /v1/packs/{name}/-/{version}.sig".
+            Also returned when the host does not advertise
+            `capabilities.packs.testMode.supported: true`.
+          content:
+            application/json:
+              schema: { $ref: '../schemas/error-envelope.schema.json' }
 # ─────────────────────────────────────────────────────────────────────────────
 # COMPONENTS
 # ─────────────────────────────────────────────────────────────────────────────
@@ -1189,6 +1447,25 @@ components:
         Duplicate requests return the cached response with header
         `openwop-Idempotent-Replay: true`.
+    PackName:
+      in: path
+      name: name
+      required: true
+      schema:
+        type: string
+        minLength: 3
+        maxLength: 214
+      description: Reverse-DNS pack name per `spec/v1/node-packs.md` §Naming (e.g. `vendor.acme.salesforce-tools`).
+    PackVersion:
+      in: path
+      name: version
+      required: true
+      schema:
+        type: string
+        pattern: '^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-[\w.-]+)?(?:\+[\w.-]+)?$'
+      description: SemVer 2.0.0 version of the pack.
   responses:
     Unauthenticated:
       description: Missing or invalid credential.
@@ -1304,3 +1581,31 @@ components:
                 supported:
                   type: array
                   items: { type: string, enum: [values, updates, messages, debug] }
+    TestPackPublishRecord:
+      description: |
+        Response body for a successful publish against the test-mode
+        registry namespace (RFC 0025). Mirror of the production publish
+        record returned by `PUT /v1/packs/{name}/-/{version}.tgz`.
+      type: object
+      required: [name, version, tarballSha256, publishedAt]
+      properties:
+        name:
+          type: string
+          description: Reverse-DNS pack name as PUT'd.
+        version:
+          type: string
+          description: SemVer 2.0.0 version as PUT'd.
+        tarballSha256:
+          type: string
+          pattern: '^sha256-[A-Za-z0-9+/=]+$'
+          description: Server-computed SHA-256 over the uploaded tarball bytes.
+        publishedAt:
+          type: string
+          format: date-time
+        signed:
+          type: boolean
+          description: Whether a sibling `.sig` signature blob was persisted.
+        signingMethod:
+          type: string
+          enum: [sigstore, manual, none]

package/coverage.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # OpenWOP Conformance Coverage Map
-> **Status: Living document. Updated 2026-05-22.** This map connects the current scenario files to the protocol surfaces they protect and records the remaining gaps from the protocol deep dive. Scenario names are source-of-truth file names under `conformance/src/scenarios/`.
+> **Status: Living document. Updated 2026-05-25.** This map connects the current scenario files to the protocol surfaces they protect and records the remaining gaps from the protocol deep dive. Scenario names are source-of-truth file names under `conformance/src/scenarios/`.
 > **Shape grade vs behavior grade.** Some optional-profile scenarios validate **capability shape** (the host's discovery advertisement is well-formed) without yet exercising **behavior** (the host actually implements the profile end-to-end). The "Current grade" column reflects shape; see §"Capability-gated scenarios: shape vs behavior" below for the dual-grade view and the `OPENWOP_REQUIRE_BEHAVIOR=true` strict-mode runner flag.
@@ -40,10 +40,14 @@
 | Envelope variant discrimination + model capabilities (RFC 0031 — `spec/v1/ai-envelope.md` §"Variant payload discrimination (normative)", `spec/v1/host-capabilities.md` §"Model-capability declarations", `spec/v1/node-packs.md` §"Model-capability declarations on NodeModules") | `envelope-variant-discriminator-static.test.ts`, `model-capability-substituted.test.ts`, `model-capability-insufficient.test.ts`, `node-module-required-capabilities-shape.test.ts` | B+ (discriminator-static + advertisement-shape always-on; 14 live behavioral assertions across substitution + insufficient + authoring-convention, capability-gated) | RFC 0031 promoted Draft → Active 2026-05-20. `envelope-variant-discriminator-static` (always-on) walks every `schemas/envelopes/*.schema.json` asserting no `oneOf` at any nesting depth (Gemini silently drops `oneOf`, producing looser-than-declared schemas — a silent correctness bug) AND every `anyOf` branch declares a single-string-`enum` discriminator in `required` per RFC 0031 §A. `model-capability-substituted` (capability-gated on `capabilities.modelCapabilities.supported` + `substitutionSupported`) carries advertisement-shape check on the `advertised: string[]` pattern (each identifier matches the spec-reserved set OR `^x-host-<host>-<key>$` per RFC 0031 §C) + 4 live behavioral assertions covering substitution emission + SECURITY invariant `model-capability-substituted-no-credential-disclosure`'s all-or-nothing `"[REDACTED]"` redaction option. `model-capability-insufficient` (capability-gated on `modelCapabilities.supported`) carries 6 live behavioral assertions covering refusal emission paths + the no-recursive-fallback constraint (RFC 0031 §"Unresolved questions" #3 — `fallbackAttempted: true` when the declared fallback itself fails; NO chaining). `node-module-required-capabilities-shape` (SHOULD-tier authoring convention check) carries 4 live assertions for the `core.ai.*` typeId-pattern recommendation. Path to `Accepted`: reference host implements `executor/modelCapabilityGate.ts` end-to-end + advertises `capabilities.modelCapabilities: { supported: true, advertised: [...], substitutionSupported: true }` (the live behavioral assertions soft-skip cleanly on hosts that haven't wired the executor yet). |
 | Envelope-reliability run-event vocabulary (RFC 0032 — `spec/v1/ai-envelope.md` §"Envelope-reliability events" + line-448 scope clarification, `spec/v1/observability.md` §"Envelope-reliability events (RFC 0032)") | `envelope-retry-attempted.test.ts`, `envelope-retry-exhausted.test.ts`, `envelope-refusal-shape.test.ts`, `envelope-truncated.test.ts`, `envelope-nl-to-format-engaged.test.ts`, `envelope-recovery-applied.test.ts` | B (1 shared advertisement-shape probe with MUST-events enforcement; 34 live behavioral assertions across the six events, all capability- + fixture-gated) | RFC 0032 promoted Draft → Active 2026-05-20. Carries the central `ai-envelope.md` line-448 scope clarification (per-kind routing events forbidden; cross-kind operational events permitted via RFC). `envelope-retry-attempted` carries the shared advertisement-shape probe: when `capabilities.envelopes.reliability.supported: true`, the host MUST list both `envelope.retry.exhausted` AND `envelope.refusal` in `events[]` (the two MUST-tier events per RFC 0032 §C); `maxRetryAttempts` MUST be in `[1, 16]`. The six scenarios collectively carry 34 live behavioral assertions (drained 2026-05-19 via the conformance `mock` provider + `POST /v1/host/sample/test/mock-ai/program` seam): retry on schema-violation + retry on truncation + retry-exhausted terminal failure + provider refusal (no-retry MUST per RFC 0032 §B.3 + RFC 0033 §D) + truncation cut-off + NL-to-Format escalation (Tam et al. mitigation per arXiv 2408.02442) + lenient-parsing recovery + SECURITY invariants `envelope-refusal-no-prompt-leak` (BYOK + prompt-content redaction on `refusalText`) and `envelope-recovery-no-content-leak` (no pre-recovery substrings in the event payload). Path to `Accepted`: reference host implements `executor/envelopeReliability.ts` end-to-end + advertises `capabilities.envelopes.reliability: { supported: true, events: [...], maxRetryAttempts: <n> }` (the behavioral assertions already pass against the reference host's end-to-end emission path under `OPENWOP_ENVELOPE_RELIABILITY_END_TO_END=true`; the no-flag default still soft-skips). |
 | Envelope-completion retry routing (RFC 0033 — `spec/v1/ai-envelope.md` §"Envelope-completion criteria", `spec/v1/observability.md` §"Envelope-completion retry routing (RFC 0033)") | `envelope-completion-distinguishes-truncation.test.ts`, `envelope-truncation-cap-exhaustion.test.ts` | B− (1 advertisement-shape probe on `completion.{distinguishesTruncation, truncationBudgetMultiplier}`; 9 live behavioral assertions across the two retry paths + the DoS-bound assertion) | RFC 0033 promoted Draft → Active 2026-05-20. Closes `spec/v1/ai-envelope.md` §"Open spec gaps" E5 (refusal-mode + retry-policy interaction). Reuses RFC 0032's event vocabulary; introduces NO new event types. `envelope-completion-distinguishes-truncation` (capability-gated on `completion.distinguishesTruncation: true`) carries 5 live behavioral assertions covering both retry paths — truncation MUST increase output budget (RECOMMENDED 2× per `truncationBudgetMultiplier`) WITHOUT a corrective fragment; schema-violation MUST add a corrective fragment WITHOUT a budget change. `envelope-truncation-cap-exhaustion` carries 4 live behavioral assertions covering the DoS-bound assertion (truncation retries count against `Capabilities.limits.schemaRounds`; exhaustion → `envelope.retry.exhausted { finalReason: "truncation" }` + `cap.breached { kind: "schema" }` + node fails with NEW error code `envelope_truncation_unrecoverable` per RFC 0033 §F). All 9 assertions are fixture- + capability-gated against the conformance `mock` provider via `POST /v1/host/sample/test/mock-ai/program`. Path to `Accepted`: reference host implements the truncation-vs-schema-violation retry-routing branch end-to-end (`executor/envelopeReliability.ts` + `stop_reason` inspection in `aiProviders/aiProvidersHost.ts`) + advertises `capabilities.envelopes.reliability.completion.distinguishesTruncation: true`. |
-| Multi-agent execution model + handoff state machine (RFC 0037 Phase 1 — `spec/v1/multi-agent-execution.md`) | `multi-agent-handoff-state-machine.test.ts` | B (1 advertisement-shape probe + 1 behavioral 4-event causation-chain assertion against the parent+child fixture pair) | RFC 0037 Phase 1 filed Draft → promoted Active 2026-05-21 after spec + schema + scenario landed atomically. Advertisement-shape probe asserts `capabilities.multiAgent.executionModel.{supported, version ∈ [1,4]}` when present. Behavioral assertion drives the `conformance-multi-agent-handoff` parent + `conformance-multi-agent-handoff-child` fixture pair: runs the supervisor → next-worker → child completed loop and asserts the 4 `core.workflowChain.event` records appear in the exact phase sequence `dispatch.began → dispatch.succeeded → child.completed → output.harvested` with each event's `causationId === prior.eventId` and `dispatch.began.causationId === runOrchestrator.decided.eventId`, plus `output.harvested.harvestedKeys === ['parentResult']` (proves the spec §"Transition events" table on real wire). Reference workflow-engine advertises + emits end-to-end when `OPENWOP_MULTI_AGENT_EXECUTION_MODEL=true`; the no-flag default soft-skips honestly. Path to `Accepted`: non-steward host advertises + the behavioral assertion passes against it. |
-| Multi-agent Phase 2 confidence-floor escalation (RFC 0039 — `spec/v1/multi-agent-execution.md` §"Confidence escalation") | `multi-agent-confidence-escalation.test.ts` | B (1 advertisement-shape probe on `confidenceEscalationFloor` + 1 behavioral assertion against the low-confidence fixture) | RFC 0039 Phase 2 filed Draft → promoted Active 2026-05-22 after the confidence-floor half landed end-to-end. Advertisement-shape probe asserts `capabilities.multiAgent.executionModel.confidenceEscalationFloor` (when present) is a number in `[0.5, 1.0]`; values below the spec floor are non-conformant. Behavioral assertion drives the `conformance-multi-agent-confidence-escalation` fixture (supervisor `mockDispatchPlan` carries one decision with `confidence: 0.3`) and asserts: parent reaches `waiting-clarification` (NOT `completed` because no dispatch fired); exactly ONE `core.workflowChain.confidence-escalated` event with `payload.confidence === 0.3`, `payload.floor ∈ [0.5, 1.0]`, `payload.escalationKind ∈ {clarify, escalate}`; causationId chains back to the `runOrchestrator.decided` event; ZERO `core.workflowChain.event` records (the load-bearing distinction from Phase 1 — confidence floor MUST fire BEFORE any dispatch.began). Reference workflow-engine advertises `version: 2` + `confidenceEscalationFloor: 0.5` when both `OPENWOP_MULTI_AGENT_EXECUTION_MODEL=true` AND `OPENWOP_MULTI_AGENT_EXECUTION_MODEL_PHASE_2=true` are set; floor tunable via `OPENWOP_MULTI_AGENT_CONFIDENCE_FLOOR`. Path to `Accepted`: non-steward host advertises `version: 2` + the behavioral assertion passes against it. Memory-lifecycle half of RFC 0039 (MAE-2/3) remains explicit follow-up: `crossChildMemoryConcurrency` capability field is schema-landed but the host's MemoryAdapter doesn't yet implement either contract. |
+| Multi-agent execution model + handoff state machine (RFC 0037 — `spec/v1/multi-agent-execution.md`, `version: 1`) | `multi-agent-handoff-state-machine.test.ts` | B (1 advertisement-shape probe + 1 behavioral 4-event causation-chain assertion against the parent+child fixture pair) | RFC 0037 filed Draft → promoted Active 2026-05-21 after spec + schema + scenario landed atomically. Advertisement-shape probe asserts `capabilities.multiAgent.executionModel.{supported, version ∈ [1,4]}` when present. Behavioral assertion drives the `conformance-multi-agent-handoff` parent + `conformance-multi-agent-handoff-child` fixture pair: runs the supervisor → next-worker → child completed loop and asserts the 4 `core.workflowChain.event` records appear in the exact phase sequence `dispatch.began → dispatch.succeeded → child.completed → output.harvested` with each event's `causationId === prior.eventId` and `dispatch.began.causationId === runOrchestrator.decided.eventId`, plus `output.harvested.harvestedKeys === ['parentResult']` (proves the spec §"Transition events" table on real wire). Reference workflow-engine advertises + emits end-to-end when `OPENWOP_MULTI_AGENT_EXECUTION_MODEL=true`; the no-flag default soft-skips honestly. Path to `Accepted`: non-steward host advertises + the behavioral assertion passes against it. |
+| Multi-agent confidence-floor escalation (RFC 0039 — `spec/v1/multi-agent-execution.md` §"Confidence escalation", `version: 2`) | `multi-agent-confidence-escalation.test.ts` | B (1 advertisement-shape probe on `confidenceEscalationFloor` + 1 behavioral assertion against the low-confidence fixture) | RFC 0039 filed Draft → promoted Active 2026-05-22 after the confidence-floor half landed end-to-end. Advertisement-shape probe asserts `capabilities.multiAgent.executionModel.confidenceEscalationFloor` (when present) is a number in `[0.5, 1.0]`; values below the spec floor are non-conformant. Behavioral assertion drives the `conformance-multi-agent-confidence-escalation` fixture (supervisor `mockDispatchPlan` carries one decision with `confidence: 0.3`) and asserts: parent reaches `waiting-clarification` (NOT `completed` because no dispatch fired); exactly ONE `core.workflowChain.confidence-escalated` event with `payload.confidence === 0.3`, `payload.floor ∈ [0.5, 1.0]`, `payload.escalationKind ∈ {clarify, escalate}`; causationId chains back to the `runOrchestrator.decided` event; ZERO `core.workflowChain.event` records (the load-bearing distinction from `version: 1` — confidence floor MUST fire BEFORE any dispatch.began). Reference workflow-engine advertises `version: 2` + `confidenceEscalationFloor: 0.5` when both `OPENWOP_MULTI_AGENT_EXECUTION_MODEL=true` AND `OPENWOP_MULTI_AGENT_EXECUTION_MODEL_PHASE_2=true` are set; floor tunable via `OPENWOP_MULTI_AGENT_CONFIDENCE_FLOOR`. Path to `Accepted`: non-steward host advertises `version: 2` + the behavioral assertion passes against it. Memory-lifecycle half of RFC 0039 (MAE-2/3) remains explicit follow-up: `crossChildMemoryConcurrency` capability field is schema-landed but the host's MemoryAdapter doesn't yet implement either contract. |
 | Sandbox execution contract (RFC 0035 — `spec/v1/host-capabilities.md` §"Sandbox execution contract") | `sandbox-no-host-fs-escape.test.ts`, `sandbox-no-host-env-leak.test.ts`, `sandbox-no-network-escape.test.ts`, `sandbox-no-host-process-escape.test.ts`, `sandbox-memory-cap.test.ts`, `sandbox-timeout-cap.test.ts`, `sandbox-capability-gate-respected.test.ts`, `sandbox-no-cross-pack-mutation.test.ts` | C+ (advertisement-shape probes always-on; 8 capability-gated behavioral stubs scaffolded; soft-skip on hosts that don't advertise `capabilities.sandbox.supported`) | RFC 0035 promoted Draft → Active 2026-05-21. 8 scenarios, one per `node-pack-sandbox-*` invariant in `SECURITY/invariants.yaml`. Behavioral assertions remain stubbed with `expect(true).toBe(true)` + docstring expected-wire-shape pending the synthetic `vendor.openwop.misbehaving-sandbox` pack + a first sandbox-executing reference host. Path to `Accepted`: first sandbox-executing host advertises + implements the 8 failure-mode invariants + the 8 scenarios pass; at that point the 8 `node-pack-sandbox-*` SECURITY rows graduate from `reference-impl` → `protocol` tier per RFC 0035 §"Acceptance criteria." |
-| Multi-region idempotency + cross-engine append-ordering (RFC 0036 — `spec/v1/idempotency.md` §"`multiRegion` sub-block", `spec/v1/replay.md` §"Cross-region replay") | `multi-region-idempotency.test.ts`, `cross-engine-append-ordering.test.ts` | C+ (2 categorical-shape probes always-on + 1 granular `multiRegion` shape probe + 1 `crossEngineOrdering` shape probe; behavioral assertions deferred to simulator landing per RFC 0036 §C) | RFC 0036 promoted Draft → Active 2026-05-21. The existing `multi-region-idempotency.test.ts` covers the categorical `capabilities.idempotency.crossRegion ∈ {single-region, best-effort, strict}` claim plus the matching operator-tier metric names; a third describe block added 2026-05-21 covers the granular `capabilities.idempotency.multiRegion.{supported, replicationLagBoundMs, partitionRecoveryStrategy}` advertisement shape (`replicationLagBoundMs ∈ [0, 60000]`; `partitionRecoveryStrategy ∈ {last-writer-wins, first-writer-wins}` OR `^x-host-<host>-<key>$`). NEW `cross-engine-append-ordering.test.ts` covers `capabilities.eventLog.crossEngineOrdering.{supported, orderingModel ∈ {lamport, vector-clock, global-sequencer}}` shape. Behavioral two-engine-append-then-cross-read assertion deferred until the Postgres reference host's multi-region simulator lands per RFC 0036 §C. Path to `Accepted`: simulator + behavioral conformance pass against the reference host; non-steward host advertises the same. |
+| Multi-region idempotency + cross-engine append-ordering (RFC 0036 — `spec/v1/idempotency.md` §"`multiRegion` sub-block", `spec/v1/replay.md` §"Cross-region replay") | `multi-region-idempotency.test.ts`, `cross-engine-append-ordering.test.ts`, **`multi-region-idempotency-behavior.test.ts` (2026-05-22)**, **`cross-engine-append-behavior.test.ts` (2026-05-22)** | A (2 categorical-shape probes always-on + 1 granular `multiRegion` shape probe + 1 `crossEngineOrdering` shape probe + 6 multi-region behavioral assertions + 4 cross-engine Lamport-ordering behavioral assertions; all 10 behavioral assertions PASS against the reference workflow-engine when `OPENWOP_TEST_MULTI_REGION_SIMULATOR=true` + `OPENWOP_TEST_CROSS_ENGINE_HARNESS=true` are set) | RFC 0036 §B + §C behavioral close-out landed 2026-05-22 via the new workflow-engine test seams (`POST /v1/host/sample/test/multi-region/simulate-partition` + `POST/GET /v1/host/sample/test/cross-engine/{append,read,reset}`) — see `spec/v1/host-sample-test-seams.md` §6 + §7. The new `multi-region-idempotency-behavior.test.ts` exercises the canonical lex-min convergence rule + order-invariance + 400-on-mismatch; the new `cross-engine-append-behavior.test.ts` exercises Lamport-clock monotonicity + per-engine order preservation + read-determinism. Path to `Accepted`: non-steward host advertises matching capabilities + the behavioral assertions pass against it. |
+| Secret-leakage telemetry / debug-bundle export (RFC 0034 §B — `spec/v1/host-capabilities.md` §"OTel collector test seam") | **`secret-leakage-otel-attribute.test.ts` (2026-05-22)** | A− (3 capability-gated probes — OTel span scrape + debug-bundle scrape + advertisement-shape; soft-skips honestly until host advertises `capabilities.observability.testSeams.{otelScrape, debugBundleExport}` AND `capabilities.secrets.supported` AND `OPENWOP_CANARY_SECRET_VALUE` env is set) | Broadens the existing protocol-tier `secret-leakage-otel-attribute` + `secret-leakage-debug-bundle-otel` SECURITY invariants from envelope-acceptor-narrow (already covered by `envelope-reasoning-secret-redaction.test.ts`) to executor-side-broad. Drives the existing `openwop-smoke-byok-roundtrip` fixture; scrapes both seams after run completion; hard-fails if the BYOK canary plaintext appears in any OTel span attribute or debug-bundle field. |
+| Experimental capability tier (RFC 0042 — `schemas/capabilities.schema.json` §`multiAgent.executionModel.tier`) | **`experimental-tier-shape.test.ts` (2026-05-22)** | A (6 server-free + helper-routing assertions across §A schema discipline + §D experimentalGate routing; always-on for hosts that advertise tier='experimental' on any capability sub-block; helper-level behavioral probes for the `experimentalGate()` routing under both default + OPENWOP_REQUIRE_EXPERIMENTAL modes) | RFC 0042 (Draft) lands the audit's "Active RFC → carve-out" pattern. Schema diff lands on `multiAgent.executionModel` with optional `tier ∈ {stable, experimental}` + `experimentalUntil` (ISO-8601 sunset) + `if/then` conditional enforcing §B sunset MUST mechanically. New `experimentalGate()` helper in `conformance/src/lib/behavior-gate.ts` routes scenarios under default mode + `OPENWOP_REQUIRE_EXPERIMENTAL=true` strict-mode. |
+| Sandbox MVP behavioral close-out (RFC 0035 §B) | **`sandbox-mvp-behavior.test.ts` (2026-05-22)** | A (10 capability-gated behavioral assertions covering 7 of 8 §B failure-mode invariants — 5 escape kinds + timeout + memory-exceeded + cross-pack-mutation isolation + capability-gate-violation + 2 well-behaved baselines; all 10 PASS against the workflow-engine's node:vm-based sandbox MVP) | Companion to the existing 8 advertisement-shape sandbox scenarios (`sandbox-no-host-fs-escape.test.ts` et al.). Exercises the canonical 4-code error catalog at `spec/v1/host-capabilities.md` §"Error codes" (`sandbox_escape_attempt` + `sandbox_capability_denied` + `sandbox_memory_exceeded` + `sandbox_timeout`) with spec-mandated `details.{escapeKind, requestedCapability, requestedBytes}` populated. Wire-shape per `spec/v1/host-sample-test-seams.md §8`. Production adopters use wasmtime/nsjail behind the same HTTP test-seam contract. |
+| RFC 0041 §B replay-divergence-at-refusal behavioral (`version: 4`) | `replay-divergence-at-refusal.test.ts` (advertisement-shape + behavioral; 3 assertions PASS against workflow-engine when the `multiAgent.executionModel.version: 4` advertisement is enabled) | A (was `it.todo` until 2026-05-23 when the executor wiring landed — see commit `1fce55a` + `bba3b4a`. Behavioral assertions cover both divergence directions: original=valid + replay=refusal AND original=refusal + replay=valid) | Closes Track #4 of the 2026-05-22 multi-agent behavioral-harness close-out. Reference workflow-engine emits `replay.divergedAtRefusal` event + fails run with `error.code: 'replay_diverged_at_refusal'` when source vs replay envelope kinds differ at the same nodeId. Gated on `OPENWOP_MULTI_AGENT_EXECUTION_MODEL_PHASE_4=true` AND `run.forkMode === 'replay'`. Path-to-Accepted for RFC 0041: non-steward host advertises `multiAgent.executionModel.version: 4` end-to-end. |
 ---
@@ -71,6 +75,23 @@ Twenty-two scenario groups validate optional profiles where the host's discovery
 | `replay-retention-expiry.test.ts` | `openwop-replay-fork` (`replay.md` §"Retention and garbage collection") | B (capability shape always; 410/422 envelope on expired-range fork gated on `OPENWOP_TEST_EXPIRED_REPLAY_RUN_ID`; details.{sourceRunId, fromSeq, retentionBoundary} soft-checks per spec SHOULD) | `host-pending` | Reference host advertises `replay.supported: true` + operator produces a known-expired run id (no standardized force-expire endpoint per RFC 0009 Q#1). |
 | `discovery.test.ts` — auth-scoped subtests (3 of them) | `openwop-discovery-auth-scoped` (`capabilities-change-detection.md` §"Scoped capability views", RFC 0011) | B (capability shape + mode/endpointPath typing always; required-field-preservation in authenticated view always; authorization-oracle probe gated on `OPENWOP_TEST_UNAUTHORIZED_API_KEY`) | `host-pending` | Reference host advertises `capabilities.discovery.authScoped.supported: true` + serves an authenticated capability view that satisfies the base schema + a tenant-scoped key pair for the oracle probe. |
 | `fs-path-traversal.test.ts` | `capabilities.fs` (RFC 0014, `host-fs-capability.md`) | A (advertisement shape + two path-escape probes asserting `path_outside_sandbox`) | host-pass (workflow-engine reference) | Reference host advertises `capabilities.fs.supported: true` with sandboxRoot under `<dataDir>/host-fs`. |
+| `credentials-capability-shape.test.ts` | `capabilities.credentials` (RFC 0046, `host-capabilities.md` §host.credentials) | A (advertisement shape always — `supported` boolean; `scopes` ⊆ {user,workspace,tenant}; `rotation` ∈ {none,two-key-overlap}) | `host-pending` | Always runs; asserts the block is absent or well-formed. No host advertises `capabilities.credentials` yet (RFC 0046 `Draft`). |
+| `credential-payload-redaction.test.ts` | `capabilities.credentials` (RFC 0046) + `SECURITY/invariants.yaml` `credential-payload-redaction` | A (advertisement shape always; redaction MUST-NOT via optional `POST /v1/host/sample/credentials/echo` seam — canary plaintext absent from all observable surfaces) | `host-pending` | Capability-gated on `credentials.supported`; behavioral probe soft-skips on 404 when the seam is unwired, mirroring `fs-path-traversal`. |
+| `oauth-capability-shape.test.ts` | `capabilities.oauth` (RFC 0047, `host-capabilities.md` §host.oauth) | A (advertisement shape always — `supported` boolean; `grants` ⊆ {authorization_code,client_credentials,refresh_token}; every `providers[].id` non-empty) | `host-pending` | Always runs; asserts the block is absent or well-formed. No host advertises `capabilities.oauth` yet (RFC 0047 `Draft`). |
+| `oauth-connector-redaction.test.ts` | `capabilities.oauth` (RFC 0047) + `SECURITY/invariants.yaml` `credential-payload-redaction` | A (advertisement shape always; token-material redaction via optional `POST /v1/host/sample/oauth/connector-echo` seam — canary token absent from all observable surfaces; `connector.authorized` carries the ref not the token) | `host-pending` | Capability-gated on `oauth.supported`; behavioral probe soft-skips on 404. Reuses the RFC 0046 redaction invariant (OAuth tokens are stored as host.credentials entries). |
+| `connector-manifest-validity.test.ts` | `node-pack-manifest.schema.json` §Connector (RFC 0045, `node-packs.md` §Connectors) | Server-free (schema validity of the `connector` block incl. both ConnectorAuth variants + positive/negative round-trip; §B action/trigger typeId-resolution semantics — `connector_action_unresolved` on an unknown typeId) | host-pass (server-free) | Always runs; no host needed. Behavioral idempotency-hint + rate-limit-honored scenarios deferred until a host advertises a connector. |
+| `identity-owner-shape.test.ts` | `run-snapshot.schema.json` properties.owner (RFC 0048 §C, `auth.md` §Identity claims) | Server-free (owner triple schema validity: positive `{tenant}` + full triple; negative missing-tenant + unknown-prop) | host-pass (server-free) | Always runs; no host needed. |
+| `cross-workspace-isolation.test.ts` | RFC 0048 §C/§D (`auth.md` §Identity claims, `rest-endpoints.md` `run_forbidden`) | A (owner-echo shape if a sample run is readable; §D isolation MUST-NOT via optional `POST /v1/host/sample/identity/cross-workspace-read` seam — cross-workspace read fails closed with `run_forbidden`/`not_found`) | `host-pending` | Behavioral probe soft-skips on 404; no host advertises run ownership yet (RFC 0048 `Draft`). |
+| `authorization-roles-shape.test.ts` | `capabilities.authorization` (RFC 0049 §A, `auth.md` §"Role-based authorization") | A (advertisement shape always — `supported` boolean; `failClosed` const true; every `roles[].role` non-empty + `scopes` array) | `host-pending` | Always runs; asserts the block is absent or well-formed. |
+| `authorization-fail-closed.test.ts` | `capabilities.authorization` (RFC 0049 §C) + `SECURITY/invariants.yaml` `authorization-fail-closed` | A (advertisement `failClosed===true` always; fail-closed MUST-NOT via optional `POST /v1/host/sample/authorization/decide` seam — an unseeded-role principal resolves `allowed:false`) | `host-pending` | Capability-gated on `authorization.supported`; behavioral probe soft-skips on 404. Scope-match + denial-audited scenarios deferred to a host. |
+| `auth-saml-profile.test.ts` | `openwop-auth-saml` (RFC 0050, `auth-profiles.md` §`openwop-auth-saml`) | A+B (profile-advertisement shape always; **1-positive + 6-negative reference suite runs server-free** via the bundled synthetic IdP `conformance/src/lib/saml-idp.ts` — `alg:none`/unsigned/bad-sig/expired/not-yet-valid/wrapping; host-ACS validation opt-in via `OPENWOP_TEST_SAML_IDP_URL` + the `auth/saml/validate` seam) | host-pass (server-free reference) | Synthetic IdP bundled (`node:crypto`, no deps). Host-ACS pass is the remaining graduation gate. |
+| `auth-scim-profile.test.ts` | `openwop-auth-scim` (RFC 0050, `auth-profiles.md` §`openwop-auth-scim`) | B (profile-advertisement shape always; SCIM user/group provisioning → principal/role roundtrip opt-in via `OPENWOP_TEST_SCIM_URL` + the `auth/scim/provision` seam) | `host-pending` | Behavior opt-in (operator-supplied SCIM endpoint); deactivate ⇒ subsequent-deny assertion deferred to a host. |
+| `approval-gate-events.test.ts` | `approval.granted` / `.rejected` / `.overridden` (RFC 0051 §B, `interrupt-profiles.md` §approvalGate) | Server-free (event-payload schema validity: required fields incl. mandatory `overridden.reason`; additionalProperties:false negatives) | host-pass (server-free) | Always runs; no host needed. |
+| `approval-gate-flow.test.ts` | `core.openwop.governance.approvalGate` (RFC 0051 §A) + `capabilities.authorization` (RFC 0049) | A (capability-gated on `authorization.supported`; unauthorized-principal-denied + override-audited via the `governance/approval-gate` seam) | `host-pending` | Behavioral probe soft-skips on 404. Grant/reject-loopback/quorum scenarios deferred until a governance host wires the seam. |
+| `scheduling-capability-shape.test.ts` | `capabilities.scheduling` (RFC 0052 §A, `host-capabilities.md` §host.scheduling) | A (advertisement shape always — `supported` boolean; `cron`/`delayed`/`calendar` booleans; `maxFutureHorizon` ISO-8601 duration) | `host-pending` | Always runs; asserts the block is absent or well-formed. |
+| `scheduling-cron-fires-once.test.ts` | `capabilities.scheduling` (RFC 0052 §B) | A (once-per-tick + missed-tick MUST-NOT via optional `POST /v1/host/sample/scheduling/tick` seam — single tick fires exactly one run; missed window never floods) | `host-pending` | Capability-gated on `scheduling.supported` + `cron`; soft-skips on 404. Delayed-horizon + calendar scenarios deferred. |
+| `deadletter-capability-shape.test.ts` | `capabilities.deadLetter` (RFC 0053 §A, `host-capabilities.md` §host.deadLetter) | A (advertisement shape always — `supported` boolean; `retentionDays` integer ≥ 1) | `host-pending` | Always runs; asserts the block is absent or well-formed. |
+| `deadletter-retry-exhaustion.test.ts` | `capabilities.deadLetter` (RFC 0053 §C) + `run.dead_lettered` event | A (retry-exhaustion → `run.dead_lettered` with `attempts` + dead-lettered run fork-eligible, via optional `POST /v1/host/sample/deadletter/exhaust` seam) | `host-pending` | Capability-gated on `deadLetter.supported`; soft-skips on 404. Retention-purge scenario deferred (needs clock seam). |
 | `kv-cross-tenant-isolation.test.ts`, `kv-atomic-increment.test.ts`, `kv-cas.test.ts` (three scenarios) | `capabilities.kvStorage` (RFC 0015, `host-kv-storage-capability.md`) + `SECURITY/invariants.yaml` `kv-cross-tenant-isolation` | A (advertisement shape always; behavioral cross-tenant `set`/`get`, 50× concurrent atomic increment convergence, CAS matching/stale-expect) | host-pass via opt-in test seam | Reference host exposes `POST /v1/host/sample/test/surface` env-gated on `OPENWOP_TEST_SEAM_ENABLED=true`; hosts that don't expose the seam soft-skip the behavioral assertions and verify advertisement shape only. |
 | `table-cross-tenant-isolation.test.ts` | `capabilities.tableStorage` (RFC 0016, `host-table-storage-capability.md`) | A (advertisement shape + behavioral cross-tenant insert/query proof) | host-pass via opt-in test seam | Same seam dependency as kv row. |
 | `queue-cross-tenant-isolation.test.ts` | `capabilities.queueBus` (RFC 0017, `host-queue-bus-capability.md`) + `SECURITY/invariants.yaml` `queue-cross-tenant-isolation` | A (advertisement shape + behavioral cross-tenant publish/consume proof) | host-pass via opt-in test seam | Same seam dependency as kv row. |
@@ -80,6 +101,8 @@ Twenty-two scenario groups validate optional profiles where the host's discovery
 | `prompt-end-to-end-events.test.ts`, `prompt-resolution-chain-node-wins.test.ts`, `prompt-resolution-chain-fallback-cascade.test.ts` (three scenarios) | `prompts-supported` profile — gates on `capabilities.prompts.supported: true` (RFC 0027 + RFC 0029, `spec/v1/prompts.md`) | A (advertisement shape always + end-to-end resolve + emit during real workflow dispatch; resolution chain Layers 1, 3, 4 exercised) | host-pass (workflow-engine reference) | Reference host advertises `capabilities.prompts.supported: true` since RFC 0027 ref-impl landed; dispatch wiring in `bootstrap/nodes.ts` walks the resolution chain and emits `agent.promptResolved` + `prompt.composed` per spec/v1/prompts.md §"Composition + observability". |
 | `prompt-pack-install.test.ts`, `prompt-list-and-fetch.test.ts`, `prompt-render-deterministic.test.ts` (three scenarios) | `prompts-endpoints` profile — gates on `capabilities.prompts.endpointsSupported: true` (RFC 0028 §A, `spec/v1/prompts.md` §"Discovery & distribution") | A (advertisement shape always + list/get/render contract + pack-source provenance stamps + ETag honoring when supported) | host-pass (workflow-engine reference) | Reference host serves the six `/v1/prompts*` routes via `routes/prompts.ts` against the in-memory `PromptStore`. Pack-install existence claim opt-in via `OPENWOP_TEST_PROMPT_PACK_INSTALLED=true` (the in-tree `vendor.openwop.prompt-sample` pack auto-installs via `promptPackLoader.ts`). |
 | `prompt-mutable-lifecycle.test.ts` | `prompts-mutable` profile — gates on `capabilities.prompts.mutableLibrary: true` (RFC 0028 §C) | A (advertisement shape + CRUD lifecycle + pack/host source 403-on-mutation) | host-pass (workflow-engine reference) | Reference host advertises `mutableLibrary: true`; user-source templates accepted, pack + host-built-in templates return 403 on POST/PUT/DELETE. |
+| `prompt-mutation-workspace-membership-enforced.test.ts` | `prompts-mutable` profile — gates on `capabilities.prompts.mutableLibrary: true` (RFC 0028 Tier-2 follow-up, post-promotion) + `SECURITY/invariants.yaml` `prompt-mutation-workspace-membership-enforced` | A (advertisement gate + cross-workspace write refusal — drives `POST /v1/prompts` with a random non-member `workspaceId`, asserts any 4xx/5xx; on 403 specifically, additionally pins canonical `error === "workspace_membership_required"` envelope per `rest-endpoints.md` §"Common error codes"; other refusal codes unconstrained) | capability-gated (no reference-host membership backend yet; soft-skips cleanly until a host wires the workspace-member resolver) | Filed 2026-05-25 in response to a MyndHyve self-disclosed Admin-SDK-bypasses-DB-rules vulnerability on revision `00207-vzq`. T1 canonicalization same-day (2026-05-25) added the 403-envelope check. Operator override via `OPENWOP_TEST_NONMEMBER_WORKSPACE_ID`. |
+| `prompt-read-workspace-membership-enforced.test.ts` | `prompts-supported` profile — gates on `capabilities.prompts.supported: true` (broader than `mutableLibrary` per MyndHyve relay Option B: read-only hosts that expose `?workspaceId=` reads are NOT exempt from the symmetric authz invariant) + `SECURITY/invariants.yaml` `prompt-read-workspace-membership-enforced` | A (advertisement gate + cross-workspace read refusal — drives `GET /v1/prompts?workspaceId=<random-non-member>`, interprets response: 4xx PASS with canonical envelope check on 403; 200 with empty `templates[]` PASS as correct null result; 200 with non-empty `templates[]` FAIL as cross-tenant leak; 200 without `templates[]` field SKIP via response-shape detection — host doesn't expose workspace-scoped reads) | capability-gated (no reference-host workspace-scoped read backend yet; soft-skips cleanly on the response-shape detection) | T2 sister scenario filed 2026-05-25 alongside T1; same threat model as the write scenario but probes the read path. Read paths are NOT exempt from cross-tenant authz — a `GET ?workspaceId=<not-mine>` that returns another workspace's templates is a data leak with the same blast radius as a cross-tenant write. Uses response-shape detection (rather than a new capability field) to self-skip hosts without workspace-scoped reads. Operator override via `OPENWOP_TEST_NONMEMBER_WORKSPACE_ID`. |
 | `prompt-resolution-chain-agent-intrinsic.test.ts` | `prompts-agent-bindings` profile — gates on `capabilities.prompts.agentBindings: true` (RFC 0029 §A Layer 2) | A (advertisement shape + Layer 2 agent intrinsic / overrides / library-default precedence over Layers 3-4) | host-pass (workflow-engine reference) | Reference host advertises `agentBindings: true` so Layer 2 sub-layers (agent-intrinsic / agent-overrides / agent-library-default) walk per RFC 0029 §B. |
 | `prompt-composed-secret-redaction.test.ts`, `prompt-composed-trust-marker.test.ts` (two scenarios) | `prompts-observability-full` profile — gates on `prompts.supported + observability: "full"` (RFC 0027 §E + RFC 0020 §D) + `SECURITY/invariants.yaml` `prompt-composed-secret-redaction` + `prompt-composed-trust-marker` | A (advertisement shape + `[REDACTED:<credentialRef>]` markers for secret-source bindings + `<UNTRUSTED>...</UNTRUSTED>` wrapping + `contentTrust: "untrusted"` propagation) | host-pass (workflow-engine reference) | Reference host advertises `observability: "full"` (sourced from `host/promptHostConfig.ts`). Composition pipeline in `host/promptCompose.ts` enforces SR-1 carry-forward + untrusted-content marker per `SECURITY/threat-model-secret-leakage.md` §SR-1. |
@@ -118,6 +141,7 @@ Every OpenAPI operation should have:
 | `pauseRun` | `pause-resume.test.ts` covers direct route behavior for running → paused, idempotent re-pause, terminal conflict, and pause-during-suspend race | Conflict and race paths covered with `details.runStatus`; endpoint is no longer coverage-missing | Add explicit immediate-vs-drain-current-node policy assertion when a host advertises both drain policies. |
 | `resumeRun` | `pause-resume.test.ts` covers direct route behavior for paused → running and non-paused conflict | Conflict path covered with `details.runStatus`; endpoint is no longer coverage-missing | Good. |
 | `forkRun` | `replay-fork.test.ts`, `replayDeterminism.test.ts` | Negative `fromSeq`, past-end, unknown source, invalid overlay | Add arbitrary-event fork and retention-expired source. |
+| `diffRun` | `run-diff.test.ts` (RFC 0054); soft-skips on 404 when the endpoint is unimplemented | Self-diff `divergedAtSeq: null`/empty (determinism floor), two-fixture divergence with `eventDiffs[0].seq === divergedAtSeq`, response-shape + `stateDiff` redaction-safety, `400` (missing `against`) + `404` (nonexistent `against`) | Add a bespoke deterministically-divergent fork fixture for `divergedAtSeq === N`-at-a-chosen-seq; full cross-principal `403` needs a multi-principal harness. |
 | `resolveInterruptByRun` | `interrupt-approval.test.ts`, `interrupt-clarification.test.ts`, `approval-payload.test.ts`, `interruptRace.test.ts` | Invalid action, unknown node, race cases | Add auth-required and quorum profile scenarios. |
 | `inspectInterruptByToken` | `interrupt-token-matrix.test.ts` (CF-3, 2026-05-15) covers malformed + unknown token paths | Negative paths covered | Add explicit expired-token case when a host advertises a TTL seam. |
 | `resolveInterruptByToken` | `interrupt-token-matrix.test.ts` covers replay (already-resolved) + unknown token; `interrupt-external-event-correlation.test.ts` covers positive path | Replay path + unknown-token path covered with explicit assertions | Add wrong-action case once the host advertises a typed allowed-actions vocabulary in the interrupt manifest. |
@@ -130,6 +154,7 @@ Every OpenAPI operation should have:
 | `updatePromptTemplate` | `prompt-mutable-lifecycle.test.ts` covers positive update + non-monotonic-version conflict + pack-sourced-readonly 403 against the reference workflow-engine | Positive update + 403 readonly-source + 409 conflict covered | Add `501` not-mutable-library negative for hosts that advertise `mutableLibrary: false`. |
 | `deletePromptTemplate` | `prompt-mutable-lifecycle.test.ts` covers positive delete + pack-sourced-readonly 403 against the reference workflow-engine | Positive delete + 403 readonly-source covered | Add `501` not-mutable-library negative + `404` unknown-template scenarios. |
 | `renderPromptTemplate` | `prompt-render-deterministic.test.ts` exercises `POST /v1/prompts:render` end-to-end against the reference workflow-engine; deterministic-hash invariant verified across `:render` + `prompt.composed` event paths. `prompt-composed-secret-redaction.test.ts` + `prompt-composed-trust-marker.test.ts` exercise the shared compose pipeline via the `/v1/host/sample/prompt/compose` seam | Deterministic render + composition redaction + trust-marker invariants covered | Add `400 prompt_variable_unresolved` matrix for missing variables across all four PromptKinds. |
+| `putTestPackTarball`, `getTestPackTarball`, `deleteTestPackVersion`, `getTestPackSignature` | `pack-registry-publish.test.ts` covers the 19-code publish error catalog through the RFC 0025 `/v1/packs-test/*` mirror namespace, gated on `capabilities.packs.testMode.supported: true` (RFC 0025 §A). 26 scenarios soft-skip when the advertisement is absent; when present, the suite exercises URL/scope, body-shape, tarball-extraction, manifest-contents, integrity, auth/conflict, unpublish-window, and signature-endpoint pairing. | Soft-skip on advertisement absence; behavioral on advertisement presence | Add real-tarball-builder fixtures so the manifest_mismatch / pack_integrity_failure / unsupported_runtime branches assert against a meaningful gzip+tar payload (currently soft-skipped with explanatory comments). |
 ---