npm - @openwop/openwop-conformance - Versions diffs - 1.12.0 → 1.13.0 - Mend

@openwop/openwop-conformance 1.12.0 → 1.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/CHANGELOG.md +12 -0
package/README.md +2 -2
package/coverage.md +7 -5
package/package.json +1 -1
package/src/lib/agentDeployment.ts +117 -0
package/src/lib/agentEval.ts +83 -0
package/src/scenarios/agent-deployment-lifecycle.test.ts +147 -0
package/src/scenarios/agent-eval-run.test.ts +145 -0

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,18 @@
 _No unreleased changes._
+## [1.13.0] — 2026-05-31 — RFC 0081 eval-suite + RFC 0082 deployment behavioral gates
+Standalone conformance minor — a scenario addition published via the `openwop-conformance/v1.13.0` per-package tag (PUBLISHING.md §"CI automation"; only the `publish-conformance` job runs), NOT a coordinated spec-corpus release. `EXPECTED_CONFORMANCE_VERSION` advances to `1.13.0` in lockstep. All additive + capability-gated; existing v1.0-only hosts pass unchanged. Ships the two gated behavioral scenarios RFC 0081 + RFC 0082 each name in their §Conformance but deferred at `Draft → Active` — the steward prerequisite to graduating `agents.evalSuite` (RFC 0081) and `agents.deployment` (RFC 0082) from `Active` to `Accepted` on a non-steward host (MyndHyve). The normative surface (the `mode:"eval"` discriminator + `GET /v1/runs/{runId}/eval-summary`, the `GET`/`POST /v1/agents/{agentId}/deployments` endpoints + the `channel` binding, the `eval.*`/`deployment.*` events, and the SDK helpers) already shipped — this release is the gated test surface only.
+### Added — RFC 0081 behavioral scenario
+- **`agent-eval-run.test.ts`** (`behaviorGate('openwop-eval-run', …)`, gated on `agents.evalSuite.supported`) — drives the §B `mode:"eval"` projection via the new `POST /v1/host/sample/agents/eval-run` seam + the test event-log seam and asserts the §C ordering (`eval.started` first → one `eval.scored` per task → `eval.completed` once, count == `eval.completed.taskCount`), the content-free `eval.scored` legs (`score` ∈ 0..1, `passed` boolean, no task-output/rubric/completion — backing the protocol-tier `eval-summary-no-content-leak` invariant), and the NORMATIVE `GET /v1/runs/{runId}/eval-summary` returning a schema-valid `EvalSummary` with `passedCount <= taskCount`. The normative eval-summary read runs black-box. **This is the RFC 0081 → Accepted bar.** New lib helper `src/lib/agentEval.ts`; new seam in `host-sample-test-seams.md` §"Open seams". No new schemas (`agent-eval-suite.schema.json` + `eval-summary.schema.json` + the three `eval.*` payload $defs shipped at `Draft → Active`); no new SECURITY invariant.
+### Added — RFC 0082 behavioral scenario
+- **`agent-deployment-lifecycle.test.ts`** (`behaviorGate('openwop-deployment-lifecycle', …)`, gated on `agents.deployment.supported`) — drives the §E promotion contract via the new `POST /v1/host/sample/agents/deployment-transition` seam + the test event-log seam across four legs: `promote` (authorize RFC 0049 `deploy:promote` → RFC 0051 approvalGate → RFC 0081 eval-verify → a content-free `deployment.promoted` with a seven-state `toState` + `toVersion`, the returned record validating `agent-deployment.schema.json`); `unauthorized` (a principal without `deploy:promote` fails closed — `allowed:false`, NO `deployment.promoted`, the behavioral leg of the `deployment-promotion-fail-closed` invariant, which graduates reference-impl → protocol tier when this passes against a host); `eval-gate-unmet` (a promote whose `evalRunId` has `EvalSummary.passed:false` is denied `eval_gate_unmet` with NO `deployment.promoted`, §E-3); and `channel-pin` (a `@channel`-bound run records `resolvedAgentVersion` on `agent.invocation.started`, the §B recorded fact a replay re-reads). The normative `GET /v1/agents/{agentId}/deployments` read runs black-box. **This is the RFC 0082 → Accepted bar.** New lib helper `src/lib/agentDeployment.ts`; new seam in `host-sample-test-seams.md` §"Open seams". No new schemas (`agent-deployment.schema.json` + `agent-deployment-transition.schema.json` + the four `deployment.*` payload $defs shipped at `Draft → Active`); no new SECURITY invariant.
 ## [1.12.0] — 2026-05-31 — RFC 0087 org-chart + RFC 0083 trigger-bridge behavioral gates
 Standalone conformance minor — a scenario addition published via the `openwop-conformance/v1.12.0` per-package tag (PUBLISHING.md §"CI automation"; only the `publish-conformance` job runs), NOT a coordinated spec-corpus release. `EXPECTED_CONFORMANCE_VERSION` advances to `1.12.0` in lockstep. All additive + capability/profile/seam-gated; existing v1.0-only hosts pass unchanged. Ships the three gated behavioral scenarios RFC 0087 + RFC 0083 each name in their §Conformance but deferred at `Draft → Active` — the steward prerequisite to graduating `agents.orgChart` (RFC 0087) and `triggerBridge` (RFC 0083) from `Active` to `Accepted` on a non-steward host (MyndHyve).

package/README.md CHANGED Viewed

@@ -93,7 +93,7 @@ Exit code is non-zero on any failed assertion.
 ## What's Covered
-The current suite has 308 scenario files under `src/scenarios/`. 2026-05-31 (RFC 0083 — durable trigger bridge, the Active→Accepted behavioral gate) added `trigger-bridge-delivery.test.ts` (profile-gated on `openwop-trigger-bridge` derived from the live discovery doc — the §C delivery model via the `POST /v1/host/sample/trigger-bridge/deliver` seam + the test event-log seam: dedup→effectively-once `trigger.delivery.attempted{delivered}` (§C-1), retry-exhaustion→`{dead-lettered}` + `trigger.subscription.state.changed{toState:dead-lettered}` (§C-2 + RFC 0053), and the delivered run's `run.started.causationId` == the delivery id (§C / RFC 0040); both `trigger.*` events content-free; the always-on shape stays in `trigger-bridge-shape.test.ts`; new lib helper `src/lib/triggerBridge.ts`). 2026-05-31 (RFC 0087 — agent org-chart, the Active→Accepted behavioral gate) added two capability-gated behavioral scenarios (both gated on `agents.orgChart.supported`, black-box on the normative `/v1/agents/org-chart` surface — no new POST seam): `agent-org-chart-scoping.test.ts` (the `GET /v1/agents/org-chart` tree-shape — departments form an acyclic `parentDepartmentId` tree, members reference `host:<id>` roster entries — + the §D responsibility roll-up via `GET /v1/agents/org-chart/{departmentId}` with a deduped `responsibilities[]` union + the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ORG_CHART_DEPARTMENT_ID`) and `org-position-no-authority-escalation.test.ts` (the behavioral leg of the protocol-tier invariant — the live org-chart wire carries NO authority-bearing field on any member/department/responsibility-view object; the structural leg stays always-on in `agent-org-chart-shape.test.ts`, and the deeper RFC 0049/0051 authority-invariance legs stay reference-impl tier per the `agent-manifest-runtime` no-host-hook precedent). 2026-05-31 (RFCs 0086 + 0077 — the Active→Accepted behavioral gate) added four capability-gated behavioral scenarios so a non-steward host can be mechanically certified non-vacuously under `OPENWOP_REQUIRE_BEHAVIOR=true`: `agent-roster-attribution.test.ts` (RFC 0086 §B/§C; gated on `agents.roster.supported` — the normative `GET /v1/agents/roster` read shape + `total==roster.length`, the §C `roster.run.initiated`-before-`agent.invocation.started` ordering, the content-free payload backing `roster-attribution-no-content`, the durable work-item `triggerSubscriptionId`, and the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ROSTER_ID`), `agent-live-invocation-bracket.test.ts` (RFC 0077 §E; gated on `agents.liveRuntime.supported` — `agent.invocation.started`-first / `agent.invocation.completed`-last bracket, matching `invocationId`, `source`/`outcome` closed enums, content-free), `agent-live-structured-output.test.ts` (RFC 0077 §B step 6; gated on `agents.liveRuntime.structuredOutput` — a result violating `handoff.returnSchemaRef` fails the invocation `outcome:"failed"` rather than shipping as completed), and `agent-live-allowlist-enforced.test.ts` (RFC 0077 §F-1 / RFC 0002 §A14; gated on `agents.liveRuntime.supported` — a tool outside `toolAllowlist` is not callable); all four drive the documented `POST /v1/host/sample/roster/fire` + `POST /v1/host/sample/agents/live-invoke` seams plus the test event-log seam and soft-skip on 404 (these are the RFC 0086 / 0077 Active→Accepted bars). 2026-05-30 (RFC 0087 — agent org-chart, Draft -> Active) added `agent-org-chart-shape.test.ts` (always-on server-free: the `capabilities.agents.orgChart` shape + the `AgentOrgChart` round-trip + the non-`host:` member negative + the **§B structural non-authority guarantee** — the schema rejects a `scopes`/`canDispatch`/`permissions`/`authority` field on a member (`additionalProperties:false`), and a member's key set is exactly `{rosterId, departmentId, roleId, reportsTo}` — backing the protocol-tier `org-position-no-authority-escalation` invariant; no new RunEventType). 2026-05-30 (RFC 0086 — standing agent roster, Draft -> Active) added `agent-roster-shape.test.ts` (always-on server-free: the `capabilities.agents.roster` shape + the `AgentRosterEntry` round-trip + the `host:` `rosterId` + `agentRef` version-XOR-channel negatives + the content-free `roster.run.initiated` negatives backing the protocol-tier `roster-attribution-no-content` invariant + the additive `roster` inventory projection + RunEventType-enum membership). 2026-05-30 (RFC 0082 — agent deployment lifecycle, Draft -> Active) added `agent-deployment-shape.test.ts` (always-on server-free: the `capabilities.agents.deployment` shape + the `AgentDeployment` record round-trip + the `AgentRef` `channel` XOR `version` `not`-clause + the four `deployment.*` payloads + the content-free negatives backing the protocol-tier `deployment-event-no-content-leak` invariant). 2026-05-30 (RFC 0085 — `openwop-agent-platform` meta-profile, Draft -> Active) added `agent-platform-profile.test.ts` (always-on server-free derivation of the operational-annex `none`/`partial`/`full` status: all-floor ⇒ partial, missing-flag ⇒ none, the replay-OR-`nondeterminismPolicy.declared` term, floor+governance ⇒ full, missing-tenant-scope ⇒ partial-not-full per the honest-advertisement rule, eval/deploy/budget-are-advisory-not-hard-terms, + the `capabilities.nondeterminismPolicy.declared` shape). 2026-05-30 (RFC 0084 — budget, quota + cost policy, Draft -> Active) added `budget-policy-shape.test.ts` (always-on server-free: `budget-policy.schema.json` round-trip + the §A orthogonality guard — a wall-time field is rejected (it's RFC 0058's `runTimeoutMs`) — + threshold/onExhaustion negatives + the four content-free `budget.{reserved,consumed,threshold.crossed,exhausted}` payloads + the four `cap.breached{budget-*}` kinds + RunEventType-enum membership + the no-pricing-property structural check backing the protocol-tier `budget-no-pricing-leak` invariant + the `capabilities.budget`/`limits.maxBudget*` shape). 2026-05-30 (RFC 0083 — durable trigger + channel bridge, Draft -> Active) added `trigger-bridge-shape.test.ts` (always-on server-free: `trigger-subscription.schema.json` round-trip + missing-`state`/out-of-enum-`source`/unknown-property negatives + the four-state vocab + the two content-free `trigger.{subscription.state.changed,delivery.attempted}` payloads incl. closed `state`/`outcome` enums + RunEventType-enum membership + the `triggerBridge`/`webhooks.durable` capability shape + the `openwop-trigger-bridge` profile derivation incl. the no-dead-letter-sink negative). 2026-05-30 (RFC 0079 — credential provenance + egress policy, Draft -> Active) added `egress-provenance-shape.test.ts` (always-on server-free: `credential-provenance.schema.json` round-trip + `audiences:[]`/missing-`credentialId`/unknown-property negatives + the no-secret-property structural check backing the protocol-tier `egress-decision-no-secret-leak` invariant + the content-free `egress.decided` record incl. the `decision` enum + RunEventType-enum membership + the `httpClient.egressPolicy` shape; the behavioral `egress-credential-audience-bound` confused-deputy MUST is reference-impl tier, deferred to a host). 2026-05-30 (RFC 0078 — portable tool catalog, Draft -> Active) added `tool-descriptor-shape.test.ts` (always-on server-free: `tool-descriptor.schema.json` round-trip + the §C-1 `exec` ⇒ `host-extension` cross-field MUST (RFC 0069) + the `safetyTier`-required negative + `additionalProperties:false`, the `capabilities.toolCatalog` `supported`/`sources`/`sessionLifecycle` shape, and the two content-free `tool.session.{opened,closed}` payload $defs incl. the closed `outcome` enum + RunEventType-enum membership). 2026-05-30 (RFC 0080 — agent memory capability reconciliation, Draft -> Active) added `memory-capability-model-shape.test.ts` (always-on server-free: the additive `capabilities.memory.{writable,search,retention}` dimension shapes + malformed-instance negatives — `retention.ttl` non-boolean, out-of-enum `search.modes`, unknown property under `additionalProperties:false` — the `agent-inventory-response` `memoryDegraded`/`degradedMemoryDimensions` closed-enum fields, and the `openwop-memory` derivation surfacing for read/write + long-term hosts while withholding from `writable:false`). 2026-05-30 (RFC 0081 — agent evaluation, Draft -> Active) added `agent-eval-suite-shape.test.ts` (always-on server-free: the `capabilities.agents.evalSuite` shape + the `AgentEvalSuite`/`EvalSummary` schema round-trips + the three `eval.{started,scored,completed}` payloads + the content-free negatives — a task entry with a `taskOutput` body, a `safetyFinding` with an `excerpt` — backing the new `eval-summary-no-content-leak` SECURITY invariant). 2026-05-29 (RFC 0076 §B — `ctx.http.safeFetch` live-run audit) added `safefetch-live-audit.test.ts` (`behaviorGate('openwop-safefetch-live-audit', …)`, gated on `httpClient.safeFetch` + `toolHooks.prePostEvents`) — asserts the audit-when-both MUST against the **durable run event log** via the new `POST /v1/host/sample/http/safe-fetch-run` open seam + the test event-log seam, closing the seam-vs-production gap (a production `createSafeFetch()` with no audit hooks passes the inline `safefetch-behavior.test.ts` but FAILS this under `OPENWOP_REQUIRE_BEHAVIOR=true`); this is the RFC 0076 §B → Accepted bar; run seam soft-skips on 404 (host-pending). 2026-05-29 (RFC 0066 — `x-openwop-form` picker UX hints, Draft → Active) added `x-openwop-form-pack-manifest.test.ts` (always-on server-free: an annotated `configSchema` stays a valid 2020-12 schema + the advisory hints don't change what it accepts, each §A annotation matches the shape, an unknown `kind` validates for forward-compat, 3 negatives — missing/non-string `kind`, non-string `dependsOn`). 2026-05-29 (RFC 0076 §B — `ctx.http.safeFetch`) added `safefetch-behavior.test.ts` (seam-gated: SSRF block / DNS-rebinding / `Connection: upgrade` refusal / tool-hooks audit-when-both, via `POST /v1/host/sample/http/safe-fetch`; advertisement contract stays in `http-client-ssrf.test.ts`). 2026-05-29 (RFC 0076 §A — pack `runtime.requires[]` install gate) added two: `runtime-requires-shape.test.ts` (server-free closed-vocabulary validation — the 8 tokens validate, a raw builtin name is rejected, empty-array≡omission, `uniqueItems`) + `runtime-requires-install-gate.test.ts` (seam-gated install-grant / install-refuse → `pack_runtime_requirement_unmet` / non-sandbox SHOULD-projection, soft-skip on 404 via `POST /v1/host/sample/packs/install-gate`). 2026-05-29 (RFC 0047 — `host.oauth` authorization-code roundtrip) added `oauth-authorization-code-roundtrip.test.ts` — capability-gated on `capabilities.oauth.supported` + `grants` including `authorization_code`; drives the `POST /v1/host/sample/oauth/authorize-code-roundtrip` seam against the one canonical synthetic provider in `fixtures/oauth-providers/synthetic.json` (soft-skip on 404, Tier-2 host-pending), asserting a successful grant returns a credential REFERENCE (token persisted as a `host.credentials` entry) and that the authorization code / state / PKCE verifier / acquired access+refresh tokens never appear on any run-visible surface (RFC 0047 §C + §C.2 / `credential-payload-redaction`). Closes the RFC 0047 Tier-2 gap (capability-shape + redaction scenarios existed; the actual authorization-code dance was unexercised). 2026-05-26 (RFC 0070 — agent-manifest runtime) added `agent-manifest-runtime.test.ts`; 2026-05-26 (RFC 0071 — artifact-type + chat card packs) added six: `artifact-type-pack-manifest-validation.test.ts` + `artifact-schema-compile-bounded.test.ts` (server-free) + `artifact-type-pack-install.test.ts` + `artifact-type-store-without-render.test.ts` + `chat-card-pack-manifest-validation.test.ts` (server-free) + `chat-card-pack-execution.test.ts` (capability-gated, host-pending). 2026-05-26 (RFCs 0067 / 0068 / 0069 — spec-gap Draft cohort) added five scenarios: `byok-auth-modes.test.ts` (RFC 0067; always-on schema-shape of `aiProviders.authModes` + a discovery-gated §B auth-mode-contract cross-field check), `memory-consolidation-shape.test.ts` (RFC 0068; always-on shape of `agents.memoryConsolidation`/`agents.commitments` + the `agent.memory.consolidated`/`commitment.fired` payload $defs), `memory-consolidation-idempotent.test.ts` + `commitment-fired.test.ts` (RFC 0068; capability-gated behavioral, soft-skip on the documented `/v1/host/sample/memory/consolidate` + `/commitment/fire` seams), and `exec-not-protocol-tier.test.ts` (RFC 0069; always-on server-free structural assertion that the protocol corpus defines no `core.*`/`openwop.*` exec-class primitive — backs the `exec-must-not-be-protocol-tier` SECURITY invariant). 2026-05-25 (RFC 0061 — stateful agent-loop lifecycle, executionModel.version 5) added four `agent-loop-*.test.ts` scenarios: `-version5-shape` (always-on; validates `executionModel.statefulResume`/`transcriptWindow` + the 1–5 version ceiling) plus `-iteration-monotonic` (gated on `version >= 5`; `runOrchestrator.decided.iteration` increments 1,2,3… exactly once per turn), `-workspace-snapshot` (gated additionally on `host.workspace.supported`; a turn-i workspace write is invisible to turn i, visible to turn i+1), and `-stateful-resume` (gated on `statefulResume`; a mid-loop suspend resumes at the same iteration without resetting the counter) — the three behavioral scenarios drive the documented agent-loop seam (`POST /v1/host/sample/agentloop/run`) and soft-skip until a host wires it. 2026-05-25 (RFC 0059 — host.workspace M2, reference-host enforcement) added two `workspace-*.test.ts` scenarios: `-behavior` (capability-gated CRUD round-trip / `If-Match` 409 `workspace_conflict` / `workspace_too_large` / §D run-start snapshot, all via the real `/v1/host/workspace/files` §C endpoints) and `-cross-tenant-isolation` (WCT-1 — drives the documented `POST /v1/host/sample/workspace/op` seam to assert a file owned by one `{tenant, workspace}` is unreadable, on both `get` and `list`, under a different owner; backs the new `workspace-cross-tenant-isolation` SECURITY invariant). The in-memory reference host now advertises `capabilities.workspace.supported` and honors §C/§D/§E end-to-end. 2026-05-25 (RFC 0062 — memory.distillation "dreams") added five `distillation-*.test.ts` scenarios: `-shape` (always-on; validates the `capabilities.memory.distillation` block + the additive `distillation` sub-object on `memory.compacted`) plus `-token-budget` (within budget `tokensUsed ≤ tokenBudget`; an un-meetable budget → `token_budget_exceeded` with no partial archive), `-stable-archive` (same sources + budget ⇒ byte-stable archive checksum), `-index-roundtrip` (gated additionally on `indexEmitted`; the `MEMORY-INDEX.json` workspace file is retrievable + `workspace.updated` fired), and `-secret-carryforward` (SR-1: a redacted source secret never appears in the archive) — the four behavioral scenarios drive the documented memory-distillation seam (`POST /v1/host/sample/memory/distill`) and soft-skip until a host wires it. 2026-05-25 (RFC 0063 — core.subWorkflow.outputAttestation) added four `subrun-*.test.ts` scenarios: `-attestation-shape` (always-on; validates the `capabilities.agents.subRunAttestation` flag) plus `-checksum-stable` (the child output checksum is the byte-stable, key-order-invariant RFC 8785 JCS + SHA-256 digest), `-approval-gate` (`requireApproval` → `accept` merges, `reject` does not), and `-approval-fail-closed` (no `accept`/`edit-accept` → no merge; backs the deferred `subrun-merge-approval-fail-closed` invariant) — the three behavioral scenarios drive the documented sub-run attestation seam (`POST /v1/host/sample/subrun/attest`) and soft-skip until a host wires it. 2026-05-25 (RFC 0064 — host.toolHooks) added five `tool-hooks-*.test.ts` scenarios: `-shape` (always-on; validates the `capabilities.toolHooks` block + the optional content-free fields on `agentToolCalled` / `agentToolReturned`) plus `-content-free` (gated on `prePostEvents`), `-authorization-fail-closed` (gated on `perToolAuthorization`), `-rate-limit` (gated on `perToolRateLimit`), and `-secret-redaction` (gated on `prePostEvents` + the SR-1 `argsHash` redaction rule) — the four behavioral scenarios drive the documented tool-hooks invoke seam (`POST /v1/host/sample/toolhooks/invoke`) and soft-skip until a host wires it. 2026-05-25 (RFC 0060 — host.heartbeat) added four `heartbeat-*.test.ts` scenarios: `-capability-shape` (always-on; validates the `capabilities.heartbeat` block) plus `-fires-once-per-tick`, `-idempotent-no-spam`, and `-runtime-bound` (gated on `capabilities.heartbeat.supported` + the host heartbeat tick seam; soft-skip until a host wires it). 2026-05-25 (RFC 0057 — memory write-attribution) added five `memory-attribution-*.test.ts` scenarios: `-shape` (always-on advertisement check on `capabilities.memory.attribution`), plus `-no-content`, `-tenant-scoped`, `-emits-on-write`, and `-replay-stable` (gated on `capabilities.memory.attribution.emitsWriteEvents`) verifying the content-free `memory.written` RunEvent, its two SECURITY invariants (`memory-attribution-no-content` + `memory-attribution-tenant-scoped`), and the §D replay rule that a `replay`-mode fork MUST NOT regenerate `memoryId`. 2026-05-25 (RFC 0025 §C point 1 — test-catalog isolation invariant; pairs with the 25 publish-error scenarios in `pack-registry-publish.test.ts`) added `pack-registry-isolation.test.ts` — capability-gated on `capabilities.packs.testMode.{supported, isolated}: true`; PUTs a disposable pack into `/v1/packs-test/{name}` and asserts the same `(name, version)` does NOT appear via `GET /v1/packs/{name}` — anchors the test-catalog isolation MUST in RFC 0025 §C. 2026-05-25 (RFC 0028 Tier-2 post-promotion T2 — read-side sister scenario for workspace-membership enforcement) added `prompt-read-workspace-membership-enforced.test.ts` — gates on `capabilities.prompts.supported: true` (broader than `mutableLibrary` so read-only hosts that expose `?workspaceId=` are also probed); drives `GET /v1/prompts?workspaceId=<random-non-member>` and interprets the response: 4xx PASS (canonical envelope check on 403); 200 with empty `templates[]` PASS (correct null result for a nonexistent workspace); 200 with non-empty `templates[]` FAIL (cross-tenant leak); 200 without `templates[]` field SKIP (host doesn't expose workspace-scoped reads). Verifies SECURITY invariant `prompt-read-workspace-membership-enforced`. Same-day T1 strengthened `prompt-mutation-workspace-membership-enforced.test.ts` to pin `error === "workspace_membership_required"` when the host's refusal status is 403 (other refusal codes unconstrained). 2026-05-25 (RFC 0028 Tier-2 follow-up — workspace-membership enforcement on mutating prompt endpoints, filed in response to a self-disclosed adopter vulnerability) added `prompt-mutation-workspace-membership-enforced.test.ts` — capability-gated on `capabilities.prompts.mutableLibrary: true`; drives `POST /v1/prompts` with a cryptographically-random non-member `workspaceId` and asserts the host refuses (NOT a 2xx; any 4xx/5xx is acceptable — silent success is the failure mode). Verifies SECURITY invariant `prompt-mutation-workspace-membership-enforced`. 2026-05-22 (RFC 0034 §B follow-up — secret-leakage harness against the OTel + debug-bundle seams) added `secret-leakage-otel-attribute.test.ts` — gates on `capabilities.secrets.supported` + `capabilities.observability.testSeams.{otelScrape,debugBundleExport}` AND the `OPENWOP_CANARY_SECRET_VALUE` env (host operator + conformance runner agree on the canary). Drives the existing `openwop-smoke-byok-roundtrip` fixture end-to-end; scrapes both seams after run completion; hard-fails if the canary plaintext appears in any OTel span attribute or debug-bundle field. Verifies SECURITY invariants `secret-leakage-otel-attribute` + `secret-leakage-debug-bundle-otel`. 2026-05-22 (RFC 0041 Phase 4 — replay determinism under nondeterministic models) added three scenarios: `replay-divergence-at-refusal.test.ts` (advertisement-shape probe on `replayDeterminism.refusalDivergenceEmission` + 2 `it.todo` for the dual-direction refusal-divergence case), `replay-observable-sequence-determinism.test.ts` (capability-gated; behavioral assertion soft-skipped until a `conformance-phase4-nondet-tool` fixture ships), `replay-llm-cache-key-portable.test.ts` (intra-host reproducibility + non-recipe-field invariance + Phase 4 advertisement alignment — reuses the existing `POST /v1/host/sample/test/llm-cache-key` seam from the sibling `replay-llm-cache-key.test.ts`). 2026-05-20 (RFC 0027 §A templateKinds-coverage follow-up — paired with `prompt-end-to-end-events.test.ts`) added `prompt-all-four-kinds-events.test.ts` exercising all four `PromptKind` values (`system`, `user`, `schema-hint`, `few-shot`) end-to-end through the reference workflow-engine sample's `local.sample.demo.mock-ai` dispatch path; capability-gated via `behaviorGate('prompts-supported', ...)`. Closes the credibility gap where the host advertised `templateKinds: ["system", "user", "few-shot", "schema-hint"]` but only the system+user pair was actually wired into dispatch. 2026-05-20 (RFCs 0030–0033 — envelope LLM-contract-hardening track) added 15 scenarios across four `Active` RFCs: `envelope-reasoning-shape.test.ts` (RFC 0030, always-on; asserts the OPTIONAL `reasoning` property on the three universal-kind schemas + the `schema.response` deliberate omission), `envelope-reasoning-secret-redaction.test.ts` (RFC 0030, capability-gated on `capabilities.envelopes.reasoning.supported` + `secrets.supported`; 5 `it.todo()` placeholders for SECURITY invariant `envelope-reasoning-secret-redaction`), `envelope-tier-one-subset-static.test.ts` (RFC 0030, always-on for load-bearing rules — no `oneOf` / `allOf` / `not` / `prefixItems` / `propertyNames` anywhere; gated on `tierOneSubsetCompliance: "strict"` for OpenAI-strict-only constraints), `envelope-variant-discriminator-static.test.ts` (RFC 0031, always-on; asserts no `oneOf` + every `anyOf` branch declares a single-string-enum discriminator in `required` on every `schemas/envelopes/*.schema.json`), `model-capability-substituted.test.ts` (RFC 0031, advertisement-shape probe on `capabilities.modelCapabilities.advertised[]` identifier pattern + 5 `it.todo()` placeholders for SECURITY invariant `model-capability-substituted-no-credential-disclosure`), `model-capability-insufficient.test.ts` (RFC 0031, 6 `it.todo()` placeholders for refusal + no-recursive-fallback), `node-module-required-capabilities-shape.test.ts` (RFC 0031 SHOULD-tier authoring-convention; 4 `it.todo()` placeholders), and the six envelope-reliability events from RFC 0032 (`envelope-retry-attempted` carrying the shared advertisement-shape probe enforcing both MUST-tier events in `events[]` per RFC 0032 §C, plus `envelope-retry-exhausted`, `envelope-refusal-shape`, `envelope-truncated`, `envelope-nl-to-format-engaged`, `envelope-recovery-applied` — collectively 39 `it.todo()` placeholders covering retry/refusal/truncation/recovery + SECURITY invariants `envelope-refusal-no-prompt-leak` and `envelope-recovery-no-content-leak`), plus RFC 0033's two scenarios (`envelope-completion-distinguishes-truncation.test.ts` + `envelope-truncation-cap-exhaustion.test.ts` — 12 `it.todo()` placeholders covering the truncation-vs-schema-violation retry-routing distinction + the DoS-bound assertion). Reference workflow-engine sample advertises `capabilities.envelopes.reasoning: { supported: true, promptDirective: "off" }` + `tierOneSubsetCompliance: "warn"` honestly (schemas accept the field; host doesn't yet inject the directive); the other three RFCs' capability blocks defer to reference-host emission code per the staged RFC 0027 §G precedent. 2026-05-20 (RFC 0028 §B Phase B — prompt-pack boot-time install) added `prompt-pack-install.test.ts` (capability-gated on `capabilities.prompts.endpointsSupported: true`; asserts a host that ran the boot-time pack loader surfaces ≥ 1 pack-source template under `GET /v1/prompts?source=pack` carrying the canonical `meta.source: "pack"` + `meta.packName` + `meta.packVersion` stamps; positively identifies the in-tree `vendor.openwop.prompt-sample` reference pack's `writer-system` template when present). Pairs with the new `host/promptPackLoader.ts` boot-time entry on the reference workflow-engine sample, which scans `examples/packs/*` plus `OPENWOP_PROMPT_PACKS_DIR` and calls `installPackTemplates()` for each `kind: "prompt"` pack found. 2026-05-20 (RFC 0029 Phase C — prompt resolution chain wire shape) added three more scenarios: `prompt-resolution-chain-node-wins.test.ts` (capability-gated on `capabilities.prompts.supported: true`; asserts layer-1 node-config supersedes lower layers per `spec/v1/prompts.md` §"Resolution chain (normative)"), `prompt-resolution-chain-agent-intrinsic.test.ts` (additionally gated on `capabilities.prompts.agentBindings: true`; asserts agent intrinsic `systemPromptRef` wins over `promptOverrides` AND lower layers when the node has no layer-1 ref), `prompt-resolution-chain-fallback-cascade.test.ts` (asserts layer 3 workflow-defaults wins over layer 4 host-defaults; layer 4 host-defaults wins when 1-3 yield null; resolved is null when all four yield null but chain[] still lists every attempted layer). The scenarios drive the host's `POST /v1/host/sample/prompt/resolve` test seam (reference-host implementation deferred to follow-up slice per RFC 0021 staging precedent). 2026-05-20 (RFC 0027 Phase A — prompt templates wire shape) added three scenarios: `prompt-template-shape.test.ts` (always-on; Ajv compileability + positive/negative round-trip for PromptTemplate + PromptRef + PromptKind), `prompt-composed-secret-redaction.test.ts` (capability-gated on `capabilities.prompts.supported: true` + `observability: "full"`; asserts `[REDACTED:<secretId>]` markers in `prompt.composed` payloads for `source: "secret"` variable bindings per SECURITY/threat-model-secret-leakage.md §SR-1), `prompt-composed-trust-marker.test.ts` (same capability gates; asserts `<UNTRUSTED>...</UNTRUSTED>` wrapping + `contentTrust: "untrusted"` propagation per RFC 0020 §D). Paired with new `fixtures/prompt-templates/` sub-directory + per-fixture schema-validity describe block + future SECURITY invariants `prompt-composed-secret-redaction` and `prompt-composed-trust-marker` (lands alongside reference-host emission per RFC 0021 staging precedent). 2026-05-18 (RFC 0022 `Draft` — runtime variable mapping) added four `it.todo()` placeholder scenarios covering the new mapping surfaces on `core.dispatch` (§A — `dispatch-input-mapping.test.ts`, `dispatch-output-mapping.test.ts`, `dispatch-cross-worker-handoff.test.ts`) and `core.subWorkflow` (§B — `subworkflow-input-mapping.test.ts`). Gated on `capabilities.agents.dispatchMapping` (dispatch trio) and `capabilities.subWorkflow.inputMapping` (subWorkflow). Promote to live assertions when RFC 0022 reaches `Active` + a reference host advertises the matching flags. 2026-05-17 (RFC 0003 §D handoff-schema enforcement, HV-1) added `agentPackHandoffSchemaValidation.test.ts` — verifies the host validates dispatch payloads against `handoff.taskSchemaRef` AND return payloads against `handoff.returnSchemaRef` per RFC 0003 §D. Paired with the new `agent-pack-handoff-schema-enforcement` row in `SECURITY/invariants.yaml`. 2026-05-17 (AI Envelope gap-closure, DRAFT v1.x — `spec/v1/ai-envelope.md`) added 7 advertisement-shape scenarios with `it.todo()` behavioral placeholders gated on `capabilities.envelopeContracts.advertised: true`: `aiEnvelope.universalKinds.test.ts`, `aiEnvelope.schemaDrift.test.ts`, `aiEnvelope.correlationReplay.test.ts`, `aiEnvelope.contractRefusal.test.ts`, `aiEnvelope.trustBoundaryPropagation.test.ts`, `aiEnvelope.redaction.test.ts`, `aiEnvelope.capBreached.test.ts`. Paired with the new `envelope-redaction-sr-1-carry-forward` row in `SECURITY/invariants.yaml`. 2026-05-17 (post-publish hardening, deep audit of `core.openwop.agents`) added `agents-run-tool-allowlist.test.ts` — server-free scenario locking in the `core.openwop.agents@1.0.1` safety-fix that closes `OPENWOP-AUDIT-2026-003` (function-typed `tool.handler` properties rejected at `validateTools()` with `INVALID_TOOL_DECLARATION`; tool-driven runs require `ctx.agentRuntime`; tool-less safe fallback preserved). Paired with the new `agents-run-no-raw-handler` row in `SECURITY/invariants.yaml`. Same-day post-publish hardening added `idempotency-key-determinism.test.ts` — server-free scenario locking in the `core.openwop.http@1.1.2` determinism safety-fix (default `composite` mode produces deterministic keys in `(runId, nodeId, payload)`; removed `uuid` mode rejects with `CONFIG_INVALID`; cross-impl vector test lets third-party reimplementations verify wire agreement). Paired with the new `idempotency-key-deterministic` row in `SECURITY/invariants.yaml`. 2026-05-17 (Phase 3 of RFC 0013) added three server-free scenarios exercising the reference workflow-chain expansion library (`conformance/src/lib/workflow-chain-expansion.ts`): `workflow-chain-expansion.test.ts` (parameter substitution + node id collision avoidance + edge rewriting + capability propagation + runtime-invariance contract), `workflow-chain-unresolvable-typeid.test.ts` (rejection with `chain_unresolvable_typeid` when a chain references an unknown typeId), and `workflow-chain-pack-signature-verification.test.ts` (Ed25519 verification recipe reuse from `node-packs.md §Signing`). Earlier that day (Phase 1) added `workflow-chain-pack-manifest-validation.test.ts` — server-free schema-validation scenario covering the new `workflow-chain-pack-manifest.schema.json` (positive sample + two negatives: kind/contents mismatch and invalid `chainId`). Closes RFC 0013 (`Workflow-chain packs`, `Draft`) Phases 1 + 3 alongside the new `spec/v1/workflow-chain-packs.md`, the `Capabilities.workflowChainPacks` block, and the registry build-index/conformance-check `kind` routing from Phase 2. Earlier that day, the suite added 27 `it.todo()` placeholder scenarios paired with RFCs 0014-0020 (host capability surfaces — fs, kvStorage, tableStorage, queueBus, sql/vector/search, blob/cache, mcp.serverMount). These promote to live assertions when each RFC reaches `Active` + the matching capability block lands in `schemas/capabilities.schema.json` + a reference host advertises the capability. Earlier additions include 18 Multi-Agent Shift scenarios (Phases 1-5) added 2026-05-10, the `registry-public.test.ts` public-registry healthcheck added 2026-05-11 (opt-in via `OPENWOP_TEST_PUBLIC_REGISTRY=true`), the `replay-llm-cache-key.test.ts` placeholder added 2026-05-11 (three `it.todo()` cases for the cross-host LLM cache-key recipe per `replay.md` §"LLM cache-key recipe"), the two `production-*.test.ts` scenarios added 2026-05-11 for the `openwop-production` profile per RFC 0009 (`production-backpressure.test.ts`, `production-retention-expiry.test.ts`), the four `auth-*.test.ts` scenarios added 2026-05-11/12 for the production-auth profiles per RFC 0010 (`auth-api-key-rotation.test.ts`, `auth-oauth2-client-credentials.test.ts`, `auth-oidc-user-bearer.test.ts`, `auth-mtls.test.ts` (opt-in via `OPENWOP_TEST_MTLS=1`)), `replay-retention-expiry.test.ts` added 2026-05-12 (capability shape + 410/422 envelope per `replay.md` §"Retention and garbage collection"), `bulk-cancel.test.ts` added 2026-05-12 (Phase B close-out of R1 — `POST /v1/runs:bulk-cancel`), the two Phase H launch-blocker advertisement-contract scenarios added 2026-05-12 (`mcp-toolcall-redaction.test.ts` for the MCP-1 invariant per `host-capabilities.md §host.mcp` + `threat-model-prompt-injection.md §UNTRUSTED`, and `http-client-ssrf.test.ts` for the SSRF + body-size cap advertisement contract on `capabilities.httpClient`), the `wasm-pack-abi-version-rejection.test.ts` Track 7 scenario added 2026-05-12 for the ABI-mismatch positive path via the `vendor.openwop.misbehaving-abi` pack per RFC 0008 §H, the `otel-trace-propagation-subworkflow.test.ts` Track 11 close-out added 2026-05-13 (parent + child run spans share the inbound traceparent's traceId across the `core.subWorkflow` dispatch boundary), and the three RFC 0012 (Memory Compaction Profile, `Active`) scenarios added 2026-05-13/14: `memory-compaction-sr1-carry-forward.test.ts` (load-bearing SR-1 §D), `memory-compaction-event-emitted.test.ts` (canonical §B payload shape), and `memory-compaction-provenance-tag.test.ts` (soft assertion on §C `compacted-from:<id>` convention). All three gate on `capabilities.memory.compaction.supported` + the host's test seam at `/v1/test/memory/{seed,compact}` (Postgres reference host enables both via `OPENWOP_MEMORY_COMPACTION=true OPENWOP_TEST_TRIGGER_COMPACTION=true`). 2026-05-15 (gap-closure CF-3) added `interrupt-token-matrix.test.ts` (malformed / unknown / replay / cross-run-id paths on `GET|POST /v1/interrupts/{token}`). The maintained scenario-to-spec map lives in [`coverage.md`](./coverage.md); this README keeps the operator quickstart and the historical scenario notes below.
+The current suite has 310 scenario files under `src/scenarios/`. 2026-05-31 (RFC 0082 — agent deployment lifecycle, the Active→Accepted behavioral gate) added `agent-deployment-lifecycle.test.ts` (capability-gated on `agents.deployment.supported` via `behaviorGate('openwop-deployment-lifecycle', …)` — the §E promotion contract via the new `POST /v1/host/sample/agents/deployment-transition` seam + the test event-log seam across four legs: `promote` (authorize RFC 0049 → approvalGate RFC 0051 → eval-verify RFC 0081 → content-free `deployment.promoted` with a seven-state `toState` + `toVersion`, the record validating `agent-deployment.schema.json`), `unauthorized` (fail-closed — `allowed:false`, no `deployment.promoted`, the behavioral leg of `deployment-promotion-fail-closed`), `eval-gate-unmet` (`eval_gate_unmet` denial, §E-3), and `channel-pin` (the §B `resolvedAgentVersion` recorded-fact on `agent.invocation.started`); new lib helper `src/lib/agentDeployment.ts`; soft-skips on 404 — the RFC 0082 → Accepted bar). 2026-05-31 (RFC 0081 — agent evaluation, the Active→Accepted behavioral gate) added `agent-eval-run.test.ts` (capability-gated on `agents.evalSuite.supported` via `behaviorGate('openwop-eval-run', …)` — the §B `mode:"eval"` projection via the new `POST /v1/host/sample/agents/eval-run` seam + the test event-log seam: `eval.started`-first → one `eval.scored` per task → `eval.completed`-once ordering (count == `eval.completed.taskCount`), the content-free `eval.scored` legs (`score` ∈ 0..1) backing `eval-summary-no-content-leak`, and the NORMATIVE `GET /v1/runs/{runId}/eval-summary` schema-valid `EvalSummary` round-trip with `passedCount <= taskCount`; new lib helper `src/lib/agentEval.ts`; soft-skips on 404 — the RFC 0081 → Accepted bar). 2026-05-31 (RFC 0083 — durable trigger bridge, the Active→Accepted behavioral gate) added `trigger-bridge-delivery.test.ts` (profile-gated on `openwop-trigger-bridge` derived from the live discovery doc — the §C delivery model via the `POST /v1/host/sample/trigger-bridge/deliver` seam + the test event-log seam: dedup→effectively-once `trigger.delivery.attempted{delivered}` (§C-1), retry-exhaustion→`{dead-lettered}` + `trigger.subscription.state.changed{toState:dead-lettered}` (§C-2 + RFC 0053), and the delivered run's `run.started.causationId` == the delivery id (§C / RFC 0040); both `trigger.*` events content-free; the always-on shape stays in `trigger-bridge-shape.test.ts`; new lib helper `src/lib/triggerBridge.ts`). 2026-05-31 (RFC 0087 — agent org-chart, the Active→Accepted behavioral gate) added two capability-gated behavioral scenarios (both gated on `agents.orgChart.supported`, black-box on the normative `/v1/agents/org-chart` surface — no new POST seam): `agent-org-chart-scoping.test.ts` (the `GET /v1/agents/org-chart` tree-shape — departments form an acyclic `parentDepartmentId` tree, members reference `host:<id>` roster entries — + the §D responsibility roll-up via `GET /v1/agents/org-chart/{departmentId}` with a deduped `responsibilities[]` union + the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ORG_CHART_DEPARTMENT_ID`) and `org-position-no-authority-escalation.test.ts` (the behavioral leg of the protocol-tier invariant — the live org-chart wire carries NO authority-bearing field on any member/department/responsibility-view object; the structural leg stays always-on in `agent-org-chart-shape.test.ts`, and the deeper RFC 0049/0051 authority-invariance legs stay reference-impl tier per the `agent-manifest-runtime` no-host-hook precedent). 2026-05-31 (RFCs 0086 + 0077 — the Active→Accepted behavioral gate) added four capability-gated behavioral scenarios so a non-steward host can be mechanically certified non-vacuously under `OPENWOP_REQUIRE_BEHAVIOR=true`: `agent-roster-attribution.test.ts` (RFC 0086 §B/§C; gated on `agents.roster.supported` — the normative `GET /v1/agents/roster` read shape + `total==roster.length`, the §C `roster.run.initiated`-before-`agent.invocation.started` ordering, the content-free payload backing `roster-attribution-no-content`, the durable work-item `triggerSubscriptionId`, and the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ROSTER_ID`), `agent-live-invocation-bracket.test.ts` (RFC 0077 §E; gated on `agents.liveRuntime.supported` — `agent.invocation.started`-first / `agent.invocation.completed`-last bracket, matching `invocationId`, `source`/`outcome` closed enums, content-free), `agent-live-structured-output.test.ts` (RFC 0077 §B step 6; gated on `agents.liveRuntime.structuredOutput` — a result violating `handoff.returnSchemaRef` fails the invocation `outcome:"failed"` rather than shipping as completed), and `agent-live-allowlist-enforced.test.ts` (RFC 0077 §F-1 / RFC 0002 §A14; gated on `agents.liveRuntime.supported` — a tool outside `toolAllowlist` is not callable); all four drive the documented `POST /v1/host/sample/roster/fire` + `POST /v1/host/sample/agents/live-invoke` seams plus the test event-log seam and soft-skip on 404 (these are the RFC 0086 / 0077 Active→Accepted bars). 2026-05-30 (RFC 0087 — agent org-chart, Draft -> Active) added `agent-org-chart-shape.test.ts` (always-on server-free: the `capabilities.agents.orgChart` shape + the `AgentOrgChart` round-trip + the non-`host:` member negative + the **§B structural non-authority guarantee** — the schema rejects a `scopes`/`canDispatch`/`permissions`/`authority` field on a member (`additionalProperties:false`), and a member's key set is exactly `{rosterId, departmentId, roleId, reportsTo}` — backing the protocol-tier `org-position-no-authority-escalation` invariant; no new RunEventType). 2026-05-30 (RFC 0086 — standing agent roster, Draft -> Active) added `agent-roster-shape.test.ts` (always-on server-free: the `capabilities.agents.roster` shape + the `AgentRosterEntry` round-trip + the `host:` `rosterId` + `agentRef` version-XOR-channel negatives + the content-free `roster.run.initiated` negatives backing the protocol-tier `roster-attribution-no-content` invariant + the additive `roster` inventory projection + RunEventType-enum membership). 2026-05-30 (RFC 0082 — agent deployment lifecycle, Draft -> Active) added `agent-deployment-shape.test.ts` (always-on server-free: the `capabilities.agents.deployment` shape + the `AgentDeployment` record round-trip + the `AgentRef` `channel` XOR `version` `not`-clause + the four `deployment.*` payloads + the content-free negatives backing the protocol-tier `deployment-event-no-content-leak` invariant). 2026-05-30 (RFC 0085 — `openwop-agent-platform` meta-profile, Draft -> Active) added `agent-platform-profile.test.ts` (always-on server-free derivation of the operational-annex `none`/`partial`/`full` status: all-floor ⇒ partial, missing-flag ⇒ none, the replay-OR-`nondeterminismPolicy.declared` term, floor+governance ⇒ full, missing-tenant-scope ⇒ partial-not-full per the honest-advertisement rule, eval/deploy/budget-are-advisory-not-hard-terms, + the `capabilities.nondeterminismPolicy.declared` shape). 2026-05-30 (RFC 0084 — budget, quota + cost policy, Draft -> Active) added `budget-policy-shape.test.ts` (always-on server-free: `budget-policy.schema.json` round-trip + the §A orthogonality guard — a wall-time field is rejected (it's RFC 0058's `runTimeoutMs`) — + threshold/onExhaustion negatives + the four content-free `budget.{reserved,consumed,threshold.crossed,exhausted}` payloads + the four `cap.breached{budget-*}` kinds + RunEventType-enum membership + the no-pricing-property structural check backing the protocol-tier `budget-no-pricing-leak` invariant + the `capabilities.budget`/`limits.maxBudget*` shape). 2026-05-30 (RFC 0083 — durable trigger + channel bridge, Draft -> Active) added `trigger-bridge-shape.test.ts` (always-on server-free: `trigger-subscription.schema.json` round-trip + missing-`state`/out-of-enum-`source`/unknown-property negatives + the four-state vocab + the two content-free `trigger.{subscription.state.changed,delivery.attempted}` payloads incl. closed `state`/`outcome` enums + RunEventType-enum membership + the `triggerBridge`/`webhooks.durable` capability shape + the `openwop-trigger-bridge` profile derivation incl. the no-dead-letter-sink negative). 2026-05-30 (RFC 0079 — credential provenance + egress policy, Draft -> Active) added `egress-provenance-shape.test.ts` (always-on server-free: `credential-provenance.schema.json` round-trip + `audiences:[]`/missing-`credentialId`/unknown-property negatives + the no-secret-property structural check backing the protocol-tier `egress-decision-no-secret-leak` invariant + the content-free `egress.decided` record incl. the `decision` enum + RunEventType-enum membership + the `httpClient.egressPolicy` shape; the behavioral `egress-credential-audience-bound` confused-deputy MUST is reference-impl tier, deferred to a host). 2026-05-30 (RFC 0078 — portable tool catalog, Draft -> Active) added `tool-descriptor-shape.test.ts` (always-on server-free: `tool-descriptor.schema.json` round-trip + the §C-1 `exec` ⇒ `host-extension` cross-field MUST (RFC 0069) + the `safetyTier`-required negative + `additionalProperties:false`, the `capabilities.toolCatalog` `supported`/`sources`/`sessionLifecycle` shape, and the two content-free `tool.session.{opened,closed}` payload $defs incl. the closed `outcome` enum + RunEventType-enum membership). 2026-05-30 (RFC 0080 — agent memory capability reconciliation, Draft -> Active) added `memory-capability-model-shape.test.ts` (always-on server-free: the additive `capabilities.memory.{writable,search,retention}` dimension shapes + malformed-instance negatives — `retention.ttl` non-boolean, out-of-enum `search.modes`, unknown property under `additionalProperties:false` — the `agent-inventory-response` `memoryDegraded`/`degradedMemoryDimensions` closed-enum fields, and the `openwop-memory` derivation surfacing for read/write + long-term hosts while withholding from `writable:false`). 2026-05-30 (RFC 0081 — agent evaluation, Draft -> Active) added `agent-eval-suite-shape.test.ts` (always-on server-free: the `capabilities.agents.evalSuite` shape + the `AgentEvalSuite`/`EvalSummary` schema round-trips + the three `eval.{started,scored,completed}` payloads + the content-free negatives — a task entry with a `taskOutput` body, a `safetyFinding` with an `excerpt` — backing the new `eval-summary-no-content-leak` SECURITY invariant). 2026-05-29 (RFC 0076 §B — `ctx.http.safeFetch` live-run audit) added `safefetch-live-audit.test.ts` (`behaviorGate('openwop-safefetch-live-audit', …)`, gated on `httpClient.safeFetch` + `toolHooks.prePostEvents`) — asserts the audit-when-both MUST against the **durable run event log** via the new `POST /v1/host/sample/http/safe-fetch-run` open seam + the test event-log seam, closing the seam-vs-production gap (a production `createSafeFetch()` with no audit hooks passes the inline `safefetch-behavior.test.ts` but FAILS this under `OPENWOP_REQUIRE_BEHAVIOR=true`); this is the RFC 0076 §B → Accepted bar; run seam soft-skips on 404 (host-pending). 2026-05-29 (RFC 0066 — `x-openwop-form` picker UX hints, Draft → Active) added `x-openwop-form-pack-manifest.test.ts` (always-on server-free: an annotated `configSchema` stays a valid 2020-12 schema + the advisory hints don't change what it accepts, each §A annotation matches the shape, an unknown `kind` validates for forward-compat, 3 negatives — missing/non-string `kind`, non-string `dependsOn`). 2026-05-29 (RFC 0076 §B — `ctx.http.safeFetch`) added `safefetch-behavior.test.ts` (seam-gated: SSRF block / DNS-rebinding / `Connection: upgrade` refusal / tool-hooks audit-when-both, via `POST /v1/host/sample/http/safe-fetch`; advertisement contract stays in `http-client-ssrf.test.ts`). 2026-05-29 (RFC 0076 §A — pack `runtime.requires[]` install gate) added two: `runtime-requires-shape.test.ts` (server-free closed-vocabulary validation — the 8 tokens validate, a raw builtin name is rejected, empty-array≡omission, `uniqueItems`) + `runtime-requires-install-gate.test.ts` (seam-gated install-grant / install-refuse → `pack_runtime_requirement_unmet` / non-sandbox SHOULD-projection, soft-skip on 404 via `POST /v1/host/sample/packs/install-gate`). 2026-05-29 (RFC 0047 — `host.oauth` authorization-code roundtrip) added `oauth-authorization-code-roundtrip.test.ts` — capability-gated on `capabilities.oauth.supported` + `grants` including `authorization_code`; drives the `POST /v1/host/sample/oauth/authorize-code-roundtrip` seam against the one canonical synthetic provider in `fixtures/oauth-providers/synthetic.json` (soft-skip on 404, Tier-2 host-pending), asserting a successful grant returns a credential REFERENCE (token persisted as a `host.credentials` entry) and that the authorization code / state / PKCE verifier / acquired access+refresh tokens never appear on any run-visible surface (RFC 0047 §C + §C.2 / `credential-payload-redaction`). Closes the RFC 0047 Tier-2 gap (capability-shape + redaction scenarios existed; the actual authorization-code dance was unexercised). 2026-05-26 (RFC 0070 — agent-manifest runtime) added `agent-manifest-runtime.test.ts`; 2026-05-26 (RFC 0071 — artifact-type + chat card packs) added six: `artifact-type-pack-manifest-validation.test.ts` + `artifact-schema-compile-bounded.test.ts` (server-free) + `artifact-type-pack-install.test.ts` + `artifact-type-store-without-render.test.ts` + `chat-card-pack-manifest-validation.test.ts` (server-free) + `chat-card-pack-execution.test.ts` (capability-gated, host-pending). 2026-05-26 (RFCs 0067 / 0068 / 0069 — spec-gap Draft cohort) added five scenarios: `byok-auth-modes.test.ts` (RFC 0067; always-on schema-shape of `aiProviders.authModes` + a discovery-gated §B auth-mode-contract cross-field check), `memory-consolidation-shape.test.ts` (RFC 0068; always-on shape of `agents.memoryConsolidation`/`agents.commitments` + the `agent.memory.consolidated`/`commitment.fired` payload $defs), `memory-consolidation-idempotent.test.ts` + `commitment-fired.test.ts` (RFC 0068; capability-gated behavioral, soft-skip on the documented `/v1/host/sample/memory/consolidate` + `/commitment/fire` seams), and `exec-not-protocol-tier.test.ts` (RFC 0069; always-on server-free structural assertion that the protocol corpus defines no `core.*`/`openwop.*` exec-class primitive — backs the `exec-must-not-be-protocol-tier` SECURITY invariant). 2026-05-25 (RFC 0061 — stateful agent-loop lifecycle, executionModel.version 5) added four `agent-loop-*.test.ts` scenarios: `-version5-shape` (always-on; validates `executionModel.statefulResume`/`transcriptWindow` + the 1–5 version ceiling) plus `-iteration-monotonic` (gated on `version >= 5`; `runOrchestrator.decided.iteration` increments 1,2,3… exactly once per turn), `-workspace-snapshot` (gated additionally on `host.workspace.supported`; a turn-i workspace write is invisible to turn i, visible to turn i+1), and `-stateful-resume` (gated on `statefulResume`; a mid-loop suspend resumes at the same iteration without resetting the counter) — the three behavioral scenarios drive the documented agent-loop seam (`POST /v1/host/sample/agentloop/run`) and soft-skip until a host wires it. 2026-05-25 (RFC 0059 — host.workspace M2, reference-host enforcement) added two `workspace-*.test.ts` scenarios: `-behavior` (capability-gated CRUD round-trip / `If-Match` 409 `workspace_conflict` / `workspace_too_large` / §D run-start snapshot, all via the real `/v1/host/workspace/files` §C endpoints) and `-cross-tenant-isolation` (WCT-1 — drives the documented `POST /v1/host/sample/workspace/op` seam to assert a file owned by one `{tenant, workspace}` is unreadable, on both `get` and `list`, under a different owner; backs the new `workspace-cross-tenant-isolation` SECURITY invariant). The in-memory reference host now advertises `capabilities.workspace.supported` and honors §C/§D/§E end-to-end. 2026-05-25 (RFC 0062 — memory.distillation "dreams") added five `distillation-*.test.ts` scenarios: `-shape` (always-on; validates the `capabilities.memory.distillation` block + the additive `distillation` sub-object on `memory.compacted`) plus `-token-budget` (within budget `tokensUsed ≤ tokenBudget`; an un-meetable budget → `token_budget_exceeded` with no partial archive), `-stable-archive` (same sources + budget ⇒ byte-stable archive checksum), `-index-roundtrip` (gated additionally on `indexEmitted`; the `MEMORY-INDEX.json` workspace file is retrievable + `workspace.updated` fired), and `-secret-carryforward` (SR-1: a redacted source secret never appears in the archive) — the four behavioral scenarios drive the documented memory-distillation seam (`POST /v1/host/sample/memory/distill`) and soft-skip until a host wires it. 2026-05-25 (RFC 0063 — core.subWorkflow.outputAttestation) added four `subrun-*.test.ts` scenarios: `-attestation-shape` (always-on; validates the `capabilities.agents.subRunAttestation` flag) plus `-checksum-stable` (the child output checksum is the byte-stable, key-order-invariant RFC 8785 JCS + SHA-256 digest), `-approval-gate` (`requireApproval` → `accept` merges, `reject` does not), and `-approval-fail-closed` (no `accept`/`edit-accept` → no merge; backs the deferred `subrun-merge-approval-fail-closed` invariant) — the three behavioral scenarios drive the documented sub-run attestation seam (`POST /v1/host/sample/subrun/attest`) and soft-skip until a host wires it. 2026-05-25 (RFC 0064 — host.toolHooks) added five `tool-hooks-*.test.ts` scenarios: `-shape` (always-on; validates the `capabilities.toolHooks` block + the optional content-free fields on `agentToolCalled` / `agentToolReturned`) plus `-content-free` (gated on `prePostEvents`), `-authorization-fail-closed` (gated on `perToolAuthorization`), `-rate-limit` (gated on `perToolRateLimit`), and `-secret-redaction` (gated on `prePostEvents` + the SR-1 `argsHash` redaction rule) — the four behavioral scenarios drive the documented tool-hooks invoke seam (`POST /v1/host/sample/toolhooks/invoke`) and soft-skip until a host wires it. 2026-05-25 (RFC 0060 — host.heartbeat) added four `heartbeat-*.test.ts` scenarios: `-capability-shape` (always-on; validates the `capabilities.heartbeat` block) plus `-fires-once-per-tick`, `-idempotent-no-spam`, and `-runtime-bound` (gated on `capabilities.heartbeat.supported` + the host heartbeat tick seam; soft-skip until a host wires it). 2026-05-25 (RFC 0057 — memory write-attribution) added five `memory-attribution-*.test.ts` scenarios: `-shape` (always-on advertisement check on `capabilities.memory.attribution`), plus `-no-content`, `-tenant-scoped`, `-emits-on-write`, and `-replay-stable` (gated on `capabilities.memory.attribution.emitsWriteEvents`) verifying the content-free `memory.written` RunEvent, its two SECURITY invariants (`memory-attribution-no-content` + `memory-attribution-tenant-scoped`), and the §D replay rule that a `replay`-mode fork MUST NOT regenerate `memoryId`. 2026-05-25 (RFC 0025 §C point 1 — test-catalog isolation invariant; pairs with the 25 publish-error scenarios in `pack-registry-publish.test.ts`) added `pack-registry-isolation.test.ts` — capability-gated on `capabilities.packs.testMode.{supported, isolated}: true`; PUTs a disposable pack into `/v1/packs-test/{name}` and asserts the same `(name, version)` does NOT appear via `GET /v1/packs/{name}` — anchors the test-catalog isolation MUST in RFC 0025 §C. 2026-05-25 (RFC 0028 Tier-2 post-promotion T2 — read-side sister scenario for workspace-membership enforcement) added `prompt-read-workspace-membership-enforced.test.ts` — gates on `capabilities.prompts.supported: true` (broader than `mutableLibrary` so read-only hosts that expose `?workspaceId=` are also probed); drives `GET /v1/prompts?workspaceId=<random-non-member>` and interprets the response: 4xx PASS (canonical envelope check on 403); 200 with empty `templates[]` PASS (correct null result for a nonexistent workspace); 200 with non-empty `templates[]` FAIL (cross-tenant leak); 200 without `templates[]` field SKIP (host doesn't expose workspace-scoped reads). Verifies SECURITY invariant `prompt-read-workspace-membership-enforced`. Same-day T1 strengthened `prompt-mutation-workspace-membership-enforced.test.ts` to pin `error === "workspace_membership_required"` when the host's refusal status is 403 (other refusal codes unconstrained). 2026-05-25 (RFC 0028 Tier-2 follow-up — workspace-membership enforcement on mutating prompt endpoints, filed in response to a self-disclosed adopter vulnerability) added `prompt-mutation-workspace-membership-enforced.test.ts` — capability-gated on `capabilities.prompts.mutableLibrary: true`; drives `POST /v1/prompts` with a cryptographically-random non-member `workspaceId` and asserts the host refuses (NOT a 2xx; any 4xx/5xx is acceptable — silent success is the failure mode). Verifies SECURITY invariant `prompt-mutation-workspace-membership-enforced`. 2026-05-22 (RFC 0034 §B follow-up — secret-leakage harness against the OTel + debug-bundle seams) added `secret-leakage-otel-attribute.test.ts` — gates on `capabilities.secrets.supported` + `capabilities.observability.testSeams.{otelScrape,debugBundleExport}` AND the `OPENWOP_CANARY_SECRET_VALUE` env (host operator + conformance runner agree on the canary). Drives the existing `openwop-smoke-byok-roundtrip` fixture end-to-end; scrapes both seams after run completion; hard-fails if the canary plaintext appears in any OTel span attribute or debug-bundle field. Verifies SECURITY invariants `secret-leakage-otel-attribute` + `secret-leakage-debug-bundle-otel`. 2026-05-22 (RFC 0041 Phase 4 — replay determinism under nondeterministic models) added three scenarios: `replay-divergence-at-refusal.test.ts` (advertisement-shape probe on `replayDeterminism.refusalDivergenceEmission` + 2 `it.todo` for the dual-direction refusal-divergence case), `replay-observable-sequence-determinism.test.ts` (capability-gated; behavioral assertion soft-skipped until a `conformance-phase4-nondet-tool` fixture ships), `replay-llm-cache-key-portable.test.ts` (intra-host reproducibility + non-recipe-field invariance + Phase 4 advertisement alignment — reuses the existing `POST /v1/host/sample/test/llm-cache-key` seam from the sibling `replay-llm-cache-key.test.ts`). 2026-05-20 (RFC 0027 §A templateKinds-coverage follow-up — paired with `prompt-end-to-end-events.test.ts`) added `prompt-all-four-kinds-events.test.ts` exercising all four `PromptKind` values (`system`, `user`, `schema-hint`, `few-shot`) end-to-end through the reference workflow-engine sample's `local.sample.demo.mock-ai` dispatch path; capability-gated via `behaviorGate('prompts-supported', ...)`. Closes the credibility gap where the host advertised `templateKinds: ["system", "user", "few-shot", "schema-hint"]` but only the system+user pair was actually wired into dispatch. 2026-05-20 (RFCs 0030–0033 — envelope LLM-contract-hardening track) added 15 scenarios across four `Active` RFCs: `envelope-reasoning-shape.test.ts` (RFC 0030, always-on; asserts the OPTIONAL `reasoning` property on the three universal-kind schemas + the `schema.response` deliberate omission), `envelope-reasoning-secret-redaction.test.ts` (RFC 0030, capability-gated on `capabilities.envelopes.reasoning.supported` + `secrets.supported`; 5 `it.todo()` placeholders for SECURITY invariant `envelope-reasoning-secret-redaction`), `envelope-tier-one-subset-static.test.ts` (RFC 0030, always-on for load-bearing rules — no `oneOf` / `allOf` / `not` / `prefixItems` / `propertyNames` anywhere; gated on `tierOneSubsetCompliance: "strict"` for OpenAI-strict-only constraints), `envelope-variant-discriminator-static.test.ts` (RFC 0031, always-on; asserts no `oneOf` + every `anyOf` branch declares a single-string-enum discriminator in `required` on every `schemas/envelopes/*.schema.json`), `model-capability-substituted.test.ts` (RFC 0031, advertisement-shape probe on `capabilities.modelCapabilities.advertised[]` identifier pattern + 5 `it.todo()` placeholders for SECURITY invariant `model-capability-substituted-no-credential-disclosure`), `model-capability-insufficient.test.ts` (RFC 0031, 6 `it.todo()` placeholders for refusal + no-recursive-fallback), `node-module-required-capabilities-shape.test.ts` (RFC 0031 SHOULD-tier authoring-convention; 4 `it.todo()` placeholders), and the six envelope-reliability events from RFC 0032 (`envelope-retry-attempted` carrying the shared advertisement-shape probe enforcing both MUST-tier events in `events[]` per RFC 0032 §C, plus `envelope-retry-exhausted`, `envelope-refusal-shape`, `envelope-truncated`, `envelope-nl-to-format-engaged`, `envelope-recovery-applied` — collectively 39 `it.todo()` placeholders covering retry/refusal/truncation/recovery + SECURITY invariants `envelope-refusal-no-prompt-leak` and `envelope-recovery-no-content-leak`), plus RFC 0033's two scenarios (`envelope-completion-distinguishes-truncation.test.ts` + `envelope-truncation-cap-exhaustion.test.ts` — 12 `it.todo()` placeholders covering the truncation-vs-schema-violation retry-routing distinction + the DoS-bound assertion). Reference workflow-engine sample advertises `capabilities.envelopes.reasoning: { supported: true, promptDirective: "off" }` + `tierOneSubsetCompliance: "warn"` honestly (schemas accept the field; host doesn't yet inject the directive); the other three RFCs' capability blocks defer to reference-host emission code per the staged RFC 0027 §G precedent. 2026-05-20 (RFC 0028 §B Phase B — prompt-pack boot-time install) added `prompt-pack-install.test.ts` (capability-gated on `capabilities.prompts.endpointsSupported: true`; asserts a host that ran the boot-time pack loader surfaces ≥ 1 pack-source template under `GET /v1/prompts?source=pack` carrying the canonical `meta.source: "pack"` + `meta.packName` + `meta.packVersion` stamps; positively identifies the in-tree `vendor.openwop.prompt-sample` reference pack's `writer-system` template when present). Pairs with the new `host/promptPackLoader.ts` boot-time entry on the reference workflow-engine sample, which scans `examples/packs/*` plus `OPENWOP_PROMPT_PACKS_DIR` and calls `installPackTemplates()` for each `kind: "prompt"` pack found. 2026-05-20 (RFC 0029 Phase C — prompt resolution chain wire shape) added three more scenarios: `prompt-resolution-chain-node-wins.test.ts` (capability-gated on `capabilities.prompts.supported: true`; asserts layer-1 node-config supersedes lower layers per `spec/v1/prompts.md` §"Resolution chain (normative)"), `prompt-resolution-chain-agent-intrinsic.test.ts` (additionally gated on `capabilities.prompts.agentBindings: true`; asserts agent intrinsic `systemPromptRef` wins over `promptOverrides` AND lower layers when the node has no layer-1 ref), `prompt-resolution-chain-fallback-cascade.test.ts` (asserts layer 3 workflow-defaults wins over layer 4 host-defaults; layer 4 host-defaults wins when 1-3 yield null; resolved is null when all four yield null but chain[] still lists every attempted layer). The scenarios drive the host's `POST /v1/host/sample/prompt/resolve` test seam (reference-host implementation deferred to follow-up slice per RFC 0021 staging precedent). 2026-05-20 (RFC 0027 Phase A — prompt templates wire shape) added three scenarios: `prompt-template-shape.test.ts` (always-on; Ajv compileability + positive/negative round-trip for PromptTemplate + PromptRef + PromptKind), `prompt-composed-secret-redaction.test.ts` (capability-gated on `capabilities.prompts.supported: true` + `observability: "full"`; asserts `[REDACTED:<secretId>]` markers in `prompt.composed` payloads for `source: "secret"` variable bindings per SECURITY/threat-model-secret-leakage.md §SR-1), `prompt-composed-trust-marker.test.ts` (same capability gates; asserts `<UNTRUSTED>...</UNTRUSTED>` wrapping + `contentTrust: "untrusted"` propagation per RFC 0020 §D). Paired with new `fixtures/prompt-templates/` sub-directory + per-fixture schema-validity describe block + future SECURITY invariants `prompt-composed-secret-redaction` and `prompt-composed-trust-marker` (lands alongside reference-host emission per RFC 0021 staging precedent). 2026-05-18 (RFC 0022 `Draft` — runtime variable mapping) added four `it.todo()` placeholder scenarios covering the new mapping surfaces on `core.dispatch` (§A — `dispatch-input-mapping.test.ts`, `dispatch-output-mapping.test.ts`, `dispatch-cross-worker-handoff.test.ts`) and `core.subWorkflow` (§B — `subworkflow-input-mapping.test.ts`). Gated on `capabilities.agents.dispatchMapping` (dispatch trio) and `capabilities.subWorkflow.inputMapping` (subWorkflow). Promote to live assertions when RFC 0022 reaches `Active` + a reference host advertises the matching flags. 2026-05-17 (RFC 0003 §D handoff-schema enforcement, HV-1) added `agentPackHandoffSchemaValidation.test.ts` — verifies the host validates dispatch payloads against `handoff.taskSchemaRef` AND return payloads against `handoff.returnSchemaRef` per RFC 0003 §D. Paired with the new `agent-pack-handoff-schema-enforcement` row in `SECURITY/invariants.yaml`. 2026-05-17 (AI Envelope gap-closure, DRAFT v1.x — `spec/v1/ai-envelope.md`) added 7 advertisement-shape scenarios with `it.todo()` behavioral placeholders gated on `capabilities.envelopeContracts.advertised: true`: `aiEnvelope.universalKinds.test.ts`, `aiEnvelope.schemaDrift.test.ts`, `aiEnvelope.correlationReplay.test.ts`, `aiEnvelope.contractRefusal.test.ts`, `aiEnvelope.trustBoundaryPropagation.test.ts`, `aiEnvelope.redaction.test.ts`, `aiEnvelope.capBreached.test.ts`. Paired with the new `envelope-redaction-sr-1-carry-forward` row in `SECURITY/invariants.yaml`. 2026-05-17 (post-publish hardening, deep audit of `core.openwop.agents`) added `agents-run-tool-allowlist.test.ts` — server-free scenario locking in the `core.openwop.agents@1.0.1` safety-fix that closes `OPENWOP-AUDIT-2026-003` (function-typed `tool.handler` properties rejected at `validateTools()` with `INVALID_TOOL_DECLARATION`; tool-driven runs require `ctx.agentRuntime`; tool-less safe fallback preserved). Paired with the new `agents-run-no-raw-handler` row in `SECURITY/invariants.yaml`. Same-day post-publish hardening added `idempotency-key-determinism.test.ts` — server-free scenario locking in the `core.openwop.http@1.1.2` determinism safety-fix (default `composite` mode produces deterministic keys in `(runId, nodeId, payload)`; removed `uuid` mode rejects with `CONFIG_INVALID`; cross-impl vector test lets third-party reimplementations verify wire agreement). Paired with the new `idempotency-key-deterministic` row in `SECURITY/invariants.yaml`. 2026-05-17 (Phase 3 of RFC 0013) added three server-free scenarios exercising the reference workflow-chain expansion library (`conformance/src/lib/workflow-chain-expansion.ts`): `workflow-chain-expansion.test.ts` (parameter substitution + node id collision avoidance + edge rewriting + capability propagation + runtime-invariance contract), `workflow-chain-unresolvable-typeid.test.ts` (rejection with `chain_unresolvable_typeid` when a chain references an unknown typeId), and `workflow-chain-pack-signature-verification.test.ts` (Ed25519 verification recipe reuse from `node-packs.md §Signing`). Earlier that day (Phase 1) added `workflow-chain-pack-manifest-validation.test.ts` — server-free schema-validation scenario covering the new `workflow-chain-pack-manifest.schema.json` (positive sample + two negatives: kind/contents mismatch and invalid `chainId`). Closes RFC 0013 (`Workflow-chain packs`, `Draft`) Phases 1 + 3 alongside the new `spec/v1/workflow-chain-packs.md`, the `Capabilities.workflowChainPacks` block, and the registry build-index/conformance-check `kind` routing from Phase 2. Earlier that day, the suite added 27 `it.todo()` placeholder scenarios paired with RFCs 0014-0020 (host capability surfaces — fs, kvStorage, tableStorage, queueBus, sql/vector/search, blob/cache, mcp.serverMount). These promote to live assertions when each RFC reaches `Active` + the matching capability block lands in `schemas/capabilities.schema.json` + a reference host advertises the capability. Earlier additions include 18 Multi-Agent Shift scenarios (Phases 1-5) added 2026-05-10, the `registry-public.test.ts` public-registry healthcheck added 2026-05-11 (opt-in via `OPENWOP_TEST_PUBLIC_REGISTRY=true`), the `replay-llm-cache-key.test.ts` placeholder added 2026-05-11 (three `it.todo()` cases for the cross-host LLM cache-key recipe per `replay.md` §"LLM cache-key recipe"), the two `production-*.test.ts` scenarios added 2026-05-11 for the `openwop-production` profile per RFC 0009 (`production-backpressure.test.ts`, `production-retention-expiry.test.ts`), the four `auth-*.test.ts` scenarios added 2026-05-11/12 for the production-auth profiles per RFC 0010 (`auth-api-key-rotation.test.ts`, `auth-oauth2-client-credentials.test.ts`, `auth-oidc-user-bearer.test.ts`, `auth-mtls.test.ts` (opt-in via `OPENWOP_TEST_MTLS=1`)), `replay-retention-expiry.test.ts` added 2026-05-12 (capability shape + 410/422 envelope per `replay.md` §"Retention and garbage collection"), `bulk-cancel.test.ts` added 2026-05-12 (Phase B close-out of R1 — `POST /v1/runs:bulk-cancel`), the two Phase H launch-blocker advertisement-contract scenarios added 2026-05-12 (`mcp-toolcall-redaction.test.ts` for the MCP-1 invariant per `host-capabilities.md §host.mcp` + `threat-model-prompt-injection.md §UNTRUSTED`, and `http-client-ssrf.test.ts` for the SSRF + body-size cap advertisement contract on `capabilities.httpClient`), the `wasm-pack-abi-version-rejection.test.ts` Track 7 scenario added 2026-05-12 for the ABI-mismatch positive path via the `vendor.openwop.misbehaving-abi` pack per RFC 0008 §H, the `otel-trace-propagation-subworkflow.test.ts` Track 11 close-out added 2026-05-13 (parent + child run spans share the inbound traceparent's traceId across the `core.subWorkflow` dispatch boundary), and the three RFC 0012 (Memory Compaction Profile, `Active`) scenarios added 2026-05-13/14: `memory-compaction-sr1-carry-forward.test.ts` (load-bearing SR-1 §D), `memory-compaction-event-emitted.test.ts` (canonical §B payload shape), and `memory-compaction-provenance-tag.test.ts` (soft assertion on §C `compacted-from:<id>` convention). All three gate on `capabilities.memory.compaction.supported` + the host's test seam at `/v1/test/memory/{seed,compact}` (Postgres reference host enables both via `OPENWOP_MEMORY_COMPACTION=true OPENWOP_TEST_TRIGGER_COMPACTION=true`). 2026-05-15 (gap-closure CF-3) added `interrupt-token-matrix.test.ts` (malformed / unknown / replay / cross-run-id paths on `GET|POST /v1/interrupts/{token}`). The maintained scenario-to-spec map lives in [`coverage.md`](./coverage.md); this README keeps the operator quickstart and the historical scenario notes below.
 High-level coverage includes:
@@ -172,7 +172,7 @@ Server-required (added in 1.7.0):
 |---|---|---|
 | **Redaction** | [`capabilities.md`](../spec/v1/capabilities.md) §"Secrets" + NFR-7 + §"aiProviders" | Vendor-neutral assertions that the server doesn't leak secret material. Three scenario groups: (a) discovery shape contract — `secrets` + `aiProviders` advertisements are well-formed regardless of `secrets.supported`; when `supported === true`, scopes MUST be non-empty + `resolution === 'host-managed'`; `byok ⊆ supported`. (b) bearer-token redaction — invalid Bearer canary in `Authorization` header is not echoed in the 401 response body. (c) credentialRef echo control — gated on `secrets.supported === true`; canary planted in `configurable.ai.credentialRef` MUST NOT appear in any RunEvent payload (poll-based capture; transport-agnostic). Uses runtime-built canary fixtures (`lib/canaries.ts`) that defeat static secret scanners. 6 scenarios. |
-Current source tree: 308 scenario files. Use [`coverage.md`](./coverage.md) for current grade/gap tracking.
+Current source tree: 310 scenario files. Use [`coverage.md`](./coverage.md) for current grade/gap tracking.
 ## Remaining Gaps

package/coverage.md CHANGED Viewed

@@ -51,14 +51,14 @@
 | RFC 0041 §B replay-divergence-at-refusal behavioral (`version: 4`) | `replay-divergence-at-refusal.test.ts` (advertisement-shape + behavioral; 3 assertions PASS against workflow-engine when the `multiAgent.executionModel.version: 4` advertisement is enabled) | A (was `it.todo` until 2026-05-23 when the executor wiring landed — see commit `1fce55a` + `bba3b4a`. Behavioral assertions cover both divergence directions: original=valid + replay=refusal AND original=refusal + replay=valid) | Closes Track #4 of the 2026-05-22 multi-agent behavioral-harness close-out. Reference workflow-engine emits `replay.divergedAtRefusal` event + fails run with `error.code: 'replay_diverged_at_refusal'` when source vs replay envelope kinds differ at the same nodeId. Gated on `OPENWOP_MULTI_AGENT_EXECUTION_MODEL_PHASE_4=true` AND `run.forkMode === 'replay'`. Path-to-Accepted for RFC 0041: non-steward host advertises `multiAgent.executionModel.version: 4` end-to-end. |
 | Agent-manifest runtime floor (RFC 0070 — `capabilities.agents.manifestRuntime`) | `agent-manifest-runtime.test.ts` | B (capability-gated; lists ≥1 installed manifest agent + dispatches one with attributed `agent.reasoned`+`agent.decided` events, plus a §F sub-threshold-escalation assertion) | RFC 0070 filed Draft 2026-05-26. Gated on `capabilities.agents.manifestRuntime.supported` + the host dispatch seam (`POST /v1/host/sample/agents/{agentId}/dispatch`); soft-skips when either is absent. The reference **workflow-engine** host advertises `manifestRuntime: { supported: true, handoffValidation: true }`, loads pack `agents[]` (RFC 0003 `installAgents`) into an AgentRegistry at boot, and dispatches end-to-end (toolAllowlist-filtered per RFC 0002 §A14, handoff-validated per RFC 0003 §D, confidence-escalating per §F) — see `apps/workflow-engine/backend/typescript/test/agent-dispatch-route.test.ts` (6 HTTP assertions, incl. the normative inventory). **RFC 0072 (`Draft`):** the scenario's inventory leg now drives the NORMATIVE `GET /v1/agents` (§A) so it runs black-box against any conformant host; the dispatch leg stays on the sample seam (soft-skips off-steward) pending the executor-integration tier. RFC 0072 §C `peerDependenciesMeta` disposition + `degraded[]` are unit-tested in `agent-loader.test.ts`. Path to `Active → Accepted` (RFC 0070): a non-steward host advertises `manifestRuntime` + serves `GET /v1/agents`. |
 | Live manifest dispatch (RFC 0077 — `capabilities.agents.liveRuntime`) | `agent-live-runtime-shape.test.ts` | A (always-on, server-free shape probe) | RFC 0077 promoted Draft → Active 2026-05-29 (5 UQs resolved via MyndHyve T4 co-design). Always-on shape probe asserts `capabilities.agents.liveRuntime` (+ `supported`/`structuredOutput`/`confidenceEscalation`/`sources` sub-flags) is declared, the `agentInvocationStarted`/`agentInvocationCompleted` payloads validate conforming content-free records + reject malformed ones (`started` missing `source`; `completed` out-of-enum `outcome`), and both event names appear in the RunEventType enum. `liveRuntime` ⊃ `manifestRuntime`. **Behavioral scenarios deferred** per RFC 0077 §Conformance (reference host): the started→completed bracket ordering, `structuredOutput` enforcement, and `toolAllowlist` enforcement gate on `capabilities.agents.liveRuntime.supported` + a live-invoke seam and soft-skip until a host wires it. Path to `Accepted`: a non-steward host advertises `liveRuntime` + emits the invocation pair (net-new MyndHyve T4 work, queued behind §B). |
-| Agent evaluation (RFC 0081 — `capabilities.agents.evalSuite`) | `agent-eval-suite-shape.test.ts` | A (always-on, server-free shape probe; doubles as the public test for `eval-summary-no-content-leak`) | RFC 0081 promoted Draft → Active 2026-05-30. Always-on shape probe asserts `capabilities.agents.evalSuite` (+ `supported`/`modes` sub-flags) is declared; the `AgentEvalSuite` + `EvalSummary` schemas compile + round-trip a conforming artifact and reject malformed ones (bad `suiteId` infix; `passScore` out of 0..1; out-of-range `aggregateScore`); the `eval.started`/`eval.scored`/`eval.completed` payloads validate content-free records + reject malformed ones; and all three event names appear in the RunEventType enum. The **content-free negatives** (an `EvalSummary` task entry carrying a `taskOutput` body; a `safetyFinding` carrying an `excerpt`) are the public test for protocol-tier SECURITY invariant `eval-summary-no-content-leak`. **Behavioral scenarios deferred** per RFC 0081 §Conformance (reference host): `agent-eval-run.test.ts` (the `eval.started`→per-task `eval.scored`→`eval.completed` ordering, the `EvalSummary` round-trip, the `mode: "eval"` 501 on unadvertised hosts) gates on `capabilities.agents.evalSuite.supported` + the eval-run seam and soft-skips until a host wires the eval projection. Path to `Accepted`: a host advertises `evalSuite` + runs a golden/regression suite end-to-end (the `GET /v1/runs/{runId}/eval-summary` endpoint + SDK helper land with it). |
+| Agent evaluation (RFC 0081 — `capabilities.agents.evalSuite`) | `agent-eval-suite-shape.test.ts` | A (always-on, server-free shape probe; doubles as the public test for `eval-summary-no-content-leak`) | RFC 0081 promoted Draft → Active 2026-05-30. Always-on shape probe asserts `capabilities.agents.evalSuite` (+ `supported`/`modes` sub-flags) is declared; the `AgentEvalSuite` + `EvalSummary` schemas compile + round-trip a conforming artifact and reject malformed ones (bad `suiteId` infix; `passScore` out of 0..1; out-of-range `aggregateScore`); the `eval.started`/`eval.scored`/`eval.completed` payloads validate content-free records + reject malformed ones; and all three event names appear in the RunEventType enum. The **content-free negatives** (an `EvalSummary` task entry carrying a `taskOutput` body; a `safetyFinding` carrying an `excerpt`) are the public test for protocol-tier SECURITY invariant `eval-summary-no-content-leak`. **Behavioral scenario authored + gated** (2026-05-31; see §"Capability-gated scenarios"): `agent-eval-run.test.ts` (the `eval.started`→per-task `eval.scored`→`eval.completed` ordering, the content-free `eval.scored` legs, and the NORMATIVE `GET /v1/runs/{runId}/eval-summary` schema-valid round-trip) gates on `capabilities.agents.evalSuite.supported` + the `POST /v1/host/sample/agents/eval-run` seam (`behaviorGate('openwop-eval-run', …)`) and soft-skips until a host wires the eval projection. Path to `Accepted`: a host advertises `evalSuite` + runs a golden/regression suite end-to-end (the `GET /v1/runs/{runId}/eval-summary` endpoint + SDK helper already landed). |
 | Memory capability model (RFC 0080 — `spec/v1/agent-memory.md` §"Memory capability model", `spec/v1/profiles.md` §`openwop-memory`) | `memory-capability-model-shape.test.ts` | A (always-on, server-free shape probe) | RFC 0080 promoted Draft → Active 2026-05-30 (4 UQs resolved via MyndHyve review). Always-on shape probe asserts the additive `capabilities.memory.{writable,search,retention}` dimensions are declared (existing `supported`/`compaction`/`distillation`/`attribution` untouched), `memory.search`/`memory.retention` validate conforming instances + reject malformed ones (`retention.ttl` non-boolean; out-of-enum `search.modes`; unknown property under `additionalProperties:false`), `agent-inventory-response` declares `memoryDegraded` + the closed-enum `degradedMemoryDimensions` (the eight §A dimension names), and `deriveProfiles` surfaces `openwop-memory` for a read/write + long-term host while withholding it from a `writable:false` host. **Behavioral scenario deferred** per RFC 0080 §Conformance: `memory-degraded-projection.test.ts` (a live `GET /v1/agents` stamping `memoryDegraded` when an agent's `memoryShape` exceeds the host's reconciled model) gates on `agents.manifestRuntime` + `memory` and soft-skips until a reference host computes it. Path to `Accepted`: a host computes the §C degraded projection + the scenario passes against it. |
 | Portable tool catalog (RFC 0078 — `spec/v1/tool-catalog.md`) | `tool-descriptor-shape.test.ts` | A (always-on, server-free shape probe) | RFC 0078 promoted Draft → Active 2026-05-30 (4 UQs resolved via MyndHyve review). Always-on shape probe asserts `tool-descriptor.schema.json` round-trips a conforming `ToolDescriptor` + rejects the malformed (`safetyTier`-required, `additionalProperties:false`), enforces the §C-1/§F-4 cross-field MUST (`safetyTier:"exec"` ⇒ `source:"host-extension"`, RFC 0069 — an `exec`+`node-pack` descriptor is rejected), asserts the `capabilities.toolCatalog` `supported`/`sources`/`sessionLifecycle` shape, and validates the two content-free `tool.session.{opened,closed}` payloads (incl. the closed `outcome` enum) + their RunEventType-enum membership. **Behavioral scenarios deferred** per RFC 0078 §Conformance: `tool-catalog-projection.test.ts` (the authorization-scoped `GET /v1/tools` + `404` non-disclosure) + `tool-session-lifecycle.test.ts` (the `tool.session.*` bracket ordering) gate on `capabilities.toolCatalog.supported` + `sessionLifecycle` and soft-skip until a reference host serves the catalog. Path to `Accepted`: a host projects ≥1 tool source at `GET /v1/tools` + the projection scenario passes. |
 | Credential provenance + egress policy (RFC 0079 — `spec/v1/host-capabilities.md` §"Credential provenance + egress policy") | `egress-provenance-shape.test.ts` | A (always-on, server-free shape probe; doubles as the public test for `egress-decision-no-secret-leak`) | RFC 0079 promoted Draft → Active 2026-05-30 (4 UQs resolved via MyndHyve review). Always-on shape probe asserts `credential-provenance.schema.json` round-trips a conforming `CredentialProvenance` + rejects `audiences:[]` / missing `credentialId` / unknown property, the descriptor + `egress.decided` declare NO secret-value property (the content-free **`egress-decision-no-secret-leak`** protocol-tier invariant), the `egress.decided` payload validates a content-free record + enforces the `decision` enum + required `decision`/`destination`, and `capabilities.httpClient.egressPolicy` is declared. **The behavioral audience-binding MUST-NOT (`egress-credential-audience-bound`) is reference-impl tier** at Draft→Active — a credential bound to audience A on an egress to B must be `denied`/`downgraded` (never `allowed`-with-credential), fail-closed on unevaluable provenance — and lands in the gated `egress-audience-binding.test.ts` + `egress-decision-content-free.test.ts` (soft-skip until a host wires `egressPolicy` over `safeFetch`). Path to `Accepted`: a reference host enforces §C + the binding scenario passes → `egress-credential-audience-bound` graduates protocol-tier (RFC 0035 precedent). |
 | Durable trigger + channel bridge (RFC 0083 — `spec/v1/trigger-bridge.md`, `spec/v1/profiles.md` §`openwop-trigger-bridge`) | `trigger-bridge-shape.test.ts` | A (always-on, server-free shape probe) | RFC 0083 promoted Draft → Active 2026-05-30 (5 UQs resolved via MyndHyve review). Always-on shape probe asserts `trigger-subscription.schema.json` round-trips a conforming `TriggerSubscription` + rejects missing-`state`/out-of-enum-`source`/unknown-property, the four-state vocab (`active`/`paused`/`failed`/`dead-lettered`) is stable, the two content-free `trigger.{subscription.state.changed,delivery.attempted}` payloads validate + enforce the `state`/`outcome` enums + RunEventType-enum membership, `capabilities.triggerBridge` + `webhooks.durable` are declared, and `deriveProfiles` surfaces `openwop-trigger-bridge` for bridge+sink+durable-source while withholding it with no dead-letter sink. **Behavioral scenario deferred** per RFC 0083 §Conformance: `trigger-bridge-delivery.test.ts` (dedup → retry → dead-letter → trigger→run causation) is profile-gated on `openwop-trigger-bridge` and soft-skips until a reference host wires durable delivery. Path to `Accepted`: a host wires the state machine + delivery loop + the scenario passes. |
 | Budget, quota + cost policy (RFC 0084 — `spec/v1/budget-policy.md`) | `budget-policy-shape.test.ts` | A (always-on, server-free shape probe; doubles as the public test for `budget-no-pricing-leak`) | RFC 0084 promoted Draft → Active 2026-05-30 (5 UQs resolved via MyndHyve review). Always-on shape probe asserts `budget-policy.schema.json` round-trips a conforming `BudgetPolicy` + enforces the §A/§E orthogonality guard (a wall-time field is rejected — that's RFC 0058's `runTimeoutMs`) + threshold/onExhaustion negatives, the four content-free `budget.{reserved,consumed,threshold.crossed,exhausted}` payloads validate + enforce the `dimension`/`scope` enums, the four `cap.breached{budget-tokens,budget-cost,budget-tool-calls,budget-retries}` kinds + the four `budget.*` RunEventType-enum entries are present, the payloads declare no pricing/credential property (the **`budget-no-pricing-leak`** protocol-tier invariant), and `capabilities.budget` + `limits.maxBudget{Tokens,CostUsd}` are declared. **Behavioral scenario deferred** per RFC 0084 §Conformance: `budget-enforcement.test.ts` (accrue → threshold → exhaust → `cap.breached{budget-cost}` → `run.failed{budget_exhausted}`; `budget_model_denied`; advisory no-stop) gates on `budget.supported` + `enforce` and soft-skips until a reference host wires accounting. Path to `Accepted`: a host wires the budget accumulator + the exhaustion stop + the scenario passes. |
 | `openwop-agent-platform` meta-profile (RFC 0085 — `spec/v1/agent-platform-profile.md`, operational annex) | `agent-platform-profile.test.ts` | A (always-on, server-free derivation probe) | RFC 0085 promoted Draft → Active 2026-05-30 (5 UQs resolved via MyndHyve review). Operational annex (NOT a closed `profiles.md` catalog predicate). Always-on derivation probe asserts `isAgentPlatformPartial`/`isAgentPlatformFull`/`agentPlatformStatus` derive `none`/`partial`/`full` correctly: all-floor ⇒ partial, a missing floor flag ⇒ none, the replay-OR-`nondeterminismPolicy.declared` term, floor+governance ⇒ full, a missing governance term (tenant `installScope`) ⇒ partial-not-full (the honest-advertisement rule), and that the eval/deploy/budget platform-plus tier is advisory (a full host without them is still full); plus `capabilities.nondeterminismPolicy.declared` is declared. **Live aggregate-evidence assertion deferred** per RFC 0085 §C: when a host claims `full`, the meta-scenario must assert every required constituent scenario is in its passing set — naturally gated on a reference host reaching partial/full (Postgres is the candidate). Path to `Accepted`: ≥1 host reports `partial`/`full` backed by the aggregate + renders the badge. |
-| Agent deployment lifecycle (RFC 0082 — `capabilities.agents.deployment`) | `agent-deployment-shape.test.ts` | A (always-on, server-free shape probe; doubles as the public test for `deployment-event-no-content-leak`) | RFC 0082 promoted Draft → Active 2026-05-30. Always-on shape probe asserts `capabilities.agents.deployment` (+ `supported`/`channels`/`canary`/`rollback`/`states` sub-flags); the `AgentDeployment` record compiles + round-trips and rejects malformed ones (out-of-enum `state`; `canaryPercent` out of 0..100); the **`AgentRef` `channel` XOR `version`** rule (each alone + neither validate; both rejected by the `not` clause, §A); the four `deployment.*` payloads validate content-free records + reject malformed ones; `agent.invocation.started` carries the additive recorded-fact `resolvedAgentVersion`/`resolvedChannel` (§B channel pin); and all four event names appear in the RunEventType enum. The **content-free negatives** (a `deployment.promoted` carrying a `manifestBody`; a `deployment.state.changed` carrying a `prompt`) are the public test for protocol-tier SECURITY invariant `deployment-event-no-content-leak`. The behavioral `deployment-promotion-fail-closed` invariant is `reference-impl` tier until the behavioral scenario lands (then graduates to protocol; RFC 0035 precedent). **Behavioral scenarios deferred** per RFC 0082 §Conformance (reference host): `agent-deployment-lifecycle.test.ts` (authz → approvalGate → eval-verify → `deployment.promoted`; the fail-closed denial; the §B replay re-read of `resolvedAgentVersion`) gates on `capabilities.agents.deployment.supported` + the deployment-store seam and soft-skips until a host wires it. Path to `Accepted`: a host implements the deployment store + canary router + the `POST /v1/agents/{agentId}/deployments` promotion contract (the endpoint + SDK helper land with it). |
+| Agent deployment lifecycle (RFC 0082 — `capabilities.agents.deployment`) | `agent-deployment-shape.test.ts` | A (always-on, server-free shape probe; doubles as the public test for `deployment-event-no-content-leak`) | RFC 0082 promoted Draft → Active 2026-05-30. Always-on shape probe asserts `capabilities.agents.deployment` (+ `supported`/`channels`/`canary`/`rollback`/`states` sub-flags); the `AgentDeployment` record compiles + round-trips and rejects malformed ones (out-of-enum `state`; `canaryPercent` out of 0..100); the **`AgentRef` `channel` XOR `version`** rule (each alone + neither validate; both rejected by the `not` clause, §A); the four `deployment.*` payloads validate content-free records + reject malformed ones; `agent.invocation.started` carries the additive recorded-fact `resolvedAgentVersion`/`resolvedChannel` (§B channel pin); and all four event names appear in the RunEventType enum. The **content-free negatives** (a `deployment.promoted` carrying a `manifestBody`; a `deployment.state.changed` carrying a `prompt`) are the public test for protocol-tier SECURITY invariant `deployment-event-no-content-leak`. The behavioral `deployment-promotion-fail-closed` invariant is `reference-impl` tier until the behavioral scenario lands (then graduates to protocol; RFC 0035 precedent). **Behavioral scenario authored + gated** (2026-05-31; see §"Capability-gated scenarios"): `agent-deployment-lifecycle.test.ts` (the §E authz → approvalGate → eval-verify → `deployment.promoted` promotion + the record round-trip, the fail-closed denial, the `eval_gate_unmet` denial, and the §B `resolvedAgentVersion` channel pin) gates on `capabilities.agents.deployment.supported` + the `POST /v1/host/sample/agents/deployment-transition` seam (`behaviorGate('openwop-deployment-lifecycle', …)`) and soft-skips until a host wires it. When it passes against a host, the `deployment-promotion-fail-closed` invariant graduates reference-impl → protocol tier. Path to `Accepted`: a host implements the deployment store + canary router + the `POST /v1/agents/{agentId}/deployments` promotion contract (the endpoint + SDK helper already landed). |
 | Standing agent roster (RFC 0086 — `capabilities.agents.roster`) | `agent-roster-shape.test.ts` | A (always-on, server-free shape probe; doubles as the public test for `roster-attribution-no-content`) | RFC 0086 promoted Draft → Active 2026-05-30. Always-on shape probe asserts `capabilities.agents.roster` (+ `supported`/`installScope`/`portfolioTriggerSources` sub-flags); the `AgentRosterEntry` record compiles + round-trips and rejects malformed ones (a non-`host:` `rosterId`; an `agentRef` carrying BOTH `version` and `channel` — the RFC 0082 §A XOR rule; a missing `rosterId`); the `roster.run.initiated` payload validates a content-free attribution record + requires its ids/persona/triggerSource; the `AgentInventoryEntry` carries the additive optional `roster` portfolio projection (§B); and `roster.run.initiated` appears in the RunEventType enum. The **content-free negatives** (a `roster.run.initiated` carrying a `body`; one carrying a `prompt`) are the public test for protocol-tier SECURITY invariant `roster-attribution-no-content`. **Behavioral scenarios deferred** per RFC 0086 §Conformance (reference host): a scheduled portfolio fire emitting `roster.run.initiated` before `agent.invocation.started`; the RFC 0083 work-item causation chain; the replay re-read; the cross-tenant `GET /v1/agents/roster/{id}` 404 — gate on `capabilities.agents.roster.supported` + the roster-store seam and soft-skip until a host wires it (the host-extension at `/v1/host/sample/roster` + board attribution, apps/workflow-engine #368, is the reference demonstration). Path to `Accepted`: a non-steward host advertises `agents.roster` + emits `roster.run.initiated`. |
 | Agent org-chart (RFC 0087 — `capabilities.agents.orgChart`) | `agent-org-chart-shape.test.ts` | A (always-on, server-free shape probe; doubles as the public test for `org-position-no-authority-escalation`) | RFC 0087 promoted Draft → Active 2026-05-30. Always-on shape probe asserts `capabilities.agents.orgChart` (+ `supported`/`installScope`/`departmentNesting`/`responsibilityView` sub-flags); the `AgentOrgChart` record compiles + round-trips and rejects malformed ones (a non-`host:` member `rosterId`; a chart missing `members`). The **§B structural non-authority guarantee**: the schema **rejects** an authority-bearing field on a member (`scopes`/`canDispatch`/`permissions`/`authority` — every object is `additionalProperties:false`), and a conforming member's key set is exactly `{rosterId, departmentId, roleId, reportsTo}`. These are the public test for protocol-tier SECURITY invariant `org-position-no-authority-escalation` (an org edge confers no authority — position describes, never authorizes). NO new RunEventType (the org-chart is structure + a read, not an event surface). **Behavioral scenarios deferred** per RFC 0087 §Conformance (reference host): the live-dispatch refusal of a manager's tool over-reach; an RFC 0049 decision invariant to org position; the cross-tenant `GET /v1/agents/org-chart` 404; the §D responsibility roll-up over live roster portfolios — gate on `capabilities.agents.orgChart.supported` + the org-store seam and soft-skip until a host wires it (the host-extension at `/v1/host/sample/org-chart`, apps/workflow-engine #371, is the reference demonstration). Path to `Accepted`: a non-steward host advertises `agents.orgChart` + passes the behavioral non-authority scenario. |
@@ -111,6 +111,8 @@ Thirty-one scenario groups validate optional profiles where the host's discovery
 | `agent-org-chart-scoping.test.ts` | `capabilities.agents.orgChart.supported` (RFC 0087 §A/§C/§D, `agent-org-chart.md`) | A (the normative `GET /v1/agents/org-chart` tree-shape — acyclic `parentDepartmentId` tree + `host:<id>` member rosterIds — + the §D `GET /v1/agents/org-chart/{departmentId}` responsibility roll-up with a deduped `responsibilities[]` union + `recursive=false` shape-stability + the RFC 0074 cross-tenant 404 via `OPENWOP_CROSS_TENANT_ORG_CHART_DEPARTMENT_ID`) | `host-pending` | `behaviorGate('openwop-org-chart-scoping', …)`. Black-box on the normative path (no POST seam); soft-skips on 404 / when the cross-tenant env var is unset. **Part of the RFC 0087 → Accepted bar.** First adopter: MyndHyve `agents.orgChart`. |
 | `org-position-no-authority-escalation.test.ts` | `capabilities.agents.orgChart.supported` (RFC 0087 §B) + `SECURITY/invariants.yaml` `org-position-no-authority-escalation` | A (behavioral leg of the protocol-tier invariant — the live org-chart wire carries NO authority-bearing field (`scopes`/`canDispatch`/`permissions`/`authority`/`roleGrants`/`capabilities`) on any member / department / responsibility-view object, proving the host's projector strips position-as-authority at every install scope) | `host-pending` | `behaviorGate('openwop-org-position-no-authority', …)`. Black-box on the normative path; soft-skips on 404. The STRUCTURAL leg (schema rejects an authority field) stays always-on in `agent-org-chart-shape.test.ts`; the deeper RFC 0049/0051 authority-invariance legs stay reference-impl tier (a non-normative authz-decide hook would be required — the `agent-manifest-runtime` precedent). **Part of the RFC 0087 → Accepted bar.** |
 | `trigger-bridge-delivery.test.ts` | `openwop-trigger-bridge` profile (RFC 0083 §C/§D, `trigger-bridge.md`; derived from discovery — bridge + dead-letter sink + durable source) | A (the §C delivery model via `POST /v1/host/sample/trigger-bridge/deliver` + the test event-log seam: dedup→effectively-once `trigger.delivery.attempted{delivered}` (§C-1), retry-exhaustion→`{dead-lettered}` + `trigger.subscription.state.changed{toState:dead-lettered}` (§C-2 + RFC 0053), and the delivered run's `run.started.causationId` == the delivery id (§C / RFC 0040); both `trigger.*` events content-free) | `host-pending` | `behaviorGate('openwop-trigger-bridge', …)`. Profile-gated; seam-gated; soft-skips on 404. Normative `GET /v1/trigger-subscriptions` read runs black-box. **This is the RFC 0083 → Accepted bar.** First adopter: MyndHyve `triggerBridge`. |
+| `agent-eval-run.test.ts` | `capabilities.agents.evalSuite.supported` (RFC 0081 §B/§C, `agent-evaluation.md`) | A (the §C `mode:"eval"` projection via `POST /v1/host/sample/agents/eval-run` + the test event-log seam: `eval.started`-first → one `eval.scored` per task → `eval.completed`-once ordering (`count == eval.completed.taskCount`), every `eval.scored` content-free + `score` ∈ 0..1 backing `eval-summary-no-content-leak`; plus the NORMATIVE `GET /v1/runs/{runId}/eval-summary` returning a schema-valid `EvalSummary` with `passedCount <= taskCount` and no per-task output body) | `host-pending` | `behaviorGate('openwop-eval-run', …)`. Seam-gated; soft-skips on 404. The normative eval-summary read runs black-box on any `evalSuite` host. **This is the RFC 0081 → Accepted bar.** First adopter: MyndHyve `agents.evalSuite`. |
+| `agent-deployment-lifecycle.test.ts` | `capabilities.agents.deployment.supported` (RFC 0082 §B/§E, `agent-deployment.md`) + `SECURITY/invariants.yaml` `deployment-promotion-fail-closed` | A (the §E promotion contract via `POST /v1/host/sample/agents/deployment-transition` + the test event-log seam: a `promote` runs authorize (RFC 0049) → approvalGate (RFC 0051) → eval-verify (RFC 0081) → content-free `deployment.promoted` with a seven-state `toState` + `toVersion`, the returned record validating `agent-deployment.schema.json`; an `unauthorized` principal fails closed — `allowed:false`, NO `deployment.promoted` (the behavioral leg of `deployment-promotion-fail-closed`); an `eval-gate-unmet` promote is denied `eval_gate_unmet` with NO `deployment.promoted` (§E-3); and a `channel-pin` run records `resolvedAgentVersion` on `agent.invocation.started` (§B)) | `host-pending` | `behaviorGate('openwop-deployment-lifecycle', …)`. Seam-gated; soft-skips on 404. The normative `GET /v1/agents/{agentId}/deployments` read runs black-box on any `deployment` host. **This is the RFC 0082 → Accepted bar** (the `deployment-promotion-fail-closed` invariant graduates reference-impl → protocol tier when this passes against a host). First adopter: MyndHyve `agents.deployment`. |
 | `approval-gate-events.test.ts` | `approval.granted` / `.rejected` / `.overridden` (RFC 0051 §B, `interrupt-profiles.md` §approvalGate) | Server-free (event-payload schema validity: required fields incl. mandatory `overridden.reason`; additionalProperties:false negatives) | host-pass (server-free) | Always runs; no host needed. |
 | `approval-gate-flow.test.ts` | `core.openwop.governance.approvalGate` (RFC 0051 §A) + `capabilities.authorization` (RFC 0049) | A (capability-gated on `authorization.supported`; unauthorized-principal-denied + override-audited via the `governance/approval-gate` seam) | `host-pending` | Behavioral probe soft-skips on 404. Grant/reject-loopback/quorum scenarios deferred until a governance host wires the seam. |
 | `scheduling-capability-shape.test.ts` | `capabilities.scheduling` (RFC 0052 §A, `host-capabilities.md` §host.scheduling) | A (advertisement shape always — `supported` boolean; `cron`/`delayed`/`calendar` booleans; `maxFutureHorizon` ISO-8601 duration) | `host-pending` | Always runs; asserts the block is absent or well-formed. |
@@ -171,9 +173,9 @@ Every OpenAPI operation should have:
 | `getRunAncestry` | `cross-host-ancestry-endpoint.test.ts`, `cross-host-causation-shape.test.ts` (RFC 0040 §C); capability-gated on `capabilities.multiAgent.executionModel.crossHostCausation.ancestryEndpointSupported` | Unadvertised-host 404 path + top-level `parent: null` shape covered | Add positive multi-hop traversal once a reference host implements end-to-end cross-host composition. |
 | `listAgents` | `agent-manifest-runtime.test.ts` (RFC 0072 §A); capability-gated on `capabilities.agents.manifestRuntime.supported` | Normative `GET /v1/agents` inventory leg — lists ≥1 installed manifest agent; soft-skips (404) when unadvertised | Black-box across hosts; dispatch leg via sample seam pending the executor-integration tier. |
 | `getAgent` | `agent-manifest-runtime.test.ts` (RFC 0072 §A) + `agent-dispatch-route.test.ts` (reference host) | One manifest agent's inventory entry + 404 for unknown | Covered against the workflow-engine reference host. |
-| `getEvalSummary` | `agent-eval-suite-shape.test.ts` (RFC 0081 — schema/shape of the returned `EvalSummary`); capability-gated on `capabilities.agents.evalSuite.supported` | Wire shape + the content-free invariant covered always-on; the live `GET /v1/runs/{runId}/eval-summary` round-trip is the behavioral `agent-eval-run.test.ts` deferred to `Active → Accepted` (reference host wires the eval projection). | Add the live 200/404/409 path once a host runs a `mode:"eval"` suite end-to-end. |
-| `listAgentDeployments` | `agent-deployment-shape.test.ts` (RFC 0082 — the `AgentDeployment` record shape the array returns); capability-gated on `capabilities.agents.deployment.supported` | Record shape covered always-on; the live list is the behavioral `agent-deployment-lifecycle.test.ts` deferred to `Active → Accepted` (reference host wires the deployment store). | Add the live 200/404 path once a host implements the deployment store. |
-| `transitionAgentDeployment` | `agent-deployment-shape.test.ts` (RFC 0082 — the `agent-deployment-transition` request + `deployment.*` event shapes) + `deployment-event-no-content-leak` public test; capability-gated on `capabilities.agents.deployment.supported` | Request/record/event shapes + content-free negatives covered always-on; the live authz→gate→eval-verify→`deployment.promoted` path is the behavioral `agent-deployment-lifecycle.test.ts` deferred to `Active → Accepted`. | Add the live fail-closed `403` / `eval_gate_unmet` / `no_active_deployment` assertions once a host wires the promotion contract. |
+| `getEvalSummary` | `agent-eval-suite-shape.test.ts` (RFC 0081 — schema/shape of the returned `EvalSummary`) + the now-authored behavioral `agent-eval-run.test.ts` (the live `GET /v1/runs/{runId}/eval-summary` schema-valid round-trip); capability-gated on `capabilities.agents.evalSuite.supported` | Wire shape + the content-free invariant covered always-on; the live round-trip is `agent-eval-run.test.ts`, `host-pending` until a host advertises `evalSuite` + wires the `POST /v1/host/sample/agents/eval-run` seam. | First adopter: MyndHyve `agents.evalSuite`. |
+| `listAgentDeployments` | `agent-deployment-shape.test.ts` (RFC 0082 — the `AgentDeployment` record shape the array returns) + the now-authored behavioral `agent-deployment-lifecycle.test.ts` (the live `GET /v1/agents/{agentId}/deployments` black-box read); capability-gated on `capabilities.agents.deployment.supported` | Record shape covered always-on; the live list is `agent-deployment-lifecycle.test.ts`, `host-pending` until a host advertises `deployment` + wires the deployment-transition seam. | First adopter: MyndHyve `agents.deployment`. |
+| `transitionAgentDeployment` | `agent-deployment-shape.test.ts` (RFC 0082 — the `agent-deployment-transition` request + `deployment.*` event shapes) + `deployment-event-no-content-leak` public test + the now-authored behavioral `agent-deployment-lifecycle.test.ts` (the live authz→gate→eval-verify→`deployment.promoted` + fail-closed + `eval_gate_unmet` legs via the seam); capability-gated on `capabilities.agents.deployment.supported` | Request/record/event shapes + content-free negatives covered always-on; the live promotion contract is `agent-deployment-lifecycle.test.ts`, `host-pending` until a host wires the deployment store. | First adopter: MyndHyve `agents.deployment`. |
 | `listAgentRoster` | `agent-roster-shape.test.ts` (RFC 0086 §B — the `agent-roster-response` shape); capability-gated on `capabilities.agents.roster.supported` | Response shape covered always-on; the live `GET /v1/agents/roster` 200/404 + tenant-scoping is deferred to `Active → Accepted` (reference host serves the normative `/v1/agents/roster`, vs the sample-extension `/v1/host/sample/roster`). | Add the live path once a host serves the normative roster endpoint. |
 | `getAgentRosterEntry` | `agent-roster-shape.test.ts` (RFC 0086 §B — the `agent-roster-entry` shape); capability-gated on `capabilities.agents.roster.supported` | Entry shape covered always-on; the live 200/404 + cross-tenant-404 is deferred to `Active → Accepted`. | Add the live path once a host serves the normative endpoint. |
 | `getAgentOrgChart` | `agent-org-chart-shape.test.ts` (RFC 0087 §C — the `agent-org-chart` shape + the `org-position-no-authority-escalation` structural test); capability-gated on `capabilities.agents.orgChart.supported` | Chart shape + the no-authority structural guarantee covered always-on; the live `GET /v1/agents/org-chart` 200/404 + tenant-scoping is deferred to `Active → Accepted`. | Add the live path once a host serves the normative endpoint. |

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@openwop/openwop-conformance",
-  "version": "1.12.0",
+  "version": "1.13.0",
   "description": "Production-ready black-box conformance suite for OpenWOP v1.0 compliant servers.",
   "repository": {
     "type": "git",

package/src/lib/agentDeployment.ts ADDED Viewed

@@ -0,0 +1,117 @@
+/**
+ * Shared helpers for the RFC 0082 `agents.deployment` conformance scenario.
+ * Lives in lib/ (not a `*.test.ts`) so scenarios import it via
+ * `../lib/agentDeployment.js`.
+ *
+ * Two surfaces:
+ *   - the NORMATIVE read (`GET /v1/agents/{agentId}/deployments`, RFC 0082
+ *     §C/§E), exercised black-box; and
+ *   - the host-sample deployment-transition seam
+ *     (`POST /v1/host/sample/agents/deployment-transition`), used to drive the
+ *     §E promotion contract (authorize → approvalGate → eval-verify →
+ *     `deployment.promoted`), the fail-closed denial, the eval-gate-unmet
+ *     denial, and the §B channel-resolution pin, so the `deployment.*` events +
+ *     the `resolvedAgentVersion` recorded fact can be asserted against the test
+ *     event-log seam. The seam is OPTIONAL — scenarios soft-skip on 404/405
+ *     (the reference deployment store is deferred per RFC 0082 §Conformance).
+ *
+ * Gating uses the `agents.deployment.supported` capability flag from the live
+ * discovery doc (root-first per RFC 0073).
+ *
+ * @see RFCS/0082-agent-deployment-lifecycle.md
+ * @see spec/v1/agent-deployment.md
+ */
+import { driver } from './driver.js';
+import { readCapabilityFamily } from './discovery-capabilities.js';
+/** Reads `agents.deployment` from discovery (root-first per RFC 0073); null
+ *  when unadvertised. */
+export async function readDeploymentCap(): Promise<Record<string, unknown> | null> {
+  const agents = await readCapabilityFamily<{ deployment?: unknown }>('agents');
+  const dep = agents?.deployment;
+  return dep && typeof dep === 'object' ? (dep as Record<string, unknown>) : null;
+}
+export interface DeploymentRecord {
+  agentId?: string;
+  version?: string;
+  state?: string;
+  canaryPercent?: number;
+  channels?: string[];
+  [k: string]: unknown;
+}
+/** GET the NORMATIVE deployment-record list (RFC 0082 §C/§E
+ *  `GET /v1/agents/{agentId}/deployments`); null when the host doesn't serve it
+ *  (404/405/501). */
+export async function listDeployments(agentId: string): Promise<DeploymentRecord[] | null> {
+  const res = await driver.get(`/v1/agents/${encodeURIComponent(agentId)}/deployments`);
+  if (res.status === 404 || res.status === 405 || res.status === 501) return null;
+  return (res.json as DeploymentRecord[] | undefined) ?? [];
+}
+export interface TransitionResult {
+  runId?: string;
+  record?: DeploymentRecord;
+  /** Fail-closed signal: false when the principal lacks the `deploy:*` scope. */
+  allowed?: boolean;
+  /** A denial reason (e.g. `eval_gate_unmet`, `no_active_deployment`). */
+  error?: string;
+  /** §B — the channel→version pin recorded at first resolution. */
+  resolvedAgentVersion?: string;
+}
+/**
+ * Drive one deployment transition through the host-sample seam (RFC 0082 §E).
+ * `scenario`:
+ *   - `promote`         — authorize → gate → eval-verify (when `evalRunId` set)
+ *                         → emit `deployment.promoted` (§E).
+ *   - `unauthorized`    — a principal lacking `deploy:promote`; MUST fail closed
+ *                         (`allowed:false`, no `deployment.promoted`) — the
+ *                         `deployment-promotion-fail-closed` invariant.
+ *   - `eval-gate-unmet` — promote with an `evalRunId` whose `EvalSummary.passed`
+ *                         is false; MUST deny with `eval_gate_unmet` (§E-3).
+ *   - `channel-pin`     — start a `@channel`-bound run; the resolved version is
+ *                         recorded as `resolvedAgentVersion` on
+ *                         `agent.invocation.started` (§B).
+ * Returns null when the seam is unwired (404/405).
+ */
+export async function driveDeploymentTransition(
+  body: {
+    scenario: 'promote' | 'unauthorized' | 'eval-gate-unmet' | 'channel-pin';
+    agentId?: string;
+    version?: string;
+    channel?: string;
+    evalRunId?: string;
+  },
+): Promise<TransitionResult | null> {
+  const res = await driver.post('/v1/host/sample/agents/deployment-transition', body);
+  if (res.status === 404 || res.status === 405) return null;
+  return (res.json as TransitionResult | undefined) ?? {};
+}
+/** The seven-state lifecycle vocabulary (RFC 0082 §C). */
+export const DEPLOYMENT_STATES = [
+  'draft',
+  'test',
+  'staged',
+  'active',
+  'paused',
+  'deprecated',
+  'rolled-back',
+];
+/** Content keys a `deployment.*` event / record MUST NEVER carry (SECURITY
+ *  invariant `deployment-event-no-content-leak`, SR-1): manifest body, prompt,
+ *  or credential material. */
+export const DEPLOYMENT_CONTENT_FORBIDDEN = [
+  'manifestBody',
+  'manifest',
+  'prompt',
+  'systemPrompt',
+  'body',
+  'secret',
+  'credentials',
+  'token',
+  'apiKey',
+];

package/src/lib/agentEval.ts ADDED Viewed

@@ -0,0 +1,83 @@
+/**
+ * Shared helpers for the RFC 0081 `agents.evalSuite` conformance scenario.
+ * Lives in lib/ (not a `*.test.ts`) so scenarios import it via
+ * `../lib/agentEval.js`.
+ *
+ * Two surfaces:
+ *   - the NORMATIVE read (`GET /v1/runs/{runId}/eval-summary`, RFC 0081 §C),
+ *     exercised black-box; and
+ *   - the host-sample eval-run seam (`POST /v1/host/sample/agents/eval-run`),
+ *     used to drive the §B `mode:"eval"` projection so the `eval.*` ordering +
+ *     the terminal `EvalSummary` can be asserted against the test event-log
+ *     seam. The seam is OPTIONAL — scenarios soft-skip on 404/405 (the
+ *     reference eval projection is deferred per RFC 0081 §Conformance).
+ *
+ * Gating uses the `agents.evalSuite.supported` capability flag from the live
+ * discovery doc (root-first per RFC 0073).
+ *
+ * @see RFCS/0081-agent-evaluation-and-scorecards.md
+ * @see spec/v1/agent-evaluation.md
+ */
+import { driver } from './driver.js';
+import { readCapabilityFamily } from './discovery-capabilities.js';
+/** Reads `agents.evalSuite` from discovery (root-first per RFC 0073); null when
+ *  unadvertised. */
+export async function readEvalSuiteCap(): Promise<Record<string, unknown> | null> {
+  const agents = await readCapabilityFamily<{ evalSuite?: unknown }>('agents');
+  const es = agents?.evalSuite;
+  return es && typeof es === 'object' ? (es as Record<string, unknown>) : null;
+}
+export interface EvalRunResult {
+  runId?: string;
+  suiteId?: string;
+  suiteVersion?: string;
+  taskCount?: number;
+  passed?: boolean;
+  aggregateScore?: number;
+}
+/**
+ * Drive one `mode:"eval"` projection through the host-sample eval-run seam
+ * (RFC 0081 §B). `body.modes` selects the eval modes (default golden); the host
+ * picks a default agent + a built-in golden suite when `agentId` is omitted.
+ * Returns null when the seam is unwired (404/405).
+ */
+export async function driveEvalRun(
+  body: { agentId?: string; modes?: string[]; taskCount?: number } = {},
+): Promise<EvalRunResult | null> {
+  const res = await driver.post('/v1/host/sample/agents/eval-run', body);
+  if (res.status === 404 || res.status === 405) return null;
+  return (res.json as EvalRunResult | undefined) ?? {};
+}
+/** GET the NORMATIVE eval scorecard (RFC 0081 §C
+ *  `GET /v1/runs/{runId}/eval-summary`); returns `{ status, summary }` so a
+ *  caller can distinguish a 404 (not-an-eval-run / unadvertised) or 409 (still
+ *  running) from a served summary. */
+export async function getEvalSummary(
+  runId: string,
+): Promise<{ status: number; summary: Record<string, unknown> | undefined }> {
+  const res = await driver.get(`/v1/runs/${encodeURIComponent(runId)}/eval-summary`);
+  return { status: res.status, summary: res.json as Record<string, unknown> | undefined };
+}
+/** The closed five-mode vocabulary (RFC 0081 §D). */
+export const EVAL_MODES = ['golden', 'rubric', 'adversarial', 'regression', 'live-shadow'];
+/** Content keys an `eval.*` event / `EvalSummary` task entry MUST NEVER carry
+ *  (SECURITY invariant `eval-summary-no-content-leak`, SR-1): task output,
+ *  rubric prose, model completion, prompt, or credential material. */
+export const EVAL_CONTENT_FORBIDDEN = [
+  'taskOutput',
+  'output',
+  'rubric',
+  'completion',
+  'prompt',
+  'body',
+  'secret',
+  'credentials',
+  'token',
+  'apiKey',
+];

package/src/scenarios/agent-deployment-lifecycle.test.ts ADDED Viewed

@@ -0,0 +1,147 @@
+/**
+ * Agent deployment lifecycle — the §E promotion contract + §B channel pin
+ * (RFC 0082) — behavioral.
+ *
+ * Capability-gated on `agents.deployment.supported` (root-first per RFC 0073).
+ * Soft-skips when unadvertised (default) / hard-fails under
+ * `OPENWOP_REQUIRE_BEHAVIOR=true`. The always-on wire-shape coverage lives in
+ * `agent-deployment-shape.test.ts`; this asserts host BEHAVIOR via the
+ * `POST /v1/host/sample/agents/deployment-transition` seam + the test event-log
+ * seam + the NORMATIVE `GET /v1/agents/{agentId}/deployments` read:
+ *
+ *   1. PROMOTE (§E) — authorize → approvalGate → eval-verify → a content-free
+ *      `deployment.promoted` with `toState` in the seven-state vocabulary; the
+ *      returned record validates against `agent-deployment.schema.json`.
+ *   2. FAIL-CLOSED (§E-1, `deployment-promotion-fail-closed`) — a principal
+ *      lacking `deploy:promote` is denied (`allowed !== true`) and emits NO
+ *      `deployment.promoted`.
+ *   3. EVAL-GATE-UNMET (§E-3) — a promote whose `evalRunId` has `passed:false`
+ *      is denied with `eval_gate_unmet` and emits NO `deployment.promoted`.
+ *   4. CHANNEL PIN (§B) — a `@channel`-bound run records the resolved version as
+ *      `resolvedAgentVersion` on `agent.invocation.started` (the recorded fact a
+ *      replay re-reads rather than re-resolving).
+ *
+ * Each leg soft-skips independently (seam absent / event-log seam absent).
+ *
+ * Spec references:
+ *   - https://github.com/openwop/openwop/blob/main/spec/v1/agent-deployment.md (§B/§E)
+ *   - https://github.com/openwop/openwop/blob/main/RFCS/0082-agent-deployment-lifecycle.md
+ */
+import { describe, it, expect } from 'vitest';
+import { readFileSync } from 'node:fs';
+import { join } from 'node:path';
+import Ajv2020 from 'ajv/dist/2020.js';
+import addFormats from 'ajv-formats';
+import { driver } from '../lib/driver.js';
+import { behaviorGate } from '../lib/behavior-gate.js';
+import { SCHEMAS_DIR } from '../lib/paths.js';
+import {
+  readDeploymentCap,
+  driveDeploymentTransition,
+  DEPLOYMENT_STATES,
+  DEPLOYMENT_CONTENT_FORBIDDEN,
+} from '../lib/agentDeployment.js';
+import { queryTestEvents, isEventLogSeamAvailable, resetTestSeam } from '../lib/event-log-query.js';
+function loadSchema(name: string): Record<string, unknown> {
+  return JSON.parse(readFileSync(join(SCHEMAS_DIR, name), 'utf8')) as Record<string, unknown>;
+}
+function expectContentFree(payload: Record<string, unknown>, where: string): void {
+  for (const f of DEPLOYMENT_CONTENT_FORBIDDEN) {
+    expect(
+      !(f in payload),
+      driver.describe('RFC 0082 §D (deployment-event-no-content-leak)', `${where} MUST be content-free (no ${f})`),
+    ).toBe(true);
+  }
+}
+describe('agent-deployment-lifecycle (RFC 0082 §B/§E)', () => {
+  it('promotes via the eval+RBAC+approval gate, fails closed without scope/eval, and pins the channel version', async () => {
+    const cap = await readDeploymentCap();
+    if (!behaviorGate('openwop-deployment-lifecycle', cap?.supported === true)) return;
+    if (!(await isEventLogSeamAvailable())) return; // event-log seam absent — soft-skip
+    const ajv = new Ajv2020({ strict: false, allErrors: true });
+    addFormats(ajv);
+    const validateRecord = ajv.compile(loadSchema('agent-deployment.schema.json'));
+    // ---- Leg 1: eval+RBAC+approval-gated promotion (§E) ------------------
+    const promote = await driveDeploymentTransition({ scenario: 'promote' });
+    if (promote === null) return; // deployment seam unwired — soft-skip the whole behavioral suite
+    if (promote.record) {
+      expect(
+        validateRecord(promote.record),
+        driver.describe(
+          'agent-deployment.schema.json',
+          `a promoted deployment record MUST validate (${ajv.errorsText(validateRecord.errors)})`,
+        ),
+      ).toBe(true);
+    }
+    if (promote.runId) {
+      const pq = await queryTestEvents(promote.runId, { type: 'deployment.promoted' });
+      if (pq.ok) {
+        for (const e of pq.events) {
+          expectContentFree(e.payload, 'deployment.promoted');
+          expect(
+            typeof e.payload.toState === 'string' && DEPLOYMENT_STATES.includes(e.payload.toState as string),
+            driver.describe('run-event-payloads.schema.json#/$defs/deploymentPromoted', 'toState MUST be in the seven-state vocabulary'),
+          ).toBe(true);
+          expect(
+            typeof e.payload.toVersion === 'string' && (e.payload.toVersion as string).length > 0,
+            driver.describe('agent-deployment.md §D', 'deployment.promoted MUST carry the promoted toVersion'),
+          ).toBe(true);
+        }
+      }
+    }
+    // ---- Leg 2: fail-closed authz (§E-1; deployment-promotion-fail-closed) -
+    const unauth = await driveDeploymentTransition({ scenario: 'unauthorized' });
+    if (unauth && unauth.runId) {
+      expect(
+        unauth.allowed !== true,
+        driver.describe('agent-deployment.md §E-1', 'a principal without deploy:promote MUST be denied (fail-closed)'),
+      ).toBe(true);
+      const uq = await queryTestEvents(unauth.runId, { type: 'deployment.promoted' });
+      if (uq.ok) {
+        expect(
+          uq.events.length === 0,
+          driver.describe('SECURITY invariant deployment-promotion-fail-closed', 'a denied transition MUST emit NO deployment.promoted'),
+        ).toBe(true);
+      }
+    }
+    // ---- Leg 3: eval-gate-unmet denial (§E-3) ----------------------------
+    const evalUnmet = await driveDeploymentTransition({ scenario: 'eval-gate-unmet' });
+    if (evalUnmet && evalUnmet.runId) {
+      expect(
+        evalUnmet.error === 'eval_gate_unmet' || evalUnmet.allowed !== true,
+        driver.describe('agent-deployment.md §E-3', 'a promote whose eval evidence has passed:false MUST be denied (eval_gate_unmet)'),
+      ).toBe(true);
+      const eq = await queryTestEvents(evalUnmet.runId, { type: 'deployment.promoted' });
+      if (eq.ok) {
+        expect(
+          eq.events.length === 0,
+          driver.describe('agent-deployment.md §E-3', 'an unmet eval gate MUST emit NO deployment.promoted'),
+        ).toBe(true);
+      }
+    }
+    // ---- Leg 4: channel-resolution pin (§B) ------------------------------
+    const pin = await driveDeploymentTransition({ scenario: 'channel-pin', channel: 'stable' });
+    if (pin && pin.runId) {
+      const iq = await queryTestEvents(pin.runId, { type: 'agent.invocation.started' });
+      if (iq.ok && iq.events.length > 0) {
+        const started = iq.events.sort((a, b) => a.sequence - b.sequence)[0]!;
+        expect(
+          typeof started.payload.resolvedAgentVersion === 'string' && (started.payload.resolvedAgentVersion as string).length > 0,
+          driver.describe('agent-deployment.md §B', 'a @channel-bound run MUST record resolvedAgentVersion on agent.invocation.started (the recorded fact a replay re-reads)'),
+        ).toBe(true);
+      }
+    }
+    await resetTestSeam();
+  });
+});

package/src/scenarios/agent-eval-run.test.ts ADDED Viewed

@@ -0,0 +1,145 @@
+/**
+ * Agent eval-run — the `mode:"eval"` projection (RFC 0081 §B/§C) — behavioral.
+ *
+ * Capability-gated on `agents.evalSuite.supported` (root-first per RFC 0073).
+ * Soft-skips when unadvertised (default) / hard-fails under
+ * `OPENWOP_REQUIRE_BEHAVIOR=true`. The always-on wire-shape coverage lives in
+ * `agent-eval-suite-shape.test.ts`; this asserts host BEHAVIOR via the
+ * `POST /v1/host/sample/agents/eval-run` seam + the test event-log seam + the
+ * NORMATIVE `GET /v1/runs/{runId}/eval-summary` read:
+ *
+ *   1. ORDERING (§C) — an eval run emits `eval.started` FIRST, one `eval.scored`
+ *      per task, then `eval.completed` once (count == eval.completed.taskCount).
+ *   2. CONTENT-FREE (SR-1 / `eval-summary-no-content-leak`) — every `eval.scored`
+ *      carries scores / ids / scalars ONLY (never task output / rubric / prose);
+ *      `score` ∈ 0..1; `passed` is a boolean.
+ *   3. NORMATIVE SUMMARY (§C) — `GET /v1/runs/{runId}/eval-summary` returns a
+ *      schema-valid `EvalSummary` whose `passedCount <= taskCount` and whose
+ *      task entries carry no output body.
+ *
+ * Each leg soft-skips independently (seam absent / event-log seam absent).
+ *
+ * Spec references:
+ *   - https://github.com/openwop/openwop/blob/main/spec/v1/agent-evaluation.md (§B/§C)
+ *   - https://github.com/openwop/openwop/blob/main/RFCS/0081-agent-evaluation-and-scorecards.md
+ */
+import { describe, it, expect } from 'vitest';
+import { readFileSync } from 'node:fs';
+import { join } from 'node:path';
+import Ajv2020 from 'ajv/dist/2020.js';
+import addFormats from 'ajv-formats';
+import { driver } from '../lib/driver.js';
+import { behaviorGate } from '../lib/behavior-gate.js';
+import { SCHEMAS_DIR } from '../lib/paths.js';
+import {
+  readEvalSuiteCap,
+  driveEvalRun,
+  getEvalSummary,
+  EVAL_CONTENT_FORBIDDEN,
+} from '../lib/agentEval.js';
+import { queryTestEvents, isEventLogSeamAvailable, resetTestSeam } from '../lib/event-log-query.js';
+function loadSchema(name: string): Record<string, unknown> {
+  return JSON.parse(readFileSync(join(SCHEMAS_DIR, name), 'utf8')) as Record<string, unknown>;
+}
+function expectContentFree(payload: Record<string, unknown>, where: string): void {
+  for (const f of EVAL_CONTENT_FORBIDDEN) {
+    expect(
+      !(f in payload),
+      driver.describe('RFC 0081 §C (eval-summary-no-content-leak)', `${where} MUST be content-free (no ${f})`),
+    ).toBe(true);
+  }
+}
+describe('agent-eval-run (RFC 0081 §B/§C)', () => {
+  it('emits eval.started → per-task eval.scored → eval.completed and serves a content-free EvalSummary', async () => {
+    const cap = await readEvalSuiteCap();
+    if (!behaviorGate('openwop-eval-run', cap?.supported === true)) return;
+    if (!(await isEventLogSeamAvailable())) return; // event-log seam absent — soft-skip
+    const run = await driveEvalRun({ modes: ['golden'] });
+    if (run === null) return; // eval-run seam unwired — soft-skip the whole behavioral suite
+    if (!run.runId) return;
+    // ---- Legs 1+2: eval.* ordering + content-free (§C) -------------------
+    const startedQ = await queryTestEvents(run.runId, { type: 'eval.started' });
+    const scoredQ = await queryTestEvents(run.runId, { type: 'eval.scored' });
+    const completedQ = await queryTestEvents(run.runId, { type: 'eval.completed' });
+    if (startedQ.ok && scoredQ.ok && startedQ.events.length > 0) {
+      const started = startedQ.events.sort((a, b) => a.sequence - b.sequence)[0]!;
+      // eval.started precedes every eval.scored (§C ordering).
+      for (const s of scoredQ.events) {
+        expect(
+          started.sequence < s.sequence,
+          driver.describe('agent-evaluation.md §C', 'eval.started MUST precede every eval.scored'),
+        ).toBe(true);
+      }
+      if (completedQ.ok && completedQ.events.length > 0) {
+        const completed = completedQ.events.sort((a, b) => a.sequence - b.sequence)[completedQ.events.length - 1]!;
+        for (const s of scoredQ.events) {
+          expect(
+            s.sequence < completed.sequence,
+            driver.describe('agent-evaluation.md §C', 'every eval.scored MUST precede eval.completed'),
+          ).toBe(true);
+        }
+        // eval.scored is emitted once per task (count == eval.completed.taskCount).
+        if (typeof completed.payload.taskCount === 'number') {
+          expect(
+            scoredQ.events.length === completed.payload.taskCount,
+            driver.describe('agent-evaluation.md §C', 'one eval.scored per task (count == eval.completed.taskCount)'),
+          ).toBe(true);
+        }
+        expectContentFree(completed.payload, 'eval.completed');
+      }
+      // each eval.scored content-free + score ∈ 0..1, passed boolean.
+      for (const s of scoredQ.events) {
+        expectContentFree(s.payload, 'eval.scored');
+        expect(
+          typeof s.payload.score === 'number' && (s.payload.score as number) >= 0 && (s.payload.score as number) <= 1,
+          driver.describe('run-event-payloads.schema.json#/$defs/evalScored', 'eval.scored.score MUST be in 0..1'),
+        ).toBe(true);
+        expect(
+          typeof s.payload.passed === 'boolean',
+          driver.describe('run-event-payloads.schema.json#/$defs/evalScored', 'eval.scored.passed MUST be a boolean'),
+        ).toBe(true);
+      }
+      expectContentFree(started.payload, 'eval.started');
+    }
+    // ---- Leg 3: NORMATIVE EvalSummary read (§C) --------------------------
+    const { status, summary } = await getEvalSummary(run.runId);
+    if (status === 200 && summary) {
+      const ajv = new Ajv2020({ strict: false, allErrors: true });
+      addFormats(ajv);
+      const validate = ajv.compile(loadSchema('eval-summary.schema.json'));
+      expect(
+        validate(summary),
+        driver.describe(
+          'eval-summary.schema.json',
+          `GET /v1/runs/{runId}/eval-summary MUST return a schema-valid EvalSummary (${ajv.errorsText(validate.errors)})`,
+        ),
+      ).toBe(true);
+      const tasks = (summary.tasks as Array<Record<string, unknown>> | undefined) ?? [];
+      const passedCount = summary.passedCount as number | undefined;
+      const taskCount = summary.taskCount as number | undefined;
+      if (typeof passedCount === 'number' && typeof taskCount === 'number') {
+        expect(
+          passedCount <= taskCount,
+          driver.describe('agent-evaluation.md §C', 'EvalSummary.passedCount MUST NOT exceed taskCount'),
+        ).toBe(true);
+      }
+      for (const t of tasks) {
+        expectContentFree(t, 'EvalSummary.tasks[]');
+      }
+    }
+    await resetTestSeam();
+  });
+});