npm - @openwop/openwop-conformance - Versions diffs - 1.4.0 → 1.5.0 - Mend

@openwop/openwop-conformance 1.4.0 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/CHANGELOG.md +41 -0
package/coverage.md +7 -7
package/package.json +1 -1
package/schemas/capabilities.schema.json +9 -0
package/src/scenarios/multi-agent-confidence-escalation.test.ts +46 -9
package/src/scenarios/sandbox-capability-gate-respected.test.ts +9 -13
package/src/scenarios/sandbox-memory-cap.test.ts +5 -8
package/src/scenarios/sandbox-no-cross-pack-mutation.test.ts +8 -13
package/src/scenarios/sandbox-no-host-env-leak.test.ts +10 -21
package/src/scenarios/sandbox-no-host-fs-escape.test.ts +12 -15
package/src/scenarios/sandbox-no-host-process-escape.test.ts +14 -13
package/src/scenarios/sandbox-no-network-escape.test.ts +10 -31
package/src/scenarios/sandbox-timeout-cap.test.ts +5 -8

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,46 @@
 # `@openwop/openwop-conformance` Changelog
+## [1.5.0] — 2026-05-22
+Minor bump per `PUBLISHING.md` §"Versioning alignment" — unblocks MyndHyve's RFC 0044 + RFC 0039 Half A co-graduation by shipping the relaxed + RFC-0044-routing assertion logic in `multi-agent-confidence-escalation.test.ts`. No new scenario files; no new fixtures. Behavioral honesty pass on 8 sandbox scenarios + schema additions for the new RFC 0044 capability advertisement.
+### Changed — RFC 0044 interrupt-kind routing in `multi-agent-confidence-escalation.test.ts`
+Previously the scenario asserted `expect(terminal.status).toBe('waiting-clarification')` — strict equality on the clarify-kind escalation path, which rejected even RFC 0039 §A's own escalate-approval path (→ `waiting-approval`). v1.5.0 ships the relaxed + RFC-0044-routing logic landed in upstream commits `f03d01d` (relaxation to accept both canonical statuses) + `641d088` (RFC 0044 vendor-kind routing):
+- **Canonical kind advertised** (`clarification` / `approval`) → strict `expect(terminal.status).toBe('waiting-clarification' | 'waiting-approval')`.
+- **Vendor kind advertised** (`x-host-<host>-<kind>` per `host-extensions.md` §"Canonical prefixes") → `expect(terminal.status.startsWith('waiting-')).toBe(true)`; the host's `interrupt.md` mapping determines the suffix.
+- **No advertisement** → fall back to the canonical either-status check (preserves the `f03d01d` relaxation).
+This unblocks MyndHyve's `confidenceEscalationInterruptKind: 'x-host-myndhyve-low-confidence'` advertisement (their entrenched `interrupt.kind: 'low-confidence'` → `waiting-approval` mapping) without forcing a cross-cutting rename of `LOW_CONFIDENCE_SUSPEND_REASON` + `mockAgent.node` + `escalationThreshold.ts` + downstream UI consumers. See RFC 0044 §B (`RFCS/0044-confidence-escalation-interrupt-kind-advertisement.md`) for the normative contract.
+### Changed — Sandbox scenarios converted vacuous `expect(true).toBe(true)` to `it.todo` (honesty pass)
+The 8 `sandbox-*.test.ts` scenarios in v1.4.0 carried `expect(true).toBe(true)` tautology assertions for their behavioral legs. v1.5.0 converts them to `it.todo()` per upstream commit `5864a2f`:
+- `sandbox-no-host-process-escape.test.ts`
+- `sandbox-no-network-escape.test.ts`
+- `sandbox-no-host-fs-escape.test.ts`
+- `sandbox-no-host-env-leak.test.ts`
+- `sandbox-timeout-cap.test.ts`
+- `sandbox-memory-cap.test.ts`
+- `sandbox-no-cross-pack-mutation.test.ts`
+- `sandbox-capability-gate-respected.test.ts`
+Test reporters now surface 8 todos instead of 8 vacuous passes. The advertisement-shape probes (in `sandbox-no-host-fs-escape`, `sandbox-memory-cap`, `sandbox-timeout-cap`) still run real discovery-doc assertions when capabilities are advertised. Behavioral assertions light up when a sandbox-executing reference host wires the seam.
+### Changed — `schemas/capabilities.schema.json` (vendored): adds `multiAgent.executionModel.confidenceEscalationInterruptKind`
+Per RFC 0044 §A. The optional field accepts the canonical literal `"clarification"` / `"approval"` OR a vendor extension matching `^x-host-[a-z][a-z0-9-]*-[a-z][a-z0-9-]*$` per `host-extensions.md` §"Canonical prefixes". Required for the routing logic above; absent advertisement falls back to the canonical-status check.
+### No new scenario files
+Scenario file count unchanged at 205 (the v1.4.0 baseline). All changes are behavior modifications to existing files.
+### Known limits — unchanged from v1.4.0
+The 6 `it.todo` behavioral assertions across RFC 0034 OTel-seam-gated, RFC 0040 traceparent-propagation, RFC 0041 refusal-divergence + observable-sequence scenarios remain. The 8 sandbox `it.todo` assertions are new in v1.5.0 (replacing the v1.4.0 vacuous-pass shapes).
 ## [1.4.0] — 2026-05-22
 Minor bump per `PUBLISHING.md` §"Versioning alignment" — bundles 45 new conformance scenarios + 23 new fixtures landing since the 1.3.0 publish (2026-05-19). Unblocks non-steward host adoption of RFCs 0027 + 0028 + 0029 + 0030 + 0031 + 0032 + 0033 + 0034 + 0035 + 0036 + 0037 + 0039 + 0040 + 0041 against a single suite version.

package/coverage.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # OpenWOP Conformance Coverage Map
-> **Status: Living document. Updated 2026-05-11.** This map connects the current scenario files to the protocol surfaces they protect and records the remaining gaps from the protocol deep dive. Scenario names are source-of-truth file names under `conformance/src/scenarios/`.
+> **Status: Living document. Updated 2026-05-22.** This map connects the current scenario files to the protocol surfaces they protect and records the remaining gaps from the protocol deep dive. Scenario names are source-of-truth file names under `conformance/src/scenarios/`.
 > **Shape grade vs behavior grade.** Some optional-profile scenarios validate **capability shape** (the host's discovery advertisement is well-formed) without yet exercising **behavior** (the host actually implements the profile end-to-end). The "Current grade" column reflects shape; see §"Capability-gated scenarios: shape vs behavior" below for the dual-grade view and the `OPENWOP_REQUIRE_BEHAVIOR=true` strict-mode runner flag.
@@ -124,12 +124,12 @@ Every OpenAPI operation should have:
 | `getArtifact` | Indirect through approval payload fixtures | `route-coverage.test.ts` covers unknown artifact `404`/`403` envelope; `artifact-auth.test.ts` (CF-4 close-out 2026-05-15; SQLite host 401-before-404 stub landed 2026-05-19, closes the info-leak surface for every HTTP method) covers `401` unauthenticated path | Negative paths covered (401 + 405 non-GET + 404/403) | Add positive artifact-read scenario once a reference host implements `getArtifact` end-to-end. |
 | `registerWebhook` | Webhook spec exists | `route-coverage.test.ts` covers invalid URL validation envelope | Add positive registration with a test receiver when harness support exists. |
 | `unregisterWebhook` | Webhook spec exists | `route-coverage.test.ts` covers unknown subscription behavior | Add full register-then-unregister roundtrip with a test receiver. |
-| `listPromptTemplates` | `prompt-template-shape.test.ts` covers schema shape + advertisement contract for `capabilities.prompts.*`; capability-gated behavioral list-with-filter scenarios deferred to RFC 0028 acceptance gate | n/a yet — endpoint surface in spec only (RFC 0028 Draft); reference host hasn't implemented the route yet | Add positive list-with-filter scenario + auth-failure + invalid-cursor scenarios once a reference host implements the route. |
-| `createPromptTemplate` | None — endpoint surface in spec only (RFC 0028 Draft) | n/a yet | Add positive create + `409` duplicate + `501` not-mutable-library + auth/scope scenarios once a reference host implements the route. |
-| `getPromptTemplate` | None — endpoint surface in spec only | n/a yet | Add positive fetch + `404` unknown + `400` ambiguous-libraryId + `ETag` revalidation scenarios. |
-| `updatePromptTemplate` | None — endpoint surface in spec only | n/a yet | Add positive update + `409` non-monotonic-version + `403` pack-sourced-readonly + `501` not-mutable-library scenarios. |
-| `deletePromptTemplate` | None — endpoint surface in spec only | n/a yet | Add positive delete + `403` pack-sourced-readonly + `404` unknown + `501` not-mutable-library scenarios. |
-| `renderPromptTemplate` | `prompt-composed-secret-redaction.test.ts` + `prompt-composed-trust-marker.test.ts` exercise the compose pipeline via the `/v1/host/sample/prompt/compose` host-extension seam; capability-gated. The spec'd `POST /v1/prompts:render` endpoint shares the same composition pipeline (RFC 0028 §A deterministic-render invariant matches RFC 0027 §F replay invariant). | Composition redaction + trust-marker invariants covered via the seam | Add positive `:render` via the spec'd endpoint + `400 prompt_variable_unresolved` + `404 template_not_found` once a reference host implements the route. |
+| `listPromptTemplates` | `prompt-template-shape.test.ts` + `prompt-list-and-fetch.test.ts` cover schema shape + advertisement contract + list/get contract for `capabilities.prompts.*` against the reference workflow-engine (RFC 0028 `Active` — endpoints live under `apps/workflow-engine/backend/typescript/src/routes/prompts.ts`) | Behavioral list + advertisement-shape covered | Add cross-host list-with-filter parity scenario when a second host advertises `endpointsSupported: true`. |
+| `createPromptTemplate` | `prompt-mutable-lifecycle.test.ts` covers CRUD lifecycle against the reference workflow-engine (gated on `mutableLibrary: true`); user-source POST succeeds, pack + host-built-in templates return 403 | Positive create + readonly-source 403 path covered | Add explicit `409` duplicate-id scenario + auth/scope matrix scenarios. |
+| `getPromptTemplate` | `prompt-list-and-fetch.test.ts` covers positive fetch + ambiguous-libraryId + ETag honoring when host advertises it | Positive fetch + 404 + ETag covered | Good — minor gap is the `400 ambiguous_template_id` cross-library disambiguation matrix. |
+| `updatePromptTemplate` | `prompt-mutable-lifecycle.test.ts` covers positive update + non-monotonic-version conflict + pack-sourced-readonly 403 against the reference workflow-engine | Positive update + 403 readonly-source + 409 conflict covered | Add `501` not-mutable-library negative for hosts that advertise `mutableLibrary: false`. |
+| `deletePromptTemplate` | `prompt-mutable-lifecycle.test.ts` covers positive delete + pack-sourced-readonly 403 against the reference workflow-engine | Positive delete + 403 readonly-source covered | Add `501` not-mutable-library negative + `404` unknown-template scenarios. |
+| `renderPromptTemplate` | `prompt-render-deterministic.test.ts` exercises `POST /v1/prompts:render` end-to-end against the reference workflow-engine; deterministic-hash invariant verified across `:render` + `prompt.composed` event paths. `prompt-composed-secret-redaction.test.ts` + `prompt-composed-trust-marker.test.ts` exercise the shared compose pipeline via the `/v1/host/sample/prompt/compose` seam | Deterministic render + composition redaction + trust-marker invariants covered | Add `400 prompt_variable_unresolved` matrix for missing variables across all four PromptKinds. |
 ---

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@openwop/openwop-conformance",
-  "version": "1.4.0",
+  "version": "1.5.0",
   "description": "Production-ready black-box conformance suite for OpenWOP v1.0 compliant servers.",
   "repository": {
     "type": "git",

package/schemas/capabilities.schema.json CHANGED Viewed

@@ -415,6 +415,15 @@
               "maximum": 1.0,
               "description": "RFC 0039 §A. Operator-declared confidence floor at or above the spec floor of 0.5; when an OrchestratorDecision carries `confidence` below this floor, the host MUST escalate via a `clarify` or `escalate` interrupt instead of executing the decision. Absent: the spec floor 0.5 applies. Values < 0.5 are non-conformant; values > 1.0 are nonsense. Applies only when `version >= 2`."
             },
+            "confidenceEscalationInterruptKind": {
+              "type": "string",
+              "anyOf": [
+                { "const": "clarification" },
+                { "const": "approval" },
+                { "pattern": "^x-host-[a-z][a-z0-9-]*-[a-z][a-z0-9-]*$" }
+              ],
+              "description": "RFC 0044 — the literal `interrupt.kind` the host emits when escalating a below-floor confidence decision per RFC 0039 §A. `clarification` and `approval` are the canonical values matching the clarify-OR-approval choice in RFC 0039 §A; vendor-extension kinds use the canonical host-extension namespace `^x-host-<host>-<kind>$` per `spec/v1/host-extensions.md` §\"Canonical prefixes\". When advertised, hosts MUST emit an interrupt of the advertised kind on every confidence-escalation event AND the host's downstream `interrupt.md` mapping determines the `waiting-*` terminal status. Absent: conformance assumes the host uses one of the two canonical kinds (the relaxed assertion accepts either). Hosts using vendor kinds MUST also publish a non-normative kind-mapping document per RFC 0044 §C."
+            },
             "crossChildMemoryConcurrency": {
               "type": "string",
               "enum": ["strict", "advisory"],

package/src/scenarios/multi-agent-confidence-escalation.test.ts CHANGED Viewed

@@ -63,6 +63,7 @@ interface DiscoveryDoc {
         supported?: unknown;
         version?: unknown;
         confidenceEscalationFloor?: unknown;
+        confidenceEscalationInterruptKind?: unknown;
       };
     };
   };
@@ -111,15 +112,51 @@ describe.skipIf(BEHAVIORAL_SKIP)('multi-agent-confidence-escalation: behavioral
     const terminal = await pollUntilTerminal(runId);
     // Phase 2 escalation suspends the parent — NOT a terminal `completed`.
     // The conformance pollUntilTerminal returns when the run reaches any
-    // settled status; we expect `waiting-clarification` or equivalent
-    // non-completed status carrying an open clarification interrupt.
-    expect(
-      terminal.status,
-      driver.describe(
-        'RFCS/0039-multi-agent-confidence-and-memory-lifecycle.md §A + spec/v1/interrupt.md',
-        'a host emitting `interrupt.kind: "clarification"` MUST surface the run as `waiting-clarification` per spec/v1/interrupt.md §"Interrupt kinds"; low-confidence decision MUST NOT reach `completed` because no dispatch fired',
-      ),
-    ).toBe('waiting-clarification');
+    // settled status. RFC 0039 §A gives hosts a choice: clarify-kind
+    // escalation (→ waiting-clarification) OR escalate-kind approval
+    // (→ waiting-approval).
+    //
+    // RFC 0044 routing: when the host advertises
+    // `capabilities.multiAgent.executionModel.confidenceEscalationInterruptKind`
+    // the scenario derives the expected terminal-status from that advertisement
+    // (canonical kinds map 1:1 to waiting-clarification / waiting-approval per
+    // `interrupt.md`; vendor `x-host-<host>-<kind>` kinds accept any waiting-*
+    // status — the host's own interrupt.md mapping determines the suffix).
+    // When the host does NOT advertise the field, fall back to the canonical
+    // either-status check.
+    const advertisedKind = d?.capabilities?.multiAgent?.executionModel?.confidenceEscalationInterruptKind;
+    const isVendorKind = typeof advertisedKind === 'string' && /^x-host-[a-z][a-z0-9-]*-[a-z][a-z0-9-]*$/.test(advertisedKind);
+    const isCanonicalKind = advertisedKind === 'clarification' || advertisedKind === 'approval';
+    if (isCanonicalKind) {
+      const expectedStatus = advertisedKind === 'clarification' ? 'waiting-clarification' : 'waiting-approval';
+      expect(
+        terminal.status,
+        driver.describe(
+          'RFCS/0044-confidence-escalation-interrupt-kind-advertisement.md §B',
+          `host advertising confidenceEscalationInterruptKind: "${advertisedKind}" MUST surface the run as "${expectedStatus}" per spec/v1/interrupt.md §"Interrupt kinds"`,
+        ),
+      ).toBe(expectedStatus);
+    } else if (isVendorKind) {
+      const status = terminal.status as string;
+      expect(
+        typeof status === 'string' && status.startsWith('waiting-'),
+        driver.describe(
+          'RFCS/0044-confidence-escalation-interrupt-kind-advertisement.md §B',
+          `host advertising vendor confidenceEscalationInterruptKind ("${advertisedKind}") MUST surface the run as a waiting-* status; the suffix is determined by the host's interrupt.md mapping (see the host's vendor-extensions doc per RFC 0044 §C)`,
+        ),
+      ).toBe(true);
+    } else {
+      // No advertisement — fall back to the canonical either-status check.
+      const acceptedStatuses = ['waiting-clarification', 'waiting-approval'];
+      expect(
+        acceptedStatuses.includes(terminal.status as string),
+        driver.describe(
+          'RFCS/0039-multi-agent-confidence-and-memory-lifecycle.md §A + spec/v1/interrupt.md',
+          'a host below the confidence floor MUST surface the run as `waiting-clarification` (clarify-kind escalation) OR `waiting-approval` (escalate-kind escalation) per RFC 0039 §A; the low-confidence decision MUST NOT reach `completed` because no dispatch fired',
+        ),
+      ).toBe(true);
+    }
     const eventsRes = await driver.get(`/v1/runs/${encodeURIComponent(runId)}/events`);
     expect(eventsRes.status).toBe(200);

package/src/scenarios/sandbox-capability-gate-respected.test.ts CHANGED Viewed

@@ -13,19 +13,15 @@
  * @see SECURITY/invariants.yaml node-pack-sandbox-capability-gate-respected
  */
-import { describe, it, expect } from 'vitest';
-import { driver } from '../lib/driver.js';
+import { describe, it } from 'vitest';
-const HTTP_SKIP = !process.env.OPENWOP_BASE_URL;
-interface D { capabilities?: { sandbox?: { supported?: unknown } } }
-async function ok(): Promise<boolean> { try { const r = await driver.get('/.well-known/openwop'); return r.status === 200 && (r.json as D).capabilities?.sandbox?.supported === true; } catch { return false; } }
+// Behavioral assertion lands when the misbehaving-capability-gate typeId
+// ships + a host advertises `capabilities.sandbox.supported: true`.
+// Expected: error.code === 'sandbox_capability_denied';
+// details.requestedCapability is set to the disallowed identifier.
+// Surfaced as `todo` so test reporters track the gap rather than
+// reporting a vacuous PASS.
-describe.skipIf(HTTP_SKIP)('sandbox-capability-gate-respected: behavioral (RFC 0035 §B)', () => {
-  it('a misbehaving pack calling an undeclared host capability fails closed with sandbox_capability_denied', async () => {
-    if (!(await ok())) return;
-    // Behavioral assertion lands when the misbehaving-capability-gate typeId
-    // is available. Expected: error.code === 'sandbox_capability_denied';
-    // details.requestedCapability is set to the disallowed identifier.
-    expect(true).toBe(true);
-  });
+describe('sandbox-capability-gate-respected: behavioral (RFC 0035 §B)', () => {
+  it.todo('a misbehaving pack calling an undeclared host capability fails closed with sandbox_capability_denied');
 });

package/src/scenarios/sandbox-memory-cap.test.ts CHANGED Viewed

@@ -50,12 +50,9 @@ describe.skipIf(HTTP_SKIP)('sandbox-memory-cap: capability shape + behavioral (R
     ).toBe(true);
   });
-  it('a misbehaving pack allocating beyond memoryLimitBytes fails with sandbox_memory_exceeded', async () => {
-    const sb = await readSandbox();
-    if (!sb || sb.memoryLimitBytes === undefined) return; // soft-skip
-    // Behavioral assertion lands when the misbehaving-memory-cap typeId is
-    // available. Expected: error.code === 'sandbox_memory_exceeded';
-    // details.requestedBytes > memoryLimitBytes.
-    expect(true).toBe(true);
-  });
+  // Behavioral assertion lands when the misbehaving-memory-cap typeId is
+  // available. Expected: error.code === 'sandbox_memory_exceeded';
+  // details.requestedBytes > memoryLimitBytes. Surfaced as `todo` so
+  // test reporters track the gap rather than reporting a vacuous PASS.
+  it.todo('a misbehaving pack allocating beyond memoryLimitBytes fails with sandbox_memory_exceeded');
 });

package/src/scenarios/sandbox-no-cross-pack-mutation.test.ts CHANGED Viewed

@@ -17,19 +17,14 @@
  * @see SECURITY/invariants.yaml node-pack-sandbox-no-cross-pack-mutation
  */
-import { describe, it, expect } from 'vitest';
-import { driver } from '../lib/driver.js';
+import { describe, it } from 'vitest';
-const HTTP_SKIP = !process.env.OPENWOP_BASE_URL;
-interface D { capabilities?: { sandbox?: { supported?: unknown } } }
-async function ok(): Promise<boolean> { try { const r = await driver.get('/.well-known/openwop'); return r.status === 200 && (r.json as D).capabilities?.sandbox?.supported === true; } catch { return false; } }
+// Behavioral assertion lands when the misbehaving-cross-pack-mutation
+// typeIds ship + a host advertises `capabilities.sandbox.supported: true`.
+// Expected: pack-b read returns the absent sentinel value; pack-a's
+// mutation did not cross the isolation boundary. Surfaced as `todo` so
+// test reporters track the gap rather than reporting a vacuous PASS.
-describe.skipIf(HTTP_SKIP)('sandbox-no-cross-pack-mutation: behavioral (RFC 0035 §B)', () => {
-  it('pack A writing a sentinel is NOT visible to pack B in the same host process', async () => {
-    if (!(await ok())) return;
-    // Behavioral assertion lands when the misbehaving-cross-pack-mutation
-    // typeIds are available. Expected: pack-b read returns the absent
-    // sentinel value; pack-a's mutation did not cross the isolation boundary.
-    expect(true).toBe(true);
-  });
+describe('sandbox-no-cross-pack-mutation: behavioral (RFC 0035 §B)', () => {
+  it.todo('pack A writing a sentinel is NOT visible to pack B in the same host process');
 });

package/src/scenarios/sandbox-no-host-env-leak.test.ts CHANGED Viewed

@@ -12,27 +12,16 @@
  * @see SECURITY/invariants.yaml node-pack-sandbox-no-host-env-leak
  */
-import { describe, it, expect } from 'vitest';
-import { driver } from '../lib/driver.js';
+import { describe, it } from 'vitest';
-const HTTP_SKIP = !process.env.OPENWOP_BASE_URL;
+// Behavioral assertion lands when a sandbox-executing host advertises
+// `capabilities.sandbox.supported: true` AND ships a misbehaving-env-leak
+// typeId. The assertion sets a canary env var on the host process, runs
+// the misbehaving pack that reads `process.env`, and asserts the pack's
+// view of env does NOT contain the canary (unless the host has forwarded
+// it via an `allowedHostCalls` entry). Surfaced as `todo` so test
+// reporters track the gap rather than reporting a vacuous PASS.
-interface DiscoveryDoc { capabilities?: { sandbox?: { supported?: unknown } } }
-async function sandboxSupported(): Promise<boolean> {
-  try {
-    const res = await driver.get('/.well-known/openwop');
-    if (res.status !== 200) return false;
-    return (res.json as DiscoveryDoc).capabilities?.sandbox?.supported === true;
-  } catch { return false; }
-}
-describe.skipIf(HTTP_SKIP)('sandbox-no-host-env-leak: behavioral (RFC 0035 §B)', () => {
-  it('a misbehaving pack reading process.env does NOT see host env vars unless explicitly allowed', async () => {
-    if (!(await sandboxSupported())) return; // soft-skip — no sandbox-executing host yet
-    // Behavioral assertion lands when the misbehaving-env-leak typeId is available.
-    // Expected: invocation returns empty/filtered env mapping; the host's own
-    // env (e.g., DATABASE_URL, OPENAI_API_KEY) is NOT visible to the pack.
-    expect(true).toBe(true);
-  });
+describe('sandbox-no-host-env-leak: behavioral (RFC 0035 §B)', () => {
+  it.todo('a misbehaving pack reading process.env does NOT see host env vars unless explicitly allowed');
 });

package/src/scenarios/sandbox-no-host-fs-escape.test.ts CHANGED Viewed

@@ -73,19 +73,16 @@ describe.skipIf(HTTP_SKIP)('sandbox-no-host-fs-escape: capability shape (RFC 003
   });
 });
-describe.skipIf(HTTP_SKIP)('sandbox-no-host-fs-escape: behavioral (RFC 0035 §B node-pack-sandbox-no-host-fs-escape)', () => {
-  it('a misbehaving pack that reads outside the sandbox root fails closed with sandbox_escape_attempt', async () => {
-    const sb = await readSandboxCaps();
-    if (sb?.supported !== true) return; // soft-skip — no sandbox-executing host yet
-    // Behavioral assertion lands when the vendor.openwop.misbehaving-sandbox
-    // synthetic pack ships + a host advertises capabilities.sandbox.supported.
-    // Expected wire shape:
-    //   POST /v1/host/sample/test/sandbox-load { packId: 'vendor.openwop.misbehaving-sandbox' }
-    //   → 200 OK
-    //   POST /v1/host/sample/test/sandbox-invoke { typeId: 'misbehave.fs-escape-read', args: { path: '/etc/passwd' } }
-    //   → response.error.code === 'sandbox_escape_attempt'
-    //   → response.error.details.escapeKind === 'host-fs-escape'
-    expect(true).toBe(true);
-  });
+// Behavioral assertion lands when the vendor.openwop.misbehaving-sandbox
+// synthetic pack ships + a host advertises capabilities.sandbox.supported.
+// Expected wire shape:
+//   POST /v1/host/sample/test/sandbox-load { packId: 'vendor.openwop.misbehaving-sandbox' }
+//   → 200 OK
+//   POST /v1/host/sample/test/sandbox-invoke { typeId: 'misbehave.fs-escape-read', args: { path: '/etc/passwd' } }
+//   → response.error.code === 'sandbox_escape_attempt'
+//   → response.error.details.escapeKind === 'host-fs-escape'
+// Surfaced as `todo` so test reporters track the gap rather than reporting
+// a vacuous PASS.
+describe('sandbox-no-host-fs-escape: behavioral (RFC 0035 §B node-pack-sandbox-no-host-fs-escape)', () => {
+  it.todo('a misbehaving pack that reads outside the sandbox root fails closed with sandbox_escape_attempt + escapeKind: "host-fs-escape"');
 });

package/src/scenarios/sandbox-no-host-process-escape.test.ts CHANGED Viewed

@@ -12,19 +12,20 @@
  * @see SECURITY/invariants.yaml node-pack-sandbox-no-host-process-escape
  */
-import { describe, it, expect } from 'vitest';
-import { driver } from '../lib/driver.js';
+import { describe, it } from 'vitest';
-const HTTP_SKIP = !process.env.OPENWOP_BASE_URL;
-interface D { capabilities?: { sandbox?: { supported?: unknown } } }
-async function ok(): Promise<boolean> { try { const r = await driver.get('/.well-known/openwop'); return r.status === 200 && (r.json as D).capabilities?.sandbox?.supported === true; } catch { return false; } }
+// Behavioral assertion lands when a sandbox-executing host advertises
+// `capabilities.sandbox.supported: true` AND ships a misbehaving-process-escape
+// typeId (e.g., vendor.openwop.misbehaving-process). The assertion drives:
+//   1. POST /v1/runs { workflowId: 'conformance-sandbox-process-escape' }
+//      where the workflow includes a node loading the misbehaving typeId.
+//   2. The node attempts spawn/fork/exec inside the sandbox.
+//   3. Assert the run terminates with error.code === 'sandbox_escape_attempt'
+//      AND error.details.escapeKind === 'host-process-escape'.
+//   4. Assert no host process was actually spawned (host-side probe).
+// Surfaced as `todo` so test reporters track the gap rather than reporting
+// a vacuous PASS.
-describe.skipIf(HTTP_SKIP)('sandbox-no-host-process-escape: behavioral (RFC 0035 §B)', () => {
-  it('a misbehaving pack calling spawn/fork/exec fails closed with sandbox_escape_attempt', async () => {
-    if (!(await ok())) return; // soft-skip — no sandbox-executing host yet
-    // Behavioral assertion lands when the misbehaving-process-escape typeId
-    // is available. Expected: error.code === 'sandbox_escape_attempt';
-    // details.escapeKind === 'host-process-escape'.
-    expect(true).toBe(true);
-  });
+describe('sandbox-no-host-process-escape: behavioral (RFC 0035 §B)', () => {
+  it.todo('a misbehaving pack calling spawn/fork/exec fails closed with sandbox_escape_attempt + escapeKind: "host-process-escape"');
 });

package/src/scenarios/sandbox-no-network-escape.test.ts CHANGED Viewed

@@ -13,37 +13,16 @@
  * @see SECURITY/invariants.yaml node-pack-sandbox-no-network-escape
  */
-import { describe, it, expect } from 'vitest';
-import { driver } from '../lib/driver.js';
+import { describe, it } from 'vitest';
-const HTTP_SKIP = !process.env.OPENWOP_BASE_URL;
+// Behavioral assertion lands when a sandbox-executing host advertises
+// `capabilities.sandbox.supported: true` (with `host.fetch` NOT in
+// `allowedHostCalls`) AND ships a misbehaving-network-escape typeId.
+// The assertion drives the pack to fetch() inside the sandbox and asserts
+// error.code === 'sandbox_capability_denied' with
+// details.requestedCapability === 'host.fetch'. Surfaced as `todo` so
+// test reporters track the gap rather than reporting a vacuous PASS.
-interface DiscoveryDoc {
-  capabilities?: { sandbox?: { supported?: unknown; allowedHostCalls?: unknown } };
-}
-async function readSandbox(): Promise<{ supported: boolean; allowedHostCalls: string[] } | null> {
-  try {
-    const res = await driver.get('/.well-known/openwop');
-    if (res.status !== 200) return null;
-    const sb = (res.json as DiscoveryDoc).capabilities?.sandbox;
-    if (!sb || sb.supported !== true) return null;
-    return {
-      supported: true,
-      allowedHostCalls: Array.isArray(sb.allowedHostCalls) ? sb.allowedHostCalls.filter((s): s is string => typeof s === 'string') : [],
-    };
-  } catch { return null; }
-}
-describe.skipIf(HTTP_SKIP)('sandbox-no-network-escape: behavioral (RFC 0035 §B)', () => {
-  it('a misbehaving pack that fetches without host.fetch in allowedHostCalls fails closed with sandbox_capability_denied', async () => {
-    const sb = await readSandbox();
-    if (!sb) return; // soft-skip — no sandbox-executing host yet
-    if (sb.allowedHostCalls.includes('host.fetch')) return; // host permits fetch — the negative test doesn't apply
-    // Behavioral assertion lands when the misbehaving-network-escape typeId
-    // is available. Expected error code: sandbox_capability_denied with
-    // details.requestedCapability: 'host.fetch'.
-    expect(true).toBe(true);
-  });
+describe('sandbox-no-network-escape: behavioral (RFC 0035 §B)', () => {
+  it.todo('a misbehaving pack fetching without host.fetch in allowedHostCalls fails closed with sandbox_capability_denied');
 });

package/src/scenarios/sandbox-timeout-cap.test.ts CHANGED Viewed

@@ -50,12 +50,9 @@ describe.skipIf(HTTP_SKIP)('sandbox-timeout-cap: capability shape + behavioral (
     ).toBe(true);
   });
-  it('a misbehaving pack exceeding wallClockLimitMs fails with sandbox_timeout', async () => {
-    const sb = await readSandbox();
-    if (!sb || sb.wallClockLimitMs === undefined) return;
-    // Behavioral assertion lands when the misbehaving-timeout-cap typeId is
-    // available. Expected: error.code === 'sandbox_timeout';
-    // details.elapsedMs > wallClockLimitMs.
-    expect(true).toBe(true);
-  });
+  // Behavioral assertion lands when the misbehaving-timeout-cap typeId is
+  // available. Expected: error.code === 'sandbox_timeout';
+  // details.elapsedMs > wallClockLimitMs. Surfaced as `todo` so test
+  // reporters track the gap rather than reporting a vacuous PASS.
+  it.todo('a misbehaving pack exceeding wallClockLimitMs fails with sandbox_timeout');
 });