@openwop/openwop-conformance 1.4.0 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,46 @@
1
1
  # `@openwop/openwop-conformance` Changelog
2
2
 
3
+ ## [1.5.0] — 2026-05-22
4
+
5
+ Minor bump per `PUBLISHING.md` §"Versioning alignment" — unblocks MyndHyve's RFC 0044 + RFC 0039 Half A co-graduation by shipping the relaxed + RFC-0044-routing assertion logic in `multi-agent-confidence-escalation.test.ts`. No new scenario files; no new fixtures. Behavioral honesty pass on 8 sandbox scenarios + schema additions for the new RFC 0044 capability advertisement.
6
+
7
+ ### Changed — RFC 0044 interrupt-kind routing in `multi-agent-confidence-escalation.test.ts`
8
+
9
+ Previously the scenario asserted `expect(terminal.status).toBe('waiting-clarification')` — strict equality on the clarify-kind escalation path, which rejected even RFC 0039 §A's own escalate-approval path (→ `waiting-approval`). v1.5.0 ships the relaxed + RFC-0044-routing logic landed in upstream commits `f03d01d` (relaxation to accept both canonical statuses) + `641d088` (RFC 0044 vendor-kind routing):
10
+
11
+ - **Canonical kind advertised** (`clarification` / `approval`) → strict `expect(terminal.status).toBe('waiting-clarification' | 'waiting-approval')`.
12
+ - **Vendor kind advertised** (`x-host-<host>-<kind>` per `host-extensions.md` §"Canonical prefixes") → `expect(terminal.status.startsWith('waiting-')).toBe(true)`; the host's `interrupt.md` mapping determines the suffix.
13
+ - **No advertisement** → fall back to the canonical either-status check (preserves the `f03d01d` relaxation).
14
+
15
+ This unblocks MyndHyve's `confidenceEscalationInterruptKind: 'x-host-myndhyve-low-confidence'` advertisement (their entrenched `interrupt.kind: 'low-confidence'` → `waiting-approval` mapping) without forcing a cross-cutting rename of `LOW_CONFIDENCE_SUSPEND_REASON` + `mockAgent.node` + `escalationThreshold.ts` + downstream UI consumers. See RFC 0044 §B (`RFCS/0044-confidence-escalation-interrupt-kind-advertisement.md`) for the normative contract.
16
+
17
+ ### Changed — Sandbox scenarios converted vacuous `expect(true).toBe(true)` to `it.todo` (honesty pass)
18
+
19
+ The 8 `sandbox-*.test.ts` scenarios in v1.4.0 carried `expect(true).toBe(true)` tautology assertions for their behavioral legs. v1.5.0 converts them to `it.todo()` per upstream commit `5864a2f`:
20
+
21
+ - `sandbox-no-host-process-escape.test.ts`
22
+ - `sandbox-no-network-escape.test.ts`
23
+ - `sandbox-no-host-fs-escape.test.ts`
24
+ - `sandbox-no-host-env-leak.test.ts`
25
+ - `sandbox-timeout-cap.test.ts`
26
+ - `sandbox-memory-cap.test.ts`
27
+ - `sandbox-no-cross-pack-mutation.test.ts`
28
+ - `sandbox-capability-gate-respected.test.ts`
29
+
30
+ Test reporters now surface 8 todos instead of 8 vacuous passes. The advertisement-shape probes (in `sandbox-no-host-fs-escape`, `sandbox-memory-cap`, `sandbox-timeout-cap`) still run real discovery-doc assertions when capabilities are advertised. Behavioral assertions light up when a sandbox-executing reference host wires the seam.
31
+
32
+ ### Changed — `schemas/capabilities.schema.json` (vendored): adds `multiAgent.executionModel.confidenceEscalationInterruptKind`
33
+
34
+ Per RFC 0044 §A. The optional field accepts the canonical literal `"clarification"` / `"approval"` OR a vendor extension matching `^x-host-[a-z][a-z0-9-]*-[a-z][a-z0-9-]*$` per `host-extensions.md` §"Canonical prefixes". Required for the routing logic above; absent advertisement falls back to the canonical-status check.
35
+
36
+ ### No new scenario files
37
+
38
+ Scenario file count unchanged at 205 (the v1.4.0 baseline). All changes are behavior modifications to existing files.
39
+
40
+ ### Known limits — unchanged from v1.4.0
41
+
42
+ The 6 `it.todo` behavioral assertions across RFC 0034 OTel-seam-gated, RFC 0040 traceparent-propagation, RFC 0041 refusal-divergence + observable-sequence scenarios remain. The 8 sandbox `it.todo` assertions are new in v1.5.0 (replacing the v1.4.0 vacuous-pass shapes).
43
+
3
44
  ## [1.4.0] — 2026-05-22
4
45
 
5
46
  Minor bump per `PUBLISHING.md` §"Versioning alignment" — bundles 45 new conformance scenarios + 23 new fixtures landing since the 1.3.0 publish (2026-05-19). Unblocks non-steward host adoption of RFCs 0027 + 0028 + 0029 + 0030 + 0031 + 0032 + 0033 + 0034 + 0035 + 0036 + 0037 + 0039 + 0040 + 0041 against a single suite version.
package/coverage.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # OpenWOP Conformance Coverage Map
2
2
 
3
- > **Status: Living document. Updated 2026-05-11.** This map connects the current scenario files to the protocol surfaces they protect and records the remaining gaps from the protocol deep dive. Scenario names are source-of-truth file names under `conformance/src/scenarios/`.
3
+ > **Status: Living document. Updated 2026-05-22.** This map connects the current scenario files to the protocol surfaces they protect and records the remaining gaps from the protocol deep dive. Scenario names are source-of-truth file names under `conformance/src/scenarios/`.
4
4
 
5
5
  > **Shape grade vs behavior grade.** Some optional-profile scenarios validate **capability shape** (the host's discovery advertisement is well-formed) without yet exercising **behavior** (the host actually implements the profile end-to-end). The "Current grade" column reflects shape; see §"Capability-gated scenarios: shape vs behavior" below for the dual-grade view and the `OPENWOP_REQUIRE_BEHAVIOR=true` strict-mode runner flag.
6
6
 
@@ -124,12 +124,12 @@ Every OpenAPI operation should have:
124
124
  | `getArtifact` | Indirect through approval payload fixtures | `route-coverage.test.ts` covers unknown artifact `404`/`403` envelope; `artifact-auth.test.ts` (CF-4 close-out 2026-05-15; SQLite host 401-before-404 stub landed 2026-05-19, closes the info-leak surface for every HTTP method) covers `401` unauthenticated path | Negative paths covered (401 + 405 non-GET + 404/403) | Add positive artifact-read scenario once a reference host implements `getArtifact` end-to-end. |
125
125
  | `registerWebhook` | Webhook spec exists | `route-coverage.test.ts` covers invalid URL validation envelope | Add positive registration with a test receiver when harness support exists. |
126
126
  | `unregisterWebhook` | Webhook spec exists | `route-coverage.test.ts` covers unknown subscription behavior | Add full register-then-unregister roundtrip with a test receiver. |
127
- | `listPromptTemplates` | `prompt-template-shape.test.ts` covers schema shape + advertisement contract for `capabilities.prompts.*`; capability-gated behavioral list-with-filter scenarios deferred to RFC 0028 acceptance gate | n/a yet endpoint surface in spec only (RFC 0028 Draft); reference host hasn't implemented the route yet | Add positive list-with-filter scenario + auth-failure + invalid-cursor scenarios once a reference host implements the route. |
128
- | `createPromptTemplate` | None endpoint surface in spec only (RFC 0028 Draft) | n/a yet | Add positive create + `409` duplicate + `501` not-mutable-library + auth/scope scenarios once a reference host implements the route. |
129
- | `getPromptTemplate` | None endpoint surface in spec only | n/a yet | Add positive fetch + `404` unknown + `400` ambiguous-libraryId + `ETag` revalidation scenarios. |
130
- | `updatePromptTemplate` | None endpoint surface in spec only | n/a yet | Add positive update + `409` non-monotonic-version + `403` pack-sourced-readonly + `501` not-mutable-library scenarios. |
131
- | `deletePromptTemplate` | None endpoint surface in spec only | n/a yet | Add positive delete + `403` pack-sourced-readonly + `404` unknown + `501` not-mutable-library scenarios. |
132
- | `renderPromptTemplate` | `prompt-composed-secret-redaction.test.ts` + `prompt-composed-trust-marker.test.ts` exercise the compose pipeline via the `/v1/host/sample/prompt/compose` host-extension seam; capability-gated. The spec'd `POST /v1/prompts:render` endpoint shares the same composition pipeline (RFC 0028 §A deterministic-render invariant matches RFC 0027 §F replay invariant). | Composition redaction + trust-marker invariants covered via the seam | Add positive `:render` via the spec'd endpoint + `400 prompt_variable_unresolved` + `404 template_not_found` once a reference host implements the route. |
127
+ | `listPromptTemplates` | `prompt-template-shape.test.ts` + `prompt-list-and-fetch.test.ts` cover schema shape + advertisement contract + list/get contract for `capabilities.prompts.*` against the reference workflow-engine (RFC 0028 `Active`endpoints live under `apps/workflow-engine/backend/typescript/src/routes/prompts.ts`) | Behavioral list + advertisement-shape covered | Add cross-host list-with-filter parity scenario when a second host advertises `endpointsSupported: true`. |
128
+ | `createPromptTemplate` | `prompt-mutable-lifecycle.test.ts` covers CRUD lifecycle against the reference workflow-engine (gated on `mutableLibrary: true`); user-source POST succeeds, pack + host-built-in templates return 403 | Positive create + readonly-source 403 path covered | Add explicit `409` duplicate-id scenario + auth/scope matrix scenarios. |
129
+ | `getPromptTemplate` | `prompt-list-and-fetch.test.ts` covers positive fetch + ambiguous-libraryId + ETag honoring when host advertises it | Positive fetch + 404 + ETag covered | Good — minor gap is the `400 ambiguous_template_id` cross-library disambiguation matrix. |
130
+ | `updatePromptTemplate` | `prompt-mutable-lifecycle.test.ts` covers positive update + non-monotonic-version conflict + pack-sourced-readonly 403 against the reference workflow-engine | Positive update + 403 readonly-source + 409 conflict covered | Add `501` not-mutable-library negative for hosts that advertise `mutableLibrary: false`. |
131
+ | `deletePromptTemplate` | `prompt-mutable-lifecycle.test.ts` covers positive delete + pack-sourced-readonly 403 against the reference workflow-engine | Positive delete + 403 readonly-source covered | Add `501` not-mutable-library negative + `404` unknown-template scenarios. |
132
+ | `renderPromptTemplate` | `prompt-render-deterministic.test.ts` exercises `POST /v1/prompts:render` end-to-end against the reference workflow-engine; deterministic-hash invariant verified across `:render` + `prompt.composed` event paths. `prompt-composed-secret-redaction.test.ts` + `prompt-composed-trust-marker.test.ts` exercise the shared compose pipeline via the `/v1/host/sample/prompt/compose` seam | Deterministic render + composition redaction + trust-marker invariants covered | Add `400 prompt_variable_unresolved` matrix for missing variables across all four PromptKinds. |
133
133
 
134
134
  ---
135
135
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@openwop/openwop-conformance",
3
- "version": "1.4.0",
3
+ "version": "1.5.0",
4
4
  "description": "Production-ready black-box conformance suite for OpenWOP v1.0 compliant servers.",
5
5
  "repository": {
6
6
  "type": "git",
@@ -415,6 +415,15 @@
415
415
  "maximum": 1.0,
416
416
  "description": "RFC 0039 §A. Operator-declared confidence floor at or above the spec floor of 0.5; when an OrchestratorDecision carries `confidence` below this floor, the host MUST escalate via a `clarify` or `escalate` interrupt instead of executing the decision. Absent: the spec floor 0.5 applies. Values < 0.5 are non-conformant; values > 1.0 are nonsense. Applies only when `version >= 2`."
417
417
  },
418
+ "confidenceEscalationInterruptKind": {
419
+ "type": "string",
420
+ "anyOf": [
421
+ { "const": "clarification" },
422
+ { "const": "approval" },
423
+ { "pattern": "^x-host-[a-z][a-z0-9-]*-[a-z][a-z0-9-]*$" }
424
+ ],
425
+ "description": "RFC 0044 — the literal `interrupt.kind` the host emits when escalating a below-floor confidence decision per RFC 0039 §A. `clarification` and `approval` are the canonical values matching the clarify-OR-approval choice in RFC 0039 §A; vendor-extension kinds use the canonical host-extension namespace `^x-host-<host>-<kind>$` per `spec/v1/host-extensions.md` §\"Canonical prefixes\". When advertised, hosts MUST emit an interrupt of the advertised kind on every confidence-escalation event AND the host's downstream `interrupt.md` mapping determines the `waiting-*` terminal status. Absent: conformance assumes the host uses one of the two canonical kinds (the relaxed assertion accepts either). Hosts using vendor kinds MUST also publish a non-normative kind-mapping document per RFC 0044 §C."
426
+ },
418
427
  "crossChildMemoryConcurrency": {
419
428
  "type": "string",
420
429
  "enum": ["strict", "advisory"],
@@ -63,6 +63,7 @@ interface DiscoveryDoc {
63
63
  supported?: unknown;
64
64
  version?: unknown;
65
65
  confidenceEscalationFloor?: unknown;
66
+ confidenceEscalationInterruptKind?: unknown;
66
67
  };
67
68
  };
68
69
  };
@@ -111,15 +112,51 @@ describe.skipIf(BEHAVIORAL_SKIP)('multi-agent-confidence-escalation: behavioral
111
112
  const terminal = await pollUntilTerminal(runId);
112
113
  // Phase 2 escalation suspends the parent — NOT a terminal `completed`.
113
114
  // The conformance pollUntilTerminal returns when the run reaches any
114
- // settled status; we expect `waiting-clarification` or equivalent
115
- // non-completed status carrying an open clarification interrupt.
116
- expect(
117
- terminal.status,
118
- driver.describe(
119
- 'RFCS/0039-multi-agent-confidence-and-memory-lifecycle.md §A + spec/v1/interrupt.md',
120
- 'a host emitting `interrupt.kind: "clarification"` MUST surface the run as `waiting-clarification` per spec/v1/interrupt.md §"Interrupt kinds"; low-confidence decision MUST NOT reach `completed` because no dispatch fired',
121
- ),
122
- ).toBe('waiting-clarification');
115
+ // settled status. RFC 0039 §A gives hosts a choice: clarify-kind
116
+ // escalation (→ waiting-clarification) OR escalate-kind approval
117
+ // (→ waiting-approval).
118
+ //
119
+ // RFC 0044 routing: when the host advertises
120
+ // `capabilities.multiAgent.executionModel.confidenceEscalationInterruptKind`
121
+ // the scenario derives the expected terminal-status from that advertisement
122
+ // (canonical kinds map 1:1 to waiting-clarification / waiting-approval per
123
+ // `interrupt.md`; vendor `x-host-<host>-<kind>` kinds accept any waiting-*
124
+ // status — the host's own interrupt.md mapping determines the suffix).
125
+ // When the host does NOT advertise the field, fall back to the canonical
126
+ // either-status check.
127
+ const advertisedKind = d?.capabilities?.multiAgent?.executionModel?.confidenceEscalationInterruptKind;
128
+ const isVendorKind = typeof advertisedKind === 'string' && /^x-host-[a-z][a-z0-9-]*-[a-z][a-z0-9-]*$/.test(advertisedKind);
129
+ const isCanonicalKind = advertisedKind === 'clarification' || advertisedKind === 'approval';
130
+
131
+ if (isCanonicalKind) {
132
+ const expectedStatus = advertisedKind === 'clarification' ? 'waiting-clarification' : 'waiting-approval';
133
+ expect(
134
+ terminal.status,
135
+ driver.describe(
136
+ 'RFCS/0044-confidence-escalation-interrupt-kind-advertisement.md §B',
137
+ `host advertising confidenceEscalationInterruptKind: "${advertisedKind}" MUST surface the run as "${expectedStatus}" per spec/v1/interrupt.md §"Interrupt kinds"`,
138
+ ),
139
+ ).toBe(expectedStatus);
140
+ } else if (isVendorKind) {
141
+ const status = terminal.status as string;
142
+ expect(
143
+ typeof status === 'string' && status.startsWith('waiting-'),
144
+ driver.describe(
145
+ 'RFCS/0044-confidence-escalation-interrupt-kind-advertisement.md §B',
146
+ `host advertising vendor confidenceEscalationInterruptKind ("${advertisedKind}") MUST surface the run as a waiting-* status; the suffix is determined by the host's interrupt.md mapping (see the host's vendor-extensions doc per RFC 0044 §C)`,
147
+ ),
148
+ ).toBe(true);
149
+ } else {
150
+ // No advertisement — fall back to the canonical either-status check.
151
+ const acceptedStatuses = ['waiting-clarification', 'waiting-approval'];
152
+ expect(
153
+ acceptedStatuses.includes(terminal.status as string),
154
+ driver.describe(
155
+ 'RFCS/0039-multi-agent-confidence-and-memory-lifecycle.md §A + spec/v1/interrupt.md',
156
+ 'a host below the confidence floor MUST surface the run as `waiting-clarification` (clarify-kind escalation) OR `waiting-approval` (escalate-kind escalation) per RFC 0039 §A; the low-confidence decision MUST NOT reach `completed` because no dispatch fired',
157
+ ),
158
+ ).toBe(true);
159
+ }
123
160
 
124
161
  const eventsRes = await driver.get(`/v1/runs/${encodeURIComponent(runId)}/events`);
125
162
  expect(eventsRes.status).toBe(200);
@@ -13,19 +13,15 @@
13
13
  * @see SECURITY/invariants.yaml node-pack-sandbox-capability-gate-respected
14
14
  */
15
15
 
16
- import { describe, it, expect } from 'vitest';
17
- import { driver } from '../lib/driver.js';
16
+ import { describe, it } from 'vitest';
18
17
 
19
- const HTTP_SKIP = !process.env.OPENWOP_BASE_URL;
20
- interface D { capabilities?: { sandbox?: { supported?: unknown } } }
21
- async function ok(): Promise<boolean> { try { const r = await driver.get('/.well-known/openwop'); return r.status === 200 && (r.json as D).capabilities?.sandbox?.supported === true; } catch { return false; } }
18
+ // Behavioral assertion lands when the misbehaving-capability-gate typeId
19
+ // ships + a host advertises `capabilities.sandbox.supported: true`.
20
+ // Expected: error.code === 'sandbox_capability_denied';
21
+ // details.requestedCapability is set to the disallowed identifier.
22
+ // Surfaced as `todo` so test reporters track the gap rather than
23
+ // reporting a vacuous PASS.
22
24
 
23
- describe.skipIf(HTTP_SKIP)('sandbox-capability-gate-respected: behavioral (RFC 0035 §B)', () => {
24
- it('a misbehaving pack calling an undeclared host capability fails closed with sandbox_capability_denied', async () => {
25
- if (!(await ok())) return;
26
- // Behavioral assertion lands when the misbehaving-capability-gate typeId
27
- // is available. Expected: error.code === 'sandbox_capability_denied';
28
- // details.requestedCapability is set to the disallowed identifier.
29
- expect(true).toBe(true);
30
- });
25
+ describe('sandbox-capability-gate-respected: behavioral (RFC 0035 §B)', () => {
26
+ it.todo('a misbehaving pack calling an undeclared host capability fails closed with sandbox_capability_denied');
31
27
  });
@@ -50,12 +50,9 @@ describe.skipIf(HTTP_SKIP)('sandbox-memory-cap: capability shape + behavioral (R
50
50
  ).toBe(true);
51
51
  });
52
52
 
53
- it('a misbehaving pack allocating beyond memoryLimitBytes fails with sandbox_memory_exceeded', async () => {
54
- const sb = await readSandbox();
55
- if (!sb || sb.memoryLimitBytes === undefined) return; // soft-skip
56
- // Behavioral assertion lands when the misbehaving-memory-cap typeId is
57
- // available. Expected: error.code === 'sandbox_memory_exceeded';
58
- // details.requestedBytes > memoryLimitBytes.
59
- expect(true).toBe(true);
60
- });
53
+ // Behavioral assertion lands when the misbehaving-memory-cap typeId is
54
+ // available. Expected: error.code === 'sandbox_memory_exceeded';
55
+ // details.requestedBytes > memoryLimitBytes. Surfaced as `todo` so
56
+ // test reporters track the gap rather than reporting a vacuous PASS.
57
+ it.todo('a misbehaving pack allocating beyond memoryLimitBytes fails with sandbox_memory_exceeded');
61
58
  });
@@ -17,19 +17,14 @@
17
17
  * @see SECURITY/invariants.yaml node-pack-sandbox-no-cross-pack-mutation
18
18
  */
19
19
 
20
- import { describe, it, expect } from 'vitest';
21
- import { driver } from '../lib/driver.js';
20
+ import { describe, it } from 'vitest';
22
21
 
23
- const HTTP_SKIP = !process.env.OPENWOP_BASE_URL;
24
- interface D { capabilities?: { sandbox?: { supported?: unknown } } }
25
- async function ok(): Promise<boolean> { try { const r = await driver.get('/.well-known/openwop'); return r.status === 200 && (r.json as D).capabilities?.sandbox?.supported === true; } catch { return false; } }
22
+ // Behavioral assertion lands when the misbehaving-cross-pack-mutation
23
+ // typeIds ship + a host advertises `capabilities.sandbox.supported: true`.
24
+ // Expected: pack-b read returns the absent sentinel value; pack-a's
25
+ // mutation did not cross the isolation boundary. Surfaced as `todo` so
26
+ // test reporters track the gap rather than reporting a vacuous PASS.
26
27
 
27
- describe.skipIf(HTTP_SKIP)('sandbox-no-cross-pack-mutation: behavioral (RFC 0035 §B)', () => {
28
- it('pack A writing a sentinel is NOT visible to pack B in the same host process', async () => {
29
- if (!(await ok())) return;
30
- // Behavioral assertion lands when the misbehaving-cross-pack-mutation
31
- // typeIds are available. Expected: pack-b read returns the absent
32
- // sentinel value; pack-a's mutation did not cross the isolation boundary.
33
- expect(true).toBe(true);
34
- });
28
+ describe('sandbox-no-cross-pack-mutation: behavioral (RFC 0035 §B)', () => {
29
+ it.todo('pack A writing a sentinel is NOT visible to pack B in the same host process');
35
30
  });
@@ -12,27 +12,16 @@
12
12
  * @see SECURITY/invariants.yaml node-pack-sandbox-no-host-env-leak
13
13
  */
14
14
 
15
- import { describe, it, expect } from 'vitest';
16
- import { driver } from '../lib/driver.js';
15
+ import { describe, it } from 'vitest';
17
16
 
18
- const HTTP_SKIP = !process.env.OPENWOP_BASE_URL;
17
+ // Behavioral assertion lands when a sandbox-executing host advertises
18
+ // `capabilities.sandbox.supported: true` AND ships a misbehaving-env-leak
19
+ // typeId. The assertion sets a canary env var on the host process, runs
20
+ // the misbehaving pack that reads `process.env`, and asserts the pack's
21
+ // view of env does NOT contain the canary (unless the host has forwarded
22
+ // it via an `allowedHostCalls` entry). Surfaced as `todo` so test
23
+ // reporters track the gap rather than reporting a vacuous PASS.
19
24
 
20
- interface DiscoveryDoc { capabilities?: { sandbox?: { supported?: unknown } } }
21
-
22
- async function sandboxSupported(): Promise<boolean> {
23
- try {
24
- const res = await driver.get('/.well-known/openwop');
25
- if (res.status !== 200) return false;
26
- return (res.json as DiscoveryDoc).capabilities?.sandbox?.supported === true;
27
- } catch { return false; }
28
- }
29
-
30
- describe.skipIf(HTTP_SKIP)('sandbox-no-host-env-leak: behavioral (RFC 0035 §B)', () => {
31
- it('a misbehaving pack reading process.env does NOT see host env vars unless explicitly allowed', async () => {
32
- if (!(await sandboxSupported())) return; // soft-skip — no sandbox-executing host yet
33
- // Behavioral assertion lands when the misbehaving-env-leak typeId is available.
34
- // Expected: invocation returns empty/filtered env mapping; the host's own
35
- // env (e.g., DATABASE_URL, OPENAI_API_KEY) is NOT visible to the pack.
36
- expect(true).toBe(true);
37
- });
25
+ describe('sandbox-no-host-env-leak: behavioral (RFC 0035 §B)', () => {
26
+ it.todo('a misbehaving pack reading process.env does NOT see host env vars unless explicitly allowed');
38
27
  });
@@ -73,19 +73,16 @@ describe.skipIf(HTTP_SKIP)('sandbox-no-host-fs-escape: capability shape (RFC 003
73
73
  });
74
74
  });
75
75
 
76
- describe.skipIf(HTTP_SKIP)('sandbox-no-host-fs-escape: behavioral (RFC 0035 §B node-pack-sandbox-no-host-fs-escape)', () => {
77
- it('a misbehaving pack that reads outside the sandbox root fails closed with sandbox_escape_attempt', async () => {
78
- const sb = await readSandboxCaps();
79
- if (sb?.supported !== true) return; // soft-skip no sandbox-executing host yet
80
-
81
- // Behavioral assertion lands when the vendor.openwop.misbehaving-sandbox
82
- // synthetic pack ships + a host advertises capabilities.sandbox.supported.
83
- // Expected wire shape:
84
- // POST /v1/host/sample/test/sandbox-load { packId: 'vendor.openwop.misbehaving-sandbox' }
85
- // 200 OK
86
- // POST /v1/host/sample/test/sandbox-invoke { typeId: 'misbehave.fs-escape-read', args: { path: '/etc/passwd' } }
87
- // → response.error.code === 'sandbox_escape_attempt'
88
- // → response.error.details.escapeKind === 'host-fs-escape'
89
- expect(true).toBe(true);
90
- });
76
+ // Behavioral assertion lands when the vendor.openwop.misbehaving-sandbox
77
+ // synthetic pack ships + a host advertises capabilities.sandbox.supported.
78
+ // Expected wire shape:
79
+ // POST /v1/host/sample/test/sandbox-load { packId: 'vendor.openwop.misbehaving-sandbox' }
80
+ // → 200 OK
81
+ // POST /v1/host/sample/test/sandbox-invoke { typeId: 'misbehave.fs-escape-read', args: { path: '/etc/passwd' } }
82
+ // response.error.code === 'sandbox_escape_attempt'
83
+ // response.error.details.escapeKind === 'host-fs-escape'
84
+ // Surfaced as `todo` so test reporters track the gap rather than reporting
85
+ // a vacuous PASS.
86
+ describe('sandbox-no-host-fs-escape: behavioral (RFC 0035 §B node-pack-sandbox-no-host-fs-escape)', () => {
87
+ it.todo('a misbehaving pack that reads outside the sandbox root fails closed with sandbox_escape_attempt + escapeKind: "host-fs-escape"');
91
88
  });
@@ -12,19 +12,20 @@
12
12
  * @see SECURITY/invariants.yaml node-pack-sandbox-no-host-process-escape
13
13
  */
14
14
 
15
- import { describe, it, expect } from 'vitest';
16
- import { driver } from '../lib/driver.js';
15
+ import { describe, it } from 'vitest';
17
16
 
18
- const HTTP_SKIP = !process.env.OPENWOP_BASE_URL;
19
- interface D { capabilities?: { sandbox?: { supported?: unknown } } }
20
- async function ok(): Promise<boolean> { try { const r = await driver.get('/.well-known/openwop'); return r.status === 200 && (r.json as D).capabilities?.sandbox?.supported === true; } catch { return false; } }
17
+ // Behavioral assertion lands when a sandbox-executing host advertises
18
+ // `capabilities.sandbox.supported: true` AND ships a misbehaving-process-escape
19
+ // typeId (e.g., vendor.openwop.misbehaving-process). The assertion drives:
20
+ // 1. POST /v1/runs { workflowId: 'conformance-sandbox-process-escape' }
21
+ // where the workflow includes a node loading the misbehaving typeId.
22
+ // 2. The node attempts spawn/fork/exec inside the sandbox.
23
+ // 3. Assert the run terminates with error.code === 'sandbox_escape_attempt'
24
+ // AND error.details.escapeKind === 'host-process-escape'.
25
+ // 4. Assert no host process was actually spawned (host-side probe).
26
+ // Surfaced as `todo` so test reporters track the gap rather than reporting
27
+ // a vacuous PASS.
21
28
 
22
- describe.skipIf(HTTP_SKIP)('sandbox-no-host-process-escape: behavioral (RFC 0035 §B)', () => {
23
- it('a misbehaving pack calling spawn/fork/exec fails closed with sandbox_escape_attempt', async () => {
24
- if (!(await ok())) return; // soft-skip — no sandbox-executing host yet
25
- // Behavioral assertion lands when the misbehaving-process-escape typeId
26
- // is available. Expected: error.code === 'sandbox_escape_attempt';
27
- // details.escapeKind === 'host-process-escape'.
28
- expect(true).toBe(true);
29
- });
29
+ describe('sandbox-no-host-process-escape: behavioral (RFC 0035 §B)', () => {
30
+ it.todo('a misbehaving pack calling spawn/fork/exec fails closed with sandbox_escape_attempt + escapeKind: "host-process-escape"');
30
31
  });
@@ -13,37 +13,16 @@
13
13
  * @see SECURITY/invariants.yaml node-pack-sandbox-no-network-escape
14
14
  */
15
15
 
16
- import { describe, it, expect } from 'vitest';
17
- import { driver } from '../lib/driver.js';
16
+ import { describe, it } from 'vitest';
18
17
 
19
- const HTTP_SKIP = !process.env.OPENWOP_BASE_URL;
18
+ // Behavioral assertion lands when a sandbox-executing host advertises
19
+ // `capabilities.sandbox.supported: true` (with `host.fetch` NOT in
20
+ // `allowedHostCalls`) AND ships a misbehaving-network-escape typeId.
21
+ // The assertion drives the pack to fetch() inside the sandbox and asserts
22
+ // error.code === 'sandbox_capability_denied' with
23
+ // details.requestedCapability === 'host.fetch'. Surfaced as `todo` so
24
+ // test reporters track the gap rather than reporting a vacuous PASS.
20
25
 
21
- interface DiscoveryDoc {
22
- capabilities?: { sandbox?: { supported?: unknown; allowedHostCalls?: unknown } };
23
- }
24
-
25
- async function readSandbox(): Promise<{ supported: boolean; allowedHostCalls: string[] } | null> {
26
- try {
27
- const res = await driver.get('/.well-known/openwop');
28
- if (res.status !== 200) return null;
29
- const sb = (res.json as DiscoveryDoc).capabilities?.sandbox;
30
- if (!sb || sb.supported !== true) return null;
31
- return {
32
- supported: true,
33
- allowedHostCalls: Array.isArray(sb.allowedHostCalls) ? sb.allowedHostCalls.filter((s): s is string => typeof s === 'string') : [],
34
- };
35
- } catch { return null; }
36
- }
37
-
38
- describe.skipIf(HTTP_SKIP)('sandbox-no-network-escape: behavioral (RFC 0035 §B)', () => {
39
- it('a misbehaving pack that fetches without host.fetch in allowedHostCalls fails closed with sandbox_capability_denied', async () => {
40
- const sb = await readSandbox();
41
- if (!sb) return; // soft-skip — no sandbox-executing host yet
42
- if (sb.allowedHostCalls.includes('host.fetch')) return; // host permits fetch — the negative test doesn't apply
43
-
44
- // Behavioral assertion lands when the misbehaving-network-escape typeId
45
- // is available. Expected error code: sandbox_capability_denied with
46
- // details.requestedCapability: 'host.fetch'.
47
- expect(true).toBe(true);
48
- });
26
+ describe('sandbox-no-network-escape: behavioral (RFC 0035 §B)', () => {
27
+ it.todo('a misbehaving pack fetching without host.fetch in allowedHostCalls fails closed with sandbox_capability_denied');
49
28
  });
@@ -50,12 +50,9 @@ describe.skipIf(HTTP_SKIP)('sandbox-timeout-cap: capability shape + behavioral (
50
50
  ).toBe(true);
51
51
  });
52
52
 
53
- it('a misbehaving pack exceeding wallClockLimitMs fails with sandbox_timeout', async () => {
54
- const sb = await readSandbox();
55
- if (!sb || sb.wallClockLimitMs === undefined) return;
56
- // Behavioral assertion lands when the misbehaving-timeout-cap typeId is
57
- // available. Expected: error.code === 'sandbox_timeout';
58
- // details.elapsedMs > wallClockLimitMs.
59
- expect(true).toBe(true);
60
- });
53
+ // Behavioral assertion lands when the misbehaving-timeout-cap typeId is
54
+ // available. Expected: error.code === 'sandbox_timeout';
55
+ // details.elapsedMs > wallClockLimitMs. Surfaced as `todo` so test
56
+ // reporters track the gap rather than reporting a vacuous PASS.
57
+ it.todo('a misbehaving pack exceeding wallClockLimitMs fails with sandbox_timeout');
61
58
  });