@openwop/openwop-conformance 1.4.0 → 1.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +41 -0
- package/coverage.md +7 -7
- package/package.json +1 -1
- package/schemas/capabilities.schema.json +9 -0
- package/src/scenarios/multi-agent-confidence-escalation.test.ts +46 -9
- package/src/scenarios/sandbox-capability-gate-respected.test.ts +9 -13
- package/src/scenarios/sandbox-memory-cap.test.ts +5 -8
- package/src/scenarios/sandbox-no-cross-pack-mutation.test.ts +8 -13
- package/src/scenarios/sandbox-no-host-env-leak.test.ts +10 -21
- package/src/scenarios/sandbox-no-host-fs-escape.test.ts +12 -15
- package/src/scenarios/sandbox-no-host-process-escape.test.ts +14 -13
- package/src/scenarios/sandbox-no-network-escape.test.ts +10 -31
- package/src/scenarios/sandbox-timeout-cap.test.ts +5 -8
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,46 @@
|
|
|
1
1
|
# `@openwop/openwop-conformance` Changelog
|
|
2
2
|
|
|
3
|
+
## [1.5.0] — 2026-05-22
|
|
4
|
+
|
|
5
|
+
Minor bump per `PUBLISHING.md` §"Versioning alignment" — unblocks MyndHyve's RFC 0044 + RFC 0039 Half A co-graduation by shipping the relaxed + RFC-0044-routing assertion logic in `multi-agent-confidence-escalation.test.ts`. No new scenario files; no new fixtures. Behavioral honesty pass on 8 sandbox scenarios + schema additions for the new RFC 0044 capability advertisement.
|
|
6
|
+
|
|
7
|
+
### Changed — RFC 0044 interrupt-kind routing in `multi-agent-confidence-escalation.test.ts`
|
|
8
|
+
|
|
9
|
+
Previously the scenario asserted `expect(terminal.status).toBe('waiting-clarification')` — strict equality on the clarify-kind escalation path, which rejected even RFC 0039 §A's own escalate-approval path (→ `waiting-approval`). v1.5.0 ships the relaxed + RFC-0044-routing logic landed in upstream commits `f03d01d` (relaxation to accept both canonical statuses) + `641d088` (RFC 0044 vendor-kind routing):
|
|
10
|
+
|
|
11
|
+
- **Canonical kind advertised** (`clarification` / `approval`) → strict `expect(terminal.status).toBe('waiting-clarification' | 'waiting-approval')`.
|
|
12
|
+
- **Vendor kind advertised** (`x-host-<host>-<kind>` per `host-extensions.md` §"Canonical prefixes") → `expect(terminal.status.startsWith('waiting-')).toBe(true)`; the host's `interrupt.md` mapping determines the suffix.
|
|
13
|
+
- **No advertisement** → fall back to the canonical either-status check (preserves the `f03d01d` relaxation).
|
|
14
|
+
|
|
15
|
+
This unblocks MyndHyve's `confidenceEscalationInterruptKind: 'x-host-myndhyve-low-confidence'` advertisement (their entrenched `interrupt.kind: 'low-confidence'` → `waiting-approval` mapping) without forcing a cross-cutting rename of `LOW_CONFIDENCE_SUSPEND_REASON` + `mockAgent.node` + `escalationThreshold.ts` + downstream UI consumers. See RFC 0044 §B (`RFCS/0044-confidence-escalation-interrupt-kind-advertisement.md`) for the normative contract.
|
|
16
|
+
|
|
17
|
+
### Changed — Sandbox scenarios converted vacuous `expect(true).toBe(true)` to `it.todo` (honesty pass)
|
|
18
|
+
|
|
19
|
+
The 8 `sandbox-*.test.ts` scenarios in v1.4.0 carried `expect(true).toBe(true)` tautology assertions for their behavioral legs. v1.5.0 converts them to `it.todo()` per upstream commit `5864a2f`:
|
|
20
|
+
|
|
21
|
+
- `sandbox-no-host-process-escape.test.ts`
|
|
22
|
+
- `sandbox-no-network-escape.test.ts`
|
|
23
|
+
- `sandbox-no-host-fs-escape.test.ts`
|
|
24
|
+
- `sandbox-no-host-env-leak.test.ts`
|
|
25
|
+
- `sandbox-timeout-cap.test.ts`
|
|
26
|
+
- `sandbox-memory-cap.test.ts`
|
|
27
|
+
- `sandbox-no-cross-pack-mutation.test.ts`
|
|
28
|
+
- `sandbox-capability-gate-respected.test.ts`
|
|
29
|
+
|
|
30
|
+
Test reporters now surface 8 todos instead of 8 vacuous passes. The advertisement-shape probes (in `sandbox-no-host-fs-escape`, `sandbox-memory-cap`, `sandbox-timeout-cap`) still run real discovery-doc assertions when capabilities are advertised. Behavioral assertions light up when a sandbox-executing reference host wires the seam.
|
|
31
|
+
|
|
32
|
+
### Changed — `schemas/capabilities.schema.json` (vendored): adds `multiAgent.executionModel.confidenceEscalationInterruptKind`
|
|
33
|
+
|
|
34
|
+
Per RFC 0044 §A. The optional field accepts the canonical literal `"clarification"` / `"approval"` OR a vendor extension matching `^x-host-[a-z][a-z0-9-]*-[a-z][a-z0-9-]*$` per `host-extensions.md` §"Canonical prefixes". Required for the routing logic above; absent advertisement falls back to the canonical-status check.
|
|
35
|
+
|
|
36
|
+
### No new scenario files
|
|
37
|
+
|
|
38
|
+
Scenario file count unchanged at 205 (the v1.4.0 baseline). All changes are behavior modifications to existing files.
|
|
39
|
+
|
|
40
|
+
### Known limits — unchanged from v1.4.0
|
|
41
|
+
|
|
42
|
+
The 6 `it.todo` behavioral assertions across RFC 0034 OTel-seam-gated, RFC 0040 traceparent-propagation, RFC 0041 refusal-divergence + observable-sequence scenarios remain. The 8 sandbox `it.todo` assertions are new in v1.5.0 (replacing the v1.4.0 vacuous-pass shapes).
|
|
43
|
+
|
|
3
44
|
## [1.4.0] — 2026-05-22
|
|
4
45
|
|
|
5
46
|
Minor bump per `PUBLISHING.md` §"Versioning alignment" — bundles 45 new conformance scenarios + 23 new fixtures landing since the 1.3.0 publish (2026-05-19). Unblocks non-steward host adoption of RFCs 0027 + 0028 + 0029 + 0030 + 0031 + 0032 + 0033 + 0034 + 0035 + 0036 + 0037 + 0039 + 0040 + 0041 against a single suite version.
|
package/coverage.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# OpenWOP Conformance Coverage Map
|
|
2
2
|
|
|
3
|
-
> **Status: Living document. Updated 2026-05-
|
|
3
|
+
> **Status: Living document. Updated 2026-05-22.** This map connects the current scenario files to the protocol surfaces they protect and records the remaining gaps from the protocol deep dive. Scenario names are source-of-truth file names under `conformance/src/scenarios/`.
|
|
4
4
|
|
|
5
5
|
> **Shape grade vs behavior grade.** Some optional-profile scenarios validate **capability shape** (the host's discovery advertisement is well-formed) without yet exercising **behavior** (the host actually implements the profile end-to-end). The "Current grade" column reflects shape; see §"Capability-gated scenarios: shape vs behavior" below for the dual-grade view and the `OPENWOP_REQUIRE_BEHAVIOR=true` strict-mode runner flag.
|
|
6
6
|
|
|
@@ -124,12 +124,12 @@ Every OpenAPI operation should have:
|
|
|
124
124
|
| `getArtifact` | Indirect through approval payload fixtures | `route-coverage.test.ts` covers unknown artifact `404`/`403` envelope; `artifact-auth.test.ts` (CF-4 close-out 2026-05-15; SQLite host 401-before-404 stub landed 2026-05-19, closes the info-leak surface for every HTTP method) covers `401` unauthenticated path | Negative paths covered (401 + 405 non-GET + 404/403) | Add positive artifact-read scenario once a reference host implements `getArtifact` end-to-end. |
|
|
125
125
|
| `registerWebhook` | Webhook spec exists | `route-coverage.test.ts` covers invalid URL validation envelope | Add positive registration with a test receiver when harness support exists. |
|
|
126
126
|
| `unregisterWebhook` | Webhook spec exists | `route-coverage.test.ts` covers unknown subscription behavior | Add full register-then-unregister roundtrip with a test receiver. |
|
|
127
|
-
| `listPromptTemplates` | `prompt-template-shape.test.ts`
|
|
128
|
-
| `createPromptTemplate` |
|
|
129
|
-
| `getPromptTemplate` |
|
|
130
|
-
| `updatePromptTemplate` |
|
|
131
|
-
| `deletePromptTemplate` |
|
|
132
|
-
| `renderPromptTemplate` | `prompt-composed-secret-redaction.test.ts` + `prompt-composed-trust-marker.test.ts` exercise the compose pipeline via the `/v1/host/sample/prompt/compose`
|
|
127
|
+
| `listPromptTemplates` | `prompt-template-shape.test.ts` + `prompt-list-and-fetch.test.ts` cover schema shape + advertisement contract + list/get contract for `capabilities.prompts.*` against the reference workflow-engine (RFC 0028 `Active` — endpoints live under `apps/workflow-engine/backend/typescript/src/routes/prompts.ts`) | Behavioral list + advertisement-shape covered | Add cross-host list-with-filter parity scenario when a second host advertises `endpointsSupported: true`. |
|
|
128
|
+
| `createPromptTemplate` | `prompt-mutable-lifecycle.test.ts` covers CRUD lifecycle against the reference workflow-engine (gated on `mutableLibrary: true`); user-source POST succeeds, pack + host-built-in templates return 403 | Positive create + readonly-source 403 path covered | Add explicit `409` duplicate-id scenario + auth/scope matrix scenarios. |
|
|
129
|
+
| `getPromptTemplate` | `prompt-list-and-fetch.test.ts` covers positive fetch + ambiguous-libraryId + ETag honoring when host advertises it | Positive fetch + 404 + ETag covered | Good — minor gap is the `400 ambiguous_template_id` cross-library disambiguation matrix. |
|
|
130
|
+
| `updatePromptTemplate` | `prompt-mutable-lifecycle.test.ts` covers positive update + non-monotonic-version conflict + pack-sourced-readonly 403 against the reference workflow-engine | Positive update + 403 readonly-source + 409 conflict covered | Add `501` not-mutable-library negative for hosts that advertise `mutableLibrary: false`. |
|
|
131
|
+
| `deletePromptTemplate` | `prompt-mutable-lifecycle.test.ts` covers positive delete + pack-sourced-readonly 403 against the reference workflow-engine | Positive delete + 403 readonly-source covered | Add `501` not-mutable-library negative + `404` unknown-template scenarios. |
|
|
132
|
+
| `renderPromptTemplate` | `prompt-render-deterministic.test.ts` exercises `POST /v1/prompts:render` end-to-end against the reference workflow-engine; deterministic-hash invariant verified across `:render` + `prompt.composed` event paths. `prompt-composed-secret-redaction.test.ts` + `prompt-composed-trust-marker.test.ts` exercise the shared compose pipeline via the `/v1/host/sample/prompt/compose` seam | Deterministic render + composition redaction + trust-marker invariants covered | Add `400 prompt_variable_unresolved` matrix for missing variables across all four PromptKinds. |
|
|
133
133
|
|
|
134
134
|
---
|
|
135
135
|
|
package/package.json
CHANGED
|
@@ -415,6 +415,15 @@
|
|
|
415
415
|
"maximum": 1.0,
|
|
416
416
|
"description": "RFC 0039 §A. Operator-declared confidence floor at or above the spec floor of 0.5; when an OrchestratorDecision carries `confidence` below this floor, the host MUST escalate via a `clarify` or `escalate` interrupt instead of executing the decision. Absent: the spec floor 0.5 applies. Values < 0.5 are non-conformant; values > 1.0 are nonsense. Applies only when `version >= 2`."
|
|
417
417
|
},
|
|
418
|
+
"confidenceEscalationInterruptKind": {
|
|
419
|
+
"type": "string",
|
|
420
|
+
"anyOf": [
|
|
421
|
+
{ "const": "clarification" },
|
|
422
|
+
{ "const": "approval" },
|
|
423
|
+
{ "pattern": "^x-host-[a-z][a-z0-9-]*-[a-z][a-z0-9-]*$" }
|
|
424
|
+
],
|
|
425
|
+
"description": "RFC 0044 — the literal `interrupt.kind` the host emits when escalating a below-floor confidence decision per RFC 0039 §A. `clarification` and `approval` are the canonical values matching the clarify-OR-approval choice in RFC 0039 §A; vendor-extension kinds use the canonical host-extension namespace `^x-host-<host>-<kind>$` per `spec/v1/host-extensions.md` §\"Canonical prefixes\". When advertised, hosts MUST emit an interrupt of the advertised kind on every confidence-escalation event AND the host's downstream `interrupt.md` mapping determines the `waiting-*` terminal status. Absent: conformance assumes the host uses one of the two canonical kinds (the relaxed assertion accepts either). Hosts using vendor kinds MUST also publish a non-normative kind-mapping document per RFC 0044 §C."
|
|
426
|
+
},
|
|
418
427
|
"crossChildMemoryConcurrency": {
|
|
419
428
|
"type": "string",
|
|
420
429
|
"enum": ["strict", "advisory"],
|
|
@@ -63,6 +63,7 @@ interface DiscoveryDoc {
|
|
|
63
63
|
supported?: unknown;
|
|
64
64
|
version?: unknown;
|
|
65
65
|
confidenceEscalationFloor?: unknown;
|
|
66
|
+
confidenceEscalationInterruptKind?: unknown;
|
|
66
67
|
};
|
|
67
68
|
};
|
|
68
69
|
};
|
|
@@ -111,15 +112,51 @@ describe.skipIf(BEHAVIORAL_SKIP)('multi-agent-confidence-escalation: behavioral
|
|
|
111
112
|
const terminal = await pollUntilTerminal(runId);
|
|
112
113
|
// Phase 2 escalation suspends the parent — NOT a terminal `completed`.
|
|
113
114
|
// The conformance pollUntilTerminal returns when the run reaches any
|
|
114
|
-
// settled status
|
|
115
|
-
//
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
115
|
+
// settled status. RFC 0039 §A gives hosts a choice: clarify-kind
|
|
116
|
+
// escalation (→ waiting-clarification) OR escalate-kind approval
|
|
117
|
+
// (→ waiting-approval).
|
|
118
|
+
//
|
|
119
|
+
// RFC 0044 routing: when the host advertises
|
|
120
|
+
// `capabilities.multiAgent.executionModel.confidenceEscalationInterruptKind`
|
|
121
|
+
// the scenario derives the expected terminal-status from that advertisement
|
|
122
|
+
// (canonical kinds map 1:1 to waiting-clarification / waiting-approval per
|
|
123
|
+
// `interrupt.md`; vendor `x-host-<host>-<kind>` kinds accept any waiting-*
|
|
124
|
+
// status — the host's own interrupt.md mapping determines the suffix).
|
|
125
|
+
// When the host does NOT advertise the field, fall back to the canonical
|
|
126
|
+
// either-status check.
|
|
127
|
+
const advertisedKind = d?.capabilities?.multiAgent?.executionModel?.confidenceEscalationInterruptKind;
|
|
128
|
+
const isVendorKind = typeof advertisedKind === 'string' && /^x-host-[a-z][a-z0-9-]*-[a-z][a-z0-9-]*$/.test(advertisedKind);
|
|
129
|
+
const isCanonicalKind = advertisedKind === 'clarification' || advertisedKind === 'approval';
|
|
130
|
+
|
|
131
|
+
if (isCanonicalKind) {
|
|
132
|
+
const expectedStatus = advertisedKind === 'clarification' ? 'waiting-clarification' : 'waiting-approval';
|
|
133
|
+
expect(
|
|
134
|
+
terminal.status,
|
|
135
|
+
driver.describe(
|
|
136
|
+
'RFCS/0044-confidence-escalation-interrupt-kind-advertisement.md §B',
|
|
137
|
+
`host advertising confidenceEscalationInterruptKind: "${advertisedKind}" MUST surface the run as "${expectedStatus}" per spec/v1/interrupt.md §"Interrupt kinds"`,
|
|
138
|
+
),
|
|
139
|
+
).toBe(expectedStatus);
|
|
140
|
+
} else if (isVendorKind) {
|
|
141
|
+
const status = terminal.status as string;
|
|
142
|
+
expect(
|
|
143
|
+
typeof status === 'string' && status.startsWith('waiting-'),
|
|
144
|
+
driver.describe(
|
|
145
|
+
'RFCS/0044-confidence-escalation-interrupt-kind-advertisement.md §B',
|
|
146
|
+
`host advertising vendor confidenceEscalationInterruptKind ("${advertisedKind}") MUST surface the run as a waiting-* status; the suffix is determined by the host's interrupt.md mapping (see the host's vendor-extensions doc per RFC 0044 §C)`,
|
|
147
|
+
),
|
|
148
|
+
).toBe(true);
|
|
149
|
+
} else {
|
|
150
|
+
// No advertisement — fall back to the canonical either-status check.
|
|
151
|
+
const acceptedStatuses = ['waiting-clarification', 'waiting-approval'];
|
|
152
|
+
expect(
|
|
153
|
+
acceptedStatuses.includes(terminal.status as string),
|
|
154
|
+
driver.describe(
|
|
155
|
+
'RFCS/0039-multi-agent-confidence-and-memory-lifecycle.md §A + spec/v1/interrupt.md',
|
|
156
|
+
'a host below the confidence floor MUST surface the run as `waiting-clarification` (clarify-kind escalation) OR `waiting-approval` (escalate-kind escalation) per RFC 0039 §A; the low-confidence decision MUST NOT reach `completed` because no dispatch fired',
|
|
157
|
+
),
|
|
158
|
+
).toBe(true);
|
|
159
|
+
}
|
|
123
160
|
|
|
124
161
|
const eventsRes = await driver.get(`/v1/runs/${encodeURIComponent(runId)}/events`);
|
|
125
162
|
expect(eventsRes.status).toBe(200);
|
|
@@ -13,19 +13,15 @@
|
|
|
13
13
|
* @see SECURITY/invariants.yaml node-pack-sandbox-capability-gate-respected
|
|
14
14
|
*/
|
|
15
15
|
|
|
16
|
-
import { describe, it
|
|
17
|
-
import { driver } from '../lib/driver.js';
|
|
16
|
+
import { describe, it } from 'vitest';
|
|
18
17
|
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
18
|
+
// Behavioral assertion lands when the misbehaving-capability-gate typeId
|
|
19
|
+
// ships + a host advertises `capabilities.sandbox.supported: true`.
|
|
20
|
+
// Expected: error.code === 'sandbox_capability_denied';
|
|
21
|
+
// details.requestedCapability is set to the disallowed identifier.
|
|
22
|
+
// Surfaced as `todo` so test reporters track the gap rather than
|
|
23
|
+
// reporting a vacuous PASS.
|
|
22
24
|
|
|
23
|
-
describe
|
|
24
|
-
it('a misbehaving pack calling an undeclared host capability fails closed with sandbox_capability_denied'
|
|
25
|
-
if (!(await ok())) return;
|
|
26
|
-
// Behavioral assertion lands when the misbehaving-capability-gate typeId
|
|
27
|
-
// is available. Expected: error.code === 'sandbox_capability_denied';
|
|
28
|
-
// details.requestedCapability is set to the disallowed identifier.
|
|
29
|
-
expect(true).toBe(true);
|
|
30
|
-
});
|
|
25
|
+
describe('sandbox-capability-gate-respected: behavioral (RFC 0035 §B)', () => {
|
|
26
|
+
it.todo('a misbehaving pack calling an undeclared host capability fails closed with sandbox_capability_denied');
|
|
31
27
|
});
|
|
@@ -50,12 +50,9 @@ describe.skipIf(HTTP_SKIP)('sandbox-memory-cap: capability shape + behavioral (R
|
|
|
50
50
|
).toBe(true);
|
|
51
51
|
});
|
|
52
52
|
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
// details.requestedBytes > memoryLimitBytes.
|
|
59
|
-
expect(true).toBe(true);
|
|
60
|
-
});
|
|
53
|
+
// Behavioral assertion lands when the misbehaving-memory-cap typeId is
|
|
54
|
+
// available. Expected: error.code === 'sandbox_memory_exceeded';
|
|
55
|
+
// details.requestedBytes > memoryLimitBytes. Surfaced as `todo` so
|
|
56
|
+
// test reporters track the gap rather than reporting a vacuous PASS.
|
|
57
|
+
it.todo('a misbehaving pack allocating beyond memoryLimitBytes fails with sandbox_memory_exceeded');
|
|
61
58
|
});
|
|
@@ -17,19 +17,14 @@
|
|
|
17
17
|
* @see SECURITY/invariants.yaml node-pack-sandbox-no-cross-pack-mutation
|
|
18
18
|
*/
|
|
19
19
|
|
|
20
|
-
import { describe, it
|
|
21
|
-
import { driver } from '../lib/driver.js';
|
|
20
|
+
import { describe, it } from 'vitest';
|
|
22
21
|
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
22
|
+
// Behavioral assertion lands when the misbehaving-cross-pack-mutation
|
|
23
|
+
// typeIds ship + a host advertises `capabilities.sandbox.supported: true`.
|
|
24
|
+
// Expected: pack-b read returns the absent sentinel value; pack-a's
|
|
25
|
+
// mutation did not cross the isolation boundary. Surfaced as `todo` so
|
|
26
|
+
// test reporters track the gap rather than reporting a vacuous PASS.
|
|
26
27
|
|
|
27
|
-
describe
|
|
28
|
-
it('pack A writing a sentinel is NOT visible to pack B in the same host process'
|
|
29
|
-
if (!(await ok())) return;
|
|
30
|
-
// Behavioral assertion lands when the misbehaving-cross-pack-mutation
|
|
31
|
-
// typeIds are available. Expected: pack-b read returns the absent
|
|
32
|
-
// sentinel value; pack-a's mutation did not cross the isolation boundary.
|
|
33
|
-
expect(true).toBe(true);
|
|
34
|
-
});
|
|
28
|
+
describe('sandbox-no-cross-pack-mutation: behavioral (RFC 0035 §B)', () => {
|
|
29
|
+
it.todo('pack A writing a sentinel is NOT visible to pack B in the same host process');
|
|
35
30
|
});
|
|
@@ -12,27 +12,16 @@
|
|
|
12
12
|
* @see SECURITY/invariants.yaml node-pack-sandbox-no-host-env-leak
|
|
13
13
|
*/
|
|
14
14
|
|
|
15
|
-
import { describe, it
|
|
16
|
-
import { driver } from '../lib/driver.js';
|
|
15
|
+
import { describe, it } from 'vitest';
|
|
17
16
|
|
|
18
|
-
|
|
17
|
+
// Behavioral assertion lands when a sandbox-executing host advertises
|
|
18
|
+
// `capabilities.sandbox.supported: true` AND ships a misbehaving-env-leak
|
|
19
|
+
// typeId. The assertion sets a canary env var on the host process, runs
|
|
20
|
+
// the misbehaving pack that reads `process.env`, and asserts the pack's
|
|
21
|
+
// view of env does NOT contain the canary (unless the host has forwarded
|
|
22
|
+
// it via an `allowedHostCalls` entry). Surfaced as `todo` so test
|
|
23
|
+
// reporters track the gap rather than reporting a vacuous PASS.
|
|
19
24
|
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
async function sandboxSupported(): Promise<boolean> {
|
|
23
|
-
try {
|
|
24
|
-
const res = await driver.get('/.well-known/openwop');
|
|
25
|
-
if (res.status !== 200) return false;
|
|
26
|
-
return (res.json as DiscoveryDoc).capabilities?.sandbox?.supported === true;
|
|
27
|
-
} catch { return false; }
|
|
28
|
-
}
|
|
29
|
-
|
|
30
|
-
describe.skipIf(HTTP_SKIP)('sandbox-no-host-env-leak: behavioral (RFC 0035 §B)', () => {
|
|
31
|
-
it('a misbehaving pack reading process.env does NOT see host env vars unless explicitly allowed', async () => {
|
|
32
|
-
if (!(await sandboxSupported())) return; // soft-skip — no sandbox-executing host yet
|
|
33
|
-
// Behavioral assertion lands when the misbehaving-env-leak typeId is available.
|
|
34
|
-
// Expected: invocation returns empty/filtered env mapping; the host's own
|
|
35
|
-
// env (e.g., DATABASE_URL, OPENAI_API_KEY) is NOT visible to the pack.
|
|
36
|
-
expect(true).toBe(true);
|
|
37
|
-
});
|
|
25
|
+
describe('sandbox-no-host-env-leak: behavioral (RFC 0035 §B)', () => {
|
|
26
|
+
it.todo('a misbehaving pack reading process.env does NOT see host env vars unless explicitly allowed');
|
|
38
27
|
});
|
|
@@ -73,19 +73,16 @@ describe.skipIf(HTTP_SKIP)('sandbox-no-host-fs-escape: capability shape (RFC 003
|
|
|
73
73
|
});
|
|
74
74
|
});
|
|
75
75
|
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
// → response.error.details.escapeKind === 'host-fs-escape'
|
|
89
|
-
expect(true).toBe(true);
|
|
90
|
-
});
|
|
76
|
+
// Behavioral assertion lands when the vendor.openwop.misbehaving-sandbox
|
|
77
|
+
// synthetic pack ships + a host advertises capabilities.sandbox.supported.
|
|
78
|
+
// Expected wire shape:
|
|
79
|
+
// POST /v1/host/sample/test/sandbox-load { packId: 'vendor.openwop.misbehaving-sandbox' }
|
|
80
|
+
// → 200 OK
|
|
81
|
+
// POST /v1/host/sample/test/sandbox-invoke { typeId: 'misbehave.fs-escape-read', args: { path: '/etc/passwd' } }
|
|
82
|
+
// → response.error.code === 'sandbox_escape_attempt'
|
|
83
|
+
// → response.error.details.escapeKind === 'host-fs-escape'
|
|
84
|
+
// Surfaced as `todo` so test reporters track the gap rather than reporting
|
|
85
|
+
// a vacuous PASS.
|
|
86
|
+
describe('sandbox-no-host-fs-escape: behavioral (RFC 0035 §B node-pack-sandbox-no-host-fs-escape)', () => {
|
|
87
|
+
it.todo('a misbehaving pack that reads outside the sandbox root fails closed with sandbox_escape_attempt + escapeKind: "host-fs-escape"');
|
|
91
88
|
});
|
|
@@ -12,19 +12,20 @@
|
|
|
12
12
|
* @see SECURITY/invariants.yaml node-pack-sandbox-no-host-process-escape
|
|
13
13
|
*/
|
|
14
14
|
|
|
15
|
-
import { describe, it
|
|
16
|
-
import { driver } from '../lib/driver.js';
|
|
15
|
+
import { describe, it } from 'vitest';
|
|
17
16
|
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
17
|
+
// Behavioral assertion lands when a sandbox-executing host advertises
|
|
18
|
+
// `capabilities.sandbox.supported: true` AND ships a misbehaving-process-escape
|
|
19
|
+
// typeId (e.g., vendor.openwop.misbehaving-process). The assertion drives:
|
|
20
|
+
// 1. POST /v1/runs { workflowId: 'conformance-sandbox-process-escape' }
|
|
21
|
+
// where the workflow includes a node loading the misbehaving typeId.
|
|
22
|
+
// 2. The node attempts spawn/fork/exec inside the sandbox.
|
|
23
|
+
// 3. Assert the run terminates with error.code === 'sandbox_escape_attempt'
|
|
24
|
+
// AND error.details.escapeKind === 'host-process-escape'.
|
|
25
|
+
// 4. Assert no host process was actually spawned (host-side probe).
|
|
26
|
+
// Surfaced as `todo` so test reporters track the gap rather than reporting
|
|
27
|
+
// a vacuous PASS.
|
|
21
28
|
|
|
22
|
-
describe
|
|
23
|
-
it('a misbehaving pack calling spawn/fork/exec fails closed with sandbox_escape_attempt
|
|
24
|
-
if (!(await ok())) return; // soft-skip — no sandbox-executing host yet
|
|
25
|
-
// Behavioral assertion lands when the misbehaving-process-escape typeId
|
|
26
|
-
// is available. Expected: error.code === 'sandbox_escape_attempt';
|
|
27
|
-
// details.escapeKind === 'host-process-escape'.
|
|
28
|
-
expect(true).toBe(true);
|
|
29
|
-
});
|
|
29
|
+
describe('sandbox-no-host-process-escape: behavioral (RFC 0035 §B)', () => {
|
|
30
|
+
it.todo('a misbehaving pack calling spawn/fork/exec fails closed with sandbox_escape_attempt + escapeKind: "host-process-escape"');
|
|
30
31
|
});
|
|
@@ -13,37 +13,16 @@
|
|
|
13
13
|
* @see SECURITY/invariants.yaml node-pack-sandbox-no-network-escape
|
|
14
14
|
*/
|
|
15
15
|
|
|
16
|
-
import { describe, it
|
|
17
|
-
import { driver } from '../lib/driver.js';
|
|
16
|
+
import { describe, it } from 'vitest';
|
|
18
17
|
|
|
19
|
-
|
|
18
|
+
// Behavioral assertion lands when a sandbox-executing host advertises
|
|
19
|
+
// `capabilities.sandbox.supported: true` (with `host.fetch` NOT in
|
|
20
|
+
// `allowedHostCalls`) AND ships a misbehaving-network-escape typeId.
|
|
21
|
+
// The assertion drives the pack to fetch() inside the sandbox and asserts
|
|
22
|
+
// error.code === 'sandbox_capability_denied' with
|
|
23
|
+
// details.requestedCapability === 'host.fetch'. Surfaced as `todo` so
|
|
24
|
+
// test reporters track the gap rather than reporting a vacuous PASS.
|
|
20
25
|
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
}
|
|
24
|
-
|
|
25
|
-
async function readSandbox(): Promise<{ supported: boolean; allowedHostCalls: string[] } | null> {
|
|
26
|
-
try {
|
|
27
|
-
const res = await driver.get('/.well-known/openwop');
|
|
28
|
-
if (res.status !== 200) return null;
|
|
29
|
-
const sb = (res.json as DiscoveryDoc).capabilities?.sandbox;
|
|
30
|
-
if (!sb || sb.supported !== true) return null;
|
|
31
|
-
return {
|
|
32
|
-
supported: true,
|
|
33
|
-
allowedHostCalls: Array.isArray(sb.allowedHostCalls) ? sb.allowedHostCalls.filter((s): s is string => typeof s === 'string') : [],
|
|
34
|
-
};
|
|
35
|
-
} catch { return null; }
|
|
36
|
-
}
|
|
37
|
-
|
|
38
|
-
describe.skipIf(HTTP_SKIP)('sandbox-no-network-escape: behavioral (RFC 0035 §B)', () => {
|
|
39
|
-
it('a misbehaving pack that fetches without host.fetch in allowedHostCalls fails closed with sandbox_capability_denied', async () => {
|
|
40
|
-
const sb = await readSandbox();
|
|
41
|
-
if (!sb) return; // soft-skip — no sandbox-executing host yet
|
|
42
|
-
if (sb.allowedHostCalls.includes('host.fetch')) return; // host permits fetch — the negative test doesn't apply
|
|
43
|
-
|
|
44
|
-
// Behavioral assertion lands when the misbehaving-network-escape typeId
|
|
45
|
-
// is available. Expected error code: sandbox_capability_denied with
|
|
46
|
-
// details.requestedCapability: 'host.fetch'.
|
|
47
|
-
expect(true).toBe(true);
|
|
48
|
-
});
|
|
26
|
+
describe('sandbox-no-network-escape: behavioral (RFC 0035 §B)', () => {
|
|
27
|
+
it.todo('a misbehaving pack fetching without host.fetch in allowedHostCalls fails closed with sandbox_capability_denied');
|
|
49
28
|
});
|
|
@@ -50,12 +50,9 @@ describe.skipIf(HTTP_SKIP)('sandbox-timeout-cap: capability shape + behavioral (
|
|
|
50
50
|
).toBe(true);
|
|
51
51
|
});
|
|
52
52
|
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
// details.elapsedMs > wallClockLimitMs.
|
|
59
|
-
expect(true).toBe(true);
|
|
60
|
-
});
|
|
53
|
+
// Behavioral assertion lands when the misbehaving-timeout-cap typeId is
|
|
54
|
+
// available. Expected: error.code === 'sandbox_timeout';
|
|
55
|
+
// details.elapsedMs > wallClockLimitMs. Surfaced as `todo` so test
|
|
56
|
+
// reporters track the gap rather than reporting a vacuous PASS.
|
|
57
|
+
it.todo('a misbehaving pack exceeding wallClockLimitMs fails with sandbox_timeout');
|
|
61
58
|
});
|