@checkstack/satellite-backend 0.5.1 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,104 @@
1
1
  # @checkstack/satellite-backend
2
2
 
3
+ ## 0.6.1
4
+
5
+ ### Patch Changes
6
+
7
+ - Updated dependencies [13373ce]
8
+ - @checkstack/common@0.14.0
9
+ - @checkstack/backend-api@0.21.1
10
+ - @checkstack/queue-api@0.3.10
11
+ - @checkstack/automation-backend@0.5.1
12
+ - @checkstack/automation-common@0.4.1
13
+ - @checkstack/gitops-backend@0.5.1
14
+ - @checkstack/gitops-common@0.6.1
15
+ - @checkstack/healthcheck-backend@1.6.1
16
+ - @checkstack/healthcheck-common@1.5.1
17
+ - @checkstack/satellite-common@0.8.1
18
+ - @checkstack/script-packages-backend@0.3.1
19
+ - @checkstack/script-packages-common@0.3.1
20
+ - @checkstack/secrets-backend@0.2.1
21
+ - @checkstack/secrets-common@0.2.1
22
+ - @checkstack/signal-common@0.2.7
23
+
24
+ ## 0.6.0
25
+
26
+ ### Minor Changes
27
+
28
+ - 9dcc848: Layered OS-level script sandbox, secure and fail-closed by default (epic #247).
29
+
30
+ Script and shell health checks and the `run_shell` / `run_script` automation actions now run inside a layered OS-level sandbox by default. The sandbox lives in `core/backend-api/src/script-sandbox/` (the single source of truth) and is enforced inside the shared runners, so it applies wherever a job runs.
31
+
32
+ Layers:
33
+
34
+ - Resource caps (CPU / memory / PID / FD / file-size, via `prlimit` on capable Linux; ESM JS-heap cap via `--max-old-space-size`; portable wall-clock timeout) and an OOM-safe streaming output cap.
35
+ - Privilege drop via a NON-ROOT supervisor model: the shipped images run the supervisor as non-root uid `65532`, so every sandboxed script inherits non-root and can never be host-root; filesystem + network confinement is delivered by ROOTLESS `bwrap`/`nsjail` via unprivileged user namespaces. `enforced.privilege` is truthful (true only when the child cannot run as host-root). Runners no longer pass `uid`/`gid` to `Bun.spawn` (a silent no-op and a forward-compat hazard).
36
+ - Filesystem isolation (`scratch-only` / `scratch-plus-ro`) confining the child to its per-run scratch dir over a read-only base; the interpreter path is RO-bound so the runtime execs, and `TMPDIR` is pinned to the in-namespace tmpfs.
37
+ - Network egress control: `deny` (routeless loopback-only netns), `allowlist` (real plumbed egress via macvlan OR rootless slirp4netns + an in-kernel nftables filter), and an always-on metadata / link-local block (`169.254.0.0/16`, `fe80::/10`, `fc00::/7`). No-blackhole invariant: `enforced.network` is never true when egress is actually severed or unfiltered; unpluggable egress degrades to surfaced host net.
38
+ - Per-run fork-bomb containment via RLIMIT*NPROC inside the fresh per-run user+PID namespace; a centralized forbidden-env denylist (`LD_PRELOAD`, `LD_LIBRARY_PATH`, `DYLD*_`, `NODE*OPTIONS`, `BUN*_`, caller `PATH` overrides).
39
+ - A validated tuned seccomp profile (`deploy/seccomp/checkstack-userns.json`) and a live `clone(CLONE_NEWUSER|CLONE_NEWNET)` capability probe (not the static sysctl), shipped by default in both Dockerfiles, `docker-compose.yml`, and `deploy/k8s/checkstack-sandbox.yaml`.
40
+
41
+ Global policy and operator surface:
42
+
43
+ - The global sandbox policy lives in ONE durable row owned by `script-packages` (its `ConfigService` row in shared `plugin_configs`). A single process-wide provider serves every runner; the two script plugins no longer register competing providers. A dedicated admin-only `script-sandbox.manage` permission gates both reading and writing the policy. New `getSandboxPolicy` / `setSandboxPolicy` endpoints and a Settings -> Script Sandbox admin UI (`enabled`, `onUnavailable`, network/filesystem/privilege modes, allow list, metadata block, resource caps). The startup capability/readiness log is emitted in-process by `script-packages-backend` (no fragile init-order RPC self-loop), and on a host that cannot enforce a layer a one-time startup warning explains the two local-dev paths (Docker, or set the global policy to `degrade`).
44
+ - Satellite relay: the WS protocol carries the resolved policy in the `authenticated` message and a `sandbox_policy` push-on-change; a satellite caches the last relayed policy and resolves every run through it.
45
+
46
+ BREAKING CHANGES (platform in BETA, shipped as minor):
47
+
48
+ - Scripts run sandboxed by default. The shipped global default is FAIL-CLOSED (`onUnavailable: "fail"`): when a requested layer cannot be enforced the run is REFUSED (clean `exitCode: -1`, never an unsandboxed spawn) rather than silently degrading. Deployments on hosts that cannot enforce a layer (no bubblewrap, user namespaces blocked, no `/proc` unmask) must run the official images with the documented runtime flags (the bundled seccomp profile + `systempaths=unconfined`, or k8s `procMount: Unmasked`), or set the global policy to `degrade`. On macOS / restricted containers the strong layers degrade to the portable subset and are surfaced per run.
49
+ - Default network posture is deny-egress (`allowlist` with an empty allow list, which resolves to the routeless `deny` path). Scripts calling external endpoints fail until those destinations are allowlisted in the global default. The always-on metadata / link-local block applies even under looser modes.
50
+ - The per-action / per-check `sandbox` config override and the transport `ScriptRequest.sandbox` field are removed; policy is global-only, so an automation/check author can no longer weaken the sandbox on their own item. Stored configs carrying a stray `sandbox` key are tolerated (stripped on parse).
51
+ - The shared runners' `run()` no longer accepts a `sandbox` option; callers rely on the global policy provider.
52
+ - A satellite fails closed (most restrictive profile) until it receives the first relayed policy; a relay-read failure or an older core keeps it fail-closed. A relay failure can never loosen a satellite's sandbox.
53
+
54
+ State and scale: the global policy is a single durable Postgres row read identically on every pod. Capability detection is per-process, deterministic from the host kernel, and surfaced per run via the `EffectiveSandbox` report (a Linux pod and a macOS satellite may legitimately differ). `CHECKSTACK_SANDBOX_UID/GID` and macvlan addressing are genuinely per-host infrastructure, surfaced per run, not the queryable policy. The satellite's policy cache is satellite-local transport state. No new pod-local current-state.
55
+
56
+ This is a beta minor.
57
+
58
+ - 9dcc848: Align workspace dependency versions and migrate React Router to v7.
59
+
60
+ BREAKING CHANGES (React Router v7): All frontend packages now depend on `react-router-dom@^7.16.0`. Previously the workspace declared four divergent ranges (`^6.20.0`, `^6.22.0`, `^7.1.1`, `^7.14.2`), which resolved both `react-router@6` and `react-router@7` into a single bundle. Everything is now unified on v7. The public imports the app uses (`BrowserRouter`, `Routes`, `Route`, `Link`, `NavLink`, `MemoryRouter`, `useNavigate`, `useParams`, `useSearchParams`, `useLocation`) are unchanged between v6 and v7, so no source rewrites were required - but any out-of-tree plugin still on react-router v6 should upgrade to v7 (see the React Router v6 -> v7 upgrade guide) to share the host's single router instance via the import map.
61
+
62
+ Other unified ranges (no API change): `react` -> `^18.3.1`, the `@orpc/*` family (`contract`, `server`, `client`, `tanstack-query`, `openapi`, `zod`) -> `^1.14.4`, and `better-auth` -> `^1.6.13`.
63
+
64
+ Removed the pre-rename `@orpc/react-query` leftover from `@checkstack/frontend-api`; its `createRouterUtils` / `RouterUtils` / `ProcedureUtils` now come from `@orpc/tanstack-query` (the package already in use).
65
+
66
+ Stale in-range runtime deps pulled up to current published versions: `hono` `^4.12.23`, `@tanstack/react-query` (+devtools) `^5.100.14`, `date-fns` `^4.4.0`, `jose` `^6.2.3`, `tar` `^7.5.16`, `semver` `^7.8.1`, `@xyflow/react` `^12.11.0`.
67
+
68
+ ### Patch Changes
69
+
70
+ - Updated dependencies [9dcc848]
71
+ - Updated dependencies [9dcc848]
72
+ - Updated dependencies [9dcc848]
73
+ - Updated dependencies [9dcc848]
74
+ - Updated dependencies [9dcc848]
75
+ - Updated dependencies [9dcc848]
76
+ - Updated dependencies [9dcc848]
77
+ - Updated dependencies [9dcc848]
78
+ - Updated dependencies [9dcc848]
79
+ - Updated dependencies [9dcc848]
80
+ - Updated dependencies [9dcc848]
81
+ - Updated dependencies [9dcc848]
82
+ - Updated dependencies [9dcc848]
83
+ - Updated dependencies [9dcc848]
84
+ - Updated dependencies [9dcc848]
85
+ - Updated dependencies [9dcc848]
86
+ - @checkstack/backend-api@0.21.0
87
+ - @checkstack/healthcheck-backend@1.6.0
88
+ - @checkstack/healthcheck-common@1.5.0
89
+ - @checkstack/automation-backend@0.5.0
90
+ - @checkstack/automation-common@0.4.0
91
+ - @checkstack/common@0.13.0
92
+ - @checkstack/script-packages-common@0.3.0
93
+ - @checkstack/script-packages-backend@0.3.0
94
+ - @checkstack/satellite-common@0.8.0
95
+ - @checkstack/gitops-backend@0.5.0
96
+ - @checkstack/gitops-common@0.6.0
97
+ - @checkstack/secrets-backend@0.2.0
98
+ - @checkstack/secrets-common@0.2.0
99
+ - @checkstack/queue-api@0.3.9
100
+ - @checkstack/signal-common@0.2.6
101
+
3
102
  ## 0.5.1
4
103
 
5
104
  ### Patch Changes
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@checkstack/satellite-backend",
3
- "version": "0.5.1",
3
+ "version": "0.6.1",
4
4
  "license": "Elastic-2.0",
5
5
  "type": "module",
6
6
  "main": "src/index.ts",
@@ -15,29 +15,29 @@
15
15
  "test": "bun test"
16
16
  },
17
17
  "dependencies": {
18
- "@checkstack/backend-api": "0.18.0",
19
- "@checkstack/automation-backend": "0.2.0",
20
- "@checkstack/automation-common": "0.2.0",
21
- "@checkstack/satellite-common": "0.6.0",
22
- "@checkstack/healthcheck-common": "1.3.0",
23
- "@checkstack/signal-common": "0.2.5",
24
- "@checkstack/healthcheck-backend": "1.3.0",
25
- "@checkstack/script-packages-common": "0.1.0",
26
- "@checkstack/script-packages-backend": "0.1.0",
27
- "@checkstack/secrets-common": "0.0.1",
28
- "@checkstack/secrets-backend": "0.0.1",
29
- "@checkstack/gitops-backend": "0.3.7",
30
- "@checkstack/gitops-common": "0.4.2",
31
- "@checkstack/common": "0.12.0",
32
- "@checkstack/queue-api": "0.3.6",
18
+ "@checkstack/backend-api": "0.21.0",
19
+ "@checkstack/automation-backend": "0.5.0",
20
+ "@checkstack/automation-common": "0.4.0",
21
+ "@checkstack/satellite-common": "0.8.0",
22
+ "@checkstack/healthcheck-common": "1.5.0",
23
+ "@checkstack/signal-common": "0.2.6",
24
+ "@checkstack/healthcheck-backend": "1.6.0",
25
+ "@checkstack/script-packages-common": "0.3.0",
26
+ "@checkstack/script-packages-backend": "0.3.0",
27
+ "@checkstack/secrets-common": "0.2.0",
28
+ "@checkstack/secrets-backend": "0.2.0",
29
+ "@checkstack/gitops-backend": "0.5.0",
30
+ "@checkstack/gitops-common": "0.6.0",
31
+ "@checkstack/common": "0.13.0",
32
+ "@checkstack/queue-api": "0.3.9",
33
33
  "drizzle-orm": "^0.45.0",
34
34
  "zod": "^4.2.1",
35
- "@orpc/server": "^1.13.2"
35
+ "@orpc/server": "^1.14.4"
36
36
  },
37
37
  "devDependencies": {
38
38
  "@checkstack/drizzle-helper": "0.0.5",
39
- "@checkstack/scripts": "0.3.4",
40
- "@checkstack/test-utils-backend": "0.1.31",
39
+ "@checkstack/scripts": "0.4.0",
40
+ "@checkstack/test-utils-backend": "0.1.34",
41
41
  "@checkstack/tsconfig": "0.0.7",
42
42
  "@types/bun": "^1.0.0",
43
43
  "@types/pg": "^8.20.0",
@@ -1,7 +1,7 @@
1
1
  /**
2
2
  * Integration test (real Postgres): cross-pod heartbeat-lost detection.
3
3
  *
4
- * This is the DETERMINISTIC backstop for `.agent/rules/state-and-scale.md` that
4
+ * This is the DETERMINISTIC backstop for `.claude/rules/state-and-scale.md` that
5
5
  * the single-process unit suite structurally cannot provide for the heartbeat
6
6
  * monitor. The prior fix made connection STATUS durable but left the
7
7
  * online→offline transition DETECTION pod-local (an in-memory `previousStatuses`
package/src/index.ts CHANGED
@@ -10,7 +10,10 @@ import {
10
10
  import { HealthCheckApi } from "@checkstack/healthcheck-common";
11
11
  import { healthCheckHooks } from "@checkstack/healthcheck-backend";
12
12
  import { ScriptPackagesApi } from "@checkstack/script-packages-common";
13
- import { scriptPackagesChangedHook } from "@checkstack/script-packages-backend";
13
+ import {
14
+ scriptPackagesChangedHook,
15
+ sandboxPolicyChangedHook,
16
+ } from "@checkstack/script-packages-backend";
14
17
  import { secretResolverRef } from "@checkstack/secrets-backend";
15
18
  import { resolveSatelliteRunSecrets } from "./run-secret-resolver";
16
19
  import { SatelliteService } from "./service";
@@ -256,6 +259,17 @@ export default createBackendPlugin({
256
259
  resolver: secretResolver,
257
260
  }),
258
261
  },
262
+ {
263
+ // Global sandbox-policy relay: carry the resolved cluster-wide
264
+ // policy in the `authenticated` payload so a satellite enforces it
265
+ // from its first run, and push it on change. A satellite stays
266
+ // FAIL-CLOSED (deny egress) until this first relay arrives, so a
267
+ // read failure here can never loosen its sandbox.
268
+ getCurrentPolicy: async () => {
269
+ const spClient = rpcClient.forPlugin(ScriptPackagesApi);
270
+ return spClient.getSandboxPolicy();
271
+ },
272
+ },
259
273
  );
260
274
 
261
275
  // Register satellite WebSocket endpoint via the scoped WS registry
@@ -343,6 +357,18 @@ export default createBackendPlugin({
343
357
  { mode: "broadcast" },
344
358
  );
345
359
 
360
+ // Fan the global sandbox-policy change out to THIS instance's connected
361
+ // satellites (push-on-change relay). Broadcast mode so every core pod
362
+ // pushes to its own satellites; offline satellites converge via the
363
+ // policy carried in `authenticated` on reconnect.
364
+ onHook(
365
+ sandboxPolicyChangedHook,
366
+ async ({ policy }) => {
367
+ wsHandler.pushSandboxPolicyToAll(policy);
368
+ },
369
+ { mode: "broadcast" },
370
+ );
371
+
346
372
  logger.debug("✅ Satellite Backend afterPluginsReady complete.");
347
373
  },
348
374
  });
@@ -529,4 +529,89 @@ describe("SatelliteWsHandler", () => {
529
529
  expect(mirrors).toHaveLength(0);
530
530
  });
531
531
  });
532
+
533
+ describe("global sandbox-policy relay", () => {
534
+ const POLICY = {
535
+ enabled: true,
536
+ onUnavailable: "degrade" as const,
537
+ resources: { cpuSeconds: 33 },
538
+ filesystem: { mode: "scratch-plus-ro" as const },
539
+ network: {
540
+ mode: "deny" as const,
541
+ allow: [] as string[],
542
+ denyLinkLocalAndMetadata: true,
543
+ },
544
+ privilege: { mode: "drop-to-uid" as const },
545
+ };
546
+
547
+ function makeSandboxSink() {
548
+ return {
549
+ sink: {
550
+ getCurrentPolicy: mock(async () => POLICY),
551
+ },
552
+ };
553
+ }
554
+
555
+ async function authedHandlerWithSandboxSink() {
556
+ const { sink } = makeSandboxSink();
557
+ const h = new SatelliteWsHandler(
558
+ service,
559
+ configRelay,
560
+ resultHandler,
561
+ logger,
562
+ undefined,
563
+ undefined,
564
+ undefined,
565
+ sink,
566
+ );
567
+ const ws = createMockWs();
568
+ const { onMessage } = h.onConnection(ws);
569
+ await onMessage(
570
+ JSON.stringify({
571
+ type: "authenticate",
572
+ clientId: "sat-1",
573
+ token: "csat_valid-token",
574
+ }),
575
+ );
576
+ return { h, ws };
577
+ }
578
+
579
+ it("carries the resolved policy in the authenticated payload", async () => {
580
+ const { ws } = await authedHandlerWithSandboxSink();
581
+ const auth = JSON.parse(ws.messages[0]);
582
+ expect(auth.type).toBe("authenticated");
583
+ expect(auth.sandboxPolicy.network.mode).toBe("deny");
584
+ expect(auth.sandboxPolicy.resources.cpuSeconds).toBe(33);
585
+ });
586
+
587
+ it("omits sandboxPolicy entirely when no sink is wired (version-skew safe)", async () => {
588
+ const ws = createMockWs();
589
+ const { onMessage } = handler.onConnection(ws);
590
+ await onMessage(
591
+ JSON.stringify({
592
+ type: "authenticate",
593
+ clientId: "sat-1",
594
+ token: "csat_valid-token",
595
+ }),
596
+ );
597
+ const auth = JSON.parse(ws.messages[0]);
598
+ expect("sandboxPolicy" in auth).toBe(false);
599
+ });
600
+
601
+ it("pushes sandbox_policy to every connected satellite on change", async () => {
602
+ const { h, ws } = await authedHandlerWithSandboxSink();
603
+ ws.messages.length = 0;
604
+
605
+ h.pushSandboxPolicyToAll({
606
+ ...POLICY,
607
+ network: { mode: "allowlist", allow: ["10.0.0.1"], denyLinkLocalAndMetadata: true },
608
+ });
609
+
610
+ expect(ws.messages).toHaveLength(1);
611
+ const msg = JSON.parse(ws.messages[0]);
612
+ expect(msg.type).toBe("sandbox_policy");
613
+ expect(msg.policy.network.mode).toBe("allowlist");
614
+ expect(msg.policy.network.allow).toEqual(["10.0.0.1"]);
615
+ });
616
+ });
532
617
  });
@@ -14,6 +14,7 @@ import {
14
14
  type ResultMessage,
15
15
  type SatelliteWithStatus,
16
16
  } from "@checkstack/satellite-common";
17
+ import type { SandboxPolicy } from "@checkstack/common";
17
18
 
18
19
  /**
19
20
  * Optional plug-point for driving a satellite connection lifecycle edge into
@@ -90,6 +91,18 @@ export interface SatelliteSecretSink {
90
91
  }): Promise<Record<string, string>>;
91
92
  }
92
93
 
94
+ /**
95
+ * Optional plug-point for relaying the GLOBAL script-sandbox policy to
96
+ * satellites. Wired from `afterPluginsReady` against the script-packages RPC.
97
+ * When absent, the `authenticated` message omits `sandboxPolicy` (version-skew
98
+ * safe) and the satellite stays FAIL-CLOSED until a policy arrives, so a
99
+ * missing sink can never loosen a satellite's sandbox.
100
+ */
101
+ export interface SatelliteSandboxPolicySink {
102
+ /** The current resolved global sandbox policy to relay to satellites. */
103
+ getCurrentPolicy(): Promise<SandboxPolicy>;
104
+ }
105
+
93
106
  /**
94
107
  * Active satellite connection tracking.
95
108
  */
@@ -138,6 +151,12 @@ export class SatelliteWsHandler implements WebSocketRouteHandler {
138
151
  * unset, such a request is answered with an error.
139
152
  */
140
153
  private secretSink?: SatelliteSecretSink,
154
+ /**
155
+ * Optional. When set, the `authenticated` message carries the resolved
156
+ * global sandbox policy and the handler can push `sandbox_policy` on change.
157
+ * When unset, the field is omitted and the satellite stays fail-closed.
158
+ */
159
+ private sandboxPolicySink?: SatelliteSandboxPolicySink,
141
160
  ) {}
142
161
 
143
162
  /**
@@ -220,6 +239,7 @@ export class SatelliteWsHandler implements WebSocketRouteHandler {
220
239
  await this.configRelay.getAssignmentsForSatellite(satellite.id);
221
240
  const scriptPackagesLockfileHash =
222
241
  await this.resolveDesiredLockfileHash();
242
+ const sandboxPolicy = await this.resolveSandboxPolicy();
223
243
 
224
244
  this.sendMessage(ws, {
225
245
  type: "authenticated",
@@ -228,6 +248,7 @@ export class SatelliteWsHandler implements WebSocketRouteHandler {
228
248
  ...(scriptPackagesLockfileHash === undefined
229
249
  ? {}
230
250
  : { scriptPackagesLockfileHash }),
251
+ ...(sandboxPolicy === undefined ? {} : { sandboxPolicy }),
231
252
  });
232
253
 
233
254
  this.logger.info(
@@ -420,6 +441,38 @@ export class SatelliteWsHandler implements WebSocketRouteHandler {
420
441
  );
421
442
  }
422
443
 
444
+ /**
445
+ * Push the new global sandbox policy to EVERY connected satellite. Called by
446
+ * the `script-sandbox.policy-changed` broadcast handler so each core instance
447
+ * fans the change out to its own satellites (push-on-change relay).
448
+ * Best-effort liveness; the policy carried in `authenticated` on (re)connect
449
+ * is the durable backstop.
450
+ */
451
+ pushSandboxPolicyToAll(policy: SandboxPolicy): void {
452
+ for (const conn of this.connections.values()) {
453
+ this.sendMessage(conn.ws, { type: "sandbox_policy", policy });
454
+ }
455
+ this.logger.debug(
456
+ `Pushed sandbox_policy to ${this.connections.size} satellite(s)`,
457
+ );
458
+ }
459
+
460
+ /**
461
+ * Resolve the current global sandbox policy for the `authenticated` payload.
462
+ * Returns `undefined` when the sink isn't wired or its read throws, so the
463
+ * field is omitted (version-skew safe) and the satellite stays FAIL-CLOSED
464
+ * (denies egress) - a relay failure must never loosen a satellite's sandbox.
465
+ */
466
+ private async resolveSandboxPolicy(): Promise<SandboxPolicy | undefined> {
467
+ if (!this.sandboxPolicySink) return undefined;
468
+ try {
469
+ return await this.sandboxPolicySink.getCurrentPolicy();
470
+ } catch (error) {
471
+ this.logger.error("Failed to resolve global sandbox policy:", error);
472
+ return undefined;
473
+ }
474
+ }
475
+
423
476
  /**
424
477
  * Resolve the desired lockfile hash for assignment payloads. Returns
425
478
  * `undefined` when the sink isn't wired (so the field is omitted entirely