@checkstack/backend-api 0.17.1 → 0.19.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,300 @@
1
1
  # @checkstack/backend-api
2
2
 
3
+ ## 0.19.0
4
+
5
+ ### Minor Changes
6
+
7
+ - 270ef29: Fix automation provider actions and `secretEnv` script actions throwing in production.
8
+
9
+ The automation dispatch engine resolved provider-action dependencies (the integration connection store, the secret resolver) through a `getService` that was a throwing stub, so Jira / Teams / Webex actions and `secretEnv` script actions threw at execute time in production. The whole dispatch test suite stubbed `getService`, so the break was invisible.
10
+
11
+ Root cause: the plugin `env` exposed `registerService` but no resolver, so the dispatch path (the only context that resolves arbitrary cross-plugin refs outside an RPC handler) had nothing real to call.
12
+
13
+ Changes:
14
+
15
+ - `@checkstack/backend-api`: add `getService<S>(ref: ServiceRef<S>): Promise<S>` to the plugin `env` (`BackendPluginRegistry`). It resolves a service registered by any plugin through the real `ServiceRegistry` using the calling plugin's identity, and throws a clear error if the ref is not registered (never silently `undefined`). **NEW PLUGIN-AUTHOR CONTRACT**: `env.getService` is now available to resolve arbitrary cross-plugin service refs at init / afterPluginsReady time.
16
+ - `@checkstack/backend`: implement `env.getService` in both the plugin loader and the runtime single-plugin registration path, backed by `ServiceRegistry.get(ref, { pluginId })`.
17
+ - `@checkstack/automation-backend`: wire the dispatch `getService` to `env.getService` (was a throwing stub). This also activates run-wide provider-credential masking, because resolving the connection store / secret resolver now flows through the run's masking interceptor.
18
+
19
+ Also fixes a test-only seam where the `core/backend` test preload registered a no-op `registerRouter`, silently disabling oRPC router registration across the suite.
20
+
21
+ - 270ef29: Fix suspend/resume durability + complete the run-wide secret-masking guarantee.
22
+
23
+ A panel review confirmed several defects in the automation dispatch engine's suspend/resume durability and in the run-wide masking choke point. These survived because the unit suite stubbed the seam under test; the fixes ship with tests that exercise the real suspend / sweep / resume paths.
24
+
25
+ Suspend/resume durability:
26
+
27
+ - **Stalled sweeper no longer re-runs intentional waits.** `findStalledRunIds` now joins `automation_runs` and returns only `status = 'running'` runs, and suspend-finalisation no longer clobbers the run's `lastActionPath` checkpoint to `null`. Previously any wait longer than the stale window (>60s) was re-walked from the top every sweep cycle, re-firing pre-wait side effects and leaking wait locks. The wait-aware sweeps now also run before the stalled-run sweep.
28
+ - **Stalled recovery refuses a run holding a live wait lock.** `recoverStalledRun` now only recovers a genuinely-`running` run with no wait lock; a crash-mid-wait recovery is left to the wait/resume paths instead of re-walking from the top and creating a duplicate lock + duplicate delay job.
29
+ - **Cancelled runs can no longer resurrect.** `resumeRun` guards on `status === 'waiting'` (mirroring `checkWaitUntil`) and drops any stale lock for a non-waiting run, so `wakeWaitingRuns` / delay-expiry / a racing queue job can't wake a cancelled or terminal run. `cancelActiveRuns` (restart mode) now deletes the cancelled runs' wait locks + run-state in the same operation.
30
+ - **Concurrency check-then-create is serialized.** The `mode` check + `createRun` now run under a transaction-scoped advisory lock keyed on `(automationId, scope)`, so two concurrent fires can't both pass a `single`-mode "no active run" check and double-run.
31
+
32
+ Masking guarantee (now genuinely covers scope + artifacts):
33
+
34
+ - **The run-wide masking choke point now also masks the durable scope snapshot and produced artifacts.** The `RunSecretRegistry` is threaded into `RunStateStore.upsert` (masks `scopeSnapshot`) and `ArtifactStore.record` (masks `data`) so a resolved connection credential threaded into `scope.variables` or surfaced into an artifact is redacted before persist - and therefore cannot reach a read-only user via `getRunScopeForReplay`. **GUARANTEE CHANGE**: run-wide masking now covers step output, run error, scope snapshot, and artifact data for every action.
35
+ - **`testConnection` / `testProviderConnection` mask provider errors.** These RPCs run outside a dispatch run, so they build a per-call mask set from the resolved/submitted connection config and run any provider error through it before returning, so a provider error echoing a token can't cross back to the browser.
36
+ - **Short secrets surface a warning.** `setSecret` now warns when a value is shorter than `MIN_MASKABLE_LENGTH` (4) that it cannot be auto-redacted (the threshold is intentionally not lowered).
37
+
38
+ Internal:
39
+
40
+ - `@checkstack/backend-api`: `withXactLock`'s `fn` now receives the transaction handle `tx` so a critical section can run on the locked connection; the doc clarifies why running on the pool inside the lock window is still safe. The incident dedup caller's comment is corrected accordingly. `RunStore` gains `findWaitLocksByRun`.
41
+
42
+ - 270ef29: Fix several correctness defects around distributed coordination and stored-data handling.
43
+
44
+ - Dwell `for:` timers now fire via an atomic `DELETE ... RETURNING` claim, so two pods (or the stalled sweeper vs the queue consumer) can no longer both fire the same dwell.
45
+ - Postgres session-level advisory locks now keep connection affinity. A shared `AdvisoryLockService` (backed by a dedicated pooled client) replaces the previous acquire/release-on-different-connection pattern that leaked locks. Used by the script-packages installer election, the automation run resume + stalled sweeper, and (via a new transaction-scoped `withXactLock`) incident dedup.
46
+ - A storage migration that crashed mid-flight is now resumed on startup under the installer-election lock, instead of permanently wedging installs.
47
+ - Distributed script-package blobs carry a `blobSha256` and are verified before extraction (the SRI `integrity` hashes the npm tarball, not the transported archive). Backward-safe: entries without the field skip verification until a re-install regenerates the manifest.
48
+ - Archive extraction rejects zip-slip paths (absolute or `..` entries) before writing anything.
49
+ - `incident.create` with `dedupe_open_for_system` serializes its check-then-create per system, so concurrent triggers for the same system can't both open a duplicate incident.
50
+ - Seeded auto-incident filter expressions JSON-encode interpolated ids so a quote/backslash can't corrupt the expression.
51
+ - Stored jsonb snapshots (dwell `actorSnapshot`, wait-lock `waitConfig`) are validated with zod on load and degrade safely instead of flowing through as the wrong type.
52
+
53
+ - b995afb: Harden the advisory-lock service against holder-connection termination.
54
+
55
+ A session-level advisory lock is held on a dedicated checked-out pool client.
56
+ If that backend is terminated (admin kill, failover, network drop) while the
57
+ lock is held, `pg` emits an `'error'` on the client; with no listener attached
58
+ that error is re-thrown by the EventEmitter and crashes the pod. The service
59
+ now attaches an error listener to the held client so the loss degrades
60
+ gracefully - the session lock is auto-released server-side when the backend
61
+ dies, and the key simply becomes acquirable again.
62
+
63
+ Also de-flaked the advisory-lock integration test: it now terminates only the
64
+ lock-holding backend (found via `pg_locks`) instead of every backend in the
65
+ database - the old blanket kill also tore down the pool's idle connections,
66
+ whose async errors flaked the run and left the pool unusable.
67
+
68
+ - 270ef29: Add in-UI script testing for automation `run_script` / `run_shell` actions.
69
+
70
+ A new `testScript` RPC runs a TypeScript or shell script against an
71
+ editable, auto-seeded sample context using the same sandboxed runner the
72
+ real action uses, so operators can test scripts directly in the editor
73
+ without dispatching a whole automation. Surfaces beneath any script field
74
+ flagged `x-script-testable` via the new `ScriptTestPanel` /
75
+ `ContextSampleEditor` components in `@checkstack/ui` and the
76
+ `scriptTestRenderer` prop threaded through `DynamicForm`.
77
+
78
+ - `@checkstack/automation-common`: adds the `testScript` contract +
79
+ `ScriptTest*` schemas (gated by `automation.manage`).
80
+ - `@checkstack/automation-backend`: implements `testScript` reusing the
81
+ shared ESM / shell runners; central-only, time-bounded.
82
+ - `@checkstack/backend-api`: new `x-script-testable` config-schema
83
+ metadata propagated to the frontend JSON Schema.
84
+ - `@checkstack/ui`: new `ScriptTestPanel` + `ContextSampleEditor`
85
+ components and a `scriptTestRenderer` prop on `DynamicForm`.
86
+ - `@checkstack/automation-frontend`: wires the test panel into the action
87
+ editor.
88
+ - `@checkstack/integration-script-backend`: marks the `run_script` /
89
+ `run_shell` script fields as testable.
90
+
91
+ - 270ef29: Activate npm packages in script execution: thread the managed
92
+ `resolutionRoot` into every user-script call site so an allowlisted package
93
+ can actually be `import`ed.
94
+
95
+ - `@checkstack/backend-api`: the ESM runner now always writes a per-run
96
+ `bunfig.toml` with `[install] auto = "disable"` and runs with that dir as
97
+ CWD. Without this Bun silently auto-installs any imported package from the
98
+ registry (verified), defeating the allowlist; with it, imports resolve
99
+ only against the reconciled `current/node_modules` (when a `resolutionRoot`
100
+ is set) and otherwise fail fast.
101
+ - `@checkstack/script-packages-backend`: `resolveResolutionRoot` /
102
+ `resolveResolutionRootFromStore` / `resolveResolutionRootForHost` decide a
103
+ host's resolution-root status (`none` / `ready` / `notReady`) from the
104
+ local `<store>/current`.
105
+ - `run_script` (integration-script-backend), the inline-script collector
106
+ (healthcheck-script-backend, core + satellite), and the in-UI `testScript`
107
+ / `testCollectorScript` endpoints all resolve the root per run and pass it
108
+ to the runner; `run_script` surfaces a clear "npm packages not ready"
109
+ error when configured-but-unsynced. Shell paths are unaffected (no module
110
+ resolution).
111
+
112
+ An opt-in end-to-end test (`CHECKSTACK_E2E_NETWORK=1`) proves an allowlisted
113
+ package imports successfully through the real `run_script` action execute
114
+ path, with non-network degradation tests running always.
115
+
116
+ BREAKING CHANGES: `@checkstack/backend-api`'s `defaultEsmScriptRunner` now
117
+ always disables Bun auto-install for the user subprocess. A script that
118
+ previously relied on Bun silently fetching an un-vendored package from the
119
+ registry at import time will now fail to resolve it. This is intentional -
120
+ package availability is governed by the admin allowlist - but any caller
121
+ depending on the old implicit auto-install behavior must add the package to
122
+ the allowlist instead. The new `EsmScriptRunOptions.resolutionRoot` field is
123
+ optional and additive (defaults to today's `os.tmpdir()` behavior when
124
+ unset), so the runner API itself is source-compatible.
125
+
126
+ - 270ef29: Add the per-host script-package reconciler and the runner resolution root.
127
+
128
+ - `@checkstack/backend-api`: `EsmScriptRunOptions.resolutionRoot` - when
129
+ set, the per-run temp dir is created inside it so module resolution walks
130
+ up to `<resolutionRoot>/node_modules` and user scripts can `import`
131
+ managed npm packages. Defaults to today's `os.tmpdir()` behavior when
132
+ unset (backward-compatible; isolation unchanged - the subprocess still
133
+ only sees `SAFE_ENV_VARS`).
134
+ - `@checkstack/script-packages-backend`: content-addressed cache archive
135
+ (tar+gzip per package), pure delta diff (`computeMissingBlobs`), atomic
136
+ `current` symlink swap, the host reconciler (`reconcileToHash` -
137
+ idempotent: pull only missing blobs, materialize a versioned tree via
138
+ `bun install --offline`, atomically flip `current`), the concrete fs/Bun
139
+ adapter, the central install resolver, and the `script-packages.changed`
140
+ broadcast hook. An opt-in end-to-end test
141
+ (`CHECKSTACK_E2E_NETWORK=1`) proves resolve -> publish -> cold reconcile
142
+ (no registry) -> offline materialize -> import.
143
+
144
+ - 270ef29: Secrets platform Phase 2: secret -> env-var mapping with central resolve, inject, and mask.
145
+
146
+ - Script consumers declare a least-privilege `secretEnv` allowlist
147
+ (`{ ENV_NAME: "${{ secrets.NAME }}" }`). The automation `run_script` /
148
+ `run_shell` actions resolve ONLY the declared secrets via
149
+ `secretResolverRef.resolveForRun`, inject them into the runner env for
150
+ that run (memory-only; the ESM runner gained a per-run `env` option), and
151
+ mask their values out of stdout/stderr/result/error via the run-scoped
152
+ masking context. A missing required secret fails the run clearly. No
153
+ ambient secret access.
154
+ - Test panel: `testScript` / `testCollectorScript` inject named
155
+ `__SECRET_<NAME>__` placeholders by default, or user-supplied per-secret
156
+ overrides; real production values are never resolved in the test path,
157
+ and overrides are masked out of the result.
158
+ - Healthcheck collectors carry the `secretEnv` field for authoring +
159
+ the test panel; runtime injection on satellites lands in Phase 3.
160
+ - Editor UX: a new `@checkstack/ui` `SecretEnvEditor` renders `x-secret-env`
161
+ record fields with `${{ secrets.* }}` name autocomplete (from
162
+ `listSecretNames`), wired into the automation action editor and the
163
+ healthcheck collector editor. New `withConfigMeta` helper +
164
+ `x-secret-env` config-meta key in `@checkstack/backend-api`.
165
+
166
+ - 270ef29: Secrets platform Phase 3: just-in-time secret delivery to satellites + source-side masking, and central-execution injection for healthcheck collectors.
167
+
168
+ - New satellite WS messages `request_run_secrets` / `run_secrets`: just
169
+ before a satellite runs a collector that declares a `secretEnv`, it asks
170
+ core for that collector's resolved env; core resolves ONLY the secrets the
171
+ collector's OWN persisted assignment declares (least-privilege — the
172
+ satellite cannot choose) and replies with the env map (or a clear error).
173
+ The satellite injects it memory-only for the run and drops it on
174
+ completion. Secrets never ride the persisted assignment and never touch
175
+ disk.
176
+ - Source-side masking: the satellite runs `maskSecrets` over the collector's
177
+ stdout/stderr/result/error using the run's delivered values BEFORE the
178
+ result leaves the satellite (defense in depth).
179
+ - `CollectorStrategy.execute` gains an optional `secretEnv`. The
180
+ inline-script and shell collectors inject it into the runner
181
+ (`process.env` / `$VAR`) and mask the values out of their output.
182
+ - Healthcheck collectors running centrally (the queue executor) also resolve
183
+ - inject `secretEnv` via `secretResolverRef`, closing the gap where a
184
+ centrally-run secretEnv collector got no secrets. A missing required
185
+ secret fails the run clearly in all paths.
186
+
187
+ ### Patch Changes
188
+
189
+ - Updated dependencies [270ef29]
190
+ - Updated dependencies [270ef29]
191
+ - Updated dependencies [270ef29]
192
+ - Updated dependencies [b995afb]
193
+ - Updated dependencies [b995afb]
194
+ - Updated dependencies [270ef29]
195
+ - Updated dependencies [270ef29]
196
+ - @checkstack/healthcheck-common@1.4.0
197
+ - @checkstack/cache-api@0.3.7
198
+ - @checkstack/queue-api@0.3.7
199
+
200
+ ## 0.18.0
201
+
202
+ ### Minor Changes
203
+
204
+ - 6d52276: feat(automation): expose `trigger.actor` so automations can filter on who/what caused an event
205
+
206
+ Every platform event now carries an **actor** - the user, application (API
207
+ client), service (backend-to-backend), or `system` (background /
208
+ unauthenticated) that caused it - and the automation engine surfaces it to
209
+ automations as `trigger.actor`. This lets a trigger filter gate on the
210
+ origin of the event it reacts to:
211
+
212
+ ```text
213
+ {{ trigger.actor.type == "system" }} # auto-created by the platform
214
+ {{ trigger.actor.type == "user" }} # a human
215
+ {{ trigger.actor.id == "app-deploybot" }} # a specific application
216
+ ```
217
+
218
+ `trigger.actor` is available on **every** trigger - it is injected by the
219
+ platform, not declared per trigger - and editor autocomplete + Run Script
220
+ context types include `trigger.actor.{type,id,name}`.
221
+
222
+ How it works:
223
+
224
+ - **`@checkstack/common`** adds the canonical `Actor` type / `ActorSchema`
225
+ and `SYSTEM_ACTOR`.
226
+ - **`@checkstack/backend-api`** adds `resolveActor(user)` and a
227
+ `HookEventMeta` envelope. The hook listener / `onHook` signature gains an
228
+ optional second `meta` argument (additive, backward compatible).
229
+ - **`@checkstack/backend`** wraps emitted hooks in an envelope so the actor
230
+ travels with the payload through the distributed queue, unwrapping it
231
+ before delivery. The RPC emit path captures the authenticated caller;
232
+ background emits default to the system actor. Raw/legacy queue data is
233
+ treated as a system-actor payload, so delivery stays backward compatible.
234
+ - **`@checkstack/automation-backend`** threads the actor into the dispatch
235
+ scope (`trigger.actor`), available to trigger filters, top-level
236
+ conditions, and all run templates, and persisted in the run's scope
237
+ snapshot. Manual runs are attributed to the invoking user.
238
+ - **`@checkstack/automation-common`** / **`@checkstack/automation-frontend`**
239
+ expose `trigger.actor` in the editor variable scope and the generated
240
+ Run Script `context.trigger.actor` types.
241
+
242
+ No database migration and no per-trigger schema changes: the actor rides as
243
+ event-envelope metadata and in the run scope snapshot.
244
+
245
+ - 35bc682: feat(healthcheck): expose check + system run-context to script collectors
246
+
247
+ Script health checks can now read which check and system a run is for.
248
+ Previously shell scripts got only a curated env whitelist and inline
249
+ scripts only `context.config`, so a script had no built-in way to know
250
+ its own check name or the system it was checking.
251
+
252
+ - `@checkstack/backend-api`: new `CollectorRunContext` type
253
+ (`{ check: { id, name, intervalSeconds }, system: { id, name } }`) and
254
+ an optional `runContext` param on `CollectorStrategy.execute`. Optional,
255
+ so existing collector implementations are unaffected.
256
+ - Shell-script collector: injects reserved `CHECKSTACK_CHECK_ID`,
257
+ `CHECKSTACK_CHECK_NAME`, `CHECKSTACK_CHECK_INTERVAL_SECONDS`,
258
+ `CHECKSTACK_SYSTEM_ID`, `CHECKSTACK_SYSTEM_NAME` env vars (user-supplied
259
+ `env` still wins on collision).
260
+ - Inline-script collector: exposes `context.check` and `context.system`
261
+ alongside `context.config`; the inline-script editor now types them for
262
+ autocomplete.
263
+ - Shell editors (health-check collectors and automation shell actions) now
264
+ also suggest the user's own `env` (JSON) keys as `$NAME` completions, via
265
+ the new exported `customShellEnvVars` helper. Keys that aren't valid shell
266
+ identifiers are omitted.
267
+ - Fix: the Typefox `CodeEditor` captured a stale `onChange` at editor start,
268
+ so editing one `DynamicForm` field reverted sibling fields changed since
269
+ mount (e.g. typing in a shell `script` field wiped an unsaved `env` value,
270
+ or deleted a sibling automation action added after mount). The change
271
+ handler now routes through a ref to the current `onChange`.
272
+ - Fix: focusing a JSON editor threw "LanguageStatusService.addStatus is not
273
+ supported" because the standalone service set omitted `ILanguageStatusService`.
274
+ That one service is now registered via `serviceOverrides`.
275
+ - Fix: the automation trigger card nested a `<Badge>` (a `<div>`) inside a
276
+ `<p>`, producing a `validateDOMNesting` warning. Switched the wrapper to a
277
+ `<div>`.
278
+ - Local runs (`queue-executor`) and satellite runs both populate the
279
+ context. `SatelliteAssignment` (and the `getAssignmentsForSatellite`
280
+ RPC output) gained optional `configName` / `systemName` so the metadata
281
+ reaches satellite-side execution; `HealthCheckService` resolves the
282
+ system name via the catalog client.
283
+
284
+ BREAKING CHANGE: `createHealthCheckRouter` now requires a `catalogClient`
285
+ option (used to resolve system names for satellite assignments). Update
286
+ call sites to pass the catalog RPC client.
287
+
288
+ ### Patch Changes
289
+
290
+ - Updated dependencies [6d52276]
291
+ - Updated dependencies [35bc682]
292
+ - @checkstack/common@0.12.0
293
+ - @checkstack/healthcheck-common@1.3.0
294
+ - @checkstack/signal-common@0.2.5
295
+ - @checkstack/cache-api@0.3.6
296
+ - @checkstack/queue-api@0.3.6
297
+
3
298
  ## 0.17.1
4
299
 
5
300
  ### Patch Changes
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@checkstack/backend-api",
3
- "version": "0.17.1",
3
+ "version": "0.19.0",
4
4
  "license": "Elastic-2.0",
5
5
  "type": "module",
6
6
  "main": "./src/index.ts",
@@ -10,11 +10,11 @@
10
10
  "lint:code": "eslint . --max-warnings 0"
11
11
  },
12
12
  "dependencies": {
13
- "@checkstack/common": "0.11.0",
14
- "@checkstack/healthcheck-common": "1.1.2",
15
- "@checkstack/cache-api": "0.3.4",
16
- "@checkstack/queue-api": "0.3.4",
17
- "@checkstack/signal-common": "0.2.4",
13
+ "@checkstack/common": "0.12.0",
14
+ "@checkstack/healthcheck-common": "1.3.0",
15
+ "@checkstack/cache-api": "0.3.6",
16
+ "@checkstack/queue-api": "0.3.6",
17
+ "@checkstack/signal-common": "0.2.5",
18
18
  "@orpc/client": "^1.13.14",
19
19
  "@orpc/contract": "^1.13.14",
20
20
  "@orpc/openapi": "^1.13.2",
@@ -26,9 +26,11 @@
26
26
  "zod": "^4.2.1"
27
27
  },
28
28
  "devDependencies": {
29
- "@types/bun": "latest",
29
+ "@checkstack/scripts": "0.3.4",
30
30
  "@checkstack/tsconfig": "0.0.7",
31
- "@checkstack/scripts": "0.3.3"
31
+ "@types/bun": "latest",
32
+ "@types/pg": "^8.20.0",
33
+ "pg": "^8.21.0"
32
34
  },
33
35
  "peerDependencies": {
34
36
  "hono": "^4.12.14",
@@ -0,0 +1,29 @@
1
+ import { describe, it, expect } from "bun:test";
2
+ import { SYSTEM_ACTOR } from "@checkstack/common";
3
+ import { resolveActor } from "./actor";
4
+
5
+ describe("resolveActor", () => {
6
+ it("falls back to the system actor when there is no caller", () => {
7
+ expect(resolveActor(undefined)).toEqual(SYSTEM_ACTOR);
8
+ });
9
+
10
+ it("maps a real (human) user", () => {
11
+ expect(
12
+ resolveActor({ type: "user", id: "user-1", name: "Nico" }),
13
+ ).toEqual({ type: "user", id: "user-1", name: "Nico" });
14
+ });
15
+
16
+ it("maps an application (API client)", () => {
17
+ expect(
18
+ resolveActor({ type: "application", id: "app-deploybot", name: "Deploy Bot" }),
19
+ ).toEqual({ type: "application", id: "app-deploybot", name: "Deploy Bot" });
20
+ });
21
+
22
+ it("maps a service to its originating plugin id", () => {
23
+ expect(resolveActor({ type: "service", pluginId: "healthcheck" })).toEqual({
24
+ type: "service",
25
+ id: "healthcheck",
26
+ name: "healthcheck",
27
+ });
28
+ });
29
+ });
package/src/actor.ts ADDED
@@ -0,0 +1,27 @@
1
+ import { SYSTEM_ACTOR, type Actor } from "@checkstack/common";
2
+ import type { AuthUser } from "./types";
3
+
4
+ /**
5
+ * Resolve the canonical platform {@link Actor} for an event from the
6
+ * authenticated caller. Background / unauthenticated emits (no `user`)
7
+ * resolve to the system actor, so every emitted event carries an actor.
8
+ *
9
+ * - {@link RealUser} -> `{ type: "user" }`
10
+ * - {@link ApplicationUser} -> `{ type: "application" }`
11
+ * - {@link ServiceUser} -> `{ type: "service", id: pluginId }`
12
+ * - `undefined` -> {@link SYSTEM_ACTOR}
13
+ */
14
+ export function resolveActor(user?: AuthUser): Actor {
15
+ if (!user) return SYSTEM_ACTOR;
16
+ switch (user.type) {
17
+ case "user": {
18
+ return { type: "user", id: user.id, name: user.name };
19
+ }
20
+ case "application": {
21
+ return { type: "application", id: user.id, name: user.name };
22
+ }
23
+ case "service": {
24
+ return { type: "service", id: user.pluginId, name: user.pluginId };
25
+ }
26
+ }
27
+ }
@@ -0,0 +1,111 @@
1
+ /**
2
+ * Integration test (real Postgres) for the advisory-lock service.
3
+ *
4
+ * This is part of the surgical integration lane (plan §14.4 #1). It pins the
5
+ * one behaviour fakes cannot model faithfully: Postgres session-level advisory
6
+ * locks are tied to the DB *connection* that acquired them, so the holding
7
+ * client must be the same one that releases — and killing the holding
8
+ * connection must auto-release the lock.
9
+ *
10
+ * Gated behind `CHECKSTACK_IT=1` so the default `bun test` never runs it. The
11
+ * `integration` CI job sets that flag and provides a real Postgres service
12
+ * container. Connection comes from `CHECKSTACK_IT_PG_URL` (defaulting to the
13
+ * `docker-compose-dev.yml` Postgres port).
14
+ */
15
+ import { afterAll, beforeAll, describe, expect, it } from "bun:test";
16
+ import { Pool } from "pg";
17
+ import { createAdvisoryLockService } from "./advisory-lock";
18
+
19
+ const PG_URL =
20
+ process.env.CHECKSTACK_IT_PG_URL ??
21
+ "postgres://postgres:postgres@localhost:5432/postgres";
22
+
23
+ describe.skipIf(!process.env.CHECKSTACK_IT)(
24
+ "advisory-lock (real Postgres)",
25
+ () => {
26
+ let pool: Pool;
27
+
28
+ beforeAll(() => {
29
+ pool = new Pool({ connectionString: PG_URL });
30
+ // A pooled client can error asynchronously while idle (e.g. its backend
31
+ // is terminated by the kill test below). pg emits that on the pool; with
32
+ // no handler it surfaces as an unhandled "Connection terminated
33
+ // unexpectedly" error that fails the whole test file. Swallowing idle-
34
+ // client errors is the documented pg pattern - the tests still assert
35
+ // behaviour through fresh checkouts.
36
+ pool.on("error", () => {});
37
+ });
38
+
39
+ afterAll(async () => {
40
+ await pool.end();
41
+ });
42
+
43
+ it("a second tryAcquire of the same key returns null until release", async () => {
44
+ const service = createAdvisoryLockService(pool);
45
+ const key = `it-advisory-lock:${crypto.randomUUID()}`;
46
+
47
+ const first = await service.tryAcquire(key);
48
+ expect(first).not.toBeNull();
49
+
50
+ // The lock is held — a concurrent acquire of the SAME key must fail.
51
+ const second = await service.tryAcquire(key);
52
+ expect(second).toBeNull();
53
+
54
+ // After release, a third acquire succeeds.
55
+ await first?.release();
56
+ const third = await service.tryAcquire(key);
57
+ expect(third).not.toBeNull();
58
+ await third?.release();
59
+ });
60
+
61
+ it("killing the holding connection auto-releases the lock", async () => {
62
+ const service = createAdvisoryLockService(pool);
63
+ const key = `it-advisory-lock:${crypto.randomUUID()}`;
64
+
65
+ // Acquire on a dedicated client owned by the handle.
66
+ const held = await service.tryAcquire(key);
67
+ expect(held).not.toBeNull();
68
+
69
+ // While held, the key is unavailable.
70
+ const blocked = await service.tryAcquire(key);
71
+ expect(blocked).toBeNull();
72
+
73
+ // Terminate ONLY the backend holding the advisory lock - found via
74
+ // `pg_locks` - from a fresh connection. Dropping that session makes
75
+ // Postgres auto-release the lock. We deliberately do NOT kill every other
76
+ // backend (the old approach): that also terminated the pool's idle
77
+ // connections, whose async "connection terminated" errors flaked the test
78
+ // and left the pool unusable. The handle holds exactly one advisory lock,
79
+ // so this targets precisely the holder.
80
+ const killer = await pool.connect();
81
+ try {
82
+ await killer.query(
83
+ `SELECT pg_terminate_backend(pid)
84
+ FROM pg_locks
85
+ WHERE locktype = 'advisory'
86
+ AND pid <> pg_backend_pid()`,
87
+ );
88
+ } finally {
89
+ killer.release();
90
+ }
91
+
92
+ // The lock should now be acquirable again. Retry briefly because the
93
+ // server takes a moment to reap the terminated backend's session locks.
94
+ let reacquired: Awaited<ReturnType<typeof service.tryAcquire>> = null;
95
+ for (let attempt = 0; attempt < 20 && reacquired === null; attempt++) {
96
+ reacquired = await service.tryAcquire(key);
97
+ if (reacquired === null) {
98
+ await new Promise((resolve) => setTimeout(resolve, 50));
99
+ }
100
+ }
101
+ expect(reacquired).not.toBeNull();
102
+ await reacquired?.release();
103
+
104
+ // The `held` handle still owns its (now-terminated) client. Release it so
105
+ // the dead client is returned to the pool - otherwise `pool.end()` in
106
+ // afterAll blocks waiting for the checked-out client to drain. The unlock
107
+ // query runs against a dead connection and rejects; that's expected.
108
+ await held?.release().catch(() => {});
109
+ });
110
+ },
111
+ );
@@ -0,0 +1,132 @@
1
+ import { describe, it, expect } from "bun:test";
2
+ import {
3
+ createAdvisoryLockService,
4
+ type AdvisoryLockPool,
5
+ type AdvisoryLockPoolClient,
6
+ } from "./advisory-lock";
7
+
8
+ /**
9
+ * Faithful fake of a `pg.Pool` that models Postgres' per-connection
10
+ * SESSION advisory-lock semantics:
11
+ *
12
+ * - A key can be held by at most one connection at a time.
13
+ * - `pg_try_advisory_lock` succeeds only if the key is free; it then
14
+ * binds the key to the acquiring connection.
15
+ * - `pg_advisory_unlock` only frees the key if THIS connection holds it
16
+ * (a no-op otherwise) — exactly the bug we are guarding against: an
17
+ * unlock issued on a different connection does nothing.
18
+ *
19
+ * This lets the test prove the service keeps acquire + release on ONE
20
+ * client.
21
+ */
22
+ interface FakePool extends AdvisoryLockPool {
23
+ checkedOut: number;
24
+ released: number;
25
+ }
26
+
27
+ function makeFakePool(): FakePool {
28
+ // key -> owning connection id (or absent if free)
29
+ const heldBy = new Map<string, number>();
30
+ let nextConnId = 0;
31
+ const counters = { checkedOut: 0, released: 0 };
32
+
33
+ // hashtextextended($1, 0) is opaque here — we just key on the raw string,
34
+ // which is faithful since the SQL is deterministic per key.
35
+ function keyOf(values: unknown[] | undefined): string {
36
+ return String(values?.[0]);
37
+ }
38
+
39
+ return {
40
+ get checkedOut() {
41
+ return counters.checkedOut;
42
+ },
43
+ get released() {
44
+ return counters.released;
45
+ },
46
+ async connect(): Promise<AdvisoryLockPoolClient> {
47
+ const connId = nextConnId++;
48
+ counters.checkedOut++;
49
+ return {
50
+ async query<T>(queryText: string, values?: unknown[]) {
51
+ const key = keyOf(values);
52
+ if (queryText.includes("pg_try_advisory_lock")) {
53
+ const owner = heldBy.get(key);
54
+ const ok = owner === undefined;
55
+ if (ok) heldBy.set(key, connId);
56
+ return { rows: [{ ok } as unknown as T] };
57
+ }
58
+ if (queryText.includes("pg_advisory_unlock")) {
59
+ // Only the owning connection can release — model the leak bug.
60
+ if (heldBy.get(key) === connId) heldBy.delete(key);
61
+ return { rows: [{ ok: true } as unknown as T] };
62
+ }
63
+ return { rows: [] };
64
+ },
65
+ release() {
66
+ counters.released++;
67
+ },
68
+ on() {
69
+ // The fake never emits async client errors; the real client's
70
+ // `on('error')` hardening is exercised by the IT against real
71
+ // Postgres (killing the holding connection).
72
+ },
73
+ };
74
+ },
75
+ };
76
+ }
77
+
78
+ describe("createAdvisoryLockService", () => {
79
+ it("acquire → second acquire fails while held → release → third acquire succeeds", async () => {
80
+ const pool = makeFakePool();
81
+ const svc = createAdvisoryLockService(pool);
82
+
83
+ const first = await svc.tryAcquire("k");
84
+ expect(first).not.toBeNull();
85
+
86
+ // Held: a second acquire (different pooled connection) must fail.
87
+ const second = await svc.tryAcquire("k");
88
+ expect(second).toBeNull();
89
+
90
+ // Release on the SAME client that acquired (the bug is release no-op'ing
91
+ // because it ran on a different connection).
92
+ await first!.release();
93
+
94
+ const third = await svc.tryAcquire("k");
95
+ expect(third).not.toBeNull();
96
+ await third!.release();
97
+ });
98
+
99
+ it("returns the client to the pool on both the failed-acquire and release paths", async () => {
100
+ const pool = makeFakePool();
101
+ const svc = createAdvisoryLockService(pool);
102
+
103
+ const h = await svc.tryAcquire("k");
104
+ const blocked = await svc.tryAcquire("k"); // fails → must release client
105
+ expect(blocked).toBeNull();
106
+ await h!.release();
107
+
108
+ // 2 connects (one held+released, one failed+released) => 2 releases.
109
+ expect(pool.checkedOut).toBe(2);
110
+ expect(pool.released).toBe(2);
111
+ });
112
+
113
+ it("release is idempotent", async () => {
114
+ const pool = makeFakePool();
115
+ const svc = createAdvisoryLockService(pool);
116
+ const h = await svc.tryAcquire("k");
117
+ await h!.release();
118
+ await h!.release(); // no throw, no double client.release
119
+ expect(pool.released).toBe(1);
120
+ });
121
+
122
+ it("different keys do not block each other", async () => {
123
+ const pool = makeFakePool();
124
+ const svc = createAdvisoryLockService(pool);
125
+ const a = await svc.tryAcquire("a");
126
+ const b = await svc.tryAcquire("b");
127
+ expect(a).not.toBeNull();
128
+ expect(b).not.toBeNull();
129
+ await a!.release();
130
+ await b!.release();
131
+ });
132
+ });
@@ -0,0 +1,174 @@
1
+ /**
2
+ * Postgres advisory-lock helpers with correct connection affinity.
3
+ *
4
+ * Postgres session-level advisory locks (`pg_try_advisory_lock` /
5
+ * `pg_advisory_unlock`) are tied to the DB *session* (connection) that
6
+ * acquired them. The platform runs every plugin query through a
7
+ * schema-scoped proxy that wraps each statement in its own short
8
+ * transaction on a connection borrowed from the shared pool and returned
9
+ * immediately. Acquiring a session lock through that proxy therefore runs
10
+ * the lock on one pooled connection and the unlock on a *different* one —
11
+ * so the unlock no-ops and the lock leaks until the original connection is
12
+ * recycled. This module fixes that two ways:
13
+ *
14
+ * - {@link AdvisoryLockService.tryAcquire} checks out ONE dedicated
15
+ * client from the pool, acquires the session lock on it, and returns a
16
+ * handle that owns that client. `release()` runs the unlock on the SAME
17
+ * client and then returns it to the pool. Use this for long-held locks
18
+ * (e.g. an installer election held across a minutes-long `bun install`)
19
+ * where a long-open transaction would be unacceptable.
20
+ *
21
+ * - {@link withXactLock} wraps acquire + work + release in a single
22
+ * transaction using `pg_advisory_xact_lock`, which auto-releases at
23
+ * COMMIT/ROLLBACK. Use this for SHORT critical sections (e.g. a
24
+ * find-then-create dedup) where holding a transaction for the duration
25
+ * is fine and the auto-release removes any chance of a leak.
26
+ *
27
+ * Keys are arbitrary strings hashed to Postgres' 64-bit lock space via
28
+ * `hashtextextended(key, 0)`. Callers SHOULD namespace keys (e.g.
29
+ * `"script-packages.installer"`, `"incident.dedupe:<systemId>"`) since the
30
+ * advisory-lock space is global to the database server, not schema-scoped.
31
+ */
32
+ import { sql } from "drizzle-orm";
33
+ import type { SafeDatabase } from "./plugin-system";
34
+
35
+ /**
36
+ * Minimal pool surface this module needs. Modelled on `pg.Pool` /
37
+ * `pg.PoolClient` without importing `pg` directly so the helper stays a
38
+ * pure type-level contract; the backend wires in the real `adminPool`.
39
+ */
40
+ export interface AdvisoryLockPoolClient {
41
+ query<T>(
42
+ queryText: string,
43
+ values?: unknown[],
44
+ ): Promise<{ rows: T[] }>;
45
+ /** Return the client to the pool. */
46
+ release(): void;
47
+ /**
48
+ * Subscribe to async client errors. A session-lock client is held for a long
49
+ * time; if its backend dies (admin termination, failover, network drop) `pg`
50
+ * emits `'error'` on the client, and an `'error'` with no listener is
51
+ * re-thrown by the EventEmitter and would crash the pod. We attach a listener
52
+ * so that loss degrades gracefully instead. Modelled on `pg.Client.on`.
53
+ */
54
+ on(event: "error", listener: (err: Error) => void): void;
55
+ }
56
+
57
+ export interface AdvisoryLockPool {
58
+ connect(): Promise<AdvisoryLockPoolClient>;
59
+ }
60
+
61
+ /** A held session-level advisory lock that owns its dedicated client. */
62
+ export interface AdvisoryLockHandle {
63
+ /**
64
+ * Release the lock (`pg_advisory_unlock` on the SAME client) and return
65
+ * the client to the pool. Idempotent: a second call is a no-op.
66
+ */
67
+ release(): Promise<void>;
68
+ }
69
+
70
+ export interface AdvisoryLockService {
71
+ /**
72
+ * Try to acquire a session-level advisory lock for `key` on a dedicated
73
+ * pooled client. Returns a handle on success, or `null` if the lock is
74
+ * already held (by this or another process). The handle owns the client
75
+ * until `release()` is called, so callers MUST always release in a
76
+ * `finally`.
77
+ */
78
+ tryAcquire(key: string): Promise<AdvisoryLockHandle | null>;
79
+ }
80
+
81
+ /**
82
+ * Build an {@link AdvisoryLockService} backed by a pool. The backend
83
+ * provides the real admin pool; tests can provide a faithful fake that
84
+ * models per-connection session-lock semantics.
85
+ */
86
+ export function createAdvisoryLockService(
87
+ pool: AdvisoryLockPool,
88
+ ): AdvisoryLockService {
89
+ return {
90
+ async tryAcquire(key) {
91
+ const client = await pool.connect();
92
+ // A held session lock keeps this client checked out (not idle), so the
93
+ // pool's own error handler won't cover it. If this backend is terminated
94
+ // (admin kill / failover) while the lock is held, `pg` emits `'error'`
95
+ // here; without a listener the process crashes. Swallow it - the session
96
+ // lock is auto-released server-side when the backend dies, and a stale
97
+ // `release()` is already a no-op-safe `finally`, so the loss surfaces as
98
+ // the key simply becoming acquirable again.
99
+ client.on("error", () => {});
100
+ let acquired = false;
101
+ try {
102
+ const result = await client.query<{ ok: boolean }>(
103
+ "SELECT pg_try_advisory_lock(hashtextextended($1, 0)) AS ok",
104
+ [key],
105
+ );
106
+ acquired = Boolean(result.rows[0]?.ok);
107
+ } catch (error) {
108
+ client.release();
109
+ throw error;
110
+ }
111
+ if (!acquired) {
112
+ // Did not get the lock — return the client immediately. (A failed
113
+ // pg_try_advisory_lock acquires nothing, so there is nothing to
114
+ // unlock.)
115
+ client.release();
116
+ return null;
117
+ }
118
+
119
+ let released = false;
120
+ return {
121
+ async release() {
122
+ if (released) return;
123
+ released = true;
124
+ try {
125
+ await client.query(
126
+ "SELECT pg_advisory_unlock(hashtextextended($1, 0))",
127
+ [key],
128
+ );
129
+ } finally {
130
+ client.release();
131
+ }
132
+ },
133
+ };
134
+ },
135
+ };
136
+ }
137
+
138
+ /**
139
+ * Run `fn` while holding a transaction-scoped advisory lock for `key`. The
140
+ * lock is acquired with `pg_advisory_xact_lock` (which BLOCKS until granted)
141
+ * inside a transaction and auto-released at COMMIT/ROLLBACK, so there is no
142
+ * unlock to leak. Use only for SHORT critical sections — the lock is held
143
+ * for the whole transaction.
144
+ *
145
+ * Because the scoped DB runs an entire `transaction()` callback on a single
146
+ * dedicated connection, the lock + the work + the implicit release all share
147
+ * one session, which is exactly the affinity session locks require.
148
+ *
149
+ * `fn` receives the transaction handle `tx` and MUST run its
150
+ * read-then-write critical section on it (not on the outer pool). Running
151
+ * the work on the pool would put it on a DIFFERENT connection than the one
152
+ * holding the lock — so two concurrent callers' critical sections could
153
+ * interleave even though both "hold" the lock. Using `tx` keeps the
154
+ * read-check + write atomic with respect to the lock.
155
+ */
156
+ export async function withXactLock<
157
+ S extends Record<string, unknown>,
158
+ T,
159
+ >({
160
+ db,
161
+ key,
162
+ fn,
163
+ }: {
164
+ db: SafeDatabase<S>;
165
+ key: string;
166
+ fn: (tx: Parameters<Parameters<SafeDatabase<S>["transaction"]>[0]>[0]) => Promise<T>;
167
+ }): Promise<T> {
168
+ return db.transaction(async (tx) => {
169
+ await tx.execute(
170
+ sql`SELECT pg_advisory_xact_lock(hashtextextended(${key}, 0))`,
171
+ );
172
+ return fn(tx);
173
+ });
174
+ }
@@ -17,6 +17,15 @@ export interface CollectorResult<TResult> {
17
17
  error?: string;
18
18
  }
19
19
 
20
+ /**
21
+ * Curated, read-only metadata about the health check + system a collector
22
+ * run is for. Metadata only - never secrets/config.
23
+ */
24
+ export interface CollectorRunContext {
25
+ check: { id: string; name: string; intervalSeconds: number };
26
+ system: { id: string; name: string };
27
+ }
28
+
20
29
  /**
21
30
  * Generic collector strategy interface.
22
31
  *
@@ -71,12 +80,24 @@ export interface CollectorStrategy<
71
80
  * @param params.config - Validated collector configuration
72
81
  * @param params.client - Connected transport client
73
82
  * @param params.pluginId - ID of the transport strategy invoking this collector
83
+ * @param params.runContext - Curated, read-only metadata about the health
84
+ * check + system this run is for (metadata only, never secrets/config)
74
85
  * @returns Collector result with typed metadata
75
86
  */
76
87
  execute(params: {
77
88
  config: TConfig;
78
89
  client: TClient;
79
90
  pluginId: string;
91
+ runContext?: CollectorRunContext;
92
+ /**
93
+ * Resolved secret env for THIS run (the collector's declared
94
+ * `secretEnv` mapped to values), delivered just-in-time. Injected into
95
+ * the collector's script execution env and never persisted. Empty /
96
+ * absent when the collector declares no secrets. The collector is
97
+ * responsible for masking these values out of its returned output
98
+ * (source-side defense in depth).
99
+ */
100
+ secretEnv?: Record<string, string>;
80
101
  }): Promise<CollectorResult<TResult>>;
81
102
 
82
103
  /**
@@ -16,6 +16,7 @@ import type { PluginArtifactStore } from "./plugin-artifact-store";
16
16
  import type { EventBus } from "./event-bus-types";
17
17
  import type { WebSocketRouteRegistry } from "./ws-registry";
18
18
  import type { ReadinessRegistry } from "./readiness-registry";
19
+ import type { AdvisoryLockService } from "./advisory-lock";
19
20
 
20
21
  export * from "./types";
21
22
 
@@ -66,4 +67,10 @@ export const coreServices = {
66
67
  readinessRegistry: createServiceRef<ReadinessRegistry>(
67
68
  "core.readinessRegistry",
68
69
  ),
70
+ /**
71
+ * Postgres advisory-lock service backed by a dedicated pooled client, so
72
+ * session-level locks keep connection affinity across acquire/release.
73
+ * See {@link AdvisoryLockService}.
74
+ */
75
+ advisoryLock: createServiceRef<AdvisoryLockService>("core.advisoryLock"),
69
76
  };
@@ -1,5 +1,9 @@
1
- import { describe, expect, it } from "bun:test";
1
+ import { afterAll, beforeAll, describe, expect, it } from "bun:test";
2
+ import { mkdtemp, mkdir, writeFile, rm } from "node:fs/promises";
3
+ import { tmpdir } from "node:os";
4
+ import path from "node:path";
2
5
  import {
6
+ defaultEsmScriptRunner,
3
7
  normaliseUserScript,
4
8
  rewriteHelperImports,
5
9
  } from "./esm-script-runner";
@@ -167,3 +171,91 @@ describe("rewriteHelperImports", () => {
167
171
  expect(out).toBe(`import x from "${HELPER_URL}";`);
168
172
  });
169
173
  });
174
+
175
+ describe("defaultEsmScriptRunner resolutionRoot", () => {
176
+ let root: string;
177
+
178
+ beforeAll(async () => {
179
+ // A throwaway "store" with a node_modules holding one fake package.
180
+ root = await mkdtemp(path.join(tmpdir(), "cs-resroot-"));
181
+ const pkgDir = path.join(root, "node_modules", "fake-pkg");
182
+ await mkdir(pkgDir, { recursive: true });
183
+ await writeFile(
184
+ path.join(pkgDir, "package.json"),
185
+ JSON.stringify({ name: "fake-pkg", version: "1.0.0", main: "index.mjs" }),
186
+ );
187
+ await writeFile(
188
+ path.join(pkgDir, "index.mjs"),
189
+ "export const greeting = 'hello-from-pkg';\n",
190
+ );
191
+ });
192
+
193
+ afterAll(async () => {
194
+ await rm(root, { recursive: true, force: true });
195
+ });
196
+
197
+ it("lets a script import a package from <resolutionRoot>/node_modules", async () => {
198
+ const res = await defaultEsmScriptRunner.run({
199
+ script: `import { greeting } from "fake-pkg";\nexport default greeting;`,
200
+ context: {},
201
+ timeoutMs: 15_000,
202
+ resolutionRoot: root,
203
+ });
204
+ expect(res.error).toBeUndefined();
205
+ expect(res.result).toBe("hello-from-pkg");
206
+ });
207
+
208
+ it("cannot resolve the package without a resolutionRoot (backward-compatible isolation)", async () => {
209
+ const res = await defaultEsmScriptRunner.run({
210
+ script: `import { greeting } from "fake-pkg";\nexport default greeting;`,
211
+ context: {},
212
+ timeoutMs: 15_000,
213
+ });
214
+ // No resolutionRoot -> runs under os.tmpdir(), no node_modules -> the
215
+ // import fails. Either an error is surfaced or no result is produced.
216
+ expect(res.result).toBeUndefined();
217
+ expect(res.error).toBeDefined();
218
+ });
219
+
220
+ it("does NOT auto-install a missing package from the registry (degradation)", async () => {
221
+ // A real, installable package name that is NOT in any resolutionRoot.
222
+ // With auto-install disabled in the per-run bunfig, Bun must error
223
+ // instead of silently fetching it from the registry.
224
+ const res = await defaultEsmScriptRunner.run({
225
+ script: `import isodd from "is-odd";\nexport default typeof isodd;`,
226
+ context: {},
227
+ timeoutMs: 20_000,
228
+ });
229
+ expect(res.result).toBeUndefined();
230
+ expect(res.error).toBeDefined();
231
+ });
232
+ });
233
+
234
+ describe("defaultEsmScriptRunner injected env", () => {
235
+ it("exposes injected env vars as process.env in the subprocess", async () => {
236
+ const res = await defaultEsmScriptRunner.run({
237
+ script: `export default process.env.API_TOKEN ?? null;`,
238
+ context: {},
239
+ timeoutMs: 15_000,
240
+ env: { API_TOKEN: "injected-secret-value" },
241
+ });
242
+ expect(res.error).toBeUndefined();
243
+ expect(res.result).toBe("injected-secret-value");
244
+ });
245
+
246
+ it("does NOT expose backend env that was not injected (isolation intact)", async () => {
247
+ // A backend secret present in the parent process must NOT leak through
248
+ // unless it was explicitly injected for this run.
249
+ process.env.__CS_TEST_BACKEND_SECRET = "must-not-leak";
250
+ try {
251
+ const res = await defaultEsmScriptRunner.run({
252
+ script: `export default process.env.__CS_TEST_BACKEND_SECRET ?? null;`,
253
+ context: {},
254
+ timeoutMs: 15_000,
255
+ });
256
+ expect(res.result).toBeNull();
257
+ } finally {
258
+ delete process.env.__CS_TEST_BACKEND_SECRET;
259
+ }
260
+ });
261
+ });
@@ -86,6 +86,34 @@ export interface EsmScriptRunOptions {
86
86
  helperModuleName?: string;
87
87
  /** Name of the helper function injected as a global AND exported by the virtual module. */
88
88
  helperFunctionName?: string;
89
+ /**
90
+ * Optional directory the per-run temp dir is created *inside*, so Node /
91
+ * Bun module resolution walks up to `<resolutionRoot>/node_modules` and
92
+ * the user's script can `import` managed npm packages.
93
+ *
94
+ * When unset (the default), the per-run dir is created under
95
+ * `os.tmpdir()` exactly as before - backward-compatible, no node_modules
96
+ * visible, isolation unchanged. The script-packages reconciler points
97
+ * this at `<store>/current` (the atomically-flipped symlink to the
98
+ * active materialized tree).
99
+ *
100
+ * Execution isolation is unchanged either way: the subprocess still gets
101
+ * only `SAFE_ENV_VARS`, so packages cannot read backend secrets.
102
+ */
103
+ resolutionRoot?: string;
104
+ /**
105
+ * Extra environment variables injected into the subprocess for THIS run
106
+ * only, merged on top of `SAFE_ENV_VARS`. The Secrets platform uses this
107
+ * to inject a run's resolved secret -> env allowlist (decision 5,
108
+ * least-privilege): only the consumer's declared secrets are injected,
109
+ * memory-only, for the lifetime of this run. It deliberately does NOT
110
+ * widen the ambient `SAFE_ENV_VARS` whitelist — the values live only in
111
+ * this options object and the spawned process env.
112
+ *
113
+ * The user's script reads these as `process.env.ENV_NAME`. On a key
114
+ * collision with a safe var, the injected value wins.
115
+ */
116
+ env?: Record<string, string>;
89
117
  }
90
118
 
91
119
  /**
@@ -301,14 +329,22 @@ export const defaultEsmScriptRunner: EsmScriptRunner = {
301
329
  timeoutMs,
302
330
  helperModuleName,
303
331
  helperFunctionName,
332
+ resolutionRoot,
333
+ env: injectedEnv,
304
334
  }) {
305
335
  const sessionId = randomUUID();
306
336
  const markerStart = `##__CS_SCRIPT_RESULT_${sessionId}_START__##`;
307
337
  const markerEnd = `##__CS_SCRIPT_RESULT_${sessionId}_END__##`;
308
338
 
309
- const tmpDir = await mkdtemp(path.join(tmpdir(), "checkstack-script-"));
339
+ // When a `resolutionRoot` is given, create the per-run dir *inside* it
340
+ // so module resolution walks up to `<resolutionRoot>/node_modules`.
341
+ // Otherwise fall back to `os.tmpdir()` (today's behavior - no
342
+ // node_modules visible, fully backward compatible).
343
+ const tmpBase = resolutionRoot ?? tmpdir();
344
+ const tmpDir = await mkdtemp(path.join(tmpBase, "checkstack-script-"));
310
345
  const userScriptPath = path.join(tmpDir, "user.mjs");
311
346
  const runnerPath = path.join(tmpDir, "runner.mjs");
347
+ const bunfigPath = path.join(tmpDir, "bunfig.toml");
312
348
 
313
349
  const hasHelper =
314
350
  typeof helperModuleName === "string" &&
@@ -348,6 +384,14 @@ export const defaultEsmScriptRunner: EsmScriptRunner = {
348
384
  })
349
385
  : normalisedSource;
350
386
 
387
+ // Disable Bun auto-install in the per-run dir ALWAYS. Without this,
388
+ // `import "any-package"` silently fetches from the registry (verified
389
+ // empirically), defeating the whole managed-allowlist model. With it,
390
+ // an import resolves ONLY against the reconciled `<resolutionRoot>/
391
+ // node_modules` (when set) and otherwise fails fast - the clear
392
+ // degradation the package feature requires.
393
+ await writeFile(bunfigPath, '[install]\nauto = "disable"\n', "utf8");
394
+
351
395
  await writeFile(userScriptPath, userSource, "utf8");
352
396
  await writeFile(
353
397
  runnerPath,
@@ -363,7 +407,14 @@ export const defaultEsmScriptRunner: EsmScriptRunner = {
363
407
 
364
408
  proc = spawn({
365
409
  cmd: [process.execPath, runnerPath],
366
- env: pickSafeEnv(),
410
+ // CWD = the per-run dir so Bun reads its `bunfig.toml`
411
+ // (auto-install disabled) and resolves modules from
412
+ // `<resolutionRoot>/node_modules` when set.
413
+ cwd: tmpDir,
414
+ // Per-run injected env wins over the safe-vars whitelist. The
415
+ // injected secret values live only here + the child process; they
416
+ // never widen the ambient SAFE_ENV_VARS.
417
+ env: { ...pickSafeEnv(), ...injectedEnv },
367
418
  stdout: "pipe",
368
419
  stderr: "pipe",
369
420
  });
@@ -1,4 +1,9 @@
1
- import type { Hook, HookSubscribeOptions, HookUnsubscribe } from "./hooks";
1
+ import type {
2
+ Hook,
3
+ HookEventMeta,
4
+ HookSubscribeOptions,
5
+ HookUnsubscribe,
6
+ } from "./hooks";
2
7
 
3
8
  /**
4
9
  * EventBus interface for dependency injection
@@ -7,22 +12,26 @@ export interface EventBus {
7
12
  subscribe<T>(
8
13
  pluginId: string,
9
14
  hook: Hook<T>,
10
- listener: (payload: T) => Promise<void>,
15
+ listener: (payload: T, meta?: HookEventMeta) => Promise<void>,
11
16
  options?: HookSubscribeOptions
12
17
  ): Promise<HookUnsubscribe>;
13
18
 
14
19
  /**
15
20
  * Emit a hook through the distributed queue system.
16
21
  * All instances receive broadcast hooks; one instance handles work-queue hooks.
22
+ *
23
+ * `meta` carries event-envelope metadata (the acting `actor`). When omitted,
24
+ * the bus defaults to the system actor, so every emitted hook carries an
25
+ * actor even when emitted from a background/unauthenticated context.
17
26
  */
18
- emit<T>(hook: Hook<T>, payload: T): Promise<void>;
27
+ emit<T>(hook: Hook<T>, payload: T, meta?: HookEventMeta): Promise<void>;
19
28
 
20
29
  /**
21
30
  * Emit a hook locally only (not distributed).
22
31
  * Use for instance-local hooks that should only run on THIS instance.
23
32
  * Uses Promise.allSettled to ensure one listener error doesn't block others.
24
33
  */
25
- emitLocal<T>(hook: Hook<T>, payload: T): Promise<void>;
34
+ emitLocal<T>(hook: Hook<T>, payload: T, meta?: HookEventMeta): Promise<void>;
26
35
 
27
36
  shutdown(): Promise<void>;
28
37
  }
package/src/hooks.ts CHANGED
@@ -1,4 +1,4 @@
1
- import type { AccessRule } from "@checkstack/common";
1
+ import type { AccessRule, Actor } from "@checkstack/common";
2
2
 
3
3
  /**
4
4
  * Hook definition for type-safe event emission and subscription
@@ -8,6 +8,19 @@ export interface Hook<T = unknown> {
8
8
  _type?: T; // Phantom type for TypeScript inference
9
9
  }
10
10
 
11
+ /**
12
+ * Envelope metadata that travels alongside every emitted hook payload,
13
+ * independent of the hook's typed payload. Injected centrally at emit time
14
+ * (from the request context, defaulting to the system actor) and delivered to
15
+ * subscribers as the optional second listener argument.
16
+ *
17
+ * The automation engine reads `actor` and exposes it to automations as
18
+ * `trigger.actor`, so a trigger filter can gate on who/what caused the event.
19
+ */
20
+ export interface HookEventMeta {
21
+ actor: Actor;
22
+ }
23
+
11
24
  /**
12
25
  * Create a typed hook
13
26
  */
package/src/index.ts CHANGED
@@ -17,6 +17,7 @@ export * from "./rpc";
17
17
  export * from "./test-utils";
18
18
  export * from "./hooks";
19
19
  export * from "./event-bus-types";
20
+ export * from "./actor";
20
21
  export * from "./plugin-source";
21
22
  export * from "./plugin-artifact-store";
22
23
  export * from "./notification-strategy";
@@ -32,3 +33,4 @@ export * from "./incremental-aggregation";
32
33
  export * from "./aggregated-result";
33
34
  export * from "./ws-registry";
34
35
  export * from "./readiness-registry";
36
+ export * from "./advisory-lock";
@@ -2,7 +2,12 @@ import { NodePgDatabase } from "drizzle-orm/node-postgres";
2
2
  import { ServiceRef } from "./service-ref";
3
3
  import { ExtensionPoint } from "./extension-point";
4
4
  import type { AccessRule, PluginMetadata } from "@checkstack/common";
5
- import type { Hook, HookSubscribeOptions, HookUnsubscribe } from "./hooks";
5
+ import type {
6
+ Hook,
7
+ HookEventMeta,
8
+ HookSubscribeOptions,
9
+ HookUnsubscribe,
10
+ } from "./hooks";
6
11
  import { Router } from "@orpc/server";
7
12
  import { RpcContext } from "./rpc";
8
13
  import { AnyContractRouter } from "@orpc/contract";
@@ -48,7 +53,7 @@ export type AfterPluginsReadyContext = {
48
53
  */
49
54
  onHook: <T>(
50
55
  hook: Hook<T>,
51
- listener: (payload: T) => Promise<void>,
56
+ listener: (payload: T, meta?: HookEventMeta) => Promise<void>,
52
57
  options?: HookSubscribeOptions,
53
58
  ) => HookUnsubscribe;
54
59
  /**
@@ -80,6 +85,20 @@ export type BackendPluginRegistry = {
80
85
  ) => Promise<void>;
81
86
  }) => void;
82
87
  registerService: <S>(ref: ServiceRef<S>, impl: S) => void;
88
+ /**
89
+ * Resolve a platform service registered by another plugin under `ref`,
90
+ * using THIS plugin's identity as the consumer (for audit / scoped
91
+ * factories). Mirrors the standard dependency-injection resolution used
92
+ * for declared `deps`, but allows resolving ARBITRARY cross-plugin refs
93
+ * at runtime — the path used by the automation dispatch engine to hand
94
+ * `getService` to provider actions at execute time.
95
+ *
96
+ * Resolves the service, or throws a clear error if `ref` is not
97
+ * registered (it never silently returns `undefined`). Safe to call from
98
+ * `init` / `afterPluginsReady` onward, by which point services are
99
+ * registered.
100
+ */
101
+ getService: <S>(ref: ServiceRef<S>) => Promise<S>;
83
102
  registerExtensionPoint: <T>(ref: ExtensionPoint<T>, impl: T) => void;
84
103
  getExtensionPoint: <T>(ref: ExtensionPoint<T>) => T;
85
104
  /**
@@ -0,0 +1,44 @@
1
+ import { describe, expect, test } from "bun:test";
2
+ import { z } from "zod";
3
+ import { configString, withConfigMeta } from "./zod-config";
4
+ import { toJsonSchema } from "./schema-utils";
5
+
6
+ describe("toJsonSchema x-* metadata", () => {
7
+ test("propagates x-script-testable and x-editor-types onto the field", () => {
8
+ const schema = z.object({
9
+ script: configString({
10
+ "x-editor-types": ["typescript"],
11
+ "x-script-testable": true,
12
+ }),
13
+ });
14
+
15
+ const json = toJsonSchema(schema) as {
16
+ properties: Record<string, Record<string, unknown>>;
17
+ };
18
+
19
+ expect(json.properties.script?.["x-script-testable"]).toBe(true);
20
+ expect(json.properties.script?.["x-editor-types"]).toEqual(["typescript"]);
21
+ });
22
+
23
+ test("omits x-script-testable when not set", () => {
24
+ const schema = z.object({
25
+ plain: configString({}),
26
+ });
27
+ const json = toJsonSchema(schema) as {
28
+ properties: Record<string, Record<string, unknown>>;
29
+ };
30
+ expect("x-script-testable" in (json.properties.plain ?? {})).toBe(false);
31
+ });
32
+
33
+ test("propagates x-secret-env onto a record field via withConfigMeta", () => {
34
+ const schema = z.object({
35
+ secretEnv: withConfigMeta(z.record(z.string(), z.string()), {
36
+ "x-secret-env": true,
37
+ }),
38
+ });
39
+ const json = toJsonSchema(schema) as {
40
+ properties: Record<string, Record<string, unknown>>;
41
+ };
42
+ expect(json.properties.secretEnv?.["x-secret-env"]).toBe(true);
43
+ });
44
+ });
@@ -67,6 +67,12 @@ function addSchemaMetadata(
67
67
  if (meta["x-editor-types"]) {
68
68
  jsonField["x-editor-types"] = meta["x-editor-types"];
69
69
  }
70
+ if (meta["x-script-testable"]) {
71
+ jsonField["x-script-testable"] = true;
72
+ }
73
+ if (meta["x-secret-env"]) {
74
+ jsonField["x-secret-env"] = true;
75
+ }
70
76
  if (meta["x-hidden-when"]) {
71
77
  jsonField["x-hidden-when"] = meta["x-hidden-when"];
72
78
  }
package/src/zod-config.ts CHANGED
@@ -38,6 +38,22 @@ export interface ConfigMeta {
38
38
  * - "formdata": Key/value pair editor (URL-encoded)
39
39
  */
40
40
  "x-editor-types"?: EditorType[];
41
+ /**
42
+ * Mark this field as an inline script that can be tested in-UI. When the
43
+ * editor renders the field (via `MultiTypeEditorField`) and the owning
44
+ * page supplies a `scriptTestRenderer`, a test panel appears beneath the
45
+ * editor so operators can run the script against a sample context.
46
+ */
47
+ "x-script-testable"?: boolean;
48
+ /**
49
+ * Mark a record field as a secret -> env mapping
50
+ * (`{ ENV_NAME: "${{ secrets.NAME }}" }`). The editor renders a
51
+ * dedicated key (env name) + secret-name picker, with the available
52
+ * names supplied to `DynamicForm` via `secretNames` (from the secrets
53
+ * plugin's `listSecretNames`). Without the marker the record falls back
54
+ * to the plain JSON editor.
55
+ */
56
+ "x-secret-env"?: boolean;
41
57
  }
42
58
 
43
59
  /**
@@ -164,3 +180,20 @@ export function configBoolean(meta: ConfigMeta) {
164
180
  schema.register(configRegistry, meta);
165
181
  return schema;
166
182
  }
183
+
184
+ /**
185
+ * Attach config metadata to an existing schema (e.g. a `z.record`) and
186
+ * return it. Use this when a field's base schema is defined elsewhere
187
+ * (such as `secretEnvMappingSchema` from `@checkstack/secrets-common`) but
188
+ * still needs editor metadata like `x-secret-env`.
189
+ */
190
+ export function withConfigMeta<T extends z.ZodTypeAny>(
191
+ schema: T,
192
+ meta: ConfigMeta,
193
+ ): T {
194
+ // The registry is typed `z.registry<ConfigMeta>()`, so registering the
195
+ // meta is sound; the generic `T` confuses zod's conditional `.register`
196
+ // overload, so register through the base schema type.
197
+ (schema as z.ZodTypeAny).register(configRegistry, meta);
198
+ return schema;
199
+ }