@checkstack/signal-backend 0.2.12 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +51 -0
- package/package.json +3 -3
- package/src/signal-service-impl.test.ts +73 -0
- package/src/signal-service-impl.ts +50 -7
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,56 @@
|
|
|
1
1
|
# @checkstack/signal-backend
|
|
2
2
|
|
|
3
|
+
## 0.3.0
|
|
4
|
+
|
|
5
|
+
### Minor Changes
|
|
6
|
+
|
|
7
|
+
- 9dcc848: Stop the spurious "Plugin unknown is not using new API. Skipping." startup warning.
|
|
8
|
+
|
|
9
|
+
`@checkstack/signal-backend` is a host-consumed library (the backend imports `SignalServiceImpl` and `createWebSocketHandler` directly), but its `package.json` declared `checkstack.type: "backend"`, so plugin discovery inserted it as a runtime backend plugin and the loader tried to read a default `register()` export it does not have - logging the offending package as the literal `unknown`.
|
|
10
|
+
|
|
11
|
+
- Reclassify `@checkstack/signal-backend` to `checkstack.type: "tooling"` (like `@checkstack/backend-api`), so it is no longer discovered or registered as a backend plugin. No runtime behavior change - the SignalService and WebSocket handler are still instantiated and registered directly by the host backend.
|
|
12
|
+
- Harden the loader's skip diagnostic so it can never render `unknown`: it resolves the offending plugin by its database-row package name (falling back to the on-disk path) and tells operators to set `checkstack.type` to `"tooling"` for host-consumed libraries.
|
|
13
|
+
|
|
14
|
+
This is a beta minor.
|
|
15
|
+
|
|
16
|
+
### Patch Changes
|
|
17
|
+
|
|
18
|
+
- 9dcc848: Write-path hardening: post-commit side effects can no longer fail a committed write, multi-row mutations are now atomic, and retry-duplication is blocked at the database.
|
|
19
|
+
|
|
20
|
+
**Platform-level (automatic for all current and future plugins):**
|
|
21
|
+
|
|
22
|
+
- signal-backend: `SignalService` (broadcast / sendToUser / sendToUsers / sendToAuthorizedUsers) is now resilient by construction - a transient event-bus/queue failure is caught and logged instead of thrown. Real-time signals are best-effort UI nudges; the authoritative data is already committed by the time a mutation broadcasts, so a signal-transport blip must never turn a successful write into a client-visible error. Every plugin's broadcasts inherit this without per-call-site `try/catch` (which would inevitably be forgotten and regress). This mirrors `createCachedScope`, which already makes cache invalidation non-throwing - so the cache + signal halves of the "post-commit side effect fails the response" class are both closed at the platform seam. Durable side effects (events/hooks that drive automations, queue jobs) intentionally still surface failures. Documented in `developer-guide/backend/signals.md`.
|
|
23
|
+
|
|
24
|
+
**Atomic multi-write mutations (each previously committed row-by-row in autocommit, so a mid-sequence failure left partial/orphaned state):**
|
|
25
|
+
|
|
26
|
+
- slo-backend: `createObjective` now inserts the objective and its 1:1 streak row in one transaction; the post-create reconcile/status/notify steps are best-effort and can no longer fail the (committed) create.
|
|
27
|
+
- incident-backend: `createIncident`, `updateIncident`, `addUpdate`, and `resolveIncident` wrap their row + system-link + timeline writes in a transaction (no more wiped system associations on a failed re-insert, or status flips with no matching timeline entry).
|
|
28
|
+
- maintenance-backend: same for `createMaintenance`, `updateMaintenance`, `addUpdate`, `closeMaintenance`.
|
|
29
|
+
- automation-backend: `cancelRun` marks the run cancelled and tears down its wait locks + durable state in one transaction - previously a failure after the status update could leave a wait lock behind, letting a later trigger event resume an already-cancelled run.
|
|
30
|
+
- healthcheck-backend: `ingestSatelliteResult` commits the run row and its hourly-aggregate increment together (no orphaned run, no aggregate without a backing run). NOTE: this guarantees run/aggregate consistency but does not yet make a _duplicate satellite delivery_ idempotent - that needs a dedupe key on the high-volume runs table and is tracked as a follow-up.
|
|
31
|
+
|
|
32
|
+
**Retry-duplication blocked at the DB (paired with the SQLSTATE 23505 -> 409 mapping shipped separately):**
|
|
33
|
+
|
|
34
|
+
- catalog-backend: new unique indexes on `groups.name`, `environments.name` (consistent with `systems.name`), on `system_links (system_id, url)`, and on `system_contacts (system_id, user_id)` + `(system_id, email)` (NULLs are distinct, so user vs mailbox contacts don't interfere). Name uniqueness is CASE-INSENSITIVE: the three name indexes are functional `lower(name)` indexes (the existing `systems.name` index is rebuilt this way too), so "Api" and "api" collide while the stored value keeps its original casing. The systems pre-write name check (`getSystemByName`) is case-folded to match. Migration `0005` de-dupes any pre-existing rows first - names are preserved by suffixing later case-insensitive duplicates (" (2)", " (3)", ...), redundant contact/link rows are removed keeping the earliest. (Link URLs stay case-sensitive - URL paths are; contact emails are deduped exact-match.)
|
|
35
|
+
- incident-backend / maintenance-backend: unique index on `incident_links (incident_id, url)` / `maintenance_links (maintenance_id, url)`, with a de-dupe step in the migration.
|
|
36
|
+
|
|
37
|
+
**Behavior change:** creating a group/environment with a duplicate name, or attaching a duplicate contact/link, now returns `409 Conflict` instead of silently creating a duplicate. The migrations resolve existing duplicates on upgrade.
|
|
38
|
+
|
|
39
|
+
This is a beta patch.
|
|
40
|
+
|
|
41
|
+
- Updated dependencies [9dcc848]
|
|
42
|
+
- Updated dependencies [9dcc848]
|
|
43
|
+
- Updated dependencies [9dcc848]
|
|
44
|
+
- Updated dependencies [9dcc848]
|
|
45
|
+
- Updated dependencies [9dcc848]
|
|
46
|
+
- Updated dependencies [9dcc848]
|
|
47
|
+
- Updated dependencies [9dcc848]
|
|
48
|
+
- Updated dependencies [9dcc848]
|
|
49
|
+
- Updated dependencies [9dcc848]
|
|
50
|
+
- @checkstack/backend-api@0.21.0
|
|
51
|
+
- @checkstack/common@0.13.0
|
|
52
|
+
- @checkstack/signal-common@0.2.6
|
|
53
|
+
|
|
3
54
|
## 0.2.12
|
|
4
55
|
|
|
5
56
|
### Patch Changes
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@checkstack/signal-backend",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.3.0",
|
|
4
4
|
"license": "Elastic-2.0",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"exports": {
|
|
@@ -11,7 +11,7 @@
|
|
|
11
11
|
"dependencies": {
|
|
12
12
|
"@checkstack/common": "0.12.0",
|
|
13
13
|
"@checkstack/signal-common": "0.2.5",
|
|
14
|
-
"@checkstack/backend-api": "0.
|
|
14
|
+
"@checkstack/backend-api": "0.20.0"
|
|
15
15
|
},
|
|
16
16
|
"devDependencies": {
|
|
17
17
|
"@types/bun": "latest",
|
|
@@ -26,6 +26,6 @@
|
|
|
26
26
|
"lint:code": "eslint . --max-warnings 0"
|
|
27
27
|
},
|
|
28
28
|
"checkstack": {
|
|
29
|
-
"type": "
|
|
29
|
+
"type": "tooling"
|
|
30
30
|
}
|
|
31
31
|
}
|
|
@@ -161,6 +161,79 @@ describe("SignalServiceImpl", () => {
|
|
|
161
161
|
});
|
|
162
162
|
});
|
|
163
163
|
|
|
164
|
+
describe("resilience: a signal failure never fails the caller's committed write", () => {
|
|
165
|
+
let throwingService: SignalServiceImpl;
|
|
166
|
+
let throwingLogger: Logger;
|
|
167
|
+
let warnCalls: number;
|
|
168
|
+
|
|
169
|
+
beforeEach(() => {
|
|
170
|
+
warnCalls = 0;
|
|
171
|
+
const throwingEventBus = {
|
|
172
|
+
// Simulate a transient event-bus/queue outage on emit.
|
|
173
|
+
emit: mock(async () => {
|
|
174
|
+
throw new Error("event bus unavailable");
|
|
175
|
+
}),
|
|
176
|
+
subscribe: mock(async () => {}),
|
|
177
|
+
shutdown: mock(async () => {}),
|
|
178
|
+
} as unknown as EventBus;
|
|
179
|
+
throwingLogger = {
|
|
180
|
+
debug: mock(() => {}),
|
|
181
|
+
info: mock(() => {}),
|
|
182
|
+
warn: mock(() => {
|
|
183
|
+
warnCalls += 1;
|
|
184
|
+
}),
|
|
185
|
+
error: mock(() => {}),
|
|
186
|
+
child: mock(() => throwingLogger),
|
|
187
|
+
} as unknown as Logger;
|
|
188
|
+
throwingService = new SignalServiceImpl(throwingEventBus, throwingLogger);
|
|
189
|
+
});
|
|
190
|
+
|
|
191
|
+
it("broadcast resolves (does not throw) when the event bus emit fails, and logs a warning", async () => {
|
|
192
|
+
await expect(
|
|
193
|
+
throwingService.broadcast(TEST_BROADCAST_SIGNAL, { message: "hi" }),
|
|
194
|
+
).resolves.toBeUndefined();
|
|
195
|
+
expect(warnCalls).toBe(1);
|
|
196
|
+
});
|
|
197
|
+
|
|
198
|
+
it("sendToUser resolves when the event bus emit fails", async () => {
|
|
199
|
+
await expect(
|
|
200
|
+
throwingService.sendToUser(TEST_USER_SIGNAL, "u1", {
|
|
201
|
+
notification: "n",
|
|
202
|
+
count: 1,
|
|
203
|
+
}),
|
|
204
|
+
).resolves.toBeUndefined();
|
|
205
|
+
expect(warnCalls).toBe(1);
|
|
206
|
+
});
|
|
207
|
+
|
|
208
|
+
it("sendToUsers resolves when every per-user emit fails", async () => {
|
|
209
|
+
await expect(
|
|
210
|
+
throwingService.sendToUsers(TEST_USER_SIGNAL, ["u1", "u2"], {
|
|
211
|
+
notification: "n",
|
|
212
|
+
count: 1,
|
|
213
|
+
}),
|
|
214
|
+
).resolves.toBeUndefined();
|
|
215
|
+
expect(warnCalls).toBe(2);
|
|
216
|
+
});
|
|
217
|
+
|
|
218
|
+
it("sendToAuthorizedUsers resolves when the auth-filter RPC fails", async () => {
|
|
219
|
+
throwingService.setAuthClient({
|
|
220
|
+
filterUsersByAccessRule: mock(async () => {
|
|
221
|
+
throw new Error("auth rpc down");
|
|
222
|
+
}),
|
|
223
|
+
});
|
|
224
|
+
await expect(
|
|
225
|
+
throwingService.sendToAuthorizedUsers(
|
|
226
|
+
TEST_USER_SIGNAL,
|
|
227
|
+
["u1"],
|
|
228
|
+
{ notification: "n", count: 1 },
|
|
229
|
+
testPluginMetadata,
|
|
230
|
+
{ id: "thing.read" },
|
|
231
|
+
),
|
|
232
|
+
).resolves.toBeUndefined();
|
|
233
|
+
expect(warnCalls).toBe(1);
|
|
234
|
+
});
|
|
235
|
+
});
|
|
236
|
+
|
|
164
237
|
describe("Signal Hooks", () => {
|
|
165
238
|
it("should have correct hook IDs", () => {
|
|
166
239
|
expect(SIGNAL_BROADCAST_HOOK.id).toBe("signal.internal.broadcast");
|
|
@@ -37,6 +37,34 @@ export class SignalServiceImpl implements SignalService {
|
|
|
37
37
|
this.authClient = client;
|
|
38
38
|
}
|
|
39
39
|
|
|
40
|
+
/**
|
|
41
|
+
* Real-time signals are BEST-EFFORT, by platform contract. They are a UI
|
|
42
|
+
* convenience (a live push so a client refreshes sooner); the authoritative
|
|
43
|
+
* data is already committed to the database by the time a mutation broadcasts.
|
|
44
|
+
* A transient event-bus/queue failure must therefore NEVER turn a successful,
|
|
45
|
+
* committed write into a client-visible error. Every signal-send routes
|
|
46
|
+
* through here so that guarantee holds for ALL callers - including future
|
|
47
|
+
* plugins - without each call site needing its own try/catch (which would
|
|
48
|
+
* inevitably be forgotten and regress). The mirror of `createCachedScope`,
|
|
49
|
+
* which already makes cache invalidation non-throwing the same way.
|
|
50
|
+
*
|
|
51
|
+
* If a send fails, the client simply misses one live nudge and picks up the
|
|
52
|
+
* (already-persisted) state on its next fetch/refetch.
|
|
53
|
+
*/
|
|
54
|
+
private async safeSend(
|
|
55
|
+
description: string,
|
|
56
|
+
send: () => Promise<void>,
|
|
57
|
+
): Promise<void> {
|
|
58
|
+
try {
|
|
59
|
+
await send();
|
|
60
|
+
} catch (error) {
|
|
61
|
+
this.logger.warn(
|
|
62
|
+
`Signal delivery failed (${description}) - the write succeeded; clients will reconcile on next fetch.`,
|
|
63
|
+
{ error },
|
|
64
|
+
);
|
|
65
|
+
}
|
|
66
|
+
}
|
|
67
|
+
|
|
40
68
|
async broadcast<T>(signal: Signal<T>, payload: T): Promise<void> {
|
|
41
69
|
const message: SignalMessage<T> = {
|
|
42
70
|
signalId: signal.id,
|
|
@@ -48,7 +76,9 @@ export class SignalServiceImpl implements SignalService {
|
|
|
48
76
|
this.logger.debug(`Broadcasting signal: ${signal.id}`);
|
|
49
77
|
|
|
50
78
|
// Emit to EventBus - all backend instances receive and push to their WebSocket clients
|
|
51
|
-
await this.
|
|
79
|
+
await this.safeSend(`broadcast ${signal.id}`, () =>
|
|
80
|
+
this.eventBus.emit(SIGNAL_BROADCAST_HOOK, message),
|
|
81
|
+
);
|
|
52
82
|
}
|
|
53
83
|
|
|
54
84
|
async sendToUser<T>(
|
|
@@ -65,7 +95,9 @@ export class SignalServiceImpl implements SignalService {
|
|
|
65
95
|
|
|
66
96
|
this.logger.debug(`Sending signal ${signal.id} to user ${userId}`);
|
|
67
97
|
|
|
68
|
-
await this.
|
|
98
|
+
await this.safeSend(`${signal.id} -> user ${userId}`, () =>
|
|
99
|
+
this.eventBus.emit(SIGNAL_USER_HOOK, { userId, message }),
|
|
100
|
+
);
|
|
69
101
|
}
|
|
70
102
|
|
|
71
103
|
async sendToUsers<T>(
|
|
@@ -97,11 +129,22 @@ export class SignalServiceImpl implements SignalService {
|
|
|
97
129
|
// Construct fully-qualified access rule ID: ${pluginMetadata.pluginId}.${accessRule.id}
|
|
98
130
|
const qualifiedAccessRule = qualifyAccessRuleId(pluginMetadata, accessRule);
|
|
99
131
|
|
|
100
|
-
// Filter users via auth RPC
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
132
|
+
// Filter users via auth RPC. Best-effort like the send itself: if the auth
|
|
133
|
+
// lookup transiently fails, skip the live nudge rather than fail the caller's
|
|
134
|
+
// already-committed write.
|
|
135
|
+
let authorizedIds: string[];
|
|
136
|
+
try {
|
|
137
|
+
authorizedIds = await this.authClient.filterUsersByAccessRule({
|
|
138
|
+
userIds,
|
|
139
|
+
accessRule: qualifiedAccessRule,
|
|
140
|
+
});
|
|
141
|
+
} catch (error) {
|
|
142
|
+
this.logger.warn(
|
|
143
|
+
`Signal delivery failed (authz filter for ${signal.id}) - the write succeeded; clients will reconcile on next fetch.`,
|
|
144
|
+
{ error },
|
|
145
|
+
);
|
|
146
|
+
return;
|
|
147
|
+
}
|
|
105
148
|
|
|
106
149
|
if (authorizedIds.length === 0) {
|
|
107
150
|
this.logger.debug(
|