@checkstack/healthcheck-backend 0.4.2 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,121 @@
1
1
  # @checkstack/healthcheck-backend
2
2
 
3
+ ## 0.6.0
4
+
5
+ ### Minor Changes
6
+
7
+ - 11d2679: Add ability to pause health check configurations globally. When paused, health checks continue to be scheduled but execution is skipped for all systems using that configuration. Users with manage access can pause/resume from the Health Checks config page.
8
+ - cce5453: Add notification suppression for incidents
9
+
10
+ - Added `suppressNotifications` field to incidents, allowing active incidents to optionally suppress health check notifications
11
+ - When enabled, health status change notifications will not be sent for affected systems while the incident is active (not resolved)
12
+ - Mirrors the existing maintenance notification suppression pattern
13
+ - Added toggle UI in the IncidentEditor dialog
14
+ - Added `hasActiveIncidentWithSuppression` RPC endpoint for service-to-service queries
15
+
16
+ ### Patch Changes
17
+
18
+ - Updated dependencies [11d2679]
19
+ - Updated dependencies [cce5453]
20
+ - @checkstack/healthcheck-common@0.6.0
21
+ - @checkstack/incident-common@0.4.0
22
+
23
+ ## 0.5.0
24
+
25
+ ### Minor Changes
26
+
27
+ - 095cf4e: ### Cross-Tier Data Aggregation
28
+
29
+ Implements intelligent cross-tier querying for health check history, enabling seamless data retrieval across raw, hourly, and daily storage tiers.
30
+
31
+ **What changed:**
32
+
33
+ - `getAggregatedHistory` now queries all three tiers (raw, hourly, daily) in parallel
34
+ - Added `NormalizedBucket` type for unified bucket format across tiers
35
+ - Added `mergeTieredBuckets()` to merge data with priority (raw > hourly > daily)
36
+ - Added `combineBuckets()` and `reaggregateBuckets()` for re-aggregation to target bucket size
37
+ - Raw data preserves full granularity when available (uses target bucket interval)
38
+
39
+ **Why:**
40
+
41
+ - Previously, the API only queried raw runs, which are retained for a limited period (default 7 days)
42
+ - For longer time ranges, data was missing because hourly/daily aggregates weren't queried
43
+ - The retention job only runs periodically, so we can't assume tier boundaries based on config
44
+ - Querying all tiers ensures no gaps in data coverage
45
+
46
+ **Technical details:**
47
+
48
+ - Additive metrics (counts, latencySum) are summed correctly for accurate averages
49
+ - p95 latency uses max of source p95s as conservative upper-bound approximation
50
+ - `aggregatedResult` (strategy-specific) is preserved for raw-only buckets
51
+
52
+ - ac3a4cf: ### Dynamic Bucket Sizing for Health Check Visualization
53
+
54
+ Implements industry-standard dynamic bucket sizing for health check data aggregation, following patterns from Grafana/VictoriaMetrics.
55
+
56
+ **What changed:**
57
+
58
+ - Replaced fixed `bucketSize: "hourly" | "daily" | "auto"` with dynamic `targetPoints` parameter (default: 500)
59
+ - Bucket interval is now calculated as `(endDate - startDate) / targetPoints` with a minimum of 1 second
60
+ - Added `bucketIntervalSeconds` to aggregated response and individual buckets
61
+ - Updated chart components to use dynamic time formatting based on bucket interval
62
+
63
+ **Why:**
64
+
65
+ - A 24-hour view with 1-second health checks previously returned 86,400+ data points, causing lag
66
+ - Now returns ~500 data points regardless of timeframe, ensuring consistent chart performance
67
+ - Charts still preserve visual fidelity through proper aggregation
68
+
69
+ **Breaking Change:**
70
+
71
+ - `bucketSize` parameter removed from `getAggregatedHistory` and `getDetailedAggregatedHistory` endpoints
72
+ - Use `targetPoints` instead (defaults to 500 if not specified)
73
+
74
+ ***
75
+
76
+ ### Collector Aggregated Charts Fix
77
+
78
+ Fixed issue where collector auto-charts (like HTTP request response time charts) were not showing in aggregated data mode.
79
+
80
+ **What changed:**
81
+
82
+ - Added `aggregatedResultSchema` to `CollectorDtoSchema`
83
+ - Backend now returns collector aggregated schemas via `getCollectors` endpoint
84
+ - Frontend `useStrategySchemas` hook now merges collector aggregated schemas
85
+ - Service now calls each collector's `aggregateResult()` when building buckets
86
+ - Aggregated collector data stored in `aggregatedResult.collectors[uuid]`
87
+
88
+ **Why:**
89
+
90
+ - Previously only strategy-level aggregated results were computed
91
+ - Collectors like HTTP Request Collector have their own `aggregateResult` method
92
+ - Without calling these, fields like `avgResponseTimeMs` and `successRate` were missing from aggregated buckets
93
+
94
+ - db1f56f: Add ephemeral field stripping to reduce database storage for health checks
95
+
96
+ - Added `x-ephemeral` metadata flag to `HealthResultMeta` for marking fields that should not be persisted
97
+ - All health result factory functions (`healthResultString`, `healthResultNumber`, `healthResultBoolean`, `healthResultArray`, `healthResultJSONPath`) now accept `x-ephemeral`
98
+ - Added `stripEphemeralFields()` utility to remove ephemeral fields before database storage
99
+ - Integrated ephemeral field stripping into `queue-executor.ts` for all collector results
100
+ - HTTP Request collector now explicitly marks `body` as ephemeral
101
+
102
+ This significantly reduces database storage for health checks with large response bodies, while still allowing assertions to run against the full response at execution time.
103
+
104
+ ### Patch Changes
105
+
106
+ - Updated dependencies [ac3a4cf]
107
+ - Updated dependencies [db1f56f]
108
+ - @checkstack/healthcheck-common@0.5.0
109
+ - @checkstack/common@0.6.0
110
+ - @checkstack/backend-api@0.5.1
111
+ - @checkstack/catalog-backend@0.2.8
112
+ - @checkstack/catalog-common@1.2.4
113
+ - @checkstack/command-backend@0.1.7
114
+ - @checkstack/integration-backend@0.1.7
115
+ - @checkstack/maintenance-common@0.4.2
116
+ - @checkstack/signal-common@0.1.4
117
+ - @checkstack/queue-api@0.2.1
118
+
3
119
  ## 0.4.2
4
120
 
5
121
  ### Patch Changes
@@ -0,0 +1 @@
1
+ ALTER TABLE "health_check_aggregates" ADD COLUMN "latency_sum_ms" integer;
@@ -0,0 +1 @@
1
+ ALTER TABLE "health_check_configurations" ADD COLUMN "paused" boolean DEFAULT false NOT NULL;
@@ -0,0 +1,413 @@
1
+ {
2
+ "id": "bb50b71f-3f81-4cb2-aac6-7e7564060fa1",
3
+ "prevId": "1ae9bf74-594e-4801-b49a-e1b7073d8572",
4
+ "version": "7",
5
+ "dialect": "postgresql",
6
+ "tables": {
7
+ "public.health_check_aggregates": {
8
+ "name": "health_check_aggregates",
9
+ "schema": "",
10
+ "columns": {
11
+ "id": {
12
+ "name": "id",
13
+ "type": "uuid",
14
+ "primaryKey": true,
15
+ "notNull": true,
16
+ "default": "gen_random_uuid()"
17
+ },
18
+ "configuration_id": {
19
+ "name": "configuration_id",
20
+ "type": "uuid",
21
+ "primaryKey": false,
22
+ "notNull": true
23
+ },
24
+ "system_id": {
25
+ "name": "system_id",
26
+ "type": "text",
27
+ "primaryKey": false,
28
+ "notNull": true
29
+ },
30
+ "bucket_start": {
31
+ "name": "bucket_start",
32
+ "type": "timestamp",
33
+ "primaryKey": false,
34
+ "notNull": true
35
+ },
36
+ "bucket_size": {
37
+ "name": "bucket_size",
38
+ "type": "bucket_size",
39
+ "typeSchema": "public",
40
+ "primaryKey": false,
41
+ "notNull": true
42
+ },
43
+ "run_count": {
44
+ "name": "run_count",
45
+ "type": "integer",
46
+ "primaryKey": false,
47
+ "notNull": true
48
+ },
49
+ "healthy_count": {
50
+ "name": "healthy_count",
51
+ "type": "integer",
52
+ "primaryKey": false,
53
+ "notNull": true
54
+ },
55
+ "degraded_count": {
56
+ "name": "degraded_count",
57
+ "type": "integer",
58
+ "primaryKey": false,
59
+ "notNull": true
60
+ },
61
+ "unhealthy_count": {
62
+ "name": "unhealthy_count",
63
+ "type": "integer",
64
+ "primaryKey": false,
65
+ "notNull": true
66
+ },
67
+ "latency_sum_ms": {
68
+ "name": "latency_sum_ms",
69
+ "type": "integer",
70
+ "primaryKey": false,
71
+ "notNull": false
72
+ },
73
+ "avg_latency_ms": {
74
+ "name": "avg_latency_ms",
75
+ "type": "integer",
76
+ "primaryKey": false,
77
+ "notNull": false
78
+ },
79
+ "min_latency_ms": {
80
+ "name": "min_latency_ms",
81
+ "type": "integer",
82
+ "primaryKey": false,
83
+ "notNull": false
84
+ },
85
+ "max_latency_ms": {
86
+ "name": "max_latency_ms",
87
+ "type": "integer",
88
+ "primaryKey": false,
89
+ "notNull": false
90
+ },
91
+ "p95_latency_ms": {
92
+ "name": "p95_latency_ms",
93
+ "type": "integer",
94
+ "primaryKey": false,
95
+ "notNull": false
96
+ },
97
+ "aggregated_result": {
98
+ "name": "aggregated_result",
99
+ "type": "jsonb",
100
+ "primaryKey": false,
101
+ "notNull": false
102
+ }
103
+ },
104
+ "indexes": {
105
+ "health_check_aggregates_bucket_unique": {
106
+ "name": "health_check_aggregates_bucket_unique",
107
+ "columns": [
108
+ {
109
+ "expression": "configuration_id",
110
+ "isExpression": false,
111
+ "asc": true,
112
+ "nulls": "last"
113
+ },
114
+ {
115
+ "expression": "system_id",
116
+ "isExpression": false,
117
+ "asc": true,
118
+ "nulls": "last"
119
+ },
120
+ {
121
+ "expression": "bucket_start",
122
+ "isExpression": false,
123
+ "asc": true,
124
+ "nulls": "last"
125
+ },
126
+ {
127
+ "expression": "bucket_size",
128
+ "isExpression": false,
129
+ "asc": true,
130
+ "nulls": "last"
131
+ }
132
+ ],
133
+ "isUnique": true,
134
+ "concurrently": false,
135
+ "method": "btree",
136
+ "with": {}
137
+ }
138
+ },
139
+ "foreignKeys": {
140
+ "health_check_aggregates_configuration_id_health_check_configurations_id_fk": {
141
+ "name": "health_check_aggregates_configuration_id_health_check_configurations_id_fk",
142
+ "tableFrom": "health_check_aggregates",
143
+ "tableTo": "health_check_configurations",
144
+ "columnsFrom": [
145
+ "configuration_id"
146
+ ],
147
+ "columnsTo": [
148
+ "id"
149
+ ],
150
+ "onDelete": "cascade",
151
+ "onUpdate": "no action"
152
+ }
153
+ },
154
+ "compositePrimaryKeys": {},
155
+ "uniqueConstraints": {},
156
+ "policies": {},
157
+ "checkConstraints": {},
158
+ "isRLSEnabled": false
159
+ },
160
+ "public.health_check_configurations": {
161
+ "name": "health_check_configurations",
162
+ "schema": "",
163
+ "columns": {
164
+ "id": {
165
+ "name": "id",
166
+ "type": "uuid",
167
+ "primaryKey": true,
168
+ "notNull": true,
169
+ "default": "gen_random_uuid()"
170
+ },
171
+ "name": {
172
+ "name": "name",
173
+ "type": "text",
174
+ "primaryKey": false,
175
+ "notNull": true
176
+ },
177
+ "strategy_id": {
178
+ "name": "strategy_id",
179
+ "type": "text",
180
+ "primaryKey": false,
181
+ "notNull": true
182
+ },
183
+ "config": {
184
+ "name": "config",
185
+ "type": "jsonb",
186
+ "primaryKey": false,
187
+ "notNull": true
188
+ },
189
+ "collectors": {
190
+ "name": "collectors",
191
+ "type": "jsonb",
192
+ "primaryKey": false,
193
+ "notNull": false
194
+ },
195
+ "interval_seconds": {
196
+ "name": "interval_seconds",
197
+ "type": "integer",
198
+ "primaryKey": false,
199
+ "notNull": true
200
+ },
201
+ "is_template": {
202
+ "name": "is_template",
203
+ "type": "boolean",
204
+ "primaryKey": false,
205
+ "notNull": false,
206
+ "default": false
207
+ },
208
+ "created_at": {
209
+ "name": "created_at",
210
+ "type": "timestamp",
211
+ "primaryKey": false,
212
+ "notNull": true,
213
+ "default": "now()"
214
+ },
215
+ "updated_at": {
216
+ "name": "updated_at",
217
+ "type": "timestamp",
218
+ "primaryKey": false,
219
+ "notNull": true,
220
+ "default": "now()"
221
+ }
222
+ },
223
+ "indexes": {},
224
+ "foreignKeys": {},
225
+ "compositePrimaryKeys": {},
226
+ "uniqueConstraints": {},
227
+ "policies": {},
228
+ "checkConstraints": {},
229
+ "isRLSEnabled": false
230
+ },
231
+ "public.health_check_runs": {
232
+ "name": "health_check_runs",
233
+ "schema": "",
234
+ "columns": {
235
+ "id": {
236
+ "name": "id",
237
+ "type": "uuid",
238
+ "primaryKey": true,
239
+ "notNull": true,
240
+ "default": "gen_random_uuid()"
241
+ },
242
+ "configuration_id": {
243
+ "name": "configuration_id",
244
+ "type": "uuid",
245
+ "primaryKey": false,
246
+ "notNull": true
247
+ },
248
+ "system_id": {
249
+ "name": "system_id",
250
+ "type": "text",
251
+ "primaryKey": false,
252
+ "notNull": true
253
+ },
254
+ "status": {
255
+ "name": "status",
256
+ "type": "health_check_status",
257
+ "typeSchema": "public",
258
+ "primaryKey": false,
259
+ "notNull": true
260
+ },
261
+ "latency_ms": {
262
+ "name": "latency_ms",
263
+ "type": "integer",
264
+ "primaryKey": false,
265
+ "notNull": false
266
+ },
267
+ "result": {
268
+ "name": "result",
269
+ "type": "jsonb",
270
+ "primaryKey": false,
271
+ "notNull": false
272
+ },
273
+ "timestamp": {
274
+ "name": "timestamp",
275
+ "type": "timestamp",
276
+ "primaryKey": false,
277
+ "notNull": true,
278
+ "default": "now()"
279
+ }
280
+ },
281
+ "indexes": {},
282
+ "foreignKeys": {
283
+ "health_check_runs_configuration_id_health_check_configurations_id_fk": {
284
+ "name": "health_check_runs_configuration_id_health_check_configurations_id_fk",
285
+ "tableFrom": "health_check_runs",
286
+ "tableTo": "health_check_configurations",
287
+ "columnsFrom": [
288
+ "configuration_id"
289
+ ],
290
+ "columnsTo": [
291
+ "id"
292
+ ],
293
+ "onDelete": "cascade",
294
+ "onUpdate": "no action"
295
+ }
296
+ },
297
+ "compositePrimaryKeys": {},
298
+ "uniqueConstraints": {},
299
+ "policies": {},
300
+ "checkConstraints": {},
301
+ "isRLSEnabled": false
302
+ },
303
+ "public.system_health_checks": {
304
+ "name": "system_health_checks",
305
+ "schema": "",
306
+ "columns": {
307
+ "system_id": {
308
+ "name": "system_id",
309
+ "type": "text",
310
+ "primaryKey": false,
311
+ "notNull": true
312
+ },
313
+ "configuration_id": {
314
+ "name": "configuration_id",
315
+ "type": "uuid",
316
+ "primaryKey": false,
317
+ "notNull": true
318
+ },
319
+ "enabled": {
320
+ "name": "enabled",
321
+ "type": "boolean",
322
+ "primaryKey": false,
323
+ "notNull": true,
324
+ "default": true
325
+ },
326
+ "state_thresholds": {
327
+ "name": "state_thresholds",
328
+ "type": "jsonb",
329
+ "primaryKey": false,
330
+ "notNull": false
331
+ },
332
+ "retention_config": {
333
+ "name": "retention_config",
334
+ "type": "jsonb",
335
+ "primaryKey": false,
336
+ "notNull": false
337
+ },
338
+ "created_at": {
339
+ "name": "created_at",
340
+ "type": "timestamp",
341
+ "primaryKey": false,
342
+ "notNull": true,
343
+ "default": "now()"
344
+ },
345
+ "updated_at": {
346
+ "name": "updated_at",
347
+ "type": "timestamp",
348
+ "primaryKey": false,
349
+ "notNull": true,
350
+ "default": "now()"
351
+ }
352
+ },
353
+ "indexes": {},
354
+ "foreignKeys": {
355
+ "system_health_checks_configuration_id_health_check_configurations_id_fk": {
356
+ "name": "system_health_checks_configuration_id_health_check_configurations_id_fk",
357
+ "tableFrom": "system_health_checks",
358
+ "tableTo": "health_check_configurations",
359
+ "columnsFrom": [
360
+ "configuration_id"
361
+ ],
362
+ "columnsTo": [
363
+ "id"
364
+ ],
365
+ "onDelete": "cascade",
366
+ "onUpdate": "no action"
367
+ }
368
+ },
369
+ "compositePrimaryKeys": {
370
+ "system_health_checks_system_id_configuration_id_pk": {
371
+ "name": "system_health_checks_system_id_configuration_id_pk",
372
+ "columns": [
373
+ "system_id",
374
+ "configuration_id"
375
+ ]
376
+ }
377
+ },
378
+ "uniqueConstraints": {},
379
+ "policies": {},
380
+ "checkConstraints": {},
381
+ "isRLSEnabled": false
382
+ }
383
+ },
384
+ "enums": {
385
+ "public.bucket_size": {
386
+ "name": "bucket_size",
387
+ "schema": "public",
388
+ "values": [
389
+ "hourly",
390
+ "daily"
391
+ ]
392
+ },
393
+ "public.health_check_status": {
394
+ "name": "health_check_status",
395
+ "schema": "public",
396
+ "values": [
397
+ "healthy",
398
+ "unhealthy",
399
+ "degraded"
400
+ ]
401
+ }
402
+ },
403
+ "schemas": {},
404
+ "sequences": {},
405
+ "roles": {},
406
+ "policies": {},
407
+ "views": {},
408
+ "_meta": {
409
+ "columns": {},
410
+ "schemas": {},
411
+ "tables": {}
412
+ }
413
+ }