@checkstack/anomaly-common 1.2.3 → 1.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,63 @@
1
1
  # @checkstack/anomaly-common
2
2
 
3
+ ## 1.3.1
4
+
5
+ ### Patch Changes
6
+
7
+ - Updated dependencies [13373ce]
8
+ - @checkstack/common@0.14.0
9
+ - @checkstack/catalog-common@2.3.1
10
+ - @checkstack/notification-common@1.3.1
11
+ - @checkstack/signal-common@0.2.7
12
+
13
+ ## 1.3.0
14
+
15
+ ### Minor Changes
16
+
17
+ - 9dcc848: Auto-resolve anomalies that settle at a new normal, and add global suppression.
18
+
19
+ Part A (bug fix): a confirmed anomaly used to stay stuck in `anomaly` indefinitely when the metric settled at a new stable level. Both detectors now carry a baseline-independent self-resolution path - spike: after `STABLE_RESOLUTION_RUN_COUNT` (5) consecutive healthy samples within `STABLE_RESOLUTION_RELATIVE_BAND` (10%) the row self-resolves to `recovered`; drift: when the projected change goes flat relative to the new mean for `STABLE_DRIFT_RESOLUTION_RUN_COUNT` (2) analyzer runs. The original baseline-relative recovery path is unchanged.
20
+
21
+ Part B (feature): global (per-row) suppression. New `suppressedAt` / `suppressedValue` / `suppressedBaseline` columns (Drizzle migration `0005`), `suppressAnomaly` / `unsuppressAnomaly` RPCs gated by `anomaly_feed.manage`, and a `suppression` filter on `getAnomalies` (default `active` hides suppressed rows). Suppressed rows drop out of the dashboard badge/widget active count; the widget exposes an eye-off suppress affordance. Suppression auto-clears once the observed value moves more than `SUPPRESSION_REACTIVATION_DELTA` (25%) from the value it was suppressed at. All suppression state lives on the shared `anomalies` row, so every pod reads the same active/suppressed set. Distinct from the existing per-user notification mute.
22
+
23
+ This is a beta minor.
24
+
25
+ - 9dcc848: Harden config-versioning so stored configs always migrate-then-validate and broken migration chains fail fast at boot.
26
+
27
+ - `@checkstack/backend-api` `Versioned<T>` gains `parseAssumingV1` (migrate-from-v1 then validate leniently, runtime path), `parseStrictAssumingV1` (migrate then validate strictly, editor path), and `validateMigrationChainFromV1()`. A standalone pure helper `assertMigrationChainFromV1({ version, migrations })` is the single shared implementation behind the constructor guard and `validateMigrationChainFromV1`.
28
+ - `Versioned` now validates its own v1 -> `version` chain in the constructor, which runs at module import / plugin registration. A new `no-restricted-syntax` ESLint rule bans calling `parse` / `safeParse` / `parseAsync` / `strict` directly on a `Versioned`'s `.schema` member.
29
+ - Auth strategy migration chains are validated at the `betterAuthExtensionPoint.addStrategy` chokepoint (`@checkstack/auth-backend`).
30
+ - Automation action AND trigger configs migrate-then-validate (lenient at dispatch, strict in the editor validator, recursing into `choose`/`parallel`/`repeat`/`sequence` blocks). The `run_script` / `run_shell` action configs bump to `version: 2` dropping the removed `sandbox` key, fixing the editor's `Unrecognized key: sandbox` error.
31
+ - Anomaly read path now validates: `getAnomalyConfig` / `getAnomalyAssignmentConfig` run stored records through `Versioned.parseRecord`; `PartialAnomalySettingsSchema` moved to `@checkstack/anomaly-common`. Notification ConfigService reads thread the migrations argument, and per-strategy `userConfig` is migrate-then-validated before `send()`.
32
+ - gitops-apply migrate-then-validates authored health-check config; integration connection validation routes through `safeValidate`. The latent HTTP health-check `result` schema (at `version: 3` with no migrations) now ships a pass-through v1 -> v2 -> v3 chain.
33
+
34
+ BREAKING CHANGES (fail-fast at boot, intended):
35
+
36
+ - Any `Versioned` config with `version > 1` and an incomplete or non-contiguous migration chain now throws at construction (boot) instead of failing lazily on first read. This covers every `Versioned` instance repo-wide, including future plugin types. Out-of-tree plugins shipping such a config must add the missing migration step(s); all in-repo strategies already have complete chains.
37
+ - An auth strategy declaring `configVersion > 1` without a complete chain throws at registration.
38
+ - A trigger's per-automation config is now a versioned `config: Versioned<TConfig>` instead of a bare `configSchema?`. Plugins registering triggers with `configSchema:` must wrap it: `config: new Versioned({ version: 1, schema })`. The underlying schema stays reachable via `config.schema`; triggers without per-automation config are unaffected.
39
+
40
+ State and scale: all affected reads resolve from shared Postgres / in-process registries, so every pod sees the same migrated answer. No new framework-owned current-state store.
41
+
42
+ This is a beta minor.
43
+
44
+ ### Patch Changes
45
+
46
+ - Updated dependencies [9dcc848]
47
+ - Updated dependencies [9dcc848]
48
+ - Updated dependencies [9dcc848]
49
+ - Updated dependencies [9dcc848]
50
+ - Updated dependencies [9dcc848]
51
+ - Updated dependencies [9dcc848]
52
+ - Updated dependencies [9dcc848]
53
+ - Updated dependencies [9dcc848]
54
+ - Updated dependencies [9dcc848]
55
+ - Updated dependencies [9dcc848]
56
+ - @checkstack/notification-common@1.3.0
57
+ - @checkstack/catalog-common@2.3.0
58
+ - @checkstack/common@0.13.0
59
+ - @checkstack/signal-common@0.2.6
60
+
3
61
  ## 1.2.3
4
62
 
5
63
  ### Patch Changes
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@checkstack/anomaly-common",
3
- "version": "1.2.3",
3
+ "version": "1.3.1",
4
4
  "license": "Elastic-2.0",
5
5
  "type": "module",
6
6
  "exports": {
@@ -9,16 +9,16 @@
9
9
  }
10
10
  },
11
11
  "dependencies": {
12
- "@checkstack/common": "0.11.0",
13
- "@checkstack/catalog-common": "2.2.2",
14
- "@checkstack/notification-common": "1.2.0",
15
- "@checkstack/signal-common": "0.2.4",
12
+ "@checkstack/common": "0.13.0",
13
+ "@checkstack/catalog-common": "2.3.0",
14
+ "@checkstack/notification-common": "1.3.0",
15
+ "@checkstack/signal-common": "0.2.6",
16
16
  "zod": "^4.2.1"
17
17
  },
18
18
  "devDependencies": {
19
19
  "typescript": "^5.7.2",
20
20
  "@checkstack/tsconfig": "0.0.7",
21
- "@checkstack/scripts": "0.3.3"
21
+ "@checkstack/scripts": "0.4.0"
22
22
  },
23
23
  "scripts": {
24
24
  "typecheck": "tsgo -b",
@@ -0,0 +1,106 @@
1
+ import { describe, test, expect } from "bun:test";
2
+ import {
3
+ STABLE_RESOLUTION_RUN_COUNT,
4
+ STABLE_RESOLUTION_RELATIVE_BAND,
5
+ SUPPRESSION_REACTIVATION_DELTA,
6
+ hasSettledAtNewLevel,
7
+ appendRecentSample,
8
+ isDriftFlatRelative,
9
+ hasChangedSinceSuppression,
10
+ } from "./self-resolution";
11
+
12
+ describe("hasSettledAtNewLevel", () => {
13
+ test("false when fewer than the required number of samples", () => {
14
+ const window = Array.from(
15
+ { length: STABLE_RESOLUTION_RUN_COUNT - 1 },
16
+ () => 500,
17
+ );
18
+ expect(hasSettledAtNewLevel(window)).toBe(false);
19
+ });
20
+
21
+ test("true when enough samples sit inside the tight relative band", () => {
22
+ // All within 10% of mean 500 → settled at a new level.
23
+ const window = [500, 505, 498, 502, 500];
24
+ expect(window.length).toBe(STABLE_RESOLUTION_RUN_COUNT);
25
+ expect(hasSettledAtNewLevel(window)).toBe(true);
26
+ });
27
+
28
+ test("false when samples are still spread wider than the band", () => {
29
+ // (max-min)/mean = (700-300)/500 = 0.8 ≫ band
30
+ const window = [300, 700, 400, 600, 500];
31
+ expect(hasSettledAtNewLevel(window)).toBe(false);
32
+ });
33
+
34
+ test("only considers the most recent STABLE_RESOLUTION_RUN_COUNT samples", () => {
35
+ // Old volatile prefix, recent settled suffix.
36
+ const window = [10, 9000, 500, 502, 498, 500, 501];
37
+ expect(hasSettledAtNewLevel(window)).toBe(true);
38
+ });
39
+
40
+ test("band boundary is inclusive", () => {
41
+ // spread exactly at the band edge relative to mean
42
+ const mean = 100;
43
+ const spread = STABLE_RESOLUTION_RELATIVE_BAND * mean; // 10
44
+ const window = [mean - spread / 2, mean, mean, mean, mean + spread / 2];
45
+ expect(hasSettledAtNewLevel(window)).toBe(true);
46
+ });
47
+ });
48
+
49
+ describe("appendRecentSample", () => {
50
+ test("appends to an empty window", () => {
51
+ expect(appendRecentSample(undefined, 5)).toEqual([5]);
52
+ });
53
+
54
+ test("caps the window at STABLE_RESOLUTION_RUN_COUNT, dropping oldest", () => {
55
+ let window: number[] | undefined;
56
+ for (let i = 0; i < STABLE_RESOLUTION_RUN_COUNT + 3; i++) {
57
+ window = appendRecentSample(window, i);
58
+ }
59
+ expect(window).toHaveLength(STABLE_RESOLUTION_RUN_COUNT);
60
+ // Last value pushed was N+2; window keeps the most recent N.
61
+ expect(window?.at(-1)).toBe(STABLE_RESOLUTION_RUN_COUNT + 2);
62
+ expect(window?.[0]).toBe(3);
63
+ });
64
+ });
65
+
66
+ describe("isDriftFlatRelative", () => {
67
+ test("true when projected change is small relative to mean", () => {
68
+ expect(isDriftFlatRelative({ projectedChange: 5, mean: 1000 })).toBe(true);
69
+ });
70
+
71
+ test("false when projected change is large relative to mean", () => {
72
+ expect(isDriftFlatRelative({ projectedChange: 500, mean: 1000 })).toBe(
73
+ false,
74
+ );
75
+ });
76
+
77
+ test("handles near-zero mean without dividing by zero", () => {
78
+ expect(isDriftFlatRelative({ projectedChange: 0, mean: 0 })).toBe(true);
79
+ expect(isDriftFlatRelative({ projectedChange: 1, mean: 0 })).toBe(false);
80
+ });
81
+ });
82
+
83
+ describe("hasChangedSinceSuppression", () => {
84
+ test("false when the value stays within the reactivation band", () => {
85
+ // 10% move, band is 25%
86
+ expect(
87
+ hasChangedSinceSuppression({ observedValue: 110, suppressedValue: 100 }),
88
+ ).toBe(false);
89
+ });
90
+
91
+ test("true when the value moves beyond the reactivation band", () => {
92
+ const beyond = 100 * (1 + SUPPRESSION_REACTIVATION_DELTA) + 1;
93
+ expect(
94
+ hasChangedSinceSuppression({
95
+ observedValue: beyond,
96
+ suppressedValue: 100,
97
+ }),
98
+ ).toBe(true);
99
+ });
100
+
101
+ test("reacts to moves in either direction", () => {
102
+ expect(
103
+ hasChangedSinceSuppression({ observedValue: 50, suppressedValue: 100 }),
104
+ ).toBe(true);
105
+ });
106
+ });
@@ -0,0 +1,115 @@
1
+ /**
2
+ * Self-resolution and suppression heuristics.
3
+ *
4
+ * PART A (auto-resolve): a confirmed anomaly must clear once the metric settles
5
+ * at a *new* stable level, even while that level is still anomalous against the
6
+ * stale baseline. The classic case is "broken then fixed at a clearly different
7
+ * value": the new value IS the new normal, but the relative resolver keeps
8
+ * comparing against the old mean and never fires until the slow hourly baseline
9
+ * analyzer drags the mean across. These pure helpers give both the spike
10
+ * detector and the drift evaluator a baseline-independent escape hatch.
11
+ *
12
+ * PART B (suppression): an operator can silence a known anomaly. It auto-clears
13
+ * ("changes again") once the observed value moves outside a relative band around
14
+ * the value it was suppressed at.
15
+ */
16
+
17
+ /**
18
+ * Number of consecutive healthy samples that must sit inside the tight band
19
+ * before a confirmed spike anomaly self-resolves.
20
+ *
21
+ * Chosen as 5: deliberately stricter than the typical confirmation window
22
+ * (~3 runs) so a brief cluster of similar values can't masquerade as a new
23
+ * stable regime, while still resolving within a handful of check cycles rather
24
+ * than the hours-to-days the baseline analyzer would take.
25
+ */
26
+ export const STABLE_RESOLUTION_RUN_COUNT = 5;
27
+
28
+ /**
29
+ * Maximum relative spread, (max − min) / max(|mean|, ε), the recent-sample
30
+ * window may have to count as "settled". 0.10 == within 10% of each other.
31
+ *
32
+ * Tight enough that genuinely volatile metrics never qualify, loose enough to
33
+ * absorb normal jitter around a new operating point.
34
+ */
35
+ export const STABLE_RESOLUTION_RELATIVE_BAND = 0.1;
36
+
37
+ /**
38
+ * Number of consecutive baseline-analyzer runs a confirmed *drift* anomaly must
39
+ * show a flat (relative) slope before it self-resolves. The analyzer runs
40
+ * hourly, so 2 runs ≈ the metric has held its new level for ~2h, which is the
41
+ * same confidence shape as the drift confirmation threshold.
42
+ */
43
+ export const STABLE_DRIFT_RESOLUTION_RUN_COUNT = 2;
44
+
45
+ /**
46
+ * Relative band for "the suppressed metric changed again". When the observed
47
+ * value moves more than 25% away from the value it was suppressed at (relative
48
+ * to that value), the suppression auto-clears. Chosen wider than the
49
+ * self-resolution band: suppression is a deliberate operator action, so we only
50
+ * undo it on a clearly material move, not on routine jitter.
51
+ */
52
+ export const SUPPRESSION_REACTIVATION_DELTA = 0.25;
53
+
54
+ /**
55
+ * Returns true when the supplied rolling window of recent healthy samples has
56
+ * both (a) reached the required length and (b) settled inside the tight
57
+ * relative band — i.e. the metric has found a new stable level.
58
+ */
59
+ export function hasSettledAtNewLevel(samples: number[]): boolean {
60
+ if (samples.length < STABLE_RESOLUTION_RUN_COUNT) return false;
61
+
62
+ const window = samples.slice(-STABLE_RESOLUTION_RUN_COUNT);
63
+ const min = Math.min(...window);
64
+ const max = Math.max(...window);
65
+ const mean = window.reduce((sum, v) => sum + v, 0) / window.length;
66
+ const denominator = Math.max(Math.abs(mean), Number.EPSILON);
67
+
68
+ return (max - min) / denominator <= STABLE_RESOLUTION_RELATIVE_BAND;
69
+ }
70
+
71
+ /**
72
+ * Append a sample to a rolling window, keeping at most
73
+ * {@link STABLE_RESOLUTION_RUN_COUNT} most-recent entries (oldest-first).
74
+ */
75
+ export function appendRecentSample(
76
+ existing: number[] | undefined,
77
+ value: number,
78
+ ): number[] {
79
+ const next = [...(existing ?? []), value];
80
+ return next.slice(-STABLE_RESOLUTION_RUN_COUNT);
81
+ }
82
+
83
+ /**
84
+ * Returns true when the projected drift change is flat relative to the current
85
+ * mean — the trend has stopped walking and the metric is holding its new level.
86
+ */
87
+ export function isDriftFlatRelative({
88
+ projectedChange,
89
+ mean,
90
+ }: {
91
+ projectedChange: number;
92
+ mean: number;
93
+ }): boolean {
94
+ const denominator = Math.max(Math.abs(mean), Number.EPSILON);
95
+ return Math.abs(projectedChange) / denominator <= STABLE_RESOLUTION_RELATIVE_BAND;
96
+ }
97
+
98
+ /**
99
+ * Returns true when an observed value has moved outside the relative band
100
+ * around the value an anomaly was suppressed at — i.e. it "changed again" and
101
+ * suppression should auto-clear.
102
+ */
103
+ export function hasChangedSinceSuppression({
104
+ observedValue,
105
+ suppressedValue,
106
+ }: {
107
+ observedValue: number;
108
+ suppressedValue: number;
109
+ }): boolean {
110
+ const denominator = Math.max(Math.abs(suppressedValue), Number.EPSILON);
111
+ return (
112
+ Math.abs(observedValue - suppressedValue) / denominator >
113
+ SUPPRESSION_REACTIVATION_DELTA
114
+ );
115
+ }
package/src/index.ts CHANGED
@@ -3,6 +3,7 @@ export * from "./engine/baseline";
3
3
  export * from "./engine/thresholds";
4
4
  export * from "./engine/config";
5
5
  export * from "./engine/drift";
6
+ export * from "./engine/self-resolution";
6
7
  export * from "./access";
7
8
  export * from "./rpc-contract";
8
9
  export * from "./plugin-metadata";
@@ -1,7 +1,7 @@
1
1
  import { createClientDefinition, proc } from "@checkstack/common";
2
2
  import { z } from "zod";
3
3
  import { pluginMetadata } from "./plugin-metadata";
4
- import { AnomalyStateSchema, AnomalySettingsSchema, AnomalyFieldConfigSchema, AnomalyKindSchema } from "./schema";
4
+ import { AnomalyStateSchema, AnomalySettingsSchema, PartialAnomalySettingsSchema, AnomalyKindSchema } from "./schema";
5
5
  import { anomalyAccess } from "./access";
6
6
 
7
7
  export const AnomalyDtoSchema = z.object({
@@ -21,6 +21,9 @@ export const AnomalyDtoSchema = z.object({
21
21
  startedAt: z.string(),
22
22
  confirmedAt: z.string().nullable(),
23
23
  recoveredAt: z.string().nullable(),
24
+ suppressedAt: z.string().nullable(),
25
+ suppressedValue: z.number().nullable(),
26
+ suppressedBaseline: z.number().nullable(),
24
27
  metadata: z.record(z.string(), z.unknown()).nullable(),
25
28
  });
26
29
 
@@ -67,21 +70,6 @@ const VersionedAnomalySettingsSchema = z.object({
67
70
  originalVersion: z.number().optional(),
68
71
  });
69
72
 
70
- /**
71
- * Partial settings schema for assignment-level overrides.
72
- * Only includes fields that the user explicitly sets.
73
- */
74
- const PartialAnomalySettingsSchema = z.object({
75
- enabled: z.boolean().optional(),
76
- sensitivity: z.number().optional(),
77
- confirmationWindow: z.number().int().optional(),
78
- baselineWindow: z.string().optional(),
79
- notify: z.boolean().optional(),
80
- driftEnabled: z.boolean().optional(),
81
- driftThreshold: z.number().optional(),
82
- fieldOverrides: z.record(z.string(), AnomalyFieldConfigSchema).optional(),
83
- });
84
-
85
73
  const VersionedPartialAnomalySettingsSchema = z.object({
86
74
  version: z.number(),
87
75
  data: PartialAnomalySettingsSchema,
@@ -101,6 +89,11 @@ export const anomalyContract = {
101
89
  configurationId: z.string().optional(),
102
90
  state: AnomalyStateSchema.optional(),
103
91
  kind: AnomalyKindSchema.optional(),
92
+ /**
93
+ * Suppression filter. "active" (default) hides globally-suppressed rows,
94
+ * "suppressed" lists only them, "all" ignores the flag.
95
+ */
96
+ suppression: z.enum(["active", "suppressed", "all"]).optional(),
104
97
  limit: z.number().optional().default(50),
105
98
  }))
106
99
  .output(z.array(AnomalyDtoSchema)),
@@ -201,6 +194,46 @@ export const anomalyContract = {
201
194
  )
202
195
  .output(z.object({ success: z.boolean() })),
203
196
 
197
+ /**
198
+ * Globally suppress an anomaly row so it disappears from the active feed
199
+ * until the metric "changes again". Suppression is per-row (not per-user)
200
+ * and lives in shared Postgres so every pod sees the same active set.
201
+ * Gated by `feed.manage`. Idempotent.
202
+ */
203
+ suppressAnomaly: proc({
204
+ operationType: "mutation",
205
+ userType: "authenticated",
206
+ access: [anomalyAccess.feed.manage],
207
+ instanceAccess: { idParam: "systemId" },
208
+ })
209
+ .route({ method: "POST" })
210
+ .input(
211
+ z.object({
212
+ systemId: z.string(),
213
+ anomalyId: z.string(),
214
+ }),
215
+ )
216
+ .output(z.object({ success: z.boolean() })),
217
+
218
+ /**
219
+ * Manually clear suppression on an anomaly row, returning it to the active
220
+ * feed. Gated by `feed.manage`. Idempotent.
221
+ */
222
+ unsuppressAnomaly: proc({
223
+ operationType: "mutation",
224
+ userType: "authenticated",
225
+ access: [anomalyAccess.feed.manage],
226
+ instanceAccess: { idParam: "systemId" },
227
+ })
228
+ .route({ method: "POST" })
229
+ .input(
230
+ z.object({
231
+ systemId: z.string(),
232
+ anomalyId: z.string(),
233
+ }),
234
+ )
235
+ .output(z.object({ success: z.boolean() })),
236
+
204
237
  /**
205
238
  * Remove a mute previously created via muteAnomalyNotification. No-op if
206
239
  * no matching record exists.
package/src/schema.ts CHANGED
@@ -33,6 +33,19 @@ export const AnomalyMetadataSchema = z
33
33
  .object({
34
34
  trendData: z.record(z.string(), z.unknown()).optional(),
35
35
  relatedAnomalies: z.array(z.string()).optional(), // UUIDs
36
+ /**
37
+ * Rolling window of the most recent healthy numeric samples, used by the
38
+ * self-resolution path (PART A) to decide that the metric has settled at a
39
+ * new stable level even while it is still anomalous against the stale
40
+ * baseline. Oldest-first; capped at {@link STABLE_RESOLUTION_RUN_COUNT}.
41
+ */
42
+ recentSamples: z.array(z.number()).optional(),
43
+ /**
44
+ * Count of consecutive baseline-analyzer runs in which a confirmed drift
45
+ * anomaly's slope has been flat relative to the (new) mean. Used by the
46
+ * drift self-resolution path.
47
+ */
48
+ stableDriftRunCount: z.number().optional(),
36
49
  })
37
50
  .catchall(z.unknown());
38
51
  export type AnomalyMetadata = z.infer<typeof AnomalyMetadataSchema>;
@@ -56,3 +69,23 @@ export const AnomalySettingsSchema = z.object({
56
69
  fieldOverrides: z.record(z.string(), AnomalyFieldConfigSchema).optional(),
57
70
  });
58
71
  export type AnomalySettings = z.infer<typeof AnomalySettingsSchema>;
72
+
73
+ /**
74
+ * Partial settings schema for assignment-level overrides. Only includes the
75
+ * fields a user may explicitly override on a per-assignment basis, so every
76
+ * field is optional. Stored as a versioned record alongside the template
77
+ * config and migrated/validated on read.
78
+ */
79
+ export const PartialAnomalySettingsSchema = z.object({
80
+ enabled: z.boolean().optional(),
81
+ sensitivity: z.number().optional(),
82
+ confirmationWindow: z.number().int().optional(),
83
+ baselineWindow: z.string().optional(),
84
+ notify: z.boolean().optional(),
85
+ driftEnabled: z.boolean().optional(),
86
+ driftThreshold: z.number().optional(),
87
+ fieldOverrides: z.record(z.string(), AnomalyFieldConfigSchema).optional(),
88
+ });
89
+ export type PartialAnomalySettings = z.infer<
90
+ typeof PartialAnomalySettingsSchema
91
+ >;