npm - @checkstack/anomaly-common - Versions diffs - 1.2.3 → 1.3.0 - Mend

@checkstack/anomaly-common 1.2.3 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/CHANGELOG.md +48 -0
package/package.json +6 -6
package/src/engine/self-resolution.test.ts +106 -0
package/src/engine/self-resolution.ts +115 -0
package/src/index.ts +1 -0
package/src/rpc-contract.ts +49 -16
package/src/schema.ts +33 -0

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,53 @@
 # @checkstack/anomaly-common
+## 1.3.0
+### Minor Changes
+- 9dcc848: Auto-resolve anomalies that settle at a new normal, and add global suppression.
+  Part A (bug fix): a confirmed anomaly used to stay stuck in `anomaly` indefinitely when the metric settled at a new stable level. Both detectors now carry a baseline-independent self-resolution path - spike: after `STABLE_RESOLUTION_RUN_COUNT` (5) consecutive healthy samples within `STABLE_RESOLUTION_RELATIVE_BAND` (10%) the row self-resolves to `recovered`; drift: when the projected change goes flat relative to the new mean for `STABLE_DRIFT_RESOLUTION_RUN_COUNT` (2) analyzer runs. The original baseline-relative recovery path is unchanged.
+  Part B (feature): global (per-row) suppression. New `suppressedAt` / `suppressedValue` / `suppressedBaseline` columns (Drizzle migration `0005`), `suppressAnomaly` / `unsuppressAnomaly` RPCs gated by `anomaly_feed.manage`, and a `suppression` filter on `getAnomalies` (default `active` hides suppressed rows). Suppressed rows drop out of the dashboard badge/widget active count; the widget exposes an eye-off suppress affordance. Suppression auto-clears once the observed value moves more than `SUPPRESSION_REACTIVATION_DELTA` (25%) from the value it was suppressed at. All suppression state lives on the shared `anomalies` row, so every pod reads the same active/suppressed set. Distinct from the existing per-user notification mute.
+  This is a beta minor.
+- 9dcc848: Harden config-versioning so stored configs always migrate-then-validate and broken migration chains fail fast at boot.
+  - `@checkstack/backend-api` `Versioned<T>` gains `parseAssumingV1` (migrate-from-v1 then validate leniently, runtime path), `parseStrictAssumingV1` (migrate then validate strictly, editor path), and `validateMigrationChainFromV1()`. A standalone pure helper `assertMigrationChainFromV1({ version, migrations })` is the single shared implementation behind the constructor guard and `validateMigrationChainFromV1`.
+  - `Versioned` now validates its own v1 -> `version` chain in the constructor, which runs at module import / plugin registration. A new `no-restricted-syntax` ESLint rule bans calling `parse` / `safeParse` / `parseAsync` / `strict` directly on a `Versioned`'s `.schema` member.
+  - Auth strategy migration chains are validated at the `betterAuthExtensionPoint.addStrategy` chokepoint (`@checkstack/auth-backend`).
+  - Automation action AND trigger configs migrate-then-validate (lenient at dispatch, strict in the editor validator, recursing into `choose`/`parallel`/`repeat`/`sequence` blocks). The `run_script` / `run_shell` action configs bump to `version: 2` dropping the removed `sandbox` key, fixing the editor's `Unrecognized key: sandbox` error.
+  - Anomaly read path now validates: `getAnomalyConfig` / `getAnomalyAssignmentConfig` run stored records through `Versioned.parseRecord`; `PartialAnomalySettingsSchema` moved to `@checkstack/anomaly-common`. Notification ConfigService reads thread the migrations argument, and per-strategy `userConfig` is migrate-then-validated before `send()`.
+  - gitops-apply migrate-then-validates authored health-check config; integration connection validation routes through `safeValidate`. The latent HTTP health-check `result` schema (at `version: 3` with no migrations) now ships a pass-through v1 -> v2 -> v3 chain.
+  BREAKING CHANGES (fail-fast at boot, intended):
+  - Any `Versioned` config with `version > 1` and an incomplete or non-contiguous migration chain now throws at construction (boot) instead of failing lazily on first read. This covers every `Versioned` instance repo-wide, including future plugin types. Out-of-tree plugins shipping such a config must add the missing migration step(s); all in-repo strategies already have complete chains.
+  - An auth strategy declaring `configVersion > 1` without a complete chain throws at registration.
+  - A trigger's per-automation config is now a versioned `config: Versioned<TConfig>` instead of a bare `configSchema?`. Plugins registering triggers with `configSchema:` must wrap it: `config: new Versioned({ version: 1, schema })`. The underlying schema stays reachable via `config.schema`; triggers without per-automation config are unaffected.
+  State and scale: all affected reads resolve from shared Postgres / in-process registries, so every pod sees the same migrated answer. No new framework-owned current-state store.
+  This is a beta minor.
+### Patch Changes
+- Updated dependencies [9dcc848]
+- Updated dependencies [9dcc848]
+- Updated dependencies [9dcc848]
+- Updated dependencies [9dcc848]
+- Updated dependencies [9dcc848]
+- Updated dependencies [9dcc848]
+- Updated dependencies [9dcc848]
+- Updated dependencies [9dcc848]
+- Updated dependencies [9dcc848]
+- Updated dependencies [9dcc848]
+  - @checkstack/notification-common@1.3.0
+  - @checkstack/catalog-common@2.3.0
+  - @checkstack/common@0.13.0
+  - @checkstack/signal-common@0.2.6
 ## 1.2.3
 ### Patch Changes

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@checkstack/anomaly-common",
-  "version": "1.2.3",
+  "version": "1.3.0",
   "license": "Elastic-2.0",
   "type": "module",
   "exports": {
@@ -9,16 +9,16 @@
     }
   },
   "dependencies": {
-    "@checkstack/common": "0.11.0",
-    "@checkstack/catalog-common": "2.2.2",
-    "@checkstack/notification-common": "1.2.0",
-    "@checkstack/signal-common": "0.2.4",
+    "@checkstack/common": "0.12.0",
+    "@checkstack/catalog-common": "2.2.3",
+    "@checkstack/notification-common": "1.2.1",
+    "@checkstack/signal-common": "0.2.5",
     "zod": "^4.2.1"
   },
   "devDependencies": {
     "typescript": "^5.7.2",
     "@checkstack/tsconfig": "0.0.7",
-    "@checkstack/scripts": "0.3.3"
+    "@checkstack/scripts": "0.3.4"
   },
   "scripts": {
     "typecheck": "tsgo -b",

package/src/engine/self-resolution.test.ts ADDED Viewed

@@ -0,0 +1,106 @@
+import { describe, test, expect } from "bun:test";
+import {
+  STABLE_RESOLUTION_RUN_COUNT,
+  STABLE_RESOLUTION_RELATIVE_BAND,
+  SUPPRESSION_REACTIVATION_DELTA,
+  hasSettledAtNewLevel,
+  appendRecentSample,
+  isDriftFlatRelative,
+  hasChangedSinceSuppression,
+} from "./self-resolution";
+describe("hasSettledAtNewLevel", () => {
+  test("false when fewer than the required number of samples", () => {
+    const window = Array.from(
+      { length: STABLE_RESOLUTION_RUN_COUNT - 1 },
+      () => 500,
+    );
+    expect(hasSettledAtNewLevel(window)).toBe(false);
+  });
+  test("true when enough samples sit inside the tight relative band", () => {
+    // All within 10% of mean 500 → settled at a new level.
+    const window = [500, 505, 498, 502, 500];
+    expect(window.length).toBe(STABLE_RESOLUTION_RUN_COUNT);
+    expect(hasSettledAtNewLevel(window)).toBe(true);
+  });
+  test("false when samples are still spread wider than the band", () => {
+    // (max-min)/mean = (700-300)/500 = 0.8 ≫ band
+    const window = [300, 700, 400, 600, 500];
+    expect(hasSettledAtNewLevel(window)).toBe(false);
+  });
+  test("only considers the most recent STABLE_RESOLUTION_RUN_COUNT samples", () => {
+    // Old volatile prefix, recent settled suffix.
+    const window = [10, 9000, 500, 502, 498, 500, 501];
+    expect(hasSettledAtNewLevel(window)).toBe(true);
+  });
+  test("band boundary is inclusive", () => {
+    // spread exactly at the band edge relative to mean
+    const mean = 100;
+    const spread = STABLE_RESOLUTION_RELATIVE_BAND * mean; // 10
+    const window = [mean - spread / 2, mean, mean, mean, mean + spread / 2];
+    expect(hasSettledAtNewLevel(window)).toBe(true);
+  });
+});
+describe("appendRecentSample", () => {
+  test("appends to an empty window", () => {
+    expect(appendRecentSample(undefined, 5)).toEqual([5]);
+  });
+  test("caps the window at STABLE_RESOLUTION_RUN_COUNT, dropping oldest", () => {
+    let window: number[] | undefined;
+    for (let i = 0; i < STABLE_RESOLUTION_RUN_COUNT + 3; i++) {
+      window = appendRecentSample(window, i);
+    }
+    expect(window).toHaveLength(STABLE_RESOLUTION_RUN_COUNT);
+    // Last value pushed was N+2; window keeps the most recent N.
+    expect(window?.at(-1)).toBe(STABLE_RESOLUTION_RUN_COUNT + 2);
+    expect(window?.[0]).toBe(3);
+  });
+});
+describe("isDriftFlatRelative", () => {
+  test("true when projected change is small relative to mean", () => {
+    expect(isDriftFlatRelative({ projectedChange: 5, mean: 1000 })).toBe(true);
+  });
+  test("false when projected change is large relative to mean", () => {
+    expect(isDriftFlatRelative({ projectedChange: 500, mean: 1000 })).toBe(
+      false,
+    );
+  });
+  test("handles near-zero mean without dividing by zero", () => {
+    expect(isDriftFlatRelative({ projectedChange: 0, mean: 0 })).toBe(true);
+    expect(isDriftFlatRelative({ projectedChange: 1, mean: 0 })).toBe(false);
+  });
+});
+describe("hasChangedSinceSuppression", () => {
+  test("false when the value stays within the reactivation band", () => {
+    // 10% move, band is 25%
+    expect(
+      hasChangedSinceSuppression({ observedValue: 110, suppressedValue: 100 }),
+    ).toBe(false);
+  });
+  test("true when the value moves beyond the reactivation band", () => {
+    const beyond = 100 * (1 + SUPPRESSION_REACTIVATION_DELTA) + 1;
+    expect(
+      hasChangedSinceSuppression({
+        observedValue: beyond,
+        suppressedValue: 100,
+      }),
+    ).toBe(true);
+  });
+  test("reacts to moves in either direction", () => {
+    expect(
+      hasChangedSinceSuppression({ observedValue: 50, suppressedValue: 100 }),
+    ).toBe(true);
+  });
+});

package/src/engine/self-resolution.ts ADDED Viewed

@@ -0,0 +1,115 @@
+/**
+ * Self-resolution and suppression heuristics.
+ *
+ * PART A (auto-resolve): a confirmed anomaly must clear once the metric settles
+ * at a *new* stable level, even while that level is still anomalous against the
+ * stale baseline. The classic case is "broken then fixed at a clearly different
+ * value": the new value IS the new normal, but the relative resolver keeps
+ * comparing against the old mean and never fires until the slow hourly baseline
+ * analyzer drags the mean across. These pure helpers give both the spike
+ * detector and the drift evaluator a baseline-independent escape hatch.
+ *
+ * PART B (suppression): an operator can silence a known anomaly. It auto-clears
+ * ("changes again") once the observed value moves outside a relative band around
+ * the value it was suppressed at.
+ */
+/**
+ * Number of consecutive healthy samples that must sit inside the tight band
+ * before a confirmed spike anomaly self-resolves.
+ *
+ * Chosen as 5: deliberately stricter than the typical confirmation window
+ * (~3 runs) so a brief cluster of similar values can't masquerade as a new
+ * stable regime, while still resolving within a handful of check cycles rather
+ * than the hours-to-days the baseline analyzer would take.
+ */
+export const STABLE_RESOLUTION_RUN_COUNT = 5;
+/**
+ * Maximum relative spread, (max − min) / max(|mean|, ε), the recent-sample
+ * window may have to count as "settled". 0.10 == within 10% of each other.
+ *
+ * Tight enough that genuinely volatile metrics never qualify, loose enough to
+ * absorb normal jitter around a new operating point.
+ */
+export const STABLE_RESOLUTION_RELATIVE_BAND = 0.1;
+/**
+ * Number of consecutive baseline-analyzer runs a confirmed *drift* anomaly must
+ * show a flat (relative) slope before it self-resolves. The analyzer runs
+ * hourly, so 2 runs ≈ the metric has held its new level for ~2h, which is the
+ * same confidence shape as the drift confirmation threshold.
+ */
+export const STABLE_DRIFT_RESOLUTION_RUN_COUNT = 2;
+/**
+ * Relative band for "the suppressed metric changed again". When the observed
+ * value moves more than 25% away from the value it was suppressed at (relative
+ * to that value), the suppression auto-clears. Chosen wider than the
+ * self-resolution band: suppression is a deliberate operator action, so we only
+ * undo it on a clearly material move, not on routine jitter.
+ */
+export const SUPPRESSION_REACTIVATION_DELTA = 0.25;
+/**
+ * Returns true when the supplied rolling window of recent healthy samples has
+ * both (a) reached the required length and (b) settled inside the tight
+ * relative band — i.e. the metric has found a new stable level.
+ */
+export function hasSettledAtNewLevel(samples: number[]): boolean {
+  if (samples.length < STABLE_RESOLUTION_RUN_COUNT) return false;
+  const window = samples.slice(-STABLE_RESOLUTION_RUN_COUNT);
+  const min = Math.min(...window);
+  const max = Math.max(...window);
+  const mean = window.reduce((sum, v) => sum + v, 0) / window.length;
+  const denominator = Math.max(Math.abs(mean), Number.EPSILON);
+  return (max - min) / denominator <= STABLE_RESOLUTION_RELATIVE_BAND;
+}
+/**
+ * Append a sample to a rolling window, keeping at most
+ * {@link STABLE_RESOLUTION_RUN_COUNT} most-recent entries (oldest-first).
+ */
+export function appendRecentSample(
+  existing: number[] | undefined,
+  value: number,
+): number[] {
+  const next = [...(existing ?? []), value];
+  return next.slice(-STABLE_RESOLUTION_RUN_COUNT);
+}
+/**
+ * Returns true when the projected drift change is flat relative to the current
+ * mean — the trend has stopped walking and the metric is holding its new level.
+ */
+export function isDriftFlatRelative({
+  projectedChange,
+  mean,
+}: {
+  projectedChange: number;
+  mean: number;
+}): boolean {
+  const denominator = Math.max(Math.abs(mean), Number.EPSILON);
+  return Math.abs(projectedChange) / denominator <= STABLE_RESOLUTION_RELATIVE_BAND;
+}
+/**
+ * Returns true when an observed value has moved outside the relative band
+ * around the value an anomaly was suppressed at — i.e. it "changed again" and
+ * suppression should auto-clear.
+ */
+export function hasChangedSinceSuppression({
+  observedValue,
+  suppressedValue,
+}: {
+  observedValue: number;
+  suppressedValue: number;
+}): boolean {
+  const denominator = Math.max(Math.abs(suppressedValue), Number.EPSILON);
+  return (
+    Math.abs(observedValue - suppressedValue) / denominator >
+    SUPPRESSION_REACTIVATION_DELTA
+  );
+}

package/src/index.ts CHANGED Viewed

@@ -3,6 +3,7 @@ export * from "./engine/baseline";
 export * from "./engine/thresholds";
 export * from "./engine/config";
 export * from "./engine/drift";
+export * from "./engine/self-resolution";
 export * from "./access";
 export * from "./rpc-contract";
 export * from "./plugin-metadata";

package/src/rpc-contract.ts CHANGED Viewed

@@ -1,7 +1,7 @@
 import { createClientDefinition, proc } from "@checkstack/common";
 import { z } from "zod";
 import { pluginMetadata } from "./plugin-metadata";
-import { AnomalyStateSchema, AnomalySettingsSchema, AnomalyFieldConfigSchema, AnomalyKindSchema } from "./schema";
+import { AnomalyStateSchema, AnomalySettingsSchema, PartialAnomalySettingsSchema, AnomalyKindSchema } from "./schema";
 import { anomalyAccess } from "./access";
 export const AnomalyDtoSchema = z.object({
@@ -21,6 +21,9 @@ export const AnomalyDtoSchema = z.object({
   startedAt: z.string(),
   confirmedAt: z.string().nullable(),
   recoveredAt: z.string().nullable(),
+  suppressedAt: z.string().nullable(),
+  suppressedValue: z.number().nullable(),
+  suppressedBaseline: z.number().nullable(),
   metadata: z.record(z.string(), z.unknown()).nullable(),
 });
@@ -67,21 +70,6 @@ const VersionedAnomalySettingsSchema = z.object({
   originalVersion: z.number().optional(),
 });
-/**
- * Partial settings schema for assignment-level overrides.
- * Only includes fields that the user explicitly sets.
- */
-const PartialAnomalySettingsSchema = z.object({
-  enabled: z.boolean().optional(),
-  sensitivity: z.number().optional(),
-  confirmationWindow: z.number().int().optional(),
-  baselineWindow: z.string().optional(),
-  notify: z.boolean().optional(),
-  driftEnabled: z.boolean().optional(),
-  driftThreshold: z.number().optional(),
-  fieldOverrides: z.record(z.string(), AnomalyFieldConfigSchema).optional(),
-});
 const VersionedPartialAnomalySettingsSchema = z.object({
   version: z.number(),
   data: PartialAnomalySettingsSchema,
@@ -101,6 +89,11 @@ export const anomalyContract = {
       configurationId: z.string().optional(),
       state: AnomalyStateSchema.optional(),
       kind: AnomalyKindSchema.optional(),
+      /**
+       * Suppression filter. "active" (default) hides globally-suppressed rows,
+       * "suppressed" lists only them, "all" ignores the flag.
+       */
+      suppression: z.enum(["active", "suppressed", "all"]).optional(),
       limit: z.number().optional().default(50),
     }))
     .output(z.array(AnomalyDtoSchema)),
@@ -201,6 +194,46 @@ export const anomalyContract = {
     )
     .output(z.object({ success: z.boolean() })),
+  /**
+   * Globally suppress an anomaly row so it disappears from the active feed
+   * until the metric "changes again". Suppression is per-row (not per-user)
+   * and lives in shared Postgres so every pod sees the same active set.
+   * Gated by `feed.manage`. Idempotent.
+   */
+  suppressAnomaly: proc({
+    operationType: "mutation",
+    userType: "authenticated",
+    access: [anomalyAccess.feed.manage],
+    instanceAccess: { idParam: "systemId" },
+  })
+    .route({ method: "POST" })
+    .input(
+      z.object({
+        systemId: z.string(),
+        anomalyId: z.string(),
+      }),
+    )
+    .output(z.object({ success: z.boolean() })),
+  /**
+   * Manually clear suppression on an anomaly row, returning it to the active
+   * feed. Gated by `feed.manage`. Idempotent.
+   */
+  unsuppressAnomaly: proc({
+    operationType: "mutation",
+    userType: "authenticated",
+    access: [anomalyAccess.feed.manage],
+    instanceAccess: { idParam: "systemId" },
+  })
+    .route({ method: "POST" })
+    .input(
+      z.object({
+        systemId: z.string(),
+        anomalyId: z.string(),
+      }),
+    )
+    .output(z.object({ success: z.boolean() })),
   /**
    * Remove a mute previously created via muteAnomalyNotification. No-op if
    * no matching record exists.

package/src/schema.ts CHANGED Viewed

@@ -33,6 +33,19 @@ export const AnomalyMetadataSchema = z
   .object({
     trendData: z.record(z.string(), z.unknown()).optional(),
     relatedAnomalies: z.array(z.string()).optional(), // UUIDs
+    /**
+     * Rolling window of the most recent healthy numeric samples, used by the
+     * self-resolution path (PART A) to decide that the metric has settled at a
+     * new stable level even while it is still anomalous against the stale
+     * baseline. Oldest-first; capped at {@link STABLE_RESOLUTION_RUN_COUNT}.
+     */
+    recentSamples: z.array(z.number()).optional(),
+    /**
+     * Count of consecutive baseline-analyzer runs in which a confirmed drift
+     * anomaly's slope has been flat relative to the (new) mean. Used by the
+     * drift self-resolution path.
+     */
+    stableDriftRunCount: z.number().optional(),
   })
   .catchall(z.unknown());
 export type AnomalyMetadata = z.infer<typeof AnomalyMetadataSchema>;
@@ -56,3 +69,23 @@ export const AnomalySettingsSchema = z.object({
   fieldOverrides: z.record(z.string(), AnomalyFieldConfigSchema).optional(),
 });
 export type AnomalySettings = z.infer<typeof AnomalySettingsSchema>;
+/**
+ * Partial settings schema for assignment-level overrides. Only includes the
+ * fields a user may explicitly override on a per-assignment basis, so every
+ * field is optional. Stored as a versioned record alongside the template
+ * config and migrated/validated on read.
+ */
+export const PartialAnomalySettingsSchema = z.object({
+  enabled: z.boolean().optional(),
+  sensitivity: z.number().optional(),
+  confirmationWindow: z.number().int().optional(),
+  baselineWindow: z.string().optional(),
+  notify: z.boolean().optional(),
+  driftEnabled: z.boolean().optional(),
+  driftThreshold: z.number().optional(),
+  fieldOverrides: z.record(z.string(), AnomalyFieldConfigSchema).optional(),
+});
+export type PartialAnomalySettings = z.infer<
+  typeof PartialAnomalySettingsSchema
+>;