@checkstack/slo-backend 0.5.0 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +91 -0
- package/package.json +19 -19
- package/src/hooks.ts +0 -39
- package/src/index.ts +149 -127
- package/src/slo-entity.test.ts +255 -0
- package/src/slo-entity.ts +162 -0
- package/src/streak-calculator.ts +65 -16
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,96 @@
|
|
|
1
1
|
# @checkstack/slo-backend
|
|
2
2
|
|
|
3
|
+
## 0.6.0
|
|
4
|
+
|
|
5
|
+
### Minor Changes
|
|
6
|
+
|
|
7
|
+
- b995afb: Make `slo` a plugin-backed, COMPUTED reactive entity via the Model-B entity state machine + rewire its cross-plugin consumers.
|
|
8
|
+
|
|
9
|
+
SLO defines a `slo` entity `{ objectiveId, systemId, target, budgetRemainingPercent, currentStreak, bestStreak }` keyed by `objectiveId`. There is no framework `entity_state` row: its current state is assembled on demand by a `read` accessor (`createSloEntityRead` / `computeSloEntityState`). `currentStreak` / `bestStreak` / `systemId` / `target` come from the authoritative `slo_streaks` + `slo_objectives` tables, and `budgetRemainingPercent` (plus `target`) is COMPUTED on the fly via the SLO engine's `computeStatus` (downtime aggregation over the objective's rolling window). The daily snapshot job's streak-persist write drives through the fail-soft `writeSloEntity` (`handle.mutate({ id: objectiveId, apply })`): `apply` persists the streak to `slo_streaks` (its own write) and returns the freshly-computed view; the framework snapshots `prev` via the computed `read` BEFORE the write, appends the transition log, and emits `ENTITY_CHANGED`.
|
|
10
|
+
|
|
11
|
+
Compute-on-read (not materialize): the budget is a pure function of the objective's append-only downtime history, so storing a second copy would duplicate the engine's source of truth and risk drift. The `read` recomputes from the same tables the SLO API already reads; it is only exercised on the prev-snapshot of the once-daily streak job and on reactive scope/wake resolution, so the recompute cost is negligible. The append-only `slo_downtime_events` + `slo_daily_snapshots` tables are declared non-reactive (bookkeeping); the live budget/streak is the entity. Operators author budget/streak thresholds as reactive `numeric_state` conditions over `state.slo.<objectiveId>.budgetRemainingPercent` / `currentStreak`.
|
|
12
|
+
|
|
13
|
+
The healthcheck + catalog consumers switched from `onHook(<hook>)` to `onEntityChanged({ kind })`, all keeping `work-queue` delivery (each handler performs side-effecting writes that must run once per cluster):
|
|
14
|
+
|
|
15
|
+
- `slo-system-down` / `slo-upstream-down`: react to `health` changes filtered to a degraded transition (`classifyHealthChange().degraded`).
|
|
16
|
+
- `slo-system-up`: reacts to `health` changes filtered to a recovered transition (`classifyHealthChange().recovered`).
|
|
17
|
+
- `slo-system-cleanup`: reacts to `catalog-system` tombstones (`change.next === null`).
|
|
18
|
+
|
|
19
|
+
BREAKING CHANGES:
|
|
20
|
+
|
|
21
|
+
- The `slo.budget.warning` / `slo.budget.critical` / `slo.budget.exhausted` and `slo.streak.broken` automation triggers are removed. These thresholds were never emitted by the engine (the underlying hooks were inert) and are replaced by reactive `numeric_state` conditions over the `slo` entity (`budgetRemainingPercent < 20`, `currentStreak == 0`, etc.). Re-author any automations that referenced these trigger ids as `numeric_state` / `state` conditions. The `slo.achievement.unlocked` and `slo.weekly.digest` triggers are KEPT.
|
|
22
|
+
|
|
23
|
+
- b995afb: Remove the dead `slo.budget.warning` / `slo.budget.critical` / `slo.budget.exhausted` / `slo.streak.broken` hook descriptors from `sloHooks`.
|
|
24
|
+
|
|
25
|
+
These four `createHook` descriptors had no emitter and no trigger registration left: per the reactive automation engine (§9.2) the SLO budget IS the reactive entity, and the old threshold/streak triggers became `numeric_state` / `state` conditions over `state.slo.<objectiveId>.budgetRemainingPercent` + `currentStreak`. Nothing in the repo emitted or subscribed to the four hooks, so they were unreachable surface. `sloAchievementUnlocked` and `sloWeeklyDigest` are unaffected and stay.
|
|
26
|
+
|
|
27
|
+
BREAKING CHANGES:
|
|
28
|
+
|
|
29
|
+
- Removed `sloHooks.sloBudgetWarning`, `sloHooks.sloBudgetCritical`, `sloHooks.sloBudgetExhausted`, and `sloHooks.sloStreakBroken`. Author SLO budget / streak threshold automations as reactive `numeric_state` / `state` conditions over the `slo` entity state instead.
|
|
30
|
+
|
|
31
|
+
### Patch Changes
|
|
32
|
+
|
|
33
|
+
- b995afb: Extract a shared `withEntityWrite` / `withEntityRemove` guard for PLUGIN-BACKED (Model B) reactive entities and refactor the per-domain copies onto it.
|
|
34
|
+
|
|
35
|
+
Every plugin-backed domain (incident, catalog, dependency, maintenance, slo, satellite) reimplemented the same "no handle wired → run the plugin write directly; handle wired → route through `handle.mutate` / `handle.remove`" guard, varying only in the id-key name. `@checkstack/automation-backend` now exports `withEntityWrite` / `withEntityRemove` (from the entity barrel) and each domain's thin, well-named wrappers (`writeIncidentEntity`, `writeMaintenanceEntity`, satellite's `mirror`, …) delegate to it, so the branch lives in exactly one place. Behavior is unchanged.
|
|
36
|
+
|
|
37
|
+
`writeHealthEntity` (healthcheck-backend) is intentionally NOT migrated onto the helper — it is genuinely bespoke (closure-captured durable state, distinct rethrow-vs-fail-soft branches, a per-system serializer, and it returns the computed state). SLO keeps its fail-soft `onError` wrapper around the shared guard.
|
|
38
|
+
|
|
39
|
+
- Updated dependencies [270ef29]
|
|
40
|
+
- Updated dependencies [b995afb]
|
|
41
|
+
- Updated dependencies [b995afb]
|
|
42
|
+
- Updated dependencies [b995afb]
|
|
43
|
+
- Updated dependencies [270ef29]
|
|
44
|
+
- Updated dependencies [270ef29]
|
|
45
|
+
- Updated dependencies [270ef29]
|
|
46
|
+
- Updated dependencies [270ef29]
|
|
47
|
+
- Updated dependencies [270ef29]
|
|
48
|
+
- Updated dependencies [270ef29]
|
|
49
|
+
- Updated dependencies [270ef29]
|
|
50
|
+
- Updated dependencies [270ef29]
|
|
51
|
+
- Updated dependencies [270ef29]
|
|
52
|
+
- Updated dependencies [b995afb]
|
|
53
|
+
- Updated dependencies [b995afb]
|
|
54
|
+
- Updated dependencies [b995afb]
|
|
55
|
+
- Updated dependencies [b995afb]
|
|
56
|
+
- Updated dependencies [270ef29]
|
|
57
|
+
- Updated dependencies [b995afb]
|
|
58
|
+
- Updated dependencies [270ef29]
|
|
59
|
+
- Updated dependencies [b995afb]
|
|
60
|
+
- Updated dependencies [b995afb]
|
|
61
|
+
- Updated dependencies [270ef29]
|
|
62
|
+
- Updated dependencies [b995afb]
|
|
63
|
+
- Updated dependencies [b995afb]
|
|
64
|
+
- Updated dependencies [270ef29]
|
|
65
|
+
- Updated dependencies [b995afb]
|
|
66
|
+
- Updated dependencies [b995afb]
|
|
67
|
+
- Updated dependencies [b995afb]
|
|
68
|
+
- Updated dependencies [b995afb]
|
|
69
|
+
- Updated dependencies [b995afb]
|
|
70
|
+
- Updated dependencies [b995afb]
|
|
71
|
+
- Updated dependencies [b995afb]
|
|
72
|
+
- Updated dependencies [270ef29]
|
|
73
|
+
- Updated dependencies [270ef29]
|
|
74
|
+
- Updated dependencies [270ef29]
|
|
75
|
+
- Updated dependencies [270ef29]
|
|
76
|
+
- Updated dependencies [270ef29]
|
|
77
|
+
- Updated dependencies [270ef29]
|
|
78
|
+
- Updated dependencies [270ef29]
|
|
79
|
+
- Updated dependencies [270ef29]
|
|
80
|
+
- Updated dependencies [b995afb]
|
|
81
|
+
- Updated dependencies [b995afb]
|
|
82
|
+
- @checkstack/backend-api@0.19.0
|
|
83
|
+
- @checkstack/automation-backend@0.3.0
|
|
84
|
+
- @checkstack/gitops-common@0.5.0
|
|
85
|
+
- @checkstack/gitops-backend@0.4.0
|
|
86
|
+
- @checkstack/healthcheck-backend@1.4.0
|
|
87
|
+
- @checkstack/healthcheck-common@1.4.0
|
|
88
|
+
- @checkstack/catalog-backend@1.3.0
|
|
89
|
+
- @checkstack/cache-api@0.3.7
|
|
90
|
+
- @checkstack/command-backend@0.1.32
|
|
91
|
+
- @checkstack/queue-api@0.3.7
|
|
92
|
+
- @checkstack/cache-utils@0.2.12
|
|
93
|
+
|
|
3
94
|
## 0.5.0
|
|
4
95
|
|
|
5
96
|
### Minor Changes
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@checkstack/slo-backend",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.6.0",
|
|
4
4
|
"license": "Elastic-2.0",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "src/index.ts",
|
|
@@ -14,30 +14,30 @@
|
|
|
14
14
|
"lint:code": "eslint . --max-warnings 0"
|
|
15
15
|
},
|
|
16
16
|
"dependencies": {
|
|
17
|
-
"@checkstack/backend-api": "0.
|
|
18
|
-
"@checkstack/cache-api": "0.3.
|
|
19
|
-
"@checkstack/cache-utils": "0.2.
|
|
20
|
-
"@checkstack/slo-common": "0.4.
|
|
21
|
-
"@checkstack/healthcheck-common": "1.
|
|
22
|
-
"@checkstack/healthcheck-backend": "1.
|
|
23
|
-
"@checkstack/dependency-common": "1.1.
|
|
24
|
-
"@checkstack/catalog-common": "2.2.
|
|
25
|
-
"@checkstack/catalog-backend": "1.
|
|
26
|
-
"@checkstack/command-backend": "0.1.
|
|
27
|
-
"@checkstack/signal-common": "0.2.
|
|
28
|
-
"@checkstack/automation-backend": "0.
|
|
29
|
-
"@checkstack/gitops-backend": "0.3.
|
|
30
|
-
"@checkstack/gitops-common": "0.4.
|
|
31
|
-
"@checkstack/common": "0.
|
|
32
|
-
"@checkstack/queue-api": "0.3.
|
|
17
|
+
"@checkstack/backend-api": "0.18.0",
|
|
18
|
+
"@checkstack/cache-api": "0.3.6",
|
|
19
|
+
"@checkstack/cache-utils": "0.2.11",
|
|
20
|
+
"@checkstack/slo-common": "0.4.2",
|
|
21
|
+
"@checkstack/healthcheck-common": "1.3.0",
|
|
22
|
+
"@checkstack/healthcheck-backend": "1.3.0",
|
|
23
|
+
"@checkstack/dependency-common": "1.1.3",
|
|
24
|
+
"@checkstack/catalog-common": "2.2.3",
|
|
25
|
+
"@checkstack/catalog-backend": "1.2.0",
|
|
26
|
+
"@checkstack/command-backend": "0.1.31",
|
|
27
|
+
"@checkstack/signal-common": "0.2.5",
|
|
28
|
+
"@checkstack/automation-backend": "0.2.0",
|
|
29
|
+
"@checkstack/gitops-backend": "0.3.7",
|
|
30
|
+
"@checkstack/gitops-common": "0.4.2",
|
|
31
|
+
"@checkstack/common": "0.12.0",
|
|
32
|
+
"@checkstack/queue-api": "0.3.6",
|
|
33
33
|
"drizzle-orm": "^0.45.0",
|
|
34
34
|
"zod": "^4.2.1",
|
|
35
35
|
"@orpc/server": "^1.13.2"
|
|
36
36
|
},
|
|
37
37
|
"devDependencies": {
|
|
38
38
|
"@checkstack/drizzle-helper": "0.0.5",
|
|
39
|
-
"@checkstack/scripts": "0.3.
|
|
40
|
-
"@checkstack/test-utils-backend": "0.1.
|
|
39
|
+
"@checkstack/scripts": "0.3.4",
|
|
40
|
+
"@checkstack/test-utils-backend": "0.1.31",
|
|
41
41
|
"@checkstack/tsconfig": "0.0.7",
|
|
42
42
|
"@types/bun": "^1.0.0",
|
|
43
43
|
"drizzle-kit": "^0.31.10",
|
package/src/hooks.ts
CHANGED
|
@@ -7,45 +7,6 @@ import type { AchievementType } from "@checkstack/slo-common";
|
|
|
7
7
|
* Registered as integration events so they flow through configured notification channels.
|
|
8
8
|
*/
|
|
9
9
|
export const sloHooks = {
|
|
10
|
-
/**
|
|
11
|
-
* Emitted when an SLO's error budget consumption exceeds the warning threshold.
|
|
12
|
-
*/
|
|
13
|
-
sloBudgetWarning: createHook<{
|
|
14
|
-
systemId: string;
|
|
15
|
-
objectiveId: string;
|
|
16
|
-
target: number;
|
|
17
|
-
budgetRemainingPercent: number;
|
|
18
|
-
}>("slo.budget.warning"),
|
|
19
|
-
|
|
20
|
-
/**
|
|
21
|
-
* Emitted when an SLO's error budget consumption exceeds the critical threshold.
|
|
22
|
-
*/
|
|
23
|
-
sloBudgetCritical: createHook<{
|
|
24
|
-
systemId: string;
|
|
25
|
-
objectiveId: string;
|
|
26
|
-
target: number;
|
|
27
|
-
budgetRemainingPercent: number;
|
|
28
|
-
}>("slo.budget.critical"),
|
|
29
|
-
|
|
30
|
-
/**
|
|
31
|
-
* Emitted when an SLO's error budget is fully exhausted.
|
|
32
|
-
*/
|
|
33
|
-
sloBudgetExhausted: createHook<{
|
|
34
|
-
systemId: string;
|
|
35
|
-
objectiveId: string;
|
|
36
|
-
target: number;
|
|
37
|
-
}>("slo.budget.exhausted"),
|
|
38
|
-
|
|
39
|
-
/**
|
|
40
|
-
* Emitted when a reliability streak is broken.
|
|
41
|
-
*/
|
|
42
|
-
sloStreakBroken: createHook<{
|
|
43
|
-
systemId: string;
|
|
44
|
-
objectiveId: string;
|
|
45
|
-
streak: number;
|
|
46
|
-
bestStreak: number;
|
|
47
|
-
}>("slo.streak.broken"),
|
|
48
|
-
|
|
49
10
|
/**
|
|
50
11
|
* Emitted when a system unlocks a new reliability achievement.
|
|
51
12
|
*/
|
package/src/index.ts
CHANGED
|
@@ -10,18 +10,34 @@ import {
|
|
|
10
10
|
AchievementTypeSchema,
|
|
11
11
|
} from "@checkstack/slo-common";
|
|
12
12
|
import { createBackendPlugin, coreServices } from "@checkstack/backend-api";
|
|
13
|
-
import {
|
|
13
|
+
import {
|
|
14
|
+
automationTriggerExtensionPoint,
|
|
15
|
+
entityExtensionPoint,
|
|
16
|
+
type EntityHandle,
|
|
17
|
+
} from "@checkstack/automation-backend";
|
|
14
18
|
import { SloService } from "./service";
|
|
15
19
|
import { SloEngine } from "./slo-engine";
|
|
16
20
|
import { createRouter } from "./router";
|
|
17
21
|
import { createSloCache } from "./cache";
|
|
18
22
|
import { DependencyApi } from "@checkstack/dependency-common";
|
|
19
23
|
import { HealthCheckApi } from "@checkstack/healthcheck-common";
|
|
20
|
-
import {
|
|
21
|
-
|
|
24
|
+
import {
|
|
25
|
+
CATALOG_SYSTEM_ENTITY_KIND,
|
|
26
|
+
} from "@checkstack/catalog-backend";
|
|
27
|
+
import {
|
|
28
|
+
HEALTH_ENTITY_KIND,
|
|
29
|
+
classifyHealthChange,
|
|
30
|
+
} from "@checkstack/healthcheck-backend";
|
|
22
31
|
import { registerSearchProvider } from "@checkstack/command-backend";
|
|
23
32
|
import { resolveRoute } from "@checkstack/common";
|
|
24
33
|
import { sloHooks } from "./hooks";
|
|
34
|
+
import {
|
|
35
|
+
SLO_ENTITY_KIND,
|
|
36
|
+
SloEntityStateSchema,
|
|
37
|
+
createSloEntityRead,
|
|
38
|
+
deriveSloTriggerEvents,
|
|
39
|
+
type SloEntityState,
|
|
40
|
+
} from "./slo-entity";
|
|
25
41
|
import { setupDailySnapshotJob } from "./streak-calculator";
|
|
26
42
|
import { setupWeeklyDigestJob } from "./weekly-digest";
|
|
27
43
|
import { evaluateAchievements } from "./achievement-evaluator";
|
|
@@ -32,32 +48,13 @@ import { registerSloGitOpsKinds } from "./slo-gitops-kinds";
|
|
|
32
48
|
// Integration Event Payload Schemas
|
|
33
49
|
// =============================================================================
|
|
34
50
|
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
const sloBudgetCriticalPayloadSchema = z.object({
|
|
43
|
-
systemId: z.string(),
|
|
44
|
-
objectiveId: z.string(),
|
|
45
|
-
target: z.number(),
|
|
46
|
-
budgetRemainingPercent: z.number(),
|
|
47
|
-
});
|
|
48
|
-
|
|
49
|
-
const sloBudgetExhaustedPayloadSchema = z.object({
|
|
50
|
-
systemId: z.string(),
|
|
51
|
-
objectiveId: z.string(),
|
|
52
|
-
target: z.number(),
|
|
53
|
-
});
|
|
54
|
-
|
|
55
|
-
const sloStreakBrokenPayloadSchema = z.object({
|
|
56
|
-
systemId: z.string(),
|
|
57
|
-
objectiveId: z.string(),
|
|
58
|
-
streak: z.number(),
|
|
59
|
-
bestStreak: z.number(),
|
|
60
|
-
});
|
|
51
|
+
// NOTE: The `budget.warning` / `.critical` / `.exhausted` and
|
|
52
|
+
// `streak.broken` trigger payload schemas were removed (§9.2). Those four
|
|
53
|
+
// thresholds are now authored as reactive `numeric_state` conditions over
|
|
54
|
+
// the `slo` entity's `budgetRemainingPercent` / `currentStreak`, not as
|
|
55
|
+
// pre-baked event triggers. The hooks they fronted were never emitted by
|
|
56
|
+
// the engine (inert), so removing the trigger registrations is behavior-
|
|
57
|
+
// preserving.
|
|
61
58
|
|
|
62
59
|
const sloAchievementUnlockedPayloadSchema = z.object({
|
|
63
60
|
systemId: z.string(),
|
|
@@ -89,6 +86,19 @@ const sloWeeklyDigestPayloadSchema = z.object({
|
|
|
89
86
|
// Plugin Definition
|
|
90
87
|
// =============================================================================
|
|
91
88
|
|
|
89
|
+
// Reactive `slo` entity handle (§10.7). Defined in register() via the
|
|
90
|
+
// entity extension point; mutated from the daily snapshot job onward.
|
|
91
|
+
let sloEntity: EntityHandle<SloEntityState> | undefined;
|
|
92
|
+
|
|
93
|
+
// The SLO service + engine are created in afterPluginsReady (they need the
|
|
94
|
+
// resolved database + RPC clients), but the PLUGIN-BACKED + COMPUTED entity
|
|
95
|
+
// `read` accessor must be supplied at `defineEntity` time in register(). These
|
|
96
|
+
// holders bridge the two: the `read` closure resolves them lazily, and
|
|
97
|
+
// afterPluginsReady sets them before any mutation runs (the daily job — the
|
|
98
|
+
// only mutation site — runs from afterPluginsReady onward).
|
|
99
|
+
let sloEntityServiceRef: SloService | undefined;
|
|
100
|
+
let sloEntityEngineRef: SloEngine | undefined;
|
|
101
|
+
|
|
92
102
|
export default createBackendPlugin({
|
|
93
103
|
metadata: pluginMetadata,
|
|
94
104
|
register(env) {
|
|
@@ -99,59 +109,53 @@ export default createBackendPlugin({
|
|
|
99
109
|
automationTriggerExtensionPoint,
|
|
100
110
|
);
|
|
101
111
|
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
112
|
+
// ─── Reactive `slo` entity (§10.7, §9.2) ───────────────────────────
|
|
113
|
+
// The SLO budget IS the entity. The former `budget.warning/.critical/
|
|
114
|
+
// .exhausted` + `streak.broken` triggers are removed — those thresholds
|
|
115
|
+
// are now authored as `numeric_state` conditions over
|
|
116
|
+
// `state.slo.<objectiveId>.budgetRemainingPercent` / `currentStreak`.
|
|
117
|
+
// The deriver fires no legacy events; it exists so `slo` is a known
|
|
118
|
+
// reactive kind (scope + wake resolution).
|
|
119
|
+
//
|
|
120
|
+
// PLUGIN-BACKED + COMPUTED (Model B): there is NO framework `entity_state`
|
|
121
|
+
// row. `read` assembles each objective's view by reading `slo_streaks` +
|
|
122
|
+
// `slo_objectives` and COMPUTING `budgetRemainingPercent` via the engine
|
|
123
|
+
// (see `createSloEntityRead`). No `indexes` — those only apply to
|
|
124
|
+
// store-backed kinds, and a plugin-backed kind keeps its state in its own
|
|
125
|
+
// tables. The `read` closure resolves the service + engine set by
|
|
126
|
+
// afterPluginsReady (the daily job is the only mutation site).
|
|
127
|
+
const entityPoint = env.getExtensionPoint(entityExtensionPoint);
|
|
128
|
+
sloEntity = entityPoint.defineEntity<SloEntityState>({
|
|
129
|
+
kind: SLO_ENTITY_KIND,
|
|
130
|
+
state: SloEntityStateSchema,
|
|
131
|
+
read: (ids) => {
|
|
132
|
+
const service = sloEntityServiceRef;
|
|
133
|
+
const engine = sloEntityEngineRef;
|
|
134
|
+
if (!service || !engine) {
|
|
135
|
+
throw new Error(
|
|
136
|
+
"slo entity read before init: service/engine not yet resolved",
|
|
137
|
+
);
|
|
138
|
+
}
|
|
139
|
+
return createSloEntityRead({ service, engine })(ids);
|
|
126
140
|
},
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
id: "streak.broken",
|
|
146
|
-
displayName: "SLO Streak Broken",
|
|
147
|
-
description: "Fired when a reliability streak is broken",
|
|
148
|
-
category: "SLO",
|
|
149
|
-
payloadSchema: sloStreakBrokenPayloadSchema,
|
|
150
|
-
hook: sloHooks.sloStreakBroken,
|
|
151
|
-
contextKey: (p) => p.systemId,
|
|
152
|
-
},
|
|
153
|
-
pluginMetadata,
|
|
154
|
-
);
|
|
141
|
+
});
|
|
142
|
+
entityPoint.registerChangeDeriver({
|
|
143
|
+
kind: SLO_ENTITY_KIND,
|
|
144
|
+
derive: deriveSloTriggerEvents,
|
|
145
|
+
});
|
|
146
|
+
// Event-sourced history is NOT the live entity (§5): downtime events +
|
|
147
|
+
// daily snapshots are append-only records, the budget/streak is the
|
|
148
|
+
// reactive entity.
|
|
149
|
+
entityPoint.declareNonReactiveState({
|
|
150
|
+
table: "slo_downtime_events",
|
|
151
|
+
reason: "bookkeeping",
|
|
152
|
+
note: "Append-only downtime history. The live budget/streak is the `slo` entity.",
|
|
153
|
+
});
|
|
154
|
+
entityPoint.declareNonReactiveState({
|
|
155
|
+
table: "slo_daily_snapshots",
|
|
156
|
+
reason: "bookkeeping",
|
|
157
|
+
note: "Append-only daily trend snapshots. The live budget/streak is the `slo` entity.",
|
|
158
|
+
});
|
|
155
159
|
|
|
156
160
|
automationTriggers.registerTrigger(
|
|
157
161
|
{
|
|
@@ -183,6 +187,8 @@ export default createBackendPlugin({
|
|
|
183
187
|
// Shared references across init/afterPluginsReady (maintenance-backend pattern)
|
|
184
188
|
let sharedEngine: SloEngine;
|
|
185
189
|
let gitopsService: SloService | undefined;
|
|
190
|
+
// Reactive `slo` entity handle (§10.7), defined just above in register().
|
|
191
|
+
const onEntityChanged = entityPoint.onEntityChanged;
|
|
186
192
|
|
|
187
193
|
// ─── GitOps Entity Kind Registration ─────────────────────────────
|
|
188
194
|
const kindRegistry = env.getExtensionPoint(entityKindExtensionPoint);
|
|
@@ -264,7 +270,6 @@ export default createBackendPlugin({
|
|
|
264
270
|
afterPluginsReady: async ({
|
|
265
271
|
database,
|
|
266
272
|
logger,
|
|
267
|
-
onHook,
|
|
268
273
|
emitHook,
|
|
269
274
|
rpcClient,
|
|
270
275
|
signalService,
|
|
@@ -277,6 +282,12 @@ export default createBackendPlugin({
|
|
|
277
282
|
signalService,
|
|
278
283
|
logger,
|
|
279
284
|
});
|
|
285
|
+
// Publish the service + engine for the PLUGIN-BACKED + COMPUTED entity
|
|
286
|
+
// `read` accessor (defined in register()). The daily snapshot job — the
|
|
287
|
+
// only `slo` mutation site — runs from here onward, so the refs are set
|
|
288
|
+
// before any `read`/`mutate` can fire.
|
|
289
|
+
sloEntityServiceRef = service;
|
|
290
|
+
sloEntityEngineRef = engine;
|
|
280
291
|
|
|
281
292
|
const dependencyClient = rpcClient.forPlugin(DependencyApi);
|
|
282
293
|
const healthCheckClient = rpcClient.forPlugin(HealthCheckApi);
|
|
@@ -345,41 +356,52 @@ export default createBackendPlugin({
|
|
|
345
356
|
}
|
|
346
357
|
};
|
|
347
358
|
|
|
359
|
+
// Cross-plugin consumers now react to the reactive `health` /
|
|
360
|
+
// `catalog-system` ENTITY changes via `onEntityChanged` instead of
|
|
361
|
+
// the (being-removed) directional hooks (§10.7). `classifyHealthChange`
|
|
362
|
+
// reproduces the exact degraded/recovered transition predicate the
|
|
363
|
+
// old `systemDegraded` / `systemHealthy` hooks fired on. Each
|
|
364
|
+
// consumer keeps `work-queue` delivery with its original
|
|
365
|
+
// `workerGroup`: these are side-effecting writes (open/close downtime,
|
|
366
|
+
// achievements, cleanup) that must run exactly once per cluster — not
|
|
367
|
+
// per-instance — so broadcast would double-apply them.
|
|
368
|
+
|
|
348
369
|
// =====================================================================
|
|
349
370
|
// Perspective 1: System goes DOWN — open downtime events
|
|
350
371
|
// =====================================================================
|
|
351
|
-
|
|
352
|
-
|
|
353
|
-
async (
|
|
372
|
+
onEntityChanged({
|
|
373
|
+
kind: HEALTH_ENTITY_KIND,
|
|
374
|
+
handler: async (change) => {
|
|
375
|
+
const { systemId, degraded, previousStatus, newStatus } =
|
|
376
|
+
classifyHealthChange(change);
|
|
377
|
+
if (!degraded) return;
|
|
354
378
|
logger.debug(
|
|
355
|
-
`SLO: System ${
|
|
379
|
+
`SLO: System ${systemId} degraded (${previousStatus} → ${newStatus})`,
|
|
356
380
|
);
|
|
357
381
|
await engine.handleSystemDown({
|
|
358
|
-
systemId
|
|
382
|
+
systemId,
|
|
359
383
|
getUpstreamHealthStatus,
|
|
360
384
|
});
|
|
361
385
|
},
|
|
362
|
-
{ mode: "work-queue", workerGroup: "slo-system-down" },
|
|
363
|
-
);
|
|
386
|
+
delivery: { mode: "work-queue", workerGroup: "slo-system-down" },
|
|
387
|
+
});
|
|
364
388
|
|
|
365
389
|
// =====================================================================
|
|
366
390
|
// Perspective 1: System goes UP — close downtime events
|
|
367
391
|
// =====================================================================
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
async (
|
|
371
|
-
|
|
372
|
-
|
|
373
|
-
|
|
374
|
-
});
|
|
392
|
+
onEntityChanged({
|
|
393
|
+
kind: HEALTH_ENTITY_KIND,
|
|
394
|
+
handler: async (change) => {
|
|
395
|
+
const { systemId, recovered } = classifyHealthChange(change);
|
|
396
|
+
if (!recovered) return;
|
|
397
|
+
logger.debug(`SLO: System ${systemId} recovered`);
|
|
398
|
+
await engine.handleSystemUp({ systemId });
|
|
375
399
|
|
|
376
400
|
// Also handle Perspective 2 (as upstream)
|
|
377
|
-
const downstreamIds = await getDownstreamSystemIds(
|
|
378
|
-
payload.systemId,
|
|
379
|
-
);
|
|
401
|
+
const downstreamIds = await getDownstreamSystemIds(systemId);
|
|
380
402
|
if (downstreamIds.length > 0) {
|
|
381
403
|
await engine.handleUpstreamUp({
|
|
382
|
-
upstreamSystemId:
|
|
404
|
+
upstreamSystemId: systemId,
|
|
383
405
|
downstreamSystemIds: downstreamIds,
|
|
384
406
|
getUpstreamHealthStatus,
|
|
385
407
|
});
|
|
@@ -387,54 +409,53 @@ export default createBackendPlugin({
|
|
|
387
409
|
|
|
388
410
|
// Evaluate achievements on recovery (rapid_recovery, clean_sheet, etc.)
|
|
389
411
|
await evaluateAchievements({
|
|
390
|
-
systemId
|
|
412
|
+
systemId,
|
|
391
413
|
service,
|
|
392
414
|
engine,
|
|
393
415
|
logger,
|
|
394
416
|
});
|
|
395
417
|
},
|
|
396
|
-
{ mode: "work-queue", workerGroup: "slo-system-up" },
|
|
397
|
-
);
|
|
418
|
+
delivery: { mode: "work-queue", workerGroup: "slo-system-up" },
|
|
419
|
+
});
|
|
398
420
|
|
|
399
421
|
// =====================================================================
|
|
400
422
|
// Perspective 2: Upstream degraded — split downstream "self" events
|
|
401
|
-
// We re-use the
|
|
423
|
+
// We re-use the degraded transition, checking downstream systems
|
|
402
424
|
// =====================================================================
|
|
403
|
-
|
|
404
|
-
|
|
405
|
-
async (
|
|
406
|
-
const
|
|
407
|
-
|
|
408
|
-
);
|
|
425
|
+
onEntityChanged({
|
|
426
|
+
kind: HEALTH_ENTITY_KIND,
|
|
427
|
+
handler: async (change) => {
|
|
428
|
+
const { systemId, degraded } = classifyHealthChange(change);
|
|
429
|
+
if (!degraded) return;
|
|
430
|
+
const downstreamIds = await getDownstreamSystemIds(systemId);
|
|
409
431
|
if (downstreamIds.length > 0) {
|
|
410
432
|
await engine.handleUpstreamDown({
|
|
411
|
-
upstreamSystemId:
|
|
412
|
-
upstreamSystemName:
|
|
433
|
+
upstreamSystemId: systemId,
|
|
434
|
+
upstreamSystemName: systemId,
|
|
413
435
|
downstreamSystemIds: downstreamIds,
|
|
414
436
|
});
|
|
415
437
|
}
|
|
416
438
|
},
|
|
417
|
-
{ mode: "work-queue", workerGroup: "slo-upstream-down" },
|
|
418
|
-
);
|
|
439
|
+
delivery: { mode: "work-queue", workerGroup: "slo-upstream-down" },
|
|
440
|
+
});
|
|
419
441
|
|
|
420
442
|
// =====================================================================
|
|
421
|
-
// Subscribe to catalog system deletion for cleanup
|
|
443
|
+
// Subscribe to catalog system deletion (tombstone) for cleanup
|
|
422
444
|
// =====================================================================
|
|
423
|
-
|
|
424
|
-
|
|
425
|
-
async (
|
|
445
|
+
onEntityChanged({
|
|
446
|
+
kind: CATALOG_SYSTEM_ENTITY_KIND,
|
|
447
|
+
handler: async (change) => {
|
|
448
|
+
// Only react to a tombstone (delete), not create/update.
|
|
449
|
+
if (change.next !== null) return;
|
|
450
|
+
const systemId = change.id;
|
|
426
451
|
logger.debug(
|
|
427
|
-
`Cleaning up SLO data for deleted system: ${
|
|
452
|
+
`Cleaning up SLO data for deleted system: ${systemId}`,
|
|
428
453
|
);
|
|
429
|
-
await service.deleteObjectivesForSystem({
|
|
430
|
-
|
|
431
|
-
});
|
|
432
|
-
await service.deleteAchievementsForSystem({
|
|
433
|
-
systemId: payload.systemId,
|
|
434
|
-
});
|
|
454
|
+
await service.deleteObjectivesForSystem({ systemId });
|
|
455
|
+
await service.deleteAchievementsForSystem({ systemId });
|
|
435
456
|
},
|
|
436
|
-
{ mode: "work-queue", workerGroup: "slo-system-cleanup" },
|
|
437
|
-
);
|
|
457
|
+
delivery: { mode: "work-queue", workerGroup: "slo-system-cleanup" },
|
|
458
|
+
});
|
|
438
459
|
|
|
439
460
|
// =====================================================================
|
|
440
461
|
// Daily snapshot + streak calculation cron job
|
|
@@ -444,6 +465,7 @@ export default createBackendPlugin({
|
|
|
444
465
|
engine,
|
|
445
466
|
logger,
|
|
446
467
|
queueManager,
|
|
468
|
+
getSloEntity: () => sloEntity,
|
|
447
469
|
});
|
|
448
470
|
|
|
449
471
|
// =====================================================================
|
|
@@ -0,0 +1,255 @@
|
|
|
1
|
+
import { describe, it, expect } from "bun:test";
|
|
2
|
+
import type { EntityHandle } from "@checkstack/automation-backend";
|
|
3
|
+
|
|
4
|
+
import {
|
|
5
|
+
SLO_ENTITY_KIND,
|
|
6
|
+
SloEntityStateSchema,
|
|
7
|
+
computeSloEntityState,
|
|
8
|
+
createSloEntityRead,
|
|
9
|
+
deriveSloTriggerEvents,
|
|
10
|
+
writeSloEntity,
|
|
11
|
+
type SloEntityState,
|
|
12
|
+
} from "./slo-entity";
|
|
13
|
+
import type { SloService } from "./service";
|
|
14
|
+
import type { SloEngine } from "./slo-engine";
|
|
15
|
+
|
|
16
|
+
describe("deriveSloTriggerEvents", () => {
|
|
17
|
+
it("fires no legacy trigger events (thresholds are numeric_state conditions, §9.2)", () => {
|
|
18
|
+
expect(
|
|
19
|
+
deriveSloTriggerEvents({
|
|
20
|
+
kind: SLO_ENTITY_KIND,
|
|
21
|
+
id: "obj-1",
|
|
22
|
+
prev: null,
|
|
23
|
+
next: {
|
|
24
|
+
objectiveId: "obj-1",
|
|
25
|
+
systemId: "sys-1",
|
|
26
|
+
target: 99.9,
|
|
27
|
+
budgetRemainingPercent: 10,
|
|
28
|
+
currentStreak: 0,
|
|
29
|
+
bestStreak: 5,
|
|
30
|
+
},
|
|
31
|
+
delta: {},
|
|
32
|
+
changedFields: [],
|
|
33
|
+
actor: { type: "system", id: "system" },
|
|
34
|
+
occurredAt: new Date().toISOString(),
|
|
35
|
+
}),
|
|
36
|
+
).toEqual([]);
|
|
37
|
+
});
|
|
38
|
+
});
|
|
39
|
+
|
|
40
|
+
describe("SloEntityStateSchema", () => {
|
|
41
|
+
it("parses the reactive subset", () => {
|
|
42
|
+
const parsed = SloEntityStateSchema.parse({
|
|
43
|
+
objectiveId: "o",
|
|
44
|
+
systemId: "s",
|
|
45
|
+
target: 99.5,
|
|
46
|
+
budgetRemainingPercent: 42,
|
|
47
|
+
currentStreak: 3,
|
|
48
|
+
bestStreak: 9,
|
|
49
|
+
});
|
|
50
|
+
expect(parsed.budgetRemainingPercent).toBe(42);
|
|
51
|
+
});
|
|
52
|
+
});
|
|
53
|
+
|
|
54
|
+
// ─── Fakes ──────────────────────────────────────────────────────────────
|
|
55
|
+
|
|
56
|
+
function makeService(over: {
|
|
57
|
+
objective?: { id: string; systemId: string; target: number } | undefined;
|
|
58
|
+
streak?: { currentStreak: number; bestStreak: number } | undefined;
|
|
59
|
+
}): SloService {
|
|
60
|
+
return {
|
|
61
|
+
async getObjective() {
|
|
62
|
+
return over.objective;
|
|
63
|
+
},
|
|
64
|
+
async getStreak() {
|
|
65
|
+
return over.streak;
|
|
66
|
+
},
|
|
67
|
+
} as unknown as SloService;
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
function makeEngine(budgetRemainingPercent: number): SloEngine {
|
|
71
|
+
return {
|
|
72
|
+
async computeStatus() {
|
|
73
|
+
return { errorBudgetRemainingPercent: budgetRemainingPercent };
|
|
74
|
+
},
|
|
75
|
+
} as unknown as SloEngine;
|
|
76
|
+
}
|
|
77
|
+
|
|
78
|
+
describe("computeSloEntityState", () => {
|
|
79
|
+
it("assembles the view by reading streak/objective + COMPUTING budget", async () => {
|
|
80
|
+
const service = makeService({
|
|
81
|
+
objective: { id: "obj-1", systemId: "sys-1", target: 99.9 },
|
|
82
|
+
streak: { currentStreak: 4, bestStreak: 12 },
|
|
83
|
+
});
|
|
84
|
+
const engine = makeEngine(20);
|
|
85
|
+
const state = await computeSloEntityState({
|
|
86
|
+
service,
|
|
87
|
+
engine,
|
|
88
|
+
objectiveId: "obj-1",
|
|
89
|
+
});
|
|
90
|
+
expect(state).toEqual({
|
|
91
|
+
objectiveId: "obj-1",
|
|
92
|
+
systemId: "sys-1",
|
|
93
|
+
target: 99.9,
|
|
94
|
+
budgetRemainingPercent: 20,
|
|
95
|
+
currentStreak: 4,
|
|
96
|
+
bestStreak: 12,
|
|
97
|
+
});
|
|
98
|
+
});
|
|
99
|
+
|
|
100
|
+
it("defaults missing streak counters to 0", async () => {
|
|
101
|
+
const service = makeService({
|
|
102
|
+
objective: { id: "obj-2", systemId: "sys-2", target: 99 },
|
|
103
|
+
streak: undefined,
|
|
104
|
+
});
|
|
105
|
+
const state = await computeSloEntityState({
|
|
106
|
+
service,
|
|
107
|
+
engine: makeEngine(100),
|
|
108
|
+
objectiveId: "obj-2",
|
|
109
|
+
});
|
|
110
|
+
expect(state?.currentStreak).toBe(0);
|
|
111
|
+
expect(state?.bestStreak).toBe(0);
|
|
112
|
+
});
|
|
113
|
+
|
|
114
|
+
it("returns undefined when the objective no longer exists", async () => {
|
|
115
|
+
const service = makeService({ objective: undefined });
|
|
116
|
+
const state = await computeSloEntityState({
|
|
117
|
+
service,
|
|
118
|
+
engine: makeEngine(50),
|
|
119
|
+
objectiveId: "gone",
|
|
120
|
+
});
|
|
121
|
+
expect(state).toBeUndefined();
|
|
122
|
+
});
|
|
123
|
+
});
|
|
124
|
+
|
|
125
|
+
describe("createSloEntityRead", () => {
|
|
126
|
+
it("computes the view per id and omits missing objectives", async () => {
|
|
127
|
+
const service = {
|
|
128
|
+
async getObjective({ id }: { id: string }) {
|
|
129
|
+
if (id === "obj-1") return { id, systemId: "sys-1", target: 99.9 };
|
|
130
|
+
return undefined;
|
|
131
|
+
},
|
|
132
|
+
async getStreak() {
|
|
133
|
+
return { currentStreak: 2, bestStreak: 7 };
|
|
134
|
+
},
|
|
135
|
+
} as unknown as SloService;
|
|
136
|
+
const read = createSloEntityRead({ service, engine: makeEngine(33) });
|
|
137
|
+
const out = await read(["obj-1", "missing"]);
|
|
138
|
+
expect(Object.keys(out)).toEqual(["obj-1"]);
|
|
139
|
+
expect(out["obj-1"]).toEqual({
|
|
140
|
+
objectiveId: "obj-1",
|
|
141
|
+
systemId: "sys-1",
|
|
142
|
+
target: 99.9,
|
|
143
|
+
budgetRemainingPercent: 33,
|
|
144
|
+
currentStreak: 2,
|
|
145
|
+
bestStreak: 7,
|
|
146
|
+
});
|
|
147
|
+
});
|
|
148
|
+
|
|
149
|
+
it("returns {} for an empty id list without touching the service", async () => {
|
|
150
|
+
let called = false;
|
|
151
|
+
const service = {
|
|
152
|
+
async getObjective() {
|
|
153
|
+
called = true;
|
|
154
|
+
return undefined;
|
|
155
|
+
},
|
|
156
|
+
} as unknown as SloService;
|
|
157
|
+
const read = createSloEntityRead({ service, engine: makeEngine(0) });
|
|
158
|
+
expect(await read([])).toEqual({});
|
|
159
|
+
expect(called).toBe(false);
|
|
160
|
+
});
|
|
161
|
+
});
|
|
162
|
+
|
|
163
|
+
describe("writeSloEntity", () => {
|
|
164
|
+
it("drives the streak write through handle.mutate keyed by objectiveId", async () => {
|
|
165
|
+
const calls: Array<{ id: string; next: SloEntityState }> = [];
|
|
166
|
+
const handle = {
|
|
167
|
+
kind: SLO_ENTITY_KIND,
|
|
168
|
+
async mutate(input: {
|
|
169
|
+
id: string;
|
|
170
|
+
apply: () => Promise<SloEntityState>;
|
|
171
|
+
}) {
|
|
172
|
+
const next = await input.apply();
|
|
173
|
+
calls.push({ id: input.id, next });
|
|
174
|
+
return next;
|
|
175
|
+
},
|
|
176
|
+
} as unknown as EntityHandle<SloEntityState>;
|
|
177
|
+
|
|
178
|
+
let applied = false;
|
|
179
|
+
await writeSloEntity({
|
|
180
|
+
handle,
|
|
181
|
+
objectiveId: "obj-7",
|
|
182
|
+
apply: async () => {
|
|
183
|
+
applied = true;
|
|
184
|
+
return {
|
|
185
|
+
objectiveId: "obj-7",
|
|
186
|
+
systemId: "sys-7",
|
|
187
|
+
target: 99.9,
|
|
188
|
+
budgetRemainingPercent: 20,
|
|
189
|
+
currentStreak: 4,
|
|
190
|
+
bestStreak: 12,
|
|
191
|
+
};
|
|
192
|
+
},
|
|
193
|
+
});
|
|
194
|
+
expect(applied).toBe(true);
|
|
195
|
+
expect(calls).toEqual([
|
|
196
|
+
{
|
|
197
|
+
id: "obj-7",
|
|
198
|
+
next: {
|
|
199
|
+
objectiveId: "obj-7",
|
|
200
|
+
systemId: "sys-7",
|
|
201
|
+
target: 99.9,
|
|
202
|
+
budgetRemainingPercent: 20,
|
|
203
|
+
currentStreak: 4,
|
|
204
|
+
bestStreak: 12,
|
|
205
|
+
},
|
|
206
|
+
},
|
|
207
|
+
]);
|
|
208
|
+
});
|
|
209
|
+
|
|
210
|
+
it("still runs the streak write when no handle is wired", async () => {
|
|
211
|
+
let applied = false;
|
|
212
|
+
await writeSloEntity({
|
|
213
|
+
handle: undefined,
|
|
214
|
+
objectiveId: "x",
|
|
215
|
+
apply: async () => {
|
|
216
|
+
applied = true;
|
|
217
|
+
return {
|
|
218
|
+
objectiveId: "x",
|
|
219
|
+
systemId: "x",
|
|
220
|
+
target: 1,
|
|
221
|
+
budgetRemainingPercent: 1,
|
|
222
|
+
currentStreak: 0,
|
|
223
|
+
bestStreak: 0,
|
|
224
|
+
};
|
|
225
|
+
},
|
|
226
|
+
});
|
|
227
|
+
expect(applied).toBe(true);
|
|
228
|
+
});
|
|
229
|
+
|
|
230
|
+
it("routes entity-layer errors to onError (fail-soft) without rethrowing", async () => {
|
|
231
|
+
let captured: unknown;
|
|
232
|
+
const handle = {
|
|
233
|
+
kind: SLO_ENTITY_KIND,
|
|
234
|
+
async mutate() {
|
|
235
|
+
throw new Error("nope");
|
|
236
|
+
},
|
|
237
|
+
} as unknown as EntityHandle<SloEntityState>;
|
|
238
|
+
await writeSloEntity({
|
|
239
|
+
handle,
|
|
240
|
+
objectiveId: "x",
|
|
241
|
+
apply: async () => ({
|
|
242
|
+
objectiveId: "x",
|
|
243
|
+
systemId: "x",
|
|
244
|
+
target: 1,
|
|
245
|
+
budgetRemainingPercent: 1,
|
|
246
|
+
currentStreak: 0,
|
|
247
|
+
bestStreak: 0,
|
|
248
|
+
}),
|
|
249
|
+
onError: (e) => {
|
|
250
|
+
captured = e;
|
|
251
|
+
},
|
|
252
|
+
});
|
|
253
|
+
expect((captured as Error).message).toBe("nope");
|
|
254
|
+
});
|
|
255
|
+
});
|
|
@@ -0,0 +1,162 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* The reactive `slo` entity (reactive automation engine §10.7, §9.2).
|
|
3
|
+
*
|
|
4
|
+
* Model B PLUGIN-BACKED + COMPUTED entity. There is NO framework
|
|
5
|
+
* `entity_state` row for an SLO. The current state is assembled on demand by
|
|
6
|
+
* the `read` accessor from two sources:
|
|
7
|
+
*
|
|
8
|
+
* - `slo_streaks` + `slo_objectives` (authoritative tables) supply
|
|
9
|
+
* `currentStreak` / `bestStreak` / `systemId` / `target`, and
|
|
10
|
+
* - the SLO engine COMPUTES `budgetRemainingPercent` (and re-surfaces
|
|
11
|
+
* `target`) on the fly via `computeStatus` (downtime aggregation over the
|
|
12
|
+
* objective's window).
|
|
13
|
+
*
|
|
14
|
+
* The streak-persist site (the daily snapshot job) drives its write through
|
|
15
|
+
* `handle.mutate({ id: objectiveId, apply })`: `apply` persists the streak to
|
|
16
|
+
* `slo_streaks` (the plugin's own write) and returns the freshly-computed
|
|
17
|
+
* view. The framework snapshots `prev` via `read` BEFORE the write, appends
|
|
18
|
+
* the transition log, and emits `ENTITY_CHANGED`.
|
|
19
|
+
*
|
|
20
|
+
* Per §9.2 the SLO budget IS the entity, and the four removed threshold hooks
|
|
21
|
+
* (`budget.warning/critical/exhausted`, `streak.broken`) become derived
|
|
22
|
+
* `numeric_state` / `state` conditions over
|
|
23
|
+
* `state.slo.<objectiveId>.budgetRemainingPercent` + `currentStreak`. The
|
|
24
|
+
* change deriver therefore emits NO legacy trigger events — operators author
|
|
25
|
+
* thresholds as reactive conditions, not pre-baked event triggers.
|
|
26
|
+
*/
|
|
27
|
+
import { z } from "zod";
|
|
28
|
+
import type {
|
|
29
|
+
EntityChangeDeriver,
|
|
30
|
+
EntityHandle,
|
|
31
|
+
EntityMutationOpts,
|
|
32
|
+
EntityRead,
|
|
33
|
+
} from "@checkstack/automation-backend";
|
|
34
|
+
import { withEntityWrite } from "@checkstack/automation-backend";
|
|
35
|
+
|
|
36
|
+
import type { SloService } from "./service";
|
|
37
|
+
import type { SloEngine } from "./slo-engine";
|
|
38
|
+
|
|
39
|
+
export const SLO_ENTITY_KIND = "slo";
|
|
40
|
+
|
|
41
|
+
export const SloEntityStateSchema = z.object({
|
|
42
|
+
objectiveId: z.string(),
|
|
43
|
+
systemId: z.string(),
|
|
44
|
+
target: z.number(),
|
|
45
|
+
budgetRemainingPercent: z.number(),
|
|
46
|
+
currentStreak: z.number().int().nonnegative(),
|
|
47
|
+
bestStreak: z.number().int().nonnegative(),
|
|
48
|
+
});
|
|
49
|
+
|
|
50
|
+
export type SloEntityState = z.infer<typeof SloEntityStateSchema>;
|
|
51
|
+
|
|
52
|
+
/**
|
|
53
|
+
* SLO change → trigger events. Intentionally empty: the threshold/streak
|
|
54
|
+
* hooks were removed (§9.2) and replaced by `numeric_state` / `state`
|
|
55
|
+
* conditions over the entity state, so a change fires no legacy event. The
|
|
56
|
+
* deriver is still registered so the kind is a known reactive kind (its
|
|
57
|
+
* state is resolvable into automation scope for those conditions + wakes
|
|
58
|
+
* suspended `wait_until`s whose condition reads `state.slo.*`).
|
|
59
|
+
*/
|
|
60
|
+
export const deriveSloTriggerEvents: EntityChangeDeriver = () => [];
|
|
61
|
+
|
|
62
|
+
/**
|
|
63
|
+
* Compute the reactive `slo` view for a single objective: read the objective
|
|
64
|
+
* config + streak, compute the error-budget remaining via the engine, and
|
|
65
|
+
* assemble the `{ objectiveId, systemId, target, budgetRemainingPercent,
|
|
66
|
+
* currentStreak, bestStreak }` subset. Returns `undefined` when the objective
|
|
67
|
+
* no longer exists (missing ids are omitted from the batched `read`).
|
|
68
|
+
*
|
|
69
|
+
* Compute-on-read (not materialized): the budget is a pure function of the
|
|
70
|
+
* objective's append-only downtime history over its rolling window. Storing a
|
|
71
|
+
* second copy would duplicate the engine's source of truth and risk drift; a
|
|
72
|
+
* read recomputes from the same tables the API already reads. See the change
|
|
73
|
+
* doc for the cost assessment.
|
|
74
|
+
*/
|
|
75
|
+
export async function computeSloEntityState(args: {
|
|
76
|
+
service: SloService;
|
|
77
|
+
engine: SloEngine;
|
|
78
|
+
objectiveId: string;
|
|
79
|
+
}): Promise<SloEntityState | undefined> {
|
|
80
|
+
const { service, engine, objectiveId } = args;
|
|
81
|
+
const objective = await service.getObjective({ id: objectiveId });
|
|
82
|
+
if (!objective) return undefined;
|
|
83
|
+
|
|
84
|
+
const [status, streak] = await Promise.all([
|
|
85
|
+
engine.computeStatus({ objective }),
|
|
86
|
+
service.getStreak({ objectiveId }),
|
|
87
|
+
]);
|
|
88
|
+
|
|
89
|
+
return {
|
|
90
|
+
objectiveId,
|
|
91
|
+
systemId: objective.systemId,
|
|
92
|
+
target: objective.target,
|
|
93
|
+
budgetRemainingPercent: status.errorBudgetRemainingPercent,
|
|
94
|
+
currentStreak: streak?.currentStreak ?? 0,
|
|
95
|
+
bestStreak: streak?.bestStreak ?? 0,
|
|
96
|
+
};
|
|
97
|
+
}
|
|
98
|
+
|
|
99
|
+
/**
|
|
100
|
+
* Build the PLUGIN-BACKED + COMPUTED `read` accessor for the `slo` entity.
|
|
101
|
+
* For each objective id, assembles the view via {@link computeSloEntityState}
|
|
102
|
+
* (missing objectives omitted). This is the single source of truth that
|
|
103
|
+
* `handle.mutate` snapshots `prev` from and `get`/`getMany`/scope enrichment
|
|
104
|
+
* route through — no framework `entity_state` storage.
|
|
105
|
+
*/
|
|
106
|
+
export function createSloEntityRead(deps: {
|
|
107
|
+
service: SloService;
|
|
108
|
+
engine: SloEngine;
|
|
109
|
+
}): EntityRead<SloEntityState> {
|
|
110
|
+
const { service, engine } = deps;
|
|
111
|
+
return async (ids) => {
|
|
112
|
+
if (ids.length === 0) return {};
|
|
113
|
+
const out: Record<string, SloEntityState> = {};
|
|
114
|
+
await Promise.all(
|
|
115
|
+
ids.map(async (objectiveId) => {
|
|
116
|
+
const state = await computeSloEntityState({
|
|
117
|
+
service,
|
|
118
|
+
engine,
|
|
119
|
+
objectiveId,
|
|
120
|
+
});
|
|
121
|
+
if (state) out[objectiveId] = state;
|
|
122
|
+
}),
|
|
123
|
+
);
|
|
124
|
+
return out;
|
|
125
|
+
};
|
|
126
|
+
}
|
|
127
|
+
|
|
128
|
+
/**
|
|
129
|
+
* Drive the streak-persist write through `handle.mutate` (§10.7). `apply`
|
|
130
|
+
* performs the REAL `slo_streaks` write (the plugin's own db/tx) and returns
|
|
131
|
+
* the freshly-computed `slo` view (budget recomputed + post-write streak).
|
|
132
|
+
* The framework snapshots `prev` via `read` BEFORE the write, appends the
|
|
133
|
+
* transition log, and emits `ENTITY_CHANGED`. No-op (no emit) when the
|
|
134
|
+
* recomputed view is structurally equal to `prev`.
|
|
135
|
+
*
|
|
136
|
+
* When no handle is available (tests / before wiring), the write still runs
|
|
137
|
+
* — the entity reactivity is layered on top, never required for the streak
|
|
138
|
+
* write to succeed. Errors from the entity layer are routed to `onError` so a
|
|
139
|
+
* mirror/transition failure never breaks the daily job.
|
|
140
|
+
*/
|
|
141
|
+
export async function writeSloEntity(args: {
|
|
142
|
+
handle: EntityHandle<SloEntityState> | undefined;
|
|
143
|
+
objectiveId: string;
|
|
144
|
+
opts?: EntityMutationOpts;
|
|
145
|
+
apply: () => Promise<SloEntityState>;
|
|
146
|
+
onError?: (error: unknown) => void;
|
|
147
|
+
}): Promise<void> {
|
|
148
|
+
const { handle, objectiveId, opts, apply, onError } = args;
|
|
149
|
+
if (!handle) {
|
|
150
|
+
await apply();
|
|
151
|
+
return;
|
|
152
|
+
}
|
|
153
|
+
// A wired handle routes through the shared guard; the daily-job caller wants
|
|
154
|
+
// an entity-layer (mirror/transition) failure to be fail-soft so it never
|
|
155
|
+
// breaks the streak persist, so errors are routed to `onError` rather than
|
|
156
|
+
// rethrown (the bespoke SLO behavior the shared guard does not encode).
|
|
157
|
+
try {
|
|
158
|
+
await withEntityWrite({ handle, id: objectiveId, opts, apply });
|
|
159
|
+
} catch (error) {
|
|
160
|
+
onError?.(error);
|
|
161
|
+
}
|
|
162
|
+
}
|
package/src/streak-calculator.ts
CHANGED
|
@@ -2,6 +2,12 @@ import type { SloService } from "./service";
|
|
|
2
2
|
import type { SloEngine } from "./slo-engine";
|
|
3
3
|
import type { Logger } from "@checkstack/backend-api";
|
|
4
4
|
import type { QueueManager } from "@checkstack/queue-api";
|
|
5
|
+
import type { EntityHandle } from "@checkstack/automation-backend";
|
|
6
|
+
import {
|
|
7
|
+
computeSloEntityState,
|
|
8
|
+
writeSloEntity,
|
|
9
|
+
type SloEntityState,
|
|
10
|
+
} from "./slo-entity";
|
|
5
11
|
|
|
6
12
|
const SNAPSHOT_QUEUE = "slo-daily-snapshots";
|
|
7
13
|
const SNAPSHOT_JOB_ID = "slo-daily-snapshot-run";
|
|
@@ -12,6 +18,8 @@ interface StreakCalculatorDeps {
|
|
|
12
18
|
engine: SloEngine;
|
|
13
19
|
logger: Logger;
|
|
14
20
|
queueManager: QueueManager;
|
|
21
|
+
/** Resolver for the reactive `slo` entity (§10.7). Undefined in tests. */
|
|
22
|
+
getSloEntity?: () => EntityHandle<SloEntityState> | undefined;
|
|
15
23
|
}
|
|
16
24
|
|
|
17
25
|
/**
|
|
@@ -20,7 +28,7 @@ interface StreakCalculatorDeps {
|
|
|
20
28
|
* and updating streak counters for all active objectives.
|
|
21
29
|
*/
|
|
22
30
|
export async function setupDailySnapshotJob(deps: StreakCalculatorDeps) {
|
|
23
|
-
const { queueManager, logger, service, engine } = deps;
|
|
31
|
+
const { queueManager, logger, service, engine, getSloEntity } = deps;
|
|
24
32
|
|
|
25
33
|
const queue = queueManager.getQueue<{ trigger: "scheduled" }>(SNAPSHOT_QUEUE);
|
|
26
34
|
|
|
@@ -28,7 +36,7 @@ export async function setupDailySnapshotJob(deps: StreakCalculatorDeps) {
|
|
|
28
36
|
await queue.consume(
|
|
29
37
|
async () => {
|
|
30
38
|
logger.info("Starting daily SLO snapshot job");
|
|
31
|
-
await runDailySnapshotJob({ service, engine, logger });
|
|
39
|
+
await runDailySnapshotJob({ service, engine, logger, getSloEntity });
|
|
32
40
|
logger.info("Completed daily SLO snapshot job");
|
|
33
41
|
},
|
|
34
42
|
{ consumerGroup: WORKER_GROUP, maxRetries: 0 },
|
|
@@ -57,8 +65,9 @@ export async function runDailySnapshotJob(deps: {
|
|
|
57
65
|
service: SloService;
|
|
58
66
|
engine: SloEngine;
|
|
59
67
|
logger: Logger;
|
|
68
|
+
getSloEntity?: () => EntityHandle<SloEntityState> | undefined;
|
|
60
69
|
}) {
|
|
61
|
-
const { service, engine, logger } = deps;
|
|
70
|
+
const { service, engine, logger, getSloEntity } = deps;
|
|
62
71
|
|
|
63
72
|
const objectives = await service.listObjectives();
|
|
64
73
|
const today = new Date();
|
|
@@ -79,24 +88,64 @@ export async function runDailySnapshotJob(deps: {
|
|
|
79
88
|
availabilityPercent: status.currentAvailability ?? 100,
|
|
80
89
|
budgetConsumedMinutes: status.errorBudgetConsumedMinutes,
|
|
81
90
|
budgetRemainingPercent: status.errorBudgetRemainingPercent,
|
|
82
|
-
|
|
91
|
+
|
|
83
92
|
burnRate: status.burnRate ?? null,
|
|
84
93
|
streakDays: streak?.currentStreak ?? 0,
|
|
85
94
|
},
|
|
86
95
|
});
|
|
87
96
|
|
|
88
|
-
// 2. Update streak
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
97
|
+
// 2. Update streak (if currently meeting target, increment; else reset)
|
|
98
|
+
// AND surface the recomputed `slo` entity, driven through
|
|
99
|
+
// `handle.mutate` (§10.7). The REAL `slo_streaks` write runs INSIDE
|
|
100
|
+
// `apply` (the plugin's own write) so the framework snapshots `prev`
|
|
101
|
+
// via the COMPUTED `read` BEFORE the streak flips, then emits
|
|
102
|
+
// `ENTITY_CHANGED`. Operators author budget/streak thresholds as
|
|
103
|
+
// `numeric_state` conditions over this state (§9.2). The change is a
|
|
104
|
+
// no-op (no emit) when neither budget nor streak moved.
|
|
105
|
+
await writeSloEntity({
|
|
106
|
+
handle: getSloEntity?.(),
|
|
107
|
+
objectiveId: objective.id,
|
|
108
|
+
apply: async () => {
|
|
109
|
+
if (!status.isBreaching && !status.hasOpenDowntime) {
|
|
110
|
+
await service.incrementStreak({ objectiveId: objective.id });
|
|
111
|
+
} else if (status.isBreaching) {
|
|
112
|
+
const currentStreak = streak?.currentStreak ?? 0;
|
|
113
|
+
if (currentStreak > 0) {
|
|
114
|
+
await service.resetStreak({ objectiveId: objective.id });
|
|
115
|
+
logger.info(
|
|
116
|
+
`SLO ${objective.id}: Streak broken at ${currentStreak} days`,
|
|
117
|
+
);
|
|
118
|
+
}
|
|
119
|
+
}
|
|
120
|
+
// Re-assemble the computed view from the POST-write tables so the
|
|
121
|
+
// emitted `next` reflects the updated streak + recomputed budget.
|
|
122
|
+
const next = await computeSloEntityState({
|
|
123
|
+
service,
|
|
124
|
+
engine,
|
|
125
|
+
objectiveId: objective.id,
|
|
126
|
+
});
|
|
127
|
+
if (next) return next;
|
|
128
|
+
// The objective vanished mid-cycle (raced delete). Fall back to a
|
|
129
|
+
// view from the in-hand objective + post-write streak so `apply`
|
|
130
|
+
// still returns a valid state and the mutate is a no-op.
|
|
131
|
+
const freshStreak = await service.getStreak({
|
|
132
|
+
objectiveId: objective.id,
|
|
133
|
+
});
|
|
134
|
+
return {
|
|
135
|
+
objectiveId: objective.id,
|
|
136
|
+
systemId: objective.systemId,
|
|
137
|
+
target: objective.target,
|
|
138
|
+
budgetRemainingPercent: status.errorBudgetRemainingPercent,
|
|
139
|
+
currentStreak: freshStreak?.currentStreak ?? 0,
|
|
140
|
+
bestStreak: freshStreak?.bestStreak ?? 0,
|
|
141
|
+
};
|
|
142
|
+
},
|
|
143
|
+
onError: (error) =>
|
|
144
|
+
logger.warn(
|
|
145
|
+
`Failed to surface slo entity for objective ${objective.id}`,
|
|
146
|
+
{ error },
|
|
147
|
+
),
|
|
148
|
+
});
|
|
100
149
|
} catch (error) {
|
|
101
150
|
logger.error(
|
|
102
151
|
`Failed to process daily snapshot for objective ${objective.id}`,
|