@checkstack/slo-backend 0.4.6 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +215 -0
- package/package.json +19 -20
- package/src/hooks.ts +2 -40
- package/src/index.ts +161 -127
- package/src/slo-entity.test.ts +255 -0
- package/src/slo-entity.ts +162 -0
- package/src/streak-calculator.ts +65 -16
- package/tsconfig.json +3 -6
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,220 @@
|
|
|
1
1
|
# @checkstack/slo-backend
|
|
2
2
|
|
|
3
|
+
## 0.6.0
|
|
4
|
+
|
|
5
|
+
### Minor Changes
|
|
6
|
+
|
|
7
|
+
- b995afb: Make `slo` a plugin-backed, COMPUTED reactive entity via the Model-B entity state machine + rewire its cross-plugin consumers.
|
|
8
|
+
|
|
9
|
+
SLO defines a `slo` entity `{ objectiveId, systemId, target, budgetRemainingPercent, currentStreak, bestStreak }` keyed by `objectiveId`. There is no framework `entity_state` row: its current state is assembled on demand by a `read` accessor (`createSloEntityRead` / `computeSloEntityState`). `currentStreak` / `bestStreak` / `systemId` / `target` come from the authoritative `slo_streaks` + `slo_objectives` tables, and `budgetRemainingPercent` (plus `target`) is COMPUTED on the fly via the SLO engine's `computeStatus` (downtime aggregation over the objective's rolling window). The daily snapshot job's streak-persist write drives through the fail-soft `writeSloEntity` (`handle.mutate({ id: objectiveId, apply })`): `apply` persists the streak to `slo_streaks` (its own write) and returns the freshly-computed view; the framework snapshots `prev` via the computed `read` BEFORE the write, appends the transition log, and emits `ENTITY_CHANGED`.
|
|
10
|
+
|
|
11
|
+
Compute-on-read (not materialize): the budget is a pure function of the objective's append-only downtime history, so storing a second copy would duplicate the engine's source of truth and risk drift. The `read` recomputes from the same tables the SLO API already reads; it is only exercised on the prev-snapshot of the once-daily streak job and on reactive scope/wake resolution, so the recompute cost is negligible. The append-only `slo_downtime_events` + `slo_daily_snapshots` tables are declared non-reactive (bookkeeping); the live budget/streak is the entity. Operators author budget/streak thresholds as reactive `numeric_state` conditions over `state.slo.<objectiveId>.budgetRemainingPercent` / `currentStreak`.
|
|
12
|
+
|
|
13
|
+
The healthcheck + catalog consumers switched from `onHook(<hook>)` to `onEntityChanged({ kind })`, all keeping `work-queue` delivery (each handler performs side-effecting writes that must run once per cluster):
|
|
14
|
+
|
|
15
|
+
- `slo-system-down` / `slo-upstream-down`: react to `health` changes filtered to a degraded transition (`classifyHealthChange().degraded`).
|
|
16
|
+
- `slo-system-up`: reacts to `health` changes filtered to a recovered transition (`classifyHealthChange().recovered`).
|
|
17
|
+
- `slo-system-cleanup`: reacts to `catalog-system` tombstones (`change.next === null`).
|
|
18
|
+
|
|
19
|
+
BREAKING CHANGES:
|
|
20
|
+
|
|
21
|
+
- The `slo.budget.warning` / `slo.budget.critical` / `slo.budget.exhausted` and `slo.streak.broken` automation triggers are removed. These thresholds were never emitted by the engine (the underlying hooks were inert) and are replaced by reactive `numeric_state` conditions over the `slo` entity (`budgetRemainingPercent < 20`, `currentStreak == 0`, etc.). Re-author any automations that referenced these trigger ids as `numeric_state` / `state` conditions. The `slo.achievement.unlocked` and `slo.weekly.digest` triggers are KEPT.
|
|
22
|
+
|
|
23
|
+
- b995afb: Remove the dead `slo.budget.warning` / `slo.budget.critical` / `slo.budget.exhausted` / `slo.streak.broken` hook descriptors from `sloHooks`.
|
|
24
|
+
|
|
25
|
+
These four `createHook` descriptors had no emitter and no trigger registration left: per the reactive automation engine (§9.2) the SLO budget IS the reactive entity, and the old threshold/streak triggers became `numeric_state` / `state` conditions over `state.slo.<objectiveId>.budgetRemainingPercent` + `currentStreak`. Nothing in the repo emitted or subscribed to the four hooks, so they were unreachable surface. `sloAchievementUnlocked` and `sloWeeklyDigest` are unaffected and stay.
|
|
26
|
+
|
|
27
|
+
BREAKING CHANGES:
|
|
28
|
+
|
|
29
|
+
- Removed `sloHooks.sloBudgetWarning`, `sloHooks.sloBudgetCritical`, `sloHooks.sloBudgetExhausted`, and `sloHooks.sloStreakBroken`. Author SLO budget / streak threshold automations as reactive `numeric_state` / `state` conditions over the `slo` entity state instead.
|
|
30
|
+
|
|
31
|
+
### Patch Changes
|
|
32
|
+
|
|
33
|
+
- b995afb: Extract a shared `withEntityWrite` / `withEntityRemove` guard for PLUGIN-BACKED (Model B) reactive entities and refactor the per-domain copies onto it.
|
|
34
|
+
|
|
35
|
+
Every plugin-backed domain (incident, catalog, dependency, maintenance, slo, satellite) reimplemented the same "no handle wired → run the plugin write directly; handle wired → route through `handle.mutate` / `handle.remove`" guard, varying only in the id-key name. `@checkstack/automation-backend` now exports `withEntityWrite` / `withEntityRemove` (from the entity barrel) and each domain's thin, well-named wrappers (`writeIncidentEntity`, `writeMaintenanceEntity`, satellite's `mirror`, …) delegate to it, so the branch lives in exactly one place. Behavior is unchanged.
|
|
36
|
+
|
|
37
|
+
`writeHealthEntity` (healthcheck-backend) is intentionally NOT migrated onto the helper — it is genuinely bespoke (closure-captured durable state, distinct rethrow-vs-fail-soft branches, a per-system serializer, and it returns the computed state). SLO keeps its fail-soft `onError` wrapper around the shared guard.
|
|
38
|
+
|
|
39
|
+
- Updated dependencies [270ef29]
|
|
40
|
+
- Updated dependencies [b995afb]
|
|
41
|
+
- Updated dependencies [b995afb]
|
|
42
|
+
- Updated dependencies [b995afb]
|
|
43
|
+
- Updated dependencies [270ef29]
|
|
44
|
+
- Updated dependencies [270ef29]
|
|
45
|
+
- Updated dependencies [270ef29]
|
|
46
|
+
- Updated dependencies [270ef29]
|
|
47
|
+
- Updated dependencies [270ef29]
|
|
48
|
+
- Updated dependencies [270ef29]
|
|
49
|
+
- Updated dependencies [270ef29]
|
|
50
|
+
- Updated dependencies [270ef29]
|
|
51
|
+
- Updated dependencies [270ef29]
|
|
52
|
+
- Updated dependencies [b995afb]
|
|
53
|
+
- Updated dependencies [b995afb]
|
|
54
|
+
- Updated dependencies [b995afb]
|
|
55
|
+
- Updated dependencies [b995afb]
|
|
56
|
+
- Updated dependencies [270ef29]
|
|
57
|
+
- Updated dependencies [b995afb]
|
|
58
|
+
- Updated dependencies [270ef29]
|
|
59
|
+
- Updated dependencies [b995afb]
|
|
60
|
+
- Updated dependencies [b995afb]
|
|
61
|
+
- Updated dependencies [270ef29]
|
|
62
|
+
- Updated dependencies [b995afb]
|
|
63
|
+
- Updated dependencies [b995afb]
|
|
64
|
+
- Updated dependencies [270ef29]
|
|
65
|
+
- Updated dependencies [b995afb]
|
|
66
|
+
- Updated dependencies [b995afb]
|
|
67
|
+
- Updated dependencies [b995afb]
|
|
68
|
+
- Updated dependencies [b995afb]
|
|
69
|
+
- Updated dependencies [b995afb]
|
|
70
|
+
- Updated dependencies [b995afb]
|
|
71
|
+
- Updated dependencies [b995afb]
|
|
72
|
+
- Updated dependencies [270ef29]
|
|
73
|
+
- Updated dependencies [270ef29]
|
|
74
|
+
- Updated dependencies [270ef29]
|
|
75
|
+
- Updated dependencies [270ef29]
|
|
76
|
+
- Updated dependencies [270ef29]
|
|
77
|
+
- Updated dependencies [270ef29]
|
|
78
|
+
- Updated dependencies [270ef29]
|
|
79
|
+
- Updated dependencies [270ef29]
|
|
80
|
+
- Updated dependencies [b995afb]
|
|
81
|
+
- Updated dependencies [b995afb]
|
|
82
|
+
- @checkstack/backend-api@0.19.0
|
|
83
|
+
- @checkstack/automation-backend@0.3.0
|
|
84
|
+
- @checkstack/gitops-common@0.5.0
|
|
85
|
+
- @checkstack/gitops-backend@0.4.0
|
|
86
|
+
- @checkstack/healthcheck-backend@1.4.0
|
|
87
|
+
- @checkstack/healthcheck-common@1.4.0
|
|
88
|
+
- @checkstack/catalog-backend@1.3.0
|
|
89
|
+
- @checkstack/cache-api@0.3.7
|
|
90
|
+
- @checkstack/command-backend@0.1.32
|
|
91
|
+
- @checkstack/queue-api@0.3.7
|
|
92
|
+
- @checkstack/cache-utils@0.2.12
|
|
93
|
+
|
|
94
|
+
## 0.5.0
|
|
95
|
+
|
|
96
|
+
### Minor Changes
|
|
97
|
+
|
|
98
|
+
- 41c77f4: feat(automation): type enum-able trigger/artifact fields as enums for editor value autocompletion
|
|
99
|
+
|
|
100
|
+
The automation editor's staged completion offers concrete values after a
|
|
101
|
+
comparator (`{{ trigger.payload.severity == "high" }}`) only when the
|
|
102
|
+
field's JSON Schema carries an `enum`. Several trigger payload + artifact
|
|
103
|
+
schemas declared closed-set fields as loose `z.string()`, so no values
|
|
104
|
+
were suggested. Tightened them to the canonical enums that already
|
|
105
|
+
existed in each plugin's `-common` package (and matched the hook payload
|
|
106
|
+
types in lockstep so the trigger's `payloadSchema` and `hook` keep the
|
|
107
|
+
same `TPayload`):
|
|
108
|
+
|
|
109
|
+
- **incident** — trigger payloads: `severity` → `IncidentSeverityEnum`,
|
|
110
|
+
`status` / `statusChange` → `IncidentStatusEnum`.
|
|
111
|
+
- **healthcheck** — trigger payloads: `previousStatus` / `newStatus` /
|
|
112
|
+
`status` → `HealthCheckStatusSchema` (across systemDegraded,
|
|
113
|
+
systemHealthy, systemHealthChanged, checkFailed; plus checkCompleted's
|
|
114
|
+
hook type).
|
|
115
|
+
- **dependency** — trigger + artifact: `impactType` → `ImpactTypeSchema`;
|
|
116
|
+
impactPropagated `previousState` / `newState` → `DerivedStateSchema`.
|
|
117
|
+
Also deduped the inline `impactTypeSchema` action-config enum to reuse
|
|
118
|
+
the canonical `ImpactTypeSchema`.
|
|
119
|
+
- **maintenance** — trigger + artifact: `status` →
|
|
120
|
+
`MaintenanceStatusEnum`; deduped the inline `maintenanceStatusEnum`
|
|
121
|
+
(used by `add_update.statusChange`) to the canonical one.
|
|
122
|
+
- **slo** — `achievement.unlocked` trigger + hook: `achievement` →
|
|
123
|
+
`AchievementTypeSchema`.
|
|
124
|
+
|
|
125
|
+
Runtime behaviour is unchanged — these fields always carried valid enum
|
|
126
|
+
values (the underlying records are enum-constrained); only the schema
|
|
127
|
+
types were loose. The hook payload generics are now precise too, which
|
|
128
|
+
caught one stale test fixture asserting an invalid `impactType: "soft"`.
|
|
129
|
+
|
|
130
|
+
Fields that look enum-ish but are genuinely free-form were intentionally
|
|
131
|
+
left as `z.string()`: satellite `region` (user-entered), Jira issue
|
|
132
|
+
`status` (per-instance workflow name), notification `strategyQualifiedId`
|
|
133
|
+
/ `errorMessage`, healthcheck collector `result`, and script
|
|
134
|
+
`stdout` / `stderr`.
|
|
135
|
+
|
|
136
|
+
### Patch Changes
|
|
137
|
+
|
|
138
|
+
- 41c77f4: feat(automation): one-time migration of webhook subscriptions + remove legacy integration backend
|
|
139
|
+
|
|
140
|
+
**BREAKING CHANGES** (platform is in BETA — no major bump):
|
|
141
|
+
|
|
142
|
+
- `IntegrationProvider` no longer carries `config` (subscription
|
|
143
|
+
config) or `deliver`. The interface now models a connection provider
|
|
144
|
+
only: connection schema + `getConnectionOptions` + `testConnection`.
|
|
145
|
+
- The legacy subscription / delivery-log / event endpoints
|
|
146
|
+
(`listSubscriptions`, `createSubscription`, `getDeliveryLogs`,
|
|
147
|
+
`listEventTypes`, …) are removed from `integrationContract`.
|
|
148
|
+
- `delivery-coordinator`, `hook-subscriber`, `event-registry`, and the
|
|
149
|
+
`integrationEventExtensionPoint` are deleted. Plugins that
|
|
150
|
+
previously called `integrationEvents.registerEvent(...)` now
|
|
151
|
+
register their hooks as automation triggers via
|
|
152
|
+
`automationTriggerExtensionPoint.registerTrigger(...)`.
|
|
153
|
+
- Frontend pages `IntegrationsPage` and `DeliveryLogsPage` are gone;
|
|
154
|
+
the integration plugin's only remaining UI is connection
|
|
155
|
+
management. Subscription management lives under `/automation/...`.
|
|
156
|
+
- `webhook_subscriptions` and `delivery_logs` tables stay in the
|
|
157
|
+
database for one release as a safety net (no code reads or writes
|
|
158
|
+
them), and will be dropped in a follow-up migration.
|
|
159
|
+
|
|
160
|
+
**New**:
|
|
161
|
+
|
|
162
|
+
- `jira.create_issue`, `teams.post_message`, `webex.post_message`,
|
|
163
|
+
`webhook.send`, `integration-script.run_shell`, and
|
|
164
|
+
`integration-script.run_script` actions registered against the
|
|
165
|
+
Automation Platform with matching `*.message`, `*.delivery`,
|
|
166
|
+
`shell.result`, and `script.result` artifact types. The script
|
|
167
|
+
plugin exposes **two** actions — `run_shell` runs bash via the
|
|
168
|
+
shared `ShellScriptRunner` (Monaco `shell` editor), `run_script`
|
|
169
|
+
runs an ESM module in a Bun subprocess via `EsmScriptRunner`
|
|
170
|
+
(Monaco `typescript` editor + `defineIntegration` helper) — to
|
|
171
|
+
preserve the legacy provider split. `jira.create_issue` keeps the
|
|
172
|
+
dynamic field-mapping dropdown (driven by
|
|
173
|
+
`JIRA_RESOLVERS.FIELD_OPTIONS`).
|
|
174
|
+
- One-time data migration runs on boot in
|
|
175
|
+
`automation-backend.afterPluginsReady`. It reads
|
|
176
|
+
`webhook_subscriptions` via a new service RPC
|
|
177
|
+
`IntegrationApi.listLegacySubscriptions`, translates each row into
|
|
178
|
+
a single-trigger / single-action automation (marked with
|
|
179
|
+
`managed_by = "migrated-subscription:<id>"`), and is idempotent
|
|
180
|
+
across restarts.
|
|
181
|
+
- Failed translations are recorded in a new
|
|
182
|
+
`automation_migration_failures` table and surfaced via
|
|
183
|
+
`AutomationApi.listMigrationFailures` /
|
|
184
|
+
`acknowledgeMigrationFailure` so admins can review and re-create
|
|
185
|
+
failed entries by hand.
|
|
186
|
+
|
|
187
|
+
- Updated dependencies [e2d6f25]
|
|
188
|
+
- Updated dependencies [41c77f4]
|
|
189
|
+
- Updated dependencies [41c77f4]
|
|
190
|
+
- Updated dependencies [e1a2077]
|
|
191
|
+
- Updated dependencies [41c77f4]
|
|
192
|
+
- Updated dependencies [41c77f4]
|
|
193
|
+
- Updated dependencies [41c77f4]
|
|
194
|
+
- Updated dependencies [41c77f4]
|
|
195
|
+
- Updated dependencies [41c77f4]
|
|
196
|
+
- Updated dependencies [41c77f4]
|
|
197
|
+
- Updated dependencies [41c77f4]
|
|
198
|
+
- Updated dependencies [6d52276]
|
|
199
|
+
- Updated dependencies [6d52276]
|
|
200
|
+
- Updated dependencies [35bc682]
|
|
201
|
+
- @checkstack/automation-backend@0.2.0
|
|
202
|
+
- @checkstack/healthcheck-backend@1.3.0
|
|
203
|
+
- @checkstack/catalog-backend@1.2.0
|
|
204
|
+
- @checkstack/common@0.12.0
|
|
205
|
+
- @checkstack/backend-api@0.18.0
|
|
206
|
+
- @checkstack/healthcheck-common@1.3.0
|
|
207
|
+
- @checkstack/catalog-common@2.2.3
|
|
208
|
+
- @checkstack/dependency-common@1.1.3
|
|
209
|
+
- @checkstack/slo-common@0.4.2
|
|
210
|
+
- @checkstack/command-backend@0.1.31
|
|
211
|
+
- @checkstack/gitops-backend@0.3.7
|
|
212
|
+
- @checkstack/gitops-common@0.4.2
|
|
213
|
+
- @checkstack/signal-common@0.2.5
|
|
214
|
+
- @checkstack/cache-api@0.3.6
|
|
215
|
+
- @checkstack/queue-api@0.3.6
|
|
216
|
+
- @checkstack/cache-utils@0.2.11
|
|
217
|
+
|
|
3
218
|
## 0.4.6
|
|
4
219
|
|
|
5
220
|
### Patch Changes
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@checkstack/slo-backend",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.6.0",
|
|
4
4
|
"license": "Elastic-2.0",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "src/index.ts",
|
|
@@ -14,31 +14,30 @@
|
|
|
14
14
|
"lint:code": "eslint . --max-warnings 0"
|
|
15
15
|
},
|
|
16
16
|
"dependencies": {
|
|
17
|
-
"@checkstack/backend-api": "0.
|
|
18
|
-
"@checkstack/cache-api": "0.3.
|
|
19
|
-
"@checkstack/cache-utils": "0.2.
|
|
20
|
-
"@checkstack/slo-common": "0.4.
|
|
21
|
-
"@checkstack/healthcheck-common": "1.
|
|
22
|
-
"@checkstack/healthcheck-backend": "1.
|
|
23
|
-
"@checkstack/dependency-common": "1.1.
|
|
24
|
-
"@checkstack/catalog-common": "2.2.
|
|
25
|
-
"@checkstack/catalog-backend": "1.
|
|
26
|
-
"@checkstack/command-backend": "0.1.
|
|
27
|
-
"@checkstack/signal-common": "0.2.
|
|
28
|
-
"@checkstack/
|
|
29
|
-
"@checkstack/
|
|
30
|
-
"@checkstack/gitops-
|
|
31
|
-
"@checkstack/
|
|
32
|
-
"@checkstack/
|
|
33
|
-
"@checkstack/queue-api": "0.3.4",
|
|
17
|
+
"@checkstack/backend-api": "0.18.0",
|
|
18
|
+
"@checkstack/cache-api": "0.3.6",
|
|
19
|
+
"@checkstack/cache-utils": "0.2.11",
|
|
20
|
+
"@checkstack/slo-common": "0.4.2",
|
|
21
|
+
"@checkstack/healthcheck-common": "1.3.0",
|
|
22
|
+
"@checkstack/healthcheck-backend": "1.3.0",
|
|
23
|
+
"@checkstack/dependency-common": "1.1.3",
|
|
24
|
+
"@checkstack/catalog-common": "2.2.3",
|
|
25
|
+
"@checkstack/catalog-backend": "1.2.0",
|
|
26
|
+
"@checkstack/command-backend": "0.1.31",
|
|
27
|
+
"@checkstack/signal-common": "0.2.5",
|
|
28
|
+
"@checkstack/automation-backend": "0.2.0",
|
|
29
|
+
"@checkstack/gitops-backend": "0.3.7",
|
|
30
|
+
"@checkstack/gitops-common": "0.4.2",
|
|
31
|
+
"@checkstack/common": "0.12.0",
|
|
32
|
+
"@checkstack/queue-api": "0.3.6",
|
|
34
33
|
"drizzle-orm": "^0.45.0",
|
|
35
34
|
"zod": "^4.2.1",
|
|
36
35
|
"@orpc/server": "^1.13.2"
|
|
37
36
|
},
|
|
38
37
|
"devDependencies": {
|
|
39
38
|
"@checkstack/drizzle-helper": "0.0.5",
|
|
40
|
-
"@checkstack/scripts": "0.3.
|
|
41
|
-
"@checkstack/test-utils-backend": "0.1.
|
|
39
|
+
"@checkstack/scripts": "0.3.4",
|
|
40
|
+
"@checkstack/test-utils-backend": "0.1.31",
|
|
42
41
|
"@checkstack/tsconfig": "0.0.7",
|
|
43
42
|
"@types/bun": "^1.0.0",
|
|
44
43
|
"drizzle-kit": "^0.31.10",
|
package/src/hooks.ts
CHANGED
|
@@ -1,4 +1,5 @@
|
|
|
1
1
|
import { createHook } from "@checkstack/backend-api";
|
|
2
|
+
import type { AchievementType } from "@checkstack/slo-common";
|
|
2
3
|
|
|
3
4
|
/**
|
|
4
5
|
* SLO hooks for cross-plugin communication.
|
|
@@ -6,51 +7,12 @@ import { createHook } from "@checkstack/backend-api";
|
|
|
6
7
|
* Registered as integration events so they flow through configured notification channels.
|
|
7
8
|
*/
|
|
8
9
|
export const sloHooks = {
|
|
9
|
-
/**
|
|
10
|
-
* Emitted when an SLO's error budget consumption exceeds the warning threshold.
|
|
11
|
-
*/
|
|
12
|
-
sloBudgetWarning: createHook<{
|
|
13
|
-
systemId: string;
|
|
14
|
-
objectiveId: string;
|
|
15
|
-
target: number;
|
|
16
|
-
budgetRemainingPercent: number;
|
|
17
|
-
}>("slo.budget.warning"),
|
|
18
|
-
|
|
19
|
-
/**
|
|
20
|
-
* Emitted when an SLO's error budget consumption exceeds the critical threshold.
|
|
21
|
-
*/
|
|
22
|
-
sloBudgetCritical: createHook<{
|
|
23
|
-
systemId: string;
|
|
24
|
-
objectiveId: string;
|
|
25
|
-
target: number;
|
|
26
|
-
budgetRemainingPercent: number;
|
|
27
|
-
}>("slo.budget.critical"),
|
|
28
|
-
|
|
29
|
-
/**
|
|
30
|
-
* Emitted when an SLO's error budget is fully exhausted.
|
|
31
|
-
*/
|
|
32
|
-
sloBudgetExhausted: createHook<{
|
|
33
|
-
systemId: string;
|
|
34
|
-
objectiveId: string;
|
|
35
|
-
target: number;
|
|
36
|
-
}>("slo.budget.exhausted"),
|
|
37
|
-
|
|
38
|
-
/**
|
|
39
|
-
* Emitted when a reliability streak is broken.
|
|
40
|
-
*/
|
|
41
|
-
sloStreakBroken: createHook<{
|
|
42
|
-
systemId: string;
|
|
43
|
-
objectiveId: string;
|
|
44
|
-
streak: number;
|
|
45
|
-
bestStreak: number;
|
|
46
|
-
}>("slo.streak.broken"),
|
|
47
|
-
|
|
48
10
|
/**
|
|
49
11
|
* Emitted when a system unlocks a new reliability achievement.
|
|
50
12
|
*/
|
|
51
13
|
sloAchievementUnlocked: createHook<{
|
|
52
14
|
systemId: string;
|
|
53
|
-
achievement:
|
|
15
|
+
achievement: AchievementType;
|
|
54
16
|
}>("slo.achievement.unlocked"),
|
|
55
17
|
|
|
56
18
|
/**
|
package/src/index.ts
CHANGED
|
@@ -7,20 +7,37 @@ import {
|
|
|
7
7
|
pluginMetadata,
|
|
8
8
|
sloContract,
|
|
9
9
|
sloRoutes,
|
|
10
|
+
AchievementTypeSchema,
|
|
10
11
|
} from "@checkstack/slo-common";
|
|
11
12
|
import { createBackendPlugin, coreServices } from "@checkstack/backend-api";
|
|
12
|
-
import {
|
|
13
|
+
import {
|
|
14
|
+
automationTriggerExtensionPoint,
|
|
15
|
+
entityExtensionPoint,
|
|
16
|
+
type EntityHandle,
|
|
17
|
+
} from "@checkstack/automation-backend";
|
|
13
18
|
import { SloService } from "./service";
|
|
14
19
|
import { SloEngine } from "./slo-engine";
|
|
15
20
|
import { createRouter } from "./router";
|
|
16
21
|
import { createSloCache } from "./cache";
|
|
17
22
|
import { DependencyApi } from "@checkstack/dependency-common";
|
|
18
23
|
import { HealthCheckApi } from "@checkstack/healthcheck-common";
|
|
19
|
-
import {
|
|
20
|
-
|
|
24
|
+
import {
|
|
25
|
+
CATALOG_SYSTEM_ENTITY_KIND,
|
|
26
|
+
} from "@checkstack/catalog-backend";
|
|
27
|
+
import {
|
|
28
|
+
HEALTH_ENTITY_KIND,
|
|
29
|
+
classifyHealthChange,
|
|
30
|
+
} from "@checkstack/healthcheck-backend";
|
|
21
31
|
import { registerSearchProvider } from "@checkstack/command-backend";
|
|
22
32
|
import { resolveRoute } from "@checkstack/common";
|
|
23
33
|
import { sloHooks } from "./hooks";
|
|
34
|
+
import {
|
|
35
|
+
SLO_ENTITY_KIND,
|
|
36
|
+
SloEntityStateSchema,
|
|
37
|
+
createSloEntityRead,
|
|
38
|
+
deriveSloTriggerEvents,
|
|
39
|
+
type SloEntityState,
|
|
40
|
+
} from "./slo-entity";
|
|
24
41
|
import { setupDailySnapshotJob } from "./streak-calculator";
|
|
25
42
|
import { setupWeeklyDigestJob } from "./weekly-digest";
|
|
26
43
|
import { evaluateAchievements } from "./achievement-evaluator";
|
|
@@ -31,36 +48,17 @@ import { registerSloGitOpsKinds } from "./slo-gitops-kinds";
|
|
|
31
48
|
// Integration Event Payload Schemas
|
|
32
49
|
// =============================================================================
|
|
33
50
|
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
const sloBudgetCriticalPayloadSchema = z.object({
|
|
42
|
-
systemId: z.string(),
|
|
43
|
-
objectiveId: z.string(),
|
|
44
|
-
target: z.number(),
|
|
45
|
-
budgetRemainingPercent: z.number(),
|
|
46
|
-
});
|
|
47
|
-
|
|
48
|
-
const sloBudgetExhaustedPayloadSchema = z.object({
|
|
49
|
-
systemId: z.string(),
|
|
50
|
-
objectiveId: z.string(),
|
|
51
|
-
target: z.number(),
|
|
52
|
-
});
|
|
53
|
-
|
|
54
|
-
const sloStreakBrokenPayloadSchema = z.object({
|
|
55
|
-
systemId: z.string(),
|
|
56
|
-
objectiveId: z.string(),
|
|
57
|
-
streak: z.number(),
|
|
58
|
-
bestStreak: z.number(),
|
|
59
|
-
});
|
|
51
|
+
// NOTE: The `budget.warning` / `.critical` / `.exhausted` and
|
|
52
|
+
// `streak.broken` trigger payload schemas were removed (§9.2). Those four
|
|
53
|
+
// thresholds are now authored as reactive `numeric_state` conditions over
|
|
54
|
+
// the `slo` entity's `budgetRemainingPercent` / `currentStreak`, not as
|
|
55
|
+
// pre-baked event triggers. The hooks they fronted were never emitted by
|
|
56
|
+
// the engine (inert), so removing the trigger registrations is behavior-
|
|
57
|
+
// preserving.
|
|
60
58
|
|
|
61
59
|
const sloAchievementUnlockedPayloadSchema = z.object({
|
|
62
60
|
systemId: z.string(),
|
|
63
|
-
achievement:
|
|
61
|
+
achievement: AchievementTypeSchema,
|
|
64
62
|
});
|
|
65
63
|
|
|
66
64
|
const sloWeeklyDigestPayloadSchema = z.object({
|
|
@@ -88,82 +86,100 @@ const sloWeeklyDigestPayloadSchema = z.object({
|
|
|
88
86
|
// Plugin Definition
|
|
89
87
|
// =============================================================================
|
|
90
88
|
|
|
89
|
+
// Reactive `slo` entity handle (§10.7). Defined in register() via the
|
|
90
|
+
// entity extension point; mutated from the daily snapshot job onward.
|
|
91
|
+
let sloEntity: EntityHandle<SloEntityState> | undefined;
|
|
92
|
+
|
|
93
|
+
// The SLO service + engine are created in afterPluginsReady (they need the
|
|
94
|
+
// resolved database + RPC clients), but the PLUGIN-BACKED + COMPUTED entity
|
|
95
|
+
// `read` accessor must be supplied at `defineEntity` time in register(). These
|
|
96
|
+
// holders bridge the two: the `read` closure resolves them lazily, and
|
|
97
|
+
// afterPluginsReady sets them before any mutation runs (the daily job — the
|
|
98
|
+
// only mutation site — runs from afterPluginsReady onward).
|
|
99
|
+
let sloEntityServiceRef: SloService | undefined;
|
|
100
|
+
let sloEntityEngineRef: SloEngine | undefined;
|
|
101
|
+
|
|
91
102
|
export default createBackendPlugin({
|
|
92
103
|
metadata: pluginMetadata,
|
|
93
104
|
register(env) {
|
|
94
105
|
env.registerAccessRules(sloAccessRules);
|
|
95
106
|
|
|
96
|
-
// Register hooks as
|
|
97
|
-
const
|
|
98
|
-
|
|
107
|
+
// Register hooks as automation triggers
|
|
108
|
+
const automationTriggers = env.getExtensionPoint(
|
|
109
|
+
automationTriggerExtensionPoint,
|
|
99
110
|
);
|
|
100
111
|
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
112
|
+
// ─── Reactive `slo` entity (§10.7, §9.2) ───────────────────────────
|
|
113
|
+
// The SLO budget IS the entity. The former `budget.warning/.critical/
|
|
114
|
+
// .exhausted` + `streak.broken` triggers are removed — those thresholds
|
|
115
|
+
// are now authored as `numeric_state` conditions over
|
|
116
|
+
// `state.slo.<objectiveId>.budgetRemainingPercent` / `currentStreak`.
|
|
117
|
+
// The deriver fires no legacy events; it exists so `slo` is a known
|
|
118
|
+
// reactive kind (scope + wake resolution).
|
|
119
|
+
//
|
|
120
|
+
// PLUGIN-BACKED + COMPUTED (Model B): there is NO framework `entity_state`
|
|
121
|
+
// row. `read` assembles each objective's view by reading `slo_streaks` +
|
|
122
|
+
// `slo_objectives` and COMPUTING `budgetRemainingPercent` via the engine
|
|
123
|
+
// (see `createSloEntityRead`). No `indexes` — those only apply to
|
|
124
|
+
// store-backed kinds, and a plugin-backed kind keeps its state in its own
|
|
125
|
+
// tables. The `read` closure resolves the service + engine set by
|
|
126
|
+
// afterPluginsReady (the daily job is the only mutation site).
|
|
127
|
+
const entityPoint = env.getExtensionPoint(entityExtensionPoint);
|
|
128
|
+
sloEntity = entityPoint.defineEntity<SloEntityState>({
|
|
129
|
+
kind: SLO_ENTITY_KIND,
|
|
130
|
+
state: SloEntityStateSchema,
|
|
131
|
+
read: (ids) => {
|
|
132
|
+
const service = sloEntityServiceRef;
|
|
133
|
+
const engine = sloEntityEngineRef;
|
|
134
|
+
if (!service || !engine) {
|
|
135
|
+
throw new Error(
|
|
136
|
+
"slo entity read before init: service/engine not yet resolved",
|
|
137
|
+
);
|
|
138
|
+
}
|
|
139
|
+
return createSloEntityRead({ service, engine })(ids);
|
|
109
140
|
},
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
displayName: "SLO Budget Exhausted",
|
|
129
|
-
description: "Fired when an SLO error budget is fully consumed",
|
|
130
|
-
category: "SLO",
|
|
131
|
-
payloadSchema: sloBudgetExhaustedPayloadSchema,
|
|
132
|
-
},
|
|
133
|
-
pluginMetadata,
|
|
134
|
-
);
|
|
135
|
-
|
|
136
|
-
integrationEvents.registerEvent(
|
|
137
|
-
{
|
|
138
|
-
hook: sloHooks.sloStreakBroken,
|
|
139
|
-
displayName: "SLO Streak Broken",
|
|
140
|
-
description: "Fired when a reliability streak is broken",
|
|
141
|
-
category: "SLO",
|
|
142
|
-
payloadSchema: sloStreakBrokenPayloadSchema,
|
|
143
|
-
},
|
|
144
|
-
pluginMetadata,
|
|
145
|
-
);
|
|
141
|
+
});
|
|
142
|
+
entityPoint.registerChangeDeriver({
|
|
143
|
+
kind: SLO_ENTITY_KIND,
|
|
144
|
+
derive: deriveSloTriggerEvents,
|
|
145
|
+
});
|
|
146
|
+
// Event-sourced history is NOT the live entity (§5): downtime events +
|
|
147
|
+
// daily snapshots are append-only records, the budget/streak is the
|
|
148
|
+
// reactive entity.
|
|
149
|
+
entityPoint.declareNonReactiveState({
|
|
150
|
+
table: "slo_downtime_events",
|
|
151
|
+
reason: "bookkeeping",
|
|
152
|
+
note: "Append-only downtime history. The live budget/streak is the `slo` entity.",
|
|
153
|
+
});
|
|
154
|
+
entityPoint.declareNonReactiveState({
|
|
155
|
+
table: "slo_daily_snapshots",
|
|
156
|
+
reason: "bookkeeping",
|
|
157
|
+
note: "Append-only daily trend snapshots. The live budget/streak is the `slo` entity.",
|
|
158
|
+
});
|
|
146
159
|
|
|
147
|
-
|
|
160
|
+
automationTriggers.registerTrigger(
|
|
148
161
|
{
|
|
149
|
-
|
|
162
|
+
id: "achievement.unlocked",
|
|
150
163
|
displayName: "SLO Achievement Unlocked",
|
|
151
164
|
description:
|
|
152
165
|
"Fired when a system unlocks a new reliability achievement",
|
|
153
166
|
category: "SLO",
|
|
154
167
|
payloadSchema: sloAchievementUnlockedPayloadSchema,
|
|
168
|
+
hook: sloHooks.sloAchievementUnlocked,
|
|
169
|
+
contextKey: (p) => p.systemId,
|
|
155
170
|
},
|
|
156
171
|
pluginMetadata,
|
|
157
172
|
);
|
|
158
173
|
|
|
159
|
-
|
|
174
|
+
automationTriggers.registerTrigger(
|
|
160
175
|
{
|
|
161
|
-
|
|
176
|
+
id: "weekly.digest",
|
|
162
177
|
displayName: "SLO Weekly Digest",
|
|
163
178
|
description:
|
|
164
179
|
"Weekly summary of SLO performance across all systems (Monday 09:00 UTC)",
|
|
165
180
|
category: "SLO",
|
|
166
181
|
payloadSchema: sloWeeklyDigestPayloadSchema,
|
|
182
|
+
hook: sloHooks.sloWeeklyDigest,
|
|
167
183
|
},
|
|
168
184
|
pluginMetadata,
|
|
169
185
|
);
|
|
@@ -171,6 +187,8 @@ export default createBackendPlugin({
|
|
|
171
187
|
// Shared references across init/afterPluginsReady (maintenance-backend pattern)
|
|
172
188
|
let sharedEngine: SloEngine;
|
|
173
189
|
let gitopsService: SloService | undefined;
|
|
190
|
+
// Reactive `slo` entity handle (§10.7), defined just above in register().
|
|
191
|
+
const onEntityChanged = entityPoint.onEntityChanged;
|
|
174
192
|
|
|
175
193
|
// ─── GitOps Entity Kind Registration ─────────────────────────────
|
|
176
194
|
const kindRegistry = env.getExtensionPoint(entityKindExtensionPoint);
|
|
@@ -252,7 +270,6 @@ export default createBackendPlugin({
|
|
|
252
270
|
afterPluginsReady: async ({
|
|
253
271
|
database,
|
|
254
272
|
logger,
|
|
255
|
-
onHook,
|
|
256
273
|
emitHook,
|
|
257
274
|
rpcClient,
|
|
258
275
|
signalService,
|
|
@@ -265,6 +282,12 @@ export default createBackendPlugin({
|
|
|
265
282
|
signalService,
|
|
266
283
|
logger,
|
|
267
284
|
});
|
|
285
|
+
// Publish the service + engine for the PLUGIN-BACKED + COMPUTED entity
|
|
286
|
+
// `read` accessor (defined in register()). The daily snapshot job — the
|
|
287
|
+
// only `slo` mutation site — runs from here onward, so the refs are set
|
|
288
|
+
// before any `read`/`mutate` can fire.
|
|
289
|
+
sloEntityServiceRef = service;
|
|
290
|
+
sloEntityEngineRef = engine;
|
|
268
291
|
|
|
269
292
|
const dependencyClient = rpcClient.forPlugin(DependencyApi);
|
|
270
293
|
const healthCheckClient = rpcClient.forPlugin(HealthCheckApi);
|
|
@@ -333,41 +356,52 @@ export default createBackendPlugin({
|
|
|
333
356
|
}
|
|
334
357
|
};
|
|
335
358
|
|
|
359
|
+
// Cross-plugin consumers now react to the reactive `health` /
|
|
360
|
+
// `catalog-system` ENTITY changes via `onEntityChanged` instead of
|
|
361
|
+
// the (being-removed) directional hooks (§10.7). `classifyHealthChange`
|
|
362
|
+
// reproduces the exact degraded/recovered transition predicate the
|
|
363
|
+
// old `systemDegraded` / `systemHealthy` hooks fired on. Each
|
|
364
|
+
// consumer keeps `work-queue` delivery with its original
|
|
365
|
+
// `workerGroup`: these are side-effecting writes (open/close downtime,
|
|
366
|
+
// achievements, cleanup) that must run exactly once per cluster — not
|
|
367
|
+
// per-instance — so broadcast would double-apply them.
|
|
368
|
+
|
|
336
369
|
// =====================================================================
|
|
337
370
|
// Perspective 1: System goes DOWN — open downtime events
|
|
338
371
|
// =====================================================================
|
|
339
|
-
|
|
340
|
-
|
|
341
|
-
async (
|
|
372
|
+
onEntityChanged({
|
|
373
|
+
kind: HEALTH_ENTITY_KIND,
|
|
374
|
+
handler: async (change) => {
|
|
375
|
+
const { systemId, degraded, previousStatus, newStatus } =
|
|
376
|
+
classifyHealthChange(change);
|
|
377
|
+
if (!degraded) return;
|
|
342
378
|
logger.debug(
|
|
343
|
-
`SLO: System ${
|
|
379
|
+
`SLO: System ${systemId} degraded (${previousStatus} → ${newStatus})`,
|
|
344
380
|
);
|
|
345
381
|
await engine.handleSystemDown({
|
|
346
|
-
systemId
|
|
382
|
+
systemId,
|
|
347
383
|
getUpstreamHealthStatus,
|
|
348
384
|
});
|
|
349
385
|
},
|
|
350
|
-
{ mode: "work-queue", workerGroup: "slo-system-down" },
|
|
351
|
-
);
|
|
386
|
+
delivery: { mode: "work-queue", workerGroup: "slo-system-down" },
|
|
387
|
+
});
|
|
352
388
|
|
|
353
389
|
// =====================================================================
|
|
354
390
|
// Perspective 1: System goes UP — close downtime events
|
|
355
391
|
// =====================================================================
|
|
356
|
-
|
|
357
|
-
|
|
358
|
-
async (
|
|
359
|
-
|
|
360
|
-
|
|
361
|
-
|
|
362
|
-
});
|
|
392
|
+
onEntityChanged({
|
|
393
|
+
kind: HEALTH_ENTITY_KIND,
|
|
394
|
+
handler: async (change) => {
|
|
395
|
+
const { systemId, recovered } = classifyHealthChange(change);
|
|
396
|
+
if (!recovered) return;
|
|
397
|
+
logger.debug(`SLO: System ${systemId} recovered`);
|
|
398
|
+
await engine.handleSystemUp({ systemId });
|
|
363
399
|
|
|
364
400
|
// Also handle Perspective 2 (as upstream)
|
|
365
|
-
const downstreamIds = await getDownstreamSystemIds(
|
|
366
|
-
payload.systemId,
|
|
367
|
-
);
|
|
401
|
+
const downstreamIds = await getDownstreamSystemIds(systemId);
|
|
368
402
|
if (downstreamIds.length > 0) {
|
|
369
403
|
await engine.handleUpstreamUp({
|
|
370
|
-
upstreamSystemId:
|
|
404
|
+
upstreamSystemId: systemId,
|
|
371
405
|
downstreamSystemIds: downstreamIds,
|
|
372
406
|
getUpstreamHealthStatus,
|
|
373
407
|
});
|
|
@@ -375,54 +409,53 @@ export default createBackendPlugin({
|
|
|
375
409
|
|
|
376
410
|
// Evaluate achievements on recovery (rapid_recovery, clean_sheet, etc.)
|
|
377
411
|
await evaluateAchievements({
|
|
378
|
-
systemId
|
|
412
|
+
systemId,
|
|
379
413
|
service,
|
|
380
414
|
engine,
|
|
381
415
|
logger,
|
|
382
416
|
});
|
|
383
417
|
},
|
|
384
|
-
{ mode: "work-queue", workerGroup: "slo-system-up" },
|
|
385
|
-
);
|
|
418
|
+
delivery: { mode: "work-queue", workerGroup: "slo-system-up" },
|
|
419
|
+
});
|
|
386
420
|
|
|
387
421
|
// =====================================================================
|
|
388
422
|
// Perspective 2: Upstream degraded — split downstream "self" events
|
|
389
|
-
// We re-use the
|
|
423
|
+
// We re-use the degraded transition, checking downstream systems
|
|
390
424
|
// =====================================================================
|
|
391
|
-
|
|
392
|
-
|
|
393
|
-
async (
|
|
394
|
-
const
|
|
395
|
-
|
|
396
|
-
);
|
|
425
|
+
onEntityChanged({
|
|
426
|
+
kind: HEALTH_ENTITY_KIND,
|
|
427
|
+
handler: async (change) => {
|
|
428
|
+
const { systemId, degraded } = classifyHealthChange(change);
|
|
429
|
+
if (!degraded) return;
|
|
430
|
+
const downstreamIds = await getDownstreamSystemIds(systemId);
|
|
397
431
|
if (downstreamIds.length > 0) {
|
|
398
432
|
await engine.handleUpstreamDown({
|
|
399
|
-
upstreamSystemId:
|
|
400
|
-
upstreamSystemName:
|
|
433
|
+
upstreamSystemId: systemId,
|
|
434
|
+
upstreamSystemName: systemId,
|
|
401
435
|
downstreamSystemIds: downstreamIds,
|
|
402
436
|
});
|
|
403
437
|
}
|
|
404
438
|
},
|
|
405
|
-
{ mode: "work-queue", workerGroup: "slo-upstream-down" },
|
|
406
|
-
);
|
|
439
|
+
delivery: { mode: "work-queue", workerGroup: "slo-upstream-down" },
|
|
440
|
+
});
|
|
407
441
|
|
|
408
442
|
// =====================================================================
|
|
409
|
-
// Subscribe to catalog system deletion for cleanup
|
|
443
|
+
// Subscribe to catalog system deletion (tombstone) for cleanup
|
|
410
444
|
// =====================================================================
|
|
411
|
-
|
|
412
|
-
|
|
413
|
-
async (
|
|
445
|
+
onEntityChanged({
|
|
446
|
+
kind: CATALOG_SYSTEM_ENTITY_KIND,
|
|
447
|
+
handler: async (change) => {
|
|
448
|
+
// Only react to a tombstone (delete), not create/update.
|
|
449
|
+
if (change.next !== null) return;
|
|
450
|
+
const systemId = change.id;
|
|
414
451
|
logger.debug(
|
|
415
|
-
`Cleaning up SLO data for deleted system: ${
|
|
452
|
+
`Cleaning up SLO data for deleted system: ${systemId}`,
|
|
416
453
|
);
|
|
417
|
-
await service.deleteObjectivesForSystem({
|
|
418
|
-
|
|
419
|
-
});
|
|
420
|
-
await service.deleteAchievementsForSystem({
|
|
421
|
-
systemId: payload.systemId,
|
|
422
|
-
});
|
|
454
|
+
await service.deleteObjectivesForSystem({ systemId });
|
|
455
|
+
await service.deleteAchievementsForSystem({ systemId });
|
|
423
456
|
},
|
|
424
|
-
{ mode: "work-queue", workerGroup: "slo-system-cleanup" },
|
|
425
|
-
);
|
|
457
|
+
delivery: { mode: "work-queue", workerGroup: "slo-system-cleanup" },
|
|
458
|
+
});
|
|
426
459
|
|
|
427
460
|
// =====================================================================
|
|
428
461
|
// Daily snapshot + streak calculation cron job
|
|
@@ -432,6 +465,7 @@ export default createBackendPlugin({
|
|
|
432
465
|
engine,
|
|
433
466
|
logger,
|
|
434
467
|
queueManager,
|
|
468
|
+
getSloEntity: () => sloEntity,
|
|
435
469
|
});
|
|
436
470
|
|
|
437
471
|
// =====================================================================
|
|
@@ -0,0 +1,255 @@
|
|
|
1
|
+
import { describe, it, expect } from "bun:test";
|
|
2
|
+
import type { EntityHandle } from "@checkstack/automation-backend";
|
|
3
|
+
|
|
4
|
+
import {
|
|
5
|
+
SLO_ENTITY_KIND,
|
|
6
|
+
SloEntityStateSchema,
|
|
7
|
+
computeSloEntityState,
|
|
8
|
+
createSloEntityRead,
|
|
9
|
+
deriveSloTriggerEvents,
|
|
10
|
+
writeSloEntity,
|
|
11
|
+
type SloEntityState,
|
|
12
|
+
} from "./slo-entity";
|
|
13
|
+
import type { SloService } from "./service";
|
|
14
|
+
import type { SloEngine } from "./slo-engine";
|
|
15
|
+
|
|
16
|
+
describe("deriveSloTriggerEvents", () => {
|
|
17
|
+
it("fires no legacy trigger events (thresholds are numeric_state conditions, §9.2)", () => {
|
|
18
|
+
expect(
|
|
19
|
+
deriveSloTriggerEvents({
|
|
20
|
+
kind: SLO_ENTITY_KIND,
|
|
21
|
+
id: "obj-1",
|
|
22
|
+
prev: null,
|
|
23
|
+
next: {
|
|
24
|
+
objectiveId: "obj-1",
|
|
25
|
+
systemId: "sys-1",
|
|
26
|
+
target: 99.9,
|
|
27
|
+
budgetRemainingPercent: 10,
|
|
28
|
+
currentStreak: 0,
|
|
29
|
+
bestStreak: 5,
|
|
30
|
+
},
|
|
31
|
+
delta: {},
|
|
32
|
+
changedFields: [],
|
|
33
|
+
actor: { type: "system", id: "system" },
|
|
34
|
+
occurredAt: new Date().toISOString(),
|
|
35
|
+
}),
|
|
36
|
+
).toEqual([]);
|
|
37
|
+
});
|
|
38
|
+
});
|
|
39
|
+
|
|
40
|
+
describe("SloEntityStateSchema", () => {
|
|
41
|
+
it("parses the reactive subset", () => {
|
|
42
|
+
const parsed = SloEntityStateSchema.parse({
|
|
43
|
+
objectiveId: "o",
|
|
44
|
+
systemId: "s",
|
|
45
|
+
target: 99.5,
|
|
46
|
+
budgetRemainingPercent: 42,
|
|
47
|
+
currentStreak: 3,
|
|
48
|
+
bestStreak: 9,
|
|
49
|
+
});
|
|
50
|
+
expect(parsed.budgetRemainingPercent).toBe(42);
|
|
51
|
+
});
|
|
52
|
+
});
|
|
53
|
+
|
|
54
|
+
// ─── Fakes ──────────────────────────────────────────────────────────────
|
|
55
|
+
|
|
56
|
+
function makeService(over: {
|
|
57
|
+
objective?: { id: string; systemId: string; target: number } | undefined;
|
|
58
|
+
streak?: { currentStreak: number; bestStreak: number } | undefined;
|
|
59
|
+
}): SloService {
|
|
60
|
+
return {
|
|
61
|
+
async getObjective() {
|
|
62
|
+
return over.objective;
|
|
63
|
+
},
|
|
64
|
+
async getStreak() {
|
|
65
|
+
return over.streak;
|
|
66
|
+
},
|
|
67
|
+
} as unknown as SloService;
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
function makeEngine(budgetRemainingPercent: number): SloEngine {
|
|
71
|
+
return {
|
|
72
|
+
async computeStatus() {
|
|
73
|
+
return { errorBudgetRemainingPercent: budgetRemainingPercent };
|
|
74
|
+
},
|
|
75
|
+
} as unknown as SloEngine;
|
|
76
|
+
}
|
|
77
|
+
|
|
78
|
+
describe("computeSloEntityState", () => {
|
|
79
|
+
it("assembles the view by reading streak/objective + COMPUTING budget", async () => {
|
|
80
|
+
const service = makeService({
|
|
81
|
+
objective: { id: "obj-1", systemId: "sys-1", target: 99.9 },
|
|
82
|
+
streak: { currentStreak: 4, bestStreak: 12 },
|
|
83
|
+
});
|
|
84
|
+
const engine = makeEngine(20);
|
|
85
|
+
const state = await computeSloEntityState({
|
|
86
|
+
service,
|
|
87
|
+
engine,
|
|
88
|
+
objectiveId: "obj-1",
|
|
89
|
+
});
|
|
90
|
+
expect(state).toEqual({
|
|
91
|
+
objectiveId: "obj-1",
|
|
92
|
+
systemId: "sys-1",
|
|
93
|
+
target: 99.9,
|
|
94
|
+
budgetRemainingPercent: 20,
|
|
95
|
+
currentStreak: 4,
|
|
96
|
+
bestStreak: 12,
|
|
97
|
+
});
|
|
98
|
+
});
|
|
99
|
+
|
|
100
|
+
it("defaults missing streak counters to 0", async () => {
|
|
101
|
+
const service = makeService({
|
|
102
|
+
objective: { id: "obj-2", systemId: "sys-2", target: 99 },
|
|
103
|
+
streak: undefined,
|
|
104
|
+
});
|
|
105
|
+
const state = await computeSloEntityState({
|
|
106
|
+
service,
|
|
107
|
+
engine: makeEngine(100),
|
|
108
|
+
objectiveId: "obj-2",
|
|
109
|
+
});
|
|
110
|
+
expect(state?.currentStreak).toBe(0);
|
|
111
|
+
expect(state?.bestStreak).toBe(0);
|
|
112
|
+
});
|
|
113
|
+
|
|
114
|
+
it("returns undefined when the objective no longer exists", async () => {
|
|
115
|
+
const service = makeService({ objective: undefined });
|
|
116
|
+
const state = await computeSloEntityState({
|
|
117
|
+
service,
|
|
118
|
+
engine: makeEngine(50),
|
|
119
|
+
objectiveId: "gone",
|
|
120
|
+
});
|
|
121
|
+
expect(state).toBeUndefined();
|
|
122
|
+
});
|
|
123
|
+
});
|
|
124
|
+
|
|
125
|
+
describe("createSloEntityRead", () => {
|
|
126
|
+
it("computes the view per id and omits missing objectives", async () => {
|
|
127
|
+
const service = {
|
|
128
|
+
async getObjective({ id }: { id: string }) {
|
|
129
|
+
if (id === "obj-1") return { id, systemId: "sys-1", target: 99.9 };
|
|
130
|
+
return undefined;
|
|
131
|
+
},
|
|
132
|
+
async getStreak() {
|
|
133
|
+
return { currentStreak: 2, bestStreak: 7 };
|
|
134
|
+
},
|
|
135
|
+
} as unknown as SloService;
|
|
136
|
+
const read = createSloEntityRead({ service, engine: makeEngine(33) });
|
|
137
|
+
const out = await read(["obj-1", "missing"]);
|
|
138
|
+
expect(Object.keys(out)).toEqual(["obj-1"]);
|
|
139
|
+
expect(out["obj-1"]).toEqual({
|
|
140
|
+
objectiveId: "obj-1",
|
|
141
|
+
systemId: "sys-1",
|
|
142
|
+
target: 99.9,
|
|
143
|
+
budgetRemainingPercent: 33,
|
|
144
|
+
currentStreak: 2,
|
|
145
|
+
bestStreak: 7,
|
|
146
|
+
});
|
|
147
|
+
});
|
|
148
|
+
|
|
149
|
+
it("returns {} for an empty id list without touching the service", async () => {
|
|
150
|
+
let called = false;
|
|
151
|
+
const service = {
|
|
152
|
+
async getObjective() {
|
|
153
|
+
called = true;
|
|
154
|
+
return undefined;
|
|
155
|
+
},
|
|
156
|
+
} as unknown as SloService;
|
|
157
|
+
const read = createSloEntityRead({ service, engine: makeEngine(0) });
|
|
158
|
+
expect(await read([])).toEqual({});
|
|
159
|
+
expect(called).toBe(false);
|
|
160
|
+
});
|
|
161
|
+
});
|
|
162
|
+
|
|
163
|
+
describe("writeSloEntity", () => {
|
|
164
|
+
it("drives the streak write through handle.mutate keyed by objectiveId", async () => {
|
|
165
|
+
const calls: Array<{ id: string; next: SloEntityState }> = [];
|
|
166
|
+
const handle = {
|
|
167
|
+
kind: SLO_ENTITY_KIND,
|
|
168
|
+
async mutate(input: {
|
|
169
|
+
id: string;
|
|
170
|
+
apply: () => Promise<SloEntityState>;
|
|
171
|
+
}) {
|
|
172
|
+
const next = await input.apply();
|
|
173
|
+
calls.push({ id: input.id, next });
|
|
174
|
+
return next;
|
|
175
|
+
},
|
|
176
|
+
} as unknown as EntityHandle<SloEntityState>;
|
|
177
|
+
|
|
178
|
+
let applied = false;
|
|
179
|
+
await writeSloEntity({
|
|
180
|
+
handle,
|
|
181
|
+
objectiveId: "obj-7",
|
|
182
|
+
apply: async () => {
|
|
183
|
+
applied = true;
|
|
184
|
+
return {
|
|
185
|
+
objectiveId: "obj-7",
|
|
186
|
+
systemId: "sys-7",
|
|
187
|
+
target: 99.9,
|
|
188
|
+
budgetRemainingPercent: 20,
|
|
189
|
+
currentStreak: 4,
|
|
190
|
+
bestStreak: 12,
|
|
191
|
+
};
|
|
192
|
+
},
|
|
193
|
+
});
|
|
194
|
+
expect(applied).toBe(true);
|
|
195
|
+
expect(calls).toEqual([
|
|
196
|
+
{
|
|
197
|
+
id: "obj-7",
|
|
198
|
+
next: {
|
|
199
|
+
objectiveId: "obj-7",
|
|
200
|
+
systemId: "sys-7",
|
|
201
|
+
target: 99.9,
|
|
202
|
+
budgetRemainingPercent: 20,
|
|
203
|
+
currentStreak: 4,
|
|
204
|
+
bestStreak: 12,
|
|
205
|
+
},
|
|
206
|
+
},
|
|
207
|
+
]);
|
|
208
|
+
});
|
|
209
|
+
|
|
210
|
+
it("still runs the streak write when no handle is wired", async () => {
|
|
211
|
+
let applied = false;
|
|
212
|
+
await writeSloEntity({
|
|
213
|
+
handle: undefined,
|
|
214
|
+
objectiveId: "x",
|
|
215
|
+
apply: async () => {
|
|
216
|
+
applied = true;
|
|
217
|
+
return {
|
|
218
|
+
objectiveId: "x",
|
|
219
|
+
systemId: "x",
|
|
220
|
+
target: 1,
|
|
221
|
+
budgetRemainingPercent: 1,
|
|
222
|
+
currentStreak: 0,
|
|
223
|
+
bestStreak: 0,
|
|
224
|
+
};
|
|
225
|
+
},
|
|
226
|
+
});
|
|
227
|
+
expect(applied).toBe(true);
|
|
228
|
+
});
|
|
229
|
+
|
|
230
|
+
it("routes entity-layer errors to onError (fail-soft) without rethrowing", async () => {
|
|
231
|
+
let captured: unknown;
|
|
232
|
+
const handle = {
|
|
233
|
+
kind: SLO_ENTITY_KIND,
|
|
234
|
+
async mutate() {
|
|
235
|
+
throw new Error("nope");
|
|
236
|
+
},
|
|
237
|
+
} as unknown as EntityHandle<SloEntityState>;
|
|
238
|
+
await writeSloEntity({
|
|
239
|
+
handle,
|
|
240
|
+
objectiveId: "x",
|
|
241
|
+
apply: async () => ({
|
|
242
|
+
objectiveId: "x",
|
|
243
|
+
systemId: "x",
|
|
244
|
+
target: 1,
|
|
245
|
+
budgetRemainingPercent: 1,
|
|
246
|
+
currentStreak: 0,
|
|
247
|
+
bestStreak: 0,
|
|
248
|
+
}),
|
|
249
|
+
onError: (e) => {
|
|
250
|
+
captured = e;
|
|
251
|
+
},
|
|
252
|
+
});
|
|
253
|
+
expect((captured as Error).message).toBe("nope");
|
|
254
|
+
});
|
|
255
|
+
});
|
|
@@ -0,0 +1,162 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* The reactive `slo` entity (reactive automation engine §10.7, §9.2).
|
|
3
|
+
*
|
|
4
|
+
* Model B PLUGIN-BACKED + COMPUTED entity. There is NO framework
|
|
5
|
+
* `entity_state` row for an SLO. The current state is assembled on demand by
|
|
6
|
+
* the `read` accessor from two sources:
|
|
7
|
+
*
|
|
8
|
+
* - `slo_streaks` + `slo_objectives` (authoritative tables) supply
|
|
9
|
+
* `currentStreak` / `bestStreak` / `systemId` / `target`, and
|
|
10
|
+
* - the SLO engine COMPUTES `budgetRemainingPercent` (and re-surfaces
|
|
11
|
+
* `target`) on the fly via `computeStatus` (downtime aggregation over the
|
|
12
|
+
* objective's window).
|
|
13
|
+
*
|
|
14
|
+
* The streak-persist site (the daily snapshot job) drives its write through
|
|
15
|
+
* `handle.mutate({ id: objectiveId, apply })`: `apply` persists the streak to
|
|
16
|
+
* `slo_streaks` (the plugin's own write) and returns the freshly-computed
|
|
17
|
+
* view. The framework snapshots `prev` via `read` BEFORE the write, appends
|
|
18
|
+
* the transition log, and emits `ENTITY_CHANGED`.
|
|
19
|
+
*
|
|
20
|
+
* Per §9.2 the SLO budget IS the entity, and the four removed threshold hooks
|
|
21
|
+
* (`budget.warning/critical/exhausted`, `streak.broken`) become derived
|
|
22
|
+
* `numeric_state` / `state` conditions over
|
|
23
|
+
* `state.slo.<objectiveId>.budgetRemainingPercent` + `currentStreak`. The
|
|
24
|
+
* change deriver therefore emits NO legacy trigger events — operators author
|
|
25
|
+
* thresholds as reactive conditions, not pre-baked event triggers.
|
|
26
|
+
*/
|
|
27
|
+
import { z } from "zod";
|
|
28
|
+
import type {
|
|
29
|
+
EntityChangeDeriver,
|
|
30
|
+
EntityHandle,
|
|
31
|
+
EntityMutationOpts,
|
|
32
|
+
EntityRead,
|
|
33
|
+
} from "@checkstack/automation-backend";
|
|
34
|
+
import { withEntityWrite } from "@checkstack/automation-backend";
|
|
35
|
+
|
|
36
|
+
import type { SloService } from "./service";
|
|
37
|
+
import type { SloEngine } from "./slo-engine";
|
|
38
|
+
|
|
39
|
+
export const SLO_ENTITY_KIND = "slo";
|
|
40
|
+
|
|
41
|
+
export const SloEntityStateSchema = z.object({
|
|
42
|
+
objectiveId: z.string(),
|
|
43
|
+
systemId: z.string(),
|
|
44
|
+
target: z.number(),
|
|
45
|
+
budgetRemainingPercent: z.number(),
|
|
46
|
+
currentStreak: z.number().int().nonnegative(),
|
|
47
|
+
bestStreak: z.number().int().nonnegative(),
|
|
48
|
+
});
|
|
49
|
+
|
|
50
|
+
export type SloEntityState = z.infer<typeof SloEntityStateSchema>;
|
|
51
|
+
|
|
52
|
+
/**
|
|
53
|
+
* SLO change → trigger events. Intentionally empty: the threshold/streak
|
|
54
|
+
* hooks were removed (§9.2) and replaced by `numeric_state` / `state`
|
|
55
|
+
* conditions over the entity state, so a change fires no legacy event. The
|
|
56
|
+
* deriver is still registered so the kind is a known reactive kind (its
|
|
57
|
+
* state is resolvable into automation scope for those conditions + wakes
|
|
58
|
+
* suspended `wait_until`s whose condition reads `state.slo.*`).
|
|
59
|
+
*/
|
|
60
|
+
export const deriveSloTriggerEvents: EntityChangeDeriver = () => [];
|
|
61
|
+
|
|
62
|
+
/**
|
|
63
|
+
* Compute the reactive `slo` view for a single objective: read the objective
|
|
64
|
+
* config + streak, compute the error-budget remaining via the engine, and
|
|
65
|
+
* assemble the `{ objectiveId, systemId, target, budgetRemainingPercent,
|
|
66
|
+
* currentStreak, bestStreak }` subset. Returns `undefined` when the objective
|
|
67
|
+
* no longer exists (missing ids are omitted from the batched `read`).
|
|
68
|
+
*
|
|
69
|
+
* Compute-on-read (not materialized): the budget is a pure function of the
|
|
70
|
+
* objective's append-only downtime history over its rolling window. Storing a
|
|
71
|
+
* second copy would duplicate the engine's source of truth and risk drift; a
|
|
72
|
+
* read recomputes from the same tables the API already reads. See the change
|
|
73
|
+
* doc for the cost assessment.
|
|
74
|
+
*/
|
|
75
|
+
export async function computeSloEntityState(args: {
|
|
76
|
+
service: SloService;
|
|
77
|
+
engine: SloEngine;
|
|
78
|
+
objectiveId: string;
|
|
79
|
+
}): Promise<SloEntityState | undefined> {
|
|
80
|
+
const { service, engine, objectiveId } = args;
|
|
81
|
+
const objective = await service.getObjective({ id: objectiveId });
|
|
82
|
+
if (!objective) return undefined;
|
|
83
|
+
|
|
84
|
+
const [status, streak] = await Promise.all([
|
|
85
|
+
engine.computeStatus({ objective }),
|
|
86
|
+
service.getStreak({ objectiveId }),
|
|
87
|
+
]);
|
|
88
|
+
|
|
89
|
+
return {
|
|
90
|
+
objectiveId,
|
|
91
|
+
systemId: objective.systemId,
|
|
92
|
+
target: objective.target,
|
|
93
|
+
budgetRemainingPercent: status.errorBudgetRemainingPercent,
|
|
94
|
+
currentStreak: streak?.currentStreak ?? 0,
|
|
95
|
+
bestStreak: streak?.bestStreak ?? 0,
|
|
96
|
+
};
|
|
97
|
+
}
|
|
98
|
+
|
|
99
|
+
/**
|
|
100
|
+
* Build the PLUGIN-BACKED + COMPUTED `read` accessor for the `slo` entity.
|
|
101
|
+
* For each objective id, assembles the view via {@link computeSloEntityState}
|
|
102
|
+
* (missing objectives omitted). This is the single source of truth that
|
|
103
|
+
* `handle.mutate` snapshots `prev` from and `get`/`getMany`/scope enrichment
|
|
104
|
+
* route through — no framework `entity_state` storage.
|
|
105
|
+
*/
|
|
106
|
+
export function createSloEntityRead(deps: {
|
|
107
|
+
service: SloService;
|
|
108
|
+
engine: SloEngine;
|
|
109
|
+
}): EntityRead<SloEntityState> {
|
|
110
|
+
const { service, engine } = deps;
|
|
111
|
+
return async (ids) => {
|
|
112
|
+
if (ids.length === 0) return {};
|
|
113
|
+
const out: Record<string, SloEntityState> = {};
|
|
114
|
+
await Promise.all(
|
|
115
|
+
ids.map(async (objectiveId) => {
|
|
116
|
+
const state = await computeSloEntityState({
|
|
117
|
+
service,
|
|
118
|
+
engine,
|
|
119
|
+
objectiveId,
|
|
120
|
+
});
|
|
121
|
+
if (state) out[objectiveId] = state;
|
|
122
|
+
}),
|
|
123
|
+
);
|
|
124
|
+
return out;
|
|
125
|
+
};
|
|
126
|
+
}
|
|
127
|
+
|
|
128
|
+
/**
|
|
129
|
+
* Drive the streak-persist write through `handle.mutate` (§10.7). `apply`
|
|
130
|
+
* performs the REAL `slo_streaks` write (the plugin's own db/tx) and returns
|
|
131
|
+
* the freshly-computed `slo` view (budget recomputed + post-write streak).
|
|
132
|
+
* The framework snapshots `prev` via `read` BEFORE the write, appends the
|
|
133
|
+
* transition log, and emits `ENTITY_CHANGED`. No-op (no emit) when the
|
|
134
|
+
* recomputed view is structurally equal to `prev`.
|
|
135
|
+
*
|
|
136
|
+
* When no handle is available (tests / before wiring), the write still runs
|
|
137
|
+
* — the entity reactivity is layered on top, never required for the streak
|
|
138
|
+
* write to succeed. Errors from the entity layer are routed to `onError` so a
|
|
139
|
+
* mirror/transition failure never breaks the daily job.
|
|
140
|
+
*/
|
|
141
|
+
export async function writeSloEntity(args: {
|
|
142
|
+
handle: EntityHandle<SloEntityState> | undefined;
|
|
143
|
+
objectiveId: string;
|
|
144
|
+
opts?: EntityMutationOpts;
|
|
145
|
+
apply: () => Promise<SloEntityState>;
|
|
146
|
+
onError?: (error: unknown) => void;
|
|
147
|
+
}): Promise<void> {
|
|
148
|
+
const { handle, objectiveId, opts, apply, onError } = args;
|
|
149
|
+
if (!handle) {
|
|
150
|
+
await apply();
|
|
151
|
+
return;
|
|
152
|
+
}
|
|
153
|
+
// A wired handle routes through the shared guard; the daily-job caller wants
|
|
154
|
+
// an entity-layer (mirror/transition) failure to be fail-soft so it never
|
|
155
|
+
// breaks the streak persist, so errors are routed to `onError` rather than
|
|
156
|
+
// rethrown (the bespoke SLO behavior the shared guard does not encode).
|
|
157
|
+
try {
|
|
158
|
+
await withEntityWrite({ handle, id: objectiveId, opts, apply });
|
|
159
|
+
} catch (error) {
|
|
160
|
+
onError?.(error);
|
|
161
|
+
}
|
|
162
|
+
}
|
package/src/streak-calculator.ts
CHANGED
|
@@ -2,6 +2,12 @@ import type { SloService } from "./service";
|
|
|
2
2
|
import type { SloEngine } from "./slo-engine";
|
|
3
3
|
import type { Logger } from "@checkstack/backend-api";
|
|
4
4
|
import type { QueueManager } from "@checkstack/queue-api";
|
|
5
|
+
import type { EntityHandle } from "@checkstack/automation-backend";
|
|
6
|
+
import {
|
|
7
|
+
computeSloEntityState,
|
|
8
|
+
writeSloEntity,
|
|
9
|
+
type SloEntityState,
|
|
10
|
+
} from "./slo-entity";
|
|
5
11
|
|
|
6
12
|
const SNAPSHOT_QUEUE = "slo-daily-snapshots";
|
|
7
13
|
const SNAPSHOT_JOB_ID = "slo-daily-snapshot-run";
|
|
@@ -12,6 +18,8 @@ interface StreakCalculatorDeps {
|
|
|
12
18
|
engine: SloEngine;
|
|
13
19
|
logger: Logger;
|
|
14
20
|
queueManager: QueueManager;
|
|
21
|
+
/** Resolver for the reactive `slo` entity (§10.7). Undefined in tests. */
|
|
22
|
+
getSloEntity?: () => EntityHandle<SloEntityState> | undefined;
|
|
15
23
|
}
|
|
16
24
|
|
|
17
25
|
/**
|
|
@@ -20,7 +28,7 @@ interface StreakCalculatorDeps {
|
|
|
20
28
|
* and updating streak counters for all active objectives.
|
|
21
29
|
*/
|
|
22
30
|
export async function setupDailySnapshotJob(deps: StreakCalculatorDeps) {
|
|
23
|
-
const { queueManager, logger, service, engine } = deps;
|
|
31
|
+
const { queueManager, logger, service, engine, getSloEntity } = deps;
|
|
24
32
|
|
|
25
33
|
const queue = queueManager.getQueue<{ trigger: "scheduled" }>(SNAPSHOT_QUEUE);
|
|
26
34
|
|
|
@@ -28,7 +36,7 @@ export async function setupDailySnapshotJob(deps: StreakCalculatorDeps) {
|
|
|
28
36
|
await queue.consume(
|
|
29
37
|
async () => {
|
|
30
38
|
logger.info("Starting daily SLO snapshot job");
|
|
31
|
-
await runDailySnapshotJob({ service, engine, logger });
|
|
39
|
+
await runDailySnapshotJob({ service, engine, logger, getSloEntity });
|
|
32
40
|
logger.info("Completed daily SLO snapshot job");
|
|
33
41
|
},
|
|
34
42
|
{ consumerGroup: WORKER_GROUP, maxRetries: 0 },
|
|
@@ -57,8 +65,9 @@ export async function runDailySnapshotJob(deps: {
|
|
|
57
65
|
service: SloService;
|
|
58
66
|
engine: SloEngine;
|
|
59
67
|
logger: Logger;
|
|
68
|
+
getSloEntity?: () => EntityHandle<SloEntityState> | undefined;
|
|
60
69
|
}) {
|
|
61
|
-
const { service, engine, logger } = deps;
|
|
70
|
+
const { service, engine, logger, getSloEntity } = deps;
|
|
62
71
|
|
|
63
72
|
const objectives = await service.listObjectives();
|
|
64
73
|
const today = new Date();
|
|
@@ -79,24 +88,64 @@ export async function runDailySnapshotJob(deps: {
|
|
|
79
88
|
availabilityPercent: status.currentAvailability ?? 100,
|
|
80
89
|
budgetConsumedMinutes: status.errorBudgetConsumedMinutes,
|
|
81
90
|
budgetRemainingPercent: status.errorBudgetRemainingPercent,
|
|
82
|
-
|
|
91
|
+
|
|
83
92
|
burnRate: status.burnRate ?? null,
|
|
84
93
|
streakDays: streak?.currentStreak ?? 0,
|
|
85
94
|
},
|
|
86
95
|
});
|
|
87
96
|
|
|
88
|
-
// 2. Update streak
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
97
|
+
// 2. Update streak (if currently meeting target, increment; else reset)
|
|
98
|
+
// AND surface the recomputed `slo` entity, driven through
|
|
99
|
+
// `handle.mutate` (§10.7). The REAL `slo_streaks` write runs INSIDE
|
|
100
|
+
// `apply` (the plugin's own write) so the framework snapshots `prev`
|
|
101
|
+
// via the COMPUTED `read` BEFORE the streak flips, then emits
|
|
102
|
+
// `ENTITY_CHANGED`. Operators author budget/streak thresholds as
|
|
103
|
+
// `numeric_state` conditions over this state (§9.2). The change is a
|
|
104
|
+
// no-op (no emit) when neither budget nor streak moved.
|
|
105
|
+
await writeSloEntity({
|
|
106
|
+
handle: getSloEntity?.(),
|
|
107
|
+
objectiveId: objective.id,
|
|
108
|
+
apply: async () => {
|
|
109
|
+
if (!status.isBreaching && !status.hasOpenDowntime) {
|
|
110
|
+
await service.incrementStreak({ objectiveId: objective.id });
|
|
111
|
+
} else if (status.isBreaching) {
|
|
112
|
+
const currentStreak = streak?.currentStreak ?? 0;
|
|
113
|
+
if (currentStreak > 0) {
|
|
114
|
+
await service.resetStreak({ objectiveId: objective.id });
|
|
115
|
+
logger.info(
|
|
116
|
+
`SLO ${objective.id}: Streak broken at ${currentStreak} days`,
|
|
117
|
+
);
|
|
118
|
+
}
|
|
119
|
+
}
|
|
120
|
+
// Re-assemble the computed view from the POST-write tables so the
|
|
121
|
+
// emitted `next` reflects the updated streak + recomputed budget.
|
|
122
|
+
const next = await computeSloEntityState({
|
|
123
|
+
service,
|
|
124
|
+
engine,
|
|
125
|
+
objectiveId: objective.id,
|
|
126
|
+
});
|
|
127
|
+
if (next) return next;
|
|
128
|
+
// The objective vanished mid-cycle (raced delete). Fall back to a
|
|
129
|
+
// view from the in-hand objective + post-write streak so `apply`
|
|
130
|
+
// still returns a valid state and the mutate is a no-op.
|
|
131
|
+
const freshStreak = await service.getStreak({
|
|
132
|
+
objectiveId: objective.id,
|
|
133
|
+
});
|
|
134
|
+
return {
|
|
135
|
+
objectiveId: objective.id,
|
|
136
|
+
systemId: objective.systemId,
|
|
137
|
+
target: objective.target,
|
|
138
|
+
budgetRemainingPercent: status.errorBudgetRemainingPercent,
|
|
139
|
+
currentStreak: freshStreak?.currentStreak ?? 0,
|
|
140
|
+
bestStreak: freshStreak?.bestStreak ?? 0,
|
|
141
|
+
};
|
|
142
|
+
},
|
|
143
|
+
onError: (error) =>
|
|
144
|
+
logger.warn(
|
|
145
|
+
`Failed to surface slo entity for objective ${objective.id}`,
|
|
146
|
+
{ error },
|
|
147
|
+
),
|
|
148
|
+
});
|
|
100
149
|
} catch (error) {
|
|
101
150
|
logger.error(
|
|
102
151
|
`Failed to process daily snapshot for objective ${objective.id}`,
|
package/tsconfig.json
CHANGED
|
@@ -4,6 +4,9 @@
|
|
|
4
4
|
"src"
|
|
5
5
|
],
|
|
6
6
|
"references": [
|
|
7
|
+
{
|
|
8
|
+
"path": "../automation-backend"
|
|
9
|
+
},
|
|
7
10
|
{
|
|
8
11
|
"path": "../backend-api"
|
|
9
12
|
},
|
|
@@ -43,12 +46,6 @@
|
|
|
43
46
|
{
|
|
44
47
|
"path": "../healthcheck-common"
|
|
45
48
|
},
|
|
46
|
-
{
|
|
47
|
-
"path": "../integration-backend"
|
|
48
|
-
},
|
|
49
|
-
{
|
|
50
|
-
"path": "../integration-common"
|
|
51
|
-
},
|
|
52
49
|
{
|
|
53
50
|
"path": "../queue-api"
|
|
54
51
|
},
|