@luanpdd/kit-mcp 1.18.0 → 1.19.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,47 +1,49 @@
1
1
  ---
2
2
  name: burn-rate-status
3
- description: Tabela de burn rate por SLO % budget gasto, ETA exhaustão, ação (page/ticket/warn/ok). Rodável manualmente ou em /loop. Aplica skill burn-rate-alerting.
4
- argument-hint: "[<slo_name>] [--lookahead 4h] [--baseline 1h]"
3
+ description: Tabela de burn rate por SLO consumindo .planning/slos/*.yml + .planning/metrics/snapshots/. Calcula SLI atual, burn rate (% budget gasto/h), ETA exhaustão e ação (PAGE/TICKET/WARN/OK). Aplica skill burn-rate-alerting.
4
+ argument-hint: "[<slo_name>] [--lookahead 4h] [--baseline 1h] [--format table|json]"
5
5
  allowed-tools:
6
6
  - Read
7
7
  - Bash
8
- - Task
9
8
  - Glob
10
9
  ---
11
10
 
12
11
  <objective>
13
- Snapshot de burn rate para 1 SLO (se especificado) ou TODOS os SLOs definidos. Aplica skill [`burn-rate-alerting`](../skills/burn-rate-alerting/SKILL.md) — fórmula `burn_rate = error_rate / (1 - target)`, lookahead ≤ 4× baseline.
12
+ Snapshot de burn rate para 1 SLO (se especificado) ou TODOS os SLOs definidos em `.planning/slos/*.yml`. Aplica skill [`burn-rate-alerting`](../skills/burn-rate-alerting/SKILL.md) — fórmula `burn_rate = error_rate / (1 - target)`, lookahead ≤ 4× baseline.
13
+
14
+ **Lê:** `.planning/slos/*.yml` (definição) + `.planning/metrics/snapshots/*.json` (eventos persistidos via `metrics.persistSnapshot()` — Phase 99).
14
15
 
15
16
  **Cria/Atualiza:** nada — comando read-only.
16
17
 
17
- **Após:** o user vê tabela com status (PAGE / TICKET / WARN / OK) e pode escolher invocar `/investigar-producao` se há burn ativo.
18
+ **Após:** o user vê tabela com status (PAGE / TICKET / WARN / OK) e pode escolher invocar `/investigar-producao` se há burn ativo, ou rodar mais carga e re-snapshotar via `metrics-snapshot` MCP tool se a janela está vazia.
18
19
  </objective>
19
20
 
20
21
  <context>
21
22
  **Argumentos:** `$ARGUMENTS` — opcional `<slo_name>` para 1 SLO; sem args = todos.
22
23
 
23
24
  **Flags:**
24
- - `--lookahead <duration>` — janela predictive (default: `4h` para short-term)
25
+ - `--lookahead <duration>` — janela predictive (default: `4h` para short-term page-tier)
25
26
  - `--baseline <duration>` — janela base (default: `1h`)
26
27
  - `--format <table|json>` — output format (default: `table`)
27
28
 
28
29
  **Combinações canônicas:**
29
- - short-term: lookahead 4h, baseline 1h (page-tier)
30
- - long-term: lookahead 3d, baseline 18h (ticket-tier)
31
-
32
- **Loop pattern:** rodar este comando via skill `loop` com intervalo 5min para monitoramento contínuo.
33
-
34
- ```text
35
- /loop 5m /burn-rate-status
36
- ```
30
+ - short-term (page-tier): lookahead 4h, baseline 1h
31
+ - long-term (ticket-tier): lookahead 3d, baseline 18h
32
+
33
+ **Phase 99 wiring:** este comando consome dados persistidos pela API
34
+ `persistSnapshot()` adicionada em `src/core/metrics.js` (Phase 99). Sem
35
+ snapshots na janela, o comando emite "no_data" para o SLO em vez de
36
+ inventar números. Para gerar dados, invoque o MCP tool `metrics-snapshot`
37
+ durante uso normal — futura fase pode auto-persistir.
37
38
  </context>
38
39
 
39
40
  <process>
40
41
 
41
42
  ## 1. Parsear argumentos
42
43
 
44
+ Bash:
43
45
  ```bash
44
- SLO_NAME=$(echo "$ARGUMENTS" | awk '{print $1}' | grep -v '^--' || true)
46
+ SLO_NAME=$(echo "$ARGUMENTS" | awk '{for(i=1;i<=NF;i++) if($i !~ /^--/) {print $i; exit}}')
45
47
  LOOKAHEAD=$(echo "$ARGUMENTS" | grep -oE -- '--lookahead [^ ]+' | awk '{print $2}')
46
48
  BASELINE=$(echo "$ARGUMENTS" | grep -oE -- '--baseline [^ ]+' | awk '{print $2}')
47
49
  FORMAT=$(echo "$ARGUMENTS" | grep -oE -- '--format [^ ]+' | awk '{print $2}')
@@ -51,90 +53,240 @@ FORMAT=$(echo "$ARGUMENTS" | grep -oE -- '--format [^ ]+' | awk '{print $2}')
51
53
  [ -z "$FORMAT" ] && FORMAT="table"
52
54
  ```
53
55
 
54
- ## 2. Listar SLOs
56
+ Convert duration to ms (helper):
57
+ ```bash
58
+ to_ms() {
59
+ local d="$1"
60
+ case "$d" in
61
+ *h) echo $(( ${d%h} * 3600000 ));;
62
+ *m) echo $(( ${d%m} * 60000 ));;
63
+ *s) echo $(( ${d%s} * 1000 ));;
64
+ *d) echo $(( ${d%d} * 86400000 ));;
65
+ *) echo 0 ;;
66
+ esac
67
+ }
68
+ LOOKAHEAD_MS=$(to_ms "$LOOKAHEAD")
69
+ BASELINE_MS=$(to_ms "$BASELINE")
70
+ ```
71
+
72
+ ## 2. Listar SLOs (FIX Phase 99: extension `.yml`, não `.md`)
55
73
 
56
74
  ```bash
57
75
  if [ -n "$SLO_NAME" ]; then
58
- SLO_FILES=(".planning/slos/${SLO_NAME}.md")
76
+ SLO_FILES=(".planning/slos/${SLO_NAME}.yml")
59
77
  else
60
- SLO_FILES=(.planning/slos/*.md)
78
+ SLO_FILES=(.planning/slos/*.yml)
61
79
  fi
62
80
 
63
- if [ ${#SLO_FILES[@]} -eq 0 ] || [ ! -f "${SLO_FILES[0]}" ]; then
64
- echo "Nenhum SLO definido. Rode /definir-slo <feature> primeiro."
81
+ # Filtra entradas inexistentes (caso o glob não tenha match).
82
+ EXISTING_SLOS=()
83
+ for f in "${SLO_FILES[@]}"; do
84
+ [ -f "$f" ] && EXISTING_SLOS+=("$f")
85
+ done
86
+
87
+ if [ ${#EXISTING_SLOS[@]} -eq 0 ]; then
88
+ echo "Nenhum SLO definido em .planning/slos/. Rode /definir-slo <feature> primeiro."
65
89
  exit 0
66
90
  fi
67
91
  ```
68
92
 
69
- ## 3. Para cada SLO, dispatch para `burn-rate-forecaster`
93
+ ## 3. Para cada SLO, carregar metadata + calcular SLI
70
94
 
71
- Para cada `SLO_FILE`:
95
+ Para cada `SLO_FILE` em `EXISTING_SLOS`:
96
+
97
+ ### 3.1 Extrair campos canônicos do YAML via regex
98
+
99
+ Os SLOs do projeto seguem schema fixo (validado por `test/unit/slo-schema.test.js`). Sem `js-yaml` — regex sobre os keys conhecidos:
72
100
 
73
101
  ```bash
74
- SLO_NAME=$(basename "$SLO_FILE" .md)
75
- TARGET=$(grep -oE 'Target.*[0-9.]+' "$SLO_FILE" | head -1 | grep -oE '[0-9.]+')
102
+ SLO_NAME=$(grep -oE '^\s+name:\s*\S+' "$SLO_FILE" | head -1 | awk '{print $2}')
103
+ SERVICE=$(grep -oE '^\s+service:\s*\S+' "$SLO_FILE" | head -1 | awk '{print $2}')
104
+ SLO_TYPE=$(grep -oE '^\s+type:\s*\S+' "$SLO_FILE" | head -1 | awk '{print $2}')
105
+
106
+ # Availability SLO: target = ratio decimal (e.g. 0.995)
107
+ TARGET_RATIO=$(grep -oE '^target:\s*[0-9.]+' "$SLO_FILE" | awk '{print $2}')
108
+ # Latency SLO: target_ms + percentile
109
+ TARGET_MS=$(grep -oE '^target_ms:\s*[0-9]+' "$SLO_FILE" | awk '{print $2}')
110
+ PERCENTILE=$(grep -oE '^\s+percentile:\s*[0-9]+' "$SLO_FILE" | awk '{print $2}')
76
111
  ```
77
112
 
78
- ```text
79
- Task(
80
- subagent_type="burn-rate-forecaster",
81
- prompt="
82
- slo_name: ${SLO_NAME}
83
- target: ${TARGET}
84
- lookahead: ${LOOKAHEAD}
85
- baseline: ${BASELINE}
86
-
87
- Calcular burn rate atual + ETA + status (PAGE/TICKET/WARN/OK).
88
- Output formato compatível com tabela mestra.
89
- "
90
- )
113
+ ### 3.2 Carregar snapshots da janela baseline
114
+
115
+ Use a API `loadSnapshots()` adicionada em Phase 99. Inline node script:
116
+
117
+ ```bash
118
+ SNAPS_JSON=$(node --input-type=module -e "
119
+ import { loadSnapshots } from './src/core/metrics.js';
120
+ const snaps = await loadSnapshots(process.cwd(), $BASELINE_MS);
121
+ console.log(JSON.stringify(snaps));
122
+ ")
123
+ SNAPSHOT_COUNT=$(echo "$SNAPS_JSON" | node -e "console.log(JSON.parse(require('fs').readFileSync(0,'utf8')).length)")
91
124
  ```
92
125
 
93
- ## 4. Agregar resultados em tabela
126
+ Se `SNAPSHOT_COUNT < 2`, marcar SLO como `no_data`:
127
+ ```bash
128
+ if [ "$SNAPSHOT_COUNT" -lt 2 ]; then
129
+ echo "SLO $SLO_NAME: insufficient snapshots in baseline window ($BASELINE) — got $SNAPSHOT_COUNT, need ≥2"
130
+ echo "Generate data: invoke 'metrics-snapshot' MCP tool during normal use."
131
+ STATUS="no_data"
132
+ continue # pula para o próximo SLO
133
+ fi
134
+ ```
94
135
 
136
+ ### 3.3 Calcular SLI por tipo de SLO
137
+
138
+ **Availability (`type: event-based`):**
139
+
140
+ Inline node — primeiro vs último snapshot dentro da janela. Delta de counters dá good/bad events na janela:
141
+
142
+ ```bash
143
+ SLI_RESULT=$(node --input-type=module -e "
144
+ import { loadSnapshots } from './src/core/metrics.js';
145
+ const snaps = await loadSnapshots(process.cwd(), $BASELINE_MS);
146
+ if (snaps.length < 2) { console.log(JSON.stringify({sli:null, error:'no_data'})); process.exit(0); }
147
+ const first = snaps[0];
148
+ const last = snaps[snaps.length - 1];
149
+ let goodFirst = 0, goodLast = 0, totalFirst = 0, totalLast = 0;
150
+ for (const [k,v] of Object.entries(first.counters)) {
151
+ if (k.endsWith(':ok')) goodFirst += v;
152
+ totalFirst += v;
153
+ }
154
+ for (const [k,v] of Object.entries(last.counters)) {
155
+ if (k.endsWith(':ok')) goodLast += v;
156
+ totalLast += v;
157
+ }
158
+ const good = goodLast - goodFirst;
159
+ const total = totalLast - totalFirst;
160
+ const sli = total > 0 ? good / total : null;
161
+ const errorRate = total > 0 ? (total - good) / total : 0;
162
+ console.log(JSON.stringify({sli, errorRate, good, total, totalFirst, totalLast}));
163
+ ")
95
164
  ```
165
+
166
+ **Latency (`type: percentile`):**
167
+
168
+ Para latency, usar o p95 do último snapshot na janela (cumulative — FIFO histogram dá p95 sobre as últimas 1000 amostras). SLI = fração de samples acima de target_ms é o budget consumido.
169
+
170
+ ```bash
171
+ SLI_RESULT=$(node --input-type=module -e "
172
+ import { loadSnapshots } from './src/core/metrics.js';
173
+ const snaps = await loadSnapshots(process.cwd(), $BASELINE_MS);
174
+ if (snaps.length < 1) { console.log(JSON.stringify({sli:null, error:'no_data'})); process.exit(0); }
175
+ const last = snaps[snaps.length - 1];
176
+ const target = $TARGET_MS;
177
+ let totalSamples = 0, slowSamples = 0;
178
+ for (const [tool, lat] of Object.entries(last.latency)) {
179
+ totalSamples += lat.count;
180
+ if (lat.p95 > target) slowSamples += Math.round(lat.count * 0.05); // approximation: p95 above target → ~5% slow
181
+ }
182
+ const sli = totalSamples > 0 ? 1 - (slowSamples / totalSamples) : null;
183
+ const errorRate = totalSamples > 0 ? slowSamples / totalSamples : 0;
184
+ console.log(JSON.stringify({sli, errorRate, totalSamples, slowSamples, p95Max: Math.max(0, ...Object.values(last.latency).map(l => l.p95 || 0))}));
185
+ ")
186
+ ```
187
+
188
+ ### 3.4 Calcular burn rate + status
189
+
190
+ Aplicar fórmula canônica da skill `burn-rate-alerting`:
191
+
192
+ ```bash
193
+ BURN_STATUS=$(node --input-type=module -e "
194
+ const result = $SLI_RESULT;
195
+ if (result.error) { console.log(JSON.stringify({status:'no_data'})); process.exit(0); }
196
+ const target = $TARGET_RATIO || (1 - 0.05); // latency: 1 - ratio_above_target (5%)
197
+ const errorRate = result.errorRate;
198
+ const slack = 1 - target; // budget = (1 - target)
199
+ const burnRate = slack > 0 ? errorRate / slack : 0;
200
+
201
+ let status, action;
202
+ if (burnRate >= 14.4) {
203
+ status = 'PAGE';
204
+ action = 'Page on-call — invoke /investigar-producao';
205
+ } else if (burnRate >= 6.0) {
206
+ status = 'TICKET';
207
+ action = 'Open ticket — investigate before budget exhausted';
208
+ } else if (burnRate >= 1.0) {
209
+ status = 'WARN';
210
+ action = 'Monitor — burn rate sustained ≥1× exhausts budget in window';
211
+ } else {
212
+ status = 'OK';
213
+ action = '—';
214
+ }
215
+
216
+ // ETA exhaustion (predictive). For burn_rate=0 (no errors), ETA is ∞.
217
+ const baselineHours = $BASELINE_MS / 3600000;
218
+ const eta = burnRate > 0 ? (1 / burnRate) * 30 * 24 / baselineHours : null; // hours until exhausted
219
+ const etaStr = eta === null ? '—' : (eta < 24 ? eta.toFixed(1) + 'h' : (eta/24).toFixed(1) + 'd');
220
+
221
+ console.log(JSON.stringify({status, action, burnRate: burnRate.toFixed(2), errorRate: (errorRate*100).toFixed(2), eta: etaStr}));
222
+ ")
223
+ ```
224
+
225
+ ### 3.5 Acumular linha da tabela
226
+
227
+ ```bash
228
+ SLO_ROWS+=("| $SLO_NAME | ${TARGET_RATIO:-${TARGET_MS}ms p$PERCENTILE} | $BASELINE | ${ERROR_RATE}% | ${BURN_RATE}× | $ETA | **$STATUS** | $ACTION |")
229
+ ```
230
+
231
+ ## 4. Renderizar tabela mestra
232
+
233
+ ```text
96
234
  ═══════════════════════════════════════════════════════════
97
235
  framework ► BURN-RATE-STATUS ▸ {timestamp}
236
+ baseline=$BASELINE lookahead=$LOOKAHEAD snapshots=$TOTAL_SNAPS
98
237
  ═══════════════════════════════════════════════════════════
99
238
 
100
- | SLO | Target | Window | Budget gasto | Burn rate | ETA exhaustão | Status | Ação |
239
+ | SLO | Target | Window | Error rate | Burn rate | ETA exhaustão | Status | Ação |
101
240
  |---|---|---|---|---|---|---|---|
102
- | checkout_success | 99.9% | 30d | 23% | 1.4× | 12d | OK | informativo |
103
- | login_success | 99.95% | 30d | 78% | 8.0× | 4h | **PAGE** | invocar /investigar-producao |
104
- | search_latency | 99% | 30d | 15% | 0.7× | — | OK | — |
241
+ {$SLO_ROWS}
105
242
  ```
106
243
 
107
244
  ## 5. Sugerir próximas ações
108
245
 
109
- Se algum SLO em status PAGE ou TICKET:
110
-
246
+ ```bash
247
+ # Contar status counts
248
+ PAGE_COUNT=$(echo "$SLO_ROWS" | grep -c "PAGE" || echo 0)
249
+ TICKET_COUNT=$(echo "$SLO_ROWS" | grep -c "TICKET" || echo 0)
250
+ WARN_COUNT=$(echo "$SLO_ROWS" | grep -c "WARN" || echo 0)
251
+ NO_DATA_COUNT=$(echo "$SLO_ROWS" | grep -c "no_data" || echo 0)
111
252
  ```
112
- ## ⚠ SLOs em alerta:
113
- 1. login_success — burn rate 8.0×, ETA 4h
114
- → /investigar-producao "login_success burn rate = 8.0× às {timestamp}"
115
253
 
116
- ## SLOs em WARN (>= 80% gasto):
117
- - (nenhum)
254
+ Output:
255
+ ```text
256
+ ## Próximas ações
257
+
258
+ {Se PAGE_COUNT > 0:}
259
+ ⚠ {PAGE_COUNT} SLO(s) em PAGE — invocar /investigar-producao "<slo_name> burn rate {burn_rate}×"
260
+
261
+ {Se TICKET_COUNT > 0:}
262
+ ☐ {TICKET_COUNT} SLO(s) em TICKET — abrir issue, investigar antes do budget esgotar
263
+
264
+ {Se WARN_COUNT > 0:}
265
+ ⓘ {WARN_COUNT} SLO(s) em WARN — burn rate sustained ≥1× exhausts budget
118
266
 
119
- ## SLOs OK:
120
- - 2 SLOs em compliance saudável
267
+ {Se NO_DATA_COUNT > 0:}
268
+ ⊘ {NO_DATA_COUNT} SLO(s) sem dados na janela — invoque o MCP tool 'metrics-snapshot' periodicamente para popular .planning/metrics/snapshots/
121
269
  ```
122
270
 
123
- ## 6. Modo `/loop`
271
+ ## 6. Modo `/loop` (idempotência)
124
272
 
125
273
  Se chamado dentro de `/loop`, comportamento idempotente:
126
- - Não acumular state entre invocações (snapshot fresh)
127
- - Output curto se nada mudou (apenas status; sem repetir tabela completa em todo loop)
128
- - Acionar AskUserQuestion APENAS quando status muda de OK → WARN/TICKET/PAGE (transição)
274
+ - Snapshot fresh em cada invocação (não acumular state).
275
+ - Output curto se status não mudou (apenas linha-resumo; sem repetir tabela completa).
276
+ - Acionar AskUserQuestion APENAS quando algum SLO transiciona OK → WARN/TICKET/PAGE.
129
277
 
130
278
  </process>
131
279
 
132
280
  <success_criteria>
133
- - [ ] $ARGUMENTS parseados (SLO opcional + flags)
134
- - [ ] SLOs descobertos via glob `.planning/slos/*.md`
135
- - [ ] `burn-rate-forecaster` invocado para cada SLO
136
- - [ ] Tabela agregada em formato consistente
137
- - [ ] Status enum: PAGE / TICKET / WARN / OK
281
+ - [ ] $ARGUMENTS parseados (SLO opcional + flags --lookahead/--baseline/--format)
282
+ - [ ] SLOs descobertos via glob `.planning/slos/*.yml` (FIX Phase 99: extension `.yml`, não `.md`)
283
+ - [ ] Snapshots carregados via `loadSnapshots()` (Phase 99 `src/core/metrics.js`)
284
+ - [ ] SLI calculado por tipo (event-based ratio para availability, percentile para latency)
285
+ - [ ] Burn rate calculado pela fórmula `error_rate / (1 - target)` (skill burn-rate-alerting)
286
+ - [ ] Status enum: PAGE / TICKET / WARN / OK / no_data
287
+ - [ ] ETA exhaustão computada (predictive forecast)
288
+ - [ ] Tabela markdown agregada
138
289
  - [ ] Sugestões de próximas ações para SLOs em alerta
139
- - [ ] Idempotente (rodável em /loop sem acúmulo)
290
+ - [ ] Idempotente em /loop (sem acúmulo de state)
291
+ - [ ] no_data graceful — não inventa números, sugere `metrics-snapshot`
140
292
  </success_criteria>
@@ -1,6 +1,6 @@
1
1
  {
2
- "version": "1.18.0",
3
- "timestamp": "2026-05-09T17:01:35.745Z",
2
+ "version": "1.19.0",
3
+ "timestamp": "2026-05-09T17:53:53.960Z",
4
4
  "files": {
5
5
  "COMANDOS.md": "d24ec61a6ec35db314cc5f2ae287bfb927b794789c8f1d558c55862f5e6534b2",
6
6
  "COMPATIBILITY.md": "794e336a87045cdf0161785b9a7a0975a49abbd80bdd816b8852251fcc8126ca",
@@ -68,7 +68,7 @@
68
68
  "commands/auditar-uat.md": "83e9f21584938350ee96ef0f0bb870786537bf38220c7a8ec0e04d06659c6bda",
69
69
  "commands/autonomo.md": "ae5746a8b9cd63d9ac8cf2774b8b466789ccefec3d9e267dcb98d97481ede57f",
70
70
  "commands/branch-pr.md": "77866ec7ef8d65ad6cea9d17491b7c7605238b3a3505dd3e128f18cd150c9be4",
71
- "commands/burn-rate-status.md": "d5316301d4ac576bf57dbd24a56b1c2063610819520af3f75026499c6525c438",
71
+ "commands/burn-rate-status.md": "040fcc64b00bf5bcb9b69b7d3c1ef729647ddf0060d232fe061ea70b242747cf",
72
72
  "commands/capturar-payloads.md": "507d009d9fb28fe12d18c3d3a599fbb23605254564e5753b056e0f32fb92f20b",
73
73
  "commands/caracterizar-prompt.md": "996b923d6c807d94be77d14dbfec3fdabf98d3bf111f6928932421b724847fb3",
74
74
  "commands/caracterizar.md": "994ce4136ba44b74890874f3274c26bcdc9f4feb5f4852cb0288687142ab1403",
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@luanpdd/kit-mcp",
3
- "version": "1.18.0",
3
+ "version": "1.19.0",
4
4
  "description": "Generic infrastructure to ship YOUR personal kit of agents/commands/skills as an MCP server, with cross-IDE sync (Claude Code, Cursor, Codex, Gemini, Windsurf, Antigravity, Copilot, Trae).",
5
5
  "type": "module",
6
6
  "bin": {
@@ -1,4 +1,5 @@
1
1
  // OBS-18-01 / OBS-18-02 — in-memory golden signals for kit-mcp server.
2
+ // OBS-19-01 / OBS-19-02 / OBS-19-03 — disk-persistent rolling snapshots.
2
3
  //
3
4
  // Phase 94: Eat Your Own Dog Food. The skill `four-golden-signals` says any
4
5
  // user-facing service worth its salt instruments Latency + Traffic + Errors
@@ -7,31 +8,48 @@
7
8
  // operator wants when something feels off.
8
9
  //
9
10
  // Scope decisions (see .planning/phases/94-golden-signals-mcp-server/94-CONTEXT.md):
10
- // - Zero dependencies. Map + array stdlib only — preserves the 6-deps budget
11
- // that Phase 92.01 fought to maintain and that Phase 93.01 enforces in CI.
12
- // - In-memory only. No file persistence, no socket export, no OTel SDK.
13
- // kit-mcp is a developer tool launched on demand by an IDE; cross-process
14
- // telemetry pipelines are explicit non-goals (see <deferred> block in
15
- // 94-CONTEXT.md). A future phase can layer OTel on top of this API.
11
+ // - Zero new dependencies. Phase 99 adds fs/promises + path from stdlib only —
12
+ // the 6-deps budget Phase 92.01 fought to maintain and Phase 93.01 enforces
13
+ // in CI is preserved.
14
+ // - In-memory primary, on-demand persistence. The Map+array core stays
15
+ // in-memory; persistSnapshot writes a JSON file under .planning/metrics/
16
+ // snapshots/ when called. No background timer, no implicit writes the
17
+ // /burn-rate-status command and metrics-snapshot tool are the writers.
16
18
  // - Bounded memory. Histograms cap at HISTOGRAM_CAP=1000 samples per tool
17
- // with FIFO drop. At cap, p50/p95/p99 over the latest 1000 samples is
18
- // more useful than an unbounded array that could grow for the lifetime
19
- // of a long-lived MCP session.
19
+ // with FIFO drop.
20
+ // - Bounded disk. cleanupOldSnapshots prunes files > 30 days old on every
21
+ // persistSnapshot call (rolling window, no separate retention job).
20
22
  // - Snapshot is read-only. Returns a fresh plain-object copy so callers
21
23
  // can JSON.stringify it without exposing internal Map references.
24
+ // - Persisted shape includes `ts` (epoch ms) inside the JSON. We do NOT
25
+ // parse the filename for windowing — filesystem-safe ISO encoding
26
+ // (`replace(/[:.]/g, '-')`) is one-way (cannot reliably round-trip back
27
+ // through Date.parse) and mtime is unreliable across copy/touch. The
28
+ // in-file ts is authoritative.
22
29
  //
23
- // API surface (4 exports):
30
+ // API surface (5 exports + 2 async):
24
31
  // incrementInvocation(tool, status) — counter++ keyed `${tool}:${status}`
25
32
  // recordLatency(tool, ms) — push to histogram, FIFO at cap
26
33
  // snapshot() — { counters, latency } plain object
27
34
  // reset() — clear both maps; called on boot if
28
35
  // KIT_MCP_METRICS_RESET=1
36
+ // persistSnapshot(rootDir) — write {ts, counters, latency} to
37
+ // .planning/metrics/snapshots/<ts>.json
38
+ // + cleanup files > 30d
39
+ // loadSnapshots(rootDir, windowMs) — read all snapshots whose in-file ts
40
+ // is within windowMs (default 30d),
41
+ // sorted ascending by ts
29
42
  //
30
43
  // Boot-time reset honors the env var by calling reset() at module load when
31
44
  // the flag is set. This keeps the signal "fresh" for a probe in tests or for
32
45
  // an operator who spawned the server with the flag for a clean comparison.
33
46
 
47
+ import fs from 'node:fs/promises';
48
+ import path from 'node:path';
49
+
34
50
  const HISTOGRAM_CAP = 1000;
51
+ const DEFAULT_RETENTION_MS = 30 * 86400 * 1000; // 30 days rolling.
52
+ const SNAPSHOT_DIR_REL = path.join('.planning', 'metrics', 'snapshots');
35
53
 
36
54
  const counters = new Map(); // key: `${tool}:${status}` → count (number)
37
55
  const histograms = new Map(); // key: tool → number[] (length ≤ HISTOGRAM_CAP)
@@ -130,6 +148,112 @@ export function reset() {
130
148
  histograms.clear();
131
149
  }
132
150
 
151
+ /**
152
+ * OBS-19-01 — Persist the current snapshot to disk under
153
+ * `<rootDir>/.planning/metrics/snapshots/<timestamp>.json`. Runs the rolling
154
+ * cleanup of files older than `retentionMs` (default 30d) on every call so
155
+ * callers don't need a separate retention job.
156
+ *
157
+ * The on-disk shape is `{ ts: <epoch_ms>, counters, latency }`. The `ts` field
158
+ * inside the JSON — NOT the filename — is the authoritative timestamp for
159
+ * loadSnapshots windowing. The filename uses an ISO encoding with `:` and `.`
160
+ * replaced by `-` for filesystem safety; that encoding is one-way (cannot
161
+ * round-trip back through Date.parse), so we never parse it for ordering.
162
+ *
163
+ * @param {string} [rootDir=process.cwd()] Project root. Snapshots land under
164
+ * `<rootDir>/.planning/metrics/snapshots/`.
165
+ * @param {object} [opts]
166
+ * @param {number} [opts.retentionMs] Override the rolling-window age in ms.
167
+ * Defaults to 30 days. Tests use shorter windows to drive the cleanup path.
168
+ * @returns {Promise<{file: string, snap: {ts: number, counters: object, latency: object}}>}
169
+ */
170
+ export async function persistSnapshot(rootDir = process.cwd(), opts = {}) {
171
+ const retentionMs = Number.isFinite(opts.retentionMs) ? opts.retentionMs : DEFAULT_RETENTION_MS;
172
+ const dir = path.join(rootDir, SNAPSHOT_DIR_REL);
173
+ await fs.mkdir(dir, { recursive: true });
174
+ const ts = Date.now();
175
+ const snap = { ts, ...snapshot() };
176
+ // Filesystem-safe ISO encoding — Windows forbids `:` in paths and `.` is
177
+ // ambiguous with extension separators on shells with brace expansion.
178
+ const isoSafe = new Date(ts).toISOString().replace(/[:.]/g, '-');
179
+ const file = path.join(dir, `${isoSafe}.json`);
180
+ await fs.writeFile(file, JSON.stringify(snap, null, 2));
181
+ await cleanupOldSnapshots(dir, retentionMs);
182
+ return { file, snap };
183
+ }
184
+
185
+ /**
186
+ * OBS-19-02 — Load all snapshots from disk whose in-file `ts` is within the
187
+ * sliding window. Returns the array sorted ascending by `ts` so consumers
188
+ * (`/burn-rate-status`) can compute first-vs-last deltas without re-sorting.
189
+ *
190
+ * Defensive against malformed JSON: a corrupt file is skipped silently rather
191
+ * than aborting the whole load. The 30d window is rolling from "now" — pass a
192
+ * smaller value to drive recent-only views (e.g. `60 * 60 * 1000` for last
193
+ * hour) when computing burn rate over a baseline window.
194
+ *
195
+ * @param {string} [rootDir=process.cwd()] Project root.
196
+ * @param {number} [windowMs] Sliding window in ms. Defaults to 30 days.
197
+ * @returns {Promise<Array<{ts: number, counters: object, latency: object}>>}
198
+ * Empty array if the snapshots directory does not exist.
199
+ */
200
+ export async function loadSnapshots(rootDir = process.cwd(), windowMs = DEFAULT_RETENTION_MS) {
201
+ const dir = path.join(rootDir, SNAPSHOT_DIR_REL);
202
+ const cutoff = Date.now() - windowMs;
203
+ let files;
204
+ try {
205
+ files = await fs.readdir(dir);
206
+ } catch {
207
+ return []; // Dir absent on first run — not an error.
208
+ }
209
+ const results = [];
210
+ for (const f of files) {
211
+ if (!f.endsWith('.json')) continue;
212
+ try {
213
+ const raw = await fs.readFile(path.join(dir, f), 'utf-8');
214
+ const parsed = JSON.parse(raw);
215
+ if (Number.isFinite(parsed?.ts) && parsed.ts >= cutoff) {
216
+ results.push(parsed);
217
+ }
218
+ } catch {
219
+ // Corrupt file — skip silently rather than break the whole burn-rate
220
+ // calculation. A future phase can surface counts via a doctor probe.
221
+ }
222
+ }
223
+ return results.sort((a, b) => a.ts - b.ts);
224
+ }
225
+
226
+ /**
227
+ * OBS-19-03 — Internal helper: delete snapshot files older than `maxAgeMs`.
228
+ * Called from persistSnapshot on every write so retention is implicit.
229
+ * Uses fs.stat().mtimeMs as the age proxy; we accept the small drift versus
230
+ * the in-file `ts` because cleanup is best-effort eviction, not authoritative
231
+ * windowing (loadSnapshots reads the in-file ts).
232
+ *
233
+ * @param {string} dir Absolute path to the snapshots directory.
234
+ * @param {number} maxAgeMs Files with mtime older than this are unlinked.
235
+ * @returns {Promise<void>}
236
+ */
237
+ async function cleanupOldSnapshots(dir, maxAgeMs) {
238
+ const cutoff = Date.now() - maxAgeMs;
239
+ let files;
240
+ try {
241
+ files = await fs.readdir(dir);
242
+ } catch {
243
+ return;
244
+ }
245
+ for (const f of files) {
246
+ if (!f.endsWith('.json')) continue;
247
+ const fp = path.join(dir, f);
248
+ try {
249
+ const stat = await fs.stat(fp);
250
+ if (stat.mtimeMs < cutoff) await fs.unlink(fp);
251
+ } catch {
252
+ // Unlink can race with concurrent cleanup; ignore ENOENT and friends.
253
+ }
254
+ }
255
+ }
256
+
133
257
  // Boot-time reset honors KIT_MCP_METRICS_RESET=1. We call reset() instead of
134
258
  // merely skipping init because the maps are already empty at module load —
135
259
  // the call is a no-op today but documents the contract for any future module
@@ -141,3 +265,4 @@ if (process.env.KIT_MCP_METRICS_RESET === '1') {
141
265
  // Exported for tests only — keeps the API surface explicit while letting unit
142
266
  // tests assert on the FIFO behavior at the boundary.
143
267
  export const __TEST_HISTOGRAM_CAP = HISTOGRAM_CAP;
268
+ export const __TEST_SNAPSHOT_DIR_REL = SNAPSHOT_DIR_REL;