@luanpdd/kit-mcp 1.17.0 → 1.19.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/kit/commands/burn-rate-status.md +213 -61
- package/kit/file-manifest.json +3 -3
- package/package.json +1 -1
- package/src/core/metrics.js +268 -0
- package/src/mcp-server/index.js +51 -12
|
@@ -1,47 +1,49 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: burn-rate-status
|
|
3
|
-
description: Tabela de burn rate por SLO
|
|
4
|
-
argument-hint: "[<slo_name>] [--lookahead 4h] [--baseline 1h]"
|
|
3
|
+
description: Tabela de burn rate por SLO consumindo .planning/slos/*.yml + .planning/metrics/snapshots/. Calcula SLI atual, burn rate (% budget gasto/h), ETA exhaustão e ação (PAGE/TICKET/WARN/OK). Aplica skill burn-rate-alerting.
|
|
4
|
+
argument-hint: "[<slo_name>] [--lookahead 4h] [--baseline 1h] [--format table|json]"
|
|
5
5
|
allowed-tools:
|
|
6
6
|
- Read
|
|
7
7
|
- Bash
|
|
8
|
-
- Task
|
|
9
8
|
- Glob
|
|
10
9
|
---
|
|
11
10
|
|
|
12
11
|
<objective>
|
|
13
|
-
Snapshot de burn rate para 1 SLO (se especificado) ou TODOS os SLOs definidos
|
|
12
|
+
Snapshot de burn rate para 1 SLO (se especificado) ou TODOS os SLOs definidos em `.planning/slos/*.yml`. Aplica skill [`burn-rate-alerting`](../skills/burn-rate-alerting/SKILL.md) — fórmula `burn_rate = error_rate / (1 - target)`, lookahead ≤ 4× baseline.
|
|
13
|
+
|
|
14
|
+
**Lê:** `.planning/slos/*.yml` (definição) + `.planning/metrics/snapshots/*.json` (eventos persistidos via `metrics.persistSnapshot()` — Phase 99).
|
|
14
15
|
|
|
15
16
|
**Cria/Atualiza:** nada — comando read-only.
|
|
16
17
|
|
|
17
|
-
**Após:** o user vê tabela com status (PAGE / TICKET / WARN / OK) e pode escolher invocar `/investigar-producao` se há burn ativo.
|
|
18
|
+
**Após:** o user vê tabela com status (PAGE / TICKET / WARN / OK) e pode escolher invocar `/investigar-producao` se há burn ativo, ou rodar mais carga e re-snapshotar via `metrics-snapshot` MCP tool se a janela está vazia.
|
|
18
19
|
</objective>
|
|
19
20
|
|
|
20
21
|
<context>
|
|
21
22
|
**Argumentos:** `$ARGUMENTS` — opcional `<slo_name>` para 1 SLO; sem args = todos.
|
|
22
23
|
|
|
23
24
|
**Flags:**
|
|
24
|
-
- `--lookahead <duration>` — janela predictive (default: `4h` para short-term)
|
|
25
|
+
- `--lookahead <duration>` — janela predictive (default: `4h` para short-term page-tier)
|
|
25
26
|
- `--baseline <duration>` — janela base (default: `1h`)
|
|
26
27
|
- `--format <table|json>` — output format (default: `table`)
|
|
27
28
|
|
|
28
29
|
**Combinações canônicas:**
|
|
29
|
-
- short-term: lookahead 4h, baseline 1h
|
|
30
|
-
- long-term: lookahead 3d, baseline 18h
|
|
31
|
-
|
|
32
|
-
**
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
30
|
+
- short-term (page-tier): lookahead 4h, baseline 1h
|
|
31
|
+
- long-term (ticket-tier): lookahead 3d, baseline 18h
|
|
32
|
+
|
|
33
|
+
**Phase 99 wiring:** este comando consome dados persistidos pela API
|
|
34
|
+
`persistSnapshot()` adicionada em `src/core/metrics.js` (Phase 99). Sem
|
|
35
|
+
snapshots na janela, o comando emite "no_data" para o SLO em vez de
|
|
36
|
+
inventar números. Para gerar dados, invoque o MCP tool `metrics-snapshot`
|
|
37
|
+
durante uso normal — futura fase pode auto-persistir.
|
|
37
38
|
</context>
|
|
38
39
|
|
|
39
40
|
<process>
|
|
40
41
|
|
|
41
42
|
## 1. Parsear argumentos
|
|
42
43
|
|
|
44
|
+
Bash:
|
|
43
45
|
```bash
|
|
44
|
-
SLO_NAME=$(echo "$ARGUMENTS" | awk '{
|
|
46
|
+
SLO_NAME=$(echo "$ARGUMENTS" | awk '{for(i=1;i<=NF;i++) if($i !~ /^--/) {print $i; exit}}')
|
|
45
47
|
LOOKAHEAD=$(echo "$ARGUMENTS" | grep -oE -- '--lookahead [^ ]+' | awk '{print $2}')
|
|
46
48
|
BASELINE=$(echo "$ARGUMENTS" | grep -oE -- '--baseline [^ ]+' | awk '{print $2}')
|
|
47
49
|
FORMAT=$(echo "$ARGUMENTS" | grep -oE -- '--format [^ ]+' | awk '{print $2}')
|
|
@@ -51,90 +53,240 @@ FORMAT=$(echo "$ARGUMENTS" | grep -oE -- '--format [^ ]+' | awk '{print $2}')
|
|
|
51
53
|
[ -z "$FORMAT" ] && FORMAT="table"
|
|
52
54
|
```
|
|
53
55
|
|
|
54
|
-
|
|
56
|
+
Convert duration to ms (helper):
|
|
57
|
+
```bash
|
|
58
|
+
to_ms() {
|
|
59
|
+
local d="$1"
|
|
60
|
+
case "$d" in
|
|
61
|
+
*h) echo $(( ${d%h} * 3600000 ));;
|
|
62
|
+
*m) echo $(( ${d%m} * 60000 ));;
|
|
63
|
+
*s) echo $(( ${d%s} * 1000 ));;
|
|
64
|
+
*d) echo $(( ${d%d} * 86400000 ));;
|
|
65
|
+
*) echo 0 ;;
|
|
66
|
+
esac
|
|
67
|
+
}
|
|
68
|
+
LOOKAHEAD_MS=$(to_ms "$LOOKAHEAD")
|
|
69
|
+
BASELINE_MS=$(to_ms "$BASELINE")
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## 2. Listar SLOs (FIX Phase 99: extension `.yml`, não `.md`)
|
|
55
73
|
|
|
56
74
|
```bash
|
|
57
75
|
if [ -n "$SLO_NAME" ]; then
|
|
58
|
-
SLO_FILES=(".planning/slos/${SLO_NAME}.
|
|
76
|
+
SLO_FILES=(".planning/slos/${SLO_NAME}.yml")
|
|
59
77
|
else
|
|
60
|
-
SLO_FILES=(.planning/slos/*.
|
|
78
|
+
SLO_FILES=(.planning/slos/*.yml)
|
|
61
79
|
fi
|
|
62
80
|
|
|
63
|
-
|
|
64
|
-
|
|
81
|
+
# Filtra entradas inexistentes (caso o glob não tenha match).
|
|
82
|
+
EXISTING_SLOS=()
|
|
83
|
+
for f in "${SLO_FILES[@]}"; do
|
|
84
|
+
[ -f "$f" ] && EXISTING_SLOS+=("$f")
|
|
85
|
+
done
|
|
86
|
+
|
|
87
|
+
if [ ${#EXISTING_SLOS[@]} -eq 0 ]; then
|
|
88
|
+
echo "Nenhum SLO definido em .planning/slos/. Rode /definir-slo <feature> primeiro."
|
|
65
89
|
exit 0
|
|
66
90
|
fi
|
|
67
91
|
```
|
|
68
92
|
|
|
69
|
-
## 3. Para cada SLO,
|
|
93
|
+
## 3. Para cada SLO, carregar metadata + calcular SLI
|
|
70
94
|
|
|
71
|
-
Para cada `SLO_FILE`:
|
|
95
|
+
Para cada `SLO_FILE` em `EXISTING_SLOS`:
|
|
96
|
+
|
|
97
|
+
### 3.1 Extrair campos canônicos do YAML via regex
|
|
98
|
+
|
|
99
|
+
Os SLOs do projeto seguem schema fixo (validado por `test/unit/slo-schema.test.js`). Sem `js-yaml` — regex sobre os keys conhecidos:
|
|
72
100
|
|
|
73
101
|
```bash
|
|
74
|
-
SLO_NAME=$(
|
|
75
|
-
|
|
102
|
+
SLO_NAME=$(grep -oE '^\s+name:\s*\S+' "$SLO_FILE" | head -1 | awk '{print $2}')
|
|
103
|
+
SERVICE=$(grep -oE '^\s+service:\s*\S+' "$SLO_FILE" | head -1 | awk '{print $2}')
|
|
104
|
+
SLO_TYPE=$(grep -oE '^\s+type:\s*\S+' "$SLO_FILE" | head -1 | awk '{print $2}')
|
|
105
|
+
|
|
106
|
+
# Availability SLO: target = ratio decimal (e.g. 0.995)
|
|
107
|
+
TARGET_RATIO=$(grep -oE '^target:\s*[0-9.]+' "$SLO_FILE" | awk '{print $2}')
|
|
108
|
+
# Latency SLO: target_ms + percentile
|
|
109
|
+
TARGET_MS=$(grep -oE '^target_ms:\s*[0-9]+' "$SLO_FILE" | awk '{print $2}')
|
|
110
|
+
PERCENTILE=$(grep -oE '^\s+percentile:\s*[0-9]+' "$SLO_FILE" | awk '{print $2}')
|
|
76
111
|
```
|
|
77
112
|
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
"
|
|
90
|
-
)
|
|
113
|
+
### 3.2 Carregar snapshots da janela baseline
|
|
114
|
+
|
|
115
|
+
Use a API `loadSnapshots()` adicionada em Phase 99. Inline node script:
|
|
116
|
+
|
|
117
|
+
```bash
|
|
118
|
+
SNAPS_JSON=$(node --input-type=module -e "
|
|
119
|
+
import { loadSnapshots } from './src/core/metrics.js';
|
|
120
|
+
const snaps = await loadSnapshots(process.cwd(), $BASELINE_MS);
|
|
121
|
+
console.log(JSON.stringify(snaps));
|
|
122
|
+
")
|
|
123
|
+
SNAPSHOT_COUNT=$(echo "$SNAPS_JSON" | node -e "console.log(JSON.parse(require('fs').readFileSync(0,'utf8')).length)")
|
|
91
124
|
```
|
|
92
125
|
|
|
93
|
-
|
|
126
|
+
Se `SNAPSHOT_COUNT < 2`, marcar SLO como `no_data`:
|
|
127
|
+
```bash
|
|
128
|
+
if [ "$SNAPSHOT_COUNT" -lt 2 ]; then
|
|
129
|
+
echo "SLO $SLO_NAME: insufficient snapshots in baseline window ($BASELINE) — got $SNAPSHOT_COUNT, need ≥2"
|
|
130
|
+
echo "Generate data: invoke 'metrics-snapshot' MCP tool during normal use."
|
|
131
|
+
STATUS="no_data"
|
|
132
|
+
continue # pula para o próximo SLO
|
|
133
|
+
fi
|
|
134
|
+
```
|
|
94
135
|
|
|
136
|
+
### 3.3 Calcular SLI por tipo de SLO
|
|
137
|
+
|
|
138
|
+
**Availability (`type: event-based`):**
|
|
139
|
+
|
|
140
|
+
Inline node — primeiro vs último snapshot dentro da janela. Delta de counters dá good/bad events na janela:
|
|
141
|
+
|
|
142
|
+
```bash
|
|
143
|
+
SLI_RESULT=$(node --input-type=module -e "
|
|
144
|
+
import { loadSnapshots } from './src/core/metrics.js';
|
|
145
|
+
const snaps = await loadSnapshots(process.cwd(), $BASELINE_MS);
|
|
146
|
+
if (snaps.length < 2) { console.log(JSON.stringify({sli:null, error:'no_data'})); process.exit(0); }
|
|
147
|
+
const first = snaps[0];
|
|
148
|
+
const last = snaps[snaps.length - 1];
|
|
149
|
+
let goodFirst = 0, goodLast = 0, totalFirst = 0, totalLast = 0;
|
|
150
|
+
for (const [k,v] of Object.entries(first.counters)) {
|
|
151
|
+
if (k.endsWith(':ok')) goodFirst += v;
|
|
152
|
+
totalFirst += v;
|
|
153
|
+
}
|
|
154
|
+
for (const [k,v] of Object.entries(last.counters)) {
|
|
155
|
+
if (k.endsWith(':ok')) goodLast += v;
|
|
156
|
+
totalLast += v;
|
|
157
|
+
}
|
|
158
|
+
const good = goodLast - goodFirst;
|
|
159
|
+
const total = totalLast - totalFirst;
|
|
160
|
+
const sli = total > 0 ? good / total : null;
|
|
161
|
+
const errorRate = total > 0 ? (total - good) / total : 0;
|
|
162
|
+
console.log(JSON.stringify({sli, errorRate, good, total, totalFirst, totalLast}));
|
|
163
|
+
")
|
|
95
164
|
```
|
|
165
|
+
|
|
166
|
+
**Latency (`type: percentile`):**
|
|
167
|
+
|
|
168
|
+
Para latency, usar o p95 do último snapshot na janela (cumulative — FIFO histogram dá p95 sobre as últimas 1000 amostras). SLI = fração de samples acima de target_ms é o budget consumido.
|
|
169
|
+
|
|
170
|
+
```bash
|
|
171
|
+
SLI_RESULT=$(node --input-type=module -e "
|
|
172
|
+
import { loadSnapshots } from './src/core/metrics.js';
|
|
173
|
+
const snaps = await loadSnapshots(process.cwd(), $BASELINE_MS);
|
|
174
|
+
if (snaps.length < 1) { console.log(JSON.stringify({sli:null, error:'no_data'})); process.exit(0); }
|
|
175
|
+
const last = snaps[snaps.length - 1];
|
|
176
|
+
const target = $TARGET_MS;
|
|
177
|
+
let totalSamples = 0, slowSamples = 0;
|
|
178
|
+
for (const [tool, lat] of Object.entries(last.latency)) {
|
|
179
|
+
totalSamples += lat.count;
|
|
180
|
+
if (lat.p95 > target) slowSamples += Math.round(lat.count * 0.05); // approximation: p95 above target → ~5% slow
|
|
181
|
+
}
|
|
182
|
+
const sli = totalSamples > 0 ? 1 - (slowSamples / totalSamples) : null;
|
|
183
|
+
const errorRate = totalSamples > 0 ? slowSamples / totalSamples : 0;
|
|
184
|
+
console.log(JSON.stringify({sli, errorRate, totalSamples, slowSamples, p95Max: Math.max(0, ...Object.values(last.latency).map(l => l.p95 || 0))}));
|
|
185
|
+
")
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
### 3.4 Calcular burn rate + status
|
|
189
|
+
|
|
190
|
+
Aplicar fórmula canônica da skill `burn-rate-alerting`:
|
|
191
|
+
|
|
192
|
+
```bash
|
|
193
|
+
BURN_STATUS=$(node --input-type=module -e "
|
|
194
|
+
const result = $SLI_RESULT;
|
|
195
|
+
if (result.error) { console.log(JSON.stringify({status:'no_data'})); process.exit(0); }
|
|
196
|
+
const target = $TARGET_RATIO || (1 - 0.05); // latency: 1 - ratio_above_target (5%)
|
|
197
|
+
const errorRate = result.errorRate;
|
|
198
|
+
const slack = 1 - target; // budget = (1 - target)
|
|
199
|
+
const burnRate = slack > 0 ? errorRate / slack : 0;
|
|
200
|
+
|
|
201
|
+
let status, action;
|
|
202
|
+
if (burnRate >= 14.4) {
|
|
203
|
+
status = 'PAGE';
|
|
204
|
+
action = 'Page on-call — invoke /investigar-producao';
|
|
205
|
+
} else if (burnRate >= 6.0) {
|
|
206
|
+
status = 'TICKET';
|
|
207
|
+
action = 'Open ticket — investigate before budget exhausted';
|
|
208
|
+
} else if (burnRate >= 1.0) {
|
|
209
|
+
status = 'WARN';
|
|
210
|
+
action = 'Monitor — burn rate sustained ≥1× exhausts budget in window';
|
|
211
|
+
} else {
|
|
212
|
+
status = 'OK';
|
|
213
|
+
action = '—';
|
|
214
|
+
}
|
|
215
|
+
|
|
216
|
+
// ETA exhaustion (predictive). For burn_rate=0 (no errors), ETA is ∞.
|
|
217
|
+
const baselineHours = $BASELINE_MS / 3600000;
|
|
218
|
+
const eta = burnRate > 0 ? (1 / burnRate) * 30 * 24 / baselineHours : null; // hours until exhausted
|
|
219
|
+
const etaStr = eta === null ? '—' : (eta < 24 ? eta.toFixed(1) + 'h' : (eta/24).toFixed(1) + 'd');
|
|
220
|
+
|
|
221
|
+
console.log(JSON.stringify({status, action, burnRate: burnRate.toFixed(2), errorRate: (errorRate*100).toFixed(2), eta: etaStr}));
|
|
222
|
+
")
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
### 3.5 Acumular linha da tabela
|
|
226
|
+
|
|
227
|
+
```bash
|
|
228
|
+
SLO_ROWS+=("| $SLO_NAME | ${TARGET_RATIO:-${TARGET_MS}ms p$PERCENTILE} | $BASELINE | ${ERROR_RATE}% | ${BURN_RATE}× | $ETA | **$STATUS** | $ACTION |")
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
## 4. Renderizar tabela mestra
|
|
232
|
+
|
|
233
|
+
```text
|
|
96
234
|
═══════════════════════════════════════════════════════════
|
|
97
235
|
framework ► BURN-RATE-STATUS ▸ {timestamp}
|
|
236
|
+
baseline=$BASELINE lookahead=$LOOKAHEAD snapshots=$TOTAL_SNAPS
|
|
98
237
|
═══════════════════════════════════════════════════════════
|
|
99
238
|
|
|
100
|
-
| SLO | Target | Window |
|
|
239
|
+
| SLO | Target | Window | Error rate | Burn rate | ETA exhaustão | Status | Ação |
|
|
101
240
|
|---|---|---|---|---|---|---|---|
|
|
102
|
-
|
|
103
|
-
| login_success | 99.95% | 30d | 78% | 8.0× | 4h | **PAGE** | invocar /investigar-producao |
|
|
104
|
-
| search_latency | 99% | 30d | 15% | 0.7× | — | OK | — |
|
|
241
|
+
{$SLO_ROWS}
|
|
105
242
|
```
|
|
106
243
|
|
|
107
244
|
## 5. Sugerir próximas ações
|
|
108
245
|
|
|
109
|
-
|
|
110
|
-
|
|
246
|
+
```bash
|
|
247
|
+
# Contar status counts
|
|
248
|
+
PAGE_COUNT=$(echo "$SLO_ROWS" | grep -c "PAGE" || echo 0)
|
|
249
|
+
TICKET_COUNT=$(echo "$SLO_ROWS" | grep -c "TICKET" || echo 0)
|
|
250
|
+
WARN_COUNT=$(echo "$SLO_ROWS" | grep -c "WARN" || echo 0)
|
|
251
|
+
NO_DATA_COUNT=$(echo "$SLO_ROWS" | grep -c "no_data" || echo 0)
|
|
111
252
|
```
|
|
112
|
-
## ⚠ SLOs em alerta:
|
|
113
|
-
1. login_success — burn rate 8.0×, ETA 4h
|
|
114
|
-
→ /investigar-producao "login_success burn rate = 8.0× às {timestamp}"
|
|
115
253
|
|
|
116
|
-
|
|
117
|
-
|
|
254
|
+
Output:
|
|
255
|
+
```text
|
|
256
|
+
## Próximas ações
|
|
257
|
+
|
|
258
|
+
{Se PAGE_COUNT > 0:}
|
|
259
|
+
⚠ {PAGE_COUNT} SLO(s) em PAGE — invocar /investigar-producao "<slo_name> burn rate {burn_rate}×"
|
|
260
|
+
|
|
261
|
+
{Se TICKET_COUNT > 0:}
|
|
262
|
+
☐ {TICKET_COUNT} SLO(s) em TICKET — abrir issue, investigar antes do budget esgotar
|
|
263
|
+
|
|
264
|
+
{Se WARN_COUNT > 0:}
|
|
265
|
+
ⓘ {WARN_COUNT} SLO(s) em WARN — burn rate sustained ≥1× exhausts budget
|
|
118
266
|
|
|
119
|
-
|
|
120
|
-
-
|
|
267
|
+
{Se NO_DATA_COUNT > 0:}
|
|
268
|
+
⊘ {NO_DATA_COUNT} SLO(s) sem dados na janela — invoque o MCP tool 'metrics-snapshot' periodicamente para popular .planning/metrics/snapshots/
|
|
121
269
|
```
|
|
122
270
|
|
|
123
|
-
## 6. Modo `/loop`
|
|
271
|
+
## 6. Modo `/loop` (idempotência)
|
|
124
272
|
|
|
125
273
|
Se chamado dentro de `/loop`, comportamento idempotente:
|
|
126
|
-
-
|
|
127
|
-
- Output curto se
|
|
128
|
-
- Acionar AskUserQuestion APENAS quando
|
|
274
|
+
- Snapshot fresh em cada invocação (não acumular state).
|
|
275
|
+
- Output curto se status não mudou (apenas linha-resumo; sem repetir tabela completa).
|
|
276
|
+
- Acionar AskUserQuestion APENAS quando algum SLO transiciona OK → WARN/TICKET/PAGE.
|
|
129
277
|
|
|
130
278
|
</process>
|
|
131
279
|
|
|
132
280
|
<success_criteria>
|
|
133
|
-
- [ ] $ARGUMENTS parseados (SLO opcional + flags)
|
|
134
|
-
- [ ] SLOs descobertos via glob `.planning/slos/*.md`
|
|
135
|
-
- [ ] `
|
|
136
|
-
- [ ]
|
|
137
|
-
- [ ]
|
|
281
|
+
- [ ] $ARGUMENTS parseados (SLO opcional + flags --lookahead/--baseline/--format)
|
|
282
|
+
- [ ] SLOs descobertos via glob `.planning/slos/*.yml` (FIX Phase 99: extension `.yml`, não `.md`)
|
|
283
|
+
- [ ] Snapshots carregados via `loadSnapshots()` (Phase 99 — `src/core/metrics.js`)
|
|
284
|
+
- [ ] SLI calculado por tipo (event-based ratio para availability, percentile para latency)
|
|
285
|
+
- [ ] Burn rate calculado pela fórmula `error_rate / (1 - target)` (skill burn-rate-alerting)
|
|
286
|
+
- [ ] Status enum: PAGE / TICKET / WARN / OK / no_data
|
|
287
|
+
- [ ] ETA exhaustão computada (predictive forecast)
|
|
288
|
+
- [ ] Tabela markdown agregada
|
|
138
289
|
- [ ] Sugestões de próximas ações para SLOs em alerta
|
|
139
|
-
- [ ] Idempotente
|
|
290
|
+
- [ ] Idempotente em /loop (sem acúmulo de state)
|
|
291
|
+
- [ ] no_data graceful — não inventa números, sugere `metrics-snapshot`
|
|
140
292
|
</success_criteria>
|
package/kit/file-manifest.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
|
-
"version": "1.
|
|
3
|
-
"timestamp": "2026-05-
|
|
2
|
+
"version": "1.19.0",
|
|
3
|
+
"timestamp": "2026-05-09T17:53:53.960Z",
|
|
4
4
|
"files": {
|
|
5
5
|
"COMANDOS.md": "d24ec61a6ec35db314cc5f2ae287bfb927b794789c8f1d558c55862f5e6534b2",
|
|
6
6
|
"COMPATIBILITY.md": "794e336a87045cdf0161785b9a7a0975a49abbd80bdd816b8852251fcc8126ca",
|
|
@@ -68,7 +68,7 @@
|
|
|
68
68
|
"commands/auditar-uat.md": "83e9f21584938350ee96ef0f0bb870786537bf38220c7a8ec0e04d06659c6bda",
|
|
69
69
|
"commands/autonomo.md": "ae5746a8b9cd63d9ac8cf2774b8b466789ccefec3d9e267dcb98d97481ede57f",
|
|
70
70
|
"commands/branch-pr.md": "77866ec7ef8d65ad6cea9d17491b7c7605238b3a3505dd3e128f18cd150c9be4",
|
|
71
|
-
"commands/burn-rate-status.md": "
|
|
71
|
+
"commands/burn-rate-status.md": "040fcc64b00bf5bcb9b69b7d3c1ef729647ddf0060d232fe061ea70b242747cf",
|
|
72
72
|
"commands/capturar-payloads.md": "507d009d9fb28fe12d18c3d3a599fbb23605254564e5753b056e0f32fb92f20b",
|
|
73
73
|
"commands/caracterizar-prompt.md": "996b923d6c807d94be77d14dbfec3fdabf98d3bf111f6928932421b724847fb3",
|
|
74
74
|
"commands/caracterizar.md": "994ce4136ba44b74890874f3274c26bcdc9f4feb5f4852cb0288687142ab1403",
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@luanpdd/kit-mcp",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.19.0",
|
|
4
4
|
"description": "Generic infrastructure to ship YOUR personal kit of agents/commands/skills as an MCP server, with cross-IDE sync (Claude Code, Cursor, Codex, Gemini, Windsurf, Antigravity, Copilot, Trae).",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -0,0 +1,268 @@
|
|
|
1
|
+
// OBS-18-01 / OBS-18-02 — in-memory golden signals for kit-mcp server.
|
|
2
|
+
// OBS-19-01 / OBS-19-02 / OBS-19-03 — disk-persistent rolling snapshots.
|
|
3
|
+
//
|
|
4
|
+
// Phase 94: Eat Your Own Dog Food. The skill `four-golden-signals` says any
|
|
5
|
+
// user-facing service worth its salt instruments Latency + Traffic + Errors
|
|
6
|
+
// + Saturation. The MCP server qualifies — every tool call is a request from
|
|
7
|
+
// an LLM client and tail latency / error rate are exactly the signals an
|
|
8
|
+
// operator wants when something feels off.
|
|
9
|
+
//
|
|
10
|
+
// Scope decisions (see .planning/phases/94-golden-signals-mcp-server/94-CONTEXT.md):
|
|
11
|
+
// - Zero new dependencies. Phase 99 adds fs/promises + path from stdlib only —
|
|
12
|
+
// the 6-deps budget Phase 92.01 fought to maintain and Phase 93.01 enforces
|
|
13
|
+
// in CI is preserved.
|
|
14
|
+
// - In-memory primary, on-demand persistence. The Map+array core stays
|
|
15
|
+
// in-memory; persistSnapshot writes a JSON file under .planning/metrics/
|
|
16
|
+
// snapshots/ when called. No background timer, no implicit writes — the
|
|
17
|
+
// /burn-rate-status command and metrics-snapshot tool are the writers.
|
|
18
|
+
// - Bounded memory. Histograms cap at HISTOGRAM_CAP=1000 samples per tool
|
|
19
|
+
// with FIFO drop.
|
|
20
|
+
// - Bounded disk. cleanupOldSnapshots prunes files > 30 days old on every
|
|
21
|
+
// persistSnapshot call (rolling window, no separate retention job).
|
|
22
|
+
// - Snapshot is read-only. Returns a fresh plain-object copy so callers
|
|
23
|
+
// can JSON.stringify it without exposing internal Map references.
|
|
24
|
+
// - Persisted shape includes `ts` (epoch ms) inside the JSON. We do NOT
|
|
25
|
+
// parse the filename for windowing — filesystem-safe ISO encoding
|
|
26
|
+
// (`replace(/[:.]/g, '-')`) is one-way (cannot reliably round-trip back
|
|
27
|
+
// through Date.parse) and mtime is unreliable across copy/touch. The
|
|
28
|
+
// in-file ts is authoritative.
|
|
29
|
+
//
|
|
30
|
+
// API surface (5 exports + 2 async):
|
|
31
|
+
// incrementInvocation(tool, status) — counter++ keyed `${tool}:${status}`
|
|
32
|
+
// recordLatency(tool, ms) — push to histogram, FIFO at cap
|
|
33
|
+
// snapshot() — { counters, latency } plain object
|
|
34
|
+
// reset() — clear both maps; called on boot if
|
|
35
|
+
// KIT_MCP_METRICS_RESET=1
|
|
36
|
+
// persistSnapshot(rootDir) — write {ts, counters, latency} to
|
|
37
|
+
// .planning/metrics/snapshots/<ts>.json
|
|
38
|
+
// + cleanup files > 30d
|
|
39
|
+
// loadSnapshots(rootDir, windowMs) — read all snapshots whose in-file ts
|
|
40
|
+
// is within windowMs (default 30d),
|
|
41
|
+
// sorted ascending by ts
|
|
42
|
+
//
|
|
43
|
+
// Boot-time reset honors the env var by calling reset() at module load when
|
|
44
|
+
// the flag is set. This keeps the signal "fresh" for a probe in tests or for
|
|
45
|
+
// an operator who spawned the server with the flag for a clean comparison.
|
|
46
|
+
|
|
47
|
+
import fs from 'node:fs/promises';
|
|
48
|
+
import path from 'node:path';
|
|
49
|
+
|
|
50
|
+
const HISTOGRAM_CAP = 1000;
|
|
51
|
+
const DEFAULT_RETENTION_MS = 30 * 86400 * 1000; // 30 days rolling.
|
|
52
|
+
const SNAPSHOT_DIR_REL = path.join('.planning', 'metrics', 'snapshots');
|
|
53
|
+
|
|
54
|
+
const counters = new Map(); // key: `${tool}:${status}` → count (number)
|
|
55
|
+
const histograms = new Map(); // key: tool → number[] (length ≤ HISTOGRAM_CAP)
|
|
56
|
+
|
|
57
|
+
/**
|
|
58
|
+
* Increment the invocation counter for a tool/status pair.
|
|
59
|
+
*
|
|
60
|
+
* @param {string} tool Tool name as it appears in the MCP request payload.
|
|
61
|
+
* @param {'ok'|'error'} [status='ok'] Outcome of the dispatch.
|
|
62
|
+
* @returns {void}
|
|
63
|
+
*/
|
|
64
|
+
export function incrementInvocation(tool, status = 'ok') {
|
|
65
|
+
if (typeof tool !== 'string' || tool.length === 0) return;
|
|
66
|
+
const key = `${tool}:${status}`;
|
|
67
|
+
counters.set(key, (counters.get(key) ?? 0) + 1);
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
/**
|
|
71
|
+
* Record an observed latency for a tool. Drops the oldest sample (FIFO) once
|
|
72
|
+
* the per-tool histogram reaches HISTOGRAM_CAP, keeping memory bounded across
|
|
73
|
+
* long-lived MCP sessions.
|
|
74
|
+
*
|
|
75
|
+
* @param {string} tool Tool name.
|
|
76
|
+
* @param {number} ms Elapsed wall-clock time in milliseconds.
|
|
77
|
+
* @returns {void}
|
|
78
|
+
*/
|
|
79
|
+
export function recordLatency(tool, ms) {
|
|
80
|
+
if (typeof tool !== 'string' || tool.length === 0) return;
|
|
81
|
+
if (typeof ms !== 'number' || !Number.isFinite(ms) || ms < 0) return;
|
|
82
|
+
let arr = histograms.get(tool);
|
|
83
|
+
if (!arr) {
|
|
84
|
+
arr = [];
|
|
85
|
+
histograms.set(tool, arr);
|
|
86
|
+
}
|
|
87
|
+
arr.push(ms);
|
|
88
|
+
if (arr.length > HISTOGRAM_CAP) arr.shift(); // FIFO drop oldest sample
|
|
89
|
+
}
|
|
90
|
+
|
|
91
|
+
/**
|
|
92
|
+
* Compute a percentile over a sorted ascending array. Linear-interpolation
|
|
93
|
+
* variant matches the typical Prometheus / Datadog reading. For N≤1000
|
|
94
|
+
* (HISTOGRAM_CAP) the sort cost on snapshot is acceptable — snapshots are
|
|
95
|
+
* read on-demand by the metrics-snapshot tool, not on every dispatch.
|
|
96
|
+
*
|
|
97
|
+
* @param {number[]} sorted Ascending-sorted samples.
|
|
98
|
+
* @param {number} p Percentile in [0, 1].
|
|
99
|
+
* @returns {number}
|
|
100
|
+
*/
|
|
101
|
+
function percentile(sorted, p) {
|
|
102
|
+
if (sorted.length === 0) return 0;
|
|
103
|
+
if (sorted.length === 1) return sorted[0];
|
|
104
|
+
const rank = p * (sorted.length - 1);
|
|
105
|
+
const lo = Math.floor(rank);
|
|
106
|
+
const hi = Math.ceil(rank);
|
|
107
|
+
if (lo === hi) return sorted[lo];
|
|
108
|
+
const frac = rank - lo;
|
|
109
|
+
return sorted[lo] + (sorted[hi] - sorted[lo]) * frac;
|
|
110
|
+
}
|
|
111
|
+
|
|
112
|
+
/**
|
|
113
|
+
* Build a read-only snapshot of all metrics. Counters are returned as a plain
|
|
114
|
+
* object keyed `${tool}:${status}` → count. Latency is keyed by tool to a
|
|
115
|
+
* `{ p50, p95, p99, count }` triple so a single tool never appears split
|
|
116
|
+
* across status outcomes (latency observation point is a single line in the
|
|
117
|
+
* dispatcher, success and failure both record).
|
|
118
|
+
*
|
|
119
|
+
* @returns {{
|
|
120
|
+
* counters: Record<string, number>,
|
|
121
|
+
* latency: Record<string, { p50: number, p95: number, p99: number, count: number }>
|
|
122
|
+
* }}
|
|
123
|
+
*/
|
|
124
|
+
export function snapshot() {
|
|
125
|
+
const out = { counters: {}, latency: {} };
|
|
126
|
+
for (const [key, val] of counters) out.counters[key] = val;
|
|
127
|
+
for (const [tool, samples] of histograms) {
|
|
128
|
+
if (samples.length === 0) continue;
|
|
129
|
+
const sorted = [...samples].sort((a, b) => a - b);
|
|
130
|
+
out.latency[tool] = {
|
|
131
|
+
p50: percentile(sorted, 0.50),
|
|
132
|
+
p95: percentile(sorted, 0.95),
|
|
133
|
+
p99: percentile(sorted, 0.99),
|
|
134
|
+
count: samples.length,
|
|
135
|
+
};
|
|
136
|
+
}
|
|
137
|
+
return out;
|
|
138
|
+
}
|
|
139
|
+
|
|
140
|
+
/**
|
|
141
|
+
* Clear both counters and histograms. Used by tests and by the boot-time
|
|
142
|
+
* KIT_MCP_METRICS_RESET=1 path so an operator can probe a fresh window.
|
|
143
|
+
*
|
|
144
|
+
* @returns {void}
|
|
145
|
+
*/
|
|
146
|
+
export function reset() {
|
|
147
|
+
counters.clear();
|
|
148
|
+
histograms.clear();
|
|
149
|
+
}
|
|
150
|
+
|
|
151
|
+
/**
|
|
152
|
+
* OBS-19-01 — Persist the current snapshot to disk under
|
|
153
|
+
* `<rootDir>/.planning/metrics/snapshots/<timestamp>.json`. Runs the rolling
|
|
154
|
+
* cleanup of files older than `retentionMs` (default 30d) on every call so
|
|
155
|
+
* callers don't need a separate retention job.
|
|
156
|
+
*
|
|
157
|
+
* The on-disk shape is `{ ts: <epoch_ms>, counters, latency }`. The `ts` field
|
|
158
|
+
* inside the JSON — NOT the filename — is the authoritative timestamp for
|
|
159
|
+
* loadSnapshots windowing. The filename uses an ISO encoding with `:` and `.`
|
|
160
|
+
* replaced by `-` for filesystem safety; that encoding is one-way (cannot
|
|
161
|
+
* round-trip back through Date.parse), so we never parse it for ordering.
|
|
162
|
+
*
|
|
163
|
+
* @param {string} [rootDir=process.cwd()] Project root. Snapshots land under
|
|
164
|
+
* `<rootDir>/.planning/metrics/snapshots/`.
|
|
165
|
+
* @param {object} [opts]
|
|
166
|
+
* @param {number} [opts.retentionMs] Override the rolling-window age in ms.
|
|
167
|
+
* Defaults to 30 days. Tests use shorter windows to drive the cleanup path.
|
|
168
|
+
* @returns {Promise<{file: string, snap: {ts: number, counters: object, latency: object}}>}
|
|
169
|
+
*/
|
|
170
|
+
export async function persistSnapshot(rootDir = process.cwd(), opts = {}) {
|
|
171
|
+
const retentionMs = Number.isFinite(opts.retentionMs) ? opts.retentionMs : DEFAULT_RETENTION_MS;
|
|
172
|
+
const dir = path.join(rootDir, SNAPSHOT_DIR_REL);
|
|
173
|
+
await fs.mkdir(dir, { recursive: true });
|
|
174
|
+
const ts = Date.now();
|
|
175
|
+
const snap = { ts, ...snapshot() };
|
|
176
|
+
// Filesystem-safe ISO encoding — Windows forbids `:` in paths and `.` is
|
|
177
|
+
// ambiguous with extension separators on shells with brace expansion.
|
|
178
|
+
const isoSafe = new Date(ts).toISOString().replace(/[:.]/g, '-');
|
|
179
|
+
const file = path.join(dir, `${isoSafe}.json`);
|
|
180
|
+
await fs.writeFile(file, JSON.stringify(snap, null, 2));
|
|
181
|
+
await cleanupOldSnapshots(dir, retentionMs);
|
|
182
|
+
return { file, snap };
|
|
183
|
+
}
|
|
184
|
+
|
|
185
|
+
/**
|
|
186
|
+
* OBS-19-02 — Load all snapshots from disk whose in-file `ts` is within the
|
|
187
|
+
* sliding window. Returns the array sorted ascending by `ts` so consumers
|
|
188
|
+
* (`/burn-rate-status`) can compute first-vs-last deltas without re-sorting.
|
|
189
|
+
*
|
|
190
|
+
* Defensive against malformed JSON: a corrupt file is skipped silently rather
|
|
191
|
+
* than aborting the whole load. The 30d window is rolling from "now" — pass a
|
|
192
|
+
* smaller value to drive recent-only views (e.g. `60 * 60 * 1000` for last
|
|
193
|
+
* hour) when computing burn rate over a baseline window.
|
|
194
|
+
*
|
|
195
|
+
* @param {string} [rootDir=process.cwd()] Project root.
|
|
196
|
+
* @param {number} [windowMs] Sliding window in ms. Defaults to 30 days.
|
|
197
|
+
* @returns {Promise<Array<{ts: number, counters: object, latency: object}>>}
|
|
198
|
+
* Empty array if the snapshots directory does not exist.
|
|
199
|
+
*/
|
|
200
|
+
export async function loadSnapshots(rootDir = process.cwd(), windowMs = DEFAULT_RETENTION_MS) {
|
|
201
|
+
const dir = path.join(rootDir, SNAPSHOT_DIR_REL);
|
|
202
|
+
const cutoff = Date.now() - windowMs;
|
|
203
|
+
let files;
|
|
204
|
+
try {
|
|
205
|
+
files = await fs.readdir(dir);
|
|
206
|
+
} catch {
|
|
207
|
+
return []; // Dir absent on first run — not an error.
|
|
208
|
+
}
|
|
209
|
+
const results = [];
|
|
210
|
+
for (const f of files) {
|
|
211
|
+
if (!f.endsWith('.json')) continue;
|
|
212
|
+
try {
|
|
213
|
+
const raw = await fs.readFile(path.join(dir, f), 'utf-8');
|
|
214
|
+
const parsed = JSON.parse(raw);
|
|
215
|
+
if (Number.isFinite(parsed?.ts) && parsed.ts >= cutoff) {
|
|
216
|
+
results.push(parsed);
|
|
217
|
+
}
|
|
218
|
+
} catch {
|
|
219
|
+
// Corrupt file — skip silently rather than break the whole burn-rate
|
|
220
|
+
// calculation. A future phase can surface counts via a doctor probe.
|
|
221
|
+
}
|
|
222
|
+
}
|
|
223
|
+
return results.sort((a, b) => a.ts - b.ts);
|
|
224
|
+
}
|
|
225
|
+
|
|
226
|
+
/**
|
|
227
|
+
* OBS-19-03 — Internal helper: delete snapshot files older than `maxAgeMs`.
|
|
228
|
+
* Called from persistSnapshot on every write so retention is implicit.
|
|
229
|
+
* Uses fs.stat().mtimeMs as the age proxy; we accept the small drift versus
|
|
230
|
+
* the in-file `ts` because cleanup is best-effort eviction, not authoritative
|
|
231
|
+
* windowing (loadSnapshots reads the in-file ts).
|
|
232
|
+
*
|
|
233
|
+
* @param {string} dir Absolute path to the snapshots directory.
|
|
234
|
+
* @param {number} maxAgeMs Files with mtime older than this are unlinked.
|
|
235
|
+
* @returns {Promise<void>}
|
|
236
|
+
*/
|
|
237
|
+
async function cleanupOldSnapshots(dir, maxAgeMs) {
|
|
238
|
+
const cutoff = Date.now() - maxAgeMs;
|
|
239
|
+
let files;
|
|
240
|
+
try {
|
|
241
|
+
files = await fs.readdir(dir);
|
|
242
|
+
} catch {
|
|
243
|
+
return;
|
|
244
|
+
}
|
|
245
|
+
for (const f of files) {
|
|
246
|
+
if (!f.endsWith('.json')) continue;
|
|
247
|
+
const fp = path.join(dir, f);
|
|
248
|
+
try {
|
|
249
|
+
const stat = await fs.stat(fp);
|
|
250
|
+
if (stat.mtimeMs < cutoff) await fs.unlink(fp);
|
|
251
|
+
} catch {
|
|
252
|
+
// Unlink can race with concurrent cleanup; ignore ENOENT and friends.
|
|
253
|
+
}
|
|
254
|
+
}
|
|
255
|
+
}
|
|
256
|
+
|
|
257
|
+
// Boot-time reset honors KIT_MCP_METRICS_RESET=1. We call reset() instead of
|
|
258
|
+
// merely skipping init because the maps are already empty at module load —
|
|
259
|
+
// the call is a no-op today but documents the contract for any future module
|
|
260
|
+
// that imports metrics.js after another module has already populated state.
|
|
261
|
+
if (process.env.KIT_MCP_METRICS_RESET === '1') {
|
|
262
|
+
reset();
|
|
263
|
+
}
|
|
264
|
+
|
|
265
|
+
// Exported for tests only — keeps the API surface explicit while letting unit
|
|
266
|
+
// tests assert on the FIFO behavior at the boundary.
|
|
267
|
+
export const __TEST_HISTOGRAM_CAP = HISTOGRAM_CAP;
|
|
268
|
+
export const __TEST_SNAPSHOT_DIR_REL = SNAPSHOT_DIR_REL;
|
package/src/mcp-server/index.js
CHANGED
|
@@ -1,10 +1,11 @@
|
|
|
1
|
-
// kit-mcp server — exposes
|
|
1
|
+
// kit-mcp server — exposes 7 tools, each with action-based dispatch (or none).
|
|
2
2
|
//
|
|
3
|
-
// kit
|
|
4
|
-
// sync
|
|
5
|
-
// gates
|
|
6
|
-
// forensics
|
|
7
|
-
// install
|
|
3
|
+
// kit action: list-agents | list-commands | list-skills | get | search
|
|
4
|
+
// sync action: targets | status | install | remove
|
|
5
|
+
// gates action: list | get | for-stage
|
|
6
|
+
// forensics action: collect | summarize | write-learnings | list-replays | record-replay | load-replay
|
|
7
|
+
// install action: targets | install | dry-run (registers this MCP into an IDE)
|
|
8
|
+
// metrics-snapshot (parameterless) (OBS-18 four-golden-signals readout)
|
|
8
9
|
//
|
|
9
10
|
// Transport: stdio (MCP standard).
|
|
10
11
|
|
|
@@ -30,6 +31,7 @@ import { recordReplay, listReplays, loadReplay, annotateReplay } from '../core/r
|
|
|
30
31
|
import { installMcp, listInstallTargets } from './install.js';
|
|
31
32
|
import { ensureSidecar } from '../ui/auto-spawn.js';
|
|
32
33
|
import { wrapProgressForUi } from '../ui/wrapper.js';
|
|
34
|
+
import { incrementInvocation, recordLatency, snapshot as metricsSnapshot } from '../core/metrics.js';
|
|
33
35
|
|
|
34
36
|
const TOOLS = [
|
|
35
37
|
{
|
|
@@ -130,6 +132,17 @@ const TOOLS = [
|
|
|
130
132
|
required: ['action'],
|
|
131
133
|
},
|
|
132
134
|
},
|
|
135
|
+
{
|
|
136
|
+
// OBS-18 (Phase 94.01): expose four-golden-signals data for the MCP server itself.
|
|
137
|
+
// Read-only (no auth needed beyond the underlying transport): returns counters
|
|
138
|
+
// keyed `${tool}:${status}` and per-tool latency p50/p95/p99/count.
|
|
139
|
+
name: 'metrics-snapshot',
|
|
140
|
+
description: 'Read in-memory golden-signals metrics for this MCP server (counters + latency p50/p95/p99 per tool).',
|
|
141
|
+
inputSchema: {
|
|
142
|
+
type: 'object',
|
|
143
|
+
properties: {},
|
|
144
|
+
},
|
|
145
|
+
},
|
|
133
146
|
];
|
|
134
147
|
|
|
135
148
|
// DRIFT-13-03: read version from package.json at module load (NOT inside
|
|
@@ -292,13 +305,21 @@ async function handleInstall(args) {
|
|
|
292
305
|
}
|
|
293
306
|
}
|
|
294
307
|
|
|
308
|
+
// OBS-18 (Phase 94.01): metrics-snapshot is parameterless and read-only.
|
|
309
|
+
// Returns the live snapshot synchronously — no auth, no projectRoot guard
|
|
310
|
+
// (no disk reads, no shell). Wraps in an async fn for handler-API uniformity.
|
|
311
|
+
async function handleMetricsSnapshot() {
|
|
312
|
+
return metricsSnapshot();
|
|
313
|
+
}
|
|
314
|
+
|
|
295
315
|
const HANDLERS = {
|
|
296
|
-
kit:
|
|
297
|
-
sync:
|
|
298
|
-
'reverse-sync':handleReverseSync,
|
|
299
|
-
gates:
|
|
300
|
-
forensics:
|
|
301
|
-
install:
|
|
316
|
+
kit: handleKit,
|
|
317
|
+
sync: handleSync,
|
|
318
|
+
'reverse-sync': handleReverseSync,
|
|
319
|
+
gates: handleGates,
|
|
320
|
+
forensics: handleForensics,
|
|
321
|
+
install: handleInstall,
|
|
322
|
+
'metrics-snapshot': handleMetricsSnapshot,
|
|
302
323
|
};
|
|
303
324
|
|
|
304
325
|
function slim(x) {
|
|
@@ -330,12 +351,30 @@ export async function createServer() {
|
|
|
330
351
|
const { name, arguments: args } = req.params;
|
|
331
352
|
const handler = HANDLERS[name];
|
|
332
353
|
if (!handler) {
|
|
354
|
+
// OBS-18 (Phase 94.01): unknown-tool path counts as an error against
|
|
355
|
+
// the unknown name itself — useful signal if a client is mis-spelling
|
|
356
|
+
// a tool name in production. No latency observation (handler never ran).
|
|
357
|
+
incrementInvocation(name || 'unknown', 'error');
|
|
333
358
|
return { content: [{ type: 'text', text: JSON.stringify({ error: `Unknown tool: ${name}` }) }], isError: true };
|
|
334
359
|
}
|
|
360
|
+
// OBS-18 (Phase 94.01): timestamp the dispatch boundary. The four-golden-signals
|
|
361
|
+
// skill cares about the *user-facing* latency, which for the MCP server is the
|
|
362
|
+
// time from request receipt (we are inside the SDK callback) to the JSON envelope
|
|
363
|
+
// being ready. Date.now() is sub-millisecond-cheap and aligns with the bucket
|
|
364
|
+
// granularity we report (50/100/250/500ms thresholds in CONTEXT.md).
|
|
365
|
+
const start = Date.now();
|
|
335
366
|
try {
|
|
336
367
|
const result = await handler(args ?? {});
|
|
368
|
+
recordLatency(name, Date.now() - start);
|
|
369
|
+
incrementInvocation(name, 'ok');
|
|
337
370
|
return { content: [{ type: 'text', text: JSON.stringify(result, null, 2) }] };
|
|
338
371
|
} catch (e) {
|
|
372
|
+
// OBS-18: still record latency on the error path — half the value of a
|
|
373
|
+
// latency histogram is catching tail-latency-then-fail patterns. Status
|
|
374
|
+
// 'error' covers any thrown exception, including Phase 79.01 gates guard
|
|
375
|
+
// and the validateProjectRoot rejection (Phase 83.01).
|
|
376
|
+
recordLatency(name, Date.now() - start);
|
|
377
|
+
incrementInvocation(name, 'error');
|
|
339
378
|
// SEC-14-06: full stack stays in stderr for operator debug; client envelope is sanitized.
|
|
340
379
|
// sanitizeMcpError redacts secrets/paths from e.message, preserves e.code (Phase 83
|
|
341
380
|
// EMANIFESTMISMATCH invariant), and emits NO stack field.
|