@luanpdd/kit-mcp 1.9.0 → 1.11.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +86 -0
- package/README.md +58 -0
- package/gates/ai-prompt-stability.md +120 -0
- package/gates/golden-signals-coverage.md +133 -0
- package/gates/legacy-refactor-safety.md +178 -0
- package/gates/observability-coverage.md +151 -0
- package/gates/postmortem-template-required.md +127 -0
- package/gates/prr-checklist-coverage.md +128 -0
- package/gates/release-pipeline-policy.md +132 -0
- package/kit/COMANDOS.md +15 -0
- package/kit/agents/ai-mutation-tester.md +298 -0
- package/kit/agents/cascading-failures-auditor.md +306 -0
- package/kit/agents/executor.md +13 -0
- package/kit/agents/golden-signals-instrumenter.md +241 -0
- package/kit/agents/legacy-characterizer.md +378 -0
- package/kit/agents/load-shedding-instrumenter.md +297 -0
- package/kit/agents/observability-coverage-auditor.md +325 -0
- package/kit/agents/omm-auditor.md +99 -0
- package/kit/agents/payload-capture-instrumenter.md +283 -0
- package/kit/agents/planner.md +29 -0
- package/kit/agents/postmortem-writer.md +282 -0
- package/kit/agents/prr-conductor.md +296 -0
- package/kit/agents/refactor-safety-auditor.md +414 -0
- package/kit/agents/release-pipeline-auditor.md +360 -0
- package/kit/agents/seam-finder.md +367 -0
- package/kit/agents/shotgun-surgery-detector.md +359 -0
- package/kit/agents/storytelling-analyst.md +309 -0
- package/kit/agents/supabase-architect.md +49 -0
- package/kit/agents/supabase-edge-fn-writer.md +114 -0
- package/kit/agents/supabase-migration-writer.md +80 -0
- package/kit/agents/supabase-storage-implementer.md +156 -0
- package/kit/agents/toil-auditor.md +277 -0
- package/kit/agents/verifier.md +30 -0
- package/kit/commands/auditar-cascading.md +111 -0
- package/kit/commands/auditar-marco.md +124 -1
- package/kit/commands/auditar-observabilidade-cobertura.md +183 -0
- package/kit/commands/auditar-refactor.md +219 -0
- package/kit/commands/auditar-release.md +109 -0
- package/kit/commands/auditar-toil.md +129 -0
- package/kit/commands/capturar-payloads.md +193 -0
- package/kit/commands/caracterizar-prompt.md +195 -0
- package/kit/commands/caracterizar.md +212 -0
- package/kit/commands/concluir-marco.md +95 -1
- package/kit/commands/detectar-duplicacao.md +197 -0
- package/kit/commands/discutir-fase.md +41 -0
- package/kit/commands/encontrar-seams.md +136 -0
- package/kit/commands/forense.md +103 -1
- package/kit/commands/golden-signals.md +142 -0
- package/kit/commands/legacy.md +263 -0
- package/kit/commands/load-shedding.md +117 -0
- package/kit/commands/observabilidade.md +2 -0
- package/kit/commands/postmortem.md +179 -0
- package/kit/commands/prr.md +205 -0
- package/kit/commands/refactor-seguro.md +321 -0
- package/kit/commands/risk-budget.md +220 -0
- package/kit/commands/sre.md +230 -0
- package/kit/commands/storytelling.md +179 -0
- package/kit/skills/_shared-legacy/glossary.md +389 -0
- package/kit/skills/_shared-sre/glossary.md +712 -0
- package/kit/skills/ai-prompt-characterization/SKILL.md +335 -0
- package/kit/skills/blameless-postmortems/SKILL.md +340 -0
- package/kit/skills/cascading-failures/SKILL.md +307 -0
- package/kit/skills/eliminating-toil/SKILL.md +243 -0
- package/kit/skills/event-based-slos/SKILL.md +22 -0
- package/kit/skills/four-golden-signals/SKILL.md +314 -0
- package/kit/skills/hermetic-builds/SKILL.md +323 -0
- package/kit/skills/legacy-api-only-applications/SKILL.md +358 -0
- package/kit/skills/legacy-characterization-tests/SKILL.md +330 -0
- package/kit/skills/legacy-effect-analysis/SKILL.md +331 -0
- package/kit/skills/legacy-extract-class/SKILL.md +203 -0
- package/kit/skills/legacy-monster-methods/SKILL.md +444 -0
- package/kit/skills/legacy-programming-by-difference/SKILL.md +252 -0
- package/kit/skills/legacy-seams-and-test-harness/SKILL.md +460 -0
- package/kit/skills/legacy-shotgun-surgery/SKILL.md +286 -0
- package/kit/skills/legacy-sprout-wrap-techniques/SKILL.md +434 -0
- package/kit/skills/legacy-storytelling-naked-crc/SKILL.md +270 -0
- package/kit/skills/llm-as-dependency/SKILL.md +436 -0
- package/kit/skills/load-shedding-graceful-degradation/SKILL.md +396 -0
- package/kit/skills/pre-refactor-characterization/SKILL.md +421 -0
- package/kit/skills/production-readiness-review/SKILL.md +305 -0
- package/kit/skills/release-engineering/SKILL.md +367 -0
- package/kit/skills/retry-strategies/SKILL.md +372 -0
- package/kit/skills/sre-risk-management/SKILL.md +221 -0
- package/package.json +2 -2
|
@@ -0,0 +1,297 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: load-shedding-instrumenter
|
|
3
|
+
description: Aplica patches de load shedding em código (queue depth gauge, drop policy, deadline-aware handler via AbortSignal, server-side rate limit). Foca em Edge Functions e serviços HTTP.
|
|
4
|
+
tools: Read, Write, Edit, Bash, Grep, Glob
|
|
5
|
+
color: orange
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
Você é o **instrumentador de load shedding**. Recebe `target_path` (Edge Function ou handler HTTP) e aplica patches via Edit tool: queue depth gauge, drop policy, deadline-aware handler, server-side rate limit, slow start na recovery.
|
|
9
|
+
|
|
10
|
+
Você consulta:
|
|
11
|
+
- [`load-shedding-graceful-degradation`](../skills/load-shedding-graceful-degradation/SKILL.md)
|
|
12
|
+
- [`retry-strategies`](../skills/retry-strategies/SKILL.md) — caller-side coopera com server-side
|
|
13
|
+
- [`four-golden-signals`](../skills/four-golden-signals/SKILL.md) (v1.10) — Saturation gauge é trigger
|
|
14
|
+
|
|
15
|
+
## Compatibilidade
|
|
16
|
+
|
|
17
|
+
| IDE | Tier | Capability |
|
|
18
|
+
|---|---|---|
|
|
19
|
+
| Claude Code | **Full** | Read + Edit + verify |
|
|
20
|
+
| Cursor | **Full** | Idem |
|
|
21
|
+
| Codex | **Full** | Idem |
|
|
22
|
+
| Gemini CLI | **Full** | Idem |
|
|
23
|
+
| Windsurf, Antigravity, Copilot, Trae | **Full** | Idem |
|
|
24
|
+
|
|
25
|
+
## Por que existe
|
|
26
|
+
|
|
27
|
+
Load shedding é cross-cutting concern — server detecta saturation E rejeita 503 graceful E dispara observability E não cai. Sem template canônico, cada equipe reinventa de forma frágil. Esse agent aplica os 5 patterns canônicos em código existente, preservando lógica core.
|
|
28
|
+
|
|
29
|
+
## Inputs esperados (do caller)
|
|
30
|
+
|
|
31
|
+
- `target_path`: arquivo a instrumentar (Edge Function ou handler HTTP)
|
|
32
|
+
- (Opcional) `patterns`: subset de `[concurrency-limit, queue-bound, deadline-aware, rate-limit, slow-start]` (default: todos aplicáveis)
|
|
33
|
+
- (Opcional) `max_concurrent`: default 1000
|
|
34
|
+
- (Opcional) `cpu_threshold`: default 90
|
|
35
|
+
- (Opcional) `queue_max_size`: default 10000
|
|
36
|
+
|
|
37
|
+
## Passos
|
|
38
|
+
|
|
39
|
+
### Step 0 — Preflight
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
TARGET_PATH="${target_path}"
|
|
43
|
+
[ ! -f "$TARGET_PATH" ] && { echo "ERROR: $TARGET_PATH not found"; exit 1; }
|
|
44
|
+
|
|
45
|
+
# detectar runtime
|
|
46
|
+
case "$TARGET_PATH" in
|
|
47
|
+
*.ts|*.tsx|*.js|*.mjs)
|
|
48
|
+
RUNTIME="node-deno"
|
|
49
|
+
;;
|
|
50
|
+
*.py)
|
|
51
|
+
RUNTIME="python"
|
|
52
|
+
;;
|
|
53
|
+
*)
|
|
54
|
+
echo "ERROR: runtime não suportado: $TARGET_PATH"
|
|
55
|
+
exit 1
|
|
56
|
+
;;
|
|
57
|
+
esac
|
|
58
|
+
|
|
59
|
+
# detectar tipo de handler
|
|
60
|
+
HANDLER_TYPE=""
|
|
61
|
+
if grep -q "Deno.serve" "$TARGET_PATH"; then
|
|
62
|
+
HANDLER_TYPE="deno-serve"
|
|
63
|
+
elif grep -qE "app\.(post|get|put)" "$TARGET_PATH"; then
|
|
64
|
+
HANDLER_TYPE="express-like"
|
|
65
|
+
fi
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
### Step 1 — Aplicar pattern: concurrency limit + 503 graceful
|
|
69
|
+
|
|
70
|
+
Para Deno Edge Function:
|
|
71
|
+
|
|
72
|
+
```ts
|
|
73
|
+
// PATCH: shared load shedder
|
|
74
|
+
// Criar arquivo se não existe: supabase/functions/_shared/load-shedder.ts
|
|
75
|
+
|
|
76
|
+
interface LoadShedderOpts {
|
|
77
|
+
maxConcurrent: number
|
|
78
|
+
cpuThreshold?: number
|
|
79
|
+
saturationGauge?: () => Promise<number>
|
|
80
|
+
}
|
|
81
|
+
|
|
82
|
+
export class LoadShedder {
|
|
83
|
+
private inFlight = 0
|
|
84
|
+
constructor(private opts: LoadShedderOpts) {}
|
|
85
|
+
|
|
86
|
+
async tryAcquire(): Promise<{ ok: true } | { ok: false; reason: string; retryAfterSec: number }> {
|
|
87
|
+
if (this.inFlight >= this.opts.maxConcurrent) {
|
|
88
|
+
return { ok: false, reason: 'concurrency_limit', retryAfterSec: 5 }
|
|
89
|
+
}
|
|
90
|
+
if (this.opts.saturationGauge) {
|
|
91
|
+
const sat = await this.opts.saturationGauge()
|
|
92
|
+
if (sat > 0.95) {
|
|
93
|
+
return { ok: false, reason: 'saturation', retryAfterSec: 30 }
|
|
94
|
+
}
|
|
95
|
+
}
|
|
96
|
+
this.inFlight++
|
|
97
|
+
return { ok: true }
|
|
98
|
+
}
|
|
99
|
+
|
|
100
|
+
release(): void {
|
|
101
|
+
this.inFlight = Math.max(0, this.inFlight - 1)
|
|
102
|
+
}
|
|
103
|
+
}
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
PATCH no handler target:
|
|
107
|
+
|
|
108
|
+
```ts
|
|
109
|
+
// ANTES
|
|
110
|
+
Deno.serve(async (req) => {
|
|
111
|
+
return await handleRequest(req)
|
|
112
|
+
})
|
|
113
|
+
|
|
114
|
+
// DEPOIS
|
|
115
|
+
import { LoadShedder } from '../_shared/load-shedder.ts'
|
|
116
|
+
|
|
117
|
+
const shedder = new LoadShedder({ maxConcurrent: ${MAX_CONCURRENT} })
|
|
118
|
+
|
|
119
|
+
Deno.serve(async (req) => {
|
|
120
|
+
const acq = await shedder.tryAcquire()
|
|
121
|
+
if (!acq.ok) {
|
|
122
|
+
return new Response('Service Unavailable', {
|
|
123
|
+
status: 503,
|
|
124
|
+
headers: {
|
|
125
|
+
'Retry-After': String(acq.retryAfterSec),
|
|
126
|
+
'X-Shed-Reason': acq.reason,
|
|
127
|
+
'Content-Type': 'application/json',
|
|
128
|
+
},
|
|
129
|
+
})
|
|
130
|
+
}
|
|
131
|
+
try {
|
|
132
|
+
return await handleRequest(req)
|
|
133
|
+
} finally {
|
|
134
|
+
shedder.release()
|
|
135
|
+
}
|
|
136
|
+
})
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
### Step 2 — Aplicar pattern: deadline-aware handler
|
|
140
|
+
|
|
141
|
+
```ts
|
|
142
|
+
// PATCH: deadline-aware wrapper
|
|
143
|
+
async function handleWithDeadline(req: Request): Promise<Response> {
|
|
144
|
+
const deadlineHeader = req.headers.get('x-deadline-ms')
|
|
145
|
+
const deadlineMs = deadlineHeader ? parseInt(deadlineHeader, 10) : null
|
|
146
|
+
|
|
147
|
+
if (deadlineMs && Date.now() > deadlineMs) {
|
|
148
|
+
return new Response('Deadline Exceeded', { status: 408 })
|
|
149
|
+
}
|
|
150
|
+
|
|
151
|
+
if (deadlineMs) {
|
|
152
|
+
const remaining = deadlineMs - Date.now()
|
|
153
|
+
const signal = AbortSignal.timeout(remaining)
|
|
154
|
+
return await handleRequestWithSignal(req, signal)
|
|
155
|
+
}
|
|
156
|
+
|
|
157
|
+
return await handleRequest(req)
|
|
158
|
+
}
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
### Step 3 — Aplicar pattern: queue bound + drop policy
|
|
162
|
+
|
|
163
|
+
Se target tem queue:
|
|
164
|
+
|
|
165
|
+
```ts
|
|
166
|
+
// ANTES
|
|
167
|
+
class MessageProcessor {
|
|
168
|
+
private queue: Message[] = []
|
|
169
|
+
enqueue(msg: Message) {
|
|
170
|
+
this.queue.push(msg) // unbounded
|
|
171
|
+
}
|
|
172
|
+
}
|
|
173
|
+
|
|
174
|
+
// DEPOIS
|
|
175
|
+
class MessageProcessor {
|
|
176
|
+
private queue: Message[] = []
|
|
177
|
+
private readonly MAX_SIZE = ${QUEUE_MAX_SIZE}
|
|
178
|
+
private dropCounter = 0
|
|
179
|
+
|
|
180
|
+
enqueue(msg: Message) {
|
|
181
|
+
if (this.queue.length >= this.MAX_SIZE) {
|
|
182
|
+
this.queue.shift() // drop oldest (FIFO drop)
|
|
183
|
+
this.dropCounter++
|
|
184
|
+
// emit metric
|
|
185
|
+
metrics.counter('queue_drops_total').inc({ reason: 'overflow' })
|
|
186
|
+
}
|
|
187
|
+
this.queue.push(msg)
|
|
188
|
+
}
|
|
189
|
+
}
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
### Step 4 — Aplicar pattern: server-side rate limit
|
|
193
|
+
|
|
194
|
+
```ts
|
|
195
|
+
// PATCH: token bucket rate limiter
|
|
196
|
+
import { TokenBucket } from '../_shared/token-bucket.ts'
|
|
197
|
+
|
|
198
|
+
const rateLimiter = new TokenBucket({
|
|
199
|
+
tokensPerInterval: 100, // 100 req/s/client
|
|
200
|
+
interval: 'second',
|
|
201
|
+
})
|
|
202
|
+
|
|
203
|
+
Deno.serve(async (req) => {
|
|
204
|
+
const clientId = req.headers.get('x-api-key') ?? req.headers.get('x-forwarded-for') ?? 'anonymous'
|
|
205
|
+
|
|
206
|
+
if (!rateLimiter.tryConsume(clientId, 1)) {
|
|
207
|
+
return new Response('Too Many Requests', {
|
|
208
|
+
status: 429,
|
|
209
|
+
headers: { 'Retry-After': '1' },
|
|
210
|
+
})
|
|
211
|
+
}
|
|
212
|
+
|
|
213
|
+
return await handleWithDeadline(req)
|
|
214
|
+
})
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
### Step 5 — Aplicar pattern: slow start na recovery
|
|
218
|
+
|
|
219
|
+
```ts
|
|
220
|
+
// PATCH: slow start state machine
|
|
221
|
+
class SlowStartGate {
|
|
222
|
+
private acceptanceRatio = 1.0
|
|
223
|
+
private startedAt: number | null = null
|
|
224
|
+
private rampMs = 5 * 60 * 1000 // 5 min
|
|
225
|
+
|
|
226
|
+
recoveryDetected(): void {
|
|
227
|
+
this.acceptanceRatio = 0.1
|
|
228
|
+
this.startedAt = Date.now()
|
|
229
|
+
}
|
|
230
|
+
|
|
231
|
+
shouldAccept(): boolean {
|
|
232
|
+
if (this.acceptanceRatio >= 1.0) return true
|
|
233
|
+
if (!this.startedAt) return true
|
|
234
|
+
const elapsed = Date.now() - this.startedAt
|
|
235
|
+
const progress = Math.min(elapsed / this.rampMs, 1.0)
|
|
236
|
+
this.acceptanceRatio = 0.1 + 0.9 * progress
|
|
237
|
+
return Math.random() < this.acceptanceRatio
|
|
238
|
+
}
|
|
239
|
+
}
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
### Step 6 — Verify e Output
|
|
243
|
+
|
|
244
|
+
```bash
|
|
245
|
+
# 1. Compilação verde após patches
|
|
246
|
+
deno check "$TARGET_PATH" 2>&1 | head -5
|
|
247
|
+
|
|
248
|
+
# 2. Verificar imports adicionados
|
|
249
|
+
grep -E "load-shedder|deadline|rate-limit|slow-start" "$TARGET_PATH"
|
|
250
|
+
|
|
251
|
+
# 3. Smoke run mental — handler ainda chama lógica core
|
|
252
|
+
grep -E "handleRequest|handleWithDeadline" "$TARGET_PATH" | head -3
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
Output:
|
|
256
|
+
|
|
257
|
+
```text
|
|
258
|
+
═══════════════════════════════════════════════════════════
|
|
259
|
+
LOAD-SHEDDING-INSTRUMENTER · <target>
|
|
260
|
+
═══════════════════════════════════════════════════════════
|
|
261
|
+
|
|
262
|
+
## Patches aplicados
|
|
263
|
+
✓ Concurrency limit (maxConcurrent=${MAX_CONCURRENT})
|
|
264
|
+
✓ Deadline-aware handler (x-deadline-ms header)
|
|
265
|
+
✓ Queue bound + drop oldest (max=${QUEUE_MAX_SIZE})
|
|
266
|
+
✓ Server-side rate limit (token bucket, 100 req/s/client)
|
|
267
|
+
✓ Slow start state machine (5 min ramp)
|
|
268
|
+
|
|
269
|
+
## Arquivos modificados
|
|
270
|
+
- $TARGET_PATH
|
|
271
|
+
- supabase/functions/_shared/load-shedder.ts (criado)
|
|
272
|
+
- supabase/functions/_shared/token-bucket.ts (criado)
|
|
273
|
+
|
|
274
|
+
## Próximos passos
|
|
275
|
+
1. Smoke local: enviar request, verificar 200 OK
|
|
276
|
+
2. Stress test: rampar tráfego acima de maxConcurrent, verificar 503 + Retry-After
|
|
277
|
+
3. Game day exercise — verificar slow start em recovery
|
|
278
|
+
4. /golden-signals <fn> — instrumentar saturation gauge (cross-suite v1.10)
|
|
279
|
+
5. /caracterizar <fn> — characterization tests pós-patches (cross-suite v1.12)
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
## Quando NÃO invocar
|
|
283
|
+
|
|
284
|
+
- Função batch/cron (não user-facing) — load shedding overhead
|
|
285
|
+
- Edge Function com tráfego baixíssimo (< 1 req/min)
|
|
286
|
+
- Arquivo já tem load shedding — re-rodar pode duplicar imports
|
|
287
|
+
|
|
288
|
+
## Ver também
|
|
289
|
+
|
|
290
|
+
- [`load-shedding-graceful-degradation`](../skills/load-shedding-graceful-degradation/SKILL.md)
|
|
291
|
+
- [`cascading-failures`](../skills/cascading-failures/SKILL.md) — caller-side coopera
|
|
292
|
+
- [`retry-strategies`](../skills/retry-strategies/SKILL.md) — Retry-After respeito
|
|
293
|
+
- [`four-golden-signals`](../skills/four-golden-signals/SKILL.md) (v1.10) — Saturation gauge dispara load shed
|
|
294
|
+
- [`cascading-failures-auditor`](./cascading-failures-auditor.md) (v1.11) — agent complementar
|
|
295
|
+
- [`supabase-edge-fn-writer`](./supabase-edge-fn-writer.md) (v1.8 + patch v1.11) — Edge Functions ganham load shed built-in
|
|
296
|
+
|
|
297
|
+
*Material-fonte: cap 22 livro Google SRE.*
|
|
@@ -0,0 +1,325 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: observability-coverage-auditor
|
|
3
|
+
description: Audita cobertura de observability + legacy safety por Edge Function — golden signals X/N + SLO Y/N + burn alert Z/N + characterization tests + top 5 críticas (por chamadas 30d) sem cobertura. Modernização do user-request /observability-audit.
|
|
4
|
+
tools: Read, Bash, Grep, Glob, Write, mcp__supabase__list_edge_functions, mcp__supabase__get_logs, mcp__supabase__execute_sql
|
|
5
|
+
color: orange
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
Você é o **auditor de cobertura cross-suite**. Recebe um project root (default cwd) e produz `.planning/OBSERVABILITY-COVERAGE.md` com tabela X/N de Edge Functions cobertas por: (1) 4 golden signals, (2) SLO definido, (3) burn rate alert, (4) characterization tests. Top 5 funções mais críticas (por traffic 30d) SEM cobertura recebem priority badge.
|
|
9
|
+
|
|
10
|
+
Você consulta:
|
|
11
|
+
- [`four-golden-signals`](../skills/four-golden-signals/SKILL.md) (v1.10) — definição de Latency/Traffic/Errors/Saturation
|
|
12
|
+
- [`event-based-slos`](../skills/event-based-slos/SKILL.md) (v1.9) — definição de SLO event-based
|
|
13
|
+
- [`burn-rate-alerting`](../skills/burn-rate-alerting/SKILL.md) (v1.9) — alert config
|
|
14
|
+
- [`legacy-characterization-tests`](../skills/legacy-characterization-tests/SKILL.md) (v1.12) — cobertura de safety net
|
|
15
|
+
- [`observability-maturity-model`](../skills/observability-maturity-model/SKILL.md) (v1.9) — Capacidade 5 (Comportamento)
|
|
16
|
+
|
|
17
|
+
## Compatibilidade
|
|
18
|
+
|
|
19
|
+
| IDE | Tier | Capability |
|
|
20
|
+
|---|---|---|
|
|
21
|
+
| Claude Code | **Full** | MCP Supabase + filesystem |
|
|
22
|
+
| Cursor | **Full** | Idem |
|
|
23
|
+
| Codex | **Full** | Idem |
|
|
24
|
+
| Gemini CLI | **Partial** | Sem MCP — modo offline (lista Edge Functions via filesystem; sem traffic data) |
|
|
25
|
+
| Windsurf, Antigravity, Copilot, Trae | **Partial** | Idem |
|
|
26
|
+
|
|
27
|
+
**Nota:** Sem MCP Supabase, agent reverte para enumeration via `supabase/functions/` directory (sem traffic 30d disponível — top 5 críticas sem prio).
|
|
28
|
+
|
|
29
|
+
## Por que existe
|
|
30
|
+
|
|
31
|
+
Equipes que adotam Observability + SRE acumulam cobertura ad-hoc — algumas Edge Functions têm 4 golden signals, outras não; algumas têm SLO, outras não; algumas têm burn alert, outras não. Sem audit estruturado, gaps escapam silenciosa até incident SEV1.
|
|
32
|
+
|
|
33
|
+
**User request explícito:** "comando que você roda hoje pra ver o tamanho do buraco e priorizar". Esse agent automatiza isso, com cross-suite (Observabilidade + SRE + Legacy).
|
|
34
|
+
|
|
35
|
+
**Modernização:** combina v1.9 (SLO/golden signals/OMM) + v1.10 (PRR/burn rate) + v1.12 (characterization) em audit único. Sem precedente em livro Feathers 2004 — Cloud + Observability infra ainda não existiam.
|
|
36
|
+
|
|
37
|
+
## Inputs esperados (do caller)
|
|
38
|
+
|
|
39
|
+
- (Opcional) `project_root`: default cwd
|
|
40
|
+
- (Opcional) `output_path`: default `.planning/OBSERVABILITY-COVERAGE.md`
|
|
41
|
+
- (Opcional) `traffic_window`: janela de traffic para criticidade (default `30d`)
|
|
42
|
+
- (Opcional) `top_n_critical`: quantas críticas listar (default 5)
|
|
43
|
+
- (Opcional) `dimensions`: lista de dimensões a auditar (default `['golden-signals', 'slo', 'burn-alert', 'characterization']`)
|
|
44
|
+
|
|
45
|
+
## Passos
|
|
46
|
+
|
|
47
|
+
### Step 0 — Preflight
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
PROJECT_ROOT="${project_root:-.}"
|
|
51
|
+
OUTPUT_PATH="${output_path:-.planning/OBSERVABILITY-COVERAGE.md}"
|
|
52
|
+
TRAFFIC_WINDOW="${traffic_window:-30d}"
|
|
53
|
+
TOP_N="${top_n_critical:-5}"
|
|
54
|
+
|
|
55
|
+
mkdir -p "$(dirname "$OUTPUT_PATH")"
|
|
56
|
+
|
|
57
|
+
# detectar projeto Supabase
|
|
58
|
+
if [ ! -d "$PROJECT_ROOT/supabase/functions" ]; then
|
|
59
|
+
echo "WARN: $PROJECT_ROOT/supabase/functions não detectado. Audit limitado a paths arbitrários."
|
|
60
|
+
fi
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
### Step 1 — Enumerar Edge Functions
|
|
64
|
+
|
|
65
|
+
```text
|
|
66
|
+
Via MCP (Tier Full):
|
|
67
|
+
mcp__supabase__list_edge_functions(project_id: <from supabase/config.toml>)
|
|
68
|
+
→ lista de { name, version, status, ... }
|
|
69
|
+
|
|
70
|
+
Via filesystem (Tier Partial):
|
|
71
|
+
ls supabase/functions/*/index.ts → lista de paths
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
Para cada function: `EDGE_FUNCTIONS = [{ name, path, deployed }]`
|
|
75
|
+
|
|
76
|
+
### Step 2 — Auditar dimensão "Golden Signals"
|
|
77
|
+
|
|
78
|
+
Para cada Edge Function path:
|
|
79
|
+
```bash
|
|
80
|
+
PATH="supabase/functions/$NAME/index.ts"
|
|
81
|
+
HAS_LATENCY=false
|
|
82
|
+
HAS_TRAFFIC=false
|
|
83
|
+
HAS_ERRORS=false
|
|
84
|
+
HAS_SATURATION=false
|
|
85
|
+
|
|
86
|
+
# heurística — grep por padrões da skill four-golden-signals
|
|
87
|
+
grep -qE "createHistogram\(.*duration|histogram.*ms|latency_histogram" "$PATH" && HAS_LATENCY=true
|
|
88
|
+
grep -qE "createCounter\(.*requests|http_requests_total|trafficCounter" "$PATH" && HAS_TRAFFIC=true
|
|
89
|
+
grep -qE "createCounter\(.*errors|http_errors_total|errorsCounter|error_type" "$PATH" && HAS_ERRORS=true
|
|
90
|
+
grep -qE "createObservableGauge\(.*saturation|connection_pool|queue_depth" "$PATH" && HAS_SATURATION=true
|
|
91
|
+
|
|
92
|
+
ALL_FOUR=true
|
|
93
|
+
[ "$HAS_LATENCY" = false ] && ALL_FOUR=false
|
|
94
|
+
[ "$HAS_TRAFFIC" = false ] && ALL_FOUR=false
|
|
95
|
+
[ "$HAS_ERRORS" = false ] && ALL_FOUR=false
|
|
96
|
+
[ "$HAS_SATURATION" = false ] && ALL_FOUR=false
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### Step 3 — Auditar dimensão "SLO definido"
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
HAS_SLO=false
|
|
103
|
+
# verificar .planning/slos/<name>.md OR .planning/SLO.md menciona name
|
|
104
|
+
if [ -f ".planning/slos/$NAME.md" ]; then
|
|
105
|
+
HAS_SLO=true
|
|
106
|
+
elif [ -f ".planning/SLO.md" ] && grep -q "$NAME" ".planning/SLO.md"; then
|
|
107
|
+
HAS_SLO=true
|
|
108
|
+
fi
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### Step 4 — Auditar dimensão "Burn rate alert"
|
|
112
|
+
|
|
113
|
+
```bash
|
|
114
|
+
HAS_BURN_ALERT=false
|
|
115
|
+
# verificar config de burn rate alerts mencionando name
|
|
116
|
+
if [ -f ".planning/burn-rate-alerts.md" ] && grep -q "$NAME" ".planning/burn-rate-alerts.md"; then
|
|
117
|
+
HAS_BURN_ALERT=true
|
|
118
|
+
elif [ -f ".planning/SLO.md" ] && grep -A 20 "$NAME" ".planning/SLO.md" | grep -q "burn"; then
|
|
119
|
+
HAS_BURN_ALERT=true
|
|
120
|
+
fi
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### Step 5 — Auditar dimensão "Characterization tests"
|
|
124
|
+
|
|
125
|
+
```bash
|
|
126
|
+
HAS_CHAR=false
|
|
127
|
+
for chardir in tests/characterization test/characterization __tests__/characterization; do
|
|
128
|
+
if find "$chardir" -path "*$NAME*" 2>/dev/null | head -1 | grep -q .; then
|
|
129
|
+
HAS_CHAR=true
|
|
130
|
+
break
|
|
131
|
+
fi
|
|
132
|
+
done
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
### Step 6 — Coletar traffic 30d (Tier Full)
|
|
136
|
+
|
|
137
|
+
```text
|
|
138
|
+
Via MCP:
|
|
139
|
+
mcp__supabase__get_logs(
|
|
140
|
+
service: 'edge-function',
|
|
141
|
+
query_filter: { fn_name: $NAME },
|
|
142
|
+
start_time: <now - 30d>,
|
|
143
|
+
end_time: <now>,
|
|
144
|
+
aggregate: count
|
|
145
|
+
)
|
|
146
|
+
→ traffic_30d_count
|
|
147
|
+
|
|
148
|
+
Via filesystem (Tier Partial):
|
|
149
|
+
traffic_30d_count = NULL // não disponível
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
### Step 7 — Compilar matriz + priorizar
|
|
153
|
+
|
|
154
|
+
```text
|
|
155
|
+
Cada Edge Function:
|
|
156
|
+
- name
|
|
157
|
+
- has_4_signals: bool
|
|
158
|
+
- has_slo: bool
|
|
159
|
+
- has_burn_alert: bool
|
|
160
|
+
- has_char: bool
|
|
161
|
+
- traffic_30d: number | null
|
|
162
|
+
- missing_count: count of false in [signals, slo, alert, char]
|
|
163
|
+
|
|
164
|
+
CRITICALITY SCORE = traffic_30d × missing_count
|
|
165
|
+
(prioriza alto traffic + muitos gaps)
|
|
166
|
+
(NULL traffic = score = missing_count alone)
|
|
167
|
+
|
|
168
|
+
TOP_N_CRITICAL = top N by criticality_score
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
### Step 8 — Escrever `OBSERVABILITY-COVERAGE.md`
|
|
172
|
+
|
|
173
|
+
```markdown
|
|
174
|
+
# OBSERVABILITY-COVERAGE — <project> — <data>
|
|
175
|
+
|
|
176
|
+
## Resumo executivo
|
|
177
|
+
|
|
178
|
+
- **Total Edge Functions:** <N>
|
|
179
|
+
- **Cobertura por dimensão:**
|
|
180
|
+
- 4 Golden Signals: <X>/<N> (<X%>)
|
|
181
|
+
- SLO definido: <Y>/<N> (<Y%>)
|
|
182
|
+
- Burn rate alert: <Z>/<N> (<Z%>)
|
|
183
|
+
- Characterization tests: <W>/<N> (<W%>)
|
|
184
|
+
- **Status agregado:**
|
|
185
|
+
- GREEN: ≥ 80% em todas as 4 dimensões
|
|
186
|
+
- YELLOW: 50-80% em alguma
|
|
187
|
+
- RED: < 50% em alguma
|
|
188
|
+
|
|
189
|
+
[atual: <STATUS>]
|
|
190
|
+
|
|
191
|
+
## Top <N> mais críticas SEM cobertura completa
|
|
192
|
+
|
|
193
|
+
| # | Edge Function | Traffic 30d | Missing | Criticality |
|
|
194
|
+
|---|---|---|---|---|
|
|
195
|
+
| 1 | process-payments | 1.2M | signals + slo | 2.4M |
|
|
196
|
+
| 2 | webhook-stripe | 800K | char | 800K |
|
|
197
|
+
| 3 | sync-customers | 450K | signals + char | 900K |
|
|
198
|
+
| 4 | export-reports | 230K | slo + alert + char | 690K |
|
|
199
|
+
| 5 | search-products | 180K | char | 180K |
|
|
200
|
+
|
|
201
|
+
**Recomendação:** instrumentar/SLO/characterizar nesta ordem.
|
|
202
|
+
|
|
203
|
+
## Tabela completa
|
|
204
|
+
|
|
205
|
+
| Edge Function | Traffic 30d | 4 Signals | SLO | Burn Alert | Char Tests |
|
|
206
|
+
|---|---|---|---|---|---|
|
|
207
|
+
| process-payments | 1.2M | ❌ | ❌ | ✅ | ✅ |
|
|
208
|
+
| webhook-stripe | 800K | ✅ | ✅ | ✅ | ❌ |
|
|
209
|
+
| sync-customers | 450K | ❌ | ✅ | ✅ | ❌ |
|
|
210
|
+
| export-reports | 230K | ✅ | ❌ | ❌ | ❌ |
|
|
211
|
+
| search-products | 180K | ✅ | ✅ | ✅ | ❌ |
|
|
212
|
+
| ... | ... | ... | ... | ... | ... |
|
|
213
|
+
|
|
214
|
+
## Análise por dimensão
|
|
215
|
+
|
|
216
|
+
### 4 Golden Signals — <X>/<N>
|
|
217
|
+
|
|
218
|
+
Falta de signals impacta:
|
|
219
|
+
- OMM Capacidade 4 (Cadência) — sem signals, MTTR cresce
|
|
220
|
+
- PRR Axe 2 (Instrumentation) — gate de production-readiness
|
|
221
|
+
|
|
222
|
+
**Próxima ação:** rode `/golden-signals <missing-fn>` para cada Edge Function listada.
|
|
223
|
+
|
|
224
|
+
### SLO definido — <Y>/<N>
|
|
225
|
+
|
|
226
|
+
Falta de SLO impacta:
|
|
227
|
+
- OMM Capacidade 1 (Resilience) — sem SLO não há error budget
|
|
228
|
+
- PRR Axe 4 (Capacity Planning) — sem SLO, capacity decisions são gut-feeling
|
|
229
|
+
|
|
230
|
+
**Próxima ação:** rode `/definir-slo <missing-fn>` para cada Edge Function listada.
|
|
231
|
+
|
|
232
|
+
### Burn rate alert — <Z>/<N>
|
|
233
|
+
|
|
234
|
+
Falta de burn alert impacta:
|
|
235
|
+
- Page-vs-ticket decision — sem alert, equipe descobre via incident
|
|
236
|
+
- Detection time — burn alert detecta SLO drain antes do exhaustion total
|
|
237
|
+
|
|
238
|
+
**Próxima ação:** rode `/burn-rate-status` para verificar configs; criar alerts faltantes.
|
|
239
|
+
|
|
240
|
+
### Characterization tests — <W>/<N>
|
|
241
|
+
|
|
242
|
+
Falta de char tests impacta:
|
|
243
|
+
- Refactor safety — qualquer mudança é "edit and pray" (cap 1 Feathers)
|
|
244
|
+
- Regression detection — bugs introduzidos passam silencioso
|
|
245
|
+
|
|
246
|
+
**Próxima ação:** rode `/caracterizar <missing-fn>` para cada Edge Function listada.
|
|
247
|
+
|
|
248
|
+
## Cross-suite scoring
|
|
249
|
+
|
|
250
|
+
Para uso em OMM (v1.9 — `/auditar-observabilidade`):
|
|
251
|
+
- Capacidade 1 (Resilience): X% golden signals + Y% SLO = score derivado
|
|
252
|
+
- Capacidade 4 (Cadência): burn alerts coverage influencia
|
|
253
|
+
- Capacidade 5 (Comportamento): char tests + signals = behavior visibility
|
|
254
|
+
|
|
255
|
+
## Próximas ações priorizadas
|
|
256
|
+
|
|
257
|
+
1. **P0 — top 1 crítica:** instrumentar `process-payments` (1.2M traffic, signals + slo missing)
|
|
258
|
+
2. **P0 — top 2 crítica:** characterize `webhook-stripe` (800K, char missing)
|
|
259
|
+
3. **P1 — outras top 5:** seguir ordem de criticality
|
|
260
|
+
4. **P2 — coverage geral:** depois das top 5, atacar resto por categoria
|
|
261
|
+
|
|
262
|
+
## Re-audit recomendado
|
|
263
|
+
|
|
264
|
+
Trimestral OR após cada milestone que adiciona Edge Functions.
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
### Step 9 — Output curto
|
|
268
|
+
|
|
269
|
+
```text
|
|
270
|
+
═══════════════════════════════════════════════════════════
|
|
271
|
+
OBSERVABILITY-COVERAGE-AUDITOR · <project>
|
|
272
|
+
═══════════════════════════════════════════════════════════
|
|
273
|
+
|
|
274
|
+
## Cobertura
|
|
275
|
+
4 Signals: <X>/<N> · SLO: <Y>/<N> · Burn alert: <Z>/<N> · Char: <W>/<N>
|
|
276
|
+
Status: [GREEN | YELLOW | RED]
|
|
277
|
+
|
|
278
|
+
## Top <N> críticas sem cobertura
|
|
279
|
+
1. process-payments (1.2M traffic, signals + slo missing)
|
|
280
|
+
2. webhook-stripe (800K, char missing)
|
|
281
|
+
3. ...
|
|
282
|
+
|
|
283
|
+
## Output
|
|
284
|
+
<OUTPUT_PATH>
|
|
285
|
+
|
|
286
|
+
## Próximos passos
|
|
287
|
+
1. Atacar top crítica primeiro: /golden-signals process-payments
|
|
288
|
+
2. Continuar pela ordem de criticality
|
|
289
|
+
3. Re-audit após cada milestone
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
## Quando NÃO invocar
|
|
293
|
+
|
|
294
|
+
- Projeto sem Edge Functions (puro frontend) — não aplicável
|
|
295
|
+
- Projeto recém-criado (< 1 mês) — distribuição de traffic insuficiente
|
|
296
|
+
- Audit recente (< 60 dias) e nada mudou — re-execução marginal
|
|
297
|
+
- Single-developer side project — overhead > valor (audit informal mental basta)
|
|
298
|
+
|
|
299
|
+
## Configuração via `.planning/config.json`
|
|
300
|
+
|
|
301
|
+
```json
|
|
302
|
+
{
|
|
303
|
+
"observability_coverage": {
|
|
304
|
+
"default_traffic_window": "30d",
|
|
305
|
+
"default_top_n_critical": 5,
|
|
306
|
+
"dimensions": ["golden-signals", "slo", "burn-alert", "characterization"],
|
|
307
|
+
"status_threshold": {
|
|
308
|
+
"green": 80,
|
|
309
|
+
"yellow": 50
|
|
310
|
+
}
|
|
311
|
+
}
|
|
312
|
+
}
|
|
313
|
+
```
|
|
314
|
+
|
|
315
|
+
## Ver também
|
|
316
|
+
|
|
317
|
+
- [`four-golden-signals`](../skills/four-golden-signals/SKILL.md) (v1.10)
|
|
318
|
+
- [`event-based-slos`](../skills/event-based-slos/SKILL.md) (v1.9)
|
|
319
|
+
- [`burn-rate-alerting`](../skills/burn-rate-alerting/SKILL.md) (v1.9)
|
|
320
|
+
- [`observability-maturity-model`](../skills/observability-maturity-model/SKILL.md) (v1.9)
|
|
321
|
+
- [`legacy-characterization-tests`](../skills/legacy-characterization-tests/SKILL.md) (v1.12)
|
|
322
|
+
- [`omm-auditor`](./omm-auditor.md) (v1.9) — consume este agent para Capacidade 5
|
|
323
|
+
- [`prr-conductor`](./prr-conductor.md) (v1.10) — consume para Axe 2 e 4
|
|
324
|
+
|
|
325
|
+
*Modernização 2026 — combina cross-suite v1.9 + v1.10 + v1.12 em audit único.*
|