@luanpdd/kit-mcp 1.10.0 → 1.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (63) hide show
  1. package/gates/ai-prompt-stability.md +120 -0
  2. package/gates/legacy-refactor-safety.md +178 -0
  3. package/gates/observability-coverage.md +151 -0
  4. package/gates/release-pipeline-policy.md +132 -0
  5. package/kit/COMANDOS.md +15 -0
  6. package/kit/agents/ai-mutation-tester.md +298 -0
  7. package/kit/agents/cascading-failures-auditor.md +306 -0
  8. package/kit/agents/executor.md +13 -0
  9. package/kit/agents/legacy-characterizer.md +378 -0
  10. package/kit/agents/load-shedding-instrumenter.md +297 -0
  11. package/kit/agents/observability-coverage-auditor.md +325 -0
  12. package/kit/agents/omm-auditor.md +47 -0
  13. package/kit/agents/payload-capture-instrumenter.md +283 -0
  14. package/kit/agents/planner.md +29 -0
  15. package/kit/agents/prr-conductor.md +8 -0
  16. package/kit/agents/refactor-safety-auditor.md +414 -0
  17. package/kit/agents/release-pipeline-auditor.md +360 -0
  18. package/kit/agents/seam-finder.md +367 -0
  19. package/kit/agents/shotgun-surgery-detector.md +359 -0
  20. package/kit/agents/storytelling-analyst.md +309 -0
  21. package/kit/agents/supabase-edge-fn-writer.md +12 -0
  22. package/kit/agents/verifier.md +30 -0
  23. package/kit/commands/auditar-cascading.md +111 -0
  24. package/kit/commands/auditar-marco.md +44 -1
  25. package/kit/commands/auditar-observabilidade-cobertura.md +183 -0
  26. package/kit/commands/auditar-refactor.md +219 -0
  27. package/kit/commands/auditar-release.md +109 -0
  28. package/kit/commands/capturar-payloads.md +193 -0
  29. package/kit/commands/caracterizar-prompt.md +195 -0
  30. package/kit/commands/caracterizar.md +212 -0
  31. package/kit/commands/concluir-marco.md +41 -1
  32. package/kit/commands/detectar-duplicacao.md +197 -0
  33. package/kit/commands/discutir-fase.md +41 -0
  34. package/kit/commands/encontrar-seams.md +136 -0
  35. package/kit/commands/forense.md +40 -1
  36. package/kit/commands/legacy.md +263 -0
  37. package/kit/commands/load-shedding.md +117 -0
  38. package/kit/commands/observabilidade.md +2 -0
  39. package/kit/commands/refactor-seguro.md +321 -0
  40. package/kit/commands/sre.md +3 -0
  41. package/kit/commands/storytelling.md +179 -0
  42. package/kit/skills/_shared-legacy/glossary.md +389 -0
  43. package/kit/skills/_shared-sre/glossary.md +139 -0
  44. package/kit/skills/ai-prompt-characterization/SKILL.md +335 -0
  45. package/kit/skills/cascading-failures/SKILL.md +307 -0
  46. package/kit/skills/four-golden-signals/SKILL.md +17 -0
  47. package/kit/skills/hermetic-builds/SKILL.md +323 -0
  48. package/kit/skills/legacy-api-only-applications/SKILL.md +358 -0
  49. package/kit/skills/legacy-characterization-tests/SKILL.md +330 -0
  50. package/kit/skills/legacy-effect-analysis/SKILL.md +331 -0
  51. package/kit/skills/legacy-extract-class/SKILL.md +203 -0
  52. package/kit/skills/legacy-monster-methods/SKILL.md +444 -0
  53. package/kit/skills/legacy-programming-by-difference/SKILL.md +252 -0
  54. package/kit/skills/legacy-seams-and-test-harness/SKILL.md +460 -0
  55. package/kit/skills/legacy-shotgun-surgery/SKILL.md +286 -0
  56. package/kit/skills/legacy-sprout-wrap-techniques/SKILL.md +434 -0
  57. package/kit/skills/legacy-storytelling-naked-crc/SKILL.md +270 -0
  58. package/kit/skills/llm-as-dependency/SKILL.md +436 -0
  59. package/kit/skills/load-shedding-graceful-degradation/SKILL.md +396 -0
  60. package/kit/skills/pre-refactor-characterization/SKILL.md +421 -0
  61. package/kit/skills/release-engineering/SKILL.md +367 -0
  62. package/kit/skills/retry-strategies/SKILL.md +372 -0
  63. package/package.json +2 -2
@@ -0,0 +1,396 @@
1
+ ---
2
+ name: load-shedding-graceful-degradation
3
+ description: Use ao instrumentar defesas server-side — load shedding (drop com 503), queue management, deadline-aware handlers, graceful degradation modes. Cap 22 livro Google SRE.
4
+ ---
5
+
6
+ # SRE — Load Shedding & Graceful Degradation
7
+
8
+ ## Quando usar
9
+
10
+ LLM carrega esta skill ao desenhar/instrumentar handler/serviço user-facing que precisa proteger a si próprio de overload. Trigger phrases:
11
+
12
+ - "load shedding", "descarte de carga"
13
+ - "graceful degradation", "modo degradado"
14
+ - "queue management", "drop policy"
15
+ - "503 Service Unavailable", "Retry-After"
16
+ - "rate limit server-side"
17
+ - "deadline-aware handler"
18
+
19
+ ## Regras absolutas
20
+
21
+ - **Load shedding > queueing > crashing.** Quando saturated, REJEITAR (503) é melhor que enfileirar (queue cresce indefinidamente) que é melhor que crashar (todos falhcos).
22
+ - **503 SEMPRE com Retry-After header.** Sem Retry-After, client retenta imediato → storm.
23
+ - **Drop policy DEFAULT: drop oldest.** FIFO drop (drop oldest) preserva latency para new requests; LIFO (drop newest) starva early requests. Default: drop oldest.
24
+ - **Queue depth deve ter limite.** Unbounded queue = OOM eventual. Bound = drop quando excede; observability sobre drop rate.
25
+ - **Degraded mode é DESIGN-TIME, não improviso.** Mode degradado é decidido upfront (cache stale OK, default values OK, simplified algo OK), não na hora do incident.
26
+ - **Deadline-aware handler.** Antes de processar, checar se request ainda relevante (deadline > now). Se não, aborta. Não desperdiça compute em request abandonada.
27
+ - **Concurrency limit per-handler.** Semaphore com limite. Excede = 503. Protege upstream resources (DB conn pool, memory, CPU).
28
+
29
+ ## Patterns canônicos
30
+
31
+ ### Pattern 1: Saturation-aware load shedder
32
+
33
+ ```ts
34
+ // PT-BR: server-side load shedder canônico
35
+ class LoadShedder {
36
+ private inFlight = 0
37
+ private readonly maxConcurrent: number
38
+ private readonly cpuThreshold: number
39
+ private readonly queueDepthThreshold: number
40
+
41
+ constructor(opts: { maxConcurrent: number; cpuThreshold?: number; queueDepthThreshold?: number }) {
42
+ this.maxConcurrent = opts.maxConcurrent
43
+ this.cpuThreshold = opts.cpuThreshold ?? 90
44
+ this.queueDepthThreshold = opts.queueDepthThreshold ?? 0.95
45
+ }
46
+
47
+ async tryAcquire(): Promise<{ ok: true } | { ok: false; reason: string; retryAfterSec: number }> {
48
+ // Check 1: concurrency limit
49
+ if (this.inFlight >= this.maxConcurrent) {
50
+ return { ok: false, reason: 'concurrency_limit', retryAfterSec: 5 }
51
+ }
52
+
53
+ // Check 2: CPU saturation (último 30s)
54
+ const cpu = await this.getCpuUsage()
55
+ if (cpu > this.cpuThreshold) {
56
+ return { ok: false, reason: 'cpu_saturation', retryAfterSec: 10 }
57
+ }
58
+
59
+ // Check 3: queue depth (downstream like pgmq, redis, kafka)
60
+ const queueRatio = await this.getQueueDepthRatio()
61
+ if (queueRatio > this.queueDepthThreshold) {
62
+ return { ok: false, reason: 'queue_depth', retryAfterSec: 30 }
63
+ }
64
+
65
+ this.inFlight++
66
+ return { ok: true }
67
+ }
68
+
69
+ release(): void {
70
+ this.inFlight = Math.max(0, this.inFlight - 1)
71
+ }
72
+ }
73
+
74
+ // Uso em handler
75
+ const shedder = new LoadShedder({ maxConcurrent: 1000, cpuThreshold: 90 })
76
+
77
+ Deno.serve(async (req) => {
78
+ const acq = await shedder.tryAcquire()
79
+ if (!acq.ok) {
80
+ return new Response('Service Unavailable', {
81
+ status: 503,
82
+ headers: { 'Retry-After': String(acq.retryAfterSec), 'X-Shed-Reason': acq.reason },
83
+ })
84
+ }
85
+ try {
86
+ return await handleRequest(req)
87
+ } finally {
88
+ shedder.release()
89
+ }
90
+ })
91
+ ```
92
+
93
+ ### Pattern 2: Drop policies para queue
94
+
95
+ ```text
96
+ DROP OLDEST (FIFO drop) — RECOMMENDED DEFAULT
97
+ ==============================================
98
+ when queue.size > limit:
99
+ queue.shift() // remove primeiro elemento
100
+ queue.push(new)
101
+ Pros: requests new flowing; latency previsível pra novos
102
+ Cons: requests velhos perdidos (já foram esperar muito)
103
+ Use case: webhooks, eventos rápidos, anything time-sensitive
104
+
105
+ DROP NEWEST (LIFO drop)
106
+ ========================
107
+ when queue.size > limit:
108
+ REJECT new request (don't push)
109
+ Pros: requests in-flight não interrompidos
110
+ Cons: starvation se traffic continua alto; latency cresce
111
+ Use case: batch processing onde antiguidade importa (FIFO required)
112
+
113
+ DROP RANDOM
114
+ ============
115
+ when queue.size > limit:
116
+ randomly drop existing OR new
117
+ Pros: fairness statisticamente
118
+ Cons: harder to reason about; latency menos previsível
119
+ Use case: research/exploratory, raramente production
120
+
121
+ DROP BY PRIORITY
122
+ ================
123
+ when queue.size > limit:
124
+ drop lowest-priority element
125
+ Pros: SLA tiered customers; high-priority preserved
126
+ Cons: requires priority taxonomy; complexity
127
+ Use case: multi-tenant com tiers (Free/Pro/Enterprise)
128
+ ```
129
+
130
+ ### Pattern 3: Degraded mode design
131
+
132
+ ```ts
133
+ // PT-BR: degraded mode é design-time, não improviso
134
+ interface ProductService {
135
+ getProduct(id: string): Promise<Product>
136
+ }
137
+
138
+ // Modo NORMAL — full features
139
+ class FullProductService implements ProductService {
140
+ constructor(
141
+ private db: Db,
142
+ private inventoryApi: InventoryApi,
143
+ private reviewsApi: ReviewsApi,
144
+ private personalizer: Personalizer,
145
+ ) {}
146
+
147
+ async getProduct(id: string): Promise<Product> {
148
+ const [base, inv, reviews, personalized] = await Promise.all([
149
+ this.db.fetch(id),
150
+ this.inventoryApi.fetch(id),
151
+ this.reviewsApi.fetch(id),
152
+ this.personalizer.fetch(id),
153
+ ])
154
+ return { ...base, inventory: inv, reviews, personalized }
155
+ }
156
+ }
157
+
158
+ // Modo DEGRADED — minimal features when deps down
159
+ class DegradedProductService implements ProductService {
160
+ constructor(private db: Db, private cache: Cache) {}
161
+
162
+ async getProduct(id: string): Promise<Product> {
163
+ const cached = await this.cache.get(`product:${id}`)
164
+ if (cached) return cached // stale data is OK in degraded mode
165
+
166
+ const base = await this.db.fetch(id)
167
+ return {
168
+ ...base,
169
+ inventory: { available: 'unknown' }, // placeholder
170
+ reviews: [],
171
+ personalized: null,
172
+ }
173
+ }
174
+ }
175
+
176
+ // Selector — switches based on health
177
+ class HealthAwareProductService implements ProductService {
178
+ constructor(
179
+ private full: FullProductService,
180
+ private degraded: DegradedProductService,
181
+ private healthCheck: () => boolean,
182
+ ) {}
183
+
184
+ async getProduct(id: string): Promise<Product> {
185
+ if (this.healthCheck()) {
186
+ try {
187
+ return await withTimeout(this.full.getProduct(id), 1000)
188
+ } catch (e) {
189
+ // fallback to degraded; log + alert
190
+ return this.degraded.getProduct(id)
191
+ }
192
+ }
193
+ return this.degraded.getProduct(id)
194
+ }
195
+ }
196
+ ```
197
+
198
+ **Princípio:** degraded mode é EXERCITADO em prod (1% de tráfego sempre passa por ele). Quando precisa virar 100%, é transição testada, não improviso.
199
+
200
+ ### Pattern 4: Deadline-aware handler
201
+
202
+ ```ts
203
+ // PT-BR: handler aborta cedo se deadline já estourou
204
+ async function deadlineAwareHandler(req: Request): Promise<Response> {
205
+ const deadlineMs = parseDeadlineHeader(req) // x-deadline-ms ou similar
206
+ if (!deadlineMs) {
207
+ return handleRequest(req) // sem deadline declarado, comportamento default
208
+ }
209
+
210
+ // Check 1: deadline já estourou ANTES de processar
211
+ if (Date.now() > deadlineMs) {
212
+ return new Response('Deadline Exceeded', {
213
+ status: 408, // Request Timeout
214
+ headers: { 'X-Deadline-Exceeded': 'true' },
215
+ })
216
+ }
217
+
218
+ // Check 2: usar AbortSignal pra abortar se deadline estourar durante processing
219
+ const remaining = deadlineMs - Date.now()
220
+ const signal = AbortSignal.timeout(remaining)
221
+
222
+ try {
223
+ return await handleRequestWithSignal(req, signal)
224
+ } catch (e) {
225
+ if (signal.aborted) {
226
+ return new Response('Deadline Exceeded', { status: 408 })
227
+ }
228
+ throw e
229
+ }
230
+ }
231
+
232
+ function parseDeadlineHeader(req: Request): number | null {
233
+ const h = req.headers.get('x-deadline-ms')
234
+ if (!h) return null
235
+ return parseInt(h, 10) // unix ms epoch
236
+ }
237
+ ```
238
+
239
+ ### Pattern 5: Slow start em recovery
240
+
241
+ ```ts
242
+ // PT-BR: após service recovery, aceita gradual
243
+ class SlowStartLoadBalancer {
244
+ private acceptanceRatio = 0.0
245
+ private startedAt: number | null = null
246
+ private readonly rampDurationMs: number = 5 * 60 * 1000 // 5 min
247
+
248
+ recoveryDetected(): void {
249
+ this.acceptanceRatio = 0.1
250
+ this.startedAt = Date.now()
251
+ }
252
+
253
+ shouldAccept(): boolean {
254
+ if (this.acceptanceRatio >= 1.0) return true
255
+ if (this.startedAt === null) return true
256
+
257
+ const elapsed = Date.now() - this.startedAt
258
+ const progress = Math.min(elapsed / this.rampDurationMs, 1.0)
259
+
260
+ // ramp: 10% → 100% over rampDurationMs
261
+ this.acceptanceRatio = 0.1 + (0.9 * progress)
262
+
263
+ return Math.random() < this.acceptanceRatio
264
+ }
265
+ }
266
+
267
+ // Em handler
268
+ const slowStart = new SlowStartLoadBalancer()
269
+
270
+ Deno.serve(async (req) => {
271
+ if (!slowStart.shouldAccept()) {
272
+ return new Response('Service Recovering', {
273
+ status: 503,
274
+ headers: { 'Retry-After': '30' },
275
+ })
276
+ }
277
+ return handleRequest(req)
278
+ })
279
+ ```
280
+
281
+ ### Pattern 6: Tiered shedding (feature flags)
282
+
283
+ Diferentes features têm diferentes priority:
284
+
285
+ ```ts
286
+ // PT-BR: critical features sempre servidas; nice-to-have desligadas em load
287
+ async function handleRequest(req: Request): Promise<Response> {
288
+ const path = new URL(req.url).pathname
289
+ const cpuLoad = await getCpuLoad()
290
+
291
+ // Path 1: critical (login, checkout) — sempre servido
292
+ if (CRITICAL_PATHS.includes(path)) {
293
+ return handleCritical(req)
294
+ }
295
+
296
+ // Path 2: important (browse, search) — degraded mode acima de 70%
297
+ if (IMPORTANT_PATHS.includes(path)) {
298
+ if (cpuLoad > 70) {
299
+ return handleDegraded(req) // sem personalization, sem ML ranking
300
+ }
301
+ return handleNormal(req)
302
+ }
303
+
304
+ // Path 3: nice-to-have (recommendations, A/B experiments) — desligado acima de 80%
305
+ if (cpuLoad > 80) {
306
+ return new Response(null, { status: 204 }) // No Content; UI handle
307
+ }
308
+ return handleNiceToHave(req)
309
+ }
310
+ ```
311
+
312
+ ## Anti-patterns
313
+
314
+ ### ANTI: queue unbounded
315
+
316
+ ```text
317
+ ANTI: queue cresce ilimitadamente; "vamos processar quando puder".
318
+
319
+ PROBLEMA: memory exhaustion eventual. OOM kill. Queue lost. Pior caso
320
+ que rejeitar early.
321
+
322
+ CERTO: bound + drop policy. Tamanho cap baseado em SLA latency
323
+ (queue_size_max / throughput < SLA_max_latency).
324
+ ```
325
+
326
+ ### ANTI: 503 sem Retry-After
327
+
328
+ ```text
329
+ ANTI: server saturated → status 503 + body "try again later".
330
+
331
+ PROBLEMA: client não sabe quanto esperar. Retenta imediato. Storm.
332
+
333
+ CERTO: 503 + Retry-After: <segundos>. Client respeita header.
334
+ Backoff distribuído.
335
+ ```
336
+
337
+ ### ANTI: degraded mode improvisado em incident
338
+
339
+ ```text
340
+ ANTI: durante outage, "vamos cortar feature X temporariamente".
341
+ No code path para isso; engineering ad-hoc.
342
+
343
+ PROBLEMA: bug introduzido sob pressão. Outage piora.
344
+
345
+ CERTO: degraded mode é design-time. Path implementado e testado em
346
+ dev/staging. Switch via feature flag. 1% do tráfego sempre
347
+ exercita o path.
348
+ ```
349
+
350
+ ### ANTI: handler ignora deadline
351
+
352
+ ```text
353
+ ANTI: handler processa request por 30s. Client desistiu em 5s.
354
+ 29 segundos de work zumbi.
355
+
356
+ PROBLEMA: recursos consumidos por work morto. Throughput cai.
357
+
358
+ CERTO: deadline-aware handler. Check at entry. AbortSignal durante
359
+ processing. Aborta cedo.
360
+ ```
361
+
362
+ ### ANTI: rate limit só client-side
363
+
364
+ ```text
365
+ ANTI: SDK do client tem rate limit. Server confia.
366
+
367
+ PROBLEMA: cliente bug ignora rate limit. Cliente malicioso ignora.
368
+ Outros bypassam SDK.
369
+
370
+ CERTO: rate limit server-side em proxy/gateway (Kong, Envoy, AWS API
371
+ Gateway). Per-API-key, per-IP, global. Hard limit.
372
+ ```
373
+
374
+ ## Verificação
375
+
376
+ 1. Load shedder ativo em handlers user-facing
377
+ 2. Drop policy explícita em queues (não default infinity)
378
+ 3. 503 retorna sempre com Retry-After
379
+ 4. Deadline-aware handler em chamadas externas
380
+ 5. Degraded mode implementado E exercitado (não só design)
381
+ 6. Slow start em recovery configurado
382
+ 7. Concurrency limit per-handler
383
+ 8. Saturation metrics (cap 6 v1.10) instrumentadas
384
+
385
+ ---
386
+
387
+ ## Ver também
388
+
389
+ - [`_shared-sre/glossary.md`](../_shared-sre/glossary.md) — vocabulário (load shedding, drop policy, etc.)
390
+ - [`cascading-failures`](../cascading-failures/SKILL.md) (v1.11) — pattern paralelo (caller-side defenses)
391
+ - [`retry-strategies`](../retry-strategies/SKILL.md) (v1.11) — caller-side retry coopera com server-side shed
392
+ - [`four-golden-signals`](../four-golden-signals/SKILL.md) (v1.10) — Saturation gauge dispara load shed
393
+ - [`load-shedding-instrumenter`](../../agents/load-shedding-instrumenter.md) (v1.11) — agent que aplica patches
394
+ - [`supabase-edge-fn-writer`](../../agents/supabase-edge-fn-writer.md) (v1.8 + patch v1.11) — Edge Functions ganham load shed built-in
395
+
396
+ *Material-fonte: Site Reliability Engineering — Beyer/Jones/Petoff/Murphy (Google/O'Reilly, 2016) — Cap 22 (subsections sobre load shedding e graceful degradation).*