npm - @luanpdd/kit-mcp - Versions diffs - 1.34.0 → 1.36.0 - Mend

@luanpdd/kit-mcp 1.34.0 → 1.36.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (118) hide show

package/README.md +1 -1
package/bin/cli.js +2 -2
package/bin/mcp.js +6 -6
package/bin/ui.js +74 -74
package/gates/ai-prompt-stability.md +120 -120
package/gates/budget-description.md +68 -68
package/gates/confidence.md +29 -29
package/gates/dependency-check.md +33 -33
package/gates/dept-cycle-prevention.md +179 -179
package/gates/golden-signals-coverage.md +133 -133
package/gates/legacy-refactor-safety.md +178 -178
package/gates/multi-tenant-rls-coverage.md +102 -102
package/gates/no-personal-uuid.md +72 -72
package/gates/obs-agents-mcp-supabase.md +86 -86
package/gates/obs-skills-frontmatter.md +76 -76
package/gates/observability-coverage.md +151 -151
package/gates/omm-no-regression.md +83 -83
package/gates/postmortem-template-required.md +127 -127
package/gates/prr-checklist-coverage.md +128 -128
package/gates/regression.md +32 -32
package/gates/release-pipeline-policy.md +132 -132
package/gates/secrets-scan.md +33 -33
package/gates/service-role-not-in-user-facing.md +113 -113
package/gates/skill-must-include.md +71 -71
package/gates/sync-idempotent.md +62 -62
package/gates/verify-phase-goal.md +34 -34
package/kit/agents/designer-ui.md +216 -216
package/kit/agents/workflow-generator.md +537 -0
package/kit/commands/adicionar-backlog.md +1 -1
package/kit/commands/adicionar-fase.md +1 -1
package/kit/commands/adicionar-tarefa.md +1 -1
package/kit/commands/auditar-observabilidade.md +103 -103
package/kit/commands/auditar-toil.md +129 -129
package/kit/commands/caracterizar-prompt.md +195 -195
package/kit/commands/criar-workflow.md +158 -0
package/kit/commands/definir-perfil.md +1 -1
package/kit/commands/definir-slo.md +108 -108
package/kit/commands/fio.md +1 -1
package/kit/commands/golden-signals.md +142 -142
package/kit/commands/instrumentar-fase.md +200 -200
package/kit/commands/investigar-producao.md +162 -162
package/kit/commands/observabilidade.md +118 -118
package/kit/commands/postmortem.md +179 -179
package/kit/commands/prr.md +205 -205
package/kit/commands/publicar-rapido.md +207 -207
package/kit/commands/risk-budget.md +220 -220
package/kit/commands/sre.md +230 -230
package/kit/file-manifest.json +5 -2
package/kit/framework/references/output-style.md +22 -22
package/kit/hooks/post-apply-migration.js +199 -199
package/kit/hooks/sidecar-tool-publisher.js +210 -210
package/kit/skills/_shared-dados-distribuidos/glossary.md +224 -224
package/kit/skills/_shared-legacy/glossary.md +389 -389
package/kit/skills/_shared-multi-tenant/glossary.md +186 -186
package/kit/skills/_shared-observability/glossary.md +396 -396
package/kit/skills/_shared-sre/glossary.md +712 -712
package/kit/skills/_shared-supabase/glossary.md +234 -234
package/kit/skills/blameless-postmortems/SKILL.md +340 -340
package/kit/skills/burn-rate-alerting/SKILL.md +258 -258
package/kit/skills/cascading-failures/SKILL.md +311 -311
package/kit/skills/core-analysis-loop/SKILL.md +352 -352
package/kit/skills/distributed-tracing/SKILL.md +362 -362
package/kit/skills/dynamic-workflow-authoring/SKILL.md +327 -0
package/kit/skills/eliminating-toil/SKILL.md +243 -243
package/kit/skills/event-based-slos/SKILL.md +296 -296
package/kit/skills/four-golden-signals/SKILL.md +314 -314
package/kit/skills/hermetic-builds/SKILL.md +323 -323
package/kit/skills/legacy-monster-methods/SKILL.md +444 -444
package/kit/skills/llm-as-dependency/SKILL.md +436 -436
package/kit/skills/load-shedding-graceful-degradation/SKILL.md +396 -396
package/kit/skills/observability-driven-development/SKILL.md +315 -315
package/kit/skills/observability-maturity-model/SKILL.md +222 -222
package/kit/skills/opentelemetry-standard/SKILL.md +351 -351
package/kit/skills/production-readiness-review/SKILL.md +305 -305
package/kit/skills/release-engineering/SKILL.md +367 -367
package/kit/skills/retry-strategies/SKILL.md +372 -372
package/kit/skills/sre-risk-management/SKILL.md +221 -221
package/kit/skills/structured-events/SKILL.md +265 -265
package/kit/skills/supabase-cron-queues/SKILL.md +275 -275
package/kit/skills/supabase-database-functions/SKILL.md +332 -332
package/kit/skills/supabase-declarative-schema/SKILL.md +183 -183
package/kit/skills/supabase-pgvector-rag/SKILL.md +253 -253
package/kit/skills/supabase-postgres-style/SKILL.md +138 -138
package/kit/skills/supabase-storage/SKILL.md +234 -234
package/kit/skills/telemetry-pipelines/SKILL.md +259 -259
package/kit/skills/telemetry-sampling/SKILL.md +256 -256
package/kit/skills/ui-anti-padroes-ia/SKILL.md +261 -261
package/kit/skills/ui-contexto-produto/SKILL.md +248 -248
package/kit/skills/ui-cor-estrategia/SKILL.md +213 -213
package/kit/skills/ui-critica-auditoria/SKILL.md +260 -260
package/kit/skills/ui-motion-funcional/SKILL.md +264 -264
package/kit/skills/ui-ritmo-espacial/SKILL.md +259 -259
package/kit/skills/ui-tipografia/SKILL.md +211 -211
package/package.json +1 -1
package/src/cli/index.js +1114 -1114
package/src/cli/render.js +194 -194
package/src/cli/upgrade-check.js +135 -135
package/src/core/error-redaction.js +76 -76
package/src/core/failures.js +153 -153
package/src/core/gate-runner.js +205 -205
package/src/core/gates.js +82 -82
package/src/core/logger.js +170 -170
package/src/core/manifest-verify.js +174 -174
package/src/core/metrics.js +268 -268
package/src/core/notify.js +60 -60
package/src/core/path-safety.js +141 -141
package/src/core/replays.js +120 -120
package/src/core/ui.js +185 -185
package/src/mcp-server/install.js +149 -149
package/src/mcp-server/roots.js +124 -124
package/src/ui/auto-spawn.js +113 -113
package/src/ui/browser.js +78 -78
package/src/ui/client.js +130 -130
package/src/ui/events.js +65 -65
package/src/ui/lockfile.js +191 -191
package/src/ui/port.js +67 -67
package/src/ui/server.js +547 -547
package/src/ui/wrapper.js +129 -129

package/kit/skills/telemetry-sampling/SKILL.md CHANGED Viewed

@@ -1,256 +1,256 @@
----
-name: telemetry-sampling
-description: Use ao reduzir custo de telemetria — head/tail sampling, by-key, dynamic. 100% errors, by-tier para customers, head-based propaga via traceparent.
----
-# Observabilidade — Telemetry Sampling
-## Quando usar
-LLM carrega esta skill ao reduzir custo de telemetria sem perder sinal. Trigger phrases:
-- "sampling", "reduzir custo de telemetria"
-- "head-based vs tail-based"
-- "by-key sampling", "dynamic sampling"
-- "100% errors mas só 1% sucessos"
-- "trace fica incompleto após sampling"
-## Regras absolutas
-- **100% dos erros sempre** — sample 100% de eventos com `result.success = false`. Erros são raros e críticos. Nunca sample.
-- **100% de paying/enterprise customers** — high-value, baixo volume relativo, debug crucial.
-- **Head-based propaga via `traceparent` flag** — decisão tomada no service de entrada, propagada downstream para garantir trace completo.
-- **Tail-based requer collector buffer** — decisão pós-trace; impossível de implementar inline em código.
-- **Constant probability falha em low volume** — 1/1000 de 100 req/min = 0.1 evento/min, perde tudo.
-- **Sample rate gravado no evento** — sem isso, agregações reconstroem totais errados.
-- **Errors > success** — categorize: paying customers > free, enterprise > pro > free.
-- **Não sample antes de aggregate** — pre-aggregation perde alta cardinalidade. Sample evento bruto, aggregate no read.
-## Estratégias canônicas
-### Head-based sampling (decisão no início do trace)
-```ts
-// PT-BR: decisão tomada no service de entrada, propagada via traceparent flag
-import { trace, context } from '@opentelemetry/api'
-import { TraceFlags } from '@opentelemetry/api'
-function shouldSample(event: SpanContext): boolean {
-  // PT-BR: 100% errors (head-based: erros raramente são conhecidos no head;
-  //         verificar HTTP status no início via header)
-  if (event.attributes['result.success'] === false) return true
-  // PT-BR: 100% enterprise — alto valor
-  if (event.attributes['customer.tier'] === 'enterprise') return true
-  // PT-BR: 10% pro
-  if (event.attributes['customer.tier'] === 'pro') return Math.random() < 0.1
-  // PT-BR: 1% free baseline
-  return Math.random() < 0.01
-}
-// PT-BR: marcar flag sampled no traceparent — propaga para downstream
-const flags = shouldSample(event) ? TraceFlags.SAMPLED : TraceFlags.NONE
-```
-### Tail-based sampling (decisão após trace completar)
-```yaml
-# PT-BR: OTel Collector config — sampling pós-trace
-# 100% errors + outliers de latência + 1% success
-processors:
-  tail_sampling:
-    decision_wait: 10s   # PT-BR: buffer 10s para esperar todos os spans do trace
-    policies:
-      - name: errors-policy
-        type: status_code
-        status_code: { status_codes: [ERROR] }
-      - name: latency-outliers
-        type: latency
-        latency: { threshold_ms: 1000 }   # PT-BR: > 1s é outlier
-      - name: probabilistic-baseline
-        type: probabilistic
-        probabilistic: { sampling_percentage: 1 }
-```
-### By-key sampling
-```ts
-// PT-BR: taxas diferentes por chave — mais preciso que constant
-const SAMPLE_RATES: Record<string, number> = {
-  // chave: [error.type | endpoint | tenant_id, etc.]
-  'error_rate_limit': 0.5,         // PT-BR: 50% (já frequente, mas importante)
-  'error_validation': 1.0,         // PT-BR: 100% (raro, debug crítico)
-  'tenant_acme-corp': 1.0,         // PT-BR: 100% (big customer)
-  'endpoint_/health': 0.001,       // PT-BR: 0.1% (muito frequente, baixo valor)
-  'default': 0.05                  // PT-BR: 5% baseline
-}
-function sampleByKey(event: SpanLike): boolean {
-  const errorKey = `error_${event.attributes['error.type']}`
-  const tenantKey = `tenant_${event.attributes['tenant_id']}`
-  const endpointKey = `endpoint_${event.attributes['endpoint']}`
-  const rate = SAMPLE_RATES[errorKey]
-            ?? SAMPLE_RATES[tenantKey]
-            ?? SAMPLE_RATES[endpointKey]
-            ?? SAMPLE_RATES['default']
-  return Math.random() < rate
-}
-```
-### Dynamic sampling (taxa adapta com volume)
-```ts
-// PT-BR: lookback 30s — quanto traffic veio recentemente?
-let recentVolume = 0
-setInterval(() => { recentVolume = 0 }, 30_000)
-function sampleDynamic(event: SpanLike): boolean {
-  recentVolume++
-  // PT-BR: tráfego baixo → sample mais; tráfego alto → sample menos
-  if (recentVolume < 100) return true                  // até 100 spans em 30s, mantém todos
-  if (recentVolume < 1000) return Math.random() < 0.1  // até 1k, 10%
-  return Math.random() < 0.01                          // > 1k, 1%
-}
-```
-### Combinando: by-key + dynamic + head
-```ts
-function shouldSample(event: SpanLike): boolean {
-  // PT-BR: 1. Errors sempre 100%
-  if (event.attributes['result.success'] === false) return true
-  // PT-BR: 2. Enterprise sempre 100%
-  if (event.attributes['customer.tier'] === 'enterprise') return true
-  // PT-BR: 3. Outras chaves de alto valor
-  if (event.attributes['feature_flag.experiment_a'] === true) return true   // experimento ativo
-  // PT-BR: 4. Dynamic baseline
-  return sampleDynamic(event)
-}
-```
-## Patterns canônicos
-### Pattern: gravar sample_rate no evento
-```ts
-// PT-BR: sem sample_rate, agregações no read time não conseguem reconstruir totais
-const sampleRate = computeSampleRate(event)
-if (Math.random() < sampleRate) {
-  span.setAttribute('_sample_rate', sampleRate)   // PT-BR: 0.01 = 1% sampled
-  span.setAttribute('_sampled', true)
-  // PT-BR: agora o backend pode multiplicar contagens por 1/sample_rate
-  exportSpan(span)
-}
-```
-### Pattern: query reconstruindo totais com sample_rate
-```sql
--- PT-BR: sem sample_rate, count(*) está errado
--- COM sample_rate, sum(1/_sample_rate) reconstrói total estimado
-select
-  endpoint,
-  sum(1.0 / _sample_rate) as estimated_total,
-  count(*) as samples_collected,
-  sum(1.0 / _sample_rate) filter (where result_success = false) as estimated_errors
-from observability.events
-where timestamp > now() - interval '1 hour'
-group by endpoint
-order by estimated_total desc;
-```
-### Pattern: sampling para alta cardinalidade
-```ts
-// PT-BR: cardinalidade alta (millions of users) — não pode sample por user.id
-//         mas pode sample por (customer.tier, error.type) — combinação cardin. baixa
-function sampleByDimensions(event: SpanLike): number {
-  const key = `${event.attributes['customer.tier']}-${event.attributes['error.type'] ?? 'success'}`
-  const rates: Record<string, number> = {
-    'enterprise-success': 0.5,
-    'enterprise-error': 1.0,
-    'pro-success': 0.1,
-    'pro-error': 1.0,
-    'free-success': 0.01,
-    'free-error': 1.0,
-  }
-  return rates[key] ?? 0.01
-}
-```
-## Anti-patterns
-### ANTI: constant probability em low volume
-```text
-ANTI: app com 100 req/min, sample rate fixo 1/1000 → 0.1 evento/min retidos
-       Você verá 1 erro a cada 10 minutos. Sinal perdido.
-CERTO: dynamic sampling — alta taxa quando volume baixo, baixa quando alto.
-```
-### ANTI: sample errors
-```text
-ANTI: sample 1% de errors junto com 1% de success — erros são 0.5% do tráfego;
-       seu sample retém 0.005% de errors total. Praticamente nunca aparecem.
-CERTO: 100% errors. SEMPRE. Erros são raros e críticos.
-```
-### ANTI: sample sem gravar rate
-```text
-ANTI: sample 1/100 mas evento não tem _sample_rate
-       Backend conta literais → count = 1% do real → métricas erradas
-CERTO: gravar _sample_rate no evento; agregar com sum(1/rate) no read.
-```
-### ANTI: tail-based sem collector
-```text
-ANTI: tentar implementar tail-based em SDK do app — precisa bufferizar todos os spans
-       de cada trace, esperar conclusão, decidir, exportar. Memória e latência altas.
-CERTO: tail-based requer OTel Collector como sidecar/proxy. App envia 100% para
-       Collector; Collector decide via processor `tail_sampling`.
-```
-### ANTI: head-based sem propagação
-```text
-ANTI: decisão de sample tomada no service A → não propagada para B → B decide sozinho
-       → trace fica incompleto (alguns spans em A, outros em B, sem correlação)
-CERTO: marcar TraceFlags.SAMPLED no traceparent; B respeita decisão upstream.
-```
-## Verificação
-1. **Errors 100%** — `select count(*) where result_success=false` × `1/sample_rate` ≈ count real
-2. **Enterprise 100%** — verificar via query que enterprise tier tem _sample_rate=1 sempre
-3. **Sample rate gravado** — `select count(*) filter (where _sample_rate is null)` = 0
-4. **Trace integridade** — head-based: trace tem todos os spans (não 50% missing)
-5. **Custo redução real** — bytes/segundo enviado para backend caiu sem perder sinal de error/p99
----
-## Ver também
-- `kit/skills/_shared-observability/glossary.md` — termos sampling
-- `kit/skills/distributed-tracing/SKILL.md` — head vs tail decision timing
-- `kit/skills/opentelemetry-standard/SKILL.md` — Collector tail_sampling processor
-- `kit/skills/event-based-slos/SKILL.md` — SLO precisa de sample_rate para reconstruir totais
-*Material-fonte: Observability Engineering (O'Reilly, 2022) — Cap 17: "Cheap and Accurate Enough: Sampling".*
+---
+name: telemetry-sampling
+description: Use ao reduzir custo de telemetria — head/tail sampling, by-key, dynamic. 100% errors, by-tier para customers, head-based propaga via traceparent.
+---
+# Observabilidade — Telemetry Sampling
+## Quando usar
+LLM carrega esta skill ao reduzir custo de telemetria sem perder sinal. Trigger phrases:
+- "sampling", "reduzir custo de telemetria"
+- "head-based vs tail-based"
+- "by-key sampling", "dynamic sampling"
+- "100% errors mas só 1% sucessos"
+- "trace fica incompleto após sampling"
+## Regras absolutas
+- **100% dos erros sempre** — sample 100% de eventos com `result.success = false`. Erros são raros e críticos. Nunca sample.
+- **100% de paying/enterprise customers** — high-value, baixo volume relativo, debug crucial.
+- **Head-based propaga via `traceparent` flag** — decisão tomada no service de entrada, propagada downstream para garantir trace completo.
+- **Tail-based requer collector buffer** — decisão pós-trace; impossível de implementar inline em código.
+- **Constant probability falha em low volume** — 1/1000 de 100 req/min = 0.1 evento/min, perde tudo.
+- **Sample rate gravado no evento** — sem isso, agregações reconstroem totais errados.
+- **Errors > success** — categorize: paying customers > free, enterprise > pro > free.
+- **Não sample antes de aggregate** — pre-aggregation perde alta cardinalidade. Sample evento bruto, aggregate no read.
+## Estratégias canônicas
+### Head-based sampling (decisão no início do trace)
+```ts
+// PT-BR: decisão tomada no service de entrada, propagada via traceparent flag
+import { trace, context } from '@opentelemetry/api'
+import { TraceFlags } from '@opentelemetry/api'
+function shouldSample(event: SpanContext): boolean {
+  // PT-BR: 100% errors (head-based: erros raramente são conhecidos no head;
+  //         verificar HTTP status no início via header)
+  if (event.attributes['result.success'] === false) return true
+  // PT-BR: 100% enterprise — alto valor
+  if (event.attributes['customer.tier'] === 'enterprise') return true
+  // PT-BR: 10% pro
+  if (event.attributes['customer.tier'] === 'pro') return Math.random() < 0.1
+  // PT-BR: 1% free baseline
+  return Math.random() < 0.01
+}
+// PT-BR: marcar flag sampled no traceparent — propaga para downstream
+const flags = shouldSample(event) ? TraceFlags.SAMPLED : TraceFlags.NONE
+```
+### Tail-based sampling (decisão após trace completar)
+```yaml
+# PT-BR: OTel Collector config — sampling pós-trace
+# 100% errors + outliers de latência + 1% success
+processors:
+  tail_sampling:
+    decision_wait: 10s   # PT-BR: buffer 10s para esperar todos os spans do trace
+    policies:
+      - name: errors-policy
+        type: status_code
+        status_code: { status_codes: [ERROR] }
+      - name: latency-outliers
+        type: latency
+        latency: { threshold_ms: 1000 }   # PT-BR: > 1s é outlier
+      - name: probabilistic-baseline
+        type: probabilistic
+        probabilistic: { sampling_percentage: 1 }
+```
+### By-key sampling
+```ts
+// PT-BR: taxas diferentes por chave — mais preciso que constant
+const SAMPLE_RATES: Record<string, number> = {
+  // chave: [error.type | endpoint | tenant_id, etc.]
+  'error_rate_limit': 0.5,         // PT-BR: 50% (já frequente, mas importante)
+  'error_validation': 1.0,         // PT-BR: 100% (raro, debug crítico)
+  'tenant_acme-corp': 1.0,         // PT-BR: 100% (big customer)
+  'endpoint_/health': 0.001,       // PT-BR: 0.1% (muito frequente, baixo valor)
+  'default': 0.05                  // PT-BR: 5% baseline
+}
+function sampleByKey(event: SpanLike): boolean {
+  const errorKey = `error_${event.attributes['error.type']}`
+  const tenantKey = `tenant_${event.attributes['tenant_id']}`
+  const endpointKey = `endpoint_${event.attributes['endpoint']}`
+  const rate = SAMPLE_RATES[errorKey]
+            ?? SAMPLE_RATES[tenantKey]
+            ?? SAMPLE_RATES[endpointKey]
+            ?? SAMPLE_RATES['default']
+  return Math.random() < rate
+}
+```
+### Dynamic sampling (taxa adapta com volume)
+```ts
+// PT-BR: lookback 30s — quanto traffic veio recentemente?
+let recentVolume = 0
+setInterval(() => { recentVolume = 0 }, 30_000)
+function sampleDynamic(event: SpanLike): boolean {
+  recentVolume++
+  // PT-BR: tráfego baixo → sample mais; tráfego alto → sample menos
+  if (recentVolume < 100) return true                  // até 100 spans em 30s, mantém todos
+  if (recentVolume < 1000) return Math.random() < 0.1  // até 1k, 10%
+  return Math.random() < 0.01                          // > 1k, 1%
+}
+```
+### Combinando: by-key + dynamic + head
+```ts
+function shouldSample(event: SpanLike): boolean {
+  // PT-BR: 1. Errors sempre 100%
+  if (event.attributes['result.success'] === false) return true
+  // PT-BR: 2. Enterprise sempre 100%
+  if (event.attributes['customer.tier'] === 'enterprise') return true
+  // PT-BR: 3. Outras chaves de alto valor
+  if (event.attributes['feature_flag.experiment_a'] === true) return true   // experimento ativo
+  // PT-BR: 4. Dynamic baseline
+  return sampleDynamic(event)
+}
+```
+## Patterns canônicos
+### Pattern: gravar sample_rate no evento
+```ts
+// PT-BR: sem sample_rate, agregações no read time não conseguem reconstruir totais
+const sampleRate = computeSampleRate(event)
+if (Math.random() < sampleRate) {
+  span.setAttribute('_sample_rate', sampleRate)   // PT-BR: 0.01 = 1% sampled
+  span.setAttribute('_sampled', true)
+  // PT-BR: agora o backend pode multiplicar contagens por 1/sample_rate
+  exportSpan(span)
+}
+```
+### Pattern: query reconstruindo totais com sample_rate
+```sql
+-- PT-BR: sem sample_rate, count(*) está errado
+-- COM sample_rate, sum(1/_sample_rate) reconstrói total estimado
+select
+  endpoint,
+  sum(1.0 / _sample_rate) as estimated_total,
+  count(*) as samples_collected,
+  sum(1.0 / _sample_rate) filter (where result_success = false) as estimated_errors
+from observability.events
+where timestamp > now() - interval '1 hour'
+group by endpoint
+order by estimated_total desc;
+```
+### Pattern: sampling para alta cardinalidade
+```ts
+// PT-BR: cardinalidade alta (millions of users) — não pode sample por user.id
+//         mas pode sample por (customer.tier, error.type) — combinação cardin. baixa
+function sampleByDimensions(event: SpanLike): number {
+  const key = `${event.attributes['customer.tier']}-${event.attributes['error.type'] ?? 'success'}`
+  const rates: Record<string, number> = {
+    'enterprise-success': 0.5,
+    'enterprise-error': 1.0,
+    'pro-success': 0.1,
+    'pro-error': 1.0,
+    'free-success': 0.01,
+    'free-error': 1.0,
+  }
+  return rates[key] ?? 0.01
+}
+```
+## Anti-patterns
+### ANTI: constant probability em low volume
+```text
+ANTI: app com 100 req/min, sample rate fixo 1/1000 → 0.1 evento/min retidos
+       Você verá 1 erro a cada 10 minutos. Sinal perdido.
+CERTO: dynamic sampling — alta taxa quando volume baixo, baixa quando alto.
+```
+### ANTI: sample errors
+```text
+ANTI: sample 1% de errors junto com 1% de success — erros são 0.5% do tráfego;
+       seu sample retém 0.005% de errors total. Praticamente nunca aparecem.
+CERTO: 100% errors. SEMPRE. Erros são raros e críticos.
+```
+### ANTI: sample sem gravar rate
+```text
+ANTI: sample 1/100 mas evento não tem _sample_rate
+       Backend conta literais → count = 1% do real → métricas erradas
+CERTO: gravar _sample_rate no evento; agregar com sum(1/rate) no read.
+```
+### ANTI: tail-based sem collector
+```text
+ANTI: tentar implementar tail-based em SDK do app — precisa bufferizar todos os spans
+       de cada trace, esperar conclusão, decidir, exportar. Memória e latência altas.
+CERTO: tail-based requer OTel Collector como sidecar/proxy. App envia 100% para
+       Collector; Collector decide via processor `tail_sampling`.
+```
+### ANTI: head-based sem propagação
+```text
+ANTI: decisão de sample tomada no service A → não propagada para B → B decide sozinho
+       → trace fica incompleto (alguns spans em A, outros em B, sem correlação)
+CERTO: marcar TraceFlags.SAMPLED no traceparent; B respeita decisão upstream.
+```
+## Verificação
+1. **Errors 100%** — `select count(*) where result_success=false` × `1/sample_rate` ≈ count real
+2. **Enterprise 100%** — verificar via query que enterprise tier tem _sample_rate=1 sempre
+3. **Sample rate gravado** — `select count(*) filter (where _sample_rate is null)` = 0
+4. **Trace integridade** — head-based: trace tem todos os spans (não 50% missing)
+5. **Custo redução real** — bytes/segundo enviado para backend caiu sem perder sinal de error/p99
+---
+## Ver também
+- `kit/skills/_shared-observability/glossary.md` — termos sampling
+- `kit/skills/distributed-tracing/SKILL.md` — head vs tail decision timing
+- `kit/skills/opentelemetry-standard/SKILL.md` — Collector tail_sampling processor
+- `kit/skills/event-based-slos/SKILL.md` — SLO precisa de sample_rate para reconstruir totais
+*Material-fonte: Observability Engineering (O'Reilly, 2022) — Cap 17: "Cheap and Accurate Enough: Sampling".*