@luanpdd/kit-mcp 1.35.0 → 1.36.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (117) hide show
  1. package/bin/cli.js +2 -2
  2. package/bin/mcp.js +6 -6
  3. package/bin/ui.js +74 -74
  4. package/gates/ai-prompt-stability.md +120 -120
  5. package/gates/budget-description.md +68 -68
  6. package/gates/confidence.md +29 -29
  7. package/gates/dependency-check.md +33 -33
  8. package/gates/dept-cycle-prevention.md +179 -179
  9. package/gates/golden-signals-coverage.md +133 -133
  10. package/gates/legacy-refactor-safety.md +178 -178
  11. package/gates/multi-tenant-rls-coverage.md +102 -102
  12. package/gates/no-personal-uuid.md +72 -72
  13. package/gates/obs-agents-mcp-supabase.md +86 -86
  14. package/gates/obs-skills-frontmatter.md +76 -76
  15. package/gates/observability-coverage.md +151 -151
  16. package/gates/omm-no-regression.md +83 -83
  17. package/gates/postmortem-template-required.md +127 -127
  18. package/gates/prr-checklist-coverage.md +128 -128
  19. package/gates/regression.md +32 -32
  20. package/gates/release-pipeline-policy.md +132 -132
  21. package/gates/secrets-scan.md +33 -33
  22. package/gates/service-role-not-in-user-facing.md +113 -113
  23. package/gates/skill-must-include.md +71 -71
  24. package/gates/sync-idempotent.md +62 -62
  25. package/gates/verify-phase-goal.md +34 -34
  26. package/kit/agents/designer-ui.md +216 -216
  27. package/kit/agents/workflow-generator.md +537 -167
  28. package/kit/commands/adicionar-backlog.md +1 -1
  29. package/kit/commands/adicionar-fase.md +1 -1
  30. package/kit/commands/adicionar-tarefa.md +1 -1
  31. package/kit/commands/auditar-observabilidade.md +103 -103
  32. package/kit/commands/auditar-toil.md +129 -129
  33. package/kit/commands/caracterizar-prompt.md +195 -195
  34. package/kit/commands/criar-workflow.md +158 -158
  35. package/kit/commands/definir-perfil.md +1 -1
  36. package/kit/commands/definir-slo.md +108 -108
  37. package/kit/commands/fio.md +1 -1
  38. package/kit/commands/golden-signals.md +142 -142
  39. package/kit/commands/instrumentar-fase.md +200 -200
  40. package/kit/commands/investigar-producao.md +162 -162
  41. package/kit/commands/observabilidade.md +118 -118
  42. package/kit/commands/postmortem.md +179 -179
  43. package/kit/commands/prr.md +205 -205
  44. package/kit/commands/publicar-rapido.md +207 -207
  45. package/kit/commands/risk-budget.md +220 -220
  46. package/kit/commands/sre.md +230 -230
  47. package/kit/file-manifest.json +424 -424
  48. package/kit/framework/references/output-style.md +22 -22
  49. package/kit/hooks/post-apply-migration.js +199 -199
  50. package/kit/hooks/sidecar-tool-publisher.js +210 -210
  51. package/kit/skills/_shared-dados-distribuidos/glossary.md +224 -224
  52. package/kit/skills/_shared-legacy/glossary.md +389 -389
  53. package/kit/skills/_shared-multi-tenant/glossary.md +186 -186
  54. package/kit/skills/_shared-observability/glossary.md +396 -396
  55. package/kit/skills/_shared-sre/glossary.md +712 -712
  56. package/kit/skills/_shared-supabase/glossary.md +234 -234
  57. package/kit/skills/blameless-postmortems/SKILL.md +340 -340
  58. package/kit/skills/burn-rate-alerting/SKILL.md +258 -258
  59. package/kit/skills/cascading-failures/SKILL.md +311 -311
  60. package/kit/skills/core-analysis-loop/SKILL.md +352 -352
  61. package/kit/skills/distributed-tracing/SKILL.md +362 -362
  62. package/kit/skills/dynamic-workflow-authoring/SKILL.md +327 -223
  63. package/kit/skills/eliminating-toil/SKILL.md +243 -243
  64. package/kit/skills/event-based-slos/SKILL.md +296 -296
  65. package/kit/skills/four-golden-signals/SKILL.md +314 -314
  66. package/kit/skills/hermetic-builds/SKILL.md +323 -323
  67. package/kit/skills/legacy-monster-methods/SKILL.md +444 -444
  68. package/kit/skills/llm-as-dependency/SKILL.md +436 -436
  69. package/kit/skills/load-shedding-graceful-degradation/SKILL.md +396 -396
  70. package/kit/skills/observability-driven-development/SKILL.md +315 -315
  71. package/kit/skills/observability-maturity-model/SKILL.md +222 -222
  72. package/kit/skills/opentelemetry-standard/SKILL.md +351 -351
  73. package/kit/skills/production-readiness-review/SKILL.md +305 -305
  74. package/kit/skills/release-engineering/SKILL.md +367 -367
  75. package/kit/skills/retry-strategies/SKILL.md +372 -372
  76. package/kit/skills/sre-risk-management/SKILL.md +221 -221
  77. package/kit/skills/structured-events/SKILL.md +265 -265
  78. package/kit/skills/supabase-cron-queues/SKILL.md +275 -275
  79. package/kit/skills/supabase-database-functions/SKILL.md +332 -332
  80. package/kit/skills/supabase-declarative-schema/SKILL.md +183 -183
  81. package/kit/skills/supabase-pgvector-rag/SKILL.md +253 -253
  82. package/kit/skills/supabase-postgres-style/SKILL.md +138 -138
  83. package/kit/skills/supabase-storage/SKILL.md +234 -234
  84. package/kit/skills/telemetry-pipelines/SKILL.md +259 -259
  85. package/kit/skills/telemetry-sampling/SKILL.md +256 -256
  86. package/kit/skills/ui-anti-padroes-ia/SKILL.md +261 -261
  87. package/kit/skills/ui-contexto-produto/SKILL.md +248 -248
  88. package/kit/skills/ui-cor-estrategia/SKILL.md +213 -213
  89. package/kit/skills/ui-critica-auditoria/SKILL.md +260 -260
  90. package/kit/skills/ui-motion-funcional/SKILL.md +264 -264
  91. package/kit/skills/ui-ritmo-espacial/SKILL.md +259 -259
  92. package/kit/skills/ui-tipografia/SKILL.md +211 -211
  93. package/package.json +1 -1
  94. package/src/cli/index.js +1114 -1114
  95. package/src/cli/render.js +194 -194
  96. package/src/cli/upgrade-check.js +135 -135
  97. package/src/core/error-redaction.js +76 -76
  98. package/src/core/failures.js +153 -153
  99. package/src/core/gate-runner.js +205 -205
  100. package/src/core/gates.js +82 -82
  101. package/src/core/logger.js +170 -170
  102. package/src/core/manifest-verify.js +174 -174
  103. package/src/core/metrics.js +268 -268
  104. package/src/core/notify.js +60 -60
  105. package/src/core/path-safety.js +141 -141
  106. package/src/core/replays.js +120 -120
  107. package/src/core/ui.js +185 -185
  108. package/src/mcp-server/install.js +149 -149
  109. package/src/mcp-server/roots.js +124 -124
  110. package/src/ui/auto-spawn.js +113 -113
  111. package/src/ui/browser.js +78 -78
  112. package/src/ui/client.js +130 -130
  113. package/src/ui/events.js +65 -65
  114. package/src/ui/lockfile.js +191 -191
  115. package/src/ui/port.js +67 -67
  116. package/src/ui/server.js +547 -547
  117. package/src/ui/wrapper.js +129 -129
@@ -1,195 +1,195 @@
1
- ---
2
- name: caracterizar-prompt
3
- description: Characterization de prompts/tools LLM em produção — temperature=0 + seed fixo + sanitização específica. Trata prompts como código legacy. Modernização 2026 sem precedente em 2004.
4
- argument-hint: "<prompt_file> [--inputs-dir PATH] [--provider openai|anthropic] [--seed N] [--max-tokens N] [--num-intents N]"
5
- allowed-tools:
6
- - Read
7
- - Write
8
- - Edit
9
- - Bash
10
- - Grep
11
- - Glob
12
- - Task
13
- ---
14
-
15
- <objective>
16
- Caracterizar **prompt LLM ou tool definition** capturando outputs determinísticos como golden snapshots. Aplica skill [`ai-prompt-characterization`](../skills/ai-prompt-characterization/SKILL.md) — `temperature=0`, `seed` fixo, sanitização de timestamps/UUIDs/datas relativas, 5+ intents distintas. Trata prompt como **código legacy também** — versionado, testado, code-reviewed.
17
-
18
- **Cria/Atualiza:**
19
- - `tests/characterization/prompts/<prompt-stem>.test.ts` (ou `.py`/`.go` conforme runtime)
20
- - `tests/characterization/prompts/__snapshots__/<prompt-stem>.test.ts.snap`
21
- - `tests/characterization/prompts/<prompt-stem>/inputs/<intent>.json` — inputs canônicos por intent
22
-
23
- **Após:** mudança em prompt deve manter snapshot diff = 0 (ou mudança documentada). Detecta drift de model upstream automaticamente.
24
- </objective>
25
-
26
- <context>
27
- **Argumentos:**
28
- - `<prompt_file>` — arquivo do prompt (e.g., `prompts/generate-summary.md`) — OBRIGATÓRIO
29
- - `--inputs-dir <path>` — diretório com inputs canônicos por intent (default: agent gera 5 sintéticos cobrindo concise/detailed/code/edge/adversarial)
30
- - `--provider openai|anthropic` — provider de LLM (default: detecta via deps)
31
- - `--seed N` — seed para determinismo (default: 42)
32
- - `--max-tokens N` — limite output (default: 500)
33
- - `--num-intents N` — número de intents a cobrir (default: 5; mínimo: 5)
34
- - `--system-prompt <text>` — system prompt se aplicável
35
-
36
- **Exemplos:**
37
- ```
38
- /caracterizar-prompt prompts/generate-summary.md
39
- /caracterizar-prompt prompts/code-reviewer.md --num-intents 7 --max-tokens 1000
40
- /caracterizar-prompt prompts/intent-classifier.md --inputs-dir test-data/classifier-intents
41
- /caracterizar-prompt prompts/customer-support.md --provider anthropic --seed 123
42
- ```
43
-
44
- **Pré-requisitos:**
45
- - ANTHROPIC_API_KEY ou OPENAI_API_KEY em env
46
- - Test framework (Vitest, Jest, pytest, ...)
47
- - Provider escolhido suporta `temperature=0` + `seed`
48
-
49
- **Quando este comando é o caminho:**
50
- - Prompt em produção > 50 linhas
51
- - Mudanças em prompt quebraram silenciosamente no passado
52
- - Equipe quer baseline antes de refactor de prompt
53
- - CI deve detectar drift de model upstream (Claude 4.7 → 4.8)
54
- </context>
55
-
56
- <process>
57
-
58
- ## 1. Parsear argumentos
59
-
60
- ```bash
61
- PROMPT_FILE=$(echo "$ARGUMENTS" | awk '{print $1}')
62
- INPUTS_DIR=$(echo "$ARGUMENTS" | grep -oE -- '--inputs-dir [^ ]+' | awk '{print $2}')
63
- PROVIDER=$(echo "$ARGUMENTS" | grep -oE -- '--provider [^ ]+' | awk '{print $2}')
64
- SEED=$(echo "$ARGUMENTS" | grep -oE -- '--seed [0-9]+' | awk '{print $2}')
65
- MAX_TOKENS=$(echo "$ARGUMENTS" | grep -oE -- '--max-tokens [0-9]+' | awk '{print $2}')
66
- NUM_INTENTS=$(echo "$ARGUMENTS" | grep -oE -- '--num-intents [0-9]+' | awk '{print $2}')
67
-
68
- [ -z "$SEED" ] && SEED=42
69
- [ -z "$MAX_TOKENS" ] && MAX_TOKENS=500
70
- [ -z "$NUM_INTENTS" ] && NUM_INTENTS=5
71
-
72
- if [ -z "$PROMPT_FILE" ]; then
73
- echo "ERROR: prompt_file obrigatório"
74
- exit 1
75
- fi
76
-
77
- if [ ! -f "$PROMPT_FILE" ]; then
78
- echo "ERROR: arquivo não encontrado: $PROMPT_FILE"
79
- exit 1
80
- fi
81
- ```
82
-
83
- ## 2. Detectar provider + framework
84
-
85
- ```bash
86
- # auto-detect provider
87
- if [ -z "$PROVIDER" ]; then
88
- if [ -n "$ANTHROPIC_API_KEY" ]; then
89
- PROVIDER="anthropic"
90
- elif [ -n "$OPENAI_API_KEY" ]; then
91
- PROVIDER="openai"
92
- else
93
- echo "ERROR: nenhum provider detectado. Setar ANTHROPIC_API_KEY ou OPENAI_API_KEY"
94
- exit 1
95
- fi
96
- fi
97
-
98
- # detectar test framework
99
- FRAMEWORK=""
100
- if [ -f "package.json" ]; then
101
- if jq -re '.devDependencies.vitest' package.json >/dev/null 2>&1; then FRAMEWORK="vitest"
102
- elif jq -re '.devDependencies.jest' package.json >/dev/null 2>&1; then FRAMEWORK="jest"
103
- fi
104
- elif [ -f "pyproject.toml" ]; then
105
- FRAMEWORK="pytest"
106
- fi
107
-
108
- [ -z "$FRAMEWORK" ] && FRAMEWORK="vitest" # default sane
109
- ```
110
-
111
- ## 3. Dispatch para `legacy-characterizer` (modo prompt)
112
-
113
- ```text
114
- Task(
115
- subagent_type="legacy-characterizer",
116
- prompt="
117
- target_file: ${PROMPT_FILE}
118
- target_kind: prompt
119
- provider: ${PROVIDER}
120
- seed: ${SEED}
121
- max_tokens: ${MAX_TOKENS}
122
- num_intents: ${NUM_INTENTS}
123
- ${INPUTS_DIR:+inputs_dir: ${INPUTS_DIR}}
124
- framework: ${FRAMEWORK}
125
-
126
- Aplicar skill ai-prompt-characterization. Etapas:
127
- 1. Ler prompt + identificar inputs esperados (system prompt? user message format? tools?)
128
- 2. Gerar (ou ler de inputs-dir) ${NUM_INTENTS}+ inputs cobrindo intents distintas:
129
- - concise: pedido curto, output esperado curto
130
- - detailed: pedido elaborado, output esperado longo
131
- - code-heavy: input/output com código
132
- - edge case: input ambíguo
133
- - adversarial: prompt injection attempt
134
- 3. Para cada intent: rodar LLM com temperature=0 + seed=${SEED}
135
- 4. Capturar text + finishReason + toolCalls (se function calling) + inputTokens + outputTokens + modelVersion
136
- 5. Sanitizar: timestamps, UUIDs, datas relativas, valores monetários, versões
137
- 6. Salvar como snapshot tests usando ${FRAMEWORK}
138
- 7. Cobertura behavioral = % intents cobertas (não % linhas)
139
- "
140
- )
141
- ```
142
-
143
- ## 4. Pós-output
144
-
145
- ```
146
- ═══════════════════════════════════════════════════════════
147
- framework ► CARACTERIZAR-PROMPT ▸ tests/characterization/prompts/...
148
- ═══════════════════════════════════════════════════════════
149
-
150
- [output do legacy-characterizer em modo prompt]
151
-
152
- ## ⚠ REVISÃO MANUAL OBRIGATÓRIA
153
-
154
- Snapshots gerados — leia cada um antes de commit:
155
- 1. Verificar nenhum PII/secret persiste pós-sanitização
156
- 2. Verificar nenhum timestamp/UUID/data relativa unredacted
157
- 3. Confirmar finishReason esperado (stop vs length vs tool_use)
158
- 4. Para tool_uses: confirmar tool name + input shape
159
-
160
- ## Próximos passos
161
-
162
- 1. **Revisar snapshots** manualmente
163
- 2. **Rodar suite local:**
164
- - JS/TS: `npm test -- tests/characterization/prompts`
165
- - Python: `pytest tests/characterization/prompts`
166
- 3. **Commit** como `chore: characterize <prompt-name>`
167
- 4. **Configurar CI:**
168
- - `tests/characterization/prompts/**` rodam em PR que toca `prompts/**`
169
- - Diff vermelho = mudança comportamental detectada → review humano
170
- 5. **Configurar nightly** para detectar drift de model upstream:
171
- - Anthropic publica Claude 4.8 → re-run characterization → snapshot diff
172
- 6. **Custo:** ~${NUM_INTENTS} × ($0.015/1k input tokens × 2k = $0.03 + output) ≈ $0.10-0.50/run
173
-
174
- ## Cross-suite
175
-
176
- - **/caracterizar** (v1.12) — characterization de código (não prompt) — análogo
177
- - **`llm-as-dependency`** skill — fakear LLM em business logic tests (não esses tests)
178
- - **`legacy-api-only-applications`** skill — LLM provider é caso especial de API external
179
- - **/instrumentar-fase** (v1.9) — instrumenta consumer de prompt (latency, tokens)
180
- ```
181
-
182
- </process>
183
-
184
- <success_criteria>
185
- - [ ] $ARGUMENTS parseados
186
- - [ ] Provider detectado automaticamente OU especificado
187
- - [ ] Framework de teste detectado
188
- - [ ] `legacy-characterizer` invocado em modo prompt
189
- - [ ] ≥ 5 intents cobrindo grupos canônicos (concise/detailed/code/edge/adversarial)
190
- - [ ] temperature=0 + seed=fixo aplicado
191
- - [ ] Sanitização específica para outputs LLM aplicada
192
- - [ ] Tests rodam contra LLM real apenas em characterization (não em business logic tests)
193
- - [ ] Próximos passos: review, commit, CI config, nightly drift detection
194
- - [ ] Cross-suite com llm-as-dependency e legacy-api-only-applications
195
- </success_criteria>
1
+ ---
2
+ name: caracterizar-prompt
3
+ description: Characterization de prompts/tools LLM em produção — temperature=0 + seed fixo + sanitização específica. Trata prompts como código legacy. Modernização 2026 sem precedente em 2004.
4
+ argument-hint: "<prompt_file> [--inputs-dir PATH] [--provider openai|anthropic] [--seed N] [--max-tokens N] [--num-intents N]"
5
+ allowed-tools:
6
+ - Read
7
+ - Write
8
+ - Edit
9
+ - Bash
10
+ - Grep
11
+ - Glob
12
+ - Task
13
+ ---
14
+
15
+ <objective>
16
+ Caracterizar **prompt LLM ou tool definition** capturando outputs determinísticos como golden snapshots. Aplica skill [`ai-prompt-characterization`](../skills/ai-prompt-characterization/SKILL.md) — `temperature=0`, `seed` fixo, sanitização de timestamps/UUIDs/datas relativas, 5+ intents distintas. Trata prompt como **código legacy também** — versionado, testado, code-reviewed.
17
+
18
+ **Cria/Atualiza:**
19
+ - `tests/characterization/prompts/<prompt-stem>.test.ts` (ou `.py`/`.go` conforme runtime)
20
+ - `tests/characterization/prompts/__snapshots__/<prompt-stem>.test.ts.snap`
21
+ - `tests/characterization/prompts/<prompt-stem>/inputs/<intent>.json` — inputs canônicos por intent
22
+
23
+ **Após:** mudança em prompt deve manter snapshot diff = 0 (ou mudança documentada). Detecta drift de model upstream automaticamente.
24
+ </objective>
25
+
26
+ <context>
27
+ **Argumentos:**
28
+ - `<prompt_file>` — arquivo do prompt (e.g., `prompts/generate-summary.md`) — OBRIGATÓRIO
29
+ - `--inputs-dir <path>` — diretório com inputs canônicos por intent (default: agent gera 5 sintéticos cobrindo concise/detailed/code/edge/adversarial)
30
+ - `--provider openai|anthropic` — provider de LLM (default: detecta via deps)
31
+ - `--seed N` — seed para determinismo (default: 42)
32
+ - `--max-tokens N` — limite output (default: 500)
33
+ - `--num-intents N` — número de intents a cobrir (default: 5; mínimo: 5)
34
+ - `--system-prompt <text>` — system prompt se aplicável
35
+
36
+ **Exemplos:**
37
+ ```
38
+ /caracterizar-prompt prompts/generate-summary.md
39
+ /caracterizar-prompt prompts/code-reviewer.md --num-intents 7 --max-tokens 1000
40
+ /caracterizar-prompt prompts/intent-classifier.md --inputs-dir test-data/classifier-intents
41
+ /caracterizar-prompt prompts/customer-support.md --provider anthropic --seed 123
42
+ ```
43
+
44
+ **Pré-requisitos:**
45
+ - ANTHROPIC_API_KEY ou OPENAI_API_KEY em env
46
+ - Test framework (Vitest, Jest, pytest, ...)
47
+ - Provider escolhido suporta `temperature=0` + `seed`
48
+
49
+ **Quando este comando é o caminho:**
50
+ - Prompt em produção > 50 linhas
51
+ - Mudanças em prompt quebraram silenciosamente no passado
52
+ - Equipe quer baseline antes de refactor de prompt
53
+ - CI deve detectar drift de model upstream (Claude 4.7 → 4.8)
54
+ </context>
55
+
56
+ <process>
57
+
58
+ ## 1. Parsear argumentos
59
+
60
+ ```bash
61
+ PROMPT_FILE=$(echo "$ARGUMENTS" | awk '{print $1}')
62
+ INPUTS_DIR=$(echo "$ARGUMENTS" | grep -oE -- '--inputs-dir [^ ]+' | awk '{print $2}')
63
+ PROVIDER=$(echo "$ARGUMENTS" | grep -oE -- '--provider [^ ]+' | awk '{print $2}')
64
+ SEED=$(echo "$ARGUMENTS" | grep -oE -- '--seed [0-9]+' | awk '{print $2}')
65
+ MAX_TOKENS=$(echo "$ARGUMENTS" | grep -oE -- '--max-tokens [0-9]+' | awk '{print $2}')
66
+ NUM_INTENTS=$(echo "$ARGUMENTS" | grep -oE -- '--num-intents [0-9]+' | awk '{print $2}')
67
+
68
+ [ -z "$SEED" ] && SEED=42
69
+ [ -z "$MAX_TOKENS" ] && MAX_TOKENS=500
70
+ [ -z "$NUM_INTENTS" ] && NUM_INTENTS=5
71
+
72
+ if [ -z "$PROMPT_FILE" ]; then
73
+ echo "ERROR: prompt_file obrigatório"
74
+ exit 1
75
+ fi
76
+
77
+ if [ ! -f "$PROMPT_FILE" ]; then
78
+ echo "ERROR: arquivo não encontrado: $PROMPT_FILE"
79
+ exit 1
80
+ fi
81
+ ```
82
+
83
+ ## 2. Detectar provider + framework
84
+
85
+ ```bash
86
+ # auto-detect provider
87
+ if [ -z "$PROVIDER" ]; then
88
+ if [ -n "$ANTHROPIC_API_KEY" ]; then
89
+ PROVIDER="anthropic"
90
+ elif [ -n "$OPENAI_API_KEY" ]; then
91
+ PROVIDER="openai"
92
+ else
93
+ echo "ERROR: nenhum provider detectado. Setar ANTHROPIC_API_KEY ou OPENAI_API_KEY"
94
+ exit 1
95
+ fi
96
+ fi
97
+
98
+ # detectar test framework
99
+ FRAMEWORK=""
100
+ if [ -f "package.json" ]; then
101
+ if jq -re '.devDependencies.vitest' package.json >/dev/null 2>&1; then FRAMEWORK="vitest"
102
+ elif jq -re '.devDependencies.jest' package.json >/dev/null 2>&1; then FRAMEWORK="jest"
103
+ fi
104
+ elif [ -f "pyproject.toml" ]; then
105
+ FRAMEWORK="pytest"
106
+ fi
107
+
108
+ [ -z "$FRAMEWORK" ] && FRAMEWORK="vitest" # default sane
109
+ ```
110
+
111
+ ## 3. Dispatch para `legacy-characterizer` (modo prompt)
112
+
113
+ ```text
114
+ Task(
115
+ subagent_type="legacy-characterizer",
116
+ prompt="
117
+ target_file: ${PROMPT_FILE}
118
+ target_kind: prompt
119
+ provider: ${PROVIDER}
120
+ seed: ${SEED}
121
+ max_tokens: ${MAX_TOKENS}
122
+ num_intents: ${NUM_INTENTS}
123
+ ${INPUTS_DIR:+inputs_dir: ${INPUTS_DIR}}
124
+ framework: ${FRAMEWORK}
125
+
126
+ Aplicar skill ai-prompt-characterization. Etapas:
127
+ 1. Ler prompt + identificar inputs esperados (system prompt? user message format? tools?)
128
+ 2. Gerar (ou ler de inputs-dir) ${NUM_INTENTS}+ inputs cobrindo intents distintas:
129
+ - concise: pedido curto, output esperado curto
130
+ - detailed: pedido elaborado, output esperado longo
131
+ - code-heavy: input/output com código
132
+ - edge case: input ambíguo
133
+ - adversarial: prompt injection attempt
134
+ 3. Para cada intent: rodar LLM com temperature=0 + seed=${SEED}
135
+ 4. Capturar text + finishReason + toolCalls (se function calling) + inputTokens + outputTokens + modelVersion
136
+ 5. Sanitizar: timestamps, UUIDs, datas relativas, valores monetários, versões
137
+ 6. Salvar como snapshot tests usando ${FRAMEWORK}
138
+ 7. Cobertura behavioral = % intents cobertas (não % linhas)
139
+ "
140
+ )
141
+ ```
142
+
143
+ ## 4. Pós-output
144
+
145
+ ```
146
+ ═══════════════════════════════════════════════════════════
147
+ framework ► CARACTERIZAR-PROMPT ▸ tests/characterization/prompts/...
148
+ ═══════════════════════════════════════════════════════════
149
+
150
+ [output do legacy-characterizer em modo prompt]
151
+
152
+ ## ⚠ REVISÃO MANUAL OBRIGATÓRIA
153
+
154
+ Snapshots gerados — leia cada um antes de commit:
155
+ 1. Verificar nenhum PII/secret persiste pós-sanitização
156
+ 2. Verificar nenhum timestamp/UUID/data relativa unredacted
157
+ 3. Confirmar finishReason esperado (stop vs length vs tool_use)
158
+ 4. Para tool_uses: confirmar tool name + input shape
159
+
160
+ ## Próximos passos
161
+
162
+ 1. **Revisar snapshots** manualmente
163
+ 2. **Rodar suite local:**
164
+ - JS/TS: `npm test -- tests/characterization/prompts`
165
+ - Python: `pytest tests/characterization/prompts`
166
+ 3. **Commit** como `chore: characterize <prompt-name>`
167
+ 4. **Configurar CI:**
168
+ - `tests/characterization/prompts/**` rodam em PR que toca `prompts/**`
169
+ - Diff vermelho = mudança comportamental detectada → review humano
170
+ 5. **Configurar nightly** para detectar drift de model upstream:
171
+ - Anthropic publica Claude 4.8 → re-run characterization → snapshot diff
172
+ 6. **Custo:** ~${NUM_INTENTS} × ($0.015/1k input tokens × 2k = $0.03 + output) ≈ $0.10-0.50/run
173
+
174
+ ## Cross-suite
175
+
176
+ - **/caracterizar** (v1.12) — characterization de código (não prompt) — análogo
177
+ - **`llm-as-dependency`** skill — fakear LLM em business logic tests (não esses tests)
178
+ - **`legacy-api-only-applications`** skill — LLM provider é caso especial de API external
179
+ - **/instrumentar-fase** (v1.9) — instrumenta consumer de prompt (latency, tokens)
180
+ ```
181
+
182
+ </process>
183
+
184
+ <success_criteria>
185
+ - [ ] $ARGUMENTS parseados
186
+ - [ ] Provider detectado automaticamente OU especificado
187
+ - [ ] Framework de teste detectado
188
+ - [ ] `legacy-characterizer` invocado em modo prompt
189
+ - [ ] ≥ 5 intents cobrindo grupos canônicos (concise/detailed/code/edge/adversarial)
190
+ - [ ] temperature=0 + seed=fixo aplicado
191
+ - [ ] Sanitização específica para outputs LLM aplicada
192
+ - [ ] Tests rodam contra LLM real apenas em characterization (não em business logic tests)
193
+ - [ ] Próximos passos: review, commit, CI config, nightly drift detection
194
+ - [ ] Cross-suite com llm-as-dependency e legacy-api-only-applications
195
+ </success_criteria>