@luanpdd/kit-mcp 1.35.0 → 1.36.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/cli.js +2 -2
- package/bin/mcp.js +6 -6
- package/bin/ui.js +74 -74
- package/gates/ai-prompt-stability.md +120 -120
- package/gates/budget-description.md +68 -68
- package/gates/confidence.md +29 -29
- package/gates/dependency-check.md +33 -33
- package/gates/dept-cycle-prevention.md +179 -179
- package/gates/golden-signals-coverage.md +133 -133
- package/gates/legacy-refactor-safety.md +178 -178
- package/gates/multi-tenant-rls-coverage.md +102 -102
- package/gates/no-personal-uuid.md +72 -72
- package/gates/obs-agents-mcp-supabase.md +86 -86
- package/gates/obs-skills-frontmatter.md +76 -76
- package/gates/observability-coverage.md +151 -151
- package/gates/omm-no-regression.md +83 -83
- package/gates/postmortem-template-required.md +127 -127
- package/gates/prr-checklist-coverage.md +128 -128
- package/gates/regression.md +32 -32
- package/gates/release-pipeline-policy.md +132 -132
- package/gates/secrets-scan.md +33 -33
- package/gates/service-role-not-in-user-facing.md +113 -113
- package/gates/skill-must-include.md +71 -71
- package/gates/sync-idempotent.md +62 -62
- package/gates/verify-phase-goal.md +34 -34
- package/kit/agents/designer-ui.md +216 -216
- package/kit/agents/workflow-generator.md +537 -167
- package/kit/commands/adicionar-backlog.md +1 -1
- package/kit/commands/adicionar-fase.md +1 -1
- package/kit/commands/adicionar-tarefa.md +1 -1
- package/kit/commands/auditar-observabilidade.md +103 -103
- package/kit/commands/auditar-toil.md +129 -129
- package/kit/commands/caracterizar-prompt.md +195 -195
- package/kit/commands/criar-workflow.md +158 -158
- package/kit/commands/definir-perfil.md +1 -1
- package/kit/commands/definir-slo.md +108 -108
- package/kit/commands/fio.md +1 -1
- package/kit/commands/golden-signals.md +142 -142
- package/kit/commands/instrumentar-fase.md +200 -200
- package/kit/commands/investigar-producao.md +162 -162
- package/kit/commands/observabilidade.md +118 -118
- package/kit/commands/postmortem.md +179 -179
- package/kit/commands/prr.md +205 -205
- package/kit/commands/publicar-rapido.md +207 -207
- package/kit/commands/risk-budget.md +220 -220
- package/kit/commands/sre.md +230 -230
- package/kit/file-manifest.json +424 -424
- package/kit/framework/references/output-style.md +22 -22
- package/kit/hooks/post-apply-migration.js +199 -199
- package/kit/hooks/sidecar-tool-publisher.js +210 -210
- package/kit/skills/_shared-dados-distribuidos/glossary.md +224 -224
- package/kit/skills/_shared-legacy/glossary.md +389 -389
- package/kit/skills/_shared-multi-tenant/glossary.md +186 -186
- package/kit/skills/_shared-observability/glossary.md +396 -396
- package/kit/skills/_shared-sre/glossary.md +712 -712
- package/kit/skills/_shared-supabase/glossary.md +234 -234
- package/kit/skills/blameless-postmortems/SKILL.md +340 -340
- package/kit/skills/burn-rate-alerting/SKILL.md +258 -258
- package/kit/skills/cascading-failures/SKILL.md +311 -311
- package/kit/skills/core-analysis-loop/SKILL.md +352 -352
- package/kit/skills/distributed-tracing/SKILL.md +362 -362
- package/kit/skills/dynamic-workflow-authoring/SKILL.md +327 -223
- package/kit/skills/eliminating-toil/SKILL.md +243 -243
- package/kit/skills/event-based-slos/SKILL.md +296 -296
- package/kit/skills/four-golden-signals/SKILL.md +314 -314
- package/kit/skills/hermetic-builds/SKILL.md +323 -323
- package/kit/skills/legacy-monster-methods/SKILL.md +444 -444
- package/kit/skills/llm-as-dependency/SKILL.md +436 -436
- package/kit/skills/load-shedding-graceful-degradation/SKILL.md +396 -396
- package/kit/skills/observability-driven-development/SKILL.md +315 -315
- package/kit/skills/observability-maturity-model/SKILL.md +222 -222
- package/kit/skills/opentelemetry-standard/SKILL.md +351 -351
- package/kit/skills/production-readiness-review/SKILL.md +305 -305
- package/kit/skills/release-engineering/SKILL.md +367 -367
- package/kit/skills/retry-strategies/SKILL.md +372 -372
- package/kit/skills/sre-risk-management/SKILL.md +221 -221
- package/kit/skills/structured-events/SKILL.md +265 -265
- package/kit/skills/supabase-cron-queues/SKILL.md +275 -275
- package/kit/skills/supabase-database-functions/SKILL.md +332 -332
- package/kit/skills/supabase-declarative-schema/SKILL.md +183 -183
- package/kit/skills/supabase-pgvector-rag/SKILL.md +253 -253
- package/kit/skills/supabase-postgres-style/SKILL.md +138 -138
- package/kit/skills/supabase-storage/SKILL.md +234 -234
- package/kit/skills/telemetry-pipelines/SKILL.md +259 -259
- package/kit/skills/telemetry-sampling/SKILL.md +256 -256
- package/kit/skills/ui-anti-padroes-ia/SKILL.md +261 -261
- package/kit/skills/ui-contexto-produto/SKILL.md +248 -248
- package/kit/skills/ui-cor-estrategia/SKILL.md +213 -213
- package/kit/skills/ui-critica-auditoria/SKILL.md +260 -260
- package/kit/skills/ui-motion-funcional/SKILL.md +264 -264
- package/kit/skills/ui-ritmo-espacial/SKILL.md +259 -259
- package/kit/skills/ui-tipografia/SKILL.md +211 -211
- package/package.json +1 -1
- package/src/cli/index.js +1114 -1114
- package/src/cli/render.js +194 -194
- package/src/cli/upgrade-check.js +135 -135
- package/src/core/error-redaction.js +76 -76
- package/src/core/failures.js +153 -153
- package/src/core/gate-runner.js +205 -205
- package/src/core/gates.js +82 -82
- package/src/core/logger.js +170 -170
- package/src/core/manifest-verify.js +174 -174
- package/src/core/metrics.js +268 -268
- package/src/core/notify.js +60 -60
- package/src/core/path-safety.js +141 -141
- package/src/core/replays.js +120 -120
- package/src/core/ui.js +185 -185
- package/src/mcp-server/install.js +149 -149
- package/src/mcp-server/roots.js +124 -124
- package/src/ui/auto-spawn.js +113 -113
- package/src/ui/browser.js +78 -78
- package/src/ui/client.js +130 -130
- package/src/ui/events.js +65 -65
- package/src/ui/lockfile.js +191 -191
- package/src/ui/port.js +67 -67
- package/src/ui/server.js +547 -547
- package/src/ui/wrapper.js +129 -129
|
@@ -1,195 +1,195 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: caracterizar-prompt
|
|
3
|
-
description: Characterization de prompts/tools LLM em produção — temperature=0 + seed fixo + sanitização específica. Trata prompts como código legacy. Modernização 2026 sem precedente em 2004.
|
|
4
|
-
argument-hint: "<prompt_file> [--inputs-dir PATH] [--provider openai|anthropic] [--seed N] [--max-tokens N] [--num-intents N]"
|
|
5
|
-
allowed-tools:
|
|
6
|
-
- Read
|
|
7
|
-
- Write
|
|
8
|
-
- Edit
|
|
9
|
-
- Bash
|
|
10
|
-
- Grep
|
|
11
|
-
- Glob
|
|
12
|
-
- Task
|
|
13
|
-
---
|
|
14
|
-
|
|
15
|
-
<objective>
|
|
16
|
-
Caracterizar **prompt LLM ou tool definition** capturando outputs determinísticos como golden snapshots. Aplica skill [`ai-prompt-characterization`](../skills/ai-prompt-characterization/SKILL.md) — `temperature=0`, `seed` fixo, sanitização de timestamps/UUIDs/datas relativas, 5+ intents distintas. Trata prompt como **código legacy também** — versionado, testado, code-reviewed.
|
|
17
|
-
|
|
18
|
-
**Cria/Atualiza:**
|
|
19
|
-
- `tests/characterization/prompts/<prompt-stem>.test.ts` (ou `.py`/`.go` conforme runtime)
|
|
20
|
-
- `tests/characterization/prompts/__snapshots__/<prompt-stem>.test.ts.snap`
|
|
21
|
-
- `tests/characterization/prompts/<prompt-stem>/inputs/<intent>.json` — inputs canônicos por intent
|
|
22
|
-
|
|
23
|
-
**Após:** mudança em prompt deve manter snapshot diff = 0 (ou mudança documentada). Detecta drift de model upstream automaticamente.
|
|
24
|
-
</objective>
|
|
25
|
-
|
|
26
|
-
<context>
|
|
27
|
-
**Argumentos:**
|
|
28
|
-
- `<prompt_file>` — arquivo do prompt (e.g., `prompts/generate-summary.md`) — OBRIGATÓRIO
|
|
29
|
-
- `--inputs-dir <path>` — diretório com inputs canônicos por intent (default: agent gera 5 sintéticos cobrindo concise/detailed/code/edge/adversarial)
|
|
30
|
-
- `--provider openai|anthropic` — provider de LLM (default: detecta via deps)
|
|
31
|
-
- `--seed N` — seed para determinismo (default: 42)
|
|
32
|
-
- `--max-tokens N` — limite output (default: 500)
|
|
33
|
-
- `--num-intents N` — número de intents a cobrir (default: 5; mínimo: 5)
|
|
34
|
-
- `--system-prompt <text>` — system prompt se aplicável
|
|
35
|
-
|
|
36
|
-
**Exemplos:**
|
|
37
|
-
```
|
|
38
|
-
/caracterizar-prompt prompts/generate-summary.md
|
|
39
|
-
/caracterizar-prompt prompts/code-reviewer.md --num-intents 7 --max-tokens 1000
|
|
40
|
-
/caracterizar-prompt prompts/intent-classifier.md --inputs-dir test-data/classifier-intents
|
|
41
|
-
/caracterizar-prompt prompts/customer-support.md --provider anthropic --seed 123
|
|
42
|
-
```
|
|
43
|
-
|
|
44
|
-
**Pré-requisitos:**
|
|
45
|
-
- ANTHROPIC_API_KEY ou OPENAI_API_KEY em env
|
|
46
|
-
- Test framework (Vitest, Jest, pytest, ...)
|
|
47
|
-
- Provider escolhido suporta `temperature=0` + `seed`
|
|
48
|
-
|
|
49
|
-
**Quando este comando é o caminho:**
|
|
50
|
-
- Prompt em produção > 50 linhas
|
|
51
|
-
- Mudanças em prompt quebraram silenciosamente no passado
|
|
52
|
-
- Equipe quer baseline antes de refactor de prompt
|
|
53
|
-
- CI deve detectar drift de model upstream (Claude 4.7 → 4.8)
|
|
54
|
-
</context>
|
|
55
|
-
|
|
56
|
-
<process>
|
|
57
|
-
|
|
58
|
-
## 1. Parsear argumentos
|
|
59
|
-
|
|
60
|
-
```bash
|
|
61
|
-
PROMPT_FILE=$(echo "$ARGUMENTS" | awk '{print $1}')
|
|
62
|
-
INPUTS_DIR=$(echo "$ARGUMENTS" | grep -oE -- '--inputs-dir [^ ]+' | awk '{print $2}')
|
|
63
|
-
PROVIDER=$(echo "$ARGUMENTS" | grep -oE -- '--provider [^ ]+' | awk '{print $2}')
|
|
64
|
-
SEED=$(echo "$ARGUMENTS" | grep -oE -- '--seed [0-9]+' | awk '{print $2}')
|
|
65
|
-
MAX_TOKENS=$(echo "$ARGUMENTS" | grep -oE -- '--max-tokens [0-9]+' | awk '{print $2}')
|
|
66
|
-
NUM_INTENTS=$(echo "$ARGUMENTS" | grep -oE -- '--num-intents [0-9]+' | awk '{print $2}')
|
|
67
|
-
|
|
68
|
-
[ -z "$SEED" ] && SEED=42
|
|
69
|
-
[ -z "$MAX_TOKENS" ] && MAX_TOKENS=500
|
|
70
|
-
[ -z "$NUM_INTENTS" ] && NUM_INTENTS=5
|
|
71
|
-
|
|
72
|
-
if [ -z "$PROMPT_FILE" ]; then
|
|
73
|
-
echo "ERROR: prompt_file obrigatório"
|
|
74
|
-
exit 1
|
|
75
|
-
fi
|
|
76
|
-
|
|
77
|
-
if [ ! -f "$PROMPT_FILE" ]; then
|
|
78
|
-
echo "ERROR: arquivo não encontrado: $PROMPT_FILE"
|
|
79
|
-
exit 1
|
|
80
|
-
fi
|
|
81
|
-
```
|
|
82
|
-
|
|
83
|
-
## 2. Detectar provider + framework
|
|
84
|
-
|
|
85
|
-
```bash
|
|
86
|
-
# auto-detect provider
|
|
87
|
-
if [ -z "$PROVIDER" ]; then
|
|
88
|
-
if [ -n "$ANTHROPIC_API_KEY" ]; then
|
|
89
|
-
PROVIDER="anthropic"
|
|
90
|
-
elif [ -n "$OPENAI_API_KEY" ]; then
|
|
91
|
-
PROVIDER="openai"
|
|
92
|
-
else
|
|
93
|
-
echo "ERROR: nenhum provider detectado. Setar ANTHROPIC_API_KEY ou OPENAI_API_KEY"
|
|
94
|
-
exit 1
|
|
95
|
-
fi
|
|
96
|
-
fi
|
|
97
|
-
|
|
98
|
-
# detectar test framework
|
|
99
|
-
FRAMEWORK=""
|
|
100
|
-
if [ -f "package.json" ]; then
|
|
101
|
-
if jq -re '.devDependencies.vitest' package.json >/dev/null 2>&1; then FRAMEWORK="vitest"
|
|
102
|
-
elif jq -re '.devDependencies.jest' package.json >/dev/null 2>&1; then FRAMEWORK="jest"
|
|
103
|
-
fi
|
|
104
|
-
elif [ -f "pyproject.toml" ]; then
|
|
105
|
-
FRAMEWORK="pytest"
|
|
106
|
-
fi
|
|
107
|
-
|
|
108
|
-
[ -z "$FRAMEWORK" ] && FRAMEWORK="vitest" # default sane
|
|
109
|
-
```
|
|
110
|
-
|
|
111
|
-
## 3. Dispatch para `legacy-characterizer` (modo prompt)
|
|
112
|
-
|
|
113
|
-
```text
|
|
114
|
-
Task(
|
|
115
|
-
subagent_type="legacy-characterizer",
|
|
116
|
-
prompt="
|
|
117
|
-
target_file: ${PROMPT_FILE}
|
|
118
|
-
target_kind: prompt
|
|
119
|
-
provider: ${PROVIDER}
|
|
120
|
-
seed: ${SEED}
|
|
121
|
-
max_tokens: ${MAX_TOKENS}
|
|
122
|
-
num_intents: ${NUM_INTENTS}
|
|
123
|
-
${INPUTS_DIR:+inputs_dir: ${INPUTS_DIR}}
|
|
124
|
-
framework: ${FRAMEWORK}
|
|
125
|
-
|
|
126
|
-
Aplicar skill ai-prompt-characterization. Etapas:
|
|
127
|
-
1. Ler prompt + identificar inputs esperados (system prompt? user message format? tools?)
|
|
128
|
-
2. Gerar (ou ler de inputs-dir) ${NUM_INTENTS}+ inputs cobrindo intents distintas:
|
|
129
|
-
- concise: pedido curto, output esperado curto
|
|
130
|
-
- detailed: pedido elaborado, output esperado longo
|
|
131
|
-
- code-heavy: input/output com código
|
|
132
|
-
- edge case: input ambíguo
|
|
133
|
-
- adversarial: prompt injection attempt
|
|
134
|
-
3. Para cada intent: rodar LLM com temperature=0 + seed=${SEED}
|
|
135
|
-
4. Capturar text + finishReason + toolCalls (se function calling) + inputTokens + outputTokens + modelVersion
|
|
136
|
-
5. Sanitizar: timestamps, UUIDs, datas relativas, valores monetários, versões
|
|
137
|
-
6. Salvar como snapshot tests usando ${FRAMEWORK}
|
|
138
|
-
7. Cobertura behavioral = % intents cobertas (não % linhas)
|
|
139
|
-
"
|
|
140
|
-
)
|
|
141
|
-
```
|
|
142
|
-
|
|
143
|
-
## 4. Pós-output
|
|
144
|
-
|
|
145
|
-
```
|
|
146
|
-
═══════════════════════════════════════════════════════════
|
|
147
|
-
framework ► CARACTERIZAR-PROMPT ▸ tests/characterization/prompts/...
|
|
148
|
-
═══════════════════════════════════════════════════════════
|
|
149
|
-
|
|
150
|
-
[output do legacy-characterizer em modo prompt]
|
|
151
|
-
|
|
152
|
-
## ⚠ REVISÃO MANUAL OBRIGATÓRIA
|
|
153
|
-
|
|
154
|
-
Snapshots gerados — leia cada um antes de commit:
|
|
155
|
-
1. Verificar nenhum PII/secret persiste pós-sanitização
|
|
156
|
-
2. Verificar nenhum timestamp/UUID/data relativa unredacted
|
|
157
|
-
3. Confirmar finishReason esperado (stop vs length vs tool_use)
|
|
158
|
-
4. Para tool_uses: confirmar tool name + input shape
|
|
159
|
-
|
|
160
|
-
## Próximos passos
|
|
161
|
-
|
|
162
|
-
1. **Revisar snapshots** manualmente
|
|
163
|
-
2. **Rodar suite local:**
|
|
164
|
-
- JS/TS: `npm test -- tests/characterization/prompts`
|
|
165
|
-
- Python: `pytest tests/characterization/prompts`
|
|
166
|
-
3. **Commit** como `chore: characterize <prompt-name>`
|
|
167
|
-
4. **Configurar CI:**
|
|
168
|
-
- `tests/characterization/prompts/**` rodam em PR que toca `prompts/**`
|
|
169
|
-
- Diff vermelho = mudança comportamental detectada → review humano
|
|
170
|
-
5. **Configurar nightly** para detectar drift de model upstream:
|
|
171
|
-
- Anthropic publica Claude 4.8 → re-run characterization → snapshot diff
|
|
172
|
-
6. **Custo:** ~${NUM_INTENTS} × ($0.015/1k input tokens × 2k = $0.03 + output) ≈ $0.10-0.50/run
|
|
173
|
-
|
|
174
|
-
## Cross-suite
|
|
175
|
-
|
|
176
|
-
- **/caracterizar** (v1.12) — characterization de código (não prompt) — análogo
|
|
177
|
-
- **`llm-as-dependency`** skill — fakear LLM em business logic tests (não esses tests)
|
|
178
|
-
- **`legacy-api-only-applications`** skill — LLM provider é caso especial de API external
|
|
179
|
-
- **/instrumentar-fase** (v1.9) — instrumenta consumer de prompt (latency, tokens)
|
|
180
|
-
```
|
|
181
|
-
|
|
182
|
-
</process>
|
|
183
|
-
|
|
184
|
-
<success_criteria>
|
|
185
|
-
- [ ] $ARGUMENTS parseados
|
|
186
|
-
- [ ] Provider detectado automaticamente OU especificado
|
|
187
|
-
- [ ] Framework de teste detectado
|
|
188
|
-
- [ ] `legacy-characterizer` invocado em modo prompt
|
|
189
|
-
- [ ] ≥ 5 intents cobrindo grupos canônicos (concise/detailed/code/edge/adversarial)
|
|
190
|
-
- [ ] temperature=0 + seed=fixo aplicado
|
|
191
|
-
- [ ] Sanitização específica para outputs LLM aplicada
|
|
192
|
-
- [ ] Tests rodam contra LLM real apenas em characterization (não em business logic tests)
|
|
193
|
-
- [ ] Próximos passos: review, commit, CI config, nightly drift detection
|
|
194
|
-
- [ ] Cross-suite com llm-as-dependency e legacy-api-only-applications
|
|
195
|
-
</success_criteria>
|
|
1
|
+
---
|
|
2
|
+
name: caracterizar-prompt
|
|
3
|
+
description: Characterization de prompts/tools LLM em produção — temperature=0 + seed fixo + sanitização específica. Trata prompts como código legacy. Modernização 2026 sem precedente em 2004.
|
|
4
|
+
argument-hint: "<prompt_file> [--inputs-dir PATH] [--provider openai|anthropic] [--seed N] [--max-tokens N] [--num-intents N]"
|
|
5
|
+
allowed-tools:
|
|
6
|
+
- Read
|
|
7
|
+
- Write
|
|
8
|
+
- Edit
|
|
9
|
+
- Bash
|
|
10
|
+
- Grep
|
|
11
|
+
- Glob
|
|
12
|
+
- Task
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
<objective>
|
|
16
|
+
Caracterizar **prompt LLM ou tool definition** capturando outputs determinísticos como golden snapshots. Aplica skill [`ai-prompt-characterization`](../skills/ai-prompt-characterization/SKILL.md) — `temperature=0`, `seed` fixo, sanitização de timestamps/UUIDs/datas relativas, 5+ intents distintas. Trata prompt como **código legacy também** — versionado, testado, code-reviewed.
|
|
17
|
+
|
|
18
|
+
**Cria/Atualiza:**
|
|
19
|
+
- `tests/characterization/prompts/<prompt-stem>.test.ts` (ou `.py`/`.go` conforme runtime)
|
|
20
|
+
- `tests/characterization/prompts/__snapshots__/<prompt-stem>.test.ts.snap`
|
|
21
|
+
- `tests/characterization/prompts/<prompt-stem>/inputs/<intent>.json` — inputs canônicos por intent
|
|
22
|
+
|
|
23
|
+
**Após:** mudança em prompt deve manter snapshot diff = 0 (ou mudança documentada). Detecta drift de model upstream automaticamente.
|
|
24
|
+
</objective>
|
|
25
|
+
|
|
26
|
+
<context>
|
|
27
|
+
**Argumentos:**
|
|
28
|
+
- `<prompt_file>` — arquivo do prompt (e.g., `prompts/generate-summary.md`) — OBRIGATÓRIO
|
|
29
|
+
- `--inputs-dir <path>` — diretório com inputs canônicos por intent (default: agent gera 5 sintéticos cobrindo concise/detailed/code/edge/adversarial)
|
|
30
|
+
- `--provider openai|anthropic` — provider de LLM (default: detecta via deps)
|
|
31
|
+
- `--seed N` — seed para determinismo (default: 42)
|
|
32
|
+
- `--max-tokens N` — limite output (default: 500)
|
|
33
|
+
- `--num-intents N` — número de intents a cobrir (default: 5; mínimo: 5)
|
|
34
|
+
- `--system-prompt <text>` — system prompt se aplicável
|
|
35
|
+
|
|
36
|
+
**Exemplos:**
|
|
37
|
+
```
|
|
38
|
+
/caracterizar-prompt prompts/generate-summary.md
|
|
39
|
+
/caracterizar-prompt prompts/code-reviewer.md --num-intents 7 --max-tokens 1000
|
|
40
|
+
/caracterizar-prompt prompts/intent-classifier.md --inputs-dir test-data/classifier-intents
|
|
41
|
+
/caracterizar-prompt prompts/customer-support.md --provider anthropic --seed 123
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
**Pré-requisitos:**
|
|
45
|
+
- ANTHROPIC_API_KEY ou OPENAI_API_KEY em env
|
|
46
|
+
- Test framework (Vitest, Jest, pytest, ...)
|
|
47
|
+
- Provider escolhido suporta `temperature=0` + `seed`
|
|
48
|
+
|
|
49
|
+
**Quando este comando é o caminho:**
|
|
50
|
+
- Prompt em produção > 50 linhas
|
|
51
|
+
- Mudanças em prompt quebraram silenciosamente no passado
|
|
52
|
+
- Equipe quer baseline antes de refactor de prompt
|
|
53
|
+
- CI deve detectar drift de model upstream (Claude 4.7 → 4.8)
|
|
54
|
+
</context>
|
|
55
|
+
|
|
56
|
+
<process>
|
|
57
|
+
|
|
58
|
+
## 1. Parsear argumentos
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
PROMPT_FILE=$(echo "$ARGUMENTS" | awk '{print $1}')
|
|
62
|
+
INPUTS_DIR=$(echo "$ARGUMENTS" | grep -oE -- '--inputs-dir [^ ]+' | awk '{print $2}')
|
|
63
|
+
PROVIDER=$(echo "$ARGUMENTS" | grep -oE -- '--provider [^ ]+' | awk '{print $2}')
|
|
64
|
+
SEED=$(echo "$ARGUMENTS" | grep -oE -- '--seed [0-9]+' | awk '{print $2}')
|
|
65
|
+
MAX_TOKENS=$(echo "$ARGUMENTS" | grep -oE -- '--max-tokens [0-9]+' | awk '{print $2}')
|
|
66
|
+
NUM_INTENTS=$(echo "$ARGUMENTS" | grep -oE -- '--num-intents [0-9]+' | awk '{print $2}')
|
|
67
|
+
|
|
68
|
+
[ -z "$SEED" ] && SEED=42
|
|
69
|
+
[ -z "$MAX_TOKENS" ] && MAX_TOKENS=500
|
|
70
|
+
[ -z "$NUM_INTENTS" ] && NUM_INTENTS=5
|
|
71
|
+
|
|
72
|
+
if [ -z "$PROMPT_FILE" ]; then
|
|
73
|
+
echo "ERROR: prompt_file obrigatório"
|
|
74
|
+
exit 1
|
|
75
|
+
fi
|
|
76
|
+
|
|
77
|
+
if [ ! -f "$PROMPT_FILE" ]; then
|
|
78
|
+
echo "ERROR: arquivo não encontrado: $PROMPT_FILE"
|
|
79
|
+
exit 1
|
|
80
|
+
fi
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
## 2. Detectar provider + framework
|
|
84
|
+
|
|
85
|
+
```bash
|
|
86
|
+
# auto-detect provider
|
|
87
|
+
if [ -z "$PROVIDER" ]; then
|
|
88
|
+
if [ -n "$ANTHROPIC_API_KEY" ]; then
|
|
89
|
+
PROVIDER="anthropic"
|
|
90
|
+
elif [ -n "$OPENAI_API_KEY" ]; then
|
|
91
|
+
PROVIDER="openai"
|
|
92
|
+
else
|
|
93
|
+
echo "ERROR: nenhum provider detectado. Setar ANTHROPIC_API_KEY ou OPENAI_API_KEY"
|
|
94
|
+
exit 1
|
|
95
|
+
fi
|
|
96
|
+
fi
|
|
97
|
+
|
|
98
|
+
# detectar test framework
|
|
99
|
+
FRAMEWORK=""
|
|
100
|
+
if [ -f "package.json" ]; then
|
|
101
|
+
if jq -re '.devDependencies.vitest' package.json >/dev/null 2>&1; then FRAMEWORK="vitest"
|
|
102
|
+
elif jq -re '.devDependencies.jest' package.json >/dev/null 2>&1; then FRAMEWORK="jest"
|
|
103
|
+
fi
|
|
104
|
+
elif [ -f "pyproject.toml" ]; then
|
|
105
|
+
FRAMEWORK="pytest"
|
|
106
|
+
fi
|
|
107
|
+
|
|
108
|
+
[ -z "$FRAMEWORK" ] && FRAMEWORK="vitest" # default sane
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
## 3. Dispatch para `legacy-characterizer` (modo prompt)
|
|
112
|
+
|
|
113
|
+
```text
|
|
114
|
+
Task(
|
|
115
|
+
subagent_type="legacy-characterizer",
|
|
116
|
+
prompt="
|
|
117
|
+
target_file: ${PROMPT_FILE}
|
|
118
|
+
target_kind: prompt
|
|
119
|
+
provider: ${PROVIDER}
|
|
120
|
+
seed: ${SEED}
|
|
121
|
+
max_tokens: ${MAX_TOKENS}
|
|
122
|
+
num_intents: ${NUM_INTENTS}
|
|
123
|
+
${INPUTS_DIR:+inputs_dir: ${INPUTS_DIR}}
|
|
124
|
+
framework: ${FRAMEWORK}
|
|
125
|
+
|
|
126
|
+
Aplicar skill ai-prompt-characterization. Etapas:
|
|
127
|
+
1. Ler prompt + identificar inputs esperados (system prompt? user message format? tools?)
|
|
128
|
+
2. Gerar (ou ler de inputs-dir) ${NUM_INTENTS}+ inputs cobrindo intents distintas:
|
|
129
|
+
- concise: pedido curto, output esperado curto
|
|
130
|
+
- detailed: pedido elaborado, output esperado longo
|
|
131
|
+
- code-heavy: input/output com código
|
|
132
|
+
- edge case: input ambíguo
|
|
133
|
+
- adversarial: prompt injection attempt
|
|
134
|
+
3. Para cada intent: rodar LLM com temperature=0 + seed=${SEED}
|
|
135
|
+
4. Capturar text + finishReason + toolCalls (se function calling) + inputTokens + outputTokens + modelVersion
|
|
136
|
+
5. Sanitizar: timestamps, UUIDs, datas relativas, valores monetários, versões
|
|
137
|
+
6. Salvar como snapshot tests usando ${FRAMEWORK}
|
|
138
|
+
7. Cobertura behavioral = % intents cobertas (não % linhas)
|
|
139
|
+
"
|
|
140
|
+
)
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
## 4. Pós-output
|
|
144
|
+
|
|
145
|
+
```
|
|
146
|
+
═══════════════════════════════════════════════════════════
|
|
147
|
+
framework ► CARACTERIZAR-PROMPT ▸ tests/characterization/prompts/...
|
|
148
|
+
═══════════════════════════════════════════════════════════
|
|
149
|
+
|
|
150
|
+
[output do legacy-characterizer em modo prompt]
|
|
151
|
+
|
|
152
|
+
## ⚠ REVISÃO MANUAL OBRIGATÓRIA
|
|
153
|
+
|
|
154
|
+
Snapshots gerados — leia cada um antes de commit:
|
|
155
|
+
1. Verificar nenhum PII/secret persiste pós-sanitização
|
|
156
|
+
2. Verificar nenhum timestamp/UUID/data relativa unredacted
|
|
157
|
+
3. Confirmar finishReason esperado (stop vs length vs tool_use)
|
|
158
|
+
4. Para tool_uses: confirmar tool name + input shape
|
|
159
|
+
|
|
160
|
+
## Próximos passos
|
|
161
|
+
|
|
162
|
+
1. **Revisar snapshots** manualmente
|
|
163
|
+
2. **Rodar suite local:**
|
|
164
|
+
- JS/TS: `npm test -- tests/characterization/prompts`
|
|
165
|
+
- Python: `pytest tests/characterization/prompts`
|
|
166
|
+
3. **Commit** como `chore: characterize <prompt-name>`
|
|
167
|
+
4. **Configurar CI:**
|
|
168
|
+
- `tests/characterization/prompts/**` rodam em PR que toca `prompts/**`
|
|
169
|
+
- Diff vermelho = mudança comportamental detectada → review humano
|
|
170
|
+
5. **Configurar nightly** para detectar drift de model upstream:
|
|
171
|
+
- Anthropic publica Claude 4.8 → re-run characterization → snapshot diff
|
|
172
|
+
6. **Custo:** ~${NUM_INTENTS} × ($0.015/1k input tokens × 2k = $0.03 + output) ≈ $0.10-0.50/run
|
|
173
|
+
|
|
174
|
+
## Cross-suite
|
|
175
|
+
|
|
176
|
+
- **/caracterizar** (v1.12) — characterization de código (não prompt) — análogo
|
|
177
|
+
- **`llm-as-dependency`** skill — fakear LLM em business logic tests (não esses tests)
|
|
178
|
+
- **`legacy-api-only-applications`** skill — LLM provider é caso especial de API external
|
|
179
|
+
- **/instrumentar-fase** (v1.9) — instrumenta consumer de prompt (latency, tokens)
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
</process>
|
|
183
|
+
|
|
184
|
+
<success_criteria>
|
|
185
|
+
- [ ] $ARGUMENTS parseados
|
|
186
|
+
- [ ] Provider detectado automaticamente OU especificado
|
|
187
|
+
- [ ] Framework de teste detectado
|
|
188
|
+
- [ ] `legacy-characterizer` invocado em modo prompt
|
|
189
|
+
- [ ] ≥ 5 intents cobrindo grupos canônicos (concise/detailed/code/edge/adversarial)
|
|
190
|
+
- [ ] temperature=0 + seed=fixo aplicado
|
|
191
|
+
- [ ] Sanitização específica para outputs LLM aplicada
|
|
192
|
+
- [ ] Tests rodam contra LLM real apenas em characterization (não em business logic tests)
|
|
193
|
+
- [ ] Próximos passos: review, commit, CI config, nightly drift detection
|
|
194
|
+
- [ ] Cross-suite com llm-as-dependency e legacy-api-only-applications
|
|
195
|
+
</success_criteria>
|