cdp-edge 2.2.0 → 2.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +25 -8
- package/extracted-skill/tracking-events-generator/agents/database-agent.md +5 -4
- package/extracted-skill/tracking-events-generator/agents/fraud-detection-agent.md +0 -1
- package/extracted-skill/tracking-events-generator/agents/linkedin-agent.md +1 -1
- package/extracted-skill/tracking-events-generator/agents/ltv-predictor-agent.md +4 -4
- package/extracted-skill/tracking-events-generator/agents/ml-clustering-agent.md +81 -70
- package/package.json +1 -1
- package/server-edge-tracker/index.js +1 -1
- package/server-edge-tracker/modules/ml/fraud.js +1 -16
- package/server-edge-tracker/modules/ml/ltv.js +1 -1
- package/server-edge-tracker/modules/ml/segmentation.js +157 -127
- package/server-edge-tracker/worker.js +178 -120
- package/server-edge-tracker/wrangler.toml +21 -4
package/README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
**Padrão Quantum Tracking: 100% Cloudflare Edge.** Sem GTM. Sem Stape. Sem cookies de terceiros.
|
|
4
4
|
|
|
5
|
-
> **v2.0
|
|
5
|
+
> **v2.2.0** — Granite 4.0 Micro · K-means Vetorial Real (bge-m3) · Sem emails descartáveis · Cloudflare Workers · Meta CAPI v22.0 · GA4 MP · TikTok Events API v1.3
|
|
6
6
|
|
|
7
7
|
---
|
|
8
8
|
|
|
@@ -14,7 +14,7 @@
|
|
|
14
14
|
Meu ecossistema opera como um Cérebro de Conversão Privado na borda. Quando um evento de Lead bate no endpoint `/track`:
|
|
15
15
|
1. **O Escudo Frontal (Fraud Gate):** Inspeciono IP, ASN e Velocity na borda. Bloqueio bots silenciosamente antes mesmo deles carregarem.
|
|
16
16
|
2. **A Roleta Invisível (A/B LTV):** Faço o sorteio de prompts para testes A/B via KV Cache em ~0ms.
|
|
17
|
-
3. **O Cérebro Financeiro (LTV Predictor):** Rodo Machine Learning (
|
|
17
|
+
3. **O Cérebro Financeiro (LTV Predictor):** Rodo Machine Learning (Granite 4.0 Micro) para qualificar a intenção e gerar o LTV Preditivo.
|
|
18
18
|
4. **Envio para as Plataformas:** O Facebook/Google/LinkedIn recebem um payload limpo (sem bot) recheado com valor financeiro de intenção extrema.
|
|
19
19
|
5. **Máquina Autônoma (Background):** Meu banco SQLite (D1) retroalimenta os processos de Clustering (Fase 1) e Bidding (Fase 2) de forma autônoma pelas costas do usuário (`ctx.waitUntil`).
|
|
20
20
|
|
|
@@ -25,6 +25,25 @@ Meu ecossistema opera como um Cérebro de Conversão Privado na borda. Quando um
|
|
|
25
25
|
|
|
26
26
|
---
|
|
27
27
|
|
|
28
|
+
## 📋 CHANGELOG v2.2.0 (10 de Abril de 2026)
|
|
29
|
+
|
|
30
|
+
### 🤖 AI Engine Upgrade — Novos Modelos
|
|
31
|
+
|
|
32
|
+
- **LTV Prediction**: `@cf/meta/llama-3.1-8b-instruct` → **`@cf/ibm-granite/granite-4.0-h-micro`** (menor latência, otimizado para edge e function calling)
|
|
33
|
+
- **ML Clustering**: algoritmo LLM simulado → **K-means vetorial real** com embeddings `@cf/baai/bge-m3` (distância cosseno, K-means++ inicialização, silhouette score real)
|
|
34
|
+
- Granite continua sendo usado para naming dos segmentos pós-clustering
|
|
35
|
+
|
|
36
|
+
### 🧹 Limpeza (Zero Lixo)
|
|
37
|
+
|
|
38
|
+
- Removido: detecção de emails descartáveis (mailinator, guerrilla, tempmail, etc.) do Fraud Gate e do agente `fraud-detection-agent.md`
|
|
39
|
+
- Removido: secrets `WEBHOOK_SECRET_HOTMART` e `WEBHOOK_SECRET_KIWIFY` (wrangler + wrangler.toml)
|
|
40
|
+
|
|
41
|
+
### 🔧 Observability
|
|
42
|
+
|
|
43
|
+
- Adicionado bloco `[observability]` no `wrangler.toml` (`logs.enabled = true`, `traces.enabled = false`)
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
28
47
|
## 📋 CHANGELOG v2.0.7 (10 de Abril de 2026)
|
|
29
48
|
|
|
30
49
|
### 🔧 Audit Completo — 45 Agentes
|
|
@@ -78,7 +97,7 @@ Meu ecossistema opera como um Cérebro de Conversão Privado na borda. Quando um
|
|
|
78
97
|
- **`GET /api/fraud/blocklist`** — IPs/fingerprints atualmente bloqueados
|
|
79
98
|
- **`POST /api/fraud/blocklist/add`** — Bloquear IP ou fingerprint (via KV, efeito imediato)
|
|
80
99
|
- **`DELETE /api/fraud/blocklist/remove`** — Remover do blocklist
|
|
81
|
-
- Sinais detectados: bot_score, datacenter IP, velocity attack,
|
|
100
|
+
- Sinais detectados: bot_score, datacenter IP, velocity attack, headless UA, sem Accept-Language
|
|
82
101
|
- Schema D1: `fraud_signals`, `fraud_alerts` + VIEW `v_fraud_dashboard`
|
|
83
102
|
- Agente: `fraud-detection-agent.md`
|
|
84
103
|
|
|
@@ -98,7 +117,7 @@ graph TD
|
|
|
98
117
|
FraudGate -->|score ≥ 80: Silent Drop 200| Void[/dev/null]
|
|
99
118
|
FraudGate -->|score < 80: Permitido| Worker[Cloudflare Worker Agent]
|
|
100
119
|
Worker -->|Identity Graph + _cdp_uid| D1[(D1 SQL — 21 tabelas)]
|
|
101
|
-
Worker -->|LTV + A/B Prompt| AI[Workers AI
|
|
120
|
+
Worker -->|LTV + A/B Prompt| AI[Workers AI Granite 4.0 Micro]
|
|
102
121
|
Worker -->|Segmento ML| Cluster[ML Clustering Engine]
|
|
103
122
|
Cluster -->|Bid otimizado| Bidding[Bidding Recommendations]
|
|
104
123
|
Worker -->|Background| Queue[Cloudflare Queues]
|
|
@@ -138,7 +157,7 @@ O sistema é composto por **43+ agentes** coordenados pelo **Master Orchestrator
|
|
|
138
157
|
### 🤖 Enterprise Intelligence (Fase 1–4)
|
|
139
158
|
| Agente | Endpoint Principal | Impacto |
|
|
140
159
|
|---|---|---|
|
|
141
|
-
| **ML Clustering Agent** | `POST /api/segmentation/cluster` |
|
|
160
|
+
| **ML Clustering Agent** | `POST /api/segmentation/cluster` | K-means vetorial real (bge-m3 embeddings + Granite naming) |
|
|
142
161
|
| **Bidding Agent** | `POST /api/bidding/recommend` | -20% CPA via bid por segmento de LTV |
|
|
143
162
|
| **A/B LTV Agent** | `POST /api/ltv/ab-test/create` | +25% precisão LTV via test de prompts |
|
|
144
163
|
| **Fraud Detection Agent** | Auto em `/track` | Bloqueia click fraud, bots, velocity attacks |
|
|
@@ -187,7 +206,7 @@ POST /track (evento Lead)
|
|
|
187
206
|
├─ [2] 🔮 A/B LTV Testing — sorteia variação ativa (KV cache ~0ms)
|
|
188
207
|
│ └─ passa customSystemPrompt para predictLtv()
|
|
189
208
|
│
|
|
190
|
-
├─ [3] 🧮 LTV Prediction — Workers AI
|
|
209
|
+
├─ [3] 🧮 LTV Prediction — Workers AI Granite 4.0 Micro
|
|
191
210
|
│ └─ Score 0-100 → class High/Medium/Low → valor em BRL
|
|
192
211
|
│
|
|
193
212
|
├─ [4] 💾 D1 Writes (background via ctx.waitUntil)
|
|
@@ -286,8 +305,6 @@ wrangler deploy
|
|
|
286
305
|
|---|---|---|
|
|
287
306
|
| `/track` | POST | Evento principal (browser → CAPI) |
|
|
288
307
|
| `/health` | GET | Smoke test completo |
|
|
289
|
-
| `/webhook/hotmart` | POST | Webhook Hotmart Purchase |
|
|
290
|
-
| `/webhook/kiwify` | POST | Webhook Kiwify Purchase |
|
|
291
308
|
| `/webhook/ticto` | POST | Webhook Ticto Purchase |
|
|
292
309
|
|
|
293
310
|
### Intelligence ML
|
|
@@ -467,9 +467,10 @@ await env.AUDIT_LOGS.put(logKey, JSON.stringify({
|
|
|
467
467
|
|
|
468
468
|
### Modelo em Uso
|
|
469
469
|
```
|
|
470
|
-
@cf/
|
|
471
|
-
|
|
472
|
-
|
|
470
|
+
@cf/ibm-granite/granite-4.0-h-micro ← LTV Prediction + Naming de Clusters
|
|
471
|
+
@cf/baai/bge-m3 ← Embeddings para K-means vetorial (ML Clustering)
|
|
472
|
+
Custo Granite: ~20-35 neurônios/requisição (3x mais eficiente que Llama 3.1 8B)
|
|
473
|
+
Limite Free: 10.000 neurônios/dia (~350 predições/dia com Granite)
|
|
473
474
|
```
|
|
474
475
|
|
|
475
476
|
### Uso no Worker (LTV Prediction)
|
|
@@ -485,7 +486,7 @@ async function predictLtv(leadData, env) {
|
|
|
485
486
|
Responda apenas: {"class": "high|medium|low", "value": 0-1000}
|
|
486
487
|
`;
|
|
487
488
|
|
|
488
|
-
const response = await env.AI.run('@cf/
|
|
489
|
+
const response = await env.AI.run('@cf/ibm-granite/granite-4.0-h-micro', {
|
|
489
490
|
messages: [{ role: 'user', content: prompt }],
|
|
490
491
|
max_tokens: 50
|
|
491
492
|
});
|
|
@@ -71,7 +71,6 @@ checkFraudGate(env, request, payload)
|
|
|
71
71
|
| IP de datacenter | ASN = AWS, GCP, Azure, DigitalOcean, Linode | +35 pts |
|
|
72
72
|
| Sem headers de browser | Accept-Language ausente | +20 pts |
|
|
73
73
|
| Geo impossível | IP country ≠ país esperado (BR fora da LATAM) | +10 pts |
|
|
74
|
-
| Email temporário | @mailinator, @guerrilla, @tempmail, etc. | +25 pts |
|
|
75
74
|
|
|
76
75
|
### 4. Threshold de Ação
|
|
77
76
|
```
|
|
@@ -43,7 +43,7 @@ import { predictLtv } from './ltv-predictor.js';
|
|
|
43
43
|
* @param {Request} request - request original
|
|
44
44
|
*/
|
|
45
45
|
async function dispatchLinkedIn(env, leadData, request) {
|
|
46
|
-
// 1. Obter LTV predito pelo ML (Workers AI —
|
|
46
|
+
// 1. Obter LTV predito pelo ML (Workers AI — Granite 4.0 Micro)
|
|
47
47
|
let conversionValue = 0;
|
|
48
48
|
try {
|
|
49
49
|
const ltvResult = await predictLtv(env, leadData, request);
|
|
@@ -14,7 +14,7 @@ Sua única responsabilidade é instruir o Cloudflare Architect a imbuir modelos
|
|
|
14
14
|
|
|
15
15
|
## 📦 O PACOTE DE ENTREGA OBRIGATÓRIO
|
|
16
16
|
Sempre que o Orquestrador invocar a Otimização de Baleias (LTV Prediction):
|
|
17
|
-
1. **Snippet de Injeção de ML**: Entregue ao Server Architect o bloco `await env.AI.run('@cf/
|
|
17
|
+
1. **Snippet de Injeção de ML**: Entregue ao Server Architect o bloco `await env.AI.run('@cf/ibm-granite/granite-4.0-h-micro', ...)` ajustado para predição puramente matemática.
|
|
18
18
|
2. **Override de Event Valuation**: Modifique como o evento `Lead` ou `Purchase` é envernizado com lucro preditivo antes do dispatch da CAPI.
|
|
19
19
|
|
|
20
20
|
> 👁️ "Não pague por cliques hoje. Compre os clientes de amanhã. Faça o algoritmo apostar sempre nas suas fichas vencedoras."
|
|
@@ -27,7 +27,7 @@ Sempre que o Orquestrador invocar a Otimização de Baleias (LTV Prediction):
|
|
|
27
27
|
- Dados de UTM: `utm_source`, `utm_medium`, `utm_campaign`
|
|
28
28
|
- `request.cf.asOrganization` e `request.cf.country` (sinais de qualidade do tráfego)
|
|
29
29
|
- Histórico D1 do `_cdp_uid`: páginas visitadas, tempo na página, eventos anteriores
|
|
30
|
-
- Binding `env.AI` (Cloudflare Workers AI — `@cf/
|
|
30
|
+
- Binding `env.AI` (Cloudflare Workers AI — `@cf/ibm-granite/granite-4.0-h-micro`)
|
|
31
31
|
|
|
32
32
|
## RESPONSABILIDADE
|
|
33
33
|
|
|
@@ -35,7 +35,7 @@ Sempre que o Orquestrador invocar a Otimização de Baleias (LTV Prediction):
|
|
|
35
35
|
- Classificar o lead em `predicted_ltv_class: 'High' | 'Medium' | 'Low'`
|
|
36
36
|
- Substituir `value: 0` do evento `Lead` pelo valor preditivo antes do dispatch CAPI/GA4/TikTok
|
|
37
37
|
- Registrar no D1 `identity_graph`: `predicted_ltv`, `predicted_ltv_class`
|
|
38
|
-
- Consumo
|
|
38
|
+
- Consumo: ~20–35 neurônios/request com Granite 4.0 Micro (~350 predições/dia no free tier, ilimitado no paid)
|
|
39
39
|
|
|
40
40
|
## SAÍDA
|
|
41
41
|
|
|
@@ -44,7 +44,7 @@ Sempre que o Orquestrador invocar a Otimização de Baleias (LTV Prediction):
|
|
|
44
44
|
"arquivos_criados": [
|
|
45
45
|
"cloudflare/ltv-predictor.js"
|
|
46
46
|
],
|
|
47
|
-
"modelo_ai": "@cf/
|
|
47
|
+
"modelo_ai": "@cf/ibm-granite/granite-4.0-h-micro",
|
|
48
48
|
"campo_substituido": "value",
|
|
49
49
|
"exemplo": {
|
|
50
50
|
"evento": "Lead",
|
|
@@ -128,74 +128,74 @@ is_business_hours = 1 if 9 <= hour_of_day <= 18 else 0
|
|
|
128
128
|
|
|
129
129
|
---
|
|
130
130
|
|
|
131
|
-
## Fase 2 — K-Means
|
|
131
|
+
## Fase 2 — K-Means Vetorial Real (embeddinggemma-300m + K-means em JS)
|
|
132
132
|
|
|
133
|
-
|
|
133
|
+
> **Arquitetura atual:** O clustering não usa LLM para fazer os cálculos matemáticos.
|
|
134
|
+
> Em vez disso, usa **embeddings semânticos reais** + **K-means implementado em JavaScript**,
|
|
135
|
+
> com o Granite usado **apenas para nomear** os clusters resultantes.
|
|
134
136
|
|
|
135
|
-
|
|
136
|
-
# Enviar para: env.AI.run('@cf/meta/llama-3.1-8b-instruct', ...)
|
|
137
|
+
### 2.1 Pipeline de Clustering
|
|
137
138
|
|
|
138
|
-
|
|
139
|
-
|
|
139
|
+
```
|
|
140
|
+
100 leads (sample) → perfil textual → embeddinggemma-300m → vetores 768d
|
|
141
|
+
↓
|
|
142
|
+
K-means++ (cosine distance, JS puro)
|
|
143
|
+
↓
|
|
144
|
+
silhouette score real calculado em JS
|
|
145
|
+
↓
|
|
146
|
+
Granite 4.0 Micro nomeia cada cluster (1 call de LLM)
|
|
147
|
+
```
|
|
140
148
|
|
|
141
|
-
|
|
142
|
-
Your task: Perform K-means clustering to group customers into {n_clusters} segments.
|
|
149
|
+
### 2.2 Modelos Workers AI utilizados
|
|
143
150
|
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
-
|
|
147
|
-
|
|
151
|
+
| Modelo | ID | Uso |
|
|
152
|
+
|---|---|---|
|
|
153
|
+
| **Granite 4.0 Micro** | `@cf/ibm-granite/granite-4.0-h-micro` | LTV Prediction + Naming de clusters |
|
|
154
|
+
| **EmbeddingGemma 300M** | `@cf/baai/bge-m3` | Embeddings semânticos para K-means |
|
|
148
155
|
|
|
149
|
-
|
|
150
|
-
1. Normalize all features to 0-1 range (min-max normalization)
|
|
151
|
-
2. Initialize K-means centroids randomly
|
|
152
|
-
3. Assign each lead to nearest centroid (Euclidean distance)
|
|
153
|
-
4. Recalculate centroids as mean of assigned points
|
|
154
|
-
5. Iterate until convergence (max 100 iterations)
|
|
155
|
-
6. Calculate Silhouette Score for each cluster (cohesion vs separation)
|
|
156
|
+
### 2.3 Perfil textual por lead (input para embedding)
|
|
156
157
|
|
|
157
|
-
|
|
158
|
-
{
|
|
159
|
-
|
|
160
|
-
{
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
"top_features": ["ltv", "behavior_score", "engagement_score"]
|
|
173
|
-
}},
|
|
174
|
-
"centroid": {{
|
|
175
|
-
"ltv": 0.75,
|
|
176
|
-
"behavior_score": 0.80,
|
|
177
|
-
"engagement_score": 0.85
|
|
178
|
-
}},
|
|
179
|
-
"sample_leads": [lead_id_1, lead_id_2, lead_id_3]
|
|
180
|
-
}},
|
|
181
|
-
...
|
|
182
|
-
],
|
|
183
|
-
"silhouette_scores": {{
|
|
184
|
-
"overall": 0.62,
|
|
185
|
-
"by_cluster": [0.71, 0.58, 0.65, ...]
|
|
186
|
-
}},
|
|
187
|
-
"convergence": {{
|
|
188
|
-
"iterations": 47,
|
|
189
|
-
"final_inertia": 1523.45
|
|
190
|
-
}}
|
|
191
|
-
}}
|
|
158
|
+
```javascript
|
|
159
|
+
function _buildLeadProfile(l) {
|
|
160
|
+
return [
|
|
161
|
+
`LTV: ${l.predicted_ltv_class || 'desconhecido'}`,
|
|
162
|
+
`engajamento: ${Math.round(l.engagement_score || 0)}`,
|
|
163
|
+
`intenção: ${l.intention_level || 'desconhecida'}`,
|
|
164
|
+
`origem: ${l.utm_source || 'direto'}`,
|
|
165
|
+
`canal: ${l.utm_medium || 'desconhecido'}`,
|
|
166
|
+
`país: ${l.country || 'BR'}`,
|
|
167
|
+
`hora: ${l.hour_of_day || 12}h`,
|
|
168
|
+
(l.is_weekend ? 'fim-de-semana' : 'dia-útil'),
|
|
169
|
+
`recência: ${l.days_since_lead || 0} dias`,
|
|
170
|
+
].filter(Boolean).join(', ');
|
|
171
|
+
}
|
|
172
|
+
```
|
|
192
173
|
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
174
|
+
### 2.4 Chamada de embeddings em batch
|
|
175
|
+
|
|
176
|
+
```javascript
|
|
177
|
+
// Embeds até 100 perfis em uma única chamada
|
|
178
|
+
const embRes = await env.AI.run('@cf/baai/bge-m3', { text: profiles });
|
|
179
|
+
const vectors = embRes.data; // float32[][] — shape [N, 768]
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
### 2.5 K-means vetorial (cosine distance)
|
|
183
|
+
|
|
184
|
+
```javascript
|
|
185
|
+
// Inicialização K-means++ → iterações até convergência → assignments finais
|
|
186
|
+
const { assignments } = _kmeansRun(vectors, nClusters); // implementado em worker.js
|
|
187
|
+
const silhouetteScore = _silhouette(vectors, assignments, nClusters); // score real
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
### 2.6 Naming dos clusters via Granite (único uso de LLM)
|
|
191
|
+
|
|
192
|
+
```javascript
|
|
193
|
+
// Granite recebe apenas as estatísticas agregadas por cluster
|
|
194
|
+
// Retorna nome descritivo + recomendação de campanha em português
|
|
195
|
+
const nameRes = await env.AI.run('@cf/ibm-granite/granite-4.0-h-micro', {
|
|
196
|
+
messages: [{ role: 'user', content: namingPrompt }],
|
|
197
|
+
max_tokens: 800
|
|
198
|
+
});
|
|
199
199
|
```
|
|
200
200
|
|
|
201
201
|
### 2.2 Features para K-Means
|
|
@@ -484,20 +484,30 @@ export async function onRequestGet(context: EventContext<Env>) {
|
|
|
484
484
|
// Feature Engineering
|
|
485
485
|
const features = extractFeatures(leads);
|
|
486
486
|
|
|
487
|
-
//
|
|
488
|
-
const
|
|
489
|
-
|
|
490
|
-
|
|
487
|
+
// 1. Embeddings reais via embeddinggemma-300m
|
|
488
|
+
const profiles = sample.map(_buildLeadProfile);
|
|
489
|
+
const embRes = await context.env.AI.run('@cf/baai/bge-m3', { text: profiles });
|
|
490
|
+
const vectors = embRes.data; // vetores 768d
|
|
491
|
+
|
|
492
|
+
// 2. K-means vetorial real (JS puro, cosine distance)
|
|
493
|
+
const { assignments } = _kmeansRun(vectors, nClusters);
|
|
494
|
+
const silhouetteScore = _silhouette(vectors, assignments, nClusters);
|
|
495
|
+
|
|
496
|
+
// 3. Granite apenas para nomear clusters
|
|
497
|
+
const nameRes = await context.env.AI.run('@cf/ibm-granite/granite-4.0-h-micro',
|
|
498
|
+
{ messages: [{ role: 'user', content: getNamingPrompt(clusterStats) }], max_tokens: 800 }
|
|
491
499
|
);
|
|
492
|
-
|
|
493
|
-
// Persistir no D1
|
|
500
|
+
|
|
501
|
+
// 4. Persistir no D1
|
|
494
502
|
await saveClusters(context.env.DB, clusters, algorithm);
|
|
495
|
-
|
|
503
|
+
|
|
496
504
|
return Response.json({
|
|
497
505
|
success: true,
|
|
498
506
|
algorithm,
|
|
507
|
+
engine: 'embeddinggemma-300m + kmeans vetorial',
|
|
499
508
|
n_clusters: nClusters,
|
|
500
|
-
|
|
509
|
+
silhouette_score: silhouetteScore,
|
|
510
|
+
clusters,
|
|
501
511
|
generated_at: new Date().toISOString()
|
|
502
512
|
});
|
|
503
513
|
}
|
|
@@ -658,7 +668,7 @@ interface SegmentationAPI {
|
|
|
658
668
|
|
|
659
669
|
```
|
|
660
670
|
[ ] Feature Engineering Pipeline implementada
|
|
661
|
-
[ ] K-means Clustering
|
|
671
|
+
[ ] K-means Clustering vetorial (embeddinggemma-300m + JS)
|
|
662
672
|
[ ] DBSCAN Clustering para anomalias
|
|
663
673
|
[ ] Hierarchical Clustering (drill-down)
|
|
664
674
|
[ ] Auto-Interpretação de segmentos
|
|
@@ -679,7 +689,8 @@ interface SegmentationAPI {
|
|
|
679
689
|
| `Clusters vazios` | Menos de `min_data_points` no D1 | Aumentar `max_data_age_months` ou aguardar mais dados |
|
|
680
690
|
| `Silhouette Score < 0.3` | Clusters não são separáveis | Aumentar `n_clusters` ou usar features melhores |
|
|
681
691
|
| `Outliers excessivos` | Epsilon/MinPts muito agressivos no DBSCAN | Ajustar parâmetros de detecção de anomalias |
|
|
682
|
-
| `
|
|
692
|
+
| `embeddinggemma timeout` | Batch maior que 100 perfis | Limitar sample a 100 leads (padrão atual) |
|
|
693
|
+
| `vectors insuficientes` | embeddinggemma retornou menos vetores que nClusters | Reduzir nClusters ou verificar resposta da API |
|
|
683
694
|
|
|
684
695
|
---
|
|
685
696
|
|
package/package.json
CHANGED
|
@@ -6,13 +6,6 @@
|
|
|
6
6
|
import { sha256, tryParseJson } from '../utils.js';
|
|
7
7
|
|
|
8
8
|
// ── Listas de detecção ────────────────────────────────────────────────────────
|
|
9
|
-
export const DISPOSABLE_EMAIL_DOMAINS = new Set([
|
|
10
|
-
'mailinator.com','guerrillamail.com','tempmail.com','throwaway.email',
|
|
11
|
-
'yopmail.com','sharklasers.com','guerrillamailblock.com','spam4.me',
|
|
12
|
-
'10minutemail.com','trashmail.com','maildrop.cc','fakeinbox.com',
|
|
13
|
-
'dispostable.com','getairmail.com','mailnull.com',
|
|
14
|
-
]);
|
|
15
|
-
|
|
16
9
|
export const DATACENTER_PATTERNS = /amazon|google|microsoft|digitalocean|linode|ovh|vultr|hetzner|contabo|cloudflare|packet|rackspace|leaseweb/i;
|
|
17
10
|
|
|
18
11
|
// ── checkFraudGate — roda ANTES de qualquer processamento de evento ────────────
|
|
@@ -64,15 +57,7 @@ export async function checkFraudGate(env, request, payload) {
|
|
|
64
57
|
result.score += 20; result.reasons.push('no_accept_language');
|
|
65
58
|
}
|
|
66
59
|
|
|
67
|
-
// 6.
|
|
68
|
-
if (email) {
|
|
69
|
-
const domain = email.split('@')[1]?.toLowerCase();
|
|
70
|
-
if (domain && DISPOSABLE_EMAIL_DOMAINS.has(domain)) {
|
|
71
|
-
result.score += 25; result.reasons.push('disposable_email');
|
|
72
|
-
}
|
|
73
|
-
}
|
|
74
|
-
|
|
75
|
-
// 7. Velocity check via KV
|
|
60
|
+
// 6. Velocity check via KV
|
|
76
61
|
if (env.GEO_CACHE && ip) {
|
|
77
62
|
const velKey1h = `fraud_velocity:${ip}:h`;
|
|
78
63
|
const velStr = await env.GEO_CACHE.get(velKey1h);
|
|
@@ -161,7 +161,7 @@ export async function predictLtv(env, payload, request, customSystemPrompt = nul
|
|
|
161
161
|
{ role: 'system', content: systemContent },
|
|
162
162
|
{ role: 'user', content: JSON.stringify(userContext) },
|
|
163
163
|
];
|
|
164
|
-
const aiRes = await env.AI.run('@cf/
|
|
164
|
+
const aiRes = await env.AI.run('@cf/ibm-granite/granite-4.0-h-micro', { messages: prompt, max_tokens: 32 });
|
|
165
165
|
const parsed = JSON.parse(aiRes.response.trim());
|
|
166
166
|
if (typeof parsed.adjustment === 'number') {
|
|
167
167
|
aiAdjustment = Math.max(-10, Math.min(10, parsed.adjustment));
|
|
@@ -5,14 +5,84 @@
|
|
|
5
5
|
|
|
6
6
|
import { tryParseJson } from '../utils.js';
|
|
7
7
|
|
|
8
|
+
// ── Helpers K-means vetorial ──────────────────────────────────────────────────
|
|
9
|
+
|
|
10
|
+
function _cosDist(a, b) {
|
|
11
|
+
let dot = 0, na = 0, nb = 0;
|
|
12
|
+
for (let i = 0; i < a.length; i++) { dot += a[i]*b[i]; na += a[i]*a[i]; nb += b[i]*b[i]; }
|
|
13
|
+
return 1 - dot / (Math.sqrt(na) * Math.sqrt(nb) + 1e-10);
|
|
14
|
+
}
|
|
15
|
+
|
|
16
|
+
function _kmeansRun(vectors, k, maxIter = 25) {
|
|
17
|
+
const n = vectors.length, dim = vectors[0].length;
|
|
18
|
+
const centroids = [vectors[Math.floor(Math.random() * n)]];
|
|
19
|
+
while (centroids.length < k) {
|
|
20
|
+
const dists = vectors.map(v => Math.min(...centroids.map(c => _cosDist(v, c))));
|
|
21
|
+
const sum = dists.reduce((a, b) => a + b, 0);
|
|
22
|
+
let r = Math.random() * sum, cumul = 0;
|
|
23
|
+
for (let i = 0; i < n; i++) { cumul += dists[i]; if (cumul >= r) { centroids.push(vectors[i]); break; } }
|
|
24
|
+
if (centroids.length < k) centroids.push(vectors[Math.floor(Math.random() * n)]);
|
|
25
|
+
}
|
|
26
|
+
let assignments = new Array(n).fill(0);
|
|
27
|
+
for (let iter = 0; iter < maxIter; iter++) {
|
|
28
|
+
let changed = false;
|
|
29
|
+
for (let i = 0; i < n; i++) {
|
|
30
|
+
let best = 0, bestD = Infinity;
|
|
31
|
+
for (let c = 0; c < k; c++) { const d = _cosDist(vectors[i], centroids[c]); if (d < bestD) { bestD = d; best = c; } }
|
|
32
|
+
if (assignments[i] !== best) { assignments[i] = best; changed = true; }
|
|
33
|
+
}
|
|
34
|
+
if (!changed) break;
|
|
35
|
+
for (let c = 0; c < k; c++) {
|
|
36
|
+
const members = vectors.filter((_, i) => assignments[i] === c);
|
|
37
|
+
if (!members.length) continue;
|
|
38
|
+
for (let d = 0; d < dim; d++) centroids[c][d] = members.reduce((s, v) => s + v[d], 0) / members.length;
|
|
39
|
+
}
|
|
40
|
+
}
|
|
41
|
+
return { assignments, centroids };
|
|
42
|
+
}
|
|
43
|
+
|
|
44
|
+
function _silhouette(vectors, assignments, k) {
|
|
45
|
+
const n = vectors.length;
|
|
46
|
+
let total = 0;
|
|
47
|
+
for (let i = 0; i < n; i++) {
|
|
48
|
+
const ci = assignments[i];
|
|
49
|
+
const same = vectors.filter((_, j) => j !== i && assignments[j] === ci);
|
|
50
|
+
const a = same.length ? same.reduce((s, v) => s + _cosDist(vectors[i], v), 0) / same.length : 0;
|
|
51
|
+
let b = Infinity;
|
|
52
|
+
for (let c = 0; c < k; c++) {
|
|
53
|
+
if (c === ci) continue;
|
|
54
|
+
const other = vectors.filter((_, j) => assignments[j] === c);
|
|
55
|
+
if (other.length) b = Math.min(b, other.reduce((s, v) => s + _cosDist(vectors[i], v), 0) / other.length);
|
|
56
|
+
}
|
|
57
|
+
total += b === Infinity ? 0 : (b - a) / Math.max(a, b);
|
|
58
|
+
}
|
|
59
|
+
return Math.round((total / n) * 1000) / 1000;
|
|
60
|
+
}
|
|
61
|
+
|
|
62
|
+
function _buildLeadProfile(l) {
|
|
63
|
+
return [
|
|
64
|
+
`LTV: ${l.predicted_ltv_class || 'desconhecido'}`,
|
|
65
|
+
`engajamento: ${Math.round(l.engagement_score || 0)}`,
|
|
66
|
+
`intenção: ${l.intention_level || 'desconhecida'}`,
|
|
67
|
+
`origem: ${l.utm_source || 'direto'}`,
|
|
68
|
+
`canal: ${l.utm_medium || 'desconhecido'}`,
|
|
69
|
+
`país: ${l.country || 'BR'}`,
|
|
70
|
+
`estado: ${l.state || ''}`,
|
|
71
|
+
`hora: ${l.hour_of_day || 12}h`,
|
|
72
|
+
(l.is_weekend ? 'fim-de-semana' : 'dia-útil'),
|
|
73
|
+
`recência: ${l.days_since_lead || 0} dias`,
|
|
74
|
+
].filter(Boolean).join(', ');
|
|
75
|
+
}
|
|
76
|
+
|
|
8
77
|
// ── POST /api/segmentation/cluster ────────────────────────────────────────────
|
|
78
|
+
// Clustering real: embeddinggemma-300m → K-means vetorial → Granite para nomear
|
|
9
79
|
export async function handleSegmentationCluster(env, request, headers) {
|
|
10
80
|
if (!env.DB) return new Response(JSON.stringify({ error: 'DB não configurado' }), { status: 503, headers });
|
|
11
|
-
if (!env.AI) return new Response(JSON.stringify({ error: 'Workers AI não configurado
|
|
81
|
+
if (!env.AI) return new Response(JSON.stringify({ error: 'Workers AI não configurado' }), { status: 503, headers });
|
|
12
82
|
|
|
13
83
|
const url = new URL(request.url);
|
|
14
84
|
const algorithm = url.searchParams.get('algorithm') || 'kmeans';
|
|
15
|
-
const nClusters = Math.min(10, Math.max(
|
|
85
|
+
const nClusters = Math.min(10, Math.max(2, parseInt(url.searchParams.get('n_clusters') || '5')));
|
|
16
86
|
const clientVertical = url.searchParams.get('vertical') || 'general';
|
|
17
87
|
const forceRecluster = url.searchParams.get('force') === 'true';
|
|
18
88
|
|
|
@@ -21,16 +91,14 @@ export async function handleSegmentationCluster(env, request, headers) {
|
|
|
21
91
|
}
|
|
22
92
|
|
|
23
93
|
try {
|
|
24
|
-
// 1. Cluster recente? Evitar re-clustering desnecessário (< 7 dias)
|
|
25
94
|
if (!forceRecluster) {
|
|
26
95
|
const existing = await env.DB.prepare(`
|
|
27
96
|
SELECT id, created_at, cluster_name FROM ml_segments
|
|
28
97
|
WHERE clustering_algorithm = ? AND is_active = 1 AND client_vertical = ?
|
|
29
98
|
ORDER BY created_at DESC LIMIT 1
|
|
30
99
|
`).bind(algorithm, clientVertical).first();
|
|
31
|
-
|
|
32
100
|
if (existing) {
|
|
33
|
-
const ageDays = (Date.now() - new Date(existing.created_at).getTime()) /
|
|
101
|
+
const ageDays = (Date.now() - new Date(existing.created_at).getTime()) / 864e5;
|
|
34
102
|
if (ageDays < 7) {
|
|
35
103
|
return new Response(JSON.stringify({
|
|
36
104
|
success: true, message: 'Cluster existente ainda válido (< 7 dias). Use ?force=true para re-clustering.',
|
|
@@ -41,7 +109,6 @@ export async function handleSegmentationCluster(env, request, headers) {
|
|
|
41
109
|
}
|
|
42
110
|
}
|
|
43
111
|
|
|
44
|
-
// 2. Extrair leads históricos do D1 (últimos 6 meses, excluindo bots confirmados)
|
|
45
112
|
const leadsRes = await env.DB.prepare(`
|
|
46
113
|
SELECT id, predicted_ltv_class, engagement_score, intention_level,
|
|
47
114
|
country, state, utm_source, utm_medium, bot_score,
|
|
@@ -49,162 +116,125 @@ export async function handleSegmentationCluster(env, request, headers) {
|
|
|
49
116
|
CAST(julianday('now') - julianday(created_at) AS INTEGER) AS days_since_lead,
|
|
50
117
|
CASE WHEN strftime('%w', created_at) IN ('0','6') THEN 1 ELSE 0 END AS is_weekend
|
|
51
118
|
FROM leads
|
|
52
|
-
WHERE created_at >= datetime('now', '-6 months')
|
|
53
|
-
|
|
54
|
-
ORDER BY RANDOM()
|
|
55
|
-
LIMIT 2000
|
|
119
|
+
WHERE created_at >= datetime('now', '-6 months') AND (bot_score IS NULL OR bot_score < 2)
|
|
120
|
+
ORDER BY RANDOM() LIMIT 2000
|
|
56
121
|
`).all();
|
|
57
122
|
|
|
58
123
|
const leads = leadsRes.results || [];
|
|
59
|
-
|
|
60
124
|
if (leads.length < 50) {
|
|
61
|
-
return new Response(JSON.stringify({
|
|
62
|
-
error: 'Dados insuficientes para clustering. Mínimo: 50 leads nos últimos 6 meses.',
|
|
63
|
-
leads_found: leads.length, required: 50,
|
|
64
|
-
}), { status: 400, headers });
|
|
65
|
-
}
|
|
66
|
-
|
|
67
|
-
// 3. Feature Engineering — normalização 0–1
|
|
68
|
-
const features = leads.map(l => ({
|
|
69
|
-
id: l.id,
|
|
70
|
-
ltv: l.predicted_ltv_class === 'High' ? 1 : (l.predicted_ltv_class === 'Medium' ? 0.5 : 0),
|
|
71
|
-
engagement: Math.min((l.engagement_score || 0) / 100, 1),
|
|
72
|
-
intention: l.intention_level === 'comprador' || l.intention_level === 'high_intent' ? 1
|
|
73
|
-
: l.intention_level === 'interessado' ? 0.6
|
|
74
|
-
: l.intention_level === 'curioso' ? 0.3 : 0,
|
|
75
|
-
recency: Math.max(0, 1 - (l.days_since_lead || 0) / 180),
|
|
76
|
-
hour: (l.hour_of_day || 12) / 23,
|
|
77
|
-
is_weekend: l.is_weekend || 0,
|
|
78
|
-
is_br: l.country === 'BR' ? 1 : 0,
|
|
79
|
-
is_paid: ['facebook','google','tiktok','instagram','youtube'].includes((l.utm_source || '').toLowerCase()) ? 1 : 0,
|
|
80
|
-
}));
|
|
81
|
-
|
|
82
|
-
// 4. Prompt para Workers AI
|
|
83
|
-
const sampleSize = Math.min(features.length, 100);
|
|
84
|
-
const sample = features.slice(0, sampleSize);
|
|
85
|
-
|
|
86
|
-
const clusteringPrompt =
|
|
87
|
-
`You are a customer segmentation ML expert. Perform ${algorithm} clustering on ${sampleSize} customers into ${nClusters} segments.
|
|
88
|
-
|
|
89
|
-
Customer features (all normalized 0-1):
|
|
90
|
-
- ltv: predicted lifetime value (0=Low, 0.5=Medium, 1=High)
|
|
91
|
-
- engagement: browser engagement score
|
|
92
|
-
- intention: purchase intention (0=none, 0.3=curious, 0.6=interested, 1=buyer)
|
|
93
|
-
- recency: lead recency (1=today, 0=6 months ago)
|
|
94
|
-
- hour: conversion hour of day
|
|
95
|
-
- is_weekend: converted on weekend (0/1)
|
|
96
|
-
- is_br: lead from Brazil (0/1)
|
|
97
|
-
- is_paid: from paid traffic channel (0/1)
|
|
98
|
-
|
|
99
|
-
Data (${sampleSize} customers): ${JSON.stringify(sample.slice(0, 50))}
|
|
100
|
-
|
|
101
|
-
Return ONLY valid JSON, zero explanation:
|
|
102
|
-
{
|
|
103
|
-
"clusters": [
|
|
104
|
-
{
|
|
105
|
-
"cluster_id": 0,
|
|
106
|
-
"name": "[Nome Descritivo em Português]",
|
|
107
|
-
"size": ${Math.round(sampleSize / nClusters)},
|
|
108
|
-
"percentage": ${Math.round(100 / nClusters)},
|
|
109
|
-
"characteristics": {
|
|
110
|
-
"avg_ltv_class": 0.5,
|
|
111
|
-
"avg_behavior_score": 0.5,
|
|
112
|
-
"avg_engagement_score": 0.5,
|
|
113
|
-
"avg_intention_level": 0.5,
|
|
114
|
-
"avg_days_since_lead": 30,
|
|
115
|
-
"dominant_countries": ["BR"],
|
|
116
|
-
"dominant_states": ["SP", "RJ"],
|
|
117
|
-
"dominant_utm_sources": ["facebook"],
|
|
118
|
-
"top_features": ["ltv", "engagement"]
|
|
119
|
-
},
|
|
120
|
-
"centroid": { "ltv": 0.5, "engagement": 0.5, "intention": 0.5 },
|
|
121
|
-
"action_recommendation": "[Recomendação de campanha específica para este segmento]"
|
|
125
|
+
return new Response(JSON.stringify({ error: 'Dados insuficientes para clustering. Mínimo: 50 leads.', leads_found: leads.length, required: 50 }), { status: 400, headers });
|
|
122
126
|
}
|
|
123
|
-
],
|
|
124
|
-
"silhouette_score": 0.65,
|
|
125
|
-
"total_processed": ${sampleSize}
|
|
126
|
-
}`;
|
|
127
127
|
|
|
128
|
-
// 5. Workers AI
|
|
129
128
|
const startTime = Date.now();
|
|
130
|
-
const
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
const
|
|
135
|
-
|
|
136
|
-
if (!
|
|
129
|
+
const sample = leads.slice(0, 100);
|
|
130
|
+
const profiles = sample.map(_buildLeadProfile);
|
|
131
|
+
|
|
132
|
+
// Embeddings reais via embeddinggemma-300m
|
|
133
|
+
const embRes = await env.AI.run('@cf/baai/bge-m3', { text: profiles });
|
|
134
|
+
const vectors = embRes.data;
|
|
135
|
+
if (!vectors || vectors.length < nClusters) throw new Error(`embeddinggemma retornou ${vectors?.length ?? 0} vetores`);
|
|
136
|
+
|
|
137
|
+
// K-means vetorial real
|
|
138
|
+
const { assignments } = _kmeansRun(vectors, nClusters);
|
|
139
|
+
const silhouetteScore = _silhouette(vectors, assignments, nClusters);
|
|
140
|
+
|
|
141
|
+
// Agregação por cluster para nomear com Granite
|
|
142
|
+
const clusterStats = Array.from({ length: nClusters }, (_, c) => {
|
|
143
|
+
const members = sample.filter((_, i) => assignments[i] === c);
|
|
144
|
+
if (!members.length) return null;
|
|
145
|
+
const ltvMap = { High: 1, Medium: 0.5, Low: 0 };
|
|
146
|
+
const avgLtv = members.reduce((s, l) => s + (ltvMap[l.predicted_ltv_class] ?? 0), 0) / members.length;
|
|
147
|
+
const avgEng = members.reduce((s, l) => s + (l.engagement_score || 0), 0) / members.length;
|
|
148
|
+
const avgDays = members.reduce((s, l) => s + (l.days_since_lead || 0), 0) / members.length;
|
|
149
|
+
const freq = (arr) => arr.length ? [...arr.reduce((m,s) => m.set(s,(m.get(s)||0)+1), new Map())].sort((a,b)=>b[1]-a[1])[0]?.[0] : null;
|
|
150
|
+
return {
|
|
151
|
+
c, size: members.length, pct: Math.round(members.length / sample.length * 100),
|
|
152
|
+
avgLtv, avgEng, avgDays,
|
|
153
|
+
topSource: freq(members.map(l => l.utm_source).filter(Boolean)) || 'direto',
|
|
154
|
+
topState: freq(members.map(l => l.state).filter(Boolean)) || 'BR',
|
|
155
|
+
topIntent: freq(members.map(l => l.intention_level).filter(Boolean)) || 'desconhecida',
|
|
156
|
+
};
|
|
157
|
+
}).filter(Boolean);
|
|
158
|
+
|
|
159
|
+
// Granite apenas para nomear segmentos
|
|
160
|
+
const namingPrompt =
|
|
161
|
+
`Você é especialista em segmentação de clientes. Dê um nome descritivo em português e uma recomendação de campanha para cada segmento. Retorne SOMENTE JSON válido:
|
|
162
|
+
{"segments":[{"cluster_id":0,"name":"...","action":"..."},...]}
|
|
163
|
+
|
|
164
|
+
${clusterStats.map(s => `Cluster ${s.c}: LTV=${s.avgLtv.toFixed(2)}, engajamento=${s.avgEng.toFixed(0)}, intenção="${s.topIntent}", origem="${s.topSource}", estado="${s.topState}", recência=${s.avgDays.toFixed(0)} dias, tamanho=${s.size}`).join('\n')}`;
|
|
165
|
+
|
|
166
|
+
const nameRes = await env.AI.run('@cf/ibm-granite/granite-4.0-h-micro', { messages: [{ role: 'user', content: namingPrompt }], max_tokens: 800 });
|
|
167
|
+
let clusterNames = {};
|
|
168
|
+
try {
|
|
169
|
+
const m = (nameRes?.response || '').match(/\{[\s\S]*\}/);
|
|
170
|
+
if (m) (JSON.parse(m[0]).segments || []).forEach(s => { clusterNames[s.cluster_id] = { name: s.name, action: s.action }; });
|
|
171
|
+
} catch { /* usa nomes fallback */ }
|
|
137
172
|
|
|
138
|
-
const
|
|
139
|
-
if (!jsonMatch) throw new Error('Resposta do Workers AI não contém JSON válido');
|
|
140
|
-
const mlResult = JSON.parse(jsonMatch[0]);
|
|
173
|
+
const duration = Date.now() - startTime;
|
|
141
174
|
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
175
|
+
const clusters = clusterStats.map(s => ({
|
|
176
|
+
cluster_id: s.c,
|
|
177
|
+
name: clusterNames[s.c]?.name || `Segmento ${s.c + 1}`,
|
|
178
|
+
size: s.size, percentage: s.pct,
|
|
179
|
+
action_recommendation: clusterNames[s.c]?.action || '',
|
|
180
|
+
characteristics: {
|
|
181
|
+
avg_ltv_class: s.avgLtv, avg_engagement_score: s.avgEng,
|
|
182
|
+
avg_intention_level: s.avgLtv, avg_days_since_lead: s.avgDays,
|
|
183
|
+
dominant_countries: ['BR'], dominant_states: [s.topState],
|
|
184
|
+
dominant_utm_sources: [s.topSource], top_features: ['ltv', 'engagement', 'intention'],
|
|
185
|
+
},
|
|
186
|
+
}));
|
|
145
187
|
|
|
146
|
-
// 6. Inativar clusters anteriores
|
|
147
188
|
await env.DB.prepare(`UPDATE ml_segments SET is_active = 0 WHERE clustering_algorithm = ? AND client_vertical = ? AND is_active = 1`).bind(algorithm, clientVertical).run();
|
|
148
189
|
|
|
149
|
-
// 7. Persistir novos clusters
|
|
150
190
|
const now = new Date().toISOString();
|
|
151
|
-
for (const cluster of
|
|
152
|
-
const ch = cluster.characteristics
|
|
191
|
+
for (const cluster of clusters) {
|
|
192
|
+
const ch = cluster.characteristics;
|
|
153
193
|
await env.DB.prepare(`
|
|
154
194
|
INSERT INTO ml_segments (
|
|
155
|
-
cluster_id, cluster_name, clustering_algorithm, client_vertical,
|
|
156
|
-
|
|
157
|
-
avg_intention_level, avg_days_since_lead,
|
|
195
|
+
cluster_id, cluster_name, clustering_algorithm, client_vertical, size, percentage,
|
|
196
|
+
avg_ltv_class, avg_behavior_score, avg_engagement_score, avg_intention_level, avg_days_since_lead,
|
|
158
197
|
dominant_countries, dominant_states, dominant_utm_sources, dominant_features,
|
|
159
198
|
silhouette_score, action_recommendations, bid_recommendations, campaign_recommendations,
|
|
160
199
|
is_active, created_at, updated_at
|
|
161
200
|
) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,1,?,?)
|
|
162
201
|
`).bind(
|
|
163
|
-
cluster.cluster_id
|
|
164
|
-
|
|
165
|
-
ch.
|
|
166
|
-
ch.
|
|
167
|
-
|
|
168
|
-
JSON.stringify(
|
|
169
|
-
mlResult.silhouette_score || 0,
|
|
170
|
-
JSON.stringify([cluster.action_recommendation || '']), JSON.stringify([]), JSON.stringify([]),
|
|
202
|
+
cluster.cluster_id, cluster.name, algorithm, clientVertical, cluster.size, cluster.percentage,
|
|
203
|
+
ch.avg_ltv_class, ch.avg_engagement_score, ch.avg_engagement_score, ch.avg_intention_level, ch.avg_days_since_lead,
|
|
204
|
+
JSON.stringify(ch.dominant_countries), JSON.stringify(ch.dominant_states),
|
|
205
|
+
JSON.stringify(ch.dominant_utm_sources), JSON.stringify(ch.top_features),
|
|
206
|
+
silhouetteScore,
|
|
207
|
+
JSON.stringify([cluster.action_recommendation]), JSON.stringify([]), JSON.stringify([]),
|
|
171
208
|
now, now,
|
|
172
209
|
).run();
|
|
173
210
|
}
|
|
174
211
|
|
|
175
|
-
// 8. Log no histórico
|
|
176
212
|
try {
|
|
177
213
|
await env.DB.prepare(`
|
|
178
|
-
INSERT INTO ml_clustering_history (
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
`).bind(
|
|
184
|
-
new Date(startTime).toISOString(), algorithm, leads.length, mlResult.clusters.length,
|
|
185
|
-
duration, Math.ceil(duration * 0.01),
|
|
186
|
-
JSON.stringify({ algorithm, n_clusters: nClusters, vertical: clientVertical }),
|
|
187
|
-
JSON.stringify({ clusters: mlResult.clusters.length, silhouette: mlResult.silhouette_score }),
|
|
214
|
+
INSERT INTO ml_clustering_history (clustering_id, started_at, completed_at, algorithm, n_leads_processed, n_clusters_created, total_duration_ms, workers_ai_neurons_used, status, parameters, results_summary)
|
|
215
|
+
VALUES (0, ?, datetime('now'), ?, ?, ?, ?, ?, 'completed', ?, ?)
|
|
216
|
+
`).bind(new Date(startTime).toISOString(), algorithm, leads.length, clusters.length, duration, Math.ceil(duration * 0.01),
|
|
217
|
+
JSON.stringify({ algorithm, n_clusters: nClusters, vertical: clientVertical, engine: 'embeddinggemma-300m+kmeans' }),
|
|
218
|
+
JSON.stringify({ clusters: clusters.length, silhouette: silhouetteScore }),
|
|
188
219
|
).run();
|
|
189
220
|
} catch (e) { console.error('[Segmentation] history log error:', e.message); }
|
|
190
221
|
|
|
191
222
|
return new Response(JSON.stringify({
|
|
192
|
-
success: true, algorithm,
|
|
193
|
-
|
|
194
|
-
|
|
223
|
+
success: true, algorithm, engine: 'embeddinggemma-300m + kmeans vetorial',
|
|
224
|
+
n_clusters: clusters.length, client_vertical: clientVertical,
|
|
225
|
+
leads_analyzed: leads.length, sample_embedded: sample.length,
|
|
226
|
+
duration_ms: duration, silhouette_score: silhouetteScore,
|
|
227
|
+
clusters, generated_at: now,
|
|
195
228
|
}), { status: 200, headers });
|
|
196
229
|
|
|
197
230
|
} catch (err) {
|
|
198
231
|
console.error('[Segmentation] cluster error:', err.message);
|
|
199
232
|
try {
|
|
200
|
-
if (env.DB)
|
|
201
|
-
|
|
202
|
-
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
}
|
|
206
|
-
} catch { /* não bloquear a resposta de erro */ }
|
|
207
|
-
|
|
233
|
+
if (env.DB) await env.DB.prepare(`
|
|
234
|
+
INSERT INTO ml_clustering_history (clustering_id, started_at, algorithm, n_leads_processed, n_clusters_created, total_duration_ms, workers_ai_neurons_used, status, error_message, parameters, results_summary)
|
|
235
|
+
VALUES (0, datetime('now'), ?, 0, 0, 0, 0, 'failed', ?, ?, '{}')
|
|
236
|
+
`).bind(algorithm, err.message, JSON.stringify({ algorithm, n_clusters: nClusters })).run();
|
|
237
|
+
} catch { /* não bloquear */ }
|
|
208
238
|
return new Response(JSON.stringify({ error: 'Erro ao executar clustering', message: err.message }), { status: 500, headers });
|
|
209
239
|
}
|
|
210
240
|
}
|
|
@@ -1903,7 +1903,7 @@ async function predictLtv(env, payload, request, customSystemPrompt = null) {
|
|
|
1903
1903
|
has_phone: !!payload.phone,
|
|
1904
1904
|
})},
|
|
1905
1905
|
];
|
|
1906
|
-
const aiRes = await env.AI.run('@cf/
|
|
1906
|
+
const aiRes = await env.AI.run('@cf/ibm-granite/granite-4.0-h-micro', { messages: prompt, max_tokens: 32 });
|
|
1907
1907
|
const parsed = JSON.parse(aiRes.response.trim());
|
|
1908
1908
|
if (typeof parsed.adjustment === 'number') {
|
|
1909
1909
|
aiAdjustment = Math.max(-10, Math.min(10, parsed.adjustment));
|
|
@@ -2415,8 +2415,82 @@ function tryParseJson(str, fallback) {
|
|
|
2415
2415
|
try { return JSON.parse(str); } catch { return fallback !== undefined ? fallback : null; }
|
|
2416
2416
|
}
|
|
2417
2417
|
|
|
2418
|
+
// ── Helpers K-means vetorial (usado pelo clustering com embeddings) ───────────
|
|
2419
|
+
|
|
2420
|
+
function _cosDist(a, b) {
|
|
2421
|
+
let dot = 0, na = 0, nb = 0;
|
|
2422
|
+
for (let i = 0; i < a.length; i++) { dot += a[i]*b[i]; na += a[i]*a[i]; nb += b[i]*b[i]; }
|
|
2423
|
+
return 1 - dot / (Math.sqrt(na) * Math.sqrt(nb) + 1e-10);
|
|
2424
|
+
}
|
|
2425
|
+
|
|
2426
|
+
function _kmeansRun(vectors, k, maxIter = 25) {
|
|
2427
|
+
const n = vectors.length;
|
|
2428
|
+
const dim = vectors[0].length;
|
|
2429
|
+
// K-means++ init
|
|
2430
|
+
const centroids = [vectors[Math.floor(Math.random() * n)]];
|
|
2431
|
+
while (centroids.length < k) {
|
|
2432
|
+
const dists = vectors.map(v => Math.min(...centroids.map(c => _cosDist(v, c))));
|
|
2433
|
+
const sum = dists.reduce((a, b) => a + b, 0);
|
|
2434
|
+
let r = Math.random() * sum, cumul = 0;
|
|
2435
|
+
for (let i = 0; i < n; i++) { cumul += dists[i]; if (cumul >= r) { centroids.push(vectors[i]); break; } }
|
|
2436
|
+
if (centroids.length < k) centroids.push(vectors[Math.floor(Math.random() * n)]);
|
|
2437
|
+
}
|
|
2438
|
+
|
|
2439
|
+
let assignments = new Array(n).fill(0);
|
|
2440
|
+
for (let iter = 0; iter < maxIter; iter++) {
|
|
2441
|
+
let changed = false;
|
|
2442
|
+
for (let i = 0; i < n; i++) {
|
|
2443
|
+
let best = 0, bestD = Infinity;
|
|
2444
|
+
for (let c = 0; c < k; c++) { const d = _cosDist(vectors[i], centroids[c]); if (d < bestD) { bestD = d; best = c; } }
|
|
2445
|
+
if (assignments[i] !== best) { assignments[i] = best; changed = true; }
|
|
2446
|
+
}
|
|
2447
|
+
if (!changed) break;
|
|
2448
|
+
// Recompute centroids
|
|
2449
|
+
for (let c = 0; c < k; c++) {
|
|
2450
|
+
const members = vectors.filter((_, i) => assignments[i] === c);
|
|
2451
|
+
if (members.length === 0) continue;
|
|
2452
|
+
for (let d = 0; d < dim; d++) centroids[c][d] = members.reduce((s, v) => s + v[d], 0) / members.length;
|
|
2453
|
+
}
|
|
2454
|
+
}
|
|
2455
|
+
return { assignments, centroids };
|
|
2456
|
+
}
|
|
2457
|
+
|
|
2458
|
+
function _silhouette(vectors, assignments, k) {
|
|
2459
|
+
const n = vectors.length;
|
|
2460
|
+
let total = 0;
|
|
2461
|
+
for (let i = 0; i < n; i++) {
|
|
2462
|
+
const ci = assignments[i];
|
|
2463
|
+
const sameCluster = vectors.filter((_, j) => j !== i && assignments[j] === ci);
|
|
2464
|
+
const a = sameCluster.length ? sameCluster.reduce((s, v) => s + _cosDist(vectors[i], v), 0) / sameCluster.length : 0;
|
|
2465
|
+
let b = Infinity;
|
|
2466
|
+
for (let c = 0; c < k; c++) {
|
|
2467
|
+
if (c === ci) continue;
|
|
2468
|
+
const other = vectors.filter((_, j) => assignments[j] === c);
|
|
2469
|
+
if (other.length) b = Math.min(b, other.reduce((s, v) => s + _cosDist(vectors[i], v), 0) / other.length);
|
|
2470
|
+
}
|
|
2471
|
+
total += b === Infinity ? 0 : (b - a) / Math.max(a, b);
|
|
2472
|
+
}
|
|
2473
|
+
return Math.round((total / n) * 1000) / 1000;
|
|
2474
|
+
}
|
|
2475
|
+
|
|
2476
|
+
function _buildLeadProfile(l) {
|
|
2477
|
+
return [
|
|
2478
|
+
`LTV: ${l.predicted_ltv_class || 'desconhecido'}`,
|
|
2479
|
+
`engajamento: ${Math.round(l.engagement_score || 0)}`,
|
|
2480
|
+
`intenção: ${l.intention_level || 'desconhecida'}`,
|
|
2481
|
+
`origem: ${l.utm_source || 'direto'}`,
|
|
2482
|
+
`canal: ${l.utm_medium || 'desconhecido'}`,
|
|
2483
|
+
`país: ${l.country || 'BR'}`,
|
|
2484
|
+
`estado: ${l.state || ''}`,
|
|
2485
|
+
`hora: ${l.hour_of_day || 12}h`,
|
|
2486
|
+
(l.is_weekend ? 'fim-de-semana' : 'dia-útil'),
|
|
2487
|
+
`recência: ${l.days_since_lead || 0} dias`,
|
|
2488
|
+
].filter(Boolean).join(', ');
|
|
2489
|
+
}
|
|
2490
|
+
|
|
2418
2491
|
// ── POST /api/segmentation/cluster ───────────────────────────────────────────
|
|
2419
|
-
//
|
|
2492
|
+
// Clustering real com embeddings (embeddinggemma-300m) + K-means vetorial
|
|
2493
|
+
// Granite usado apenas para nomear segmentos
|
|
2420
2494
|
// Requer bindings: DB + AI
|
|
2421
2495
|
async function handleSegmentationCluster(env, request, headers) {
|
|
2422
2496
|
if (!env.DB) return new Response(JSON.stringify({ error: 'DB não configurado' }), { status: 503, headers });
|
|
@@ -2424,7 +2498,7 @@ async function handleSegmentationCluster(env, request, headers) {
|
|
|
2424
2498
|
|
|
2425
2499
|
const url = new URL(request.url);
|
|
2426
2500
|
const algorithm = url.searchParams.get('algorithm') || 'kmeans';
|
|
2427
|
-
const nClusters = Math.min(10, Math.max(
|
|
2501
|
+
const nClusters = Math.min(10, Math.max(2, parseInt(url.searchParams.get('n_clusters') || '5')));
|
|
2428
2502
|
const clientVertical = url.searchParams.get('vertical') || 'general';
|
|
2429
2503
|
const forceRecluster = url.searchParams.get('force') === 'true';
|
|
2430
2504
|
|
|
@@ -2480,96 +2554,94 @@ async function handleSegmentationCluster(env, request, headers) {
|
|
|
2480
2554
|
}), { status: 400, headers });
|
|
2481
2555
|
}
|
|
2482
2556
|
|
|
2483
|
-
|
|
2484
|
-
const features = leads.map(l => ({
|
|
2485
|
-
id: l.id,
|
|
2486
|
-
ltv: l.predicted_ltv_class === 'High' ? 1 : (l.predicted_ltv_class === 'Medium' ? 0.5 : 0),
|
|
2487
|
-
engagement: Math.min((l.engagement_score || 0) / 100, 1),
|
|
2488
|
-
intention: l.intention_level === 'comprador' || l.intention_level === 'high_intent' ? 1
|
|
2489
|
-
: l.intention_level === 'interessado' ? 0.6
|
|
2490
|
-
: l.intention_level === 'curioso' ? 0.3 : 0,
|
|
2491
|
-
recency: Math.max(0, 1 - (l.days_since_lead || 0) / 180),
|
|
2492
|
-
hour: (l.hour_of_day || 12) / 23,
|
|
2493
|
-
is_weekend: l.is_weekend || 0,
|
|
2494
|
-
is_br: l.country === 'BR' ? 1 : 0,
|
|
2495
|
-
is_paid: ['facebook','google','tiktok','instagram','youtube'].includes(
|
|
2496
|
-
(l.utm_source || '').toLowerCase()) ? 1 : 0,
|
|
2497
|
-
}));
|
|
2557
|
+
const startTime = Date.now();
|
|
2498
2558
|
|
|
2499
|
-
//
|
|
2500
|
-
const
|
|
2501
|
-
const
|
|
2502
|
-
|
|
2503
|
-
const
|
|
2504
|
-
|
|
2505
|
-
|
|
2506
|
-
|
|
2507
|
-
|
|
2508
|
-
- engagement: browser engagement score
|
|
2509
|
-
- intention: purchase intention (0=none, 0.3=curious, 0.6=interested, 1=buyer)
|
|
2510
|
-
- recency: lead recency (1=today, 0=6 months ago)
|
|
2511
|
-
- hour: conversion hour of day
|
|
2512
|
-
- is_weekend: converted on weekend (0/1)
|
|
2513
|
-
- is_br: lead from Brazil (0/1)
|
|
2514
|
-
- is_paid: from paid traffic channel (0/1)
|
|
2515
|
-
|
|
2516
|
-
Data (${sampleSize} customers): ${JSON.stringify(sample.slice(0, 50))}
|
|
2517
|
-
|
|
2518
|
-
Return ONLY valid JSON, zero explanation:
|
|
2519
|
-
{
|
|
2520
|
-
"clusters": [
|
|
2521
|
-
{
|
|
2522
|
-
"cluster_id": 0,
|
|
2523
|
-
"name": "[Nome Descritivo em Português]",
|
|
2524
|
-
"size": ${Math.round(sampleSize / nClusters)},
|
|
2525
|
-
"percentage": ${Math.round(100 / nClusters)},
|
|
2526
|
-
"characteristics": {
|
|
2527
|
-
"avg_ltv_class": 0.5,
|
|
2528
|
-
"avg_behavior_score": 0.5,
|
|
2529
|
-
"avg_engagement_score": 0.5,
|
|
2530
|
-
"avg_intention_level": 0.5,
|
|
2531
|
-
"avg_days_since_lead": 30,
|
|
2532
|
-
"dominant_countries": ["BR"],
|
|
2533
|
-
"dominant_states": ["SP", "RJ"],
|
|
2534
|
-
"dominant_utm_sources": ["facebook"],
|
|
2535
|
-
"top_features": ["ltv", "engagement"]
|
|
2536
|
-
},
|
|
2537
|
-
"centroid": { "ltv": 0.5, "engagement": 0.5, "intention": 0.5 },
|
|
2538
|
-
"action_recommendation": "[Recomendação de campanha específica para este segmento]"
|
|
2559
|
+
// 3. Gerar perfis textuais e embeddings via embeddinggemma-300m
|
|
2560
|
+
const sample = leads.slice(0, 100); // max 100 por batch
|
|
2561
|
+
const profiles = sample.map(_buildLeadProfile);
|
|
2562
|
+
|
|
2563
|
+
const embRes = await env.AI.run('@cf/baai/bge-m3', { text: profiles });
|
|
2564
|
+
const vectors = embRes.data; // float32[][] shape [N, 768]
|
|
2565
|
+
|
|
2566
|
+
if (!vectors || vectors.length < nClusters) {
|
|
2567
|
+
throw new Error(`embeddinggemma retornou ${vectors?.length ?? 0} vetores — insuficiente para ${nClusters} clusters`);
|
|
2539
2568
|
}
|
|
2540
|
-
],
|
|
2541
|
-
"silhouette_score": 0.65,
|
|
2542
|
-
"total_processed": ${sampleSize}
|
|
2543
|
-
}`;
|
|
2544
2569
|
|
|
2545
|
-
//
|
|
2546
|
-
const
|
|
2547
|
-
|
|
2548
|
-
|
|
2549
|
-
|
|
2570
|
+
// 4. K-means vetorial real (cosine distance)
|
|
2571
|
+
const { assignments } = _kmeansRun(vectors, nClusters);
|
|
2572
|
+
|
|
2573
|
+
// 5. Silhouette score real
|
|
2574
|
+
const silhouetteScore = _silhouette(vectors, assignments, nClusters);
|
|
2575
|
+
|
|
2576
|
+
// 6. Agregar estatísticas por cluster para nomear com Granite
|
|
2577
|
+
const clusterStats = Array.from({ length: nClusters }, (_, c) => {
|
|
2578
|
+
const members = sample.filter((_, i) => assignments[i] === c);
|
|
2579
|
+
if (members.length === 0) return null;
|
|
2580
|
+
const ltvMap = { High: 1, Medium: 0.5, Low: 0 };
|
|
2581
|
+
const avgLtv = members.reduce((s, l) => s + (ltvMap[l.predicted_ltv_class] ?? 0), 0) / members.length;
|
|
2582
|
+
const avgEng = members.reduce((s, l) => s + (l.engagement_score || 0), 0) / members.length;
|
|
2583
|
+
const avgDays = members.reduce((s, l) => s + (l.days_since_lead || 0), 0) / members.length;
|
|
2584
|
+
const sources = members.map(l => l.utm_source).filter(Boolean);
|
|
2585
|
+
const states = members.map(l => l.state).filter(Boolean);
|
|
2586
|
+
const topSource = sources.length ? [...sources.reduce((m, s) => m.set(s, (m.get(s)||0)+1), new Map())].sort((a,b)=>b[1]-a[1])[0]?.[0] : 'direto';
|
|
2587
|
+
const topState = states.length ? [...states.reduce((m, s) => m.set(s, (m.get(s)||0)+1), new Map())].sort((a,b)=>b[1]-a[1])[0]?.[0] : 'BR';
|
|
2588
|
+
const intentions = members.map(l => l.intention_level).filter(Boolean);
|
|
2589
|
+
const topIntent = intentions.length ? [...intentions.reduce((m, s) => m.set(s,(m.get(s)||0)+1), new Map())].sort((a,b)=>b[1]-a[1])[0]?.[0] : 'desconhecida';
|
|
2590
|
+
return { c, size: members.length, pct: Math.round(members.length / sample.length * 100), avgLtv, avgEng, avgDays, topSource, topState, topIntent };
|
|
2591
|
+
}).filter(Boolean);
|
|
2592
|
+
|
|
2593
|
+
// 7. Usar Granite apenas para nomear e recomendar ação por cluster
|
|
2594
|
+
const namingPrompt =
|
|
2595
|
+
`Você é especialista em segmentação de clientes. Dê um nome descritivo em português e uma recomendação de campanha para cada segmento abaixo. Retorne SOMENTE JSON válido:
|
|
2596
|
+
{"segments":[{"cluster_id":0,"name":"...","action":"..."},...]}
|
|
2597
|
+
|
|
2598
|
+
Segmentos:
|
|
2599
|
+
${clusterStats.map(s => `Cluster ${s.c}: LTV médio=${s.avgLtv.toFixed(2)}, engajamento=${s.avgEng.toFixed(0)}, intenção dominante="${s.topIntent}", origem="${s.topSource}", estado="${s.topState}", recência=${s.avgDays.toFixed(0)} dias, tamanho=${s.size} leads`).join('\n')}`;
|
|
2600
|
+
|
|
2601
|
+
const nameRes = await env.AI.run('@cf/ibm-granite/granite-4.0-h-micro', {
|
|
2602
|
+
messages: [{ role: 'user', content: namingPrompt }],
|
|
2603
|
+
max_tokens: 800,
|
|
2550
2604
|
});
|
|
2551
|
-
const duration = Date.now() - startTime;
|
|
2552
2605
|
|
|
2553
|
-
|
|
2606
|
+
let clusterNames = {};
|
|
2607
|
+
try {
|
|
2608
|
+
const m = (nameRes?.response || '').match(/\{[\s\S]*\}/);
|
|
2609
|
+
if (m) {
|
|
2610
|
+
const parsed = JSON.parse(m[0]);
|
|
2611
|
+
(parsed.segments || []).forEach(s => { clusterNames[s.cluster_id] = { name: s.name, action: s.action }; });
|
|
2612
|
+
}
|
|
2613
|
+
} catch { /* usa nomes fallback */ }
|
|
2554
2614
|
|
|
2555
|
-
|
|
2556
|
-
const jsonMatch = aiRes.response.trim().match(/\{[\s\S]*\}/);
|
|
2557
|
-
if (!jsonMatch) throw new Error('Resposta do Workers AI não contém JSON válido');
|
|
2558
|
-
const mlResult = JSON.parse(jsonMatch[0]);
|
|
2615
|
+
const duration = Date.now() - startTime;
|
|
2559
2616
|
|
|
2560
|
-
|
|
2561
|
-
|
|
2562
|
-
|
|
2617
|
+
// 8. Montar resultado final
|
|
2618
|
+
const clusters = clusterStats.map(s => ({
|
|
2619
|
+
cluster_id: s.c,
|
|
2620
|
+
name: clusterNames[s.c]?.name || `Segmento ${s.c + 1}`,
|
|
2621
|
+
size: s.size,
|
|
2622
|
+
percentage: s.pct,
|
|
2623
|
+
action_recommendation: clusterNames[s.c]?.action || '',
|
|
2624
|
+
characteristics: {
|
|
2625
|
+
avg_ltv_class: s.avgLtv,
|
|
2626
|
+
avg_engagement_score: s.avgEng,
|
|
2627
|
+
avg_intention_level: s.avgLtv,
|
|
2628
|
+
avg_days_since_lead: s.avgDays,
|
|
2629
|
+
dominant_countries: ['BR'],
|
|
2630
|
+
dominant_states: [s.topState],
|
|
2631
|
+
dominant_utm_sources: [s.topSource],
|
|
2632
|
+
top_features: ['ltv', 'engagement', 'intention'],
|
|
2633
|
+
},
|
|
2634
|
+
}));
|
|
2563
2635
|
|
|
2564
|
-
//
|
|
2636
|
+
// 9. Inativar clusters anteriores do mesmo algoritmo/vertical
|
|
2565
2637
|
await env.DB.prepare(
|
|
2566
2638
|
`UPDATE ml_segments SET is_active = 0 WHERE clustering_algorithm = ? AND client_vertical = ? AND is_active = 1`
|
|
2567
2639
|
).bind(algorithm, clientVertical).run();
|
|
2568
2640
|
|
|
2569
|
-
//
|
|
2641
|
+
// 10. Persistir novos clusters no D1
|
|
2570
2642
|
const now = new Date().toISOString();
|
|
2571
|
-
for (const cluster of
|
|
2572
|
-
const ch = cluster.characteristics
|
|
2643
|
+
for (const cluster of clusters) {
|
|
2644
|
+
const ch = cluster.characteristics;
|
|
2573
2645
|
await env.DB.prepare(`
|
|
2574
2646
|
INSERT INTO ml_segments (
|
|
2575
2647
|
cluster_id, cluster_name, clustering_algorithm, client_vertical,
|
|
@@ -2581,23 +2653,23 @@ Return ONLY valid JSON, zero explanation:
|
|
|
2581
2653
|
is_active, created_at, updated_at
|
|
2582
2654
|
) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,1,?,?)
|
|
2583
2655
|
`).bind(
|
|
2584
|
-
cluster.cluster_id
|
|
2585
|
-
cluster.name
|
|
2656
|
+
cluster.cluster_id,
|
|
2657
|
+
cluster.name,
|
|
2586
2658
|
algorithm,
|
|
2587
2659
|
clientVertical,
|
|
2588
|
-
cluster.size
|
|
2589
|
-
cluster.percentage
|
|
2590
|
-
ch.avg_ltv_class
|
|
2591
|
-
ch.
|
|
2592
|
-
ch.avg_engagement_score
|
|
2593
|
-
ch.avg_intention_level
|
|
2594
|
-
ch.avg_days_since_lead
|
|
2595
|
-
JSON.stringify(ch.dominant_countries
|
|
2596
|
-
JSON.stringify(ch.dominant_states
|
|
2597
|
-
JSON.stringify(ch.dominant_utm_sources
|
|
2598
|
-
JSON.stringify(ch.top_features
|
|
2599
|
-
|
|
2600
|
-
JSON.stringify([cluster.action_recommendation
|
|
2660
|
+
cluster.size,
|
|
2661
|
+
cluster.percentage,
|
|
2662
|
+
ch.avg_ltv_class,
|
|
2663
|
+
ch.avg_engagement_score,
|
|
2664
|
+
ch.avg_engagement_score,
|
|
2665
|
+
ch.avg_intention_level,
|
|
2666
|
+
ch.avg_days_since_lead,
|
|
2667
|
+
JSON.stringify(ch.dominant_countries),
|
|
2668
|
+
JSON.stringify(ch.dominant_states),
|
|
2669
|
+
JSON.stringify(ch.dominant_utm_sources),
|
|
2670
|
+
JSON.stringify(ch.top_features),
|
|
2671
|
+
silhouetteScore,
|
|
2672
|
+
JSON.stringify([cluster.action_recommendation]),
|
|
2601
2673
|
JSON.stringify([]),
|
|
2602
2674
|
JSON.stringify([]),
|
|
2603
2675
|
now,
|
|
@@ -2605,7 +2677,7 @@ Return ONLY valid JSON, zero explanation:
|
|
|
2605
2677
|
).run();
|
|
2606
2678
|
}
|
|
2607
2679
|
|
|
2608
|
-
//
|
|
2680
|
+
// 11. Log no histórico de clustering
|
|
2609
2681
|
try {
|
|
2610
2682
|
await env.DB.prepare(`
|
|
2611
2683
|
INSERT INTO ml_clustering_history (
|
|
@@ -2617,23 +2689,25 @@ Return ONLY valid JSON, zero explanation:
|
|
|
2617
2689
|
new Date(startTime).toISOString(),
|
|
2618
2690
|
algorithm,
|
|
2619
2691
|
leads.length,
|
|
2620
|
-
|
|
2692
|
+
clusters.length,
|
|
2621
2693
|
duration,
|
|
2622
2694
|
Math.ceil(duration * 0.01),
|
|
2623
|
-
JSON.stringify({ algorithm, n_clusters: nClusters, vertical: clientVertical }),
|
|
2624
|
-
JSON.stringify({ clusters:
|
|
2695
|
+
JSON.stringify({ algorithm, n_clusters: nClusters, vertical: clientVertical, engine: 'embeddinggemma-300m+kmeans' }),
|
|
2696
|
+
JSON.stringify({ clusters: clusters.length, silhouette: silhouetteScore }),
|
|
2625
2697
|
).run();
|
|
2626
2698
|
} catch (e) { console.error('[Segmentation] history log error:', e.message); }
|
|
2627
2699
|
|
|
2628
2700
|
return new Response(JSON.stringify({
|
|
2629
2701
|
success: true,
|
|
2630
2702
|
algorithm,
|
|
2631
|
-
|
|
2703
|
+
engine: 'embeddinggemma-300m + kmeans vetorial',
|
|
2704
|
+
n_clusters: clusters.length,
|
|
2632
2705
|
client_vertical: clientVertical,
|
|
2633
2706
|
leads_analyzed: leads.length,
|
|
2707
|
+
sample_embedded: sample.length,
|
|
2634
2708
|
duration_ms: duration,
|
|
2635
|
-
silhouette_score:
|
|
2636
|
-
clusters
|
|
2709
|
+
silhouette_score: silhouetteScore,
|
|
2710
|
+
clusters,
|
|
2637
2711
|
generated_at: now,
|
|
2638
2712
|
}), { status: 200, headers });
|
|
2639
2713
|
|
|
@@ -2794,14 +2868,6 @@ async function handleSegmentationUpdate(env, request, headers) {
|
|
|
2794
2868
|
// Heurístico puro (sem AI) — latência zero no /track
|
|
2795
2869
|
// ─────────────────────────────────────────────────────────────────────────────
|
|
2796
2870
|
|
|
2797
|
-
// Domínios de email descartáveis
|
|
2798
|
-
const DISPOSABLE_EMAIL_DOMAINS = new Set([
|
|
2799
|
-
'mailinator.com','guerrillamail.com','tempmail.com','throwaway.email',
|
|
2800
|
-
'yopmail.com','sharklasers.com','guerrillamailblock.com','spam4.me',
|
|
2801
|
-
'10minutemail.com','trashmail.com','maildrop.cc','fakeinbox.com',
|
|
2802
|
-
'dispostable.com','mailnull.com','tempr.email','getnada.com',
|
|
2803
|
-
]);
|
|
2804
|
-
|
|
2805
2871
|
// ASNs conhecidos de datacenters (evitar falsos negativos em ASNs legítimos)
|
|
2806
2872
|
const DATACENTER_PATTERNS = /amazon|google|microsoft|digitalocean|linode|ovh|vultr|hetzner|contabo|cloudflare|packet|rackspace|leaseweb/i;
|
|
2807
2873
|
|
|
@@ -2854,15 +2920,7 @@ async function checkFraudGate(env, request, payload) {
|
|
|
2854
2920
|
result.score += 20; result.reasons.push('no_accept_language');
|
|
2855
2921
|
}
|
|
2856
2922
|
|
|
2857
|
-
// 6.
|
|
2858
|
-
if (email) {
|
|
2859
|
-
const domain = email.split('@')[1]?.toLowerCase();
|
|
2860
|
-
if (domain && DISPOSABLE_EMAIL_DOMAINS.has(domain)) {
|
|
2861
|
-
result.score += 25; result.reasons.push('disposable_email');
|
|
2862
|
-
}
|
|
2863
|
-
}
|
|
2864
|
-
|
|
2865
|
-
// 7. Velocity check via KV
|
|
2923
|
+
// 6. Velocity check via KV
|
|
2866
2924
|
if (env.GEO_CACHE && ip) {
|
|
2867
2925
|
const velKey1h = `fraud_velocity:${ip}:h`;
|
|
2868
2926
|
const velStr = await env.GEO_CACHE.get(velKey1h);
|
|
@@ -3839,7 +3897,7 @@ export default {
|
|
|
3839
3897
|
|
|
3840
3898
|
// Workers AI — ping
|
|
3841
3899
|
try {
|
|
3842
|
-
await env.AI.run('@cf/
|
|
3900
|
+
await env.AI.run('@cf/ibm-granite/granite-4.0-h-micro', {
|
|
3843
3901
|
messages: [{ role: 'user', content: 'ping' }],
|
|
3844
3902
|
max_tokens: 1,
|
|
3845
3903
|
});
|
|
@@ -25,10 +25,10 @@ zone_name = "lancamentosabc.com.br"
|
|
|
25
25
|
|
|
26
26
|
# ── Variáveis públicas (não são segredos) ─────────────────────────────────────
|
|
27
27
|
[vars]
|
|
28
|
-
META_PIXEL_ID = "
|
|
29
|
-
GA4_MEASUREMENT_ID = "G-
|
|
30
|
-
TIKTOK_PIXEL_ID = "
|
|
31
|
-
SITE_DOMAIN = "
|
|
28
|
+
META_PIXEL_ID = "1583939052660159"
|
|
29
|
+
GA4_MEASUREMENT_ID = "G-G7VEN1MNH1"
|
|
30
|
+
TIKTOK_PIXEL_ID = "D71D6T3C77U56RM5VF0G"
|
|
31
|
+
SITE_DOMAIN = "lancamentosabc.com.br"
|
|
32
32
|
|
|
33
33
|
# ── Banco D1 ──────────────────────────────────────────────────────────────────
|
|
34
34
|
# Após criar o banco com "wrangler d1 create cdp-edge-db",
|
|
@@ -95,6 +95,22 @@ namespace_id = "1001"
|
|
|
95
95
|
limit = 60
|
|
96
96
|
period = 60
|
|
97
97
|
|
|
98
|
+
# ── Observabilidade — Logs + Traces persistidos no painel Cloudflare ─────────
|
|
99
|
+
[observability]
|
|
100
|
+
enabled = false
|
|
101
|
+
head_sampling_rate = 1
|
|
102
|
+
|
|
103
|
+
[observability.logs]
|
|
104
|
+
enabled = true
|
|
105
|
+
head_sampling_rate = 1
|
|
106
|
+
persist = true
|
|
107
|
+
invocation_logs = true
|
|
108
|
+
|
|
109
|
+
[observability.traces]
|
|
110
|
+
enabled = false
|
|
111
|
+
persist = true
|
|
112
|
+
head_sampling_rate = 1
|
|
113
|
+
|
|
98
114
|
# ── Secrets (NÃO ficam aqui — configurar via CLI) ─────────────────────────────
|
|
99
115
|
# wrangler secret put META_ACCESS_TOKEN ← token Meta CAPI (obrigatório)
|
|
100
116
|
# wrangler secret put GA4_API_SECRET ← secret GA4 Measurement Protocol (obrigatório)
|
|
@@ -107,6 +123,7 @@ period = 60
|
|
|
107
123
|
# wrangler secret put RESEND_API_KEY ← API Key do Resend (resend.com)
|
|
108
124
|
# wrangler secret put RESEND_FROM_EMAIL ← Remetente verificado ex: "CDP Edge <noreply@seudominio.com.br>"
|
|
109
125
|
# wrangler secret put WA_WEBHOOK_VERIFY_TOKEN ← Token de verificação do webhook WhatsApp (você define — qualquer string segura)
|
|
126
|
+
# wrangler secret put WEBHOOK_SECRET_TICTO ← HMAC-SHA256 Ticto
|
|
110
127
|
# wrangler secret put PINTEREST_ACCESS_TOKEN ← Bearer token Pinterest Conversions API
|
|
111
128
|
# wrangler secret put PINTEREST_AD_ACCOUNT_ID ← ID da conta de anúncios Pinterest (ex: 549755813XXX)
|
|
112
129
|
# wrangler secret put REDDIT_ACCESS_TOKEN ← Bearer token Reddit Conversions API
|