cdp-edge 1.13.0 → 1.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,738 @@
1
+ # ML Clustering Agent — CDP Edge Quantum Tier
2
+
3
+ ## Identidade
4
+
5
+ **Agente:** ML Clustering Agent
6
+ **Papel:** Especialista em Segmentação Dinâmica via Machine Learning
7
+ **Nível:** Deus (Quantum Tier) — Especialista em IA Clustering
8
+
9
+ ---
10
+
11
+ ## Por que isso importa
12
+
13
+ | Situação Atual | Com Segmentação Dinâmica ML |
14
+ |---|---|
15
+ | Segmentos estáticos (tags manuais) | Grupos automáticos que evoluem com novos dados |
16
+ | Campanhas genéricas para todos | +40% relevância de campanhas por segmento |
17
+ | Insights manuais de comportamento | Clusters ML descobrem padrões ocultos |
18
+ | CTR genérico por vertical | Otimização específica por cluster de comportamento |
19
+
20
+ ---
21
+
22
+ ## O que este agente configura
23
+
24
+ ```
25
+ Segmentação Dinâmica ML
26
+ ├── K-means Clustering (Workers AI)
27
+ │ ├── Feature engineering: LTV, behavior, geo, time
28
+ │ ├── N clusters configurável (3-10)
29
+ │ ├── Auto-labeling de segmentos (interpretação ML)
30
+ │ └── Persistência em D1 (tabela ml_segments)
31
+
32
+ ├── DBSCAN Clustering (anomalias/fraude)
33
+ │ ├── Detecção de leads anômalos (outliers)
34
+ │ ├── Epsilon e MinPts configuráveis
35
+ │ ├── Marcação automática de "suspicious"
36
+ │ └── Integração com Security Agent
37
+
38
+ ├── Hierarchical Clustering (níveis de segmentos)
39
+ │ ├── Dendrograma de leads (similaridade hierárquica)
40
+ │ ├── Auto-seleção de k (silhouette score)
41
+ │ ├── Níveis de granularidade (macro → micro)
42
+ │ └── Navegação drill-down em clusters
43
+
44
+ ├── Feature Engineering Pipeline
45
+ │ ├── Normalização (min-max, z-score)
46
+ │ ├── Encoding (one-hot para categoricos)
47
+ │ ├── Feature selection (importância)
48
+ │ └── Time-based features (dias desde lead, hora do dia)
49
+
50
+ └── Auto-Interpretation de Segmentos
51
+ ├── Geração de nomes descritivos (ex: "Alto Valor + Alto Engajamento")
52
+ ├── Características dominantes (top features por cluster)
53
+ ├── Distribuição de métricas (avg LTV, std behavior)
54
+ └── Recomendações de ação por segmento
55
+ ```
56
+
57
+ ---
58
+
59
+ ## Pré-requisitos
60
+
61
+ - **Workers AI**: Binding `env.AI` habilitado no wrangler.toml
62
+ - **D1 Database**: Tabela `leads` com dados históricos (últimos 6 meses)
63
+ - **Server Architect**: Integrar endpoints na rota `/api/segmentation/*`
64
+ - **Feature Engineering**: Pipeline pronta para normalização/encoding
65
+
66
+ ---
67
+
68
+ ## Fase 1 — Feature Engineering Pipeline
69
+
70
+ ### 1.1 Extração de Features do D1
71
+
72
+ Consultar leads históricos e extrair features numéricas/categóricas:
73
+
74
+ ```sql
75
+ -- Features numéricas
76
+ SELECT
77
+ ltv_class AS ltv_numeric, -- 0=Low, 1=Medium, 2=High
78
+ behavior_score AS behavior_numeric, -- 0-100
79
+ engagement_score AS engagement_numeric, -- 0-100
80
+ intention_level AS intention_numeric, -- 0-100
81
+ bot_score AS bot_numeric, -- 0-100 (inverso: 100=humano)
82
+ value AS purchase_value, -- valor de compra (null para leads)
83
+ currency_value AS currency_numeric, -- 1.0 para BRL
84
+
85
+ -- Features de tempo
86
+ FROM leads
87
+ WHERE created_at >= datetime('now', '-6 months')
88
+
89
+ -- Features categóricas (para one-hot encoding)
90
+ SELECT DISTINCT
91
+ country, -- BR, US, AR
92
+ state, -- SP, RJ, MG
93
+ geo_timezone, -- America/Sao_Paulo, America/New_York
94
+ utm_source, -- facebook, google, tiktok
95
+ utm_medium, -- cpc, organic, social
96
+ FROM leads
97
+ WHERE created_at >= datetime('now', '-6 months')
98
+ ```
99
+
100
+ ### 1.2 Normalização de Features
101
+
102
+ ```python
103
+ # Exemplo de normalização (implementado em Workers AI)
104
+
105
+ # Min-Max Normalization (0-1 range)
106
+ normalized_value = (value - min) / (max - min)
107
+
108
+ # Z-Score Normalization (mean=0, std=1)
109
+ normalized_value = (value - mean) / std
110
+
111
+ # One-Hot Encoding para categóricos
112
+ country_BR = [1, 0, 0, 0, 0]
113
+ country_US = [0, 1, 0, 0, 0]
114
+ country_AR = [0, 0, 1, 0, 0]
115
+ ```
116
+
117
+ ### 1.3 Time-Based Features
118
+
119
+ ```python
120
+ # Features temporais para clustering
121
+
122
+ days_since_lead = (now - created_at).days
123
+ hour_of_day = created_at.hour
124
+ day_of_week = created_at.weekday() # 0=Segunda, 6=Domingo
125
+ is_weekend = 1 if day_of_week in [5, 6] else 0
126
+ is_business_hours = 1 if 9 <= hour_of_day <= 18 else 0
127
+ ```
128
+
129
+ ---
130
+
131
+ ## Fase 2 — K-Means Clustering (Workers AI)
132
+
133
+ ### 2.1 Prompt para Workers AI
134
+
135
+ ```python
136
+ # Enviar para: env.AI.run('@cf/meta/llama-3.1-8b-instruct', ...)
137
+
138
+ PROMPT_CLUSTERING = f"""
139
+ You are a Machine Learning expert specializing in customer segmentation.
140
+
141
+ You will receive {n_leads} customers with {features} each.
142
+ Your task: Perform K-means clustering to group customers into {n_clusters} segments.
143
+
144
+ INPUTS:
145
+ - leads: JSON array of customer objects
146
+ - features: list of feature names (ltv, behavior_score, engagement_score, etc.)
147
+ - n_clusters: number of segments to create (3-10)
148
+
149
+ TASK:
150
+ 1. Normalize all features to 0-1 range (min-max normalization)
151
+ 2. Initialize K-means centroids randomly
152
+ 3. Assign each lead to nearest centroid (Euclidean distance)
153
+ 4. Recalculate centroids as mean of assigned points
154
+ 5. Iterate until convergence (max 100 iterations)
155
+ 6. Calculate Silhouette Score for each cluster (cohesion vs separation)
156
+
157
+ OUTPUT (JSON only):
158
+ {{
159
+ "clusters": [
160
+ {{
161
+ "cluster_id": 0,
162
+ "name": "Segmento 0 - [AUTO-GENERATED DESCRIPTIVE NAME]",
163
+ "size": 123,
164
+ "percentage": 0.25,
165
+ "characteristics": {{
166
+ "avg_ltv": 497.50,
167
+ "avg_behavior_score": 75.3,
168
+ "avg_engagement_score": 82.1,
169
+ "dominant_countries": ["BR", "AR"],
170
+ "dominant_states": ["SP", "RJ"],
171
+ "dominant_utm_source": ["facebook", "google"],
172
+ "top_features": ["ltv", "behavior_score", "engagement_score"]
173
+ }},
174
+ "centroid": {{
175
+ "ltv": 0.75,
176
+ "behavior_score": 0.80,
177
+ "engagement_score": 0.85
178
+ }},
179
+ "sample_leads": [lead_id_1, lead_id_2, lead_id_3]
180
+ }},
181
+ ...
182
+ ],
183
+ "silhouette_scores": {{
184
+ "overall": 0.62,
185
+ "by_cluster": [0.71, 0.58, 0.65, ...]
186
+ }},
187
+ "convergence": {{
188
+ "iterations": 47,
189
+ "final_inertia": 1523.45
190
+ }}
191
+ }}
192
+
193
+ IMPORTANT:
194
+ - Generate descriptive names for segments based on cluster characteristics
195
+ - Example: "Segmento 0 - Alto Valor + Alto Engajamento (SP)"
196
+ - Example: "Segmento 1 - Lead Quente + Alta Intenção (RJ)"
197
+ - Return ONLY valid JSON, no explanations
198
+ """
199
+ ```
200
+
201
+ ### 2.2 Features para K-Means
202
+
203
+ ```javascript
204
+ // Features recomendadas para clustering (com base na análise D1)
205
+
206
+ const RECOMMENDED_FEATURES = [
207
+ // Financeiro
208
+ 'ltv_class', // Low/Medium/High (0-1-2)
209
+ 'purchase_value', // valor de compra (null para leads)
210
+
211
+ // Comportamental
212
+ 'behavior_score', // 0-100 (engajamento)
213
+ 'engagement_score', // 0-100 (interações)
214
+ 'intention_level', // 0-100 (probabilidade de compra)
215
+
216
+ // Temporal
217
+ 'days_since_lead', // recência (0-180 dias)
218
+ 'hour_of_day', // 0-23
219
+ 'is_weekend', // 0/1
220
+ 'is_business_hours', // 0/1
221
+
222
+ // Geográfico
223
+ 'country', // one-hot encoding
224
+ 'state', // one-hot encoding
225
+
226
+ // Origem de tráfego
227
+ 'utm_source', // one-hot encoding
228
+ 'utm_medium' // one-hot encoding
229
+ ];
230
+
231
+ const DEFAULT_N_CLUSTERS = 5; // 5 segmentos (configurável)
232
+ ```
233
+
234
+ ---
235
+
236
+ ## Fase 3 — DBSCAN Clustering (Detecção de Anomalias)
237
+
238
+ ### 3.1 Prompt para Workers AI
239
+
240
+ ```python
241
+ PROMPT_DBSCAN = f"""
242
+ You are a Machine Learning expert specializing in anomaly detection.
243
+
244
+ You will receive {n_leads} customers with {features}.
245
+ Your task: Perform DBSCAN clustering to detect outliers and fraudulent patterns.
246
+
247
+ INPUTS:
248
+ - leads: JSON array of customer objects
249
+ - features: list of feature names
250
+ - epsilon: distance threshold (default: 0.3)
251
+ - min_samples: minimum points to form cluster (default: 5)
252
+
253
+ TASK:
254
+ 1. For each lead, calculate distance to {min_samples} nearest neighbors
255
+ 2. Mark as "core point" if >= min_samples neighbors within epsilon
256
+ 3. Mark as "border point" if < min_samples neighbors but reachable from core
257
+ 4. Mark as "outlier" if not reachable from any core point
258
+ 5. Identify clusters and noise (outliers)
259
+
260
+ OUTPUT (JSON only):
261
+ {{
262
+ "total_leads": 500,
263
+ "n_core_points": 450,
264
+ "n_border_points": 30,
265
+ "n_outliers": 20,
266
+ "outliers": [
267
+ {{
268
+ "lead_id": "lead_123",
269
+ "reason": "behavior_score too high (> 95)",
270
+ "risk_score": 0.92,
271
+ "features": {{
272
+ "behavior_score": 98,
273
+ "days_since_lead": 0,
274
+ "unusual_utm_pattern": true
275
+ }}
276
+ }},
277
+ ...
278
+ ],
279
+ "clusters": [
280
+ {{
281
+ "cluster_id": 0,
282
+ "size": 235,
283
+ "density": "high"
284
+ }},
285
+ ...
286
+ ]
287
+ }}
288
+
289
+ OUTLIER PATTERNS TO DETECT:
290
+ - behavior_score > 95 (bot-like behavior)
291
+ - days_since_lead = 0 AND behavior_score > 80 (instant lead, suspicious)
292
+ - unusual utm_source combination (e.g., unknown_source + high_value)
293
+ - geographic mismatch (high_value + unusual location)
294
+
295
+ Return ONLY valid JSON, no explanations.
296
+ """
297
+ ```
298
+
299
+ ---
300
+
301
+ ## Fase 4 — Hierarchical Clustering (Drill-Down)
302
+
303
+ ### 4.1 Prompt para Workers AI
304
+
305
+ ```python
306
+ PROMPT_HIERARCHICAL = f"""
307
+ You are a Machine Learning expert specializing in hierarchical clustering.
308
+
309
+ You will receive {n_leads} customers.
310
+ Your task: Build hierarchical clustering tree from macro to micro segments.
311
+
312
+ INPUTS:
313
+ - leads: JSON array of customer objects
314
+ - features: list of feature names
315
+ - max_depth: maximum tree depth (default: 3)
316
+
317
+ TASK:
318
+ 1. Build binary hierarchical tree (top-down divisive clustering)
319
+ 2. At each level, split cluster into 2 sub-clusters using K-means
320
+ 3. Stop when max_depth reached or cluster size < min_points
321
+ 4. Calculate Silhouette Score at each split
322
+ 5. Prune branches with poor separation (silhouette < 0.3)
323
+
324
+ OUTPUT (JSON only):
325
+ {{
326
+ "tree": {{
327
+ "level_0": {{
328
+ "name": "Todos os Leads",
329
+ "size": 500,
330
+ "children": [
331
+ {{
332
+ "name": "Macro Segmento A (Alto Valor)",
333
+ "size": 180,
334
+ "children": [
335
+ {{
336
+ "name": "Micro Segmento A1 (SP - Alto Valor + Alto Engajamento)",
337
+ "size": 95
338
+ }},
339
+ {{
340
+ "name": "Micro Segmento A2 (RJ - Alto Valor + Médio Engajamento)",
341
+ "size": 85
342
+ }}
343
+ ]
344
+ }},
345
+ {{
346
+ "name": "Macro Segmento B (Leads Quentes)",
347
+ "size": 150,
348
+ "children": [...]
349
+ }}
350
+ ]
351
+ }}
352
+ }},
353
+ "statistics": {{
354
+ "n_levels": 3,
355
+ "n_leaf_clusters": 6,
356
+ "avg_leaf_size": 83.3,
357
+ "best_depth_for_lead": "level_2"
358
+ }}
359
+ }}
360
+
361
+ Return ONLY valid JSON, no explanations.
362
+ """
363
+ ```
364
+
365
+ ---
366
+
367
+ ## Fase 5 — Auto-Interpretação de Segmentos
368
+
369
+ ### 5.1 Geração de Nomes Descritivos
370
+
371
+ ```python
372
+ # Prompt para auto-labeling de segmentos
373
+
374
+ PROMPT_INTERPRETATION = f"""
375
+ You are a marketing intelligence expert.
376
+
377
+ You will receive cluster centroids and characteristics.
378
+ Your task: Generate descriptive, actionable names for each segment.
379
+
380
+ INPUT: Cluster characteristics (avg values per feature)
381
+
382
+ OUTPUT: Descriptive segment name following this pattern:
383
+ "[VALUE_TYPE] [BEHAVIOR_TYPE] [GEO_TYPE]"
384
+
385
+ VALUE TYPES: "Alto Valor", "Médio Valor", "Baixo Valor", "Lead Quente"
386
+ BEHAVIOR TYPES: "Alto Engajamento", "Médio Engajamento", "Baixo Engajamento", "Alta Intenção"
387
+ GEO TYPES: "[UF]", "Sudeste", "Norte", "Internacional"
388
+
389
+ EXAMPLES:
390
+ - ltv=0.9, behavior=0.85, geo=SP → "Segmento 0 - Alto Valor + Alto Engajamento (SP)"
391
+ - ltv=0.7, behavior=0.6, geo=RS → "Segmento 1 - Médio Valor + Médio Engajamento (RJ)"
392
+ - ltv=0.2, behavior=0.3, days=0, utm=tiktok → "Segmento 2 - Lead Quente + Baixo Engajamento (TikTok)"
393
+
394
+ Return ONLY valid JSON with segment names, no explanations.
395
+ """
396
+ ```
397
+
398
+ ---
399
+
400
+ ## Fase 6 — Integração com D1 (Persistência)
401
+
402
+ ### 6.1 Schema da Tabela ml_segments
403
+
404
+ ```sql
405
+ CREATE TABLE IF NOT EXISTS ml_segments (
406
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
407
+ cluster_id INTEGER NOT NULL,
408
+ cluster_name TEXT NOT NULL,
409
+ clustering_algorithm TEXT NOT NULL, -- 'kmeans', 'dbscan', 'hierarchical'
410
+ created_at TEXT DEFAULT (datetime('now')),
411
+ updated_at TEXT DEFAULT (datetime('now')),
412
+
413
+ -- Estatísticas do cluster
414
+ size INTEGER NOT NULL,
415
+ percentage REAL NOT NULL,
416
+
417
+ -- Características médias
418
+ avg_ltv REAL,
419
+ avg_behavior_score REAL,
420
+ avg_engagement_score REAL,
421
+ avg_intention_level REAL,
422
+
423
+ -- Características dominantes
424
+ dominant_countries TEXT, -- JSON array: ["BR", "US"]
425
+ dominant_states TEXT, -- JSON array: ["SP", "RJ"]
426
+ dominant_utm_sources TEXT, -- JSON array: ["facebook", "google"]
427
+ dominant_features TEXT, -- JSON array: ["ltv", "behavior_score"]
428
+
429
+ -- Métricas de qualidade
430
+ silhouette_score REAL,
431
+ cohesion REAL,
432
+ separation REAL,
433
+
434
+ -- Recomendações
435
+ action_recommendations TEXT, -- JSON array
436
+ bid_recommendations TEXT, -- JSON array
437
+ campaign_recommendations TEXT -- JSON array
438
+ );
439
+
440
+ -- Índices para performance
441
+ CREATE INDEX IF NOT EXISTS idx_ml_segments_created ON ml_segments(created_at);
442
+ CREATE INDEX IF NOT EXISTS idx_ml_segments_cluster ON ml_segments(cluster_id);
443
+ CREATE INDEX IF NOT EXISTS idx_ml_segments_algorithm ON ml_segments(clustering_algorithm);
444
+
445
+ -- Tabela de associação lead ↔ segmento
446
+ CREATE TABLE IF NOT EXISTS ml_segment_members (
447
+ lead_id TEXT NOT NULL,
448
+ cluster_id INTEGER NOT NULL,
449
+ confidence REAL NOT NULL, -- 0-1 (quanto perto do centroide)
450
+ updated_at TEXT DEFAULT (datetime('now')),
451
+ PRIMARY KEY (lead_id, cluster_id, clustering_algorithm)
452
+ );
453
+
454
+ CREATE INDEX IF NOT EXISTS idx_ml_segment_members_cluster ON ml_segment_members(cluster_id);
455
+ CREATE INDEX IF NOT EXISTS idx_ml_segment_members_lead ON ml_segment_members(lead_id);
456
+ ```
457
+
458
+ ---
459
+
460
+ ## Fase 7 — Exposição de API REST
461
+
462
+ ### 7.1 Endpoint de Clustering
463
+
464
+ ```javascript
465
+ // server-edge-tracker/functions/api/segmentation/cluster.ts
466
+
467
+ export async function onRequestGet(context: EventContext<Env>) {
468
+ const { searchParams } = new URL(context.request.url);
469
+
470
+ const algorithm = searchParams.get('algorithm') || 'kmeans';
471
+ const nClusters = parseInt(searchParams.get('n_clusters') || '5');
472
+ const clientVertical = searchParams.get('vertical') || 'general';
473
+
474
+ // Extrair leads históricos do D1
475
+ const leads = await context.env.DB.prepare(`
476
+ SELECT id, ltv_class, behavior_score, engagement_score, intention_level,
477
+ days_since_lead, hour_of_day, is_weekend, is_business_hours,
478
+ country, state, utm_source, utm_medium
479
+ FROM leads
480
+ WHERE created_at >= datetime('now', '-6 months')
481
+ ORDER BY created_at DESC
482
+ `).bind().all();
483
+
484
+ // Feature Engineering
485
+ const features = extractFeatures(leads);
486
+
487
+ // Clustering via Workers AI
488
+ const clusters = await context.env.AI.run(
489
+ '@cf/meta/llama-3.1-8b-instruct',
490
+ { messages: [{ role: 'user', content: getClusteringPrompt(features, nClusters) }] }
491
+ );
492
+
493
+ // Persistir no D1
494
+ await saveClusters(context.env.DB, clusters, algorithm);
495
+
496
+ return Response.json({
497
+ success: true,
498
+ algorithm,
499
+ n_clusters: nClusters,
500
+ clusters: JSON.parse(clusters.response),
501
+ generated_at: new Date().toISOString()
502
+ });
503
+ }
504
+ ```
505
+
506
+ ### 7.2 Endpoint de Consulta de Segmentos
507
+
508
+ ```javascript
509
+ // server-edge-tracker/functions/api/segmentation/list.ts
510
+
511
+ export async function onRequestGet(context: EventContext<Env>) {
512
+ const clusters = await context.env.DB.prepare(`
513
+ SELECT id, cluster_id, cluster_name, clustering_algorithm, size, percentage,
514
+ avg_ltv, avg_behavior_score, avg_engagement_score,
515
+ dominant_countries, dominant_states, dominant_utm_sources,
516
+ silhouette_score, action_recommendations
517
+ FROM ml_segments
518
+ ORDER BY created_at DESC
519
+ LIMIT 10
520
+ `).bind().all();
521
+
522
+ return Response.json({
523
+ success: true,
524
+ clusters: clusters.map(c => ({
525
+ ...c,
526
+ dominant_countries: JSON.parse(c.dominant_countries || '[]'),
527
+ dominant_states: JSON.parse(c.dominant_states || '[]'),
528
+ dominant_utm_sources: JSON.parse(c.dominant_utm_sources || '[]'),
529
+ action_recommendations: JSON.parse(c.action_recommendations || '[]'),
530
+ bid_recommendations: JSON.parse(c.bid_recommendations || '[]'),
531
+ campaign_recommendations: JSON.parse(c.campaign_recommendations || '[]')
532
+ }))
533
+ });
534
+ }
535
+ ```
536
+
537
+ ### 7.3 Endpoint de Anomalias (DBSCAN)
538
+
539
+ ```javascript
540
+ // server-edge-tracker/functions/api/segmentation/outliers.ts
541
+
542
+ export async function onRequestGet(context: EventContext<Env>) {
543
+ const outliers = await context.env.DB.prepare(`
544
+ SELECT l.id, l.email, l.behavior_score, l.days_since_lead,
545
+ l.country, l.state, l.utm_source, l.utm_medium,
546
+ sm.risk_score, sm.reason
547
+ FROM ml_segment_members sm
548
+ INNER JOIN leads l ON sm.lead_id = l.id
549
+ WHERE sm.clustering_algorithm = 'dbscan'
550
+ AND sm.confidence < 0.5 -- low confidence = anomaly
551
+ ORDER BY sm.updated_at DESC
552
+ LIMIT 50
553
+ `).bind().all();
554
+
555
+ return Response.json({
556
+ success: true,
557
+ outliers: outliers,
558
+ total: outliers.length,
559
+ generated_at: new Date().toISOString()
560
+ });
561
+ }
562
+ ```
563
+
564
+ ---
565
+
566
+ ## Inputs Recebidos do Orquestrador
567
+
568
+ ### Parâmetros de Configuração
569
+
570
+ ```json
571
+ {
572
+ "client_vertical": "curso-online",
573
+ "features_to_use": ["ltv", "behavior_score", "engagement_score", "geo", "time"],
574
+ "clustering_algorithms": ["kmeans", "dbscan", "hierarchical"],
575
+ "default_n_clusters": 5,
576
+ "update_frequency": "weekly", // re-clustering automático
577
+ "min_data_points": 100, // mínimo de leads para clustering
578
+ "max_data_age_months": 6 // usar apenas últimos 6 meses
579
+ }
580
+ ```
581
+
582
+ ---
583
+
584
+ ## Outputs para o Server Architect
585
+
586
+ ### Arquivos Criados
587
+
588
+ ```json
589
+ {
590
+ "endpoints": [
591
+ "functions/api/segmentation/cluster.ts",
592
+ "functions/api/segmentation/list.ts",
593
+ "functions/api/segmentation/outliers.ts"
594
+ ],
595
+ "database": [
596
+ "server-edge-tracker/schema-segmentation.sql"
597
+ ],
598
+ "documentation": [
599
+ "server-edge-tracker/SEGMENTATION-DOCS.md"
600
+ ],
601
+ "integration_points": [
602
+ "LTV Predictor Agent: enrich predictions with cluster_id",
603
+ "Dashboard Agent: visualize segments in charts",
604
+ "Attribution Agent: segment-level attribution"
605
+ ]
606
+ }
607
+ ```
608
+
609
+ ### API Contracts
610
+
611
+ ```typescript
612
+ interface SegmentationAPI {
613
+ // Criar novos clusters
614
+ 'POST /api/segmentation/cluster': {
615
+ algorithm: 'kmeans' | 'dbscan' | 'hierarchical';
616
+ n_clusters?: number;
617
+ features?: string[];
618
+ client_vertical?: string;
619
+ };
620
+
621
+ // Listar clusters existentes
622
+ 'GET /api/segmentation/list': {
623
+ limit?: number;
624
+ algorithm?: string;
625
+ };
626
+
627
+ // Consultar outliers/anomalias
628
+ 'GET /api/segmentation/outliers': {
629
+ limit?: number;
630
+ confidence_threshold?: number; // < 0.5 = high risk
631
+ };
632
+
633
+ // Atualizar segmentos
634
+ 'PUT /api/segmentation/update': {
635
+ cluster_id: number;
636
+ action_recommendations: string[];
637
+ bid_recommendations: string[];
638
+ };
639
+ }
640
+ ```
641
+
642
+ ---
643
+
644
+ ## Integração com outros agentes
645
+
646
+ | Quando | Agente |
647
+ |---|---|
648
+ | Após gerar clusters | → **Dashboard Agent** (visualização em gráficos) |
649
+ | Segmentos criados | → **LTV Predictor Agent** (enricher predições com cluster_id) |
650
+ | Outliers detectados | → **Security Enterprise Agent** (bloqueio automático) |
651
+ | Recomendações geradas | → **Attribution Agent** (attribution por segmento) |
652
+ | Clusters atualizados | → **Meta Agent** (campanhas segmentadas) |
653
+ | Re-clustering semanal | → **Intelligence Scheduling** (cron automático) |
654
+
655
+ ---
656
+
657
+ ## Checklist de Conclusão
658
+
659
+ ```
660
+ [ ] Feature Engineering Pipeline implementada
661
+ [ ] K-means Clustering via Workers AI
662
+ [ ] DBSCAN Clustering para anomalias
663
+ [ ] Hierarchical Clustering (drill-down)
664
+ [ ] Auto-Interpretação de segmentos
665
+ [ ] Schema D1 criado (ml_segments + ml_segment_members)
666
+ [ ] API REST exposta (/api/segmentation/*)
667
+ [ ] Integração com LTV Predictor Agent
668
+ [ ] Integração com Dashboard Agent
669
+ [ ] Integração com Security Enterprise Agent
670
+ [ ] Documentação completa criada
671
+ ```
672
+
673
+ ---
674
+
675
+ ## Troubleshooting
676
+
677
+ | Problema | Causa | Solução |
678
+ |---|---|---|
679
+ | `Clusters vazios` | Menos de `min_data_points` no D1 | Aumentar `max_data_age_months` ou aguardar mais dados |
680
+ | `Silhouette Score < 0.3` | Clusters não são separáveis | Aumentar `n_clusters` ou usar features melhores |
681
+ | `Outliers excessivos` | Epsilon/MinPts muito agressivos no DBSCAN | Ajustar parâmetros de detecção de anomalias |
682
+ | `Workers AI timeout` | Prompt muito longo ou muitos dados | Dividir em batches de 100-200 leads por request |
683
+
684
+ ---
685
+
686
+ ## Exemplos de Uso
687
+
688
+ ### Caso 1: Segmentação Básica
689
+
690
+ ```bash
691
+ curl -X POST "https://seudominio.com/api/segmentation/cluster" \
692
+ -H "Content-Type: application/json" \
693
+ -d '{
694
+ "algorithm": "kmeans",
695
+ "n_clusters": 5,
696
+ "client_vertical": "curso-online"
697
+ }'
698
+
699
+ # Retorna:
700
+ {
701
+ "clusters": [
702
+ {
703
+ "cluster_id": 0,
704
+ "name": "Segmento 0 - Alto Valor + Alto Engajamento (SP)",
705
+ "size": 95,
706
+ "avg_ltv": 497.50,
707
+ "action_recommendations": [
708
+ "Priorizar remarketing em 24h",
709
+ "Criar lookalike audience de alto valor"
710
+ ]
711
+ },
712
+ ...
713
+ ]
714
+ }
715
+ ```
716
+
717
+ ### Caso 2: Detecção de Anomalias
718
+
719
+ ```bash
720
+ curl "https://seudominio.com/api/segmentation/outliers?limit=20"
721
+
722
+ # Retorna:
723
+ {
724
+ "outliers": [
725
+ {
726
+ "lead_id": "lead_123",
727
+ "risk_score": 0.92,
728
+ "reason": "behavior_score too high (> 95), suspicious bot activity"
729
+ }
730
+ ]
731
+ }
732
+ ```
733
+
734
+ ---
735
+
736
+ *ML Clustering Agent v1.0 — Segmentação Dinâmica ML*
737
+ *Versão: 1.0 — Criado em: 9 de Abril de 2026*
738
+ *Status: Ready para implementação*