npm - cdp-edge - Versions diffs - 1.2.2 → 1.4.0 - Mend

cdp-edge 1.2.2 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (141) hide show

package/extracted-skill/tracking-events-generator/agents/ml-clustering-agent.md ADDED Viewed

@@ -0,0 +1,769 @@
+# ML Clustering Agent — CDP Edge Quantum Tier
+## Identidade
+**Agente:** ML Clustering Agent
+**Papel:** Especialista em Segmentação Dinâmica via Machine Learning
+**Nível:** Deus (Quantum Tier) — Especialista em IA Clustering
+---
+## Por que isso importa
+| Situação Atual | Com Segmentação Dinâmica ML |
+|---|---|
+| Segmentos estáticos (tags manuais) | Grupos automáticos que evoluem com novos dados |
+| Campanhas genéricas para todos | +40% relevância de campanhas por segmento |
+| Insights manuais de comportamento | Clusters ML descobrem padrões ocultos |
+| CTR genérico por vertical | Otimização específica por cluster de comportamento |
+---
+## O que este agente configura
+```
+Segmentação Dinâmica ML
+├── K-means Clustering (Workers AI)
+│   ├── Feature engineering: LTV, behavior, geo, time
+│   ├── N clusters configurável (3-10)
+│   ├── Auto-labeling de segmentos (interpretação ML)
+│   └── Persistência em D1 (tabela ml_segments)
+│
+├── DBSCAN Clustering (anomalias/fraude)
+│   ├── Detecção de leads anômalos (outliers)
+│   ├── Epsilon e MinPts configuráveis
+│   ├── Marcação automática de "suspicious"
+│   └── Integração com Security Agent
+│
+├── Hierarchical Clustering (níveis de segmentos)
+│   ├── Dendrograma de leads (similaridade hierárquica)
+│   ├── Auto-seleção de k (silhouette score)
+│   ├── Níveis de granularidade (macro → micro)
+│   └── Navegação drill-down em clusters
+│
+├── Feature Engineering Pipeline
+│   ├── Normalização (min-max, z-score)
+│   ├── Encoding (one-hot para categoricos)
+│   ├── Feature selection (importância)
+│   └── Time-based features (dias desde lead, hora do dia)
+│
+└── Auto-Interpretation de Segmentos
+    ├── Geração de nomes descritivos (ex: "Alto Valor + Alto Engajamento")
+    ├── Características dominantes (top features por cluster)
+    ├── Distribuição de métricas (avg LTV, std behavior)
+    └── Recomendações de ação por segmento
+```
+---
+## Pré-requisitos
+- **Workers AI**: Binding `env.AI` habilitado no wrangler.toml
+- **D1 Database**: Tabela `leads` com dados históricos (últimos 6 meses)
+- **Server Architect**: Integrar endpoints na rota `/api/segmentation/*`
+- **Feature Engineering**: Pipeline pronta para normalização/encoding
+---
+## Fase 1 — Feature Engineering Pipeline
+### 1.1 Extração de Features do D1
+Consultar leads históricos e extrair features numéricas/categóricas:
+```sql
+-- Features numéricas
+SELECT
+  ltv_class AS ltv_numeric,           -- 0=Low, 1=Medium, 2=High
+  behavior_score AS behavior_numeric,        -- 0-100
+  engagement_score AS engagement_numeric,  -- 0-100
+  intention_level AS intention_numeric,      -- 0-100
+  bot_score AS bot_numeric,               -- 0-100 (inverso: 100=humano)
+  value AS purchase_value,                -- valor de compra (null para leads)
+  currency_value AS currency_numeric,       -- 1.0 para BRL
+-- Features de tempo
+FROM leads
+WHERE created_at >= datetime('now', '-6 months')
+-- Features categóricas (para one-hot encoding)
+SELECT DISTINCT
+  country,           -- BR, US, AR
+  state,            -- SP, RJ, MG
+  geo_timezone,      -- America/Sao_Paulo, America/New_York
+  utm_source,       -- facebook, google, tiktok
+  utm_medium,       -- cpc, organic, social
+FROM leads
+WHERE created_at >= datetime('now', '-6 months')
+```
+### 1.2 Normalização de Features
+```python
+# Exemplo de normalização (implementado em Workers AI)
+# Min-Max Normalization (0-1 range)
+normalized_value = (value - min) / (max - min)
+# Z-Score Normalization (mean=0, std=1)
+normalized_value = (value - mean) / std
+# One-Hot Encoding para categóricos
+country_BR = [1, 0, 0, 0, 0]
+country_US = [0, 1, 0, 0, 0]
+country_AR = [0, 0, 1, 0, 0]
+```
+### 1.3 Time-Based Features
+```python
+# Features temporais para clustering
+days_since_lead = (now - created_at).days
+hour_of_day = created_at.hour
+day_of_week = created_at.weekday()  # 0=Segunda, 6=Domingo
+is_weekend = 1 if day_of_week in [5, 6] else 0
+is_business_hours = 1 if 9 <= hour_of_day <= 18 else 0
+```
+---
+## Fase 2 — K-Means Vetorial Real (embeddinggemma-300m + K-means em JS)
+> **Arquitetura atual:** O clustering não usa LLM para fazer os cálculos matemáticos.
+> Em vez disso, usa **embeddings semânticos reais** + **K-means implementado em JavaScript**,
+> com o Granite usado **apenas para nomear** os clusters resultantes.
+### 2.1 Pipeline de Clustering
+```
+100 leads (sample) → perfil textual → embeddinggemma-300m → vetores 768d
+                                                                   ↓
+                                                    K-means++ (cosine distance, JS puro)
+                                                                   ↓
+                                               silhouette score real calculado em JS
+                                                                   ↓
+                                   Granite 4.0 Micro nomeia cada cluster (1 call de LLM)
+```
+### 2.2 Modelos Workers AI utilizados
+| Modelo | ID | Uso |
+|---|---|---|
+| **Granite 4.0 Micro** | `@cf/ibm-granite/granite-4.0-h-micro` | LTV Prediction + Naming de clusters |
+| **EmbeddingGemma 300M** | `@cf/baai/bge-m3` | Embeddings semânticos para K-means |
+### 2.3 Perfil textual por lead (input para embedding)
+```typescript
+function _buildLeadProfile(l) {
+  return [
+    `LTV: ${l.predicted_ltv_class || 'desconhecido'}`,
+    `engajamento: ${Math.round(l.engagement_score || 0)}`,
+    `intenção: ${l.intention_level || 'desconhecida'}`,
+    `origem: ${l.utm_source || 'direto'}`,
+    `canal: ${l.utm_medium || 'desconhecido'}`,
+    `país: ${l.country || 'BR'}`,
+    `hora: ${l.hour_of_day || 12}h`,
+    (l.is_weekend ? 'fim-de-semana' : 'dia-útil'),
+    `recência: ${l.days_since_lead || 0} dias`,
+  ].filter(Boolean).join(', ');
+}
+```
+### 2.4 Chamada de embeddings em batch
+```typescript
+// Embeds até 100 perfis em uma única chamada
+const embRes = await env.AI.run('@cf/baai/bge-m3', { text: profiles });
+const vectors = embRes.data; // float32[][] — shape [N, 768]
+```
+### 2.5 K-means vetorial (cosine distance)
+```typescript
+// Inicialização K-means++ → iterações até convergência → assignments finais
+const { assignments } = _kmeansRun(vectors, nClusters); // implementado em index.ts
+const silhouetteScore = _silhouette(vectors, assignments, nClusters); // score real
+```
+### 2.6 Naming dos clusters via Granite (único uso de LLM)
+```typescript
+// Granite recebe apenas as estatísticas agregadas por cluster
+// Retorna nome descritivo + recomendação de campanha em português
+const nameRes = await env.AI.run('@cf/ibm-granite/granite-4.0-h-micro', {
+  messages: [{ role: 'user', content: namingPrompt }],
+  max_tokens: 800
+});
+```
+### 2.2 Features para K-Means
+```typescript
+// Features recomendadas para clustering (com base na análise D1)
+const RECOMMENDED_FEATURES = [
+  // Financeiro
+  'ltv_class',           // Low/Medium/High (0-1-2)
+  'purchase_value',       // valor de compra (null para leads)
+  // Comportamental
+  'behavior_score',       // 0-100 (engajamento)
+  'engagement_score',     // 0-100 (interações)
+  'intention_level',      // 0-100 (probabilidade de compra)
+  // Temporal
+  'days_since_lead',      // recência (0-180 dias)
+  'hour_of_day',         // 0-23
+  'is_weekend',          // 0/1
+  'is_business_hours',     // 0/1
+  // Geográfico
+  'country',              // one-hot encoding
+  'state',               // one-hot encoding
+  // Origem de tráfego
+  'utm_source',           // one-hot encoding
+  'utm_medium'           // one-hot encoding
+];
+const DEFAULT_N_CLUSTERS = 5;  // 5 segmentos (configurável)
+```
+---
+## Fase 3 — DBSCAN Clustering (Detecção de Anomalias)
+### 3.1 Prompt para Workers AI
+```python
+PROMPT_DBSCAN = f"""
+You are a Machine Learning expert specializing in anomaly detection.
+You will receive {n_leads} customers with {features}.
+Your task: Perform DBSCAN clustering to detect outliers and fraudulent patterns.
+INPUTS:
+- leads: JSON array of customer objects
+- features: list of feature names
+- epsilon: distance threshold (default: 0.3)
+- min_samples: minimum points to form cluster (default: 5)
+TASK:
+1. For each lead, calculate distance to {min_samples} nearest neighbors
+2. Mark as "core point" if >= min_samples neighbors within epsilon
+3. Mark as "border point" if < min_samples neighbors but reachable from core
+4. Mark as "outlier" if not reachable from any core point
+5. Identify clusters and noise (outliers)
+OUTPUT (JSON only):
+{{
+  "total_leads": 500,
+  "n_core_points": 450,
+  "n_border_points": 30,
+  "n_outliers": 20,
+  "outliers": [
+    {{
+      "lead_id": "lead_123",
+      "reason": "behavior_score too high (> 95)",
+      "risk_score": 0.92,
+      "features": {{
+        "behavior_score": 98,
+        "days_since_lead": 0,
+        "unusual_utm_pattern": true
+      }}
+    }},
+    ...
+  ],
+  "clusters": [
+    {{
+      "cluster_id": 0,
+      "size": 235,
+      "density": "high"
+    }},
+    ...
+  ]
+}}
+OUTLIER PATTERNS TO DETECT:
+- behavior_score > 95 (bot-like behavior)
+- days_since_lead = 0 AND behavior_score > 80 (instant lead, suspicious)
+- unusual utm_source combination (e.g., unknown_source + high_value)
+- geographic mismatch (high_value + unusual location)
+Return ONLY valid JSON, no explanations.
+"""
+```
+---
+## Fase 4 — Hierarchical Clustering (Drill-Down)
+### 4.1 Prompt para Workers AI
+```python
+PROMPT_HIERARCHICAL = f"""
+You are a Machine Learning expert specializing in hierarchical clustering.
+You will receive {n_leads} customers.
+Your task: Build hierarchical clustering tree from macro to micro segments.
+INPUTS:
+- leads: JSON array of customer objects
+- features: list of feature names
+- max_depth: maximum tree depth (default: 3)
+TASK:
+1. Build binary hierarchical tree (top-down divisive clustering)
+2. At each level, split cluster into 2 sub-clusters using K-means
+3. Stop when max_depth reached or cluster size < min_points
+4. Calculate Silhouette Score at each split
+5. Prune branches with poor separation (silhouette < 0.3)
+OUTPUT (JSON only):
+{{
+  "tree": {{
+    "level_0": {{
+      "name": "Todos os Leads",
+      "size": 500,
+      "children": [
+        {{
+          "name": "Macro Segmento A (Alto Valor)",
+          "size": 180,
+          "children": [
+            {{
+              "name": "Micro Segmento A1 (SP - Alto Valor + Alto Engajamento)",
+              "size": 95
+            }},
+            {{
+              "name": "Micro Segmento A2 (RJ - Alto Valor + Médio Engajamento)",
+              "size": 85
+            }}
+          ]
+        }},
+        {{
+          "name": "Macro Segmento B (Leads Quentes)",
+          "size": 150,
+          "children": [...]
+        }}
+      ]
+    }}
+  }},
+  "statistics": {{
+    "n_levels": 3,
+    "n_leaf_clusters": 6,
+    "avg_leaf_size": 83.3,
+    "best_depth_for_lead": "level_2"
+  }}
+}}
+Return ONLY valid JSON, no explanations.
+"""
+```
+---
+## Fase 5 — Auto-Interpretação de Segmentos
+### 5.1 Geração de Nomes Descritivos
+```python
+# Prompt para auto-labeling de segmentos
+PROMPT_INTERPRETATION = f"""
+You are a marketing intelligence expert.
+You will receive cluster centroids and characteristics.
+Your task: Generate descriptive, actionable names for each segment.
+INPUT: Cluster characteristics (avg values per feature)
+OUTPUT: Descriptive segment name following this pattern:
+"[VALUE_TYPE] [BEHAVIOR_TYPE] [GEO_TYPE]"
+VALUE TYPES: "Alto Valor", "Médio Valor", "Baixo Valor", "Lead Quente"
+BEHAVIOR TYPES: "Alto Engajamento", "Médio Engajamento", "Baixo Engajamento", "Alta Intenção"
+GEO TYPES: "[UF]", "Sudeste", "Norte", "Internacional"
+EXAMPLES:
+- ltv=0.9, behavior=0.85, geo=SP → "Segmento 0 - Alto Valor + Alto Engajamento (SP)"
+- ltv=0.7, behavior=0.6, geo=RS → "Segmento 1 - Médio Valor + Médio Engajamento (RJ)"
+- ltv=0.2, behavior=0.3, days=0, utm=tiktok → "Segmento 2 - Lead Quente + Baixo Engajamento (TikTok)"
+Return ONLY valid JSON with segment names, no explanations.
+"""
+```
+---
+## Fase 6 — Integração com D1 (Persistência)
+### 6.1 Schema da Tabela ml_segments
+```sql
+CREATE TABLE IF NOT EXISTS ml_segments (
+  id INTEGER PRIMARY KEY AUTOINCREMENT,
+  cluster_id INTEGER NOT NULL,
+  cluster_name TEXT NOT NULL,
+  clustering_algorithm TEXT NOT NULL,  -- 'kmeans', 'dbscan', 'hierarchical'
+  created_at TEXT DEFAULT (datetime('now')),
+  updated_at TEXT DEFAULT (datetime('now')),
+  -- Estatísticas do cluster
+  size INTEGER NOT NULL,
+  percentage REAL NOT NULL,
+  -- Características médias
+  avg_ltv REAL,
+  avg_behavior_score REAL,
+  avg_engagement_score REAL,
+  avg_intention_level REAL,
+  -- Características dominantes
+  dominant_countries TEXT,        -- JSON array: ["BR", "US"]
+  dominant_states TEXT,            -- JSON array: ["SP", "RJ"]
+  dominant_utm_sources TEXT,        -- JSON array: ["facebook", "google"]
+  dominant_features TEXT,            -- JSON array: ["ltv", "behavior_score"]
+  -- Métricas de qualidade
+  silhouette_score REAL,
+  cohesion REAL,
+  separation REAL,
+  -- Recomendações
+  action_recommendations TEXT,        -- JSON array
+  bid_recommendations TEXT,            -- JSON array
+  campaign_recommendations TEXT         -- JSON array
+);
+-- Índices para performance
+CREATE INDEX IF NOT EXISTS idx_ml_segments_created ON ml_segments(created_at);
+CREATE INDEX IF NOT EXISTS idx_ml_segments_cluster ON ml_segments(cluster_id);
+CREATE INDEX IF NOT EXISTS idx_ml_segments_algorithm ON ml_segments(clustering_algorithm);
+-- Tabela de associação lead ↔ segmento
+CREATE TABLE IF NOT EXISTS ml_segment_members (
+  lead_id TEXT NOT NULL,
+  cluster_id INTEGER NOT NULL,
+  confidence REAL NOT NULL,           -- 0-1 (quanto perto do centroide)
+  updated_at TEXT DEFAULT (datetime('now')),
+  PRIMARY KEY (lead_id, cluster_id, clustering_algorithm)
+);
+CREATE INDEX IF NOT EXISTS idx_ml_segment_members_cluster ON ml_segment_members(cluster_id);
+CREATE INDEX IF NOT EXISTS idx_ml_segment_members_lead ON ml_segment_members(lead_id);
+```
+---
+## Fase 7 — Exposição de API REST
+### 7.1 Endpoint de Clustering
+```typescript
+// server-edge-tracker/functions/api/segmentation/cluster.ts
+export async function onRequestGet(context: EventContext<Env>) {
+  const { searchParams } = new URL(context.request.url);
+  const algorithm = searchParams.get('algorithm') || 'kmeans';
+  const nClusters = parseInt(searchParams.get('n_clusters') || '5');
+  const clientVertical = searchParams.get('vertical') || 'general';
+  // Extrair leads históricos do D1
+  const leads = await context.env.DB.prepare(`
+    SELECT id, ltv_class, behavior_score, engagement_score, intention_level,
+           days_since_lead, hour_of_day, is_weekend, is_business_hours,
+           country, state, utm_source, utm_medium
+    FROM leads
+    WHERE created_at >= datetime('now', '-6 months')
+    ORDER BY created_at DESC
+  `).bind().all();
+  // Feature Engineering
+  const features = extractFeatures(leads);
+  // 1. Embeddings reais via embeddinggemma-300m
+  const profiles = sample.map(_buildLeadProfile);
+  const embRes   = await context.env.AI.run('@cf/baai/bge-m3', { text: profiles });
+  const vectors  = embRes.data; // vetores 768d
+  // 2. K-means vetorial real (JS puro, cosine distance)
+  const { assignments } = _kmeansRun(vectors, nClusters);
+  const silhouetteScore = _silhouette(vectors, assignments, nClusters);
+  // 3. Granite apenas para nomear clusters
+  const nameRes  = await context.env.AI.run('@cf/ibm-granite/granite-4.0-h-micro',
+    { messages: [{ role: 'user', content: getNamingPrompt(clusterStats) }], max_tokens: 800 }
+  );
+  // 4. Persistir no D1
+  await saveClusters(context.env.DB, clusters, algorithm);
+  return Response.json({
+    success: true,
+    algorithm,
+    engine: 'embeddinggemma-300m + kmeans vetorial',
+    n_clusters: nClusters,
+    silhouette_score: silhouetteScore,
+    clusters,
+    generated_at: new Date().toISOString()
+  });
+}
+```
+### 7.2 Endpoint de Consulta de Segmentos
+```typescript
+// server-edge-tracker/functions/api/segmentation/list.ts
+export async function onRequestGet(context: EventContext<Env>) {
+  const clusters = await context.env.DB.prepare(`
+    SELECT id, cluster_id, cluster_name, clustering_algorithm, size, percentage,
+           avg_ltv, avg_behavior_score, avg_engagement_score,
+           dominant_countries, dominant_states, dominant_utm_sources,
+           silhouette_score, action_recommendations
+    FROM ml_segments
+    ORDER BY created_at DESC
+    LIMIT 10
+  `).bind().all();
+  return Response.json({
+    success: true,
+    clusters: clusters.map(c => ({
+      ...c,
+      dominant_countries: JSON.parse(c.dominant_countries || '[]'),
+      dominant_states: JSON.parse(c.dominant_states || '[]'),
+      dominant_utm_sources: JSON.parse(c.dominant_utm_sources || '[]'),
+      action_recommendations: JSON.parse(c.action_recommendations || '[]'),
+      bid_recommendations: JSON.parse(c.bid_recommendations || '[]'),
+      campaign_recommendations: JSON.parse(c.campaign_recommendations || '[]')
+    }))
+  });
+}
+```
+### 7.3 Endpoint de Anomalias (DBSCAN)
+```typescript
+// server-edge-tracker/functions/api/segmentation/outliers.ts
+export async function onRequestGet(context: EventContext<Env>) {
+  const outliers = await context.env.DB.prepare(`
+    SELECT l.id, l.email, l.behavior_score, l.days_since_lead,
+           l.country, l.state, l.utm_source, l.utm_medium,
+           sm.risk_score, sm.reason
+    FROM ml_segment_members sm
+    INNER JOIN leads l ON sm.lead_id = l.id
+    WHERE sm.clustering_algorithm = 'dbscan'
+      AND sm.confidence < 0.5  -- low confidence = anomaly
+    ORDER BY sm.updated_at DESC
+    LIMIT 50
+  `).bind().all();
+  return Response.json({
+    success: true,
+    outliers: outliers,
+    total: outliers.length,
+    generated_at: new Date().toISOString()
+  });
+}
+```
+---
+## Inputs Recebidos do Orquestrador
+### Parâmetros de Configuração
+```json
+{
+  "client_vertical": "curso-online",
+  "features_to_use": ["ltv", "behavior_score", "engagement_score", "geo", "time"],
+  "clustering_algorithms": ["kmeans", "dbscan", "hierarchical"],
+  "default_n_clusters": 5,
+  "update_frequency": "weekly",  // re-clustering automático
+  "min_data_points": 100,  // mínimo de leads para clustering
+  "max_data_age_months": 6  // usar apenas últimos 6 meses
+}
+```
+---
+## Outputs para o Server Architect
+### Arquivos Criados
+```json
+{
+  "endpoints": [
+    "functions/api/segmentation/cluster.ts",
+    "functions/api/segmentation/list.ts",
+    "functions/api/segmentation/outliers.ts"
+  ],
+  "database": [
+    "server-edge-tracker/schema-segmentation.sql"
+  ],
+  "documentation": [
+    "server-edge-tracker/SEGMENTATION-DOCS.md"
+  ],
+  "integration_points": [
+    "LTV Predictor Agent: enrich predictions with cluster_id",
+    "Dashboard Agent: visualize segments in charts",
+    "Attribution Agent: segment-level attribution"
+  ]
+}
+```
+### API Contracts
+```typescript
+interface SegmentationAPI {
+  // Criar novos clusters
+  'POST /api/segmentation/cluster': {
+    algorithm: 'kmeans' | 'dbscan' | 'hierarchical';
+    n_clusters?: number;
+    features?: string[];
+    client_vertical?: string;
+  };
+  // Listar clusters existentes
+  'GET /api/segmentation/list': {
+    limit?: number;
+    algorithm?: string;
+  };
+  // Consultar outliers/anomalias
+  'GET /api/segmentation/outliers': {
+    limit?: number;
+    confidence_threshold?: number;  // < 0.5 = high risk
+  };
+  // Atualizar segmentos
+  'PUT /api/segmentation/update': {
+    cluster_id: number;
+    action_recommendations: string[];
+    bid_recommendations: string[];
+  };
+}
+```
+---
+## Integração com outros agentes
+| Quando | Agente |
+|---|---|
+| Após gerar clusters | → **Dashboard Agent** (visualização em gráficos) |
+| Segmentos criados | → **LTV Predictor Agent** (enricher predições com cluster_id) |
+| Outliers detectados | → **Security Enterprise Agent** (bloqueio automático) |
+| Recomendações geradas | → **Attribution Agent** (attribution por segmento) |
+| Clusters atualizados | → **Meta Agent** (campanhas segmentadas) |
+| Re-clustering semanal | → **Intelligence Scheduling** (cron automático) |
+---
+## Checklist de Conclusão
+```
+[ ] Feature Engineering Pipeline implementada
+[ ] K-means Clustering vetorial (embeddinggemma-300m + JS)
+[ ] DBSCAN Clustering para anomalias
+[ ] Hierarchical Clustering (drill-down)
+[ ] Auto-Interpretação de segmentos
+[ ] Schema D1 criado (ml_segments + ml_segment_members)
+[ ] API REST exposta (/api/segmentation/*)
+[ ] Integração com LTV Predictor Agent
+[ ] Integração com Dashboard Agent
+[ ] Integração com Security Enterprise Agent
+[ ] Documentação completa criada
+```
+---
+## Troubleshooting
+| Problema | Causa | Solução |
+|---|---|---|
+| `Clusters vazios` | Menos de `min_data_points` no D1 | Aumentar `max_data_age_months` ou aguardar mais dados |
+| `Silhouette Score < 0.3` | Clusters não são separáveis | Aumentar `n_clusters` ou usar features melhores |
+| `Outliers excessivos` | Epsilon/MinPts muito agressivos no DBSCAN | Ajustar parâmetros de detecção de anomalias |
+| `embeddinggemma timeout` | Batch maior que 100 perfis | Limitar sample a 100 leads (padrão atual) |
+| `vectors insuficientes` | embeddinggemma retornou menos vetores que nClusters | Reduzir nClusters ou verificar resposta da API |
+---
+## Exemplos de Uso
+### Caso 1: Segmentação Básica
+```bash
+curl -X POST "https://seudominio.com/api/segmentation/cluster" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "algorithm": "kmeans",
+    "n_clusters": 5,
+    "client_vertical": "curso-online"
+  }'
+# Retorna:
+{
+  "clusters": [
+    {
+      "cluster_id": 0,
+      "name": "Segmento 0 - Alto Valor + Alto Engajamento (SP)",
+      "size": 95,
+      "avg_ltv": 497.50,
+      "action_recommendations": [
+        "Priorizar remarketing em 24h",
+        "Criar lookalike audience de alto valor"
+      ]
+    },
+    ...
+  ]
+}
+```
+### Caso 2: Detecção de Anomalias
+```bash
+curl "https://seudominio.com/api/segmentation/outliers?limit=20"
+# Retorna:
+{
+  "outliers": [
+    {
+      "lead_id": "lead_123",
+      "risk_score": 0.92,
+      "reason": "behavior_score too high (> 95), suspicious bot activity"
+    }
+  ]
+}
+```
+---
+## COMANDO *new-ai-module — Ativação por módulo genérico
+Este agente também é invocado pelo Master Orchestrator quando recebe o comando `*new-ai-module` com descrição do tipo:
+**segmentar, agrupar, cluster, similaridade, perfil, embedding, distribuição**
+### Responsabilidade no PASSO 1 do pipeline *new-ai-module
+Entregar obrigatoriamente ao Master Orchestrator:
+1. **Modelo escolhido** — para clustering semântico: `@cf/baai/bge-m3`; para auto-labeling de segmentos: `@cf/ibm-granite/granite-4.0-h-micro`
+2. **Estratégia de clustering** — K-means (grupos fixos) ou DBSCAN (anomalias/fraude), com `n_clusters` recomendado
+3. **Pipeline de features** — quais campos do payload ou da tabela D1 alimentam o vetor de embedding
+4. **Contrato de output** — `{ segment_id, segment_label, similarity_score, cluster_method }`
+5. **TTL de cache KV recomendado** — padrão 3600s; para segmentos comportamentais usar 1800s
+6. **Posição no pipeline `/track`** — se Modo A ou C, este módulo entra após LTV Prediction
+> **Regra:** Responder com o pacote completo em uma única mensagem. Sem perguntas de volta ao Orchestrator.
+---
+*ML Clustering Agent v1.0 — Segmentação Dinâmica ML*
+*Versão: 1.0 — Criado em: 9 de Abril de 2026*
+*Status: Ready para implementação*