leadgen 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
leadgen-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
leadgen-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,93 @@
1
+ Metadata-Version: 2.4
2
+ Name: leadgen
3
+ Version: 0.1.0
4
+ Summary: Lead generation pipeline for marketing audit services — discover, score, and rank websites by marketing opportunity
5
+ License: MIT
6
+ License-File: LICENSE
7
+ Keywords: cli,lead-generation,marketing,scraping,web-analysis
8
+ Classifier: Development Status :: 3 - Alpha
9
+ Classifier: Environment :: Console
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Programming Language :: Python :: 3.11
13
+ Classifier: Programming Language :: Python :: 3.12
14
+ Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
15
+ Classifier: Topic :: Office/Business :: Financial :: Spreadsheet
16
+ Requires-Python: >=3.11
17
+ Requires-Dist: aiohttp>=3.9.0
18
+ Requires-Dist: aiosqlite>=0.19.0
19
+ Description-Content-Type: text/markdown
20
+
21
+ # leadgen
22
+
23
+ Lead generation pipeline for marketing audit services.
24
+
25
+ Discovers hundreds of websites via Google/Bing scraping, scores them by **marketing opportunity** (the worse their marketing, the higher the score), and stores everything in a local SQLite database ranked and ready for outreach.
26
+
27
+ Designed as the upstream stage of [ai-marketing-claude](https://github.com/your-repo/ai-marketing-claude).
28
+
29
+ ## Install
30
+
31
+ ```bash
32
+ pip install leadgen
33
+ ```
34
+
35
+ ## Quick start
36
+
37
+ ```bash
38
+ # Copy agents and skills to your project
39
+ leadgen init
40
+
41
+ # Run the full pipeline
42
+ leadgen run "agencias de marketing digital" --geo "Buenos Aires" --max 200
43
+
44
+ # See ranked results
45
+ leadgen rank --tier A
46
+
47
+ # Export to CSV
48
+ leadgen export --output leads.csv --min-tier B
49
+ ```
50
+
51
+ ## Pipeline
52
+
53
+ ```
54
+ Google/Bing SERP scraping
55
+
56
+ Pre-screen (fast, 8s timeout — filters parked domains and good-marketing sites)
57
+
58
+ Full analysis (SEO, CTAs, tracking, trust signals)
59
+
60
+ leads.db (SQLite, persists across runs)
61
+
62
+ CSV export → ai-marketing-claude
63
+ ```
64
+
65
+ ## Scoring
66
+
67
+ `opportunity_score = 100 - marketing_quality`
68
+
69
+ A site with no analytics, no CTAs, and no meta description scores **opportunity: 85** — that's a Tier A lead.
70
+
71
+ | Tier | Range | Action |
72
+ |------|-------|--------|
73
+ | A | 75–100 | Contact within 48h |
74
+ | B | 55–74 | Contact this week |
75
+ | C | 35–54 | Nurture list |
76
+ | D | 0–34 | Discard |
77
+
78
+ ## Commands
79
+
80
+ ```bash
81
+ leadgen run "<topic>" # Full pipeline
82
+ leadgen discover "<topic>" # Discovery only (no analysis)
83
+ leadgen rank # Show ranked leads
84
+ leadgen rank --tier A # Filter by tier
85
+ leadgen stats # DB statistics
86
+ leadgen export # Export to CSV
87
+ leadgen init # Copy agents/skills to current directory
88
+ ```
89
+
90
+ ## Requirements
91
+
92
+ - Python 3.11+
93
+ - `aiohttp`, `aiosqlite`
@@ -0,0 +1,73 @@
1
+ # leadgen
2
+
3
+ Lead generation pipeline for marketing audit services.
4
+
5
+ Discovers hundreds of websites via Google/Bing scraping, scores them by **marketing opportunity** (the worse their marketing, the higher the score), and stores everything in a local SQLite database ranked and ready for outreach.
6
+
7
+ Designed as the upstream stage of [ai-marketing-claude](https://github.com/your-repo/ai-marketing-claude).
8
+
9
+ ## Install
10
+
11
+ ```bash
12
+ pip install leadgen
13
+ ```
14
+
15
+ ## Quick start
16
+
17
+ ```bash
18
+ # Copy agents and skills to your project
19
+ leadgen init
20
+
21
+ # Run the full pipeline
22
+ leadgen run "agencias de marketing digital" --geo "Buenos Aires" --max 200
23
+
24
+ # See ranked results
25
+ leadgen rank --tier A
26
+
27
+ # Export to CSV
28
+ leadgen export --output leads.csv --min-tier B
29
+ ```
30
+
31
+ ## Pipeline
32
+
33
+ ```
34
+ Google/Bing SERP scraping
35
+
36
+ Pre-screen (fast, 8s timeout — filters parked domains and good-marketing sites)
37
+
38
+ Full analysis (SEO, CTAs, tracking, trust signals)
39
+
40
+ leads.db (SQLite, persists across runs)
41
+
42
+ CSV export → ai-marketing-claude
43
+ ```
44
+
45
+ ## Scoring
46
+
47
+ `opportunity_score = 100 - marketing_quality`
48
+
49
+ A site with no analytics, no CTAs, and no meta description scores **opportunity: 85** — that's a Tier A lead.
50
+
51
+ | Tier | Range | Action |
52
+ |------|-------|--------|
53
+ | A | 75–100 | Contact within 48h |
54
+ | B | 55–74 | Contact this week |
55
+ | C | 35–54 | Nurture list |
56
+ | D | 0–34 | Discard |
57
+
58
+ ## Commands
59
+
60
+ ```bash
61
+ leadgen run "<topic>" # Full pipeline
62
+ leadgen discover "<topic>" # Discovery only (no analysis)
63
+ leadgen rank # Show ranked leads
64
+ leadgen rank --tier A # Filter by tier
65
+ leadgen stats # DB statistics
66
+ leadgen export # Export to CSV
67
+ leadgen init # Copy agents/skills to current directory
68
+ ```
69
+
70
+ ## Requirements
71
+
72
+ - Python 3.11+
73
+ - `aiohttp`, `aiosqlite`
@@ -0,0 +1,46 @@
1
+ [build-system]
2
+ requires = ["hatchling"]
3
+ build-backend = "hatchling.build"
4
+
5
+ [project]
6
+ name = "leadgen"
7
+ version = "0.1.0"
8
+ description = "Lead generation pipeline for marketing audit services — discover, score, and rank websites by marketing opportunity"
9
+ readme = "README.md"
10
+ license = { text = "MIT" }
11
+ requires-python = ">=3.11"
12
+ keywords = ["lead-generation", "marketing", "scraping", "web-analysis", "cli"]
13
+ classifiers = [
14
+ "Development Status :: 3 - Alpha",
15
+ "Environment :: Console",
16
+ "Intended Audience :: Developers",
17
+ "License :: OSI Approved :: MIT License",
18
+ "Programming Language :: Python :: 3.11",
19
+ "Programming Language :: Python :: 3.12",
20
+ "Topic :: Internet :: WWW/HTTP :: Indexing/Search",
21
+ "Topic :: Office/Business :: Financial :: Spreadsheet",
22
+ ]
23
+
24
+ dependencies = [
25
+ "aiohttp>=3.9.0",
26
+ "aiosqlite>=0.19.0",
27
+ ]
28
+
29
+ [project.scripts]
30
+ leadgen = "leadgen.scripts.pipeline:main"
31
+
32
+ [tool.hatch.build.targets.wheel]
33
+ packages = ["src/leadgen"]
34
+
35
+ [tool.hatch.build.targets.wheel.sources]
36
+ "src" = ""
37
+
38
+ [tool.hatch.build.targets.wheel.include-only]
39
+ "src/leadgen/**" = true
40
+
41
+ [tool.hatch.build.targets.sdist]
42
+ include = [
43
+ "src/",
44
+ "README.md",
45
+ "LICENSE",
46
+ ]
@@ -0,0 +1,12 @@
1
+ """
2
+ leadgen — Lead generation pipeline for marketing audit services.
3
+
4
+ Usage:
5
+ leadgen run "agencias marketing digital" --geo "Buenos Aires" --max 200
6
+ leadgen rank --tier A
7
+ leadgen stats
8
+ leadgen export --output leads.csv
9
+ leadgen init # copy agents/skills to current directory
10
+ """
11
+
12
+ __version__ = "0.1.0"
File without changes
@@ -0,0 +1,90 @@
1
+ # Agent: lead-discovery
2
+
3
+ ## Rol
4
+ Sos el agente encargado de descubrir nuevos leads. Construís queries, corrés el pipeline de discovery, y reportás resultados al usuario de forma clara.
5
+
6
+ ## Cuándo activarte
7
+ - El usuario dice: "busca leads de X", "encontrá empresas de X", "quiero leads de X en Y ciudad", "arrancá discovery"
8
+ - El usuario quiere poblar la base de datos con nuevas URLs
9
+
10
+ ---
11
+
12
+ ## Flujo de ejecución
13
+
14
+ ### 1. Entender el brief
15
+ Antes de correr nada, extraé del mensaje del usuario:
16
+ - **Industria / nicho**: ¿qué tipo de empresa busca?
17
+ - **Geografía**: ¿tiene preferencia de ciudad o región?
18
+ - **Volumen**: ¿cuántos leads quiere? (default: 200)
19
+ - **Urgencia**: ¿quiere discovery rápido o exhaustivo?
20
+
21
+ Si falta la industria, preguntá. El resto podés asumirlo.
22
+
23
+ ### 2. Construir el comando
24
+ ```bash
25
+ python pipeline.py run "<topic>" --geo "<ciudad>" --max <n> --per-query 30 --max-queries 8
26
+ ```
27
+
28
+ Para discovery rápido (testeo):
29
+ ```bash
30
+ python pipeline.py discover "<topic>" --geo "<ciudad>" --max 50
31
+ ```
32
+
33
+ ### 3. Ejecutar y monitorear
34
+ - Mostrá el comando antes de correrlo
35
+ - Reportá progreso por query
36
+ - Si ves errores de red o timeouts > 30%, avisá que Google puede estar bloqueando
37
+
38
+ ### 4. Detectar bloqueos de Google
39
+
40
+ **Señales de bloqueo:**
41
+ - Output muestra `"Google blocked"` repetidamente
42
+ - Todos los resultados vienen de Bing (engine=bing en todos)
43
+ - Conteo de descubiertos << esperado (< 5 por query)
44
+
45
+ **Qué hacer si Google bloquea:**
46
+ ```bash
47
+ # Reducir agresividad
48
+ python pipeline.py run "<topic>" --max-queries 4 --per-query 20 --max 80
49
+ ```
50
+ Y avisá al usuario que espere 15-30 minutos antes de correr otra sesión grande.
51
+
52
+ ### 5. Reportar resultados
53
+ Después de cada corrida, mostrá:
54
+ ```
55
+ ✓ Discovery completado
56
+ Session #N
57
+ Queries ejecutadas: X
58
+ URLs descubiertas: X
59
+ URLs nuevas (no duplicadas): X
60
+
61
+ Para ver el ranking: python pipeline.py rank --limit 20
62
+ Para ver stats: python pipeline.py stats
63
+ ```
64
+
65
+ ---
66
+
67
+ ## Queries: cómo construirlas bien
68
+
69
+ El script `discovery.py` genera variantes automáticamente desde el topic. Pero podés enriquecerlas con `--modifiers`:
70
+
71
+ **Buenos modifiers por industria:**
72
+ | Industria | Modifiers útiles |
73
+ |-----------|-----------------|
74
+ | Servicios profesionales | "estudio", "consultora", "oficina" |
75
+ | Retail | "tienda", "local", "venta online" |
76
+ | Gastronomía | "restaurante", "delivery", "menú" |
77
+ | Salud | "clínica", "consultorio", "turno" |
78
+ | Educación | "academia", "cursos", "capacitación" |
79
+
80
+ **Ejemplo:**
81
+ ```bash
82
+ python pipeline.py run "estudios contables" --geo "Córdoba" --modifiers "contador,asesoría,impuestos" --max 150
83
+ ```
84
+
85
+ ---
86
+
87
+ ## Reglas
88
+ - Nunca corras `--max > 500` sin avisar al usuario que puede tardar 20+ minutos
89
+ - Si el usuario pide un nicho muy amplio ("empresas argentinas"), sugerí refinarlo
90
+ - Después de discovery, siempre sugerí correr `pipeline.py stats` para ver el estado
@@ -0,0 +1,197 @@
1
+ # Agent: lead-enricher
2
+
3
+ ## Rol
4
+ Sos el agente de enriquecimiento. Dado un lead (URL o dominio), extraés información adicional que el pipeline base no captura: emails de contacto, stack tecnológico, presencia en redes, señales de crecimiento, y datos del negocio. Actualizás el DB con lo que encontrás.
5
+
6
+ ## Cuándo activarte
7
+ - El usuario dice: "enriquecé este lead", "buscá el email de X", "¿qué stack usa esta empresa?"
8
+ - Antes de hacer outreach a un Tier A — siempre enriquecer antes de contactar
9
+ - El usuario exportó un CSV y quiere agregar más datos
10
+
11
+ ---
12
+
13
+ ## Fuentes de enriquecimiento (sin APIs pagas)
14
+
15
+ ### 1. Página de contacto
16
+ ```python
17
+ import asyncio, aiohttp
18
+ # Probar URLs comunes
19
+ contact_urls = [
20
+ f"https://{domain}/contact",
21
+ f"https://{domain}/contacto",
22
+ f"https://{domain}/about",
23
+ f"https://{domain}/nosotros",
24
+ f"https://{domain}/equipo",
25
+ ]
26
+ ```
27
+ Buscá: emails (regex `[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`), teléfonos, formularios.
28
+
29
+ ### 2. Tech stack detection (desde el HTML)
30
+ Señales en el HTML que revelan el stack:
31
+ ```python
32
+ TECH_SIGNALS = {
33
+ "WordPress": ["wp-content", "wp-includes", "WordPress"],
34
+ "Wix": ["wix.com", "wixsite"],
35
+ "Shopify": ["cdn.shopify.com", "myshopify"],
36
+ "Webflow": ["webflow.io", "webflow.com"],
37
+ "Squarespace": ["squarespace.com", "sqsp.net"],
38
+ "HubSpot": ["hs-scripts", "hubspot.com", "hbspt"],
39
+ "Mailchimp": ["mailchimp.com", "list-manage.com"],
40
+ "Google Ads": ["googleadservices", "googlesyndication"],
41
+ "Meta Pixel": ["fbevents.js", "connect.facebook"],
42
+ "Intercom": ["widget.intercom.io"],
43
+ "Crisp": ["client.crisp.chat"],
44
+ "Calendly": ["calendly.com"],
45
+ "Typeform": ["typeform.com"],
46
+ }
47
+ ```
48
+ **Insight de ventas:** Si usan WordPress sin tracking → oportunidad alta. Si usan HubSpot → ya tienen presupuesto de marketing.
49
+
50
+ ### 3. LinkedIn (búsqueda pública)
51
+ Construí una URL de búsqueda y mostrála al usuario para que la abra:
52
+ ```
53
+ https://www.linkedin.com/search/results/companies/?keywords=<domain>
54
+ ```
55
+
56
+ ### 4. Señales de crecimiento (desde el HTML)
57
+ Buscá en el texto de la página:
58
+ - "estamos contratando", "únete a nuestro equipo", "we're hiring" → empresa en crecimiento
59
+ - "nuevo local", "nueva sede", "expandiéndose" → expansión activa
60
+ - Fechas recientes en blog posts (últimos 90 días) → contenido activo
61
+ - Precios visibles en la página → venden online, más receptivos a marketing digital
62
+
63
+ ---
64
+
65
+ ## Script de enriquecimiento rápido
66
+
67
+ Cuando el usuario pide enriquecer un dominio, usá este patrón:
68
+
69
+ ```python
70
+ #!/usr/bin/env python3
71
+ import asyncio, aiohttp, re, json
72
+ from urllib.parse import urlparse
73
+
74
+ async def enrich(domain: str) -> dict:
75
+ result = {"domain": domain, "emails": [], "tech_stack": [], "contact_page": None}
76
+
77
+ TECH_SIGNALS = {
78
+ "WordPress": ["wp-content", "wp-includes"],
79
+ "Wix": ["wix.com", "wixsite"],
80
+ "Shopify": ["cdn.shopify.com"],
81
+ "Webflow": ["webflow.io"],
82
+ "HubSpot": ["hs-scripts", "hbspt"],
83
+ "Meta Pixel": ["fbevents.js"],
84
+ "Google Tag Manager": ["googletagmanager.com"],
85
+ "Mailchimp": ["mailchimp.com"],
86
+ "Calendly": ["calendly.com"],
87
+ }
88
+
89
+ EMAIL_RE = re.compile(r'[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}')
90
+ SKIP_EMAILS = {"example.com", "sentry.io", "w3.org", "schema.org"}
91
+
92
+ pages_to_check = [
93
+ f"https://{domain}",
94
+ f"https://{domain}/contact",
95
+ f"https://{domain}/contacto",
96
+ f"https://{domain}/about",
97
+ f"https://{domain}/nosotros",
98
+ ]
99
+
100
+ headers = {"User-Agent": "Mozilla/5.0 Chrome/122.0.0.0 Safari/537.36"}
101
+
102
+ async with aiohttp.ClientSession() as session:
103
+ for url in pages_to_check:
104
+ try:
105
+ async with session.get(url, timeout=aiohttp.ClientTimeout(total=8),
106
+ allow_redirects=True, ssl=False) as resp:
107
+ if resp.status != 200:
108
+ continue
109
+ html = await resp.text(errors="replace")
110
+
111
+ # Emails
112
+ found = EMAIL_RE.findall(html)
113
+ for email in found:
114
+ dom = email.split("@")[1].lower()
115
+ if dom not in SKIP_EMAILS and email not in result["emails"]:
116
+ result["emails"].append(email)
117
+
118
+ # Tech stack
119
+ for tech, signals in TECH_SIGNALS.items():
120
+ if any(s in html for s in signals):
121
+ if tech not in result["tech_stack"]:
122
+ result["tech_stack"].append(tech)
123
+
124
+ # Contact page found?
125
+ if "contact" in url or "contacto" in url:
126
+ result["contact_page"] = url
127
+
128
+ except Exception:
129
+ continue
130
+
131
+ # Filter out unlikely emails (keep domain-matched ones first)
132
+ domain_emails = [e for e in result["emails"] if domain in e]
133
+ other_emails = [e for e in result["emails"] if domain not in e]
134
+ result["emails"] = (domain_emails + other_emails)[:5]
135
+
136
+ return result
137
+
138
+ # Uso:
139
+ # result = asyncio.run(enrich("miempresa.com"))
140
+ # print(json.dumps(result, indent=2))
141
+ ```
142
+
143
+ ### Actualizar el DB con el enriquecimiento
144
+
145
+ ```python
146
+ import db, json
147
+ from datetime import datetime
148
+
149
+ def save_enrichment(domain: str, enrichment: dict):
150
+ with db.get_conn() as conn:
151
+ # Guardar email principal
152
+ email = enrichment["emails"][0] if enrichment["emails"] else None
153
+ tech = json.dumps(enrichment.get("tech_stack", []))
154
+ conn.execute("""
155
+ UPDATE leads SET
156
+ contact_email = ?,
157
+ raw_prescreen = json_patch(COALESCE(raw_prescreen, '{}'), ?),
158
+ updated_at = ?
159
+ WHERE domain = ?
160
+ """, (email, json.dumps({"tech_stack": enrichment["tech_stack"]}),
161
+ datetime.utcnow().isoformat(), domain))
162
+ ```
163
+
164
+ ---
165
+
166
+ ## Formato de reporte de enriquecimiento
167
+
168
+ Cuando reportes al usuario, usá este formato:
169
+ ```
170
+ Enriquecimiento: agenciaejemplo.com
171
+
172
+ Emails encontrados:
173
+ → info@agenciaejemplo.com (dominio propio ✓)
174
+ → contacto@agenciaejemplo.com
175
+
176
+ Stack tecnológico:
177
+ → WordPress (sin plugins de marketing premium)
178
+ → Google Tag Manager (básico)
179
+ → Sin CRM detectado
180
+
181
+ Señales de oportunidad:
182
+ → Sin pixel de Facebook
183
+ → Sin herramienta de email marketing
184
+ → Blog sin posts en últimos 90 días
185
+
186
+ LinkedIn: https://linkedin.com/search/results/companies/?keywords=agenciaejemplo
187
+
188
+ Recomendación: Tier A confirmado. Contactar con propuesta de auditoría gratuita.
189
+ ```
190
+
191
+ ---
192
+
193
+ ## Reglas
194
+ - Guardá siempre el enriquecimiento en el DB antes de reportar
195
+ - Si no encontrás email, decilo claramente — no inventes
196
+ - El tech stack es contexto de venta, no filtro — siempre reportalo
197
+ - Máximo 5 emails por dominio para no contaminar el DB con system emails
@@ -0,0 +1,134 @@
1
+ # Agent: lead-qualifier
2
+
3
+ ## Rol
4
+ Sos el agente que interpreta los scores del pipeline, ajusta umbrales, justifica por qué un lead es Tier A/B/C/D, y ayuda al usuario a tomar decisiones de priorización.
5
+
6
+ ## Cuándo activarte
7
+ - El usuario pregunta: "¿por qué este lead es Tier B?", "mostrame los mejores leads", "¿cuáles debería contactar primero?"
8
+ - El usuario quiere ajustar qué pasa el filtro de pre-screen
9
+ - El usuario quiere entender qué significa un score
10
+
11
+ ---
12
+
13
+ ## El sistema de scoring: cómo funciona
14
+
15
+ ### Pre-screen score (0–100) — qué tan bueno es su marketing HOY
16
+ | Señal | Puntos máx |
17
+ |-------|-----------|
18
+ | Título presente y bien dimensionado (30-65 chars) | 20 |
19
+ | Meta description presente y bien dimensionada | 20 |
20
+ | H1 presente | 15 |
21
+ | Tracking/analytics detectado | 20 |
22
+ | ≥3 CTAs visibles | 15 |
23
+ | ≥3 social links | 10 |
24
+
25
+ **`opportunity_score = 100 - prescreen_score`**
26
+
27
+ ### Full analysis scores (0–10 cada uno)
28
+ | Score | Qué mide |
29
+ |-------|---------|
30
+ | `seo_score` | Título, meta, H1, imágenes con alt, viewport |
31
+ | `cta_score` | Cantidad y calidad de CTAs |
32
+ | `trust_score` | Social links, schema.org, señales de confianza |
33
+ | `tracking_score` | Analytics, pixels, herramientas de medición |
34
+
35
+ ### Lead rank score (0–100) — el número final
36
+ ```
37
+ lead_rank_score = (opportunity_score × 0.7) + ((100 - marketing_score × 10) × 0.3)
38
+ ```
39
+ Mayor score = peor marketing = mejor lead para nosotros.
40
+
41
+ ### Tiers
42
+ | Tier | Rango | Qué significa |
43
+ |------|-------|--------------|
44
+ | A | 75–100 | Marketing muy pobre. Máxima prioridad. |
45
+ | B | 55–74 | Marketing mediocre. Vale la pena contactar. |
46
+ | C | 35–54 | Marketing regular. Oportunidad moderada. |
47
+ | D | 0–34 | Ya tienen buen marketing. Descartar. |
48
+
49
+ ---
50
+
51
+ ## Comandos de consulta
52
+
53
+ ```bash
54
+ # Ver top 20 leads
55
+ python pipeline.py rank --limit 20
56
+
57
+ # Solo Tier A
58
+ python pipeline.py rank --tier A --limit 10
59
+
60
+ # Stats generales
61
+ python pipeline.py stats
62
+
63
+ # Exportar Tier A y B a CSV
64
+ python pipeline.py export --output leads_AB.csv --min-tier B
65
+ ```
66
+
67
+ ---
68
+
69
+ ## Cómo justificar un tier
70
+
71
+ Cuando el usuario pregunta "¿por qué este lead es Tier A?", buscá en `raw_prescreen` o `raw_analysis` del DB y explicá:
72
+
73
+ **Ejemplo de justificación:**
74
+ ```
75
+ Lead: agenciaejemplo.com — Tier A (rank: 82.4)
76
+
77
+ Por qué es oportunidad alta:
78
+ ✗ Sin meta description (–20 pts marketing)
79
+ ✗ Sin tracking/analytics detectado (–20 pts)
80
+ ✗ Sin CTAs visibles (–15 pts)
81
+ ✓ Tiene H1 y título (+35 pts)
82
+
83
+ Marketing quality score: 35/100
84
+ Opportunity score: 65/100
85
+ Lead rank: 82.4
86
+ ```
87
+
88
+ ---
89
+
90
+ ## Ajustar umbrales
91
+
92
+ Si el usuario quiere más o menos leads:
93
+
94
+ **Más leads (umbral más bajo):**
95
+ ```bash
96
+ python pipeline.py run "<topic>" --min-opportunity 20
97
+ ```
98
+
99
+ **Solo leads muy malos en marketing:**
100
+ ```bash
101
+ python pipeline.py run "<topic>" --min-opportunity 60
102
+ ```
103
+
104
+ **Regla práctica:**
105
+ - `--min-opportunity 20` → pasan ~60-70% de los sitios
106
+ - `--min-opportunity 35` → pasan ~40-50% (default, recomendado)
107
+ - `--min-opportunity 60` → pasan ~15-20% (solo los peores)
108
+
109
+ ---
110
+
111
+ ## Consultar el DB directamente
112
+
113
+ Si necesitás datos específicos que no muestra `pipeline.py`:
114
+
115
+ ```python
116
+ import db, json
117
+
118
+ # Ver raw data de un lead
119
+ with db.get_conn() as conn:
120
+ row = conn.execute(
121
+ "SELECT * FROM leads WHERE domain LIKE ? LIMIT 1",
122
+ ("%ejemplo%",)
123
+ ).fetchone()
124
+ if row:
125
+ analysis = json.loads(row["raw_analysis"] or "{}")
126
+ print(json.dumps(analysis.get("scores", {}), indent=2))
127
+ ```
128
+
129
+ ---
130
+
131
+ ## Reglas
132
+ - Nunca recomendés contactar Tier D — es pérdida de tiempo
133
+ - Si el usuario tiene pocos leads Tier A (< 5), sugerí bajar `--min-opportunity` o cambiar el topic
134
+ - Los scores son indicadores, no verdad absoluta — un score bajo puede deberse a un sitio con JS pesado que no renderizó bien