@opendirectory.dev/skills 0.1.32 → 0.1.34

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. package/package.json +1 -1
  2. package/registry.json +10 -0
  3. package/skills/blog-cover-image-cli/README.md +18 -20
  4. package/skills/brand-alchemy/README.md +18 -20
  5. package/skills/claude-md-generator/README.md +18 -20
  6. package/skills/cold-email-verifier/README.md +18 -20
  7. package/skills/cook-the-blog/README.md +18 -20
  8. package/skills/dependency-update-bot/README.md +18 -20
  9. package/skills/docs-from-code/README.md +18 -20
  10. package/skills/explain-this-pr/README.md +18 -20
  11. package/skills/google-trends-api-skills/README.md +18 -20
  12. package/skills/hackernews-intel/README.md +18 -20
  13. package/skills/human-tone/README.md +18 -20
  14. package/skills/kill-the-standup/README.md +18 -20
  15. package/skills/linkedin-post-generator/README.md +18 -20
  16. package/skills/llms-txt-generator/README.md +18 -20
  17. package/skills/meeting-brief-generator/README.md +18 -20
  18. package/skills/meta-ads-skill/README.md +18 -20
  19. package/skills/newsletter-digest/README.md +18 -20
  20. package/skills/noise2blog/README.md +18 -20
  21. package/skills/outreach-sequence-builder/README.md +18 -20
  22. package/skills/position-me/README.md +18 -20
  23. package/skills/pr-description-writer/README.md +18 -20
  24. package/skills/pricing-page-psychology-audit/README.md +18 -20
  25. package/skills/producthunt-launch-kit/README.md +18 -20
  26. package/skills/reddit-icp-monitor/README.md +18 -20
  27. package/skills/reddit-post-engine/README.md +18 -20
  28. package/skills/schema-markup-generator/README.md +18 -20
  29. package/skills/show-hn-writer/README.md +18 -20
  30. package/skills/tweet-thread-from-blog/README.md +18 -20
  31. package/skills/twitter-GTM-find-skill/README.md +18 -20
  32. package/skills/vc-finder/.env.example +18 -0
  33. package/skills/vc-finder/README.md +113 -0
  34. package/skills/vc-finder/SKILL.md +663 -0
  35. package/skills/vc-finder/evals/evals.json +125 -0
  36. package/skills/vc-finder/references/stage-signals.md +98 -0
  37. package/skills/vc-finder/references/vc-outreach-guide.md +142 -0
  38. package/skills/yc-intent-radar-skill/README.md +18 -20
@@ -0,0 +1,663 @@
1
+ ---
2
+ name: vc-finder
3
+ description: Takes a startup product URL or description, detects the industry and funding stage, identifies 5 comparable funded companies, searches who invested in those companies (Track A), finds VCs who publish investment theses about this space (Track B), and returns a ranked sourced list of relevant investors with deep-dives and outreach hooks. Use when asked to find investors for a startup, identify which VCs fund products like mine, research who backs companies in my space, build a VC target list, or find investor-market fit. Trigger when a user says "find VCs for my startup", "who invests in my space", "build me a VC list", "which funds should I pitch", "find investors for my product", "who backed companies like mine", or "help me find venture capital".
4
+ compatibility: [claude-code, gemini-cli, github-copilot]
5
+ ---
6
+
7
+ # VC Finder
8
+
9
+ Take a product URL or description. Detect industry and stage. Find 5 comparable funded companies. Run two research tracks: who invested in those comparables (Track A), and which VCs publish theses about this space (Track B). Return a sourced, ranked investor list with outreach hooks.
10
+
11
+ ---
12
+
13
+ **Critical rule:** Every VC in Track A must include the specific comparable company they backed as evidence. Every VC in Track B must include the exact article or post title where they stated their thesis. If a VC name did not appear in Tavily search results, do not include them. No hallucinated fund names.
14
+
15
+ ---
16
+
17
+ ## Common Mistakes
18
+
19
+ | The agent will want to... | Why that's wrong |
20
+ |---|---|
21
+ | Add a16z or Sequoia because they are famous | A famous VC without evidence is noise. Only include VCs that appear in Tavily search results for this specific product. Name-dropping wastes the founder's time. |
22
+ | Continue when all 5 Track A searches return 0 results | Zero Track A results means the comparables were wrong or too obscure. Stop, regenerate comparables with broader known names, and retry. Continuing produces an evidence-free list. |
23
+ | Include a Track B VC without citing the article or post | Thesis without a source is indistinguishable from hallucination. The founder cannot verify it and the list loses all credibility. |
24
+ | Detect stage from website aesthetics ("site looks polished") | Stage must come from the specific CTA signals detected in Step 4. Aesthetic guessing sends founders to wrong-stage investors. |
25
+ | Write generic outreach hooks like "highlight your traction" | Every outreach hook must name this specific product's differentiator and a specific VC portfolio signal. Generic hooks are removed by the QA step. |
26
+ | Skip the URL fetch when the user also provides a description | Always fetch the URL. The live page often reveals stage signals (pricing CTAs, customer logos, job openings) that the user's description omits. |
27
+
28
+ ---
29
+
30
+ ## Step 1: Setup Check
31
+
32
+ ```bash
33
+ echo "GEMINI_API_KEY: ${GEMINI_API_KEY:+set}"
34
+ echo "TAVILY_API_KEY: ${TAVILY_API_KEY:+set}"
35
+ echo "FIRECRAWL_API_KEY: ${FIRECRAWL_API_KEY:-not set, Tavily extract will be used as fallback}"
36
+ ```
37
+
38
+ **If GEMINI_API_KEY is missing:** Stop. Tell the user: "GEMINI_API_KEY is required for product analysis and VC synthesis. Get it at aistudio.google.com. Add it to your .env file."
39
+
40
+ **If TAVILY_API_KEY is missing:** Stop. Tell the user: "TAVILY_API_KEY is required to research VC investments and theses. There is no fallback for this. Get it at app.tavily.com. Free tier: 1000 credits/month (about 125 full runs). Add it to your .env file."
41
+
42
+ **If only FIRECRAWL_API_KEY is missing:** Continue silently. Tavily extract will be used for the URL fetch.
43
+
44
+ ---
45
+
46
+ ## Step 2: Gather Input
47
+
48
+ You need:
49
+ - Product URL (required, unless user pastes a product description directly)
50
+ - Optional: target stage hint (pre-seed, seed, series-a, series-b) -- if provided, use it and skip stage detection
51
+ - Optional: geography preference (US, Europe, global) -- defaults to US if not specified
52
+
53
+ **If the user provides only a pasted description (no URL):** Skip Steps 3-4. Go directly to Step 5 with the pasted text as `product_content`. Set `stage_source` to `user_description`.
54
+
55
+ **If neither URL nor description is provided:** Ask: "What is the URL of your product or startup? Or paste a short description: what it does, who it is for, and what stage you are at (pre-seed, seed, Series A)."
56
+
57
+ Derive product slug from URL for the output filename:
58
+
59
+ ```bash
60
+ PRODUCT_SLUG=$(python3 -c "
61
+ from urllib.parse import urlparse
62
+ url = 'URL_HERE'
63
+ host = urlparse(url).netloc.replace('www.', '')
64
+ print(host.split('.')[0])
65
+ ")
66
+ ```
67
+
68
+ ---
69
+
70
+ ## Step 3: Fetch Product Page
71
+
72
+ **Primary: Firecrawl (if FIRECRAWL_API_KEY is set)**
73
+
74
+ ```bash
75
+ curl -s -X POST https://api.firecrawl.dev/v1/scrape \
76
+ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
77
+ -H "Content-Type: application/json" \
78
+ -d '{"url": "URL_HERE", "formats": ["markdown"], "onlyMainContent": true}' \
79
+ | python3 -c "
80
+ import sys, json
81
+ d = json.load(sys.stdin)
82
+ content = d.get('data', {}).get('markdown', '') or d.get('markdown', '')
83
+ print(f'Fetched: {len(content)} characters')
84
+ open('/tmp/vc-product-raw.md', 'w').write(content)
85
+ "
86
+ ```
87
+
88
+ **Fallback: Tavily extract (if FIRECRAWL_API_KEY is not set)**
89
+
90
+ ```bash
91
+ curl -s -X POST https://api.tavily.com/extract \
92
+ -H "Content-Type: application/json" \
93
+ -d "{\"api_key\": \"$TAVILY_API_KEY\", \"urls\": [\"URL_HERE\"]}" \
94
+ | python3 -c "
95
+ import sys, json
96
+ d = json.load(sys.stdin)
97
+ content = d.get('results', [{}])[0].get('raw_content', '')
98
+ print(f'Fetched via Tavily extract: {len(content)} characters')
99
+ open('/tmp/vc-product-raw.md', 'w').write(content)
100
+ "
101
+ ```
102
+
103
+ **Step-level checkpoint:**
104
+
105
+ ```bash
106
+ python3 -c "
107
+ content = open('/tmp/vc-product-raw.md').read()
108
+ if len(content) < 200:
109
+ print('ERROR: Page returned fewer than 200 characters.')
110
+ else:
111
+ print(f'Content OK: {len(content)} characters')
112
+ "
113
+ ```
114
+
115
+ **If content < 200 characters:** Stop fetching. Tell the user: "The product page returned no readable content. This usually means the site is JavaScript-rendered and requires a browser. Please paste your product description directly: what it does, who it is for, and what stage you are at."
116
+
117
+ Proceed to Step 5 using the pasted description as `product_content`.
118
+
119
+ ---
120
+
121
+ ## Step 4: Detect Stage Signals Locally (No API)
122
+
123
+ Parse the fetched markdown with regex before any API call. This gives Gemini anchored evidence rather than asking it to guess from aesthetics.
124
+
125
+ ```bash
126
+ python3 << 'PYEOF'
127
+ import re, json
128
+
129
+ content = open('/tmp/vc-product-raw.md').read().lower()
130
+ stage_signals = []
131
+
132
+ # Pre-seed signals
133
+ if re.search(r'join\s+(the\s+)?waitlist|sign\s+up\s+for\s+beta|early\s+access|request\s+(an?\s+)?invite|get\s+notified', content):
134
+ stage_signals.append({'signal': 'waitlist or beta CTA', 'stage_hint': 'pre-seed'})
135
+
136
+ # Seed signals
137
+ if re.search(r'start\s+(your\s+)?free\s+trial|try\s+(it\s+)?for\s+free|request\s+a?\s+demo|book\s+a?\s+demo|schedule\s+a?\s+demo', content):
138
+ stage_signals.append({'signal': 'free trial or demo CTA', 'stage_hint': 'seed'})
139
+
140
+ # Series A signals
141
+ if re.search(r'contact\s+sales|talk\s+to\s+(our\s+)?sales|see\s+pricing|view\s+pricing|plans\s+and\s+pricing', content):
142
+ stage_signals.append({'signal': 'pricing or sales CTA', 'stage_hint': 'series-a'})
143
+ if re.search(r'case\s+stud(y|ies)|customer\s+stor(y|ies)|trusted\s+by\s+[\d,]+|used\s+by\s+[\d,]+', content):
144
+ stage_signals.append({'signal': 'case studies or customer count', 'stage_hint': 'series-a'})
145
+
146
+ # Series A/B signals
147
+ if re.search(r'enterprise\s+(plan|pricing|tier)|we.?re\s+hiring|join\s+our\s+team|open\s+positions', content):
148
+ stage_signals.append({'signal': 'enterprise tier or job openings', 'stage_hint': 'series-a-or-b'})
149
+
150
+ # Funding announcement -- extract directly if present
151
+ funding_match = re.search(
152
+ r'raised\s+\$[\d,.]+\s*[mk]?|series\s+[abc]\s+round|seed\s+round|(\$[\d,.]+\s*[mk]?\s+(?:seed|series\s+[abc]))',
153
+ content
154
+ )
155
+ if funding_match:
156
+ stage_signals.append({'signal': f'funding text: {funding_match.group(0).strip()}', 'stage_hint': 'announced'})
157
+
158
+ # Determine dominant stage
159
+ if not stage_signals:
160
+ dominant = 'unknown'
161
+ elif any(s['stage_hint'] == 'announced' for s in stage_signals):
162
+ dominant = 'announced'
163
+ elif any(s['stage_hint'] == 'series-a-or-b' for s in stage_signals):
164
+ dominant = 'series-a'
165
+ elif any(s['stage_hint'] == 'series-a' for s in stage_signals):
166
+ dominant = 'series-a'
167
+ elif any(s['stage_hint'] == 'seed' for s in stage_signals):
168
+ dominant = 'seed'
169
+ else:
170
+ dominant = 'pre-seed'
171
+
172
+ confidence = 'high' if len(stage_signals) >= 2 else ('medium' if len(stage_signals) == 1 else 'low')
173
+
174
+ result = {'signals': stage_signals, 'dominant_stage': dominant, 'confidence': confidence}
175
+ json.dump(result, open('/tmp/vc-stage-signals.json', 'w'), indent=2)
176
+ print(f'Stage: {dominant} ({confidence} confidence) from {len(stage_signals)} signal(s)')
177
+ for s in stage_signals:
178
+ print(f' - {s["signal"]} -> {s["stage_hint"]}')
179
+ PYEOF
180
+ ```
181
+
182
+ ---
183
+
184
+ ## Step 5: Product Analysis with Gemini
185
+
186
+ ```bash
187
+ python3 << 'PYEOF'
188
+ import json
189
+
190
+ product_content = open('/tmp/vc-product-raw.md').read()[:6000]
191
+ stage_signals = json.load(open('/tmp/vc-stage-signals.json'))
192
+
193
+ request = {
194
+ "system_instruction": {
195
+ "parts": [{
196
+ "text": "You are a venture capital analyst. Analyze a product page and return structured JSON only. No commentary. No em dashes. Vague category labels like 'technology' or 'software' alone are not acceptable at L2 or L3 -- be specific. Comparable companies must be real funded companies with public funding records, well-known enough to appear in press coverage."
197
+ }]
198
+ },
199
+ "contents": [{
200
+ "parts": [{
201
+ "text": f"""Analyze this product page and return a JSON object with exactly these keys:
202
+
203
+ 1. product_name: string
204
+ 2. one_line_description: string -- what it does, for whom, core value prop. Under 20 words. No marketing language.
205
+ 3. industry_taxonomy: object with:
206
+ - l1: top-level (e.g. "software", "fintech", "healthtech", "consumer", "hardware")
207
+ - l2: sector (e.g. "developer tools", "sales technology", "edtech", "logistics software")
208
+ - l3: specific niche (e.g. "CI/CD automation", "outbound prospecting", "last-mile routing")
209
+ 4. icp: object with:
210
+ - buyer_persona: job title (e.g. "VP Engineering", "founder", "sales ops manager")
211
+ - company_type: (e.g. "B2B SaaS", "e-commerce brand", "enterprise IT team")
212
+ - company_size: (e.g. "5-50 employees", "50-500 employees", "enterprise")
213
+ 5. detected_stage: one of: pre-seed, seed, series-a, series-b, unknown
214
+ 6. stage_confidence: one of: high, medium, low
215
+ 7. stage_evidence: one sentence citing exactly which CTA or text on the page drove this classification. Write "no clear signals found" if unknown.
216
+ 8. comparable_companies: array of exactly 5 objects, each with:
217
+ - name: real company name (must have public VC funding records)
218
+ - similarity_reason: one sentence why this company is comparable to the product
219
+ - estimated_stage: their funding stage as of your knowledge cutoff
220
+ 9. geography_bias: one of: US, Europe, global, unclear -- infer from page text
221
+
222
+ Stage signals detected from the page (use as input to your stage classification):
223
+ {json.dumps(stage_signals, indent=2)}
224
+
225
+ Product page content:
226
+ {product_content}"""
227
+ }]
228
+ }],
229
+ "generationConfig": {
230
+ "temperature": 0.2,
231
+ "maxOutputTokens": 3000
232
+ }
233
+ }
234
+
235
+ json.dump(request, open('/tmp/vc-analysis-request.json', 'w'))
236
+ PYEOF
237
+
238
+ curl -s -X POST \
239
+ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=$GEMINI_API_KEY" \
240
+ -H "Content-Type: application/json" \
241
+ -d @/tmp/vc-analysis-request.json \
242
+ | python3 -c "
243
+ import sys, json
244
+ d = json.load(sys.stdin)
245
+ text = d['candidates'][0]['content']['parts'][0]['text'].strip()
246
+ if text.startswith('\`\`\`'):
247
+ text = '\n'.join(text.split('\n')[1:-1])
248
+ analysis = json.loads(text)
249
+ json.dump(analysis, open('/tmp/vc-product-analysis.json', 'w'), indent=2)
250
+ print('Product analysis complete.')
251
+ print('Product:', analysis['product_name'])
252
+ print('Industry:', analysis['industry_taxonomy']['l1'], '>', analysis['industry_taxonomy']['l2'], '>', analysis['industry_taxonomy']['l3'])
253
+ print('Stage:', analysis['detected_stage'], '(' + analysis['stage_confidence'] + ' confidence)')
254
+ print('Comparables:', ', '.join(c['name'] for c in analysis['comparable_companies']))
255
+ "
256
+ ```
257
+
258
+ **If Gemini returns empty or JSON parsing fails:** Retry once with `maxOutputTokens` reduced to 2000. If retry also fails: Stop. Tell the user: "Product analysis failed. Please paste a direct description (3-5 sentences: what it does, who it is for, current stage) and run again."
259
+
260
+ ---
261
+
262
+ ## Step 6: Track A -- Who Invested in Comparable Companies
263
+
264
+ Run 5 Tavily searches, one per comparable. Save all results to a single file.
265
+
266
+ ```bash
267
+ python3 << 'PYEOF'
268
+ import json, os, urllib.request
269
+
270
+ analysis = json.load(open('/tmp/vc-product-analysis.json'))
271
+ comparables = analysis['comparable_companies']
272
+ tavily_key = os.environ.get('TAVILY_API_KEY', '')
273
+ all_track_a = []
274
+
275
+ for comp in comparables:
276
+ company = comp['name']
277
+ query = f'"{company}" investors funding venture capital backed seed series'
278
+
279
+ payload = json.dumps({
280
+ "api_key": tavily_key,
281
+ "query": query,
282
+ "search_depth": "advanced",
283
+ "max_results": 5,
284
+ "include_answer": True
285
+ }).encode()
286
+
287
+ req = urllib.request.Request(
288
+ 'https://api.tavily.com/search',
289
+ data=payload,
290
+ headers={'Content-Type': 'application/json'},
291
+ method='POST'
292
+ )
293
+
294
+ try:
295
+ with urllib.request.urlopen(req, timeout=30) as resp:
296
+ result = json.loads(resp.read())
297
+ all_track_a.append({
298
+ 'comparable_company': company,
299
+ 'similarity_reason': comp['similarity_reason'],
300
+ 'query': query,
301
+ 'answer': result.get('answer', ''),
302
+ 'results': result.get('results', [])
303
+ })
304
+ print(f'Track A - {company}: {len(result.get("results", []))} results')
305
+ except Exception as e:
306
+ print(f'Track A - {company}: FAILED ({e})')
307
+ all_track_a.append({
308
+ 'comparable_company': company,
309
+ 'similarity_reason': comp['similarity_reason'],
310
+ 'query': query,
311
+ 'answer': '',
312
+ 'results': [],
313
+ 'error': str(e)
314
+ })
315
+
316
+ json.dump(all_track_a, open('/tmp/vc-tracka-results.json', 'w'), indent=2)
317
+ print(f'Track A complete. Comparables with results: {sum(1 for r in all_track_a if r.get("results"))}')
318
+ PYEOF
319
+ ```
320
+
321
+ **If all 5 Track A searches return 0 results:** Tell the user: "No funding data found for the comparable companies. This usually means the comparables are too early-stage or obscure for public press coverage. I will retry with broader comparable names." Then re-run Step 5 with a note to Gemini to choose "well-funded companies with significant press coverage" and retry Step 6.
322
+
323
+ If the retry also returns 0 results: proceed to Track B only, and flag this in `data_quality_flags`.
324
+
325
+ ---
326
+
327
+ ## Step 7: Track B -- VCs With Investment Theses About This Space
328
+
329
+ Run 3 Tavily searches using the L2 and L3 taxonomy from Step 5.
330
+
331
+ ```bash
332
+ python3 << 'PYEOF'
333
+ import json, os, urllib.request
334
+
335
+ analysis = json.load(open('/tmp/vc-product-analysis.json'))
336
+ l2 = analysis['industry_taxonomy']['l2']
337
+ l3 = analysis['industry_taxonomy']['l3']
338
+ stage = analysis['detected_stage']
339
+ tavily_key = os.environ.get('TAVILY_API_KEY', '')
340
+
341
+ queries = [
342
+ {
343
+ 'name': 'thesis_l3',
344
+ 'query': f'venture capital investment thesis "{l3}" investing 2023 OR 2024 OR 2025'
345
+ },
346
+ {
347
+ 'name': 'thesis_l2',
348
+ 'query': f'VC fund "{l2}" investment thesis portfolio companies'
349
+ },
350
+ {
351
+ 'name': 'stage_space',
352
+ 'query': f'{stage} investors "{l3}" startup venture capital fund'
353
+ }
354
+ ]
355
+
356
+ all_track_b = []
357
+
358
+ for q in queries:
359
+ payload = json.dumps({
360
+ "api_key": tavily_key,
361
+ "query": q['query'],
362
+ "search_depth": "advanced",
363
+ "max_results": 7,
364
+ "include_answer": True
365
+ }).encode()
366
+
367
+ req = urllib.request.Request(
368
+ 'https://api.tavily.com/search',
369
+ data=payload,
370
+ headers={'Content-Type': 'application/json'},
371
+ method='POST'
372
+ )
373
+
374
+ try:
375
+ with urllib.request.urlopen(req, timeout=30) as resp:
376
+ result = json.loads(resp.read())
377
+ all_track_b.append({
378
+ 'query_name': q['name'],
379
+ 'query': q['query'],
380
+ 'answer': result.get('answer', ''),
381
+ 'results': result.get('results', [])
382
+ })
383
+ print(f"Track B - {q['name']}: {len(result.get('results', []))} results")
384
+ except Exception as e:
385
+ print(f"Track B - {q['name']}: FAILED ({e})")
386
+ all_track_b.append({
387
+ 'query_name': q['name'],
388
+ 'query': q['query'],
389
+ 'answer': '',
390
+ 'results': [],
391
+ 'error': str(e)
392
+ })
393
+
394
+ json.dump(all_track_b, open('/tmp/vc-trackb-results.json', 'w'), indent=2)
395
+ PYEOF
396
+ ```
397
+
398
+ **If all 3 Track B searches return 0 results:** Proceed with Track A results only. Note in `data_quality_flags`: "No thesis-led investors found via public search. Try checking Substack manually for VC newsletters covering this niche."
399
+
400
+ ---
401
+
402
+ ## Step 8: Gemini Synthesis -- Rank and Score All VCs
403
+
404
+ ```bash
405
+ python3 << 'PYEOF'
406
+ import json
407
+
408
+ analysis = json.load(open('/tmp/vc-product-analysis.json'))
409
+ track_a = json.load(open('/tmp/vc-tracka-results.json'))
410
+ track_b = json.load(open('/tmp/vc-trackb-results.json'))
411
+
412
+ # Compress results to stay within token limits
413
+ track_a_summary = []
414
+ for item in track_a:
415
+ snippets = [{'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:400]}
416
+ for r in item.get('results', [])[:3]]
417
+ track_a_summary.append({
418
+ 'comparable_company': item['comparable_company'],
419
+ 'similarity_reason': item['similarity_reason'],
420
+ 'answer': item.get('answer', '')[:500],
421
+ 'top_results': snippets
422
+ })
423
+
424
+ track_b_summary = []
425
+ for item in track_b:
426
+ snippets = [{'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:400]}
427
+ for r in item.get('results', [])[:4]]
428
+ track_b_summary.append({
429
+ 'query_name': item['query_name'],
430
+ 'answer': item.get('answer', '')[:500],
431
+ 'top_results': snippets
432
+ })
433
+
434
+ context = {
435
+ 'product': {
436
+ 'name': analysis['product_name'],
437
+ 'description': analysis['one_line_description'],
438
+ 'industry': analysis['industry_taxonomy'],
439
+ 'icp': analysis['icp'],
440
+ 'stage': analysis['detected_stage'],
441
+ 'stage_confidence': analysis['stage_confidence'],
442
+ 'geography': analysis['geography_bias']
443
+ },
444
+ 'track_a_research': track_a_summary,
445
+ 'track_b_research': track_b_summary
446
+ }
447
+
448
+ request = {
449
+ "system_instruction": {
450
+ "parts": [{
451
+ "text": """You are a venture capital research analyst. Synthesize investor research into a sourced, ranked list. Follow these rules exactly:
452
+ 1. Only include VCs whose names appear in the provided Tavily search results. Do not add VCs not mentioned in the data.
453
+ 2. Every Track A VC must have evidence_company: the specific comparable company they backed (required -- omit the VC if you cannot confirm this).
454
+ 3. Every Track B VC must have thesis_source_title: the exact article or page title where they stated their thesis (required -- omit the VC if you cannot confirm this).
455
+ 4. stage_fit_score 1-10: penalize 3 points if the VC's typical stage does not match the product's detected stage.
456
+ 5. space_fit_score 1-10: only give 9-10 if the VC backed 2+ companies in this specific L3 niche.
457
+ 6. check_size: use ranges from search result data only. If not found, write "not in search data".
458
+ 7. approach_method: one of -- cold email, warm intro required, AngelList, application form, Twitter/X DM. Infer from what is publicly known about this fund's intake process.
459
+ 8. outreach_hook: must reference this specific product's differentiator and a named VC portfolio signal or thesis quote. Generic hooks like 'highlight your traction' are not acceptable.
460
+ 9. No em dashes anywhere in output.
461
+ 10. No marketing language."""
462
+ }]
463
+ },
464
+ "contents": [{
465
+ "parts": [{
466
+ "text": f"""Synthesize this VC research for the product below. Return a JSON object with exactly these keys:
467
+
468
+ 1. product_summary: object with name, one_line_description, industry_l1, industry_l2, industry_l3, detected_stage, comparable_companies_used (array of names)
469
+
470
+ 2. track_a_vcs: array of VC objects from Track A research. Each object:
471
+ - fund_name, evidence_company (REQUIRED), evidence_source_url, stage_focus, check_size, thesis_summary (1-2 sentences), stage_fit_score (1-10), space_fit_score (1-10), approach_method
472
+
473
+ 3. track_b_vcs: array of VC objects from Track B research. Each object:
474
+ - fund_name, thesis_source_title (REQUIRED), thesis_source_url, stage_focus, check_size, thesis_summary (1-2 sentences), stage_fit_score (1-10), space_fit_score (1-10), approach_method
475
+
476
+ 4. top_5_deep_dives: array of exactly 5 objects (the 5 highest combined score VCs across both tracks). Each:
477
+ - fund_name, track ("A" or "B"), fund_overview (2-3 sentences), why_fit (2-3 sentences specific to this product's L3 niche), portfolio_in_space (array of 1-3 names from search data only), how_to_approach (specific steps, min 30 chars), outreach_hook (2-3 sentences, product-specific)
478
+
479
+ 5. outreach_hooks: array of exactly 3 objects:
480
+ - hook_type (e.g. "portfolio overlap angle", "thesis language mirror", "comparable exit angle"), hook_text (2-3 sentences a founder would actually send), best_for (which VC type this works for)
481
+
482
+ 6. data_quality_flags: array of strings noting any gaps or low-confidence areas
483
+
484
+ Research data:
485
+ {json.dumps(context, indent=2)}"""
486
+ }]
487
+ }],
488
+ "generationConfig": {
489
+ "temperature": 0.3,
490
+ "maxOutputTokens": 6000
491
+ }
492
+ }
493
+
494
+ json.dump(request, open('/tmp/vc-synthesis-request.json', 'w'))
495
+ print('Synthesis request prepared.')
496
+ PYEOF
497
+
498
+ curl -s -X POST \
499
+ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=$GEMINI_API_KEY" \
500
+ -H "Content-Type: application/json" \
501
+ -d @/tmp/vc-synthesis-request.json \
502
+ | python3 -c "
503
+ import sys, json
504
+ d = json.load(sys.stdin)
505
+ text = d['candidates'][0]['content']['parts'][0]['text'].strip()
506
+ if text.startswith('\`\`\`'):
507
+ text = '\n'.join(text.split('\n')[1:-1])
508
+ result = json.loads(text)
509
+ json.dump(result, open('/tmp/vc-final-list.json', 'w'), indent=2)
510
+ print(f'Synthesis complete. Track A: {len(result.get(\"track_a_vcs\", []))} VCs. Track B: {len(result.get(\"track_b_vcs\", []))} VCs.')
511
+ "
512
+ ```
513
+
514
+ **If Gemini returns empty or JSON parsing fails:** Retry once with `maxOutputTokens` reduced to 4000. If retry also fails: present whatever partial JSON was returned, mark missing sections `[INCOMPLETE]`, and tell the user: "Synthesis incomplete. The research data may have been too large. Try running again."
515
+
516
+ ---
517
+
518
+ ## Step 9: Self-QA
519
+
520
+ Run before presenting. Remove non-evidenced VCs structurally.
521
+
522
+ ```bash
523
+ python3 << 'PYEOF'
524
+ import json
525
+
526
+ result = json.load(open('/tmp/vc-final-list.json'))
527
+ failures = []
528
+
529
+ # Remove Track A VCs missing evidence_company
530
+ original_a = len(result.get('track_a_vcs', []))
531
+ result['track_a_vcs'] = [v for v in result.get('track_a_vcs', []) if v.get('evidence_company')]
532
+ removed_a = original_a - len(result['track_a_vcs'])
533
+ if removed_a > 0:
534
+ failures.append(f'Removed {removed_a} Track A VC(s) missing evidence_company')
535
+
536
+ # Remove Track B VCs missing thesis_source_title
537
+ original_b = len(result.get('track_b_vcs', []))
538
+ result['track_b_vcs'] = [v for v in result.get('track_b_vcs', []) if v.get('thesis_source_title')]
539
+ removed_b = original_b - len(result['track_b_vcs'])
540
+ if removed_b > 0:
541
+ failures.append(f'Removed {removed_b} Track B VC(s) missing thesis_source_title')
542
+
543
+ # Check top 5 deep dives
544
+ dives = result.get('top_5_deep_dives', [])
545
+ if len(dives) < 5:
546
+ failures.append(f'Only {len(dives)} deep dives (expected 5) -- insufficient search data')
547
+ for dd in dives:
548
+ if not dd.get('how_to_approach') or len(dd.get('how_to_approach', '')) < 30:
549
+ dd['how_to_approach'] = 'Approach method not determinable from search data. Check the fund website directly for application instructions.'
550
+ failures.append(f"Fixed: '{dd.get('fund_name')}' had missing how_to_approach")
551
+
552
+ # Check outreach hooks count
553
+ if len(result.get('outreach_hooks', [])) != 3:
554
+ failures.append(f"Expected 3 outreach hooks, got {len(result.get('outreach_hooks', []))}")
555
+
556
+ # Check for em dashes
557
+ if ':' in json.dumps(result):
558
+ result_str = json.dumps(result).replace(':', ':')
559
+ result = json.loads(result_str)
560
+ failures.append('Fixed: em dash characters removed from output')
561
+
562
+ # Check for forbidden words
563
+ forbidden = ['powerful', 'robust', 'seamless', 'innovative', 'game-changing', 'streamline', 'leverage', 'transform']
564
+ full_text = json.dumps(result).lower()
565
+ for word in forbidden:
566
+ if word in full_text:
567
+ failures.append(f"Warning: forbidden word '{word}' found in output -- review before presenting")
568
+
569
+ # Ensure data_quality_flags exists
570
+ if 'data_quality_flags' not in result:
571
+ result['data_quality_flags'] = []
572
+ result['data_quality_flags'].extend(failures)
573
+
574
+ json.dump(result, open('/tmp/vc-final-list.json', 'w'), indent=2)
575
+ print(f'QA complete. Issues addressed: {len(failures)}')
576
+ for f in failures:
577
+ print(f' - {f}')
578
+ if not failures:
579
+ print('All QA checks passed.')
580
+ PYEOF
581
+ ```
582
+
583
+ ---
584
+
585
+ ## Step 10: Save and Present Output
586
+
587
+ ```bash
588
+ DATE=$(date +%Y-%m-%d)
589
+ OUTPUT_FILE="docs/vc-intel/${PRODUCT_SLUG}-${DATE}.md"
590
+ mkdir -p docs/vc-intel
591
+ ```
592
+
593
+ Present the final output:
594
+
595
+ ```
596
+ ## VC Finder: [product_name]
597
+ Date: [today] | Stage: [detected_stage] ([stage_confidence] confidence) | Geography: [geography_bias]
598
+
599
+ ---
600
+
601
+ ### Product Analysis
602
+
603
+ What it does: [one_line_description]
604
+ Industry: [l1] > [l2] > [l3]
605
+ Buyer: [buyer_persona] at [company_type], [company_size]
606
+ Comparable companies used for research: [comma-separated list]
607
+
608
+ ---
609
+
610
+ ### Track A: VCs Who Backed Similar Companies
611
+
612
+ *These investors have already written a check in this space.*
613
+
614
+ | Fund | Backed Comparable | Stage Focus | Check Size | Fit Score | Approach |
615
+ |---|---|---|---|---|---|
616
+ [one row per Track A VC, sorted by space_fit_score descending]
617
+
618
+ ---
619
+
620
+ ### Track B: Thesis-Led Investors
621
+
622
+ *These investors are actively publishing about this space.*
623
+
624
+ | Fund | Thesis Source | Stage Focus | Check Size | Fit Score | Approach |
625
+ |---|---|---|---|---|---|
626
+ [one row per Track B VC, sorted by space_fit_score descending]
627
+
628
+ ---
629
+
630
+ ### Top 5 Deep Dives
631
+
632
+ #### [N]. [Fund Name] (Track [A/B])
633
+
634
+ Overview: [fund_overview]
635
+ Why it fits: [why_fit]
636
+ Portfolio in this space: [names, or "Not found in search data"]
637
+ How to approach: [how_to_approach]
638
+ Outreach hook: "[outreach_hook]"
639
+
640
+ [repeat for all 5]
641
+
642
+ ---
643
+
644
+ ### 3 Outreach Hooks for This Product Type
645
+
646
+ **1. [hook_type]**
647
+ [hook_text]
648
+ Best for: [best_for]
649
+
650
+ [repeat for all 3]
651
+
652
+ ---
653
+ Data quality notes: [data_quality_flags, or "None"]
654
+ Saved to: docs/vc-intel/[PRODUCT_SLUG]-[DATE].md
655
+ ```
656
+
657
+ Clean up temp files:
658
+
659
+ ```bash
660
+ rm -f /tmp/vc-product-raw.md /tmp/vc-stage-signals.json /tmp/vc-analysis-request.json \
661
+ /tmp/vc-product-analysis.json /tmp/vc-tracka-results.json /tmp/vc-trackb-results.json \
662
+ /tmp/vc-synthesis-request.json /tmp/vc-final-list.json /tmp/vc-qa-result.json
663
+ ```