@opendirectory.dev/skills 0.1.41 → 0.1.43

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: vc-finder
3
- description: Takes a startup product URL or description, detects the industry and funding stage, identifies 5 comparable funded companies, searches who invested in those companies (Track A), finds VCs who publish investment theses about this space (Track B), and returns a ranked sourced list of relevant investors with deep-dives and outreach hooks. Use when asked to find investors for a startup, identify which VCs fund products like mine, research who backs companies in my space, build a VC target list, or find investor-market fit. Trigger when a user says "find VCs for my startup", "who invests in my space", "build me a VC list", "which funds should I pitch", "find investors for my product", "who backed companies like mine", or "help me find venture capital".
3
+ description: 'Takes a startup product URL or description, detects the industry and funding stage, identifies 5 comparable funded companies, searches who invested in those companies (Track A), finds VCs who publish investment theses about this space (Track B), and returns a ranked sourced list of relevant investors with deep-dives and outreach hooks. Use when asked to find investors for a startup, identify which VCs fund products like mine, research who backs companies in my space, build a VC target list, or find investor-market fit.'
4
4
  compatibility: [claude-code, gemini-cli, github-copilot]
5
5
  ---
6
6
 
@@ -10,7 +10,11 @@ Take a product URL or description. Detect industry and stage. Find 5 comparable
10
10
 
11
11
  ---
12
12
 
13
- **Critical rule:** Every VC in Track A must include the specific comparable company they backed as evidence. Every VC in Track B must include the exact article or post title where they stated their thesis. If a VC name did not appear in Tavily search results, do not include them. No hallucinated fund names.
13
+ **Zero-hallucination policy:** Every fact in the output must be traceable to a specific Tavily search result or the fetched product page. This applies to:
14
+ - Comparable company names: must appear in Tavily search results, not AI training knowledge
15
+ - VC fund names: must appear verbatim in Tavily search results
16
+ - Check sizes, stage focus, portfolio companies: must come from search snippets, not AI knowledge
17
+ - Fund overviews and thesis summaries: extracted from search snippets only. If a detail is not in the search data, write "not found in search data" -- do not fill from training knowledge.
14
18
 
15
19
  ---
16
20
 
@@ -19,25 +23,24 @@ Take a product URL or description. Detect industry and stage. Find 5 comparable
19
23
  | The agent will want to... | Why that's wrong |
20
24
  |---|---|
21
25
  | Add a16z or Sequoia because they are famous | A famous VC without evidence is noise. Only include VCs that appear in Tavily search results for this specific product. Name-dropping wastes the founder's time. |
22
- | Continue when all 5 Track A searches return 0 results | Zero Track A results means the comparables were wrong or too obscure. Stop, regenerate comparables with broader known names, and retry. Continuing produces an evidence-free list. |
26
+ | Generate comparable companies from training knowledge | Comparables must come from Tavily search results (Step 6). AI knowledge of companies is not evidence -- a company suggested from memory may have wrong funding status or may not be a true comparable. |
27
+ | Continue when all 5 Track A searches return 0 results | Zero Track A results means the comparables were wrong or too obscure. Stop, re-run Step 6 with broader search queries, and retry. |
23
28
  | Include a Track B VC without citing the article or post | Thesis without a source is indistinguishable from hallucination. The founder cannot verify it and the list loses all credibility. |
24
- | Detect stage from website aesthetics ("site looks polished") | Stage must come from the specific CTA signals detected in Step 4. Aesthetic guessing sends founders to wrong-stage investors. |
25
- | Write generic outreach hooks like "highlight your traction" | Every outreach hook must name this specific product's differentiator and a specific VC portfolio signal. Generic hooks are removed by the QA step. |
26
- | Skip the URL fetch when the user also provides a description | Always fetch the URL. The live page often reveals stage signals (pricing CTAs, customer logos, job openings) that the user's description omits. |
29
+ | Fill in fund overview from training knowledge | Fund overviews must come from Tavily snippet text only. If the snippets don't describe the fund, write "not found in search data". |
30
+ | Detect stage from website aesthetics | Stage must come from the specific CTA signals detected in Step 4. |
31
+ | Write generic outreach hooks | Every outreach hook must name this specific product's differentiator and a specific VC portfolio signal or thesis quote from the search data. |
32
+ | Skip the URL fetch when the user also provides a description | Always fetch the URL. The live page often reveals stage signals that the user's description omits. |
27
33
 
28
34
  ---
29
35
 
30
36
  ## Step 1: Setup Check
31
37
 
32
38
  ```bash
33
- echo "GEMINI_API_KEY: ${GEMINI_API_KEY:+set}"
34
- echo "TAVILY_API_KEY: ${TAVILY_API_KEY:+set}"
39
+ echo "TAVILY_API_KEY: ${TAVILY_API_KEY:+set}"
35
40
  echo "FIRECRAWL_API_KEY: ${FIRECRAWL_API_KEY:-not set, Tavily extract will be used as fallback}"
36
41
  ```
37
42
 
38
- **If GEMINI_API_KEY is missing:** Stop. Tell the user: "GEMINI_API_KEY is required for product analysis and VC synthesis. Get it at aistudio.google.com. Add it to your .env file."
39
-
40
- **If TAVILY_API_KEY is missing:** Stop. Tell the user: "TAVILY_API_KEY is required to research VC investments and theses. There is no fallback for this. Get it at app.tavily.com. Free tier: 1000 credits/month (about 125 full runs). Add it to your .env file."
43
+ **If TAVILY_API_KEY is missing:** Stop. Tell the user: "TAVILY_API_KEY is required to research VC investments and theses. There is no fallback for this. Get it at app.tavily.com -- free tier: 1000 credits/month (about 125 full runs). Add it to your .env file."
41
44
 
42
45
  **If only FIRECRAWL_API_KEY is missing:** Continue silently. Tavily extract will be used for the URL fetch.
43
46
 
@@ -120,7 +123,7 @@ Proceed to Step 5 using the pasted description as `product_content`.
120
123
 
121
124
  ## Step 4: Detect Stage Signals Locally (No API)
122
125
 
123
- Parse the fetched markdown with regex before any API call. This gives Gemini anchored evidence rather than asking it to guess from aesthetics.
126
+ Parse the fetched markdown with regex before the analysis step.
124
127
 
125
128
  ```bash
126
129
  python3 << 'PYEOF'
@@ -129,25 +132,20 @@ import re, json
129
132
  content = open('/tmp/vc-product-raw.md').read().lower()
130
133
  stage_signals = []
131
134
 
132
- # Pre-seed signals
133
135
  if re.search(r'join\s+(the\s+)?waitlist|sign\s+up\s+for\s+beta|early\s+access|request\s+(an?\s+)?invite|get\s+notified', content):
134
136
  stage_signals.append({'signal': 'waitlist or beta CTA', 'stage_hint': 'pre-seed'})
135
137
 
136
- # Seed signals
137
138
  if re.search(r'start\s+(your\s+)?free\s+trial|try\s+(it\s+)?for\s+free|request\s+a?\s+demo|book\s+a?\s+demo|schedule\s+a?\s+demo', content):
138
139
  stage_signals.append({'signal': 'free trial or demo CTA', 'stage_hint': 'seed'})
139
140
 
140
- # Series A signals
141
141
  if re.search(r'contact\s+sales|talk\s+to\s+(our\s+)?sales|see\s+pricing|view\s+pricing|plans\s+and\s+pricing', content):
142
142
  stage_signals.append({'signal': 'pricing or sales CTA', 'stage_hint': 'series-a'})
143
143
  if re.search(r'case\s+stud(y|ies)|customer\s+stor(y|ies)|trusted\s+by\s+[\d,]+|used\s+by\s+[\d,]+', content):
144
144
  stage_signals.append({'signal': 'case studies or customer count', 'stage_hint': 'series-a'})
145
145
 
146
- # Series A/B signals
147
146
  if re.search(r'enterprise\s+(plan|pricing|tier)|we.?re\s+hiring|join\s+our\s+team|open\s+positions', content):
148
147
  stage_signals.append({'signal': 'enterprise tier or job openings', 'stage_hint': 'series-a-or-b'})
149
148
 
150
- # Funding announcement -- extract directly if present
151
149
  funding_match = re.search(
152
150
  r'raised\s+\$[\d,.]+\s*[mk]?|series\s+[abc]\s+round|seed\s+round|(\$[\d,.]+\s*[mk]?\s+(?:seed|series\s+[abc]))',
153
151
  content
@@ -155,7 +153,6 @@ funding_match = re.search(
155
153
  if funding_match:
156
154
  stage_signals.append({'signal': f'funding text: {funding_match.group(0).strip()}', 'stage_hint': 'announced'})
157
155
 
158
- # Determine dominant stage
159
156
  if not stage_signals:
160
157
  dominant = 'unknown'
161
158
  elif any(s['stage_hint'] == 'announced' for s in stage_signals):
@@ -181,87 +178,340 @@ PYEOF
181
178
 
182
179
  ---
183
180
 
184
- ## Step 5: Product Analysis with Gemini
181
+ ## Step 5: Product Analysis (Taxonomy, Stage, ICP)
182
+
183
+ Print the product content and stage signals:
184
+
185
+ ```bash
186
+ python3 -c "
187
+ import json
188
+ content = open('/tmp/vc-product-raw.md').read()[:6000]
189
+ signals = json.load(open('/tmp/vc-stage-signals.json'))
190
+ print('=== PRODUCT PAGE (first 6000 chars) ===')
191
+ print(content)
192
+ print()
193
+ print('=== DETECTED STAGE SIGNALS ===')
194
+ print(json.dumps(signals, indent=2))
195
+ "
196
+ ```
197
+
198
+ **AI instructions:** Analyze the product page content above. Generate the taxonomy, ICP, and stage classification only -- do NOT generate comparable companies yet (that is done via live search in Step 6).
199
+
200
+ Write to `/tmp/vc-product-analysis.json`:
201
+
202
+ - `product_name`: from the page
203
+ - `one_line_description`: what it does, for whom, core value prop. Under 20 words. No marketing language.
204
+ - `industry_taxonomy`: `l1` (top-level: fintech / healthtech / developer tools / consumer / etc.), `l2` (sector: sales technology / logistics software / etc.), `l3` (specific niche: outbound prospecting / last-mile routing / etc.). Vague labels like "technology" or "software" alone are not acceptable.
205
+ - `icp`: `buyer_persona` (job title), `company_type`, `company_size`
206
+ - `detected_stage`: pre-seed / seed / series-a / series-b / unknown
207
+ - `stage_confidence`: high / medium / low
208
+ - `stage_evidence`: one sentence citing exactly which CTA or text on the page drove this. Write "no clear signals found" if unknown.
209
+ - `geography_bias`: US / Europe / global / unclear
210
+ - `comparable_companies`: leave as empty array `[]` -- will be filled in Step 6
185
211
 
186
212
  ```bash
187
213
  python3 << 'PYEOF'
188
214
  import json
189
215
 
190
- product_content = open('/tmp/vc-product-raw.md').read()[:6000]
191
- stage_signals = json.load(open('/tmp/vc-stage-signals.json'))
216
+ analysis = {
217
+ # FILL from your analysis above
218
+ "comparable_companies": []
219
+ }
192
220
 
193
- request = {
194
- "system_instruction": {
195
- "parts": [{
196
- "text": "You are a venture capital analyst. Analyze a product page and return structured JSON only. No commentary. No em dashes. Vague category labels like 'technology' or 'software' alone are not acceptable at L2 or L3 -- be specific. Comparable companies must be real funded companies with public funding records, well-known enough to appear in press coverage."
197
- }]
198
- },
199
- "contents": [{
200
- "parts": [{
201
- "text": f"""Analyze this product page and return a JSON object with exactly these keys:
202
-
203
- 1. product_name: string
204
- 2. one_line_description: string -- what it does, for whom, core value prop. Under 20 words. No marketing language.
205
- 3. industry_taxonomy: object with:
206
- - l1: top-level (e.g. "software", "fintech", "healthtech", "consumer", "hardware")
207
- - l2: sector (e.g. "developer tools", "sales technology", "edtech", "logistics software")
208
- - l3: specific niche (e.g. "CI/CD automation", "outbound prospecting", "last-mile routing")
209
- 4. icp: object with:
210
- - buyer_persona: job title (e.g. "VP Engineering", "founder", "sales ops manager")
211
- - company_type: (e.g. "B2B SaaS", "e-commerce brand", "enterprise IT team")
212
- - company_size: (e.g. "5-50 employees", "50-500 employees", "enterprise")
213
- 5. detected_stage: one of: pre-seed, seed, series-a, series-b, unknown
214
- 6. stage_confidence: one of: high, medium, low
215
- 7. stage_evidence: one sentence citing exactly which CTA or text on the page drove this classification. Write "no clear signals found" if unknown.
216
- 8. comparable_companies: array of exactly 5 objects, each with:
217
- - name: real company name (must have public VC funding records)
218
- - similarity_reason: one sentence why this company is comparable to the product
219
- - estimated_stage: their funding stage as of your knowledge cutoff
220
- 9. geography_bias: one of: US, Europe, global, unclear -- infer from page text
221
-
222
- Stage signals detected from the page (use as input to your stage classification):
223
- {json.dumps(stage_signals, indent=2)}
224
-
225
- Product page content:
226
- {product_content}"""
227
- }]
228
- }],
229
- "generationConfig": {
230
- "temperature": 0.2,
231
- "maxOutputTokens": 3000
232
- }
221
+ json.dump(analysis, open('/tmp/vc-product-analysis.json', 'w'), indent=2)
222
+ print('Product analysis written.')
223
+ PYEOF
224
+ ```
225
+
226
+ Verify:
227
+
228
+ ```bash
229
+ python3 -c "
230
+ import json
231
+ a = json.load(open('/tmp/vc-product-analysis.json'))
232
+ print('Product:', a['product_name'])
233
+ print('Industry:', a['industry_taxonomy']['l1'], '>', a['industry_taxonomy']['l2'], '>', a['industry_taxonomy']['l3'])
234
+ print('Stage:', a['detected_stage'], '(' + a['stage_confidence'] + ' confidence)')
235
+ "
236
+ ```
237
+
238
+ ---
239
+
240
+ ## Step 5b: Curated Pre-Match Against Verified Fund Dataset
241
+
242
+ Run the product taxonomy against a curated dataset of 25 verified VC funds (sourced from fund websites). Produces zero-hallucination fund matches and seed comparables for Track A -- no Tavily credits consumed.
243
+
244
+ Print product analysis for tag mapping:
245
+
246
+ ```bash
247
+ python3 -c "
248
+ import json
249
+ a = json.load(open('/tmp/vc-product-analysis.json'))
250
+ print('Taxonomy:', a['industry_taxonomy']['l1'], '>', a['industry_taxonomy']['l2'], '>', a['industry_taxonomy']['l3'])
251
+ print('Stage:', a['detected_stage'])
252
+ print('Geography:', a['geography_bias'])
253
+ "
254
+ ```
255
+
256
+ **AI instructions:** Map the product taxonomy to the standard tags used in the fund dataset. Available tags:
257
+ `DevTools`, `Infrastructure`, `Open Source`, `B2B SaaS`, `AI`, `Data`, `FinTech`, `HealthTech`, `Enterprise`, `Consumer`, `Marketplaces`, `E-commerce`, `Crypto`, `DeepTech`, `Cybersecurity`, `Generalist`
258
+
259
+ Pick 2-4 tags that describe this product. Map `detected_stage` to: `Pre-seed`, `Seed`, `Series A`, or `Growth`. Map `geography_bias` to: `US`, `Europe`, `India`, or `Global`.
260
+
261
+ Write product context:
262
+
263
+ ```bash
264
+ python3 << 'PYEOF'
265
+ import json
266
+
267
+ # FILL based on taxonomy analysis above
268
+ context = {
269
+ "extracted_tags": ["TagA", "TagB"], # 2-4 tags from the list above
270
+ "stage_hint": "Seed", # Pre-seed / Seed / Series A / Growth
271
+ "geography_hint": "US" # US / Europe / India / Global
233
272
  }
273
+ json.dump(context, open('/tmp/vc-product-context.json', 'w'), indent=2)
274
+ print('Product context:', context)
275
+ PYEOF
276
+ ```
277
+
278
+ Run scoring against the embedded curated dataset:
279
+
280
+ ```bash
281
+ python3 << 'PYEOF'
282
+ import json
283
+
284
+ context = json.load(open('/tmp/vc-product-context.json'))
285
+
286
+ VC_FUNDS = [
287
+ {"fund_name":"Y Combinator","thesis":"We provide seed funding for startups. We invest in deeply technical teams building massive companies across all domains.","check_size":"$500k","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","B2B SaaS","DevTools","AI"],"geography_focus":["Global"],"notable_portfolio":["Stripe","Airbnb","GitLab"],"website":"https://www.ycombinator.com"},
288
+ {"fund_name":"boldstart ventures","thesis":"Day one partner for developer first, crypto, and SaaS founders. We love deeply technical founders solving hard infrastructure problems.","check_size":"$1M - $3M","stage_focus":["Pre-seed","Seed"],"industry_tags":["DevTools","Infrastructure","Crypto"],"geography_focus":["Global","US"],"notable_portfolio":["Snyk","Blockdaemon","Superhuman"],"website":"https://boldstart.vc"},
289
+ {"fund_name":"Heavybit","thesis":"The leading investor in developer-first startups. We help technical founders launch, gain traction, and build enterprise-ready companies.","check_size":"$1M - $5M","stage_focus":["Seed","Series A"],"industry_tags":["DevTools","Infrastructure","Open Source"],"geography_focus":["Global","US"],"notable_portfolio":["PagerDuty","Sanity","Netlify"],"website":"https://www.heavybit.com"},
290
+ {"fund_name":"Amplify Partners","thesis":"We invest in technical founders building the next generation of IT infrastructure, developer tools, and data platforms.","check_size":"$2M - $8M","stage_focus":["Seed","Series A"],"industry_tags":["DevTools","Infrastructure","AI","Data"],"geography_focus":["US"],"notable_portfolio":["Datadog","OCTO","dbt Labs"],"website":"https://www.amplifypartners.com"},
291
+ {"fund_name":"OSS Capital","thesis":"We exclusively back early-stage founders building Commercial Open Source Software (COSS) companies.","check_size":"$500k - $2M","stage_focus":["Pre-seed","Seed","Series A"],"industry_tags":["Open Source","DevTools"],"geography_focus":["Global"],"notable_portfolio":["Cal.com","Appsmith","Hoppscotch"],"website":"https://oss.capital"},
292
+ {"fund_name":"Sequoia Capital","thesis":"We help the daring build legendary companies, from idea to IPO and beyond. Sequoia is an early-stage and growth-stage investor.","check_size":"$1M - $10M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Enterprise","Consumer","AI"],"geography_focus":["Global"],"notable_portfolio":["Apple","Google","WhatsApp"],"website":"https://www.sequoiacap.com"},
293
+ {"fund_name":"Andreessen Horowitz (a16z)","thesis":"We invest in software eating the world. We back bold entrepreneurs building the future through technology.","check_size":"$1M - $50M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Crypto","Enterprise","Consumer","AI"],"geography_focus":["Global","US"],"notable_portfolio":["Facebook","Coinbase","Figma"],"website":"https://a16z.com"},
294
+ {"fund_name":"Point Nine Capital","thesis":"We are a seed-stage venture capital firm focused on B2B SaaS and B2B marketplaces globally.","check_size":"$1M - $3M","stage_focus":["Seed"],"industry_tags":["B2B SaaS","Marketplaces"],"geography_focus":["Europe","Global"],"notable_portfolio":["Zendesk","Typeform","Docplanner"],"website":"https://www.pointnine.com"},
295
+ {"fund_name":"Cherry Ventures","thesis":"We champion founders in Europe from their earliest days. We are generalist seed investors.","check_size":"$1M - $4M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","Consumer","B2B SaaS"],"geography_focus":["Europe"],"notable_portfolio":["FlixBus","Auto1 Group","Forto"],"website":"https://www.cherry.vc"},
296
+ {"fund_name":"First Round Capital","thesis":"We are the seed-stage firm that builds the most supportive community for founders.","check_size":"$1M - $4M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","B2B SaaS","Consumer"],"geography_focus":["US"],"notable_portfolio":["Uber","Notion","Roblox"],"website":"https://firstround.com"},
297
+ {"fund_name":"Bessemer Venture Partners","thesis":"BVP helps entrepreneurs lay strong foundations to build and forge long-standing companies.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Enterprise","Consumer","FinTech"],"geography_focus":["Global"],"notable_portfolio":["LinkedIn","Twilio","Shopify"],"website":"https://www.bvp.com"},
298
+ {"fund_name":"Index Ventures","thesis":"We back the best and most ambitious entrepreneurs across all stages to build category-defining businesses.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","FinTech","Consumer","B2B SaaS"],"geography_focus":["Europe","US","Global"],"notable_portfolio":["Dropbox","Slack","Figma"],"website":"https://www.indexventures.com"},
299
+ {"fund_name":"Lightspeed Venture Partners","thesis":"We invest globally in enterprise, consumer, and health founders who are shaping the future.","check_size":"$1M - $25M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Enterprise","Consumer","FinTech"],"geography_focus":["Global"],"notable_portfolio":["Snap","Rippling","MuleSoft"],"website":"https://lsvp.com"},
300
+ {"fund_name":"Accel","thesis":"We partner with exceptional founders from inception through all phases of private company growth.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","B2B SaaS","Consumer","DevTools"],"geography_focus":["Global"],"notable_portfolio":["Facebook","Atlassian","Spotify"],"website":"https://www.accel.com"},
301
+ {"fund_name":"Bain Capital Ventures","thesis":"From seed to growth, we back founders building legendary infrastructure, fintech, application, and commerce companies.","check_size":"$1M - $50M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Infrastructure","FinTech","B2B SaaS"],"geography_focus":["US","Global"],"notable_portfolio":["DocuSign","SendGrid","Redis"],"website":"https://www.baincapitalventures.com"},
302
+ {"fund_name":"Greylock Partners","thesis":"We partner with early-stage founders to build enterprise and consumer software companies that define new categories.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["Enterprise","Consumer","Cybersecurity","AI"],"geography_focus":["US"],"notable_portfolio":["Workday","Palo Alto Networks","LinkedIn"],"website":"https://greylock.com"},
303
+ {"fund_name":"Unusual Ventures","thesis":"We provide a breakthrough level of support for early-stage founders building enterprise tech.","check_size":"$1M - $5M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Enterprise","DevTools","B2B SaaS"],"geography_focus":["US"],"notable_portfolio":["Arctic Wolf","Harness","Vivun"],"website":"https://www.unusual.vc"},
304
+ {"fund_name":"Crane Venture Partners","thesis":"We back deep tech and enterprise founders in Europe solving hard problems with data and code.","check_size":"$1M - $4M","stage_focus":["Seed"],"industry_tags":["Enterprise","DeepTech","Data","AI"],"geography_focus":["Europe"],"notable_portfolio":["Onfido","Tessian","Forto"],"website":"https://crane.vc"},
305
+ {"fund_name":"Founder Collective","thesis":"We are a seed-stage venture capital fund, built by founders, for founders. We back weird, wonderful, and wild startups.","check_size":"$500k - $2M","stage_focus":["Seed"],"industry_tags":["Generalist","Consumer","B2B SaaS"],"geography_focus":["US","Global"],"notable_portfolio":["Uber","Airtable","BuzzFeed"],"website":"https://www.foundercollective.com"},
306
+ {"fund_name":"Benchmark","thesis":"We are a partnership of equal partners. We back mission-driven founders at the earliest stages and walk beside them for the long haul.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["Generalist","Marketplaces","Enterprise","Consumer"],"geography_focus":["US","Global"],"notable_portfolio":["Uber","Twitter","eBay","Snapchat"],"website":"https://www.benchmark.com"},
307
+ {"fund_name":"Accel India","thesis":"We partner with exceptional founders from inception through all phases of private company growth in the Indian ecosystem.","check_size":"$1M - $15M","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","B2B SaaS","Consumer","FinTech","E-commerce"],"geography_focus":["India"],"notable_portfolio":["Flipkart","Swiggy","Freshworks"],"website":"https://www.accel.com/india"},
308
+ {"fund_name":"Blume Ventures","thesis":"We are a seed and pre-seed venture fund that backs startups with both funding and active mentoring.","check_size":"$500k - $3M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","B2B SaaS","Consumer","DeepTech","HealthTech"],"geography_focus":["India"],"notable_portfolio":["Unacademy","Purplle","GreyOrange"],"website":"https://blume.vc"},
309
+ {"fund_name":"Elevation Capital","thesis":"We partner with visionary founders in India across early stages to help them build category-defining businesses.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["Generalist","Consumer","FinTech","B2B SaaS","HealthTech"],"geography_focus":["India"],"notable_portfolio":["Paytm","Swiggy","Meesho"],"website":"https://elevationcapital.com"},
310
+ {"fund_name":"Peak XV Partners","thesis":"Formerly Sequoia India & SEA, we partner with founders across early, growth, and public stages to build enduring companies.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Consumer","FinTech","B2B SaaS","DevTools","AI"],"geography_focus":["India","South Asia"],"notable_portfolio":["Zomato","Pine Labs","Cred"],"website":"https://www.peakxv.com"},
311
+ {"fund_name":"Nexus Venture Partners","thesis":"We are a US-India venture capital firm backing extraordinary founders building product-first companies.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["B2B SaaS","Enterprise","DevTools","Consumer"],"geography_focus":["India","US"],"notable_portfolio":["Postman","Hasura","Zepto"],"website":"https://nexusvp.com"}
312
+ ]
313
+
314
+ STAGE_ORDER = {"Pre-seed": 0, "Seed": 1, "Series A": 2, "Growth": 3}
315
+
316
+ def score_fund(fund, ctx):
317
+ score = 0
318
+ fund_tags = fund.get("industry_tags", [])
319
+ extracted_tags = ctx.get("extracted_tags", ["Generalist"])
320
+ tag_points = 0
321
+ matched_tags = []
322
+ for tag in extracted_tags:
323
+ if tag in fund_tags:
324
+ tag_points += 5 if tag == "Generalist" else 20
325
+ matched_tags.append(tag)
326
+ tag_points = min(tag_points, 60)
327
+ score += tag_points
328
+ stage_hint = ctx.get("stage_hint")
329
+ fund_stages = fund.get("stage_focus", [])
330
+ if not stage_hint:
331
+ score += 10
332
+ elif fund_stages:
333
+ if stage_hint in fund_stages:
334
+ score += 20
335
+ elif stage_hint in STAGE_ORDER:
336
+ hint_idx = STAGE_ORDER[stage_hint]
337
+ if any(f in STAGE_ORDER and abs(STAGE_ORDER[f] - hint_idx) == 1 for f in fund_stages):
338
+ score += 10
339
+ geo_hint = ctx.get("geography_hint")
340
+ fund_geo = fund.get("geography_focus", ["Global"])
341
+ if not geo_hint or geo_hint == "Global":
342
+ score += 10
343
+ elif fund_geo == ["India"] and geo_hint == "US":
344
+ pass
345
+ elif geo_hint in fund_geo:
346
+ score += 20
347
+ elif "Global" in fund_geo:
348
+ score += 15
349
+ if geo_hint == "US" and "India" in fund_geo and "US" not in fund_geo and "Global" not in fund_geo:
350
+ score = max(0, score - 30)
351
+ if fund_tags and extracted_tags and fund_tags[0] not in extracted_tags and tag_points <= 20:
352
+ score = max(0, score - 15)
353
+ return score, matched_tags
354
+
355
+ scored = []
356
+ for fund in VC_FUNDS:
357
+ score, matched_tags = score_fund(fund, context)
358
+ tier = "High" if score >= 70 else ("Medium" if score >= 40 else "Low")
359
+ scored.append({
360
+ "fund_name": fund["fund_name"],
361
+ "thesis": fund["thesis"],
362
+ "check_size": fund["check_size"],
363
+ "stage_focus": fund["stage_focus"],
364
+ "industry_tags": fund["industry_tags"],
365
+ "geography_focus": fund["geography_focus"],
366
+ "notable_portfolio": fund["notable_portfolio"],
367
+ "website": fund["website"],
368
+ "source": "verified (fund website)",
369
+ "score": score,
370
+ "confidence": tier,
371
+ "matched_tags": matched_tags
372
+ })
373
+
374
+ scored.sort(key=lambda x: (-x["score"], x["fund_name"]))
375
+ relevant = [m for m in scored if m["confidence"] in ("High", "Medium")]
234
376
 
235
- json.dump(request, open('/tmp/vc-analysis-request.json', 'w'))
377
+ curated_comparables = []
378
+ for m in relevant:
379
+ for company in m.get("notable_portfolio", []):
380
+ if company not in curated_comparables:
381
+ curated_comparables.append(company)
382
+
383
+ output = {
384
+ "high_medium_matches": relevant,
385
+ "curated_comparables": curated_comparables[:6]
386
+ }
387
+ json.dump(output, open('/tmp/vc-curated-matches.json', 'w'), indent=2)
388
+ print(f'Curated matches: {len(relevant)} High/Medium confidence funds')
389
+ for m in relevant[:8]:
390
+ print(f' {m["confidence"]:6} ({m["score"]:3}) {m["fund_name"]}')
391
+ print(f'Seed comparables from portfolio: {curated_comparables[:6]}')
236
392
  PYEOF
393
+ ```
237
394
 
238
- curl -s -X POST \
239
- "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=$GEMINI_API_KEY" \
240
- -H "Content-Type: application/json" \
241
- -d @/tmp/vc-analysis-request.json \
242
- | python3 -c "
243
- import sys, json
244
- d = json.load(sys.stdin)
245
- text = d['candidates'][0]['content']['parts'][0]['text'].strip()
246
- if text.startswith('\`\`\`'):
247
- text = '\n'.join(text.split('\n')[1:-1])
248
- analysis = json.loads(text)
249
- json.dump(analysis, open('/tmp/vc-product-analysis.json', 'w'), indent=2)
250
- print('Product analysis complete.')
251
- print('Product:', analysis['product_name'])
252
- print('Industry:', analysis['industry_taxonomy']['l1'], '>', analysis['industry_taxonomy']['l2'], '>', analysis['industry_taxonomy']['l3'])
253
- print('Stage:', analysis['detected_stage'], '(' + analysis['stage_confidence'] + ' confidence)')
254
- print('Comparables:', ', '.join(c['name'] for c in analysis['comparable_companies']))
395
+ ---
396
+
397
+ ## Step 6: Discover Comparable Companies via Tavily
398
+
399
+ Load curated portfolio companies from Step 5b as seed comparables:
400
+
401
+ ```bash
402
+ python3 -c "
403
+ import json
404
+ matches = json.load(open('/tmp/vc-curated-matches.json'))
405
+ curated = matches.get('curated_comparables', [])
406
+ print(f'Curated portfolio comparables ({len(curated)}): {curated}')
407
+ need = max(0, 5 - len(curated))
408
+ print(f'Tavily will supplement with up to {need} more')
255
409
  "
256
410
  ```
257
411
 
258
- **If Gemini returns empty or JSON parsing fails:** Retry once with `maxOutputTokens` reduced to 2000. If retry also fails: Stop. Tell the user: "Product analysis failed. Please paste a direct description (3-5 sentences: what it does, who it is for, current stage) and run again."
412
+ **Do not use AI training knowledge to generate comparable companies.** Curated portfolio companies (above) are already zero-hallucination comparables from verified fund data. Tavily supplements with L3-niche-specific companies.
413
+
414
+ ```bash
415
+ python3 << 'PYEOF'
416
+ import json, os, urllib.request
417
+
418
+ analysis = json.load(open('/tmp/vc-product-analysis.json'))
419
+ l2 = analysis['industry_taxonomy']['l2']
420
+ l3 = analysis['industry_taxonomy']['l3']
421
+ tavily_key = os.environ.get('TAVILY_API_KEY', '')
422
+
423
+ queries = [
424
+ f'"{l3}" startup raised funding venture capital seed series',
425
+ f'"{l2}" companies venture backed funded startup'
426
+ ]
427
+
428
+ all_results = []
429
+ for query in queries:
430
+ payload = json.dumps({
431
+ "api_key": tavily_key,
432
+ "query": query,
433
+ "search_depth": "advanced",
434
+ "max_results": 8,
435
+ "include_answer": True
436
+ }).encode()
437
+
438
+ req = urllib.request.Request(
439
+ 'https://api.tavily.com/search',
440
+ data=payload,
441
+ headers={'Content-Type': 'application/json'},
442
+ method='POST'
443
+ )
444
+
445
+ try:
446
+ with urllib.request.urlopen(req, timeout=30) as resp:
447
+ result = json.loads(resp.read())
448
+ all_results.append({
449
+ 'query': query,
450
+ 'answer': result.get('answer', ''),
451
+ 'results': [
452
+ {'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:500]}
453
+ for r in result.get('results', [])
454
+ ]
455
+ })
456
+ print(f'Comparable search: {len(result.get("results", []))} results for "{query[:60]}"')
457
+ except Exception as e:
458
+ print(f'Comparable search FAILED: {e}')
459
+ all_results.append({'query': query, 'answer': '', 'results': [], 'error': str(e)})
460
+
461
+ json.dump(all_results, open('/tmp/vc-comparable-search.json', 'w'), indent=2)
462
+ PYEOF
463
+ ```
464
+
465
+ Print results for AI selection:
466
+
467
+ ```bash
468
+ python3 -c "
469
+ import json
470
+ results = json.load(open('/tmp/vc-comparable-search.json'))
471
+ for r in results:
472
+ print(f'Query: {r[\"query\"]}')
473
+ print(f'Answer: {r.get(\"answer\",\"\")[:400]}')
474
+ for item in r.get('results', []):
475
+ print(f' - {item[\"title\"]} | {item[\"url\"]}')
476
+ print(f' {item[\"content\"][:200]}')
477
+ print()
478
+ "
479
+ ```
480
+
481
+ **AI instructions:** Combine the curated portfolio companies from `/tmp/vc-curated-matches.json` with the Tavily search results above. Pick exactly 5 comparable companies. Prioritize curated portfolio companies (already verified -- they are real portfolio companies of matched VC funds). Supplement with Tavily-discovered companies to reach 5 if needed.
482
+
483
+ For each comparable write:
484
+ - `name`: company name
485
+ - `similarity_reason`: one sentence explaining the fit (for curated: reference the fund that backed them; for Tavily: cite the snippet)
486
+ - `source_url`: portfolio fund website for curated companies, Tavily result URL for discovered ones
487
+ - `estimated_stage`: from curated data or snippet text -- write "not in search data" if unknown
488
+ - `source_type`: `"curated_portfolio"` or `"tavily_discovered"`
489
+
490
+ Update `/tmp/vc-product-analysis.json` with the `comparable_companies` array:
491
+
492
+ ```bash
493
+ python3 << 'PYEOF'
494
+ import json
495
+
496
+ analysis = json.load(open('/tmp/vc-product-analysis.json'))
497
+
498
+ analysis['comparable_companies'] = [
499
+ # FILL 5 companies -- curated_portfolio first, then tavily_discovered
500
+ # Each: {"name": str, "similarity_reason": str, "source_url": str, "estimated_stage": str, "source_type": str}
501
+ ]
502
+
503
+ json.dump(analysis, open('/tmp/vc-product-analysis.json', 'w'), indent=2)
504
+ print('Comparables written:', ', '.join(c['name'] for c in analysis['comparable_companies']))
505
+ PYEOF
506
+ ```
507
+
508
+ **If fewer than 3 comparable companies appear in the search results:** Broaden the queries. Run a third search: `"[l1] startup" funding round venture capital`. If still thin, proceed with what is available and flag in `data_quality_flags`.
259
509
 
260
510
  ---
261
511
 
262
- ## Step 6: Track A -- Who Invested in Comparable Companies
512
+ ## Step 7: Track A -- Who Invested in Comparable Companies
263
513
 
264
- Run 5 Tavily searches, one per comparable. Save all results to a single file.
514
+ Run 5 Tavily searches, one per comparable.
265
515
 
266
516
  ```bash
267
517
  python3 << 'PYEOF'
@@ -318,15 +568,13 @@ print(f'Track A complete. Comparables with results: {sum(1 for r in all_track_a
318
568
  PYEOF
319
569
  ```
320
570
 
321
- **If all 5 Track A searches return 0 results:** Tell the user: "No funding data found for the comparable companies. This usually means the comparables are too early-stage or obscure for public press coverage. I will retry with broader comparable names." Then re-run Step 5 with a note to Gemini to choose "well-funded companies with significant press coverage" and retry Step 6.
322
-
323
- If the retry also returns 0 results: proceed to Track B only, and flag this in `data_quality_flags`.
571
+ **If all 5 Track A searches return 0 results:** Re-run Step 6 with broader queries. Retry with well-covered companies (those with significant press coverage). If still 0: proceed to Track B only and flag in `data_quality_flags`.
324
572
 
325
573
  ---
326
574
 
327
- ## Step 7: Track B -- VCs With Investment Theses About This Space
575
+ ## Step 8: Track B -- VCs With Investment Theses About This Space
328
576
 
329
- Run 3 Tavily searches using the L2 and L3 taxonomy from Step 5.
577
+ Run 3 Tavily searches using L2 and L3 taxonomy from Step 5.
330
578
 
331
579
  ```bash
332
580
  python3 << 'PYEOF'
@@ -339,18 +587,9 @@ stage = analysis['detected_stage']
339
587
  tavily_key = os.environ.get('TAVILY_API_KEY', '')
340
588
 
341
589
  queries = [
342
- {
343
- 'name': 'thesis_l3',
344
- 'query': f'venture capital investment thesis "{l3}" investing 2023 OR 2024 OR 2025'
345
- },
346
- {
347
- 'name': 'thesis_l2',
348
- 'query': f'VC fund "{l2}" investment thesis portfolio companies'
349
- },
350
- {
351
- 'name': 'stage_space',
352
- 'query': f'{stage} investors "{l3}" startup venture capital fund'
353
- }
590
+ {'name': 'thesis_l3', 'query': f'venture capital investment thesis "{l3}" investing 2023 OR 2024 OR 2025'},
591
+ {'name': 'thesis_l2', 'query': f'VC fund "{l2}" investment thesis portfolio companies'},
592
+ {'name': 'stage_space', 'query': f'{stage} investors "{l3}" startup venture capital fund'}
354
593
  ]
355
594
 
356
595
  all_track_b = []
@@ -383,33 +622,29 @@ for q in queries:
383
622
  print(f"Track B - {q['name']}: {len(result.get('results', []))} results")
384
623
  except Exception as e:
385
624
  print(f"Track B - {q['name']}: FAILED ({e})")
386
- all_track_b.append({
387
- 'query_name': q['name'],
388
- 'query': q['query'],
389
- 'answer': '',
390
- 'results': [],
391
- 'error': str(e)
392
- })
625
+ all_track_b.append({'query_name': q['name'], 'query': q['query'], 'answer': '', 'results': [], 'error': str(e)})
393
626
 
394
627
  json.dump(all_track_b, open('/tmp/vc-trackb-results.json', 'w'), indent=2)
395
628
  PYEOF
396
629
  ```
397
630
 
398
- **If all 3 Track B searches return 0 results:** Proceed with Track A results only. Note in `data_quality_flags`: "No thesis-led investors found via public search. Try checking Substack manually for VC newsletters covering this niche."
631
+ **If all 3 Track B searches return 0 results:** Proceed with Track A results only. Note in `data_quality_flags`: "No thesis-led investors found via public search."
399
632
 
400
633
  ---
401
634
 
402
- ## Step 8: Gemini Synthesis -- Rank and Score All VCs
635
+ ## Step 9: Synthesize -- Rank and Score All VCs
636
+
637
+ Print the research data:
403
638
 
404
639
  ```bash
405
- python3 << 'PYEOF'
640
+ python3 -c "
406
641
  import json
407
642
 
408
643
  analysis = json.load(open('/tmp/vc-product-analysis.json'))
409
644
  track_a = json.load(open('/tmp/vc-tracka-results.json'))
410
645
  track_b = json.load(open('/tmp/vc-trackb-results.json'))
646
+ curated = json.load(open('/tmp/vc-curated-matches.json'))
411
647
 
412
- # Compress results to stay within token limits
413
648
  track_a_summary = []
414
649
  for item in track_a:
415
650
  snippets = [{'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:400]}
@@ -431,7 +666,22 @@ for item in track_b:
431
666
  'top_results': snippets
432
667
  })
433
668
 
434
- context = {
669
+ curated_summary = []
670
+ for m in curated.get('high_medium_matches', []):
671
+ curated_summary.append({
672
+ 'fund_name': m['fund_name'],
673
+ 'confidence': m['confidence'],
674
+ 'score': m['score'],
675
+ 'matched_tags': m['matched_tags'],
676
+ 'thesis': m['thesis'],
677
+ 'check_size': m['check_size'],
678
+ 'stage_focus': m['stage_focus'],
679
+ 'notable_portfolio': m['notable_portfolio'],
680
+ 'website': m['website'],
681
+ 'source': 'verified (fund website)'
682
+ })
683
+
684
+ print(json.dumps({
435
685
  'product': {
436
686
  'name': analysis['product_name'],
437
687
  'description': analysis['one_line_description'],
@@ -441,83 +691,58 @@ context = {
441
691
  'stage_confidence': analysis['stage_confidence'],
442
692
  'geography': analysis['geography_bias']
443
693
  },
694
+ 'curated_matches': curated_summary,
444
695
  'track_a_research': track_a_summary,
445
696
  'track_b_research': track_b_summary
446
- }
447
-
448
- request = {
449
- "system_instruction": {
450
- "parts": [{
451
- "text": """You are a venture capital research analyst. Synthesize investor research into a sourced, ranked list. Follow these rules exactly:
452
- 1. Only include VCs whose names appear in the provided Tavily search results. Do not add VCs not mentioned in the data.
453
- 2. Every Track A VC must have evidence_company: the specific comparable company they backed (required -- omit the VC if you cannot confirm this).
454
- 3. Every Track B VC must have thesis_source_title: the exact article or page title where they stated their thesis (required -- omit the VC if you cannot confirm this).
455
- 4. stage_fit_score 1-10: penalize 3 points if the VC's typical stage does not match the product's detected stage.
456
- 5. space_fit_score 1-10: only give 9-10 if the VC backed 2+ companies in this specific L3 niche.
457
- 6. check_size: use ranges from search result data only. If not found, write "not in search data".
458
- 7. approach_method: one of -- cold email, warm intro required, AngelList, application form, Twitter/X DM. Infer from what is publicly known about this fund's intake process.
459
- 8. outreach_hook: must reference this specific product's differentiator and a named VC portfolio signal or thesis quote. Generic hooks like 'highlight your traction' are not acceptable.
460
- 9. No em dashes anywhere in output.
461
- 10. No marketing language."""
462
- }]
463
- },
464
- "contents": [{
465
- "parts": [{
466
- "text": f"""Synthesize this VC research for the product below. Return a JSON object with exactly these keys:
467
-
468
- 1. product_summary: object with name, one_line_description, industry_l1, industry_l2, industry_l3, detected_stage, comparable_companies_used (array of names)
469
-
470
- 2. track_a_vcs: array of VC objects from Track A research. Each object:
471
- - fund_name, evidence_company (REQUIRED), evidence_source_url, stage_focus, check_size, thesis_summary (1-2 sentences), stage_fit_score (1-10), space_fit_score (1-10), approach_method
472
-
473
- 3. track_b_vcs: array of VC objects from Track B research. Each object:
474
- - fund_name, thesis_source_title (REQUIRED), thesis_source_url, stage_focus, check_size, thesis_summary (1-2 sentences), stage_fit_score (1-10), space_fit_score (1-10), approach_method
475
-
476
- 4. top_5_deep_dives: array of exactly 5 objects (the 5 highest combined score VCs across both tracks). Each:
477
- - fund_name, track ("A" or "B"), fund_overview (2-3 sentences), why_fit (2-3 sentences specific to this product's L3 niche), portfolio_in_space (array of 1-3 names from search data only), how_to_approach (specific steps, min 30 chars), outreach_hook (2-3 sentences, product-specific)
697
+ }, indent=2))
698
+ "
699
+ ```
478
700
 
479
- 5. outreach_hooks: array of exactly 3 objects:
480
- - hook_type (e.g. "portfolio overlap angle", "thesis language mirror", "comparable exit angle"), hook_text (2-3 sentences a founder would actually send), best_for (which VC type this works for)
701
+ **AI instructions -- zero-hallucination rules:**
702
+
703
+ Every field in the output must be traceable to the printed data above. Rules:
704
+
705
+ 1. **curated_vcs:** Use the `curated_matches` data directly. These are pre-verified -- no Tavily evidence required. `fund_overview` comes from the `thesis` field in the curated data. `check_size` and `stage_focus` come from the curated data fields. Do NOT fill from training knowledge even for these funds.
706
+ 2. **VC names (Track A / B):** Only include a fund if its name appears verbatim in the snippet text or title. No exceptions.
707
+ 3. **evidence_company (Track A):** The comparable company they backed -- must be stated in the snippet text, not inferred.
708
+ 4. **thesis_source_title (Track B):** The exact title of the article or post as it appears in the search results.
709
+ 5. **fund_overview (Track A / B):** Extract from snippet text only. Max 2 sentences. If the snippets do not describe the fund, write "not found in search data".
710
+ 6. **thesis_summary:** Close paraphrase of the snippet text. Do not add context from training knowledge.
711
+ 7. **check_size (Track A / B):** From snippet data only. Write "not in search data" if not mentioned.
712
+ 8. **portfolio_in_space:** Only companies that appear in the search snippets. Write "not found in search data" if none.
713
+ 9. **stage_fit_score 1-10:** Penalize 3 points if the VC's stated stage does not match the product's detected stage.
714
+ 10. **space_fit_score 1-10:** 9-10 only if the VC backed 2+ companies in the L3 niche per the snippets or curated data.
715
+ 11. **approach_method:** one of -- cold email / warm intro required / AngelList / application form / Twitter/X DM. Infer from snippets or fund website.
716
+ 12. **outreach_hook:** Must name a specific portfolio signal or thesis quote. Generic hooks like "highlight your traction" are not acceptable.
717
+ 13. No em dashes. No marketing language.
718
+
719
+ Write to `/tmp/vc-final-list.json`:
720
+
721
+ - `product_summary`: name, one_line_description, industry_l1, industry_l2, industry_l3, detected_stage, comparable_companies_used (names only)
722
+ - `curated_vcs`: fund_name, confidence ("High"/"Medium"), matched_tags, fund_overview (from thesis field), check_size, stage_focus, website, source ("verified (fund website)"), stage_fit_score, space_fit_score
723
+ - `track_a_vcs`: fund_name, evidence_company (REQUIRED), evidence_source_url, stage_focus, check_size, fund_overview, thesis_summary, stage_fit_score, space_fit_score, approach_method
724
+ - `track_b_vcs`: fund_name, thesis_source_title (REQUIRED), thesis_source_url, stage_focus, check_size, fund_overview, thesis_summary, stage_fit_score, space_fit_score, approach_method
725
+ - `top_5_deep_dives`: fund_name, track ("Curated"/"A"/"B"), fund_overview, why_fit, portfolio_in_space, how_to_approach (min 30 chars), outreach_hook
726
+ - `outreach_hooks`: 3 objects -- hook_type, hook_text (2-3 sentences), best_for
727
+ - `data_quality_flags`: gaps, missing fields, low-confidence areas
481
728
 
482
- 6. data_quality_flags: array of strings noting any gaps or low-confidence areas
729
+ ```bash
730
+ python3 << 'PYEOF'
731
+ import json
483
732
 
484
- Research data:
485
- {json.dumps(context, indent=2)}"""
486
- }]
487
- }],
488
- "generationConfig": {
489
- "temperature": 0.3,
490
- "maxOutputTokens": 6000
491
- }
733
+ result = {
734
+ # FILL from synthesis above
735
+ # Must include: product_summary, curated_vcs, track_a_vcs, track_b_vcs, top_5_deep_dives, outreach_hooks, data_quality_flags
492
736
  }
493
737
 
494
- json.dump(request, open('/tmp/vc-synthesis-request.json', 'w'))
495
- print('Synthesis request prepared.')
496
- PYEOF
497
-
498
- curl -s -X POST \
499
- "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=$GEMINI_API_KEY" \
500
- -H "Content-Type: application/json" \
501
- -d @/tmp/vc-synthesis-request.json \
502
- | python3 -c "
503
- import sys, json
504
- d = json.load(sys.stdin)
505
- text = d['candidates'][0]['content']['parts'][0]['text'].strip()
506
- if text.startswith('\`\`\`'):
507
- text = '\n'.join(text.split('\n')[1:-1])
508
- result = json.loads(text)
509
738
  json.dump(result, open('/tmp/vc-final-list.json', 'w'), indent=2)
510
- print(f'Synthesis complete. Track A: {len(result.get(\"track_a_vcs\", []))} VCs. Track B: {len(result.get(\"track_b_vcs\", []))} VCs.')
511
- "
739
+ print(f'Synthesis written. Curated: {len(result.get("curated_vcs", []))} VCs. Track A: {len(result.get("track_a_vcs", []))} VCs. Track B: {len(result.get("track_b_vcs", []))} VCs.')
740
+ PYEOF
512
741
  ```
513
742
 
514
- **If Gemini returns empty or JSON parsing fails:** Retry once with `maxOutputTokens` reduced to 4000. If retry also fails: present whatever partial JSON was returned, mark missing sections `[INCOMPLETE]`, and tell the user: "Synthesis incomplete. The research data may have been too large. Try running again."
515
-
516
743
  ---
517
744
 
518
- ## Step 9: Self-QA
519
-
520
- Run before presenting. Remove non-evidenced VCs structurally.
745
+ ## Step 10: Self-QA
521
746
 
522
747
  ```bash
523
748
  python3 << 'PYEOF'
@@ -540,6 +765,18 @@ removed_b = original_b - len(result['track_b_vcs'])
540
765
  if removed_b > 0:
541
766
  failures.append(f'Removed {removed_b} Track B VC(s) missing thesis_source_title')
542
767
 
768
+ # Remove deep dives for VCs that were stripped from all tracks
769
+ valid_funds = (
770
+ {v['fund_name'] for v in result.get('curated_vcs', [])} |
771
+ {v['fund_name'] for v in result.get('track_a_vcs', [])} |
772
+ {v['fund_name'] for v in result.get('track_b_vcs', [])}
773
+ )
774
+ original_dives = len(result.get('top_5_deep_dives', []))
775
+ result['top_5_deep_dives'] = [d for d in result.get('top_5_deep_dives', []) if d.get('fund_name') in valid_funds]
776
+ removed_dives = original_dives - len(result['top_5_deep_dives'])
777
+ if removed_dives > 0:
778
+ failures.append(f'Removed {removed_dives} deep dive(s) for funds stripped during QA')
779
+
543
780
  # Check top 5 deep dives
544
781
  dives = result.get('top_5_deep_dives', [])
545
782
  if len(dives) < 5:
@@ -548,25 +785,31 @@ for dd in dives:
548
785
  if not dd.get('how_to_approach') or len(dd.get('how_to_approach', '')) < 30:
549
786
  dd['how_to_approach'] = 'Approach method not determinable from search data. Check the fund website directly for application instructions.'
550
787
  failures.append(f"Fixed: '{dd.get('fund_name')}' had missing how_to_approach")
788
+ if not dd.get('fund_overview') or dd.get('fund_overview') == '':
789
+ dd['fund_overview'] = 'not found in search data'
551
790
 
552
791
  # Check outreach hooks count
553
792
  if len(result.get('outreach_hooks', [])) != 3:
554
793
  failures.append(f"Expected 3 outreach hooks, got {len(result.get('outreach_hooks', []))}")
555
794
 
556
795
  # Check for em dashes
557
- if ':' in json.dumps(result):
558
- result_str = json.dumps(result).replace(':', ':')
559
- result = json.loads(result_str)
560
- failures.append('Fixed: em dash characters removed from output')
796
+ full_text = json.dumps(result)
797
+ if '' in full_text:
798
+ result = json.loads(full_text.replace('—', '-'))
799
+ failures.append('Fixed: em dash characters replaced with hyphens')
561
800
 
562
801
  # Check for forbidden words
563
802
  forbidden = ['powerful', 'robust', 'seamless', 'innovative', 'game-changing', 'streamline', 'leverage', 'transform']
564
- full_text = json.dumps(result).lower()
803
+ full_text_lower = json.dumps(result).lower()
565
804
  for word in forbidden:
566
- if word in full_text:
805
+ if word in full_text_lower:
567
806
  failures.append(f"Warning: forbidden word '{word}' found in output -- review before presenting")
568
807
 
569
- # Ensure data_quality_flags exists
808
+ # Flag any "not found in search data" entries so user knows coverage is incomplete
809
+ not_found_count = json.dumps(result).count('not found in search data')
810
+ if not_found_count > 0:
811
+ failures.append(f'INFO: {not_found_count} field(s) marked "not found in search data" -- verify directly before outreach')
812
+
570
813
  if 'data_quality_flags' not in result:
571
814
  result['data_quality_flags'] = []
572
815
  result['data_quality_flags'].extend(failures)
@@ -582,7 +825,7 @@ PYEOF
582
825
 
583
826
  ---
584
827
 
585
- ## Step 10: Save and Present Output
828
+ ## Step 11: Save and Present Output
586
829
 
587
830
  ```bash
588
831
  DATE=$(date +%Y-%m-%d)
@@ -603,13 +846,23 @@ Date: [today] | Stage: [detected_stage] ([stage_confidence] confidence) | Geogra
603
846
  What it does: [one_line_description]
604
847
  Industry: [l1] > [l2] > [l3]
605
848
  Buyer: [buyer_persona] at [company_type], [company_size]
606
- Comparable companies used for research: [comma-separated list]
849
+ Comparable companies used: [comma-separated list, noting source_type for each]
850
+
851
+ ---
852
+
853
+ ### Curated Matches (Verified)
854
+
855
+ *Funds matched from a verified dataset of 25 VC funds sourced from fund websites. Zero hallucination -- details come directly from the dataset.*
856
+
857
+ | Fund | Confidence | Stage Focus | Check Size | Matched Tags |
858
+ |---|---|---|---|---|
859
+ [one row per curated VC, sorted by confidence then score]
607
860
 
608
861
  ---
609
862
 
610
863
  ### Track A: VCs Who Backed Similar Companies
611
864
 
612
- *These investors have already written a check in this space.*
865
+ *These investors have already written a check in this space. Evidence from live Tavily search.*
613
866
 
614
867
  | Fund | Backed Comparable | Stage Focus | Check Size | Fit Score | Approach |
615
868
  |---|---|---|---|---|---|
@@ -629,15 +882,15 @@ Comparable companies used for research: [comma-separated list]
629
882
 
630
883
  ### Top 5 Deep Dives
631
884
 
632
- #### [N]. [Fund Name] (Track [A/B])
885
+ #### [N]. [Fund Name] (Track [Curated/A/B])
633
886
 
634
- Overview: [fund_overview]
887
+ Overview: [fund_overview -- from dataset or search data only]
635
888
  Why it fits: [why_fit]
636
- Portfolio in this space: [names, or "Not found in search data"]
889
+ Portfolio in this space: [from dataset or search data, or "not found in search data"]
637
890
  How to approach: [how_to_approach]
638
891
  Outreach hook: "[outreach_hook]"
639
892
 
640
- [repeat for all 5]
893
+ [repeat for all available deep dives]
641
894
 
642
895
  ---
643
896
 
@@ -657,7 +910,7 @@ Saved to: docs/vc-intel/[PRODUCT_SLUG]-[DATE].md
657
910
  Clean up temp files:
658
911
 
659
912
  ```bash
660
- rm -f /tmp/vc-product-raw.md /tmp/vc-stage-signals.json /tmp/vc-analysis-request.json \
661
- /tmp/vc-product-analysis.json /tmp/vc-tracka-results.json /tmp/vc-trackb-results.json \
662
- /tmp/vc-synthesis-request.json /tmp/vc-final-list.json /tmp/vc-qa-result.json
913
+ rm -f /tmp/vc-product-raw.md /tmp/vc-stage-signals.json /tmp/vc-product-analysis.json \
914
+ /tmp/vc-product-context.json /tmp/vc-curated-matches.json /tmp/vc-comparable-search.json \
915
+ /tmp/vc-tracka-results.json /tmp/vc-trackb-results.json /tmp/vc-final-list.json
663
916
  ```