@opendirectory.dev/skills 0.1.41 → 0.1.42
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skills/vc-finder/.env.example +1 -5
- package/skills/vc-finder/README.md +16 -9
- package/skills/vc-finder/SKILL.md +446 -193
- package/skills/vc-finder/data/vc_funds.json +277 -0
- package/skills/vc-finder/evals/evals.json +43 -25
- package/skills/vc-finder/scripts/match_funds.py +144 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: vc-finder
|
|
3
|
-
description: Takes a startup product URL or description, detects the industry and funding stage, identifies 5 comparable funded companies, searches who invested in those companies (Track A), finds VCs who publish investment theses about this space (Track B), and returns a ranked sourced list of relevant investors with deep-dives and outreach hooks. Use when asked to find investors for a startup, identify which VCs fund products like mine, research who backs companies in my space, build a VC target list, or find investor-market fit.
|
|
3
|
+
description: 'Takes a startup product URL or description, detects the industry and funding stage, identifies 5 comparable funded companies, searches who invested in those companies (Track A), finds VCs who publish investment theses about this space (Track B), and returns a ranked sourced list of relevant investors with deep-dives and outreach hooks. Use when asked to find investors for a startup, identify which VCs fund products like mine, research who backs companies in my space, build a VC target list, or find investor-market fit.'
|
|
4
4
|
compatibility: [claude-code, gemini-cli, github-copilot]
|
|
5
5
|
---
|
|
6
6
|
|
|
@@ -10,7 +10,11 @@ Take a product URL or description. Detect industry and stage. Find 5 comparable
|
|
|
10
10
|
|
|
11
11
|
---
|
|
12
12
|
|
|
13
|
-
**
|
|
13
|
+
**Zero-hallucination policy:** Every fact in the output must be traceable to a specific Tavily search result or the fetched product page. This applies to:
|
|
14
|
+
- Comparable company names: must appear in Tavily search results, not AI training knowledge
|
|
15
|
+
- VC fund names: must appear verbatim in Tavily search results
|
|
16
|
+
- Check sizes, stage focus, portfolio companies: must come from search snippets, not AI knowledge
|
|
17
|
+
- Fund overviews and thesis summaries: extracted from search snippets only. If a detail is not in the search data, write "not found in search data" -- do not fill from training knowledge.
|
|
14
18
|
|
|
15
19
|
---
|
|
16
20
|
|
|
@@ -19,25 +23,24 @@ Take a product URL or description. Detect industry and stage. Find 5 comparable
|
|
|
19
23
|
| The agent will want to... | Why that's wrong |
|
|
20
24
|
|---|---|
|
|
21
25
|
| Add a16z or Sequoia because they are famous | A famous VC without evidence is noise. Only include VCs that appear in Tavily search results for this specific product. Name-dropping wastes the founder's time. |
|
|
22
|
-
|
|
|
26
|
+
| Generate comparable companies from training knowledge | Comparables must come from Tavily search results (Step 6). AI knowledge of companies is not evidence -- a company suggested from memory may have wrong funding status or may not be a true comparable. |
|
|
27
|
+
| Continue when all 5 Track A searches return 0 results | Zero Track A results means the comparables were wrong or too obscure. Stop, re-run Step 6 with broader search queries, and retry. |
|
|
23
28
|
| Include a Track B VC without citing the article or post | Thesis without a source is indistinguishable from hallucination. The founder cannot verify it and the list loses all credibility. |
|
|
24
|
-
|
|
|
25
|
-
|
|
|
26
|
-
|
|
|
29
|
+
| Fill in fund overview from training knowledge | Fund overviews must come from Tavily snippet text only. If the snippets don't describe the fund, write "not found in search data". |
|
|
30
|
+
| Detect stage from website aesthetics | Stage must come from the specific CTA signals detected in Step 4. |
|
|
31
|
+
| Write generic outreach hooks | Every outreach hook must name this specific product's differentiator and a specific VC portfolio signal or thesis quote from the search data. |
|
|
32
|
+
| Skip the URL fetch when the user also provides a description | Always fetch the URL. The live page often reveals stage signals that the user's description omits. |
|
|
27
33
|
|
|
28
34
|
---
|
|
29
35
|
|
|
30
36
|
## Step 1: Setup Check
|
|
31
37
|
|
|
32
38
|
```bash
|
|
33
|
-
echo "
|
|
34
|
-
echo "TAVILY_API_KEY: ${TAVILY_API_KEY:+set}"
|
|
39
|
+
echo "TAVILY_API_KEY: ${TAVILY_API_KEY:+set}"
|
|
35
40
|
echo "FIRECRAWL_API_KEY: ${FIRECRAWL_API_KEY:-not set, Tavily extract will be used as fallback}"
|
|
36
41
|
```
|
|
37
42
|
|
|
38
|
-
**If
|
|
39
|
-
|
|
40
|
-
**If TAVILY_API_KEY is missing:** Stop. Tell the user: "TAVILY_API_KEY is required to research VC investments and theses. There is no fallback for this. Get it at app.tavily.com. Free tier: 1000 credits/month (about 125 full runs). Add it to your .env file."
|
|
43
|
+
**If TAVILY_API_KEY is missing:** Stop. Tell the user: "TAVILY_API_KEY is required to research VC investments and theses. There is no fallback for this. Get it at app.tavily.com -- free tier: 1000 credits/month (about 125 full runs). Add it to your .env file."
|
|
41
44
|
|
|
42
45
|
**If only FIRECRAWL_API_KEY is missing:** Continue silently. Tavily extract will be used for the URL fetch.
|
|
43
46
|
|
|
@@ -120,7 +123,7 @@ Proceed to Step 5 using the pasted description as `product_content`.
|
|
|
120
123
|
|
|
121
124
|
## Step 4: Detect Stage Signals Locally (No API)
|
|
122
125
|
|
|
123
|
-
Parse the fetched markdown with regex before
|
|
126
|
+
Parse the fetched markdown with regex before the analysis step.
|
|
124
127
|
|
|
125
128
|
```bash
|
|
126
129
|
python3 << 'PYEOF'
|
|
@@ -129,25 +132,20 @@ import re, json
|
|
|
129
132
|
content = open('/tmp/vc-product-raw.md').read().lower()
|
|
130
133
|
stage_signals = []
|
|
131
134
|
|
|
132
|
-
# Pre-seed signals
|
|
133
135
|
if re.search(r'join\s+(the\s+)?waitlist|sign\s+up\s+for\s+beta|early\s+access|request\s+(an?\s+)?invite|get\s+notified', content):
|
|
134
136
|
stage_signals.append({'signal': 'waitlist or beta CTA', 'stage_hint': 'pre-seed'})
|
|
135
137
|
|
|
136
|
-
# Seed signals
|
|
137
138
|
if re.search(r'start\s+(your\s+)?free\s+trial|try\s+(it\s+)?for\s+free|request\s+a?\s+demo|book\s+a?\s+demo|schedule\s+a?\s+demo', content):
|
|
138
139
|
stage_signals.append({'signal': 'free trial or demo CTA', 'stage_hint': 'seed'})
|
|
139
140
|
|
|
140
|
-
# Series A signals
|
|
141
141
|
if re.search(r'contact\s+sales|talk\s+to\s+(our\s+)?sales|see\s+pricing|view\s+pricing|plans\s+and\s+pricing', content):
|
|
142
142
|
stage_signals.append({'signal': 'pricing or sales CTA', 'stage_hint': 'series-a'})
|
|
143
143
|
if re.search(r'case\s+stud(y|ies)|customer\s+stor(y|ies)|trusted\s+by\s+[\d,]+|used\s+by\s+[\d,]+', content):
|
|
144
144
|
stage_signals.append({'signal': 'case studies or customer count', 'stage_hint': 'series-a'})
|
|
145
145
|
|
|
146
|
-
# Series A/B signals
|
|
147
146
|
if re.search(r'enterprise\s+(plan|pricing|tier)|we.?re\s+hiring|join\s+our\s+team|open\s+positions', content):
|
|
148
147
|
stage_signals.append({'signal': 'enterprise tier or job openings', 'stage_hint': 'series-a-or-b'})
|
|
149
148
|
|
|
150
|
-
# Funding announcement -- extract directly if present
|
|
151
149
|
funding_match = re.search(
|
|
152
150
|
r'raised\s+\$[\d,.]+\s*[mk]?|series\s+[abc]\s+round|seed\s+round|(\$[\d,.]+\s*[mk]?\s+(?:seed|series\s+[abc]))',
|
|
153
151
|
content
|
|
@@ -155,7 +153,6 @@ funding_match = re.search(
|
|
|
155
153
|
if funding_match:
|
|
156
154
|
stage_signals.append({'signal': f'funding text: {funding_match.group(0).strip()}', 'stage_hint': 'announced'})
|
|
157
155
|
|
|
158
|
-
# Determine dominant stage
|
|
159
156
|
if not stage_signals:
|
|
160
157
|
dominant = 'unknown'
|
|
161
158
|
elif any(s['stage_hint'] == 'announced' for s in stage_signals):
|
|
@@ -181,87 +178,340 @@ PYEOF
|
|
|
181
178
|
|
|
182
179
|
---
|
|
183
180
|
|
|
184
|
-
## Step 5: Product Analysis
|
|
181
|
+
## Step 5: Product Analysis (Taxonomy, Stage, ICP)
|
|
182
|
+
|
|
183
|
+
Print the product content and stage signals:
|
|
184
|
+
|
|
185
|
+
```bash
|
|
186
|
+
python3 -c "
|
|
187
|
+
import json
|
|
188
|
+
content = open('/tmp/vc-product-raw.md').read()[:6000]
|
|
189
|
+
signals = json.load(open('/tmp/vc-stage-signals.json'))
|
|
190
|
+
print('=== PRODUCT PAGE (first 6000 chars) ===')
|
|
191
|
+
print(content)
|
|
192
|
+
print()
|
|
193
|
+
print('=== DETECTED STAGE SIGNALS ===')
|
|
194
|
+
print(json.dumps(signals, indent=2))
|
|
195
|
+
"
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
**AI instructions:** Analyze the product page content above. Generate the taxonomy, ICP, and stage classification only -- do NOT generate comparable companies yet (that is done via live search in Step 6).
|
|
199
|
+
|
|
200
|
+
Write to `/tmp/vc-product-analysis.json`:
|
|
201
|
+
|
|
202
|
+
- `product_name`: from the page
|
|
203
|
+
- `one_line_description`: what it does, for whom, core value prop. Under 20 words. No marketing language.
|
|
204
|
+
- `industry_taxonomy`: `l1` (top-level: fintech / healthtech / developer tools / consumer / etc.), `l2` (sector: sales technology / logistics software / etc.), `l3` (specific niche: outbound prospecting / last-mile routing / etc.). Vague labels like "technology" or "software" alone are not acceptable.
|
|
205
|
+
- `icp`: `buyer_persona` (job title), `company_type`, `company_size`
|
|
206
|
+
- `detected_stage`: pre-seed / seed / series-a / series-b / unknown
|
|
207
|
+
- `stage_confidence`: high / medium / low
|
|
208
|
+
- `stage_evidence`: one sentence citing exactly which CTA or text on the page drove this. Write "no clear signals found" if unknown.
|
|
209
|
+
- `geography_bias`: US / Europe / global / unclear
|
|
210
|
+
- `comparable_companies`: leave as empty array `[]` -- will be filled in Step 6
|
|
185
211
|
|
|
186
212
|
```bash
|
|
187
213
|
python3 << 'PYEOF'
|
|
188
214
|
import json
|
|
189
215
|
|
|
190
|
-
|
|
191
|
-
|
|
216
|
+
analysis = {
|
|
217
|
+
# FILL from your analysis above
|
|
218
|
+
"comparable_companies": []
|
|
219
|
+
}
|
|
192
220
|
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
|
|
221
|
+
json.dump(analysis, open('/tmp/vc-product-analysis.json', 'w'), indent=2)
|
|
222
|
+
print('Product analysis written.')
|
|
223
|
+
PYEOF
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
Verify:
|
|
227
|
+
|
|
228
|
+
```bash
|
|
229
|
+
python3 -c "
|
|
230
|
+
import json
|
|
231
|
+
a = json.load(open('/tmp/vc-product-analysis.json'))
|
|
232
|
+
print('Product:', a['product_name'])
|
|
233
|
+
print('Industry:', a['industry_taxonomy']['l1'], '>', a['industry_taxonomy']['l2'], '>', a['industry_taxonomy']['l3'])
|
|
234
|
+
print('Stage:', a['detected_stage'], '(' + a['stage_confidence'] + ' confidence)')
|
|
235
|
+
"
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
---
|
|
239
|
+
|
|
240
|
+
## Step 5b: Curated Pre-Match Against Verified Fund Dataset
|
|
241
|
+
|
|
242
|
+
Run the product taxonomy against a curated dataset of 25 verified VC funds (sourced from fund websites). Produces zero-hallucination fund matches and seed comparables for Track A -- no Tavily credits consumed.
|
|
243
|
+
|
|
244
|
+
Print product analysis for tag mapping:
|
|
245
|
+
|
|
246
|
+
```bash
|
|
247
|
+
python3 -c "
|
|
248
|
+
import json
|
|
249
|
+
a = json.load(open('/tmp/vc-product-analysis.json'))
|
|
250
|
+
print('Taxonomy:', a['industry_taxonomy']['l1'], '>', a['industry_taxonomy']['l2'], '>', a['industry_taxonomy']['l3'])
|
|
251
|
+
print('Stage:', a['detected_stage'])
|
|
252
|
+
print('Geography:', a['geography_bias'])
|
|
253
|
+
"
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
**AI instructions:** Map the product taxonomy to the standard tags used in the fund dataset. Available tags:
|
|
257
|
+
`DevTools`, `Infrastructure`, `Open Source`, `B2B SaaS`, `AI`, `Data`, `FinTech`, `HealthTech`, `Enterprise`, `Consumer`, `Marketplaces`, `E-commerce`, `Crypto`, `DeepTech`, `Cybersecurity`, `Generalist`
|
|
258
|
+
|
|
259
|
+
Pick 2-4 tags that describe this product. Map `detected_stage` to: `Pre-seed`, `Seed`, `Series A`, or `Growth`. Map `geography_bias` to: `US`, `Europe`, `India`, or `Global`.
|
|
260
|
+
|
|
261
|
+
Write product context:
|
|
262
|
+
|
|
263
|
+
```bash
|
|
264
|
+
python3 << 'PYEOF'
|
|
265
|
+
import json
|
|
266
|
+
|
|
267
|
+
# FILL based on taxonomy analysis above
|
|
268
|
+
context = {
|
|
269
|
+
"extracted_tags": ["TagA", "TagB"], # 2-4 tags from the list above
|
|
270
|
+
"stage_hint": "Seed", # Pre-seed / Seed / Series A / Growth
|
|
271
|
+
"geography_hint": "US" # US / Europe / India / Global
|
|
233
272
|
}
|
|
273
|
+
json.dump(context, open('/tmp/vc-product-context.json', 'w'), indent=2)
|
|
274
|
+
print('Product context:', context)
|
|
275
|
+
PYEOF
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
Run scoring against the embedded curated dataset:
|
|
279
|
+
|
|
280
|
+
```bash
|
|
281
|
+
python3 << 'PYEOF'
|
|
282
|
+
import json
|
|
283
|
+
|
|
284
|
+
context = json.load(open('/tmp/vc-product-context.json'))
|
|
285
|
+
|
|
286
|
+
VC_FUNDS = [
|
|
287
|
+
{"fund_name":"Y Combinator","thesis":"We provide seed funding for startups. We invest in deeply technical teams building massive companies across all domains.","check_size":"$500k","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","B2B SaaS","DevTools","AI"],"geography_focus":["Global"],"notable_portfolio":["Stripe","Airbnb","GitLab"],"website":"https://www.ycombinator.com"},
|
|
288
|
+
{"fund_name":"boldstart ventures","thesis":"Day one partner for developer first, crypto, and SaaS founders. We love deeply technical founders solving hard infrastructure problems.","check_size":"$1M - $3M","stage_focus":["Pre-seed","Seed"],"industry_tags":["DevTools","Infrastructure","Crypto"],"geography_focus":["Global","US"],"notable_portfolio":["Snyk","Blockdaemon","Superhuman"],"website":"https://boldstart.vc"},
|
|
289
|
+
{"fund_name":"Heavybit","thesis":"The leading investor in developer-first startups. We help technical founders launch, gain traction, and build enterprise-ready companies.","check_size":"$1M - $5M","stage_focus":["Seed","Series A"],"industry_tags":["DevTools","Infrastructure","Open Source"],"geography_focus":["Global","US"],"notable_portfolio":["PagerDuty","Sanity","Netlify"],"website":"https://www.heavybit.com"},
|
|
290
|
+
{"fund_name":"Amplify Partners","thesis":"We invest in technical founders building the next generation of IT infrastructure, developer tools, and data platforms.","check_size":"$2M - $8M","stage_focus":["Seed","Series A"],"industry_tags":["DevTools","Infrastructure","AI","Data"],"geography_focus":["US"],"notable_portfolio":["Datadog","OCTO","dbt Labs"],"website":"https://www.amplifypartners.com"},
|
|
291
|
+
{"fund_name":"OSS Capital","thesis":"We exclusively back early-stage founders building Commercial Open Source Software (COSS) companies.","check_size":"$500k - $2M","stage_focus":["Pre-seed","Seed","Series A"],"industry_tags":["Open Source","DevTools"],"geography_focus":["Global"],"notable_portfolio":["Cal.com","Appsmith","Hoppscotch"],"website":"https://oss.capital"},
|
|
292
|
+
{"fund_name":"Sequoia Capital","thesis":"We help the daring build legendary companies, from idea to IPO and beyond. Sequoia is an early-stage and growth-stage investor.","check_size":"$1M - $10M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Enterprise","Consumer","AI"],"geography_focus":["Global"],"notable_portfolio":["Apple","Google","WhatsApp"],"website":"https://www.sequoiacap.com"},
|
|
293
|
+
{"fund_name":"Andreessen Horowitz (a16z)","thesis":"We invest in software eating the world. We back bold entrepreneurs building the future through technology.","check_size":"$1M - $50M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Crypto","Enterprise","Consumer","AI"],"geography_focus":["Global","US"],"notable_portfolio":["Facebook","Coinbase","Figma"],"website":"https://a16z.com"},
|
|
294
|
+
{"fund_name":"Point Nine Capital","thesis":"We are a seed-stage venture capital firm focused on B2B SaaS and B2B marketplaces globally.","check_size":"$1M - $3M","stage_focus":["Seed"],"industry_tags":["B2B SaaS","Marketplaces"],"geography_focus":["Europe","Global"],"notable_portfolio":["Zendesk","Typeform","Docplanner"],"website":"https://www.pointnine.com"},
|
|
295
|
+
{"fund_name":"Cherry Ventures","thesis":"We champion founders in Europe from their earliest days. We are generalist seed investors.","check_size":"$1M - $4M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","Consumer","B2B SaaS"],"geography_focus":["Europe"],"notable_portfolio":["FlixBus","Auto1 Group","Forto"],"website":"https://www.cherry.vc"},
|
|
296
|
+
{"fund_name":"First Round Capital","thesis":"We are the seed-stage firm that builds the most supportive community for founders.","check_size":"$1M - $4M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","B2B SaaS","Consumer"],"geography_focus":["US"],"notable_portfolio":["Uber","Notion","Roblox"],"website":"https://firstround.com"},
|
|
297
|
+
{"fund_name":"Bessemer Venture Partners","thesis":"BVP helps entrepreneurs lay strong foundations to build and forge long-standing companies.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Enterprise","Consumer","FinTech"],"geography_focus":["Global"],"notable_portfolio":["LinkedIn","Twilio","Shopify"],"website":"https://www.bvp.com"},
|
|
298
|
+
{"fund_name":"Index Ventures","thesis":"We back the best and most ambitious entrepreneurs across all stages to build category-defining businesses.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","FinTech","Consumer","B2B SaaS"],"geography_focus":["Europe","US","Global"],"notable_portfolio":["Dropbox","Slack","Figma"],"website":"https://www.indexventures.com"},
|
|
299
|
+
{"fund_name":"Lightspeed Venture Partners","thesis":"We invest globally in enterprise, consumer, and health founders who are shaping the future.","check_size":"$1M - $25M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Enterprise","Consumer","FinTech"],"geography_focus":["Global"],"notable_portfolio":["Snap","Rippling","MuleSoft"],"website":"https://lsvp.com"},
|
|
300
|
+
{"fund_name":"Accel","thesis":"We partner with exceptional founders from inception through all phases of private company growth.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","B2B SaaS","Consumer","DevTools"],"geography_focus":["Global"],"notable_portfolio":["Facebook","Atlassian","Spotify"],"website":"https://www.accel.com"},
|
|
301
|
+
{"fund_name":"Bain Capital Ventures","thesis":"From seed to growth, we back founders building legendary infrastructure, fintech, application, and commerce companies.","check_size":"$1M - $50M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Infrastructure","FinTech","B2B SaaS"],"geography_focus":["US","Global"],"notable_portfolio":["DocuSign","SendGrid","Redis"],"website":"https://www.baincapitalventures.com"},
|
|
302
|
+
{"fund_name":"Greylock Partners","thesis":"We partner with early-stage founders to build enterprise and consumer software companies that define new categories.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["Enterprise","Consumer","Cybersecurity","AI"],"geography_focus":["US"],"notable_portfolio":["Workday","Palo Alto Networks","LinkedIn"],"website":"https://greylock.com"},
|
|
303
|
+
{"fund_name":"Unusual Ventures","thesis":"We provide a breakthrough level of support for early-stage founders building enterprise tech.","check_size":"$1M - $5M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Enterprise","DevTools","B2B SaaS"],"geography_focus":["US"],"notable_portfolio":["Arctic Wolf","Harness","Vivun"],"website":"https://www.unusual.vc"},
|
|
304
|
+
{"fund_name":"Crane Venture Partners","thesis":"We back deep tech and enterprise founders in Europe solving hard problems with data and code.","check_size":"$1M - $4M","stage_focus":["Seed"],"industry_tags":["Enterprise","DeepTech","Data","AI"],"geography_focus":["Europe"],"notable_portfolio":["Onfido","Tessian","Forto"],"website":"https://crane.vc"},
|
|
305
|
+
{"fund_name":"Founder Collective","thesis":"We are a seed-stage venture capital fund, built by founders, for founders. We back weird, wonderful, and wild startups.","check_size":"$500k - $2M","stage_focus":["Seed"],"industry_tags":["Generalist","Consumer","B2B SaaS"],"geography_focus":["US","Global"],"notable_portfolio":["Uber","Airtable","BuzzFeed"],"website":"https://www.foundercollective.com"},
|
|
306
|
+
{"fund_name":"Benchmark","thesis":"We are a partnership of equal partners. We back mission-driven founders at the earliest stages and walk beside them for the long haul.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["Generalist","Marketplaces","Enterprise","Consumer"],"geography_focus":["US","Global"],"notable_portfolio":["Uber","Twitter","eBay","Snapchat"],"website":"https://www.benchmark.com"},
|
|
307
|
+
{"fund_name":"Accel India","thesis":"We partner with exceptional founders from inception through all phases of private company growth in the Indian ecosystem.","check_size":"$1M - $15M","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","B2B SaaS","Consumer","FinTech","E-commerce"],"geography_focus":["India"],"notable_portfolio":["Flipkart","Swiggy","Freshworks"],"website":"https://www.accel.com/india"},
|
|
308
|
+
{"fund_name":"Blume Ventures","thesis":"We are a seed and pre-seed venture fund that backs startups with both funding and active mentoring.","check_size":"$500k - $3M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","B2B SaaS","Consumer","DeepTech","HealthTech"],"geography_focus":["India"],"notable_portfolio":["Unacademy","Purplle","GreyOrange"],"website":"https://blume.vc"},
|
|
309
|
+
{"fund_name":"Elevation Capital","thesis":"We partner with visionary founders in India across early stages to help them build category-defining businesses.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["Generalist","Consumer","FinTech","B2B SaaS","HealthTech"],"geography_focus":["India"],"notable_portfolio":["Paytm","Swiggy","Meesho"],"website":"https://elevationcapital.com"},
|
|
310
|
+
{"fund_name":"Peak XV Partners","thesis":"Formerly Sequoia India & SEA, we partner with founders across early, growth, and public stages to build enduring companies.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Consumer","FinTech","B2B SaaS","DevTools","AI"],"geography_focus":["India","South Asia"],"notable_portfolio":["Zomato","Pine Labs","Cred"],"website":"https://www.peakxv.com"},
|
|
311
|
+
{"fund_name":"Nexus Venture Partners","thesis":"We are a US-India venture capital firm backing extraordinary founders building product-first companies.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["B2B SaaS","Enterprise","DevTools","Consumer"],"geography_focus":["India","US"],"notable_portfolio":["Postman","Hasura","Zepto"],"website":"https://nexusvp.com"}
|
|
312
|
+
]
|
|
313
|
+
|
|
314
|
+
STAGE_ORDER = {"Pre-seed": 0, "Seed": 1, "Series A": 2, "Growth": 3}
|
|
315
|
+
|
|
316
|
+
def score_fund(fund, ctx):
|
|
317
|
+
score = 0
|
|
318
|
+
fund_tags = fund.get("industry_tags", [])
|
|
319
|
+
extracted_tags = ctx.get("extracted_tags", ["Generalist"])
|
|
320
|
+
tag_points = 0
|
|
321
|
+
matched_tags = []
|
|
322
|
+
for tag in extracted_tags:
|
|
323
|
+
if tag in fund_tags:
|
|
324
|
+
tag_points += 5 if tag == "Generalist" else 20
|
|
325
|
+
matched_tags.append(tag)
|
|
326
|
+
tag_points = min(tag_points, 60)
|
|
327
|
+
score += tag_points
|
|
328
|
+
stage_hint = ctx.get("stage_hint")
|
|
329
|
+
fund_stages = fund.get("stage_focus", [])
|
|
330
|
+
if not stage_hint:
|
|
331
|
+
score += 10
|
|
332
|
+
elif fund_stages:
|
|
333
|
+
if stage_hint in fund_stages:
|
|
334
|
+
score += 20
|
|
335
|
+
elif stage_hint in STAGE_ORDER:
|
|
336
|
+
hint_idx = STAGE_ORDER[stage_hint]
|
|
337
|
+
if any(f in STAGE_ORDER and abs(STAGE_ORDER[f] - hint_idx) == 1 for f in fund_stages):
|
|
338
|
+
score += 10
|
|
339
|
+
geo_hint = ctx.get("geography_hint")
|
|
340
|
+
fund_geo = fund.get("geography_focus", ["Global"])
|
|
341
|
+
if not geo_hint or geo_hint == "Global":
|
|
342
|
+
score += 10
|
|
343
|
+
elif fund_geo == ["India"] and geo_hint == "US":
|
|
344
|
+
pass
|
|
345
|
+
elif geo_hint in fund_geo:
|
|
346
|
+
score += 20
|
|
347
|
+
elif "Global" in fund_geo:
|
|
348
|
+
score += 15
|
|
349
|
+
if geo_hint == "US" and "India" in fund_geo and "US" not in fund_geo and "Global" not in fund_geo:
|
|
350
|
+
score = max(0, score - 30)
|
|
351
|
+
if fund_tags and extracted_tags and fund_tags[0] not in extracted_tags and tag_points <= 20:
|
|
352
|
+
score = max(0, score - 15)
|
|
353
|
+
return score, matched_tags
|
|
354
|
+
|
|
355
|
+
scored = []
|
|
356
|
+
for fund in VC_FUNDS:
|
|
357
|
+
score, matched_tags = score_fund(fund, context)
|
|
358
|
+
tier = "High" if score >= 70 else ("Medium" if score >= 40 else "Low")
|
|
359
|
+
scored.append({
|
|
360
|
+
"fund_name": fund["fund_name"],
|
|
361
|
+
"thesis": fund["thesis"],
|
|
362
|
+
"check_size": fund["check_size"],
|
|
363
|
+
"stage_focus": fund["stage_focus"],
|
|
364
|
+
"industry_tags": fund["industry_tags"],
|
|
365
|
+
"geography_focus": fund["geography_focus"],
|
|
366
|
+
"notable_portfolio": fund["notable_portfolio"],
|
|
367
|
+
"website": fund["website"],
|
|
368
|
+
"source": "verified (fund website)",
|
|
369
|
+
"score": score,
|
|
370
|
+
"confidence": tier,
|
|
371
|
+
"matched_tags": matched_tags
|
|
372
|
+
})
|
|
373
|
+
|
|
374
|
+
scored.sort(key=lambda x: (-x["score"], x["fund_name"]))
|
|
375
|
+
relevant = [m for m in scored if m["confidence"] in ("High", "Medium")]
|
|
234
376
|
|
|
235
|
-
|
|
377
|
+
curated_comparables = []
|
|
378
|
+
for m in relevant:
|
|
379
|
+
for company in m.get("notable_portfolio", []):
|
|
380
|
+
if company not in curated_comparables:
|
|
381
|
+
curated_comparables.append(company)
|
|
382
|
+
|
|
383
|
+
output = {
|
|
384
|
+
"high_medium_matches": relevant,
|
|
385
|
+
"curated_comparables": curated_comparables[:6]
|
|
386
|
+
}
|
|
387
|
+
json.dump(output, open('/tmp/vc-curated-matches.json', 'w'), indent=2)
|
|
388
|
+
print(f'Curated matches: {len(relevant)} High/Medium confidence funds')
|
|
389
|
+
for m in relevant[:8]:
|
|
390
|
+
print(f' {m["confidence"]:6} ({m["score"]:3}) {m["fund_name"]}')
|
|
391
|
+
print(f'Seed comparables from portfolio: {curated_comparables[:6]}')
|
|
236
392
|
PYEOF
|
|
393
|
+
```
|
|
237
394
|
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
|
|
249
|
-
|
|
250
|
-
|
|
251
|
-
print('
|
|
252
|
-
print('Industry:', analysis['industry_taxonomy']['l1'], '>', analysis['industry_taxonomy']['l2'], '>', analysis['industry_taxonomy']['l3'])
|
|
253
|
-
print('Stage:', analysis['detected_stage'], '(' + analysis['stage_confidence'] + ' confidence)')
|
|
254
|
-
print('Comparables:', ', '.join(c['name'] for c in analysis['comparable_companies']))
|
|
395
|
+
---
|
|
396
|
+
|
|
397
|
+
## Step 6: Discover Comparable Companies via Tavily
|
|
398
|
+
|
|
399
|
+
Load curated portfolio companies from Step 5b as seed comparables:
|
|
400
|
+
|
|
401
|
+
```bash
|
|
402
|
+
python3 -c "
|
|
403
|
+
import json
|
|
404
|
+
matches = json.load(open('/tmp/vc-curated-matches.json'))
|
|
405
|
+
curated = matches.get('curated_comparables', [])
|
|
406
|
+
print(f'Curated portfolio comparables ({len(curated)}): {curated}')
|
|
407
|
+
need = max(0, 5 - len(curated))
|
|
408
|
+
print(f'Tavily will supplement with up to {need} more')
|
|
255
409
|
"
|
|
256
410
|
```
|
|
257
411
|
|
|
258
|
-
**
|
|
412
|
+
**Do not use AI training knowledge to generate comparable companies.** Curated portfolio companies (above) are already zero-hallucination comparables from verified fund data. Tavily supplements with L3-niche-specific companies.
|
|
413
|
+
|
|
414
|
+
```bash
|
|
415
|
+
python3 << 'PYEOF'
|
|
416
|
+
import json, os, urllib.request
|
|
417
|
+
|
|
418
|
+
analysis = json.load(open('/tmp/vc-product-analysis.json'))
|
|
419
|
+
l2 = analysis['industry_taxonomy']['l2']
|
|
420
|
+
l3 = analysis['industry_taxonomy']['l3']
|
|
421
|
+
tavily_key = os.environ.get('TAVILY_API_KEY', '')
|
|
422
|
+
|
|
423
|
+
queries = [
|
|
424
|
+
f'"{l3}" startup raised funding venture capital seed series',
|
|
425
|
+
f'"{l2}" companies venture backed funded startup'
|
|
426
|
+
]
|
|
427
|
+
|
|
428
|
+
all_results = []
|
|
429
|
+
for query in queries:
|
|
430
|
+
payload = json.dumps({
|
|
431
|
+
"api_key": tavily_key,
|
|
432
|
+
"query": query,
|
|
433
|
+
"search_depth": "advanced",
|
|
434
|
+
"max_results": 8,
|
|
435
|
+
"include_answer": True
|
|
436
|
+
}).encode()
|
|
437
|
+
|
|
438
|
+
req = urllib.request.Request(
|
|
439
|
+
'https://api.tavily.com/search',
|
|
440
|
+
data=payload,
|
|
441
|
+
headers={'Content-Type': 'application/json'},
|
|
442
|
+
method='POST'
|
|
443
|
+
)
|
|
444
|
+
|
|
445
|
+
try:
|
|
446
|
+
with urllib.request.urlopen(req, timeout=30) as resp:
|
|
447
|
+
result = json.loads(resp.read())
|
|
448
|
+
all_results.append({
|
|
449
|
+
'query': query,
|
|
450
|
+
'answer': result.get('answer', ''),
|
|
451
|
+
'results': [
|
|
452
|
+
{'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:500]}
|
|
453
|
+
for r in result.get('results', [])
|
|
454
|
+
]
|
|
455
|
+
})
|
|
456
|
+
print(f'Comparable search: {len(result.get("results", []))} results for "{query[:60]}"')
|
|
457
|
+
except Exception as e:
|
|
458
|
+
print(f'Comparable search FAILED: {e}')
|
|
459
|
+
all_results.append({'query': query, 'answer': '', 'results': [], 'error': str(e)})
|
|
460
|
+
|
|
461
|
+
json.dump(all_results, open('/tmp/vc-comparable-search.json', 'w'), indent=2)
|
|
462
|
+
PYEOF
|
|
463
|
+
```
|
|
464
|
+
|
|
465
|
+
Print results for AI selection:
|
|
466
|
+
|
|
467
|
+
```bash
|
|
468
|
+
python3 -c "
|
|
469
|
+
import json
|
|
470
|
+
results = json.load(open('/tmp/vc-comparable-search.json'))
|
|
471
|
+
for r in results:
|
|
472
|
+
print(f'Query: {r[\"query\"]}')
|
|
473
|
+
print(f'Answer: {r.get(\"answer\",\"\")[:400]}')
|
|
474
|
+
for item in r.get('results', []):
|
|
475
|
+
print(f' - {item[\"title\"]} | {item[\"url\"]}')
|
|
476
|
+
print(f' {item[\"content\"][:200]}')
|
|
477
|
+
print()
|
|
478
|
+
"
|
|
479
|
+
```
|
|
480
|
+
|
|
481
|
+
**AI instructions:** Combine the curated portfolio companies from `/tmp/vc-curated-matches.json` with the Tavily search results above. Pick exactly 5 comparable companies. Prioritize curated portfolio companies (already verified -- they are real portfolio companies of matched VC funds). Supplement with Tavily-discovered companies to reach 5 if needed.
|
|
482
|
+
|
|
483
|
+
For each comparable write:
|
|
484
|
+
- `name`: company name
|
|
485
|
+
- `similarity_reason`: one sentence explaining the fit (for curated: reference the fund that backed them; for Tavily: cite the snippet)
|
|
486
|
+
- `source_url`: portfolio fund website for curated companies, Tavily result URL for discovered ones
|
|
487
|
+
- `estimated_stage`: from curated data or snippet text -- write "not in search data" if unknown
|
|
488
|
+
- `source_type`: `"curated_portfolio"` or `"tavily_discovered"`
|
|
489
|
+
|
|
490
|
+
Update `/tmp/vc-product-analysis.json` with the `comparable_companies` array:
|
|
491
|
+
|
|
492
|
+
```bash
|
|
493
|
+
python3 << 'PYEOF'
|
|
494
|
+
import json
|
|
495
|
+
|
|
496
|
+
analysis = json.load(open('/tmp/vc-product-analysis.json'))
|
|
497
|
+
|
|
498
|
+
analysis['comparable_companies'] = [
|
|
499
|
+
# FILL 5 companies -- curated_portfolio first, then tavily_discovered
|
|
500
|
+
# Each: {"name": str, "similarity_reason": str, "source_url": str, "estimated_stage": str, "source_type": str}
|
|
501
|
+
]
|
|
502
|
+
|
|
503
|
+
json.dump(analysis, open('/tmp/vc-product-analysis.json', 'w'), indent=2)
|
|
504
|
+
print('Comparables written:', ', '.join(c['name'] for c in analysis['comparable_companies']))
|
|
505
|
+
PYEOF
|
|
506
|
+
```
|
|
507
|
+
|
|
508
|
+
**If fewer than 3 comparable companies appear in the search results:** Broaden the queries. Run a third search: `"[l1] startup" funding round venture capital`. If still thin, proceed with what is available and flag in `data_quality_flags`.
|
|
259
509
|
|
|
260
510
|
---
|
|
261
511
|
|
|
262
|
-
## Step
|
|
512
|
+
## Step 7: Track A -- Who Invested in Comparable Companies
|
|
263
513
|
|
|
264
|
-
Run 5 Tavily searches, one per comparable.
|
|
514
|
+
Run 5 Tavily searches, one per comparable.
|
|
265
515
|
|
|
266
516
|
```bash
|
|
267
517
|
python3 << 'PYEOF'
|
|
@@ -318,15 +568,13 @@ print(f'Track A complete. Comparables with results: {sum(1 for r in all_track_a
|
|
|
318
568
|
PYEOF
|
|
319
569
|
```
|
|
320
570
|
|
|
321
|
-
**If all 5 Track A searches return 0 results:**
|
|
322
|
-
|
|
323
|
-
If the retry also returns 0 results: proceed to Track B only, and flag this in `data_quality_flags`.
|
|
571
|
+
**If all 5 Track A searches return 0 results:** Re-run Step 6 with broader queries. Retry with well-covered companies (those with significant press coverage). If still 0: proceed to Track B only and flag in `data_quality_flags`.
|
|
324
572
|
|
|
325
573
|
---
|
|
326
574
|
|
|
327
|
-
## Step
|
|
575
|
+
## Step 8: Track B -- VCs With Investment Theses About This Space
|
|
328
576
|
|
|
329
|
-
Run 3 Tavily searches using
|
|
577
|
+
Run 3 Tavily searches using L2 and L3 taxonomy from Step 5.
|
|
330
578
|
|
|
331
579
|
```bash
|
|
332
580
|
python3 << 'PYEOF'
|
|
@@ -339,18 +587,9 @@ stage = analysis['detected_stage']
|
|
|
339
587
|
tavily_key = os.environ.get('TAVILY_API_KEY', '')
|
|
340
588
|
|
|
341
589
|
queries = [
|
|
342
|
-
{
|
|
343
|
-
|
|
344
|
-
|
|
345
|
-
},
|
|
346
|
-
{
|
|
347
|
-
'name': 'thesis_l2',
|
|
348
|
-
'query': f'VC fund "{l2}" investment thesis portfolio companies'
|
|
349
|
-
},
|
|
350
|
-
{
|
|
351
|
-
'name': 'stage_space',
|
|
352
|
-
'query': f'{stage} investors "{l3}" startup venture capital fund'
|
|
353
|
-
}
|
|
590
|
+
{'name': 'thesis_l3', 'query': f'venture capital investment thesis "{l3}" investing 2023 OR 2024 OR 2025'},
|
|
591
|
+
{'name': 'thesis_l2', 'query': f'VC fund "{l2}" investment thesis portfolio companies'},
|
|
592
|
+
{'name': 'stage_space', 'query': f'{stage} investors "{l3}" startup venture capital fund'}
|
|
354
593
|
]
|
|
355
594
|
|
|
356
595
|
all_track_b = []
|
|
@@ -383,33 +622,29 @@ for q in queries:
|
|
|
383
622
|
print(f"Track B - {q['name']}: {len(result.get('results', []))} results")
|
|
384
623
|
except Exception as e:
|
|
385
624
|
print(f"Track B - {q['name']}: FAILED ({e})")
|
|
386
|
-
all_track_b.append({
|
|
387
|
-
'query_name': q['name'],
|
|
388
|
-
'query': q['query'],
|
|
389
|
-
'answer': '',
|
|
390
|
-
'results': [],
|
|
391
|
-
'error': str(e)
|
|
392
|
-
})
|
|
625
|
+
all_track_b.append({'query_name': q['name'], 'query': q['query'], 'answer': '', 'results': [], 'error': str(e)})
|
|
393
626
|
|
|
394
627
|
json.dump(all_track_b, open('/tmp/vc-trackb-results.json', 'w'), indent=2)
|
|
395
628
|
PYEOF
|
|
396
629
|
```
|
|
397
630
|
|
|
398
|
-
**If all 3 Track B searches return 0 results:** Proceed with Track A results only. Note in `data_quality_flags`: "No thesis-led investors found via public search.
|
|
631
|
+
**If all 3 Track B searches return 0 results:** Proceed with Track A results only. Note in `data_quality_flags`: "No thesis-led investors found via public search."
|
|
399
632
|
|
|
400
633
|
---
|
|
401
634
|
|
|
402
|
-
## Step
|
|
635
|
+
## Step 9: Synthesize -- Rank and Score All VCs
|
|
636
|
+
|
|
637
|
+
Print the research data:
|
|
403
638
|
|
|
404
639
|
```bash
|
|
405
|
-
python3
|
|
640
|
+
python3 -c "
|
|
406
641
|
import json
|
|
407
642
|
|
|
408
643
|
analysis = json.load(open('/tmp/vc-product-analysis.json'))
|
|
409
644
|
track_a = json.load(open('/tmp/vc-tracka-results.json'))
|
|
410
645
|
track_b = json.load(open('/tmp/vc-trackb-results.json'))
|
|
646
|
+
curated = json.load(open('/tmp/vc-curated-matches.json'))
|
|
411
647
|
|
|
412
|
-
# Compress results to stay within token limits
|
|
413
648
|
track_a_summary = []
|
|
414
649
|
for item in track_a:
|
|
415
650
|
snippets = [{'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:400]}
|
|
@@ -431,7 +666,22 @@ for item in track_b:
|
|
|
431
666
|
'top_results': snippets
|
|
432
667
|
})
|
|
433
668
|
|
|
434
|
-
|
|
669
|
+
curated_summary = []
|
|
670
|
+
for m in curated.get('high_medium_matches', []):
|
|
671
|
+
curated_summary.append({
|
|
672
|
+
'fund_name': m['fund_name'],
|
|
673
|
+
'confidence': m['confidence'],
|
|
674
|
+
'score': m['score'],
|
|
675
|
+
'matched_tags': m['matched_tags'],
|
|
676
|
+
'thesis': m['thesis'],
|
|
677
|
+
'check_size': m['check_size'],
|
|
678
|
+
'stage_focus': m['stage_focus'],
|
|
679
|
+
'notable_portfolio': m['notable_portfolio'],
|
|
680
|
+
'website': m['website'],
|
|
681
|
+
'source': 'verified (fund website)'
|
|
682
|
+
})
|
|
683
|
+
|
|
684
|
+
print(json.dumps({
|
|
435
685
|
'product': {
|
|
436
686
|
'name': analysis['product_name'],
|
|
437
687
|
'description': analysis['one_line_description'],
|
|
@@ -441,83 +691,58 @@ context = {
|
|
|
441
691
|
'stage_confidence': analysis['stage_confidence'],
|
|
442
692
|
'geography': analysis['geography_bias']
|
|
443
693
|
},
|
|
694
|
+
'curated_matches': curated_summary,
|
|
444
695
|
'track_a_research': track_a_summary,
|
|
445
696
|
'track_b_research': track_b_summary
|
|
446
|
-
}
|
|
447
|
-
|
|
448
|
-
|
|
449
|
-
"system_instruction": {
|
|
450
|
-
"parts": [{
|
|
451
|
-
"text": """You are a venture capital research analyst. Synthesize investor research into a sourced, ranked list. Follow these rules exactly:
|
|
452
|
-
1. Only include VCs whose names appear in the provided Tavily search results. Do not add VCs not mentioned in the data.
|
|
453
|
-
2. Every Track A VC must have evidence_company: the specific comparable company they backed (required -- omit the VC if you cannot confirm this).
|
|
454
|
-
3. Every Track B VC must have thesis_source_title: the exact article or page title where they stated their thesis (required -- omit the VC if you cannot confirm this).
|
|
455
|
-
4. stage_fit_score 1-10: penalize 3 points if the VC's typical stage does not match the product's detected stage.
|
|
456
|
-
5. space_fit_score 1-10: only give 9-10 if the VC backed 2+ companies in this specific L3 niche.
|
|
457
|
-
6. check_size: use ranges from search result data only. If not found, write "not in search data".
|
|
458
|
-
7. approach_method: one of -- cold email, warm intro required, AngelList, application form, Twitter/X DM. Infer from what is publicly known about this fund's intake process.
|
|
459
|
-
8. outreach_hook: must reference this specific product's differentiator and a named VC portfolio signal or thesis quote. Generic hooks like 'highlight your traction' are not acceptable.
|
|
460
|
-
9. No em dashes anywhere in output.
|
|
461
|
-
10. No marketing language."""
|
|
462
|
-
}]
|
|
463
|
-
},
|
|
464
|
-
"contents": [{
|
|
465
|
-
"parts": [{
|
|
466
|
-
"text": f"""Synthesize this VC research for the product below. Return a JSON object with exactly these keys:
|
|
467
|
-
|
|
468
|
-
1. product_summary: object with name, one_line_description, industry_l1, industry_l2, industry_l3, detected_stage, comparable_companies_used (array of names)
|
|
469
|
-
|
|
470
|
-
2. track_a_vcs: array of VC objects from Track A research. Each object:
|
|
471
|
-
- fund_name, evidence_company (REQUIRED), evidence_source_url, stage_focus, check_size, thesis_summary (1-2 sentences), stage_fit_score (1-10), space_fit_score (1-10), approach_method
|
|
472
|
-
|
|
473
|
-
3. track_b_vcs: array of VC objects from Track B research. Each object:
|
|
474
|
-
- fund_name, thesis_source_title (REQUIRED), thesis_source_url, stage_focus, check_size, thesis_summary (1-2 sentences), stage_fit_score (1-10), space_fit_score (1-10), approach_method
|
|
475
|
-
|
|
476
|
-
4. top_5_deep_dives: array of exactly 5 objects (the 5 highest combined score VCs across both tracks). Each:
|
|
477
|
-
- fund_name, track ("A" or "B"), fund_overview (2-3 sentences), why_fit (2-3 sentences specific to this product's L3 niche), portfolio_in_space (array of 1-3 names from search data only), how_to_approach (specific steps, min 30 chars), outreach_hook (2-3 sentences, product-specific)
|
|
697
|
+
}, indent=2))
|
|
698
|
+
"
|
|
699
|
+
```
|
|
478
700
|
|
|
479
|
-
|
|
480
|
-
|
|
701
|
+
**AI instructions -- zero-hallucination rules:**
|
|
702
|
+
|
|
703
|
+
Every field in the output must be traceable to the printed data above. Rules:
|
|
704
|
+
|
|
705
|
+
1. **curated_vcs:** Use the `curated_matches` data directly. These are pre-verified -- no Tavily evidence required. `fund_overview` comes from the `thesis` field in the curated data. `check_size` and `stage_focus` come from the curated data fields. Do NOT fill from training knowledge even for these funds.
|
|
706
|
+
2. **VC names (Track A / B):** Only include a fund if its name appears verbatim in the snippet text or title. No exceptions.
|
|
707
|
+
3. **evidence_company (Track A):** The comparable company they backed -- must be stated in the snippet text, not inferred.
|
|
708
|
+
4. **thesis_source_title (Track B):** The exact title of the article or post as it appears in the search results.
|
|
709
|
+
5. **fund_overview (Track A / B):** Extract from snippet text only. Max 2 sentences. If the snippets do not describe the fund, write "not found in search data".
|
|
710
|
+
6. **thesis_summary:** Close paraphrase of the snippet text. Do not add context from training knowledge.
|
|
711
|
+
7. **check_size (Track A / B):** From snippet data only. Write "not in search data" if not mentioned.
|
|
712
|
+
8. **portfolio_in_space:** Only companies that appear in the search snippets. Write "not found in search data" if none.
|
|
713
|
+
9. **stage_fit_score 1-10:** Penalize 3 points if the VC's stated stage does not match the product's detected stage.
|
|
714
|
+
10. **space_fit_score 1-10:** 9-10 only if the VC backed 2+ companies in the L3 niche per the snippets or curated data.
|
|
715
|
+
11. **approach_method:** one of -- cold email / warm intro required / AngelList / application form / Twitter/X DM. Infer from snippets or fund website.
|
|
716
|
+
12. **outreach_hook:** Must name a specific portfolio signal or thesis quote. Generic hooks like "highlight your traction" are not acceptable.
|
|
717
|
+
13. No em dashes. No marketing language.
|
|
718
|
+
|
|
719
|
+
Write to `/tmp/vc-final-list.json`:
|
|
720
|
+
|
|
721
|
+
- `product_summary`: name, one_line_description, industry_l1, industry_l2, industry_l3, detected_stage, comparable_companies_used (names only)
|
|
722
|
+
- `curated_vcs`: fund_name, confidence ("High"/"Medium"), matched_tags, fund_overview (from thesis field), check_size, stage_focus, website, source ("verified (fund website)"), stage_fit_score, space_fit_score
|
|
723
|
+
- `track_a_vcs`: fund_name, evidence_company (REQUIRED), evidence_source_url, stage_focus, check_size, fund_overview, thesis_summary, stage_fit_score, space_fit_score, approach_method
|
|
724
|
+
- `track_b_vcs`: fund_name, thesis_source_title (REQUIRED), thesis_source_url, stage_focus, check_size, fund_overview, thesis_summary, stage_fit_score, space_fit_score, approach_method
|
|
725
|
+
- `top_5_deep_dives`: fund_name, track ("Curated"/"A"/"B"), fund_overview, why_fit, portfolio_in_space, how_to_approach (min 30 chars), outreach_hook
|
|
726
|
+
- `outreach_hooks`: 3 objects -- hook_type, hook_text (2-3 sentences), best_for
|
|
727
|
+
- `data_quality_flags`: gaps, missing fields, low-confidence areas
|
|
481
728
|
|
|
482
|
-
|
|
729
|
+
```bash
|
|
730
|
+
python3 << 'PYEOF'
|
|
731
|
+
import json
|
|
483
732
|
|
|
484
|
-
|
|
485
|
-
|
|
486
|
-
|
|
487
|
-
}],
|
|
488
|
-
"generationConfig": {
|
|
489
|
-
"temperature": 0.3,
|
|
490
|
-
"maxOutputTokens": 6000
|
|
491
|
-
}
|
|
733
|
+
result = {
|
|
734
|
+
# FILL from synthesis above
|
|
735
|
+
# Must include: product_summary, curated_vcs, track_a_vcs, track_b_vcs, top_5_deep_dives, outreach_hooks, data_quality_flags
|
|
492
736
|
}
|
|
493
737
|
|
|
494
|
-
json.dump(request, open('/tmp/vc-synthesis-request.json', 'w'))
|
|
495
|
-
print('Synthesis request prepared.')
|
|
496
|
-
PYEOF
|
|
497
|
-
|
|
498
|
-
curl -s -X POST \
|
|
499
|
-
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=$GEMINI_API_KEY" \
|
|
500
|
-
-H "Content-Type: application/json" \
|
|
501
|
-
-d @/tmp/vc-synthesis-request.json \
|
|
502
|
-
| python3 -c "
|
|
503
|
-
import sys, json
|
|
504
|
-
d = json.load(sys.stdin)
|
|
505
|
-
text = d['candidates'][0]['content']['parts'][0]['text'].strip()
|
|
506
|
-
if text.startswith('\`\`\`'):
|
|
507
|
-
text = '\n'.join(text.split('\n')[1:-1])
|
|
508
|
-
result = json.loads(text)
|
|
509
738
|
json.dump(result, open('/tmp/vc-final-list.json', 'w'), indent=2)
|
|
510
|
-
print(f'Synthesis
|
|
511
|
-
|
|
739
|
+
print(f'Synthesis written. Curated: {len(result.get("curated_vcs", []))} VCs. Track A: {len(result.get("track_a_vcs", []))} VCs. Track B: {len(result.get("track_b_vcs", []))} VCs.')
|
|
740
|
+
PYEOF
|
|
512
741
|
```
|
|
513
742
|
|
|
514
|
-
**If Gemini returns empty or JSON parsing fails:** Retry once with `maxOutputTokens` reduced to 4000. If retry also fails: present whatever partial JSON was returned, mark missing sections `[INCOMPLETE]`, and tell the user: "Synthesis incomplete. The research data may have been too large. Try running again."
|
|
515
|
-
|
|
516
743
|
---
|
|
517
744
|
|
|
518
|
-
## Step
|
|
519
|
-
|
|
520
|
-
Run before presenting. Remove non-evidenced VCs structurally.
|
|
745
|
+
## Step 10: Self-QA
|
|
521
746
|
|
|
522
747
|
```bash
|
|
523
748
|
python3 << 'PYEOF'
|
|
@@ -540,6 +765,18 @@ removed_b = original_b - len(result['track_b_vcs'])
|
|
|
540
765
|
if removed_b > 0:
|
|
541
766
|
failures.append(f'Removed {removed_b} Track B VC(s) missing thesis_source_title')
|
|
542
767
|
|
|
768
|
+
# Remove deep dives for VCs that were stripped from all tracks
|
|
769
|
+
valid_funds = (
|
|
770
|
+
{v['fund_name'] for v in result.get('curated_vcs', [])} |
|
|
771
|
+
{v['fund_name'] for v in result.get('track_a_vcs', [])} |
|
|
772
|
+
{v['fund_name'] for v in result.get('track_b_vcs', [])}
|
|
773
|
+
)
|
|
774
|
+
original_dives = len(result.get('top_5_deep_dives', []))
|
|
775
|
+
result['top_5_deep_dives'] = [d for d in result.get('top_5_deep_dives', []) if d.get('fund_name') in valid_funds]
|
|
776
|
+
removed_dives = original_dives - len(result['top_5_deep_dives'])
|
|
777
|
+
if removed_dives > 0:
|
|
778
|
+
failures.append(f'Removed {removed_dives} deep dive(s) for funds stripped during QA')
|
|
779
|
+
|
|
543
780
|
# Check top 5 deep dives
|
|
544
781
|
dives = result.get('top_5_deep_dives', [])
|
|
545
782
|
if len(dives) < 5:
|
|
@@ -548,25 +785,31 @@ for dd in dives:
|
|
|
548
785
|
if not dd.get('how_to_approach') or len(dd.get('how_to_approach', '')) < 30:
|
|
549
786
|
dd['how_to_approach'] = 'Approach method not determinable from search data. Check the fund website directly for application instructions.'
|
|
550
787
|
failures.append(f"Fixed: '{dd.get('fund_name')}' had missing how_to_approach")
|
|
788
|
+
if not dd.get('fund_overview') or dd.get('fund_overview') == '':
|
|
789
|
+
dd['fund_overview'] = 'not found in search data'
|
|
551
790
|
|
|
552
791
|
# Check outreach hooks count
|
|
553
792
|
if len(result.get('outreach_hooks', [])) != 3:
|
|
554
793
|
failures.append(f"Expected 3 outreach hooks, got {len(result.get('outreach_hooks', []))}")
|
|
555
794
|
|
|
556
795
|
# Check for em dashes
|
|
557
|
-
|
|
558
|
-
|
|
559
|
-
result = json.loads(
|
|
560
|
-
failures.append('Fixed: em dash characters
|
|
796
|
+
full_text = json.dumps(result)
|
|
797
|
+
if '—' in full_text:
|
|
798
|
+
result = json.loads(full_text.replace('—', '-'))
|
|
799
|
+
failures.append('Fixed: em dash characters replaced with hyphens')
|
|
561
800
|
|
|
562
801
|
# Check for forbidden words
|
|
563
802
|
forbidden = ['powerful', 'robust', 'seamless', 'innovative', 'game-changing', 'streamline', 'leverage', 'transform']
|
|
564
|
-
|
|
803
|
+
full_text_lower = json.dumps(result).lower()
|
|
565
804
|
for word in forbidden:
|
|
566
|
-
if word in
|
|
805
|
+
if word in full_text_lower:
|
|
567
806
|
failures.append(f"Warning: forbidden word '{word}' found in output -- review before presenting")
|
|
568
807
|
|
|
569
|
-
#
|
|
808
|
+
# Flag any "not found in search data" entries so user knows coverage is incomplete
|
|
809
|
+
not_found_count = json.dumps(result).count('not found in search data')
|
|
810
|
+
if not_found_count > 0:
|
|
811
|
+
failures.append(f'INFO: {not_found_count} field(s) marked "not found in search data" -- verify directly before outreach')
|
|
812
|
+
|
|
570
813
|
if 'data_quality_flags' not in result:
|
|
571
814
|
result['data_quality_flags'] = []
|
|
572
815
|
result['data_quality_flags'].extend(failures)
|
|
@@ -582,7 +825,7 @@ PYEOF
|
|
|
582
825
|
|
|
583
826
|
---
|
|
584
827
|
|
|
585
|
-
## Step
|
|
828
|
+
## Step 11: Save and Present Output
|
|
586
829
|
|
|
587
830
|
```bash
|
|
588
831
|
DATE=$(date +%Y-%m-%d)
|
|
@@ -603,13 +846,23 @@ Date: [today] | Stage: [detected_stage] ([stage_confidence] confidence) | Geogra
|
|
|
603
846
|
What it does: [one_line_description]
|
|
604
847
|
Industry: [l1] > [l2] > [l3]
|
|
605
848
|
Buyer: [buyer_persona] at [company_type], [company_size]
|
|
606
|
-
Comparable companies used
|
|
849
|
+
Comparable companies used: [comma-separated list, noting source_type for each]
|
|
850
|
+
|
|
851
|
+
---
|
|
852
|
+
|
|
853
|
+
### Curated Matches (Verified)
|
|
854
|
+
|
|
855
|
+
*Funds matched from a verified dataset of 25 VC funds sourced from fund websites. Zero hallucination -- details come directly from the dataset.*
|
|
856
|
+
|
|
857
|
+
| Fund | Confidence | Stage Focus | Check Size | Matched Tags |
|
|
858
|
+
|---|---|---|---|---|
|
|
859
|
+
[one row per curated VC, sorted by confidence then score]
|
|
607
860
|
|
|
608
861
|
---
|
|
609
862
|
|
|
610
863
|
### Track A: VCs Who Backed Similar Companies
|
|
611
864
|
|
|
612
|
-
*These investors have already written a check in this space.*
|
|
865
|
+
*These investors have already written a check in this space. Evidence from live Tavily search.*
|
|
613
866
|
|
|
614
867
|
| Fund | Backed Comparable | Stage Focus | Check Size | Fit Score | Approach |
|
|
615
868
|
|---|---|---|---|---|---|
|
|
@@ -629,15 +882,15 @@ Comparable companies used for research: [comma-separated list]
|
|
|
629
882
|
|
|
630
883
|
### Top 5 Deep Dives
|
|
631
884
|
|
|
632
|
-
#### [N]. [Fund Name] (Track [A/B])
|
|
885
|
+
#### [N]. [Fund Name] (Track [Curated/A/B])
|
|
633
886
|
|
|
634
|
-
Overview: [fund_overview]
|
|
887
|
+
Overview: [fund_overview -- from dataset or search data only]
|
|
635
888
|
Why it fits: [why_fit]
|
|
636
|
-
Portfolio in this space: [
|
|
889
|
+
Portfolio in this space: [from dataset or search data, or "not found in search data"]
|
|
637
890
|
How to approach: [how_to_approach]
|
|
638
891
|
Outreach hook: "[outreach_hook]"
|
|
639
892
|
|
|
640
|
-
[repeat for all
|
|
893
|
+
[repeat for all available deep dives]
|
|
641
894
|
|
|
642
895
|
---
|
|
643
896
|
|
|
@@ -657,7 +910,7 @@ Saved to: docs/vc-intel/[PRODUCT_SLUG]-[DATE].md
|
|
|
657
910
|
Clean up temp files:
|
|
658
911
|
|
|
659
912
|
```bash
|
|
660
|
-
rm -f /tmp/vc-product-raw.md /tmp/vc-stage-signals.json /tmp/vc-analysis
|
|
661
|
-
/tmp/vc-product-
|
|
662
|
-
/tmp/vc-
|
|
913
|
+
rm -f /tmp/vc-product-raw.md /tmp/vc-stage-signals.json /tmp/vc-product-analysis.json \
|
|
914
|
+
/tmp/vc-product-context.json /tmp/vc-curated-matches.json /tmp/vc-comparable-search.json \
|
|
915
|
+
/tmp/vc-tracka-results.json /tmp/vc-trackb-results.json /tmp/vc-final-list.json
|
|
663
916
|
```
|