@opendirectory.dev/skills 0.1.33 → 0.1.34

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@opendirectory.dev/skills",
3
- "version": "0.1.33",
3
+ "version": "0.1.34",
4
4
  "main": "dist/index.js",
5
5
  "types": "dist/index.d.ts",
6
6
  "bin": {
package/registry.json CHANGED
@@ -289,6 +289,16 @@
289
289
  "version": "0.0.1",
290
290
  "path": "skills/twitter-GTM-find-skill"
291
291
  },
292
+ {
293
+ "name": "vc-finder",
294
+ "description": "Takes a startup product URL or description, detects the industry and funding stage, identifies 5 comparable funded companies, searches who invested...",
295
+ "tags": [
296
+ "SEO"
297
+ ],
298
+ "author": "opendirectory",
299
+ "version": "0.0.1",
300
+ "path": "skills/vc-finder"
301
+ },
292
302
  {
293
303
  "name": "yc-intent-radar-skill",
294
304
  "description": "Scrape daily job listings from YCombinator's Workatastartup platform without duplicates.",
@@ -0,0 +1,18 @@
1
+ # vc-finder: Environment Variables
2
+ # ===================================
3
+ # Gemini and Tavily are required. Firecrawl is recommended.
4
+
5
+ # Required: Google Gemini API key for product analysis and VC synthesis
6
+ # Get it: aistudio.google.com > Get API key
7
+ GEMINI_API_KEY=your_gemini_api_key_here
8
+
9
+ # Required: Tavily API key for VC investment research (Track A and Track B searches)
10
+ # Get it: app.tavily.com > API Keys
11
+ # Free tier: 1000 credits/month (~125 full runs at 8 searches per run)
12
+ TAVILY_API_KEY=your_tavily_api_key_here
13
+
14
+ # Recommended: Firecrawl API key for fetching JS-rendered product pages
15
+ # Get it: firecrawl.dev > Get API key
16
+ # Free tier: 500 credits/month
17
+ # If not set: Tavily extract is used as fallback (may miss content on React/Next.js sites)
18
+ FIRECRAWL_API_KEY=your_firecrawl_api_key_here
@@ -0,0 +1,113 @@
1
+ # vc-finder
2
+
3
+ Give the skill a product URL or description. It detects the industry and funding stage, identifies 5 comparable funded companies, searches who backed those companies (Track A), finds VCs who publish investment theses about this space (Track B), and returns a ranked sourced investor list with deep-dives and outreach hooks.
4
+
5
+ ## Install
6
+
7
+ ```bash
8
+ npx "@opendirectory.dev/skills" install vc-finder --target claude
9
+ ```
10
+
11
+ ### Video Tutorial
12
+ Watch this quick video to see how it's done:
13
+
14
+ https://github.com/user-attachments/assets/ee98a1b5-ebc4-452f-bbfb-c434f2935067
15
+
16
+ ### Step 1: Download the skill from GitHub
17
+ 1. Click the **Code** button on this repo's GitHub page.
18
+ 2. Select **Download ZIP** to download the repository.
19
+ 3. Extract the ZIP file on your computer.
20
+
21
+ ### Step 2: Install the Skill in Claude
22
+ 1. Open your **Claude desktop app**.
23
+ 2. Go to the sidebar on the left side and click on the **Customize** section.
24
+ 3. Click on the **Skills** tab, then click on the **+** (plus) icon button to create a new skill.
25
+ 4. Choose the option to **Upload a skill**, and drag and drop the `.zip` file (or you can extract it and drop the folder, both work).
26
+
27
+ > **Note:** Make sure you are uploading the folder that contains the `SKILL.md` file!
28
+
29
+ ## What It Does
30
+
31
+ - Fetches the product URL via Firecrawl (handles JS-rendered SPAs) or Tavily extract as fallback
32
+ - Detects funding stage from CTA signals on the page (waitlist, free trial, pricing, sales CTAs)
33
+ - Uses Gemini to map a 3-level industry taxonomy (L1 > L2 > L3) and identify 5 comparable funded companies
34
+ - Track A: 5 Tavily searches to find who invested in each comparable company
35
+ - Track B: 3 Tavily searches to find VCs who publish investment theses about this specific niche
36
+ - Gemini synthesizes and ranks all found VCs by stage fit and space fit (1-10 scores)
37
+ - Produces top 5 deep-dives with fund overview, portfolio evidence, how-to-approach, and outreach hook
38
+ - Generates 3 product-specific outreach hooks (not generic advice)
39
+ - Saves output to `docs/vc-intel/[product]-[date].md`
40
+
41
+ ## Requirements
42
+
43
+ | Requirement | Purpose | How to Set Up |
44
+ |---|---|---|
45
+ | Gemini API key | Product analysis and VC synthesis | aistudio.google.com, Get API key |
46
+ | Tavily API key | VC investment research (Track A and Track B) | app.tavily.com, free tier: 1000 credits/month |
47
+ | Firecrawl API key | Fetching JS-rendered product pages | firecrawl.dev, free tier: 500 credits/month |
48
+
49
+ Gemini and Tavily are required. Firecrawl is recommended -- without it, Tavily extract is used as fallback (may miss JS-rendered content).
50
+
51
+ ## Setup
52
+
53
+ ```bash
54
+ cp .env.example .env
55
+ # Add GEMINI_API_KEY and TAVILY_API_KEY (required)
56
+ # Add FIRECRAWL_API_KEY (recommended)
57
+ ```
58
+
59
+ ## How to Use
60
+
61
+ ```
62
+ "Find VCs for my startup: https://example.com"
63
+ "Who invests in developer tools at seed stage?"
64
+ "Build me a VC target list for https://example.com"
65
+ "Which funds should I pitch? https://example.com"
66
+ "Find investors for my product: [paste description]"
67
+ "Who backed companies like mine? https://example.com"
68
+ ```
69
+
70
+ Or paste a product description directly if the URL is behind a login or returns no readable content.
71
+
72
+ ## Why Two Tracks
73
+
74
+ **Track A (portfolio mapping):** VCs who already wrote a check in your space. These investors have proven they understand the category, the risks, and the buyer. They need less convincing than a generalist fund.
75
+
76
+ **Track B (thesis matching):** VCs who are actively publishing about your space. An investor who wrote a 2,000-word blog post about why they want to invest in CI/CD tooling is actively looking for deals. Your cold email lands in a much warmer inbox.
77
+
78
+ Generic "VCs in B2B SaaS" lists skip both signals. This skill produces only VCs with named evidence for each entry.
79
+
80
+ ## Output
81
+
82
+ Each run produces:
83
+
84
+ 1. **Product analysis**: detected industry taxonomy, stage, ICP, comparable companies used
85
+ 2. **Track A table**: VCs who backed comparable companies (with evidence)
86
+ 3. **Track B table**: VCs with published theses about this space (with source)
87
+ 4. **Top 5 deep-dives**: fund overview, why it fits, portfolio in space, how to approach, outreach hook
88
+ 5. **3 outreach hooks**: product-specific openers for cold outreach
89
+
90
+ ## Cost per Run
91
+
92
+ - Firecrawl: ~$0.001 per fetch
93
+ - Tavily: 8 searches at ~$0.01 each = ~$0.08
94
+ - Gemini: 2 calls at ~$0.015 each = ~$0.03
95
+ - Total: ~$0.12 per run
96
+
97
+ ## Project Structure
98
+
99
+ ```
100
+ vc-finder/
101
+ ├── SKILL.md
102
+ ├── README.md
103
+ ├── .env.example
104
+ ├── evals/
105
+ │ └── evals.json
106
+ └── references/
107
+ ├── stage-signals.md
108
+ └── vc-outreach-guide.md
109
+ ```
110
+
111
+ ## License
112
+
113
+ MIT
@@ -0,0 +1,663 @@
1
+ ---
2
+ name: vc-finder
3
+ description: Takes a startup product URL or description, detects the industry and funding stage, identifies 5 comparable funded companies, searches who invested in those companies (Track A), finds VCs who publish investment theses about this space (Track B), and returns a ranked sourced list of relevant investors with deep-dives and outreach hooks. Use when asked to find investors for a startup, identify which VCs fund products like mine, research who backs companies in my space, build a VC target list, or find investor-market fit. Trigger when a user says "find VCs for my startup", "who invests in my space", "build me a VC list", "which funds should I pitch", "find investors for my product", "who backed companies like mine", or "help me find venture capital".
4
+ compatibility: [claude-code, gemini-cli, github-copilot]
5
+ ---
6
+
7
+ # VC Finder
8
+
9
+ Take a product URL or description. Detect industry and stage. Find 5 comparable funded companies. Run two research tracks: who invested in those comparables (Track A), and which VCs publish theses about this space (Track B). Return a sourced, ranked investor list with outreach hooks.
10
+
11
+ ---
12
+
13
+ **Critical rule:** Every VC in Track A must include the specific comparable company they backed as evidence. Every VC in Track B must include the exact article or post title where they stated their thesis. If a VC name did not appear in Tavily search results, do not include them. No hallucinated fund names.
14
+
15
+ ---
16
+
17
+ ## Common Mistakes
18
+
19
+ | The agent will want to... | Why that's wrong |
20
+ |---|---|
21
+ | Add a16z or Sequoia because they are famous | A famous VC without evidence is noise. Only include VCs that appear in Tavily search results for this specific product. Name-dropping wastes the founder's time. |
22
+ | Continue when all 5 Track A searches return 0 results | Zero Track A results means the comparables were wrong or too obscure. Stop, regenerate comparables with broader known names, and retry. Continuing produces an evidence-free list. |
23
+ | Include a Track B VC without citing the article or post | Thesis without a source is indistinguishable from hallucination. The founder cannot verify it and the list loses all credibility. |
24
+ | Detect stage from website aesthetics ("site looks polished") | Stage must come from the specific CTA signals detected in Step 4. Aesthetic guessing sends founders to wrong-stage investors. |
25
+ | Write generic outreach hooks like "highlight your traction" | Every outreach hook must name this specific product's differentiator and a specific VC portfolio signal. Generic hooks are removed by the QA step. |
26
+ | Skip the URL fetch when the user also provides a description | Always fetch the URL. The live page often reveals stage signals (pricing CTAs, customer logos, job openings) that the user's description omits. |
27
+
28
+ ---
29
+
30
+ ## Step 1: Setup Check
31
+
32
+ ```bash
33
+ echo "GEMINI_API_KEY: ${GEMINI_API_KEY:+set}"
34
+ echo "TAVILY_API_KEY: ${TAVILY_API_KEY:+set}"
35
+ echo "FIRECRAWL_API_KEY: ${FIRECRAWL_API_KEY:-not set, Tavily extract will be used as fallback}"
36
+ ```
37
+
38
+ **If GEMINI_API_KEY is missing:** Stop. Tell the user: "GEMINI_API_KEY is required for product analysis and VC synthesis. Get it at aistudio.google.com. Add it to your .env file."
39
+
40
+ **If TAVILY_API_KEY is missing:** Stop. Tell the user: "TAVILY_API_KEY is required to research VC investments and theses. There is no fallback for this. Get it at app.tavily.com. Free tier: 1000 credits/month (about 125 full runs). Add it to your .env file."
41
+
42
+ **If only FIRECRAWL_API_KEY is missing:** Continue silently. Tavily extract will be used for the URL fetch.
43
+
44
+ ---
45
+
46
+ ## Step 2: Gather Input
47
+
48
+ You need:
49
+ - Product URL (required, unless user pastes a product description directly)
50
+ - Optional: target stage hint (pre-seed, seed, series-a, series-b) -- if provided, use it and skip stage detection
51
+ - Optional: geography preference (US, Europe, global) -- defaults to US if not specified
52
+
53
+ **If the user provides only a pasted description (no URL):** Skip Steps 3-4. Go directly to Step 5 with the pasted text as `product_content`. Set `stage_source` to `user_description`.
54
+
55
+ **If neither URL nor description is provided:** Ask: "What is the URL of your product or startup? Or paste a short description: what it does, who it is for, and what stage you are at (pre-seed, seed, Series A)."
56
+
57
+ Derive product slug from URL for the output filename:
58
+
59
+ ```bash
60
+ PRODUCT_SLUG=$(python3 -c "
61
+ from urllib.parse import urlparse
62
+ url = 'URL_HERE'
63
+ host = urlparse(url).netloc.replace('www.', '')
64
+ print(host.split('.')[0])
65
+ ")
66
+ ```
67
+
68
+ ---
69
+
70
+ ## Step 3: Fetch Product Page
71
+
72
+ **Primary: Firecrawl (if FIRECRAWL_API_KEY is set)**
73
+
74
+ ```bash
75
+ curl -s -X POST https://api.firecrawl.dev/v1/scrape \
76
+ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
77
+ -H "Content-Type: application/json" \
78
+ -d '{"url": "URL_HERE", "formats": ["markdown"], "onlyMainContent": true}' \
79
+ | python3 -c "
80
+ import sys, json
81
+ d = json.load(sys.stdin)
82
+ content = d.get('data', {}).get('markdown', '') or d.get('markdown', '')
83
+ print(f'Fetched: {len(content)} characters')
84
+ open('/tmp/vc-product-raw.md', 'w').write(content)
85
+ "
86
+ ```
87
+
88
+ **Fallback: Tavily extract (if FIRECRAWL_API_KEY is not set)**
89
+
90
+ ```bash
91
+ curl -s -X POST https://api.tavily.com/extract \
92
+ -H "Content-Type: application/json" \
93
+ -d "{\"api_key\": \"$TAVILY_API_KEY\", \"urls\": [\"URL_HERE\"]}" \
94
+ | python3 -c "
95
+ import sys, json
96
+ d = json.load(sys.stdin)
97
+ content = d.get('results', [{}])[0].get('raw_content', '')
98
+ print(f'Fetched via Tavily extract: {len(content)} characters')
99
+ open('/tmp/vc-product-raw.md', 'w').write(content)
100
+ "
101
+ ```
102
+
103
+ **Step-level checkpoint:**
104
+
105
+ ```bash
106
+ python3 -c "
107
+ content = open('/tmp/vc-product-raw.md').read()
108
+ if len(content) < 200:
109
+ print('ERROR: Page returned fewer than 200 characters.')
110
+ else:
111
+ print(f'Content OK: {len(content)} characters')
112
+ "
113
+ ```
114
+
115
+ **If content < 200 characters:** Stop fetching. Tell the user: "The product page returned no readable content. This usually means the site is JavaScript-rendered and requires a browser. Please paste your product description directly: what it does, who it is for, and what stage you are at."
116
+
117
+ Proceed to Step 5 using the pasted description as `product_content`.
118
+
119
+ ---
120
+
121
+ ## Step 4: Detect Stage Signals Locally (No API)
122
+
123
+ Parse the fetched markdown with regex before any API call. This gives Gemini anchored evidence rather than asking it to guess from aesthetics.
124
+
125
+ ```bash
126
+ python3 << 'PYEOF'
127
+ import re, json
128
+
129
+ content = open('/tmp/vc-product-raw.md').read().lower()
130
+ stage_signals = []
131
+
132
+ # Pre-seed signals
133
+ if re.search(r'join\s+(the\s+)?waitlist|sign\s+up\s+for\s+beta|early\s+access|request\s+(an?\s+)?invite|get\s+notified', content):
134
+ stage_signals.append({'signal': 'waitlist or beta CTA', 'stage_hint': 'pre-seed'})
135
+
136
+ # Seed signals
137
+ if re.search(r'start\s+(your\s+)?free\s+trial|try\s+(it\s+)?for\s+free|request\s+a?\s+demo|book\s+a?\s+demo|schedule\s+a?\s+demo', content):
138
+ stage_signals.append({'signal': 'free trial or demo CTA', 'stage_hint': 'seed'})
139
+
140
+ # Series A signals
141
+ if re.search(r'contact\s+sales|talk\s+to\s+(our\s+)?sales|see\s+pricing|view\s+pricing|plans\s+and\s+pricing', content):
142
+ stage_signals.append({'signal': 'pricing or sales CTA', 'stage_hint': 'series-a'})
143
+ if re.search(r'case\s+stud(y|ies)|customer\s+stor(y|ies)|trusted\s+by\s+[\d,]+|used\s+by\s+[\d,]+', content):
144
+ stage_signals.append({'signal': 'case studies or customer count', 'stage_hint': 'series-a'})
145
+
146
+ # Series A/B signals
147
+ if re.search(r'enterprise\s+(plan|pricing|tier)|we.?re\s+hiring|join\s+our\s+team|open\s+positions', content):
148
+ stage_signals.append({'signal': 'enterprise tier or job openings', 'stage_hint': 'series-a-or-b'})
149
+
150
+ # Funding announcement -- extract directly if present
151
+ funding_match = re.search(
152
+ r'raised\s+\$[\d,.]+\s*[mk]?|series\s+[abc]\s+round|seed\s+round|(\$[\d,.]+\s*[mk]?\s+(?:seed|series\s+[abc]))',
153
+ content
154
+ )
155
+ if funding_match:
156
+ stage_signals.append({'signal': f'funding text: {funding_match.group(0).strip()}', 'stage_hint': 'announced'})
157
+
158
+ # Determine dominant stage
159
+ if not stage_signals:
160
+ dominant = 'unknown'
161
+ elif any(s['stage_hint'] == 'announced' for s in stage_signals):
162
+ dominant = 'announced'
163
+ elif any(s['stage_hint'] == 'series-a-or-b' for s in stage_signals):
164
+ dominant = 'series-a'
165
+ elif any(s['stage_hint'] == 'series-a' for s in stage_signals):
166
+ dominant = 'series-a'
167
+ elif any(s['stage_hint'] == 'seed' for s in stage_signals):
168
+ dominant = 'seed'
169
+ else:
170
+ dominant = 'pre-seed'
171
+
172
+ confidence = 'high' if len(stage_signals) >= 2 else ('medium' if len(stage_signals) == 1 else 'low')
173
+
174
+ result = {'signals': stage_signals, 'dominant_stage': dominant, 'confidence': confidence}
175
+ json.dump(result, open('/tmp/vc-stage-signals.json', 'w'), indent=2)
176
+ print(f'Stage: {dominant} ({confidence} confidence) from {len(stage_signals)} signal(s)')
177
+ for s in stage_signals:
178
+ print(f' - {s["signal"]} -> {s["stage_hint"]}')
179
+ PYEOF
180
+ ```
181
+
182
+ ---
183
+
184
+ ## Step 5: Product Analysis with Gemini
185
+
186
+ ```bash
187
+ python3 << 'PYEOF'
188
+ import json
189
+
190
+ product_content = open('/tmp/vc-product-raw.md').read()[:6000]
191
+ stage_signals = json.load(open('/tmp/vc-stage-signals.json'))
192
+
193
+ request = {
194
+ "system_instruction": {
195
+ "parts": [{
196
+ "text": "You are a venture capital analyst. Analyze a product page and return structured JSON only. No commentary. No em dashes. Vague category labels like 'technology' or 'software' alone are not acceptable at L2 or L3 -- be specific. Comparable companies must be real funded companies with public funding records, well-known enough to appear in press coverage."
197
+ }]
198
+ },
199
+ "contents": [{
200
+ "parts": [{
201
+ "text": f"""Analyze this product page and return a JSON object with exactly these keys:
202
+
203
+ 1. product_name: string
204
+ 2. one_line_description: string -- what it does, for whom, core value prop. Under 20 words. No marketing language.
205
+ 3. industry_taxonomy: object with:
206
+ - l1: top-level (e.g. "software", "fintech", "healthtech", "consumer", "hardware")
207
+ - l2: sector (e.g. "developer tools", "sales technology", "edtech", "logistics software")
208
+ - l3: specific niche (e.g. "CI/CD automation", "outbound prospecting", "last-mile routing")
209
+ 4. icp: object with:
210
+ - buyer_persona: job title (e.g. "VP Engineering", "founder", "sales ops manager")
211
+ - company_type: (e.g. "B2B SaaS", "e-commerce brand", "enterprise IT team")
212
+ - company_size: (e.g. "5-50 employees", "50-500 employees", "enterprise")
213
+ 5. detected_stage: one of: pre-seed, seed, series-a, series-b, unknown
214
+ 6. stage_confidence: one of: high, medium, low
215
+ 7. stage_evidence: one sentence citing exactly which CTA or text on the page drove this classification. Write "no clear signals found" if unknown.
216
+ 8. comparable_companies: array of exactly 5 objects, each with:
217
+ - name: real company name (must have public VC funding records)
218
+ - similarity_reason: one sentence why this company is comparable to the product
219
+ - estimated_stage: their funding stage as of your knowledge cutoff
220
+ 9. geography_bias: one of: US, Europe, global, unclear -- infer from page text
221
+
222
+ Stage signals detected from the page (use as input to your stage classification):
223
+ {json.dumps(stage_signals, indent=2)}
224
+
225
+ Product page content:
226
+ {product_content}"""
227
+ }]
228
+ }],
229
+ "generationConfig": {
230
+ "temperature": 0.2,
231
+ "maxOutputTokens": 3000
232
+ }
233
+ }
234
+
235
+ json.dump(request, open('/tmp/vc-analysis-request.json', 'w'))
236
+ PYEOF
237
+
238
+ curl -s -X POST \
239
+ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=$GEMINI_API_KEY" \
240
+ -H "Content-Type: application/json" \
241
+ -d @/tmp/vc-analysis-request.json \
242
+ | python3 -c "
243
+ import sys, json
244
+ d = json.load(sys.stdin)
245
+ text = d['candidates'][0]['content']['parts'][0]['text'].strip()
246
+ if text.startswith('\`\`\`'):
247
+ text = '\n'.join(text.split('\n')[1:-1])
248
+ analysis = json.loads(text)
249
+ json.dump(analysis, open('/tmp/vc-product-analysis.json', 'w'), indent=2)
250
+ print('Product analysis complete.')
251
+ print('Product:', analysis['product_name'])
252
+ print('Industry:', analysis['industry_taxonomy']['l1'], '>', analysis['industry_taxonomy']['l2'], '>', analysis['industry_taxonomy']['l3'])
253
+ print('Stage:', analysis['detected_stage'], '(' + analysis['stage_confidence'] + ' confidence)')
254
+ print('Comparables:', ', '.join(c['name'] for c in analysis['comparable_companies']))
255
+ "
256
+ ```
257
+
258
+ **If Gemini returns empty or JSON parsing fails:** Retry once with `maxOutputTokens` reduced to 2000. If retry also fails: Stop. Tell the user: "Product analysis failed. Please paste a direct description (3-5 sentences: what it does, who it is for, current stage) and run again."
259
+
260
+ ---
261
+
262
+ ## Step 6: Track A -- Who Invested in Comparable Companies
263
+
264
+ Run 5 Tavily searches, one per comparable. Save all results to a single file.
265
+
266
+ ```bash
267
+ python3 << 'PYEOF'
268
+ import json, os, urllib.request
269
+
270
+ analysis = json.load(open('/tmp/vc-product-analysis.json'))
271
+ comparables = analysis['comparable_companies']
272
+ tavily_key = os.environ.get('TAVILY_API_KEY', '')
273
+ all_track_a = []
274
+
275
+ for comp in comparables:
276
+ company = comp['name']
277
+ query = f'"{company}" investors funding venture capital backed seed series'
278
+
279
+ payload = json.dumps({
280
+ "api_key": tavily_key,
281
+ "query": query,
282
+ "search_depth": "advanced",
283
+ "max_results": 5,
284
+ "include_answer": True
285
+ }).encode()
286
+
287
+ req = urllib.request.Request(
288
+ 'https://api.tavily.com/search',
289
+ data=payload,
290
+ headers={'Content-Type': 'application/json'},
291
+ method='POST'
292
+ )
293
+
294
+ try:
295
+ with urllib.request.urlopen(req, timeout=30) as resp:
296
+ result = json.loads(resp.read())
297
+ all_track_a.append({
298
+ 'comparable_company': company,
299
+ 'similarity_reason': comp['similarity_reason'],
300
+ 'query': query,
301
+ 'answer': result.get('answer', ''),
302
+ 'results': result.get('results', [])
303
+ })
304
+ print(f'Track A - {company}: {len(result.get("results", []))} results')
305
+ except Exception as e:
306
+ print(f'Track A - {company}: FAILED ({e})')
307
+ all_track_a.append({
308
+ 'comparable_company': company,
309
+ 'similarity_reason': comp['similarity_reason'],
310
+ 'query': query,
311
+ 'answer': '',
312
+ 'results': [],
313
+ 'error': str(e)
314
+ })
315
+
316
+ json.dump(all_track_a, open('/tmp/vc-tracka-results.json', 'w'), indent=2)
317
+ print(f'Track A complete. Comparables with results: {sum(1 for r in all_track_a if r.get("results"))}')
318
+ PYEOF
319
+ ```
320
+
321
+ **If all 5 Track A searches return 0 results:** Tell the user: "No funding data found for the comparable companies. This usually means the comparables are too early-stage or obscure for public press coverage. I will retry with broader comparable names." Then re-run Step 5 with a note to Gemini to choose "well-funded companies with significant press coverage" and retry Step 6.
322
+
323
+ If the retry also returns 0 results: proceed to Track B only, and flag this in `data_quality_flags`.
324
+
325
+ ---
326
+
327
+ ## Step 7: Track B -- VCs With Investment Theses About This Space
328
+
329
+ Run 3 Tavily searches using the L2 and L3 taxonomy from Step 5.
330
+
331
+ ```bash
332
+ python3 << 'PYEOF'
333
+ import json, os, urllib.request
334
+
335
+ analysis = json.load(open('/tmp/vc-product-analysis.json'))
336
+ l2 = analysis['industry_taxonomy']['l2']
337
+ l3 = analysis['industry_taxonomy']['l3']
338
+ stage = analysis['detected_stage']
339
+ tavily_key = os.environ.get('TAVILY_API_KEY', '')
340
+
341
+ queries = [
342
+ {
343
+ 'name': 'thesis_l3',
344
+ 'query': f'venture capital investment thesis "{l3}" investing 2023 OR 2024 OR 2025'
345
+ },
346
+ {
347
+ 'name': 'thesis_l2',
348
+ 'query': f'VC fund "{l2}" investment thesis portfolio companies'
349
+ },
350
+ {
351
+ 'name': 'stage_space',
352
+ 'query': f'{stage} investors "{l3}" startup venture capital fund'
353
+ }
354
+ ]
355
+
356
+ all_track_b = []
357
+
358
+ for q in queries:
359
+ payload = json.dumps({
360
+ "api_key": tavily_key,
361
+ "query": q['query'],
362
+ "search_depth": "advanced",
363
+ "max_results": 7,
364
+ "include_answer": True
365
+ }).encode()
366
+
367
+ req = urllib.request.Request(
368
+ 'https://api.tavily.com/search',
369
+ data=payload,
370
+ headers={'Content-Type': 'application/json'},
371
+ method='POST'
372
+ )
373
+
374
+ try:
375
+ with urllib.request.urlopen(req, timeout=30) as resp:
376
+ result = json.loads(resp.read())
377
+ all_track_b.append({
378
+ 'query_name': q['name'],
379
+ 'query': q['query'],
380
+ 'answer': result.get('answer', ''),
381
+ 'results': result.get('results', [])
382
+ })
383
+ print(f"Track B - {q['name']}: {len(result.get('results', []))} results")
384
+ except Exception as e:
385
+ print(f"Track B - {q['name']}: FAILED ({e})")
386
+ all_track_b.append({
387
+ 'query_name': q['name'],
388
+ 'query': q['query'],
389
+ 'answer': '',
390
+ 'results': [],
391
+ 'error': str(e)
392
+ })
393
+
394
+ json.dump(all_track_b, open('/tmp/vc-trackb-results.json', 'w'), indent=2)
395
+ PYEOF
396
+ ```
397
+
398
+ **If all 3 Track B searches return 0 results:** Proceed with Track A results only. Note in `data_quality_flags`: "No thesis-led investors found via public search. Try checking Substack manually for VC newsletters covering this niche."
399
+
400
+ ---
401
+
402
+ ## Step 8: Gemini Synthesis -- Rank and Score All VCs
403
+
404
+ ```bash
405
+ python3 << 'PYEOF'
406
+ import json
407
+
408
+ analysis = json.load(open('/tmp/vc-product-analysis.json'))
409
+ track_a = json.load(open('/tmp/vc-tracka-results.json'))
410
+ track_b = json.load(open('/tmp/vc-trackb-results.json'))
411
+
412
+ # Compress results to stay within token limits
413
+ track_a_summary = []
414
+ for item in track_a:
415
+ snippets = [{'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:400]}
416
+ for r in item.get('results', [])[:3]]
417
+ track_a_summary.append({
418
+ 'comparable_company': item['comparable_company'],
419
+ 'similarity_reason': item['similarity_reason'],
420
+ 'answer': item.get('answer', '')[:500],
421
+ 'top_results': snippets
422
+ })
423
+
424
+ track_b_summary = []
425
+ for item in track_b:
426
+ snippets = [{'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:400]}
427
+ for r in item.get('results', [])[:4]]
428
+ track_b_summary.append({
429
+ 'query_name': item['query_name'],
430
+ 'answer': item.get('answer', '')[:500],
431
+ 'top_results': snippets
432
+ })
433
+
434
+ context = {
435
+ 'product': {
436
+ 'name': analysis['product_name'],
437
+ 'description': analysis['one_line_description'],
438
+ 'industry': analysis['industry_taxonomy'],
439
+ 'icp': analysis['icp'],
440
+ 'stage': analysis['detected_stage'],
441
+ 'stage_confidence': analysis['stage_confidence'],
442
+ 'geography': analysis['geography_bias']
443
+ },
444
+ 'track_a_research': track_a_summary,
445
+ 'track_b_research': track_b_summary
446
+ }
447
+
448
+ request = {
449
+ "system_instruction": {
450
+ "parts": [{
451
+ "text": """You are a venture capital research analyst. Synthesize investor research into a sourced, ranked list. Follow these rules exactly:
452
+ 1. Only include VCs whose names appear in the provided Tavily search results. Do not add VCs not mentioned in the data.
453
+ 2. Every Track A VC must have evidence_company: the specific comparable company they backed (required -- omit the VC if you cannot confirm this).
454
+ 3. Every Track B VC must have thesis_source_title: the exact article or page title where they stated their thesis (required -- omit the VC if you cannot confirm this).
455
+ 4. stage_fit_score 1-10: penalize 3 points if the VC's typical stage does not match the product's detected stage.
456
+ 5. space_fit_score 1-10: only give 9-10 if the VC backed 2+ companies in this specific L3 niche.
457
+ 6. check_size: use ranges from search result data only. If not found, write "not in search data".
458
+ 7. approach_method: one of -- cold email, warm intro required, AngelList, application form, Twitter/X DM. Infer from what is publicly known about this fund's intake process.
459
+ 8. outreach_hook: must reference this specific product's differentiator and a named VC portfolio signal or thesis quote. Generic hooks like 'highlight your traction' are not acceptable.
460
+ 9. No em dashes anywhere in output.
461
+ 10. No marketing language."""
462
+ }]
463
+ },
464
+ "contents": [{
465
+ "parts": [{
466
+ "text": f"""Synthesize this VC research for the product below. Return a JSON object with exactly these keys:
467
+
468
+ 1. product_summary: object with name, one_line_description, industry_l1, industry_l2, industry_l3, detected_stage, comparable_companies_used (array of names)
469
+
470
+ 2. track_a_vcs: array of VC objects from Track A research. Each object:
471
+ - fund_name, evidence_company (REQUIRED), evidence_source_url, stage_focus, check_size, thesis_summary (1-2 sentences), stage_fit_score (1-10), space_fit_score (1-10), approach_method
472
+
473
+ 3. track_b_vcs: array of VC objects from Track B research. Each object:
474
+ - fund_name, thesis_source_title (REQUIRED), thesis_source_url, stage_focus, check_size, thesis_summary (1-2 sentences), stage_fit_score (1-10), space_fit_score (1-10), approach_method
475
+
476
+ 4. top_5_deep_dives: array of exactly 5 objects (the 5 highest combined score VCs across both tracks). Each:
477
+ - fund_name, track ("A" or "B"), fund_overview (2-3 sentences), why_fit (2-3 sentences specific to this product's L3 niche), portfolio_in_space (array of 1-3 names from search data only), how_to_approach (specific steps, min 30 chars), outreach_hook (2-3 sentences, product-specific)
478
+
479
+ 5. outreach_hooks: array of exactly 3 objects:
480
+ - hook_type (e.g. "portfolio overlap angle", "thesis language mirror", "comparable exit angle"), hook_text (2-3 sentences a founder would actually send), best_for (which VC type this works for)
481
+
482
+ 6. data_quality_flags: array of strings noting any gaps or low-confidence areas
483
+
484
+ Research data:
485
+ {json.dumps(context, indent=2)}"""
486
+ }]
487
+ }],
488
+ "generationConfig": {
489
+ "temperature": 0.3,
490
+ "maxOutputTokens": 6000
491
+ }
492
+ }
493
+
494
+ json.dump(request, open('/tmp/vc-synthesis-request.json', 'w'))
495
+ print('Synthesis request prepared.')
496
+ PYEOF
497
+
498
+ curl -s -X POST \
499
+ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=$GEMINI_API_KEY" \
500
+ -H "Content-Type: application/json" \
501
+ -d @/tmp/vc-synthesis-request.json \
502
+ | python3 -c "
503
+ import sys, json
504
+ d = json.load(sys.stdin)
505
+ text = d['candidates'][0]['content']['parts'][0]['text'].strip()
506
+ if text.startswith('\`\`\`'):
507
+ text = '\n'.join(text.split('\n')[1:-1])
508
+ result = json.loads(text)
509
+ json.dump(result, open('/tmp/vc-final-list.json', 'w'), indent=2)
510
+ print(f'Synthesis complete. Track A: {len(result.get(\"track_a_vcs\", []))} VCs. Track B: {len(result.get(\"track_b_vcs\", []))} VCs.')
511
+ "
512
+ ```
513
+
514
+ **If Gemini returns empty or JSON parsing fails:** Retry once with `maxOutputTokens` reduced to 4000. If retry also fails: present whatever partial JSON was returned, mark missing sections `[INCOMPLETE]`, and tell the user: "Synthesis incomplete. The research data may have been too large. Try running again."
515
+
516
+ ---
517
+
518
+ ## Step 9: Self-QA
519
+
520
+ Run before presenting. Remove non-evidenced VCs structurally.
521
+
522
+ ```bash
523
+ python3 << 'PYEOF'
524
+ import json
525
+
526
+ result = json.load(open('/tmp/vc-final-list.json'))
527
+ failures = []
528
+
529
+ # Remove Track A VCs missing evidence_company
530
+ original_a = len(result.get('track_a_vcs', []))
531
+ result['track_a_vcs'] = [v for v in result.get('track_a_vcs', []) if v.get('evidence_company')]
532
+ removed_a = original_a - len(result['track_a_vcs'])
533
+ if removed_a > 0:
534
+ failures.append(f'Removed {removed_a} Track A VC(s) missing evidence_company')
535
+
536
+ # Remove Track B VCs missing thesis_source_title
537
+ original_b = len(result.get('track_b_vcs', []))
538
+ result['track_b_vcs'] = [v for v in result.get('track_b_vcs', []) if v.get('thesis_source_title')]
539
+ removed_b = original_b - len(result['track_b_vcs'])
540
+ if removed_b > 0:
541
+ failures.append(f'Removed {removed_b} Track B VC(s) missing thesis_source_title')
542
+
543
+ # Check top 5 deep dives
544
+ dives = result.get('top_5_deep_dives', [])
545
+ if len(dives) < 5:
546
+ failures.append(f'Only {len(dives)} deep dives (expected 5) -- insufficient search data')
547
+ for dd in dives:
548
+ if not dd.get('how_to_approach') or len(dd.get('how_to_approach', '')) < 30:
549
+ dd['how_to_approach'] = 'Approach method not determinable from search data. Check the fund website directly for application instructions.'
550
+ failures.append(f"Fixed: '{dd.get('fund_name')}' had missing how_to_approach")
551
+
552
+ # Check outreach hooks count
553
+ if len(result.get('outreach_hooks', [])) != 3:
554
+ failures.append(f"Expected 3 outreach hooks, got {len(result.get('outreach_hooks', []))}")
555
+
556
+ # Check for em dashes
557
+ if ':' in json.dumps(result):
558
+ result_str = json.dumps(result).replace(':', ':')
559
+ result = json.loads(result_str)
560
+ failures.append('Fixed: em dash characters removed from output')
561
+
562
+ # Check for forbidden words
563
+ forbidden = ['powerful', 'robust', 'seamless', 'innovative', 'game-changing', 'streamline', 'leverage', 'transform']
564
+ full_text = json.dumps(result).lower()
565
+ for word in forbidden:
566
+ if word in full_text:
567
+ failures.append(f"Warning: forbidden word '{word}' found in output -- review before presenting")
568
+
569
+ # Ensure data_quality_flags exists
570
+ if 'data_quality_flags' not in result:
571
+ result['data_quality_flags'] = []
572
+ result['data_quality_flags'].extend(failures)
573
+
574
+ json.dump(result, open('/tmp/vc-final-list.json', 'w'), indent=2)
575
+ print(f'QA complete. Issues addressed: {len(failures)}')
576
+ for f in failures:
577
+ print(f' - {f}')
578
+ if not failures:
579
+ print('All QA checks passed.')
580
+ PYEOF
581
+ ```
582
+
583
+ ---
584
+
585
+ ## Step 10: Save and Present Output
586
+
587
+ ```bash
588
+ DATE=$(date +%Y-%m-%d)
589
+ OUTPUT_FILE="docs/vc-intel/${PRODUCT_SLUG}-${DATE}.md"
590
+ mkdir -p docs/vc-intel
591
+ ```
592
+
593
+ Present the final output:
594
+
595
+ ```
596
+ ## VC Finder: [product_name]
597
+ Date: [today] | Stage: [detected_stage] ([stage_confidence] confidence) | Geography: [geography_bias]
598
+
599
+ ---
600
+
601
+ ### Product Analysis
602
+
603
+ What it does: [one_line_description]
604
+ Industry: [l1] > [l2] > [l3]
605
+ Buyer: [buyer_persona] at [company_type], [company_size]
606
+ Comparable companies used for research: [comma-separated list]
607
+
608
+ ---
609
+
610
+ ### Track A: VCs Who Backed Similar Companies
611
+
612
+ *These investors have already written a check in this space.*
613
+
614
+ | Fund | Backed Comparable | Stage Focus | Check Size | Fit Score | Approach |
615
+ |---|---|---|---|---|---|
616
+ [one row per Track A VC, sorted by space_fit_score descending]
617
+
618
+ ---
619
+
620
+ ### Track B: Thesis-Led Investors
621
+
622
+ *These investors are actively publishing about this space.*
623
+
624
+ | Fund | Thesis Source | Stage Focus | Check Size | Fit Score | Approach |
625
+ |---|---|---|---|---|---|
626
+ [one row per Track B VC, sorted by space_fit_score descending]
627
+
628
+ ---
629
+
630
+ ### Top 5 Deep Dives
631
+
632
+ #### [N]. [Fund Name] (Track [A/B])
633
+
634
+ Overview: [fund_overview]
635
+ Why it fits: [why_fit]
636
+ Portfolio in this space: [names, or "Not found in search data"]
637
+ How to approach: [how_to_approach]
638
+ Outreach hook: "[outreach_hook]"
639
+
640
+ [repeat for all 5]
641
+
642
+ ---
643
+
644
+ ### 3 Outreach Hooks for This Product Type
645
+
646
+ **1. [hook_type]**
647
+ [hook_text]
648
+ Best for: [best_for]
649
+
650
+ [repeat for all 3]
651
+
652
+ ---
653
+ Data quality notes: [data_quality_flags, or "None"]
654
+ Saved to: docs/vc-intel/[PRODUCT_SLUG]-[DATE].md
655
+ ```
656
+
657
+ Clean up temp files:
658
+
659
+ ```bash
660
+ rm -f /tmp/vc-product-raw.md /tmp/vc-stage-signals.json /tmp/vc-analysis-request.json \
661
+ /tmp/vc-product-analysis.json /tmp/vc-tracka-results.json /tmp/vc-trackb-results.json \
662
+ /tmp/vc-synthesis-request.json /tmp/vc-final-list.json /tmp/vc-qa-result.json
663
+ ```
@@ -0,0 +1,125 @@
1
+ [
2
+ {
3
+ "id": "eval_001",
4
+ "name": "B2B SaaS product URL: full two-track output with correct stage detection",
5
+ "description": "A B2B SaaS developer tool with a demo CTA. Validates the full 10-step workflow, stage detection from page signals, two-track VC research, and complete output.",
6
+ "input": {
7
+ "prompt": "Find VCs for my startup: https://linear.app",
8
+ "env": {
9
+ "GEMINI_API_KEY": "set",
10
+ "TAVILY_API_KEY": "set",
11
+ "FIRECRAWL_API_KEY": "set"
12
+ }
13
+ },
14
+ "expected_behavior": [
15
+ "Fetches URL via Firecrawl",
16
+ "Step 4 detects 'free trial or demo CTA' signal from page content",
17
+ "Stage outputs 'seed' or 'series-a' with medium or high confidence",
18
+ "Industry taxonomy: software > developer tools > issue tracking or project management",
19
+ "Generates 5 comparable companies (e.g. Jira, Shortcut, Height, Plane, Asana)",
20
+ "Runs 5 Track A Tavily searches using quoted company names",
21
+ "Runs 3 Track B Tavily searches using L2 and L3 taxonomy terms",
22
+ "Every Track A VC in the output has evidence_company field populated",
23
+ "Every Track B VC in the output has thesis_source_title field populated",
24
+ "Self-QA step removes any VCs missing required evidence fields",
25
+ "Output includes exactly 5 deep dives and exactly 3 outreach hooks",
26
+ "Output saved to docs/vc-intel/linear-[date].md"
27
+ ],
28
+ "expected_output": "Full two-track VC list with sourced Track A evidence and Track B thesis citations, 5 deep dives with product-specific outreach hooks, saved to docs/vc-intel/"
29
+ },
30
+ {
31
+ "id": "eval_002",
32
+ "name": "Consumer app: different stage signals, consumer-focused VCs not B2B enterprise funds",
33
+ "description": "A consumer mobile app. Validates that the skill detects consumer-specific signals and returns consumer-focused investors, not enterprise SaaS funds.",
34
+ "input": {
35
+ "prompt": "Find investors for this consumer app: https://www.bereal.com",
36
+ "env": {
37
+ "GEMINI_API_KEY": "set",
38
+ "TAVILY_API_KEY": "set",
39
+ "FIRECRAWL_API_KEY": "set"
40
+ }
41
+ },
42
+ "expected_behavior": [
43
+ "Fetches URL successfully",
44
+ "Industry taxonomy maps to consumer or social media, not B2B SaaS",
45
+ "Comparable companies are consumer social apps, not enterprise tools",
46
+ "Track A searches target consumer social investors (who backed TikTok, Snapchat, Instagram at early stage)",
47
+ "Track B searches use consumer social or UGC thesis terms",
48
+ "VCs in output are consumer-focused funds, not B2B SaaS investors",
49
+ "Outreach hooks are specific to consumer social or photo product type",
50
+ "No B2B enterprise-only investors appear in the output"
51
+ ],
52
+ "expected_output": "VC list with consumer-focused investors, stage-appropriate alignment, no B2B enterprise fund entries"
53
+ },
54
+ {
55
+ "id": "eval_003",
56
+ "name": "Description-only input: skips fetch, goes direct to Gemini analysis",
57
+ "description": "User pastes a product description with no URL. Validates that Steps 3 and 4 are skipped and the skill proceeds directly to Gemini analysis.",
58
+ "input": {
59
+ "prompt": "Find VCs for my startup. Here's what we do: We build an AI-powered legal contract review tool for in-house legal teams at mid-market companies. Our tool flags risky clauses and suggests standard language. We're pre-seed, 3 months post-launch, no pricing page yet. US-focused.",
60
+ "env": {
61
+ "GEMINI_API_KEY": "set",
62
+ "TAVILY_API_KEY": "set",
63
+ "FIRECRAWL_API_KEY": "not set"
64
+ }
65
+ },
66
+ "expected_behavior": [
67
+ "Detects no URL in the input",
68
+ "Skips Step 3: no fetch attempt is made",
69
+ "Skips Step 4: no regex stage detection from page",
70
+ "Passes the pasted description directly to Gemini in Step 5",
71
+ "Stage set to 'pre-seed' with high confidence (user stated it explicitly)",
72
+ "Industry taxonomy: software > legaltech > contract review automation",
73
+ "Generates 5 comparable legaltech companies",
74
+ "Runs full Track A and Track B searches using description-derived comparables and taxonomy",
75
+ "Output note: 'Stage: pre-seed (stated by user)'",
76
+ "data_quality_flags includes note about stage coming from user description not page signals"
77
+ ],
78
+ "expected_output": "Full VC list using description-derived analysis, legaltech-specific investors, stage labeled as user-stated"
79
+ },
80
+ {
81
+ "id": "eval_004",
82
+ "name": "GEMINI_API_KEY missing: immediate stop at Step 1",
83
+ "description": "Validates that the skill stops at Step 1 with exact setup instructions when Gemini key is absent.",
84
+ "input": {
85
+ "prompt": "Find VCs for https://stripe.com",
86
+ "env": {
87
+ "GEMINI_API_KEY": "not set",
88
+ "TAVILY_API_KEY": "set",
89
+ "FIRECRAWL_API_KEY": "set"
90
+ }
91
+ },
92
+ "expected_behavior": [
93
+ "Step 1 detects GEMINI_API_KEY is missing",
94
+ "Stops immediately at Step 1",
95
+ "Tells the user: 'GEMINI_API_KEY is required for product analysis and VC synthesis. Get it at aistudio.google.com. Add it to your .env file.'",
96
+ "Does NOT fetch the URL",
97
+ "Does NOT run any Tavily searches",
98
+ "Does NOT attempt any analysis"
99
+ ],
100
+ "expected_output": "Immediate stop at Step 1 with exact error message including the URL to get the key. No partial output generated."
101
+ },
102
+ {
103
+ "id": "eval_005",
104
+ "name": "TAVILY_API_KEY missing: immediate stop at Step 1, no fallback",
105
+ "description": "Validates that the skill stops at Step 1 when Tavily key is absent, with no fallback path suggested.",
106
+ "input": {
107
+ "prompt": "Who invests in CI/CD automation startups? https://buildkite.com",
108
+ "env": {
109
+ "GEMINI_API_KEY": "set",
110
+ "TAVILY_API_KEY": "not set",
111
+ "FIRECRAWL_API_KEY": "set"
112
+ }
113
+ },
114
+ "expected_behavior": [
115
+ "Step 1 detects GEMINI_API_KEY is set",
116
+ "Step 1 detects TAVILY_API_KEY is missing",
117
+ "Stops immediately at Step 1",
118
+ "Tells the user: 'TAVILY_API_KEY is required to research VC investments and theses. There is no fallback for this. Get it at app.tavily.com. Free tier: 1000 credits/month (about 125 full runs). Add it to your .env file.'",
119
+ "Does NOT fetch the URL",
120
+ "Does NOT run any Gemini analysis",
121
+ "Does NOT suggest any workaround"
122
+ ],
123
+ "expected_output": "Immediate stop at Step 1 with exact error message. Does not proceed past setup check."
124
+ }
125
+ ]
@@ -0,0 +1,98 @@
1
+ # Stage Signals Reference
2
+
3
+ Used by SKILL.md Step 4 to detect funding stage from product page content before any API call.
4
+
5
+ ---
6
+
7
+ ## Signal Detection Table
8
+
9
+ | Signal Pattern (regex) | Stage Hint | Example Page Text |
10
+ |---|---|---|
11
+ | `join (the )?waitlist` | pre-seed | "Join the waitlist for early access" |
12
+ | `sign up for beta` | pre-seed | "Sign up for our beta program" |
13
+ | `early access` | pre-seed | "Request early access" |
14
+ | `request (an? )?invite` | pre-seed | "Request an invite" |
15
+ | `get notified` | pre-seed | "Get notified when we launch" |
16
+ | `start (your )?free trial` | seed | "Start your free 14-day trial" |
17
+ | `try (it )?for free` | seed | "Try for free, no credit card required" |
18
+ | `request a? demo` | seed | "Request a demo" |
19
+ | `book a? demo` | seed | "Book a demo with our team" |
20
+ | `schedule a? demo` | seed | "Schedule a 30-minute demo" |
21
+ | `contact sales` | series-a | "Contact sales for enterprise pricing" |
22
+ | `talk to (our )?sales` | series-a | "Talk to our sales team" |
23
+ | `see pricing` / `view pricing` | series-a | "See pricing" |
24
+ | `plans and pricing` | series-a | "Plans and Pricing" |
25
+ | `case stud(y\|ies)` | series-a | "Read our case studies" |
26
+ | `customer stor(y\|ies)` | series-a | "Customer success stories" |
27
+ | `trusted by \d+` | series-a | "Trusted by 2,000+ teams" |
28
+ | `enterprise (plan\|pricing\|tier)` | series-a-or-b | "Enterprise plan available" |
29
+ | `we.?re hiring` | series-a-or-b | "We're hiring -- see open roles" |
30
+ | `join our team` | series-a-or-b | "Join our team of 50+" |
31
+ | `raised \$[\d,.]+[mk]?` | announced | "We raised $8M in Series A" |
32
+ | `series [abc] round` | announced | "Series B round closed" |
33
+ | `seed round` | announced | "Seed round led by X" |
34
+
35
+ ---
36
+
37
+ ## Signal Confidence Rules
38
+
39
+ | Signals Found | Confidence |
40
+ |---|---|
41
+ | 2 or more matching signals | high |
42
+ | Exactly 1 matching signal | medium |
43
+ | 0 signals (stage estimated from content alone) | low |
44
+ | Funding announcement text found directly | high (overrides other signals) |
45
+
46
+ ---
47
+
48
+ ## Handling Conflicting Signals
49
+
50
+ Some pages show signals from multiple stages simultaneously (e.g. a pricing page AND a waitlist). Use this resolution order:
51
+
52
+ 1. Funding announcement text wins over all other signals (the stage is known, not inferred)
53
+ 2. If both "pricing page" (Series A) and "free trial" (seed) signals appear: call it seed-to-series-a transition, output `series-a` with medium confidence
54
+ 3. If both "waitlist" (pre-seed) and "demo request" (seed) signals appear: output `seed` -- the product is likely further along than the waitlist implies
55
+ 4. If no signals found at all: output `unknown` with low confidence, pass this to Gemini with a note to infer from product maturity and content
56
+
57
+ ---
58
+
59
+ ## Common Misdetections
60
+
61
+ **"Request demo" from a mature Series B company:** Some large companies keep demo CTAs even after raising Series B. Override signals if the page also shows: enterprise logo bars, "trusted by Fortune 500", or explicit Series B announcement.
62
+
63
+ **Open-source projects:** Often show no stage signals (no pricing, no CTA). Output `unknown`. In the Gemini analysis step, note that the product appears open-source and ask Gemini to infer whether there is a commercial entity behind it.
64
+
65
+ **Startup landing pages with no product live:** A site with only a headline, a value prop paragraph, and an email capture is almost certainly pre-seed. Even without explicit waitlist language, if the page has no product demo and no pricing, output `pre-seed` with medium confidence.
66
+
67
+ ---
68
+
69
+ ## Stage-to-Tavily Query Modifiers
70
+
71
+ These modifiers are added to Track B search queries based on detected stage:
72
+
73
+ | Detected Stage | Query Modifier |
74
+ |---|---|
75
+ | pre-seed | "pre-seed micro VC angel" |
76
+ | seed | "seed fund" |
77
+ | series-a | "series A lead investor" |
78
+ | series-b | "growth stage VC" |
79
+ | unknown | "early stage" (default) |
80
+
81
+ Example Track B query with modifier:
82
+ - L3 = "CI/CD automation", stage = seed
83
+ - Query: `seed fund "CI/CD automation" investment thesis portfolio companies`
84
+
85
+ ---
86
+
87
+ ## Stage-to-VC-Check-Size Reference
88
+
89
+ Use this to validate stage fit scores in the synthesis step:
90
+
91
+ | Stage | Typical Check Size |
92
+ |---|---|
93
+ | Pre-seed | $25K-$500K |
94
+ | Seed | $500K-$3M |
95
+ | Series A | $3M-$15M |
96
+ | Series B | $15M-$50M |
97
+
98
+ A VC whose typical check size is $20M-$50M has a stage fit score of 2 or lower for a pre-seed product, regardless of how well their thesis matches.
@@ -0,0 +1,142 @@
1
+ # VC Outreach Guide
2
+
3
+ How to approach the different investor archetypes the skill surfaces. The right approach depends on the VC type, not on the founder's preference.
4
+
5
+ ---
6
+
7
+ ## 5 Investor Archetypes
8
+
9
+ ### Archetype 1: Thesis-First Writers (Track B investors)
10
+
11
+ These VCs publish blog posts, newsletters, or Twitter threads about why they want to invest in a specific space. They are signaling active deal interest.
12
+
13
+ **How to approach:**
14
+ 1. Read the article or post that surfaced them in Track B before reaching out
15
+ 2. Open with a direct reference to their specific thesis language, not a paraphrase: "You wrote in [post title] that you believe [exact phrase from post]. We built exactly that."
16
+ 3. Keep the first message under 100 words. The thesis reference does the work -- do not over-explain.
17
+ 4. Cold email works here. These VCs publish specifically to attract inbound -- they expect cold emails from founders who read their work.
18
+
19
+ **What to avoid:** Do not write "I came across your firm" or "I've been following your work." These are signals that you did not actually read the thesis. Quote it or do not mention it.
20
+
21
+ ---
22
+
23
+ ### Archetype 2: Portfolio-Pattern Investors (Track A investors)
24
+
25
+ These VCs backed a company comparable to yours. They already understand the space, the buyer, and the risk profile.
26
+
27
+ **How to approach:**
28
+ 1. Name the specific portfolio company in your opening: "You backed [Company X] which means you understand [specific pain]."
29
+ 2. Do not position yourself as a competitor to their portfolio company. Instead, find the adjacent or complementary angle: "We're solving the [different part of the workflow] problem that [Company X] doesn't address."
30
+ 3. Cold email or LinkedIn works. Warm intro from a mutual connection in the portfolio company works best.
31
+
32
+ **What to avoid:** Do not say "just like [portfolio company] but better." This forces the investor to choose between two bets and they will default to protecting the existing one.
33
+
34
+ ---
35
+
36
+ ### Archetype 3: Operator-Turned-Investor
37
+
38
+ Former founders or operators who moved into investing. Typically reachable via warm intro from their portfolio company founders.
39
+
40
+ **How to approach:**
41
+ 1. Find a founder in their portfolio via LinkedIn or the fund's website
42
+ 2. Get a warm intro from that founder: "I'm building in the same space you were in at [company] -- would you be willing to introduce me to [investor name]?"
43
+ 3. Operator-investors respond to founder-to-founder intros at 3-5x the rate of cold outreach
44
+
45
+ **What to avoid:** Cold email works less well with this archetype. They deal-select heavily from their personal networks. Do not skip the warm path attempt.
46
+
47
+ ---
48
+
49
+ ### Archetype 4: Multi-Stage Generalists with Sector Coverage
50
+
51
+ Large funds with dedicated sector teams. Partners focus on different verticals. Do not email the wrong person.
52
+
53
+ **How to approach:**
54
+ 1. Identify the specific partner who covers your sector from the fund website or LinkedIn. Look for their recent investments in your space.
55
+ 2. Email the sector partner, not the managing partner or the fund's general inbox.
56
+ 3. If you cannot identify the right partner, email a principal or associate first and ask who covers your space. They are more responsive and will route you correctly.
57
+
58
+ **What to avoid:** Do not cold email a managing partner at a large fund with no warm intro. Response rates are near zero. The sector partner path is 10x more effective.
59
+
60
+ ---
61
+
62
+ ### Archetype 5: Scout Program Participants
63
+
64
+ Many top-tier funds (a16z, First Round, Bessemer, etc.) run scout programs -- operators, founders, and angels who refer deals in exchange for carry. Scouts have much higher response rates than GPs at the same fund.
65
+
66
+ **How to approach:**
67
+ 1. Find scouts on Twitter/X or LinkedIn (search "[fund name] scout")
68
+ 2. Scouts are often active founders themselves -- they respond well to peer-to-peer founder outreach
69
+ 3. A positive scout referral carries significant weight internally at the fund
70
+
71
+ **What to avoid:** Do not treat a scout as a gatekeeper to avoid. A scout intro is often more valuable than a cold email to the GP.
72
+
73
+ ---
74
+
75
+ ## Cold Email vs Warm Intro
76
+
77
+ | Method | Typical Response Rate | When to Use |
78
+ |---|---|---|
79
+ | Cold email (generic) | 1-3% | Only if no warm path exists |
80
+ | Cold email (thesis-referenced) | 8-15% | Track B investors who published their thesis |
81
+ | Cold email (portfolio-referenced) | 5-10% | Track A investors with named portfolio evidence |
82
+ | Warm intro from portfolio founder | 30-50% | Always attempt first for Archetype 3 and 4 |
83
+ | Warm intro from mutual angel/advisor | 20-35% | Use your existing cap table to find bridge connections |
84
+
85
+ The warm intro advantage is real. Before sending a cold email to a Track A or Track B investor, spend 10 minutes on LinkedIn finding a second-degree connection through their portfolio companies. One warm intro is worth 10 cold emails.
86
+
87
+ ---
88
+
89
+ ## Application-Form Funds
90
+
91
+ Some funds operate structured intake processes. These require filling out a form, not cold emailing.
92
+
93
+ | Fund Type | How to Engage |
94
+ |---|---|
95
+ | YC | Apply during batch cycle at ycombinator.com/apply |
96
+ | First Round Capital | Submit via firstround.com/funding |
97
+ | Andreessen Horowitz | Public application at a16z.com -- reference their exact investment criteria from their published content |
98
+ | Sequoia | Warm intro strongly preferred; Arc program for early stage |
99
+ | Techstars | Apply to accelerator program for access to their network |
100
+
101
+ For application-form funds: use the exact language from their published investment criteria in your application. They score applications against stated criteria. Generic applications score low regardless of product quality.
102
+
103
+ ---
104
+
105
+ ## Outreach Message Structure
106
+
107
+ For cold email (any archetype):
108
+
109
+ ```
110
+ Subject: [Specific signal reference, under 8 words]
111
+
112
+ [Sentence about THEM -- their portfolio company, their thesis, their recent investment]
113
+
114
+ [One sentence about what you build and who it is for]
115
+
116
+ [One specific data point: metric, customer, or signal]
117
+
118
+ [One-line ask: 20-minute call, or specific question]
119
+
120
+ [Name]
121
+ ```
122
+
123
+ Hard limits:
124
+ - Email body: under 100 words
125
+ - Subject line: under 8 words
126
+ - No attachments in first message
127
+ - No pitch deck link in first message (send on request only)
128
+ - No "I hope this finds you well" or "I wanted to reach out"
129
+
130
+ ---
131
+
132
+ ## Red Flags to Check Before Outreach
133
+
134
+ Before emailing any VC from the output list, check:
135
+
136
+ 1. **Recent fund vintage:** A fund that raised in 2019 and has not announced a new fund is likely deployed and not writing new checks. Check Crunchbase or their website for their most recent fund announcement.
137
+
138
+ 2. **Portfolio company failure in your space:** If a VC backed a direct competitor that failed, they may be reluctant to re-invest in the same category. Frame your differentiation clearly.
139
+
140
+ 3. **Check size mismatch:** If the VC's minimum check is $5M and you are raising a $1M seed, the round economics do not work. Do not waste either side's time.
141
+
142
+ 4. **Geography restriction:** Some funds explicitly invest only in US companies or only in European companies. Check their website before outreach.