orangeslice 1.6.0 → 1.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +65 -0
- package/dist/apify.d.ts +57 -0
- package/dist/apify.js +126 -0
- package/dist/cli.js +18 -7
- package/dist/generateObject.d.ts +34 -0
- package/dist/generateObject.js +85 -0
- package/dist/geo.d.ts +50 -0
- package/dist/geo.js +91 -0
- package/dist/index.d.ts +32 -3
- package/dist/index.js +24 -3
- package/dist/serp.d.ts +4 -1
- package/dist/serp.js +2 -2
- package/docs/AGENTS.md +94 -384
- package/docs/apify.md +133 -0
- package/docs/b2b.md +178 -0
- package/docs/browser.md +173 -0
- package/docs/serp.md +167 -0
- package/docs/strategies.md +250 -0
- package/package.json +2 -2
- package/docs/B2B_CROSS_TABLE_TEST_FINDINGS.md +0 -255
- package/docs/B2B_DATABASE.md +0 -314
- package/docs/B2B_DATABASE_TEST_FINDINGS.md +0 -476
- package/docs/B2B_EMPLOYEE_SEARCH.md +0 -697
- package/docs/B2B_GENERALIZATION_RULES.md +0 -220
- package/docs/B2B_NLP_QUERY_MAPPINGS.md +0 -240
- package/docs/B2B_NORMALIZED_VS_DENORMALIZED.md +0 -952
- package/docs/B2B_SCHEMA.md +0 -1042
- package/docs/B2B_SQL_COMPREHENSIVE_TEST_FINDINGS.md +0 -301
- package/docs/B2B_TABLE_INDICES.ts +0 -496
|
@@ -0,0 +1,250 @@
|
|
|
1
|
+
# Research Strategies
|
|
2
|
+
|
|
3
|
+
Patterns for prospecting, enrichment, and social listening.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Prospecting: Two Approaches
|
|
8
|
+
|
|
9
|
+
### 1. Direct Query with Filters (Preferred)
|
|
10
|
+
|
|
11
|
+
Use when criteria is directly searchable:
|
|
12
|
+
|
|
13
|
+
- **Google dorking** — `"AI CRM" site:linkedin.com/company`
|
|
14
|
+
- **B2B database** — industry, company size, funding, job titles
|
|
15
|
+
|
|
16
|
+
```typescript
|
|
17
|
+
// Direct: Companies in fintech with 50-500 employees
|
|
18
|
+
const companies = await orangeslice.b2b.sql(`
|
|
19
|
+
SELECT * FROM linkedin_company
|
|
20
|
+
WHERE industry ILIKE '%fintech%'
|
|
21
|
+
AND employee_count BETWEEN 50 AND 500
|
|
22
|
+
LIMIT 100
|
|
23
|
+
`);
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
### 2. Search → Enrich → Qualify
|
|
27
|
+
|
|
28
|
+
Use when criteria can't be searched directly:
|
|
29
|
+
|
|
30
|
+
- "Companies that recently switched CRMs"
|
|
31
|
+
- "Are they actively hiring for this role?"
|
|
32
|
+
- "Do they use [specific tool]?"
|
|
33
|
+
|
|
34
|
+
```typescript
|
|
35
|
+
// 1. Broad search
|
|
36
|
+
const { results } = await orangeslice.serp.search(
|
|
37
|
+
`"switched to Salesforce" site:linkedin.com/posts`
|
|
38
|
+
);
|
|
39
|
+
|
|
40
|
+
// 2. Enrich each result
|
|
41
|
+
for (const result of results) {
|
|
42
|
+
const companyName = extractCompanyFromPost(result);
|
|
43
|
+
const company = await orangeslice.b2b.sql(`
|
|
44
|
+
SELECT * FROM linkedin_company WHERE company_name ILIKE '%${companyName}%'
|
|
45
|
+
`);
|
|
46
|
+
|
|
47
|
+
// 3. Qualify
|
|
48
|
+
if (company[0]?.employee_count > 100) {
|
|
49
|
+
// Add to qualified list
|
|
50
|
+
}
|
|
51
|
+
}
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## Data Enrichment Pattern
|
|
57
|
+
|
|
58
|
+
**Standard: Search → Scrape → Extract**
|
|
59
|
+
|
|
60
|
+
```typescript
|
|
61
|
+
// 1. Find relevant pages
|
|
62
|
+
const { results } = await orangeslice.serp.search(
|
|
63
|
+
`site:${domain} "practice areas" "medical malpractice"`
|
|
64
|
+
);
|
|
65
|
+
|
|
66
|
+
// 2. Scrape the top result
|
|
67
|
+
const { markdown } = await orangeslice.firecrawl.scrape(results[0].link);
|
|
68
|
+
|
|
69
|
+
// 3. Extract structured data
|
|
70
|
+
const data = await orangeslice.generateObject.generate({
|
|
71
|
+
prompt: `Does this law firm handle medical malpractice cases?\n\n${markdown}`,
|
|
72
|
+
schema: {
|
|
73
|
+
type: "object",
|
|
74
|
+
properties: {
|
|
75
|
+
handles_med_mal: { type: "boolean" },
|
|
76
|
+
practice_areas: { type: "array", items: { type: "string" } }
|
|
77
|
+
}
|
|
78
|
+
}
|
|
79
|
+
});
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
### When to Use Each Tool
|
|
83
|
+
|
|
84
|
+
| Use Search → Scrape → Extract | Use `browser.execute` |
|
|
85
|
+
|------------------------------|----------------------|
|
|
86
|
+
| Data spread across unknown pages | Same template across pages |
|
|
87
|
+
| Varied/unknown page structure | Need specific CSS selectors |
|
|
88
|
+
| One-off enrichment | Scraping lists/tables |
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## Social Listening
|
|
93
|
+
|
|
94
|
+
Find posts mentioning topics, brands, or competitors.
|
|
95
|
+
|
|
96
|
+
### Finding Posts: Use Dorking
|
|
97
|
+
|
|
98
|
+
```
|
|
99
|
+
# LinkedIn posts mentioning topic
|
|
100
|
+
"AI sales tools" site:linkedin.com/posts
|
|
101
|
+
|
|
102
|
+
# Twitter/X posts
|
|
103
|
+
"competitor name" site:x.com inurl:status
|
|
104
|
+
|
|
105
|
+
# Reddit discussions
|
|
106
|
+
"product name" site:reddit.com
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
### Common Problem: Sellers vs. Complainers
|
|
110
|
+
|
|
111
|
+
Users want people **complaining about** tools. Searches return mostly **people selling** alternatives.
|
|
112
|
+
|
|
113
|
+
**Solution: Filter with verification**
|
|
114
|
+
|
|
115
|
+
```typescript
|
|
116
|
+
const { results } = await orangeslice.serp.search(
|
|
117
|
+
`"hate Salesforce" OR "frustrated with Salesforce" site:linkedin.com/posts`
|
|
118
|
+
);
|
|
119
|
+
|
|
120
|
+
for (const post of results) {
|
|
121
|
+
// Get author info
|
|
122
|
+
const authorUrl = extractAuthorUrl(post.link);
|
|
123
|
+
const profile = await orangeslice.b2b.sql(`...`);
|
|
124
|
+
|
|
125
|
+
// Filter out salespeople
|
|
126
|
+
if (!profile.headline?.toLowerCase().includes('sales')) {
|
|
127
|
+
// This is likely a real complainer
|
|
128
|
+
}
|
|
129
|
+
}
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
---
|
|
133
|
+
|
|
134
|
+
## Company Research Checklist
|
|
135
|
+
|
|
136
|
+
```typescript
|
|
137
|
+
async function researchCompany(domain: string) {
|
|
138
|
+
// 1. Basic company info
|
|
139
|
+
const company = await orangeslice.b2b.sql(`
|
|
140
|
+
SELECT * FROM linkedin_company WHERE domain = '${domain}'
|
|
141
|
+
`);
|
|
142
|
+
|
|
143
|
+
// 2. Leadership team
|
|
144
|
+
const leadership = await orangeslice.b2b.sql(`
|
|
145
|
+
SELECT lp.first_name, lp.last_name, pos.title
|
|
146
|
+
FROM linkedin_profile lp
|
|
147
|
+
JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
|
|
148
|
+
WHERE pos.linkedin_company_id = ${company[0].id}
|
|
149
|
+
AND pos.end_date IS NULL
|
|
150
|
+
AND (pos.title ILIKE 'ceo%' OR pos.title ILIKE 'cto%' OR pos.title ILIKE '%founder%')
|
|
151
|
+
LIMIT 10
|
|
152
|
+
`);
|
|
153
|
+
|
|
154
|
+
// 3. Funding history
|
|
155
|
+
const funding = await orangeslice.b2b.sql(`
|
|
156
|
+
SELECT * FROM linkedin_crunchbase_funding
|
|
157
|
+
WHERE linkedin_company_id = ${company[0].id}
|
|
158
|
+
ORDER BY announced_date DESC
|
|
159
|
+
`);
|
|
160
|
+
|
|
161
|
+
// 4. Recent news
|
|
162
|
+
const news = await orangeslice.serp.search(
|
|
163
|
+
`"${company[0].company_name}" funding OR acquisition`,
|
|
164
|
+
{ tbs: "qdr:m" }
|
|
165
|
+
);
|
|
166
|
+
|
|
167
|
+
// 5. Website content
|
|
168
|
+
const about = await orangeslice.firecrawl.scrape(`https://${domain}/about`);
|
|
169
|
+
|
|
170
|
+
// 6. Current job openings
|
|
171
|
+
const jobs = await orangeslice.b2b.sql(`
|
|
172
|
+
SELECT title, locality FROM linkedin_job
|
|
173
|
+
WHERE linkedin_company_id = ${company[0].id}
|
|
174
|
+
AND closed_since IS NULL
|
|
175
|
+
LIMIT 20
|
|
176
|
+
`);
|
|
177
|
+
|
|
178
|
+
return { company, leadership, funding, news, about, jobs };
|
|
179
|
+
}
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
---
|
|
183
|
+
|
|
184
|
+
## Person Research Checklist
|
|
185
|
+
|
|
186
|
+
```typescript
|
|
187
|
+
async function researchPerson(linkedinUrl: string) {
|
|
188
|
+
// Extract profile ID from URL
|
|
189
|
+
const parts = linkedinUrl.split('/in/');
|
|
190
|
+
const linkedinUserId = parts[1]?.split('/')[0];
|
|
191
|
+
|
|
192
|
+
// 1. Profile info
|
|
193
|
+
const profile = await orangeslice.b2b.sql(`
|
|
194
|
+
SELECT * FROM linkedin_profile
|
|
195
|
+
WHERE linkedin_user_id = '${linkedinUserId}'
|
|
196
|
+
`);
|
|
197
|
+
|
|
198
|
+
// 2. Work history
|
|
199
|
+
const positions = await orangeslice.b2b.sql(`
|
|
200
|
+
SELECT pos.*, lc.company_name, lc.domain
|
|
201
|
+
FROM linkedin_profile_position3 pos
|
|
202
|
+
JOIN linkedin_company lc ON lc.id = pos.linkedin_company_id
|
|
203
|
+
WHERE pos.linkedin_profile_id = ${profile[0].id}
|
|
204
|
+
ORDER BY pos.start_date DESC
|
|
205
|
+
`);
|
|
206
|
+
|
|
207
|
+
// 3. Recent activity (posts, mentions)
|
|
208
|
+
const activity = await orangeslice.serp.search(
|
|
209
|
+
`"${profile[0].first_name} ${profile[0].last_name}" site:linkedin.com`,
|
|
210
|
+
{ tbs: "qdr:m" }
|
|
211
|
+
);
|
|
212
|
+
|
|
213
|
+
return { profile, positions, activity };
|
|
214
|
+
}
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
---
|
|
218
|
+
|
|
219
|
+
## Batch Processing Pattern
|
|
220
|
+
|
|
221
|
+
For large lists, process in controlled batches:
|
|
222
|
+
|
|
223
|
+
```typescript
|
|
224
|
+
async function processBatch<T, R>(
|
|
225
|
+
items: T[],
|
|
226
|
+
processor: (item: T) => Promise<R>,
|
|
227
|
+
batchSize = 10
|
|
228
|
+
): Promise<R[]> {
|
|
229
|
+
const results: R[] = [];
|
|
230
|
+
|
|
231
|
+
for (let i = 0; i < items.length; i += batchSize) {
|
|
232
|
+
const batch = items.slice(i, i + batchSize);
|
|
233
|
+
const batchResults = await Promise.all(batch.map(processor));
|
|
234
|
+
results.push(...batchResults);
|
|
235
|
+
|
|
236
|
+
// Rate limiting is handled by orangeslice, but add delay between batches
|
|
237
|
+
if (i + batchSize < items.length) {
|
|
238
|
+
await new Promise(r => setTimeout(r, 1000));
|
|
239
|
+
}
|
|
240
|
+
}
|
|
241
|
+
|
|
242
|
+
return results;
|
|
243
|
+
}
|
|
244
|
+
|
|
245
|
+
// Usage
|
|
246
|
+
const domains = ["stripe.com", "ramp.com", "brex.com", ...];
|
|
247
|
+
const companies = await processBatch(domains, async (domain) => {
|
|
248
|
+
return orangeslice.b2b.sql(`SELECT * FROM linkedin_company WHERE domain = '${domain}'`);
|
|
249
|
+
});
|
|
250
|
+
```
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "orangeslice",
|
|
3
|
-
"version": "1.
|
|
4
|
-
"description": "
|
|
3
|
+
"version": "1.7.0",
|
|
4
|
+
"description": "AI agent toolkit: B2B database, SERP, web scraping, browser automation, structured AI output, Apify actors, geocoding",
|
|
5
5
|
"main": "dist/index.js",
|
|
6
6
|
"types": "dist/index.d.ts",
|
|
7
7
|
"bin": {
|
|
@@ -1,255 +0,0 @@
|
|
|
1
|
-
# B2B Cross-Table Query Test Findings
|
|
2
|
-
|
|
3
|
-
Comprehensive performance comparison between normalized tables (`linkedin_profile`, `linkedin_company`) and denormalized views (`lkd_profile`, `lkd_company`) for cross-table queries.
|
|
4
|
-
|
|
5
|
-
---
|
|
6
|
-
|
|
7
|
-
## Executive Summary
|
|
8
|
-
|
|
9
|
-
| Pattern | Normalized | Denormalized | Winner | Speedup |
|
|
10
|
-
| ---------------------------------- | ---------- | ------------ | ------------ | ------- |
|
|
11
|
-
| **Company ID lookup → employees** | 48ms | 279ms | Normalized | 5.8x |
|
|
12
|
-
| **Company name (org) search** | 274ms | 8,600ms | Normalized | 31x |
|
|
13
|
-
| **GIN-indexed org ILIKE** | 430ms | 29,409ms | Normalized | 68x |
|
|
14
|
-
| **Title ILIKE (common term)** | 64ms | 313ms | Normalized | 4.9x |
|
|
15
|
-
| **updated_at filter** | 4ms | 14ms | Normalized | 3.5x |
|
|
16
|
-
| **Company ID direct lookup** | 4ms | 31ms | Normalized | 7.8x |
|
|
17
|
-
| **Headline (rare term)** | 2,530ms | 1,258ms | Denormalized | 2x |
|
|
18
|
-
| **Skill array search** | 216ms | 169ms | Denormalized | 1.3x |
|
|
19
|
-
| **Industry + employee_count** | 742ms | 202ms | Denormalized | 3.7x |
|
|
20
|
-
| **Headline + company size (JOIN)** | 20,205ms | 217ms | Denormalized | 93x |
|
|
21
|
-
| **Multi-skill + company size** | 28,173ms | 1,281ms | Denormalized | 22x |
|
|
22
|
-
| **Skill + company industry** | TIMEOUT | 3,553ms | Denormalized | ∞ |
|
|
23
|
-
| **Complex multi-filter + company** | TIMEOUT | 4,947ms | Denormalized | ∞ |
|
|
24
|
-
| **AI company + SF location** | TIMEOUT | 11,061ms | Denormalized | ∞ |
|
|
25
|
-
|
|
26
|
-
**Key Finding**: When combining profile text filters (headline, skills) with company constraints (employee_count, industry), **denormalized JOINs are 20-90x faster** and often the only option that completes within timeout.
|
|
27
|
-
|
|
28
|
-
---
|
|
29
|
-
|
|
30
|
-
## Critical Pattern: Profile + Company Combined Filters
|
|
31
|
-
|
|
32
|
-
The most important discovery: **cross-table queries with text filters perform dramatically better with denormalized tables**.
|
|
33
|
-
|
|
34
|
-
### Normalized Multi-JOIN (Often Fails)
|
|
35
|
-
|
|
36
|
-
```sql
|
|
37
|
-
-- ❌ TIMEOUT or 20+ seconds
|
|
38
|
-
SELECT lp.id, lp.first_name, lp.headline, lc.company_name
|
|
39
|
-
FROM linkedin_profile lp
|
|
40
|
-
JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
|
|
41
|
-
JOIN linkedin_company lc ON lc.id = pos.linkedin_company_id
|
|
42
|
-
WHERE pos.end_date IS NULL
|
|
43
|
-
AND lp.headline ILIKE '%engineer%'
|
|
44
|
-
AND lc.employee_count > 1000
|
|
45
|
-
LIMIT 50
|
|
46
|
-
-- Result: 20,205ms
|
|
47
|
-
```
|
|
48
|
-
|
|
49
|
-
### Denormalized JOIN (Fast)
|
|
50
|
-
|
|
51
|
-
```sql
|
|
52
|
-
-- ✅ 217ms - 93x faster
|
|
53
|
-
SELECT lkd.profile_id, lkd.first_name, lkd.headline, lkdc.name
|
|
54
|
-
FROM lkd_profile lkd
|
|
55
|
-
JOIN lkd_company lkdc ON lkdc.linkedin_company_id = lkd.linkedin_company_id
|
|
56
|
-
WHERE lkd.headline ILIKE '%engineer%'
|
|
57
|
-
AND lkdc.employee_count > 1000
|
|
58
|
-
LIMIT 50
|
|
59
|
-
-- Result: 217ms
|
|
60
|
-
```
|
|
61
|
-
|
|
62
|
-
---
|
|
63
|
-
|
|
64
|
-
## Test Results by Category
|
|
65
|
-
|
|
66
|
-
### A. Company-First Queries
|
|
67
|
-
|
|
68
|
-
| Test | Query | Normalized | Denormalized | Winner |
|
|
69
|
-
| ---- | ------------------------------- | ---------- | ------------ | ----------------- |
|
|
70
|
-
| A1 | Employees at company ID | **48ms** | 279ms | Normalized (5.8x) |
|
|
71
|
-
| A2 | Employees by company name (org) | **274ms** | 8,600ms | Normalized (31x) |
|
|
72
|
-
| A3 | Engineers at large companies | **96ms** | 234ms | Normalized (2.4x) |
|
|
73
|
-
|
|
74
|
-
**Conclusion**: For company-first queries, normalized tables win due to indexed lookups.
|
|
75
|
-
|
|
76
|
-
### B. Profile-First Queries
|
|
77
|
-
|
|
78
|
-
| Test | Query | Normalized | Denormalized | Winner |
|
|
79
|
-
| ---- | -------------------------- | ---------- | ------------ | ------------------- |
|
|
80
|
-
| B1 | Python developers | 216ms | **169ms** | Denormalized (1.3x) |
|
|
81
|
-
| B2 | US Data Scientists | 644ms | **557ms** | Denormalized (1.2x) |
|
|
82
|
-
| B3 | Senior engineers + company | 4,535ms | **196ms** | Denormalized (23x) |
|
|
83
|
-
|
|
84
|
-
**Conclusion**: Simple profile queries are similar; profile + company queries favor denormalized.
|
|
85
|
-
|
|
86
|
-
### C. Complex Prospecting Queries
|
|
87
|
-
|
|
88
|
-
| Test | Query | Normalized | Denormalized | Winner |
|
|
89
|
-
| ---- | ----------------------------------------- | ----------- | ------------ | ----------------- |
|
|
90
|
-
| C1 | Decision makers at funded startups | **1,198ms** | 3,124ms | Normalized (2.6x) |
|
|
91
|
-
| C2 | AI company employees in SF | TIMEOUT | **11,061ms** | Denormalized (∞) |
|
|
92
|
-
| C3 | Hybrid (normalized profile + lkd_company) | 9,631ms | - | - |
|
|
93
|
-
|
|
94
|
-
**Conclusion**: When funding table is used (indexed JOIN), normalized wins. When text filters span tables, denormalized wins.
|
|
95
|
-
|
|
96
|
-
### D. Company Lookups
|
|
97
|
-
|
|
98
|
-
| Test | Query | Normalized | Denormalized | Winner |
|
|
99
|
-
| ---- | -------------------------- | ---------- | ------------ | ------------------- |
|
|
100
|
-
| D1 | Company by ID | **4ms** | 31ms | Normalized (7.8x) |
|
|
101
|
-
| D2 | Industry + employee filter | 742ms | **202ms** | Denormalized (3.7x) |
|
|
102
|
-
|
|
103
|
-
### E. Edge Cases
|
|
104
|
-
|
|
105
|
-
| Test | Query | Normalized | Denormalized | Winner |
|
|
106
|
-
| ---- | --------------------- | ---------- | ------------ | ------------------- |
|
|
107
|
-
| E1 | Headline (blockchain) | 713ms | **384ms** | Denormalized (1.9x) |
|
|
108
|
-
| E2 | Company description | 144ms | 152ms | Tie |
|
|
109
|
-
|
|
110
|
-
### F. Verification Tests
|
|
111
|
-
|
|
112
|
-
| Test | Query | Normalized | Denormalized | Winner |
|
|
113
|
-
| ---- | ---------------------------- | ---------- | ------------ | ------------------- |
|
|
114
|
-
| F1 | Multi-skill + company size | 28,173ms | **1,281ms** | Denormalized (22x) |
|
|
115
|
-
| F2 | Country + org (GIN) | **990ms** | 4,594ms | Normalized (4.6x) |
|
|
116
|
-
| F3 | Title regex + company filter | 434ms | **227ms** | Denormalized (1.9x) |
|
|
117
|
-
|
|
118
|
-
### G. Index Pattern Tests
|
|
119
|
-
|
|
120
|
-
| Test | Query | Normalized | Denormalized | Winner |
|
|
121
|
-
| ---- | ------------------------- | ---------- | ------------ | ----------------- |
|
|
122
|
-
| G1 | org ILIKE (GIN indexed) | **430ms** | 29,409ms | Normalized (68x) |
|
|
123
|
-
| G2 | headline ILIKE (no index) | 2,530ms | **1,258ms** | Denormalized (2x) |
|
|
124
|
-
| G3 | title ILIKE | **64ms** | 313ms | Normalized (4.9x) |
|
|
125
|
-
| G4 | updated_at filter | **4ms** | 14ms | Normalized (3.5x) |
|
|
126
|
-
|
|
127
|
-
### H. Cross-Table JOIN Patterns
|
|
128
|
-
|
|
129
|
-
| Test | Query | Normalized | Denormalized | Winner |
|
|
130
|
-
| ---- | ------------------------- | ---------- | ------------ | ------------------ |
|
|
131
|
-
| H1 | Headline + employee_count | 20,205ms | **217ms** | Denormalized (93x) |
|
|
132
|
-
| H2 | Skill + company industry | TIMEOUT | **3,553ms** | Denormalized (∞) |
|
|
133
|
-
| H3 | Multi-filter + company | TIMEOUT | **4,947ms** | Denormalized (∞) |
|
|
134
|
-
|
|
135
|
-
---
|
|
136
|
-
|
|
137
|
-
## Decision Rules for Cross-Table Queries
|
|
138
|
-
|
|
139
|
-
### Use Normalized (`linkedin_profile` + `linkedin_company` JOINs) When:
|
|
140
|
-
|
|
141
|
-
1. **Company-first lookup** - Start with company ID, get employees
|
|
142
|
-
2. **GIN-indexed field** - Searching `linkedin_profile.org` (company name)
|
|
143
|
-
3. **Indexed lookups** - `updated_at`, company ID, profile ID
|
|
144
|
-
4. **Title field search** - `linkedin_profile.title` is faster
|
|
145
|
-
5. **Indexed JOIN tables** - `linkedin_crunchbase_funding`, `linkedin_profile_position3` by company
|
|
146
|
-
|
|
147
|
-
### Use Denormalized (`lkd_profile` JOIN `lkd_company`) When:
|
|
148
|
-
|
|
149
|
-
1. **Headline + company filter** - 93x faster
|
|
150
|
-
2. **Skill + company constraint** - Normalized times out
|
|
151
|
-
3. **Multi-filter combinations** - 22x faster
|
|
152
|
-
4. **Industry + employee_count** - 3.7x faster
|
|
153
|
-
5. **Text filter spanning profile + company** - Often only option
|
|
154
|
-
|
|
155
|
-
### Never Use:
|
|
156
|
-
|
|
157
|
-
1. `lkd_profile.company_name` ILIKE - Use `linkedin_profile.org` (68x faster)
|
|
158
|
-
2. Normalized multi-JOIN with headline filter - Will timeout or be 20s+
|
|
159
|
-
|
|
160
|
-
---
|
|
161
|
-
|
|
162
|
-
## Recommended Query Patterns
|
|
163
|
-
|
|
164
|
-
### Pattern 1: Find Employees at Company by Name
|
|
165
|
-
|
|
166
|
-
```sql
|
|
167
|
-
-- ✅ BEST: Use GIN-indexed org field
|
|
168
|
-
SELECT id, first_name, title, headline, org
|
|
169
|
-
FROM linkedin_profile
|
|
170
|
-
WHERE org ILIKE '%Google%'
|
|
171
|
-
LIMIT 50
|
|
172
|
-
-- Result: 274ms
|
|
173
|
-
```
|
|
174
|
-
|
|
175
|
-
### Pattern 2: Find Engineers at Large Companies
|
|
176
|
-
|
|
177
|
-
```sql
|
|
178
|
-
-- ✅ BEST: Denormalized JOIN (93x faster)
|
|
179
|
-
SELECT lkd.profile_id, lkd.first_name, lkd.headline, lkdc.name, lkdc.employee_count
|
|
180
|
-
FROM lkd_profile lkd
|
|
181
|
-
JOIN lkd_company lkdc ON lkdc.linkedin_company_id = lkd.linkedin_company_id
|
|
182
|
-
WHERE lkd.headline ILIKE '%engineer%'
|
|
183
|
-
AND lkdc.employee_count > 1000
|
|
184
|
-
LIMIT 50
|
|
185
|
-
-- Result: 217ms
|
|
186
|
-
```
|
|
187
|
-
|
|
188
|
-
### Pattern 3: Find People with Skills at Specific Company Types
|
|
189
|
-
|
|
190
|
-
```sql
|
|
191
|
-
-- ✅ BEST: Denormalized (normalized times out)
|
|
192
|
-
SELECT lkd.profile_id, lkd.first_name, lkd.headline, lkdc.name
|
|
193
|
-
FROM lkd_profile lkd
|
|
194
|
-
JOIN lkd_company lkdc ON lkdc.linkedin_company_id = lkd.linkedin_company_id
|
|
195
|
-
WHERE 'Python' = ANY(lkd.skills)
|
|
196
|
-
AND 'SQL' = ANY(lkd.skills)
|
|
197
|
-
AND lkdc.employee_count BETWEEN 100 AND 5000
|
|
198
|
-
LIMIT 50
|
|
199
|
-
-- Result: 1,281ms (normalized: 28,173ms)
|
|
200
|
-
```
|
|
201
|
-
|
|
202
|
-
### Pattern 4: Prospecting Query (Profile Criteria + Company Criteria)
|
|
203
|
-
|
|
204
|
-
```sql
|
|
205
|
-
-- ✅ BEST: Denormalized for multi-filter
|
|
206
|
-
SELECT lkd.profile_id, lkd.first_name, lkd.title, lkdc.name, lkdc.employee_count
|
|
207
|
-
FROM lkd_profile lkd
|
|
208
|
-
JOIN lkd_company lkdc ON lkdc.linkedin_company_id = lkd.linkedin_company_id
|
|
209
|
-
WHERE lkd.title ~* '(manager|director|lead)'
|
|
210
|
-
AND lkdc.employee_count BETWEEN 100 AND 1000
|
|
211
|
-
LIMIT 50
|
|
212
|
-
-- Result: 227ms (normalized: 434ms)
|
|
213
|
-
```
|
|
214
|
-
|
|
215
|
-
### Pattern 5: Decision Makers at Funded Startups
|
|
216
|
-
|
|
217
|
-
```sql
|
|
218
|
-
-- ✅ BEST: Normalized when using indexed funding table
|
|
219
|
-
SELECT DISTINCT lp.id, lp.first_name, lp.title, lc.company_name
|
|
220
|
-
FROM linkedin_profile lp
|
|
221
|
-
JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
|
|
222
|
-
JOIN linkedin_company lc ON lc.id = pos.linkedin_company_id
|
|
223
|
-
JOIN linkedin_crunchbase_funding cf ON cf.linkedin_company_id = lc.id
|
|
224
|
-
WHERE pos.end_date IS NULL
|
|
225
|
-
AND lp.title ~* '(CEO|CTO|VP|Director|Head)'
|
|
226
|
-
AND lc.employee_count BETWEEN 10 AND 500
|
|
227
|
-
LIMIT 50
|
|
228
|
-
-- Result: 1,198ms
|
|
229
|
-
```
|
|
230
|
-
|
|
231
|
-
---
|
|
232
|
-
|
|
233
|
-
## Summary: The Cross-Table Golden Rules
|
|
234
|
-
|
|
235
|
-
1. **Company name search** → Always use `linkedin_profile.org` (GIN indexed, 68x faster)
|
|
236
|
-
2. **Headline/skill + company constraint** → Always use denormalized JOIN (20-93x faster, normalized often times out)
|
|
237
|
-
3. **Company-first lookups** → Use normalized (5-8x faster)
|
|
238
|
-
4. **Indexed table JOINs (funding, positions)** → Normalized is fine
|
|
239
|
-
5. **Multi-filter profile + company** → Denormalized is the only option that works
|
|
240
|
-
|
|
241
|
-
### Quick Decision:
|
|
242
|
-
|
|
243
|
-
```
|
|
244
|
-
Need to search by company name?
|
|
245
|
-
└─ YES → Use linkedin_profile.org
|
|
246
|
-
|
|
247
|
-
Need profile text filter (headline/skills) + company constraint?
|
|
248
|
-
└─ YES → Use lkd_profile JOIN lkd_company
|
|
249
|
-
|
|
250
|
-
Need company ID lookup or indexed JOIN?
|
|
251
|
-
└─ YES → Use normalized tables
|
|
252
|
-
|
|
253
|
-
Default for prospecting queries:
|
|
254
|
-
└─ Use lkd_profile JOIN lkd_company
|
|
255
|
-
```
|