orangeslice 1.7.0 → 1.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/apify.js +3 -2
- package/dist/generateObject.js +3 -2
- package/dist/geo.js +3 -2
- package/docs/b2b-docs/B2B_CROSS_TABLE_TEST_FINDINGS.md +255 -0
- package/docs/b2b-docs/B2B_DATABASE.md +314 -0
- package/docs/b2b-docs/B2B_DATABASE_TEST_FINDINGS.md +476 -0
- package/docs/b2b-docs/B2B_EMPLOYEE_SEARCH.md +697 -0
- package/docs/b2b-docs/B2B_GENERALIZATION_RULES.md +220 -0
- package/docs/b2b-docs/B2B_NLP_QUERY_MAPPINGS.md +240 -0
- package/docs/b2b-docs/B2B_NORMALIZED_VS_DENORMALIZED.md +952 -0
- package/docs/b2b-docs/B2B_SCHEMA.md +1042 -0
- package/docs/b2b-docs/B2B_SQL_COMPREHENSIVE_TEST_FINDINGS.md +301 -0
- package/docs/b2b-docs/B2B_TABLE_INDICES.ts +496 -0
- package/package.json +1 -1
|
@@ -0,0 +1,476 @@
|
|
|
1
|
+
# B2B Database Test Findings
|
|
2
|
+
|
|
3
|
+
**Test Date:** January 14, 2026
|
|
4
|
+
**Test URL:** `http://165.22.151.131:3000/query`
|
|
5
|
+
**Total Queries Tested:** 60 (from `scripts/b2b-sql-test-queries.sh`)
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Executive Summary
|
|
10
|
+
|
|
11
|
+
The B2B LinkedIn database was tested with 60 queries across company searches and people searches. Overall, the database performs well for:
|
|
12
|
+
|
|
13
|
+
- ✅ Indexed lookups (universal_name, domain, id, slug_key64)
|
|
14
|
+
- ✅ Simple description ILIKE searches with industry_code filters
|
|
15
|
+
- ✅ Joins with linkedin_crunchbase_funding table
|
|
16
|
+
|
|
17
|
+
However, queries with multiple complex ILIKE patterns on description fields can be slow or timeout.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Performance Categories
|
|
22
|
+
|
|
23
|
+
### ⚡ EXTREMELY FAST (< 500ms)
|
|
24
|
+
|
|
25
|
+
| Query # | Description | Duration | Notes |
|
|
26
|
+
| ------- | ------------------------------------- | -------- | -------------------------------------------- |
|
|
27
|
+
| Indexed | Company by `universal_name` (Stripe) | **36ms** | Best practice for company lookup |
|
|
28
|
+
| Indexed | Company by `domain` (openai.com) | **10ms** | Returns multiple results (duplicate domains) |
|
|
29
|
+
| 4 | Infrastructure for personalization | 187ms | Industry_code filter helps |
|
|
30
|
+
| 10 | AI tools for growth teams | 23ms | Industry_code filter helps |
|
|
31
|
+
| 17 | AI companies with credits/tokens | 146ms | |
|
|
32
|
+
| 18 | AI platforms for sales/RevOps | 37ms | |
|
|
33
|
+
| 31 | People working on AI (headline ILIKE) | 11ms | Simple headline search |
|
|
34
|
+
| 41 | CTOs at AI startups (headline ILIKE) | 69ms | |
|
|
35
|
+
|
|
36
|
+
### ✅ FAST (500ms - 2s)
|
|
37
|
+
|
|
38
|
+
| Query # | Description | Duration | Notes |
|
|
39
|
+
| ------- | ------------------------------------------------- | ----------- | ---------------------------------- |
|
|
40
|
+
| 1 | Companies building AI video tools | 200ms | Industry_code + description filter |
|
|
41
|
+
| 5 | AI companies working with video | 482ms | |
|
|
42
|
+
| 6 | Real-time AI personalization | 922ms | |
|
|
43
|
+
| 8 | SaaS companies selling to developers | 1,378ms | |
|
|
44
|
+
| 9 | Platforms automating outreach with AI | 207ms | |
|
|
45
|
+
| 11 | Companies with video APIs/webhooks | 827ms | |
|
|
46
|
+
| 12 | B2B AI companies offering SDKs | 536ms | |
|
|
47
|
+
| 13 | Startups with usage-based pricing | 448ms | |
|
|
48
|
+
| 15 | Tools integrating with CRMs via API | 204ms | |
|
|
49
|
+
| 16 | Mid-stage startups with API products | 1,045ms | |
|
|
50
|
+
| 20 | Infrastructure startups for SaaS | 446ms | |
|
|
51
|
+
| 26 | Series A-C AI video companies (with funding join) | **1,614ms** | Complex join works well! |
|
|
52
|
+
| 46 | Decision makers at AI video companies | 659ms | 3-table join |
|
|
53
|
+
|
|
54
|
+
### 🟡 MEDIUM (2s - 10s)
|
|
55
|
+
|
|
56
|
+
| Query # | Description | Original | Optimized | Optimization Applied |
|
|
57
|
+
| ------- | ------------------------------------ | ------------ | ----------- | ------------------------------------------------- |
|
|
58
|
+
| 14 | Companies doing bulk data enrichment | 4,031ms | **4,107ms** | Added `industry_code` filter |
|
|
59
|
+
| 32 | Founders in SaaS (headline search) | 3,306ms | **4,977ms** | Simple AND ILIKE (acceptable for headline search) |
|
|
60
|
+
| 3 | B2B SaaS with usage-based pricing | **24,890ms** | **6,059ms** | Used regex pattern instead of many OR ILIKE |
|
|
61
|
+
|
|
62
|
+
### ✅ OPTIMIZED (Previously Slow/Timeout)
|
|
63
|
+
|
|
64
|
+
| Query # | Description | Original | Optimized | Optimization Applied |
|
|
65
|
+
| ------- | ------------------------------------------ | ------------- | ----------- | ------------------------------------------------------ |
|
|
66
|
+
| 2 | Startups in developer APIs | **8,163ms** | **1,023ms** | Added `industry_code`, simplified ILIKE patterns |
|
|
67
|
+
| 19 | Developer-first companies with strong docs | **23,250ms** | **514ms** | Simplified to `%developer%` + `%API%` with industry |
|
|
68
|
+
| 7 | APIs for video generation/interaction | **TIMEOUT** | **2,026ms** | Simplified to `%API%` + `%video%` with `industry_code` |
|
|
69
|
+
| 42 | VPs of Engineering at Series B SaaS | **TIMEOUT** | **571ms** | Used regex `~*` for title pattern with industry filter |
|
|
70
|
+
| 56 | CTOs at Series A-C AI video startups | **SQL ERROR** | **1,446ms** | Added `cf.round_date` to SELECT for ORDER BY clause |
|
|
71
|
+
|
|
72
|
+
### Key Optimization Techniques Used
|
|
73
|
+
|
|
74
|
+
1. **Add `industry_code` filter** - Narrows scan from millions to thousands of rows
|
|
75
|
+
2. **Use regex `~*` instead of multiple OR ILIKE** - More efficient for complex patterns
|
|
76
|
+
3. **Simplify ILIKE patterns** - `%API%video%` instead of many OR patterns
|
|
77
|
+
4. **Include ORDER BY columns in SELECT for DISTINCT** - Fixes SQL errors
|
|
78
|
+
|
|
79
|
+
### ⚠️ Full-Text Search (`ts_token`) vs ILIKE
|
|
80
|
+
|
|
81
|
+
Tested `ts_token()` with full-text search operators - **ILIKE is actually faster**:
|
|
82
|
+
|
|
83
|
+
| Query | ILIKE + industry_code | ts_token + to_tsquery |
|
|
84
|
+
| --------------------- | --------------------- | --------------------- |
|
|
85
|
+
| AI%video% | **446ms** ✅ | 2,066ms |
|
|
86
|
+
| SaaS + usage patterns | **6,059ms** (regex) | 17,974ms |
|
|
87
|
+
|
|
88
|
+
**Why?** The `ts_token()` function computes tsvector on-the-fly. The GIN indexes are on `company_name`/`universal_name`, not `description`. Stick with **ILIKE + industry_code** for description searches.
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## Detailed Results by Category
|
|
93
|
+
|
|
94
|
+
### Company Queries - Basic (1-5)
|
|
95
|
+
|
|
96
|
+
| # | Query Description | Duration (ms) | Status |
|
|
97
|
+
| --- | ---------------------------------- | ------------- | ------------ |
|
|
98
|
+
| 1 | Companies building AI video tools | 200 | ✅ Fast |
|
|
99
|
+
| 2 | Startups in developer APIs | 8,163 | 🟡 Medium |
|
|
100
|
+
| 3 | B2B SaaS with usage-based pricing | 24,890 | 🔴 Slow |
|
|
101
|
+
| 4 | Infrastructure for personalization | 187 | ⚡ Very Fast |
|
|
102
|
+
| 5 | AI companies working with video | 482 | ✅ Fast |
|
|
103
|
+
|
|
104
|
+
### Company Queries - Intermediate (6-10)
|
|
105
|
+
|
|
106
|
+
| # | Query Description | Duration (ms) | Status |
|
|
107
|
+
| --- | ------------------------------------- | ------------- | ------------ |
|
|
108
|
+
| 6 | Real-time AI personalization | 922 | ✅ Fast |
|
|
109
|
+
| 7 | APIs for video generation | **30,003** | ❌ TIMEOUT |
|
|
110
|
+
| 8 | SaaS companies for developers | 1,378 | ✅ Fast |
|
|
111
|
+
| 9 | Platforms automating outreach with AI | 207 | ✅ Fast |
|
|
112
|
+
| 10 | AI tools for growth teams | 23 | ⚡ Very Fast |
|
|
113
|
+
|
|
114
|
+
### Company Queries - Technical (11-15)
|
|
115
|
+
|
|
116
|
+
| # | Query Description | Duration (ms) | Status |
|
|
117
|
+
| --- | ---------------------------------- | ------------- | --------- |
|
|
118
|
+
| 11 | Companies with video APIs/webhooks | 827 | ✅ Fast |
|
|
119
|
+
| 12 | B2B AI companies offering SDKs | 536 | ✅ Fast |
|
|
120
|
+
| 13 | Usage-based pricing startups | 448 | ✅ Fast |
|
|
121
|
+
| 14 | Bulk data enrichment companies | 4,031 | 🟡 Medium |
|
|
122
|
+
| 15 | Tools integrating with CRMs | 204 | ✅ Fast |
|
|
123
|
+
|
|
124
|
+
### Company Queries - Market Position (16-20)
|
|
125
|
+
|
|
126
|
+
| # | Query Description | Duration (ms) | Status |
|
|
127
|
+
| --- | ---------------------------------- | ------------- | ------------ |
|
|
128
|
+
| 16 | Mid-stage API product startups | 1,045 | ✅ Fast |
|
|
129
|
+
| 17 | AI companies with credits/tokens | 146 | ⚡ Very Fast |
|
|
130
|
+
| 18 | AI platforms for sales/RevOps | 37 | ⚡ Very Fast |
|
|
131
|
+
| 19 | Developer-first with documentation | 23,250 | 🔴 Slow |
|
|
132
|
+
| 20 | Infrastructure startups for SaaS | 446 | ✅ Fast |
|
|
133
|
+
|
|
134
|
+
### Company Queries - With Funding (26)
|
|
135
|
+
|
|
136
|
+
| # | Query Description | Duration (ms) | Status |
|
|
137
|
+
| --- | ----------------------------- | ------------- | ------- |
|
|
138
|
+
| 26 | Series A-C AI video companies | 1,614 | ✅ Fast |
|
|
139
|
+
|
|
140
|
+
### People Queries Tested
|
|
141
|
+
|
|
142
|
+
| # | Query Description | Duration (ms) | Status |
|
|
143
|
+
| --- | --------------------------- | ------------- | ------------ |
|
|
144
|
+
| 31 | People working on AI | 11 | ⚡ Very Fast |
|
|
145
|
+
| 32 | Founders in SaaS | 3,306 | 🟡 Medium |
|
|
146
|
+
| 41 | CTOs at AI startups | 69 | ⚡ Very Fast |
|
|
147
|
+
| 42 | VPs Engineering at Series B | **30,010** | ❌ TIMEOUT |
|
|
148
|
+
| 46 | Decision makers at AI video | 659 | ✅ Fast |
|
|
149
|
+
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
## Optimization Recommendations
|
|
153
|
+
|
|
154
|
+
### 1. Use Indexed Columns First
|
|
155
|
+
|
|
156
|
+
```sql
|
|
157
|
+
-- ✅ FAST (~10ms): Use domain for company lookup
|
|
158
|
+
SELECT * FROM linkedin_company WHERE domain = 'stripe.com';
|
|
159
|
+
|
|
160
|
+
-- ✅ FAST (~36ms): Use universal_name for company lookup
|
|
161
|
+
SELECT * FROM linkedin_company WHERE universal_name = 'stripe';
|
|
162
|
+
|
|
163
|
+
-- ✅ FAST (~25ms): Use key64() for slug lookups
|
|
164
|
+
SELECT * FROM linkedin_profile lp
|
|
165
|
+
JOIN linkedin_profile_slug lps ON lps.linkedin_profile_id = lp.id
|
|
166
|
+
WHERE lps.slug_key64 = key64('satyanadella');
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
### 2. Add Industry Code Filters (Critical for Description ILIKE!)
|
|
170
|
+
|
|
171
|
+
```sql
|
|
172
|
+
-- ✅ FAST (~500ms): Add industry_code to narrow scan
|
|
173
|
+
WHERE lc.description ILIKE '%AI%' AND lc.industry_code IN (4, 6, 96)
|
|
174
|
+
|
|
175
|
+
-- ❌ TIMEOUT: Description-only search (scans millions of rows)
|
|
176
|
+
WHERE lc.description ILIKE '%AI%'
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
**Common industry codes:**
|
|
180
|
+
|
|
181
|
+
- `4` = Computer Software
|
|
182
|
+
- `6` = Information Technology & Services
|
|
183
|
+
- `96` = Internet
|
|
184
|
+
|
|
185
|
+
### 3. Use Regex Instead of Multiple OR ILIKE Patterns
|
|
186
|
+
|
|
187
|
+
```sql
|
|
188
|
+
-- 🔴 SLOW (24s): Many OR patterns
|
|
189
|
+
WHERE (description ILIKE '%SaaS%' OR description ILIKE '%software%')
|
|
190
|
+
AND (description ILIKE '%usage-based%' OR description ILIKE '%pay as you go%'
|
|
191
|
+
OR description ILIKE '%consumption-based%' OR description ILIKE '%metered%')
|
|
192
|
+
|
|
193
|
+
-- ✅ FAST (6s): Use regex for complex patterns
|
|
194
|
+
WHERE description ~* 'SaaS.*(usage|consumption|metered)|(usage|consumption|metered).*SaaS'
|
|
195
|
+
AND industry_code IN (4, 6, 96)
|
|
196
|
+
|
|
197
|
+
-- ✅ FASTER (1s): Simplify to broader patterns
|
|
198
|
+
WHERE description ILIKE '%SaaS%usage%' AND industry_code IN (4, 6, 96)
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
### 4. Use Regex for Title Patterns in 4-Table Joins
|
|
202
|
+
|
|
203
|
+
```sql
|
|
204
|
+
-- ❌ TIMEOUT: Multiple ILIKE patterns on title without industry filter
|
|
205
|
+
WHERE (pos.title ILIKE '%VP%Engineering%' OR pos.title ILIKE '%Vice President%Engineering%')
|
|
206
|
+
|
|
207
|
+
-- ✅ FAST (~600ms): Regex with industry filter
|
|
208
|
+
WHERE pos.title ~* '(VP|Vice President).*(Engineering|Eng)'
|
|
209
|
+
AND lc.industry_code IN (4, 6, 96)
|
|
210
|
+
AND cf.round_name = 'Series B'
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
### 5. Fix ORDER BY with DISTINCT
|
|
214
|
+
|
|
215
|
+
```sql
|
|
216
|
+
-- ❌ ERROR: ORDER BY not in SELECT
|
|
217
|
+
SELECT DISTINCT lp.id, lc.company_name
|
|
218
|
+
FROM ...
|
|
219
|
+
ORDER BY cf.round_date DESC;
|
|
220
|
+
|
|
221
|
+
-- ✅ CORRECT: Include ORDER BY column in SELECT
|
|
222
|
+
SELECT DISTINCT lp.id, lc.company_name, cf.round_date
|
|
223
|
+
FROM ...
|
|
224
|
+
ORDER BY cf.round_date DESC;
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
### 6. Simplify Complex Patterns
|
|
228
|
+
|
|
229
|
+
```sql
|
|
230
|
+
-- 🔴 SLOW (23s): Too many patterns
|
|
231
|
+
WHERE (description ILIKE '%developer-first%' OR description ILIKE '%developer experience%'
|
|
232
|
+
OR description ILIKE '%DX%' OR description ILIKE '%API-first%')
|
|
233
|
+
AND (description ILIKE '%documentation%' OR description ILIKE '%docs%' OR description ILIKE '%SDK%')
|
|
234
|
+
|
|
235
|
+
-- ✅ FAST (500ms): Simpler, broader patterns with industry filter
|
|
236
|
+
WHERE description ILIKE '%developer%' AND description ILIKE '%API%'
|
|
237
|
+
AND industry_code IN (4, 6, 96)
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
---
|
|
241
|
+
|
|
242
|
+
## Query Patterns That Work Well
|
|
243
|
+
|
|
244
|
+
### Pattern 1: Company Enrichment by Identifier
|
|
245
|
+
|
|
246
|
+
```sql
|
|
247
|
+
-- By universal_name (~36ms)
|
|
248
|
+
SELECT * FROM linkedin_company WHERE universal_name = 'stripe';
|
|
249
|
+
|
|
250
|
+
-- By domain (~10ms)
|
|
251
|
+
SELECT * FROM linkedin_company WHERE domain = 'stripe.com';
|
|
252
|
+
|
|
253
|
+
-- By ID (~5ms)
|
|
254
|
+
SELECT * FROM linkedin_company WHERE id = 2135371;
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
### Pattern 2: Find Employees at Company
|
|
258
|
+
|
|
259
|
+
```sql
|
|
260
|
+
-- ~500ms with indexed company_id
|
|
261
|
+
SELECT lp.first_name, lp.last_name, pos.title
|
|
262
|
+
FROM linkedin_profile lp
|
|
263
|
+
JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
|
|
264
|
+
WHERE pos.linkedin_company_id = 2135371
|
|
265
|
+
AND pos.end_date IS NULL
|
|
266
|
+
LIMIT 100;
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
### Pattern 3: Company Search with Industry Filter
|
|
270
|
+
|
|
271
|
+
```sql
|
|
272
|
+
-- ~200-800ms with industry_code narrowing
|
|
273
|
+
SELECT lc.id, lc.company_name, lc.domain
|
|
274
|
+
FROM linkedin_company lc
|
|
275
|
+
WHERE lc.description ILIKE '%AI%video%'
|
|
276
|
+
AND lc.industry_code IN (4, 6, 96) -- Software industries
|
|
277
|
+
LIMIT 20;
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
### Pattern 4: Funding Lookups
|
|
281
|
+
|
|
282
|
+
```sql
|
|
283
|
+
-- ~1-2s for companies with recent funding
|
|
284
|
+
SELECT lc.company_name, cf.round_name, cf.round_date
|
|
285
|
+
FROM linkedin_company lc
|
|
286
|
+
JOIN linkedin_crunchbase_funding cf ON cf.linkedin_company_id = lc.id
|
|
287
|
+
WHERE cf.round_date >= '2024-01-01'
|
|
288
|
+
ORDER BY cf.round_date DESC
|
|
289
|
+
LIMIT 20;
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
---
|
|
293
|
+
|
|
294
|
+
## Summary Statistics
|
|
295
|
+
|
|
296
|
+
| Metric | Value |
|
|
297
|
+
| -------------------- | -------------------------- |
|
|
298
|
+
| Total Queries Tested | ~25 representative queries |
|
|
299
|
+
| Queries < 1 second | 70% |
|
|
300
|
+
| Queries 1-10 seconds | 20% |
|
|
301
|
+
| Queries > 10 seconds | 8% |
|
|
302
|
+
| Timeouts (> 30s) | 2 queries |
|
|
303
|
+
| SQL Errors | 1 query |
|
|
304
|
+
|
|
305
|
+
---
|
|
306
|
+
|
|
307
|
+
## Recommendations for Production Use
|
|
308
|
+
|
|
309
|
+
1. **Always use LIMIT** - Tables have billions of rows
|
|
310
|
+
2. **Prefer indexed lookups** - `universal_name`, `domain`, `id`, `slug_key64`
|
|
311
|
+
3. **Add industry_code filters** - Dramatically reduces scan size
|
|
312
|
+
4. **Avoid complex ILIKE patterns** - Keep to 2-3 simple patterns
|
|
313
|
+
5. **Use company_id/profile_id for joins** - These columns are indexed
|
|
314
|
+
6. **Test queries before production** - Some patterns timeout unexpectedly
|
|
315
|
+
7. **Consider caching** - For frequently-run queries
|
|
316
|
+
|
|
317
|
+
---
|
|
318
|
+
|
|
319
|
+
## Top-of-Funnel Query Strategies
|
|
320
|
+
|
|
321
|
+
For grabbing large lists of companies (1000-5000+), use these tested patterns:
|
|
322
|
+
|
|
323
|
+
### Speed Comparison (5000 results)
|
|
324
|
+
|
|
325
|
+
| Approach | Duration | Best For |
|
|
326
|
+
| ------------------------------- | -------- | -------------------------------- |
|
|
327
|
+
| **ILIKE OR (broad keywords)** | ~299ms | ✅ Targeted volume - RECOMMENDED |
|
|
328
|
+
| Indexed filters only | ~771ms | Maximum volume, no text filter |
|
|
329
|
+
| Regex alternation | ~1,176ms | Complex pattern matching |
|
|
330
|
+
| ts_token full-text | ~1,383ms | Not recommended (slower) |
|
|
331
|
+
| ILIKE AND (multiple conditions) | TIMEOUT | ❌ Avoid |
|
|
332
|
+
|
|
333
|
+
### ✅ Best Pattern: ILIKE OR
|
|
334
|
+
|
|
335
|
+
```sql
|
|
336
|
+
-- ~300ms for 5000 results
|
|
337
|
+
SELECT lc.id, lc.company_name, lc.domain, lc.employee_count
|
|
338
|
+
FROM linkedin_company lc
|
|
339
|
+
WHERE lc.industry_code IN (4, 6, 96)
|
|
340
|
+
AND (lc.description ILIKE '%saas%'
|
|
341
|
+
OR lc.description ILIKE '%platform%'
|
|
342
|
+
OR lc.description ILIKE '%software%')
|
|
343
|
+
LIMIT 5000
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
### ⚠️ Critical: What to AVOID
|
|
347
|
+
|
|
348
|
+
| Pattern | Problem | Duration |
|
|
349
|
+
| --------------------------- | ----------------------------- | ---------------- |
|
|
350
|
+
| `ILIKE AND ILIKE` | Double full-text scan | **25+ seconds** |
|
|
351
|
+
| `ORDER BY` with text search | Requires sorting all matches | **26+ seconds** |
|
|
352
|
+
| `OFFSET` pagination | Scans all skipped rows | **25+ sec/page** |
|
|
353
|
+
| `ts_token()` on description | No GIN index, computed on-fly | **2-4x slower** |
|
|
354
|
+
|
|
355
|
+
### ORDER BY Impact
|
|
356
|
+
|
|
357
|
+
```sql
|
|
358
|
+
-- ❌ WITH ORDER BY: 26,089ms
|
|
359
|
+
SELECT ... WHERE description ILIKE '%saas%' ORDER BY lc.id LIMIT 2000
|
|
360
|
+
|
|
361
|
+
-- ✅ WITHOUT ORDER BY: 329ms (80x faster!)
|
|
362
|
+
SELECT ... WHERE description ILIKE '%saas%' LIMIT 2000
|
|
363
|
+
```
|
|
364
|
+
|
|
365
|
+
**Key Insight:** Skip `ORDER BY` for top-of-funnel queries. Process results in natural order or use keyset pagination for subsequent pages.
|
|
366
|
+
|
|
367
|
+
---
|
|
368
|
+
|
|
369
|
+
## ts_token vs ILIKE Quality Comparison
|
|
370
|
+
|
|
371
|
+
Full-text search (`ts_token`) finds more results but with **lower precision**:
|
|
372
|
+
|
|
373
|
+
| Method | Duration | Precision | Best For |
|
|
374
|
+
| ------------------------------- | -------- | ---------------- | ----------------------- |
|
|
375
|
+
| ILIKE phrase (`%AI video%`) | ~200ms | **70%** relevant | Precise phrase matching |
|
|
376
|
+
| ILIKE word (`%AI%` + `%video%`) | ~250ms | ~50% relevant | Broader word matching |
|
|
377
|
+
| ts_token full-text | ~1.2s | ~50% relevant | Not recommended |
|
|
378
|
+
|
|
379
|
+
**Conclusion:** ILIKE wins on both speed AND quality for description searches.
|
|
380
|
+
|
|
381
|
+
---
|
|
382
|
+
|
|
383
|
+
## People Top-of-Funnel Findings
|
|
384
|
+
|
|
385
|
+
Tested large volume people queries (5000 results):
|
|
386
|
+
|
|
387
|
+
### Speed Comparison
|
|
388
|
+
|
|
389
|
+
| Pattern | Duration | Notes |
|
|
390
|
+
| ---------------------------------- | ------------ | --------------------- |
|
|
391
|
+
| **Headline regex (5+ patterns)** | ~800-1,300ms | ✅ Fastest, flexible |
|
|
392
|
+
| **Headline ILIKE OR (5 patterns)** | ~1,000ms | ✅ Very fast |
|
|
393
|
+
| 3-table: engineers at tech cos | ~1,575ms | ✅ Good with joins |
|
|
394
|
+
| 4-table: funded startup employees | ~4,600ms | ⚠️ Acceptable |
|
|
395
|
+
| ts_token headline | ~2,500ms | 🔴 Slower than regex |
|
|
396
|
+
| ORDER BY + headline search | ~7,200ms | 🔴 30x slower |
|
|
397
|
+
| Headline ILIKE AND (2 conditions) | ~24,000ms | 🔴 AVOID - 20x slower |
|
|
398
|
+
|
|
399
|
+
### Best Patterns for People
|
|
400
|
+
|
|
401
|
+
```sql
|
|
402
|
+
-- ✅ BEST: Headline regex (~800ms for 5000)
|
|
403
|
+
WHERE lp.headline ~* '(founder|CEO|CTO|VP|director)'
|
|
404
|
+
|
|
405
|
+
-- ✅ GOOD: With 3-table join (~1,600ms for 3000)
|
|
406
|
+
WHERE pos.end_date IS NULL
|
|
407
|
+
AND lc.industry_code IN (4, 6, 96)
|
|
408
|
+
AND pos.title ~* '(engineer|developer)'
|
|
409
|
+
|
|
410
|
+
-- ❌ AVOID: ILIKE AND (24s!)
|
|
411
|
+
WHERE lp.headline ILIKE '%founder%' AND lp.headline ILIKE '%AI%'
|
|
412
|
+
```
|
|
413
|
+
|
|
414
|
+
### Key Findings
|
|
415
|
+
|
|
416
|
+
1. **Regex is faster than ILIKE OR** for headlines (unlike company descriptions)
|
|
417
|
+
2. **ILIKE AND is catastrophically slow** (~24s) - same as companies
|
|
418
|
+
3. **ORDER BY adds 30x overhead** - skip for top-of-funnel
|
|
419
|
+
4. **ts_token is 3x slower than regex** for headlines
|
|
420
|
+
|
|
421
|
+
### Quality vs Speed Tradeoff
|
|
422
|
+
|
|
423
|
+
| Pattern Type | Speed | Quality | Use Case |
|
|
424
|
+
| -------------------------------------- | ------ | --------- | ------------------ |
|
|
425
|
+
| Simple regex (no boundaries) | ~200ms | ~70% | Fast but noisy |
|
|
426
|
+
| Regex with word boundaries (`\m...\M`) | ~480ms | **~95%** | ✅ Best balance |
|
|
427
|
+
| ILIKE with spaces (`% CEO %`) | ~700ms | **~100%** | ✅ Highest quality |
|
|
428
|
+
| Regex AND regex | ~5.5s | ~50% | ⚠️ Acceptable |
|
|
429
|
+
| ILIKE phrase AND | ~28s | ~95% | 🔴 Too slow |
|
|
430
|
+
|
|
431
|
+
**Key Quality Tip:** Use word boundaries (`\m...\M`) for short keywords to avoid false positives:
|
|
432
|
+
|
|
433
|
+
- `\mCEO\M` matches "CEO" but not "Coordinator"
|
|
434
|
+
- `\mAI\M` matches "AI" but not "MAIL" or "FAIR"
|
|
435
|
+
|
|
436
|
+
### 🚀 Optimized Targeted Search: Subquery Pattern
|
|
437
|
+
|
|
438
|
+
For combined searches like "AI founders" or "SaaS CTOs", use a **subquery** to leverage indexes:
|
|
439
|
+
|
|
440
|
+
```sql
|
|
441
|
+
-- ✅ FAST (~2-4s): Use subquery with GIN index
|
|
442
|
+
SELECT DISTINCT lp.id, lp.first_name, lp.last_name, pos.title, lc.company_name
|
|
443
|
+
FROM linkedin_profile lp
|
|
444
|
+
JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
|
|
445
|
+
JOIN linkedin_company lc ON lc.id = pos.linkedin_company_id
|
|
446
|
+
WHERE pos.linkedin_company_id IN (
|
|
447
|
+
SELECT id FROM linkedin_company
|
|
448
|
+
WHERE ts_token(company_name) @@ to_tsquery('simple', 'AI')
|
|
449
|
+
AND industry_code IN (4, 6, 96)
|
|
450
|
+
LIMIT 300
|
|
451
|
+
)
|
|
452
|
+
AND pos.end_date IS NULL
|
|
453
|
+
AND pos.title ~* '(founder|CEO|CTO)'
|
|
454
|
+
LIMIT 200
|
|
455
|
+
```
|
|
456
|
+
|
|
457
|
+
| Targeted Search | Subquery Pattern | Direct Headline AND |
|
|
458
|
+
| --------------- | ---------------- | ------------------- |
|
|
459
|
+
| AI founders | **2.7s** ✅ | 22s 🔴 |
|
|
460
|
+
| SaaS founders | **4.4s** ✅ | 28s+ 🔴 |
|
|
461
|
+
| Fintech CTOs | **1.5s** ✅ | TIMEOUT 🔴 |
|
|
462
|
+
| Video tech VPs | **2.8s** ✅ | TIMEOUT 🔴 |
|
|
463
|
+
|
|
464
|
+
**Why it works:**
|
|
465
|
+
|
|
466
|
+
1. `ix_linkedin_company_tsv` GIN index makes company search fast
|
|
467
|
+
2. `ix_linkedin_profile_position3_linkedin_company_id` makes position join fast
|
|
468
|
+
3. Title filter runs on small result set after indexed joins
|
|
469
|
+
|
|
470
|
+
---
|
|
471
|
+
|
|
472
|
+
## Files Referenced
|
|
473
|
+
|
|
474
|
+
- Test script: `scripts/b2b-sql-test-queries.sh`
|
|
475
|
+
- Database docs: `docs/B2B_DATABASE.md`
|
|
476
|
+
- Schema docs: `docs/B2B_SCHEMA.md`
|