orangeslice 1.7.2 → 1.7.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/b2b-docs/B2B_CROSS_TABLE_TEST_FINDINGS.md +255 -0
- package/docs/b2b-docs/B2B_DATABASE.md +314 -0
- package/docs/b2b-docs/B2B_DATABASE_TEST_FINDINGS.md +476 -0
- package/docs/b2b-docs/B2B_EMPLOYEE_SEARCH.md +697 -0
- package/docs/b2b-docs/B2B_GENERALIZATION_RULES.md +220 -0
- package/docs/b2b-docs/B2B_NLP_QUERY_MAPPINGS.md +240 -0
- package/docs/b2b-docs/B2B_NORMALIZED_VS_DENORMALIZED.md +952 -0
- package/docs/b2b-docs/B2B_SCHEMA.md +1042 -0
- package/docs/b2b-docs/B2B_SQL_COMPREHENSIVE_TEST_FINDINGS.md +301 -0
- package/docs/b2b-docs/B2B_TABLE_INDICES.ts +496 -0
- package/package.json +1 -1
|
@@ -0,0 +1,952 @@
|
|
|
1
|
+
# B2B Database: Normalized vs Denormalized Table Selection Guide
|
|
2
|
+
|
|
3
|
+
Comprehensive test results and decision matrix for choosing between normalized tables (`linkedin_profile`, `linkedin_company`) and denormalized views (`lkd_profile`, `lkd_company`).
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Executive Summary
|
|
8
|
+
|
|
9
|
+
### Profile Tables (`linkedin_profile` vs `lkd_profile`)
|
|
10
|
+
|
|
11
|
+
| Scenario | Winner | Speed Advantage |
|
|
12
|
+
| ---------------------------------------------------------------- | ------------ | ---------------- |
|
|
13
|
+
| **Simple PK/slug lookups** | Normalized | 3-6x faster |
|
|
14
|
+
| **Simple text/ILIKE (common terms)** | Normalized | 2-20x faster |
|
|
15
|
+
| **Batch PK lookups (≤20 IDs)** | Normalized | 3x faster |
|
|
16
|
+
| **Large result sets (5000+ rows)** | Normalized | 10-17x faster |
|
|
17
|
+
| **Indexed filters (updated_at, jobs_count, industry_id)** | Normalized | 3-8x faster |
|
|
18
|
+
| **Org/company name search (GIN indexed)** | Normalized | 20x faster |
|
|
19
|
+
| **Country filter + headline search** | Denormalized | Works vs TIMEOUT |
|
|
20
|
+
| **Multi-filter combinations (2+ filters)** | Denormalized | 2-3x faster |
|
|
21
|
+
| **Rare/uncommon headline terms** | Denormalized | 2-2.5x faster |
|
|
22
|
+
| **Multiple skills array searches** | Denormalized | 2-3x faster |
|
|
23
|
+
| **Numeric + text combos (follower + headline)** | Denormalized | 1.7-2.5x faster |
|
|
24
|
+
| **Complex regex patterns** | Denormalized | 1.4-2x faster |
|
|
25
|
+
| **Summary text searches** | Denormalized | 1.6x faster |
|
|
26
|
+
| **Name searches (first + last)** | Denormalized | 1.5x faster |
|
|
27
|
+
| **Getting enriched nested data (seniority, job_function, etc.)** | Denormalized | Only option |
|
|
28
|
+
|
|
29
|
+
### Company Tables (`linkedin_company` vs `lkd_company`) - Updated Jan 2026
|
|
30
|
+
|
|
31
|
+
| Scenario | Winner | Speed Advantage |
|
|
32
|
+
| ---------------------------------------- | ------------ | ------------------------- |
|
|
33
|
+
| **Slug lookups via `key64()`** | Normalized | 5-12x faster (4ms) |
|
|
34
|
+
| **Domain/ticker/universal_name lookups** | Normalized | Only option (indexed) |
|
|
35
|
+
| **Single country filter** | Normalized | 20x faster (9ms vs 228ms) |
|
|
36
|
+
| **Industry via industry_code** | Normalized | 136x faster (2ms) |
|
|
37
|
+
| **Nested data via JOINs (1 ID)** | Normalized | 7-19x faster |
|
|
38
|
+
| **Aggregations (GROUP BY)** | Normalized | Works vs TIMEOUT |
|
|
39
|
+
| **Simple slug lookup (no key64)** | Denormalized | Only option (21-47ms) |
|
|
40
|
+
| **Compound queries (2+ filters)** | Denormalized | 1.5-2.4x faster |
|
|
41
|
+
| **Funding + employee compound** ⚠️ | Denormalized | **17x faster** (70ms) |
|
|
42
|
+
| **Country + description combinations** | Denormalized | 2.4x faster |
|
|
43
|
+
| **Description ILIKE (rare terms)** | Denormalized | 1.5x faster |
|
|
44
|
+
| **Specialties filtering** | Denormalized | 2.2x faster |
|
|
45
|
+
| **Pre-formatted nested JSON** | Denormalized | Convenience |
|
|
46
|
+
|
|
47
|
+
**Key Insights** (from 40+ comprehensive tests):
|
|
48
|
+
|
|
49
|
+
1. **Text Selectivity Crossover**: Common terms (CEO, engineer) favor normalized (2.4-5.7x faster); rare terms (kubernetes, blockchain) favor denormalized (2-2.8x faster)
|
|
50
|
+
2. **Filter Count Scaling**: 1 filter → normalized usually wins; 3+ filters → denormalized wins (2-4x faster)
|
|
51
|
+
3. **Filter Location**: Text filter on company side → normalized (company table smaller); text filter on profile side → denormalized (profile table huge)
|
|
52
|
+
4. **Cross-Table Pattern**: Profile text + company constraint → denormalized (13-93x faster, normalized often times out)
|
|
53
|
+
5. **Companies**: Use `key64()` for slug lookups; `lkd_company` lacks domain/ticker columns
|
|
54
|
+
|
|
55
|
+
See [B2B_GENERALIZATION_RULES.md](./B2B_GENERALIZATION_RULES.md) for the complete decision matrix.
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## Critical Finding: Country Filtering
|
|
60
|
+
|
|
61
|
+
The most significant discovery is that **country filtering behaves completely differently** between the two tables:
|
|
62
|
+
|
|
63
|
+
| Query Pattern | `linkedin_profile` | `lkd_profile` | Result |
|
|
64
|
+
| -------------------------------------------------- | ------------------ | ------------- | ----------------------- |
|
|
65
|
+
| `location_country_code = 'US'` + headline ILIKE | **TIMEOUT (30s+)** | - | Fails |
|
|
66
|
+
| `country_iso = 'US'` + headline ILIKE | - | **16-22s** | Works |
|
|
67
|
+
| `location_country_code = 'US'` + common term (CTO) | 47ms | 104ms | Both work |
|
|
68
|
+
| `country_iso = 'US'` + rare term (iOS developer) | TIMEOUT | **16s** | Only denormalized works |
|
|
69
|
+
|
|
70
|
+
**Rule**: For country-filtered searches with uncommon headline terms, you **must** use `lkd_profile` with `country_iso`.
|
|
71
|
+
|
|
72
|
+
```sql
|
|
73
|
+
-- ✅ WORKS: US-based iOS developers
|
|
74
|
+
SELECT profile_id, first_name, headline, locality
|
|
75
|
+
FROM lkd_profile
|
|
76
|
+
WHERE country_iso = 'US'
|
|
77
|
+
AND headline ~* '(\miOS\M|\mSwift\M).*(developer|engineer)'
|
|
78
|
+
LIMIT 100
|
|
79
|
+
|
|
80
|
+
-- ❌ TIMEOUT: Same query on normalized table
|
|
81
|
+
SELECT id, first_name, headline, location_name
|
|
82
|
+
FROM linkedin_profile
|
|
83
|
+
WHERE location_country_code = 'US'
|
|
84
|
+
AND headline ILIKE '%iOS developer%'
|
|
85
|
+
LIMIT 100
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
## Compound Query Performance
|
|
91
|
+
|
|
92
|
+
The key pattern: **compound queries scale better on denormalized views**.
|
|
93
|
+
|
|
94
|
+
| # of Filters | Normalized | Denormalized | Winner |
|
|
95
|
+
| ------------------- | ---------- | ------------ | ------------------- |
|
|
96
|
+
| 1 filter (indexed) | **Faster** | Slower | Normalized |
|
|
97
|
+
| 1 filter (headline) | Faster | Slower | Normalized |
|
|
98
|
+
| 2 filters | Similar | Similar | Tie |
|
|
99
|
+
| 3+ filters | **Slower** | **Faster** | Denormalized (2-3x) |
|
|
100
|
+
|
|
101
|
+
### Example: Triple Filter Performance
|
|
102
|
+
|
|
103
|
+
```sql
|
|
104
|
+
-- lkd_profile: 5.8s ✅
|
|
105
|
+
SELECT first_name, headline, connection_count
|
|
106
|
+
FROM lkd_profile
|
|
107
|
+
WHERE connection_count > 500
|
|
108
|
+
AND 'Python' = ANY(skills)
|
|
109
|
+
AND headline ILIKE '%engineer%'
|
|
110
|
+
LIMIT 100
|
|
111
|
+
|
|
112
|
+
-- linkedin_profile: 18.2s (3.1x slower)
|
|
113
|
+
SELECT first_name, headline, connections
|
|
114
|
+
FROM linkedin_profile
|
|
115
|
+
WHERE connections > 500
|
|
116
|
+
AND 'Python' = ANY(skills)
|
|
117
|
+
AND headline ILIKE '%engineer%'
|
|
118
|
+
LIMIT 100
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
## Test Results: Profile Queries
|
|
124
|
+
|
|
125
|
+
### 1. Primary Key Lookups
|
|
126
|
+
|
|
127
|
+
| Query Type | Normalized | Denormalized | Winner |
|
|
128
|
+
| ----------------------- | ---------- | ------------ | ----------------- |
|
|
129
|
+
| Single ID lookup | **0.12ms** | 0.66ms | Normalized (5.5x) |
|
|
130
|
+
| Batch 5 IDs | **1.84ms** | 1.15ms | Tie |
|
|
131
|
+
| Batch 20 IDs | **3.46ms** | 11.44ms | Normalized (3.3x) |
|
|
132
|
+
| Slug lookup (with JOIN) | **0.18ms** | timeout | Normalized |
|
|
133
|
+
|
|
134
|
+
```sql
|
|
135
|
+
-- RECOMMENDED: Normalized PK lookup
|
|
136
|
+
SELECT id, formatted_name, title, org, headline
|
|
137
|
+
FROM linkedin_profile WHERE id = ?
|
|
138
|
+
|
|
139
|
+
-- AVOID: Denormalized for simple lookups
|
|
140
|
+
SELECT profile_id, name, title, company_name, headline
|
|
141
|
+
FROM lkd_profile WHERE profile_id = ?
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
### 2. Text/ILIKE Searches
|
|
145
|
+
|
|
146
|
+
| Query Type | Normalized | Denormalized | Winner |
|
|
147
|
+
| ---------------------------------- | ---------- | ------------ | ----------------- |
|
|
148
|
+
| Title ILIKE (LIMIT 10) | **24.5ms** | 66.7ms | Normalized (2.7x) |
|
|
149
|
+
| Headline multi-term | **99.2ms** | 116.4ms | Normalized (1.2x) |
|
|
150
|
+
| Org/company_name ILIKE | **86.7ms** | 1,710ms | Normalized (20x) |
|
|
151
|
+
| Compound filter (location + title) | timeout | 283.7ms | Denormalized\* |
|
|
152
|
+
|
|
153
|
+
\*Note: Normalized query timed out on this particular compound filter
|
|
154
|
+
|
|
155
|
+
```sql
|
|
156
|
+
-- RECOMMENDED: Normalized text search
|
|
157
|
+
SELECT id, formatted_name, title
|
|
158
|
+
FROM linkedin_profile
|
|
159
|
+
WHERE title ILIKE '%software engineer%' LIMIT 10
|
|
160
|
+
|
|
161
|
+
-- AVOID for search: Denormalized
|
|
162
|
+
SELECT profile_id, name, title
|
|
163
|
+
FROM lkd_profile
|
|
164
|
+
WHERE title ILIKE '%software engineer%' LIMIT 10
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
### 3. Getting Nested Data (Experience, Education)
|
|
168
|
+
|
|
169
|
+
| Query Type | Normalized (with JOINs) | Denormalized | Winner |
|
|
170
|
+
| ------------------------------------------ | ----------------------- | ------------ | ------------------- |
|
|
171
|
+
| Profile + positions (1 ID) | **0.33ms** | 6.42ms | Normalized (19x) |
|
|
172
|
+
| Profile + education (1 ID) | **1.22ms** | 0.66ms | Denormalized (1.8x) |
|
|
173
|
+
| Profile + positions + education (1 ID) | **0.27ms** | 4.58ms | Normalized (17x) |
|
|
174
|
+
| Batch 5 IDs + positions | **0.85ms** | 6.36ms | Normalized (7.5x) |
|
|
175
|
+
| Full profile (positions, education, certs) | **0.32ms** | 6.28ms | Normalized (20x) |
|
|
176
|
+
|
|
177
|
+
```sql
|
|
178
|
+
-- RECOMMENDED: Normalized with JOINs for nested data
|
|
179
|
+
SELECT lp.id, lp.formatted_name,
|
|
180
|
+
pp.title as job_title, pp.company_name, pp.start_date
|
|
181
|
+
FROM linkedin_profile lp
|
|
182
|
+
LEFT JOIN linkedin_profile_position3 pp ON pp.linkedin_profile_id = lp.id
|
|
183
|
+
WHERE lp.id = ?
|
|
184
|
+
ORDER BY pp.start_date DESC NULLS LAST
|
|
185
|
+
|
|
186
|
+
-- USE ONLY when you need enriched fields (seniority, job_function, etc.)
|
|
187
|
+
SELECT profile_id, name, experience, education
|
|
188
|
+
FROM lkd_profile WHERE profile_id = ?
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
### 4. Filtering on Nested Data
|
|
192
|
+
|
|
193
|
+
| Query Type | Normalized | Denormalized | Winner |
|
|
194
|
+
| ----------------------------------------- | ---------- | ------------ | --------------------- |
|
|
195
|
+
| EXISTS check on positions (known profile) | 1,106.6ms | **2.84ms** | Denormalized (390x)\* |
|
|
196
|
+
| Filter positions by title | **17.6ms** | 1,582ms | Normalized (90x) |
|
|
197
|
+
| COUNT positions > 5 (HAVING) | **0.99ms** | 94ms | Normalized (95x) |
|
|
198
|
+
| jsonb_array_length filter | N/A | 94ms | Normalized preferred |
|
|
199
|
+
|
|
200
|
+
\*Note: Denormalized wins when starting from a known profile_id and checking JSON fields. But loses badly for scanning/searching within JSON arrays.
|
|
201
|
+
|
|
202
|
+
```sql
|
|
203
|
+
-- RECOMMENDED: Normalized for filtering on nested data
|
|
204
|
+
SELECT DISTINCT pp.linkedin_profile_id
|
|
205
|
+
FROM linkedin_profile_position3 pp
|
|
206
|
+
WHERE pp.title ILIKE '%software engineer%'
|
|
207
|
+
LIMIT 10
|
|
208
|
+
|
|
209
|
+
-- AVOID: JSON array filtering is very slow
|
|
210
|
+
SELECT profile_id, name
|
|
211
|
+
FROM lkd_profile
|
|
212
|
+
WHERE EXISTS (
|
|
213
|
+
SELECT 1 FROM jsonb_array_elements(experience::jsonb) e
|
|
214
|
+
WHERE (e->>'title') ILIKE '%software engineer%'
|
|
215
|
+
) LIMIT 10
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
### 5. Extended Comparison Tests (2025)
|
|
219
|
+
|
|
220
|
+
Comprehensive tests across 23 query patterns:
|
|
221
|
+
|
|
222
|
+
| Test | Query Pattern | `lkd_profile` | `linkedin_profile` | Winner | Speedup |
|
|
223
|
+
| -------------------------- | ----------------------------- | ------------- | ------------------ | ------------ | ------- |
|
|
224
|
+
| Slug lookup | slug = 'x' | TIMEOUT | **7ms** | Normalized | ∞ |
|
|
225
|
+
| Profile ID lookup | id = X | 82ms | **33ms** | Normalized | 2.5x |
|
|
226
|
+
| updated_at filter | updated_at > date | 91ms | **12ms** | Normalized | 7.5x |
|
|
227
|
+
| jobs_count filter | jobs_count > 10 | 228ms | **53ms** | Normalized | 4.3x |
|
|
228
|
+
| Large results (5000) | headline ILIKE | 5.5s | **324ms** | Normalized | 17x |
|
|
229
|
+
| **Name search** | first + last | **2.3s** | 3.5s | Denormalized | 1.5x |
|
|
230
|
+
| **Follower + headline** | num > X AND headline | **1.8s** | 3.1s | Denormalized | 1.7x |
|
|
231
|
+
| **Multiple skills (3)** | skill1 AND skill2 AND skill3 | **2.5s** | 7s | Denormalized | 2.8x |
|
|
232
|
+
| **Skills + headline** | skill AND headline ILIKE | **7s** | 9.7s | Denormalized | 1.4x |
|
|
233
|
+
| **Rare term (blockchain)** | headline ILIKE '%blockchain%' | **1.4s** | 3.4s | Denormalized | 2.4x |
|
|
234
|
+
| **Rare term (kubernetes)** | headline ILIKE '%kubernetes%' | **1.6s** | 3.1s | Denormalized | 1.9x |
|
|
235
|
+
| **Summary search** | summary ILIKE '%startup%' | **612ms** | 966ms | Denormalized | 1.6x |
|
|
236
|
+
| **Connection + skills** | connections > X AND skill | **1.6s** | 3.8s | Denormalized | 2.4x |
|
|
237
|
+
| **Complex regex** | headline ~\* '(a\|b\|c\|d)' | **335ms** | 465ms | Denormalized | 1.4x |
|
|
238
|
+
| **Triple filter** | conn + skill + headline | **5.8s** | 18.2s | Denormalized | 3.1x |
|
|
239
|
+
| **Rare term + follower** | tensorflow + follower > X | **9.2s** | 23s | Denormalized | 2.5x |
|
|
240
|
+
| **Word boundary regex** | headline ~\* '\mAI\M' | **227ms** | 480ms | Denormalized | 2.1x |
|
|
241
|
+
| **Headline + locality** | headline + city ILIKE | **746ms** | 1.95s | Denormalized | 2.6x |
|
|
242
|
+
|
|
243
|
+
---
|
|
244
|
+
|
|
245
|
+
## Test Results: Company Queries (January 2026)
|
|
246
|
+
|
|
247
|
+
> **Note**: `lkd_company` is a VIEW (not a materialized view or table), so it has no indexes and relies on the underlying table indexes.
|
|
248
|
+
|
|
249
|
+
### 5. Primary Key & Slug Lookups
|
|
250
|
+
|
|
251
|
+
| Query Type | Normalized | Denormalized | Winner |
|
|
252
|
+
| ------------------------------- | ---------- | ------------ | ------------------- |
|
|
253
|
+
| Single ID lookup | **3-4ms** | 4-19ms | Normalized (varies) |
|
|
254
|
+
| Batch 5 IDs | 4ms | 4ms | Tie |
|
|
255
|
+
| Batch 20 IDs | 5ms | 4ms | Tie |
|
|
256
|
+
| Slug via raw text | TIMEOUT | TIMEOUT | Both fail |
|
|
257
|
+
| Slug via `key64()` + subquery | **4ms** | N/A | Normalized |
|
|
258
|
+
| Slug direct on lkd_company | N/A | **21-47ms** | Denormalized only |
|
|
259
|
+
| Domain lookup (indexed) | **3ms** | N/A | Normalized only |
|
|
260
|
+
| universal_name lookup (indexed) | **2ms** | N/A | Normalized only |
|
|
261
|
+
| Ticker lookup (indexed) | **5ms** | N/A | Normalized only |
|
|
262
|
+
|
|
263
|
+
**Critical: Slug Lookup Strategy**
|
|
264
|
+
|
|
265
|
+
```sql
|
|
266
|
+
-- ✅ FASTEST: Use key64() for indexed slug lookup
|
|
267
|
+
SELECT id, company_name FROM linkedin_company
|
|
268
|
+
WHERE id = (
|
|
269
|
+
SELECT linkedin_company_id FROM linkedin_company_slug
|
|
270
|
+
WHERE slug_key64 = key64('google') LIMIT 1
|
|
271
|
+
)
|
|
272
|
+
-- Result: 4ms
|
|
273
|
+
|
|
274
|
+
-- 🟡 WORKS: Denormalized direct (slower but simple)
|
|
275
|
+
SELECT linkedin_company_id, name FROM lkd_company WHERE slug = 'google'
|
|
276
|
+
-- Result: 21-47ms
|
|
277
|
+
|
|
278
|
+
-- ❌ TIMEOUT: Raw slug lookup without key64()
|
|
279
|
+
SELECT linkedin_company_id FROM linkedin_company_slug WHERE slug = 'google'
|
|
280
|
+
-- Result: 30s+ TIMEOUT
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
### 6. Text Searches
|
|
284
|
+
|
|
285
|
+
| Query Type | Normalized | Denormalized | Winner |
|
|
286
|
+
| -------------------------------- | ---------- | ------------ | ------------------- |
|
|
287
|
+
| Company name ILIKE | **691ms** | 872ms | Normalized (1.3x) |
|
|
288
|
+
| Description ILIKE (AI) | 396ms | **337ms** | Denormalized (1.2x) |
|
|
289
|
+
| Description ILIKE (blockchain) | 380ms | **372ms** | Tie |
|
|
290
|
+
| Description ILIKE (ML) | 413ms | **276ms** | Denormalized (1.5x) |
|
|
291
|
+
| Company headline ILIKE | **36ms** | 42ms | Tie |
|
|
292
|
+
| Regex multiple keywords (AI\|ML) | **128ms** | 158ms | Normalized (1.2x) |
|
|
293
|
+
| Specialties array contains | 25.9s | **11.6s** | Denormalized (2.2x) |
|
|
294
|
+
|
|
295
|
+
### 7. Getting Nested Data (Funding, Locations, Industries)
|
|
296
|
+
|
|
297
|
+
| Query Type | Normalized (JOINs) | Denormalized | Winner |
|
|
298
|
+
| -------------------------- | ------------------ | ------------ | ----------------- |
|
|
299
|
+
| Company + funding (1 ID) | **9ms** | 67ms | Normalized (7.4x) |
|
|
300
|
+
| Company + locations (1 ID) | **6ms** | 11ms | Normalized (1.8x) |
|
|
301
|
+
| Company + industry (1 ID) | **3ms** | 50ms | Normalized (17x) |
|
|
302
|
+
| Full company all nested | **3ms** | 58ms | Normalized (19x) |
|
|
303
|
+
| Batch 5 IDs + industry | **3ms** | 53ms | Normalized (18x) |
|
|
304
|
+
|
|
305
|
+
### 8. Filtering on Nested Company Data
|
|
306
|
+
|
|
307
|
+
| Query Type | Normalized | Denormalized | Winner |
|
|
308
|
+
| ------------------------------- | ---------- | ------------ | --------------------- |
|
|
309
|
+
| EXISTS funding check | **111ms** | 191ms | Normalized (1.7x) |
|
|
310
|
+
| Funding + Employee compound | 1,183ms | **70ms** | Denormalized (17x) ⚠️ |
|
|
311
|
+
| Industry name via JSON ILIKE | N/A | 272ms | Denormalized only |
|
|
312
|
+
| Industry via industry_code JOIN | **2ms** | N/A | Normalized only |
|
|
313
|
+
|
|
314
|
+
**⚠️ Critical Finding**: Funding + Employee compound queries are **17x faster** on denormalized.
|
|
315
|
+
|
|
316
|
+
### 9. Numeric Filters
|
|
317
|
+
|
|
318
|
+
| Query Type | Normalized | Denormalized | Winner |
|
|
319
|
+
| ----------------------------- | ---------- | ------------ | ----------------- |
|
|
320
|
+
| employee_count > 1000 | 105ms | **81ms** | Denormalized |
|
|
321
|
+
| employee_count BETWEEN 50-200 | **10ms** | 11ms | Tie |
|
|
322
|
+
| follower_count > 100000 | **209ms** | 283ms | Normalized (1.4x) |
|
|
323
|
+
| founded > 2020 | **7ms** | 6ms | Tie |
|
|
324
|
+
| founded + employee compound | 82ms | 89ms | Tie |
|
|
325
|
+
| updated_at > date (indexed) | **3ms** | 4ms | Normalized |
|
|
326
|
+
|
|
327
|
+
### 10. Country Filtering
|
|
328
|
+
|
|
329
|
+
| Query Type | Normalized | Denormalized | Winner |
|
|
330
|
+
| ------------------------- | ---------- | ------------ | ------------------- |
|
|
331
|
+
| country_iso only | **9-11ms** | 228ms | Normalized (20x) |
|
|
332
|
+
| country_code only | **9ms** | N/A | Normalized only |
|
|
333
|
+
| country_iso + employee | 250ms | **158ms** | Denormalized (1.6x) |
|
|
334
|
+
| country_iso + description | 322ms | **135ms** | Denormalized (2.4x) |
|
|
335
|
+
|
|
336
|
+
### 11. Compound Query Performance
|
|
337
|
+
|
|
338
|
+
| # of Filters | Normalized | Denormalized | Winner |
|
|
339
|
+
| ----------------------------------- | ---------- | ------------ | ------------------- |
|
|
340
|
+
| 1 filter (indexed: country, domain) | **Faster** | Slower | Normalized (9-20x) |
|
|
341
|
+
| 1 filter (text: description) | Similar | Similar | Tie |
|
|
342
|
+
| 2 filters (employee + name) | 243ms | **229ms** | Tie |
|
|
343
|
+
| 2 filters (employee + description) | 162ms | **84ms** | Denormalized (1.9x) |
|
|
344
|
+
| 3 filters (country+emp+desc) | 797ms | **372ms** | Denormalized (2.1x) |
|
|
345
|
+
| 4 filters (country+emp+foll+desc) | 158ms | **108ms** | Denormalized (1.5x) |
|
|
346
|
+
| Double text (name + description) | 176ms | **148ms** | Denormalized (1.2x) |
|
|
347
|
+
| Locality + description | **357ms** | 434ms | Normalized (1.2x) |
|
|
348
|
+
|
|
349
|
+
### 12. Large Result Sets & Aggregations
|
|
350
|
+
|
|
351
|
+
| Query Type | Normalized | Denormalized | Winner |
|
|
352
|
+
| ---------------------------- | ---------- | ------------ | ------------------ |
|
|
353
|
+
| 1,000 rows (employee filter) | 16ms | 17ms | Tie |
|
|
354
|
+
| 5,000 rows (employee filter) | **20ms** | 27ms | Normalized (1.35x) |
|
|
355
|
+
| 10,000 rows (no filter) | 30ms | 29ms | Tie |
|
|
356
|
+
| COUNT with filter | **20s** | 21s | Both slow |
|
|
357
|
+
| GROUP BY country | **26s** | TIMEOUT | Normalized |
|
|
358
|
+
|
|
359
|
+
---
|
|
360
|
+
|
|
361
|
+
## Decision Matrix
|
|
362
|
+
|
|
363
|
+
### When to Use `linkedin_profile` (Normalized)
|
|
364
|
+
|
|
365
|
+
✅ **Always use for:**
|
|
366
|
+
|
|
367
|
+
- Slug lookups (via `key64()` function) - **7ms vs TIMEOUT**
|
|
368
|
+
- Profile ID lookups - **2.5x faster**
|
|
369
|
+
- `updated_at` filtering - **7.5x faster** (indexed)
|
|
370
|
+
- `jobs_count` filtering - **4.3x faster**
|
|
371
|
+
- Org/company name search - **20x faster** (GIN indexed)
|
|
372
|
+
- Title field searches - **14x faster**
|
|
373
|
+
- Large result sets (5000+ rows) - **17x faster**
|
|
374
|
+
- JOINs with position/education tables for filtering
|
|
375
|
+
- COUNT, GROUP BY, aggregations
|
|
376
|
+
- EXISTS checks across many profiles
|
|
377
|
+
|
|
378
|
+
### When to Use `lkd_profile` (Denormalized)
|
|
379
|
+
|
|
380
|
+
✅ **Use for:**
|
|
381
|
+
|
|
382
|
+
- **Country filtering with uncommon terms** - `country_iso = 'US'` works where `location_country_code` times out
|
|
383
|
+
- **Multi-filter combinations (2+ filters)** - 2-3x faster
|
|
384
|
+
- **Multiple skills array searches** - 2.8x faster
|
|
385
|
+
- **Rare/uncommon headline terms** (blockchain, kubernetes, tensorflow) - 2-2.5x faster
|
|
386
|
+
- **Numeric + text combos** (follower count + headline) - 1.7-2.5x faster
|
|
387
|
+
- **Complex regex patterns** (multi-option, word boundaries) - 1.4-2x faster
|
|
388
|
+
- **Summary text searches** - 1.6x faster
|
|
389
|
+
- **Name searches** (first + last name) - 1.5x faster
|
|
390
|
+
- **Headline + locality combinations** - 2.6x faster
|
|
391
|
+
- Enriched experience data (seniority, job_function, employment_type, academic_qualification, inferred_location)
|
|
392
|
+
- Building API responses that need complete profile with all nested arrays
|
|
393
|
+
- You have a known profile_id and want all nested data in one query
|
|
394
|
+
- Checking JSON fields on a single known entity
|
|
395
|
+
|
|
396
|
+
❌ **Never use for:**
|
|
397
|
+
|
|
398
|
+
- Slug lookups (use normalized with `key64()`)
|
|
399
|
+
- Searching/filtering within JSON arrays (experience titles, etc.)
|
|
400
|
+
- Aggregations or counting
|
|
401
|
+
- Large result sets without filters
|
|
402
|
+
|
|
403
|
+
### When to Use `linkedin_company` (Normalized)
|
|
404
|
+
|
|
405
|
+
✅ **Always use for:**
|
|
406
|
+
|
|
407
|
+
- **Slug lookups via `key64()`** - 4ms vs 21-47ms (5-12x faster)
|
|
408
|
+
- **Domain lookups** - only option (indexed, 3ms)
|
|
409
|
+
- **Ticker lookups** - only option (indexed, 5ms)
|
|
410
|
+
- **Universal name lookups** - only option (indexed, 2ms)
|
|
411
|
+
- **Single country filter** - 9ms vs 228ms (20x faster)
|
|
412
|
+
- **Updated_at filtering** - indexed (3ms)
|
|
413
|
+
- **Industry filtering via industry_code** - 2ms vs 272ms (136x faster)
|
|
414
|
+
- **Aggregations (GROUP BY)** - 26s vs TIMEOUT
|
|
415
|
+
- **Complex JOINs** (company → positions → profiles)
|
|
416
|
+
- **Getting nested data by known ID** - JOINs are 7-19x faster
|
|
417
|
+
- **EXISTS checks on funding** - 1.7x faster
|
|
418
|
+
|
|
419
|
+
### When to Use `lkd_company` (Denormalized)
|
|
420
|
+
|
|
421
|
+
✅ **Use for:**
|
|
422
|
+
|
|
423
|
+
- **Compound queries (2+ filters)** - 1.5-2.4x faster
|
|
424
|
+
- **Funding + employee compound** - 17x faster (70ms vs 1,183ms) ⚠️
|
|
425
|
+
- **Country + description combinations** - 2.4x faster
|
|
426
|
+
- **Country + employee combinations** - 1.6x faster
|
|
427
|
+
- **Description searches (rare terms)** - 1.5x faster for ML/AI terms
|
|
428
|
+
- **Specialties filtering** - 2.2x faster (11.6s vs 25.9s)
|
|
429
|
+
- **Simple slug lookups** without `key64()` - works (21-47ms) vs TIMEOUT
|
|
430
|
+
- Pre-formatted `industries`, `locations`, `crunchbase_funding` JSON
|
|
431
|
+
- Building API responses with complete company data in one query
|
|
432
|
+
|
|
433
|
+
❌ **Never use for:**
|
|
434
|
+
|
|
435
|
+
- Domain/ticker/universal_name lookups (columns don't exist)
|
|
436
|
+
- Single indexed filter lookups (country_iso, industry_code) - 20x slower
|
|
437
|
+
- Aggregations (COUNT, GROUP BY) - TIMEOUT
|
|
438
|
+
- Complex JOINs with other tables
|
|
439
|
+
|
|
440
|
+
---
|
|
441
|
+
|
|
442
|
+
## Cross-Table Query Performance (Profile + Company)
|
|
443
|
+
|
|
444
|
+
**Critical Finding**: When combining profile text filters (headline, skills) with company constraints, **denormalized JOINs are 20-93x faster** and often the only option that completes.
|
|
445
|
+
|
|
446
|
+
### Performance Comparison
|
|
447
|
+
|
|
448
|
+
| Query Pattern | Normalized Multi-JOIN | Denormalized JOIN | Winner |
|
|
449
|
+
| -------------------------- | --------------------- | ----------------- | ------------------ |
|
|
450
|
+
| Headline + company size | 20,205ms | **217ms** | Denormalized (93x) |
|
|
451
|
+
| Multi-skill + company size | 28,173ms | **1,281ms** | Denormalized (22x) |
|
|
452
|
+
| Skill + company industry | TIMEOUT | **3,553ms** | Denormalized (∞) |
|
|
453
|
+
| Senior engineers + company | 4,535ms | **196ms** | Denormalized (23x) |
|
|
454
|
+
| Company ID → employees | **48ms** | 279ms | Normalized (5.8x) |
|
|
455
|
+
| Company name (org) search | **274ms** | 8,600ms | Normalized (31x) |
|
|
456
|
+
|
|
457
|
+
### Rules for Cross-Table Queries
|
|
458
|
+
|
|
459
|
+
1. **Company name search** → Always use `linkedin_profile.org` (GIN indexed, 68x faster)
|
|
460
|
+
2. **Headline/skill + company constraint** → Always use `lkd_profile JOIN lkd_company` (normalized times out)
|
|
461
|
+
3. **Company-first lookups** → Use normalized (5-8x faster)
|
|
462
|
+
4. **Multi-filter profile + company** → Denormalized is often the only option
|
|
463
|
+
|
|
464
|
+
### Example: Engineers at Large Companies
|
|
465
|
+
|
|
466
|
+
```sql
|
|
467
|
+
-- ❌ SLOW: Normalized multi-JOIN (20 seconds)
|
|
468
|
+
SELECT lp.id, lp.headline, lc.company_name
|
|
469
|
+
FROM linkedin_profile lp
|
|
470
|
+
JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
|
|
471
|
+
JOIN linkedin_company lc ON lc.id = pos.linkedin_company_id
|
|
472
|
+
WHERE pos.end_date IS NULL
|
|
473
|
+
AND lp.headline ILIKE '%engineer%'
|
|
474
|
+
AND lc.employee_count > 1000
|
|
475
|
+
LIMIT 50
|
|
476
|
+
|
|
477
|
+
-- ✅ FAST: Denormalized JOIN (217ms - 93x faster)
|
|
478
|
+
SELECT lkd.profile_id, lkd.headline, lkdc.name
|
|
479
|
+
FROM lkd_profile lkd
|
|
480
|
+
JOIN lkd_company lkdc ON lkdc.linkedin_company_id = lkd.linkedin_company_id
|
|
481
|
+
WHERE lkd.headline ILIKE '%engineer%'
|
|
482
|
+
AND lkdc.employee_count > 1000
|
|
483
|
+
LIMIT 50
|
|
484
|
+
```
|
|
485
|
+
|
|
486
|
+
See [B2B_CROSS_TABLE_TEST_FINDINGS.md](./B2B_CROSS_TABLE_TEST_FINDINGS.md) for full test results.
|
|
487
|
+
|
|
488
|
+
---
|
|
489
|
+
|
|
490
|
+
## Query Patterns & Recommendations
|
|
491
|
+
|
|
492
|
+
### Pattern 1: Find Profiles with Specific Experience
|
|
493
|
+
|
|
494
|
+
```sql
|
|
495
|
+
-- ✅ CORRECT: Use normalized tables
|
|
496
|
+
SELECT DISTINCT lp.id, lp.formatted_name, pp.title, pp.company_name
|
|
497
|
+
FROM linkedin_profile lp
|
|
498
|
+
JOIN linkedin_profile_position3 pp ON pp.linkedin_profile_id = lp.id
|
|
499
|
+
WHERE pp.title ILIKE '%product manager%'
|
|
500
|
+
AND pp.linkedin_company_id IN (SELECT id FROM linkedin_company WHERE employee_count > 1000)
|
|
501
|
+
LIMIT 100
|
|
502
|
+
|
|
503
|
+
-- ❌ WRONG: Don't search JSON arrays
|
|
504
|
+
SELECT profile_id, name FROM lkd_profile
|
|
505
|
+
WHERE EXISTS (SELECT 1 FROM jsonb_array_elements(experience::jsonb) e
|
|
506
|
+
WHERE (e->>'title') ILIKE '%product manager%')
|
|
507
|
+
```
|
|
508
|
+
|
|
509
|
+
### Pattern 2: Get Complete Profile for Display
|
|
510
|
+
|
|
511
|
+
```sql
|
|
512
|
+
-- ✅ Option A: Denormalized (when you need enriched fields)
|
|
513
|
+
SELECT profile_id, name, title, company_name, headline,
|
|
514
|
+
experience, education, certifications, skills
|
|
515
|
+
FROM lkd_profile WHERE profile_id = ?
|
|
516
|
+
|
|
517
|
+
-- ✅ Option B: Normalized with multiple queries (faster total time)
|
|
518
|
+
SELECT * FROM linkedin_profile WHERE id = ?;
|
|
519
|
+
SELECT * FROM linkedin_profile_position3 WHERE linkedin_profile_id = ? ORDER BY start_date DESC;
|
|
520
|
+
SELECT * FROM linkedin_profile_education2 WHERE linkedin_profile_id = ?;
|
|
521
|
+
```
|
|
522
|
+
|
|
523
|
+
### Pattern 3: Find Companies with Funding
|
|
524
|
+
|
|
525
|
+
```sql
|
|
526
|
+
-- ✅ CORRECT: Use normalized with EXISTS
|
|
527
|
+
SELECT lc.id, lc.company_name, lc.employee_count
|
|
528
|
+
FROM linkedin_company lc
|
|
529
|
+
WHERE EXISTS (SELECT 1 FROM linkedin_crunchbase_funding cf WHERE cf.linkedin_company_id = lc.id)
|
|
530
|
+
AND lc.employee_count BETWEEN 50 AND 500
|
|
531
|
+
LIMIT 100
|
|
532
|
+
|
|
533
|
+
-- ❌ WRONG: Don't filter on JSON
|
|
534
|
+
SELECT linkedin_company_id, name FROM lkd_company
|
|
535
|
+
WHERE crunchbase_funding IS NOT NULL AND crunchbase_funding::text != 'null'
|
|
536
|
+
```
|
|
537
|
+
|
|
538
|
+
### Pattern 4: Company Lookup with Full Data
|
|
539
|
+
|
|
540
|
+
```sql
|
|
541
|
+
-- ✅ CORRECT: Denormalized for complete company
|
|
542
|
+
SELECT linkedin_company_id, name, description, employee_count,
|
|
543
|
+
industries, locations, crunchbase_funding
|
|
544
|
+
FROM lkd_company WHERE linkedin_company_id = ?
|
|
545
|
+
|
|
546
|
+
-- OR: Normalized with JOINs (faster but more code)
|
|
547
|
+
SELECT lc.*,
|
|
548
|
+
array_agg(DISTINCT ca.address) as addresses,
|
|
549
|
+
array_agg(DISTINCT cf.round_name) as funding_rounds
|
|
550
|
+
FROM linkedin_company lc
|
|
551
|
+
LEFT JOIN linkedin_company_address2 ca ON ca.linkedin_company_id = lc.id
|
|
552
|
+
LEFT JOIN linkedin_crunchbase_funding cf ON cf.linkedin_company_id = lc.id
|
|
553
|
+
WHERE lc.id = ?
|
|
554
|
+
GROUP BY lc.id
|
|
555
|
+
```
|
|
556
|
+
|
|
557
|
+
---
|
|
558
|
+
|
|
559
|
+
## Summary: The Golden Rules
|
|
560
|
+
|
|
561
|
+
### Profile Rules
|
|
562
|
+
|
|
563
|
+
1. **For indexed lookups (slug, ID, updated_at)**: Always use normalized tables
|
|
564
|
+
2. **For country + headline filtering**: Use `lkd_profile` with `country_iso` (normalized times out)
|
|
565
|
+
3. **For single filters on common terms**: Use normalized tables (faster)
|
|
566
|
+
4. **For multi-filter combinations (2+ filters)**: Use denormalized views (2-3x faster)
|
|
567
|
+
5. **For rare/uncommon headline terms**: Use denormalized views (2-2.5x faster)
|
|
568
|
+
6. **For multiple skills searches**: Use denormalized views (2.8x faster)
|
|
569
|
+
7. **For nested data**: Normalized with JOINs unless you need enriched fields
|
|
570
|
+
8. **For aggregations**: Always normalized
|
|
571
|
+
9. **Never**: Filter on JSON array contents in denormalized views
|
|
572
|
+
|
|
573
|
+
### Company Rules (Updated Jan 2026)
|
|
574
|
+
|
|
575
|
+
1. **For slug lookups**: Use `key64()` with normalized (4ms) or denormalized direct (21-47ms)
|
|
576
|
+
2. **For domain/ticker/universal_name lookups**: Normalized only (indexed, 2-5ms)
|
|
577
|
+
3. **For single indexed filters (country_iso, industry_code)**: Normalized (20-136x faster)
|
|
578
|
+
4. **For compound queries (2+ filters)**: Denormalized (1.5-2.4x faster)
|
|
579
|
+
5. **For funding + other filters**: Denormalized (17x faster) ⚠️
|
|
580
|
+
6. **For nested data by known ID**: Normalized JOINs (7-19x faster)
|
|
581
|
+
7. **For aggregations (COUNT, GROUP BY)**: Normalized only (denormalized TIMEOUTs)
|
|
582
|
+
8. **For pre-formatted JSON response**: Denormalized (convenience)
|
|
583
|
+
|
|
584
|
+
### Cross-Table Rules (Profile + Company Queries)
|
|
585
|
+
|
|
586
|
+
1. **Company name search**: Always use `linkedin_profile.org` (GIN indexed, 68x faster)
|
|
587
|
+
2. **Headline/skill + company constraint**: Use `lkd_profile JOIN lkd_company` (20-93x faster)
|
|
588
|
+
3. **Company-first lookup → employees**: Use normalized (5-8x faster)
|
|
589
|
+
4. **Multi-filter profile + company**: Denormalized JOIN is often the only option that works
|
|
590
|
+
|
|
591
|
+
### Quick Decision Flowchart
|
|
592
|
+
|
|
593
|
+
```
|
|
594
|
+
Query has indexed lookup (slug, ID, updated_at)?
|
|
595
|
+
└─ YES → Use linkedin_profile (normalized)
|
|
596
|
+
|
|
597
|
+
Query needs profile + company data together?
|
|
598
|
+
└─ YES:
|
|
599
|
+
Searching by company name?
|
|
600
|
+
└─ YES → Use linkedin_profile.org (GIN indexed, 68x faster)
|
|
601
|
+
Profile text filter (headline/skill) + company constraint?
|
|
602
|
+
└─ YES → Use lkd_profile JOIN lkd_company (20-93x faster)
|
|
603
|
+
Company-first lookup?
|
|
604
|
+
└─ YES → Use normalized JOINs (5-8x faster)
|
|
605
|
+
|
|
606
|
+
Query has country filter + uncommon headline term?
|
|
607
|
+
└─ YES → Use lkd_profile (denormalized) - normalized will TIMEOUT
|
|
608
|
+
|
|
609
|
+
Query has 3+ combined filters?
|
|
610
|
+
└─ YES → Use lkd_profile (denormalized) - 2-3x faster
|
|
611
|
+
|
|
612
|
+
Query searches rare headline terms (iOS, kubernetes, blockchain)?
|
|
613
|
+
└─ YES → Use lkd_profile (denormalized) - 2-2.5x faster
|
|
614
|
+
|
|
615
|
+
Query has multiple skills?
|
|
616
|
+
└─ YES → Use lkd_profile (denormalized) - 2.8x faster
|
|
617
|
+
|
|
618
|
+
Query needs large result set (5000+)?
|
|
619
|
+
└─ YES → Use linkedin_profile (normalized) - 17x faster
|
|
620
|
+
|
|
621
|
+
Default:
|
|
622
|
+
└─ Use linkedin_profile (normalized)
|
|
623
|
+
```
|
|
624
|
+
|
|
625
|
+
---
|
|
626
|
+
|
|
627
|
+
## Common Query Patterns
|
|
628
|
+
|
|
629
|
+
### Pattern 5: Country-Filtered Headline Search (NEW)
|
|
630
|
+
|
|
631
|
+
```sql
|
|
632
|
+
-- ✅ CORRECT: Use lkd_profile for country + uncommon term
|
|
633
|
+
SELECT profile_id, first_name, last_name, headline, locality, url
|
|
634
|
+
FROM lkd_profile
|
|
635
|
+
WHERE country_iso = 'US'
|
|
636
|
+
AND headline ~* '(\miOS\M|\mSwift\M).*(developer|engineer)'
|
|
637
|
+
LIMIT 100
|
|
638
|
+
-- Result: 20s, 100 rows ✅
|
|
639
|
+
|
|
640
|
+
-- ❌ WRONG: linkedin_profile times out
|
|
641
|
+
SELECT id, first_name, last_name, headline, location_name
|
|
642
|
+
FROM linkedin_profile
|
|
643
|
+
WHERE location_country_code = 'US'
|
|
644
|
+
AND headline ILIKE '%iOS developer%'
|
|
645
|
+
LIMIT 100
|
|
646
|
+
-- Result: TIMEOUT after 30s ❌
|
|
647
|
+
```
|
|
648
|
+
|
|
649
|
+
### Pattern 6: Multi-Skill Search
|
|
650
|
+
|
|
651
|
+
```sql
|
|
652
|
+
-- ✅ CORRECT: Use lkd_profile for multiple skills (2.8x faster)
|
|
653
|
+
SELECT profile_id, first_name, headline, skills[1:5]
|
|
654
|
+
FROM lkd_profile
|
|
655
|
+
WHERE 'Python' = ANY(skills)
|
|
656
|
+
AND 'SQL' = ANY(skills)
|
|
657
|
+
AND 'Data Analysis' = ANY(skills)
|
|
658
|
+
LIMIT 50
|
|
659
|
+
-- Result: 2.5s
|
|
660
|
+
|
|
661
|
+
-- 🟡 WORKS but slower: linkedin_profile
|
|
662
|
+
SELECT id, first_name, headline, skills[1:5]
|
|
663
|
+
FROM linkedin_profile
|
|
664
|
+
WHERE 'Python' = ANY(skills)
|
|
665
|
+
AND 'SQL' = ANY(skills)
|
|
666
|
+
AND 'Data Analysis' = ANY(skills)
|
|
667
|
+
LIMIT 50
|
|
668
|
+
-- Result: 7s (2.8x slower)
|
|
669
|
+
```
|
|
670
|
+
|
|
671
|
+
### Pattern 7: Rare Headline Term Search
|
|
672
|
+
|
|
673
|
+
```sql
|
|
674
|
+
-- ✅ FASTER: Use lkd_profile for rare terms (2.4x faster)
|
|
675
|
+
SELECT profile_id, first_name, headline
|
|
676
|
+
FROM lkd_profile
|
|
677
|
+
WHERE headline ILIKE '%blockchain%'
|
|
678
|
+
LIMIT 100
|
|
679
|
+
-- Result: 1.4s
|
|
680
|
+
|
|
681
|
+
-- 🟡 WORKS but slower: linkedin_profile
|
|
682
|
+
SELECT id, first_name, headline
|
|
683
|
+
FROM linkedin_profile
|
|
684
|
+
WHERE headline ILIKE '%blockchain%'
|
|
685
|
+
LIMIT 100
|
|
686
|
+
-- Result: 3.4s (2.4x slower)
|
|
687
|
+
```
|
|
688
|
+
|
|
689
|
+
### Pattern 8: Triple Filter Combination
|
|
690
|
+
|
|
691
|
+
```sql
|
|
692
|
+
-- ✅ MUCH FASTER: Use lkd_profile for 3+ filters (3.1x faster)
|
|
693
|
+
SELECT profile_id, first_name, headline, connection_count
|
|
694
|
+
FROM lkd_profile
|
|
695
|
+
WHERE connection_count > 500
|
|
696
|
+
AND 'Python' = ANY(skills)
|
|
697
|
+
AND headline ILIKE '%engineer%'
|
|
698
|
+
LIMIT 100
|
|
699
|
+
-- Result: 5.8s
|
|
700
|
+
|
|
701
|
+
-- 🟡 WORKS but much slower: linkedin_profile
|
|
702
|
+
SELECT id, first_name, headline, connections
|
|
703
|
+
FROM linkedin_profile
|
|
704
|
+
WHERE connections > 500
|
|
705
|
+
AND 'Python' = ANY(skills)
|
|
706
|
+
AND headline ILIKE '%engineer%'
|
|
707
|
+
LIMIT 100
|
|
708
|
+
-- Result: 18.2s (3.1x slower)
|
|
709
|
+
```
|
|
710
|
+
|
|
711
|
+
---
|
|
712
|
+
|
|
713
|
+
## Company Query Patterns (Updated Jan 2026)
|
|
714
|
+
|
|
715
|
+
### Pattern 9: Company Slug Lookup
|
|
716
|
+
|
|
717
|
+
```sql
|
|
718
|
+
-- ✅ FASTEST: Use key64() for indexed lookup (4ms)
|
|
719
|
+
SELECT id, company_name, employee_count
|
|
720
|
+
FROM linkedin_company
|
|
721
|
+
WHERE id = (
|
|
722
|
+
SELECT linkedin_company_id FROM linkedin_company_slug
|
|
723
|
+
WHERE slug_key64 = key64('google') LIMIT 1
|
|
724
|
+
)
|
|
725
|
+
|
|
726
|
+
-- 🟡 WORKS: Denormalized direct (21-47ms) - simpler but 5-12x slower
|
|
727
|
+
SELECT linkedin_company_id, name, employee_count
|
|
728
|
+
FROM lkd_company WHERE slug = 'google'
|
|
729
|
+
|
|
730
|
+
-- ❌ TIMEOUT: Raw slug without key64()
|
|
731
|
+
SELECT linkedin_company_id FROM linkedin_company_slug WHERE slug = 'google'
|
|
732
|
+
-- Result: 30s+ TIMEOUT
|
|
733
|
+
```
|
|
734
|
+
|
|
735
|
+
### Pattern 10: Company Domain/Ticker Lookup
|
|
736
|
+
|
|
737
|
+
```sql
|
|
738
|
+
-- ✅ CORRECT: Only normalized has these indexed columns
|
|
739
|
+
SELECT id, company_name, domain FROM linkedin_company WHERE domain = 'google.com'
|
|
740
|
+
-- Result: 3ms
|
|
741
|
+
|
|
742
|
+
SELECT id, company_name, ticker FROM linkedin_company WHERE ticker = 'GOOGL'
|
|
743
|
+
-- Result: 5ms
|
|
744
|
+
|
|
745
|
+
SELECT id, company_name FROM linkedin_company WHERE universal_name = 'google'
|
|
746
|
+
-- Result: 2ms
|
|
747
|
+
|
|
748
|
+
-- ❌ WRONG: lkd_company does NOT have domain/ticker columns
|
|
749
|
+
```
|
|
750
|
+
|
|
751
|
+
### Pattern 11: Company Compound Filters (2+ filters)
|
|
752
|
+
|
|
753
|
+
```sql
|
|
754
|
+
-- ✅ FASTER: Use lkd_company for compound queries (1.9-2.4x faster)
|
|
755
|
+
SELECT linkedin_company_id, name
|
|
756
|
+
FROM lkd_company
|
|
757
|
+
WHERE country_iso = 'US'
|
|
758
|
+
AND employee_count > 100
|
|
759
|
+
AND description ILIKE '%software%'
|
|
760
|
+
LIMIT 50
|
|
761
|
+
-- Result: 135-372ms
|
|
762
|
+
|
|
763
|
+
-- 🟡 WORKS but slower: linkedin_company
|
|
764
|
+
SELECT id, company_name
|
|
765
|
+
FROM linkedin_company
|
|
766
|
+
WHERE country_iso = 'US'
|
|
767
|
+
AND employee_count > 100
|
|
768
|
+
AND description ILIKE '%software%'
|
|
769
|
+
LIMIT 50
|
|
770
|
+
-- Result: 247-797ms (2.1x slower)
|
|
771
|
+
```
|
|
772
|
+
|
|
773
|
+
### Pattern 12: Companies with Funding + Filters ⚠️
|
|
774
|
+
|
|
775
|
+
```sql
|
|
776
|
+
-- ✅ MUCH FASTER: Use lkd_company for funding + other filters (17x faster!)
|
|
777
|
+
SELECT linkedin_company_id, name, crunchbase_funding
|
|
778
|
+
FROM lkd_company
|
|
779
|
+
WHERE employee_count > 50
|
|
780
|
+
AND crunchbase_funding IS NOT NULL
|
|
781
|
+
AND crunchbase_funding::text != 'null'
|
|
782
|
+
AND crunchbase_funding::text != '[]'
|
|
783
|
+
LIMIT 50
|
|
784
|
+
-- Result: 70ms
|
|
785
|
+
|
|
786
|
+
-- ❌ VERY SLOW: Normalized with EXISTS
|
|
787
|
+
SELECT lc.id, lc.company_name
|
|
788
|
+
FROM linkedin_company lc
|
|
789
|
+
WHERE lc.employee_count > 50
|
|
790
|
+
AND EXISTS (SELECT 1 FROM linkedin_crunchbase_funding cf WHERE cf.linkedin_company_id = lc.id)
|
|
791
|
+
LIMIT 50
|
|
792
|
+
-- Result: 1,183ms (17x slower)
|
|
793
|
+
```
|
|
794
|
+
|
|
795
|
+
### Pattern 13: Company Industry Lookup
|
|
796
|
+
|
|
797
|
+
```sql
|
|
798
|
+
-- ✅ FASTEST: Use normalized with industry_code (2ms)
|
|
799
|
+
SELECT id, company_name, industry_code
|
|
800
|
+
FROM linkedin_company
|
|
801
|
+
WHERE industry_code = 4
|
|
802
|
+
LIMIT 100
|
|
803
|
+
|
|
804
|
+
-- 🟡 ALTERNATIVE: Denormalized JSON search (272ms) - 136x slower
|
|
805
|
+
SELECT linkedin_company_id, name
|
|
806
|
+
FROM lkd_company
|
|
807
|
+
WHERE industries::text ILIKE '%Software Development%'
|
|
808
|
+
LIMIT 100
|
|
809
|
+
```
|
|
810
|
+
|
|
811
|
+
### Pattern 14: Company Aggregations
|
|
812
|
+
|
|
813
|
+
```sql
|
|
814
|
+
-- ✅ CORRECT: Only use normalized for aggregations
|
|
815
|
+
SELECT country_iso, COUNT(*)
|
|
816
|
+
FROM linkedin_company
|
|
817
|
+
WHERE country_iso IS NOT NULL
|
|
818
|
+
GROUP BY country_iso
|
|
819
|
+
ORDER BY COUNT(*) DESC
|
|
820
|
+
LIMIT 10
|
|
821
|
+
-- Result: 26s (slow but works)
|
|
822
|
+
|
|
823
|
+
-- ❌ TIMEOUT: Denormalized for aggregations
|
|
824
|
+
SELECT country_iso, COUNT(*) FROM lkd_company ...
|
|
825
|
+
-- Result: TIMEOUT (30s+)
|
|
826
|
+
```
|
|
827
|
+
|
|
828
|
+
---
|
|
829
|
+
|
|
830
|
+
## Appendix: Fields Only Available in Denormalized Views
|
|
831
|
+
|
|
832
|
+
### `lkd_profile` Exclusive Fields
|
|
833
|
+
|
|
834
|
+
**Key columns for filtering:**
|
|
835
|
+
|
|
836
|
+
| Column | Type | Notes |
|
|
837
|
+
| --------------- | ---- | ---------------------------------------------------------------- |
|
|
838
|
+
| `country_iso` | text | **Fast country filter** (use instead of `location_country_code`) |
|
|
839
|
+
| `country_name` | text | Full country name |
|
|
840
|
+
| `industry_name` | text | Denormalized industry name |
|
|
841
|
+
| `url` | text | Full profile URL |
|
|
842
|
+
|
|
843
|
+
**JSON fields with enriched data:**
|
|
844
|
+
|
|
845
|
+
- `experience` - JSON array with enriched fields per position:
|
|
846
|
+
- `seniority[]` - Array of {id, seniority} objects
|
|
847
|
+
- `job_function[]` - Array of {id, job_function} objects
|
|
848
|
+
- `employment_type[]` - Array of {id, job_employment_type} objects
|
|
849
|
+
- `academic_qualification[]` - Array of {id, academic_qualification} objects
|
|
850
|
+
- `inferred_location` - Geocoded {latitude, longitude, formatted_address, country_iso, admin_district, locality}
|
|
851
|
+
- `education` - JSON array of education records
|
|
852
|
+
- `certifications` - JSON array
|
|
853
|
+
- `courses` - JSON array
|
|
854
|
+
- `projects` - JSON array
|
|
855
|
+
- `volunteering` - JSON array
|
|
856
|
+
- `patents` - JSON array
|
|
857
|
+
- `awards` - JSON array
|
|
858
|
+
- `publications` - JSON array
|
|
859
|
+
- `languages` - JSON array
|
|
860
|
+
- `recommendations` - JSON array
|
|
861
|
+
- `test_scores` - JSON array
|
|
862
|
+
- `articles` - JSON array
|
|
863
|
+
|
|
864
|
+
### `lkd_company` Exclusive Fields
|
|
865
|
+
|
|
866
|
+
- `industries[]` - Pre-joined industry data with {id, name, primary}
|
|
867
|
+
- `locations[]` - Pre-joined with inferred_location geocoding
|
|
868
|
+
- `crunchbase_funding[]` - Formatted funding rounds with URLs
|
|
869
|
+
- `naics_codes[]` - Pre-joined NAICS code data
|
|
870
|
+
- `inferred_location` - Geocoded primary location
|
|
871
|
+
|
|
872
|
+
---
|
|
873
|
+
|
|
874
|
+
## Appendix: Column Comparison
|
|
875
|
+
|
|
876
|
+
### Key Filtering Columns
|
|
877
|
+
|
|
878
|
+
| Purpose | `linkedin_profile` | `lkd_profile` | Notes |
|
|
879
|
+
| --------------- | ----------------------- | ------------------------------- | ------------------------------------------- |
|
|
880
|
+
| Profile ID | `id` | `profile_id` | Both indexed |
|
|
881
|
+
| Country code | `location_country_code` | `country_iso` | **lkd_profile works with headline filters** |
|
|
882
|
+
| Location string | `location_name` | `locality` | Similar performance |
|
|
883
|
+
| Current company | `org` | `company_name` | `org` has GIN index |
|
|
884
|
+
| Current title | `title` | `title` | Similar |
|
|
885
|
+
| Headline | `headline` | `headline` | Similar |
|
|
886
|
+
| Skills | `skills` (array) | `skills` (array) | lkd_profile faster for multi-skill |
|
|
887
|
+
| Connections | `connections` | `connection_count` | Similar |
|
|
888
|
+
| Followers | `num_followers` | `follower_count` | Similar |
|
|
889
|
+
| Industry | `linkedin_industry_id` | `industry_id` + `industry_name` | lkd_profile has name |
|
|
890
|
+
| Updated | `updated_at` | `updated_at` | linkedin_profile indexed |
|
|
891
|
+
|
|
892
|
+
### Indexes Available
|
|
893
|
+
|
|
894
|
+
**`linkedin_profile` indexes:**
|
|
895
|
+
|
|
896
|
+
- `linkedin_profile_pkey` - Primary key on `id`
|
|
897
|
+
- `ix_linkedin_profile_org_tsv` - GIN on `org` (full-text)
|
|
898
|
+
- `linkedin_profile_updated_at_idx` - on `updated_at`
|
|
899
|
+
- `ix_linkedin_profile_linkedin_user_id` - on `linkedin_user_id`
|
|
900
|
+
|
|
901
|
+
**`lkd_profile`:** No indexes (denormalized view), but optimized for compound queries
|
|
902
|
+
|
|
903
|
+
### Company Column Comparison
|
|
904
|
+
|
|
905
|
+
| Purpose | `linkedin_company` | `lkd_company` | Notes |
|
|
906
|
+
| -------------- | -------------------- | --------------------- | ----------------------------------- |
|
|
907
|
+
| Company ID | `id` | `linkedin_company_id` | Both fast for lookups |
|
|
908
|
+
| Name | `company_name` | `name` | Similar |
|
|
909
|
+
| Slug | N/A (use slug table) | `slug` | **lkd_company has it directly** |
|
|
910
|
+
| Domain | `domain` | N/A | **Only normalized has domain** |
|
|
911
|
+
| Ticker | `ticker` | `ticker` | Both have it |
|
|
912
|
+
| Universal name | `universal_name` | N/A | **Only normalized (indexed)** |
|
|
913
|
+
| Country | `country_iso` | `country_iso` | Normalized faster for single filter |
|
|
914
|
+
| Locality | `locality` | `locality` | Similar |
|
|
915
|
+
| Employee count | `employee_count` | `employee_count` | Similar |
|
|
916
|
+
| Follower count | `follower_count` | `follower_count` | Similar |
|
|
917
|
+
| Founded | `founded` | `founded_year` | Similar |
|
|
918
|
+
| Industry | `industry_code` | `industries` (JSON) | Normalized indexed (136x faster) |
|
|
919
|
+
| Funding | N/A (use JOIN) | `crunchbase_funding` | JSON pre-formatted |
|
|
920
|
+
| Locations | N/A (use JOIN) | `locations` (JSON) | JSON pre-formatted |
|
|
921
|
+
|
|
922
|
+
### Company Indexes Available
|
|
923
|
+
|
|
924
|
+
**`linkedin_company` indexes:**
|
|
925
|
+
|
|
926
|
+
- `linkedin_company_pkey` - Primary key on `id`
|
|
927
|
+
- `ix_linkedin_company_domain` - on `domain`
|
|
928
|
+
- `ix_linkedin_company_ticker` - on `ticker`
|
|
929
|
+
- `linkedin_company_universal_name_ix` - on `universal_name`
|
|
930
|
+
- `ix_linkedin_company_tsv` - GIN on company_name + universal_name (full-text)
|
|
931
|
+
- `ix_linkedin_company_company_id` - on `company_id`
|
|
932
|
+
- `ix_linkedin_company_max_snapshot_id` - on `max_snapshot_id`
|
|
933
|
+
|
|
934
|
+
**`linkedin_company_slug` indexes:**
|
|
935
|
+
|
|
936
|
+
- `linkedin_company_slug_pk` - Primary key
|
|
937
|
+
- `linkedin_company_slug_slug_key64_uniq` - on `slug_key64` (**use with key64() function**)
|
|
938
|
+
- `linkedin_company_slug_linkedin_company_id_ix` - on `linkedin_company_id`
|
|
939
|
+
|
|
940
|
+
**`lkd_company`:** No indexes (it's a VIEW, not a materialized view)
|
|
941
|
+
|
|
942
|
+
---
|
|
943
|
+
|
|
944
|
+
## Test Methodology (Jan 2026)
|
|
945
|
+
|
|
946
|
+
Tests were conducted against a live B2B database via `http://165.22.151.131:3000/query` endpoint.
|
|
947
|
+
|
|
948
|
+
- Each query was run multiple times for consistency verification
|
|
949
|
+
- Timings include network latency (remote server)
|
|
950
|
+
- Results show `duration_ms` as reported by the database
|
|
951
|
+
- TIMEOUT = 30+ seconds (query cancelled)
|
|
952
|
+
- Tests covered: ID lookups, slug lookups, text searches, compound filters, nested data, aggregations
|