orangeslice 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +107 -0
- package/dist/b2b.d.ts +30 -0
- package/dist/b2b.js +89 -0
- package/dist/index.d.ts +29 -0
- package/dist/index.js +28 -0
- package/dist/queue.d.ts +9 -0
- package/dist/queue.js +48 -0
- package/docs/B2B_CROSS_TABLE_TEST_FINDINGS.md +255 -0
- package/docs/B2B_DATABASE.md +314 -0
- package/docs/B2B_DATABASE_TEST_FINDINGS.md +476 -0
- package/docs/B2B_EMPLOYEE_SEARCH.md +697 -0
- package/docs/B2B_GENERALIZATION_RULES.md +220 -0
- package/docs/B2B_NLP_QUERY_MAPPINGS.md +240 -0
- package/docs/B2B_NORMALIZED_VS_DENORMALIZED.md +952 -0
- package/docs/B2B_SCHEMA.md +1042 -0
- package/docs/B2B_SQL_COMPREHENSIVE_TEST_FINDINGS.md +301 -0
- package/docs/B2B_TABLE_INDICES.ts +496 -0
- package/package.json +33 -0
|
@@ -0,0 +1,301 @@
|
|
|
1
|
+
# B2B SQL Comprehensive Test Findings
|
|
2
|
+
|
|
3
|
+
**Test Date:** January 14, 2026
|
|
4
|
+
**Test Script:** `scripts/b2b-sql-comprehensive-queries.sh`
|
|
5
|
+
**Total Queries:** 80
|
|
6
|
+
**Runtime:** ~7 minutes
|
|
7
|
+
|
|
8
|
+
## Executive Summary
|
|
9
|
+
|
|
10
|
+
Ran 80 comprehensive SQL queries against the B2B database to validate documented query patterns and discover performance characteristics. Key findings include several query patterns that timeout, sparse data in some fields, and company lookup gotchas.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## ⚡ Fast Indexed Lookups (< 100ms) - All Working
|
|
15
|
+
|
|
16
|
+
| Query | Expected | Actual | Status |
|
|
17
|
+
| ----------------------------------- | -------- | ------ | ----------------------------- |
|
|
18
|
+
| Company by domain | ~10ms | ~638ms | ⚠️ Returns multiple companies |
|
|
19
|
+
| Company by universal_name | ~36ms | ~18ms | ✅ Faster than expected |
|
|
20
|
+
| Company by slug using key64() | ~25ms | ~20ms | ✅ |
|
|
21
|
+
| Profile by slug using key64() | ~25ms | ~21ms | ✅ |
|
|
22
|
+
| People at company by ID | ~35ms | ~23ms | ✅ |
|
|
23
|
+
| Title ILIKE after company_id filter | ~35-65ms | ~21ms | ✅ Faster than expected |
|
|
24
|
+
| Title regex after company_id filter | ~30ms | ~84ms | ✅ Slightly slower |
|
|
25
|
+
|
|
26
|
+
### Key Finding: Multiple Companies Share Domains
|
|
27
|
+
|
|
28
|
+
Query #1 returned **28 companies** with domain `stripe.com`, including unrelated businesses that use Stripe for payments. Always filter by `employee_count DESC` to find the main company.
|
|
29
|
+
|
|
30
|
+
```sql
|
|
31
|
+
-- ✅ CORRECT: Order by employee_count to get main company
|
|
32
|
+
SELECT lc.id, lc.company_name, lc.employee_count
|
|
33
|
+
FROM linkedin_company lc
|
|
34
|
+
WHERE lc.domain = 'stripe.com'
|
|
35
|
+
ORDER BY employee_count DESC NULLS LAST
|
|
36
|
+
LIMIT 1
|
|
37
|
+
-- Returns: id = 2135371, company_name = "Stripe", employee_count = 13809
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## ⚠️ Company Lookup Gotcha: Similar Names
|
|
43
|
+
|
|
44
|
+
**Query #2** (`universal_name = 'anthropic'`) returned the **wrong company**:
|
|
45
|
+
|
|
46
|
+
| Field | Returned | Expected |
|
|
47
|
+
| -------------- | ----------------------------- | ------------------- |
|
|
48
|
+
| id | 7936402 | (Anthropic AI's ID) |
|
|
49
|
+
| company_name | Anthropic | Anthropic |
|
|
50
|
+
| domain | anthropic.io | anthropic.com |
|
|
51
|
+
| employee_count | 3 | ~3,500 |
|
|
52
|
+
| description | "early stage investment fund" | AI research company |
|
|
53
|
+
|
|
54
|
+
**Lesson:** Similar company names exist. Use **domain lookup** for well-known companies, or verify results by employee count and description.
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## ✅ Description Search with industry_code (200-800ms)
|
|
59
|
+
|
|
60
|
+
All queries with `industry_code IN (4, 6, 96)` filter performed well:
|
|
61
|
+
|
|
62
|
+
| Query | Description | Duration |
|
|
63
|
+
| ----- | ------------------------------ | -------- |
|
|
64
|
+
| #8 | AI video companies | ~385ms |
|
|
65
|
+
| #9 | Personalization infrastructure | ~1,938ms |
|
|
66
|
+
| #10 | AI platforms for sales/RevOps | ~128ms |
|
|
67
|
+
| #11 | Developer API companies | ~677ms |
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
## ✅ Top-of-Funnel Queries (5000 results)
|
|
72
|
+
|
|
73
|
+
| Query | Pattern | Duration | Results |
|
|
74
|
+
| ----- | ---------------------------------- | -------- | ------- |
|
|
75
|
+
| #12 | ILIKE OR + industry_code | ~300ms | 5,000 |
|
|
76
|
+
| #13 | Indexed filters only | ~770ms | 5,000 |
|
|
77
|
+
| #16 | Headline regex | ~800ms | 5,000 |
|
|
78
|
+
| #17 | Word boundaries for short keywords | ~480ms | 2,000 |
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## ✅ Funding Queries (19-55ms)
|
|
83
|
+
|
|
84
|
+
All funding queries performed excellently:
|
|
85
|
+
|
|
86
|
+
| Query | Description | Duration |
|
|
87
|
+
| ----- | ------------------------- | -------- |
|
|
88
|
+
| #20 | Recently funded (2024+) | ~28ms |
|
|
89
|
+
| #21 | Series A companies | ~34ms |
|
|
90
|
+
| #22 | Companies funded by a]16h | ~55ms |
|
|
91
|
+
| #23 | Seed stage with amount | ~19ms |
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## ✅ Industry Reference Lookups (~15-27ms)
|
|
96
|
+
|
|
97
|
+
Query #24 revealed actual industry names:
|
|
98
|
+
|
|
99
|
+
| Code | Name | Company Count |
|
|
100
|
+
| ------ | --------------------------------- | ------------- |
|
|
101
|
+
| 48 | Construction | 1,959,385 |
|
|
102
|
+
| 44 | Real Estate | 1,796,025 |
|
|
103
|
+
| **96** | **IT Services and IT Consulting** | **1,786,164** |
|
|
104
|
+
| 27 | Retail | 1,507,617 |
|
|
105
|
+
| 80 | Advertising Services | 1,331,885 |
|
|
106
|
+
|
|
107
|
+
**Note:** Industry code 96 is "IT Services and IT Consulting", not "Internet" as previously documented.
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## ✅ Employee Growth Metrics (~25-126ms)
|
|
112
|
+
|
|
113
|
+
| Query | Description | Duration |
|
|
114
|
+
| ----- | ------------------------- | -------- |
|
|
115
|
+
| #26 | Fast-growing (>50% 12mo) | ~41ms |
|
|
116
|
+
| #27 | Hypergrowth (>20% 3mo) | ~126ms |
|
|
117
|
+
| #28 | Growth with LinkedIn data | ~25ms |
|
|
118
|
+
|
|
119
|
+
---
|
|
120
|
+
|
|
121
|
+
## 🚀 Subquery Pattern - Mixed Results
|
|
122
|
+
|
|
123
|
+
| Query | Keyword | Duration | Status |
|
|
124
|
+
| ----- | -------------- | ------------ | ---------- |
|
|
125
|
+
| #29 | "AI" founders | ~5,302ms | ✅ Works |
|
|
126
|
+
| #30 | "fintech" CTOs | **30,017ms** | ❌ TIMEOUT |
|
|
127
|
+
|
|
128
|
+
**Finding:** Subquery pattern works for **common keywords** (AI, SaaS) but may **timeout for rare keywords** (fintech).
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
## ✅ 3-Table and 4-Table Joins
|
|
133
|
+
|
|
134
|
+
| Query | Description | Duration | Status |
|
|
135
|
+
| ----- | ------------------------------------------ | ------------ | ---------- |
|
|
136
|
+
| #31 | Engineers at tech companies | ~232ms | ✅ |
|
|
137
|
+
| #32 | Decision makers at AI video companies | ~2,472ms | ✅ |
|
|
138
|
+
| #33 | Engineers at Series A startups | ~44ms | ✅ |
|
|
139
|
+
| #34 | VPs Engineering at Series B | ~209ms | ✅ |
|
|
140
|
+
| #35 | CTOs at Series A-C AI companies + ORDER BY | **30,076ms** | ❌ TIMEOUT |
|
|
141
|
+
|
|
142
|
+
**Finding:** 4-table joins work well with regex + industry_code, but **adding ORDER BY causes timeout**.
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## ❌ Education Queries - Large Company Filter Timeout
|
|
147
|
+
|
|
148
|
+
| Query | Description | Duration | Status |
|
|
149
|
+
| ----- | ------------------------- | ------------ | ---------- |
|
|
150
|
+
| #36 | Stanford alumni at Google | **30,118ms** | ❌ TIMEOUT |
|
|
151
|
+
| #37 | Recent MBA graduates | ~165ms | ✅ |
|
|
152
|
+
| #38 | CS degree holders | ~41ms | ✅ |
|
|
153
|
+
|
|
154
|
+
**Finding:** Education + large company filter (Google has 1441 as company_id) causes timeout. Filter by school only.
|
|
155
|
+
|
|
156
|
+
---
|
|
157
|
+
|
|
158
|
+
## ✅ Skills, Certifications, Articles
|
|
159
|
+
|
|
160
|
+
| Query | Description | Duration |
|
|
161
|
+
| ----- | ------------------------------- | ---------------- |
|
|
162
|
+
| #39 | Python + ML skills | ~979ms |
|
|
163
|
+
| #40 | Full-stack (React + Node) | ~22,172ms (slow) |
|
|
164
|
+
| #41 | AWS certified | ~47ms |
|
|
165
|
+
| #42 | Google Cloud certified | ~92ms |
|
|
166
|
+
| #43 | AI thought leaders (>100 likes) | ~651ms |
|
|
167
|
+
| #44 | Prolific writers (5+ articles) | ~137ms |
|
|
168
|
+
|
|
169
|
+
---
|
|
170
|
+
|
|
171
|
+
## ✅ Job Postings
|
|
172
|
+
|
|
173
|
+
| Query | Description | Duration | Status |
|
|
174
|
+
| ----- | --------------------------------- | ------------ | ---------- |
|
|
175
|
+
| #45 | Jobs at Stripe by company_id | ~18ms | ✅ |
|
|
176
|
+
| #46 | High-paying engineering (>200k) | ~122ms | ✅ |
|
|
177
|
+
| #47 | Remote jobs 2024 + location ILIKE | **30,016ms** | ❌ TIMEOUT |
|
|
178
|
+
|
|
179
|
+
**Finding:** Job queries by company_id are fast. Location ILIKE with date filter times out.
|
|
180
|
+
|
|
181
|
+
---
|
|
182
|
+
|
|
183
|
+
## ❌ Company Posts - Content ILIKE Timeout
|
|
184
|
+
|
|
185
|
+
| Query | Description | Duration | Status |
|
|
186
|
+
| ----- | -------------------------- | ------------ | ---------- |
|
|
187
|
+
| #48 | Companies posting about AI | **30,132ms** | ❌ TIMEOUT |
|
|
188
|
+
| #49 | Viral posts (>1000 likes) | ~263ms | ✅ |
|
|
189
|
+
|
|
190
|
+
**Finding:** Content ILIKE on company posts times out. Filter by likes/engagement works.
|
|
191
|
+
|
|
192
|
+
---
|
|
193
|
+
|
|
194
|
+
## ❌ Sparse/Unavailable Data
|
|
195
|
+
|
|
196
|
+
| Query | Description | Duration | Results | Status |
|
|
197
|
+
| ----- | -------------------------------- | --------- | ------- | ---------------- |
|
|
198
|
+
| #50 | Companies with ML/AI specialties | ~27,307ms | 0 | ❌ Sparse |
|
|
199
|
+
| #52 | Companies with has_careers=true | ~28,281ms | 0 | ❌ Not populated |
|
|
200
|
+
| #63 | Work modality options | ~17ms | 0 | ❌ Empty table |
|
|
201
|
+
| #64 | Country reference (tech hubs) | ~18ms | 0 | ❌ Query issue |
|
|
202
|
+
| #73 | Multilingual professionals | ~67ms | 0 | ❌ Sparse |
|
|
203
|
+
| #78 | Profiles with GitHub links | ~19ms | 0 | ❌ Sparse |
|
|
204
|
+
|
|
205
|
+
---
|
|
206
|
+
|
|
207
|
+
## ❌ Alumni Queries - Complex Joins Timeout
|
|
208
|
+
|
|
209
|
+
| Query | Description | Duration | Status |
|
|
210
|
+
| ----- | ----------------------------------- | ------------ | ---------- |
|
|
211
|
+
| #54 | Ex-Google at startups | **30,077ms** | ❌ TIMEOUT |
|
|
212
|
+
| #55 | Former employees at Stripe (simple) | ~104ms | ✅ |
|
|
213
|
+
|
|
214
|
+
**Finding:** Simple alumni queries (former at ONE company) work. Complex "ex-X now at Y" queries timeout.
|
|
215
|
+
|
|
216
|
+
---
|
|
217
|
+
|
|
218
|
+
## ✅ Aggregations
|
|
219
|
+
|
|
220
|
+
| Query | Description | Duration |
|
|
221
|
+
| ----- | ----------------------------- | -------- |
|
|
222
|
+
| #56 | Role distribution at OpenAI | ~5,761ms |
|
|
223
|
+
| #57 | Average funding by round type | ~334ms |
|
|
224
|
+
|
|
225
|
+
---
|
|
226
|
+
|
|
227
|
+
## ❌ Influencer/Ranking Queries Timeout
|
|
228
|
+
|
|
229
|
+
| Query | Description | Duration | Status |
|
|
230
|
+
| ----- | -------------------------- | ------------ | -------------- |
|
|
231
|
+
| #58 | LinkedIn influencers | **30,017ms** | ❌ Not indexed |
|
|
232
|
+
| #59 | High follower count (>50k) | ~3,963ms | ✅ |
|
|
233
|
+
| #68 | Fortune 500 companies | **30,014ms** | ❌ Not indexed |
|
|
234
|
+
| #69 | Inc Magazine ranked | **30,265ms** | ❌ Not indexed |
|
|
235
|
+
|
|
236
|
+
**Finding:** Boolean `influencer = true` and rank columns on `company` table are not indexed.
|
|
237
|
+
|
|
238
|
+
---
|
|
239
|
+
|
|
240
|
+
## ✅ Other Working Queries
|
|
241
|
+
|
|
242
|
+
| Query | Description | Duration |
|
|
243
|
+
| ----- | ------------------------------- | ------------ | ---------------------- |
|
|
244
|
+
| #53 | Companies with multiple offices | ~1,034ms |
|
|
245
|
+
| #70 | Tech/education volunteers | ~137ms |
|
|
246
|
+
| #71 | Innovation/excellence awards | ~45ms |
|
|
247
|
+
| #72 | Highly recommended (>20) | ~691ms |
|
|
248
|
+
| #74 | Patent holders | ~42ms |
|
|
249
|
+
| #75 | AI/ML patent holders | ~1,919ms |
|
|
250
|
+
| #76 | Published researchers | ~40ms |
|
|
251
|
+
| #77 | People with AI/ML projects | ~46ms |
|
|
252
|
+
| #79 | Full company enrichment | ~29ms |
|
|
253
|
+
| #80 | Full profile enrichment | **30,024ms** | ❌ (specific ID issue) |
|
|
254
|
+
|
|
255
|
+
---
|
|
256
|
+
|
|
257
|
+
## Summary of Query Patterns
|
|
258
|
+
|
|
259
|
+
### ✅ SAFE Patterns (Use These)
|
|
260
|
+
|
|
261
|
+
1. **Indexed lookups**: domain, universal_name, slug_key64, linkedin_company_id
|
|
262
|
+
2. **Title ILIKE after company_id filter**: ~21ms
|
|
263
|
+
3. **Description ILIKE with industry_code**: ~128-677ms
|
|
264
|
+
4. **Funding joins**: ~19-55ms
|
|
265
|
+
5. **Simple alumni queries** (former at ONE company): ~104ms
|
|
266
|
+
6. **3-table joins with industry_code**: ~232-2,472ms
|
|
267
|
+
7. **4-table joins with regex + industry_code (NO ORDER BY)**: ~44-209ms
|
|
268
|
+
8. **Headline regex for top-of-funnel**: ~800ms for 5000
|
|
269
|
+
|
|
270
|
+
### ❌ AVOID Patterns (Will Timeout)
|
|
271
|
+
|
|
272
|
+
1. **Subquery pattern with rare keywords** (fintech, niche terms)
|
|
273
|
+
2. **4-table joins with ORDER BY**
|
|
274
|
+
3. **Education + large company filter** (Stanford at Google)
|
|
275
|
+
4. **Complex alumni queries** (ex-X now at startups)
|
|
276
|
+
5. **Job posting location ILIKE + date filter**
|
|
277
|
+
6. **Company posts content ILIKE**
|
|
278
|
+
7. **Boolean filters**: `influencer = true`
|
|
279
|
+
8. **Rank filters**: `rank_fortune IS NOT NULL`
|
|
280
|
+
9. **ILIKE AND ILIKE** (two conditions on same field)
|
|
281
|
+
10. **ORDER BY with text search**
|
|
282
|
+
|
|
283
|
+
### ⚠️ Sparse Data (May Return 0 Results)
|
|
284
|
+
|
|
285
|
+
- `specialties` array
|
|
286
|
+
- `has_careers` boolean
|
|
287
|
+
- `linkedin_work_modality` table
|
|
288
|
+
- Language proficiency joins
|
|
289
|
+
- URL resources (GitHub links)
|
|
290
|
+
|
|
291
|
+
---
|
|
292
|
+
|
|
293
|
+
## Recommendations
|
|
294
|
+
|
|
295
|
+
1. **Always filter by employee_count** when looking up companies by domain
|
|
296
|
+
2. **Verify company identity** by domain, not just universal_name
|
|
297
|
+
3. **Use broader keywords** in subquery patterns (avoid rare terms)
|
|
298
|
+
4. **Skip ORDER BY** in 4-table joins
|
|
299
|
+
5. **Split complex alumni queries** into multiple simpler queries
|
|
300
|
+
6. **Use likes/engagement filters** for company posts, not content ILIKE
|
|
301
|
+
7. **Avoid sparse fields** unless specifically needed
|