orangeslice 1.6.1 → 1.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/apify.d.ts +57 -0
- package/dist/apify.js +126 -0
- package/dist/cli.js +18 -7
- package/dist/generateObject.d.ts +34 -0
- package/dist/generateObject.js +85 -0
- package/dist/geo.d.ts +50 -0
- package/dist/geo.js +91 -0
- package/dist/index.d.ts +32 -3
- package/dist/index.js +24 -3
- package/docs/AGENTS.md +94 -384
- package/docs/apify.md +133 -0
- package/docs/b2b.md +178 -0
- package/docs/browser.md +173 -0
- package/docs/serp.md +167 -0
- package/docs/strategies.md +250 -0
- package/package.json +2 -2
- package/docs/B2B_CROSS_TABLE_TEST_FINDINGS.md +0 -255
- package/docs/B2B_DATABASE.md +0 -314
- package/docs/B2B_DATABASE_TEST_FINDINGS.md +0 -476
- package/docs/B2B_EMPLOYEE_SEARCH.md +0 -697
- package/docs/B2B_GENERALIZATION_RULES.md +0 -220
- package/docs/B2B_NLP_QUERY_MAPPINGS.md +0 -240
- package/docs/B2B_NORMALIZED_VS_DENORMALIZED.md +0 -952
- package/docs/B2B_SCHEMA.md +0 -1042
- package/docs/B2B_SQL_COMPREHENSIVE_TEST_FINDINGS.md +0 -301
- package/docs/B2B_TABLE_INDICES.ts +0 -496
package/docs/B2B_DATABASE.md
DELETED
|
@@ -1,314 +0,0 @@
|
|
|
1
|
-
# B2B Database Guide
|
|
2
|
-
|
|
3
|
-
A comprehensive B2B data enrichment database with LinkedIn profiles, companies, job postings, and funding data.
|
|
4
|
-
|
|
5
|
-
## Database Scale
|
|
6
|
-
|
|
7
|
-
| Table | Estimated Rows |
|
|
8
|
-
| ----------------------------- | ---------------- |
|
|
9
|
-
| `linkedin_profile` | **1.15 billion** |
|
|
10
|
-
| `linkedin_profile_position3` | **2.6 billion** |
|
|
11
|
-
| `linkedin_job` | **1.48 billion** |
|
|
12
|
-
| `linkedin_profile_education2` | **965 million** |
|
|
13
|
-
| `linkedin_profile_slug` | **1.14 billion** |
|
|
14
|
-
| `person` | **1.32 billion** |
|
|
15
|
-
|
|
16
|
-
---
|
|
17
|
-
|
|
18
|
-
## Access Permissions
|
|
19
|
-
|
|
20
|
-
### ✅ Tables WITH Access (48 tables)
|
|
21
|
-
|
|
22
|
-
| Category | Tables |
|
|
23
|
-
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
24
|
-
| **Core LinkedIn** | `linkedin_profile`, `linkedin_company`, `linkedin_job`, `linkedin_profile_position3`, `linkedin_profile_education2` |
|
|
25
|
-
| **Slugs** | `linkedin_profile_slug`, `linkedin_company_slug` |
|
|
26
|
-
| **Funding** | `linkedin_crunchbase_funding` |
|
|
27
|
-
| **Reference** | `linkedin_industry`, `linkedin_school`, `linkedin_language`, `linkedin_specialty`, `linkedin_work_modality` |
|
|
28
|
-
| **Profile Details** | `linkedin_profile_certification`, `linkedin_profile_award`, `linkedin_profile_project`, `linkedin_profile_volunteer_experience`, `linkedin_profile_recommendation2`, `linkedin_profile_test_scores`, `linkedin_profile_language_proficiency` |
|
|
29
|
-
| **Company Details** | `linkedin_company_address2`, `linkedin_company_post` |
|
|
30
|
-
| **Content** | `linkedin_article`, `linkedin_patent`, `linkedin_project`, `linkedin_publication2` |
|
|
31
|
-
| **Jobs** | `job_title`, `job_function`, `job_seniority`, `job_employment_type`, `job_academic_qualification` |
|
|
32
|
-
| **Geography** | `country`, `locality`, `naics_code` |
|
|
33
|
-
| **Other** | `company`, `company_type`, `person`, `language_proficiency` |
|
|
34
|
-
|
|
35
|
-
### ❌ Tables WITHOUT Access (84 tables)
|
|
36
|
-
|
|
37
|
-
| Category | Tables (Permission Denied) |
|
|
38
|
-
| --------------------- | ----------------------------------------------------------------- |
|
|
39
|
-
| **Contact Data** | `email_address`, `email_address_linkedin_profile`, `phone_number` |
|
|
40
|
-
| **Company Reference** | `company_size`, `company_location`, `company_country` |
|
|
41
|
-
| **Geography** | `linkedin_geo` |
|
|
42
|
-
| **Indeed** | All `indeed_*` tables |
|
|
43
|
-
| **Domain/Web** | `domain`, `host`, `domain_traffic_estimate`, `web_tag` |
|
|
44
|
-
| **App Store** | `ios_sdk`, `play_store_sdk`, `itunes_*`, `play_store_*` |
|
|
45
|
-
|
|
46
|
-
---
|
|
47
|
-
|
|
48
|
-
## Top Industries
|
|
49
|
-
|
|
50
|
-
| ID | Industry | Companies |
|
|
51
|
-
| --- | ------------------------------------ | --------- |
|
|
52
|
-
| 48 | Construction | 1.96M |
|
|
53
|
-
| 44 | Real Estate | 1.80M |
|
|
54
|
-
| 96 | IT Services and IT Consulting | 1.79M |
|
|
55
|
-
| 27 | Retail | 1.51M |
|
|
56
|
-
| 80 | Advertising Services | 1.33M |
|
|
57
|
-
| 11 | Business Consulting and Services | 1.32M |
|
|
58
|
-
| 4 | **Software Development** | 1.19M |
|
|
59
|
-
| 6 | Technology, Information and Internet | 617K |
|
|
60
|
-
|
|
61
|
-
---
|
|
62
|
-
|
|
63
|
-
## GTM Query Examples
|
|
64
|
-
|
|
65
|
-
### 🏢 Company Enrichment
|
|
66
|
-
|
|
67
|
-
**Find company by universal_name (FAST ~300ms)**
|
|
68
|
-
|
|
69
|
-
```sql
|
|
70
|
-
SELECT id, company_name, domain, website, employee_count,
|
|
71
|
-
locality, country_code, description
|
|
72
|
-
FROM linkedin_company
|
|
73
|
-
WHERE universal_name = 'stripe';
|
|
74
|
-
```
|
|
75
|
-
|
|
76
|
-
**Find company by domain (FAST ~500ms)**
|
|
77
|
-
|
|
78
|
-
```sql
|
|
79
|
-
SELECT id, company_name, universal_name, employee_count, locality
|
|
80
|
-
FROM linkedin_company
|
|
81
|
-
WHERE domain = 'openai.com';
|
|
82
|
-
```
|
|
83
|
-
|
|
84
|
-
**Full company enrichment with industry**
|
|
85
|
-
|
|
86
|
-
```sql
|
|
87
|
-
SELECT lc.company_name, lc.domain, lc.employee_count, lc.locality,
|
|
88
|
-
lc.description, li.name as industry_name
|
|
89
|
-
FROM linkedin_company lc
|
|
90
|
-
LEFT JOIN linkedin_industry li ON li.id = lc.industry_code
|
|
91
|
-
WHERE lc.universal_name = 'openai';
|
|
92
|
-
```
|
|
93
|
-
|
|
94
|
-
### 👤 Profile Lookup
|
|
95
|
-
|
|
96
|
-
**Find profile by LinkedIn slug (FAST ~400ms - use key64)**
|
|
97
|
-
|
|
98
|
-
```sql
|
|
99
|
-
SELECT lp.first_name, lp.last_name, lp.headline, lp.location_name
|
|
100
|
-
FROM linkedin_profile lp
|
|
101
|
-
JOIN linkedin_profile_slug lps ON lps.linkedin_profile_id = lp.id
|
|
102
|
-
WHERE lps.slug_key64 = key64('satyanadella');
|
|
103
|
-
```
|
|
104
|
-
|
|
105
|
-
**Find company by LinkedIn slug (FAST ~400ms - use key64)**
|
|
106
|
-
|
|
107
|
-
```sql
|
|
108
|
-
SELECT lc.id, lc.company_name, lc.domain, lc.employee_count
|
|
109
|
-
FROM linkedin_company lc
|
|
110
|
-
JOIN linkedin_company_slug lcs ON lcs.linkedin_company_id = lc.id
|
|
111
|
-
WHERE lcs.slug_key64 = key64('meta');
|
|
112
|
-
```
|
|
113
|
-
|
|
114
|
-
### 👥 Find Employees at a Company
|
|
115
|
-
|
|
116
|
-
**Current employees at a company**
|
|
117
|
-
|
|
118
|
-
```sql
|
|
119
|
-
SELECT lp.first_name, lp.last_name, lp.headline, pos.title
|
|
120
|
-
FROM linkedin_profile lp
|
|
121
|
-
JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
|
|
122
|
-
WHERE pos.linkedin_company_id = 2135371 -- Stripe's ID
|
|
123
|
-
AND pos.end_date IS NULL
|
|
124
|
-
LIMIT 100;
|
|
125
|
-
```
|
|
126
|
-
|
|
127
|
-
**Headcount by role at a company**
|
|
128
|
-
|
|
129
|
-
```sql
|
|
130
|
-
SELECT pos.title, COUNT(*) as count
|
|
131
|
-
FROM linkedin_profile_position3 pos
|
|
132
|
-
WHERE pos.linkedin_company_id = 11130470 -- OpenAI
|
|
133
|
-
AND pos.end_date IS NULL
|
|
134
|
-
GROUP BY pos.title
|
|
135
|
-
ORDER BY count DESC
|
|
136
|
-
LIMIT 20;
|
|
137
|
-
```
|
|
138
|
-
|
|
139
|
-
### 🎯 Find Decision Makers
|
|
140
|
-
|
|
141
|
-
**VPs of Sales at mid-to-large companies**
|
|
142
|
-
|
|
143
|
-
```sql
|
|
144
|
-
SELECT lp.first_name, lp.last_name, lp.headline,
|
|
145
|
-
pos.title, lc.company_name, lc.employee_count
|
|
146
|
-
FROM linkedin_profile lp
|
|
147
|
-
JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
|
|
148
|
-
JOIN linkedin_company lc ON lc.id = pos.linkedin_company_id
|
|
149
|
-
WHERE pos.title ILIKE '%vp%sales%'
|
|
150
|
-
AND pos.end_date IS NULL
|
|
151
|
-
AND lc.employee_count > 50
|
|
152
|
-
LIMIT 20;
|
|
153
|
-
```
|
|
154
|
-
|
|
155
|
-
**C-suite at mid-size software companies (~10 seconds)**
|
|
156
|
-
|
|
157
|
-
```sql
|
|
158
|
-
SELECT lp.first_name, lp.last_name, pos.title,
|
|
159
|
-
lc.company_name, lc.employee_count
|
|
160
|
-
FROM linkedin_profile lp
|
|
161
|
-
JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
|
|
162
|
-
JOIN linkedin_company lc ON lc.id = pos.linkedin_company_id
|
|
163
|
-
WHERE pos.end_date IS NULL
|
|
164
|
-
AND lc.employee_count BETWEEN 100 AND 500
|
|
165
|
-
AND lc.industry_code = 4 -- Software Development
|
|
166
|
-
AND (pos.title ILIKE 'ceo%' OR pos.title ILIKE 'cto%'
|
|
167
|
-
OR pos.title ILIKE 'cfo%' OR pos.title ILIKE 'chief%')
|
|
168
|
-
LIMIT 20;
|
|
169
|
-
```
|
|
170
|
-
|
|
171
|
-
### 🔍 Find People by Skills
|
|
172
|
-
|
|
173
|
-
```sql
|
|
174
|
-
SELECT lp.first_name, lp.last_name, lp.headline,
|
|
175
|
-
lp.skills[1:5] as top_skills, lp.location_name
|
|
176
|
-
FROM linkedin_profile lp
|
|
177
|
-
WHERE 'Python' = ANY(lp.skills)
|
|
178
|
-
AND 'Machine Learning' = ANY(lp.skills)
|
|
179
|
-
LIMIT 20;
|
|
180
|
-
```
|
|
181
|
-
|
|
182
|
-
### 🎓 Find Alumni
|
|
183
|
-
|
|
184
|
-
```sql
|
|
185
|
-
SELECT lp.first_name, lp.last_name, lp.headline,
|
|
186
|
-
edu.school_name, edu.degree
|
|
187
|
-
FROM linkedin_profile lp
|
|
188
|
-
JOIN linkedin_profile_education2 edu ON edu.linkedin_profile_id = lp.id
|
|
189
|
-
WHERE edu.school_name ILIKE '%stanford%'
|
|
190
|
-
LIMIT 20;
|
|
191
|
-
```
|
|
192
|
-
|
|
193
|
-
### 📍 Find People by Location
|
|
194
|
-
|
|
195
|
-
```sql
|
|
196
|
-
SELECT lp.first_name, lp.last_name, lp.headline, lp.location_name
|
|
197
|
-
FROM linkedin_profile lp
|
|
198
|
-
WHERE lp.location_name ILIKE '%san francisco%'
|
|
199
|
-
LIMIT 20;
|
|
200
|
-
```
|
|
201
|
-
|
|
202
|
-
### 💼 Job Postings
|
|
203
|
-
|
|
204
|
-
**Recent jobs at a company with salary info**
|
|
205
|
-
|
|
206
|
-
```sql
|
|
207
|
-
SELECT lj.title, lj.location, lj.salary_range,
|
|
208
|
-
lj.salary_min, lj.salary_max, lj.posted_date, lj.applicants
|
|
209
|
-
FROM linkedin_job lj
|
|
210
|
-
WHERE lj.linkedin_company_id = 2135371 -- Stripe
|
|
211
|
-
ORDER BY lj.posted_date DESC NULLS LAST
|
|
212
|
-
LIMIT 20;
|
|
213
|
-
```
|
|
214
|
-
|
|
215
|
-
### 💰 Companies with Funding
|
|
216
|
-
|
|
217
|
-
```sql
|
|
218
|
-
SELECT lc.company_name, lc.domain, lc.employee_count,
|
|
219
|
-
cf.round_name, cf.round_date, cf.round_amount,
|
|
220
|
-
cf.investor_names[1:3] as top_investors
|
|
221
|
-
FROM linkedin_company lc
|
|
222
|
-
JOIN linkedin_crunchbase_funding cf ON cf.linkedin_company_id = lc.id
|
|
223
|
-
WHERE cf.round_date >= '2024-01-01'
|
|
224
|
-
ORDER BY cf.round_date DESC
|
|
225
|
-
LIMIT 20;
|
|
226
|
-
```
|
|
227
|
-
|
|
228
|
-
---
|
|
229
|
-
|
|
230
|
-
## Performance Guide
|
|
231
|
-
|
|
232
|
-
### ✅ Fast Queries (indexed lookups)
|
|
233
|
-
|
|
234
|
-
| Lookup Method | Speed | Example |
|
|
235
|
-
| ---------------------------------- | ------ | ---------------------------------- |
|
|
236
|
-
| Company by `universal_name` | ~300ms | `WHERE universal_name = 'stripe'` |
|
|
237
|
-
| Company by `domain` | ~500ms | `WHERE domain = 'stripe.com'` |
|
|
238
|
-
| Company by `id` | ~300ms | `WHERE id = 2135371` |
|
|
239
|
-
| Profile/Company by `slug_key64` | ~400ms | `WHERE slug_key64 = key64('meta')` |
|
|
240
|
-
| Positions by `linkedin_company_id` | ~500ms | `WHERE linkedin_company_id = X` |
|
|
241
|
-
| Positions by `linkedin_profile_id` | ~500ms | `WHERE linkedin_profile_id = X` |
|
|
242
|
-
| Jobs by `linkedin_company_id` | ~1s | `WHERE linkedin_company_id = X` |
|
|
243
|
-
| Education by `linkedin_profile_id` | ~500ms | `WHERE linkedin_profile_id = X` |
|
|
244
|
-
|
|
245
|
-
### ⚠️ Slow Queries (full table scans)
|
|
246
|
-
|
|
247
|
-
| Query Type | Speed | Reason |
|
|
248
|
-
| ----------------------------------- | --------- | ------------------------ |
|
|
249
|
-
| Company by `company_name ILIKE` | Timeout | No index on company_name |
|
|
250
|
-
| Profile by `slug` (without key64) | Timeout | No index on raw slug |
|
|
251
|
-
| Profile by `headline ILIKE` | Very slow | Full text scan |
|
|
252
|
-
| Aggregate queries across industries | 60s+ | Large table joins |
|
|
253
|
-
| Profile by `location_name ILIKE` | Slow | No index |
|
|
254
|
-
|
|
255
|
-
### 🔑 Key Functions
|
|
256
|
-
|
|
257
|
-
```sql
|
|
258
|
-
-- Convert slug to indexed key64 for fast lookups
|
|
259
|
-
key64('stripe') -- Returns bigint for index lookup
|
|
260
|
-
|
|
261
|
-
-- Example: Fast profile lookup
|
|
262
|
-
WHERE slug_key64 = key64('satyanadella')
|
|
263
|
-
|
|
264
|
-
-- Example: Fast company lookup
|
|
265
|
-
WHERE slug_key64 = key64('meta')
|
|
266
|
-
```
|
|
267
|
-
|
|
268
|
-
---
|
|
269
|
-
|
|
270
|
-
## Common Gotchas
|
|
271
|
-
|
|
272
|
-
1. **Use `company_name` not `name`** - The column is `company_name` on linkedin_company
|
|
273
|
-
2. **Use `slug_key64` for slug lookups** - Raw `slug` column is not indexed
|
|
274
|
-
3. **Use `linkedin_profile_position3`** - position2 is legacy
|
|
275
|
-
4. **Use `industry_code`** - Not `linkedin_industry_id` on linkedin_company
|
|
276
|
-
5. **NULL `end_date` = current position** - Check for current employees
|
|
277
|
-
6. **Email/phone tables are restricted** - Cannot access contact data directly
|
|
278
|
-
7. **Always use LIMIT** - Tables are massive
|
|
279
|
-
8. **Use `ILIKE` for case-insensitive** - Names vary in casing
|
|
280
|
-
|
|
281
|
-
---
|
|
282
|
-
|
|
283
|
-
## Schema Exploration
|
|
284
|
-
|
|
285
|
-
```sql
|
|
286
|
-
-- List accessible tables
|
|
287
|
-
SELECT table_name
|
|
288
|
-
FROM information_schema.table_privileges
|
|
289
|
-
WHERE grantee = 'jzt2be9botwq' AND privilege_type = 'SELECT'
|
|
290
|
-
ORDER BY table_name;
|
|
291
|
-
|
|
292
|
-
-- Describe a table
|
|
293
|
-
SELECT column_name, data_type, is_nullable
|
|
294
|
-
FROM information_schema.columns
|
|
295
|
-
WHERE table_name = 'linkedin_profile';
|
|
296
|
-
|
|
297
|
-
-- Check table indexes
|
|
298
|
-
SELECT indexname, indexdef
|
|
299
|
-
FROM pg_indexes
|
|
300
|
-
WHERE tablename = 'linkedin_company';
|
|
301
|
-
|
|
302
|
-
-- Check table row counts
|
|
303
|
-
SELECT relname AS table_name, reltuples::bigint AS estimated_rows
|
|
304
|
-
FROM pg_class
|
|
305
|
-
WHERE relkind = 'r' AND relnamespace = 'public'::regnamespace
|
|
306
|
-
ORDER BY reltuples DESC
|
|
307
|
-
LIMIT 25;
|
|
308
|
-
```
|
|
309
|
-
|
|
310
|
-
---
|
|
311
|
-
|
|
312
|
-
## Related Documentation
|
|
313
|
-
|
|
314
|
-
- **[B2B_SCHEMA.md](./B2B_SCHEMA.md)** - Complete schema reference with all columns and indexes
|