orangeslice 1.6.1 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,314 +0,0 @@
1
- # B2B Database Guide
2
-
3
- A comprehensive B2B data enrichment database with LinkedIn profiles, companies, job postings, and funding data.
4
-
5
- ## Database Scale
6
-
7
- | Table | Estimated Rows |
8
- | ----------------------------- | ---------------- |
9
- | `linkedin_profile` | **1.15 billion** |
10
- | `linkedin_profile_position3` | **2.6 billion** |
11
- | `linkedin_job` | **1.48 billion** |
12
- | `linkedin_profile_education2` | **965 million** |
13
- | `linkedin_profile_slug` | **1.14 billion** |
14
- | `person` | **1.32 billion** |
15
-
16
- ---
17
-
18
- ## Access Permissions
19
-
20
- ### ✅ Tables WITH Access (48 tables)
21
-
22
- | Category | Tables |
23
- | ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
24
- | **Core LinkedIn** | `linkedin_profile`, `linkedin_company`, `linkedin_job`, `linkedin_profile_position3`, `linkedin_profile_education2` |
25
- | **Slugs** | `linkedin_profile_slug`, `linkedin_company_slug` |
26
- | **Funding** | `linkedin_crunchbase_funding` |
27
- | **Reference** | `linkedin_industry`, `linkedin_school`, `linkedin_language`, `linkedin_specialty`, `linkedin_work_modality` |
28
- | **Profile Details** | `linkedin_profile_certification`, `linkedin_profile_award`, `linkedin_profile_project`, `linkedin_profile_volunteer_experience`, `linkedin_profile_recommendation2`, `linkedin_profile_test_scores`, `linkedin_profile_language_proficiency` |
29
- | **Company Details** | `linkedin_company_address2`, `linkedin_company_post` |
30
- | **Content** | `linkedin_article`, `linkedin_patent`, `linkedin_project`, `linkedin_publication2` |
31
- | **Jobs** | `job_title`, `job_function`, `job_seniority`, `job_employment_type`, `job_academic_qualification` |
32
- | **Geography** | `country`, `locality`, `naics_code` |
33
- | **Other** | `company`, `company_type`, `person`, `language_proficiency` |
34
-
35
- ### ❌ Tables WITHOUT Access (84 tables)
36
-
37
- | Category | Tables (Permission Denied) |
38
- | --------------------- | ----------------------------------------------------------------- |
39
- | **Contact Data** | `email_address`, `email_address_linkedin_profile`, `phone_number` |
40
- | **Company Reference** | `company_size`, `company_location`, `company_country` |
41
- | **Geography** | `linkedin_geo` |
42
- | **Indeed** | All `indeed_*` tables |
43
- | **Domain/Web** | `domain`, `host`, `domain_traffic_estimate`, `web_tag` |
44
- | **App Store** | `ios_sdk`, `play_store_sdk`, `itunes_*`, `play_store_*` |
45
-
46
- ---
47
-
48
- ## Top Industries
49
-
50
- | ID | Industry | Companies |
51
- | --- | ------------------------------------ | --------- |
52
- | 48 | Construction | 1.96M |
53
- | 44 | Real Estate | 1.80M |
54
- | 96 | IT Services and IT Consulting | 1.79M |
55
- | 27 | Retail | 1.51M |
56
- | 80 | Advertising Services | 1.33M |
57
- | 11 | Business Consulting and Services | 1.32M |
58
- | 4 | **Software Development** | 1.19M |
59
- | 6 | Technology, Information and Internet | 617K |
60
-
61
- ---
62
-
63
- ## GTM Query Examples
64
-
65
- ### 🏢 Company Enrichment
66
-
67
- **Find company by universal_name (FAST ~300ms)**
68
-
69
- ```sql
70
- SELECT id, company_name, domain, website, employee_count,
71
- locality, country_code, description
72
- FROM linkedin_company
73
- WHERE universal_name = 'stripe';
74
- ```
75
-
76
- **Find company by domain (FAST ~500ms)**
77
-
78
- ```sql
79
- SELECT id, company_name, universal_name, employee_count, locality
80
- FROM linkedin_company
81
- WHERE domain = 'openai.com';
82
- ```
83
-
84
- **Full company enrichment with industry**
85
-
86
- ```sql
87
- SELECT lc.company_name, lc.domain, lc.employee_count, lc.locality,
88
- lc.description, li.name as industry_name
89
- FROM linkedin_company lc
90
- LEFT JOIN linkedin_industry li ON li.id = lc.industry_code
91
- WHERE lc.universal_name = 'openai';
92
- ```
93
-
94
- ### 👤 Profile Lookup
95
-
96
- **Find profile by LinkedIn slug (FAST ~400ms - use key64)**
97
-
98
- ```sql
99
- SELECT lp.first_name, lp.last_name, lp.headline, lp.location_name
100
- FROM linkedin_profile lp
101
- JOIN linkedin_profile_slug lps ON lps.linkedin_profile_id = lp.id
102
- WHERE lps.slug_key64 = key64('satyanadella');
103
- ```
104
-
105
- **Find company by LinkedIn slug (FAST ~400ms - use key64)**
106
-
107
- ```sql
108
- SELECT lc.id, lc.company_name, lc.domain, lc.employee_count
109
- FROM linkedin_company lc
110
- JOIN linkedin_company_slug lcs ON lcs.linkedin_company_id = lc.id
111
- WHERE lcs.slug_key64 = key64('meta');
112
- ```
113
-
114
- ### 👥 Find Employees at a Company
115
-
116
- **Current employees at a company**
117
-
118
- ```sql
119
- SELECT lp.first_name, lp.last_name, lp.headline, pos.title
120
- FROM linkedin_profile lp
121
- JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
122
- WHERE pos.linkedin_company_id = 2135371 -- Stripe's ID
123
- AND pos.end_date IS NULL
124
- LIMIT 100;
125
- ```
126
-
127
- **Headcount by role at a company**
128
-
129
- ```sql
130
- SELECT pos.title, COUNT(*) as count
131
- FROM linkedin_profile_position3 pos
132
- WHERE pos.linkedin_company_id = 11130470 -- OpenAI
133
- AND pos.end_date IS NULL
134
- GROUP BY pos.title
135
- ORDER BY count DESC
136
- LIMIT 20;
137
- ```
138
-
139
- ### 🎯 Find Decision Makers
140
-
141
- **VPs of Sales at mid-to-large companies**
142
-
143
- ```sql
144
- SELECT lp.first_name, lp.last_name, lp.headline,
145
- pos.title, lc.company_name, lc.employee_count
146
- FROM linkedin_profile lp
147
- JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
148
- JOIN linkedin_company lc ON lc.id = pos.linkedin_company_id
149
- WHERE pos.title ILIKE '%vp%sales%'
150
- AND pos.end_date IS NULL
151
- AND lc.employee_count > 50
152
- LIMIT 20;
153
- ```
154
-
155
- **C-suite at mid-size software companies (~10 seconds)**
156
-
157
- ```sql
158
- SELECT lp.first_name, lp.last_name, pos.title,
159
- lc.company_name, lc.employee_count
160
- FROM linkedin_profile lp
161
- JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
162
- JOIN linkedin_company lc ON lc.id = pos.linkedin_company_id
163
- WHERE pos.end_date IS NULL
164
- AND lc.employee_count BETWEEN 100 AND 500
165
- AND lc.industry_code = 4 -- Software Development
166
- AND (pos.title ILIKE 'ceo%' OR pos.title ILIKE 'cto%'
167
- OR pos.title ILIKE 'cfo%' OR pos.title ILIKE 'chief%')
168
- LIMIT 20;
169
- ```
170
-
171
- ### 🔍 Find People by Skills
172
-
173
- ```sql
174
- SELECT lp.first_name, lp.last_name, lp.headline,
175
- lp.skills[1:5] as top_skills, lp.location_name
176
- FROM linkedin_profile lp
177
- WHERE 'Python' = ANY(lp.skills)
178
- AND 'Machine Learning' = ANY(lp.skills)
179
- LIMIT 20;
180
- ```
181
-
182
- ### 🎓 Find Alumni
183
-
184
- ```sql
185
- SELECT lp.first_name, lp.last_name, lp.headline,
186
- edu.school_name, edu.degree
187
- FROM linkedin_profile lp
188
- JOIN linkedin_profile_education2 edu ON edu.linkedin_profile_id = lp.id
189
- WHERE edu.school_name ILIKE '%stanford%'
190
- LIMIT 20;
191
- ```
192
-
193
- ### 📍 Find People by Location
194
-
195
- ```sql
196
- SELECT lp.first_name, lp.last_name, lp.headline, lp.location_name
197
- FROM linkedin_profile lp
198
- WHERE lp.location_name ILIKE '%san francisco%'
199
- LIMIT 20;
200
- ```
201
-
202
- ### 💼 Job Postings
203
-
204
- **Recent jobs at a company with salary info**
205
-
206
- ```sql
207
- SELECT lj.title, lj.location, lj.salary_range,
208
- lj.salary_min, lj.salary_max, lj.posted_date, lj.applicants
209
- FROM linkedin_job lj
210
- WHERE lj.linkedin_company_id = 2135371 -- Stripe
211
- ORDER BY lj.posted_date DESC NULLS LAST
212
- LIMIT 20;
213
- ```
214
-
215
- ### 💰 Companies with Funding
216
-
217
- ```sql
218
- SELECT lc.company_name, lc.domain, lc.employee_count,
219
- cf.round_name, cf.round_date, cf.round_amount,
220
- cf.investor_names[1:3] as top_investors
221
- FROM linkedin_company lc
222
- JOIN linkedin_crunchbase_funding cf ON cf.linkedin_company_id = lc.id
223
- WHERE cf.round_date >= '2024-01-01'
224
- ORDER BY cf.round_date DESC
225
- LIMIT 20;
226
- ```
227
-
228
- ---
229
-
230
- ## Performance Guide
231
-
232
- ### ✅ Fast Queries (indexed lookups)
233
-
234
- | Lookup Method | Speed | Example |
235
- | ---------------------------------- | ------ | ---------------------------------- |
236
- | Company by `universal_name` | ~300ms | `WHERE universal_name = 'stripe'` |
237
- | Company by `domain` | ~500ms | `WHERE domain = 'stripe.com'` |
238
- | Company by `id` | ~300ms | `WHERE id = 2135371` |
239
- | Profile/Company by `slug_key64` | ~400ms | `WHERE slug_key64 = key64('meta')` |
240
- | Positions by `linkedin_company_id` | ~500ms | `WHERE linkedin_company_id = X` |
241
- | Positions by `linkedin_profile_id` | ~500ms | `WHERE linkedin_profile_id = X` |
242
- | Jobs by `linkedin_company_id` | ~1s | `WHERE linkedin_company_id = X` |
243
- | Education by `linkedin_profile_id` | ~500ms | `WHERE linkedin_profile_id = X` |
244
-
245
- ### ⚠️ Slow Queries (full table scans)
246
-
247
- | Query Type | Speed | Reason |
248
- | ----------------------------------- | --------- | ------------------------ |
249
- | Company by `company_name ILIKE` | Timeout | No index on company_name |
250
- | Profile by `slug` (without key64) | Timeout | No index on raw slug |
251
- | Profile by `headline ILIKE` | Very slow | Full text scan |
252
- | Aggregate queries across industries | 60s+ | Large table joins |
253
- | Profile by `location_name ILIKE` | Slow | No index |
254
-
255
- ### 🔑 Key Functions
256
-
257
- ```sql
258
- -- Convert slug to indexed key64 for fast lookups
259
- key64('stripe') -- Returns bigint for index lookup
260
-
261
- -- Example: Fast profile lookup
262
- WHERE slug_key64 = key64('satyanadella')
263
-
264
- -- Example: Fast company lookup
265
- WHERE slug_key64 = key64('meta')
266
- ```
267
-
268
- ---
269
-
270
- ## Common Gotchas
271
-
272
- 1. **Use `company_name` not `name`** - The column is `company_name` on linkedin_company
273
- 2. **Use `slug_key64` for slug lookups** - Raw `slug` column is not indexed
274
- 3. **Use `linkedin_profile_position3`** - position2 is legacy
275
- 4. **Use `industry_code`** - Not `linkedin_industry_id` on linkedin_company
276
- 5. **NULL `end_date` = current position** - Check for current employees
277
- 6. **Email/phone tables are restricted** - Cannot access contact data directly
278
- 7. **Always use LIMIT** - Tables are massive
279
- 8. **Use `ILIKE` for case-insensitive** - Names vary in casing
280
-
281
- ---
282
-
283
- ## Schema Exploration
284
-
285
- ```sql
286
- -- List accessible tables
287
- SELECT table_name
288
- FROM information_schema.table_privileges
289
- WHERE grantee = 'jzt2be9botwq' AND privilege_type = 'SELECT'
290
- ORDER BY table_name;
291
-
292
- -- Describe a table
293
- SELECT column_name, data_type, is_nullable
294
- FROM information_schema.columns
295
- WHERE table_name = 'linkedin_profile';
296
-
297
- -- Check table indexes
298
- SELECT indexname, indexdef
299
- FROM pg_indexes
300
- WHERE tablename = 'linkedin_company';
301
-
302
- -- Check table row counts
303
- SELECT relname AS table_name, reltuples::bigint AS estimated_rows
304
- FROM pg_class
305
- WHERE relkind = 'r' AND relnamespace = 'public'::regnamespace
306
- ORDER BY reltuples DESC
307
- LIMIT 25;
308
- ```
309
-
310
- ---
311
-
312
- ## Related Documentation
313
-
314
- - **[B2B_SCHEMA.md](./B2B_SCHEMA.md)** - Complete schema reference with all columns and indexes