orangeslice 1.6.0 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,250 @@
1
+ # Research Strategies
2
+
3
+ Patterns for prospecting, enrichment, and social listening.
4
+
5
+ ---
6
+
7
+ ## Prospecting: Two Approaches
8
+
9
+ ### 1. Direct Query with Filters (Preferred)
10
+
11
+ Use when criteria is directly searchable:
12
+
13
+ - **Google dorking** — `"AI CRM" site:linkedin.com/company`
14
+ - **B2B database** — industry, company size, funding, job titles
15
+
16
+ ```typescript
17
+ // Direct: Companies in fintech with 50-500 employees
18
+ const companies = await orangeslice.b2b.sql(`
19
+ SELECT * FROM linkedin_company
20
+ WHERE industry ILIKE '%fintech%'
21
+ AND employee_count BETWEEN 50 AND 500
22
+ LIMIT 100
23
+ `);
24
+ ```
25
+
26
+ ### 2. Search → Enrich → Qualify
27
+
28
+ Use when criteria can't be searched directly:
29
+
30
+ - "Companies that recently switched CRMs"
31
+ - "Are they actively hiring for this role?"
32
+ - "Do they use [specific tool]?"
33
+
34
+ ```typescript
35
+ // 1. Broad search
36
+ const { results } = await orangeslice.serp.search(
37
+ `"switched to Salesforce" site:linkedin.com/posts`
38
+ );
39
+
40
+ // 2. Enrich each result
41
+ for (const result of results) {
42
+ const companyName = extractCompanyFromPost(result);
43
+ const company = await orangeslice.b2b.sql(`
44
+ SELECT * FROM linkedin_company WHERE company_name ILIKE '%${companyName}%'
45
+ `);
46
+
47
+ // 3. Qualify
48
+ if (company[0]?.employee_count > 100) {
49
+ // Add to qualified list
50
+ }
51
+ }
52
+ ```
53
+
54
+ ---
55
+
56
+ ## Data Enrichment Pattern
57
+
58
+ **Standard: Search → Scrape → Extract**
59
+
60
+ ```typescript
61
+ // 1. Find relevant pages
62
+ const { results } = await orangeslice.serp.search(
63
+ `site:${domain} "practice areas" "medical malpractice"`
64
+ );
65
+
66
+ // 2. Scrape the top result
67
+ const { markdown } = await orangeslice.firecrawl.scrape(results[0].link);
68
+
69
+ // 3. Extract structured data
70
+ const data = await orangeslice.generateObject.generate({
71
+ prompt: `Does this law firm handle medical malpractice cases?\n\n${markdown}`,
72
+ schema: {
73
+ type: "object",
74
+ properties: {
75
+ handles_med_mal: { type: "boolean" },
76
+ practice_areas: { type: "array", items: { type: "string" } }
77
+ }
78
+ }
79
+ });
80
+ ```
81
+
82
+ ### When to Use Each Tool
83
+
84
+ | Use Search → Scrape → Extract | Use `browser.execute` |
85
+ |------------------------------|----------------------|
86
+ | Data spread across unknown pages | Same template across pages |
87
+ | Varied/unknown page structure | Need specific CSS selectors |
88
+ | One-off enrichment | Scraping lists/tables |
89
+
90
+ ---
91
+
92
+ ## Social Listening
93
+
94
+ Find posts mentioning topics, brands, or competitors.
95
+
96
+ ### Finding Posts: Use Dorking
97
+
98
+ ```
99
+ # LinkedIn posts mentioning topic
100
+ "AI sales tools" site:linkedin.com/posts
101
+
102
+ # Twitter/X posts
103
+ "competitor name" site:x.com inurl:status
104
+
105
+ # Reddit discussions
106
+ "product name" site:reddit.com
107
+ ```
108
+
109
+ ### Common Problem: Sellers vs. Complainers
110
+
111
+ Users want people **complaining about** tools. Searches return mostly **people selling** alternatives.
112
+
113
+ **Solution: Filter with verification**
114
+
115
+ ```typescript
116
+ const { results } = await orangeslice.serp.search(
117
+ `"hate Salesforce" OR "frustrated with Salesforce" site:linkedin.com/posts`
118
+ );
119
+
120
+ for (const post of results) {
121
+ // Get author info
122
+ const authorUrl = extractAuthorUrl(post.link);
123
+ const profile = await orangeslice.b2b.sql(`...`);
124
+
125
+ // Filter out salespeople
126
+ if (!profile.headline?.toLowerCase().includes('sales')) {
127
+ // This is likely a real complainer
128
+ }
129
+ }
130
+ ```
131
+
132
+ ---
133
+
134
+ ## Company Research Checklist
135
+
136
+ ```typescript
137
+ async function researchCompany(domain: string) {
138
+ // 1. Basic company info
139
+ const company = await orangeslice.b2b.sql(`
140
+ SELECT * FROM linkedin_company WHERE domain = '${domain}'
141
+ `);
142
+
143
+ // 2. Leadership team
144
+ const leadership = await orangeslice.b2b.sql(`
145
+ SELECT lp.first_name, lp.last_name, pos.title
146
+ FROM linkedin_profile lp
147
+ JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
148
+ WHERE pos.linkedin_company_id = ${company[0].id}
149
+ AND pos.end_date IS NULL
150
+ AND (pos.title ILIKE 'ceo%' OR pos.title ILIKE 'cto%' OR pos.title ILIKE '%founder%')
151
+ LIMIT 10
152
+ `);
153
+
154
+ // 3. Funding history
155
+ const funding = await orangeslice.b2b.sql(`
156
+ SELECT * FROM linkedin_crunchbase_funding
157
+ WHERE linkedin_company_id = ${company[0].id}
158
+ ORDER BY announced_date DESC
159
+ `);
160
+
161
+ // 4. Recent news
162
+ const news = await orangeslice.serp.search(
163
+ `"${company[0].company_name}" funding OR acquisition`,
164
+ { tbs: "qdr:m" }
165
+ );
166
+
167
+ // 5. Website content
168
+ const about = await orangeslice.firecrawl.scrape(`https://${domain}/about`);
169
+
170
+ // 6. Current job openings
171
+ const jobs = await orangeslice.b2b.sql(`
172
+ SELECT title, locality FROM linkedin_job
173
+ WHERE linkedin_company_id = ${company[0].id}
174
+ AND closed_since IS NULL
175
+ LIMIT 20
176
+ `);
177
+
178
+ return { company, leadership, funding, news, about, jobs };
179
+ }
180
+ ```
181
+
182
+ ---
183
+
184
+ ## Person Research Checklist
185
+
186
+ ```typescript
187
+ async function researchPerson(linkedinUrl: string) {
188
+ // Extract profile ID from URL
189
+ const parts = linkedinUrl.split('/in/');
190
+ const linkedinUserId = parts[1]?.split('/')[0];
191
+
192
+ // 1. Profile info
193
+ const profile = await orangeslice.b2b.sql(`
194
+ SELECT * FROM linkedin_profile
195
+ WHERE linkedin_user_id = '${linkedinUserId}'
196
+ `);
197
+
198
+ // 2. Work history
199
+ const positions = await orangeslice.b2b.sql(`
200
+ SELECT pos.*, lc.company_name, lc.domain
201
+ FROM linkedin_profile_position3 pos
202
+ JOIN linkedin_company lc ON lc.id = pos.linkedin_company_id
203
+ WHERE pos.linkedin_profile_id = ${profile[0].id}
204
+ ORDER BY pos.start_date DESC
205
+ `);
206
+
207
+ // 3. Recent activity (posts, mentions)
208
+ const activity = await orangeslice.serp.search(
209
+ `"${profile[0].first_name} ${profile[0].last_name}" site:linkedin.com`,
210
+ { tbs: "qdr:m" }
211
+ );
212
+
213
+ return { profile, positions, activity };
214
+ }
215
+ ```
216
+
217
+ ---
218
+
219
+ ## Batch Processing Pattern
220
+
221
+ For large lists, process in controlled batches:
222
+
223
+ ```typescript
224
+ async function processBatch<T, R>(
225
+ items: T[],
226
+ processor: (item: T) => Promise<R>,
227
+ batchSize = 10
228
+ ): Promise<R[]> {
229
+ const results: R[] = [];
230
+
231
+ for (let i = 0; i < items.length; i += batchSize) {
232
+ const batch = items.slice(i, i + batchSize);
233
+ const batchResults = await Promise.all(batch.map(processor));
234
+ results.push(...batchResults);
235
+
236
+ // Rate limiting is handled by orangeslice, but add delay between batches
237
+ if (i + batchSize < items.length) {
238
+ await new Promise(r => setTimeout(r, 1000));
239
+ }
240
+ }
241
+
242
+ return results;
243
+ }
244
+
245
+ // Usage
246
+ const domains = ["stripe.com", "ramp.com", "brex.com", ...];
247
+ const companies = await processBatch(domains, async (domain) => {
248
+ return orangeslice.b2b.sql(`SELECT * FROM linkedin_company WHERE domain = '${domain}'`);
249
+ });
250
+ ```
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "orangeslice",
3
- "version": "1.6.0",
4
- "description": "Turn any AI agent into a B2B sales research assistant with 1B+ LinkedIn profiles",
3
+ "version": "1.7.0",
4
+ "description": "AI agent toolkit: B2B database, SERP, web scraping, browser automation, structured AI output, Apify actors, geocoding",
5
5
  "main": "dist/index.js",
6
6
  "types": "dist/index.d.ts",
7
7
  "bin": {
@@ -1,255 +0,0 @@
1
- # B2B Cross-Table Query Test Findings
2
-
3
- Comprehensive performance comparison between normalized tables (`linkedin_profile`, `linkedin_company`) and denormalized views (`lkd_profile`, `lkd_company`) for cross-table queries.
4
-
5
- ---
6
-
7
- ## Executive Summary
8
-
9
- | Pattern | Normalized | Denormalized | Winner | Speedup |
10
- | ---------------------------------- | ---------- | ------------ | ------------ | ------- |
11
- | **Company ID lookup → employees** | 48ms | 279ms | Normalized | 5.8x |
12
- | **Company name (org) search** | 274ms | 8,600ms | Normalized | 31x |
13
- | **GIN-indexed org ILIKE** | 430ms | 29,409ms | Normalized | 68x |
14
- | **Title ILIKE (common term)** | 64ms | 313ms | Normalized | 4.9x |
15
- | **updated_at filter** | 4ms | 14ms | Normalized | 3.5x |
16
- | **Company ID direct lookup** | 4ms | 31ms | Normalized | 7.8x |
17
- | **Headline (rare term)** | 2,530ms | 1,258ms | Denormalized | 2x |
18
- | **Skill array search** | 216ms | 169ms | Denormalized | 1.3x |
19
- | **Industry + employee_count** | 742ms | 202ms | Denormalized | 3.7x |
20
- | **Headline + company size (JOIN)** | 20,205ms | 217ms | Denormalized | 93x |
21
- | **Multi-skill + company size** | 28,173ms | 1,281ms | Denormalized | 22x |
22
- | **Skill + company industry** | TIMEOUT | 3,553ms | Denormalized | ∞ |
23
- | **Complex multi-filter + company** | TIMEOUT | 4,947ms | Denormalized | ∞ |
24
- | **AI company + SF location** | TIMEOUT | 11,061ms | Denormalized | ∞ |
25
-
26
- **Key Finding**: When combining profile text filters (headline, skills) with company constraints (employee_count, industry), **denormalized JOINs are 20-90x faster** and often the only option that completes within timeout.
27
-
28
- ---
29
-
30
- ## Critical Pattern: Profile + Company Combined Filters
31
-
32
- The most important discovery: **cross-table queries with text filters perform dramatically better with denormalized tables**.
33
-
34
- ### Normalized Multi-JOIN (Often Fails)
35
-
36
- ```sql
37
- -- ❌ TIMEOUT or 20+ seconds
38
- SELECT lp.id, lp.first_name, lp.headline, lc.company_name
39
- FROM linkedin_profile lp
40
- JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
41
- JOIN linkedin_company lc ON lc.id = pos.linkedin_company_id
42
- WHERE pos.end_date IS NULL
43
- AND lp.headline ILIKE '%engineer%'
44
- AND lc.employee_count > 1000
45
- LIMIT 50
46
- -- Result: 20,205ms
47
- ```
48
-
49
- ### Denormalized JOIN (Fast)
50
-
51
- ```sql
52
- -- ✅ 217ms - 93x faster
53
- SELECT lkd.profile_id, lkd.first_name, lkd.headline, lkdc.name
54
- FROM lkd_profile lkd
55
- JOIN lkd_company lkdc ON lkdc.linkedin_company_id = lkd.linkedin_company_id
56
- WHERE lkd.headline ILIKE '%engineer%'
57
- AND lkdc.employee_count > 1000
58
- LIMIT 50
59
- -- Result: 217ms
60
- ```
61
-
62
- ---
63
-
64
- ## Test Results by Category
65
-
66
- ### A. Company-First Queries
67
-
68
- | Test | Query | Normalized | Denormalized | Winner |
69
- | ---- | ------------------------------- | ---------- | ------------ | ----------------- |
70
- | A1 | Employees at company ID | **48ms** | 279ms | Normalized (5.8x) |
71
- | A2 | Employees by company name (org) | **274ms** | 8,600ms | Normalized (31x) |
72
- | A3 | Engineers at large companies | **96ms** | 234ms | Normalized (2.4x) |
73
-
74
- **Conclusion**: For company-first queries, normalized tables win due to indexed lookups.
75
-
76
- ### B. Profile-First Queries
77
-
78
- | Test | Query | Normalized | Denormalized | Winner |
79
- | ---- | -------------------------- | ---------- | ------------ | ------------------- |
80
- | B1 | Python developers | 216ms | **169ms** | Denormalized (1.3x) |
81
- | B2 | US Data Scientists | 644ms | **557ms** | Denormalized (1.2x) |
82
- | B3 | Senior engineers + company | 4,535ms | **196ms** | Denormalized (23x) |
83
-
84
- **Conclusion**: Simple profile queries are similar; profile + company queries favor denormalized.
85
-
86
- ### C. Complex Prospecting Queries
87
-
88
- | Test | Query | Normalized | Denormalized | Winner |
89
- | ---- | ----------------------------------------- | ----------- | ------------ | ----------------- |
90
- | C1 | Decision makers at funded startups | **1,198ms** | 3,124ms | Normalized (2.6x) |
91
- | C2 | AI company employees in SF | TIMEOUT | **11,061ms** | Denormalized (∞) |
92
- | C3 | Hybrid (normalized profile + lkd_company) | 9,631ms | - | - |
93
-
94
- **Conclusion**: When funding table is used (indexed JOIN), normalized wins. When text filters span tables, denormalized wins.
95
-
96
- ### D. Company Lookups
97
-
98
- | Test | Query | Normalized | Denormalized | Winner |
99
- | ---- | -------------------------- | ---------- | ------------ | ------------------- |
100
- | D1 | Company by ID | **4ms** | 31ms | Normalized (7.8x) |
101
- | D2 | Industry + employee filter | 742ms | **202ms** | Denormalized (3.7x) |
102
-
103
- ### E. Edge Cases
104
-
105
- | Test | Query | Normalized | Denormalized | Winner |
106
- | ---- | --------------------- | ---------- | ------------ | ------------------- |
107
- | E1 | Headline (blockchain) | 713ms | **384ms** | Denormalized (1.9x) |
108
- | E2 | Company description | 144ms | 152ms | Tie |
109
-
110
- ### F. Verification Tests
111
-
112
- | Test | Query | Normalized | Denormalized | Winner |
113
- | ---- | ---------------------------- | ---------- | ------------ | ------------------- |
114
- | F1 | Multi-skill + company size | 28,173ms | **1,281ms** | Denormalized (22x) |
115
- | F2 | Country + org (GIN) | **990ms** | 4,594ms | Normalized (4.6x) |
116
- | F3 | Title regex + company filter | 434ms | **227ms** | Denormalized (1.9x) |
117
-
118
- ### G. Index Pattern Tests
119
-
120
- | Test | Query | Normalized | Denormalized | Winner |
121
- | ---- | ------------------------- | ---------- | ------------ | ----------------- |
122
- | G1 | org ILIKE (GIN indexed) | **430ms** | 29,409ms | Normalized (68x) |
123
- | G2 | headline ILIKE (no index) | 2,530ms | **1,258ms** | Denormalized (2x) |
124
- | G3 | title ILIKE | **64ms** | 313ms | Normalized (4.9x) |
125
- | G4 | updated_at filter | **4ms** | 14ms | Normalized (3.5x) |
126
-
127
- ### H. Cross-Table JOIN Patterns
128
-
129
- | Test | Query | Normalized | Denormalized | Winner |
130
- | ---- | ------------------------- | ---------- | ------------ | ------------------ |
131
- | H1 | Headline + employee_count | 20,205ms | **217ms** | Denormalized (93x) |
132
- | H2 | Skill + company industry | TIMEOUT | **3,553ms** | Denormalized (∞) |
133
- | H3 | Multi-filter + company | TIMEOUT | **4,947ms** | Denormalized (∞) |
134
-
135
- ---
136
-
137
- ## Decision Rules for Cross-Table Queries
138
-
139
- ### Use Normalized (`linkedin_profile` + `linkedin_company` JOINs) When:
140
-
141
- 1. **Company-first lookup** - Start with company ID, get employees
142
- 2. **GIN-indexed field** - Searching `linkedin_profile.org` (company name)
143
- 3. **Indexed lookups** - `updated_at`, company ID, profile ID
144
- 4. **Title field search** - `linkedin_profile.title` is faster
145
- 5. **Indexed JOIN tables** - `linkedin_crunchbase_funding`, `linkedin_profile_position3` by company
146
-
147
- ### Use Denormalized (`lkd_profile` JOIN `lkd_company`) When:
148
-
149
- 1. **Headline + company filter** - 93x faster
150
- 2. **Skill + company constraint** - Normalized times out
151
- 3. **Multi-filter combinations** - 22x faster
152
- 4. **Industry + employee_count** - 3.7x faster
153
- 5. **Text filter spanning profile + company** - Often only option
154
-
155
- ### Never Use:
156
-
157
- 1. `lkd_profile.company_name` ILIKE - Use `linkedin_profile.org` (68x faster)
158
- 2. Normalized multi-JOIN with headline filter - Will timeout or be 20s+
159
-
160
- ---
161
-
162
- ## Recommended Query Patterns
163
-
164
- ### Pattern 1: Find Employees at Company by Name
165
-
166
- ```sql
167
- -- ✅ BEST: Use GIN-indexed org field
168
- SELECT id, first_name, title, headline, org
169
- FROM linkedin_profile
170
- WHERE org ILIKE '%Google%'
171
- LIMIT 50
172
- -- Result: 274ms
173
- ```
174
-
175
- ### Pattern 2: Find Engineers at Large Companies
176
-
177
- ```sql
178
- -- ✅ BEST: Denormalized JOIN (93x faster)
179
- SELECT lkd.profile_id, lkd.first_name, lkd.headline, lkdc.name, lkdc.employee_count
180
- FROM lkd_profile lkd
181
- JOIN lkd_company lkdc ON lkdc.linkedin_company_id = lkd.linkedin_company_id
182
- WHERE lkd.headline ILIKE '%engineer%'
183
- AND lkdc.employee_count > 1000
184
- LIMIT 50
185
- -- Result: 217ms
186
- ```
187
-
188
- ### Pattern 3: Find People with Skills at Specific Company Types
189
-
190
- ```sql
191
- -- ✅ BEST: Denormalized (normalized times out)
192
- SELECT lkd.profile_id, lkd.first_name, lkd.headline, lkdc.name
193
- FROM lkd_profile lkd
194
- JOIN lkd_company lkdc ON lkdc.linkedin_company_id = lkd.linkedin_company_id
195
- WHERE 'Python' = ANY(lkd.skills)
196
- AND 'SQL' = ANY(lkd.skills)
197
- AND lkdc.employee_count BETWEEN 100 AND 5000
198
- LIMIT 50
199
- -- Result: 1,281ms (normalized: 28,173ms)
200
- ```
201
-
202
- ### Pattern 4: Prospecting Query (Profile Criteria + Company Criteria)
203
-
204
- ```sql
205
- -- ✅ BEST: Denormalized for multi-filter
206
- SELECT lkd.profile_id, lkd.first_name, lkd.title, lkdc.name, lkdc.employee_count
207
- FROM lkd_profile lkd
208
- JOIN lkd_company lkdc ON lkdc.linkedin_company_id = lkd.linkedin_company_id
209
- WHERE lkd.title ~* '(manager|director|lead)'
210
- AND lkdc.employee_count BETWEEN 100 AND 1000
211
- LIMIT 50
212
- -- Result: 227ms (normalized: 434ms)
213
- ```
214
-
215
- ### Pattern 5: Decision Makers at Funded Startups
216
-
217
- ```sql
218
- -- ✅ BEST: Normalized when using indexed funding table
219
- SELECT DISTINCT lp.id, lp.first_name, lp.title, lc.company_name
220
- FROM linkedin_profile lp
221
- JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
222
- JOIN linkedin_company lc ON lc.id = pos.linkedin_company_id
223
- JOIN linkedin_crunchbase_funding cf ON cf.linkedin_company_id = lc.id
224
- WHERE pos.end_date IS NULL
225
- AND lp.title ~* '(CEO|CTO|VP|Director|Head)'
226
- AND lc.employee_count BETWEEN 10 AND 500
227
- LIMIT 50
228
- -- Result: 1,198ms
229
- ```
230
-
231
- ---
232
-
233
- ## Summary: The Cross-Table Golden Rules
234
-
235
- 1. **Company name search** → Always use `linkedin_profile.org` (GIN indexed, 68x faster)
236
- 2. **Headline/skill + company constraint** → Always use denormalized JOIN (20-93x faster, normalized often times out)
237
- 3. **Company-first lookups** → Use normalized (5-8x faster)
238
- 4. **Indexed table JOINs (funding, positions)** → Normalized is fine
239
- 5. **Multi-filter profile + company** → Denormalized is the only option that works
240
-
241
- ### Quick Decision:
242
-
243
- ```
244
- Need to search by company name?
245
- └─ YES → Use linkedin_profile.org
246
-
247
- Need profile text filter (headline/skills) + company constraint?
248
- └─ YES → Use lkd_profile JOIN lkd_company
249
-
250
- Need company ID lookup or indexed JOIN?
251
- └─ YES → Use normalized tables
252
-
253
- Default for prospecting queries:
254
- └─ Use lkd_profile JOIN lkd_company
255
- ```