orangeslice 2.1.0 → 2.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +10 -0
- package/dist/cli.js +34 -2
- package/dist/crunchbase.d.ts +10 -0
- package/dist/crunchbase.js +13 -0
- package/dist/index.d.ts +6 -0
- package/dist/index.js +7 -1
- package/docs/integrations/gmail/index.md +1 -1
- package/docs/integrations/gmail/sendEmail.md +13 -13
- package/docs/prospecting/index.md +24 -16
- package/docs/services/company/linkedin/search.md +32 -0
- package/docs/services/crunchbase/search.md +337 -0
- package/docs/services/index.md +1 -1
- package/docs/services/person/linkedin/search.md +32 -0
- package/package.json +1 -1
- package/docs/providers/predictleads/openapi.json +0 -13209
- package/docs/services/healthcare/npi.md +0 -190
|
@@ -0,0 +1,337 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Search Crunchbase with SQL
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# Crunchbase Search
|
|
6
|
+
|
|
7
|
+
Run SQL against `public.crunchbase_scraper_lean` for startup/company prospecting.
|
|
8
|
+
|
|
9
|
+
```typescript
|
|
10
|
+
const rows = await services.crunchbase.search({
|
|
11
|
+
sql: `
|
|
12
|
+
SELECT name, website_url, linkedin_url
|
|
13
|
+
FROM public.crunchbase_scraper_lean
|
|
14
|
+
WHERE operating_status = 'active'
|
|
15
|
+
LIMIT 25
|
|
16
|
+
`
|
|
17
|
+
});
|
|
18
|
+
|
|
19
|
+
// rows: Record<string, unknown>[]
|
|
20
|
+
return rows;
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## Contract (Hard Rules)
|
|
24
|
+
|
|
25
|
+
- Query **only** `public.crunchbase_scraper_lean`.
|
|
26
|
+
- **Only one statement** is allowed.
|
|
27
|
+
- **Only SELECT** queries are allowed (`WITH ... SELECT` is fine).
|
|
28
|
+
- Always include `LIMIT` (recommended `<= 100`).
|
|
29
|
+
- This is an external service path, not `ctx.sql()`.
|
|
30
|
+
- Credits are 1 credit per returned row (reserve estimate is derived from `LIMIT`).
|
|
31
|
+
|
|
32
|
+
## Return Type
|
|
33
|
+
|
|
34
|
+
`services.crunchbase.search()` returns rows directly:
|
|
35
|
+
|
|
36
|
+
```typescript
|
|
37
|
+
(Record < string, unknown > []);
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
No `{ rows, count }` envelope.
|
|
41
|
+
|
|
42
|
+
```typescript
|
|
43
|
+
const rows = await services.crunchbase.search({ sql: "SELECT name FROM public.crunchbase_scraper_lean LIMIT 10" });
|
|
44
|
+
const count = rows.length;
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
## Live Schema (Verified)
|
|
48
|
+
|
|
49
|
+
Source of truth: live DB introspection of `public.crunchbase_scraper_lean`.
|
|
50
|
+
|
|
51
|
+
| Column | Type | Nullable |
|
|
52
|
+
| ---------------------------- | ------------- | -------- |
|
|
53
|
+
| `id` | `bigint` | no |
|
|
54
|
+
| `uuid` | `text` | yes |
|
|
55
|
+
| `name` | `text` | yes |
|
|
56
|
+
| `link` | `text` | yes |
|
|
57
|
+
| `type` | `text` | yes |
|
|
58
|
+
| `operating_status` | `text` | yes |
|
|
59
|
+
| `company_type` | `text` | yes |
|
|
60
|
+
| `short_description` | `text` | yes |
|
|
61
|
+
| `description` | `text` | yes |
|
|
62
|
+
| `website_url` | `text` | yes |
|
|
63
|
+
| `linkedin_url` | `text` | yes |
|
|
64
|
+
| `twitter_url` | `text` | yes |
|
|
65
|
+
| `facebook_url` | `text` | yes |
|
|
66
|
+
| `contact_email` | `text` | yes |
|
|
67
|
+
| `phone_number` | `text` | yes |
|
|
68
|
+
| `hq_postal_code` | `text` | yes |
|
|
69
|
+
| `primary_category` | `text` | yes |
|
|
70
|
+
| `categories` | `jsonb` | no |
|
|
71
|
+
| `category_groups` | `jsonb` | no |
|
|
72
|
+
| `location_identifiers` | `jsonb` | no |
|
|
73
|
+
| `location_group_identifiers` | `jsonb` | no |
|
|
74
|
+
| `num_employees_enum` | `integer` | yes |
|
|
75
|
+
| `revenue_range` | `text` | yes |
|
|
76
|
+
| `funding_stage` | `text` | yes |
|
|
77
|
+
| `funding_total_usd` | `numeric` | yes |
|
|
78
|
+
| `last_funding_total_usd` | `numeric` | yes |
|
|
79
|
+
| `last_funding_type` | `text` | yes |
|
|
80
|
+
| `last_funding_date` | `date` | yes |
|
|
81
|
+
| `num_funding_rounds` | `integer` | yes |
|
|
82
|
+
| `num_investors` | `integer` | yes |
|
|
83
|
+
| `num_lead_investors` | `integer` | yes |
|
|
84
|
+
| `rank_org_company` | `integer` | yes |
|
|
85
|
+
| `rank_org` | `integer` | yes |
|
|
86
|
+
| `rank_delta_d7` | `integer` | yes |
|
|
87
|
+
| `rank_delta_d30` | `integer` | yes |
|
|
88
|
+
| `rank_delta_d90` | `integer` | yes |
|
|
89
|
+
| `growth_score_tier` | `text` | yes |
|
|
90
|
+
| `heat_score_tier` | `text` | yes |
|
|
91
|
+
| `ipo_status` | `text` | yes |
|
|
92
|
+
| `went_public_on` | `date` | yes |
|
|
93
|
+
| `imported_at` | `timestamptz` | no |
|
|
94
|
+
|
|
95
|
+
## Enum Catalog (Verified Distinct Values)
|
|
96
|
+
|
|
97
|
+
These are observed live values, in production data.
|
|
98
|
+
|
|
99
|
+
### `operating_status`
|
|
100
|
+
|
|
101
|
+
- `active`
|
|
102
|
+
- `closed`
|
|
103
|
+
|
|
104
|
+
### `company_type`
|
|
105
|
+
|
|
106
|
+
- `for_profit`
|
|
107
|
+
- `non_profit`
|
|
108
|
+
|
|
109
|
+
### `type`
|
|
110
|
+
|
|
111
|
+
- `organization`
|
|
112
|
+
|
|
113
|
+
### `funding_stage`
|
|
114
|
+
|
|
115
|
+
- `seed`
|
|
116
|
+
- `early_stage_venture`
|
|
117
|
+
- `m_and_a`
|
|
118
|
+
- `late_stage_venture`
|
|
119
|
+
- `ipo`
|
|
120
|
+
|
|
121
|
+
### `last_funding_type`
|
|
122
|
+
|
|
123
|
+
- `seed`
|
|
124
|
+
- `series_a`
|
|
125
|
+
- `series_b`
|
|
126
|
+
- `series_c`
|
|
127
|
+
|
|
128
|
+
### `revenue_range`
|
|
129
|
+
|
|
130
|
+
- `r_00000000`
|
|
131
|
+
- `r_00001000`
|
|
132
|
+
- `r_00010000`
|
|
133
|
+
- `r_00050000`
|
|
134
|
+
- `r_00100000`
|
|
135
|
+
- `r_00500000`
|
|
136
|
+
- `r_01000000`
|
|
137
|
+
- `r_10000000`
|
|
138
|
+
|
|
139
|
+
### `growth_score_tier`
|
|
140
|
+
|
|
141
|
+
- `c100_high`
|
|
142
|
+
- `c200_medium`
|
|
143
|
+
- `c300_low`
|
|
144
|
+
|
|
145
|
+
### `heat_score_tier`
|
|
146
|
+
|
|
147
|
+
- `c100_high`
|
|
148
|
+
- `c200_medium`
|
|
149
|
+
- `c300_low`
|
|
150
|
+
|
|
151
|
+
### `ipo_status`
|
|
152
|
+
|
|
153
|
+
- `private`
|
|
154
|
+
- `public`
|
|
155
|
+
- `delisted`
|
|
156
|
+
|
|
157
|
+
### `num_employees_enum`
|
|
158
|
+
|
|
159
|
+
Column exists, but currently sparse/null in this dataset.
|
|
160
|
+
|
|
161
|
+
## JSONB Array Fields
|
|
162
|
+
|
|
163
|
+
`categories`, `category_groups`, `location_identifiers`, and `location_group_identifiers` are `jsonb` arrays.
|
|
164
|
+
|
|
165
|
+
Do **not** treat them as `text[]` with `&& ARRAY[...]::text[]`.
|
|
166
|
+
Use `jsonb_array_elements_text(...)` with `EXISTS`, for example:
|
|
167
|
+
|
|
168
|
+
```sql
|
|
169
|
+
AND EXISTS (
|
|
170
|
+
SELECT 1
|
|
171
|
+
FROM jsonb_array_elements_text(categories) AS c(category)
|
|
172
|
+
WHERE category IN ('Health Care', 'Biotechnology')
|
|
173
|
+
)
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
## Recommended Query Patterns
|
|
177
|
+
|
|
178
|
+
| Pattern | Why |
|
|
179
|
+
| ------------------------------------------------------- | ---------------------------------- |
|
|
180
|
+
| Equality / `IN` filters on enum columns | Fast and stable |
|
|
181
|
+
| Date windows on `last_funding_date` | Strong recency control |
|
|
182
|
+
| Numeric ranges on `funding_total_usd` | Good segmentation |
|
|
183
|
+
| `EXISTS + jsonb_array_elements_text` for tags/locations | Works with current schema |
|
|
184
|
+
| Explicit narrow column lists | Lower payload and faster execution |
|
|
185
|
+
|
|
186
|
+
## Banned / Avoided Patterns
|
|
187
|
+
|
|
188
|
+
| Pattern | Why | Better Alternative |
|
|
189
|
+
| ---------------------------------------------------------------------------- | ----------------------------------- | --------------------------------------------------- |
|
|
190
|
+
| Missing `LIMIT` | Unbounded scans + excessive credits | Always add `LIMIT` |
|
|
191
|
+
| `SELECT *` for production pulls | Larger payload and cost | Select only needed columns |
|
|
192
|
+
| Leading-wildcard scans on long text (`ILIKE '%term%'`) across broad dataset | Expensive text scans | Use enum/date/range filters first, then narrow text |
|
|
193
|
+
| Heavy aggregations (`COUNT(*)`, `DISTINCT`, wide `GROUP BY`) on large slices | Slow and expensive | Pull scoped rows, aggregate in code |
|
|
194
|
+
| Unscoped global sorts on large sets | Expensive sort operations | Filter first, sort smaller result sets |
|
|
195
|
+
| Multi-table joins for routine prospecting | More planner risk and latency | Stay on lean table only |
|
|
196
|
+
|
|
197
|
+
## Canonical Prospecting Queries
|
|
198
|
+
|
|
199
|
+
### 1) US early-stage SaaS/AI, currently active
|
|
200
|
+
|
|
201
|
+
```sql
|
|
202
|
+
SELECT
|
|
203
|
+
name,
|
|
204
|
+
website_url,
|
|
205
|
+
linkedin_url,
|
|
206
|
+
funding_stage,
|
|
207
|
+
num_employees_enum,
|
|
208
|
+
last_funding_date
|
|
209
|
+
FROM public.crunchbase_scraper_lean
|
|
210
|
+
WHERE operating_status = 'active'
|
|
211
|
+
AND funding_stage IN ('seed', 'early_stage_venture')
|
|
212
|
+
AND EXISTS (
|
|
213
|
+
SELECT 1
|
|
214
|
+
FROM jsonb_array_elements_text(categories) AS c(category)
|
|
215
|
+
WHERE category IN ('SaaS', 'Artificial Intelligence (AI)')
|
|
216
|
+
)
|
|
217
|
+
AND EXISTS (
|
|
218
|
+
SELECT 1
|
|
219
|
+
FROM jsonb_array_elements_text(location_identifiers) AS l(location)
|
|
220
|
+
WHERE location = 'United States'
|
|
221
|
+
)
|
|
222
|
+
LIMIT 100;
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
### 2) Recently funded (last 12 months)
|
|
226
|
+
|
|
227
|
+
```sql
|
|
228
|
+
SELECT
|
|
229
|
+
name,
|
|
230
|
+
website_url,
|
|
231
|
+
last_funding_type,
|
|
232
|
+
last_funding_date,
|
|
233
|
+
last_funding_total_usd,
|
|
234
|
+
funding_total_usd
|
|
235
|
+
FROM public.crunchbase_scraper_lean
|
|
236
|
+
WHERE operating_status = 'active'
|
|
237
|
+
AND last_funding_date >= CURRENT_DATE - INTERVAL '12 months'
|
|
238
|
+
AND last_funding_type IN ('seed', 'series_a', 'series_b')
|
|
239
|
+
ORDER BY last_funding_date DESC NULLS LAST
|
|
240
|
+
LIMIT 100;
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
### 3) Bay Area fintech companies with meaningful funding
|
|
244
|
+
|
|
245
|
+
```sql
|
|
246
|
+
SELECT
|
|
247
|
+
name,
|
|
248
|
+
website_url,
|
|
249
|
+
funding_stage,
|
|
250
|
+
funding_total_usd,
|
|
251
|
+
num_employees_enum
|
|
252
|
+
FROM public.crunchbase_scraper_lean
|
|
253
|
+
WHERE operating_status = 'active'
|
|
254
|
+
AND EXISTS (
|
|
255
|
+
SELECT 1
|
|
256
|
+
FROM jsonb_array_elements_text(categories) AS c(category)
|
|
257
|
+
WHERE category IN ('FinTech', 'Financial Services')
|
|
258
|
+
)
|
|
259
|
+
AND EXISTS (
|
|
260
|
+
SELECT 1
|
|
261
|
+
FROM jsonb_array_elements_text(location_group_identifiers) AS g(location_group)
|
|
262
|
+
WHERE location_group = 'San Francisco Bay Area'
|
|
263
|
+
)
|
|
264
|
+
AND funding_total_usd >= 5000000
|
|
265
|
+
LIMIT 75;
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
### 4) Non-profits with health focus
|
|
269
|
+
|
|
270
|
+
```sql
|
|
271
|
+
SELECT
|
|
272
|
+
name,
|
|
273
|
+
website_url,
|
|
274
|
+
company_type,
|
|
275
|
+
categories,
|
|
276
|
+
location_identifiers
|
|
277
|
+
FROM public.crunchbase_scraper_lean
|
|
278
|
+
WHERE company_type = 'non_profit'
|
|
279
|
+
AND EXISTS (
|
|
280
|
+
SELECT 1
|
|
281
|
+
FROM jsonb_array_elements_text(categories) AS c(category)
|
|
282
|
+
WHERE category ILIKE ANY (ARRAY['%health%', '%medical%', '%biotech%', '%pharma%', '%telemedicine%'])
|
|
283
|
+
)
|
|
284
|
+
LIMIT 100;
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
### 5) Healthtech seed to series B (safe column set)
|
|
288
|
+
|
|
289
|
+
```sql
|
|
290
|
+
SELECT
|
|
291
|
+
name,
|
|
292
|
+
website_url,
|
|
293
|
+
linkedin_url,
|
|
294
|
+
short_description,
|
|
295
|
+
funding_stage,
|
|
296
|
+
last_funding_type,
|
|
297
|
+
last_funding_date,
|
|
298
|
+
funding_total_usd,
|
|
299
|
+
num_employees_enum,
|
|
300
|
+
categories,
|
|
301
|
+
location_identifiers,
|
|
302
|
+
num_investors,
|
|
303
|
+
num_funding_rounds
|
|
304
|
+
FROM public.crunchbase_scraper_lean
|
|
305
|
+
WHERE operating_status = 'active'
|
|
306
|
+
AND last_funding_type IN ('seed', 'series_a', 'series_b')
|
|
307
|
+
AND EXISTS (
|
|
308
|
+
SELECT 1
|
|
309
|
+
FROM jsonb_array_elements_text(categories) AS c(category)
|
|
310
|
+
WHERE category ILIKE ANY (ARRAY['%health%', '%medical%', '%biotech%', '%pharma%', '%telemedicine%'])
|
|
311
|
+
)
|
|
312
|
+
ORDER BY last_funding_date DESC NULLS LAST
|
|
313
|
+
LIMIT 100;
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
## Usage Pattern (Spreadsheet Code)
|
|
317
|
+
|
|
318
|
+
```typescript
|
|
319
|
+
const rows = await services.crunchbase.search({
|
|
320
|
+
sql: `
|
|
321
|
+
SELECT name, website_url, linkedin_url
|
|
322
|
+
FROM public.crunchbase_scraper_lean
|
|
323
|
+
WHERE operating_status = 'active'
|
|
324
|
+
LIMIT 20
|
|
325
|
+
`
|
|
326
|
+
});
|
|
327
|
+
|
|
328
|
+
// rows is already an array of objects
|
|
329
|
+
return rows;
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
## Troubleshooting
|
|
333
|
+
|
|
334
|
+
- `column "...\" does not exist` -> you are using an old/nonexistent column name; check "Known Bad Column Names".
|
|
335
|
+
- `only public.crunchbase_scraper_lean is allowed` -> query references a disallowed table.
|
|
336
|
+
- `only SELECT queries are allowed` -> remove `INSERT/UPDATE/DELETE`, keep read-only SQL.
|
|
337
|
+
- Empty results with no error -> usually value casing mismatch (use lowercase enum values like `active`, `series_a`).
|
package/docs/services/index.md
CHANGED
|
@@ -2,12 +2,12 @@
|
|
|
2
2
|
- **apify**: Run any of 10,000+ Apify actors for web scraping, social media, e-commerce, and more.
|
|
3
3
|
- **browser**: Kernel browser automation - spin up cloud browsers, execute Playwright code, take screenshots. **Use this for scraping structured lists of repeated data** (e.g., product listings, search results, table rows) where you know the DOM structure. Also ideal for **intercepting network requests** to discover underlying APIs, then paginate those APIs directly in your code (faster & cheaper than clicking through pages). Perfect for JS-heavy sites that don't work with simple HTTP scraping.
|
|
4
4
|
- **company**: company data (getting employees at the company, getting company data, getting open jobs).
|
|
5
|
+
- **crunchbase**: SQL search over the lean Crunchbase company table (`public.crunchbase_scraper_lean`) for startup prospecting.
|
|
5
6
|
- **person**: finding a persons linkedin url, enriching it from linkedin, contact info, and searching for specific people / groups on linkedin
|
|
6
7
|
- **geo**: parsing address
|
|
7
8
|
- **googleMaps**: search businesses via Google Maps.
|
|
8
9
|
- **email**: send transactional notification emails through Orange Slice's managed sender.
|
|
9
10
|
- **scrape**: website scraper, sitemap scraper
|
|
10
11
|
- **web**: SERP
|
|
11
|
-
- **healthcare**: Query the NPI (National Provider Identifier) database for healthcare organizations by specialty, location, or name. Contains 1.8M+ providers.
|
|
12
12
|
- **predictLeads**: company intelligence datasets (financing events, technologies, products, job openings, news, and related company data).
|
|
13
13
|
- **guides**: agent notes & operational docs (see [Error Handling Cheatsheet](../error-handling-cheatsheet.md))
|
|
@@ -207,6 +207,38 @@ JOIN linkedin_company lc ON ...
|
|
|
207
207
|
- **Use `lp` alias** for person tables
|
|
208
208
|
- **Default to US**: `lp.location_country_code = 'US'`
|
|
209
209
|
|
|
210
|
+
## Return Type
|
|
211
|
+
|
|
212
|
+
`services.person.linkedin.search()` returns an object envelope:
|
|
213
|
+
|
|
214
|
+
```typescript
|
|
215
|
+
{
|
|
216
|
+
rows: (Record < string, unknown > []);
|
|
217
|
+
count: number;
|
|
218
|
+
}
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
- `rows`: Result rows from your SQL query, with exactly the columns you selected.
|
|
222
|
+
- `count`: Number of rows returned in `rows`.
|
|
223
|
+
|
|
224
|
+
Example:
|
|
225
|
+
|
|
226
|
+
```typescript
|
|
227
|
+
const searchResult = await services.person.linkedin.search({
|
|
228
|
+
sql: `
|
|
229
|
+
SELECT
|
|
230
|
+
lp.first_name,
|
|
231
|
+
lp.last_name,
|
|
232
|
+
lp.public_profile_url AS lp_linkedin_url
|
|
233
|
+
FROM linkedin_profile lp
|
|
234
|
+
WHERE lp.location_country_code = 'US'
|
|
235
|
+
LIMIT 10
|
|
236
|
+
`
|
|
237
|
+
});
|
|
238
|
+
|
|
239
|
+
return searchResult.rows; // Most spreadsheet snippets should return rows
|
|
240
|
+
```
|
|
241
|
+
|
|
210
242
|
---
|
|
211
243
|
|
|
212
244
|
## Table Aliases
|