orangeslice 2.1.4 → 2.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +3 -1
- package/dist/careers.d.ts +47 -0
- package/dist/careers.js +11 -0
- package/dist/cli.js +0 -0
- package/dist/expansion.js +21 -10
- package/dist/index.d.ts +14 -0
- package/dist/index.js +20 -2
- package/dist/ocean.d.ts +166 -0
- package/dist/ocean.js +23 -0
- package/docs/data-enrichement/index.md +10 -2
- package/docs/integrations/gmail/createDraft.md +54 -0
- package/docs/integrations/gmail/fetchEmails.md +50 -0
- package/docs/integrations/gmail/fetchMessageByMessageId.md +36 -0
- package/docs/integrations/gmail/fetchMessageByThreadId.md +37 -0
- package/docs/integrations/gmail/getProfile.md +37 -0
- package/docs/integrations/gmail/index.md +19 -2
- package/docs/integrations/gmail/listLabels.md +34 -0
- package/docs/integrations/gmail/replyToThread.md +51 -0
- package/docs/integrations/hubspot/createWebhookFlow.md +137 -0
- package/docs/integrations/hubspot/index.md +8 -2
- package/docs/integrations/hubspot/updateFlow.md +13 -14
- package/docs/integrations/hubspot/updateWebhookFlow.md +52 -0
- package/docs/integrations/index.md +64 -2
- package/docs/integrations/salesforce/index.md +4 -0
- package/docs/integrations/salesforce/request.md +58 -0
- package/docs/services/ai/generateObject.ts +7 -3
- package/docs/services/ai/generateText.ts +7 -3
- package/docs/services/builtWith/index.md +2 -2
- package/docs/services/company/findCareersPage.md +137 -0
- package/docs/services/company/findCareersPage.ts +37 -0
- package/docs/services/company/linkedin/enrich.md +47 -2
- package/docs/services/company/scrapeCareersPage.md +150 -0
- package/docs/services/index.md +1 -1
- package/docs/services/person/linkedin/findUrl.md +2 -2
- package/docs/services/web/search.md +29 -14
- package/docs/triggers-runtime.md +26 -5
- package/package.json +1 -1
|
@@ -0,0 +1,137 @@
|
|
|
1
|
+
# Find Company Careers Page
|
|
2
|
+
|
|
3
|
+
Resolve a company's official careers page and, when possible, return the underlying ATS jobs page instead.
|
|
4
|
+
|
|
5
|
+
This is best when you have a company website or a specific company page and want the canonical place to browse jobs.
|
|
6
|
+
|
|
7
|
+
## Input Parameters
|
|
8
|
+
|
|
9
|
+
Provide **one** of:
|
|
10
|
+
|
|
11
|
+
| Parameter | Type | Required | Description |
|
|
12
|
+
| --------- | -------- | -------- | ------------------------------------------------------------------ |
|
|
13
|
+
| `website` | `string` | No | Company website or page URL, e.g. `stripe.com` or `https://ro.co/` |
|
|
14
|
+
| `url` | `string` | No | Alias for `website` |
|
|
15
|
+
|
|
16
|
+
**Optional:**
|
|
17
|
+
|
|
18
|
+
| Parameter | Type | Required | Description |
|
|
19
|
+
| --------- | -------- | -------- | ------------------------------------ |
|
|
20
|
+
| `timeout` | `string` | No | Batch timeout override, e.g. `"30m"` |
|
|
21
|
+
|
|
22
|
+
## Output
|
|
23
|
+
|
|
24
|
+
```typescript
|
|
25
|
+
{
|
|
26
|
+
inputUrl: string;
|
|
27
|
+
normalizedWebsiteUrl: string;
|
|
28
|
+
careerPageUrl: string | null;
|
|
29
|
+
pageType: "ats" | "official" | "not_found";
|
|
30
|
+
atsProvider: string | null;
|
|
31
|
+
detectionMethod:
|
|
32
|
+
| "input-ats"
|
|
33
|
+
| "homepage-ats-link"
|
|
34
|
+
| "homepage-careers-link"
|
|
35
|
+
| "deterministic-candidate"
|
|
36
|
+
| "candidate-ats-link"
|
|
37
|
+
| "embedded-ats"
|
|
38
|
+
| "candidate-redirect"
|
|
39
|
+
| "ats-unverified"
|
|
40
|
+
| "not-found";
|
|
41
|
+
checkedUrls: string[];
|
|
42
|
+
}
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
## Examples
|
|
46
|
+
|
|
47
|
+
### Basic Careers Lookup
|
|
48
|
+
|
|
49
|
+
```typescript
|
|
50
|
+
const result = await services.company.findCareersPage({
|
|
51
|
+
website: row.website
|
|
52
|
+
});
|
|
53
|
+
|
|
54
|
+
return result.careerPageUrl;
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
### Prefer ATS When Available
|
|
58
|
+
|
|
59
|
+
```typescript
|
|
60
|
+
const result = await services.company.findCareersPage({
|
|
61
|
+
website: "https://plaid.com"
|
|
62
|
+
});
|
|
63
|
+
|
|
64
|
+
return {
|
|
65
|
+
url: result.careerPageUrl,
|
|
66
|
+
type: result.pageType,
|
|
67
|
+
ats: result.atsProvider
|
|
68
|
+
};
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
### Handle Not Found
|
|
72
|
+
|
|
73
|
+
```typescript
|
|
74
|
+
const result = await services.company.findCareersPage({
|
|
75
|
+
website: row.website
|
|
76
|
+
});
|
|
77
|
+
|
|
78
|
+
if (result.pageType === "not_found") {
|
|
79
|
+
return null;
|
|
80
|
+
}
|
|
81
|
+
|
|
82
|
+
return result.careerPageUrl;
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### Debug Why a Result Was Chosen
|
|
86
|
+
|
|
87
|
+
```typescript
|
|
88
|
+
const result = await services.company.findCareersPage({
|
|
89
|
+
website: row.website
|
|
90
|
+
});
|
|
91
|
+
|
|
92
|
+
return {
|
|
93
|
+
careerPageUrl: result.careerPageUrl,
|
|
94
|
+
pageType: result.pageType,
|
|
95
|
+
atsProvider: result.atsProvider,
|
|
96
|
+
detectionMethod: result.detectionMethod,
|
|
97
|
+
checkedUrls: result.checkedUrls
|
|
98
|
+
};
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
## What It Detects
|
|
102
|
+
|
|
103
|
+
- Official careers pages like `https://company.com/careers`
|
|
104
|
+
- Careers subdomains like `https://careers.company.com/`
|
|
105
|
+
- ATS boards when discoverable from the company site
|
|
106
|
+
- Embedded/wrapped ATS pages when the company site hosts the jobs UI directly
|
|
107
|
+
|
|
108
|
+
Common ATS providers currently recognized include:
|
|
109
|
+
|
|
110
|
+
- `ashby`
|
|
111
|
+
- `greenhouse`
|
|
112
|
+
- `lever`
|
|
113
|
+
- `workday`
|
|
114
|
+
- `icims`
|
|
115
|
+
- `gem`
|
|
116
|
+
- `kula`
|
|
117
|
+
- `breezy`
|
|
118
|
+
- `bamboohr`
|
|
119
|
+
- `rippling`
|
|
120
|
+
- `personio`
|
|
121
|
+
- `phenom`
|
|
122
|
+
- `smartrecruiters`
|
|
123
|
+
- `successfactors`
|
|
124
|
+
- `jobvite`
|
|
125
|
+
- `recruitee`
|
|
126
|
+
- `teamtailor`
|
|
127
|
+
- `indeed`
|
|
128
|
+
- `bestjobs`
|
|
129
|
+
- `ejobs`
|
|
130
|
+
|
|
131
|
+
## Key Rules
|
|
132
|
+
|
|
133
|
+
1. **Pass the company website when possible** - homepage/company URLs usually produce the best canonical result.
|
|
134
|
+
2. **ATS is preferred over generic careers pages** - if the company site clearly points to an ATS board, that board is returned.
|
|
135
|
+
3. **Deep location/provider pages can still work** - the resolver attempts to collapse some subdomains and detail pages back to the parent organization's careers site.
|
|
136
|
+
4. **`pageType: "official"` is still a success** - many enterprises host jobs on branded careers portals instead of a third-party ATS URL.
|
|
137
|
+
5. **Use `checkedUrls` for debugging** - when a result looks wrong or missing, inspect the visited candidates.
|
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
interface FindCareersPageResult {
|
|
2
|
+
/** The original website or URL input */
|
|
3
|
+
inputUrl: string;
|
|
4
|
+
/** Canonical homepage/base URL used during discovery */
|
|
5
|
+
normalizedWebsiteUrl: string;
|
|
6
|
+
/** Best careers page URL found, or null when none was found */
|
|
7
|
+
careerPageUrl: string | null;
|
|
8
|
+
/** Whether the result points to an ATS board, an official careers page, or nothing */
|
|
9
|
+
pageType: "ats" | "official" | "not_found";
|
|
10
|
+
/** ATS provider when pageType is "ats" */
|
|
11
|
+
atsProvider: string | null;
|
|
12
|
+
/** How the page was discovered */
|
|
13
|
+
detectionMethod:
|
|
14
|
+
| "input-ats"
|
|
15
|
+
| "homepage-ats-link"
|
|
16
|
+
| "homepage-careers-link"
|
|
17
|
+
| "deterministic-candidate"
|
|
18
|
+
| "candidate-ats-link"
|
|
19
|
+
| "embedded-ats"
|
|
20
|
+
| "candidate-redirect"
|
|
21
|
+
| "ats-unverified"
|
|
22
|
+
| "not-found";
|
|
23
|
+
/** URLs checked while searching */
|
|
24
|
+
checkedUrls: string[];
|
|
25
|
+
}
|
|
26
|
+
|
|
27
|
+
/**
|
|
28
|
+
* Find the best careers page for a company website.
|
|
29
|
+
* Accepts a homepage URL/domain and returns either a canonical ATS board URL
|
|
30
|
+
* or an official careers page on the company site.
|
|
31
|
+
*/
|
|
32
|
+
type findCareersPage = (params: {
|
|
33
|
+
/** Company website or homepage URL */
|
|
34
|
+
website?: string;
|
|
35
|
+
/** Alias for website. Provide website or url. */
|
|
36
|
+
url?: string;
|
|
37
|
+
}) => Promise<FindCareersPageResult>;
|
|
@@ -49,7 +49,6 @@ interface B2BCompany {
|
|
|
49
49
|
company_size: string | null; // Size range label
|
|
50
50
|
ticker: string | null; // Stock ticker
|
|
51
51
|
logo: string | null; // Logo URL
|
|
52
|
-
specialties: string[] | null; // Company specialties
|
|
53
52
|
twitter_handle: string | null; // Twitter handle
|
|
54
53
|
linkedin_url: string | null; // Full LinkedIn URL
|
|
55
54
|
created_at: string | null; // Record created timestamp
|
|
@@ -57,7 +56,15 @@ interface B2BCompany {
|
|
|
57
56
|
}
|
|
58
57
|
```
|
|
59
58
|
|
|
60
|
-
>
|
|
59
|
+
> Important: LinkedIn company `industry` / `industries` coverage in the B2B DB is very sparse and often too weak for enrichment. These fields may be `null`, generic, stale, or missing even when the company record exists. Treat them as lookup metadata only, not as a high-confidence classification source for enrichment workflows.
|
|
60
|
+
>
|
|
61
|
+
> Preferred pattern for enrichment/classification:
|
|
62
|
+
>
|
|
63
|
+
> 1. Start from the company `domain` when available
|
|
64
|
+
> 2. `services.scrape.website(...)` the company site or a relevant subpage
|
|
65
|
+
> 3. `services.ai.generateObject(...)` to classify the company from the scraped content
|
|
66
|
+
>
|
|
67
|
+
> Use LinkedIn enrich primarily for fast lookup fields like company identity, URL, headcount, location, and description. Do **not** build industry enrichment pipelines that depend mainly on LinkedIn `industry`.
|
|
61
68
|
|
|
62
69
|
### Extended (`extended: true`) - `B2BCompanyExtended`
|
|
63
70
|
|
|
@@ -283,6 +290,44 @@ return {
|
|
|
283
290
|
};
|
|
284
291
|
```
|
|
285
292
|
|
|
293
|
+
### Classify Industry from Domain, Not LinkedIn
|
|
294
|
+
|
|
295
|
+
If your goal is enrichment or categorization, prefer the company website over LinkedIn `industry`:
|
|
296
|
+
|
|
297
|
+
```typescript
|
|
298
|
+
const company = await services.company.linkedin.enrich({
|
|
299
|
+
domain: row.domain
|
|
300
|
+
});
|
|
301
|
+
|
|
302
|
+
const { markdown } = await services.scrape.website({
|
|
303
|
+
url: `https://${row.domain}`
|
|
304
|
+
});
|
|
305
|
+
|
|
306
|
+
const { object } = await services.ai.generateObject({
|
|
307
|
+
prompt: `
|
|
308
|
+
Classify this company based on its website content.
|
|
309
|
+
|
|
310
|
+
Do not rely on LinkedIn industry because it is sparse and often too generic.
|
|
311
|
+
Use LinkedIn only as lightweight context for identity verification.
|
|
312
|
+
|
|
313
|
+
Domain: ${row.domain}
|
|
314
|
+
LinkedIn name: ${company?.name ?? "unknown"}
|
|
315
|
+
LinkedIn description: ${company?.description ?? "unknown"}
|
|
316
|
+
|
|
317
|
+
Website content:
|
|
318
|
+
${markdown}
|
|
319
|
+
`,
|
|
320
|
+
schema: z.object({
|
|
321
|
+
industry: z.string().nullable(),
|
|
322
|
+
subindustry: z.string().nullable(),
|
|
323
|
+
businessModel: z.string().nullable(),
|
|
324
|
+
confidence: z.enum(["low", "medium", "high"])
|
|
325
|
+
})
|
|
326
|
+
});
|
|
327
|
+
|
|
328
|
+
return object;
|
|
329
|
+
```
|
|
330
|
+
|
|
286
331
|
### Handle Missing Companies
|
|
287
332
|
|
|
288
333
|
```typescript
|
|
@@ -0,0 +1,150 @@
|
|
|
1
|
+
# Scrape ATS Careers Page
|
|
2
|
+
|
|
3
|
+
Extract a standardized list of jobs from a supported **official ATS-hosted** careers page without using a browser when possible.
|
|
4
|
+
|
|
5
|
+
This is best when you already have an ATS careers page URL, or when you first resolved one with `services.company.findCareersPage` and now want the actual jobs.
|
|
6
|
+
|
|
7
|
+
## Input Parameters
|
|
8
|
+
|
|
9
|
+
Provide **one** of:
|
|
10
|
+
|
|
11
|
+
| Parameter | Type | Required | Description |
|
|
12
|
+
| ---------------- | -------- | -------- | ----------------------------------------------------------------------------------------------- |
|
|
13
|
+
| `careersPageUrl` | `string` | No | Official ATS board URL or ATS job/detail URL, e.g. `https://job-boards.greenhouse.io/anthropic` |
|
|
14
|
+
| `url` | `string` | No | Alias for `careersPageUrl` |
|
|
15
|
+
|
|
16
|
+
**Optional:**
|
|
17
|
+
|
|
18
|
+
| Parameter | Type | Required | Description |
|
|
19
|
+
| --------- | -------- | -------- | ------------------------------------ |
|
|
20
|
+
| `timeout` | `string` | No | Batch timeout override, e.g. `"30m"` |
|
|
21
|
+
|
|
22
|
+
## Output
|
|
23
|
+
|
|
24
|
+
```typescript
|
|
25
|
+
{
|
|
26
|
+
status: "success" | "unsupported_url" | "unsupported_provider";
|
|
27
|
+
inputUrl: string;
|
|
28
|
+
normalizedBoardUrl: string | null;
|
|
29
|
+
atsProvider: string | null;
|
|
30
|
+
companyName: string | null;
|
|
31
|
+
source: "api" | "html" | null;
|
|
32
|
+
totalJobs: number;
|
|
33
|
+
jobs: Array<{
|
|
34
|
+
id: string;
|
|
35
|
+
title: string;
|
|
36
|
+
url: string;
|
|
37
|
+
applyUrl: string | null;
|
|
38
|
+
location: string | null;
|
|
39
|
+
locations: string[];
|
|
40
|
+
department: string | null;
|
|
41
|
+
team: string | null;
|
|
42
|
+
employmentType: string | null;
|
|
43
|
+
workplaceType: string | null;
|
|
44
|
+
postedAt: string | null;
|
|
45
|
+
postedText: string | null;
|
|
46
|
+
requisitionId: string | null;
|
|
47
|
+
}>;
|
|
48
|
+
checkedUrls: string[];
|
|
49
|
+
supportedProviders: string[];
|
|
50
|
+
message: string | null;
|
|
51
|
+
}
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
## Examples
|
|
55
|
+
|
|
56
|
+
### Scrape Jobs From a Known ATS Board
|
|
57
|
+
|
|
58
|
+
```typescript
|
|
59
|
+
const result = await services.company.scrapeCareersPage({
|
|
60
|
+
careersPageUrl: "https://job-boards.greenhouse.io/anthropic"
|
|
61
|
+
});
|
|
62
|
+
|
|
63
|
+
return result.jobs;
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
### Resolve Then Scrape
|
|
67
|
+
|
|
68
|
+
```typescript
|
|
69
|
+
const careers = await services.company.findCareersPage({
|
|
70
|
+
website: row.website
|
|
71
|
+
});
|
|
72
|
+
|
|
73
|
+
if (!careers.careerPageUrl || careers.pageType !== "ats") {
|
|
74
|
+
return [];
|
|
75
|
+
}
|
|
76
|
+
|
|
77
|
+
const jobs = await services.company.scrapeCareersPage({
|
|
78
|
+
careersPageUrl: careers.careerPageUrl
|
|
79
|
+
});
|
|
80
|
+
|
|
81
|
+
return jobs.jobs;
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
### Return Lightweight Job Summaries
|
|
85
|
+
|
|
86
|
+
```typescript
|
|
87
|
+
const result = await services.company.scrapeCareersPage({
|
|
88
|
+
careersPageUrl: row.careers_page
|
|
89
|
+
});
|
|
90
|
+
|
|
91
|
+
return result.jobs.map((job) => ({
|
|
92
|
+
title: job.title,
|
|
93
|
+
location: job.location,
|
|
94
|
+
department: job.department,
|
|
95
|
+
url: job.url
|
|
96
|
+
}));
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### Handle Unsupported Providers Gracefully
|
|
100
|
+
|
|
101
|
+
```typescript
|
|
102
|
+
const result = await services.company.scrapeCareersPage({
|
|
103
|
+
careersPageUrl: row.careers_page
|
|
104
|
+
});
|
|
105
|
+
|
|
106
|
+
if (result.status !== "success") {
|
|
107
|
+
return {
|
|
108
|
+
status: result.status,
|
|
109
|
+
provider: result.atsProvider,
|
|
110
|
+
message: result.message
|
|
111
|
+
};
|
|
112
|
+
}
|
|
113
|
+
|
|
114
|
+
return result.totalJobs;
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
### Pass a Job Detail URL
|
|
118
|
+
|
|
119
|
+
```typescript
|
|
120
|
+
const result = await services.company.scrapeCareersPage({
|
|
121
|
+
careersPageUrl: "https://jobs.lever.co/mistral/2a357282-9d44-4b41-a249-c75ffe878ce2"
|
|
122
|
+
});
|
|
123
|
+
|
|
124
|
+
return {
|
|
125
|
+
board: result.normalizedBoardUrl,
|
|
126
|
+
jobs: result.totalJobs
|
|
127
|
+
};
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
## Supported Providers
|
|
131
|
+
|
|
132
|
+
Current browser-free implementations:
|
|
133
|
+
|
|
134
|
+
- `ashby`
|
|
135
|
+
- `breezy`
|
|
136
|
+
- `greenhouse`
|
|
137
|
+
- `lever`
|
|
138
|
+
- `recruitee`
|
|
139
|
+
- `rippling`
|
|
140
|
+
- `smartrecruiters`
|
|
141
|
+
- `workable`
|
|
142
|
+
- `workday`
|
|
143
|
+
|
|
144
|
+
## Key Rules
|
|
145
|
+
|
|
146
|
+
1. **Use this for official ATS pages** - this endpoint is not meant for generic `company.com/careers` pages unless they are clearly hosted by a supported ATS.
|
|
147
|
+
2. **Prefer resolving first when starting from a company website** - use `services.company.findCareersPage` to find the canonical ATS URL, then pass that into this scraper.
|
|
148
|
+
3. **Job/detail URLs are okay** - supported ATS detail URLs are normalized back to the board before scraping.
|
|
149
|
+
4. **Treat `unsupported_provider` as expected** - it means the input was a recognized ATS, but this scraper does not implement that provider yet.
|
|
150
|
+
5. **Use `checkedUrls` for debugging** - when counts or mappings look off, inspect the URLs that were actually queried.
|
package/docs/services/index.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
- **ai**: AI helpers (summaries, classifications, scoring).
|
|
2
2
|
- **apify**: Run any of 10,000+ Apify actors for web scraping, social media, e-commerce, and more.
|
|
3
3
|
- **browser**: Kernel browser automation - spin up cloud browsers, execute Playwright code, take screenshots. **Use this for scraping structured lists of repeated data** (e.g., product listings, search results, table rows) where you know the DOM structure. Also ideal for **intercepting network requests** to discover underlying APIs, then paginate those APIs directly in your code (faster & cheaper than clicking through pages). Perfect for JS-heavy sites that don't work with simple HTTP scraping.
|
|
4
|
-
- **company**: company data (getting employees at the company, getting company data, getting open jobs).
|
|
4
|
+
- **company**: company data (getting employees at the company, finding careers pages, getting company data, getting open jobs).
|
|
5
5
|
- **crunchbase**: SQL search over the lean Crunchbase company table (`public.crunchbase_scraper_lean`) for startup prospecting.
|
|
6
6
|
- **person**: finding a persons linkedin url, enriching it from linkedin, contact info, and searching for specific people / groups on linkedin
|
|
7
7
|
- **geo**: parsing address
|
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
/\*_ Credits: 2 for the
|
|
1
|
+
/\*_ Credits: 2 for the search path, or 50 when reverse-email lookup is used. Charged only if a valid URL is returned. _/
|
|
2
2
|
|
|
3
3
|
/\*\*
|
|
4
4
|
|
|
@@ -15,6 +15,6 @@
|
|
|
15
15
|
keyword?: string;
|
|
16
16
|
/\*_ Location string (e.g., city, state, country) to narrow search results _/
|
|
17
17
|
location?: string;
|
|
18
|
-
/\*_ Email address.
|
|
18
|
+
/\*_ Email address. For work emails, the service may infer the name from the email, try search with that + the email domain, validate the result against B2B current-company domain data, then fall back to reverse-email lookup. _/
|
|
19
19
|
email?: string;
|
|
20
20
|
}) => Promise<string | undefined>;
|
|
@@ -67,6 +67,22 @@ Web search returns URLs based on keywords, **not confirmed matches**. Scrape the
|
|
|
67
67
|
|
|
68
68
|
---
|
|
69
69
|
|
|
70
|
+
## Company Subpage Discovery Rule
|
|
71
|
+
|
|
72
|
+
To find pages on a company's own website, **never search Google by company name**. Always start from the verified domain and dork with `site:` plus `inurl:` hints.
|
|
73
|
+
|
|
74
|
+
```ts
|
|
75
|
+
await services.web.search({ query: "site:stripe.com inurl:team OR inurl:about OR inurl:careers" });
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
Never do this for subpage discovery:
|
|
79
|
+
|
|
80
|
+
```ts
|
|
81
|
+
await services.web.search({ query: '"Stripe" careers' });
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
70
86
|
## Parallel Query Permutations
|
|
71
87
|
|
|
72
88
|
Always run multiple query variations for better coverage.
|
|
@@ -87,17 +103,17 @@ const allResults = await services.web.batchSearch({
|
|
|
87
103
|
const uniqueLinks = [...new Set(allResults.flatMap((r) => r.results.map((x) => x.link)))];
|
|
88
104
|
```
|
|
89
105
|
|
|
90
|
-
| Use Case | Permutation Ideas
|
|
91
|
-
| -------------- |
|
|
92
|
-
| Person search | Full name, initials, nicknames, with/without middle name
|
|
93
|
-
| Company search |
|
|
94
|
-
| Title search | CEO/Founder/Chief, VP/Director, formal/informal titles
|
|
106
|
+
| Use Case | Permutation Ideas |
|
|
107
|
+
| -------------- | --------------------------------------------------------------------------- |
|
|
108
|
+
| Person search | Full name, initials, nicknames, with/without middle name |
|
|
109
|
+
| Company search | Prefer verified domain first; use name variants only for off-site discovery |
|
|
110
|
+
| Title search | CEO/Founder/Chief, VP/Director, formal/informal titles |
|
|
95
111
|
|
|
96
112
|
---
|
|
97
113
|
|
|
98
114
|
## Google Dorking
|
|
99
115
|
|
|
100
|
-
Use `site:` and `inurl:` to target specific platforms.
|
|
116
|
+
Use `site:` and `inurl:` to target specific platforms and verified company domains.
|
|
101
117
|
|
|
102
118
|
| Platform | Dork | Example |
|
|
103
119
|
| ------------------ | --------------------------- | -------------------------------------------- |
|
|
@@ -107,20 +123,19 @@ Use `site:` and `inurl:` to target specific platforms.
|
|
|
107
123
|
| Reddit | `site:reddit.com` | `site:reddit.com/r/sales "cold email"` |
|
|
108
124
|
|
|
109
125
|
```ts
|
|
110
|
-
// Find company
|
|
111
|
-
const
|
|
126
|
+
// Find company subpages from a verified domain
|
|
127
|
+
const domain = "stripe.com";
|
|
112
128
|
const queries = [
|
|
113
|
-
`
|
|
114
|
-
`
|
|
115
|
-
`
|
|
116
|
-
`
|
|
117
|
-
`stripe.com site:linkedin.com/in`
|
|
129
|
+
`site:${domain} inurl:team OR inurl:about OR inurl:leadership`,
|
|
130
|
+
`site:${domain} inurl:careers OR inurl:jobs`,
|
|
131
|
+
`site:${domain} inurl:blog OR inurl:news OR inurl:press`,
|
|
132
|
+
`site:${domain} inurl:contact OR inurl:locations`
|
|
118
133
|
];
|
|
119
134
|
|
|
120
135
|
const results = await services.web.batchSearch({
|
|
121
136
|
queries: queries.map((query) => ({ query }))
|
|
122
137
|
});
|
|
123
|
-
const
|
|
138
|
+
const subpages = [...new Set(results.flatMap((r) => r.results.map((x) => x.link)))];
|
|
124
139
|
```
|
|
125
140
|
|
|
126
141
|
---
|
package/docs/triggers-runtime.md
CHANGED
|
@@ -5,11 +5,21 @@ description: Trigger mental model + runtime API + webhook shape access. Read bef
|
|
|
5
5
|
|
|
6
6
|
# Triggers Runtime (Agent)
|
|
7
7
|
|
|
8
|
-
##
|
|
8
|
+
## Core Rule: Triggers Push Data, Columns Do Work
|
|
9
9
|
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
10
|
+
**Triggers are thin data ingesters, NOT processing engines.** Push bare minimum data into rows; all enrichment, AI, scoring, and transformation belongs in code columns.
|
|
11
|
+
|
|
12
|
+
The spreadsheet is a live workspace, not a log. Columns make work visible, debuggable, retryable per-row, and composable across data sources. Processing inside triggers hides failures in run logs.
|
|
13
|
+
|
|
14
|
+
**Trigger code should only:**
|
|
15
|
+
|
|
16
|
+
- Parse/extract fields from `ctx.trigger.payload`
|
|
17
|
+
- Light dedup (`SELECT` to check if row exists)
|
|
18
|
+
- `addRows()` with raw data + `{ run: true }` to kick off columns
|
|
19
|
+
- Paginate via self-invocation for large ingestion
|
|
20
|
+
- Route to the right sheet based on payload type
|
|
21
|
+
|
|
22
|
+
**Never put in trigger code:** `services.*` enrichment, AI calls, scoring, or per-row transformation loops — the sheet IS the loop.
|
|
13
23
|
|
|
14
24
|
## Webhook Data Model (from DB schema)
|
|
15
25
|
|
|
@@ -175,11 +185,22 @@ interface Ctx {
|
|
|
175
185
|
## Minimal Example
|
|
176
186
|
|
|
177
187
|
```ts
|
|
188
|
+
// Inspect recent webhook payloads to understand shape
|
|
178
189
|
const t = await ctx.triggers.byName("Inbound Lead Webhook");
|
|
179
190
|
const recent = await t.webhooks.list({ limit: 10 });
|
|
180
191
|
const samplePayload = recent[0]?.payload;
|
|
181
192
|
|
|
193
|
+
// Trigger code: push minimal data, let columns do the work
|
|
182
194
|
await t.update({
|
|
183
|
-
code: `
|
|
195
|
+
code: `
|
|
196
|
+
const leads = Array.isArray(ctx.trigger.payload) ? ctx.trigger.payload : [ctx.trigger.payload];
|
|
197
|
+
const sheet = await ctx.sheet("Inbound Leads");
|
|
198
|
+
await sheet.addRows(
|
|
199
|
+
leads.map(l => ({ "Name": l.name, "Email": l.email, "Source": "webhook" })),
|
|
200
|
+
{ run: true }
|
|
201
|
+
);
|
|
202
|
+
return { pushed: leads.length };
|
|
203
|
+
`
|
|
184
204
|
});
|
|
205
|
+
// Then create enrichment columns on "Inbound Leads" to do the actual work
|
|
185
206
|
```
|