salesprompter-cli 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,53 +2,206 @@
2
2
 
3
3
  `salesprompter-cli` is a JSON-first command line interface for running a practical sales workflow:
4
4
 
5
+ - Authenticate via Salesprompter app backend
5
6
  - Define an ideal customer profile
7
+ - Resolve a target account from a domain
6
8
  - Generate seed leads
7
9
  - Enrich leads
8
10
  - Score leads
9
11
  - Sync leads into CRM and outreach systems
12
+ - Analyze upstream lead-list and domain-enrichment bottlenecks
13
+ - Replace opaque Pipedream logic with deterministic CLI workflows
10
14
 
11
- The first version is intentionally local-first. It uses deterministic heuristics and dry-run sync providers so the command surface and data contracts are stable before real APIs are attached.
15
+ ## Integration Contract
12
16
 
13
- When you pass `--company-domain`, the CLI generates modeled contacts for that target account. These are synthetic leads for workflow testing, not verified real contacts.
17
+ This CLI is not a standalone toy. It is a production integration surface for the Salesprompter app.
18
+
19
+ - The Domain Finder flow in this repository is a real end-to-end test case for Salesprompter app integration.
20
+ - CLI behavior should be treated as an app contract: auth, BigQuery execution, artifacts, and writeback semantics must remain stable.
21
+ - Changes to domain selection, writeback, or audit logic should always be validated with:
22
+ - CLI tests (`npm test`)
23
+ - BigQuery-backed runs (`domainfinder:run:bq`, `domainfinder:audit-existing:bq`)
24
+ - before/after delta checks (`domainfinder:audit-delta`)
25
+
26
+ The current version is account-first. It resolves a company from a domain, then generates contacts for that account through provider interfaces. The default providers are still fallback heuristics, but the JSON contracts are now stable for real company and people-data integrations.
27
+
28
+ When the output `mode` is `fallback`, the leads are modeled contacts for workflow testing, not verified real contacts. A real provider path should return `mode: "real"` using the same JSON shape.
29
+
30
+ All non-auth commands require a logged-in CLI session. This gives you one identity model across Salesprompter app, CLI, and Chrome extension.
31
+
32
+ Global output flags:
33
+
34
+ - `--json`: compact machine-readable JSON (optimized for agent/LLM parsers)
35
+ - `--quiet`: suppress successful stdout payloads (errors still surface)
36
+
37
+ ## Auth and Session
38
+
39
+ The CLI stores a local session file at `~/.config/salesprompter/auth-session.json` (or `SALESPROMPTER_CONFIG_DIR`).
40
+
41
+ ```bash
42
+ # Device flow through Salesprompter backend (recommended)
43
+ salesprompter auth:login
44
+
45
+ # Direct login using app-issued bearer token
46
+ salesprompter auth:login --token "$SALESPROMPTER_TOKEN"
47
+
48
+ # Verify active identity with backend
49
+ salesprompter auth:whoami --verify
50
+
51
+ # Clear local session
52
+ salesprompter auth:logout
53
+ ```
54
+
55
+ Environment variables:
56
+
57
+ - `SALESPROMPTER_API_BASE_URL`: override backend URL (default `https://app.salesprompter.com`)
58
+ - `SALESPROMPTER_CONFIG_DIR`: override local config dir
59
+ - `SALESPROMPTER_SKIP_AUTH=1`: bypass auth guard (tests/dev only)
60
+ - `INSTANTLY_API_KEY`: required for real `sync:outreach --target instantly`
61
+ - `INSTANTLY_CAMPAIGN_ID`: default campaign id for Instantly sync
62
+ - `SALESPROMPTER_INSTANTLY_BASE_URL`: override Instantly API base URL (tests/local proxies)
63
+
64
+ App compatibility:
65
+
66
+ - If your app exposes `/api/cli/auth/device/start` and `/api/cli/auth/device/poll`, use `salesprompter auth:login`.
67
+ - If device auth is not enabled, create a CLI token from the app endpoint `POST /api/cli/auth/token` and use:
68
+
69
+ ```bash
70
+ salesprompter auth:login --token "<token-from-app>"
71
+ ```
14
72
 
15
73
  ## Why this shape works for humans and LLMs
16
74
 
17
75
  - Every command reads and writes plain JSON.
18
- - Output is machine-readable and composable.
76
+ - Output is machine-readable and composable (`--json` for compact transport).
19
77
  - Domain contracts are explicit and validated with `zod`.
20
78
  - External integrations are behind narrow provider interfaces.
79
+ - Lead generation reports which provider and mode produced the result.
80
+ - Workflow bottlenecks become inspectable artifacts instead of hidden Pipedream state.
81
+ - Real prospect lookup can be normalized straight into the CLI lead schema with `--lead-out`.
21
82
 
22
83
  ## Commands
23
84
 
24
85
  ```bash
25
86
  salesprompter icp:define --name "EU SaaS RevOps" \
87
+ --description "RevOps and sales leaders in European growth-stage software companies" \
26
88
  --industries "Software,Financial Services" \
27
89
  --company-sizes "50-199,200-499" \
28
90
  --regions "Europe" \
91
+ --countries "DE,NL,GB" \
29
92
  --titles "Head of Revenue Operations,VP Sales" \
30
93
  --required-signals "recent funding,growing outbound team" \
94
+ --keywords "revenue operations,outbound,sales tooling" \
31
95
  --out ./data/icp.json
32
96
 
97
+ salesprompter auth:login
98
+ salesprompter auth:whoami --verify
99
+ salesprompter icp:vendor --vendor deel --market dach --out ./data/deel-icp.json
100
+ salesprompter leads:lookup:bq --icp ./data/deel-icp.json --limit 100 --execute --out ./data/deel-leads-raw.json --lead-out ./data/deel-leads.json
101
+ salesprompter leads:enrich --in ./data/deel-leads.json --out ./data/deel-enriched.json
102
+ salesprompter leads:score --icp ./data/deel-icp.json --in ./data/deel-enriched.json --out ./data/deel-scored.json
103
+ salesprompter sync:outreach --target instantly --in ./data/deel-scored.json --campaign-id "$INSTANTLY_CAMPAIGN_ID"
104
+ salesprompter sync:outreach --target instantly --in ./data/deel-scored.json --campaign-id "$INSTANTLY_CAMPAIGN_ID" --apply
105
+ salesprompter icp:from-historical-queries:bq --vendor deel --market dach --out ./data/deel-icp-historical.json --report-out ./data/deel-historical-report.json
106
+ salesprompter leadlists:funnel:bq --vendor deel --market dach --out ./data/deel-leadlists-funnel.json
107
+ salesprompter domainfinder:backlog:bq --market dach --out ./data/deel-domainfinder-backlog.json
108
+ salesprompter domainfinder:candidates:bq --market dach --limit 500 --out ./data/domain-candidates.json --sql-out ./data/domain-candidates.sql
109
+ salesprompter domainfinder:input-sql --market dach --out ./data/domainFinder_input_v2.sql
110
+ salesprompter domainfinder:select --in ./data/domain-candidates.json --out ./data/domain-decisions.json
111
+ salesprompter domainfinder:audit --in ./data/domain-decisions.json --out ./data/domain-audit.json
112
+ salesprompter domainfinder:compare-pipedream --in ./data/domain-candidates.json --out ./data/domain-comparison.json
113
+ salesprompter domainfinder:audit-existing:bq --market dach --out ./data/domain-existing-audit.json
114
+ salesprompter domainfinder:audit-delta --before ./data/domain-existing-audit-before.json --after ./data/domain-existing-audit-after.json --out ./data/domain-existing-audit-delta.json
115
+ salesprompter domainfinder:repair-existing:bq --market dach --mode conservative --limit 5000 --out ./data/domain-repair.sql --trace-id salesprompter-cli-repair-dach
116
+ salesprompter domainfinder:writeback-sql --in ./data/domain-decisions.json --out ./data/domain-writeback.sql --trace-id salesprompter-cli-dach-20260308
117
+ salesprompter domainfinder:writeback:bq --in ./data/domain-decisions.json --out ./data/domain-writeback.sql --trace-id salesprompter-cli-dach-20260308
118
+ salesprompter domainfinder:run:bq --market dach --limit 500 --out-dir ./data/domainfinder-run --trace-id salesprompter-cli-dach-20260308
119
+ salesprompter account:resolve --domain deel.com --company-name Deel --out ./data/deel-account.json
33
120
  salesprompter leads:generate --icp ./data/icp.json --count 5 --out ./data/leads.json
34
- salesprompter leads:generate --icp ./data/icp.json --count 5 --company-domain deel.com --company-name Deel --out ./data/deel-leads.json
121
+ salesprompter leads:generate --icp ./data/icp.json --count 5 --domain deel.com --company-name Deel --out ./data/deel-leads.json
35
122
  salesprompter leads:enrich --in ./data/leads.json --out ./data/enriched.json
36
123
  salesprompter leads:score --icp ./data/icp.json --in ./data/enriched.json --out ./data/scored.json
124
+ salesprompter leads:lookup:bq --icp ./data/deel-icp.json --limit 100
125
+ salesprompter queries:analyze:bq --search-kind sales-people --include-function "Human Resources" --out ./data/hr-query-report.json
37
126
  salesprompter sync:crm --target hubspot --in ./data/scored.json
38
127
  salesprompter sync:outreach --target instantly --in ./data/scored.json
39
128
  ```
40
129
 
130
+ ## Domain Finder Migration
131
+
132
+ The original Pipedream `domainFinder` workflow was doing three things:
133
+
134
+ 1. fetch a small input set from BigQuery
135
+ 2. ask OpenAI / Hunter for candidate domains
136
+ 3. pick a domain and write it back
137
+
138
+ The CLI now models that logic directly and improves it:
139
+
140
+ - `domainfinder:backlog:bq` measures whether the backlog is being starved before enrichment
141
+ - `domainfinder:candidates:bq` fetches real candidate rows from BigQuery for backlog companies
142
+ - `domainfinder:input-sql` generates a replacement input view driven by `linkedin_companies`, not `leadPool_new`
143
+ - `domainfinder:select` applies deterministic domain selection rules
144
+ - `domainfinder:audit` turns decisions into a review queue and writeback summary
145
+ - `domainfinder:compare-pipedream` quantifies how often the old selector would disagree with the improved selector
146
+ - `domainfinder:audit-existing:bq` measures current warehouse-visible mismatches and bad chosen domains
147
+ - `domainfinder:audit-delta` compares two audit snapshots and reports metric deltas
148
+ - `domainfinder:repair-existing:bq` generates (and optionally executes) targeted repair writes with selectable mode:
149
+ - `conservative`: only missing/blacklisted chosen domains
150
+ - `aggressive`: missing/blacklisted plus all mismatches
151
+ - `mismatch-only`: only mismatched chosen domains
152
+ - `domainfinder:writeback-sql` emits conservative SQL for `domainFinder_output`
153
+ - `domainfinder:writeback:bq` can execute that writeback in BigQuery when explicitly asked
154
+ - `domainfinder:run:bq` runs the full candidate -> decision -> audit -> writeback pipeline and stores all artifacts
155
+
156
+ Improved selection policy:
157
+
158
+ 1. prefer `domain_linkedin` when present and not blacklisted
159
+ 2. otherwise prefer `website_linkedin` root domain when present and not blacklisted
160
+ 3. otherwise choose the non-blacklisted candidate with the highest Hunter email count
161
+ 4. otherwise fall back to the first non-null candidate
162
+
163
+ This removes the earlier failure mode where OpenAI or Hunter could override a good LinkedIn domain purely because of a higher Hunter count.
164
+
165
+ Writeback policy:
166
+
167
+ - write to `SalesPrompter.domainFinder_output`, which is the source feeding `linkedin_companies.domain`
168
+ - exclude `no-domain` decisions
169
+ - exclude blacklisted domains from generated writeback SQL
170
+ - preserve batch provenance through `trace_id`
171
+
41
172
  ## Development
42
173
 
43
174
  ```bash
44
175
  npm install
45
176
  npm run build
46
177
  node ./dist/cli.js --help
178
+ npm test
47
179
  ```
48
180
 
181
+ BigQuery project selection:
182
+
183
+ - The CLI runs `bq query` with `--project_id` from `BQ_PROJECT_ID`.
184
+ - Fallback order: `BQ_PROJECT_ID` -> `GOOGLE_CLOUD_PROJECT` -> `GCLOUD_PROJECT` -> `icpidentifier`.
185
+
186
+ ## Real Deel Flow
187
+
188
+ For Deel as the vendor you sell for, do not use `--domain deel.com`. That path targets contacts at Deel itself.
189
+
190
+ Use this path instead:
191
+
192
+ 1. `salesprompter icp:vendor --vendor deel --market dach --out ./data/deel-icp.json`
193
+ 2. `salesprompter leads:lookup:bq --icp ./data/deel-icp.json --limit 100 --execute --out ./data/deel-leads-raw.json --lead-out ./data/deel-leads.json`
194
+ 3. `salesprompter leads:enrich --in ./data/deel-leads.json --out ./data/deel-enriched.json`
195
+ 4. `salesprompter leads:score --icp ./data/deel-icp.json --in ./data/deel-enriched.json --out ./data/deel-scored.json`
196
+ 5. `salesprompter sync:outreach --target instantly --in ./data/deel-scored.json --campaign-id "$INSTANTLY_CAMPAIGN_ID"`
197
+ 6. Add `--apply` only when the dry-run output looks correct.
198
+
49
199
  ## Next integrations
50
200
 
51
- - Replace `HeuristicLeadProvider` with Apollo, Clay, LinkedIn, or custom data providers.
201
+ - Replace `HeuristicCompanyProvider` with a real account lookup provider.
202
+ - Replace `HeuristicPeopleSearchProvider` with Apollo, Clay, LinkedIn, or custom people-data providers.
52
203
  - Replace `HeuristicEnrichmentProvider` with enrichment APIs such as Clearbit, FullEnrich, or custom LLM workflows.
53
204
  - Replace `DryRunSyncProvider` with real HubSpot, Salesforce, Pipedrive, Instantly, Apollo, or Outreach clients.
54
- - Add prompt-oriented commands so an LLM can ask for ICP refinement or lead prioritization directly through the CLI.
205
+ - Add provider selection and credentials for a first real `--domain deel.com` workflow.
206
+ - Replace configurable `bq` field mapping with a typed adapter per warehouse schema.
207
+ - Add a real candidate-fetch command that reads domain candidates from BigQuery and feeds them into `domainfinder:select`.
package/dist/auth.js ADDED
@@ -0,0 +1,242 @@
1
+ import os from "node:os";
2
+ import path from "node:path";
3
+ import { access, mkdir, readFile, rm, writeFile } from "node:fs/promises";
4
+ import { z } from "zod";
5
+ const DEFAULT_API_BASE_URL = "https://app.salesprompter.com";
6
+ const CLIENT_HEADER = "salesprompter-cli/0.2";
7
+ const DEFAULT_DEVICE_POLL_INTERVAL_SECONDS = 3;
8
+ const DEFAULT_DEVICE_TIMEOUT_SECONDS = 180;
9
+ const UserSchema = z.object({
10
+ id: z.string().min(1),
11
+ email: z.string().email(),
12
+ name: z.string().min(1).optional()
13
+ });
14
+ const AuthSessionSchema = z.object({
15
+ accessToken: z.string().min(1),
16
+ refreshToken: z.string().min(1).optional(),
17
+ apiBaseUrl: z.string().url(),
18
+ user: UserSchema,
19
+ expiresAt: z.string().datetime().optional(),
20
+ createdAt: z.string().datetime()
21
+ });
22
+ const DeviceStartResponseSchema = z.object({
23
+ deviceCode: z.string().min(1),
24
+ userCode: z.string().min(1),
25
+ verificationUrl: z.string().url().optional(),
26
+ verificationUri: z.string().url().optional(),
27
+ intervalSeconds: z.number().int().min(1).optional(),
28
+ expiresInSeconds: z.number().int().min(30).optional()
29
+ });
30
+ const DevicePollPendingSchema = z.object({
31
+ status: z.literal("pending")
32
+ });
33
+ const DevicePollDeniedSchema = z.object({
34
+ status: z.enum(["denied", "expired"])
35
+ });
36
+ const DevicePollAuthorizedSchema = z.object({
37
+ status: z.literal("authorized"),
38
+ accessToken: z.string().min(1),
39
+ refreshToken: z.string().min(1).optional(),
40
+ expiresAt: z.string().datetime().optional(),
41
+ user: UserSchema
42
+ });
43
+ const DevicePollResponseSchema = z.union([
44
+ DevicePollPendingSchema,
45
+ DevicePollDeniedSchema,
46
+ DevicePollAuthorizedSchema
47
+ ]);
48
+ const WhoAmIResponseSchema = z
49
+ .union([
50
+ z.object({
51
+ user: UserSchema,
52
+ expiresAt: z.string().datetime().optional()
53
+ }),
54
+ z.object({
55
+ id: z.string().min(1),
56
+ email: z.string().email(),
57
+ name: z.string().min(1).optional(),
58
+ expiresAt: z.string().datetime().optional()
59
+ }),
60
+ z.object({
61
+ data: z.object({
62
+ user: UserSchema,
63
+ expiresAt: z.string().datetime().optional()
64
+ })
65
+ })
66
+ ])
67
+ .transform((value) => {
68
+ if ("data" in value) {
69
+ return value.data;
70
+ }
71
+ if ("user" in value) {
72
+ return value;
73
+ }
74
+ return {
75
+ user: {
76
+ id: value.id,
77
+ email: value.email,
78
+ name: value.name
79
+ },
80
+ expiresAt: value.expiresAt
81
+ };
82
+ });
83
+ function getConfigDir() {
84
+ const override = process.env.SALESPROMPTER_CONFIG_DIR?.trim();
85
+ if (override !== undefined && override.length > 0) {
86
+ return override;
87
+ }
88
+ return path.join(os.homedir(), ".config", "salesprompter");
89
+ }
90
+ function getSessionPath() {
91
+ return path.join(getConfigDir(), "auth-session.json");
92
+ }
93
+ function normalizeApiBaseUrl(apiBaseUrl) {
94
+ const value = (apiBaseUrl ?? process.env.SALESPROMPTER_API_BASE_URL ?? DEFAULT_API_BASE_URL).trim();
95
+ return value.replace(/\/+$/, "");
96
+ }
97
+ async function hasSessionFile() {
98
+ try {
99
+ await access(getSessionPath());
100
+ return true;
101
+ }
102
+ catch {
103
+ return false;
104
+ }
105
+ }
106
+ async function httpJson(url, init, schema) {
107
+ const response = await fetch(url, init);
108
+ const text = await response.text();
109
+ const payload = text.length > 0 ? JSON.parse(text) : {};
110
+ if (!response.ok) {
111
+ throw new Error(`request failed (${response.status}) for ${url}`);
112
+ }
113
+ return schema.parse(payload);
114
+ }
115
+ function hasExpired(expiresAt) {
116
+ if (expiresAt === undefined) {
117
+ return false;
118
+ }
119
+ return Date.now() >= Date.parse(expiresAt);
120
+ }
121
+ export async function readAuthSession() {
122
+ if (!(await hasSessionFile())) {
123
+ return null;
124
+ }
125
+ const content = await readFile(getSessionPath(), "utf8");
126
+ const parsed = JSON.parse(content);
127
+ return AuthSessionSchema.parse(parsed);
128
+ }
129
+ export async function writeAuthSession(session) {
130
+ const sessionPath = getSessionPath();
131
+ await mkdir(path.dirname(sessionPath), { recursive: true });
132
+ await writeFile(sessionPath, `${JSON.stringify(session, null, 2)}\n`, "utf8");
133
+ }
134
+ export async function clearAuthSession() {
135
+ await rm(getSessionPath(), { force: true });
136
+ }
137
+ export async function requireAuthSession() {
138
+ const session = await readAuthSession();
139
+ if (session === null) {
140
+ throw new Error("not logged in. Run `salesprompter auth:login`.");
141
+ }
142
+ if (hasExpired(session.expiresAt)) {
143
+ throw new Error("session expired. Run `salesprompter auth:login`.");
144
+ }
145
+ return session;
146
+ }
147
+ export async function verifySession(session) {
148
+ const apiBaseUrl = normalizeApiBaseUrl(session.apiBaseUrl);
149
+ const response = await httpJson(`${apiBaseUrl}/api/cli/auth/me`, {
150
+ method: "GET",
151
+ headers: {
152
+ Authorization: `Bearer ${session.accessToken}`,
153
+ "X-Salesprompter-Client": CLIENT_HEADER
154
+ }
155
+ }, WhoAmIResponseSchema);
156
+ return AuthSessionSchema.parse({
157
+ ...session,
158
+ apiBaseUrl,
159
+ user: response.user,
160
+ expiresAt: response.expiresAt ?? session.expiresAt
161
+ });
162
+ }
163
+ export async function loginWithToken(token, apiBaseUrl) {
164
+ const normalizedApiBaseUrl = normalizeApiBaseUrl(apiBaseUrl);
165
+ const response = await httpJson(`${normalizedApiBaseUrl}/api/cli/auth/me`, {
166
+ method: "GET",
167
+ headers: {
168
+ Authorization: `Bearer ${token}`,
169
+ "X-Salesprompter-Client": CLIENT_HEADER
170
+ }
171
+ }, WhoAmIResponseSchema);
172
+ const session = AuthSessionSchema.parse({
173
+ accessToken: token,
174
+ apiBaseUrl: normalizedApiBaseUrl,
175
+ user: response.user,
176
+ expiresAt: response.expiresAt,
177
+ createdAt: new Date().toISOString()
178
+ });
179
+ await writeAuthSession(session);
180
+ return session;
181
+ }
182
+ export async function loginWithDeviceFlow(options) {
183
+ const apiBaseUrl = normalizeApiBaseUrl(options?.apiBaseUrl);
184
+ const timeoutSeconds = options?.timeoutSeconds ?? DEFAULT_DEVICE_TIMEOUT_SECONDS;
185
+ let start;
186
+ try {
187
+ start = await httpJson(`${apiBaseUrl}/api/cli/auth/device/start`, {
188
+ method: "POST",
189
+ headers: {
190
+ "Content-Type": "application/json",
191
+ "X-Salesprompter-Client": CLIENT_HEADER
192
+ },
193
+ body: JSON.stringify({ client: "salesprompter-cli" })
194
+ }, DeviceStartResponseSchema);
195
+ }
196
+ catch (error) {
197
+ const message = error instanceof Error ? error.message : String(error);
198
+ if (message.includes("(404)")) {
199
+ throw new Error("device login is not configured on this Salesprompter app. Generate a CLI token in the app and run `salesprompter auth:login --token <token>`.");
200
+ }
201
+ throw error;
202
+ }
203
+ const verificationUrl = start.verificationUrl ?? start.verificationUri;
204
+ if (verificationUrl === undefined) {
205
+ throw new Error("device start response missing verification url");
206
+ }
207
+ const pollIntervalMs = (start.intervalSeconds ?? DEFAULT_DEVICE_POLL_INTERVAL_SECONDS) * 1000;
208
+ const deadline = Date.now() + Math.max(timeoutSeconds, 30) * 1000;
209
+ while (Date.now() < deadline) {
210
+ const poll = await httpJson(`${apiBaseUrl}/api/cli/auth/device/poll`, {
211
+ method: "POST",
212
+ headers: {
213
+ "Content-Type": "application/json",
214
+ "X-Salesprompter-Client": CLIENT_HEADER
215
+ },
216
+ body: JSON.stringify({ deviceCode: start.deviceCode })
217
+ }, DevicePollResponseSchema);
218
+ if (poll.status === "authorized") {
219
+ const session = AuthSessionSchema.parse({
220
+ accessToken: poll.accessToken,
221
+ refreshToken: poll.refreshToken,
222
+ apiBaseUrl,
223
+ user: poll.user,
224
+ expiresAt: poll.expiresAt,
225
+ createdAt: new Date().toISOString()
226
+ });
227
+ await writeAuthSession(session);
228
+ return { session, verificationUrl, userCode: start.userCode };
229
+ }
230
+ if (poll.status === "denied") {
231
+ throw new Error("login denied");
232
+ }
233
+ if (poll.status === "expired") {
234
+ throw new Error("device login expired");
235
+ }
236
+ await new Promise((resolve) => setTimeout(resolve, pollIntervalMs));
237
+ }
238
+ throw new Error("timed out waiting for device login");
239
+ }
240
+ export function shouldBypassAuth() {
241
+ return process.env.SALESPROMPTER_SKIP_AUTH === "1";
242
+ }
@@ -0,0 +1,194 @@
1
+ import { execFile } from "node:child_process";
2
+ import { promisify } from "node:util";
3
+ const execFileAsync = promisify(execFile);
4
+ const DEFAULT_BQ_PROJECT_ID = process.env.BQ_PROJECT_ID ?? process.env.GOOGLE_CLOUD_PROJECT ?? process.env.GCLOUD_PROJECT ?? "icpidentifier";
5
+ function escapeSqlString(value) {
6
+ return value.replaceAll("\\", "\\\\").replaceAll("'", "\\'");
7
+ }
8
+ function lowerQuoted(value) {
9
+ return `'${escapeSqlString(value.trim().toLowerCase())}'`;
10
+ }
11
+ function upperQuoted(value) {
12
+ return `'${escapeSqlString(value.trim().toUpperCase())}'`;
13
+ }
14
+ function buildContainsClause(field, values) {
15
+ const normalized = values.map((value) => value.trim()).filter((value) => value.length > 0);
16
+ if (normalized.length === 0) {
17
+ return null;
18
+ }
19
+ const clauses = normalized.map((value) => `LOWER(CAST(${field} AS STRING)) LIKE ${lowerQuoted(`%${value}%`)}`);
20
+ return `(${clauses.join(" OR ")})`;
21
+ }
22
+ function buildInClause(field, values) {
23
+ const normalized = values.map((value) => value.trim().toLowerCase()).filter((value) => value.length > 0);
24
+ if (normalized.length === 0) {
25
+ return null;
26
+ }
27
+ return `LOWER(CAST(${field} AS STRING)) IN (${normalized.map(lowerQuoted).join(", ")})`;
28
+ }
29
+ function buildCountryClause(field, values) {
30
+ const normalized = values.map((value) => value.trim().toUpperCase()).filter((value) => value.length > 0);
31
+ if (normalized.length === 0) {
32
+ return null;
33
+ }
34
+ return `UPPER(CAST(${field} AS STRING)) IN (${normalized.map(upperQuoted).join(", ")})`;
35
+ }
36
+ function buildCompanySizeClause(field, buckets) {
37
+ const normalized = buckets.map((value) => value.trim()).filter((value) => value.length > 0);
38
+ if (normalized.length === 0) {
39
+ return null;
40
+ }
41
+ const bucketMap = {
42
+ "1-49": ["1-10", "11-50"],
43
+ "50-199": ["51-200"],
44
+ "200-499": ["201-500"],
45
+ "500+": ["501-1000", "1001-5000", "5001-10.000", "10.000+"]
46
+ };
47
+ const expanded = normalized.flatMap((bucket) => bucketMap[bucket] ?? [bucket]);
48
+ const clauses = expanded.map((bucket) => `LOWER(CAST(${field} AS STRING)) = ${lowerQuoted(bucket)}`);
49
+ return `(${clauses.join(" OR ")})`;
50
+ }
51
+ function requireString(row, field) {
52
+ const value = String(row[field] ?? "").trim();
53
+ if (value.length === 0) {
54
+ throw new Error(`BigQuery lead row missing required field: ${field}`);
55
+ }
56
+ return value;
57
+ }
58
+ function employeeCountFromBucket(bucket) {
59
+ switch (bucket.trim()) {
60
+ case "1-10":
61
+ return 10;
62
+ case "11-50":
63
+ return 30;
64
+ case "51-200":
65
+ return 125;
66
+ case "201-500":
67
+ return 350;
68
+ case "501-1000":
69
+ return 750;
70
+ case "1001-5000":
71
+ return 2500;
72
+ case "5001-10.000":
73
+ return 7500;
74
+ case "10.000+":
75
+ return 10000;
76
+ default:
77
+ return 250;
78
+ }
79
+ }
80
+ function deriveRegion(country, region) {
81
+ const normalizedRegion = region.trim();
82
+ if (normalizedRegion.length > 0) {
83
+ return normalizedRegion;
84
+ }
85
+ const normalizedCountry = country.trim().toUpperCase();
86
+ if (["DE", "AT", "CH"].includes(normalizedCountry)) {
87
+ return "DACH";
88
+ }
89
+ return normalizedCountry.length > 0 ? normalizedCountry : "unknown";
90
+ }
91
+ export function normalizeBigQueryLeadRows(rows) {
92
+ return rows.map((row) => {
93
+ const firstName = String(row.firstName ?? "").trim();
94
+ const lastName = String(row.lastName ?? "").trim();
95
+ const contactName = [firstName, lastName].filter((value) => value.length > 0).join(" ");
96
+ if (contactName.length === 0) {
97
+ throw new Error("BigQuery lead row missing required name fields");
98
+ }
99
+ const companySize = requireString(row, "companySize");
100
+ const country = String(row.country ?? "").trim();
101
+ const region = String(row.region ?? "").trim();
102
+ return {
103
+ companyName: requireString(row, "companyName"),
104
+ domain: requireString(row, "domain"),
105
+ industry: requireString(row, "industry"),
106
+ region: deriveRegion(country, region),
107
+ employeeCount: employeeCountFromBucket(companySize),
108
+ contactName,
109
+ title: requireString(row, "title"),
110
+ email: requireString(row, "email"),
111
+ source: "bigquery-leadpool",
112
+ signals: []
113
+ };
114
+ });
115
+ }
116
+ export function buildBigQueryLeadLookupSql(icp, options) {
117
+ const filters = [
118
+ buildContainsClause(options.titleField, icp.titles),
119
+ buildInClause(options.industryField, icp.industries),
120
+ options.regionField ? buildInClause(options.regionField, icp.regions) : null,
121
+ buildCountryClause(options.countryField, icp.countries),
122
+ buildCompanySizeClause(options.companySizeField, icp.companySizes)
123
+ ].filter((value) => value !== null);
124
+ if (options.useSalesprompterGuards) {
125
+ filters.push(`${options.emailField} IS NOT NULL`, `COALESCE(email_invalid, FALSE) = FALSE`, `COALESCE(company_blacklisted, FALSE) = FALSE`, `COALESCE(company_inSequence, FALSE) = FALSE`, `COALESCE(contact_replied, FALSE) = FALSE`, `COALESCE(contact_bounced, FALSE) = FALSE`, `COALESCE(jobTitle_blacklisted, FALSE) = FALSE`);
126
+ }
127
+ const keywordSearchExpression = options.keywordFields.length > 0
128
+ ? `CONCAT(${options.keywordFields.map((field) => `COALESCE(CAST(${field} AS STRING), '')`).join(", ' ', ")})`
129
+ : null;
130
+ const keywordClause = keywordSearchExpression ? buildContainsClause(keywordSearchExpression, icp.keywords) : null;
131
+ if (keywordClause !== null) {
132
+ filters.push(keywordClause);
133
+ }
134
+ const excludedKeywordClause = keywordSearchExpression ? buildContainsClause(keywordSearchExpression, icp.excludedKeywords) : null;
135
+ if (excludedKeywordClause !== null) {
136
+ filters.push(`NOT ${excludedKeywordClause}`);
137
+ }
138
+ if (options.additionalWhere !== undefined && options.additionalWhere.trim().length > 0) {
139
+ filters.push(`(${options.additionalWhere.trim()})`);
140
+ }
141
+ const whereClause = filters.length > 0 ? filters.join("\n AND ") : "TRUE";
142
+ return [
143
+ "SELECT",
144
+ ` ${options.companyField} AS companyName,`,
145
+ ` ${options.domainField} AS domain,`,
146
+ ` ${options.titleField} AS title,`,
147
+ ` ${options.firstNameField} AS firstName,`,
148
+ ` ${options.lastNameField} AS lastName,`,
149
+ ` ${options.emailField} AS email,`,
150
+ ` ${options.industryField} AS industry,`,
151
+ ` ${options.companySizeField} AS companySize,`,
152
+ ` ${options.countryField} AS country,`,
153
+ options.regionField ? ` ${options.regionField} AS region` : " CAST(NULL AS STRING) AS region",
154
+ `FROM \`${options.table}\``,
155
+ "WHERE",
156
+ ` ${whereClause}`,
157
+ `LIMIT ${options.limit}`
158
+ ].join("\n");
159
+ }
160
+ export async function runBigQueryQuery(sql, options = {}) {
161
+ const args = [
162
+ "query",
163
+ "--use_legacy_sql=false",
164
+ "--format=prettyjson",
165
+ `--project_id=${DEFAULT_BQ_PROJECT_ID}`
166
+ ];
167
+ if (options.maxRows !== undefined) {
168
+ args.push(`--max_rows=${options.maxRows}`);
169
+ }
170
+ args.push(sql);
171
+ const { stdout } = await execFileAsync("bq", args);
172
+ return JSON.parse(stdout);
173
+ }
174
+ export async function executeBigQuerySql(sql, options = {}) {
175
+ const args = [
176
+ "query",
177
+ "--use_legacy_sql=false",
178
+ "--format=prettyjson",
179
+ `--project_id=${DEFAULT_BQ_PROJECT_ID}`
180
+ ];
181
+ if (options.maxRows !== undefined) {
182
+ args.push(`--max_rows=${options.maxRows}`);
183
+ }
184
+ args.push(sql);
185
+ const { stdout } = await execFileAsync("bq", args);
186
+ return stdout.trim();
187
+ }
188
+ export async function runBigQueryRows(sql, options = {}) {
189
+ const result = await runBigQueryQuery(sql, options);
190
+ if (!Array.isArray(result)) {
191
+ throw new Error("expected BigQuery query result to be an array");
192
+ }
193
+ return result;
194
+ }