npm - flashrev-ai-enrich - Versions diffs - 1.0.0 - Mend

flashrev-ai-enrich 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

package/CHANGELOG.md +33 -0
package/LICENSE +21 -0
package/README.md +334 -0
package/bin/flashrev-ai-enrich.js +23 -0
package/examples/company-profile.job.json +14 -0
package/examples/leads.csv +4 -0
package/examples/person-email-unlock.job.json +13 -0
package/examples/sample-prospects-enriched.csv +8 -0
package/examples/sample-prospects.csv +8 -0
package/package.json +42 -0
package/skills/flashrev-ai-enrich/SKILL.md +212 -0
package/skills/flashrev-ai-enrich/agents/openai.yaml +3 -0
package/skills/flashrev-ai-enrich/references/api_contract.md +165 -0
package/src/args.js +118 -0
package/src/billing.js +54 -0
package/src/capabilities.js +2473 -0
package/src/cli.js +435 -0
package/src/config.js +114 -0
package/src/csv.js +81 -0
package/src/customer-api.js +101 -0
package/src/estimate.js +64 -0
package/src/flashrev-client.js +338 -0
package/src/job.js +126 -0
package/src/prompt-router.js +144 -0
package/src/runner.js +269 -0
package/src/table.js +41 -0
package/src/tabular.js +17 -0
package/src/utils.js +104 -0

package/skills/flashrev-ai-enrich/SKILL.md ADDED Viewed

@@ -0,0 +1,212 @@
+---
+name: flashrev-ai-enrich
+description: Use this skill when an AI agent needs to enrich a CSV lead list through the flashrev-ai-enrich npm CLI (v1.0+). Triggers on requests involving list enrichment, filling missing company/person fields, verifying emails or phones, unlocking contact emails or phone numbers, finding company CEOs / executives / industry / employees / LinkedIn posts, matching companies or people to FlashRev IDs, Google search / news / maps lookups, scraping a single page, or running an LLM over each row. The CLI is a structured tool — agents should call `flashrev-ai-enrich schema` to discover the 34 capabilities, then invoke `run` with `--capability <funcName> --map ...` directly; `--prompt "..."` exists for ad-hoc human users and costs 1 extra token per invocation. All enrichment decisions and token deductions are owned by the FlashRev backend; the CLI never calls external data providers directly except for the special `customer_api` capability. Dry-run estimates and the 10-row sample preview must be completed before live runs unless the user passed `--yes`. Agents should invoke with `FLASHREV_ENRICH_AI_MODE=1` (or `--ai-mode`) so list outputs (`tokens` / `schema` / `token-history`) and error envelopes are JSON-structured.
+---
+# FlashRev AI Enrich
+Use the `flashrev-ai-enrich` CLI to enrich CSV lead lists through FlashRev. The CLI does not send outreach messages. It reads CSV files, maps CSV columns to FlashRev capability inputs, estimates token cost via dry-run, previews enriched sample rows, then writes an enriched CSV.
+## Commands
+```
+flashrev-ai-enrich init [--force]                              Write default config
+flashrev-ai-enrich doctor [--no-api]                            Self-check Node / config / API
+flashrev-ai-enrich tokens [--json]                              Show balance / total / used / plan
+flashrev-ai-enrich token-history [--from YYYY-MM-DD] [--to YYYY-MM-DD] [--limit N] [--json]
+                                                                Show consumption log (auto-paginates)
+flashrev-ai-enrich schema [--json]                              List 34 capabilities (synced from backend at runtime)
+flashrev-ai-enrich dry-run  --source leads.csv (--capability ID | --prompt "...") [--map ...] [--output ...]
+                                                                Estimate without calling backend
+flashrev-ai-enrich run      --source leads.csv --out X.csv (--capability ID | --prompt "...") [--yes] [--concurrency N] [--sample-size N]
+                                                                Real enrichment with sample preview. --prompt routes to a funcName via run_llm (1 extra token)
+```
+## Required confirmations before real `run`
+1. User has a FlashRev account with available tokens (`flashrev-ai-enrich tokens` → `remaining > 0`).
+2. `FLASHREV_API_KEY` env var is set (generated from https://info.flashlabs.ai/settings/privateApps).
+3. Source CSV path and output CSV path are confirmed.
+4. Either `--capability ID` (from `flashrev-ai-enrich schema`) or `--prompt "<intent>"` is confirmed. Agents should prefer `--capability ID` directly; `--prompt` is for ad-hoc human use because it costs 1 extra token to route through `run_llm`.
+5. Input mappings (`--map flashrev_field=csv_column`) cover at least one capability rule. Skipped only when `--prompt` is used and the LLM returns valid mappings (still subject to rule validation afterwards).
+6. Output mappings (`--output csv_col=response_field`) or `--output-fields` are confirmed. Skipped under `--prompt` if the LLM returned mappings, but always required for dynamic-output capabilities (e.g., `run_llm`, `scrape_and_extract`).
+7. `dry-run` first to see estimated token cost and effective concurrency.
+8. Do not proceed past the sample preview (default 10 rows, configurable via `--sample-size N`) unless the user approves or `--yes` is set.
+## Input modes
+### A. CSV mode (typical batch)
+```bash
+flashrev-ai-enrich run \
+  --source leads.csv --out leads.enriched.csv \
+  --capability enrich_email \
+  --map first_name=first_name --map last_name=last_name --map company_name=company \
+  --output verified_email=verified_business_email \
+  --yes
+```
+`--map` connects CSV column → capability input field; `--output` connects CSV output column → backend response field.
+### B. Inline mode (single row test, no CSV)
+```bash
+flashrev-ai-enrich run \
+  --capability verify_email \
+  --input email=ada@example.com \
+  --output ok=deliverable_email \
+  --out out.csv --yes
+```
+In inline mode the `--input key=value` pairs are auto-mapped (no need for `--map`).
+### C. Job file (for repeatable presets)
+```bash
+flashrev-ai-enrich run --source leads.csv --out out.csv --job enrich.job.json --yes
+```
+Job file shape:
+```json
+{
+  "capability": "enrich_email",
+  "inputMapping": {
+    "first_name":  "first_name",
+    "last_name":   "last_name",
+    "company_name": "company"
+  },
+  "outputs": {
+    "verified_business_email":  "verified_business_email",
+    "all_verified_business_emails": "all_verified_business_emails"
+  }
+}
+```
+### D. Prompt routing mode (ad-hoc human use; costs 1 extra token)
+Skip `--capability` and describe the intent in natural language. The CLI sends the prompt + CSV columns + capability registry to `run_llm`, which returns JSON `{ funcName, inputMapping, outputMapping, reasoning }`; the CLI prints a Routing-decision block and then runs the resulting job through the normal dry-run / sample / run pipeline.
+```bash
+flashrev-ai-enrich run --source leads.csv --out leads.enriched.csv \
+  --prompt "for each row, take the email column and verify it is a deliverable business email" \
+  --yes
+```
+Rules of thumb when writing prompts:
+- Name the CSV column explicitly ("take the **email** column"); vague prompts make the LLM return empty mappings.
+- Describe the business outcome, not the capability name ("find the CEO" beats "use get_company_ceo").
+- One capability per prompt — the LLM picks exactly one funcName.
+- `--map` / `--output` on the command line override the LLM's choices; use them to lock specific columns while letting the LLM pick the capability.
+- `--capability X --prompt "..."` together: `--capability` wins, `--prompt` is ignored with a stderr warning (no routing token charged).
+- Unroutable prompts (e.g., "make me a sandwich") exit non-zero with the LLM's reasoning printed; zero rows run.
+Agents calling this CLI should usually skip prompt routing entirely — `schema` + explicit `--capability ID` is cheaper, faster, and deterministic. Prompt routing is for humans at a terminal.
+## Status semantics (output CSV columns)
+Every output CSV gets `flashrev_enrich_status` and `flashrev_enrich_error` columns:
+| status | meaning |
+|---|---|
+| `success` | Got business data; charged per capability `unitPriceToken`. |
+| `cached` | Hit `unlock_contact` dedup (same `person_id` already unlocked). 0 tokens. |
+| `no_data` | Backend returned 200 but the requested output fields are empty / null. 0 tokens. |
+| `failed` | HTTP error from backend, retries exhausted. 0 tokens. |
+`Failed` count > 0 with `Tokens used` > 0 means some rows got SOMETHING from backend (charged) but not the specific output fields the user asked for.
+## Cost reporting
+`Summary` line in `run` output prints `(balance before → balance after)` — that delta is the **authoritative** amount charged for the row enrichments. Each row's individual `cost.tokens` reported by backend may be slightly off under high concurrency (known limitation; `token-history` is always exact).
+When `--prompt` is used, the Routing-decision block prints its own `routing cost: 1 token(s)` line. That 1 token is **not** included in the `Summary` `balance before → after` delta, since routing happens before the balance snapshot. Total user cost per `--prompt` run = 1 routing token + (rows × capability unitPriceToken).
+## Special capability: `customer_api`
+`customer_api` does NOT call FlashRev backend — the CLI fetches the user-provided URL locally and parses the response. 0 tokens.
+Inputs (via `--map <field>=<csv_col>` or `--input <field>=<value>`):
+| field | required | default | notes |
+|---|---|---|---|
+| `url` | yes | — | target URL (alias: `endpoint`) |
+| `method` | no | `GET` | HTTP method |
+| `headers` | no | `{}` | JSON object of HTTP headers |
+| `body` | no | — | string (sent as-is) or object (JSON-serialized; Content-Type defaults to application/json) |
+| `params` | no | — | object of query-string params; appended to `url` |
+| `timeout` | no | `30000` | milliseconds before AbortError |
+The response JSON (or `{ text }` wrapper for non-JSON) becomes the row's enrichment data; map output columns via `--output csv_col=response_field` as usual. Useful for mixing 3rd-party APIs into the same enrichment workflow.
+## Date format
+`--from` and `--to` accept `YYYY-MM-DD`. They are interpreted in the local timezone. `--to` alone makes the CLI paginate through history until it covers the date range (up to 2000 records).
+## Safety rules
+- Never print or store `FLASHREV_API_KEY` in generated artifacts.
+- Prefer the `FLASHREV_API_KEY` env var over `--api-key`.
+- Treat email / phone enrichment (`enrich_email` / `enrich_phone`) as paid unlock operations.
+- If `tokens` returns `remaining: 0`, tell the user to recharge before running.
+- Do not describe or expose FlashRev backend data sources, routing, or internal service names to end users.
+- Never overwrite the source CSV (CLI refuses `--source == --out`).
+- Preserve row-level errors in `flashrev_enrich_status` and `flashrev_enrich_error` columns.
+## Failure handling
+- `402 Insufficient tokens` → run terminates; tell user to recharge.
+- `401` / `403` → invalid API key; verify `FLASHREV_API_KEY`.
+- `429 Rate limit` → CLI auto-retries with exponential backoff (500ms / 1s / 2s, up to 3 retries = 4 total attempts).
+- `503` / `504` → backend timeout/unavailable; auto-retried with the same schedule as 429.
+- Any other 4xx/5xx on a row → that single row is marked `failed`, batch continues.
+- `--prompt` routing failure (LLM returns non-JSON, unknown funcName, or `run_llm` itself errors) → CLI exits non-zero **before** enrichment starts, prints the LLM's reasoning. Suggest the user retry with `--capability ID`.
+- `--prompt` routed to a capability but `Input mapping does not satisfy <funcName>` → the LLM returned empty / wrong mapping; rerun with a more explicit prompt (name the CSV column) or use `--map` to override.
+## Workflow recipe
+```bash
+# 1. (first time) write config
+flashrev-ai-enrich init
+export FLASHREV_API_KEY="sk_xxxx"   # from info.flashlabs.ai/settings/privateApps
+# 2. verify
+flashrev-ai-enrich doctor
+# 3. browse capabilities and pick one
+flashrev-ai-enrich schema | less
+# 4. (optional) check balance
+flashrev-ai-enrich tokens
+# 5. estimate cost
+flashrev-ai-enrich dry-run --source leads.csv \
+  --capability enrich_email \
+  --map first_name=first_name --map last_name=last_name --map company_name=company
+# 6. real run with sample preview
+flashrev-ai-enrich run --source leads.csv --out out.csv \
+  --capability enrich_email \
+  --map first_name=first_name --map last_name=last_name --map company_name=company \
+  --output verified_email=verified_business_email
+# (preview shown, type 'y' to continue, or pass --yes to auto-confirm)
+# 7. audit spend
+flashrev-ai-enrich token-history --from 2026-05-01
+```
+### Shortcut for ad-hoc human use (prompt routing)
+When the user does not know the capability name and is willing to spend 1 extra token to let the LLM pick:
+```bash
+# dry-run only routes (1 token) — no enrichment
+flashrev-ai-enrich dry-run --source leads.csv \
+  --prompt "find the CEO of each company"
+# real run: 1 routing token + N rows
+flashrev-ai-enrich run --source leads.csv --out out.csv \
+  --prompt "find the CEO of each company" --yes
+```
+Agents should skip this and pass `--capability` directly.

package/skills/flashrev-ai-enrich/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,3 @@
+display_name: FlashRev AI Enrich
+short_description: Enrich FlashRev CSV lead lists with dry-run estimates and sample previews.
+default_prompt: Run FlashRev AI Enrich on a CSV using a dry-run first, then enrich after approval.

package/skills/flashrev-ai-enrich/references/api_contract.md ADDED Viewed

@@ -0,0 +1,165 @@
+# FlashRev AI Enrich API Contract
+The wire format this CLI relies on. Anything beyond what is documented here is FlashRev internal and may change without notice.
+## Base URL & auth
+| Item | Value |
+|---|---|
+| Base URL | `https://open-ai-api.flashlabs.ai` |
+| Auth header | `X-API-Key: <private app key>` |
+| Key issuance | https://info.flashlabs.ai/settings/privateApps |
+| Path prefix | All CLI-facing routes share the `/flashrev/...` prefix |
+The API key is exchanged for an authenticated session by the FlashRev gateway. The CLI never sees or handles internal tokens.
+## Endpoints used by CLI
+### 1. Token balance — `GET /flashrev/api/v2/oauth/me`
+Response:
+```json
+{
+  "code": 200,
+  "data": {
+    "companyId": 1000001,
+    "newCreditFlag": "Y",
+    "limit": {
+      "tokenTotal": 1121000,
+      "tokenCost": 917206.5,
+      "tokenCategoryRemain": { "SUBSCRIPTION": 0, "GIFT": 0, "ADDON": 289974 }
+    },
+    "vip": { "packageName": "...", "...": "..." }
+  }
+}
+```
+CLI computes `remaining = tokenTotal - tokenCost`.
+### 2. Token history — `POST /flashrev/api/v2/commodity/token/transaction/list`
+Request body:
+```json
+{ "page": 1, "pageSize": 100, "transactionType": 2 }
+```
+`transactionType: 2` = consumption (1 = top-up).
+Response:
+```json
+{
+  "code": 200,
+  "data": {
+    "list": [
+      { "createdAt": "2026-05-29 06:59:56", "featId": "unlock_contact",
+        "featName": "Verify Email Address", "tokenAmount": 1, "unit": "Run",
+        "quantity": 1, "transactionType": 2 }
+    ],
+    "total": 100, "page": 1, "pageSize": 100
+  }
+}
+```
+The endpoint does not currently accept date filters; the CLI paginates and filters locally by `createdAt`. CLI cap: 20 pages × 100 = 2000 records.
+### 3. Capability registry — `GET /flashrev/api/v1/enrich/configs`
+Returns the active capability list with pricing and shape.
+```json
+{
+  "code": 200,
+  "data": [
+    {
+      "funcName": "enrich_email",
+      "displayName": "Enrich Person -> Get Emails",
+      "featId": "unlock_contact",
+      "unitPriceToken": 2,
+      "concurrency": 10,
+      "inputColumn": [
+        { "key": "first_name", "name": "First Name" },
+        { "key": "last_name",  "name": "Last Name" }
+      ],
+      "outputColumn": ["verified_business_email", "all_verified_business_emails", "..."],
+      "rules": [
+        ["first_name", "last_name", "company_name"],
+        ["person_linkedin_url"],
+        ["email"]
+      ]
+    }
+  ]
+}
+```
+### 4. Enrich — `POST /flashrev/api/v1/enrich/run`
+Per-row enrichment, synchronous.
+Request body (snake_case on the wire):
+```json
+{
+  "func_name": "enrich_email",
+  "input": {
+    "first_name": "Ada",
+    "last_name": "Lovelace",
+    "company_name": "Acme"
+  },
+  "row_id": "row-42"
+}
+```
+`row_id` is optional and used by the CLI for tracing only.
+Response (success):
+```json
+{
+  "code": 200,
+  "msg": "OK",
+  "data": {
+    "code": 200,
+    "msg": "Successful",
+    "data": {
+      "verified_business_email": "ada@acme.com",
+      "all_verified_business_emails": ["ada@acme.com"],
+      "verified_personal_email": ""
+    },
+    "cost": { "tokens": 2, "cached": false }
+  }
+}
+```
+The response uses a nested `data` wrapper; the CLI's `normalizeEnrichResponse` handles both flat and nested shapes.
+`cost.tokens` may be slightly inaccurate under high concurrency; the values returned by `token/transaction/list` are always exact.
+### 5. Error codes
+HTTP-level (gateway / transport):
+| Code | Meaning |
+|---|---|
+| 401 | Invalid or revoked `X-API-Key` |
+| 402 | Insufficient tokens |
+| 429 | Rate limit exceeded (CLI auto-retries with exponential backoff) |
+| 503 | Service temporarily unavailable |
+| 504 | Upstream timeout |
+Business-level (HTTP 200 but inner `code != 200`):
+| Inner code | Meaning |
+|---|---|
+| 200 + `data` populated | Real enrichment, charged at `unitPriceToken` |
+| 200 + `data` empty / requested fields blank | No data for this lead; CLI marks `no_data`, not charged |
+| 422 | Input validation failed (e.g., missing required input combo) |
+| 4xx other | Request rejected |
+## Deduction semantics
+- **Pre-check**: The balance is checked before any downstream call. Insufficient → `402` immediately, no charge.
+- **Rate limit**: A per-`funcName` quota is enforced server-side. Overflow → `429`; the CLI retries with backoff.
+- **Charge on success only**: A row is billed only when the response carries real business data. Empty / 4xx / 5xx responses are not billed.
+- **Dedup**: Contact-unlock capabilities (`enrich_email`, `enrich_phone`) consult an unlock cache. Repeat unlocks of the same person return cached data at 0 tokens (`cost.cached: true`).
+## customer_api
+`customer_api` is a special capability whose backend route is empty. The CLI fetches the user-provided URL locally and parses the response. Token cost is always 0. Use it to mix third-party data sources into the same enrichment workflow.

package/src/args.js ADDED Viewed

@@ -0,0 +1,118 @@
+// Flags that NEVER take a value — they are always boolean. Listing them here
+// prevents the greedy `--flag <next-token>` consumption from swallowing what
+// is actually a command name or positional argument
+// (e.g. `--ai-mode tokens` previously parsed as { aiMode: "tokens" }).
+const BOOLEAN_FLAGS = new Set([
+  "ai-mode",
+  "yes",
+  "y",
+  "json",
+  "dry-run",
+  "help",
+  "h",
+  "version",
+  "v",
+  "force",
+  "skip-balance-check",
+  "save-api-key"
+]);
+export function parseArgv(argv) {
+  const positionals = [];
+  const flags = {};
+  for (let index = 0; index < argv.length; index += 1) {
+    const token = argv[index];
+    if (token === "--") {
+      positionals.push(...argv.slice(index + 1));
+      break;
+    }
+    if (token.startsWith("--no-")) {
+      setFlag(flags, toCamel(token.slice(5)), false);
+      continue;
+    }
+    if (token.startsWith("--")) {
+      const raw = token.slice(2);
+      const equalsIndex = raw.indexOf("=");
+      if (equalsIndex >= 0) {
+        setFlag(flags, toCamel(raw.slice(0, equalsIndex)), coerceValue(raw.slice(equalsIndex + 1)));
+        continue;
+      }
+      const key = toCamel(raw);
+      if (BOOLEAN_FLAGS.has(raw)) {
+        setFlag(flags, key, true);
+        continue;
+      }
+      const next = argv[index + 1];
+      if (next !== undefined && !next.startsWith("-")) {
+        setFlag(flags, key, coerceValue(next));
+        index += 1;
+      } else {
+        setFlag(flags, key, true);
+      }
+      continue;
+    }
+    if (token.startsWith("-") && token.length > 1) {
+      for (const shortFlag of token.slice(1)) setFlag(flags, shortFlag, true);
+      continue;
+    }
+    positionals.push(token);
+  }
+  return { positionals, flags };
+}
+export function readFlag(flags, names, fallback = undefined) {
+  const keys = Array.isArray(names) ? names : [names];
+  for (const key of keys) {
+    if (Object.prototype.hasOwnProperty.call(flags, key)) {
+      const value = flags[key];
+      return Array.isArray(value) ? value[value.length - 1] : value;
+    }
+  }
+  return fallback;
+}
+export function readFlagList(flags, names) {
+  const keys = Array.isArray(names) ? names : [names];
+  const values = [];
+  for (const key of keys) {
+    const value = flags[key];
+    if (Array.isArray(value)) values.push(...value);
+    else if (value !== undefined && value !== true && value !== "") values.push(value);
+  }
+  return values;
+}
+export function requireFlag(flags, names, label) {
+  const value = readFlag(flags, names);
+  if (value === undefined || value === true || value === "") {
+    throw new Error(`Missing required flag: --${label || names}`);
+  }
+  return value;
+}
+function setFlag(flags, key, value) {
+  if (Object.prototype.hasOwnProperty.call(flags, key)) {
+    flags[key] = Array.isArray(flags[key]) ? [...flags[key], value] : [flags[key], value];
+    return;
+  }
+  flags[key] = value;
+}
+function toCamel(value) {
+  return value.replace(/-([a-z0-9])/g, (_, char) => char.toUpperCase());
+}
+function coerceValue(value) {
+  if (value === "true") return true;
+  if (value === "false") return false;
+  if (/^-?\d+(\.\d+)?$/.test(value)) return Number(value);
+  return value;
+}

package/src/billing.js ADDED Viewed

@@ -0,0 +1,54 @@
+/**
+ * Unit price fallback.
+ *
+ * Real prices are maintained in the backend admin table (api_product_feature) and
+ * delivered via the `unitPriceToken` field returned by /api/v1/enrich/configs.
+ * The CLI fetches this once at startup and attaches unitPriceToken to each capability.
+ *
+ * This local table is only used when /configs fetch fails or when the capability
+ * registry does not include unitPriceToken — values are approximate, intended for
+ * dry-run estimation only.
+ */
+const FALLBACK_BY_FEAT_ID = {
+  enrich_company: 1,
+  enrich_company_profile: 1,
+  enrich_prospect: 1,
+  unlock_contact: 2,
+  verify_email_address: 1,
+  verify_phone_number: 1,
+  match_company_ceo: 2,
+  match_company_executives: 2,
+  enrich_flashagent: 2,
+  run_llm_with_deep_research: 5,
+  run_llm_with_deep_research_by_flashrev: 5,
+  run_llm_with_deep_research_by_perplexity: 5,
+  custom_api_request: 1
+};
+const FALLBACK_DEFAULT = 1;
+/**
+ * Estimated unit price (tokens per call) for a capability.
+ * Priority: capability.unitPriceToken (synced from remote) > featId fallback > default 1.
+ */
+export function unitPriceFor(capability) {
+  if (!capability) return FALLBACK_DEFAULT;
+  if (capability.unitPriceToken != null) return Number(capability.unitPriceToken) || FALLBACK_DEFAULT;
+  if (capability.featId && FALLBACK_BY_FEAT_ID[capability.featId]) {
+    return FALLBACK_BY_FEAT_ID[capability.featId];
+  }
+  return FALLBACK_DEFAULT;
+}
+/**
+ * Billing summary shown by dry-run / estimate.
+ */
+export function billingForCapability(capability) {
+  const cost = unitPriceFor(capability);
+  return {
+    action: capability?.displayName || capability?.name || capability?.id || "Enrich",
+    cost,
+    unit: "Run",
+    source: capability?.unitPriceToken != null ? "remote" : "fallback"
+  };
+}