fullstackgtm 0.25.0 → 0.25.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,24 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
5
5
  and the project adheres to [Semantic Versioning](https://semver.org/).
6
6
  The path to 1.0 is planned in [docs/roadmap-to-1.0.md](./docs/roadmap-to-1.0.md).
7
7
 
8
+ ## [0.25.1] — 2026-06-12
9
+
10
+ Docs-sync release — no code changes.
11
+
12
+ ### Fixed
13
+
14
+ - README, INSTALL_FOR_AGENTS.md, the agent skill, and docs/api.md corrected
15
+ against the shipped surface: the MCP tool list now enumerates all 8 tools
16
+ (read-only vs gated), the builtin rule count is 12, docs/api.md gains a
17
+ Schedule section and the `schedule` command in its CLI list, the README
18
+ cites the 612-run CRM-ops benchmark and the `diff --fail-on-new-findings`
19
+ CI gate, the bulk-update section covers `!~` / `--create-task` /
20
+ `--force-archive-duplicates`, and the skill's verb map completes the
21
+ `schedule`, `market`, and `plans` rows and adds `report`.
22
+ - The 0.23.0 entry below is amended retroactively: `dedupe`, `reassign`,
23
+ `fix`, and `--set <field>=from:<sourceField>` shipped in 0.23.0 without a
24
+ changelog record.
25
+
8
26
  ## [0.25.0] — 2026-06-12
9
27
 
10
28
  ### Added
@@ -136,6 +154,29 @@ everything that shipped 0.19–0.23. (No code changes.)
136
154
  - Every `enrich` subcommand catches `--help`/`-h` before config load,
137
155
  credential resolution, or any network call. No scheduling/cron logic —
138
156
  that is the horizontal schedule layer's job (docs/schedule.md).
157
+ - **Four task-shaped verbs** (entry added retroactively — these shipped in
158
+ 0.23.0 without a changelog record). The 612-run benchmark's gated-agent
159
+ failures clustered into four missing verbs; all four compile to plans
160
+ through the existing plan → approve → apply gate — nothing writes directly.
161
+ - `dedupe <account|contact|deal> --key <domain|email|name>` — duplicate
162
+ groups by normalized identity key, one `merge_records` operation per
163
+ group with a deterministic survivor (`richest` = most populated data
164
+ fields, ties to lowest id; `oldest` = lowest id). High risk, approval
165
+ required; merges are irreversible on apply.
166
+ - `reassign --from <ownerId> --to <ownerId>` — the ownership-handoff
167
+ playbook: one bulk-update-style plan per object type, extra `--where`
168
+ scoping account-lifted for deals/contacts, `--except-deal-stage`
169
+ excluding the stage AND records whose account has an open deal in it —
170
+ re-verified per record at apply time.
171
+ - `fix --rule <id>` — one-shot composite: audit one rule → save → suggest
172
+ → approve only suggestion-backed values at the confidence bar → apply
173
+ (`--yes` required), with a stage-by-stage summary.
174
+ - `bulk-update --set <field>=from:<sourceField>` — per-record derived
175
+ values resolved from the filter view (relational sources like
176
+ `account.ownerId` included); empty-source records are skipped and
177
+ counted, never guessed. Plus the `--archive` duplicate guard: archiving
178
+ a record that shares its identity key with another is refused and
179
+ pointed at `dedupe`, overridable with `--force-archive-duplicates`.
139
180
 
140
181
  ## [0.22.0] — 2026-06-12
141
182
 
@@ -3,6 +3,10 @@
3
3
  Deterministic install-and-verify steps. Every command is non-interactive, every
4
4
  check has an expected output, and nothing here writes to a CRM.
5
5
 
6
+ If your harness supports agent skills, `npx skills add fullstackgtm/core`
7
+ installs the compact operating guide; this document remains the deterministic
8
+ install-and-verify path.
9
+
6
10
  ## 1. Install
7
11
 
8
12
  ```bash
@@ -60,6 +64,11 @@ page texts — every span is checked character-for-character against the stored
60
64
  capture, and paraphrased quotes are rejected. In non-interactive contexts the
61
65
  CLI never prompts — it fails with this guidance.
62
66
 
67
+ Apollo enrichment (`enrich append --source apollo`) needs `APOLLO_API_KEY` in
68
+ the environment, or have the human run `echo "$KEY" | fullstackgtm login apollo`
69
+ once. Without it, `enrich ingest <file> --source clay` still stages push-style
70
+ data keyless.
71
+
63
72
  Provider prerequisites (what the human must create, and which scopes) are in
64
73
  the README's **"Connect your CRM"** section: HubSpot needs a private app with
65
74
  four `crm.objects.*.read` scopes (plus write scopes only for `apply`);
@@ -111,8 +120,12 @@ If the working directory's project already has the peers in its node_modules,
111
120
  the server resolves them from there (peer-dependency semantics) — so this
112
121
  works from inside existing projects too.
113
122
 
114
- Tools exposed over stdio: `fullstackgtm_audit` (read-only),
115
- `fullstackgtm_rules`, `fullstackgtm_apply` (requires `approvedOperationIds`).
123
+ Tools exposed over stdio read-only: `fullstackgtm_audit`,
124
+ `fullstackgtm_rules`, `fullstackgtm_suggest`, `fullstackgtm_call_parse`,
125
+ `fullstackgtm_resolve`, `fullstackgtm_market_worksheet`. Gated:
126
+ `fullstackgtm_apply` (requires explicit `approvedOperationIds`),
127
+ `fullstackgtm_market_observe` (every quoted span is verified against the
128
+ stored captures before anything is appended).
116
129
 
117
130
  ## Troubleshooting
118
131
 
package/README.md CHANGED
@@ -127,7 +127,7 @@ fullstackgtm reassign --from 411 --to 902 --except-deal-stage closing --save #
127
127
  fullstackgtm fix --rule missing-deal-owner --provider hubspot --yes # audit one rule → suggest → approve → apply, one command
128
128
  ```
129
129
 
130
- `bulk-update` filters the snapshot (`=`, `!=`, `~` substring, `:empty`/`:notempty`, `|` any-of, relational pseudo-fields like `account.domain` or `openDealStages`) into a dry-run patch plan — and **the full filter is re-verified per record at apply time**, with mid-apply rechecks, so a record that stopped matching between audit and apply is skipped, not clobbered. Equality filters double as preconditions; `--require` adds explicit ones; `--guard` asserts cross-record conditions; `--max-operations` caps blast radius. `--set field=from:<sourceField>` derives values per record; `--archive` refuses records whose identity key (account domain, contact email) is shared with another record — that's a duplicate, and duplicates are merged with `dedupe`, not archived around.
130
+ `bulk-update` filters the snapshot (`=`, `!=`, `~` substring, `!~` not-substring, `:empty`/`:notempty`, `|` any-of, relational pseudo-fields like `account.domain` or `openDealStages`) into a dry-run patch plan — and **the full filter is re-verified per record at apply time**, with mid-apply rechecks, so a record that stopped matching between audit and apply is skipped, not clobbered. Equality filters double as preconditions; `--require` adds explicit ones; `--guard` asserts cross-record conditions; `--max-operations` caps blast radius. `--set field=from:<sourceField>` derives values per record; `--create-task <text>` is the third change mode, emitting approval-gated `create_task` operations instead of field writes; `--archive` refuses records whose identity key (account domain, contact email) is shared with another record — that's a duplicate, and duplicates are merged with `dedupe`, not archived around (`--force-archive-duplicates` overrides that refusal explicitly).
131
131
 
132
132
  `dedupe` finds duplicate groups by normalized identity key and emits one `merge_records` operation per group with a deterministic survivor (`richest` = most populated fields, ties to lowest id; `oldest`). Merges stay irreversible-and-therefore-low-confidence-capped on approval, exactly like merge suggestions from the audit. `reassign` is the ownership-handoff playbook: one plan per object type, extra scoping account-lifted to deals and contacts, and `--except-deal-stage` excludes both deals in that stage and every record whose account has an open deal in it. `fix` is the one-shot composite for a single rule: audit → save → suggest → approve suggestion-backed operations at the confidence bar → with `--yes`, apply and print the stage-by-stage summary; without it, stop after approval and print the apply command.
133
133
 
@@ -210,12 +210,17 @@ fullstackgtm audit --input snap.json --rules stale-deal --stale-days 45 --json
210
210
 
211
211
  # Gate a nightly CI job or agent run on hygiene: exit 2 if findings ≥ threshold
212
212
  fullstackgtm audit --provider hubspot --fail-on warning
213
+
214
+ # Gate CI on hygiene drift instead: exit 2 only when a NEW (rule, record) finding appears
215
+ fullstackgtm diff --before old.json --after new.json --fail-on-new-findings
213
216
  ```
214
217
 
215
218
  - Finding and operation ids are **stable hashes** of rule + record, so two runs over the same data produce identical ids — agents can diff plans, track findings across runs, and approve operations by id without re-parsing.
216
219
  - `--demo` (with `--seed`) generates a realistic mid-market CRM with injected real-world failure modes — departed owners, unlinked deals, orphan accounts, stale pipeline — so agents and CI can exercise the full snapshot → audit → apply pipeline with zero credentials.
217
220
  - Exit codes: `0` success, `1` error, `2` findings at/above `--fail-on`.
218
221
 
222
+ "Built for agents" is measured, not asserted: a 612-run benchmark (17 scenarios × 3 tool-surface arms × 4 trials, deterministic graders over final CRM state, τ-bench-style pass^k) shows the gated CLI surface beating raw CRM-API access on completion-under-policy for every model tested. Full matrix and methodology: [the leaderboard](./evals/crm/leaderboard/RESULTS.md).
223
+
219
224
  ## Authentication: CLI-first, browser only at the consent moment
220
225
 
221
226
  Credential resolution is a ladder — the first rung that yields a token wins:
@@ -297,7 +302,7 @@ The Stripe connector only reads customers and subscriptions, and `apply` is read
297
302
  | Concept | What it is |
298
303
  |---|---|
299
304
  | **Canonical snapshot** | Provider-independent view of users, accounts, contacts, deals, activities. Records carry `identities` — `(provider, externalId)` claims — so the same real-world entity can be tracked across several systems. |
300
- | **Audit rule** | A deterministic function `(context) => { findings, operations }`. Eleven built-ins cover orphan accounts, ownerless/unlinked/amount-less deals, past close dates, stale pipeline, duplicates, and more — `fullstackgtm rules` lists them all. Write your own in ~10 lines. |
305
+ | **Audit rule** | A deterministic function `(context) => { findings, operations }`. Twelve built-ins cover orphan accounts, ownerless/unlinked/amount-less deals, past close dates, stale pipeline, duplicates, and more — `fullstackgtm rules` lists them all. Write your own in ~10 lines. |
301
306
  | **Patch plan** | The dry-run output of an audit: findings plus typed patch operations with before/after values, reasons, risk levels, and approval flags. Always a proposal, never a mutation. |
302
307
  | **Connector** | A provider adapter: `fetchSnapshot()` for reads, optional `applyOperation()` for writes. HubSpot and Salesforce reference connectors ship in the package; connectors never drop records they can't fully resolve — the audit flags them instead. |
303
308
  | **Patch plan run** | The audit record of one apply attempt: per-operation applied/failed/skipped results. |
@@ -396,7 +401,13 @@ Or configure any MCP client (Cursor, Claude Desktop, …) with:
396
401
  }
397
402
  ```
398
403
 
399
- Exposes `fullstackgtm_audit` (read-only; sample, demo, file, or live provider sources with optional rule scoping), `fullstackgtm_rules` (rule discovery), and `fullstackgtm_apply` (requires explicit `approvedOperationIds`) over stdio. Tokens stored via `fullstackgtm login` are picked up automatically — the env var is only needed when no stored login exists.
404
+ Eight tools are exposed over stdio.
405
+
406
+ **Read-only:** `fullstackgtm_audit` (sample, demo, file, or live provider sources with optional rule scoping), `fullstackgtm_rules` (rule discovery), `fullstackgtm_suggest` (deterministic placeholder values with confidence + reasons), `fullstackgtm_call_parse` (transcripts → provenance-marked segments, insights, and evidence), `fullstackgtm_resolve` (the create gate: exists / ambiguous / safe_to_create), and `fullstackgtm_market_worksheet` (the classification packet for one vendor: claims, judging rules, captured page texts).
407
+
408
+ **Gated:** `fullstackgtm_apply` (requires explicit `approvedOperationIds`; placeholders still need value overrides) and `fullstackgtm_market_observe` (verifies every quoted span against the stored captures before appending — nothing is stored unless the whole set passes).
409
+
410
+ Tokens stored via `fullstackgtm login` are picked up automatically — the env var is only needed when no stored login exists.
400
411
 
401
412
  ## Safety model
402
413
 
package/docs/api.md CHANGED
@@ -21,7 +21,7 @@ release.
21
21
  - `GtmAuditRule` — `{ id, title, description, category?, evaluate(context) }`; the public extension point.
22
22
  - `GtmRuleContext` — `{ snapshot, policy, index }` with the prebuilt O(n) `GtmSnapshotIndex`.
23
23
  - `auditSnapshot(snapshot, policy?, rules?)` → `PatchPlan`.
24
- - `builtinAuditRules` (11 rules) plus each rule exported individually.
24
+ - `builtinAuditRules` (12 rules) plus each rule exported individually.
25
25
  - **Determinism guarantee**: identical inputs produce identical findings and operations with identical ids (`auditFindingId`, `patchOperationId` are stable hashes of rule + record).
26
26
 
27
27
  ## Patch plans and application
@@ -62,7 +62,9 @@ Commands: `login` / `logout`, `snapshot`, `audit`, `report`, `diff`, `merge`, `p
62
62
  `bulk-update`, `dedupe`, `reassign`, `fix`,
63
63
  `market` (`init` / `capture` / `classify` / `worksheet` / `observe` / `fronts` /
64
64
  `axes` / `overlay` / `scale` / `report` / `refresh`),
65
- `enrich` (`append` / `refresh` / `ingest` / `status`), `rules`, `profiles`, `doctor`.
65
+ `enrich` (`append` / `refresh` / `ingest` / `status`),
66
+ `schedule` (`add` / `list` / `remove` / `enable` / `disable` / `run` /
67
+ `install` / `uninstall` / `status`), `rules`, `profiles`, `doctor`.
66
68
  Exit codes: `0` success · `1` error · `2` findings/regressions at the requested gate
67
69
  (`--fail-on`, `--fail-on-new-findings`). `--json` everywhere; JSON output shapes are stable.
68
70
 
@@ -115,6 +117,30 @@ dependency-free CSV intake; the Apollo client (`createApolloClient`,
115
117
  `pullApolloRecords`, 429-aware with `Retry-After`) is the first `api`-kind
116
118
  source.
117
119
 
120
+ ## Schedule
121
+
122
+ The horizontal scheduler: a declarative schedule-entry store, a
123
+ dependency-free 5-field cron parser, and the read/plan-side `SCHEDULABLE`
124
+ allowlist. `validateSchedulableArgv` enforces the allowlist at `schedule add`
125
+ time and re-checks it at run time (`tokenizeCommand` splits the quoted command
126
+ string — tokenization, never shell). `apply` is schedulable only as
127
+ `apply --plan-id <id>`, with the plan's `approved` status re-checked at every
128
+ firing — an unapproved plan records a `plan_not_approved` no-op run
129
+ (`ScheduleRunRecord.noopReason`) instead of executing.
130
+
131
+ - Entries: `ScheduleEntry` (`ScheduleProvider` is `"local"` for now),
132
+ `scheduleId`, `ScheduleStore` / `createFileScheduleStore`.
133
+ - Run history: `ScheduleRunRecord` (`ScheduleRunTrigger`: `cron` | `manual`),
134
+ `ScheduleRunStore` / `createFileScheduleRunStore`, with `schedulesPath` /
135
+ `scheduleRunsDir` for the profile-scoped file layout.
136
+ - Cron: `parseCron` → `CronExpression`, `cronMatches`, `nextCronFiring`,
137
+ `expectedFirings`, `computeMissedFirings` (status and missed-firing
138
+ visibility — local cron has no catch-up).
139
+ - Local provider: `schedule install` renders enabled entries into a
140
+ sentinel-managed crontab block — `crontabSentinels`, `renderManagedBlock`,
141
+ `replaceManagedBlock`, and `systemCrontabIo` behind the injectable
142
+ `CrontabIo` seam (tests never touch a real crontab).
143
+
118
144
  ## Market map
119
145
 
120
146
  Newer surface (0.16–0.23); shapes are settling toward the 1.0 contract. A live
@@ -78,18 +78,21 @@ values win, **merges cannot be undone**, and a record stops merging after
78
78
  250 cumulative merges. Salesforce merge is SOAP/Apex only (no REST), only
79
79
  Lead/Contact/Account/Case, max 3 records per call.
80
80
 
81
- **The gap:** our three duplicate rules (`duplicate-account-domain`,
82
- `duplicate-contact-email`, `duplicate-open-deal`) detect groups but emit
83
- only merge-review *tasks* detection without remediation.
81
+ **The gap (closed in 0.12):** our three duplicate rules
82
+ (`duplicate-account-domain`, `duplicate-contact-email`,
83
+ `duplicate-open-deal`) used to detect groups but emit only merge-review
84
+ *tasks* — detection without remediation.
84
85
 
85
- **The plan (0.12):** a `merge_records` operation type —
86
+ **Shipped (0.12):** a `merge_records` operation type —
86
87
  `requires_human_survivor_selection` placeholder, survivor heuristics in
87
88
  `suggest` (ordered, evidence-based: most engagements → oldest → most
88
89
  complete, each with a written reason), high risk, approval required, with
89
90
  the irreversibility called out in the plan text. The dry-run plan is the
90
91
  preview every commercial tool charges for; the pre-apply snapshot is the
91
92
  loser-record archive. HubSpot first; Salesforce merge documented as
92
- unsupported until an Apex path justifies itself.
93
+ unsupported until an Apex path justifies itself. 0.23 added `dedupe` as a
94
+ first-class verb over the same operation type (groups → one governed merge
95
+ per group, deterministic survivor).
93
96
 
94
97
  ## D — Delete/Archive: the exit ramp
95
98
 
@@ -131,5 +134,7 @@ Lessons from auditing our own apply path:
131
134
  | 0.11.1 | Fix our own faucet: resolve-first `create:` + plan-scoped dedup, HubSpot association-aware CAS for `link_record`, domain normalization in `duplicate-account-domain`, `create_task` idempotency token |
132
135
  | 0.12 (shipped) | `merge_records` (HubSpot contacts/companies/deals) + survivor suggestions capped at low confidence; the three duplicate rules emit governed merges instead of review tasks |
133
136
  | 0.15 (shipped) | `resolve` gate (CLI/lib/MCP, gate exit codes), provenance capture (`hs_object_source*` → `RecordProvenance`) + attribution in duplicate findings, self-stamped creates |
134
- | 0.16 | prevention-posture checks (native duplicate rules active? unique-value properties defined?) · live targeted resolve lookups |
137
+ | 0.16 (shipped the market map instead) | prevention-posture checks (native duplicate rules active? unique-value properties defined?) and live targeted resolve lookups were slated here but did not ship — they remain future work; 0.16 went to the market map layer |
138
+ | 0.23 (shipped) | `dedupe <object> --key <domain\|email\|name>` — the Remediate layer as a first-class verb: duplicate groups by normalized identity key, one governed `merge_records` per group, deterministic survivor (`richest`/`oldest`) |
139
+ | 0.24 (shipped) | schedule layer — recurring Detect: the nightly watch recipe becomes a declared cadence (`schedule add "audit --provider hubspot --save" --cron "0 2 * * *"`); read/plan-side allowlist only, scheduling never auto-approves |
135
140
  | docs | The nightly watch recipe (existing flags, documented as CRM CI) |
@@ -106,6 +106,33 @@ The original thesis: GTM data disagrees across systems.
106
106
  - Docs site with the operating-model registry as browsable reference.
107
107
  - Performance pass: streaming snapshots for very large orgs.
108
108
 
109
+ ## 0.10 → 0.25 — the layers, as shipped
110
+
111
+ The plan above ended at the freeze; what shipped next grew the surface
112
+ outward, one layer per release, each consolidating before the next expanded:
113
+
114
+ - **0.11** — the suggest chain: deterministic placeholder values with
115
+ confidence + reasons, `plans approve --values-from`.
116
+ - **0.12** — governed merge: `merge_records` (HubSpot contacts / companies /
117
+ deals), survivor suggestions capped at low confidence.
118
+ - **0.13–0.14** — call intelligence: `call parse|score|link|plan`, LLM
119
+ extraction behind the bring-your-own-key seam, deterministic baseline,
120
+ provenance-marked insights.
121
+ - **0.15** — the Prevent layer: the `resolve` create gate plus record-source
122
+ provenance and attribution.
123
+ - **0.16–0.22** — the market map: content-addressed captures, classification
124
+ with mechanical span verification, front states and drift, axis discovery,
125
+ overlay directives, scale estimation, the field report.
126
+ - **0.19 / 0.23** — governed write verbs (`bulk-update`, then `dedupe`,
127
+ `reassign`, `fix`) and the enrich layer (Apollo / Clay, fill-blanks-only
128
+ plans).
129
+ - **0.24** — the schedule layer: horizontal cron, read/plan-side allowlist,
130
+ scheduling never auto-approves.
131
+ - **0.25** — agent skill distribution (`npx skills add fullstackgtm/core`).
132
+
133
+ The known-gaps list below predates these layers and has been re-verified
134
+ against the 0.25 surface: still accurate, still open.
135
+
109
136
  ## Known real-portal gaps to close before 1.0
110
137
 
111
138
  Found by exercising the published package as a fresh RevOps user with a real
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "fullstackgtm",
3
- "version": "0.25.0",
3
+ "version": "0.25.1",
4
4
  "description": "Open-source agentic GTM ops framework: canonical GTM data model, pluggable deterministic audits, reviewable dry-run patch plans, approval-gated write-back with conflict detection, and cross-system entity resolution. HubSpot, Salesforce, and Stripe connectors included.",
5
5
  "license": "Apache-2.0",
6
6
  "author": "Full Stack GTM",
@@ -48,7 +48,8 @@ Credentials resolve in order: `--token-env <NAME>` → ambient env
48
48
  In a sandbox prefer the first two. LLM-powered verbs (`call parse`, `call
49
49
  score`, `market classify`) take `ANTHROPIC_API_KEY`/`OPENAI_API_KEY`, or use
50
50
  their deterministic/worksheet fallbacks — the CLI never prompts when
51
- non-interactive.
51
+ non-interactive. `--profile <name>` (or `FULLSTACKGTM_PROFILE`) scopes
52
+ credentials AND stored plans per client org.
52
53
 
53
54
  ## Verb map
54
55
 
@@ -61,9 +62,10 @@ non-interactive.
61
62
  | `fix --rule <id>` | audit one rule → suggest → approve at the confidence bar → apply only with `--yes` |
62
63
  | `call parse\|score\|link\|plan` | Transcripts → evidence-quoted insights, rubric scorecards, deal linking, governed next-step writes |
63
64
  | `enrich append\|refresh\|ingest\|status` | Governed enrichment (Apollo pull / Clay ingest), fill-blanks-only plans |
64
- | `market capture\|classify\|worksheet\|observe\|fronts\|axes\|overlay\|scale\|report\|refresh` | Competitive category map; evidence quotes verified verbatim against stored captures |
65
- | `schedule add\|install\|run\|status` | Horizontal cron; read/plan-side allowlist only — scheduling NEVER auto-approves |
66
- | `plans list\|approve` / `snapshot` / `rules` / `doctor` | Plan lifecycle, raw snapshots, rule registry, machine state |
65
+ | `market init\|capture\|classify\|worksheet\|observe\|fronts\|axes\|overlay\|scale\|report\|refresh` | Competitive category map; evidence quotes verified verbatim against stored captures |
66
+ | `schedule add\|list\|remove\|enable\|disable\|run\|install\|uninstall\|status` | Horizontal cron; read/plan-side allowlist only — scheduling NEVER auto-approves |
67
+ | `report` | Client-ready audit deliverable (markdown or self-contained HTML) |
68
+ | `plans list\|show\|approve\|reject` / `snapshot` / `rules` / `doctor` | Plan lifecycle, raw snapshots, rule registry, machine state |
67
69
 
68
70
  All write-shaped verbs produce plans; none writes outside approve → apply.
69
71
  Add `--json` for machine-readable output on any command.