npm - fullstackgtm - Versions diffs - 0.25.0 → 0.25.1 - Mend

fullstackgtm 0.25.0 → 0.25.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/CHANGELOG.md +41 -0
package/INSTALL_FOR_AGENTS.md +15 -2
package/README.md +14 -3
package/docs/api.md +28 -2
package/docs/crm-health-lifecycle.md +11 -6
package/docs/roadmap-to-1.0.md +27 -0
package/package.json +1 -1
package/skills/fullstackgtm/SKILL.md +6 -4

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,24 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and the project adheres to [Semantic Versioning](https://semver.org/).
 The path to 1.0 is planned in [docs/roadmap-to-1.0.md](./docs/roadmap-to-1.0.md).
+## [0.25.1] — 2026-06-12
+Docs-sync release — no code changes.
+### Fixed
+- README, INSTALL_FOR_AGENTS.md, the agent skill, and docs/api.md corrected
+  against the shipped surface: the MCP tool list now enumerates all 8 tools
+  (read-only vs gated), the builtin rule count is 12, docs/api.md gains a
+  Schedule section and the `schedule` command in its CLI list, the README
+  cites the 612-run CRM-ops benchmark and the `diff --fail-on-new-findings`
+  CI gate, the bulk-update section covers `!~` / `--create-task` /
+  `--force-archive-duplicates`, and the skill's verb map completes the
+  `schedule`, `market`, and `plans` rows and adds `report`.
+- The 0.23.0 entry below is amended retroactively: `dedupe`, `reassign`,
+  `fix`, and `--set <field>=from:<sourceField>` shipped in 0.23.0 without a
+  changelog record.
 ## [0.25.0] — 2026-06-12
 ### Added
@@ -136,6 +154,29 @@ everything that shipped 0.19–0.23. (No code changes.)
   - Every `enrich` subcommand catches `--help`/`-h` before config load,
     credential resolution, or any network call. No scheduling/cron logic —
     that is the horizontal schedule layer's job (docs/schedule.md).
+- **Four task-shaped verbs** (entry added retroactively — these shipped in
+  0.23.0 without a changelog record). The 612-run benchmark's gated-agent
+  failures clustered into four missing verbs; all four compile to plans
+  through the existing plan → approve → apply gate — nothing writes directly.
+  - `dedupe <account|contact|deal> --key <domain|email|name>` — duplicate
+    groups by normalized identity key, one `merge_records` operation per
+    group with a deterministic survivor (`richest` = most populated data
+    fields, ties to lowest id; `oldest` = lowest id). High risk, approval
+    required; merges are irreversible on apply.
+  - `reassign --from <ownerId> --to <ownerId>` — the ownership-handoff
+    playbook: one bulk-update-style plan per object type, extra `--where`
+    scoping account-lifted for deals/contacts, `--except-deal-stage`
+    excluding the stage AND records whose account has an open deal in it —
+    re-verified per record at apply time.
+  - `fix --rule <id>` — one-shot composite: audit one rule → save → suggest
+    → approve only suggestion-backed values at the confidence bar → apply
+    (`--yes` required), with a stage-by-stage summary.
+  - `bulk-update --set <field>=from:<sourceField>` — per-record derived
+    values resolved from the filter view (relational sources like
+    `account.ownerId` included); empty-source records are skipped and
+    counted, never guessed. Plus the `--archive` duplicate guard: archiving
+    a record that shares its identity key with another is refused and
+    pointed at `dedupe`, overridable with `--force-archive-duplicates`.
 ## [0.22.0] — 2026-06-12

package/INSTALL_FOR_AGENTS.md CHANGED Viewed

@@ -3,6 +3,10 @@
 Deterministic install-and-verify steps. Every command is non-interactive, every
 check has an expected output, and nothing here writes to a CRM.
+If your harness supports agent skills, `npx skills add fullstackgtm/core`
+installs the compact operating guide; this document remains the deterministic
+install-and-verify path.
 ## 1. Install
 ```bash
@@ -60,6 +64,11 @@ page texts — every span is checked character-for-character against the stored
 capture, and paraphrased quotes are rejected. In non-interactive contexts the
 CLI never prompts — it fails with this guidance.
+Apollo enrichment (`enrich append --source apollo`) needs `APOLLO_API_KEY` in
+the environment, or have the human run `echo "$KEY" | fullstackgtm login apollo`
+once. Without it, `enrich ingest <file> --source clay` still stages push-style
+data keyless.
 Provider prerequisites (what the human must create, and which scopes) are in
 the README's **"Connect your CRM"** section: HubSpot needs a private app with
 four `crm.objects.*.read` scopes (plus write scopes only for `apply`);
@@ -111,8 +120,12 @@ If the working directory's project already has the peers in its node_modules,
 the server resolves them from there (peer-dependency semantics) — so this
 works from inside existing projects too.
-Tools exposed over stdio: `fullstackgtm_audit` (read-only),
-`fullstackgtm_rules`, `fullstackgtm_apply` (requires `approvedOperationIds`).
+Tools exposed over stdio — read-only: `fullstackgtm_audit`,
+`fullstackgtm_rules`, `fullstackgtm_suggest`, `fullstackgtm_call_parse`,
+`fullstackgtm_resolve`, `fullstackgtm_market_worksheet`. Gated:
+`fullstackgtm_apply` (requires explicit `approvedOperationIds`),
+`fullstackgtm_market_observe` (every quoted span is verified against the
+stored captures before anything is appended).
 ## Troubleshooting

package/README.md CHANGED Viewed

@@ -127,7 +127,7 @@ fullstackgtm reassign --from 411 --to 902 --except-deal-stage closing --save   #
 fullstackgtm fix --rule missing-deal-owner --provider hubspot --yes  # audit one rule → suggest → approve → apply, one command
 ```
-`bulk-update` filters the snapshot (`=`, `!=`, `~` substring, `:empty`/`:notempty`, `|` any-of, relational pseudo-fields like `account.domain` or `openDealStages`) into a dry-run patch plan — and **the full filter is re-verified per record at apply time**, with mid-apply rechecks, so a record that stopped matching between audit and apply is skipped, not clobbered. Equality filters double as preconditions; `--require` adds explicit ones; `--guard` asserts cross-record conditions; `--max-operations` caps blast radius. `--set field=from:<sourceField>` derives values per record; `--archive` refuses records whose identity key (account domain, contact email) is shared with another record — that's a duplicate, and duplicates are merged with `dedupe`, not archived around.
+`bulk-update` filters the snapshot (`=`, `!=`, `~` substring, `!~` not-substring, `:empty`/`:notempty`, `|` any-of, relational pseudo-fields like `account.domain` or `openDealStages`) into a dry-run patch plan — and **the full filter is re-verified per record at apply time**, with mid-apply rechecks, so a record that stopped matching between audit and apply is skipped, not clobbered. Equality filters double as preconditions; `--require` adds explicit ones; `--guard` asserts cross-record conditions; `--max-operations` caps blast radius. `--set field=from:<sourceField>` derives values per record; `--create-task <text>` is the third change mode, emitting approval-gated `create_task` operations instead of field writes; `--archive` refuses records whose identity key (account domain, contact email) is shared with another record — that's a duplicate, and duplicates are merged with `dedupe`, not archived around (`--force-archive-duplicates` overrides that refusal explicitly).
 `dedupe` finds duplicate groups by normalized identity key and emits one `merge_records` operation per group with a deterministic survivor (`richest` = most populated fields, ties to lowest id; `oldest`). Merges stay irreversible-and-therefore-low-confidence-capped on approval, exactly like merge suggestions from the audit. `reassign` is the ownership-handoff playbook: one plan per object type, extra scoping account-lifted to deals and contacts, and `--except-deal-stage` excludes both deals in that stage and every record whose account has an open deal in it. `fix` is the one-shot composite for a single rule: audit → save → suggest → approve suggestion-backed operations at the confidence bar → with `--yes`, apply and print the stage-by-stage summary; without it, stop after approval and print the apply command.
@@ -210,12 +210,17 @@ fullstackgtm audit --input snap.json --rules stale-deal --stale-days 45 --json
 # Gate a nightly CI job or agent run on hygiene: exit 2 if findings ≥ threshold
 fullstackgtm audit --provider hubspot --fail-on warning
+# Gate CI on hygiene drift instead: exit 2 only when a NEW (rule, record) finding appears
+fullstackgtm diff --before old.json --after new.json --fail-on-new-findings
 ```
 - Finding and operation ids are **stable hashes** of rule + record, so two runs over the same data produce identical ids — agents can diff plans, track findings across runs, and approve operations by id without re-parsing.
 - `--demo` (with `--seed`) generates a realistic mid-market CRM with injected real-world failure modes — departed owners, unlinked deals, orphan accounts, stale pipeline — so agents and CI can exercise the full snapshot → audit → apply pipeline with zero credentials.
 - Exit codes: `0` success, `1` error, `2` findings at/above `--fail-on`.
+"Built for agents" is measured, not asserted: a 612-run benchmark (17 scenarios × 3 tool-surface arms × 4 trials, deterministic graders over final CRM state, τ-bench-style pass^k) shows the gated CLI surface beating raw CRM-API access on completion-under-policy for every model tested. Full matrix and methodology: [the leaderboard](./evals/crm/leaderboard/RESULTS.md).
 ## Authentication: CLI-first, browser only at the consent moment
 Credential resolution is a ladder — the first rung that yields a token wins:
@@ -297,7 +302,7 @@ The Stripe connector only reads customers and subscriptions, and `apply` is read
 | Concept | What it is |
 |---|---|
 | **Canonical snapshot** | Provider-independent view of users, accounts, contacts, deals, activities. Records carry `identities` — `(provider, externalId)` claims — so the same real-world entity can be tracked across several systems. |
-| **Audit rule** | A deterministic function `(context) => { findings, operations }`. Eleven built-ins cover orphan accounts, ownerless/unlinked/amount-less deals, past close dates, stale pipeline, duplicates, and more — `fullstackgtm rules` lists them all. Write your own in ~10 lines. |
+| **Audit rule** | A deterministic function `(context) => { findings, operations }`. Twelve built-ins cover orphan accounts, ownerless/unlinked/amount-less deals, past close dates, stale pipeline, duplicates, and more — `fullstackgtm rules` lists them all. Write your own in ~10 lines. |
 | **Patch plan** | The dry-run output of an audit: findings plus typed patch operations with before/after values, reasons, risk levels, and approval flags. Always a proposal, never a mutation. |
 | **Connector** | A provider adapter: `fetchSnapshot()` for reads, optional `applyOperation()` for writes. HubSpot and Salesforce reference connectors ship in the package; connectors never drop records they can't fully resolve — the audit flags them instead. |
 | **Patch plan run** | The audit record of one apply attempt: per-operation applied/failed/skipped results. |
@@ -396,7 +401,13 @@ Or configure any MCP client (Cursor, Claude Desktop, …) with:
 }
 ```
-Exposes `fullstackgtm_audit` (read-only; sample, demo, file, or live provider sources with optional rule scoping), `fullstackgtm_rules` (rule discovery), and `fullstackgtm_apply` (requires explicit `approvedOperationIds`) over stdio. Tokens stored via `fullstackgtm login` are picked up automatically — the env var is only needed when no stored login exists.
+Eight tools are exposed over stdio.
+**Read-only:** `fullstackgtm_audit` (sample, demo, file, or live provider sources with optional rule scoping), `fullstackgtm_rules` (rule discovery), `fullstackgtm_suggest` (deterministic placeholder values with confidence + reasons), `fullstackgtm_call_parse` (transcripts → provenance-marked segments, insights, and evidence), `fullstackgtm_resolve` (the create gate: exists / ambiguous / safe_to_create), and `fullstackgtm_market_worksheet` (the classification packet for one vendor: claims, judging rules, captured page texts).
+**Gated:** `fullstackgtm_apply` (requires explicit `approvedOperationIds`; placeholders still need value overrides) and `fullstackgtm_market_observe` (verifies every quoted span against the stored captures before appending — nothing is stored unless the whole set passes).
+Tokens stored via `fullstackgtm login` are picked up automatically — the env var is only needed when no stored login exists.
 ## Safety model

package/docs/api.md CHANGED Viewed

@@ -21,7 +21,7 @@ release.
 - `GtmAuditRule` — `{ id, title, description, category?, evaluate(context) }`; the public extension point.
 - `GtmRuleContext` — `{ snapshot, policy, index }` with the prebuilt O(n) `GtmSnapshotIndex`.
 - `auditSnapshot(snapshot, policy?, rules?)` → `PatchPlan`.
-- `builtinAuditRules` (11 rules) plus each rule exported individually.
+- `builtinAuditRules` (12 rules) plus each rule exported individually.
 - **Determinism guarantee**: identical inputs produce identical findings and operations with identical ids (`auditFindingId`, `patchOperationId` are stable hashes of rule + record).
 ## Patch plans and application
@@ -62,7 +62,9 @@ Commands: `login` / `logout`, `snapshot`, `audit`, `report`, `diff`, `merge`, `p
 `bulk-update`, `dedupe`, `reassign`, `fix`,
 `market` (`init` / `capture` / `classify` / `worksheet` / `observe` / `fronts` /
 `axes` / `overlay` / `scale` / `report` / `refresh`),
-`enrich` (`append` / `refresh` / `ingest` / `status`), `rules`, `profiles`, `doctor`.
+`enrich` (`append` / `refresh` / `ingest` / `status`),
+`schedule` (`add` / `list` / `remove` / `enable` / `disable` / `run` /
+`install` / `uninstall` / `status`), `rules`, `profiles`, `doctor`.
 Exit codes: `0` success · `1` error · `2` findings/regressions at the requested gate
 (`--fail-on`, `--fail-on-new-findings`). `--json` everywhere; JSON output shapes are stable.
@@ -115,6 +117,30 @@ dependency-free CSV intake; the Apollo client (`createApolloClient`,
 `pullApolloRecords`, 429-aware with `Retry-After`) is the first `api`-kind
 source.
+## Schedule
+The horizontal scheduler: a declarative schedule-entry store, a
+dependency-free 5-field cron parser, and the read/plan-side `SCHEDULABLE`
+allowlist. `validateSchedulableArgv` enforces the allowlist at `schedule add`
+time and re-checks it at run time (`tokenizeCommand` splits the quoted command
+string — tokenization, never shell). `apply` is schedulable only as
+`apply --plan-id <id>`, with the plan's `approved` status re-checked at every
+firing — an unapproved plan records a `plan_not_approved` no-op run
+(`ScheduleRunRecord.noopReason`) instead of executing.
+- Entries: `ScheduleEntry` (`ScheduleProvider` is `"local"` for now),
+  `scheduleId`, `ScheduleStore` / `createFileScheduleStore`.
+- Run history: `ScheduleRunRecord` (`ScheduleRunTrigger`: `cron` | `manual`),
+  `ScheduleRunStore` / `createFileScheduleRunStore`, with `schedulesPath` /
+  `scheduleRunsDir` for the profile-scoped file layout.
+- Cron: `parseCron` → `CronExpression`, `cronMatches`, `nextCronFiring`,
+  `expectedFirings`, `computeMissedFirings` (status and missed-firing
+  visibility — local cron has no catch-up).
+- Local provider: `schedule install` renders enabled entries into a
+  sentinel-managed crontab block — `crontabSentinels`, `renderManagedBlock`,
+  `replaceManagedBlock`, and `systemCrontabIo` behind the injectable
+  `CrontabIo` seam (tests never touch a real crontab).
 ## Market map
 Newer surface (0.16–0.23); shapes are settling toward the 1.0 contract. A live

package/docs/crm-health-lifecycle.md CHANGED Viewed

@@ -78,18 +78,21 @@ values win, **merges cannot be undone**, and a record stops merging after
 250 cumulative merges. Salesforce merge is SOAP/Apex only (no REST), only
 Lead/Contact/Account/Case, max 3 records per call.
-**The gap:** our three duplicate rules (`duplicate-account-domain`,
-`duplicate-contact-email`, `duplicate-open-deal`) detect groups but emit
-only merge-review *tasks* — detection without remediation.
+**The gap (closed in 0.12):** our three duplicate rules
+(`duplicate-account-domain`, `duplicate-contact-email`,
+`duplicate-open-deal`) used to detect groups but emit only merge-review
+*tasks* — detection without remediation.
-**The plan (0.12):** a `merge_records` operation type —
+**Shipped (0.12):** a `merge_records` operation type —
 `requires_human_survivor_selection` placeholder, survivor heuristics in
 `suggest` (ordered, evidence-based: most engagements → oldest → most
 complete, each with a written reason), high risk, approval required, with
 the irreversibility called out in the plan text. The dry-run plan is the
 preview every commercial tool charges for; the pre-apply snapshot is the
 loser-record archive. HubSpot first; Salesforce merge documented as
-unsupported until an Apex path justifies itself.
+unsupported until an Apex path justifies itself. 0.23 added `dedupe` as a
+first-class verb over the same operation type (groups → one governed merge
+per group, deterministic survivor).
 ## D — Delete/Archive: the exit ramp
@@ -131,5 +134,7 @@ Lessons from auditing our own apply path:
 | 0.11.1 | Fix our own faucet: resolve-first `create:` + plan-scoped dedup, HubSpot association-aware CAS for `link_record`, domain normalization in `duplicate-account-domain`, `create_task` idempotency token |
 | 0.12 (shipped) | `merge_records` (HubSpot contacts/companies/deals) + survivor suggestions capped at low confidence; the three duplicate rules emit governed merges instead of review tasks |
 | 0.15 (shipped) | `resolve` gate (CLI/lib/MCP, gate exit codes), provenance capture (`hs_object_source*` → `RecordProvenance`) + attribution in duplicate findings, self-stamped creates |
-| 0.16 | prevention-posture checks (native duplicate rules active? unique-value properties defined?) · live targeted resolve lookups |
+| 0.16 (shipped the market map instead) | prevention-posture checks (native duplicate rules active? unique-value properties defined?) and live targeted resolve lookups were slated here but did not ship — they remain future work; 0.16 went to the market map layer |
+| 0.23 (shipped) | `dedupe <object> --key <domain\|email\|name>` — the Remediate layer as a first-class verb: duplicate groups by normalized identity key, one governed `merge_records` per group, deterministic survivor (`richest`/`oldest`) |
+| 0.24 (shipped) | schedule layer — recurring Detect: the nightly watch recipe becomes a declared cadence (`schedule add "audit --provider hubspot --save" --cron "0 2 * * *"`); read/plan-side allowlist only, scheduling never auto-approves |
 | docs | The nightly watch recipe (existing flags, documented as CRM CI) |

package/docs/roadmap-to-1.0.md CHANGED Viewed

@@ -106,6 +106,33 @@ The original thesis: GTM data disagrees across systems.
 - Docs site with the operating-model registry as browsable reference.
 - Performance pass: streaming snapshots for very large orgs.
+## 0.10 → 0.25 — the layers, as shipped
+The plan above ended at the freeze; what shipped next grew the surface
+outward, one layer per release, each consolidating before the next expanded:
+- **0.11** — the suggest chain: deterministic placeholder values with
+  confidence + reasons, `plans approve --values-from`.
+- **0.12** — governed merge: `merge_records` (HubSpot contacts / companies /
+  deals), survivor suggestions capped at low confidence.
+- **0.13–0.14** — call intelligence: `call parse|score|link|plan`, LLM
+  extraction behind the bring-your-own-key seam, deterministic baseline,
+  provenance-marked insights.
+- **0.15** — the Prevent layer: the `resolve` create gate plus record-source
+  provenance and attribution.
+- **0.16–0.22** — the market map: content-addressed captures, classification
+  with mechanical span verification, front states and drift, axis discovery,
+  overlay directives, scale estimation, the field report.
+- **0.19 / 0.23** — governed write verbs (`bulk-update`, then `dedupe`,
+  `reassign`, `fix`) and the enrich layer (Apollo / Clay, fill-blanks-only
+  plans).
+- **0.24** — the schedule layer: horizontal cron, read/plan-side allowlist,
+  scheduling never auto-approves.
+- **0.25** — agent skill distribution (`npx skills add fullstackgtm/core`).
+The known-gaps list below predates these layers and has been re-verified
+against the 0.25 surface: still accurate, still open.
 ## Known real-portal gaps to close before 1.0
 Found by exercising the published package as a fresh RevOps user with a real

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "fullstackgtm",
-  "version": "0.25.0",
+  "version": "0.25.1",
   "description": "Open-source agentic GTM ops framework: canonical GTM data model, pluggable deterministic audits, reviewable dry-run patch plans, approval-gated write-back with conflict detection, and cross-system entity resolution. HubSpot, Salesforce, and Stripe connectors included.",
   "license": "Apache-2.0",
   "author": "Full Stack GTM",

package/skills/fullstackgtm/SKILL.md CHANGED Viewed

@@ -48,7 +48,8 @@ Credentials resolve in order: `--token-env <NAME>` → ambient env
 In a sandbox prefer the first two. LLM-powered verbs (`call parse`, `call
 score`, `market classify`) take `ANTHROPIC_API_KEY`/`OPENAI_API_KEY`, or use
 their deterministic/worksheet fallbacks — the CLI never prompts when
-non-interactive.
+non-interactive. `--profile <name>` (or `FULLSTACKGTM_PROFILE`) scopes
+credentials AND stored plans per client org.
 ## Verb map
@@ -61,9 +62,10 @@ non-interactive.
 | `fix --rule <id>` | audit one rule → suggest → approve at the confidence bar → apply only with `--yes` |
 | `call parse\|score\|link\|plan` | Transcripts → evidence-quoted insights, rubric scorecards, deal linking, governed next-step writes |
 | `enrich append\|refresh\|ingest\|status` | Governed enrichment (Apollo pull / Clay ingest), fill-blanks-only plans |
-| `market capture\|classify\|worksheet\|observe\|fronts\|axes\|overlay\|scale\|report\|refresh` | Competitive category map; evidence quotes verified verbatim against stored captures |
-| `schedule add\|install\|run\|status` | Horizontal cron; read/plan-side allowlist only — scheduling NEVER auto-approves |
-| `plans list\|approve` / `snapshot` / `rules` / `doctor` | Plan lifecycle, raw snapshots, rule registry, machine state |
+| `market init\|capture\|classify\|worksheet\|observe\|fronts\|axes\|overlay\|scale\|report\|refresh` | Competitive category map; evidence quotes verified verbatim against stored captures |
+| `schedule add\|list\|remove\|enable\|disable\|run\|install\|uninstall\|status` | Horizontal cron; read/plan-side allowlist only — scheduling NEVER auto-approves |
+| `report` | Client-ready audit deliverable (markdown or self-contained HTML) |
+| `plans list\|show\|approve\|reject` / `snapshot` / `rules` / `doctor` | Plan lifecycle, raw snapshots, rule registry, machine state |
 All write-shaped verbs produce plans; none writes outside approve → apply.
 Add `--json` for machine-readable output on any command.