thumbgate 1.18.0 → 1.19.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "thumbgate-marketplace",
3
- "version": "1.18.0",
3
+ "version": "1.19.0",
4
4
  "owner": {
5
5
  "name": "Igor Ganapolsky",
6
6
  "email": "ig5973700@gmail.com"
@@ -13,7 +13,7 @@
13
13
  "source": "npm",
14
14
  "package": "thumbgate"
15
15
  },
16
- "version": "1.18.0",
16
+ "version": "1.19.0",
17
17
  "author": {
18
18
  "name": "Igor Ganapolsky"
19
19
  },
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "thumbgate",
3
3
  "description": "Type 👍 or 👎 on any agent action. ThumbGate captures it, distills a lesson, and blocks the pattern from repeating. One thumbs-down = the agent physically cannot make that mistake again. 33 pre-action checks, budget enforcement, self-protection, and NIST/SOC2 compliance tags.",
4
- "version": "1.18.0",
4
+ "version": "1.19.0",
5
5
  "author": {
6
6
  "name": "Igor Ganapolsky"
7
7
  },
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "thumbgate",
3
- "version": "1.18.0",
3
+ "version": "1.19.0",
4
4
  "description": "ThumbGate — 👍👎 feedback that teaches your AI agent. Thumbs down a mistake, it never happens again.",
5
5
  "homepage": "https://thumbgate-production.up.railway.app",
6
6
  "transport": "stdio",
package/README.md CHANGED
@@ -8,7 +8,7 @@
8
8
 
9
9
  **Your AI coding bill has a leak.**
10
10
 
11
- **Stop paying $ for the same AI mistake.**
11
+ **Stop paying for the same AI mistake twice.**
12
12
 
13
13
  Every retry loop, every hallucinated import, every *"let me try a different approach"* — those are billable tokens on every LLM vendor's bill. Thumbs-down once; ThumbGate blocks that exact mistake on every future call. Across Claude Code, Cursor, Codex, Gemini, Amp, Cline, OpenCode — any MCP-compatible agent, forever.
14
14
 
@@ -24,6 +24,14 @@ Under the hood: your thumbs-down becomes one of your **Pre-Action Checks** that
24
24
 
25
25
  ---
26
26
 
27
+ > *"A better dashboard doesn't make the agents more reliable. The hard part isn't visibility. It's trust."*
28
+ >
29
+ > — **Rob May**, CEO & co-founder, Neurometric AI, quoted in [The New Stack](https://thenewstack.io/claude-code-agent-view/) on Anthropic's Claude Code Agent View (May 2026).
30
+ >
31
+ > ThumbGate is the open-source layer that makes the trust part real: PreToolUse gates, thumbs-down to rule, audit trail on every interception.
32
+
33
+ ---
34
+
27
35
  ## 🎬 90-second demo
28
36
 
29
37
  Watch the force-push scenario: agent tries to `git push --force`, one thumbs-down, next session it's blocked — zero tokens spent on the repeat.
@@ -107,13 +115,26 @@ ThumbGate operates as a 4-layer enforcement stack between your AI agent and your
107
115
  Your thumbs-up/down reactions are captured via MCP protocol, CLI, or the ChatGPT GPT surface. Each reaction is stored as a structured lesson with context, timestamp, and severity.
108
116
 
109
117
  ### Layer 2: Check Engine
110
- The check engine converts lessons into enforceable rules using pattern matching, semantic similarity (via LanceDB vectors), and Thompson Sampling for adaptive rule selection. Rules stay in local ThumbGate runtime state.
118
+ The check engine converts lessons into enforceable rules. **The runtime gate decision is deterministic** literal pattern match AST match → scoped rule lookup. No LLM call on the enforcement path.
119
+
120
+ Where retrieval is needed (an agent is about to run a destructive command not on the literal block list, but semantically similar to one we've blocked before), ThumbGate uses local CPU-only `bge-small` embeddings via LanceDB's built-in pipeline. No external API call, no inference cost beyond CPU. So **"no LLM in enforcement"** holds: the gate decision uses no LLM; the rule corpus is just searchable via local embeddings.
121
+
122
+ **Thompson Sampling tunes per-rule confidence weights** for soft-gating rules so high-noise rules quiet down and high-signal rules sharpen. It never decides *whether* a rule fires — a hard rule like "block `git push --force` on main" always fires deterministically. Bandit exploration would be terrifying for hard rules; we don't do it.
123
+
124
+ Rules stay in local ThumbGate runtime state.
111
125
 
112
126
  ### Layer 3: Pre-Action Interception
113
127
  Before any agent action executes, ThumbGate's `PreToolUse` hook intercepts the command and evaluates it against all active checks. This happens at the MCP protocol level — the agent physically cannot bypass it.
114
128
 
115
- ### Layer 4: Multi-Agent Distribution
116
- Checks are distributed across all connected agents via MCP stdio protocol. One correction in Claude Code protects Cursor, Codex, Gemini CLI, Cline, and any MCP-compatible agent.
129
+ ### Layer 4: Multi-Agent Distribution (the actual moat vs hand-rolled hooks)
130
+ Claude Code already ships `permissions.deny` and `PreToolUse` hooks. Cursor and Codex have their own. So why ThumbGate over a hand-written hook?
131
+
132
+ Two things hand-written hooks structurally cannot do:
133
+
134
+ 1. **Cross-agent propagation.** A `permissions.deny` pattern lives in one agent's config and stays there. ThumbGate's checks distribute across every connected agent over MCP stdio — thumbs-down once in Cursor, the same pattern blocks on Claude Code, Codex, Gemini CLI, Cline, OpenCode, Amp in the next session, no copy-paste between configs.
135
+ 2. **Learning loop.** A hand-written hook covers exactly the patterns you wrote. ThumbGate promotes every thumbs-down into a fresh rule, tunes existing rules' confidence weights from outcomes (Thompson Sampling, see Layer 2), and pulls semantically-near patterns into scope via local embeddings. The rule corpus sharpens without an operator hand-writing a regex for every new mistake shape.
136
+
137
+ Hand-rolled hooks are the right tool for a small, static denylist you maintain by hand. ThumbGate is the right tool when you want corrections from any agent to harden every agent automatically.
117
138
 
118
139
  Prompt engineering still matters, but it is only the starting point. ThumbGate adds prompt evaluation on top: proof lanes, benchmarks, and self-heal checks tell you whether your prompt and workflow actually held up under execution instead of leaving you to guess from vibes. Run `npx thumbgate eval --from-feedback --write-report=.thumbgate/prompt-eval-proof.md` to turn real thumbs-up/down feedback into reusable eval cases and a buyer-ready proof report.
119
140
 
@@ -423,6 +444,7 @@ Pro ($19/mo or $149/yr) removes the rule cap and adds history-aware lesson recal
423
444
 
424
445
  ## Docs
425
446
 
447
+ - [**ThumbGate for Federal Agencies**](docs/FEDERAL.md) — pilot-ready posture, NIST 800-53 control mapping, OMB M-24-10 / EO 14110 alignment, ThumbGate-Core gov deployment mode, public/Core boundary invariants. Landing page: [thumbgate.ai/federal](https://thumbgate-production.up.railway.app/federal).
426
448
  - [First Dollar Playbook](docs/FIRST_DOLLAR_PLAYBOOK.md) — turning one painful workflow into the next booked pilot
427
449
  - [Commercial Truth](docs/COMMERCIAL_TRUTH.md) — pricing, claims, what we don't say
428
450
  - [Changeset Strategy](docs/CHANGESET_STRATEGY.md) — release notes and version bump enforcement
@@ -2,13 +2,13 @@
2
2
  "mcpServers": {
3
3
  "thumbgate": {
4
4
  "command": "npx",
5
- "args": ["--yes", "--package", "thumbgate@1.18.0", "thumbgate", "serve"]
5
+ "args": ["--yes", "--package", "thumbgate@1.19.0", "thumbgate", "serve"]
6
6
  }
7
7
  },
8
8
  "hooks": {
9
9
  "preToolUse": {
10
10
  "command": "npx",
11
- "args": ["--yes", "--package", "thumbgate@1.18.0", "thumbgate", "gate-check"]
11
+ "args": ["--yes", "--package", "thumbgate@1.19.0", "thumbgate", "gate-check"]
12
12
  }
13
13
  }
14
14
  }
@@ -216,7 +216,7 @@ const {
216
216
  finalizeSession: finalizeFeedbackSession,
217
217
  } = require('../../scripts/feedback-session');
218
218
 
219
- const SERVER_INFO = { name: 'thumbgate-mcp', version: '1.18.0' };
219
+ const SERVER_INFO = { name: 'thumbgate-mcp', version: '1.19.0' };
220
220
  const COMMERCE_CATEGORIES = [
221
221
  'product_recommendation',
222
222
  'brand_compliance',
@@ -7,7 +7,7 @@
7
7
  "npx",
8
8
  "--yes",
9
9
  "--package",
10
- "thumbgate@1.18.0",
10
+ "thumbgate@1.19.0",
11
11
  "thumbgate",
12
12
  "serve"
13
13
  ],
@@ -76,9 +76,40 @@
76
76
  "medianLatencyMs",
77
77
  "costPerAnalysisUsd"
78
78
  ]
79
+ },
80
+ "tokenizer-brittleness": {
81
+ "label": "Tokenizer brittleness and byte-level robustness",
82
+ "summary": "Evaluate models for malformed JSONL, Unicode confusables, stack traces, secrets, SQL snippets, file paths, and code-symbol-heavy inputs before routing log, code, or security workloads.",
83
+ "desiredStrengths": ["tokenizer-free", "byte-level", "log-robustness", "code-symbols", "security-scanning", "fast-inference"],
84
+ "targetContextWindow": 64000,
85
+ "benchmarkCommands": [
86
+ "npx thumbgate model-candidates --workload=tokenizer-brittleness --json",
87
+ "node --test tests/model-candidates.test.js --test-name-pattern=tokenizer",
88
+ "node scripts/gate-eval.js run"
89
+ ],
90
+ "metrics": [
91
+ "caseCoverage",
92
+ "symbolPreservationRate",
93
+ "secretDetectionRecall",
94
+ "jsonlRepairAccuracy",
95
+ "medianLatencyMs",
96
+ "memoryBandwidthEstimate"
97
+ ]
79
98
  }
80
99
  },
81
100
  "candidates": [
101
+ {
102
+ "id": "research/fast-byte-latent-transformer",
103
+ "vendor": "Meta + Stanford + University of Washington",
104
+ "family": "blt",
105
+ "provider": "research",
106
+ "model": "fast-byte-latent-transformer",
107
+ "contextWindow": 64000,
108
+ "costClass": "medium",
109
+ "researchOnly": true,
110
+ "strengths": ["tokenizer-free", "byte-level", "log-robustness", "code-symbols", "security-scanning", "fast-inference"],
111
+ "notes": "Research-only candidate inspired by Fast BLT. Use as an evaluation target for tokenizer-free robustness and memory-bandwidth planning; do not route production traffic until a maintained runtime and model weights exist."
112
+ },
82
113
  {
83
114
  "id": "self-hosted/deepseek-v4-flash-sglang",
84
115
  "vendor": "DeepSeek",
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "thumbgate",
3
- "version": "1.18.0",
3
+ "version": "1.19.0",
4
4
  "description": "ThumbGate self-improving agent governance: thumbs-up/down turns every mistake into a prevention rule and blocks repeat patterns. 33 pre-action checks, budget enforcement, and self-protection for Claude Code, Cursor, Codex, Gemini CLI, and Amp.",
5
5
  "homepage": "https://thumbgate-production.up.railway.app",
6
6
  "repository": {
@@ -39,6 +39,7 @@
39
39
  "public/codex-plugin.html",
40
40
  "public/compare.html",
41
41
  "public/dashboard.html",
42
+ "public/federal.html",
42
43
  "public/guide.html",
43
44
  "public/index.html",
44
45
  "public/learn.html",
@@ -46,6 +47,7 @@
46
47
  "public/numbers.html",
47
48
  "public/pro.html",
48
49
  "scripts/access-anomaly-detector.js",
50
+ "scripts/activation-tracker.js",
49
51
  "scripts/agent-audit-trace.js",
50
52
  "scripts/agent-design-governance.js",
51
53
  "scripts/agent-memory-lifecycle.js",
@@ -152,6 +154,7 @@
152
154
  "scripts/mcp-policy.js",
153
155
  "scripts/mcp-transport-strategy.js",
154
156
  "scripts/memory-firewall.js",
157
+ "scripts/memory-scope-readiness.js",
155
158
  "scripts/meta-agent-loop.js",
156
159
  "scripts/model-access-eligibility.js",
157
160
  "scripts/model-migration-readiness.js",
@@ -163,6 +166,7 @@
163
166
  "scripts/otel-declarative-config.js",
164
167
  "scripts/operational-integrity.js",
165
168
  "scripts/perplexity-client.js",
169
+ "scripts/plausible-server-events.js",
166
170
  "scripts/pr-manager.js",
167
171
  "scripts/private-core-boundary.js",
168
172
  "scripts/pro-local-dashboard.js",
@@ -323,7 +327,16 @@
323
327
  "social:prospect:bluesky": "node scripts/social-bluesky-prospecting.js",
324
328
  "social:prospect:bluesky:dry": "node scripts/social-bluesky-prospecting.js --dry-run",
325
329
  "social:reply-publish:bluesky:dry": "node scripts/social-reply-monitor-bluesky.js --publish-approved --dry-run",
326
- "test": "npm run test:schema && npm run test:loop && npm run test:dpo && npm run test:kto && npm run test:api && npm run test:proof && npm run test:e2e && npm run test:rlaif && npm run test:attribution && npm run test:quality && npm run test:intelligence && npm run test:training-export && npm run test:deployment && npm run test:operational-integrity && npm run test:workflow && npm run test:billing && npm run test:cli && npm run test:watcher && npm run test:autoresearch && npm run test:ops && npm run test:session-analyzer && npm run test:tessl && npm run test:gates && npm run test:evoskill && npm run test:gates-hardening && npm run test:workers && npm run test:social-analytics && npm run test:memalign && npm run test:xmemory-lite && npm run test:filesystem-search && npm run test:zernio && npm run test:platform-limits && npm run test:post-video && npm run test:post-everywhere-instagram && npm run test:post-everywhere-channels && npm run test:post-everywhere-zernio-default && npm run test:zernio-canonical-pollers && npm run test:zernio-status && npm run test:obsidian-export && npm run test:lesson-db && npm run test:lesson-rotation && npm run test:memory-dedup && npm run test:feedback-quality && npm run test:sync-version && npm run test:check-congruence && npm run test:tool-registry && npm run test:feedback-to-rules && npm run test:memory-firewall && npm run test:belief-update && npm run test:hosted-config && npm run test:operational-summary && npm run test:operational-dashboard && npm run test:operator-artifacts && npm run test:operator-key-auth && npm run test:cloudflare-sandbox && npm run test:mcp-config && npm run test:plan-gate && npm run test:pulse && npm run test:semantic-layer && npm run test:data-pipeline && npm run test:optimize-context && npm run test:principle-extractor && npm run test:analytics-window && npm run test:funnel-analytics && npm run test:experiment-tracker && npm run test:build-metadata && npm run test:context-engine && npm run test:hf-papers && npm run test:marketing-experiment && npm run test:seo-gsd && npm run test:verify-run && npm run test:export-dpo-pairs && npm run test:export-hf-dataset && npm run test:license && npm run test:bot-detector && npm run test:postinstall && npm run test:funnel-invariants && npm run test:cli-telemetry && npm run test:pro-parity && npm run test:model-tier-router && npm run test:computer-use-firewall && npm run test:skill-exporter && npm run test:statusline && npm run test:evolution && npm run test:org-dashboard && npm run test:multi-hop-recall && npm run test:synthetic-dpo && npm run test:thumbgate-skill && npm run test:learn-hub && npm run test:feedback-fallback && npm run test:metaclaw && npm run test:server-lock && npm run test:control-tower && npm run test:pii-scanner && npm run test:data-governance && npm run test:lesson-inference && npm run test:semantic-dedup && npm run test:fs-utils && npm run test:cli-schema && npm run test:explore && npm run test:lesson-reranker && npm run test:lesson-retrieval && npm run test:cross-encoder && npm run test:reflector-agent && npm run test:feedback-session && npm run test:feedback-history-distiller && npm run test:hallucination-detector && npm run test:history-distiller && npm run test:predictive-insights && npm run test:prove-predictive-insights && npm run test:statusbar-cli && npm run test:generate-instagram-card && npm run test:instagram-thumbgate-post && npm run test:publish-instagram-thumbgate && npm run test:lesson-synthesis && npm run test:lesson-canonical && npm run test:background-governance && npm run test:memory-migration && npm run test:prompt-dlp && npm run test:ephemeral-store && npm run test:agent-security && npm run test:skill-progressive && npm run test:per-step-scoring && npm run test:weekly-auto-post && npm run test:social-post-hourly && npm run test:social-quality-gate && npm run test:a2ui-engine && npm run test:gate-satisfy && npm run test:money-watcher && npm run test:budget && npm run test:quick-start && npm run test:utm && npm run test:product-feedback && npm run test:feedback-root-consolidator && npm run test:engagement-audit && npm run test:install-growth-automation && npm run test:publish-thumbgate-launch && npm run test:community-course-platform-launch-kit && npm run test:reconcile-thumbgate-campaign && npm run test:reddit-publisher && npm run test:schedule-thumbgate-campaign && npm run test:social-reply-monitor && npm run test:social-dedupe-cleanup && npm run test:sync-launch-assets && npm run test:ai-search-visibility && npm run test:perplexity && npm run test:security-scanner && npm run test:llm-client && npm run test:managed-lesson-agent && npm run test:self-distill && npm run test:meta-agent && npm run test:harness-selector && npm run test:thumbgate-bench && npm run test:seo-guides && npm run test:enforcement-loop && npm run test:cli-agent-experience && npm run test:bot-detection && npm run test:checkout-bot-guard && npm run test:session-health && npm run test:session-episodes && npm run test:spec-gate && npm run test:decision-trace && npm run test:dashboard-insights && npm run test:telemetry-tracked-link-slug && npm run test:prompt-eval && npm run test:demo-voiceover && npm run test:gate-coherence && npm run test:gate-eval && npm run test:high-roi && npm run test:public-static-assets && npm run test:token-savings && npm run test:numbers-page && npm run test:workflow-gate-checkpoint && npm run test:lesson-export-import && npm run test:landing-page-claims && npm run test:competitive-positioning-marketing && npm run test:medium-weekly && npm run test:dashboard-deeplink-e2e && npm run test:public-package-parity && npm run test:token-savings-dashboard && npm run test:cursor-wiring && npm run test:pretooluse-injection && npm run test:recent-corrective-context && npm run test:durability-step && npm run test:mailer && npm run test:brand-assets && npm run test:enforcement-teeth && npm run test:bayes-optimal-gate && npm run test:swarm-coordinator && npm run test:session-report && npm run test:agent-reasoning-traces && npm run test:judge-reward && npm run test:llm-behavior-monitor && npm run test:prompting-os && npm run test:single-use-credential-gate && npm run test:structured-prompt-driven && npm run test:require-evidence-gate && npm run test:rule-validator && npm run test:bluesky-atproto && npm run test:social-reply-monitor-bluesky && npm run test:bluesky-delete-replies && npm run test:architect-kit-memory-bridge && npm run test:sonar-review-hotspots && npm run test:actionable-remediations && npm run test:gemini-embedding-policy && npm run test:agent-design-governance && npm run test:public-core-boundary",
330
+ "test:python": "python3 -m pytest tests/*.py",
331
+ "test": "npm run test:python && npm run test:schema && npm run test:loop && npm run test:dpo && npm run test:kto && npm run test:api && npm run test:proof && npm run test:e2e && npm run test:rlaif && npm run test:attribution && npm run test:quality && npm run test:intelligence && npm run test:training-export && npm run test:deployment && npm run test:operational-integrity && npm run test:workflow && npm run test:billing && npm run test:cli && npm run test:watcher && npm run test:autoresearch && npm run test:ops && npm run test:session-analyzer && npm run test:tessl && npm run test:gates && npm run test:evoskill && npm run test:gates-hardening && npm run test:workers && npm run test:social-analytics && npm run test:memalign && npm run test:xmemory-lite && npm run test:filesystem-search && npm run test:zernio && npm run test:platform-limits && npm run test:post-video && npm run test:post-everywhere-instagram && npm run test:post-everywhere-channels && npm run test:post-everywhere-zernio-default && npm run test:zernio-canonical-pollers && npm run test:zernio-status && npm run test:obsidian-export && npm run test:lesson-db && npm run test:lesson-rotation && npm run test:memory-dedup && npm run test:feedback-quality && npm run test:sync-version && npm run test:check-congruence && npm run test:tool-registry && npm run test:feedback-to-rules && npm run test:memory-firewall && npm run test:memory-scope-readiness && npm run test:belief-update && npm run test:hosted-config && npm run test:operational-summary && npm run test:operational-dashboard && npm run test:operator-artifacts && npm run test:operator-key-auth && npm run test:cloudflare-sandbox && npm run test:mcp-config && npm run test:plan-gate && npm run test:pulse && npm run test:semantic-layer && npm run test:data-pipeline && npm run test:optimize-context && npm run test:principle-extractor && npm run test:analytics-window && npm run test:funnel-analytics && npm run test:experiment-tracker && npm run test:build-metadata && npm run test:context-engine && npm run test:hf-papers && npm run test:marketing-experiment && npm run test:seo-gsd && npm run test:verify-run && npm run test:export-dpo-pairs && npm run test:export-hf-dataset && npm run test:license && npm run test:bot-detector && npm run test:audit-pr-bot-contamination && npm run test:stripe-bootstrap-saas-catalog && npm run test:postinstall && npm run test:funnel-invariants && npm run test:cli-telemetry && npm run test:pro-parity && npm run test:model-tier-router && npm run test:computer-use-firewall && npm run test:skill-exporter && npm run test:statusline && npm run test:evolution && npm run test:org-dashboard && npm run test:multi-hop-recall && npm run test:synthetic-dpo && npm run test:thumbgate-skill && npm run test:learn-hub && npm run test:feedback-fallback && npm run test:metaclaw && npm run test:server-lock && npm run test:control-tower && npm run test:pii-scanner && npm run test:data-governance && npm run test:lesson-inference && npm run test:semantic-dedup && npm run test:fs-utils && npm run test:cli-schema && npm run test:explore && npm run test:lesson-reranker && npm run test:lesson-retrieval && npm run test:cross-encoder && npm run test:reflector-agent && npm run test:feedback-session && npm run test:feedback-history-distiller && npm run test:hallucination-detector && npm run test:history-distiller && npm run test:predictive-insights && npm run test:prove-predictive-insights && npm run test:statusbar-cli && npm run test:generate-instagram-card && npm run test:instagram-thumbgate-post && npm run test:publish-instagram-thumbgate && npm run test:lesson-synthesis && npm run test:lesson-canonical && npm run test:background-governance && npm run test:memory-migration && npm run test:prompt-dlp && npm run test:ephemeral-store && npm run test:agent-security && npm run test:skill-progressive && npm run test:per-step-scoring && npm run test:weekly-auto-post && npm run test:social-post-hourly && npm run test:social-quality-gate && npm run test:a2ui-engine && npm run test:gate-satisfy && npm run test:money-watcher && npm run test:budget && npm run test:quick-start && npm run test:utm && npm run test:product-feedback && npm run test:feedback-root-consolidator && npm run test:engagement-audit && npm run test:install-growth-automation && npm run test:publish-thumbgate-launch && npm run test:community-course-platform-launch-kit && npm run test:reconcile-thumbgate-campaign && npm run test:reddit-publisher && npm run test:schedule-thumbgate-campaign && npm run test:social-reply-monitor && npm run test:social-dedupe-cleanup && npm run test:sync-launch-assets && npm run test:ai-search-visibility && npm run test:perplexity && npm run test:security-scanner && npm run test:llm-client && npm run test:managed-lesson-agent && npm run test:self-distill && npm run test:meta-agent && npm run test:harness-selector && npm run test:thumbgate-bench && npm run test:seo-guides && npm run test:enforcement-loop && npm run test:cli-agent-experience && npm run test:bot-detection && npm run test:checkout-bot-guard && npm run test:checkout-pro-confirmation-gate && npm run test:session-health && npm run test:session-episodes && npm run test:spec-gate && npm run test:decision-trace && npm run test:dashboard-insights && npm run test:telemetry-tracked-link-slug && npm run test:prompt-eval && npm run test:demo-voiceover && npm run test:gate-coherence && npm run test:gate-eval && npm run test:high-roi && npm run test:public-static-assets && npm run test:token-savings && npm run test:numbers-page && npm run test:workflow-gate-checkpoint && npm run test:lesson-export-import && npm run test:landing-page-claims && npm run test:competitive-positioning-marketing && npm run test:medium-weekly && npm run test:dashboard-deeplink-e2e && npm run test:public-package-parity && npm run test:token-savings-dashboard && npm run test:cursor-wiring && npm run test:pretooluse-injection && npm run test:recent-corrective-context && npm run test:durability-step && npm run test:mailer && npm run test:brand-assets && npm run test:enforcement-teeth && npm run test:bayes-optimal-gate && npm run test:swarm-coordinator && npm run test:session-report && npm run test:agent-reasoning-traces && npm run test:judge-reward && npm run test:llm-behavior-monitor && npm run test:prompting-os && npm run test:single-use-credential-gate && npm run test:structured-prompt-driven && npm run test:require-evidence-gate && npm run test:rule-validator && npm run test:bluesky-atproto && npm run test:social-reply-monitor-bluesky && npm run test:bluesky-delete-replies && npm run test:architect-kit-memory-bridge && npm run test:sonar-review-hotspots && npm run test:actionable-remediations && npm run test:gemini-embedding-policy && npm run test:agent-design-governance && npm run test:public-core-boundary && npm run test:hook-stop-verify-deploy && npm run test:hook-stop-anti-claim && npm run test:plausible-server-events && npm run test:activation-tracker && npm run test:unified-revenue-rollup && npm run test:conversion-rate-stats && npm run test:external-customer-audit && npm run test:telemetry-export",
332
+ "test:hook-stop-verify-deploy": "node --test tests/hook-stop-verify-deploy.test.js",
333
+ "test:hook-stop-anti-claim": "node --test tests/hook-stop-anti-claim.test.js",
334
+ "test:plausible-server-events": "node --test tests/plausible-server-events.test.js",
335
+ "test:activation-tracker": "node --test tests/activation-tracker.test.js",
336
+ "test:unified-revenue-rollup": "node --test tests/unified-revenue-rollup.test.js",
337
+ "test:conversion-rate-stats": "node --test tests/conversion-rate-stats.test.js",
338
+ "test:external-customer-audit": "node --test tests/external-customer-audit.test.js",
339
+ "test:telemetry-export": "node --test tests/telemetry-export.test.js",
327
340
  "test:swarm-coordinator": "node --test tests/swarm-coordinator.test.js",
328
341
  "test:session-report": "node --test tests/session-report.test.js",
329
342
  "test:agent-reasoning-traces": "node --test tests/agent-reasoning-traces.test.js tests/agent-stack-survival-audit.test.js",
@@ -348,6 +361,7 @@
348
361
  "test:prompt-eval": "node --test tests/prompt-eval.test.js",
349
362
  "eval:feedback": "node scripts/prompt-eval.js --from-feedback",
350
363
  "eval:feedback-quality": "python3 scripts/feedback_quality_eval.py",
364
+ "eval:classifier": "python3 scripts/eval_gate_classifier.py",
351
365
  "test:decision-trace": "node --test tests/decision-trace.test.js",
352
366
  "test:feedback-fallback": "node --test tests/feedback-fallback.test.js",
353
367
  "test:metaclaw": "node --test tests/metaclaw-features.test.js",
@@ -367,6 +381,7 @@
367
381
  "test:learn-hub": "node --test tests/learn-hub.test.js",
368
382
  "test:feedback-to-rules": "node --test tests/feedback-to-rules.test.js",
369
383
  "test:memory-firewall": "node --test tests/memory-firewall.test.js",
384
+ "test:memory-scope-readiness": "node --test tests/memory-scope-readiness.test.js",
370
385
  "test:belief-update": "node --test tests/belief-update.test.js",
371
386
  "test:hosted-config": "node --test tests/hosted-config.test.js",
372
387
  "test:operational-summary": "node --test tests/operational-summary.test.js",
@@ -405,10 +420,10 @@
405
420
  "test:kto": "node --test tests/export-kto.test.js",
406
421
  "test:api": "node --test --test-concurrency=1 tests/api-server.test.js tests/api-events-sse.test.js tests/api-auth-config.test.js tests/mcp-server.test.js tests/adapters.test.js tests/openapi-parity.test.js tests/budget-guard.test.js tests/context-manager.test.js tests/contextfs.test.js tests/job-api.test.js tests/pack-templates.test.js tests/dashboard.test.js tests/dashboard-render-spec.test.js tests/dashboard-html.test.js tests/agent-readiness.test.js tests/mcp-policy.test.js tests/subagent-profiles.test.js tests/intent-router.test.js tests/internal-agent-bootstrap.test.js tests/lesson-search.test.js tests/thumbgate-search.test.js tests/document-intake.test.js tests/rubric-engine.test.js tests/self-healing-check.test.js tests/self-heal.test.js tests/feedback-schema.test.js tests/thompson-sampling.test.js tests/feedback-sequences.test.js tests/diversity-tracking.test.js tests/vector-store.test.js tests/gemini-embedding-policy.test.js tests/feedback-attribution.test.js tests/hybrid-feedback-context.test.js tests/loop-closure.test.js tests/code-reasoning.test.js tests/feedback-loop.test.js tests/feedback-inbox-read.test.js tests/feedback-to-memory.test.js tests/test-coverage.test.js tests/version-metadata.test.js tests/claude-mcpb.test.js tests/claude-codex-bridge.test.js tests/cursor-plugin.test.js tests/codex-plugin.test.js tests/ide-marketplace-extensions.test.js tests/telemetry-analytics.test.js tests/public-landing.test.js tests/lessons-page.test.js tests/pro-landing.test.js tests/local-model-profile.test.js tests/risk-scorer.test.js tests/context-compaction.test.js tests/reminder-engine.test.js tests/post-to-x.test.js tests/verification-loop.test.js tests/async-job-runner.test.js tests/commerce-quality.test.js tests/recall-limit.test.js tests/problem-detail.test.js tests/natural-language-harness.test.js tests/settings-hierarchy.test.js",
407
422
  "test:proof": "node --test tests/prove-adapters.test.js tests/prove-attribution.test.js tests/prove-cloudflare-sandbox.test.js tests/prove-data-quality.test.js tests/prove-intelligence.test.js tests/prove-lancedb.test.js tests/prove-loop-closure.test.js tests/prove-training-export.test.js tests/prove-local-intelligence.test.js tests/prove-workflow-contract.test.js tests/prove-autoresearch.test.js tests/prove-claim-verification.test.js tests/prove-data-pipeline.test.js tests/prove-evolution.test.js tests/prove-harnesses.test.js tests/prove-packaged-runtime.test.js tests/prove-runtime.test.js tests/prove-seo-gsd.test.js tests/prove-settings.test.js tests/prove-xmemory.test.js && node --test tests/prove-automation.test.js",
408
- "test:e2e": "node --test tests/e2e-pipeline.test.js tests/e2e-product-flows.test.js tests/e2e-coverage-contract.test.js",
423
+ "test:e2e": "node --test tests/e2e-pipeline.test.js tests/e2e-product-flows.test.js tests/e2e-coverage-contract.test.js tests/interaction-model-e2e.test.js",
409
424
  "test:rlaif": "node --test tests/rlaif-self-audit.test.js tests/dpo-optimizer.test.js tests/meta-policy.test.js tests/agent-reward-model.test.js",
410
425
  "test:attribution": "node --test tests/feedback-attribution.test.js tests/hybrid-feedback-context.test.js",
411
- "test:quality": "node --test tests/validate-feedback.test.js tests/feedback-quality-eval-python.test.js",
426
+ "test:quality": "node --test tests/validate-feedback.test.js tests/feedback-quality-eval-python.test.js tests/eval-gate-classifier.test.js",
412
427
  "test:intelligence": "node --test tests/intelligence.test.js",
413
428
  "test:training-export": "node --test tests/training-export.test.js tests/databricks-export.test.js",
414
429
  "test:deployment": "node --test tests/deployment.test.js tests/deploy-policy.test.js tests/publish-decision.test.js tests/changeset-check.test.js tests/release-notes.test.js tests/sonarcloud-workflow.test.js tests/package-boundary.test.js tests/public-package-boundary.test.js tests/revenue-observability-workflow.test.js",
@@ -488,8 +503,11 @@
488
503
  "test:zernio-status": "node --test tests/zernio-status.test.js",
489
504
  "test:license": "node --test tests/license.test.js",
490
505
  "test:bot-detector": "node --test tests/bot-detector.test.js",
506
+ "test:audit-pr-bot-contamination": "node --test tests/audit-pr-bot-contamination.test.js",
507
+ "test:stripe-bootstrap-saas-catalog": "node --test tests/stripe-bootstrap-saas-catalog.test.js",
491
508
  "test:bot-detection": "node --test tests/bot-detection.test.js",
492
509
  "test:checkout-bot-guard": "node --test tests/checkout-bot-guard.test.js",
510
+ "test:checkout-pro-confirmation-gate": "node --test tests/checkout-pro-confirmation-gate.test.js",
493
511
  "test:postinstall": "node --test tests/postinstall.test.js",
494
512
  "test:funnel-invariants": "node --test tests/funnel-invariants.test.js",
495
513
  "test:cli-telemetry": "node --test tests/cli-telemetry.test.js",
@@ -599,7 +617,8 @@
599
617
  "test:gate-eval": "node --test tests/gate-eval.test.js",
600
618
  "gate-eval:ci": "node scripts/gate-eval.js run",
601
619
  "test:ai-engineering-stack-guardrails": "node --test tests/ai-engineering-stack-guardrails.test.js",
602
- "test:high-roi": "node --test tests/high-roi.test.js tests/model-candidates.test.js tests/autonomous-workflow.test.js tests/high-roi-agent-workflows.test.js tests/code-graph-guardrails.test.js tests/proxy-pointer-rag-guardrails.test.js tests/rag-precision-guardrails.test.js tests/ai-engineering-stack-guardrails.test.js tests/long-running-agent-context-guardrails.test.js tests/reasoning-efficiency-guardrails.test.js tests/deepseek-v4-runtime-guardrails.test.js tests/upstream-contribution-engine.test.js tests/proactive-agent-eval-guardrails.test.js tests/reward-hacking-guardrails.test.js tests/chatgpt-ads-readiness-pack.test.js tests/oss-pr-opportunity-scout.test.js tests/agent-design-governance.test.js tests/gemini-embedding-policy.test.js tests/openclaw-agent-governance-kit.test.js",
620
+ "test:interaction-model": "node --test tests/interaction-model.test.js tests/interaction-model-e2e.test.js",
621
+ "test:high-roi": "node --test tests/high-roi.test.js tests/model-candidates.test.js tests/autonomous-workflow.test.js tests/high-roi-agent-workflows.test.js tests/interaction-model.test.js tests/interaction-model-e2e.test.js tests/code-graph-guardrails.test.js tests/proxy-pointer-rag-guardrails.test.js tests/rag-precision-guardrails.test.js tests/ai-engineering-stack-guardrails.test.js tests/long-running-agent-context-guardrails.test.js tests/reasoning-efficiency-guardrails.test.js tests/deepseek-v4-runtime-guardrails.test.js tests/upstream-contribution-engine.test.js tests/proactive-agent-eval-guardrails.test.js tests/reward-hacking-guardrails.test.js tests/chatgpt-ads-readiness-pack.test.js tests/oss-pr-opportunity-scout.test.js tests/agent-design-governance.test.js tests/gemini-embedding-policy.test.js tests/openclaw-agent-governance-kit.test.js",
603
622
  "test:public-static-assets": "node --test tests/public-static-assets.test.js",
604
623
  "test:token-savings": "node --test tests/token-savings.test.js",
605
624
  "test:numbers-page": "node --test tests/numbers-page.test.js",
@@ -676,7 +695,7 @@
676
695
  "node": ">=18.18.0"
677
696
  },
678
697
  "dependencies": {
679
- "@anthropic-ai/sdk": "0.92.0",
698
+ "@anthropic-ai/sdk": "0.95.2",
680
699
  "@google/genai": "1.49.0",
681
700
  "@huggingface/transformers": "^4.2.0",
682
701
  "@lancedb/lancedb": "^0.27.2",
@@ -255,6 +255,12 @@
255
255
  <p><a href="/compare/agentix-labs" class="cta">Read ThumbGate vs Agentix Labs</a></p>
256
256
  </div>
257
257
 
258
+ <div class="card">
259
+ <h3>Already using a supply-chain scanner?</h3>
260
+ <p>HEIDI (Meterian) catches AI assistants suggesting vulnerable npm/pip/maven packages — the supply-chain half of AI coding safety. ThumbGate is the behavior half: it blocks the agent's tool call before it fires a known-bad pattern. Different threat surfaces, both local-first, both free at base. Run both.</p>
261
+ <p><a href="/compare/heidi" class="cta">Read ThumbGate vs HEIDI</a></p>
262
+ </div>
263
+
258
264
  <h2>How It Works</h2>
259
265
  <div class="step-grid">
260
266
  <div class="step-card">