thumbgate 1.17.0 → 1.19.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "thumbgate-marketplace",
3
- "version": "1.17.0",
3
+ "version": "1.19.0",
4
4
  "owner": {
5
5
  "name": "Igor Ganapolsky",
6
6
  "email": "ig5973700@gmail.com"
@@ -13,7 +13,7 @@
13
13
  "source": "npm",
14
14
  "package": "thumbgate"
15
15
  },
16
- "version": "1.17.0",
16
+ "version": "1.19.0",
17
17
  "author": {
18
18
  "name": "Igor Ganapolsky"
19
19
  },
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "thumbgate",
3
3
  "description": "Type 👍 or 👎 on any agent action. ThumbGate captures it, distills a lesson, and blocks the pattern from repeating. One thumbs-down = the agent physically cannot make that mistake again. 33 pre-action checks, budget enforcement, self-protection, and NIST/SOC2 compliance tags.",
4
- "version": "1.17.0",
4
+ "version": "1.19.0",
5
5
  "author": {
6
6
  "name": "Igor Ganapolsky"
7
7
  },
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "thumbgate",
3
- "version": "1.17.0",
3
+ "version": "1.19.0",
4
4
  "description": "ThumbGate — 👍👎 feedback that teaches your AI agent. Thumbs down a mistake, it never happens again.",
5
5
  "homepage": "https://thumbgate-production.up.railway.app",
6
6
  "transport": "stdio",
package/README.md CHANGED
@@ -8,7 +8,7 @@
8
8
 
9
9
  **Your AI coding bill has a leak.**
10
10
 
11
- **Stop paying $ for the same AI mistake.**
11
+ **Stop paying for the same AI mistake twice.**
12
12
 
13
13
  Every retry loop, every hallucinated import, every *"let me try a different approach"* — those are billable tokens on every LLM vendor's bill. Thumbs-down once; ThumbGate blocks that exact mistake on every future call. Across Claude Code, Cursor, Codex, Gemini, Amp, Cline, OpenCode — any MCP-compatible agent, forever.
14
14
 
@@ -24,6 +24,14 @@ Under the hood: your thumbs-down becomes one of your **Pre-Action Checks** that
24
24
 
25
25
  ---
26
26
 
27
+ > *"A better dashboard doesn't make the agents more reliable. The hard part isn't visibility. It's trust."*
28
+ >
29
+ > — **Rob May**, CEO & co-founder, Neurometric AI, quoted in [The New Stack](https://thenewstack.io/claude-code-agent-view/) on Anthropic's Claude Code Agent View (May 2026).
30
+ >
31
+ > ThumbGate is the open-source layer that makes the trust part real: PreToolUse gates, thumbs-down to rule, audit trail on every interception.
32
+
33
+ ---
34
+
27
35
  ## 🎬 90-second demo
28
36
 
29
37
  Watch the force-push scenario: agent tries to `git push --force`, one thumbs-down, next session it's blocked — zero tokens spent on the repeat.
@@ -107,13 +115,26 @@ ThumbGate operates as a 4-layer enforcement stack between your AI agent and your
107
115
  Your thumbs-up/down reactions are captured via MCP protocol, CLI, or the ChatGPT GPT surface. Each reaction is stored as a structured lesson with context, timestamp, and severity.
108
116
 
109
117
  ### Layer 2: Check Engine
110
- The check engine converts lessons into enforceable rules using pattern matching, semantic similarity (via LanceDB vectors), and Thompson Sampling for adaptive rule selection. Rules stay in local ThumbGate runtime state.
118
+ The check engine converts lessons into enforceable rules. **The runtime gate decision is deterministic** literal pattern match AST match → scoped rule lookup. No LLM call on the enforcement path.
119
+
120
+ Where retrieval is needed (an agent is about to run a destructive command not on the literal block list, but semantically similar to one we've blocked before), ThumbGate uses local CPU-only `bge-small` embeddings via LanceDB's built-in pipeline. No external API call, no inference cost beyond CPU. So **"no LLM in enforcement"** holds: the gate decision uses no LLM; the rule corpus is just searchable via local embeddings.
121
+
122
+ **Thompson Sampling tunes per-rule confidence weights** for soft-gating rules so high-noise rules quiet down and high-signal rules sharpen. It never decides *whether* a rule fires — a hard rule like "block `git push --force` on main" always fires deterministically. Bandit exploration would be terrifying for hard rules; we don't do it.
123
+
124
+ Rules stay in local ThumbGate runtime state.
111
125
 
112
126
  ### Layer 3: Pre-Action Interception
113
127
  Before any agent action executes, ThumbGate's `PreToolUse` hook intercepts the command and evaluates it against all active checks. This happens at the MCP protocol level — the agent physically cannot bypass it.
114
128
 
115
- ### Layer 4: Multi-Agent Distribution
116
- Checks are distributed across all connected agents via MCP stdio protocol. One correction in Claude Code protects Cursor, Codex, Gemini CLI, Cline, and any MCP-compatible agent.
129
+ ### Layer 4: Multi-Agent Distribution (the actual moat vs hand-rolled hooks)
130
+ Claude Code already ships `permissions.deny` and `PreToolUse` hooks. Cursor and Codex have their own. So why ThumbGate over a hand-written hook?
131
+
132
+ Two things hand-written hooks structurally cannot do:
133
+
134
+ 1. **Cross-agent propagation.** A `permissions.deny` pattern lives in one agent's config and stays there. ThumbGate's checks distribute across every connected agent over MCP stdio — thumbs-down once in Cursor, the same pattern blocks on Claude Code, Codex, Gemini CLI, Cline, OpenCode, Amp in the next session, no copy-paste between configs.
135
+ 2. **Learning loop.** A hand-written hook covers exactly the patterns you wrote. ThumbGate promotes every thumbs-down into a fresh rule, tunes existing rules' confidence weights from outcomes (Thompson Sampling, see Layer 2), and pulls semantically-near patterns into scope via local embeddings. The rule corpus sharpens without an operator hand-writing a regex for every new mistake shape.
136
+
137
+ Hand-rolled hooks are the right tool for a small, static denylist you maintain by hand. ThumbGate is the right tool when you want corrections from any agent to harden every agent automatically.
117
138
 
118
139
  Prompt engineering still matters, but it is only the starting point. ThumbGate adds prompt evaluation on top: proof lanes, benchmarks, and self-heal checks tell you whether your prompt and workflow actually held up under execution instead of leaving you to guess from vibes. Run `npx thumbgate eval --from-feedback --write-report=.thumbgate/prompt-eval-proof.md` to turn real thumbs-up/down feedback into reusable eval cases and a buyer-ready proof report.
119
140
 
@@ -423,6 +444,7 @@ Pro ($19/mo or $149/yr) removes the rule cap and adds history-aware lesson recal
423
444
 
424
445
  ## Docs
425
446
 
447
+ - [**ThumbGate for Federal Agencies**](docs/FEDERAL.md) — pilot-ready posture, NIST 800-53 control mapping, OMB M-24-10 / EO 14110 alignment, ThumbGate-Core gov deployment mode, public/Core boundary invariants. Landing page: [thumbgate.ai/federal](https://thumbgate-production.up.railway.app/federal).
426
448
  - [First Dollar Playbook](docs/FIRST_DOLLAR_PLAYBOOK.md) — turning one painful workflow into the next booked pilot
427
449
  - [Commercial Truth](docs/COMMERCIAL_TRUTH.md) — pricing, claims, what we don't say
428
450
  - [Changeset Strategy](docs/CHANGESET_STRATEGY.md) — release notes and version bump enforcement
@@ -2,13 +2,13 @@
2
2
  "mcpServers": {
3
3
  "thumbgate": {
4
4
  "command": "npx",
5
- "args": ["--yes", "--package", "thumbgate@1.17.0", "thumbgate", "serve"]
5
+ "args": ["--yes", "--package", "thumbgate@1.19.0", "thumbgate", "serve"]
6
6
  }
7
7
  },
8
8
  "hooks": {
9
9
  "preToolUse": {
10
10
  "command": "npx",
11
- "args": ["--yes", "--package", "thumbgate@1.17.0", "thumbgate", "gate-check"]
11
+ "args": ["--yes", "--package", "thumbgate@1.19.0", "thumbgate", "gate-check"]
12
12
  }
13
13
  }
14
14
  }
@@ -216,7 +216,7 @@ const {
216
216
  finalizeSession: finalizeFeedbackSession,
217
217
  } = require('../../scripts/feedback-session');
218
218
 
219
- const SERVER_INFO = { name: 'thumbgate-mcp', version: '1.17.0' };
219
+ const SERVER_INFO = { name: 'thumbgate-mcp', version: '1.19.0' };
220
220
  const COMMERCE_CATEGORIES = [
221
221
  'product_recommendation',
222
222
  'brand_compliance',
@@ -7,7 +7,7 @@
7
7
  "npx",
8
8
  "--yes",
9
9
  "--package",
10
- "thumbgate@1.17.0",
10
+ "thumbgate@1.19.0",
11
11
  "thumbgate",
12
12
  "serve"
13
13
  ],
@@ -76,9 +76,40 @@
76
76
  "medianLatencyMs",
77
77
  "costPerAnalysisUsd"
78
78
  ]
79
+ },
80
+ "tokenizer-brittleness": {
81
+ "label": "Tokenizer brittleness and byte-level robustness",
82
+ "summary": "Evaluate models for malformed JSONL, Unicode confusables, stack traces, secrets, SQL snippets, file paths, and code-symbol-heavy inputs before routing log, code, or security workloads.",
83
+ "desiredStrengths": ["tokenizer-free", "byte-level", "log-robustness", "code-symbols", "security-scanning", "fast-inference"],
84
+ "targetContextWindow": 64000,
85
+ "benchmarkCommands": [
86
+ "npx thumbgate model-candidates --workload=tokenizer-brittleness --json",
87
+ "node --test tests/model-candidates.test.js --test-name-pattern=tokenizer",
88
+ "node scripts/gate-eval.js run"
89
+ ],
90
+ "metrics": [
91
+ "caseCoverage",
92
+ "symbolPreservationRate",
93
+ "secretDetectionRecall",
94
+ "jsonlRepairAccuracy",
95
+ "medianLatencyMs",
96
+ "memoryBandwidthEstimate"
97
+ ]
79
98
  }
80
99
  },
81
100
  "candidates": [
101
+ {
102
+ "id": "research/fast-byte-latent-transformer",
103
+ "vendor": "Meta + Stanford + University of Washington",
104
+ "family": "blt",
105
+ "provider": "research",
106
+ "model": "fast-byte-latent-transformer",
107
+ "contextWindow": 64000,
108
+ "costClass": "medium",
109
+ "researchOnly": true,
110
+ "strengths": ["tokenizer-free", "byte-level", "log-robustness", "code-symbols", "security-scanning", "fast-inference"],
111
+ "notes": "Research-only candidate inspired by Fast BLT. Use as an evaluation target for tokenizer-free robustness and memory-bandwidth planning; do not route production traffic until a maintained runtime and model weights exist."
112
+ },
82
113
  {
83
114
  "id": "self-hosted/deepseek-v4-flash-sglang",
84
115
  "vendor": "DeepSeek",
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "thumbgate",
3
- "version": "1.17.0",
3
+ "version": "1.19.0",
4
4
  "description": "ThumbGate self-improving agent governance: thumbs-up/down turns every mistake into a prevention rule and blocks repeat patterns. 33 pre-action checks, budget enforcement, and self-protection for Claude Code, Cursor, Codex, Gemini CLI, and Amp.",
5
5
  "homepage": "https://thumbgate-production.up.railway.app",
6
6
  "repository": {
@@ -39,6 +39,7 @@
39
39
  "public/codex-plugin.html",
40
40
  "public/compare.html",
41
41
  "public/dashboard.html",
42
+ "public/federal.html",
42
43
  "public/guide.html",
43
44
  "public/index.html",
44
45
  "public/learn.html",
@@ -46,6 +47,7 @@
46
47
  "public/numbers.html",
47
48
  "public/pro.html",
48
49
  "scripts/access-anomaly-detector.js",
50
+ "scripts/activation-tracker.js",
49
51
  "scripts/agent-audit-trace.js",
50
52
  "scripts/agent-design-governance.js",
51
53
  "scripts/agent-memory-lifecycle.js",
@@ -110,6 +112,7 @@
110
112
  "scripts/feedback-loop.js",
111
113
  "scripts/feedback-paths.js",
112
114
  "scripts/feedback-quality.js",
115
+ "scripts/feedback_quality_eval.py",
113
116
  "scripts/feedback-schema.js",
114
117
  "scripts/feedback-session.js",
115
118
  "scripts/feedback-to-rules.js",
@@ -151,6 +154,7 @@
151
154
  "scripts/mcp-policy.js",
152
155
  "scripts/mcp-transport-strategy.js",
153
156
  "scripts/memory-firewall.js",
157
+ "scripts/memory-scope-readiness.js",
154
158
  "scripts/meta-agent-loop.js",
155
159
  "scripts/model-access-eligibility.js",
156
160
  "scripts/model-migration-readiness.js",
@@ -162,6 +166,7 @@
162
166
  "scripts/otel-declarative-config.js",
163
167
  "scripts/operational-integrity.js",
164
168
  "scripts/perplexity-client.js",
169
+ "scripts/plausible-server-events.js",
165
170
  "scripts/pr-manager.js",
166
171
  "scripts/private-core-boundary.js",
167
172
  "scripts/pro-local-dashboard.js",
@@ -322,7 +327,16 @@
322
327
  "social:prospect:bluesky": "node scripts/social-bluesky-prospecting.js",
323
328
  "social:prospect:bluesky:dry": "node scripts/social-bluesky-prospecting.js --dry-run",
324
329
  "social:reply-publish:bluesky:dry": "node scripts/social-reply-monitor-bluesky.js --publish-approved --dry-run",
325
- "test": "npm run test:schema && npm run test:loop && npm run test:dpo && npm run test:kto && npm run test:api && npm run test:proof && npm run test:e2e && npm run test:rlaif && npm run test:attribution && npm run test:quality && npm run test:intelligence && npm run test:training-export && npm run test:deployment && npm run test:operational-integrity && npm run test:workflow && npm run test:billing && npm run test:cli && npm run test:watcher && npm run test:autoresearch && npm run test:ops && npm run test:session-analyzer && npm run test:tessl && npm run test:gates && npm run test:evoskill && npm run test:gates-hardening && npm run test:workers && npm run test:social-analytics && npm run test:memalign && npm run test:xmemory-lite && npm run test:filesystem-search && npm run test:zernio && npm run test:platform-limits && npm run test:post-video && npm run test:post-everywhere-instagram && npm run test:post-everywhere-channels && npm run test:post-everywhere-zernio-default && npm run test:zernio-canonical-pollers && npm run test:zernio-status && npm run test:obsidian-export && npm run test:lesson-db && npm run test:lesson-rotation && npm run test:memory-dedup && npm run test:feedback-quality && npm run test:sync-version && npm run test:check-congruence && npm run test:tool-registry && npm run test:feedback-to-rules && npm run test:memory-firewall && npm run test:belief-update && npm run test:hosted-config && npm run test:operational-summary && npm run test:operational-dashboard && npm run test:operator-artifacts && npm run test:operator-key-auth && npm run test:cloudflare-sandbox && npm run test:mcp-config && npm run test:plan-gate && npm run test:pulse && npm run test:semantic-layer && npm run test:data-pipeline && npm run test:optimize-context && npm run test:principle-extractor && npm run test:analytics-window && npm run test:funnel-analytics && npm run test:experiment-tracker && npm run test:build-metadata && npm run test:context-engine && npm run test:hf-papers && npm run test:marketing-experiment && npm run test:seo-gsd && npm run test:verify-run && npm run test:export-dpo-pairs && npm run test:export-hf-dataset && npm run test:license && npm run test:bot-detector && npm run test:postinstall && npm run test:funnel-invariants && npm run test:cli-telemetry && npm run test:pro-parity && npm run test:model-tier-router && npm run test:computer-use-firewall && npm run test:skill-exporter && npm run test:statusline && npm run test:evolution && npm run test:org-dashboard && npm run test:multi-hop-recall && npm run test:synthetic-dpo && npm run test:thumbgate-skill && npm run test:learn-hub && npm run test:feedback-fallback && npm run test:metaclaw && npm run test:server-lock && npm run test:control-tower && npm run test:pii-scanner && npm run test:data-governance && npm run test:lesson-inference && npm run test:semantic-dedup && npm run test:fs-utils && npm run test:cli-schema && npm run test:explore && npm run test:lesson-reranker && npm run test:lesson-retrieval && npm run test:cross-encoder && npm run test:reflector-agent && npm run test:feedback-session && npm run test:feedback-history-distiller && npm run test:hallucination-detector && npm run test:history-distiller && npm run test:predictive-insights && npm run test:prove-predictive-insights && npm run test:statusbar-cli && npm run test:generate-instagram-card && npm run test:instagram-thumbgate-post && npm run test:publish-instagram-thumbgate && npm run test:lesson-synthesis && npm run test:lesson-canonical && npm run test:background-governance && npm run test:memory-migration && npm run test:prompt-dlp && npm run test:ephemeral-store && npm run test:agent-security && npm run test:skill-progressive && npm run test:per-step-scoring && npm run test:weekly-auto-post && npm run test:social-post-hourly && npm run test:social-quality-gate && npm run test:a2ui-engine && npm run test:gate-satisfy && npm run test:money-watcher && npm run test:budget && npm run test:quick-start && npm run test:utm && npm run test:product-feedback && npm run test:feedback-root-consolidator && npm run test:engagement-audit && npm run test:install-growth-automation && npm run test:publish-thumbgate-launch && npm run test:community-course-platform-launch-kit && npm run test:reconcile-thumbgate-campaign && npm run test:reddit-publisher && npm run test:schedule-thumbgate-campaign && npm run test:social-reply-monitor && npm run test:social-dedupe-cleanup && npm run test:sync-launch-assets && npm run test:ai-search-visibility && npm run test:perplexity && npm run test:security-scanner && npm run test:llm-client && npm run test:managed-lesson-agent && npm run test:self-distill && npm run test:meta-agent && npm run test:harness-selector && npm run test:thumbgate-bench && npm run test:seo-guides && npm run test:enforcement-loop && npm run test:cli-agent-experience && npm run test:bot-detection && npm run test:checkout-bot-guard && npm run test:session-health && npm run test:session-episodes && npm run test:spec-gate && npm run test:decision-trace && npm run test:dashboard-insights && npm run test:telemetry-tracked-link-slug && npm run test:prompt-eval && npm run test:demo-voiceover && npm run test:gate-coherence && npm run test:gate-eval && npm run test:high-roi && npm run test:public-static-assets && npm run test:token-savings && npm run test:numbers-page && npm run test:workflow-gate-checkpoint && npm run test:lesson-export-import && npm run test:landing-page-claims && npm run test:competitive-positioning-marketing && npm run test:medium-weekly && npm run test:dashboard-deeplink-e2e && npm run test:public-package-parity && npm run test:token-savings-dashboard && npm run test:cursor-wiring && npm run test:pretooluse-injection && npm run test:recent-corrective-context && npm run test:durability-step && npm run test:mailer && npm run test:brand-assets && npm run test:enforcement-teeth && npm run test:bayes-optimal-gate && npm run test:swarm-coordinator && npm run test:session-report && npm run test:agent-reasoning-traces && npm run test:judge-reward && npm run test:llm-behavior-monitor && npm run test:prompting-os && npm run test:single-use-credential-gate && npm run test:structured-prompt-driven && npm run test:require-evidence-gate && npm run test:rule-validator && npm run test:bluesky-atproto && npm run test:social-reply-monitor-bluesky && npm run test:bluesky-delete-replies && npm run test:architect-kit-memory-bridge && npm run test:sonar-review-hotspots && npm run test:actionable-remediations && npm run test:gemini-embedding-policy && npm run test:agent-design-governance && npm run test:public-core-boundary",
330
+ "test:python": "python3 -m pytest tests/*.py",
331
+ "test": "npm run test:python && npm run test:schema && npm run test:loop && npm run test:dpo && npm run test:kto && npm run test:api && npm run test:proof && npm run test:e2e && npm run test:rlaif && npm run test:attribution && npm run test:quality && npm run test:intelligence && npm run test:training-export && npm run test:deployment && npm run test:operational-integrity && npm run test:workflow && npm run test:billing && npm run test:cli && npm run test:watcher && npm run test:autoresearch && npm run test:ops && npm run test:session-analyzer && npm run test:tessl && npm run test:gates && npm run test:evoskill && npm run test:gates-hardening && npm run test:workers && npm run test:social-analytics && npm run test:memalign && npm run test:xmemory-lite && npm run test:filesystem-search && npm run test:zernio && npm run test:platform-limits && npm run test:post-video && npm run test:post-everywhere-instagram && npm run test:post-everywhere-channels && npm run test:post-everywhere-zernio-default && npm run test:zernio-canonical-pollers && npm run test:zernio-status && npm run test:obsidian-export && npm run test:lesson-db && npm run test:lesson-rotation && npm run test:memory-dedup && npm run test:feedback-quality && npm run test:sync-version && npm run test:check-congruence && npm run test:tool-registry && npm run test:feedback-to-rules && npm run test:memory-firewall && npm run test:memory-scope-readiness && npm run test:belief-update && npm run test:hosted-config && npm run test:operational-summary && npm run test:operational-dashboard && npm run test:operator-artifacts && npm run test:operator-key-auth && npm run test:cloudflare-sandbox && npm run test:mcp-config && npm run test:plan-gate && npm run test:pulse && npm run test:semantic-layer && npm run test:data-pipeline && npm run test:optimize-context && npm run test:principle-extractor && npm run test:analytics-window && npm run test:funnel-analytics && npm run test:experiment-tracker && npm run test:build-metadata && npm run test:context-engine && npm run test:hf-papers && npm run test:marketing-experiment && npm run test:seo-gsd && npm run test:verify-run && npm run test:export-dpo-pairs && npm run test:export-hf-dataset && npm run test:license && npm run test:bot-detector && npm run test:audit-pr-bot-contamination && npm run test:stripe-bootstrap-saas-catalog && npm run test:postinstall && npm run test:funnel-invariants && npm run test:cli-telemetry && npm run test:pro-parity && npm run test:model-tier-router && npm run test:computer-use-firewall && npm run test:skill-exporter && npm run test:statusline && npm run test:evolution && npm run test:org-dashboard && npm run test:multi-hop-recall && npm run test:synthetic-dpo && npm run test:thumbgate-skill && npm run test:learn-hub && npm run test:feedback-fallback && npm run test:metaclaw && npm run test:server-lock && npm run test:control-tower && npm run test:pii-scanner && npm run test:data-governance && npm run test:lesson-inference && npm run test:semantic-dedup && npm run test:fs-utils && npm run test:cli-schema && npm run test:explore && npm run test:lesson-reranker && npm run test:lesson-retrieval && npm run test:cross-encoder && npm run test:reflector-agent && npm run test:feedback-session && npm run test:feedback-history-distiller && npm run test:hallucination-detector && npm run test:history-distiller && npm run test:predictive-insights && npm run test:prove-predictive-insights && npm run test:statusbar-cli && npm run test:generate-instagram-card && npm run test:instagram-thumbgate-post && npm run test:publish-instagram-thumbgate && npm run test:lesson-synthesis && npm run test:lesson-canonical && npm run test:background-governance && npm run test:memory-migration && npm run test:prompt-dlp && npm run test:ephemeral-store && npm run test:agent-security && npm run test:skill-progressive && npm run test:per-step-scoring && npm run test:weekly-auto-post && npm run test:social-post-hourly && npm run test:social-quality-gate && npm run test:a2ui-engine && npm run test:gate-satisfy && npm run test:money-watcher && npm run test:budget && npm run test:quick-start && npm run test:utm && npm run test:product-feedback && npm run test:feedback-root-consolidator && npm run test:engagement-audit && npm run test:install-growth-automation && npm run test:publish-thumbgate-launch && npm run test:community-course-platform-launch-kit && npm run test:reconcile-thumbgate-campaign && npm run test:reddit-publisher && npm run test:schedule-thumbgate-campaign && npm run test:social-reply-monitor && npm run test:social-dedupe-cleanup && npm run test:sync-launch-assets && npm run test:ai-search-visibility && npm run test:perplexity && npm run test:security-scanner && npm run test:llm-client && npm run test:managed-lesson-agent && npm run test:self-distill && npm run test:meta-agent && npm run test:harness-selector && npm run test:thumbgate-bench && npm run test:seo-guides && npm run test:enforcement-loop && npm run test:cli-agent-experience && npm run test:bot-detection && npm run test:checkout-bot-guard && npm run test:checkout-pro-confirmation-gate && npm run test:session-health && npm run test:session-episodes && npm run test:spec-gate && npm run test:decision-trace && npm run test:dashboard-insights && npm run test:telemetry-tracked-link-slug && npm run test:prompt-eval && npm run test:demo-voiceover && npm run test:gate-coherence && npm run test:gate-eval && npm run test:high-roi && npm run test:public-static-assets && npm run test:token-savings && npm run test:numbers-page && npm run test:workflow-gate-checkpoint && npm run test:lesson-export-import && npm run test:landing-page-claims && npm run test:competitive-positioning-marketing && npm run test:medium-weekly && npm run test:dashboard-deeplink-e2e && npm run test:public-package-parity && npm run test:token-savings-dashboard && npm run test:cursor-wiring && npm run test:pretooluse-injection && npm run test:recent-corrective-context && npm run test:durability-step && npm run test:mailer && npm run test:brand-assets && npm run test:enforcement-teeth && npm run test:bayes-optimal-gate && npm run test:swarm-coordinator && npm run test:session-report && npm run test:agent-reasoning-traces && npm run test:judge-reward && npm run test:llm-behavior-monitor && npm run test:prompting-os && npm run test:single-use-credential-gate && npm run test:structured-prompt-driven && npm run test:require-evidence-gate && npm run test:rule-validator && npm run test:bluesky-atproto && npm run test:social-reply-monitor-bluesky && npm run test:bluesky-delete-replies && npm run test:architect-kit-memory-bridge && npm run test:sonar-review-hotspots && npm run test:actionable-remediations && npm run test:gemini-embedding-policy && npm run test:agent-design-governance && npm run test:public-core-boundary && npm run test:hook-stop-verify-deploy && npm run test:hook-stop-anti-claim && npm run test:plausible-server-events && npm run test:activation-tracker && npm run test:unified-revenue-rollup && npm run test:conversion-rate-stats && npm run test:external-customer-audit && npm run test:telemetry-export",
332
+ "test:hook-stop-verify-deploy": "node --test tests/hook-stop-verify-deploy.test.js",
333
+ "test:hook-stop-anti-claim": "node --test tests/hook-stop-anti-claim.test.js",
334
+ "test:plausible-server-events": "node --test tests/plausible-server-events.test.js",
335
+ "test:activation-tracker": "node --test tests/activation-tracker.test.js",
336
+ "test:unified-revenue-rollup": "node --test tests/unified-revenue-rollup.test.js",
337
+ "test:conversion-rate-stats": "node --test tests/conversion-rate-stats.test.js",
338
+ "test:external-customer-audit": "node --test tests/external-customer-audit.test.js",
339
+ "test:telemetry-export": "node --test tests/telemetry-export.test.js",
326
340
  "test:swarm-coordinator": "node --test tests/swarm-coordinator.test.js",
327
341
  "test:session-report": "node --test tests/session-report.test.js",
328
342
  "test:agent-reasoning-traces": "node --test tests/agent-reasoning-traces.test.js tests/agent-stack-survival-audit.test.js",
@@ -346,6 +360,8 @@
346
360
  "test:telemetry-tracked-link-slug": "node --test tests/telemetry-tracked-link-slug.test.js",
347
361
  "test:prompt-eval": "node --test tests/prompt-eval.test.js",
348
362
  "eval:feedback": "node scripts/prompt-eval.js --from-feedback",
363
+ "eval:feedback-quality": "python3 scripts/feedback_quality_eval.py",
364
+ "eval:classifier": "python3 scripts/eval_gate_classifier.py",
349
365
  "test:decision-trace": "node --test tests/decision-trace.test.js",
350
366
  "test:feedback-fallback": "node --test tests/feedback-fallback.test.js",
351
367
  "test:metaclaw": "node --test tests/metaclaw-features.test.js",
@@ -365,6 +381,7 @@
365
381
  "test:learn-hub": "node --test tests/learn-hub.test.js",
366
382
  "test:feedback-to-rules": "node --test tests/feedback-to-rules.test.js",
367
383
  "test:memory-firewall": "node --test tests/memory-firewall.test.js",
384
+ "test:memory-scope-readiness": "node --test tests/memory-scope-readiness.test.js",
368
385
  "test:belief-update": "node --test tests/belief-update.test.js",
369
386
  "test:hosted-config": "node --test tests/hosted-config.test.js",
370
387
  "test:operational-summary": "node --test tests/operational-summary.test.js",
@@ -403,10 +420,10 @@
403
420
  "test:kto": "node --test tests/export-kto.test.js",
404
421
  "test:api": "node --test --test-concurrency=1 tests/api-server.test.js tests/api-events-sse.test.js tests/api-auth-config.test.js tests/mcp-server.test.js tests/adapters.test.js tests/openapi-parity.test.js tests/budget-guard.test.js tests/context-manager.test.js tests/contextfs.test.js tests/job-api.test.js tests/pack-templates.test.js tests/dashboard.test.js tests/dashboard-render-spec.test.js tests/dashboard-html.test.js tests/agent-readiness.test.js tests/mcp-policy.test.js tests/subagent-profiles.test.js tests/intent-router.test.js tests/internal-agent-bootstrap.test.js tests/lesson-search.test.js tests/thumbgate-search.test.js tests/document-intake.test.js tests/rubric-engine.test.js tests/self-healing-check.test.js tests/self-heal.test.js tests/feedback-schema.test.js tests/thompson-sampling.test.js tests/feedback-sequences.test.js tests/diversity-tracking.test.js tests/vector-store.test.js tests/gemini-embedding-policy.test.js tests/feedback-attribution.test.js tests/hybrid-feedback-context.test.js tests/loop-closure.test.js tests/code-reasoning.test.js tests/feedback-loop.test.js tests/feedback-inbox-read.test.js tests/feedback-to-memory.test.js tests/test-coverage.test.js tests/version-metadata.test.js tests/claude-mcpb.test.js tests/claude-codex-bridge.test.js tests/cursor-plugin.test.js tests/codex-plugin.test.js tests/ide-marketplace-extensions.test.js tests/telemetry-analytics.test.js tests/public-landing.test.js tests/lessons-page.test.js tests/pro-landing.test.js tests/local-model-profile.test.js tests/risk-scorer.test.js tests/context-compaction.test.js tests/reminder-engine.test.js tests/post-to-x.test.js tests/verification-loop.test.js tests/async-job-runner.test.js tests/commerce-quality.test.js tests/recall-limit.test.js tests/problem-detail.test.js tests/natural-language-harness.test.js tests/settings-hierarchy.test.js",
405
422
  "test:proof": "node --test tests/prove-adapters.test.js tests/prove-attribution.test.js tests/prove-cloudflare-sandbox.test.js tests/prove-data-quality.test.js tests/prove-intelligence.test.js tests/prove-lancedb.test.js tests/prove-loop-closure.test.js tests/prove-training-export.test.js tests/prove-local-intelligence.test.js tests/prove-workflow-contract.test.js tests/prove-autoresearch.test.js tests/prove-claim-verification.test.js tests/prove-data-pipeline.test.js tests/prove-evolution.test.js tests/prove-harnesses.test.js tests/prove-packaged-runtime.test.js tests/prove-runtime.test.js tests/prove-seo-gsd.test.js tests/prove-settings.test.js tests/prove-xmemory.test.js && node --test tests/prove-automation.test.js",
406
- "test:e2e": "node --test tests/e2e-pipeline.test.js tests/e2e-product-flows.test.js tests/e2e-coverage-contract.test.js",
423
+ "test:e2e": "node --test tests/e2e-pipeline.test.js tests/e2e-product-flows.test.js tests/e2e-coverage-contract.test.js tests/interaction-model-e2e.test.js",
407
424
  "test:rlaif": "node --test tests/rlaif-self-audit.test.js tests/dpo-optimizer.test.js tests/meta-policy.test.js tests/agent-reward-model.test.js",
408
425
  "test:attribution": "node --test tests/feedback-attribution.test.js tests/hybrid-feedback-context.test.js",
409
- "test:quality": "node --test tests/validate-feedback.test.js",
426
+ "test:quality": "node --test tests/validate-feedback.test.js tests/feedback-quality-eval-python.test.js tests/eval-gate-classifier.test.js",
410
427
  "test:intelligence": "node --test tests/intelligence.test.js",
411
428
  "test:training-export": "node --test tests/training-export.test.js tests/databricks-export.test.js",
412
429
  "test:deployment": "node --test tests/deployment.test.js tests/deploy-policy.test.js tests/publish-decision.test.js tests/changeset-check.test.js tests/release-notes.test.js tests/sonarcloud-workflow.test.js tests/package-boundary.test.js tests/public-package-boundary.test.js tests/revenue-observability-workflow.test.js",
@@ -486,8 +503,11 @@
486
503
  "test:zernio-status": "node --test tests/zernio-status.test.js",
487
504
  "test:license": "node --test tests/license.test.js",
488
505
  "test:bot-detector": "node --test tests/bot-detector.test.js",
506
+ "test:audit-pr-bot-contamination": "node --test tests/audit-pr-bot-contamination.test.js",
507
+ "test:stripe-bootstrap-saas-catalog": "node --test tests/stripe-bootstrap-saas-catalog.test.js",
489
508
  "test:bot-detection": "node --test tests/bot-detection.test.js",
490
509
  "test:checkout-bot-guard": "node --test tests/checkout-bot-guard.test.js",
510
+ "test:checkout-pro-confirmation-gate": "node --test tests/checkout-pro-confirmation-gate.test.js",
491
511
  "test:postinstall": "node --test tests/postinstall.test.js",
492
512
  "test:funnel-invariants": "node --test tests/funnel-invariants.test.js",
493
513
  "test:cli-telemetry": "node --test tests/cli-telemetry.test.js",
@@ -597,7 +617,8 @@
597
617
  "test:gate-eval": "node --test tests/gate-eval.test.js",
598
618
  "gate-eval:ci": "node scripts/gate-eval.js run",
599
619
  "test:ai-engineering-stack-guardrails": "node --test tests/ai-engineering-stack-guardrails.test.js",
600
- "test:high-roi": "node --test tests/high-roi.test.js tests/model-candidates.test.js tests/autonomous-workflow.test.js tests/high-roi-agent-workflows.test.js tests/code-graph-guardrails.test.js tests/proxy-pointer-rag-guardrails.test.js tests/rag-precision-guardrails.test.js tests/ai-engineering-stack-guardrails.test.js tests/long-running-agent-context-guardrails.test.js tests/reasoning-efficiency-guardrails.test.js tests/deepseek-v4-runtime-guardrails.test.js tests/upstream-contribution-engine.test.js tests/proactive-agent-eval-guardrails.test.js tests/reward-hacking-guardrails.test.js tests/chatgpt-ads-readiness-pack.test.js tests/oss-pr-opportunity-scout.test.js tests/agent-design-governance.test.js tests/gemini-embedding-policy.test.js tests/openclaw-agent-governance-kit.test.js",
620
+ "test:interaction-model": "node --test tests/interaction-model.test.js tests/interaction-model-e2e.test.js",
621
+ "test:high-roi": "node --test tests/high-roi.test.js tests/model-candidates.test.js tests/autonomous-workflow.test.js tests/high-roi-agent-workflows.test.js tests/interaction-model.test.js tests/interaction-model-e2e.test.js tests/code-graph-guardrails.test.js tests/proxy-pointer-rag-guardrails.test.js tests/rag-precision-guardrails.test.js tests/ai-engineering-stack-guardrails.test.js tests/long-running-agent-context-guardrails.test.js tests/reasoning-efficiency-guardrails.test.js tests/deepseek-v4-runtime-guardrails.test.js tests/upstream-contribution-engine.test.js tests/proactive-agent-eval-guardrails.test.js tests/reward-hacking-guardrails.test.js tests/chatgpt-ads-readiness-pack.test.js tests/oss-pr-opportunity-scout.test.js tests/agent-design-governance.test.js tests/gemini-embedding-policy.test.js tests/openclaw-agent-governance-kit.test.js",
601
622
  "test:public-static-assets": "node --test tests/public-static-assets.test.js",
602
623
  "test:token-savings": "node --test tests/token-savings.test.js",
603
624
  "test:numbers-page": "node --test tests/numbers-page.test.js",
@@ -674,9 +695,9 @@
674
695
  "node": ">=18.18.0"
675
696
  },
676
697
  "dependencies": {
677
- "@anthropic-ai/sdk": "0.92.0",
698
+ "@anthropic-ai/sdk": "0.95.2",
678
699
  "@google/genai": "1.49.0",
679
- "@huggingface/transformers": "^4.1.0",
700
+ "@huggingface/transformers": "^4.2.0",
680
701
  "@lancedb/lancedb": "^0.27.2",
681
702
  "apache-arrow": "^18.1.0",
682
703
  "better-sqlite3": "^12.9.0",
@@ -692,7 +713,7 @@
692
713
  },
693
714
  "mcpName": "io.github.IgorGanapolsky/thumbgate",
694
715
  "devDependencies": {
695
- "@changesets/changelog-github": "^0.6.0",
716
+ "@changesets/changelog-github": "^0.7.0",
696
717
  "@changesets/cli": "^2.31.0",
697
718
  "c8": "^11.0.0",
698
719
  "undici": "^8.2.0"
@@ -255,6 +255,12 @@
255
255
  <p><a href="/compare/agentix-labs" class="cta">Read ThumbGate vs Agentix Labs</a></p>
256
256
  </div>
257
257
 
258
+ <div class="card">
259
+ <h3>Already using a supply-chain scanner?</h3>
260
+ <p>HEIDI (Meterian) catches AI assistants suggesting vulnerable npm/pip/maven packages — the supply-chain half of AI coding safety. ThumbGate is the behavior half: it blocks the agent's tool call before it fires a known-bad pattern. Different threat surfaces, both local-first, both free at base. Run both.</p>
261
+ <p><a href="/compare/heidi" class="cta">Read ThumbGate vs HEIDI</a></p>
262
+ </div>
263
+
258
264
  <h2>How It Works</h2>
259
265
  <div class="step-grid">
260
266
  <div class="step-card">