thumbgate 1.17.0 → 1.19.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +2 -2
- package/.claude-plugin/plugin.json +1 -1
- package/.well-known/mcp/server-card.json +1 -1
- package/README.md +26 -4
- package/adapters/claude/.mcp.json +2 -2
- package/adapters/mcp/server-stdio.js +1 -1
- package/adapters/opencode/opencode.json +1 -1
- package/config/model-candidates.json +31 -0
- package/package.json +29 -8
- package/public/compare.html +6 -0
- package/public/federal.html +375 -0
- package/public/guide.html +2 -2
- package/public/index.html +49 -19
- package/public/learn.html +28 -0
- package/public/numbers.html +2 -2
- package/public/pro.html +4 -4
- package/scripts/activation-tracker.js +127 -0
- package/scripts/auto-promote-gates.js +4 -1
- package/scripts/feedback-loop.js +14 -1
- package/scripts/feedback-to-rules.js +11 -1
- package/scripts/feedback_quality_eval.py +725 -0
- package/scripts/memory-scope-readiness.js +315 -0
- package/scripts/plausible-server-events.js +162 -0
- package/scripts/seo-gsd.js +75 -2
- package/scripts/statusline-links.js +2 -0
- package/scripts/telemetry-analytics.js +1 -0
- package/src/api/server.js +557 -12
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "thumbgate-marketplace",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.19.0",
|
|
4
4
|
"owner": {
|
|
5
5
|
"name": "Igor Ganapolsky",
|
|
6
6
|
"email": "ig5973700@gmail.com"
|
|
@@ -13,7 +13,7 @@
|
|
|
13
13
|
"source": "npm",
|
|
14
14
|
"package": "thumbgate"
|
|
15
15
|
},
|
|
16
|
-
"version": "1.
|
|
16
|
+
"version": "1.19.0",
|
|
17
17
|
"author": {
|
|
18
18
|
"name": "Igor Ganapolsky"
|
|
19
19
|
},
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "thumbgate",
|
|
3
3
|
"description": "Type 👍 or 👎 on any agent action. ThumbGate captures it, distills a lesson, and blocks the pattern from repeating. One thumbs-down = the agent physically cannot make that mistake again. 33 pre-action checks, budget enforcement, self-protection, and NIST/SOC2 compliance tags.",
|
|
4
|
-
"version": "1.
|
|
4
|
+
"version": "1.19.0",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Igor Ganapolsky"
|
|
7
7
|
},
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "thumbgate",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.19.0",
|
|
4
4
|
"description": "ThumbGate — 👍👎 feedback that teaches your AI agent. Thumbs down a mistake, it never happens again.",
|
|
5
5
|
"homepage": "https://thumbgate-production.up.railway.app",
|
|
6
6
|
"transport": "stdio",
|
package/README.md
CHANGED
|
@@ -8,7 +8,7 @@
|
|
|
8
8
|
|
|
9
9
|
**Your AI coding bill has a leak.**
|
|
10
10
|
|
|
11
|
-
**Stop paying
|
|
11
|
+
**Stop paying for the same AI mistake twice.**
|
|
12
12
|
|
|
13
13
|
Every retry loop, every hallucinated import, every *"let me try a different approach"* — those are billable tokens on every LLM vendor's bill. Thumbs-down once; ThumbGate blocks that exact mistake on every future call. Across Claude Code, Cursor, Codex, Gemini, Amp, Cline, OpenCode — any MCP-compatible agent, forever.
|
|
14
14
|
|
|
@@ -24,6 +24,14 @@ Under the hood: your thumbs-down becomes one of your **Pre-Action Checks** that
|
|
|
24
24
|
|
|
25
25
|
---
|
|
26
26
|
|
|
27
|
+
> *"A better dashboard doesn't make the agents more reliable. The hard part isn't visibility. It's trust."*
|
|
28
|
+
>
|
|
29
|
+
> — **Rob May**, CEO & co-founder, Neurometric AI, quoted in [The New Stack](https://thenewstack.io/claude-code-agent-view/) on Anthropic's Claude Code Agent View (May 2026).
|
|
30
|
+
>
|
|
31
|
+
> ThumbGate is the open-source layer that makes the trust part real: PreToolUse gates, thumbs-down to rule, audit trail on every interception.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
27
35
|
## 🎬 90-second demo
|
|
28
36
|
|
|
29
37
|
Watch the force-push scenario: agent tries to `git push --force`, one thumbs-down, next session it's blocked — zero tokens spent on the repeat.
|
|
@@ -107,13 +115,26 @@ ThumbGate operates as a 4-layer enforcement stack between your AI agent and your
|
|
|
107
115
|
Your thumbs-up/down reactions are captured via MCP protocol, CLI, or the ChatGPT GPT surface. Each reaction is stored as a structured lesson with context, timestamp, and severity.
|
|
108
116
|
|
|
109
117
|
### Layer 2: Check Engine
|
|
110
|
-
The check engine converts lessons into enforceable rules
|
|
118
|
+
The check engine converts lessons into enforceable rules. **The runtime gate decision is deterministic** — literal pattern match → AST match → scoped rule lookup. No LLM call on the enforcement path.
|
|
119
|
+
|
|
120
|
+
Where retrieval is needed (an agent is about to run a destructive command not on the literal block list, but semantically similar to one we've blocked before), ThumbGate uses local CPU-only `bge-small` embeddings via LanceDB's built-in pipeline. No external API call, no inference cost beyond CPU. So **"no LLM in enforcement"** holds: the gate decision uses no LLM; the rule corpus is just searchable via local embeddings.
|
|
121
|
+
|
|
122
|
+
**Thompson Sampling tunes per-rule confidence weights** for soft-gating rules so high-noise rules quiet down and high-signal rules sharpen. It never decides *whether* a rule fires — a hard rule like "block `git push --force` on main" always fires deterministically. Bandit exploration would be terrifying for hard rules; we don't do it.
|
|
123
|
+
|
|
124
|
+
Rules stay in local ThumbGate runtime state.
|
|
111
125
|
|
|
112
126
|
### Layer 3: Pre-Action Interception
|
|
113
127
|
Before any agent action executes, ThumbGate's `PreToolUse` hook intercepts the command and evaluates it against all active checks. This happens at the MCP protocol level — the agent physically cannot bypass it.
|
|
114
128
|
|
|
115
|
-
### Layer 4: Multi-Agent Distribution
|
|
116
|
-
|
|
129
|
+
### Layer 4: Multi-Agent Distribution (the actual moat vs hand-rolled hooks)
|
|
130
|
+
Claude Code already ships `permissions.deny` and `PreToolUse` hooks. Cursor and Codex have their own. So why ThumbGate over a hand-written hook?
|
|
131
|
+
|
|
132
|
+
Two things hand-written hooks structurally cannot do:
|
|
133
|
+
|
|
134
|
+
1. **Cross-agent propagation.** A `permissions.deny` pattern lives in one agent's config and stays there. ThumbGate's checks distribute across every connected agent over MCP stdio — thumbs-down once in Cursor, the same pattern blocks on Claude Code, Codex, Gemini CLI, Cline, OpenCode, Amp in the next session, no copy-paste between configs.
|
|
135
|
+
2. **Learning loop.** A hand-written hook covers exactly the patterns you wrote. ThumbGate promotes every thumbs-down into a fresh rule, tunes existing rules' confidence weights from outcomes (Thompson Sampling, see Layer 2), and pulls semantically-near patterns into scope via local embeddings. The rule corpus sharpens without an operator hand-writing a regex for every new mistake shape.
|
|
136
|
+
|
|
137
|
+
Hand-rolled hooks are the right tool for a small, static denylist you maintain by hand. ThumbGate is the right tool when you want corrections from any agent to harden every agent automatically.
|
|
117
138
|
|
|
118
139
|
Prompt engineering still matters, but it is only the starting point. ThumbGate adds prompt evaluation on top: proof lanes, benchmarks, and self-heal checks tell you whether your prompt and workflow actually held up under execution instead of leaving you to guess from vibes. Run `npx thumbgate eval --from-feedback --write-report=.thumbgate/prompt-eval-proof.md` to turn real thumbs-up/down feedback into reusable eval cases and a buyer-ready proof report.
|
|
119
140
|
|
|
@@ -423,6 +444,7 @@ Pro ($19/mo or $149/yr) removes the rule cap and adds history-aware lesson recal
|
|
|
423
444
|
|
|
424
445
|
## Docs
|
|
425
446
|
|
|
447
|
+
- [**ThumbGate for Federal Agencies**](docs/FEDERAL.md) — pilot-ready posture, NIST 800-53 control mapping, OMB M-24-10 / EO 14110 alignment, ThumbGate-Core gov deployment mode, public/Core boundary invariants. Landing page: [thumbgate.ai/federal](https://thumbgate-production.up.railway.app/federal).
|
|
426
448
|
- [First Dollar Playbook](docs/FIRST_DOLLAR_PLAYBOOK.md) — turning one painful workflow into the next booked pilot
|
|
427
449
|
- [Commercial Truth](docs/COMMERCIAL_TRUTH.md) — pricing, claims, what we don't say
|
|
428
450
|
- [Changeset Strategy](docs/CHANGESET_STRATEGY.md) — release notes and version bump enforcement
|
|
@@ -2,13 +2,13 @@
|
|
|
2
2
|
"mcpServers": {
|
|
3
3
|
"thumbgate": {
|
|
4
4
|
"command": "npx",
|
|
5
|
-
"args": ["--yes", "--package", "thumbgate@1.
|
|
5
|
+
"args": ["--yes", "--package", "thumbgate@1.19.0", "thumbgate", "serve"]
|
|
6
6
|
}
|
|
7
7
|
},
|
|
8
8
|
"hooks": {
|
|
9
9
|
"preToolUse": {
|
|
10
10
|
"command": "npx",
|
|
11
|
-
"args": ["--yes", "--package", "thumbgate@1.
|
|
11
|
+
"args": ["--yes", "--package", "thumbgate@1.19.0", "thumbgate", "gate-check"]
|
|
12
12
|
}
|
|
13
13
|
}
|
|
14
14
|
}
|
|
@@ -216,7 +216,7 @@ const {
|
|
|
216
216
|
finalizeSession: finalizeFeedbackSession,
|
|
217
217
|
} = require('../../scripts/feedback-session');
|
|
218
218
|
|
|
219
|
-
const SERVER_INFO = { name: 'thumbgate-mcp', version: '1.
|
|
219
|
+
const SERVER_INFO = { name: 'thumbgate-mcp', version: '1.19.0' };
|
|
220
220
|
const COMMERCE_CATEGORIES = [
|
|
221
221
|
'product_recommendation',
|
|
222
222
|
'brand_compliance',
|
|
@@ -76,9 +76,40 @@
|
|
|
76
76
|
"medianLatencyMs",
|
|
77
77
|
"costPerAnalysisUsd"
|
|
78
78
|
]
|
|
79
|
+
},
|
|
80
|
+
"tokenizer-brittleness": {
|
|
81
|
+
"label": "Tokenizer brittleness and byte-level robustness",
|
|
82
|
+
"summary": "Evaluate models for malformed JSONL, Unicode confusables, stack traces, secrets, SQL snippets, file paths, and code-symbol-heavy inputs before routing log, code, or security workloads.",
|
|
83
|
+
"desiredStrengths": ["tokenizer-free", "byte-level", "log-robustness", "code-symbols", "security-scanning", "fast-inference"],
|
|
84
|
+
"targetContextWindow": 64000,
|
|
85
|
+
"benchmarkCommands": [
|
|
86
|
+
"npx thumbgate model-candidates --workload=tokenizer-brittleness --json",
|
|
87
|
+
"node --test tests/model-candidates.test.js --test-name-pattern=tokenizer",
|
|
88
|
+
"node scripts/gate-eval.js run"
|
|
89
|
+
],
|
|
90
|
+
"metrics": [
|
|
91
|
+
"caseCoverage",
|
|
92
|
+
"symbolPreservationRate",
|
|
93
|
+
"secretDetectionRecall",
|
|
94
|
+
"jsonlRepairAccuracy",
|
|
95
|
+
"medianLatencyMs",
|
|
96
|
+
"memoryBandwidthEstimate"
|
|
97
|
+
]
|
|
79
98
|
}
|
|
80
99
|
},
|
|
81
100
|
"candidates": [
|
|
101
|
+
{
|
|
102
|
+
"id": "research/fast-byte-latent-transformer",
|
|
103
|
+
"vendor": "Meta + Stanford + University of Washington",
|
|
104
|
+
"family": "blt",
|
|
105
|
+
"provider": "research",
|
|
106
|
+
"model": "fast-byte-latent-transformer",
|
|
107
|
+
"contextWindow": 64000,
|
|
108
|
+
"costClass": "medium",
|
|
109
|
+
"researchOnly": true,
|
|
110
|
+
"strengths": ["tokenizer-free", "byte-level", "log-robustness", "code-symbols", "security-scanning", "fast-inference"],
|
|
111
|
+
"notes": "Research-only candidate inspired by Fast BLT. Use as an evaluation target for tokenizer-free robustness and memory-bandwidth planning; do not route production traffic until a maintained runtime and model weights exist."
|
|
112
|
+
},
|
|
82
113
|
{
|
|
83
114
|
"id": "self-hosted/deepseek-v4-flash-sglang",
|
|
84
115
|
"vendor": "DeepSeek",
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "thumbgate",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.19.0",
|
|
4
4
|
"description": "ThumbGate self-improving agent governance: thumbs-up/down turns every mistake into a prevention rule and blocks repeat patterns. 33 pre-action checks, budget enforcement, and self-protection for Claude Code, Cursor, Codex, Gemini CLI, and Amp.",
|
|
5
5
|
"homepage": "https://thumbgate-production.up.railway.app",
|
|
6
6
|
"repository": {
|
|
@@ -39,6 +39,7 @@
|
|
|
39
39
|
"public/codex-plugin.html",
|
|
40
40
|
"public/compare.html",
|
|
41
41
|
"public/dashboard.html",
|
|
42
|
+
"public/federal.html",
|
|
42
43
|
"public/guide.html",
|
|
43
44
|
"public/index.html",
|
|
44
45
|
"public/learn.html",
|
|
@@ -46,6 +47,7 @@
|
|
|
46
47
|
"public/numbers.html",
|
|
47
48
|
"public/pro.html",
|
|
48
49
|
"scripts/access-anomaly-detector.js",
|
|
50
|
+
"scripts/activation-tracker.js",
|
|
49
51
|
"scripts/agent-audit-trace.js",
|
|
50
52
|
"scripts/agent-design-governance.js",
|
|
51
53
|
"scripts/agent-memory-lifecycle.js",
|
|
@@ -110,6 +112,7 @@
|
|
|
110
112
|
"scripts/feedback-loop.js",
|
|
111
113
|
"scripts/feedback-paths.js",
|
|
112
114
|
"scripts/feedback-quality.js",
|
|
115
|
+
"scripts/feedback_quality_eval.py",
|
|
113
116
|
"scripts/feedback-schema.js",
|
|
114
117
|
"scripts/feedback-session.js",
|
|
115
118
|
"scripts/feedback-to-rules.js",
|
|
@@ -151,6 +154,7 @@
|
|
|
151
154
|
"scripts/mcp-policy.js",
|
|
152
155
|
"scripts/mcp-transport-strategy.js",
|
|
153
156
|
"scripts/memory-firewall.js",
|
|
157
|
+
"scripts/memory-scope-readiness.js",
|
|
154
158
|
"scripts/meta-agent-loop.js",
|
|
155
159
|
"scripts/model-access-eligibility.js",
|
|
156
160
|
"scripts/model-migration-readiness.js",
|
|
@@ -162,6 +166,7 @@
|
|
|
162
166
|
"scripts/otel-declarative-config.js",
|
|
163
167
|
"scripts/operational-integrity.js",
|
|
164
168
|
"scripts/perplexity-client.js",
|
|
169
|
+
"scripts/plausible-server-events.js",
|
|
165
170
|
"scripts/pr-manager.js",
|
|
166
171
|
"scripts/private-core-boundary.js",
|
|
167
172
|
"scripts/pro-local-dashboard.js",
|
|
@@ -322,7 +327,16 @@
|
|
|
322
327
|
"social:prospect:bluesky": "node scripts/social-bluesky-prospecting.js",
|
|
323
328
|
"social:prospect:bluesky:dry": "node scripts/social-bluesky-prospecting.js --dry-run",
|
|
324
329
|
"social:reply-publish:bluesky:dry": "node scripts/social-reply-monitor-bluesky.js --publish-approved --dry-run",
|
|
325
|
-
"test": "
|
|
330
|
+
"test:python": "python3 -m pytest tests/*.py",
|
|
331
|
+
"test": "npm run test:python && npm run test:schema && npm run test:loop && npm run test:dpo && npm run test:kto && npm run test:api && npm run test:proof && npm run test:e2e && npm run test:rlaif && npm run test:attribution && npm run test:quality && npm run test:intelligence && npm run test:training-export && npm run test:deployment && npm run test:operational-integrity && npm run test:workflow && npm run test:billing && npm run test:cli && npm run test:watcher && npm run test:autoresearch && npm run test:ops && npm run test:session-analyzer && npm run test:tessl && npm run test:gates && npm run test:evoskill && npm run test:gates-hardening && npm run test:workers && npm run test:social-analytics && npm run test:memalign && npm run test:xmemory-lite && npm run test:filesystem-search && npm run test:zernio && npm run test:platform-limits && npm run test:post-video && npm run test:post-everywhere-instagram && npm run test:post-everywhere-channels && npm run test:post-everywhere-zernio-default && npm run test:zernio-canonical-pollers && npm run test:zernio-status && npm run test:obsidian-export && npm run test:lesson-db && npm run test:lesson-rotation && npm run test:memory-dedup && npm run test:feedback-quality && npm run test:sync-version && npm run test:check-congruence && npm run test:tool-registry && npm run test:feedback-to-rules && npm run test:memory-firewall && npm run test:memory-scope-readiness && npm run test:belief-update && npm run test:hosted-config && npm run test:operational-summary && npm run test:operational-dashboard && npm run test:operator-artifacts && npm run test:operator-key-auth && npm run test:cloudflare-sandbox && npm run test:mcp-config && npm run test:plan-gate && npm run test:pulse && npm run test:semantic-layer && npm run test:data-pipeline && npm run test:optimize-context && npm run test:principle-extractor && npm run test:analytics-window && npm run test:funnel-analytics && npm run test:experiment-tracker && npm run test:build-metadata && npm run test:context-engine && npm run test:hf-papers && npm run test:marketing-experiment && npm run test:seo-gsd && npm run test:verify-run && npm run test:export-dpo-pairs && npm run test:export-hf-dataset && npm run test:license && npm run test:bot-detector && npm run test:audit-pr-bot-contamination && npm run test:stripe-bootstrap-saas-catalog && npm run test:postinstall && npm run test:funnel-invariants && npm run test:cli-telemetry && npm run test:pro-parity && npm run test:model-tier-router && npm run test:computer-use-firewall && npm run test:skill-exporter && npm run test:statusline && npm run test:evolution && npm run test:org-dashboard && npm run test:multi-hop-recall && npm run test:synthetic-dpo && npm run test:thumbgate-skill && npm run test:learn-hub && npm run test:feedback-fallback && npm run test:metaclaw && npm run test:server-lock && npm run test:control-tower && npm run test:pii-scanner && npm run test:data-governance && npm run test:lesson-inference && npm run test:semantic-dedup && npm run test:fs-utils && npm run test:cli-schema && npm run test:explore && npm run test:lesson-reranker && npm run test:lesson-retrieval && npm run test:cross-encoder && npm run test:reflector-agent && npm run test:feedback-session && npm run test:feedback-history-distiller && npm run test:hallucination-detector && npm run test:history-distiller && npm run test:predictive-insights && npm run test:prove-predictive-insights && npm run test:statusbar-cli && npm run test:generate-instagram-card && npm run test:instagram-thumbgate-post && npm run test:publish-instagram-thumbgate && npm run test:lesson-synthesis && npm run test:lesson-canonical && npm run test:background-governance && npm run test:memory-migration && npm run test:prompt-dlp && npm run test:ephemeral-store && npm run test:agent-security && npm run test:skill-progressive && npm run test:per-step-scoring && npm run test:weekly-auto-post && npm run test:social-post-hourly && npm run test:social-quality-gate && npm run test:a2ui-engine && npm run test:gate-satisfy && npm run test:money-watcher && npm run test:budget && npm run test:quick-start && npm run test:utm && npm run test:product-feedback && npm run test:feedback-root-consolidator && npm run test:engagement-audit && npm run test:install-growth-automation && npm run test:publish-thumbgate-launch && npm run test:community-course-platform-launch-kit && npm run test:reconcile-thumbgate-campaign && npm run test:reddit-publisher && npm run test:schedule-thumbgate-campaign && npm run test:social-reply-monitor && npm run test:social-dedupe-cleanup && npm run test:sync-launch-assets && npm run test:ai-search-visibility && npm run test:perplexity && npm run test:security-scanner && npm run test:llm-client && npm run test:managed-lesson-agent && npm run test:self-distill && npm run test:meta-agent && npm run test:harness-selector && npm run test:thumbgate-bench && npm run test:seo-guides && npm run test:enforcement-loop && npm run test:cli-agent-experience && npm run test:bot-detection && npm run test:checkout-bot-guard && npm run test:checkout-pro-confirmation-gate && npm run test:session-health && npm run test:session-episodes && npm run test:spec-gate && npm run test:decision-trace && npm run test:dashboard-insights && npm run test:telemetry-tracked-link-slug && npm run test:prompt-eval && npm run test:demo-voiceover && npm run test:gate-coherence && npm run test:gate-eval && npm run test:high-roi && npm run test:public-static-assets && npm run test:token-savings && npm run test:numbers-page && npm run test:workflow-gate-checkpoint && npm run test:lesson-export-import && npm run test:landing-page-claims && npm run test:competitive-positioning-marketing && npm run test:medium-weekly && npm run test:dashboard-deeplink-e2e && npm run test:public-package-parity && npm run test:token-savings-dashboard && npm run test:cursor-wiring && npm run test:pretooluse-injection && npm run test:recent-corrective-context && npm run test:durability-step && npm run test:mailer && npm run test:brand-assets && npm run test:enforcement-teeth && npm run test:bayes-optimal-gate && npm run test:swarm-coordinator && npm run test:session-report && npm run test:agent-reasoning-traces && npm run test:judge-reward && npm run test:llm-behavior-monitor && npm run test:prompting-os && npm run test:single-use-credential-gate && npm run test:structured-prompt-driven && npm run test:require-evidence-gate && npm run test:rule-validator && npm run test:bluesky-atproto && npm run test:social-reply-monitor-bluesky && npm run test:bluesky-delete-replies && npm run test:architect-kit-memory-bridge && npm run test:sonar-review-hotspots && npm run test:actionable-remediations && npm run test:gemini-embedding-policy && npm run test:agent-design-governance && npm run test:public-core-boundary && npm run test:hook-stop-verify-deploy && npm run test:hook-stop-anti-claim && npm run test:plausible-server-events && npm run test:activation-tracker && npm run test:unified-revenue-rollup && npm run test:conversion-rate-stats && npm run test:external-customer-audit && npm run test:telemetry-export",
|
|
332
|
+
"test:hook-stop-verify-deploy": "node --test tests/hook-stop-verify-deploy.test.js",
|
|
333
|
+
"test:hook-stop-anti-claim": "node --test tests/hook-stop-anti-claim.test.js",
|
|
334
|
+
"test:plausible-server-events": "node --test tests/plausible-server-events.test.js",
|
|
335
|
+
"test:activation-tracker": "node --test tests/activation-tracker.test.js",
|
|
336
|
+
"test:unified-revenue-rollup": "node --test tests/unified-revenue-rollup.test.js",
|
|
337
|
+
"test:conversion-rate-stats": "node --test tests/conversion-rate-stats.test.js",
|
|
338
|
+
"test:external-customer-audit": "node --test tests/external-customer-audit.test.js",
|
|
339
|
+
"test:telemetry-export": "node --test tests/telemetry-export.test.js",
|
|
326
340
|
"test:swarm-coordinator": "node --test tests/swarm-coordinator.test.js",
|
|
327
341
|
"test:session-report": "node --test tests/session-report.test.js",
|
|
328
342
|
"test:agent-reasoning-traces": "node --test tests/agent-reasoning-traces.test.js tests/agent-stack-survival-audit.test.js",
|
|
@@ -346,6 +360,8 @@
|
|
|
346
360
|
"test:telemetry-tracked-link-slug": "node --test tests/telemetry-tracked-link-slug.test.js",
|
|
347
361
|
"test:prompt-eval": "node --test tests/prompt-eval.test.js",
|
|
348
362
|
"eval:feedback": "node scripts/prompt-eval.js --from-feedback",
|
|
363
|
+
"eval:feedback-quality": "python3 scripts/feedback_quality_eval.py",
|
|
364
|
+
"eval:classifier": "python3 scripts/eval_gate_classifier.py",
|
|
349
365
|
"test:decision-trace": "node --test tests/decision-trace.test.js",
|
|
350
366
|
"test:feedback-fallback": "node --test tests/feedback-fallback.test.js",
|
|
351
367
|
"test:metaclaw": "node --test tests/metaclaw-features.test.js",
|
|
@@ -365,6 +381,7 @@
|
|
|
365
381
|
"test:learn-hub": "node --test tests/learn-hub.test.js",
|
|
366
382
|
"test:feedback-to-rules": "node --test tests/feedback-to-rules.test.js",
|
|
367
383
|
"test:memory-firewall": "node --test tests/memory-firewall.test.js",
|
|
384
|
+
"test:memory-scope-readiness": "node --test tests/memory-scope-readiness.test.js",
|
|
368
385
|
"test:belief-update": "node --test tests/belief-update.test.js",
|
|
369
386
|
"test:hosted-config": "node --test tests/hosted-config.test.js",
|
|
370
387
|
"test:operational-summary": "node --test tests/operational-summary.test.js",
|
|
@@ -403,10 +420,10 @@
|
|
|
403
420
|
"test:kto": "node --test tests/export-kto.test.js",
|
|
404
421
|
"test:api": "node --test --test-concurrency=1 tests/api-server.test.js tests/api-events-sse.test.js tests/api-auth-config.test.js tests/mcp-server.test.js tests/adapters.test.js tests/openapi-parity.test.js tests/budget-guard.test.js tests/context-manager.test.js tests/contextfs.test.js tests/job-api.test.js tests/pack-templates.test.js tests/dashboard.test.js tests/dashboard-render-spec.test.js tests/dashboard-html.test.js tests/agent-readiness.test.js tests/mcp-policy.test.js tests/subagent-profiles.test.js tests/intent-router.test.js tests/internal-agent-bootstrap.test.js tests/lesson-search.test.js tests/thumbgate-search.test.js tests/document-intake.test.js tests/rubric-engine.test.js tests/self-healing-check.test.js tests/self-heal.test.js tests/feedback-schema.test.js tests/thompson-sampling.test.js tests/feedback-sequences.test.js tests/diversity-tracking.test.js tests/vector-store.test.js tests/gemini-embedding-policy.test.js tests/feedback-attribution.test.js tests/hybrid-feedback-context.test.js tests/loop-closure.test.js tests/code-reasoning.test.js tests/feedback-loop.test.js tests/feedback-inbox-read.test.js tests/feedback-to-memory.test.js tests/test-coverage.test.js tests/version-metadata.test.js tests/claude-mcpb.test.js tests/claude-codex-bridge.test.js tests/cursor-plugin.test.js tests/codex-plugin.test.js tests/ide-marketplace-extensions.test.js tests/telemetry-analytics.test.js tests/public-landing.test.js tests/lessons-page.test.js tests/pro-landing.test.js tests/local-model-profile.test.js tests/risk-scorer.test.js tests/context-compaction.test.js tests/reminder-engine.test.js tests/post-to-x.test.js tests/verification-loop.test.js tests/async-job-runner.test.js tests/commerce-quality.test.js tests/recall-limit.test.js tests/problem-detail.test.js tests/natural-language-harness.test.js tests/settings-hierarchy.test.js",
|
|
405
422
|
"test:proof": "node --test tests/prove-adapters.test.js tests/prove-attribution.test.js tests/prove-cloudflare-sandbox.test.js tests/prove-data-quality.test.js tests/prove-intelligence.test.js tests/prove-lancedb.test.js tests/prove-loop-closure.test.js tests/prove-training-export.test.js tests/prove-local-intelligence.test.js tests/prove-workflow-contract.test.js tests/prove-autoresearch.test.js tests/prove-claim-verification.test.js tests/prove-data-pipeline.test.js tests/prove-evolution.test.js tests/prove-harnesses.test.js tests/prove-packaged-runtime.test.js tests/prove-runtime.test.js tests/prove-seo-gsd.test.js tests/prove-settings.test.js tests/prove-xmemory.test.js && node --test tests/prove-automation.test.js",
|
|
406
|
-
"test:e2e": "node --test tests/e2e-pipeline.test.js tests/e2e-product-flows.test.js tests/e2e-coverage-contract.test.js",
|
|
423
|
+
"test:e2e": "node --test tests/e2e-pipeline.test.js tests/e2e-product-flows.test.js tests/e2e-coverage-contract.test.js tests/interaction-model-e2e.test.js",
|
|
407
424
|
"test:rlaif": "node --test tests/rlaif-self-audit.test.js tests/dpo-optimizer.test.js tests/meta-policy.test.js tests/agent-reward-model.test.js",
|
|
408
425
|
"test:attribution": "node --test tests/feedback-attribution.test.js tests/hybrid-feedback-context.test.js",
|
|
409
|
-
"test:quality": "node --test tests/validate-feedback.test.js",
|
|
426
|
+
"test:quality": "node --test tests/validate-feedback.test.js tests/feedback-quality-eval-python.test.js tests/eval-gate-classifier.test.js",
|
|
410
427
|
"test:intelligence": "node --test tests/intelligence.test.js",
|
|
411
428
|
"test:training-export": "node --test tests/training-export.test.js tests/databricks-export.test.js",
|
|
412
429
|
"test:deployment": "node --test tests/deployment.test.js tests/deploy-policy.test.js tests/publish-decision.test.js tests/changeset-check.test.js tests/release-notes.test.js tests/sonarcloud-workflow.test.js tests/package-boundary.test.js tests/public-package-boundary.test.js tests/revenue-observability-workflow.test.js",
|
|
@@ -486,8 +503,11 @@
|
|
|
486
503
|
"test:zernio-status": "node --test tests/zernio-status.test.js",
|
|
487
504
|
"test:license": "node --test tests/license.test.js",
|
|
488
505
|
"test:bot-detector": "node --test tests/bot-detector.test.js",
|
|
506
|
+
"test:audit-pr-bot-contamination": "node --test tests/audit-pr-bot-contamination.test.js",
|
|
507
|
+
"test:stripe-bootstrap-saas-catalog": "node --test tests/stripe-bootstrap-saas-catalog.test.js",
|
|
489
508
|
"test:bot-detection": "node --test tests/bot-detection.test.js",
|
|
490
509
|
"test:checkout-bot-guard": "node --test tests/checkout-bot-guard.test.js",
|
|
510
|
+
"test:checkout-pro-confirmation-gate": "node --test tests/checkout-pro-confirmation-gate.test.js",
|
|
491
511
|
"test:postinstall": "node --test tests/postinstall.test.js",
|
|
492
512
|
"test:funnel-invariants": "node --test tests/funnel-invariants.test.js",
|
|
493
513
|
"test:cli-telemetry": "node --test tests/cli-telemetry.test.js",
|
|
@@ -597,7 +617,8 @@
|
|
|
597
617
|
"test:gate-eval": "node --test tests/gate-eval.test.js",
|
|
598
618
|
"gate-eval:ci": "node scripts/gate-eval.js run",
|
|
599
619
|
"test:ai-engineering-stack-guardrails": "node --test tests/ai-engineering-stack-guardrails.test.js",
|
|
600
|
-
"test:
|
|
620
|
+
"test:interaction-model": "node --test tests/interaction-model.test.js tests/interaction-model-e2e.test.js",
|
|
621
|
+
"test:high-roi": "node --test tests/high-roi.test.js tests/model-candidates.test.js tests/autonomous-workflow.test.js tests/high-roi-agent-workflows.test.js tests/interaction-model.test.js tests/interaction-model-e2e.test.js tests/code-graph-guardrails.test.js tests/proxy-pointer-rag-guardrails.test.js tests/rag-precision-guardrails.test.js tests/ai-engineering-stack-guardrails.test.js tests/long-running-agent-context-guardrails.test.js tests/reasoning-efficiency-guardrails.test.js tests/deepseek-v4-runtime-guardrails.test.js tests/upstream-contribution-engine.test.js tests/proactive-agent-eval-guardrails.test.js tests/reward-hacking-guardrails.test.js tests/chatgpt-ads-readiness-pack.test.js tests/oss-pr-opportunity-scout.test.js tests/agent-design-governance.test.js tests/gemini-embedding-policy.test.js tests/openclaw-agent-governance-kit.test.js",
|
|
601
622
|
"test:public-static-assets": "node --test tests/public-static-assets.test.js",
|
|
602
623
|
"test:token-savings": "node --test tests/token-savings.test.js",
|
|
603
624
|
"test:numbers-page": "node --test tests/numbers-page.test.js",
|
|
@@ -674,9 +695,9 @@
|
|
|
674
695
|
"node": ">=18.18.0"
|
|
675
696
|
},
|
|
676
697
|
"dependencies": {
|
|
677
|
-
"@anthropic-ai/sdk": "0.
|
|
698
|
+
"@anthropic-ai/sdk": "0.95.2",
|
|
678
699
|
"@google/genai": "1.49.0",
|
|
679
|
-
"@huggingface/transformers": "^4.
|
|
700
|
+
"@huggingface/transformers": "^4.2.0",
|
|
680
701
|
"@lancedb/lancedb": "^0.27.2",
|
|
681
702
|
"apache-arrow": "^18.1.0",
|
|
682
703
|
"better-sqlite3": "^12.9.0",
|
|
@@ -692,7 +713,7 @@
|
|
|
692
713
|
},
|
|
693
714
|
"mcpName": "io.github.IgorGanapolsky/thumbgate",
|
|
694
715
|
"devDependencies": {
|
|
695
|
-
"@changesets/changelog-github": "^0.
|
|
716
|
+
"@changesets/changelog-github": "^0.7.0",
|
|
696
717
|
"@changesets/cli": "^2.31.0",
|
|
697
718
|
"c8": "^11.0.0",
|
|
698
719
|
"undici": "^8.2.0"
|
package/public/compare.html
CHANGED
|
@@ -255,6 +255,12 @@
|
|
|
255
255
|
<p><a href="/compare/agentix-labs" class="cta">Read ThumbGate vs Agentix Labs</a></p>
|
|
256
256
|
</div>
|
|
257
257
|
|
|
258
|
+
<div class="card">
|
|
259
|
+
<h3>Already using a supply-chain scanner?</h3>
|
|
260
|
+
<p>HEIDI (Meterian) catches AI assistants suggesting vulnerable npm/pip/maven packages — the supply-chain half of AI coding safety. ThumbGate is the behavior half: it blocks the agent's tool call before it fires a known-bad pattern. Different threat surfaces, both local-first, both free at base. Run both.</p>
|
|
261
|
+
<p><a href="/compare/heidi" class="cta">Read ThumbGate vs HEIDI</a></p>
|
|
262
|
+
</div>
|
|
263
|
+
|
|
258
264
|
<h2>How It Works</h2>
|
|
259
265
|
<div class="step-grid">
|
|
260
266
|
<div class="step-card">
|