thumbgate 1.27.7 → 1.27.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.well-known/llms.txt +1 -2
- package/README.md +0 -2
- package/bin/cli.js +259 -78
- package/package.json +12 -18
- package/public/blog.html +30 -0
- package/public/compare/adopt-ai.html +219 -0
- package/public/compare/agentix-labs.html +197 -0
- package/public/compare/ai-experience-orchestration.html +216 -0
- package/public/compare/anthropic-claude-for-legal.html +260 -0
- package/public/compare/anthropic-containment.html +280 -0
- package/public/compare/arcade.html +175 -0
- package/public/compare/arcjet.html +239 -0
- package/public/compare/bumblebee.html +307 -0
- package/public/compare/claude-code-hooks.html +294 -0
- package/public/compare/databricks-unity-ai-gateway.html +215 -0
- package/public/compare/fallow.html +351 -0
- package/public/compare/heidi.html +233 -0
- package/public/compare/mem0.html +342 -0
- package/public/compare/oak-and-sparrow-gatekeeper.html +289 -0
- package/public/compare/rein.html +236 -0
- package/public/compare/sigmashake.html +256 -0
- package/public/compare/speclock.html +342 -0
- package/public/compare.html +2 -0
- package/public/guides/agent-harness-optimization.html +342 -0
- package/public/guides/agentic-web-governance.html +406 -0
- package/public/guides/ai-agent-governance-sprint.html +415 -0
- package/public/guides/ai-agent-pre-action-approval-gates.html +401 -0
- package/public/guides/ai-agent-workflow-migration-checklist.html +392 -0
- package/public/guides/ai-deployment-readiness.html +415 -0
- package/public/guides/ai-mode-ads-agent-governance.html +401 -0
- package/public/guides/ai-search-topical-presence.html +342 -0
- package/public/guides/autoresearch-agent-safety.html +342 -0
- package/public/guides/background-agent-governance.html +358 -0
- package/public/guides/best-tools-stop-ai-agents-breaking-production.html +363 -0
- package/public/guides/browser-automation-safety.html +342 -0
- package/public/guides/chatgpt-ads-trust.html +353 -0
- package/public/guides/claude-code-feedback.html +339 -0
- package/public/guides/claude-code-prevent-repeated-mistakes.html +161 -0
- package/public/guides/claude-code-skills-guardrails.html +343 -0
- package/public/guides/claude-desktop.html +356 -0
- package/public/guides/code-knowledge-graph-guardrails.html +365 -0
- package/public/guides/codex-cli-guardrails.html +339 -0
- package/public/guides/cursor-agent-guardrails.html +339 -0
- package/public/guides/cursor-prevent-repeated-mistakes.html +161 -0
- package/public/guides/database-agent-safety.html +406 -0
- package/public/guides/deepseek-v4-runtime-guardrails.html +346 -0
- package/public/guides/developer-machine-supply-chain-guardrails.html +358 -0
- package/public/guides/gcp-mcp-guardrails.html +147 -0
- package/public/guides/gemini-cli-feedback-memory.html +339 -0
- package/public/guides/gpt-5-5-model-evaluation.html +358 -0
- package/public/guides/internal-ai-engineering-stack-guardrails.html +348 -0
- package/public/guides/long-running-agent-context-management.html +346 -0
- package/public/guides/mcp-tool-governance.html +401 -0
- package/public/guides/multica-thumbgate-setup.html +134 -0
- package/public/guides/native-messaging-host-security.html +342 -0
- package/public/guides/policy-engine-pre-action-gates.html +346 -0
- package/public/guides/pre-action-checks.html +342 -0
- package/public/guides/pretooluse-hooks-vs-advisory-prompt-rules.html +342 -0
- package/public/guides/prompt-tricks-to-workflow-rules.html +365 -0
- package/public/guides/proxy-pointer-rag-guardrails.html +352 -0
- package/public/guides/rag-precision-tuning-guardrails.html +352 -0
- package/public/guides/reasoning-compression-guardrails.html +346 -0
- package/public/guides/relational-knowledge-ai-recommendations.html +342 -0
- package/public/guides/roo-code-alternative-cline.html +339 -0
- package/public/guides/semantic-programmatic-seo-guardrails.html +352 -0
- package/public/guides/seo-agent-skills-guardrails.html +344 -0
- package/public/guides/stop-repeated-ai-agent-mistakes.html +342 -0
- package/public/index.html +10 -48
- package/public/learn/ac-dc-runtime-enforcement.html +277 -0
- package/public/learn/agent-harness-pattern.html +181 -0
- package/public/learn/agent-swarms-shared-gates.html +173 -0
- package/public/learn/agentic-enterprise-context-brain.html +117 -0
- package/public/learn/agentic-os-team-governance.html +146 -0
- package/public/learn/ai-agent-governance.html +158 -0
- package/public/learn/ai-agent-persistent-memory.html +211 -0
- package/public/learn/background-agent-control-layer.html +184 -0
- package/public/learn/claude-code-goal-with-rubrics.html +205 -0
- package/public/learn/codex-role-plugins-need-governance.html +125 -0
- package/public/learn/cost-aware-agent-gate-routing.html +173 -0
- package/public/learn/databricks-unity-ai-gateway-runtime-governance.html +157 -0
- package/public/learn/deterministic-agent-workflows.html +185 -0
- package/public/learn/feedback-loop-vs-decision-layer.html +283 -0
- package/public/learn/from-prototype-to-production.html +223 -0
- package/public/learn/learn.css +51 -0
- package/public/learn/mcp-pre-action-checks-explained.html +172 -0
- package/public/learn/pretix-stripe-connect-marketplaces.html +161 -0
- package/public/learn/regulated-agent-execution-boundary.html +196 -0
- package/public/learn/spec-driven-development.html +168 -0
- package/public/learn/stop-ai-agent-force-push.html +134 -0
- package/public/learn/vibe-coding-safety-net.html +142 -0
- package/public/learn.html +6 -50
- package/public/pro.html +6 -6
- package/scripts/cli-schema.js +10 -22
- package/scripts/dashboard-chat.js +1 -2
- package/scripts/document-intake.js +49 -1
- package/scripts/gemini-embedding-policy.js +1 -2
- package/scripts/hosted-config.js +12 -0
- package/scripts/plausible-domain-config.js +1 -3
- package/scripts/reddit-browser-notification-watch.js +230 -0
- package/scripts/seo-gsd.js +0 -239
- package/scripts/vector-store.js +0 -44
- package/scripts/workspace-evolver.js +2 -62
- package/src/api/server.js +124 -335
- package/adapters/policy-engine/ethicore-guardian-client.js +0 -68
- package/adapters/policy-engine/thumbgate-policy-engine-adapter.js +0 -260
- package/scripts/hook-stop-anti-claim.js +0 -227
|
@@ -0,0 +1,211 @@
|
|
|
1
|
+
<!DOCTYPE html>
|
|
2
|
+
<html lang="en">
|
|
3
|
+
<head>
|
|
4
|
+
<meta charset="UTF-8">
|
|
5
|
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
6
|
+
<title>How to Give Your AI Coding Agent Persistent Memory Across Sessions — ThumbGate</title>
|
|
7
|
+
<script defer data-domain="thumbgate.ai" src="https://plausible.io/js/script.js"></script>
|
|
8
|
+
<meta name="description" content="AI coding agents forget everything when a session ends. Learn how to give Claude Code, Cursor, Codex, and Gemini persistent memory using an MCP memory server that survives restarts.">
|
|
9
|
+
<meta name="keywords" content="ai agent memory, persistent memory, claude code memory, cursor agent memory, MCP memory server, session persistence, agent context, episodic memory, semantic memory">
|
|
10
|
+
<meta property="og:title" content="How to Give Your AI Coding Agent Persistent Memory Across Sessions">
|
|
11
|
+
<meta property="og:description" content="Context windows are ephemeral. Real memory persists. Here is how to build durable memory for any MCP-compatible AI coding agent.">
|
|
12
|
+
<meta property="og:type" content="article">
|
|
13
|
+
<meta property="og:url" content="https://thumbgate.ai/learn/ai-agent-persistent-memory">
|
|
14
|
+
<link rel="canonical" href="https://thumbgate.ai/learn/ai-agent-persistent-memory">
|
|
15
|
+
|
|
16
|
+
<script type="application/ld+json">
|
|
17
|
+
{
|
|
18
|
+
"@context": "https://schema.org",
|
|
19
|
+
"@type": "TechArticle",
|
|
20
|
+
"headline": "How to Give Your AI Coding Agent Persistent Memory Across Sessions",
|
|
21
|
+
"description": "AI coding agents forget everything when a session ends. Learn how to give Claude Code, Cursor, Codex, and Gemini persistent memory using an MCP memory server that survives restarts.",
|
|
22
|
+
"author": {
|
|
23
|
+
"@type": "Person",
|
|
24
|
+
"name": "Igor Ganapolsky",
|
|
25
|
+
"url": "https://github.com/IgorGanapolsky"
|
|
26
|
+
},
|
|
27
|
+
"publisher": {
|
|
28
|
+
"@type": "Organization",
|
|
29
|
+
"name": "ThumbGate",
|
|
30
|
+
"url": "https://thumbgate.ai"
|
|
31
|
+
},
|
|
32
|
+
"datePublished": "2026-04-02",
|
|
33
|
+
"dateModified": "2026-04-02",
|
|
34
|
+
"mainEntityOfPage": "https://thumbgate.ai/learn/ai-agent-persistent-memory",
|
|
35
|
+
"about": [
|
|
36
|
+
{"@type": "Thing", "name": "ai agent memory"},
|
|
37
|
+
{"@type": "Thing", "name": "persistent memory"},
|
|
38
|
+
{"@type": "Thing", "name": "MCP memory server"},
|
|
39
|
+
{"@type": "Thing", "name": "session persistence"}
|
|
40
|
+
]
|
|
41
|
+
}
|
|
42
|
+
</script>
|
|
43
|
+
|
|
44
|
+
<link rel="stylesheet" href="/learn/learn.css">
|
|
45
|
+
<style>
|
|
46
|
+
table { width: 100%; border-collapse: collapse; margin: 1rem 0; }
|
|
47
|
+
th, td { text-align: left; padding: 0.6rem 0.8rem; border-bottom: 1px solid var(--border); font-size: 0.9rem; }
|
|
48
|
+
th { color: var(--cyan); font-weight: 600; }
|
|
49
|
+
.memory-row td:first-child { color: var(--green); font-weight: 500; }
|
|
50
|
+
</style>
|
|
51
|
+
</head>
|
|
52
|
+
<body>
|
|
53
|
+
|
|
54
|
+
<nav>
|
|
55
|
+
<a href="/" class="brand"><img src="/assets/brand/thumbgate-mark-inline.svg" alt="ThumbGate" class="logo-mark" width="28" height="28"><span class="logo-text">ThumbGate</span></a>
|
|
56
|
+
<a href="/guide">Setup Guide</a>
|
|
57
|
+
<a href="/learn">Learn</a>
|
|
58
|
+
<a href="/dashboard">Dashboard</a>
|
|
59
|
+
<a href="https://github.com/IgorGanapolsky/ThumbGate" target="_blank" rel="noopener">GitHub</a>
|
|
60
|
+
</nav>
|
|
61
|
+
|
|
62
|
+
<div class="container">
|
|
63
|
+
<div class="breadcrumb"><a href="/learn">Learn</a> / AI Agent Persistent Memory</div>
|
|
64
|
+
<h1>How to Give Your AI Coding Agent Persistent Memory Across Sessions</h1>
|
|
65
|
+
<p style="color:var(--muted);">6 min read · For developers using Claude Code, Cursor, Codex, or Gemini who are tired of re-explaining context every session</p>
|
|
66
|
+
|
|
67
|
+
<div class="tldr"><strong>TL;DR:</strong> Your AI agent forgets everything between sessions. Give it a SQLite+FTS5 memory that stores lessons, retrieves relevant context, and blocks known-bad actions automatically.</div>
|
|
68
|
+
|
|
69
|
+
<h2>The problem: agents forget everything when you close the tab</h2>
|
|
70
|
+
<p>You spend twenty minutes explaining your codebase to your AI coding agent. You tell it about the monorepo structure, the deployment conventions, the one branch it must never force-push to. The session ends. You come back tomorrow and it has no memory of any of it.</p>
|
|
71
|
+
<p>You are not doing anything wrong. This is how context windows work. Every session starts with a blank slate. The agent has no continuity of experience — no record of past mistakes, no accumulated knowledge of your project, no recollection of the rules you established last week.</p>
|
|
72
|
+
<p>The frustration is real and widespread. Developers using Claude Code, Cursor, Codex, and Gemini all hit the same wall. The agents are capable — they just cannot remember.</p>
|
|
73
|
+
|
|
74
|
+
<div class="callout">
|
|
75
|
+
<strong>The distinction that matters:</strong> A context window holds information for one session. Memory holds information across sessions. Most agents have the former. Almost none have the latter by default.
|
|
76
|
+
</div>
|
|
77
|
+
|
|
78
|
+
<h2>Why context windows are not memory</h2>
|
|
79
|
+
<p>Context windows are large and getting larger. That solves a different problem. A big context window means the agent can reason over more information at once within a single session. It does not mean that information survives when the session ends.</p>
|
|
80
|
+
<p>Think of the difference this way: a context window is RAM — fast, capacious, gone when the power cuts. Memory is disk — slower to query, but persistent. You need both. Right now, AI coding agents only ship with RAM.</p>
|
|
81
|
+
<p>The consequences compound over time. An agent with no persistent memory will:</p>
|
|
82
|
+
<ul>
|
|
83
|
+
<li>Repeat mistakes it made last week because it has no record of them</li>
|
|
84
|
+
<li>Re-ask you for project conventions it has already learned once</li>
|
|
85
|
+
<li>Ignore prevention rules you painstakingly wrote into a prompt — because that prompt is gone</li>
|
|
86
|
+
<li>Treat every session as if it is the first day on the job</li>
|
|
87
|
+
</ul>
|
|
88
|
+
<p>Stuffing facts into a <code>CLAUDE.md</code> file helps, but it is a manual workaround. You are the memory. You remember what to put in the file. You update it when things change. That is not a solution — it is delegation of a machine problem back to a human.</p>
|
|
89
|
+
|
|
90
|
+
<div class="callout callout-red">
|
|
91
|
+
<strong>The hidden cost:</strong> Re-explaining context is not just annoying. Every token you spend re-establishing what the agent already knew is a token not spent on the actual task. And re-explained rules are still just prompt rules — the agent can reason around them.
|
|
92
|
+
</div>
|
|
93
|
+
|
|
94
|
+
<h2>Three types of agent memory</h2>
|
|
95
|
+
<p>Cognitive science distinguishes several memory types. The same taxonomy maps cleanly onto what AI coding agents need. Here is how each type works and what it looks like in practice:</p>
|
|
96
|
+
|
|
97
|
+
<table>
|
|
98
|
+
<thead>
|
|
99
|
+
<tr>
|
|
100
|
+
<th>Memory Type</th>
|
|
101
|
+
<th>What It Stores</th>
|
|
102
|
+
<th>Concrete Example</th>
|
|
103
|
+
</tr>
|
|
104
|
+
</thead>
|
|
105
|
+
<tbody>
|
|
106
|
+
<tr class="memory-row">
|
|
107
|
+
<td>Episodic</td>
|
|
108
|
+
<td>Records of specific past events — what happened, when, and what the outcome was</td>
|
|
109
|
+
<td>The agent tried to force-push to main. You gave thumbs-down. That event is stored with context, timestamp, and failure description.</td>
|
|
110
|
+
</tr>
|
|
111
|
+
<tr class="memory-row">
|
|
112
|
+
<td>Semantic</td>
|
|
113
|
+
<td>Generalised knowledge extracted from episodes — rules, patterns, facts about the world</td>
|
|
114
|
+
<td>From multiple thumbs-down events, the system derives: "force-pushing to main causes broken deploys in this repo." That becomes a prevention rule.</td>
|
|
115
|
+
</tr>
|
|
116
|
+
<tr class="memory-row">
|
|
117
|
+
<td>Procedural</td>
|
|
118
|
+
<td>Encoded behaviours — checks that fire before actions without requiring the agent to reason about them</td>
|
|
119
|
+
<td>A PreToolUse hook that checks every <code>git push</code> command against the prevention rule and blocks the dangerous pattern automatically.</td>
|
|
120
|
+
</tr>
|
|
121
|
+
</tbody>
|
|
122
|
+
</table>
|
|
123
|
+
|
|
124
|
+
<p>Most "persistent memory" proposals for AI agents stop at episodic: they store a log of past conversations. That is useful, but insufficient. The signal gets diluted in a sea of raw events. What agents need is the full pipeline: episodes promote to semantic rules, semantic rules compile into procedural checks.</p>
|
|
125
|
+
|
|
126
|
+
<h2>How ThumbGate implements persistent memory</h2>
|
|
127
|
+
<p>ThumbGate is built around this three-tier memory architecture. Here is each layer in concrete terms.</p>
|
|
128
|
+
|
|
129
|
+
<h3>Episodic layer: the feedback log</h3>
|
|
130
|
+
<p>Every thumbs-up or thumbs-down you give an agent action is written to a structured feedback log. Each entry captures the tool call that was made, the context at the time, what worked or went wrong, and any tags you add. The log is append-only and survives across sessions.</p>
|
|
131
|
+
<p>In the current Claude auto-capture hook, a vague thumbs-down can borrow up to 8 prior recorded entries plus the failed tool call before promotion. Accepted feedback also opens a linked 60-second follow-up session so later corrections stay attached to the same memory trace instead of fragmenting into duplicates.</p>
|
|
132
|
+
|
|
133
|
+
<pre><code># Thumbs-down: record a specific failure
|
|
134
|
+
node .claude/scripts/feedback/capture-feedback.js \
|
|
135
|
+
--feedback=down \
|
|
136
|
+
--context="deploying to production" \
|
|
137
|
+
--what-went-wrong="agent ran db migration without backup" \
|
|
138
|
+
--what-to-change="always checkpoint before schema changes" \
|
|
139
|
+
--tags="database,migrations,safety"</code></pre>
|
|
140
|
+
|
|
141
|
+
<p>You do not need to write this manually for every interaction. The MCP server captures tool calls automatically. Manual feedback is for adding nuance that the agent could not observe on its own.</p>
|
|
142
|
+
|
|
143
|
+
<h3>Semantic layer: the lesson database</h3>
|
|
144
|
+
<p>Raw feedback events are processed into a SQLite database with full-text search (FTS5). This is not a flat file — it is a queryable knowledge store. When a new session starts, the system retrieves lessons relevant to the current task by similarity, not by recency.</p>
|
|
145
|
+
<p>The FTS5 index means retrieval is fast even as the database grows. You are not loading the entire history into context. You are loading the lessons most likely to matter right now. That is the difference between a knowledge base and a memory dump.</p>
|
|
146
|
+
|
|
147
|
+
<h3>Procedural layer: prevention rules and checks</h3>
|
|
148
|
+
<p>Promoted lessons generate prevention rules in <code>prevention-rules.md</code>. Rules are not prompt instructions — they are checked by a PreToolUse hook that fires before every tool call. The agent cannot reason around a check. The check runs outside the agent's context.</p>
|
|
149
|
+
|
|
150
|
+
<div class="callout callout-green">
|
|
151
|
+
<strong>The promotion pipeline:</strong> Thumbs-down event → feedback log entry → lesson promoted to SQLite → prevention rule generated → PreToolUse check active for every future session, with no additional setup.
|
|
152
|
+
</div>
|
|
153
|
+
|
|
154
|
+
<h2>Thompson Sampling for memory-informed decisions</h2>
|
|
155
|
+
<p>Not every prevention rule has the same confidence level. A rule derived from one thumbs-down event is weaker than a rule reinforced by a dozen. ThumbGate uses Thompson Sampling — a multi-armed bandit algorithm — to handle this uncertainty.</p>
|
|
156
|
+
<p>For each check, the system maintains a Beta distribution over outcomes. As thumbs-up and thumbs-down feedback accumulates, the distribution tightens. A check with high confidence becomes a hard block. A check still gathering signal issues a warning and lets the agent reconsider.</p>
|
|
157
|
+
<p>This matters for memory because it means the system learns your preferences rather than requiring you to manually tune thresholds. You give feedback. The check calibrates. The agent adapts.</p>
|
|
158
|
+
<p>Thompson Sampling also prevents over-blocking. If a pattern that was once dangerous stops being a problem — because the codebase changed, or you updated your workflow — thumbs-up feedback on future calls will widen the distribution back toward allowing. Memory is not one-way.</p>
|
|
159
|
+
|
|
160
|
+
<h2>Setup: persistent memory in two minutes</h2>
|
|
161
|
+
<p>ThumbGate ships as an MCP server. Any agent that speaks MCP — Claude Code, Cursor, Codex, Gemini, Amp, OpenCode — can connect to it. You initialize once and the memory layer is active for every subsequent session.</p>
|
|
162
|
+
|
|
163
|
+
<pre><code>npx thumbgate init</code></pre>
|
|
164
|
+
|
|
165
|
+
<p>That command sets up:</p>
|
|
166
|
+
<ul>
|
|
167
|
+
<li>The SQLite+FTS5 lesson database in <code>.claude/memory/</code></li>
|
|
168
|
+
<li>The feedback log at <code>.claude/memory/feedback/feedback-log.jsonl</code></li>
|
|
169
|
+
<li>Prevention rules at <code>.claude/memory/feedback/prevention-rules.md</code></li>
|
|
170
|
+
<li>A PreToolUse hook that reads checks on every tool call</li>
|
|
171
|
+
<li>The MCP server adapter for your agent runtime</li>
|
|
172
|
+
</ul>
|
|
173
|
+
|
|
174
|
+
<p>After init, your agent starts each session with context assembled from relevant past lessons. It does not start blank. It starts informed.</p>
|
|
175
|
+
|
|
176
|
+
<h3>What memory looks like on day one vs. day thirty</h3>
|
|
177
|
+
<p>On day one, the database is empty. The agent behaves the same as it always has. You give feedback on its actions.</p>
|
|
178
|
+
<p>By day thirty, the database has accumulated dozens of lessons. The agent's context at session start includes the most relevant ones. Prevention rules have tightened around patterns that caused problems. Patterns that worked have been reinforced. The agent makes fewer mistakes — not because it was retrained, but because the checks learned from your feedback.</p>
|
|
179
|
+
<p>That is persistent memory in practice: not a bigger context window, not a longer system prompt, but a feedback loop that accumulates signal and converts it into durable enforcement.</p>
|
|
180
|
+
|
|
181
|
+
<div class="cta-box">
|
|
182
|
+
<h2 style="color:var(--text);font-size:1.3rem;margin:0 0 8px;">Give your agent memory that survives restarts</h2>
|
|
183
|
+
<p>One command. Works with Claude Code, Cursor, Codex, Gemini, Amp, and any MCP-compatible agent.</p>
|
|
184
|
+
<div class="cta-install">$ npx thumbgate init</div>
|
|
185
|
+
<div class="cta-actions" aria-label="Paid ThumbGate options">
|
|
186
|
+
<a class="cta-link" href="/checkout/pro?utm_source=learn&utm_medium=persistent_memory_article&utm_campaign=memory_to_pro&cta_id=learn_persistent_memory_pro&cta_placement=article_cta&plan_id=pro&billing_cycle=monthly">Get Pro — $19/mo or $149/yr</a>
|
|
187
|
+
<a class="cta-link cta-link-secondary" rel="nofollow noopener noreferrer" target="_blank" href="https://buy.stripe.com/00w14neyUcXA5pL5e33sI0e">Pay $499 diagnostic</a>
|
|
188
|
+
<a class="cta-link cta-link-secondary" href="/#workflow-sprint-intake">Send workflow first</a>
|
|
189
|
+
</div>
|
|
190
|
+
<p style="font-size:0.85rem;margin-top:0.9rem;">Free is enough for a solo proof. Pro adds dashboard, recall, lesson search, unlimited captures/rules, and DPO export. Use the diagnostic when a team needs a workflow review before rollout.</p>
|
|
191
|
+
</div>
|
|
192
|
+
|
|
193
|
+
<div class="related">
|
|
194
|
+
<h3>Related articles</h3>
|
|
195
|
+
<a href="/learn/agent-harness-pattern">The Agent Harness Pattern: Why Your AI Needs a Seatbelt →</a>
|
|
196
|
+
<a href="/learn/mcp-pre-action-checks-explained">MCP Pre-Action Checks Explained →</a>
|
|
197
|
+
<a href="/learn/stop-ai-agent-force-push">How to Stop AI Agents From Force-Pushing to Main →</a>
|
|
198
|
+
<a href="/learn/vibe-coding-safety-net">The Vibe Coding Safety Net You Are Missing →</a>
|
|
199
|
+
</div>
|
|
200
|
+
</div>
|
|
201
|
+
|
|
202
|
+
|
|
203
|
+
<div class="sticky-cta">
|
|
204
|
+
<span style="color:var(--muted)">Try it now:</span>
|
|
205
|
+
<code>npx thumbgate init</code>
|
|
206
|
+
<a href="/checkout/pro?utm_source=learn&utm_medium=sticky&utm_campaign=memory_to_pro&cta_id=learn_persistent_memory_sticky_pro&cta_placement=sticky&plan_id=pro&billing_cycle=monthly">Pro →</a>
|
|
207
|
+
<a href="https://github.com/IgorGanapolsky/ThumbGate" target="_blank" rel="noopener">GitHub →</a>
|
|
208
|
+
</div>
|
|
209
|
+
<script src="/js/buyer-intent.js"></script>
|
|
210
|
+
</body>
|
|
211
|
+
</html>
|
|
@@ -0,0 +1,184 @@
|
|
|
1
|
+
<!DOCTYPE html>
|
|
2
|
+
<html lang="en">
|
|
3
|
+
<head>
|
|
4
|
+
<meta charset="UTF-8">
|
|
5
|
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
6
|
+
<title>Background Agents Need a Control Layer Outside the Model - ThumbGate</title>
|
|
7
|
+
<script defer data-domain="thumbgate.ai" src="https://plausible.io/js/script.js"></script>
|
|
8
|
+
<meta name="description" content="Engineering teams are shipping background agents, but the operating system around them is the real bottleneck. Learn where ThumbGate fits: pre-action controls, evidence, and local enforcement outside the model context.">
|
|
9
|
+
<meta name="keywords" content="background agents, AI SDLC, agent control layer, pre-action checks, agent governance, agent operating system, AI software engineering agents, ThumbGate">
|
|
10
|
+
<meta property="og:title" content="Background Agents Need a Control Layer Outside the Model">
|
|
11
|
+
<meta property="og:description" content="Agents can produce work. Production teams still need triggers, isolated runs, context, visibility, and controls. ThumbGate owns the controls and evidence layer.">
|
|
12
|
+
<meta property="og:type" content="article">
|
|
13
|
+
<meta property="og:url" content="https://thumbgate.ai/learn/background-agent-control-layer">
|
|
14
|
+
<link rel="canonical" href="https://thumbgate.ai/learn/background-agent-control-layer">
|
|
15
|
+
<link rel="stylesheet" href="/learn/learn.css">
|
|
16
|
+
<script type="application/ld+json">
|
|
17
|
+
{
|
|
18
|
+
"@context": "https://schema.org",
|
|
19
|
+
"@type": "TechArticle",
|
|
20
|
+
"headline": "Background Agents Need a Control Layer Outside the Model",
|
|
21
|
+
"description": "A practical map of the control and evidence layer teams need around background AI software engineering agents.",
|
|
22
|
+
"author": {
|
|
23
|
+
"@type": "Person",
|
|
24
|
+
"name": "Igor Ganapolsky",
|
|
25
|
+
"url": "https://github.com/IgorGanapolsky"
|
|
26
|
+
},
|
|
27
|
+
"publisher": {
|
|
28
|
+
"@type": "Organization",
|
|
29
|
+
"name": "ThumbGate",
|
|
30
|
+
"url": "https://thumbgate.ai"
|
|
31
|
+
},
|
|
32
|
+
"datePublished": "2026-05-25",
|
|
33
|
+
"dateModified": "2026-05-25",
|
|
34
|
+
"mainEntityOfPage": "https://thumbgate.ai/learn/background-agent-control-layer",
|
|
35
|
+
"about": [
|
|
36
|
+
{ "@type": "Thing", "name": "background agents" },
|
|
37
|
+
{ "@type": "Thing", "name": "AI SDLC" },
|
|
38
|
+
{ "@type": "Thing", "name": "pre-action checks" },
|
|
39
|
+
{ "@type": "Thing", "name": "agent governance" }
|
|
40
|
+
]
|
|
41
|
+
}
|
|
42
|
+
</script>
|
|
43
|
+
<style>
|
|
44
|
+
table { width: 100%; border-collapse: collapse; margin: 1rem 0; }
|
|
45
|
+
th, td { text-align: left; padding: 0.7rem 0.8rem; border-bottom: 1px solid var(--border); vertical-align: top; font-size: 0.92rem; }
|
|
46
|
+
th { color: var(--cyan); font-weight: 700; }
|
|
47
|
+
.layer strong { color: var(--green); }
|
|
48
|
+
.mini-grid { display: grid; grid-template-columns: repeat(2, minmax(0, 1fr)); gap: 1rem; margin: 1.25rem 0; }
|
|
49
|
+
.mini-card { background: var(--bg-card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
|
50
|
+
.mini-card h3 { margin-top: 0; color: var(--text); }
|
|
51
|
+
.mini-card p { color: var(--muted); }
|
|
52
|
+
@media (max-width: 700px) { .mini-grid { grid-template-columns: 1fr; } }
|
|
53
|
+
</style>
|
|
54
|
+
</head>
|
|
55
|
+
<body>
|
|
56
|
+
<nav>
|
|
57
|
+
<a href="/" class="brand"><img src="/assets/brand/thumbgate-mark-inline.svg" alt="ThumbGate" class="logo-mark" width="28" height="28"><span class="logo-text">ThumbGate</span></a>
|
|
58
|
+
<a href="/guide">Setup Guide</a>
|
|
59
|
+
<a href="/learn">Learn</a>
|
|
60
|
+
<a href="/dashboard">Dashboard</a>
|
|
61
|
+
<a href="https://github.com/IgorGanapolsky/ThumbGate" target="_blank" rel="noopener">GitHub</a>
|
|
62
|
+
</nav>
|
|
63
|
+
|
|
64
|
+
<div class="container">
|
|
65
|
+
<div class="breadcrumb"><a href="/learn">Learn</a> / Background Agent Control Layer</div>
|
|
66
|
+
<h1>Background agents need a control layer outside the model.</h1>
|
|
67
|
+
<p style="color:var(--muted);">5 min read · For engineering leaders moving from individual AI coding to team-scale background agents</p>
|
|
68
|
+
|
|
69
|
+
<div class="tldr"><strong>TL;DR:</strong> The hard part is no longer proving an agent can open a pull request. The hard part is the system around it: what starts the work, where it runs, what context it gets, what evidence it leaves, and which controls run outside the model context.</div>
|
|
70
|
+
|
|
71
|
+
<h2>The buyer problem changed</h2>
|
|
72
|
+
<p>Most AI coding rollouts begin with individual acceleration. A developer uses Claude Code, Cursor, Codex, Gemini, or a background agent and gets faster at producing diffs. Then the team hits the next bottleneck: review queues, CI pressure, release routing, credential scope, and unclear accountability.</p>
|
|
73
|
+
<p>That is the moment agent adoption stops being a prompt-engineering problem and becomes an operating-system problem. The question shifts from "can the agent write code?" to "can the organization safely receive, constrain, inspect, and improve the work?"</p>
|
|
74
|
+
|
|
75
|
+
<div class="callout">
|
|
76
|
+
<strong>ThumbGate's position:</strong> memory and context help the agent know more. A control layer decides what the agent is allowed to do next, records why, and turns repeated failures into enforceable rules.
|
|
77
|
+
</div>
|
|
78
|
+
|
|
79
|
+
<h2>The five layers around a production agent</h2>
|
|
80
|
+
<p>For teams running agents beyond the IDE, the system usually decomposes into five layers. ThumbGate does not need to replace all five. It wins by being the enforcement and evidence layer that composes with the rest.</p>
|
|
81
|
+
|
|
82
|
+
<table>
|
|
83
|
+
<thead>
|
|
84
|
+
<tr>
|
|
85
|
+
<th>Layer</th>
|
|
86
|
+
<th>Buyer question</th>
|
|
87
|
+
<th>ThumbGate role</th>
|
|
88
|
+
</tr>
|
|
89
|
+
</thead>
|
|
90
|
+
<tbody>
|
|
91
|
+
<tr class="layer">
|
|
92
|
+
<td><strong>Triggers</strong></td>
|
|
93
|
+
<td>What starts the work: ticket, PR, incident, CVE, scheduled migration, or human request?</td>
|
|
94
|
+
<td>Attach a contract: repo scope, allowed tools, done criteria, review threshold, and blocked-action policy.</td>
|
|
95
|
+
</tr>
|
|
96
|
+
<tr class="layer">
|
|
97
|
+
<td><strong>Isolated runs</strong></td>
|
|
98
|
+
<td>Where does the agent execute, and which credentials, repos, network paths, and files can it touch?</td>
|
|
99
|
+
<td>Run pre-action checks in the execution boundary before privileged tools fire.</td>
|
|
100
|
+
</tr>
|
|
101
|
+
<tr class="layer">
|
|
102
|
+
<td><strong>Context</strong></td>
|
|
103
|
+
<td>What does the agent need beyond the prompt: ownership, CI logs, docs, conventions, and prior failures?</td>
|
|
104
|
+
<td>Promote feedback and failures into local lessons, then compile trusted lessons into rules.</td>
|
|
105
|
+
</tr>
|
|
106
|
+
<tr class="layer">
|
|
107
|
+
<td><strong>Visibility</strong></td>
|
|
108
|
+
<td>What evidence can reviewers inspect: logs, diffs, tests, blocked actions, overrides, and decisions?</td>
|
|
109
|
+
<td>Emit structured evidence for allow, warn, block, override, and handoff decisions.</td>
|
|
110
|
+
</tr>
|
|
111
|
+
<tr class="layer">
|
|
112
|
+
<td><strong>Controls</strong></td>
|
|
113
|
+
<td>Which governance rules live outside the model so the agent cannot reason around them?</td>
|
|
114
|
+
<td>Enforce PreToolUse gates, policy bundles, local allowlists, and repeated-failure prevention rules.</td>
|
|
115
|
+
</tr>
|
|
116
|
+
</tbody>
|
|
117
|
+
</table>
|
|
118
|
+
|
|
119
|
+
<h2>Why controls outside context matter</h2>
|
|
120
|
+
<p>A prompt rule is useful until the model forgets it, compresses it away, misunderstands it, or decides a new situation is an exception. A pre-action control does not depend on the model remembering the rule. It sees the proposed tool call and returns allow, warn, block, or review.</p>
|
|
121
|
+
<p>That is a different category of safety. It is not a bigger memory. It is a runtime boundary.</p>
|
|
122
|
+
|
|
123
|
+
<div class="mini-grid">
|
|
124
|
+
<div class="mini-card">
|
|
125
|
+
<h3>Memory answers what happened</h3>
|
|
126
|
+
<p>It stores prior runs, feedback, conventions, and task context so the agent stops starting from zero.</p>
|
|
127
|
+
</div>
|
|
128
|
+
<div class="mini-card">
|
|
129
|
+
<h3>Controls answer whether this may run</h3>
|
|
130
|
+
<p>They block known-bad actions before execution and preserve proof that the decision happened.</p>
|
|
131
|
+
</div>
|
|
132
|
+
</div>
|
|
133
|
+
|
|
134
|
+
<h2>The high-ROI starting workflows</h2>
|
|
135
|
+
<p>Start with work that already has clear, verifiable criteria. That gives the control layer a concrete success standard instead of a vague promise.</p>
|
|
136
|
+
<ul>
|
|
137
|
+
<li><strong>CVE remediation:</strong> trigger from advisory, limit repo scope, run tests, block unsafe dependency changes, create PR evidence.</li>
|
|
138
|
+
<li><strong>CI/CD migrations:</strong> enforce branch, environment, and secret boundaries before the agent edits pipelines.</li>
|
|
139
|
+
<li><strong>Test generation:</strong> require failing-before/passing-after proof before marking a run complete.</li>
|
|
140
|
+
<li><strong>Documentation updates:</strong> block edits that cite unsupported claims, stale endpoints, or missing proof links.</li>
|
|
141
|
+
<li><strong>Legal or regulated intake:</strong> block advice-shaped responses, confidential egress, and unapproved model calls before they happen.</li>
|
|
142
|
+
</ul>
|
|
143
|
+
|
|
144
|
+
<h2>How ThumbGate fits next to background-agent platforms</h2>
|
|
145
|
+
<p>Background-agent platforms provide orchestration, environments, and fleet execution. ThumbGate should not pretend to replace that stack. It should attach as the local enforcement and proof layer across agents, models, repos, and workflows.</p>
|
|
146
|
+
<p>The integration shape is simple: when an agent proposes an action, ThumbGate evaluates the action against local rules and prior failures. If the action is safe, it proceeds and logs evidence. If the action matches a known-bad pattern, it blocks or routes to review before the tool runs.</p>
|
|
147
|
+
|
|
148
|
+
<div class="callout callout-green">
|
|
149
|
+
<strong>Sales line:</strong> If your team already has agents, ThumbGate helps you ship the system around them: pre-action controls, reviewable evidence, and local rules that survive model churn.
|
|
150
|
+
</div>
|
|
151
|
+
|
|
152
|
+
<h2>What to show in a buyer demo</h2>
|
|
153
|
+
<ol>
|
|
154
|
+
<li>One trigger with a clear contract: repo, task, allowed tools, and done criteria.</li>
|
|
155
|
+
<li>One proposed risky action stopped before execution.</li>
|
|
156
|
+
<li>One safe action allowed with evidence attached.</li>
|
|
157
|
+
<li>One failure converted into a reusable rule.</li>
|
|
158
|
+
<li>One export that a reviewer, security lead, or risk officer can inspect later.</li>
|
|
159
|
+
</ol>
|
|
160
|
+
|
|
161
|
+
<div class="cta-box">
|
|
162
|
+
<h2 style="color:var(--text);font-size:1.3rem;margin:0 0 8px;">Add the control layer before agents scale</h2>
|
|
163
|
+
<p>Install local pre-action gates, then decide which workflows deserve hosted evidence, team rules, and audit exports.</p>
|
|
164
|
+
<div class="cta-install">$ npx thumbgate init</div>
|
|
165
|
+
</div>
|
|
166
|
+
|
|
167
|
+
<div class="related">
|
|
168
|
+
<h3>Related articles</h3>
|
|
169
|
+
<a href="/learn/mcp-pre-action-checks-explained">MCP Pre-Action Checks Explained →</a>
|
|
170
|
+
<a href="/learn/ai-agent-persistent-memory">AI Agent Persistent Memory →</a>
|
|
171
|
+
<a href="/learn/regulated-agent-execution-boundary">Regulated Agent Execution Boundary →</a>
|
|
172
|
+
<a href="/learn/ac-dc-runtime-enforcement">AC/DC Runtime Enforcement →</a>
|
|
173
|
+
<a href="/learn/feedback-loop-vs-decision-layer">The Feedback Loop vs the Decision Layer →</a>
|
|
174
|
+
<a href="/ai-malpractice-prevention">Pre-Execution Controls for Legal AI Agents →</a>
|
|
175
|
+
</div>
|
|
176
|
+
</div>
|
|
177
|
+
|
|
178
|
+
<div class="sticky-cta">
|
|
179
|
+
<span style="color:var(--muted)">Try it now:</span>
|
|
180
|
+
<code>npx thumbgate init</code>
|
|
181
|
+
<a href="https://github.com/IgorGanapolsky/ThumbGate" target="_blank" rel="noopener">GitHub →</a>
|
|
182
|
+
</div>
|
|
183
|
+
</body>
|
|
184
|
+
</html>
|
|
@@ -0,0 +1,205 @@
|
|
|
1
|
+
<!DOCTYPE html>
|
|
2
|
+
<html lang="en">
|
|
3
|
+
<head>
|
|
4
|
+
<meta charset="UTF-8">
|
|
5
|
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
6
|
+
<title>Claude Code /goal vs Todo: The 4-Field Pattern That Actually Holds — ThumbGate</title>
|
|
7
|
+
<script defer data-domain="thumbgate.ai" src="https://plausible.io/js/script.js"></script>
|
|
8
|
+
<meta name="description" content="Treating /goal like a todo wastes Claude Code. The pattern that holds: clear goal, measurable success, shown proof, hard limits. Maps directly to ThumbGate's rubric-engine for enforced verification.">
|
|
9
|
+
<meta name="keywords" content="claude code /goal, claude code goal command, agent goal pattern, measurable success criteria, rubric engine, AI agent verification, ThumbGate rubric">
|
|
10
|
+
<meta property="og:title" content="Claude Code /goal vs Todo: The 4-Field Pattern That Actually Holds">
|
|
11
|
+
<meta property="og:description" content="The /goal command becomes ten times more useful when you treat it as a verifiable contract, not a todo. Here is the 4-field pattern and how to enforce it.">
|
|
12
|
+
<meta property="og:type" content="article">
|
|
13
|
+
<meta property="og:url" content="https://thumbgate.ai/learn/claude-code-goal-with-rubrics">
|
|
14
|
+
<link rel="canonical" href="https://thumbgate.ai/learn/claude-code-goal-with-rubrics">
|
|
15
|
+
|
|
16
|
+
<script type="application/ld+json">
|
|
17
|
+
{
|
|
18
|
+
"@context": "https://schema.org",
|
|
19
|
+
"@type": "TechArticle",
|
|
20
|
+
"headline": "Claude Code /goal vs Todo: The 4-Field Pattern That Actually Holds",
|
|
21
|
+
"description": "Treating /goal like a todo wastes Claude Code. The pattern that holds: clear goal, measurable success, shown proof, hard limits — enforced by ThumbGate rubrics.",
|
|
22
|
+
"author": {
|
|
23
|
+
"@type": "Person",
|
|
24
|
+
"name": "Igor Ganapolsky",
|
|
25
|
+
"url": "https://github.com/IgorGanapolsky"
|
|
26
|
+
},
|
|
27
|
+
"publisher": {
|
|
28
|
+
"@type": "Organization",
|
|
29
|
+
"name": "ThumbGate",
|
|
30
|
+
"url": "https://thumbgate.ai"
|
|
31
|
+
},
|
|
32
|
+
"datePublished": "2026-05-18",
|
|
33
|
+
"dateModified": "2026-05-18",
|
|
34
|
+
"mainEntityOfPage": "https://thumbgate.ai/learn/claude-code-goal-with-rubrics",
|
|
35
|
+
"about": [
|
|
36
|
+
{"@type": "Thing", "name": "Claude Code /goal command"},
|
|
37
|
+
{"@type": "Thing", "name": "AI agent rubric"},
|
|
38
|
+
{"@type": "Thing", "name": "verifiable AI agent outcomes"}
|
|
39
|
+
]
|
|
40
|
+
}
|
|
41
|
+
</script>
|
|
42
|
+
|
|
43
|
+
<link rel="stylesheet" href="/learn/learn.css">
|
|
44
|
+
<style>
|
|
45
|
+
table { width: 100%; border-collapse: collapse; margin: 1rem 0; }
|
|
46
|
+
th, td { text-align: left; padding: 0.6rem 0.8rem; border-bottom: 1px solid var(--border); font-size: 0.9rem; vertical-align: top; }
|
|
47
|
+
th { color: var(--cyan); font-weight: 600; }
|
|
48
|
+
.mapping-row td:first-child { color: var(--green); font-weight: 500; }
|
|
49
|
+
pre.example { background: var(--bg-raised); padding: 14px 16px; border-radius: 8px; font-size: 13px; border-left: 3px solid var(--cyan); margin: 12px 0; overflow-x: auto; }
|
|
50
|
+
</style>
|
|
51
|
+
</head>
|
|
52
|
+
<body>
|
|
53
|
+
|
|
54
|
+
<nav>
|
|
55
|
+
<a href="/" class="brand"><img src="/assets/brand/thumbgate-mark-inline.svg" alt="ThumbGate" class="logo-mark" width="28" height="28"><span class="logo-text">ThumbGate</span></a>
|
|
56
|
+
<a href="/guide">Setup Guide</a>
|
|
57
|
+
<a href="/learn">Learn</a>
|
|
58
|
+
<a href="/dashboard">Dashboard</a>
|
|
59
|
+
<a href="https://github.com/IgorGanapolsky/ThumbGate" target="_blank" rel="noopener">GitHub</a>
|
|
60
|
+
</nav>
|
|
61
|
+
|
|
62
|
+
<div class="container">
|
|
63
|
+
<div class="breadcrumb"><a href="/learn">Learn</a> / Claude Code /goal With Rubrics</div>
|
|
64
|
+
<h1>Claude Code /goal vs Todo: The 4-Field Pattern That Actually Holds</h1>
|
|
65
|
+
<p style="color:var(--muted);">6 min read · For developers using Claude Code's /goal command in production</p>
|
|
66
|
+
|
|
67
|
+
<div class="tldr"><strong>TL;DR:</strong> Treating <code>/goal</code> like a todo wastes the command. The pattern that holds: <strong>clear goal, measurable success, shown proof, hard limits.</strong> That's a 4-field rubric — the same shape ThumbGate's rubric-engine enforces at gate-fire time. Pair them and the agent cannot fake completion.</div>
|
|
68
|
+
|
|
69
|
+
<h2>The todo-shaped /goal anti-pattern</h2>
|
|
70
|
+
<p>A common usage of Claude Code's <code>/goal</code> command looks like this:</p>
|
|
71
|
+
|
|
72
|
+
<pre class="example">/goal fix the auth bug</pre>
|
|
73
|
+
|
|
74
|
+
<p>This is a todo, not a goal. There is no way for the agent or you to know when the work is actually done. The agent will declare success after the first plausible-looking change, you will discover it was wrong an hour later, and the same conversation will re-enter the loop.</p>
|
|
75
|
+
|
|
76
|
+
<div class="callout">
|
|
77
|
+
<strong>The verifiable-contract reframe:</strong> A goal is not a wish. It is a contract with a clear outcome, a measurable success criterion, an inspectable proof, and a stop condition. Anything short of all four is a todo.
|
|
78
|
+
</div>
|
|
79
|
+
|
|
80
|
+
<h2>The 4-field pattern</h2>
|
|
81
|
+
<p>Each field maps to a specific failure mode the agent commits when the field is missing:</p>
|
|
82
|
+
|
|
83
|
+
<table>
|
|
84
|
+
<thead>
|
|
85
|
+
<tr>
|
|
86
|
+
<th>Field</th>
|
|
87
|
+
<th>What it answers</th>
|
|
88
|
+
<th>Failure mode if missing</th>
|
|
89
|
+
</tr>
|
|
90
|
+
</thead>
|
|
91
|
+
<tbody>
|
|
92
|
+
<tr class="mapping-row">
|
|
93
|
+
<td>1. Clear goal</td>
|
|
94
|
+
<td>What is the outcome, in one sentence, that someone outside the project would understand?</td>
|
|
95
|
+
<td>Scope creep. Agent expands the task to whatever it can find.</td>
|
|
96
|
+
</tr>
|
|
97
|
+
<tr class="mapping-row">
|
|
98
|
+
<td>2. Measurable success</td>
|
|
99
|
+
<td>What single check, run after the work, returns 0 / pass / a specific number?</td>
|
|
100
|
+
<td>"Looks done." Agent declares completion on optimistic intermediate signals.</td>
|
|
101
|
+
</tr>
|
|
102
|
+
<tr class="mapping-row">
|
|
103
|
+
<td>3. Shown proof</td>
|
|
104
|
+
<td>What output, file, or screenshot will be in the final message proving the check ran?</td>
|
|
105
|
+
<td>Hallucinated completion. Agent reports a green test that was never executed.</td>
|
|
106
|
+
</tr>
|
|
107
|
+
<tr class="mapping-row">
|
|
108
|
+
<td>4. Hard limits</td>
|
|
109
|
+
<td>What is the deadline, retry cap, or scope cliff that stops the work even if not yet "done"?</td>
|
|
110
|
+
<td>Infinite spin. Agent keeps trying variants past any reasonable budget.</td>
|
|
111
|
+
</tr>
|
|
112
|
+
</tbody>
|
|
113
|
+
</table>
|
|
114
|
+
|
|
115
|
+
<h2>Concrete /goal phrasing</h2>
|
|
116
|
+
<p>Same task, todo-shape vs goal-shape:</p>
|
|
117
|
+
|
|
118
|
+
<pre class="example"><strong>Todo shape (anti-pattern):</strong>
|
|
119
|
+
/goal fix the auth bug
|
|
120
|
+
|
|
121
|
+
<strong>Goal shape:</strong>
|
|
122
|
+
/goal fix the auth bug
|
|
123
|
+
success: npm test --testPathPattern=auth returns exit 0 with 12 passing
|
|
124
|
+
proof: paste the final 'PASS auth' line of the test output
|
|
125
|
+
limit: stop after 3 implementation attempts; if still failing, file a /thumbsdown</pre>
|
|
126
|
+
|
|
127
|
+
<p>The agent now has a contract it cannot fake. The success criterion is mechanical (exit code + count). The proof is inspectable (a literal line of output). The limit is a stop condition that triggers a feedback capture instead of a fifth wasted attempt.</p>
|
|
128
|
+
|
|
129
|
+
<h2>Where ThumbGate plugs in</h2>
|
|
130
|
+
<p>The 4-field pattern is exactly the shape of a rubric. ThumbGate's <code>scripts/rubric-engine.js</code> evaluates each completed agent task against a 4-field rubric:</p>
|
|
131
|
+
|
|
132
|
+
<table>
|
|
133
|
+
<thead>
|
|
134
|
+
<tr>
|
|
135
|
+
<th>Claude Code /goal field</th>
|
|
136
|
+
<th>ThumbGate rubric field</th>
|
|
137
|
+
<th>What ThumbGate does</th>
|
|
138
|
+
</tr>
|
|
139
|
+
</thead>
|
|
140
|
+
<tbody>
|
|
141
|
+
<tr class="mapping-row">
|
|
142
|
+
<td>Clear goal</td>
|
|
143
|
+
<td><code>rubric.goal</code></td>
|
|
144
|
+
<td>Stored as the canonical task description in the feedback log</td>
|
|
145
|
+
</tr>
|
|
146
|
+
<tr class="mapping-row">
|
|
147
|
+
<td>Measurable success</td>
|
|
148
|
+
<td><code>rubric.verification.check</code></td>
|
|
149
|
+
<td>The check is run before the agent's "done" claim is promoted to memory</td>
|
|
150
|
+
</tr>
|
|
151
|
+
<tr class="mapping-row">
|
|
152
|
+
<td>Shown proof</td>
|
|
153
|
+
<td><code>rubric.verification.evidence</code></td>
|
|
154
|
+
<td>Captured into the lesson DB; missing proof = no promotion</td>
|
|
155
|
+
</tr>
|
|
156
|
+
<tr class="mapping-row">
|
|
157
|
+
<td>Hard limits</td>
|
|
158
|
+
<td><code>rubric.budget</code></td>
|
|
159
|
+
<td>Tied to budget-guard.js — when limit hit, ThumbGate forces a thumbs-down capture and surfaces it for review</td>
|
|
160
|
+
</tr>
|
|
161
|
+
</tbody>
|
|
162
|
+
</table>
|
|
163
|
+
|
|
164
|
+
<div class="callout callout-green">
|
|
165
|
+
<strong>Two layers, same shape:</strong> Claude Code's <code>/goal</code> sets the contract at task start. ThumbGate's rubric-engine enforces it at task end. The agent cannot promote a "done" memory unless every rubric field has the proof attached.
|
|
166
|
+
</div>
|
|
167
|
+
|
|
168
|
+
<h2>What this prevents in practice</h2>
|
|
169
|
+
<p>Three concrete agent failure modes the paired pattern blocks:</p>
|
|
170
|
+
|
|
171
|
+
<ul>
|
|
172
|
+
<li><strong>Test-skipping.</strong> Agent reports "tests pass" but never ran them. Rubric verification.check runs the actual command and refuses the "done" promotion if exit code != 0.</li>
|
|
173
|
+
<li><strong>Optimistic completion.</strong> Agent declares success on the first plausible change. The proof field requires an inspectable artifact (test output line, screenshot, exit code) that has to exist before promotion.</li>
|
|
174
|
+
<li><strong>Budget run-away.</strong> Agent retries silently across 8+ attempts. Hard limit triggers a budget-guard block, forces feedback capture, and prevents the same pattern from recurring on the next prompt.</li>
|
|
175
|
+
</ul>
|
|
176
|
+
|
|
177
|
+
<h2>Five minutes to wire it</h2>
|
|
178
|
+
<p>In your project root:</p>
|
|
179
|
+
|
|
180
|
+
<pre><code>npx thumbgate init</code></pre>
|
|
181
|
+
|
|
182
|
+
<p>Then, in any conversation, use the goal-shape phrasing above. ThumbGate's rubric-engine runs as part of the PreToolUse hook chain; no extra config required. The first time the agent tries to promote a "done" claim without proof, you'll see the gate fire in your dashboard.</p>
|
|
183
|
+
|
|
184
|
+
<div class="cta-box">
|
|
185
|
+
<h2 style="color:var(--text);font-size:1.3rem;margin:0 0 8px;">Pair /goal with a rubric the agent cannot fake.</h2>
|
|
186
|
+
<p>Works with Claude Code, Cursor, Codex, Gemini, Amp, OpenCode, and any MCP-compatible agent.</p>
|
|
187
|
+
<div class="cta-install">$ npx thumbgate init</div>
|
|
188
|
+
</div>
|
|
189
|
+
|
|
190
|
+
<div class="related">
|
|
191
|
+
<h3>Related articles</h3>
|
|
192
|
+
<a href="/learn/agent-harness-pattern">The Agent Harness Pattern: Why Your AI Needs a Seatbelt →</a>
|
|
193
|
+
<a href="/learn/agent-swarms-shared-gates">Agent Swarms: One Gate Layer, Every Model →</a>
|
|
194
|
+
<a href="/learn/mcp-pre-action-checks-explained">MCP Pre-Action Checks Explained →</a>
|
|
195
|
+
</div>
|
|
196
|
+
</div>
|
|
197
|
+
|
|
198
|
+
|
|
199
|
+
<div class="sticky-cta">
|
|
200
|
+
<span style="color:var(--muted)">Try it now:</span>
|
|
201
|
+
<code>npx thumbgate init</code>
|
|
202
|
+
<a href="https://github.com/IgorGanapolsky/ThumbGate" target="_blank" rel="noopener">GitHub →</a>
|
|
203
|
+
</div>
|
|
204
|
+
</body>
|
|
205
|
+
</html>
|