verifiable-thinking-mcp 0.4.1 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +125 -128
  2. package/dist/index.js +536 -0
  3. package/package.json +2 -1
package/README.md CHANGED
@@ -1,56 +1,33 @@
1
1
  <div align="center">
2
2
 
3
- # Verifiable Thinking
3
+ <img src="assets/header.svg" alt="Verifiable Thinking MCP" width="800" />
4
4
 
5
- **Your LLM is confidently wrong 40% of the time on reasoning questions.**<br>
6
- **This fixes that.**
5
+ **Your LLM is confidently wrong 40% of the time on reasoning questions. This fixes that.**
7
6
 
8
- [![npm](https://img.shields.io/npm/v/verifiable-thinking-mcp?color=blue)](https://www.npmjs.com/package/verifiable-thinking-mcp)
9
- [![CI](https://img.shields.io/github/actions/workflow/status/CoderDayton/verifiable-thinking-mcp/ci.yml?label=CI)](https://github.com/CoderDayton/verifiable-thinking-mcp/actions)
7
+ [![npm version](https://img.shields.io/npm/v/verifiable-thinking-mcp?color=blue&label=npm)](https://www.npmjs.com/package/verifiable-thinking-mcp)
8
+ [![CI](https://img.shields.io/github/actions/workflow/status/CoderDayton/verifiable-thinking-mcp/ci.yml?label=CI)](https://github.com/CoderDayton/verifiable-thinking-mcp/actions/workflows/ci.yml)
10
9
  [![codecov](https://codecov.io/gh/CoderDayton/verifiable-thinking-mcp/branch/main/graph/badge.svg)](https://codecov.io/gh/CoderDayton/verifiable-thinking-mcp)
11
10
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
12
11
 
13
- [Why This Exists](#why-this-exists) [Quick Start](#quick-start) [Features](#features) [vs Sequential Thinking](#vs-sequential-thinking)
12
+ *15 trap patterns detected in <1ms. No LLM calls. Just pattern matching.*
13
+
14
+ [Quick Start](#quick-start) • [Features](#features) • [Trap Detection](#trap-detection) • [API](#tools)
14
15
 
15
16
  </div>
16
17
 
17
18
  ---
18
19
 
19
- ## The Problem
20
-
21
- Ask Claude or GPT this:
22
-
23
- > *A bat and ball cost $1.10. The bat costs $1 more than the ball. How much does the ball cost?*
24
-
25
- **40% of the time, it answers $0.10.** Confidently. With reasoning. And it's wrong.
26
-
27
- The correct answer is $0.05 (because $0.05 + $1.05 = $1.10).
28
-
29
- This isn't a cherry-picked example. LLMs fail predictably on cognitive traps:
30
- - Lily pad doubling problems
31
- - Monty Hall scenarios
32
- - Base rate fallacies
33
- - Gambler's fallacy questions
34
-
35
- They fail because they pattern-match to *similar-looking* problems instead of reasoning through the actual structure.
36
-
37
- ## The Solution
38
-
39
20
  ```
40
- ┌─────────────────────────────────────────────────────────────────┐
41
- "A bat and ball cost $1.10. The bat costs $1 more..." │
42
- ↓ │
43
- TRAP DETECTED: additive_system │
44
- ⚠️ Don't subtract $1 from $1.10. Set up: x + (x+1) = 1.10
45
- ↓ │
46
- LLM receives warning BEFORE reasoning starts
47
- │ ↓ │
48
- │ Answer: $0.05 ✓ │
49
- └─────────────────────────────────────────────────────────────────┘
21
+ ┌────────────────────────────────────────────────────────────────┐
22
+ "A bat and ball cost $1.10. The bat costs $1 more..." │
23
+ ↓ │
24
+ TRAP DETECTED: additive_system │
25
+ > Don't subtract $1 from $1.10. Set up: x + (x+1) = 1.10
26
+ ↓ │
27
+ Answer: $0.05 (not $0.10)
28
+ └────────────────────────────────────────────────────────────────┘
50
29
  ```
51
30
 
52
- **Verifiable Thinking** detects 15 cognitive trap patterns in <1ms and warns the LLM before it starts reasoning. No extra LLM calls. Just pattern matching.
53
-
54
31
  ## Quick Start
55
32
 
56
33
  ```bash
@@ -70,137 +47,159 @@ Add to Claude Desktop (`claude_desktop_config.json`):
70
47
  }
71
48
  ```
72
49
 
73
- That's it. Claude now has trap detection built in.
50
+ ## Features
74
51
 
75
- ## Why This Exists
52
+ | | |
53
+ |---|---|
54
+ | 🎯 **Trap Detection** | 15 patterns (bat-ball, Monty Hall, base rate) caught before reasoning starts |
55
+ | ⚔️ **Auto-Challenge** | Forces counterarguments when confidence >95%—no more overconfident wrong answers |
56
+ | 🔍 **Contradiction Detection** | Catches "Let x=5" then "Now x=10" across steps |
57
+ | 🌿 **Hypothesis Branching** | Explore alternatives, auto-detects when branches confirm/refute |
58
+ | 🔢 **Local Math** | Evaluates expressions without LLM round-trips |
59
+ | 🗜️ **Smart Compression** | 56.8% token savings with query-aware CPC compression |
60
+ | ⚡ **Real Token Counting** | Tiktoken integration—3,922× cache speedup, zero estimation error |
76
61
 
77
- I got tired of LLMs being confidently wrong.
62
+ ## Token Efficiency
78
63
 
79
- Not wrong about obscure facts—wrong about basic math and logic. The kind of problems where a human who *thought carefully* would get it right, but an LLM pattern-matches to the wrong template and produces a confident, well-reasoned, incorrect answer.
64
+ Every operation counts. Verifiable Thinking uses **real token counting** (tiktoken) and **intelligent compression** to cut costs by 50-60% without sacrificing reasoning quality.
80
65
 
81
- The MCP ecosystem had "Sequential Thinking"—a tool that helps LLMs think step-by-step. But step-by-step reasoning doesn't help if you're reasoning toward the wrong answer from the start.
66
+ ```typescript
67
+ // Traditional reasoning: ~1,350 tokens for 10-step chain
68
+ // Verifiable Thinking: ~580 tokens (56.8% savings)
82
69
 
83
- So I built this. **22,000+ lines of code. 1,831 tests. 15 trap detectors.** All to catch the patterns that make LLMs fail.
70
+ // Real token counting (not estimation)
71
+ countTokens("What is 2+2?") // → 7 tokens (not 3)
72
+ // Cache speedup: 3,922× faster on repeated strings
84
73
 
85
- ## Features
74
+ // Compress before processing (not just storage)
75
+ scratchpad({
76
+ operation: "step",
77
+ thought: "Long analysis...", // 135 tokens → 72 tokens
78
+ compress: true
79
+ })
86
80
 
87
- | Feature | What It Does | Why It Matters |
88
- |---------|--------------|----------------|
89
- | **Trap Detection** | 15 cognitive trap patterns detected in <1ms | Warns LLM *before* it reasons toward wrong answer |
90
- | **Auto-Challenge** | Forces counterarguments when confidence >95% | Catches overconfident mistakes |
91
- | **Contradiction Detection** | Spots "Let x=5" then "Now x=10" in reasoning chains | Prevents reasoning drift |
92
- | **Confidence Tracking** | Monitors per-step and chain-average confidence | Flags suspiciously stable overconfidence |
93
- | **Local Math** | Evaluates expressions without LLM calls | Catches arithmetic errors instantly |
94
- | **Budget Control** | Token tracking with soft/hard limits | Prevents runaway reasoning chains |
81
+ // Budget controls
82
+ scratchpad({
83
+ warn_at_tokens: 2000, // Soft warning
84
+ hard_limit_tokens: 5000 // Hard stop
85
+ })
86
+ ```
95
87
 
96
- <details>
97
- <summary><strong>All 15 Trap Patterns</strong></summary>
98
-
99
- | Pattern | Classic Example | The Trap |
100
- |---------|-----------------|----------|
101
- | `additive_system` | Bat and ball | Subtract instead of solve equations |
102
- | `nonlinear_growth` | Lily pad doubling | Linear interpolation on exponential |
103
- | `rate_pattern` | 5 machines, 5 minutes | Incorrect scaling |
104
- | `harmonic_mean` | Round-trip average speed | Arithmetic mean for rates |
105
- | `independence` | Coin flip sequence | Gambler's fallacy |
106
- | `pigeonhole` | Socks in the dark | Underestimate worst case |
107
- | `base_rate` | Medical test accuracy | Ignore prevalence |
108
- | `factorial_counting` | Trailing zeros in n! | Simple division |
109
- | `clock_overlap` | Hour/minute hand overlaps | Assume exactly 12 |
110
- | `conditional_probability` | Given/if probability | Ignore conditioning |
111
- | `conjunction_fallacy` | Linda the bank teller | More detail = more likely |
112
- | `monty_hall` | Door switching game | 50/50 fallacy after reveal |
113
- | `anchoring` | Estimation after priming | Irrelevant number influence |
114
- | `sunk_cost` | Should I continue? | Past investment bias |
115
- | `framing_effect` | "Save 200" vs "400 die" | Gain/loss framing |
88
+ **At scale:** 1,000 reasoning chains/day = **$4,193/year saved** (at GPT-4o pricing).
116
89
 
117
- </details>
90
+ See [`docs/token-optimization.md`](docs/token-optimization.md) for architecture details and benchmarks.
118
91
 
119
92
  ## How It Works
120
93
 
121
94
  ```typescript
122
- // Step 1: Start reasoning—trap detection runs automatically
95
+ // Start with a question—trap detection runs automatically
123
96
  scratchpad({
124
97
  operation: "step",
125
- question: "A bat and ball cost $1.10. The bat costs $1 more than the ball...",
126
- thought: "Let me work this out systematically",
127
- confidence: 0.8
128
- })
129
- // → Returns trap_analysis: { pattern: "additive_system", warning: "..." }
130
-
131
- // Step 2: Continue reasoning with the warning in context
132
- scratchpad({
133
- operation: "step",
134
- thought: "Setting up equations: ball = x, bat = x + 1.00",
98
+ question: "A bat and ball cost $1.10...",
99
+ thought: "Let ball = x, bat = x + 1.00",
135
100
  confidence: 0.9
136
101
  })
102
+ // → Returns trap_analysis warning
137
103
 
138
- // Step 3: Complete—auto spot-check validates answer
139
- scratchpad({
140
- operation: "complete",
141
- final_answer: "$0.05"
142
- })
143
- // Returns validation result
104
+ // High confidence? Auto-challenge kicks in
105
+ scratchpad({ operation: "step", thought: "...", confidence: 0.96 })
106
+ // → Returns challenge_suggestion: "What if your assumption is wrong?"
107
+
108
+ // Complete with spot-check
109
+ scratchpad({ operation: "complete", final_answer: "$0.05" })
144
110
  ```
145
111
 
146
- ## vs Sequential Thinking
112
+ ## Trap Detection
147
113
 
148
- | | Sequential Thinking | Verifiable Thinking |
149
- |---|:---:|:---:|
150
- | Trap detection | | 15 patterns |
151
- | Auto-challenge | | |
152
- | Contradiction detection | | |
153
- | Confidence tracking | | |
154
- | Local compute | | |
155
- | Token budgets | ❌ | ✓ |
156
- | Lines of code | ~100 | 22,000+ |
157
- | Tests | ? | 1,831 |
114
+ | Pattern | What It Catches |
115
+ |---------|-----------------|
116
+ | `additive_system` | Bat-ball, widget-gadget (subtract instead of solve) |
117
+ | `nonlinear_growth` | Lily pad doubling (linear interpolation) |
118
+ | `monty_hall` | Door switching (50/50 fallacy) |
119
+ | `base_rate` | Medical tests (ignoring prevalence) |
120
+ | `independence` | Coin flips (gambler's fallacy) |
158
121
 
159
- Sequential Thinking helps you think step-by-step.<br>
160
- Verifiable Thinking catches you when you're stepping in the wrong direction.
122
+ <details>
123
+ <summary>All 15 patterns</summary>
124
+
125
+ | Pattern | Trap |
126
+ |---------|------|
127
+ | `additive_system` | Subtract instead of solve |
128
+ | `nonlinear_growth` | Linear interpolation |
129
+ | `rate_pattern` | Incorrect scaling |
130
+ | `harmonic_mean` | Arithmetic mean for rates |
131
+ | `independence` | Gambler's fallacy |
132
+ | `pigeonhole` | Underestimate worst case |
133
+ | `base_rate` | Ignore prevalence |
134
+ | `factorial_counting` | Simple division |
135
+ | `clock_overlap` | Assume 12 overlaps |
136
+ | `conditional_probability` | Ignore conditioning |
137
+ | `conjunction_fallacy` | More detail = more likely |
138
+ | `monty_hall` | 50/50 after reveal |
139
+ | `anchoring` | Irrelevant number influence |
140
+ | `sunk_cost` | Past investment bias |
141
+ | `framing_effect` | Gain/loss framing |
161
142
 
162
- [Full comparison →](docs/competitive-analysis.md)
143
+ </details>
163
144
 
164
- ## API Reference
145
+ ## Tools
165
146
 
166
- <details>
167
- <summary><strong>scratchpad operations</strong></summary>
147
+ **`scratchpad`** — the main tool with 11 operations:
168
148
 
169
- | Operation | Purpose |
170
- |-----------|---------|
149
+ | Operation | What It Does |
150
+ |-----------|--------------|
171
151
  | `step` | Add reasoning step (trap priming on first) |
172
152
  | `complete` | Finalize with auto spot-check |
173
153
  | `revise` | Fix earlier step |
174
154
  | `branch` | Explore alternative path |
175
155
  | `challenge` | Force adversarial self-check |
176
156
  | `navigate` | View history/branches |
177
- | `spot_check` | Manual trap validation |
178
- | `hint` | Progressive algebraic help |
179
- | `mistakes` | Detect common errors |
180
- | `augment` | Evaluate math expressions |
181
- | `override` | Force-commit after failure |
182
-
183
- </details>
184
157
 
185
158
  <details>
186
- <summary><strong>Session management</strong></summary>
159
+ <summary>All operations</summary>
187
160
 
188
- - `list_sessions` List all active sessions
189
- - `get_session` — Get session details
190
- - `clear_session` Delete a session
191
- - `compress` CPC-style context compression
161
+ | Operation | Purpose |
162
+ |-----------|---------|
163
+ | `step` | Add reasoning step |
164
+ | `complete` | Finalize chain |
165
+ | `revise` | Fix earlier step |
166
+ | `branch` | Alternative path |
167
+ | `challenge` | Adversarial self-check |
168
+ | `navigate` | View history |
169
+ | `spot_check` | Manual trap check |
170
+ | `hint` | Progressive simplification |
171
+ | `mistakes` | Algebraic error detection |
172
+ | `augment` | Compute math expressions |
173
+ | `override` | Force-commit failed step |
192
174
 
193
175
  </details>
194
176
 
177
+ **Other tools:** `list_sessions`, `get_session`, `clear_session`, `compress`
178
+
179
+ ## vs Sequential Thinking MCP
180
+
181
+ | | Sequential Thinking | Verifiable Thinking |
182
+ |---|---|---|
183
+ | Trap detection | ❌ | 15 patterns |
184
+ | Auto-challenge | ❌ | >95% confidence |
185
+ | Contradiction detection | ❌ | ✅ |
186
+ | Confidence tracking | ❌ | Per-step + chain |
187
+ | Local compute | ❌ | ✅ |
188
+ | Token budgets | ❌ | Soft + hard limits |
189
+ | Real token counting | ❌ | Tiktoken (3,922× cache speedup) |
190
+ | Compression | ❌ | 56.8% token savings |
191
+
192
+ Sequential Thinking is ~100 lines. This is 22,000+ with 1,831 tests.
193
+
194
+ See [`docs/competitive-analysis.md`](docs/competitive-analysis.md) for full breakdown.
195
+
195
196
  ## Development
196
197
 
197
198
  ```bash
198
199
  git clone https://github.com/CoderDayton/verifiable-thinking-mcp.git
199
200
  cd verifiable-thinking-mcp && bun install
200
-
201
- bun run dev # MCP Inspector
201
+ bun run dev # Interactive MCP Inspector
202
202
  bun test # 1,831 tests
203
- bun run build # Production bundle
204
203
  ```
205
204
 
206
205
  ## License
@@ -211,8 +210,6 @@ MIT
211
210
 
212
211
  <div align="center">
213
212
 
214
- **[Report Bug](https://github.com/CoderDayton/verifiable-thinking-mcp/issues) · [Request Feature](https://github.com/CoderDayton/verifiable-thinking-mcp/issues) · [Discussions](https://github.com/CoderDayton/verifiable-thinking-mcp/discussions)**
215
-
216
- *Built because LLMs shouldn't be confidently wrong.*
213
+ **[Report Bug](https://github.com/CoderDayton/verifiable-thinking-mcp/issues) · [Request Feature](https://github.com/CoderDayton/verifiable-thinking-mcp/issues)**
217
214
 
218
215
  </div>