entroplain 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,476 +1,478 @@
1
- # Entroplain
2
-
3
- **Entropy-based early exit for efficient agent reasoning.**
4
-
5
- Stop burning tokens. Know when your agent has finished thinking.
6
-
7
- ---
8
-
9
- ## What It Does
10
-
11
- Entroplain monitors your LLM's **predictive entropy** — the uncertainty in its output distribution — to detect when reasoning has converged.
12
-
13
- ```text
14
- High entropy → Model is searching, exploring, uncertain
15
- Low entropy → Model is confident, converged, ready to output
16
- ```
17
-
18
- **Key insight:** Reasoning follows a multi-modal entropy trajectory. Local minima ("valleys") mark reasoning milestones. Exit at the right valley, save 40-60% compute with minimal accuracy loss.
19
-
20
- ---
21
-
22
- ## Quick Start
23
-
24
- ### Install
25
-
26
- ```bash
27
- # Python (pip)
28
- pip install entroplain
29
-
30
- # Node.js (npm)
31
- npm install entroplain
32
- ```
33
-
34
- ### Requirements
35
-
36
- **Python:** 3.8+
37
-
38
- **Node.js:** 18+
39
-
40
- **For cloud providers:** Set API keys via environment variables:
41
-
42
- ```bash
43
- export OPENAI_API_KEY=sk-...
44
- export ANTHROPIC_API_KEY=sk-ant-...
45
- export NVIDIA_API_KEY=nvapi-...
46
- ```
47
-
48
- **For local models:** Install [Ollama](https://ollama.ai) or [llama.cpp](https://github.com/ggerganov/llama.cpp)
49
-
50
- ---
51
-
52
- ## 🚀 Works With Any Agent (Proxy Method)
53
-
54
- The **proxy** is the easiest way to use Entroplain with OpenClaw, Claude Code, or any other agent framework:
55
-
56
- ### How It Works
57
-
58
- ```
59
- Your Agent → Proxy (localhost:8765) → Real API
60
-
61
-
62
- Entropy Monitor
63
-
64
-
65
- Early Exit Check
66
- ```
67
-
68
- The proxy intercepts all LLM API calls, monitors entropy, and terminates streams when reasoning converges.
69
-
70
- ### Setup (One-Time)
71
-
72
- ```bash
73
- # Install with proxy support
74
- pip install entroplain[proxy]
75
-
76
- # Start the proxy
77
- entroplain-proxy --port 8765 --log-entropy
78
-
79
- # Point your agent to the proxy
80
- export OPENAI_BASE_URL=http://localhost:8765/v1
81
-
82
- # or for NVIDIA:
83
- export NVIDIA_BASE_URL=http://localhost:8765/v1
84
-
85
- # or for Anthropic:
86
- export ANTHROPIC_BASE_URL=http://localhost:8765/v1
87
- ```
88
-
89
- That's it! Now run your agent normally and entropy monitoring is automatic.
90
-
91
- ### Proxy Options
92
-
93
- ```bash
94
- # Monitor only, don't exit early
95
- entroplain-proxy --port 8765 --no-early-exit
96
-
97
- # Custom thresholds
98
- entroplain-proxy --port 8765 --entropy-threshold 0.2 --min-valleys 3
99
-
100
- # Enable cost tracking
101
- entroplain-proxy --port 8765 --model gpt-4o --log-entropy
102
-
103
- # Launch dashboard
104
- entroplain-dashboard --port 8050
105
- ```
106
-
107
- ---
108
-
109
- ## 🎯 Dashboard
110
-
111
- Real-time entropy visualization:
112
-
113
- ```bash
114
- # Start the dashboard
115
- entroplain-dashboard --port 8050
116
-
117
- # Open in browser
118
- open http://localhost:8050
119
- ```
120
-
121
- The dashboard shows:
122
- - **Live entropy curve** with valley markers
123
- - **Token count** and valleys detected
124
- - **Cost savings** in real-time
125
- - **Status badges** (active/idle/exited)
126
-
127
- ---
128
-
129
- ## 💰 Cost Tracking
130
-
131
- Track actual savings from early exit:
132
-
133
- ```python
134
- from entroplain import CostTracker
135
-
136
- tracker = CostTracker(model="gpt-4o")
137
- tracker.track_input(100) # 100 input tokens
138
- tracker.track_output(50) # 50 output tokens
139
- tracker.set_full_estimate(150) # Would have been 150
140
-
141
- estimate = tracker.get_estimate()
142
- print(f"Saved ${estimate.cost_saved_usd:.4f} ({estimate.savings_percent:.1f}%)")
143
- ```
144
-
145
- **Supported pricing:** GPT-4o, GPT-4-turbo, Claude 4, Llama 3.1 (NVIDIA), or custom rates.
146
-
147
- ---
148
-
149
- ## Direct Usage (Python)
150
-
151
- If you want more control, use Entroplain directly:
152
-
153
- ```python
154
- from entroplain import EntropyMonitor, NVIDIAProvider
155
-
156
- monitor = EntropyMonitor()
157
- provider = NVIDIAProvider()
158
-
159
- for token in provider.stream_with_entropy(
160
- model="meta/llama-3.1-70b-instruct",
161
- messages=[{"role": "user", "content": "Solve: x^2 = 16"}]
162
- ):
163
- monitor.track(token.token, token.entropy)
164
- print(token.token, end="")
165
-
166
- if monitor.should_exit():
167
- print("\n[Early exit - reasoning converged]")
168
- break
169
-
170
- print(f"\nStats: {monitor.get_stats()}")
171
- ```
172
-
173
- ---
174
-
175
- ## How It Works
176
-
177
- ### 1. Track Entropy Per Token
178
-
179
- Every token has an entropy value derived from the model's output distribution:
180
-
181
- ```python
182
- entropy = -sum(p * log2(p) for p in probabilities if p > 0)
183
- ```
184
-
185
- ### 2. Detect Valleys
186
-
187
- Local minima in the entropy trajectory indicate reasoning milestones:
188
-
189
- ```text
190
- Entropy: 0.8 → 0.6 → 0.3* → 0.5 → 0.2* → 0.1*
191
- ↑ ↑
192
- Valley 1 Valley 2
193
- ```
194
-
195
- ### 3. Exit at the Right Moment
196
-
197
- When valley count plateaus and velocity stabilizes, reasoning is complete.
198
-
199
- ---
200
-
201
- ## Exit Strategies
202
-
203
- Choose how Entroplain detects convergence:
204
-
205
- | Strategy | Description |
206
- |----------|-------------|
207
- | `combined` | Entropy low OR valleys plateau, AND velocity stable (default) |
208
- | `valleys_plateau` | Exit when reasoning milestones stabilize |
209
- | `entropy_drop` | Exit when model confidence is high |
210
- | `velocity_zero` | Exit when entropy stops changing |
211
- | `repetition` | Exit when model starts repeating itself |
212
- | `confidence` | Exit when top token prob > 95% for N tokens |
213
-
214
- ```python
215
- monitor = EntropyMonitor(
216
- exit_condition="repetition", # or "confidence", "combined", etc.
217
- repetition_threshold=0.3, # Exit when 30% of recent tokens repeat
218
- )
219
- ```
220
-
221
- ---
222
-
223
- ## Experimental Evidence
224
-
225
- Tested on Llama-3.1-70b via NVIDIA API:
226
-
227
- | Difficulty | Avg Valleys | Avg Entropy | Avg Velocity |
228
- |------------|-------------|-------------|--------------|
229
- | Easy | 61.3 | 0.3758 | 0.4852 |
230
- | Medium | 53.0 | 0.3267 | 0.4394 |
231
- | Hard | 70.2 | 0.2947 | 0.4095 |
232
-
233
- **Finding:** Hard problems have more entropy valleys (70.2 vs 61.3) — valleys correlate with reasoning complexity.
234
-
235
- ---
236
-
237
- ## Platform Support
238
-
239
- | Platform | Support | How to Enable |
240
- |----------|---------|---------------|
241
- | **Local (llama.cpp, Ollama)** | Full | Built-in, no config |
242
- | **OpenAI** | ✅ Yes | `logprobs: true` |
243
- | **Anthropic Claude** | ✅ Yes (Claude 4) | `logprobs: True` |
244
- | **Google Gemini** | ✅ Yes | `response_logprobs=True` |
245
- | **NVIDIA NIM** | ✅ Yes | `logprobs: true` |
246
- | **OpenRouter** | ⚠️ Partial | ~23% of models support it |
247
-
248
- ---
249
-
250
- ## Integration Examples
251
-
252
- ### OpenAI / NVIDIA / OpenRouter
253
-
254
- ```python
255
- from openai import OpenAI
256
- from entroplain import EntropyMonitor
257
-
258
- client = OpenAI()
259
- monitor = EntropyMonitor()
260
-
261
- response = client.chat.completions.create(
262
- model="gpt-4o",
263
- messages=[{"role": "user", "content": "Solve this step by step..."}],
264
- logprobs=True,
265
- top_logprobs=5,
266
- stream=True
267
- )
268
-
269
- for chunk in response:
270
- if chunk.choices[0].delta.content:
271
- token = chunk.choices[0].delta.content
272
- entropy = monitor.calculate_entropy(chunk.choices[0].logprobs)
273
-
274
- if monitor.should_exit():
275
- print("\n[Early exit — reasoning converged]")
276
- break
277
-
278
- print(token, end="")
279
- ```
280
-
281
- ### Ollama (Local)
282
-
283
- ```python
284
- import ollama
285
- from entroplain import EntropyMonitor
286
-
287
- monitor = EntropyMonitor()
288
-
289
- response = ollama.generate(
290
- model="llama3.1",
291
- prompt="Think through this carefully...",
292
- options={"num_ctx": 4096}
293
- )
294
-
295
- for token_data in response.get("token_probs", []):
296
- entropy = monitor.calculate_from_logits(token_data["logits"])
297
- monitor.track(token_data["token"], entropy)
298
- ```
299
-
300
- ### Anthropic Claude
301
-
302
- ```python
303
- from anthropic import Anthropic
304
- from entroplain import EntropyMonitor
305
-
306
- client = Anthropic()
307
- monitor = EntropyMonitor()
308
-
309
- with client.messages.stream(
310
- model="claude-sonnet-4-20250514",
311
- max_tokens=1024,
312
- messages=[{"role": "user", "content": "Analyze this..."}],
313
- ) as stream:
314
- for text in stream.text_stream:
315
- entropy = monitor.get_entropy()
316
-
317
- if monitor.should_exit():
318
- break
319
-
320
- print(text, end="", flush=True)
321
- ```
322
-
323
- ---
324
-
325
- ## CLI
326
-
327
- ```bash
328
- # Analyze a prompt's entropy trajectory
329
- entroplain analyze "What is 2+2?" --model gpt-4o
330
-
331
- # Stream with early exit
332
- entroplain stream "Explain quantum computing" --exit-on-converge
333
-
334
- # Run the proxy (works with any agent)
335
- entroplain-proxy --port 8765 --log-entropy --model gpt-4o
336
-
337
- # Launch the dashboard
338
- entroplain-dashboard --port 8050
339
-
340
- # Benchmark entropy patterns
341
- entroplain benchmark --problems gsm8k --output results.json
342
- ```
343
-
344
- ---
345
-
346
- ## API Reference
347
-
348
- ### `EntropyMonitor`
349
-
350
- ```python
351
- class EntropyMonitor:
352
- def __init__(
353
- self,
354
- entropy_threshold: float = 0.15,
355
- min_valleys: int = 2,
356
- velocity_threshold: float = 0.05,
357
- min_tokens: int = 50,
358
- exit_condition: str = "combined"
359
- ):
360
- ...
361
-
362
- def track(self, token: str, entropy: float, confidence: float = 0.0) -> EntropyPoint:
363
- """Track a token and its entropy value."""
364
-
365
- def should_exit(self) -> bool:
366
- """Determine if reasoning has converged."""
367
-
368
- def get_valleys(self) -> List[Tuple[int, float]]:
369
- """Get all entropy valleys (local minima)."""
370
-
371
- def get_stats(self) -> Dict:
372
- """Get current statistics."""
373
-
374
- def reset(self) -> None:
375
- """Clear all tracked data."""
376
- ```
377
-
378
- ### `CostTracker`
379
-
380
- ```python
381
- class CostTracker:
382
- def __init__(self, model: str = "default"):
383
- ...
384
-
385
- def track_input(self, tokens: int):
386
- """Track input tokens."""
387
-
388
- def track_output(self, tokens: int):
389
- """Track output tokens."""
390
-
391
- def set_full_estimate(self, tokens: int):
392
- """Set estimated output if no early exit."""
393
-
394
- def get_estimate(self) -> CostEstimate:
395
- """Get cost estimate with savings."""
396
- ```
397
-
398
- ### `EntropyProxy`
399
-
400
- ```bash
401
- # Run the proxy
402
- entroplain-proxy --port 8765 --log-entropy --model gpt-4o
403
-
404
- # Options
405
- --entropy-threshold 0.15 # Exit threshold
406
- --min-valleys 2 # Minimum valleys
407
- --no-early-exit # Monitor only, don't exit
408
- --log-entropy # Log entropy values
409
- --model gpt-4o # Model for cost tracking
410
- --no-cost-tracking # Disable cost tracking
411
- ```
412
-
413
- ---
414
-
415
- ## Research
416
-
417
- ### Paper
418
-
419
- See [`paper.md`](./paper.md) for the full research proposal:
420
-
421
- **"Entropy-Based Early Exit for Efficient Agent Reasoning"**
422
-
423
- ### Key Findings
424
-
425
- 1. **H1 Supported:** Entropy valleys correlate with reasoning complexity (70.2 valleys for hard problems vs 61.3 for easy)
426
- 2. **H2 Supported:** Entropy velocity differs by difficulty (0.4852 easy vs 0.4095 hard)
427
- 3. **Potential:** 40-60% compute reduction with 95%+ accuracy retention
428
-
429
- ### Citation
430
-
431
- ```bibtex
432
- @software{entroplain2026,
433
- title = {Entroplain: Entropy-Based Early Exit for Efficient Agent Reasoning},
434
- author = {Entroplain Contributors},
435
- year = {2026},
436
- url = {https://github.com/entroplain/entroplain}
437
- }
438
- ```
439
-
440
- ---
441
-
442
- ## Contributing
443
-
444
- We welcome contributions! See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.
445
-
446
- ### Development Setup
447
-
448
- ```bash
449
- git clone https://github.com/entroplain/entroplain.git
450
- cd entroplain
451
- pip install -e ".[dev]"
452
- pytest
453
- ```
454
-
455
- ---
456
-
457
- ## License
458
-
459
- MIT License — see [LICENSE](./LICENSE) for details.
460
-
461
- ---
462
-
463
- ## Links
464
-
465
- - **PyPI:** https://pypi.org/project/entroplain/
466
- - **npm:** https://www.npmjs.com/package/entroplain
467
- - **GitHub:** https://github.com/entroplain/entroplain
468
- - **Issues:** https://github.com/entroplain/entroplain/issues
469
-
470
- ---
471
-
472
- ## Acknowledgments
473
-
474
- - Research inspired by early exit architectures in transformers
475
- - Experimental validation using NVIDIA NIM API
476
- - Built for the agent-first future of AI
1
+ # Entroplain
2
+
3
+ **Entropy-based early exit for efficient agent reasoning.**
4
+
5
+ Stop burning tokens. Know when your agent has finished thinking.
6
+
7
+ 🌐 **Website:** https://entroplain.vercel.app/
8
+
9
+ ---
10
+
11
+ ## What It Does
12
+
13
+ Entroplain monitors your LLM's **predictive entropy** — the uncertainty in its output distribution — to detect when reasoning has converged.
14
+
15
+ ```text
16
+ High entropy → Model is searching, exploring, uncertain
17
+ Low entropy → Model is confident, converged, ready to output
18
+ ```
19
+
20
+ **Key insight:** Reasoning follows a multi-modal entropy trajectory. Local minima ("valleys") mark reasoning milestones. Exit at the right valley, save 40-60% compute with minimal accuracy loss.
21
+
22
+ ---
23
+
24
+ ## Quick Start
25
+
26
+ ### Install
27
+
28
+ ```bash
29
+ # Python (pip)
30
+ pip install entroplain
31
+
32
+ # Node.js (npm)
33
+ npm install entroplain
34
+ ```
35
+
36
+ ### Requirements
37
+
38
+ **Python:** 3.8+
39
+
40
+ **Node.js:** 18+
41
+
42
+ **For cloud providers:** Set API keys via environment variables:
43
+
44
+ ```bash
45
+ export OPENAI_API_KEY=sk-...
46
+ export ANTHROPIC_API_KEY=sk-ant-...
47
+ export NVIDIA_API_KEY=nvapi-...
48
+ ```
49
+
50
+ **For local models:** Install [Ollama](https://ollama.ai) or [llama.cpp](https://github.com/ggerganov/llama.cpp)
51
+
52
+ ---
53
+
54
+ ## 🚀 Works With Any Agent (Proxy Method)
55
+
56
+ The **proxy** is the easiest way to use Entroplain with OpenClaw, Claude Code, or any other agent framework:
57
+
58
+ ### How It Works
59
+
60
+ ```
61
+ Your Agent → Proxy (localhost:8765) → Real API
62
+
63
+
64
+ Entropy Monitor
65
+
66
+
67
+ Early Exit Check
68
+ ```
69
+
70
+ The proxy intercepts all LLM API calls, monitors entropy, and terminates streams when reasoning converges.
71
+
72
+ ### Setup (One-Time)
73
+
74
+ ```bash
75
+ # Install with proxy support
76
+ pip install entroplain[proxy]
77
+
78
+ # Start the proxy
79
+ entroplain-proxy --port 8765 --log-entropy
80
+
81
+ # Point your agent to the proxy
82
+ export OPENAI_BASE_URL=http://localhost:8765/v1
83
+
84
+ # or for NVIDIA:
85
+ export NVIDIA_BASE_URL=http://localhost:8765/v1
86
+
87
+ # or for Anthropic:
88
+ export ANTHROPIC_BASE_URL=http://localhost:8765/v1
89
+ ```
90
+
91
+ That's it! Now run your agent normally and entropy monitoring is automatic.
92
+
93
+ ### Proxy Options
94
+
95
+ ```bash
96
+ # Monitor only, don't exit early
97
+ entroplain-proxy --port 8765 --no-early-exit
98
+
99
+ # Custom thresholds
100
+ entroplain-proxy --port 8765 --entropy-threshold 0.2 --min-valleys 3
101
+
102
+ # Enable cost tracking
103
+ entroplain-proxy --port 8765 --model gpt-4o --log-entropy
104
+
105
+ # Launch dashboard
106
+ entroplain-dashboard --port 8050
107
+ ```
108
+
109
+ ---
110
+
111
+ ## 🎯 Dashboard
112
+
113
+ Real-time entropy visualization:
114
+
115
+ ```bash
116
+ # Start the dashboard
117
+ entroplain-dashboard --port 8050
118
+
119
+ # Open in browser
120
+ open http://localhost:8050
121
+ ```
122
+
123
+ The dashboard shows:
124
+ - **Live entropy curve** with valley markers
125
+ - **Token count** and valleys detected
126
+ - **Cost savings** in real-time
127
+ - **Status badges** (active/idle/exited)
128
+
129
+ ---
130
+
131
+ ## 💰 Cost Tracking
132
+
133
+ Track actual savings from early exit:
134
+
135
+ ```python
136
+ from entroplain import CostTracker
137
+
138
+ tracker = CostTracker(model="gpt-4o")
139
+ tracker.track_input(100) # 100 input tokens
140
+ tracker.track_output(50) # 50 output tokens
141
+ tracker.set_full_estimate(150) # Would have been 150
142
+
143
+ estimate = tracker.get_estimate()
144
+ print(f"Saved ${estimate.cost_saved_usd:.4f} ({estimate.savings_percent:.1f}%)")
145
+ ```
146
+
147
+ **Supported pricing:** GPT-4o, GPT-4-turbo, Claude 4, Llama 3.1 (NVIDIA), or custom rates.
148
+
149
+ ---
150
+
151
+ ## Direct Usage (Python)
152
+
153
+ If you want more control, use Entroplain directly:
154
+
155
+ ```python
156
+ from entroplain import EntropyMonitor, NVIDIAProvider
157
+
158
+ monitor = EntropyMonitor()
159
+ provider = NVIDIAProvider()
160
+
161
+ for token in provider.stream_with_entropy(
162
+ model="meta/llama-3.1-70b-instruct",
163
+ messages=[{"role": "user", "content": "Solve: x^2 = 16"}]
164
+ ):
165
+ monitor.track(token.token, token.entropy)
166
+ print(token.token, end="")
167
+
168
+ if monitor.should_exit():
169
+ print("\n[Early exit - reasoning converged]")
170
+ break
171
+
172
+ print(f"\nStats: {monitor.get_stats()}")
173
+ ```
174
+
175
+ ---
176
+
177
+ ## How It Works
178
+
179
+ ### 1. Track Entropy Per Token
180
+
181
+ Every token has an entropy value derived from the model's output distribution:
182
+
183
+ ```python
184
+ entropy = -sum(p * log2(p) for p in probabilities if p > 0)
185
+ ```
186
+
187
+ ### 2. Detect Valleys
188
+
189
+ Local minima in the entropy trajectory indicate reasoning milestones:
190
+
191
+ ```text
192
+ Entropy: 0.8 → 0.6 → 0.3* → 0.5 → 0.2* → 0.1*
193
+ ↑ ↑
194
+ Valley 1 Valley 2
195
+ ```
196
+
197
+ ### 3. Exit at the Right Moment
198
+
199
+ When valley count plateaus and velocity stabilizes, reasoning is complete.
200
+
201
+ ---
202
+
203
+ ## Exit Strategies
204
+
205
+ Choose how Entroplain detects convergence:
206
+
207
+ | Strategy | Description |
208
+ |----------|-------------|
209
+ | `combined` | Entropy low OR valleys plateau, AND velocity stable (default) |
210
+ | `valleys_plateau` | Exit when reasoning milestones stabilize |
211
+ | `entropy_drop` | Exit when model confidence is high |
212
+ | `velocity_zero` | Exit when entropy stops changing |
213
+ | `repetition` | Exit when model starts repeating itself |
214
+ | `confidence` | Exit when top token prob > 95% for N tokens |
215
+
216
+ ```python
217
+ monitor = EntropyMonitor(
218
+ exit_condition="repetition", # or "confidence", "combined", etc.
219
+ repetition_threshold=0.3, # Exit when 30% of recent tokens repeat
220
+ )
221
+ ```
222
+
223
+ ---
224
+
225
+ ## Experimental Evidence
226
+
227
+ Tested on Llama-3.1-70b via NVIDIA API:
228
+
229
+ | Difficulty | Avg Valleys | Avg Entropy | Avg Velocity |
230
+ |------------|-------------|-------------|--------------|
231
+ | Easy | 61.3 | 0.3758 | 0.4852 |
232
+ | Medium | 53.0 | 0.3267 | 0.4394 |
233
+ | Hard | 70.2 | 0.2947 | 0.4095 |
234
+
235
+ **Finding:** Hard problems have more entropy valleys (70.2 vs 61.3) — valleys correlate with reasoning complexity.
236
+
237
+ ---
238
+
239
+ ## Platform Support
240
+
241
+ | Platform | Support | How to Enable |
242
+ |----------|---------|---------------|
243
+ | **Local (llama.cpp, Ollama)** | ✅ Full | Built-in, no config |
244
+ | **OpenAI** | ✅ Yes | `logprobs: true` |
245
+ | **Anthropic Claude** | ✅ Yes (Claude 4) | `logprobs: True` |
246
+ | **Google Gemini** | Yes | `response_logprobs=True` |
247
+ | **NVIDIA NIM** | ✅ Yes | `logprobs: true` |
248
+ | **OpenRouter** | ⚠️ Partial | ~23% of models support it |
249
+
250
+ ---
251
+
252
+ ## Integration Examples
253
+
254
+ ### OpenAI / NVIDIA / OpenRouter
255
+
256
+ ```python
257
+ from openai import OpenAI
258
+ from entroplain import EntropyMonitor
259
+
260
+ client = OpenAI()
261
+ monitor = EntropyMonitor()
262
+
263
+ response = client.chat.completions.create(
264
+ model="gpt-4o",
265
+ messages=[{"role": "user", "content": "Solve this step by step..."}],
266
+ logprobs=True,
267
+ top_logprobs=5,
268
+ stream=True
269
+ )
270
+
271
+ for chunk in response:
272
+ if chunk.choices[0].delta.content:
273
+ token = chunk.choices[0].delta.content
274
+ entropy = monitor.calculate_entropy(chunk.choices[0].logprobs)
275
+
276
+ if monitor.should_exit():
277
+ print("\n[Early exit — reasoning converged]")
278
+ break
279
+
280
+ print(token, end="")
281
+ ```
282
+
283
+ ### Ollama (Local)
284
+
285
+ ```python
286
+ import ollama
287
+ from entroplain import EntropyMonitor
288
+
289
+ monitor = EntropyMonitor()
290
+
291
+ response = ollama.generate(
292
+ model="llama3.1",
293
+ prompt="Think through this carefully...",
294
+ options={"num_ctx": 4096}
295
+ )
296
+
297
+ for token_data in response.get("token_probs", []):
298
+ entropy = monitor.calculate_from_logits(token_data["logits"])
299
+ monitor.track(token_data["token"], entropy)
300
+ ```
301
+
302
+ ### Anthropic Claude
303
+
304
+ ```python
305
+ from anthropic import Anthropic
306
+ from entroplain import EntropyMonitor
307
+
308
+ client = Anthropic()
309
+ monitor = EntropyMonitor()
310
+
311
+ with client.messages.stream(
312
+ model="claude-sonnet-4-20250514",
313
+ max_tokens=1024,
314
+ messages=[{"role": "user", "content": "Analyze this..."}],
315
+ ) as stream:
316
+ for text in stream.text_stream:
317
+ entropy = monitor.get_entropy()
318
+
319
+ if monitor.should_exit():
320
+ break
321
+
322
+ print(text, end="", flush=True)
323
+ ```
324
+
325
+ ---
326
+
327
+ ## CLI
328
+
329
+ ```bash
330
+ # Analyze a prompt's entropy trajectory
331
+ entroplain analyze "What is 2+2?" --model gpt-4o
332
+
333
+ # Stream with early exit
334
+ entroplain stream "Explain quantum computing" --exit-on-converge
335
+
336
+ # Run the proxy (works with any agent)
337
+ entroplain-proxy --port 8765 --log-entropy --model gpt-4o
338
+
339
+ # Launch the dashboard
340
+ entroplain-dashboard --port 8050
341
+
342
+ # Benchmark entropy patterns
343
+ entroplain benchmark --problems gsm8k --output results.json
344
+ ```
345
+
346
+ ---
347
+
348
+ ## API Reference
349
+
350
+ ### `EntropyMonitor`
351
+
352
+ ```python
353
+ class EntropyMonitor:
354
+ def __init__(
355
+ self,
356
+ entropy_threshold: float = 0.15,
357
+ min_valleys: int = 2,
358
+ velocity_threshold: float = 0.05,
359
+ min_tokens: int = 50,
360
+ exit_condition: str = "combined"
361
+ ):
362
+ ...
363
+
364
+ def track(self, token: str, entropy: float, confidence: float = 0.0) -> EntropyPoint:
365
+ """Track a token and its entropy value."""
366
+
367
+ def should_exit(self) -> bool:
368
+ """Determine if reasoning has converged."""
369
+
370
+ def get_valleys(self) -> List[Tuple[int, float]]:
371
+ """Get all entropy valleys (local minima)."""
372
+
373
+ def get_stats(self) -> Dict:
374
+ """Get current statistics."""
375
+
376
+ def reset(self) -> None:
377
+ """Clear all tracked data."""
378
+ ```
379
+
380
+ ### `CostTracker`
381
+
382
+ ```python
383
+ class CostTracker:
384
+ def __init__(self, model: str = "default"):
385
+ ...
386
+
387
+ def track_input(self, tokens: int):
388
+ """Track input tokens."""
389
+
390
+ def track_output(self, tokens: int):
391
+ """Track output tokens."""
392
+
393
+ def set_full_estimate(self, tokens: int):
394
+ """Set estimated output if no early exit."""
395
+
396
+ def get_estimate(self) -> CostEstimate:
397
+ """Get cost estimate with savings."""
398
+ ```
399
+
400
+ ### `EntropyProxy`
401
+
402
+ ```bash
403
+ # Run the proxy
404
+ entroplain-proxy --port 8765 --log-entropy --model gpt-4o
405
+
406
+ # Options
407
+ --entropy-threshold 0.15 # Exit threshold
408
+ --min-valleys 2 # Minimum valleys
409
+ --no-early-exit # Monitor only, don't exit
410
+ --log-entropy # Log entropy values
411
+ --model gpt-4o # Model for cost tracking
412
+ --no-cost-tracking # Disable cost tracking
413
+ ```
414
+
415
+ ---
416
+
417
+ ## Research
418
+
419
+ ### Paper
420
+
421
+ See [`paper.md`](./paper.md) for the full research proposal:
422
+
423
+ **"Entropy-Based Early Exit for Efficient Agent Reasoning"**
424
+
425
+ ### Key Findings
426
+
427
+ 1. **H1 Supported:** Entropy valleys correlate with reasoning complexity (70.2 valleys for hard problems vs 61.3 for easy)
428
+ 2. **H2 Supported:** Entropy velocity differs by difficulty (0.4852 easy vs 0.4095 hard)
429
+ 3. **Potential:** 40-60% compute reduction with 95%+ accuracy retention
430
+
431
+ ### Citation
432
+
433
+ ```bibtex
434
+ @software{entroplain2026,
435
+ title = {Entroplain: Entropy-Based Early Exit for Efficient Agent Reasoning},
436
+ author = {Entroplain Contributors},
437
+ year = {2026},
438
+ url = {https://github.com/entroplain/entroplain}
439
+ }
440
+ ```
441
+
442
+ ---
443
+
444
+ ## Contributing
445
+
446
+ We welcome contributions! See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.
447
+
448
+ ### Development Setup
449
+
450
+ ```bash
451
+ git clone https://github.com/entroplain/entroplain.git
452
+ cd entroplain
453
+ pip install -e ".[dev]"
454
+ pytest
455
+ ```
456
+
457
+ ---
458
+
459
+ ## License
460
+
461
+ MIT License — see [LICENSE](./LICENSE) for details.
462
+
463
+ ---
464
+
465
+ ## Links
466
+
467
+ - **PyPI:** https://pypi.org/project/entroplain/
468
+ - **npm:** https://www.npmjs.com/package/entroplain
469
+ - **GitHub:** https://github.com/entroplain/entroplain
470
+ - **Issues:** https://github.com/entroplain/entroplain/issues
471
+
472
+ ---
473
+
474
+ ## Acknowledgments
475
+
476
+ - Research inspired by early exit architectures in transformers
477
+ - Experimental validation using NVIDIA NIM API
478
+ - Built for the agent-first future of AI