npm - entroplain - Versions diffs - 0.1.1 → 0.2.1 - Mend

entroplain 0.1.1 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (37) hide show

package/26.0.1 +0 -0
package/CONTRIBUTING.md +103 -103
package/DEPLOY.md +41 -0
package/README.md +478 -389
package/dist/entroplain-0.2.2-py3-none-any.whl +0 -0
package/dist/entroplain-0.2.2.tar.gz +0 -0
package/dist/entroplain-0.2.3-py3-none-any.whl +0 -0
package/dist/entroplain-0.2.3.tar.gz +0 -0
package/docs/AGENT_USAGE.md +178 -178
package/docs/USAGE.md +302 -302
package/entroplain/__init__.py +32 -33
package/entroplain/cost_tracker.py +231 -0
package/entroplain/dashboard.py +480 -0
package/entroplain/monitor.py +390 -272
package/entroplain/providers.py +626 -626
package/entroplain/proxy.py +561 -278
package/entroplain/shared_state.py +72 -0
package/entroplain-proxy +0 -0
package/package.json +47 -44
package/paper.md +299 -0
package/pip +0 -0
package/pyproject.toml +96 -89
package/scripts/setup.bat +89 -0
package/scripts/setup.sh +98 -0
package/test_nvidia.py +56 -0
package/test_proxy.py +16 -0
package/vercel.json +6 -0
package/website/README.md +14 -0
package/website/app/globals.css +88 -0
package/website/app/layout.tsx +34 -0
package/website/app/page.tsx +537 -0
package/website/package-lock.json +520 -0
package/website/package.json +25 -0
package/website/tsconfig.json +40 -0
package/website/vercel.json +3 -0
package/dist/entroplain-0.1.1-py3-none-any.whl +0 -0
package/dist/entroplain-0.1.1.tar.gz +0 -0

package/README.md CHANGED Viewed

@@ -1,389 +1,478 @@
-# Entroplain
-**Entropy-based early exit for efficient agent reasoning.**
-Stop burning tokens. Know when your agent has finished thinking.
----
-## What It Does
-Entroplain monitors your LLM's **predictive entropy** — the uncertainty in its output distribution — to detect when reasoning has converged.
-```text
-High entropy → Model is searching, exploring, uncertain
-Low entropy  → Model is confident, converged, ready to output
-```
-**Key insight:** Reasoning follows a multi-modal entropy trajectory. Local minima ("valleys") mark reasoning milestones. Exit at the right valley, save 40-60% compute with minimal accuracy loss.
----
-## Quick Start
-### Install
-```bash
-# Python (pip)
-pip install entroplain
-# Node.js (npm)
-npm install entroplain
-```
-### Requirements
-**Python:** 3.8+
-**Node.js:** 18+
-**For cloud providers:** Set API keys via environment variables:
-```bash
-export OPENAI_API_KEY=sk-...
-export ANTHROPIC_API_KEY=sk-ant-...
-export NVIDIA_API_KEY=nvapi-...
-```
-**For local models:** Install [Ollama](https://ollama.ai) or [llama.cpp](https://github.com/ggerganov/llama.cpp)
-### Use with Any Agent
-```python
-from entroplain import EntropyMonitor
-monitor = EntropyMonitor()
-# Stream tokens with entropy tracking
-async for token, entropy in monitor.stream(agent.generate()):
-    print(f"{token} (entropy: {entropy:.3f})")
-    # Detect reasoning convergence
-    if monitor.is_converged():
-        break  # Early exit — reasoning complete
-```
----
-## How It Works
-### 1. Track Entropy Per Token
-Every token has an entropy value derived from the model's output distribution:
-```python
-entropy = -sum(p * log2(p) for p in probabilities if p > 0)
-```
-### 2. Detect Valleys
-Local minima in the entropy trajectory indicate reasoning milestones:
-```text
-Entropy: 0.8 → 0.6 → 0.3* → 0.5 → 0.2* → 0.1*
-                       ↑           ↑
-                    Valley 1    Valley 2
-```
-### 3. Exit at the Right Moment
-When valley count plateaus and velocity stabilizes, reasoning is complete.
----
-## Experimental Evidence
-Tested on Llama-3.1-70b via NVIDIA API:
-| Difficulty | Avg Valleys | Avg Entropy | Avg Velocity |
-|------------|-------------|-------------|--------------|
-| Easy       | 61.3        | 0.3758      | 0.4852       |
-| Medium     | 53.0        | 0.3267      | 0.4394       |
-| Hard       | 70.2        | 0.2947      | 0.4095       |
-**Finding:** Hard problems have more entropy valleys (70.2 vs 61.3) — valleys correlate with reasoning complexity.
----
-## Platform Support
-| Platform | Support | How to Enable |
-|----------|---------|---------------|
-| **Local (llama.cpp, Ollama)** | ✅ Full | Built-in, no config |
-| **OpenAI** | ✅ Yes | `logprobs: true` |
-| **Anthropic Claude** | ✅ Yes (Claude 4) | `logprobs: True` |
-| **Google Gemini** | ✅ Yes | `response_logprobs=True` |
-| **NVIDIA NIM** | ✅ Yes | `logprobs: true` |
-| **OpenRouter** | ⚠️ Partial | ~23% of models support it |
----
-## Integration Examples
-### OpenAI / NVIDIA / OpenRouter
-```python
-from openai import OpenAI
-from entroplain import EntropyMonitor
-client = OpenAI()
-monitor = EntropyMonitor()
-response = client.chat.completions.create(
-    model="gpt-4o",
-    messages=[{"role": "user", "content": "Solve this step by step..."}],
-    logprobs=True,
-    top_logprobs=5,
-    stream=True
-)
-for chunk in response:
-    if chunk.choices[0].delta.content:
-        token = chunk.choices[0].delta.content
-        entropy = monitor.calculate_entropy(chunk.choices[0].logprobs)
-        if monitor.should_exit():
-            print("\n[Early exit — reasoning converged]")
-            break
-        print(token, end="")
-```
-### Ollama (Local)
-```python
-import ollama
-from entroplain import EntropyMonitor
-monitor = EntropyMonitor()
-# Ollama exposes logits for local models
-response = ollama.generate(
-    model="llama3.1",
-    prompt="Think through this carefully...",
-    options={"num_ctx": 4096}
-)
-# Direct access to token probabilities
-for token_data in response.get("token_probs", []):
-    entropy = monitor.calculate_from_logits(token_data["logits"])
-    monitor.track(token_data["token"], entropy)
-```
-### Anthropic Claude
-```python
-from anthropic import Anthropic
-from entroplain import EntropyMonitor
-client = Anthropic()
-monitor = EntropyMonitor()
-with client.messages.stream(
-    model="claude-sonnet-4-20250514",
-    max_tokens=1024,
-    messages=[{"role": "user", "content": "Analyze this..."}],
-) as stream:
-    for text in stream.text_stream:
-        entropy = monitor.get_entropy()
-        if monitor.should_exit():
-            break
-        print(text, end="", flush=True)
-```
-### Agent Frameworks
-**OpenClaw:**
-```python
-# In your agent config
-entropy_monitor:
-  enabled: true
-  exit_threshold: 0.15  # Exit when entropy drops below this
-  min_valleys: 3        # Require at least N reasoning milestones
-```
-**Claude Code:**
-```json
-{
-  "hooks": {
-    "on_token": "entroplain.hooks.track_entropy",
-    "on_converge": "entroplain.hooks.early_exit"
-  }
-}
-```
----
-## Configuration
-### Environment Variables
-```bash
-# For cloud providers
-ENTROPPLAIN_OPENAI_API_KEY=sk-...
-ENTROPPLAIN_ANTHROPIC_API_KEY=sk-ant-...
-ENTROPPLAIN_NVIDIA_API_KEY=nvapi-...
-# For local models
-ENTROPPLAIN_LOCAL_PROVIDER=ollama  # or llama.cpp
-ENTROPPLAIN_LOCAL_MODEL=llama3.1
-```
-### Exit Conditions
-```python
-monitor = EntropyMonitor(
-    # Exit when entropy drops below threshold
-    entropy_threshold=0.15,
-    # Require minimum valleys before exit
-    min_valleys=2,
-    # Exit when velocity stabilizes (change < this)
-    velocity_threshold=0.05,
-    # Don't exit before N tokens
-    min_tokens=50,
-    # Custom exit condition
-    exit_condition="valleys_plateau"  # or "entropy_drop", "velocity_zero"
-)
-```
----
-## CLI Usage
-```bash
-# Analyze a prompt's entropy trajectory
-entroplain analyze "What is 2+2?" --model gpt-4o
-# Stream with early exit
-entroplain stream "Solve this step by step: x^2 = 16" --exit-on-converge
-# Benchmark entropy patterns
-entroplain benchmark --problems gsm8k --output results.json
-# Visualize entropy trajectory
-entroplain visualize results.json --output entropy_plot.png
-```
----
-## API Reference
-### `EntropyMonitor`
-```python
-class EntropyMonitor:
-    def __init__(
-        self,
-        entropy_threshold: float = 0.15,
-        min_valleys: int = 2,
-        velocity_threshold: float = 0.05,
-        min_tokens: int = 50
-    ): ...
-    def calculate_entropy(self, logprobs: List[float]) -> float:
-        """Calculate Shannon entropy from log probabilities."""
-    def track(self, token: str, entropy: float) -> None:
-        """Track a token and its entropy value."""
-    def get_valleys(self) -> List[Tuple[int, float]]:
-        """Get all entropy valleys (local minima)."""
-    def get_velocity(self) -> float:
-        """Get current entropy velocity (rate of change)."""
-    def should_exit(self) -> bool:
-        """Determine if reasoning has converged."""
-    def is_converged(self) -> bool:
-        """Alias for should_exit()."""
-    def get_trajectory(self) -> List[float]:
-        """Get full entropy trajectory."""
-    def reset(self) -> None:
-        """Clear all tracked data."""
-```
-### `calculate_entropy(logprobs)`
-```python
-from entroplain import calculate_entropy
-# From log probabilities
-entropy = calculate_entropy([-0.5, -2.1, -0.1, -5.2])
-# Returns: 0.847
-# From probabilities
-entropy = calculate_entropy([0.6, 0.125, 0.9, 0.005], from_probs=True)
-```
----
-## Research
-### Paper
-See [`paper.md`](./paper.md) for the full research proposal: **"Entropy-Based Early Exit for Efficient Agent Reasoning"**
-### Key Findings
-1. **H1 Supported:** Entropy valleys correlate with reasoning complexity (70.2 valleys for hard problems vs 61.3 for easy)
-2. **H2 Supported:** Entropy velocity differs by difficulty (0.4852 easy vs 0.4095 hard)
-3. **Potential:** 40-60% compute reduction with 95%+ accuracy retention
-### Citation
-```bibtex
-@software{entroplain2026,
-  title = {Entroplain: Entropy-Based Early Exit for Efficient Agent Reasoning},
-  author = {Entroplain Contributors},
-  year = {2026},
-  url = {https://github.com/entroplain/entroplain}
-}
-```
----
-## Roadmap
-- [ ] v0.1.0 — Core entropy tracking (Python)
-- [ ] v0.2.0 — Multi-provider support (OpenAI, Anthropic, Gemini, NVIDIA)
-- [ ] v0.3.0 — Local model support (llama.cpp, Ollama)
-- [ ] v0.4.0 — Agent framework integrations (OpenClaw, Claude Code)
-- [ ] v0.5.0 — JavaScript/Node.js SDK
-- [ ] v1.0.0 — Production release with benchmarks
----
-## Contributing
-We welcome contributions! See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.
-### Development Setup
-```bash
-git clone https://github.com/entroplain/entroplain.git
-cd entroplain
-pip install -e ".[dev]"
-pytest
-```
----
-## License
-MIT License — see [LICENSE](./LICENSE) for details.
----
-## Acknowledgments
-- Research inspired by early exit architectures in transformers
-- Experimental validation using NVIDIA NIM API
-- Built for the agent-first future of AI
+# Entroplain
+**Entropy-based early exit for efficient agent reasoning.**
+Stop burning tokens. Know when your agent has finished thinking.
+🌐 **Website:** https://entroplain.vercel.app/
+---
+## What It Does
+Entroplain monitors your LLM's **predictive entropy** — the uncertainty in its output distribution — to detect when reasoning has converged.
+```text
+High entropy → Model is searching, exploring, uncertain
+Low entropy → Model is confident, converged, ready to output
+```
+**Key insight:** Reasoning follows a multi-modal entropy trajectory. Local minima ("valleys") mark reasoning milestones. Exit at the right valley, save 40-60% compute with minimal accuracy loss.
+---
+## Quick Start
+### Install
+```bash
+# Python (pip)
+pip install entroplain
+# Node.js (npm)
+npm install entroplain
+```
+### Requirements
+**Python:** 3.8+
+**Node.js:** 18+
+**For cloud providers:** Set API keys via environment variables:
+```bash
+export OPENAI_API_KEY=sk-...
+export ANTHROPIC_API_KEY=sk-ant-...
+export NVIDIA_API_KEY=nvapi-...
+```
+**For local models:** Install [Ollama](https://ollama.ai) or [llama.cpp](https://github.com/ggerganov/llama.cpp)
+---
+## 🚀 Works With Any Agent (Proxy Method)
+The **proxy** is the easiest way to use Entroplain with OpenClaw, Claude Code, or any other agent framework:
+### How It Works
+```
+Your Agent → Proxy (localhost:8765) → Real API
+               │
+               ▼
+         Entropy Monitor
+               │
+               ▼
+         Early Exit Check
+```
+The proxy intercepts all LLM API calls, monitors entropy, and terminates streams when reasoning converges.
+### Setup (One-Time)
+```bash
+# Install with proxy support
+pip install entroplain[proxy]
+# Start the proxy
+entroplain-proxy --port 8765 --log-entropy
+# Point your agent to the proxy
+export OPENAI_BASE_URL=http://localhost:8765/v1
+# or for NVIDIA:
+export NVIDIA_BASE_URL=http://localhost:8765/v1
+# or for Anthropic:
+export ANTHROPIC_BASE_URL=http://localhost:8765/v1
+```
+That's it! Now run your agent normally and entropy monitoring is automatic.
+### Proxy Options
+```bash
+# Monitor only, don't exit early
+entroplain-proxy --port 8765 --no-early-exit
+# Custom thresholds
+entroplain-proxy --port 8765 --entropy-threshold 0.2 --min-valleys 3
+# Enable cost tracking
+entroplain-proxy --port 8765 --model gpt-4o --log-entropy
+# Launch dashboard
+entroplain-dashboard --port 8050
+```
+---
+## 🎯 Dashboard
+Real-time entropy visualization:
+```bash
+# Start the dashboard
+entroplain-dashboard --port 8050
+# Open in browser
+open http://localhost:8050
+```
+The dashboard shows:
+- **Live entropy curve** with valley markers
+- **Token count** and valleys detected
+- **Cost savings** in real-time
+- **Status badges** (active/idle/exited)
+---
+## 💰 Cost Tracking
+Track actual savings from early exit:
+```python
+from entroplain import CostTracker
+tracker = CostTracker(model="gpt-4o")
+tracker.track_input(100)   # 100 input tokens
+tracker.track_output(50)   # 50 output tokens
+tracker.set_full_estimate(150)  # Would have been 150
+estimate = tracker.get_estimate()
+print(f"Saved ${estimate.cost_saved_usd:.4f} ({estimate.savings_percent:.1f}%)")
+```
+**Supported pricing:** GPT-4o, GPT-4-turbo, Claude 4, Llama 3.1 (NVIDIA), or custom rates.
+---
+## Direct Usage (Python)
+If you want more control, use Entroplain directly:
+```python
+from entroplain import EntropyMonitor, NVIDIAProvider
+monitor = EntropyMonitor()
+provider = NVIDIAProvider()
+for token in provider.stream_with_entropy(
+    model="meta/llama-3.1-70b-instruct",
+    messages=[{"role": "user", "content": "Solve: x^2 = 16"}]
+):
+    monitor.track(token.token, token.entropy)
+    print(token.token, end="")
+    if monitor.should_exit():
+        print("\n[Early exit - reasoning converged]")
+        break
+print(f"\nStats: {monitor.get_stats()}")
+```
+---
+## How It Works
+### 1. Track Entropy Per Token
+Every token has an entropy value derived from the model's output distribution:
+```python
+entropy = -sum(p * log2(p) for p in probabilities if p > 0)
+```
+### 2. Detect Valleys
+Local minima in the entropy trajectory indicate reasoning milestones:
+```text
+Entropy: 0.8 → 0.6 → 0.3* → 0.5 → 0.2* → 0.1*
+                      ↑             ↑
+                  Valley 1      Valley 2
+```
+### 3. Exit at the Right Moment
+When valley count plateaus and velocity stabilizes, reasoning is complete.
+---
+## Exit Strategies
+Choose how Entroplain detects convergence:
+| Strategy | Description |
+|----------|-------------|
+| `combined` | Entropy low OR valleys plateau, AND velocity stable (default) |
+| `valleys_plateau` | Exit when reasoning milestones stabilize |
+| `entropy_drop` | Exit when model confidence is high |
+| `velocity_zero` | Exit when entropy stops changing |
+| `repetition` | Exit when model starts repeating itself |
+| `confidence` | Exit when top token prob > 95% for N tokens |
+```python
+monitor = EntropyMonitor(
+    exit_condition="repetition",  # or "confidence", "combined", etc.
+    repetition_threshold=0.3,      # Exit when 30% of recent tokens repeat
+)
+```
+---
+## Experimental Evidence
+Tested on Llama-3.1-70b via NVIDIA API:
+| Difficulty | Avg Valleys | Avg Entropy | Avg Velocity |
+|------------|-------------|-------------|--------------|
+| Easy       | 61.3        | 0.3758      | 0.4852       |
+| Medium     | 53.0        | 0.3267      | 0.4394       |
+| Hard       | 70.2        | 0.2947      | 0.4095       |
+**Finding:** Hard problems have more entropy valleys (70.2 vs 61.3) — valleys correlate with reasoning complexity.
+---
+## Platform Support
+| Platform | Support | How to Enable |
+|----------|---------|---------------|
+| **Local (llama.cpp, Ollama)** | ✅ Full | Built-in, no config |
+| **OpenAI** | ✅ Yes | `logprobs: true` |
+| **Anthropic Claude** | ✅ Yes (Claude 4) | `logprobs: True` |
+| **Google Gemini** | ✅ Yes | `response_logprobs=True` |
+| **NVIDIA NIM** | ✅ Yes | `logprobs: true` |
+| **OpenRouter** | ⚠️ Partial | ~23% of models support it |
+---
+## Integration Examples
+### OpenAI / NVIDIA / OpenRouter
+```python
+from openai import OpenAI
+from entroplain import EntropyMonitor
+client = OpenAI()
+monitor = EntropyMonitor()
+response = client.chat.completions.create(
+    model="gpt-4o",
+    messages=[{"role": "user", "content": "Solve this step by step..."}],
+    logprobs=True,
+    top_logprobs=5,
+    stream=True
+)
+for chunk in response:
+    if chunk.choices[0].delta.content:
+        token = chunk.choices[0].delta.content
+        entropy = monitor.calculate_entropy(chunk.choices[0].logprobs)
+        if monitor.should_exit():
+            print("\n[Early exit — reasoning converged]")
+            break
+        print(token, end="")
+```
+### Ollama (Local)
+```python
+import ollama
+from entroplain import EntropyMonitor
+monitor = EntropyMonitor()
+response = ollama.generate(
+    model="llama3.1",
+    prompt="Think through this carefully...",
+    options={"num_ctx": 4096}
+)
+for token_data in response.get("token_probs", []):
+    entropy = monitor.calculate_from_logits(token_data["logits"])
+    monitor.track(token_data["token"], entropy)
+```
+### Anthropic Claude
+```python
+from anthropic import Anthropic
+from entroplain import EntropyMonitor
+client = Anthropic()
+monitor = EntropyMonitor()
+with client.messages.stream(
+    model="claude-sonnet-4-20250514",
+    max_tokens=1024,
+    messages=[{"role": "user", "content": "Analyze this..."}],
+) as stream:
+    for text in stream.text_stream:
+        entropy = monitor.get_entropy()
+        if monitor.should_exit():
+            break
+        print(text, end="", flush=True)
+```
+---
+## CLI
+```bash
+# Analyze a prompt's entropy trajectory
+entroplain analyze "What is 2+2?" --model gpt-4o
+# Stream with early exit
+entroplain stream "Explain quantum computing" --exit-on-converge
+# Run the proxy (works with any agent)
+entroplain-proxy --port 8765 --log-entropy --model gpt-4o
+# Launch the dashboard
+entroplain-dashboard --port 8050
+# Benchmark entropy patterns
+entroplain benchmark --problems gsm8k --output results.json
+```
+---
+## API Reference
+### `EntropyMonitor`
+```python
+class EntropyMonitor:
+    def __init__(
+        self,
+        entropy_threshold: float = 0.15,
+        min_valleys: int = 2,
+        velocity_threshold: float = 0.05,
+        min_tokens: int = 50,
+        exit_condition: str = "combined"
+    ):
+        ...
+    def track(self, token: str, entropy: float, confidence: float = 0.0) -> EntropyPoint:
+        """Track a token and its entropy value."""
+    def should_exit(self) -> bool:
+        """Determine if reasoning has converged."""
+    def get_valleys(self) -> List[Tuple[int, float]]:
+        """Get all entropy valleys (local minima)."""
+    def get_stats(self) -> Dict:
+        """Get current statistics."""
+    def reset(self) -> None:
+        """Clear all tracked data."""
+```
+### `CostTracker`
+```python
+class CostTracker:
+    def __init__(self, model: str = "default"):
+        ...
+    def track_input(self, tokens: int):
+        """Track input tokens."""
+    def track_output(self, tokens: int):
+        """Track output tokens."""
+    def set_full_estimate(self, tokens: int):
+        """Set estimated output if no early exit."""
+    def get_estimate(self) -> CostEstimate:
+        """Get cost estimate with savings."""
+```
+### `EntropyProxy`
+```bash
+# Run the proxy
+entroplain-proxy --port 8765 --log-entropy --model gpt-4o
+# Options
+--entropy-threshold 0.15    # Exit threshold
+--min-valleys 2             # Minimum valleys
+--no-early-exit             # Monitor only, don't exit
+--log-entropy               # Log entropy values
+--model gpt-4o              # Model for cost tracking
+--no-cost-tracking          # Disable cost tracking
+```
+---
+## Research
+### Paper
+See [`paper.md`](./paper.md) for the full research proposal:
+**"Entropy-Based Early Exit for Efficient Agent Reasoning"**
+### Key Findings
+1. **H1 Supported:** Entropy valleys correlate with reasoning complexity (70.2 valleys for hard problems vs 61.3 for easy)
+2. **H2 Supported:** Entropy velocity differs by difficulty (0.4852 easy vs 0.4095 hard)
+3. **Potential:** 40-60% compute reduction with 95%+ accuracy retention
+### Citation
+```bibtex
+@software{entroplain2026,
+  title = {Entroplain: Entropy-Based Early Exit for Efficient Agent Reasoning},
+  author = {Entroplain Contributors},
+  year = {2026},
+  url = {https://github.com/entroplain/entroplain}
+}
+```
+---
+## Contributing
+We welcome contributions! See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.
+### Development Setup
+```bash
+git clone https://github.com/entroplain/entroplain.git
+cd entroplain
+pip install -e ".[dev]"
+pytest
+```
+---
+## License
+MIT License — see [LICENSE](./LICENSE) for details.
+---
+## Links
+- **PyPI:** https://pypi.org/project/entroplain/
+- **npm:** https://www.npmjs.com/package/entroplain
+- **GitHub:** https://github.com/entroplain/entroplain
+- **Issues:** https://github.com/entroplain/entroplain/issues
+---
+## Acknowledgments
+- Research inspired by early exit architectures in transformers
+- Experimental validation using NVIDIA NIM API
+- Built for the agent-first future of AI