PyPI - tokenshrink - Versions diffs - 0.1.0__tar.gz - Mend

tokenshrink 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

tokenshrink-0.1.0/.gitignore +38 -0
tokenshrink-0.1.0/LICENSE +21 -0
tokenshrink-0.1.0/PKG-INFO +255 -0
tokenshrink-0.1.0/README.md +221 -0
tokenshrink-0.1.0/docs/index.html +443 -0
tokenshrink-0.1.0/docs/marketing/reddit-posts.md +155 -0
tokenshrink-0.1.0/pyproject.toml +68 -0
tokenshrink-0.1.0/site/index.html +443 -0
tokenshrink-0.1.0/src/tokenshrink/__init__.py +25 -0
tokenshrink-0.1.0/src/tokenshrink/cli.py +190 -0
tokenshrink-0.1.0/src/tokenshrink/pipeline.py +400 -0

tokenshrink-0.1.0/.gitignore ADDED Viewed

@@ -0,0 +1,38 @@
+# Python
+__pycache__/
+*.py[cod]
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+.venv/
+venv/
+ENV/
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+# Index (local)
+.tokenshrink/
+# OS
+.DS_Store
+Thumbs.db

tokenshrink-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Musashi
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

tokenshrink-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,255 @@
+Metadata-Version: 2.4
+Name: tokenshrink
+Version: 0.1.0
+Summary: Cut your AI costs 50-80%. FAISS retrieval + LLMLingua compression.
+Project-URL: Homepage, https://tokenshrink.dev
+Project-URL: Repository, https://github.com/MusashiMiyamoto1-cloud/tokenshrink
+Project-URL: Documentation, https://tokenshrink.dev/docs
+Author-email: Musashi <musashimiyamoto1@icloud.com>
+License-Expression: MIT
+License-File: LICENSE
+Keywords: agents,ai,compression,context,cost-reduction,faiss,llm,llmlingua,rag,tokens
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Requires-Python: >=3.10
+Requires-Dist: faiss-cpu>=1.7.4
+Requires-Dist: numpy>=1.24.0
+Requires-Dist: sentence-transformers>=2.2.0
+Provides-Extra: all
+Requires-Dist: llmlingua>=0.2.0; extra == 'all'
+Requires-Dist: pytest>=7.0.0; extra == 'all'
+Requires-Dist: ruff>=0.1.0; extra == 'all'
+Provides-Extra: compression
+Requires-Dist: llmlingua>=0.2.0; extra == 'compression'
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0.0; extra == 'dev'
+Requires-Dist: ruff>=0.1.0; extra == 'dev'
+Description-Content-Type: text/markdown
+# TokenShrink
+**Cut your AI costs 50-80%.** FAISS semantic retrieval + LLMLingua compression.
+Stop loading entire files into your prompts. Load only what's relevant, compressed.
+## Quick Start
+```bash
+pip install tokenshrink
+# Index your docs
+tokenshrink index ./docs
+# Get compressed context
+tokenshrink query "What are the API limits?" --compress
+```
+## Why TokenShrink?
+| Without | With TokenShrink |
+|---------|------------------|
+| Load entire file (5000 tokens) | Load relevant chunks (200 tokens) |
+| $0.15 per query | $0.03 per query |
+| Slow responses | Fast responses |
+| Hit context limits | Stay under limits |
+**Real numbers:** 50-80% token reduction on typical RAG workloads.
+## Installation
+```bash
+# Basic (retrieval only)
+pip install tokenshrink
+# With compression (recommended)
+pip install tokenshrink[compression]
+```
+## Usage
+### CLI
+```bash
+# Index files
+tokenshrink index ./docs
+tokenshrink index ./src --extensions .py,.md
+# Query (retrieval only)
+tokenshrink query "How do I authenticate?"
+# Query with compression
+tokenshrink query "How do I authenticate?" --compress
+# View stats
+tokenshrink stats
+# JSON output (for scripts)
+tokenshrink query "question" --json
+```
+### Python API
+```python
+from tokenshrink import TokenShrink
+# Initialize
+ts = TokenShrink()
+# Index your files
+ts.index("./docs")
+# Get compressed context
+result = ts.query("What are the rate limits?")
+print(result.context)      # Ready for your LLM
+print(result.savings)      # "Saved 65% (1200 → 420 tokens)"
+print(result.sources)      # ["api.md", "limits.md"]
+```
+### Integration Examples
+**With OpenAI:**
+```python
+from tokenshrink import TokenShrink
+from openai import OpenAI
+ts = TokenShrink()
+ts.index("./knowledge")
+client = OpenAI()
+def ask(question: str) -> str:
+    # Get relevant, compressed context
+    ctx = ts.query(question)
+    response = client.chat.completions.create(
+        model="gpt-4",
+        messages=[
+            {"role": "system", "content": f"Context:\n{ctx.context}"},
+            {"role": "user", "content": question}
+        ]
+    )
+    print(f"Token savings: {ctx.savings}")
+    return response.choices[0].message.content
+answer = ask("What's the refund policy?")
+```
+**With LangChain:**
+```python
+from tokenshrink import TokenShrink
+from langchain.llms import OpenAI
+from langchain.prompts import PromptTemplate
+ts = TokenShrink()
+ts.index("./docs")
+def get_context(query: str) -> str:
+    result = ts.query(query)
+    return result.context
+# Use in your chain
+template = PromptTemplate(
+    input_variables=["context", "question"],
+    template="Context:\n{context}\n\nQuestion: {question}"
+)
+```
+## How It Works
+```
+┌──────────┐     ┌───────────┐     ┌────────────┐
+│  Files   │ ──► │  Indexer  │ ──► │ FAISS Index│
+└──────────┘     │ (MiniLM)  │     └────────────┘
+                 └───────────┘            │
+                                          ▼
+┌──────────┐     ┌───────────┐     ┌────────────┐
+│ Question │ ──► │  Search   │ ──► │  Relevant  │
+└──────────┘     │           │     │  Chunks    │
+                 └───────────┘     └────────────┘
+                                          │
+                                          ▼
+                               ┌────────────────┐
+                               │  Compressor    │
+                               │ (LLMLingua-2)  │
+                               └────────────────┘
+                                          │
+                                          ▼
+                               ┌────────────────┐
+                               │ Optimized      │
+                               │ Context        │
+                               └────────────────┘
+```
+1. **Index**: Chunks your files, creates embeddings with MiniLM
+2. **Search**: Finds relevant chunks via semantic similarity
+3. **Compress**: Removes redundancy while preserving meaning
+## Configuration
+```python
+ts = TokenShrink(
+    index_dir=".tokenshrink",    # Where to store the index
+    model="all-MiniLM-L6-v2",    # Embedding model
+    chunk_size=512,              # Words per chunk
+    chunk_overlap=50,            # Overlap between chunks
+    device="auto",               # auto, mps, cuda, cpu
+    compression=True,            # Enable LLMLingua
+)
+```
+## Supported File Types
+Default: `.md`, `.txt`, `.py`, `.json`, `.yaml`, `.yml`
+Custom:
+```bash
+tokenshrink index ./src --extensions .py,.ts,.js,.md
+```
+## Performance
+| Metric | Value |
+|--------|-------|
+| Index 1000 files | ~30 seconds |
+| Search latency | <50ms |
+| Compression | ~200ms |
+| Token reduction | 50-80% |
+## Requirements
+- Python 3.10+
+- 4GB RAM (8GB for compression)
+- Apple Silicon: MPS acceleration
+- NVIDIA: CUDA acceleration
+## FAQ
+**Q: Do I need LLMLingua?**
+A: No. Retrieval works without it (still saves 60-70% by loading only relevant chunks). Add compression for extra 20-30% savings.
+**Q: Does it work with non-English?**
+A: Retrieval works well with multilingual content. Compression is English-optimized.
+**Q: How do I update the index?**
+A: Just run `tokenshrink index` again. It detects changed files automatically.
+## Uninstall
+```bash
+pip uninstall tokenshrink
+rm -rf .tokenshrink  # Remove local index
+```
+---
+Built by [Musashi](https://github.com/MusashiMiyamoto1-cloud) · Part of [Agent Guard](https://agentguard.co)

tokenshrink-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,221 @@
+# TokenShrink
+**Cut your AI costs 50-80%.** FAISS semantic retrieval + LLMLingua compression.
+Stop loading entire files into your prompts. Load only what's relevant, compressed.
+## Quick Start
+```bash
+pip install tokenshrink
+# Index your docs
+tokenshrink index ./docs
+# Get compressed context
+tokenshrink query "What are the API limits?" --compress
+```
+## Why TokenShrink?
+| Without | With TokenShrink |
+|---------|------------------|
+| Load entire file (5000 tokens) | Load relevant chunks (200 tokens) |
+| $0.15 per query | $0.03 per query |
+| Slow responses | Fast responses |
+| Hit context limits | Stay under limits |
+**Real numbers:** 50-80% token reduction on typical RAG workloads.
+## Installation
+```bash
+# Basic (retrieval only)
+pip install tokenshrink
+# With compression (recommended)
+pip install tokenshrink[compression]
+```
+## Usage
+### CLI
+```bash
+# Index files
+tokenshrink index ./docs
+tokenshrink index ./src --extensions .py,.md
+# Query (retrieval only)
+tokenshrink query "How do I authenticate?"
+# Query with compression
+tokenshrink query "How do I authenticate?" --compress
+# View stats
+tokenshrink stats
+# JSON output (for scripts)
+tokenshrink query "question" --json
+```
+### Python API
+```python
+from tokenshrink import TokenShrink
+# Initialize
+ts = TokenShrink()
+# Index your files
+ts.index("./docs")
+# Get compressed context
+result = ts.query("What are the rate limits?")
+print(result.context)      # Ready for your LLM
+print(result.savings)      # "Saved 65% (1200 → 420 tokens)"
+print(result.sources)      # ["api.md", "limits.md"]
+```
+### Integration Examples
+**With OpenAI:**
+```python
+from tokenshrink import TokenShrink
+from openai import OpenAI
+ts = TokenShrink()
+ts.index("./knowledge")
+client = OpenAI()
+def ask(question: str) -> str:
+    # Get relevant, compressed context
+    ctx = ts.query(question)
+    response = client.chat.completions.create(
+        model="gpt-4",
+        messages=[
+            {"role": "system", "content": f"Context:\n{ctx.context}"},
+            {"role": "user", "content": question}
+        ]
+    )
+    print(f"Token savings: {ctx.savings}")
+    return response.choices[0].message.content
+answer = ask("What's the refund policy?")
+```
+**With LangChain:**
+```python
+from tokenshrink import TokenShrink
+from langchain.llms import OpenAI
+from langchain.prompts import PromptTemplate
+ts = TokenShrink()
+ts.index("./docs")
+def get_context(query: str) -> str:
+    result = ts.query(query)
+    return result.context
+# Use in your chain
+template = PromptTemplate(
+    input_variables=["context", "question"],
+    template="Context:\n{context}\n\nQuestion: {question}"
+)
+```
+## How It Works
+```
+┌──────────┐     ┌───────────┐     ┌────────────┐
+│  Files   │ ──► │  Indexer  │ ──► │ FAISS Index│
+└──────────┘     │ (MiniLM)  │     └────────────┘
+                 └───────────┘            │
+                                          ▼
+┌──────────┐     ┌───────────┐     ┌────────────┐
+│ Question │ ──► │  Search   │ ──► │  Relevant  │
+└──────────┘     │           │     │  Chunks    │
+                 └───────────┘     └────────────┘
+                                          │
+                                          ▼
+                               ┌────────────────┐
+                               │  Compressor    │
+                               │ (LLMLingua-2)  │
+                               └────────────────┘
+                                          │
+                                          ▼
+                               ┌────────────────┐
+                               │ Optimized      │
+                               │ Context        │
+                               └────────────────┘
+```
+1. **Index**: Chunks your files, creates embeddings with MiniLM
+2. **Search**: Finds relevant chunks via semantic similarity
+3. **Compress**: Removes redundancy while preserving meaning
+## Configuration
+```python
+ts = TokenShrink(
+    index_dir=".tokenshrink",    # Where to store the index
+    model="all-MiniLM-L6-v2",    # Embedding model
+    chunk_size=512,              # Words per chunk
+    chunk_overlap=50,            # Overlap between chunks
+    device="auto",               # auto, mps, cuda, cpu
+    compression=True,            # Enable LLMLingua
+)
+```
+## Supported File Types
+Default: `.md`, `.txt`, `.py`, `.json`, `.yaml`, `.yml`
+Custom:
+```bash
+tokenshrink index ./src --extensions .py,.ts,.js,.md
+```
+## Performance
+| Metric | Value |
+|--------|-------|
+| Index 1000 files | ~30 seconds |
+| Search latency | <50ms |
+| Compression | ~200ms |
+| Token reduction | 50-80% |
+## Requirements
+- Python 3.10+
+- 4GB RAM (8GB for compression)
+- Apple Silicon: MPS acceleration
+- NVIDIA: CUDA acceleration
+## FAQ
+**Q: Do I need LLMLingua?**
+A: No. Retrieval works without it (still saves 60-70% by loading only relevant chunks). Add compression for extra 20-30% savings.
+**Q: Does it work with non-English?**
+A: Retrieval works well with multilingual content. Compression is English-optimized.
+**Q: How do I update the index?**
+A: Just run `tokenshrink index` again. It detects changed files automatically.
+## Uninstall
+```bash
+pip uninstall tokenshrink
+rm -rf .tokenshrink  # Remove local index
+```
+---
+Built by [Musashi](https://github.com/MusashiMiyamoto1-cloud) · Part of [Agent Guard](https://agentguard.co)