tokenshrink 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,38 @@
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *.so
5
+ .Python
6
+ build/
7
+ develop-eggs/
8
+ dist/
9
+ downloads/
10
+ eggs/
11
+ .eggs/
12
+ lib/
13
+ lib64/
14
+ parts/
15
+ sdist/
16
+ var/
17
+ wheels/
18
+ *.egg-info/
19
+ .installed.cfg
20
+ *.egg
21
+
22
+ # Virtual environments
23
+ .venv/
24
+ venv/
25
+ ENV/
26
+
27
+ # IDE
28
+ .idea/
29
+ .vscode/
30
+ *.swp
31
+ *.swo
32
+
33
+ # Index (local)
34
+ .tokenshrink/
35
+
36
+ # OS
37
+ .DS_Store
38
+ Thumbs.db
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Musashi
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,255 @@
1
+ Metadata-Version: 2.4
2
+ Name: tokenshrink
3
+ Version: 0.1.0
4
+ Summary: Cut your AI costs 50-80%. FAISS retrieval + LLMLingua compression.
5
+ Project-URL: Homepage, https://tokenshrink.dev
6
+ Project-URL: Repository, https://github.com/MusashiMiyamoto1-cloud/tokenshrink
7
+ Project-URL: Documentation, https://tokenshrink.dev/docs
8
+ Author-email: Musashi <musashimiyamoto1@icloud.com>
9
+ License-Expression: MIT
10
+ License-File: LICENSE
11
+ Keywords: agents,ai,compression,context,cost-reduction,faiss,llm,llmlingua,rag,tokens
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
20
+ Requires-Python: >=3.10
21
+ Requires-Dist: faiss-cpu>=1.7.4
22
+ Requires-Dist: numpy>=1.24.0
23
+ Requires-Dist: sentence-transformers>=2.2.0
24
+ Provides-Extra: all
25
+ Requires-Dist: llmlingua>=0.2.0; extra == 'all'
26
+ Requires-Dist: pytest>=7.0.0; extra == 'all'
27
+ Requires-Dist: ruff>=0.1.0; extra == 'all'
28
+ Provides-Extra: compression
29
+ Requires-Dist: llmlingua>=0.2.0; extra == 'compression'
30
+ Provides-Extra: dev
31
+ Requires-Dist: pytest>=7.0.0; extra == 'dev'
32
+ Requires-Dist: ruff>=0.1.0; extra == 'dev'
33
+ Description-Content-Type: text/markdown
34
+
35
+ # TokenShrink
36
+
37
+ **Cut your AI costs 50-80%.** FAISS semantic retrieval + LLMLingua compression.
38
+
39
+ Stop loading entire files into your prompts. Load only what's relevant, compressed.
40
+
41
+ ## Quick Start
42
+
43
+ ```bash
44
+ pip install tokenshrink
45
+
46
+ # Index your docs
47
+ tokenshrink index ./docs
48
+
49
+ # Get compressed context
50
+ tokenshrink query "What are the API limits?" --compress
51
+ ```
52
+
53
+ ## Why TokenShrink?
54
+
55
+ | Without | With TokenShrink |
56
+ |---------|------------------|
57
+ | Load entire file (5000 tokens) | Load relevant chunks (200 tokens) |
58
+ | $0.15 per query | $0.03 per query |
59
+ | Slow responses | Fast responses |
60
+ | Hit context limits | Stay under limits |
61
+
62
+ **Real numbers:** 50-80% token reduction on typical RAG workloads.
63
+
64
+ ## Installation
65
+
66
+ ```bash
67
+ # Basic (retrieval only)
68
+ pip install tokenshrink
69
+
70
+ # With compression (recommended)
71
+ pip install tokenshrink[compression]
72
+ ```
73
+
74
+ ## Usage
75
+
76
+ ### CLI
77
+
78
+ ```bash
79
+ # Index files
80
+ tokenshrink index ./docs
81
+ tokenshrink index ./src --extensions .py,.md
82
+
83
+ # Query (retrieval only)
84
+ tokenshrink query "How do I authenticate?"
85
+
86
+ # Query with compression
87
+ tokenshrink query "How do I authenticate?" --compress
88
+
89
+ # View stats
90
+ tokenshrink stats
91
+
92
+ # JSON output (for scripts)
93
+ tokenshrink query "question" --json
94
+ ```
95
+
96
+ ### Python API
97
+
98
+ ```python
99
+ from tokenshrink import TokenShrink
100
+
101
+ # Initialize
102
+ ts = TokenShrink()
103
+
104
+ # Index your files
105
+ ts.index("./docs")
106
+
107
+ # Get compressed context
108
+ result = ts.query("What are the rate limits?")
109
+
110
+ print(result.context) # Ready for your LLM
111
+ print(result.savings) # "Saved 65% (1200 → 420 tokens)"
112
+ print(result.sources) # ["api.md", "limits.md"]
113
+ ```
114
+
115
+ ### Integration Examples
116
+
117
+ **With OpenAI:**
118
+
119
+ ```python
120
+ from tokenshrink import TokenShrink
121
+ from openai import OpenAI
122
+
123
+ ts = TokenShrink()
124
+ ts.index("./knowledge")
125
+
126
+ client = OpenAI()
127
+
128
+ def ask(question: str) -> str:
129
+ # Get relevant, compressed context
130
+ ctx = ts.query(question)
131
+
132
+ response = client.chat.completions.create(
133
+ model="gpt-4",
134
+ messages=[
135
+ {"role": "system", "content": f"Context:\n{ctx.context}"},
136
+ {"role": "user", "content": question}
137
+ ]
138
+ )
139
+
140
+ print(f"Token savings: {ctx.savings}")
141
+ return response.choices[0].message.content
142
+
143
+ answer = ask("What's the refund policy?")
144
+ ```
145
+
146
+ **With LangChain:**
147
+
148
+ ```python
149
+ from tokenshrink import TokenShrink
150
+ from langchain.llms import OpenAI
151
+ from langchain.prompts import PromptTemplate
152
+
153
+ ts = TokenShrink()
154
+ ts.index("./docs")
155
+
156
+ def get_context(query: str) -> str:
157
+ result = ts.query(query)
158
+ return result.context
159
+
160
+ # Use in your chain
161
+ template = PromptTemplate(
162
+ input_variables=["context", "question"],
163
+ template="Context:\n{context}\n\nQuestion: {question}"
164
+ )
165
+ ```
166
+
167
+ ## How It Works
168
+
169
+ ```
170
+ ┌──────────┐ ┌───────────┐ ┌────────────┐
171
+ │ Files │ ──► │ Indexer │ ──► │ FAISS Index│
172
+ └──────────┘ │ (MiniLM) │ └────────────┘
173
+ └───────────┘ │
174
+
175
+ ┌──────────┐ ┌───────────┐ ┌────────────┐
176
+ │ Question │ ──► │ Search │ ──► │ Relevant │
177
+ └──────────┘ │ │ │ Chunks │
178
+ └───────────┘ └────────────┘
179
+
180
+
181
+ ┌────────────────┐
182
+ │ Compressor │
183
+ │ (LLMLingua-2) │
184
+ └────────────────┘
185
+
186
+
187
+ ┌────────────────┐
188
+ │ Optimized │
189
+ │ Context │
190
+ └────────────────┘
191
+ ```
192
+
193
+ 1. **Index**: Chunks your files, creates embeddings with MiniLM
194
+ 2. **Search**: Finds relevant chunks via semantic similarity
195
+ 3. **Compress**: Removes redundancy while preserving meaning
196
+
197
+ ## Configuration
198
+
199
+ ```python
200
+ ts = TokenShrink(
201
+ index_dir=".tokenshrink", # Where to store the index
202
+ model="all-MiniLM-L6-v2", # Embedding model
203
+ chunk_size=512, # Words per chunk
204
+ chunk_overlap=50, # Overlap between chunks
205
+ device="auto", # auto, mps, cuda, cpu
206
+ compression=True, # Enable LLMLingua
207
+ )
208
+ ```
209
+
210
+ ## Supported File Types
211
+
212
+ Default: `.md`, `.txt`, `.py`, `.json`, `.yaml`, `.yml`
213
+
214
+ Custom:
215
+ ```bash
216
+ tokenshrink index ./src --extensions .py,.ts,.js,.md
217
+ ```
218
+
219
+ ## Performance
220
+
221
+ | Metric | Value |
222
+ |--------|-------|
223
+ | Index 1000 files | ~30 seconds |
224
+ | Search latency | <50ms |
225
+ | Compression | ~200ms |
226
+ | Token reduction | 50-80% |
227
+
228
+ ## Requirements
229
+
230
+ - Python 3.10+
231
+ - 4GB RAM (8GB for compression)
232
+ - Apple Silicon: MPS acceleration
233
+ - NVIDIA: CUDA acceleration
234
+
235
+ ## FAQ
236
+
237
+ **Q: Do I need LLMLingua?**
238
+ A: No. Retrieval works without it (still saves 60-70% by loading only relevant chunks). Add compression for extra 20-30% savings.
239
+
240
+ **Q: Does it work with non-English?**
241
+ A: Retrieval works well with multilingual content. Compression is English-optimized.
242
+
243
+ **Q: How do I update the index?**
244
+ A: Just run `tokenshrink index` again. It detects changed files automatically.
245
+
246
+ ## Uninstall
247
+
248
+ ```bash
249
+ pip uninstall tokenshrink
250
+ rm -rf .tokenshrink # Remove local index
251
+ ```
252
+
253
+ ---
254
+
255
+ Built by [Musashi](https://github.com/MusashiMiyamoto1-cloud) · Part of [Agent Guard](https://agentguard.co)
@@ -0,0 +1,221 @@
1
+ # TokenShrink
2
+
3
+ **Cut your AI costs 50-80%.** FAISS semantic retrieval + LLMLingua compression.
4
+
5
+ Stop loading entire files into your prompts. Load only what's relevant, compressed.
6
+
7
+ ## Quick Start
8
+
9
+ ```bash
10
+ pip install tokenshrink
11
+
12
+ # Index your docs
13
+ tokenshrink index ./docs
14
+
15
+ # Get compressed context
16
+ tokenshrink query "What are the API limits?" --compress
17
+ ```
18
+
19
+ ## Why TokenShrink?
20
+
21
+ | Without | With TokenShrink |
22
+ |---------|------------------|
23
+ | Load entire file (5000 tokens) | Load relevant chunks (200 tokens) |
24
+ | $0.15 per query | $0.03 per query |
25
+ | Slow responses | Fast responses |
26
+ | Hit context limits | Stay under limits |
27
+
28
+ **Real numbers:** 50-80% token reduction on typical RAG workloads.
29
+
30
+ ## Installation
31
+
32
+ ```bash
33
+ # Basic (retrieval only)
34
+ pip install tokenshrink
35
+
36
+ # With compression (recommended)
37
+ pip install tokenshrink[compression]
38
+ ```
39
+
40
+ ## Usage
41
+
42
+ ### CLI
43
+
44
+ ```bash
45
+ # Index files
46
+ tokenshrink index ./docs
47
+ tokenshrink index ./src --extensions .py,.md
48
+
49
+ # Query (retrieval only)
50
+ tokenshrink query "How do I authenticate?"
51
+
52
+ # Query with compression
53
+ tokenshrink query "How do I authenticate?" --compress
54
+
55
+ # View stats
56
+ tokenshrink stats
57
+
58
+ # JSON output (for scripts)
59
+ tokenshrink query "question" --json
60
+ ```
61
+
62
+ ### Python API
63
+
64
+ ```python
65
+ from tokenshrink import TokenShrink
66
+
67
+ # Initialize
68
+ ts = TokenShrink()
69
+
70
+ # Index your files
71
+ ts.index("./docs")
72
+
73
+ # Get compressed context
74
+ result = ts.query("What are the rate limits?")
75
+
76
+ print(result.context) # Ready for your LLM
77
+ print(result.savings) # "Saved 65% (1200 → 420 tokens)"
78
+ print(result.sources) # ["api.md", "limits.md"]
79
+ ```
80
+
81
+ ### Integration Examples
82
+
83
+ **With OpenAI:**
84
+
85
+ ```python
86
+ from tokenshrink import TokenShrink
87
+ from openai import OpenAI
88
+
89
+ ts = TokenShrink()
90
+ ts.index("./knowledge")
91
+
92
+ client = OpenAI()
93
+
94
+ def ask(question: str) -> str:
95
+ # Get relevant, compressed context
96
+ ctx = ts.query(question)
97
+
98
+ response = client.chat.completions.create(
99
+ model="gpt-4",
100
+ messages=[
101
+ {"role": "system", "content": f"Context:\n{ctx.context}"},
102
+ {"role": "user", "content": question}
103
+ ]
104
+ )
105
+
106
+ print(f"Token savings: {ctx.savings}")
107
+ return response.choices[0].message.content
108
+
109
+ answer = ask("What's the refund policy?")
110
+ ```
111
+
112
+ **With LangChain:**
113
+
114
+ ```python
115
+ from tokenshrink import TokenShrink
116
+ from langchain.llms import OpenAI
117
+ from langchain.prompts import PromptTemplate
118
+
119
+ ts = TokenShrink()
120
+ ts.index("./docs")
121
+
122
+ def get_context(query: str) -> str:
123
+ result = ts.query(query)
124
+ return result.context
125
+
126
+ # Use in your chain
127
+ template = PromptTemplate(
128
+ input_variables=["context", "question"],
129
+ template="Context:\n{context}\n\nQuestion: {question}"
130
+ )
131
+ ```
132
+
133
+ ## How It Works
134
+
135
+ ```
136
+ ┌──────────┐ ┌───────────┐ ┌────────────┐
137
+ │ Files │ ──► │ Indexer │ ──► │ FAISS Index│
138
+ └──────────┘ │ (MiniLM) │ └────────────┘
139
+ └───────────┘ │
140
+
141
+ ┌──────────┐ ┌───────────┐ ┌────────────┐
142
+ │ Question │ ──► │ Search │ ──► │ Relevant │
143
+ └──────────┘ │ │ │ Chunks │
144
+ └───────────┘ └────────────┘
145
+
146
+
147
+ ┌────────────────┐
148
+ │ Compressor │
149
+ │ (LLMLingua-2) │
150
+ └────────────────┘
151
+
152
+
153
+ ┌────────────────┐
154
+ │ Optimized │
155
+ │ Context │
156
+ └────────────────┘
157
+ ```
158
+
159
+ 1. **Index**: Chunks your files, creates embeddings with MiniLM
160
+ 2. **Search**: Finds relevant chunks via semantic similarity
161
+ 3. **Compress**: Removes redundancy while preserving meaning
162
+
163
+ ## Configuration
164
+
165
+ ```python
166
+ ts = TokenShrink(
167
+ index_dir=".tokenshrink", # Where to store the index
168
+ model="all-MiniLM-L6-v2", # Embedding model
169
+ chunk_size=512, # Words per chunk
170
+ chunk_overlap=50, # Overlap between chunks
171
+ device="auto", # auto, mps, cuda, cpu
172
+ compression=True, # Enable LLMLingua
173
+ )
174
+ ```
175
+
176
+ ## Supported File Types
177
+
178
+ Default: `.md`, `.txt`, `.py`, `.json`, `.yaml`, `.yml`
179
+
180
+ Custom:
181
+ ```bash
182
+ tokenshrink index ./src --extensions .py,.ts,.js,.md
183
+ ```
184
+
185
+ ## Performance
186
+
187
+ | Metric | Value |
188
+ |--------|-------|
189
+ | Index 1000 files | ~30 seconds |
190
+ | Search latency | <50ms |
191
+ | Compression | ~200ms |
192
+ | Token reduction | 50-80% |
193
+
194
+ ## Requirements
195
+
196
+ - Python 3.10+
197
+ - 4GB RAM (8GB for compression)
198
+ - Apple Silicon: MPS acceleration
199
+ - NVIDIA: CUDA acceleration
200
+
201
+ ## FAQ
202
+
203
+ **Q: Do I need LLMLingua?**
204
+ A: No. Retrieval works without it (still saves 60-70% by loading only relevant chunks). Add compression for extra 20-30% savings.
205
+
206
+ **Q: Does it work with non-English?**
207
+ A: Retrieval works well with multilingual content. Compression is English-optimized.
208
+
209
+ **Q: How do I update the index?**
210
+ A: Just run `tokenshrink index` again. It detects changed files automatically.
211
+
212
+ ## Uninstall
213
+
214
+ ```bash
215
+ pip uninstall tokenshrink
216
+ rm -rf .tokenshrink # Remove local index
217
+ ```
218
+
219
+ ---
220
+
221
+ Built by [Musashi](https://github.com/MusashiMiyamoto1-cloud) · Part of [Agent Guard](https://agentguard.co)