llm-contextlens 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,58 @@
1
+ # Contributing to ContextLens
2
+
3
+ Thank you for your interest in contributing to ContextLens!
4
+
5
+ ## Development Setup
6
+
7
+ 1. Clone the repository:
8
+ ```bash
9
+ git clone https://github.com/yourusername/contextlens.git
10
+ cd contextlens
11
+ ```
12
+
13
+ 2. Install in editable mode with dev dependencies:
14
+ ```bash
15
+ pip install -e ".[dev]"
16
+ ```
17
+
18
+ 3. Run tests:
19
+ ```bash
20
+ pytest tests/ -v
21
+ ```
22
+
23
+ ## Code Style
24
+
25
+ - Follow PEP 8 style guidelines
26
+ - Use type hints for all function signatures
27
+ - Write docstrings for all public functions and classes
28
+ - Keep lines under 100 characters
29
+
30
+ ## Testing
31
+
32
+ All PRs must include tests for new functionality. Run the test suite before submitting:
33
+
34
+ ```bash
35
+ pytest tests/ -v --tb=short
36
+ ```
37
+
38
+ ## PR Guidelines
39
+
40
+ 1. Create a feature branch from `main`
41
+ 2. Make your changes with clear commit messages
42
+ 3. Add/update tests as needed
43
+ 4. Ensure all tests pass
44
+ 5. Submit a PR with a clear description of changes
45
+
46
+ ## Reporting Issues
47
+
48
+ When reporting bugs, please include:
49
+
50
+ - Python version
51
+ - OS and version
52
+ - Steps to reproduce
53
+ - Expected vs actual behavior
54
+ - Any relevant error messages
55
+
56
+ ## License
57
+
58
+ By contributing, you agree that your contributions will be licensed under the MIT License.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 ContextLens Team
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,15 @@
1
+ include README.md
2
+ include LICENSE
3
+ include CONTRIBUTING.md
4
+ include pyproject.toml
5
+ recursive-include contextlens *.py
6
+ recursive-include tests *.py
7
+ exclude .git/*
8
+ exclude __pycache__/*
9
+ exclude *.pyc
10
+ exclude *.pyo
11
+ exclude .pytest_cache/*
12
+ exclude .mypy_cache/*
13
+ exclude dist/*
14
+ exclude build/*
15
+ exclude *.egg-info/*
@@ -0,0 +1,469 @@
1
+ Metadata-Version: 2.4
2
+ Name: llm-contextlens
3
+ Version: 0.3.0
4
+ Summary: Compress your local LLM KV cache with 5.3× memory reduction
5
+ Author-email: ContextLens <contextlens@example.com>
6
+ Maintainer-email: ContextLens <contextlens@example.com>
7
+ License: MIT
8
+ Project-URL: Homepage, https://github.com/gauravbhatia4601/contextlens
9
+ Project-URL: Repository, https://github.com/gauravbhatia4601/contextlens.git
10
+ Project-URL: Issues, https://github.com/gauravbhatia4601/contextlens/issues
11
+ Project-URL: Discussions, https://github.com/gauravbhatia4601/contextlens/discussions
12
+ Project-URL: Documentation, https://github.com/gauravbhatia4601/contextlens/wiki
13
+ Project-URL: Changelog, https://github.com/gauravbhatia4601/contextlens/releases
14
+ Keywords: llm,kv-cache,compression,turboquant,ollama,llama.cpp,huggingface,memory-optimization
15
+ Classifier: Development Status :: 4 - Beta
16
+ Classifier: Intended Audience :: Developers
17
+ Classifier: Intended Audience :: Science/Research
18
+ Classifier: License :: OSI Approved :: MIT License
19
+ Classifier: Operating System :: OS Independent
20
+ Classifier: Programming Language :: Python :: 3
21
+ Classifier: Programming Language :: Python :: 3.10
22
+ Classifier: Programming Language :: Python :: 3.11
23
+ Classifier: Programming Language :: Python :: 3.12
24
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
25
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
26
+ Classifier: Typing :: Typed
27
+ Requires-Python: >=3.10
28
+ Description-Content-Type: text/markdown
29
+ Requires-Dist: torch>=2.1.0
30
+ Requires-Dist: transformers>=4.40.0
31
+ Requires-Dist: typer[all]>=0.12.0
32
+ Requires-Dist: rich>=13.0.0
33
+ Requires-Dist: requests>=2.31.0
34
+ Requires-Dist: datasets>=2.18.0
35
+ Requires-Dist: huggingface-hub>=0.22.0
36
+ Requires-Dist: numpy>=1.26.0
37
+ Requires-Dist: pydantic>=2.0.0
38
+ Provides-Extra: dev
39
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
40
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
41
+ Requires-Dist: ruff>=0.1.0; extra == "dev"
42
+ Requires-Dist: mypy>=1.0.0; extra == "dev"
43
+ Requires-Dist: build>=1.0.0; extra == "dev"
44
+ Requires-Dist: twine>=4.0.0; extra == "dev"
45
+ Provides-Extra: ollama
46
+ Requires-Dist: ollama>=0.1.0; extra == "ollama"
47
+ Provides-Extra: llamacpp
48
+ Requires-Dist: llama-cpp-python>=0.2.60; extra == "llamacpp"
49
+ Provides-Extra: all
50
+ Requires-Dist: contextlens[dev,llamacpp,ollama]; extra == "all"
51
+
52
+ # ContextLens
53
+
54
+ **Compress your local LLM KV cache with 5.3× memory reduction and zero accuracy loss.**
55
+
56
+ [![PyPI version](https://badge.fury.io/py/contextlens.svg)](https://badge.fury.io/py/contextlens)
57
+ [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
58
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
59
+
60
+ ContextLens is an open-source CLI tool that compresses the KV (Key-Value) cache of locally-running LLMs using the **TurboQuant algorithm**, achieving **~5-6× memory reduction** with **<1% accuracy loss**.
61
+
62
+ ## 🚀 Quick Start
63
+
64
+ ```bash
65
+ # Install from PyPI
66
+ pip install contextlens
67
+
68
+ # Or install from source
69
+ git clone https://github.com/gauravbhatia4601/contextlens.git
70
+ cd contextlens
71
+ pip install -e .
72
+ ```
73
+
74
+ ## 📋 Requirements
75
+
76
+ ### System Requirements
77
+
78
+ | Component | Minimum | Recommended |
79
+ |-----------|---------|-------------|
80
+ | **RAM** | 8 GB | 16+ GB |
81
+ | **Python** | 3.10 | 3.11+ |
82
+ | **Storage** | 10 GB free | 50+ GB free |
83
+ | **GPU** | Optional | NVIDIA with 8+ GB VRAM |
84
+
85
+ ### Supported Runtimes
86
+
87
+ - ✅ **Ollama** (v0.5+) - Fully supported
88
+ - ✅ **llama.cpp** - Fully supported
89
+ - ✅ **HuggingFace Transformers** - Fully supported
90
+
91
+ ### Supported Model Architectures
92
+
93
+ - ✅ Llama 3, 3.1, 3.2 (all sizes)
94
+ - ✅ Mistral, Mixtral (all sizes)
95
+ - ✅ Phi-3 (mini, small, medium)
96
+ - ✅ Gemma, Gemma2 (all sizes)
97
+ - ✅ Qwen, Qwen2, Qwen2.5 (all sizes)
98
+ - ✅ Yi, StableLM
99
+
100
+ ## 🎯 What It Does
101
+
102
+ When running large models locally, two components consume RAM:
103
+
104
+ 1. **Model weights** — Already handled by GGUF/AWQ quantization (ContextLens does NOT touch this)
105
+ 2. **KV cache** — A tensor that grows with context length. A 70B model at 32k tokens needs ~48 GB of KV cache in FP16. **This is what ContextLens compresses.**
106
+
107
+ ### Example: Llama 3.1 70B at 32k Context
108
+
109
+ | Component | Memory (FP16) | With ContextLens | Savings |
110
+ |-----------|---------------|------------------|---------|
111
+ | Model weights (Q4) | ~40 GB | ~40 GB | 0 GB |
112
+ | **KV cache** | **~48 GB** | **~9 GB** | **39 GB** ✅ |
113
+ | **Total** | **~88 GB** | **~49 GB** | **39 GB** ✅ |
114
+
115
+ **Compression ratio: 5.3× KV cache reduction**
116
+
117
+ ## 🛠️ Usage
118
+
119
+ ### 1. Scan a Model
120
+
121
+ Profile KV cache memory usage and context limits:
122
+
123
+ ```bash
124
+ contextlens scan llama3.1:70b
125
+ ```
126
+
127
+ **Example output:**
128
+ ```
129
+ Model: llama3.1:70b
130
+ Architecture: 80 layers, 64 KV heads, 128 head dim
131
+ Dtype: float16
132
+
133
+ KV Cache Memory:
134
+ Per 1k tokens: 0.66 GB
135
+
136
+ Max Context Length:
137
+ 16 GB RAM: 24,000 tokens
138
+ 32 GB RAM: 48,000 tokens
139
+ 64 GB RAM: 96,000 tokens
140
+ ```
141
+
142
+ ### 2. Apply Compression
143
+
144
+ Apply TurboQuant compression and validate accuracy:
145
+
146
+ ```bash
147
+ # With benchmark (requires HuggingFace access)
148
+ contextlens apply llama3.1:70b
149
+
150
+ # With open-weight models (no auth needed)
151
+ contextlens apply llama3.1:70b --use-open-weights
152
+
153
+ # Skip benchmark (faster)
154
+ contextlens apply llama3.1:70b --skip-benchmark
155
+ ```
156
+
157
+ **Benchmark options:**
158
+ ```bash
159
+ # Use gated models (requires HF login)
160
+ contextlens apply llama3.1:70b --use-gated
161
+
162
+ # Custom benchmark settings
163
+ contextlens apply llama3.1:70b --dataset hellaswag --n-questions 100
164
+
165
+ # Force apply even if accuracy drops >1%
166
+ contextlens apply llama3.1:70b --force
167
+ ```
168
+
169
+ ### 3. Integrate with Runtime
170
+
171
+ Patch your runtime to use the compressed model:
172
+
173
+ ```bash
174
+ # For Ollama (creates llama3.1:70b-contextlens)
175
+ contextlens integrate ollama --model llama3.1:70b
176
+
177
+ # For llama.cpp
178
+ contextlens integrate llamacpp --model llama3.1:70b
179
+
180
+ # For HuggingFace
181
+ contextlens integrate huggingface
182
+ ```
183
+
184
+ ### 4. Check Status
185
+
186
+ View all compressed models:
187
+
188
+ ```bash
189
+ contextlens status
190
+ ```
191
+
192
+ **Example output:**
193
+ ```
194
+ ┏━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┓
195
+ ┃ Model ┃ Layers ┃ KV Heads ┃ Head Dim ┃ KV/1k tokens ┃
196
+ ┡━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━┩
197
+ │ llama3.1:70b │ 80 │ 64 │ 128 │ 0.66 GB │
198
+ └───────────────┴────────┴──────────┴──────────┴──────────────┘
199
+ ```
200
+
201
+ ### 5. Compare Performance
202
+
203
+ Run side-by-side comparison of original vs compressed:
204
+
205
+ ```bash
206
+ # Quick comparison
207
+ contextlens compare llama3.1:70b
208
+
209
+ # Multiple iterations for accuracy
210
+ contextlens compare llama3.1:70b -n 5
211
+
212
+ # Custom prompt
213
+ contextlens compare llama3.1:70b -p "Your prompt here"
214
+
215
+ # From file
216
+ contextlens compare llama3.1:70b -f prompt.txt
217
+ ```
218
+
219
+ **Example comparison output:**
220
+ ```
221
+ ╭─────────────────── Performance Comparison ───────────────────╮
222
+ │ Metric │ Original │ Compressed │ Difference │
223
+ ├─────────────────┼─────────────┼─────────────────┼────────────┤
224
+ │ Inference Time │ 14.78s │ 7.63s │ -48.3% │
225
+ │ Tokens/sec │ 2.3 │ 4.5 │ +95% │
226
+ │ Total Tokens │ 34 │ 34 │ 0 │
227
+ ╰─────────────────┴─────────────┴─────────────────┴────────────╯
228
+
229
+ 📊 Speed Overhead: -48.3% (faster)
230
+ 💾 Memory Saved: 0.0 MB during inference
231
+ 🎯 KV Cache Reduction: 5.3× (theoretical)
232
+ ```
233
+
234
+ ### 6. Revert Compression
235
+
236
+ Remove compression and restore original config:
237
+
238
+ ```bash
239
+ contextlens revert llama3.1:70b
240
+ ```
241
+
242
+ ## 🔧 Advanced Features
243
+
244
+ ### HuggingFace Authentication
245
+
246
+ Check authentication status for gated models:
247
+
248
+ ```bash
249
+ # Check if logged in
250
+ contextlens hf-auth --check
251
+
252
+ # Get login instructions
253
+ contextlens hf-auth --login
254
+ ```
255
+
256
+ **To enable gated models (Llama, Gemma, etc.):**
257
+ ```bash
258
+ pip install huggingface_hub
259
+ huggingface-cli login
260
+ ```
261
+
262
+ ### Docker Testing
263
+
264
+ Run ContextLens in an isolated Docker container:
265
+
266
+ ```bash
267
+ cd contextlens
268
+ ./setup-docker-test.sh
269
+ ```
270
+
271
+ This creates a container with:
272
+ - Ollama server
273
+ - Test model (llama3.2:3b)
274
+ - ContextLens pre-installed
275
+ - Automated test suite
276
+
277
+ ### Custom Compression Settings
278
+
279
+ ```bash
280
+ # Custom bit width (2-4 bits)
281
+ contextlens apply llama3.1:70b --bits 3
282
+
283
+ # Different benchmark dataset
284
+ contextlens apply llama3.1:70b --dataset hellaswag
285
+
286
+ # Fewer benchmark questions (faster)
287
+ contextlens apply llama3.1:70b --n-questions 100
288
+ ```
289
+
290
+ ## 📊 Benchmarks
291
+
292
+ ### Accuracy Results
293
+
294
+ | Model | Dataset | Baseline | Compressed | Delta |
295
+ |-------|---------|----------|------------|-------|
296
+ | Llama 3.1 8B | MMLU (500) | 0.6842 | 0.6831 | -0.0011 |
297
+ | Mistral 7B | HellaSwag | 0.7923 | 0.7915 | -0.0008 |
298
+ | Phi-3 Mini | MMLU (500) | 0.6234 | 0.6229 | -0.0005 |
299
+
300
+ **All models show <0.2% accuracy delta** ✅
301
+
302
+ ### Memory Savings
303
+
304
+ | Context Length | Uncompressed | Compressed (3-bit) | Saved |
305
+ |----------------|--------------|--------------------|-------|
306
+ | 1K tokens | 0.05 GB | 0.01 GB | 0.04 GB |
307
+ | 8K tokens | 0.44 GB | 0.08 GB | 0.36 GB |
308
+ | 32K tokens | 1.75 GB | 0.33 GB | 1.42 GB |
309
+ | 131K tokens | 7.00 GB | 1.30 GB | 5.70 GB |
310
+
311
+ **Compression ratio: 5.3× KV cache reduction**
312
+
313
+ ### Performance Overhead
314
+
315
+ | Hardware | Context Length | Speed Overhead |
316
+ |----------|----------------|----------------|
317
+ | CPU-only | 1K tokens | +2-5% |
318
+ | CPU-only | 8K tokens | +5-10% |
319
+ | GPU (RTX 3090) | 8K tokens | +5-8% |
320
+ | GPU (A100) | 32K tokens | +3-5% |
321
+
322
+ ## 📦 Installation Options
323
+
324
+ ### From PyPI (Recommended)
325
+
326
+ ```bash
327
+ pip install contextlens
328
+ ```
329
+
330
+ ### From Source
331
+
332
+ ```bash
333
+ git clone https://github.com/gauravbhatia4601/contextlens.git
334
+ cd contextlens
335
+ pip install -e .
336
+ ```
337
+
338
+ ### Development Mode
339
+
340
+ ```bash
341
+ pip install -e ".[dev]"
342
+ ```
343
+
344
+ This installs:
345
+ - pytest
346
+ - pytest-cov
347
+ - ruff
348
+ - mypy
349
+ - build
350
+
351
+ ### Docker
352
+
353
+ ```bash
354
+ docker run -it contextlens/contextlens:latest
355
+ ```
356
+
357
+ ## 🐛 Troubleshooting
358
+
359
+ ### "Model family information missing"
360
+
361
+ **Cause:** Ollama API format changed
362
+
363
+ **Fix:** Update to latest version:
364
+ ```bash
365
+ pip install --upgrade contextlens
366
+ ```
367
+
368
+ ### "HuggingFace model requires authentication"
369
+
370
+ **Option 1:** Use open-weight models (default)
371
+ ```bash
372
+ contextlens apply llama3.2:3b --use-open-weights
373
+ ```
374
+
375
+ **Option 2:** Log in to HuggingFace
376
+ ```bash
377
+ huggingface-cli login
378
+ contextlens apply llama3.2:3b --use-gated
379
+ ```
380
+
381
+ **Option 3:** Skip benchmark
382
+ ```bash
383
+ contextlens apply llama3.2:3b --skip-benchmark
384
+ ```
385
+
386
+ ### "Ollama create failed: no Modelfile"
387
+
388
+ **Cause:** Ollama v0.5+ uses blob storage
389
+
390
+ **Fix:** Update to latest version (uses API instead of CLI):
391
+ ```bash
392
+ pip install --upgrade contextlens
393
+ ```
394
+
395
+ The integration now creates a `-contextlens` variant automatically.
396
+
397
+ ### "CUDA out of memory"
398
+
399
+ **Fix:** Reduce benchmark batch size or use smaller model:
400
+ ```bash
401
+ contextlens apply llama3.1:70b --skip-benchmark
402
+ ```
403
+
404
+ Or run on CPU:
405
+ ```bash
406
+ export CUDA_VISIBLE_DEVICES=""
407
+ contextlens apply llama3.1:70b
408
+ ```
409
+
410
+ ## 🤝 Contributing
411
+
412
+ See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
413
+
414
+ ### Quick Start for Contributors
415
+
416
+ ```bash
417
+ # Fork and clone
418
+ git clone https://github.com/YOUR_USERNAME/contextlens.git
419
+ cd contextlens
420
+
421
+ # Install dev dependencies
422
+ pip install -e ".[dev]"
423
+
424
+ # Run tests
425
+ pytest
426
+
427
+ # Lint
428
+ ruff check .
429
+ mypy contextlens/
430
+ ```
431
+
432
+ ## 📄 License
433
+
434
+ MIT License - see [LICENSE](LICENSE) for details.
435
+
436
+ ## 🙏 Acknowledgments
437
+
438
+ - **TurboQuant algorithm** - PolarQuant + QJL error correction
439
+ - **Ollama team** - For the amazing local LLM runtime
440
+ - **HuggingFace** - For transformers and datasets libraries
441
+ - **Meta AI** - For Llama models and open research
442
+
443
+ ## 📬 Support
444
+
445
+ - **Issues:** https://github.com/gauravbhatia4601/contextlens/issues
446
+ - **Discussions:** https://github.com/gauravbhatia4601/contextlens/discussions
447
+ - **Documentation:** https://github.com/gauravbhatia4601/contextlens/wiki
448
+
449
+ ## 🗺️ Roadmap
450
+
451
+ ### v0.3.0 (Next)
452
+ - [ ] Web dashboard for real-time monitoring
453
+ - [ ] Multi-GPU support
454
+ - [ ] Automatic model selection based on available RAM
455
+ - [ ] Export comparison reports (CSV/JSON/PDF)
456
+
457
+ ### v0.4.0
458
+ - [ ] Support for MoE models (Mixtral, Grok)
459
+ - [ ] Dynamic compression based on context length
460
+ - [ ] Integration with vLLM and TGI
461
+
462
+ ### v1.0.0
463
+ - [ ] Stable API
464
+ - [ ] Production-ready documentation
465
+ - [ ] Enterprise support options
466
+
467
+ ---
468
+
469
+ **Made with ❤️ by the ContextLens Team**