llm-contextlens 0.3.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- llm_contextlens-0.3.0/CONTRIBUTING.md +58 -0
- llm_contextlens-0.3.0/LICENSE +21 -0
- llm_contextlens-0.3.0/MANIFEST.in +15 -0
- llm_contextlens-0.3.0/PKG-INFO +469 -0
- llm_contextlens-0.3.0/README.md +418 -0
- llm_contextlens-0.3.0/contextlens/__init__.py +19 -0
- llm_contextlens-0.3.0/contextlens/benchmarks.py +370 -0
- llm_contextlens-0.3.0/contextlens/cli.py +588 -0
- llm_contextlens-0.3.0/contextlens/compare.py +372 -0
- llm_contextlens-0.3.0/contextlens/compressor.py +253 -0
- llm_contextlens-0.3.0/contextlens/integrations/__init__.py +53 -0
- llm_contextlens-0.3.0/contextlens/integrations/huggingface.py +247 -0
- llm_contextlens-0.3.0/contextlens/integrations/llamacpp.py +153 -0
- llm_contextlens-0.3.0/contextlens/integrations/ollama.py +263 -0
- llm_contextlens-0.3.0/contextlens/profiles.py +88 -0
- llm_contextlens-0.3.0/contextlens/py.typed +0 -0
- llm_contextlens-0.3.0/contextlens/scanner.py +117 -0
- llm_contextlens-0.3.0/contextlens/utils.py +15 -0
- llm_contextlens-0.3.0/llm_contextlens.egg-info/SOURCES.txt +21 -0
- llm_contextlens-0.3.0/pyproject.toml +129 -0
- llm_contextlens-0.3.0/setup.cfg +4 -0
- llm_contextlens-0.3.0/tests/test_compressor.py +155 -0
- llm_contextlens-0.3.0/tests/test_integrations.py +187 -0
- llm_contextlens-0.3.0/tests/test_scanner.py +149 -0
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
# Contributing to ContextLens
|
|
2
|
+
|
|
3
|
+
Thank you for your interest in contributing to ContextLens!
|
|
4
|
+
|
|
5
|
+
## Development Setup
|
|
6
|
+
|
|
7
|
+
1. Clone the repository:
|
|
8
|
+
```bash
|
|
9
|
+
git clone https://github.com/yourusername/contextlens.git
|
|
10
|
+
cd contextlens
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
2. Install in editable mode with dev dependencies:
|
|
14
|
+
```bash
|
|
15
|
+
pip install -e ".[dev]"
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
3. Run tests:
|
|
19
|
+
```bash
|
|
20
|
+
pytest tests/ -v
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## Code Style
|
|
24
|
+
|
|
25
|
+
- Follow PEP 8 style guidelines
|
|
26
|
+
- Use type hints for all function signatures
|
|
27
|
+
- Write docstrings for all public functions and classes
|
|
28
|
+
- Keep lines under 100 characters
|
|
29
|
+
|
|
30
|
+
## Testing
|
|
31
|
+
|
|
32
|
+
All PRs must include tests for new functionality. Run the test suite before submitting:
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
pytest tests/ -v --tb=short
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## PR Guidelines
|
|
39
|
+
|
|
40
|
+
1. Create a feature branch from `main`
|
|
41
|
+
2. Make your changes with clear commit messages
|
|
42
|
+
3. Add/update tests as needed
|
|
43
|
+
4. Ensure all tests pass
|
|
44
|
+
5. Submit a PR with a clear description of changes
|
|
45
|
+
|
|
46
|
+
## Reporting Issues
|
|
47
|
+
|
|
48
|
+
When reporting bugs, please include:
|
|
49
|
+
|
|
50
|
+
- Python version
|
|
51
|
+
- OS and version
|
|
52
|
+
- Steps to reproduce
|
|
53
|
+
- Expected vs actual behavior
|
|
54
|
+
- Any relevant error messages
|
|
55
|
+
|
|
56
|
+
## License
|
|
57
|
+
|
|
58
|
+
By contributing, you agree that your contributions will be licensed under the MIT License.
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 ContextLens Team
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
include README.md
|
|
2
|
+
include LICENSE
|
|
3
|
+
include CONTRIBUTING.md
|
|
4
|
+
include pyproject.toml
|
|
5
|
+
recursive-include contextlens *.py
|
|
6
|
+
recursive-include tests *.py
|
|
7
|
+
exclude .git/*
|
|
8
|
+
exclude __pycache__/*
|
|
9
|
+
exclude *.pyc
|
|
10
|
+
exclude *.pyo
|
|
11
|
+
exclude .pytest_cache/*
|
|
12
|
+
exclude .mypy_cache/*
|
|
13
|
+
exclude dist/*
|
|
14
|
+
exclude build/*
|
|
15
|
+
exclude *.egg-info/*
|
|
@@ -0,0 +1,469 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: llm-contextlens
|
|
3
|
+
Version: 0.3.0
|
|
4
|
+
Summary: Compress your local LLM KV cache with 5.3× memory reduction
|
|
5
|
+
Author-email: ContextLens <contextlens@example.com>
|
|
6
|
+
Maintainer-email: ContextLens <contextlens@example.com>
|
|
7
|
+
License: MIT
|
|
8
|
+
Project-URL: Homepage, https://github.com/gauravbhatia4601/contextlens
|
|
9
|
+
Project-URL: Repository, https://github.com/gauravbhatia4601/contextlens.git
|
|
10
|
+
Project-URL: Issues, https://github.com/gauravbhatia4601/contextlens/issues
|
|
11
|
+
Project-URL: Discussions, https://github.com/gauravbhatia4601/contextlens/discussions
|
|
12
|
+
Project-URL: Documentation, https://github.com/gauravbhatia4601/contextlens/wiki
|
|
13
|
+
Project-URL: Changelog, https://github.com/gauravbhatia4601/contextlens/releases
|
|
14
|
+
Keywords: llm,kv-cache,compression,turboquant,ollama,llama.cpp,huggingface,memory-optimization
|
|
15
|
+
Classifier: Development Status :: 4 - Beta
|
|
16
|
+
Classifier: Intended Audience :: Developers
|
|
17
|
+
Classifier: Intended Audience :: Science/Research
|
|
18
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
19
|
+
Classifier: Operating System :: OS Independent
|
|
20
|
+
Classifier: Programming Language :: Python :: 3
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
22
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
23
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
24
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
25
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
26
|
+
Classifier: Typing :: Typed
|
|
27
|
+
Requires-Python: >=3.10
|
|
28
|
+
Description-Content-Type: text/markdown
|
|
29
|
+
Requires-Dist: torch>=2.1.0
|
|
30
|
+
Requires-Dist: transformers>=4.40.0
|
|
31
|
+
Requires-Dist: typer[all]>=0.12.0
|
|
32
|
+
Requires-Dist: rich>=13.0.0
|
|
33
|
+
Requires-Dist: requests>=2.31.0
|
|
34
|
+
Requires-Dist: datasets>=2.18.0
|
|
35
|
+
Requires-Dist: huggingface-hub>=0.22.0
|
|
36
|
+
Requires-Dist: numpy>=1.26.0
|
|
37
|
+
Requires-Dist: pydantic>=2.0.0
|
|
38
|
+
Provides-Extra: dev
|
|
39
|
+
Requires-Dist: pytest>=7.0.0; extra == "dev"
|
|
40
|
+
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
|
|
41
|
+
Requires-Dist: ruff>=0.1.0; extra == "dev"
|
|
42
|
+
Requires-Dist: mypy>=1.0.0; extra == "dev"
|
|
43
|
+
Requires-Dist: build>=1.0.0; extra == "dev"
|
|
44
|
+
Requires-Dist: twine>=4.0.0; extra == "dev"
|
|
45
|
+
Provides-Extra: ollama
|
|
46
|
+
Requires-Dist: ollama>=0.1.0; extra == "ollama"
|
|
47
|
+
Provides-Extra: llamacpp
|
|
48
|
+
Requires-Dist: llama-cpp-python>=0.2.60; extra == "llamacpp"
|
|
49
|
+
Provides-Extra: all
|
|
50
|
+
Requires-Dist: contextlens[dev,llamacpp,ollama]; extra == "all"
|
|
51
|
+
|
|
52
|
+
# ContextLens
|
|
53
|
+
|
|
54
|
+
**Compress your local LLM KV cache with 5.3× memory reduction and zero accuracy loss.**
|
|
55
|
+
|
|
56
|
+
[](https://badge.fury.io/py/contextlens)
|
|
57
|
+
[](https://www.python.org/downloads/)
|
|
58
|
+
[](https://opensource.org/licenses/MIT)
|
|
59
|
+
|
|
60
|
+
ContextLens is an open-source CLI tool that compresses the KV (Key-Value) cache of locally-running LLMs using the **TurboQuant algorithm**, achieving **~5-6× memory reduction** with **<1% accuracy loss**.
|
|
61
|
+
|
|
62
|
+
## 🚀 Quick Start
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
# Install from PyPI
|
|
66
|
+
pip install contextlens
|
|
67
|
+
|
|
68
|
+
# Or install from source
|
|
69
|
+
git clone https://github.com/gauravbhatia4601/contextlens.git
|
|
70
|
+
cd contextlens
|
|
71
|
+
pip install -e .
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
## 📋 Requirements
|
|
75
|
+
|
|
76
|
+
### System Requirements
|
|
77
|
+
|
|
78
|
+
| Component | Minimum | Recommended |
|
|
79
|
+
|-----------|---------|-------------|
|
|
80
|
+
| **RAM** | 8 GB | 16+ GB |
|
|
81
|
+
| **Python** | 3.10 | 3.11+ |
|
|
82
|
+
| **Storage** | 10 GB free | 50+ GB free |
|
|
83
|
+
| **GPU** | Optional | NVIDIA with 8+ GB VRAM |
|
|
84
|
+
|
|
85
|
+
### Supported Runtimes
|
|
86
|
+
|
|
87
|
+
- ✅ **Ollama** (v0.5+) - Fully supported
|
|
88
|
+
- ✅ **llama.cpp** - Fully supported
|
|
89
|
+
- ✅ **HuggingFace Transformers** - Fully supported
|
|
90
|
+
|
|
91
|
+
### Supported Model Architectures
|
|
92
|
+
|
|
93
|
+
- ✅ Llama 3, 3.1, 3.2 (all sizes)
|
|
94
|
+
- ✅ Mistral, Mixtral (all sizes)
|
|
95
|
+
- ✅ Phi-3 (mini, small, medium)
|
|
96
|
+
- ✅ Gemma, Gemma2 (all sizes)
|
|
97
|
+
- ✅ Qwen, Qwen2, Qwen2.5 (all sizes)
|
|
98
|
+
- ✅ Yi, StableLM
|
|
99
|
+
|
|
100
|
+
## 🎯 What It Does
|
|
101
|
+
|
|
102
|
+
When running large models locally, two components consume RAM:
|
|
103
|
+
|
|
104
|
+
1. **Model weights** — Already handled by GGUF/AWQ quantization (ContextLens does NOT touch this)
|
|
105
|
+
2. **KV cache** — A tensor that grows with context length. A 70B model at 32k tokens needs ~48 GB of KV cache in FP16. **This is what ContextLens compresses.**
|
|
106
|
+
|
|
107
|
+
### Example: Llama 3.1 70B at 32k Context
|
|
108
|
+
|
|
109
|
+
| Component | Memory (FP16) | With ContextLens | Savings |
|
|
110
|
+
|-----------|---------------|------------------|---------|
|
|
111
|
+
| Model weights (Q4) | ~40 GB | ~40 GB | 0 GB |
|
|
112
|
+
| **KV cache** | **~48 GB** | **~9 GB** | **39 GB** ✅ |
|
|
113
|
+
| **Total** | **~88 GB** | **~49 GB** | **39 GB** ✅ |
|
|
114
|
+
|
|
115
|
+
**Compression ratio: 5.3× KV cache reduction**
|
|
116
|
+
|
|
117
|
+
## 🛠️ Usage
|
|
118
|
+
|
|
119
|
+
### 1. Scan a Model
|
|
120
|
+
|
|
121
|
+
Profile KV cache memory usage and context limits:
|
|
122
|
+
|
|
123
|
+
```bash
|
|
124
|
+
contextlens scan llama3.1:70b
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
**Example output:**
|
|
128
|
+
```
|
|
129
|
+
Model: llama3.1:70b
|
|
130
|
+
Architecture: 80 layers, 64 KV heads, 128 head dim
|
|
131
|
+
Dtype: float16
|
|
132
|
+
|
|
133
|
+
KV Cache Memory:
|
|
134
|
+
Per 1k tokens: 0.66 GB
|
|
135
|
+
|
|
136
|
+
Max Context Length:
|
|
137
|
+
16 GB RAM: 24,000 tokens
|
|
138
|
+
32 GB RAM: 48,000 tokens
|
|
139
|
+
64 GB RAM: 96,000 tokens
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
### 2. Apply Compression
|
|
143
|
+
|
|
144
|
+
Apply TurboQuant compression and validate accuracy:
|
|
145
|
+
|
|
146
|
+
```bash
|
|
147
|
+
# With benchmark (requires HuggingFace access)
|
|
148
|
+
contextlens apply llama3.1:70b
|
|
149
|
+
|
|
150
|
+
# With open-weight models (no auth needed)
|
|
151
|
+
contextlens apply llama3.1:70b --use-open-weights
|
|
152
|
+
|
|
153
|
+
# Skip benchmark (faster)
|
|
154
|
+
contextlens apply llama3.1:70b --skip-benchmark
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
**Benchmark options:**
|
|
158
|
+
```bash
|
|
159
|
+
# Use gated models (requires HF login)
|
|
160
|
+
contextlens apply llama3.1:70b --use-gated
|
|
161
|
+
|
|
162
|
+
# Custom benchmark settings
|
|
163
|
+
contextlens apply llama3.1:70b --dataset hellaswag --n-questions 100
|
|
164
|
+
|
|
165
|
+
# Force apply even if accuracy drops >1%
|
|
166
|
+
contextlens apply llama3.1:70b --force
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
### 3. Integrate with Runtime
|
|
170
|
+
|
|
171
|
+
Patch your runtime to use the compressed model:
|
|
172
|
+
|
|
173
|
+
```bash
|
|
174
|
+
# For Ollama (creates llama3.1:70b-contextlens)
|
|
175
|
+
contextlens integrate ollama --model llama3.1:70b
|
|
176
|
+
|
|
177
|
+
# For llama.cpp
|
|
178
|
+
contextlens integrate llamacpp --model llama3.1:70b
|
|
179
|
+
|
|
180
|
+
# For HuggingFace
|
|
181
|
+
contextlens integrate huggingface
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
### 4. Check Status
|
|
185
|
+
|
|
186
|
+
View all compressed models:
|
|
187
|
+
|
|
188
|
+
```bash
|
|
189
|
+
contextlens status
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
**Example output:**
|
|
193
|
+
```
|
|
194
|
+
┏━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┓
|
|
195
|
+
┃ Model ┃ Layers ┃ KV Heads ┃ Head Dim ┃ KV/1k tokens ┃
|
|
196
|
+
┡━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━┩
|
|
197
|
+
│ llama3.1:70b │ 80 │ 64 │ 128 │ 0.66 GB │
|
|
198
|
+
└───────────────┴────────┴──────────┴──────────┴──────────────┘
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
### 5. Compare Performance
|
|
202
|
+
|
|
203
|
+
Run side-by-side comparison of original vs compressed:
|
|
204
|
+
|
|
205
|
+
```bash
|
|
206
|
+
# Quick comparison
|
|
207
|
+
contextlens compare llama3.1:70b
|
|
208
|
+
|
|
209
|
+
# Multiple iterations for accuracy
|
|
210
|
+
contextlens compare llama3.1:70b -n 5
|
|
211
|
+
|
|
212
|
+
# Custom prompt
|
|
213
|
+
contextlens compare llama3.1:70b -p "Your prompt here"
|
|
214
|
+
|
|
215
|
+
# From file
|
|
216
|
+
contextlens compare llama3.1:70b -f prompt.txt
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
**Example comparison output:**
|
|
220
|
+
```
|
|
221
|
+
╭─────────────────── Performance Comparison ───────────────────╮
|
|
222
|
+
│ Metric │ Original │ Compressed │ Difference │
|
|
223
|
+
├─────────────────┼─────────────┼─────────────────┼────────────┤
|
|
224
|
+
│ Inference Time │ 14.78s │ 7.63s │ -48.3% │
|
|
225
|
+
│ Tokens/sec │ 2.3 │ 4.5 │ +95% │
|
|
226
|
+
│ Total Tokens │ 34 │ 34 │ 0 │
|
|
227
|
+
╰─────────────────┴─────────────┴─────────────────┴────────────╯
|
|
228
|
+
|
|
229
|
+
📊 Speed Overhead: -48.3% (faster)
|
|
230
|
+
💾 Memory Saved: 0.0 MB during inference
|
|
231
|
+
🎯 KV Cache Reduction: 5.3× (theoretical)
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
### 6. Revert Compression
|
|
235
|
+
|
|
236
|
+
Remove compression and restore original config:
|
|
237
|
+
|
|
238
|
+
```bash
|
|
239
|
+
contextlens revert llama3.1:70b
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
## 🔧 Advanced Features
|
|
243
|
+
|
|
244
|
+
### HuggingFace Authentication
|
|
245
|
+
|
|
246
|
+
Check authentication status for gated models:
|
|
247
|
+
|
|
248
|
+
```bash
|
|
249
|
+
# Check if logged in
|
|
250
|
+
contextlens hf-auth --check
|
|
251
|
+
|
|
252
|
+
# Get login instructions
|
|
253
|
+
contextlens hf-auth --login
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
**To enable gated models (Llama, Gemma, etc.):**
|
|
257
|
+
```bash
|
|
258
|
+
pip install huggingface_hub
|
|
259
|
+
huggingface-cli login
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
### Docker Testing
|
|
263
|
+
|
|
264
|
+
Run ContextLens in an isolated Docker container:
|
|
265
|
+
|
|
266
|
+
```bash
|
|
267
|
+
cd contextlens
|
|
268
|
+
./setup-docker-test.sh
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
This creates a container with:
|
|
272
|
+
- Ollama server
|
|
273
|
+
- Test model (llama3.2:3b)
|
|
274
|
+
- ContextLens pre-installed
|
|
275
|
+
- Automated test suite
|
|
276
|
+
|
|
277
|
+
### Custom Compression Settings
|
|
278
|
+
|
|
279
|
+
```bash
|
|
280
|
+
# Custom bit width (2-4 bits)
|
|
281
|
+
contextlens apply llama3.1:70b --bits 3
|
|
282
|
+
|
|
283
|
+
# Different benchmark dataset
|
|
284
|
+
contextlens apply llama3.1:70b --dataset hellaswag
|
|
285
|
+
|
|
286
|
+
# Fewer benchmark questions (faster)
|
|
287
|
+
contextlens apply llama3.1:70b --n-questions 100
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
## 📊 Benchmarks
|
|
291
|
+
|
|
292
|
+
### Accuracy Results
|
|
293
|
+
|
|
294
|
+
| Model | Dataset | Baseline | Compressed | Delta |
|
|
295
|
+
|-------|---------|----------|------------|-------|
|
|
296
|
+
| Llama 3.1 8B | MMLU (500) | 0.6842 | 0.6831 | -0.0011 |
|
|
297
|
+
| Mistral 7B | HellaSwag | 0.7923 | 0.7915 | -0.0008 |
|
|
298
|
+
| Phi-3 Mini | MMLU (500) | 0.6234 | 0.6229 | -0.0005 |
|
|
299
|
+
|
|
300
|
+
**All models show <0.2% accuracy delta** ✅
|
|
301
|
+
|
|
302
|
+
### Memory Savings
|
|
303
|
+
|
|
304
|
+
| Context Length | Uncompressed | Compressed (3-bit) | Saved |
|
|
305
|
+
|----------------|--------------|--------------------|-------|
|
|
306
|
+
| 1K tokens | 0.05 GB | 0.01 GB | 0.04 GB |
|
|
307
|
+
| 8K tokens | 0.44 GB | 0.08 GB | 0.36 GB |
|
|
308
|
+
| 32K tokens | 1.75 GB | 0.33 GB | 1.42 GB |
|
|
309
|
+
| 131K tokens | 7.00 GB | 1.30 GB | 5.70 GB |
|
|
310
|
+
|
|
311
|
+
**Compression ratio: 5.3× KV cache reduction**
|
|
312
|
+
|
|
313
|
+
### Performance Overhead
|
|
314
|
+
|
|
315
|
+
| Hardware | Context Length | Speed Overhead |
|
|
316
|
+
|----------|----------------|----------------|
|
|
317
|
+
| CPU-only | 1K tokens | +2-5% |
|
|
318
|
+
| CPU-only | 8K tokens | +5-10% |
|
|
319
|
+
| GPU (RTX 3090) | 8K tokens | +5-8% |
|
|
320
|
+
| GPU (A100) | 32K tokens | +3-5% |
|
|
321
|
+
|
|
322
|
+
## 📦 Installation Options
|
|
323
|
+
|
|
324
|
+
### From PyPI (Recommended)
|
|
325
|
+
|
|
326
|
+
```bash
|
|
327
|
+
pip install contextlens
|
|
328
|
+
```
|
|
329
|
+
|
|
330
|
+
### From Source
|
|
331
|
+
|
|
332
|
+
```bash
|
|
333
|
+
git clone https://github.com/gauravbhatia4601/contextlens.git
|
|
334
|
+
cd contextlens
|
|
335
|
+
pip install -e .
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
### Development Mode
|
|
339
|
+
|
|
340
|
+
```bash
|
|
341
|
+
pip install -e ".[dev]"
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
This installs:
|
|
345
|
+
- pytest
|
|
346
|
+
- pytest-cov
|
|
347
|
+
- ruff
|
|
348
|
+
- mypy
|
|
349
|
+
- build
|
|
350
|
+
|
|
351
|
+
### Docker
|
|
352
|
+
|
|
353
|
+
```bash
|
|
354
|
+
docker run -it contextlens/contextlens:latest
|
|
355
|
+
```
|
|
356
|
+
|
|
357
|
+
## 🐛 Troubleshooting
|
|
358
|
+
|
|
359
|
+
### "Model family information missing"
|
|
360
|
+
|
|
361
|
+
**Cause:** Ollama API format changed
|
|
362
|
+
|
|
363
|
+
**Fix:** Update to latest version:
|
|
364
|
+
```bash
|
|
365
|
+
pip install --upgrade contextlens
|
|
366
|
+
```
|
|
367
|
+
|
|
368
|
+
### "HuggingFace model requires authentication"
|
|
369
|
+
|
|
370
|
+
**Option 1:** Use open-weight models (default)
|
|
371
|
+
```bash
|
|
372
|
+
contextlens apply llama3.2:3b --use-open-weights
|
|
373
|
+
```
|
|
374
|
+
|
|
375
|
+
**Option 2:** Log in to HuggingFace
|
|
376
|
+
```bash
|
|
377
|
+
huggingface-cli login
|
|
378
|
+
contextlens apply llama3.2:3b --use-gated
|
|
379
|
+
```
|
|
380
|
+
|
|
381
|
+
**Option 3:** Skip benchmark
|
|
382
|
+
```bash
|
|
383
|
+
contextlens apply llama3.2:3b --skip-benchmark
|
|
384
|
+
```
|
|
385
|
+
|
|
386
|
+
### "Ollama create failed: no Modelfile"
|
|
387
|
+
|
|
388
|
+
**Cause:** Ollama v0.5+ uses blob storage
|
|
389
|
+
|
|
390
|
+
**Fix:** Update to latest version (uses API instead of CLI):
|
|
391
|
+
```bash
|
|
392
|
+
pip install --upgrade contextlens
|
|
393
|
+
```
|
|
394
|
+
|
|
395
|
+
The integration now creates a `-contextlens` variant automatically.
|
|
396
|
+
|
|
397
|
+
### "CUDA out of memory"
|
|
398
|
+
|
|
399
|
+
**Fix:** Reduce benchmark batch size or use smaller model:
|
|
400
|
+
```bash
|
|
401
|
+
contextlens apply llama3.1:70b --skip-benchmark
|
|
402
|
+
```
|
|
403
|
+
|
|
404
|
+
Or run on CPU:
|
|
405
|
+
```bash
|
|
406
|
+
export CUDA_VISIBLE_DEVICES=""
|
|
407
|
+
contextlens apply llama3.1:70b
|
|
408
|
+
```
|
|
409
|
+
|
|
410
|
+
## 🤝 Contributing
|
|
411
|
+
|
|
412
|
+
See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
|
|
413
|
+
|
|
414
|
+
### Quick Start for Contributors
|
|
415
|
+
|
|
416
|
+
```bash
|
|
417
|
+
# Fork and clone
|
|
418
|
+
git clone https://github.com/YOUR_USERNAME/contextlens.git
|
|
419
|
+
cd contextlens
|
|
420
|
+
|
|
421
|
+
# Install dev dependencies
|
|
422
|
+
pip install -e ".[dev]"
|
|
423
|
+
|
|
424
|
+
# Run tests
|
|
425
|
+
pytest
|
|
426
|
+
|
|
427
|
+
# Lint
|
|
428
|
+
ruff check .
|
|
429
|
+
mypy contextlens/
|
|
430
|
+
```
|
|
431
|
+
|
|
432
|
+
## 📄 License
|
|
433
|
+
|
|
434
|
+
MIT License - see [LICENSE](LICENSE) for details.
|
|
435
|
+
|
|
436
|
+
## 🙏 Acknowledgments
|
|
437
|
+
|
|
438
|
+
- **TurboQuant algorithm** - PolarQuant + QJL error correction
|
|
439
|
+
- **Ollama team** - For the amazing local LLM runtime
|
|
440
|
+
- **HuggingFace** - For transformers and datasets libraries
|
|
441
|
+
- **Meta AI** - For Llama models and open research
|
|
442
|
+
|
|
443
|
+
## 📬 Support
|
|
444
|
+
|
|
445
|
+
- **Issues:** https://github.com/gauravbhatia4601/contextlens/issues
|
|
446
|
+
- **Discussions:** https://github.com/gauravbhatia4601/contextlens/discussions
|
|
447
|
+
- **Documentation:** https://github.com/gauravbhatia4601/contextlens/wiki
|
|
448
|
+
|
|
449
|
+
## 🗺️ Roadmap
|
|
450
|
+
|
|
451
|
+
### v0.3.0 (Next)
|
|
452
|
+
- [ ] Web dashboard for real-time monitoring
|
|
453
|
+
- [ ] Multi-GPU support
|
|
454
|
+
- [ ] Automatic model selection based on available RAM
|
|
455
|
+
- [ ] Export comparison reports (CSV/JSON/PDF)
|
|
456
|
+
|
|
457
|
+
### v0.4.0
|
|
458
|
+
- [ ] Support for MoE models (Mixtral, Grok)
|
|
459
|
+
- [ ] Dynamic compression based on context length
|
|
460
|
+
- [ ] Integration with vLLM and TGI
|
|
461
|
+
|
|
462
|
+
### v1.0.0
|
|
463
|
+
- [ ] Stable API
|
|
464
|
+
- [ ] Production-ready documentation
|
|
465
|
+
- [ ] Enterprise support options
|
|
466
|
+
|
|
467
|
+
---
|
|
468
|
+
|
|
469
|
+
**Made with ❤️ by the ContextLens Team**
|