quantbenchx 0.3.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- quantbenchx-0.3.0/.github/workflows/ci.yml +15 -0
- quantbenchx-0.3.0/.gitignore +2 -0
- quantbenchx-0.3.0/CHANGELOG.md +25 -0
- quantbenchx-0.3.0/PKG-INFO +213 -0
- quantbenchx-0.3.0/README.md +182 -0
- quantbenchx-0.3.0/assets/layerwise.svg +158 -0
- quantbenchx-0.3.0/assets/profile.svg +182 -0
- quantbenchx-0.3.0/examples/profile_model.py +65 -0
- quantbenchx-0.3.0/examples/quality_analysis.py +75 -0
- quantbenchx-0.3.0/examples/recommend_quant.py +56 -0
- quantbenchx-0.3.0/generate_svgs.py +98 -0
- quantbenchx-0.3.0/pyproject.toml +63 -0
- quantbenchx-0.3.0/src/quantbenchx/__init__.py +132 -0
- quantbenchx-0.3.0/src/quantbenchx/_types.py +220 -0
- quantbenchx-0.3.0/src/quantbenchx/bandwidth.py +290 -0
- quantbenchx-0.3.0/src/quantbenchx/cli.py +153 -0
- quantbenchx-0.3.0/src/quantbenchx/compare.py +101 -0
- quantbenchx-0.3.0/src/quantbenchx/imatrix.py +201 -0
- quantbenchx-0.3.0/src/quantbenchx/layerwise.py +167 -0
- quantbenchx-0.3.0/src/quantbenchx/matrix.py +289 -0
- quantbenchx-0.3.0/src/quantbenchx/perplexity.py +168 -0
- quantbenchx-0.3.0/src/quantbenchx/predict.py +125 -0
- quantbenchx-0.3.0/src/quantbenchx/profile.py +301 -0
- quantbenchx-0.3.0/src/quantbenchx/py.typed +0 -0
- quantbenchx-0.3.0/src/quantbenchx/recommend.py +240 -0
- quantbenchx-0.3.0/src/quantbenchx/report.py +171 -0
- quantbenchx-0.3.0/tests/test_bandwidth.py +191 -0
- quantbenchx-0.3.0/tests/test_cli.py +95 -0
- quantbenchx-0.3.0/tests/test_compare.py +90 -0
- quantbenchx-0.3.0/tests/test_edge_cases.py +282 -0
- quantbenchx-0.3.0/tests/test_imatrix.py +239 -0
- quantbenchx-0.3.0/tests/test_layerwise.py +113 -0
- quantbenchx-0.3.0/tests/test_matrix.py +206 -0
- quantbenchx-0.3.0/tests/test_perplexity.py +240 -0
- quantbenchx-0.3.0/tests/test_predict.py +100 -0
- quantbenchx-0.3.0/tests/test_profile.py +210 -0
- quantbenchx-0.3.0/tests/test_recommend.py +188 -0
- quantbenchx-0.3.0/tests/test_report.py +102 -0
- quantbenchx-0.3.0/tests/test_types.py +146 -0
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
name: CI
|
|
2
|
+
on: [push, pull_request]
|
|
3
|
+
jobs:
|
|
4
|
+
test:
|
|
5
|
+
runs-on: ubuntu-latest
|
|
6
|
+
strategy:
|
|
7
|
+
matrix:
|
|
8
|
+
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
|
|
9
|
+
steps:
|
|
10
|
+
- uses: actions/checkout@v4
|
|
11
|
+
- uses: actions/setup-python@v5
|
|
12
|
+
with:
|
|
13
|
+
python-version: ${{ matrix.python-version }}
|
|
14
|
+
- run: pip install -e ".[cli]" pytest
|
|
15
|
+
- run: pytest tests/ -v
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [0.3.0] - 2025-04-10
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
- Add quantization format comparison matrix via `matrix.py`
|
|
12
|
+
- Add memory bandwidth estimation in `bandwidth.py`
|
|
13
|
+
|
|
14
|
+
## [0.2.0] - 2026-04-10
|
|
15
|
+
|
|
16
|
+
### Added
|
|
17
|
+
- Add perplexity-based quality scoring
|
|
18
|
+
- Add GGUF imatrix analysis
|
|
19
|
+
- Add quantization recommendation engine
|
|
20
|
+
|
|
21
|
+
## [0.1.0] - 2026-04-10
|
|
22
|
+
|
|
23
|
+
### Added
|
|
24
|
+
- Initial release: quantization quality analyzer
|
|
25
|
+
|
|
@@ -0,0 +1,213 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: quantbenchx
|
|
3
|
+
Version: 0.3.0
|
|
4
|
+
Summary: Quantization quality analyzer — pure-Python GGUF/safetensors parsing, layerwise analysis, quality prediction. Zero deps.
|
|
5
|
+
Project-URL: Homepage, https://github.com/stef41/quantbenchx
|
|
6
|
+
Project-URL: Repository, https://github.com/stef41/quantbenchx
|
|
7
|
+
Project-URL: Issues, https://github.com/stef41/quantbenchx/issues
|
|
8
|
+
Author: Zacharie B
|
|
9
|
+
License: Apache-2.0
|
|
10
|
+
Keywords: analysis,benchmark,compression,evaluation,gguf,llm,model,quality,quantization,safetensors
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: Intended Audience :: Science/Research
|
|
14
|
+
Classifier: License :: OSI Approved :: Apache Software License
|
|
15
|
+
Classifier: Programming Language :: Python :: 3
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
21
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
22
|
+
Classifier: Typing :: Typed
|
|
23
|
+
Requires-Python: >=3.9
|
|
24
|
+
Provides-Extra: all
|
|
25
|
+
Requires-Dist: click>=8.0; extra == 'all'
|
|
26
|
+
Requires-Dist: rich>=13.0; extra == 'all'
|
|
27
|
+
Provides-Extra: cli
|
|
28
|
+
Requires-Dist: click>=8.0; extra == 'cli'
|
|
29
|
+
Requires-Dist: rich>=13.0; extra == 'cli'
|
|
30
|
+
Description-Content-Type: text/markdown
|
|
31
|
+
|
|
32
|
+
# quantbenchx
|
|
33
|
+
|
|
34
|
+
[](https://github.com/stef41/quantbenchxx/actions/workflows/ci.yml)
|
|
35
|
+
[](https://www.python.org/downloads/)
|
|
36
|
+
[](LICENSE)
|
|
37
|
+
|
|
38
|
+
**Quantization quality analyzer for LLMs.** Pure-Python GGUF and safetensors parsing, layerwise sensitivity analysis, quality prediction, and mixed-quantization recommendations — zero dependencies.
|
|
39
|
+
|
|
40
|
+
Point quantbenchx at any `.gguf` or `.safetensors` file and get an instant quality report: dtype distribution, estimated perplexity impact, layer sensitivity scores, and mixed-precision recommendations.
|
|
41
|
+
|
|
42
|
+
<p align="center">
|
|
43
|
+
<img src="assets/profile.svg" width="700" alt="quantbenchx profile report" />
|
|
44
|
+
</p>
|
|
45
|
+
|
|
46
|
+
## Why quantbenchx?
|
|
47
|
+
|
|
48
|
+
| Problem | quantbenchx Solution |
|
|
49
|
+
|---|---|
|
|
50
|
+
| "Is Q4_K_M good enough for my use case?" | Estimated perplexity delta + risk level |
|
|
51
|
+
| No way to inspect GGUF internals without llama.cpp | Pure-Python parser — just `pip install` |
|
|
52
|
+
| Which layers are most sensitive to quantization? | Layerwise sensitivity scoring with position awareness |
|
|
53
|
+
| Choosing between Q4_K_M, Q5_K_S, Q6_K, etc. | Side-by-side format comparison with rankings |
|
|
54
|
+
| Mixed-precision quant is complex to configure | Automated recommendations targeting your bpw budget |
|
|
55
|
+
|
|
56
|
+
## Installation
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
pip install quantbenchx # zero dependencies
|
|
60
|
+
pip install quantbenchx[cli] # + click, rich for terminal UI
|
|
61
|
+
pip install quantbenchx[all] # everything
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
## Quick Start
|
|
65
|
+
|
|
66
|
+
### 1. Profile a quantized model
|
|
67
|
+
|
|
68
|
+
```python
|
|
69
|
+
from quantbenchx import profile_gguf, estimate_quality
|
|
70
|
+
|
|
71
|
+
profile = profile_gguf("Meta-Llama-3-8B-Q4_K_M.gguf")
|
|
72
|
+
|
|
73
|
+
print(f"Model: {profile.name}")
|
|
74
|
+
print(f"Size: {profile.size_gb:.2f} GB")
|
|
75
|
+
print(f"Avg bits/weight: {profile.quant.avg_bits_per_weight:.2f}")
|
|
76
|
+
print(f"Compression: {profile.compression_ratio:.1f}x vs FP32")
|
|
77
|
+
|
|
78
|
+
quality = estimate_quality(profile)
|
|
79
|
+
print(f"Risk level: {quality.risk_level}")
|
|
80
|
+
print(f"Est. perplexity delta: +{quality.estimated_perplexity_delta:.4f}")
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
### 2. Layerwise analysis
|
|
84
|
+
|
|
85
|
+
<p align="center">
|
|
86
|
+
<img src="assets/layerwise.svg" width="700" alt="quantbenchx layerwise analysis" />
|
|
87
|
+
</p>
|
|
88
|
+
|
|
89
|
+
```python
|
|
90
|
+
from quantbenchx import profile_gguf, analyze_layers, layer_sensitivity
|
|
91
|
+
|
|
92
|
+
profile = profile_gguf("model.gguf")
|
|
93
|
+
|
|
94
|
+
# Get sensitivity scores
|
|
95
|
+
sensitivity = layer_sensitivity(profile)
|
|
96
|
+
for layer_name, score in sorted(sensitivity.items(), key=lambda x: -x[1])[:5]:
|
|
97
|
+
print(f" {layer_name}: {score:.3f}")
|
|
98
|
+
|
|
99
|
+
# Full layerwise breakdown
|
|
100
|
+
for row in analyze_layers(profile):
|
|
101
|
+
print(f"{row['name']:30s} {row['avg_bits_per_weight']:5.2f} bpw sens={row['sensitivity']:.3f}")
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### 3. Compare quantization formats
|
|
105
|
+
|
|
106
|
+
```python
|
|
107
|
+
from quantbenchx import profile_gguf, compare_profiles, compare_formats
|
|
108
|
+
|
|
109
|
+
q4 = profile_gguf("model-Q4_K_M.gguf")
|
|
110
|
+
q5 = profile_gguf("model-Q5_K_M.gguf")
|
|
111
|
+
q8 = profile_gguf("model-Q8_0.gguf")
|
|
112
|
+
|
|
113
|
+
# Pairwise comparison
|
|
114
|
+
diff = compare_profiles(q4, q8)
|
|
115
|
+
print(f"Size delta: {diff['size_delta_bytes'] / 1e9:.2f} GB")
|
|
116
|
+
print(f"BPW delta: {diff['bpw_delta']:.2f}")
|
|
117
|
+
|
|
118
|
+
# Multi-format ranking
|
|
119
|
+
ranking = compare_formats([q4, q5, q8])
|
|
120
|
+
for row in ranking["ranking"]:
|
|
121
|
+
print(f" #{row['rank']} {row['name']} — {row['avg_bpw']:.2f} bpw, {row['size_gb']:.2f} GB")
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
### 4. Mixed-quantization recommendations
|
|
125
|
+
|
|
126
|
+
```python
|
|
127
|
+
from quantbenchx import profile_gguf, recommend_mixed_quant
|
|
128
|
+
|
|
129
|
+
profile = profile_gguf("model.gguf")
|
|
130
|
+
rec = recommend_mixed_quant(profile, target_bpw=4.5)
|
|
131
|
+
|
|
132
|
+
print(f"Target: {rec['target_bpw']} bpw → Estimated: {rec['estimated_avg_bpw']} bpw")
|
|
133
|
+
print(f"High precision: {rec['n_high_precision_layers']} layers ({rec['high_quant']})")
|
|
134
|
+
print(f"Low precision: {rec['n_low_precision_layers']} layers ({rec['low_quant']})")
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
### 5. Predict quality for any bpw
|
|
138
|
+
|
|
139
|
+
```python
|
|
140
|
+
from quantbenchx import perplexity_delta
|
|
141
|
+
|
|
142
|
+
for bpw in [8.0, 6.0, 5.0, 4.5, 4.0, 3.5, 3.0, 2.0]:
|
|
143
|
+
delta = perplexity_delta(bpw)
|
|
144
|
+
print(f" {bpw:.1f} bpw → +{delta:.4f} perplexity")
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
## CLI
|
|
148
|
+
|
|
149
|
+
```bash
|
|
150
|
+
# Profile a GGUF or safetensors file
|
|
151
|
+
quantbenchx profile model.gguf
|
|
152
|
+
|
|
153
|
+
# Markdown output
|
|
154
|
+
quantbenchx profile model.gguf --markdown
|
|
155
|
+
|
|
156
|
+
# Save JSON report
|
|
157
|
+
quantbenchx profile model.gguf -o report.json
|
|
158
|
+
|
|
159
|
+
# Compare two files
|
|
160
|
+
quantbenchx compare model-Q4.gguf model-Q8.gguf
|
|
161
|
+
|
|
162
|
+
# Layerwise analysis
|
|
163
|
+
quantbenchx layers model.gguf
|
|
164
|
+
|
|
165
|
+
# Mixed-quant recommendation
|
|
166
|
+
quantbenchx recommend model.gguf --target-bpw 4.5
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
## Supported Formats
|
|
170
|
+
|
|
171
|
+
| Format | Parser | Status |
|
|
172
|
+
|---|---|---|
|
|
173
|
+
| GGUF (v2, v3) | Pure Python — reads header only | Full support |
|
|
174
|
+
| safetensors | Pure Python — reads JSON header only | Full support |
|
|
175
|
+
|
|
176
|
+
### Supported Dtypes
|
|
177
|
+
|
|
178
|
+
Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_1, Q4_K_S, Q4_K_M, Q5_0, Q5_1, Q5_K_S, Q5_K_M, Q6_K, Q8_0, IQ1_S, IQ2_XXS, IQ3_XXS, IQ4_XS, F16, BF16, F32
|
|
179
|
+
|
|
180
|
+
## Architecture
|
|
181
|
+
|
|
182
|
+
```
|
|
183
|
+
quantbenchx/
|
|
184
|
+
├── _types.py # DType, TensorInfo, LayerInfo, ModelProfile, QualityEstimate
|
|
185
|
+
├── profile.py # Pure-Python GGUF & safetensors parsers
|
|
186
|
+
├── layerwise.py # Layer sensitivity analysis, mixed-quant recommendations
|
|
187
|
+
├── compare.py # Cross-format and pairwise comparisons
|
|
188
|
+
├── predict.py # Quality estimation from bits-per-weight curves
|
|
189
|
+
├── report.py # JSON/text/rich/markdown formatting
|
|
190
|
+
└── cli.py # Click CLI interface
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
## See Also
|
|
194
|
+
|
|
195
|
+
Part of the **stef41 LLM toolkit** — open-source tools for every stage of the LLM lifecycle:
|
|
196
|
+
|
|
197
|
+
| Project | What it does |
|
|
198
|
+
|---------|-------------|
|
|
199
|
+
| [tokonomics](https://github.com/stef41/tokonomix) | Token counting & cost management for LLM APIs |
|
|
200
|
+
| [datacrux](https://github.com/stef41/datacruxai) | Training data quality — dedup, PII, contamination |
|
|
201
|
+
| [castwright](https://github.com/stef41/castwright) | Synthetic instruction data generation |
|
|
202
|
+
| [datamix](https://github.com/stef41/datamix) | Dataset mixing & curriculum optimization |
|
|
203
|
+
| [toksight](https://github.com/stef41/toksight) | Tokenizer analysis & comparison |
|
|
204
|
+
| [trainpulse](https://github.com/stef41/trainpulse) | Training health monitoring |
|
|
205
|
+
| [ckpt](https://github.com/stef41/ckptkit) | Checkpoint inspection, diffing & merging |
|
|
206
|
+
| [infermark](https://github.com/stef41/infermark) | Inference benchmarking |
|
|
207
|
+
| [modeldiff](https://github.com/stef41/modeldiffx) | Behavioral regression testing |
|
|
208
|
+
| [vibesafe](https://github.com/stef41/vibesafex) | AI-generated code safety scanner |
|
|
209
|
+
| [injectionguard](https://github.com/stef41/injectionguard) | Prompt injection detection |
|
|
210
|
+
|
|
211
|
+
## License
|
|
212
|
+
|
|
213
|
+
Apache 2.0
|
|
@@ -0,0 +1,182 @@
|
|
|
1
|
+
# quantbenchx
|
|
2
|
+
|
|
3
|
+
[](https://github.com/stef41/quantbenchxx/actions/workflows/ci.yml)
|
|
4
|
+
[](https://www.python.org/downloads/)
|
|
5
|
+
[](LICENSE)
|
|
6
|
+
|
|
7
|
+
**Quantization quality analyzer for LLMs.** Pure-Python GGUF and safetensors parsing, layerwise sensitivity analysis, quality prediction, and mixed-quantization recommendations — zero dependencies.
|
|
8
|
+
|
|
9
|
+
Point quantbenchx at any `.gguf` or `.safetensors` file and get an instant quality report: dtype distribution, estimated perplexity impact, layer sensitivity scores, and mixed-precision recommendations.
|
|
10
|
+
|
|
11
|
+
<p align="center">
|
|
12
|
+
<img src="assets/profile.svg" width="700" alt="quantbenchx profile report" />
|
|
13
|
+
</p>
|
|
14
|
+
|
|
15
|
+
## Why quantbenchx?
|
|
16
|
+
|
|
17
|
+
| Problem | quantbenchx Solution |
|
|
18
|
+
|---|---|
|
|
19
|
+
| "Is Q4_K_M good enough for my use case?" | Estimated perplexity delta + risk level |
|
|
20
|
+
| No way to inspect GGUF internals without llama.cpp | Pure-Python parser — just `pip install` |
|
|
21
|
+
| Which layers are most sensitive to quantization? | Layerwise sensitivity scoring with position awareness |
|
|
22
|
+
| Choosing between Q4_K_M, Q5_K_S, Q6_K, etc. | Side-by-side format comparison with rankings |
|
|
23
|
+
| Mixed-precision quant is complex to configure | Automated recommendations targeting your bpw budget |
|
|
24
|
+
|
|
25
|
+
## Installation
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
pip install quantbenchx # zero dependencies
|
|
29
|
+
pip install quantbenchx[cli] # + click, rich for terminal UI
|
|
30
|
+
pip install quantbenchx[all] # everything
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## Quick Start
|
|
34
|
+
|
|
35
|
+
### 1. Profile a quantized model
|
|
36
|
+
|
|
37
|
+
```python
|
|
38
|
+
from quantbenchx import profile_gguf, estimate_quality
|
|
39
|
+
|
|
40
|
+
profile = profile_gguf("Meta-Llama-3-8B-Q4_K_M.gguf")
|
|
41
|
+
|
|
42
|
+
print(f"Model: {profile.name}")
|
|
43
|
+
print(f"Size: {profile.size_gb:.2f} GB")
|
|
44
|
+
print(f"Avg bits/weight: {profile.quant.avg_bits_per_weight:.2f}")
|
|
45
|
+
print(f"Compression: {profile.compression_ratio:.1f}x vs FP32")
|
|
46
|
+
|
|
47
|
+
quality = estimate_quality(profile)
|
|
48
|
+
print(f"Risk level: {quality.risk_level}")
|
|
49
|
+
print(f"Est. perplexity delta: +{quality.estimated_perplexity_delta:.4f}")
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### 2. Layerwise analysis
|
|
53
|
+
|
|
54
|
+
<p align="center">
|
|
55
|
+
<img src="assets/layerwise.svg" width="700" alt="quantbenchx layerwise analysis" />
|
|
56
|
+
</p>
|
|
57
|
+
|
|
58
|
+
```python
|
|
59
|
+
from quantbenchx import profile_gguf, analyze_layers, layer_sensitivity
|
|
60
|
+
|
|
61
|
+
profile = profile_gguf("model.gguf")
|
|
62
|
+
|
|
63
|
+
# Get sensitivity scores
|
|
64
|
+
sensitivity = layer_sensitivity(profile)
|
|
65
|
+
for layer_name, score in sorted(sensitivity.items(), key=lambda x: -x[1])[:5]:
|
|
66
|
+
print(f" {layer_name}: {score:.3f}")
|
|
67
|
+
|
|
68
|
+
# Full layerwise breakdown
|
|
69
|
+
for row in analyze_layers(profile):
|
|
70
|
+
print(f"{row['name']:30s} {row['avg_bits_per_weight']:5.2f} bpw sens={row['sensitivity']:.3f}")
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
### 3. Compare quantization formats
|
|
74
|
+
|
|
75
|
+
```python
|
|
76
|
+
from quantbenchx import profile_gguf, compare_profiles, compare_formats
|
|
77
|
+
|
|
78
|
+
q4 = profile_gguf("model-Q4_K_M.gguf")
|
|
79
|
+
q5 = profile_gguf("model-Q5_K_M.gguf")
|
|
80
|
+
q8 = profile_gguf("model-Q8_0.gguf")
|
|
81
|
+
|
|
82
|
+
# Pairwise comparison
|
|
83
|
+
diff = compare_profiles(q4, q8)
|
|
84
|
+
print(f"Size delta: {diff['size_delta_bytes'] / 1e9:.2f} GB")
|
|
85
|
+
print(f"BPW delta: {diff['bpw_delta']:.2f}")
|
|
86
|
+
|
|
87
|
+
# Multi-format ranking
|
|
88
|
+
ranking = compare_formats([q4, q5, q8])
|
|
89
|
+
for row in ranking["ranking"]:
|
|
90
|
+
print(f" #{row['rank']} {row['name']} — {row['avg_bpw']:.2f} bpw, {row['size_gb']:.2f} GB")
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
### 4. Mixed-quantization recommendations
|
|
94
|
+
|
|
95
|
+
```python
|
|
96
|
+
from quantbenchx import profile_gguf, recommend_mixed_quant
|
|
97
|
+
|
|
98
|
+
profile = profile_gguf("model.gguf")
|
|
99
|
+
rec = recommend_mixed_quant(profile, target_bpw=4.5)
|
|
100
|
+
|
|
101
|
+
print(f"Target: {rec['target_bpw']} bpw → Estimated: {rec['estimated_avg_bpw']} bpw")
|
|
102
|
+
print(f"High precision: {rec['n_high_precision_layers']} layers ({rec['high_quant']})")
|
|
103
|
+
print(f"Low precision: {rec['n_low_precision_layers']} layers ({rec['low_quant']})")
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### 5. Predict quality for any bpw
|
|
107
|
+
|
|
108
|
+
```python
|
|
109
|
+
from quantbenchx import perplexity_delta
|
|
110
|
+
|
|
111
|
+
for bpw in [8.0, 6.0, 5.0, 4.5, 4.0, 3.5, 3.0, 2.0]:
|
|
112
|
+
delta = perplexity_delta(bpw)
|
|
113
|
+
print(f" {bpw:.1f} bpw → +{delta:.4f} perplexity")
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
## CLI
|
|
117
|
+
|
|
118
|
+
```bash
|
|
119
|
+
# Profile a GGUF or safetensors file
|
|
120
|
+
quantbenchx profile model.gguf
|
|
121
|
+
|
|
122
|
+
# Markdown output
|
|
123
|
+
quantbenchx profile model.gguf --markdown
|
|
124
|
+
|
|
125
|
+
# Save JSON report
|
|
126
|
+
quantbenchx profile model.gguf -o report.json
|
|
127
|
+
|
|
128
|
+
# Compare two files
|
|
129
|
+
quantbenchx compare model-Q4.gguf model-Q8.gguf
|
|
130
|
+
|
|
131
|
+
# Layerwise analysis
|
|
132
|
+
quantbenchx layers model.gguf
|
|
133
|
+
|
|
134
|
+
# Mixed-quant recommendation
|
|
135
|
+
quantbenchx recommend model.gguf --target-bpw 4.5
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
## Supported Formats
|
|
139
|
+
|
|
140
|
+
| Format | Parser | Status |
|
|
141
|
+
|---|---|---|
|
|
142
|
+
| GGUF (v2, v3) | Pure Python — reads header only | Full support |
|
|
143
|
+
| safetensors | Pure Python — reads JSON header only | Full support |
|
|
144
|
+
|
|
145
|
+
### Supported Dtypes
|
|
146
|
+
|
|
147
|
+
Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_1, Q4_K_S, Q4_K_M, Q5_0, Q5_1, Q5_K_S, Q5_K_M, Q6_K, Q8_0, IQ1_S, IQ2_XXS, IQ3_XXS, IQ4_XS, F16, BF16, F32
|
|
148
|
+
|
|
149
|
+
## Architecture
|
|
150
|
+
|
|
151
|
+
```
|
|
152
|
+
quantbenchx/
|
|
153
|
+
├── _types.py # DType, TensorInfo, LayerInfo, ModelProfile, QualityEstimate
|
|
154
|
+
├── profile.py # Pure-Python GGUF & safetensors parsers
|
|
155
|
+
├── layerwise.py # Layer sensitivity analysis, mixed-quant recommendations
|
|
156
|
+
├── compare.py # Cross-format and pairwise comparisons
|
|
157
|
+
├── predict.py # Quality estimation from bits-per-weight curves
|
|
158
|
+
├── report.py # JSON/text/rich/markdown formatting
|
|
159
|
+
└── cli.py # Click CLI interface
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
## See Also
|
|
163
|
+
|
|
164
|
+
Part of the **stef41 LLM toolkit** — open-source tools for every stage of the LLM lifecycle:
|
|
165
|
+
|
|
166
|
+
| Project | What it does |
|
|
167
|
+
|---------|-------------|
|
|
168
|
+
| [tokonomics](https://github.com/stef41/tokonomix) | Token counting & cost management for LLM APIs |
|
|
169
|
+
| [datacrux](https://github.com/stef41/datacruxai) | Training data quality — dedup, PII, contamination |
|
|
170
|
+
| [castwright](https://github.com/stef41/castwright) | Synthetic instruction data generation |
|
|
171
|
+
| [datamix](https://github.com/stef41/datamix) | Dataset mixing & curriculum optimization |
|
|
172
|
+
| [toksight](https://github.com/stef41/toksight) | Tokenizer analysis & comparison |
|
|
173
|
+
| [trainpulse](https://github.com/stef41/trainpulse) | Training health monitoring |
|
|
174
|
+
| [ckpt](https://github.com/stef41/ckptkit) | Checkpoint inspection, diffing & merging |
|
|
175
|
+
| [infermark](https://github.com/stef41/infermark) | Inference benchmarking |
|
|
176
|
+
| [modeldiff](https://github.com/stef41/modeldiffx) | Behavioral regression testing |
|
|
177
|
+
| [vibesafe](https://github.com/stef41/vibesafex) | AI-generated code safety scanner |
|
|
178
|
+
| [injectionguard](https://github.com/stef41/injectionguard) | Prompt injection detection |
|
|
179
|
+
|
|
180
|
+
## License
|
|
181
|
+
|
|
182
|
+
Apache 2.0
|