compressed-tensors 0.5.0__py3-none-any.whl → 0.7.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- compressed_tensors/__init__.py +1 -0
- compressed_tensors/base.py +2 -0
- compressed_tensors/compressors/__init__.py +6 -12
- compressed_tensors/compressors/base.py +137 -9
- compressed_tensors/compressors/helpers.py +6 -6
- compressed_tensors/compressors/model_compressors/__init__.py +17 -0
- compressed_tensors/compressors/{model_compressor.py → model_compressors/model_compressor.py} +99 -43
- compressed_tensors/compressors/quantized_compressors/__init__.py +18 -0
- compressed_tensors/compressors/{naive_quantized.py → quantized_compressors/base.py} +64 -62
- compressed_tensors/compressors/quantized_compressors/naive_quantized.py +140 -0
- compressed_tensors/compressors/quantized_compressors/pack_quantized.py +211 -0
- compressed_tensors/compressors/sparse_compressors/__init__.py +18 -0
- compressed_tensors/compressors/sparse_compressors/base.py +110 -0
- compressed_tensors/compressors/{dense.py → sparse_compressors/dense.py} +3 -3
- compressed_tensors/compressors/{sparse_bitmask.py → sparse_compressors/sparse_bitmask.py} +14 -59
- compressed_tensors/compressors/sparse_quantized_compressors/__init__.py +16 -0
- compressed_tensors/compressors/{marlin_24.py → sparse_quantized_compressors/marlin_24.py} +3 -3
- compressed_tensors/config/base.py +6 -1
- compressed_tensors/linear/__init__.py +13 -0
- compressed_tensors/linear/compressed_linear.py +87 -0
- compressed_tensors/quantization/__init__.py +1 -0
- compressed_tensors/quantization/cache.py +201 -0
- compressed_tensors/quantization/lifecycle/apply.py +63 -9
- compressed_tensors/quantization/lifecycle/calibration.py +7 -7
- compressed_tensors/quantization/lifecycle/compressed.py +3 -1
- compressed_tensors/quantization/lifecycle/forward.py +126 -44
- compressed_tensors/quantization/lifecycle/frozen.py +6 -1
- compressed_tensors/quantization/lifecycle/helpers.py +0 -20
- compressed_tensors/quantization/lifecycle/initialize.py +138 -55
- compressed_tensors/quantization/observers/__init__.py +1 -0
- compressed_tensors/quantization/observers/base.py +54 -14
- compressed_tensors/quantization/observers/min_max.py +8 -0
- compressed_tensors/quantization/observers/mse.py +162 -0
- compressed_tensors/quantization/quant_args.py +102 -24
- compressed_tensors/quantization/quant_config.py +14 -2
- compressed_tensors/quantization/quant_scheme.py +12 -13
- compressed_tensors/quantization/utils/helpers.py +44 -19
- compressed_tensors/utils/__init__.py +1 -0
- compressed_tensors/utils/helpers.py +30 -1
- compressed_tensors/utils/offload.py +14 -2
- compressed_tensors/utils/permute.py +70 -0
- compressed_tensors/utils/safetensors_load.py +2 -0
- compressed_tensors/utils/semi_structured_conversions.py +1 -0
- compressed_tensors/version.py +1 -1
- {compressed_tensors-0.5.0.dist-info → compressed_tensors-0.7.0.dist-info}/METADATA +35 -23
- compressed_tensors-0.7.0.dist-info/RECORD +59 -0
- {compressed_tensors-0.5.0.dist-info → compressed_tensors-0.7.0.dist-info}/WHEEL +1 -1
- compressed_tensors/compressors/pack_quantized.py +0 -219
- compressed_tensors-0.5.0.dist-info/RECORD +0 -48
- {compressed_tensors-0.5.0.dist-info → compressed_tensors-0.7.0.dist-info}/LICENSE +0 -0
- {compressed_tensors-0.5.0.dist-info → compressed_tensors-0.7.0.dist-info}/top_level.txt +0 -0
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.1
|
2
2
|
Name: compressed-tensors
|
3
|
-
Version: 0.
|
3
|
+
Version: 0.7.0
|
4
4
|
Summary: Library for utilization of compressed safetensors of neural network models
|
5
5
|
Home-page: https://github.com/neuralmagic/compressed-tensors
|
6
6
|
Author: Neuralmagic, Inc.
|
@@ -8,44 +8,56 @@ Author-email: support@neuralmagic.com
|
|
8
8
|
License: Apache 2.0
|
9
9
|
Description-Content-Type: text/markdown
|
10
10
|
License-File: LICENSE
|
11
|
-
Requires-Dist: torch
|
11
|
+
Requires-Dist: torch>=1.7.0
|
12
12
|
Requires-Dist: transformers
|
13
|
-
Requires-Dist:
|
14
|
-
|
13
|
+
Requires-Dist: pydantic>=2.0
|
14
|
+
Provides-Extra: accelerate
|
15
|
+
Requires-Dist: accelerate; extra == "accelerate"
|
15
16
|
Provides-Extra: dev
|
16
|
-
Requires-Dist: black
|
17
|
-
Requires-Dist: isort
|
18
|
-
Requires-Dist: wheel
|
19
|
-
Requires-Dist: flake8
|
20
|
-
Requires-Dist: pytest
|
21
|
-
Requires-Dist: nbconvert
|
17
|
+
Requires-Dist: black==22.12.0; extra == "dev"
|
18
|
+
Requires-Dist: isort==5.8.0; extra == "dev"
|
19
|
+
Requires-Dist: wheel>=0.36.2; extra == "dev"
|
20
|
+
Requires-Dist: flake8>=3.8.3; extra == "dev"
|
21
|
+
Requires-Dist: pytest>=6.0.0; extra == "dev"
|
22
|
+
Requires-Dist: nbconvert>=7.16.3; extra == "dev"
|
22
23
|
|
23
|
-
#
|
24
|
+
# compressed-tensors
|
24
25
|
|
25
|
-
|
26
|
+
The `compressed-tensors` library extends the [safetensors](https://github.com/huggingface/safetensors) format, providing a versatile and efficient way to store and manage compressed tensor data. This library supports various quantization and sparsity schemes, making it a unified format for handling different model optimizations like GPTQ, AWQ, SmoothQuant, INT8, FP8, SparseGPT, and more.
|
26
27
|
|
27
|
-
##
|
28
|
+
## Why `compressed-tensors`?
|
28
29
|
|
29
|
-
|
30
|
+
As model compression becomes increasingly important for efficient deployment of LLMs, the landscape of quantization and compression techniques has become increasingly fragmented.
|
31
|
+
Each method often comes with its own storage format and loading procedures, making it challenging to work with multiple techniques or switch between them.
|
32
|
+
`compressed-tensors` addresses this by providing a single, extensible format that can represent a wide variety of compression schemes.
|
30
33
|
|
31
|
-
|
34
|
+
* **Unified Checkpoint Format**: Supports various compression schemes in a single, consistent format.
|
35
|
+
* **Wide Compatibility**: Works with popular quantization methods like GPTQ, SmoothQuant, and FP8. See [llm-compressor](https://github.com/vllm-project/llm-compressor)
|
36
|
+
* **Flexible Quantization Support**:
|
37
|
+
* Weight-only quantization (e.g., W4A16, W8A16, WnA16)
|
38
|
+
* Activation quantization (e.g., W8A8)
|
39
|
+
* KV cache quantization
|
40
|
+
* Non-uniform schemes (different layers can be quantized in different ways!)
|
41
|
+
* **Sparsity Support**: Handles both unstructured and semi-structured (e.g., 2:4) sparsity patterns.
|
42
|
+
* **Open-Source Integration**: Designed to work seamlessly with Hugging Face models and PyTorch.
|
32
43
|
|
33
|
-
|
34
|
-
- Quantized -> due to their low precision representation.
|
35
|
-
|
36
|
-
### Introduce an elegant interface to save/load compressed tensors
|
37
|
-
|
38
|
-
The library provides the user with the ability to compress/decompress tensors. The properties of tensors are defined by human-readable configs, allowing the users to understand the compression format at a quick glance.
|
44
|
+
This allows developers and researchers to easily experiment with composing different quantization methods, simplify model deployment pipelines, and reduce the overhead of supporting multiple compression formats in inference engines.
|
39
45
|
|
40
46
|
## Installation
|
41
47
|
|
42
|
-
###
|
48
|
+
### From [PyPI](https://pypi.org/project/compressed-tensors)
|
43
49
|
|
50
|
+
Stable release:
|
44
51
|
```bash
|
45
52
|
pip install compressed-tensors
|
46
53
|
```
|
47
54
|
|
48
|
-
|
55
|
+
Nightly release:
|
56
|
+
```bash
|
57
|
+
pip install compressed-tensors-nightly
|
58
|
+
```
|
59
|
+
|
60
|
+
### From Source
|
49
61
|
|
50
62
|
```bash
|
51
63
|
git clone https://github.com/neuralmagic/compressed-tensors
|
@@ -0,0 +1,59 @@
|
|
1
|
+
compressed_tensors/__init__.py,sha256=UtKmifNeBCSE2TZSAfduVNNzHY-3V7bLjZ7n7RuXLOE,812
|
2
|
+
compressed_tensors/base.py,sha256=73HYH7HY7O2roC89yG_piPFnZwrBfn_i7HmKl90SKc0,875
|
3
|
+
compressed_tensors/version.py,sha256=RTYptXdV8f4QbYCRQ13eGeEsq4grNJs6EXgejoZl9EE,1585
|
4
|
+
compressed_tensors/compressors/__init__.py,sha256=smSygTSfcfuujRrAXDc6uZm4L_ccV1tWZewqVnOb4lM,825
|
5
|
+
compressed_tensors/compressors/base.py,sha256=D9TNwQcjanDiAHODPbg8JUqc66e3j50rctY7A708NEs,6743
|
6
|
+
compressed_tensors/compressors/helpers.py,sha256=OK6qxX9j3bHwF9JfIYSGMgBJe2PWjlTA3byXKCJaTIQ,5431
|
7
|
+
compressed_tensors/compressors/model_compressors/__init__.py,sha256=5RGGPFu4YqEt_aOdFSQYFYFDjcZFJN0CsMqRtDZz3Js,666
|
8
|
+
compressed_tensors/compressors/model_compressors/model_compressor.py,sha256=XJgPsq8KiDfiR4e8bSI38lmoOd2ApqRk1aPcXS2obqY,15600
|
9
|
+
compressed_tensors/compressors/quantized_compressors/__init__.py,sha256=09UJq68Pht6Bf-4iP9xYl3tetKsncNPHD8IAGbePsr4,714
|
10
|
+
compressed_tensors/compressors/quantized_compressors/base.py,sha256=K1KOnS6Y8nUA1-HN7VhyfsDc01nilW0WfXMUhuD-l8w,5954
|
11
|
+
compressed_tensors/compressors/quantized_compressors/naive_quantized.py,sha256=Mmfr-hap-4zw7CzE1mXi0UirknqGidNxw38GGWVgTqM,4916
|
12
|
+
compressed_tensors/compressors/quantized_compressors/pack_quantized.py,sha256=9H8UrG5v1GRtslLjOEiUM2dnyxJnR-HJmlsFezQs_r0,7706
|
13
|
+
compressed_tensors/compressors/sparse_compressors/__init__.py,sha256=i2TESH27l7KXeOhJ6hShIoI904XX96l-cRQiMR6MAaU,704
|
14
|
+
compressed_tensors/compressors/sparse_compressors/base.py,sha256=Ua4rUSGyucEs-YJI5z3oIUF-zqQLrFsQ9f-qKasEdUM,4410
|
15
|
+
compressed_tensors/compressors/sparse_compressors/dense.py,sha256=lSKNWRx6H7aUqaJj1j4qbXk8Gkm1UohbnvW1Rvq6Ra4,1284
|
16
|
+
compressed_tensors/compressors/sparse_compressors/sparse_bitmask.py,sha256=4fKwCG7ZM8mUtSnjPvubzEHl-mTnxMzwjmcs7L43WLY,6622
|
17
|
+
compressed_tensors/compressors/sparse_quantized_compressors/__init__.py,sha256=4f_cwcKXB1nVVMoiKgTFAc8jAPjPLElo-Df_EDm1_xw,675
|
18
|
+
compressed_tensors/compressors/sparse_quantized_compressors/marlin_24.py,sha256=akqE7eW8CLTslpWRxERaZ8R0TSm1lS7D1bgZXKL0xi8,9427
|
19
|
+
compressed_tensors/config/__init__.py,sha256=ZBqWn3r6ku1qfmlHHYp0mQueY0i7Pwhr9rbQk9dDlMc,704
|
20
|
+
compressed_tensors/config/base.py,sha256=BNTFKy12isY7qblwxdi_R1f00EzgrNOXLrfxqLCPT8w,1903
|
21
|
+
compressed_tensors/config/dense.py,sha256=NgSxnFCnckU9-iunxEaqiFwqgdO7YYxlWKR74jNbjks,1317
|
22
|
+
compressed_tensors/config/sparse_bitmask.py,sha256=pZUboRNZTu6NajGOQEFExoPknak5ynVAUeiiYpS1Gt8,1308
|
23
|
+
compressed_tensors/linear/__init__.py,sha256=fH6rjBYAxuwrTzBTlTjTgCYNyh6TCvCqajCz4Im4YrA,617
|
24
|
+
compressed_tensors/linear/compressed_linear.py,sha256=0jTTf6XxOAjAYs3tvFtgiNMAO4W10sSeR-pdH2M413g,3218
|
25
|
+
compressed_tensors/quantization/__init__.py,sha256=nWP_fsl6Nn0ksEgZPzerGiETdvF-ZfNwPnwGlRiR5pY,805
|
26
|
+
compressed_tensors/quantization/cache.py,sha256=vnBB5zasO_XpHomZvzUPVVbzyCz2VgebsHePm0kANzY,6831
|
27
|
+
compressed_tensors/quantization/quant_args.py,sha256=73KevZXHyrkMCT_3CxbYHz70fI3i-wcF8NvN0wsBPK4,8271
|
28
|
+
compressed_tensors/quantization/quant_config.py,sha256=NCiMvUMnnz5kTyAkDylxjtEGQnjgsIYIeNR2zyHEdTQ,10371
|
29
|
+
compressed_tensors/quantization/quant_scheme.py,sha256=uFgp6ECU6ZkHWkeKlAVAzZTLDbrTrzPSPrY23eJluaw,5931
|
30
|
+
compressed_tensors/quantization/lifecycle/__init__.py,sha256=MXE2E7GfIfRRfhrdGy2Og3AZOz5N59B0ZGFcsD89y6c,821
|
31
|
+
compressed_tensors/quantization/lifecycle/apply.py,sha256=czaayvpeUYyWRJhO_klffw6esptOgA9sBKL5TWQcRdw,15805
|
32
|
+
compressed_tensors/quantization/lifecycle/calibration.py,sha256=IuLeRkVQPrMxkMcIjr4OMFlIUMHkqjH4qAxC2KiUBGw,2673
|
33
|
+
compressed_tensors/quantization/lifecycle/compressed.py,sha256=Fj9n66IN0EWsOAkBHg3O0GlOQpxstqjCcs0ttzMXrJ0,2296
|
34
|
+
compressed_tensors/quantization/lifecycle/forward.py,sha256=eLup6QDRUUp_Ozcas7RDRLIXBWjFbxn5gWbcAIJEGlw,15715
|
35
|
+
compressed_tensors/quantization/lifecycle/frozen.py,sha256=NiJw7NP7pcT6idWFa8vksgiLoT8oQ975e57S4QfD2QQ,1874
|
36
|
+
compressed_tensors/quantization/lifecycle/helpers.py,sha256=C0mhy2vJ0fCjVeN4kFNhw8Eq1wkteBGHiZ36RVLThRY,944
|
37
|
+
compressed_tensors/quantization/lifecycle/initialize.py,sha256=4_YG7jKl7d2-Cy58pOkMtInFRhvYahxYchesWMPxPVM,8862
|
38
|
+
compressed_tensors/quantization/observers/__init__.py,sha256=4Sa7rqi5RB_S5bPO8KmncETiqDsoMBhwP37arlQym8s,764
|
39
|
+
compressed_tensors/quantization/observers/base.py,sha256=5ovQicWPYHjIxr6-EkQ4lgOX0PpI9g23iSzKpxjM1Zg,8420
|
40
|
+
compressed_tensors/quantization/observers/helpers.py,sha256=s_A23Qa_BLfOdHJCN5bm-qPWkhjjj_RIVrhSp1Y9Dtk,4211
|
41
|
+
compressed_tensors/quantization/observers/memoryless.py,sha256=jH_c6K3gxf4W3VNXQ7tbnP-J_86QTrEfjBn6Kh1C-H8,2165
|
42
|
+
compressed_tensors/quantization/observers/min_max.py,sha256=sQXqU3z-voxIDfR_9mQzwQUflZj2sASm_G8CYaXntFw,3865
|
43
|
+
compressed_tensors/quantization/observers/mse.py,sha256=Aeh-253Vbab1F8cYuBiGNn4OXWJ67wXQ_JVfl3mu2a8,6034
|
44
|
+
compressed_tensors/quantization/utils/__init__.py,sha256=VdtEmP0bvuND_IGQnyqUPc5lnFp-1_yD7StKSX4x80w,656
|
45
|
+
compressed_tensors/quantization/utils/helpers.py,sha256=y4LEyC2oUd876ZMdALWKGH3Ct5EgBJZV4id_NUjTGH8,9531
|
46
|
+
compressed_tensors/registry/__init__.py,sha256=FwLSNYqfIrb5JD_6OK_MT4_svvKTN_nEhpgQlQvGbjI,658
|
47
|
+
compressed_tensors/registry/registry.py,sha256=fxjOjh2wklCvJhQxwofdy-zV8q7MkQ85SLG77nml2iA,11890
|
48
|
+
compressed_tensors/utils/__init__.py,sha256=gS4gSU2pwcAbsKj-6YMaqhm25udFy6ISYaWBf-myRSM,808
|
49
|
+
compressed_tensors/utils/helpers.py,sha256=hWGIR0W7ENHwdC7wW2SQJJiCF9-xOu_u3fY2RzLyYg4,4101
|
50
|
+
compressed_tensors/utils/offload.py,sha256=d9q8LNe8HyF8tOjgjA7QGLD3HRysmNp0d8eBbdqBgIM,4089
|
51
|
+
compressed_tensors/utils/permutations_24.py,sha256=kx6fsfDHebx94zsSzhXGyCyuC9sVyah6BUUir_StT28,2530
|
52
|
+
compressed_tensors/utils/permute.py,sha256=V6tJLKo3Syccj-viv4F7ZKZgJeCB-hl-dK8RKI_kBwI,2355
|
53
|
+
compressed_tensors/utils/safetensors_load.py,sha256=m08ANVuTBxQdoa6LufDgcNJ7wCLDJolyZljB8VEybAU,8578
|
54
|
+
compressed_tensors/utils/semi_structured_conversions.py,sha256=XKNffPum54kPASgqKzgKvyeqWPAkair2XEQXjkp7ho8,13489
|
55
|
+
compressed_tensors-0.7.0.dist-info/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
|
56
|
+
compressed_tensors-0.7.0.dist-info/METADATA,sha256=Lgcl4rU8ifo0PY-FrurFApAkuTD9HBeJohuULjVqebs,6782
|
57
|
+
compressed_tensors-0.7.0.dist-info/WHEEL,sha256=eOLhNAGa2EW3wWl_TU484h7q1UNgy0JXjjoqKoxAAQc,92
|
58
|
+
compressed_tensors-0.7.0.dist-info/top_level.txt,sha256=w2i-GyPs2s1UwVxvutSvN_lM22SXC2hQFBmoMcPnV7Y,19
|
59
|
+
compressed_tensors-0.7.0.dist-info/RECORD,,
|
@@ -1,219 +0,0 @@
|
|
1
|
-
# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
|
2
|
-
#
|
3
|
-
# Licensed under the Apache License, Version 2.0 (the "License");
|
4
|
-
# you may not use this file except in compliance with the License.
|
5
|
-
# You may obtain a copy of the License at
|
6
|
-
#
|
7
|
-
# http://www.apache.org/licenses/LICENSE-2.0
|
8
|
-
#
|
9
|
-
# Unless required by applicable law or agreed to in writing,
|
10
|
-
# software distributed under the License is distributed on an "AS IS" BASIS,
|
11
|
-
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12
|
-
# See the License for the specific language governing permissions and
|
13
|
-
# limitations under the License.
|
14
|
-
|
15
|
-
import logging
|
16
|
-
import math
|
17
|
-
from typing import Dict, Generator, Tuple
|
18
|
-
|
19
|
-
import numpy as np
|
20
|
-
import torch
|
21
|
-
from compressed_tensors.compressors import Compressor
|
22
|
-
from compressed_tensors.config import CompressionFormat
|
23
|
-
from compressed_tensors.quantization import QuantizationArgs
|
24
|
-
from compressed_tensors.quantization.lifecycle.forward import dequantize, quantize
|
25
|
-
from compressed_tensors.quantization.utils import can_quantize
|
26
|
-
from compressed_tensors.utils import get_nested_weight_mappings, merge_names
|
27
|
-
from safetensors import safe_open
|
28
|
-
from torch import Tensor
|
29
|
-
from tqdm import tqdm
|
30
|
-
|
31
|
-
|
32
|
-
__all__ = ["PackedQuantizationCompressor", "pack_to_int32", "unpack_from_int32"]
|
33
|
-
|
34
|
-
_LOGGER: logging.Logger = logging.getLogger(__name__)
|
35
|
-
|
36
|
-
|
37
|
-
@Compressor.register(name=CompressionFormat.pack_quantized.value)
|
38
|
-
class PackedQuantizationCompressor(Compressor):
|
39
|
-
"""
|
40
|
-
Compresses a quantized model by packing every eight 4-bit weights into an int32
|
41
|
-
"""
|
42
|
-
|
43
|
-
COMPRESSION_PARAM_NAMES = [
|
44
|
-
"weight_packed",
|
45
|
-
"weight_scale",
|
46
|
-
"weight_zero_point",
|
47
|
-
"weight_shape",
|
48
|
-
]
|
49
|
-
|
50
|
-
def compress(
|
51
|
-
self,
|
52
|
-
model_state: Dict[str, Tensor],
|
53
|
-
names_to_scheme: Dict[str, QuantizationArgs],
|
54
|
-
**kwargs,
|
55
|
-
) -> Dict[str, Tensor]:
|
56
|
-
"""
|
57
|
-
Compresses a dense state dict
|
58
|
-
|
59
|
-
:param model_state: state dict of uncompressed model
|
60
|
-
:param names_to_scheme: quantization args for each quantized weight, needed for
|
61
|
-
quantize function to calculate bit depth
|
62
|
-
:return: compressed state dict
|
63
|
-
"""
|
64
|
-
compressed_dict = {}
|
65
|
-
weight_suffix = ".weight"
|
66
|
-
_LOGGER.debug(
|
67
|
-
f"Compressing model with {len(model_state)} parameterized layers..."
|
68
|
-
)
|
69
|
-
|
70
|
-
for name, value in tqdm(model_state.items(), desc="Compressing model"):
|
71
|
-
if name.endswith(weight_suffix):
|
72
|
-
prefix = name[: -(len(weight_suffix))]
|
73
|
-
scale = model_state.get(merge_names(prefix, "weight_scale"), None)
|
74
|
-
zp = model_state.get(merge_names(prefix, "weight_zero_point"), None)
|
75
|
-
shape = torch.tensor(value.shape)
|
76
|
-
if scale is not None and zp is not None:
|
77
|
-
# weight is quantized, compress it
|
78
|
-
quant_args = names_to_scheme[prefix]
|
79
|
-
if can_quantize(value, quant_args):
|
80
|
-
# convert weight to an int if not already compressed
|
81
|
-
value = quantize(
|
82
|
-
x=value,
|
83
|
-
scale=scale,
|
84
|
-
zero_point=zp,
|
85
|
-
args=quant_args,
|
86
|
-
dtype=torch.int8,
|
87
|
-
)
|
88
|
-
value = pack_to_int32(value.cpu(), quant_args.num_bits)
|
89
|
-
compressed_dict[merge_names(prefix, "weight_shape")] = shape
|
90
|
-
compressed_dict[merge_names(prefix, "weight_packed")] = value
|
91
|
-
continue
|
92
|
-
|
93
|
-
elif name.endswith("zero_point"):
|
94
|
-
if torch.all(value == 0):
|
95
|
-
# all zero_points are 0, no need to include in
|
96
|
-
# compressed state_dict
|
97
|
-
continue
|
98
|
-
|
99
|
-
compressed_dict[name] = value.to("cpu")
|
100
|
-
|
101
|
-
return compressed_dict
|
102
|
-
|
103
|
-
def decompress(
|
104
|
-
self,
|
105
|
-
path_to_model_or_tensors: str,
|
106
|
-
names_to_scheme: Dict[str, QuantizationArgs],
|
107
|
-
device: str = "cpu",
|
108
|
-
) -> Generator[Tuple[str, Tensor], None, None]:
|
109
|
-
"""
|
110
|
-
Reads a compressed state dict located at path_to_model_or_tensors
|
111
|
-
and returns a generator for sequentially decompressing back to a
|
112
|
-
dense state dict
|
113
|
-
|
114
|
-
:param model_path: path to compressed safetensors model (directory with
|
115
|
-
one or more safetensors files) or compressed tensors file
|
116
|
-
:param device: optional device to load intermediate weights into
|
117
|
-
:return: compressed state dict
|
118
|
-
"""
|
119
|
-
weight_mappings = get_nested_weight_mappings(
|
120
|
-
path_to_model_or_tensors, self.COMPRESSION_PARAM_NAMES
|
121
|
-
)
|
122
|
-
for weight_name in weight_mappings.keys():
|
123
|
-
weight_data = {}
|
124
|
-
for param_name, safe_path in weight_mappings[weight_name].items():
|
125
|
-
weight_data["num_bits"] = names_to_scheme.get(weight_name).num_bits
|
126
|
-
full_name = merge_names(weight_name, param_name)
|
127
|
-
with safe_open(safe_path, framework="pt", device=device) as f:
|
128
|
-
weight_data[param_name] = f.get_tensor(full_name)
|
129
|
-
|
130
|
-
if "weight_scale" in weight_data:
|
131
|
-
zero_point = weight_data.get("weight_zero_point", None)
|
132
|
-
scale = weight_data["weight_scale"]
|
133
|
-
weight = weight_data["weight_packed"]
|
134
|
-
num_bits = weight_data["num_bits"]
|
135
|
-
original_shape = torch.Size(weight_data["weight_shape"])
|
136
|
-
unpacked = unpack_from_int32(weight, num_bits, original_shape)
|
137
|
-
decompressed = dequantize(
|
138
|
-
x_q=unpacked,
|
139
|
-
scale=scale,
|
140
|
-
zero_point=zero_point,
|
141
|
-
)
|
142
|
-
yield merge_names(weight_name, "weight"), decompressed
|
143
|
-
|
144
|
-
|
145
|
-
def pack_to_int32(value: torch.Tensor, num_bits: int) -> torch.Tensor:
|
146
|
-
"""
|
147
|
-
Packs a tensor of quantized weights stored in int8 into int32s with padding
|
148
|
-
|
149
|
-
:param value: tensor to pack
|
150
|
-
:param num_bits: number of bits used to store underlying data
|
151
|
-
:returns: packed int32 tensor
|
152
|
-
"""
|
153
|
-
if value.dtype is not torch.int8:
|
154
|
-
raise ValueError("Tensor must be quantized to torch.int8 before packing")
|
155
|
-
|
156
|
-
if num_bits > 8:
|
157
|
-
raise ValueError("Packing is only supported for less than 8 bits")
|
158
|
-
|
159
|
-
# convert to unsigned for packing
|
160
|
-
offset = pow(2, num_bits) // 2
|
161
|
-
value = (value + offset).to(torch.uint8)
|
162
|
-
value = value.cpu().numpy().astype(np.uint32)
|
163
|
-
pack_factor = 32 // num_bits
|
164
|
-
|
165
|
-
# pad input tensor and initialize packed output
|
166
|
-
packed_size = math.ceil(value.shape[1] / pack_factor)
|
167
|
-
packed = np.zeros((value.shape[0], packed_size), dtype=np.uint32)
|
168
|
-
padding = packed.shape[1] * pack_factor - value.shape[1]
|
169
|
-
value = np.pad(value, pad_width=[(0, 0), (0, padding)], constant_values=0)
|
170
|
-
|
171
|
-
# pack values
|
172
|
-
for i in range(pack_factor):
|
173
|
-
packed |= value[:, i::pack_factor] << num_bits * i
|
174
|
-
|
175
|
-
# convert back to signed and torch
|
176
|
-
packed = np.ascontiguousarray(packed).view(np.int32)
|
177
|
-
return torch.from_numpy(packed)
|
178
|
-
|
179
|
-
|
180
|
-
def unpack_from_int32(
|
181
|
-
value: torch.Tensor, num_bits: int, shape: torch.Size
|
182
|
-
) -> torch.Tensor:
|
183
|
-
"""
|
184
|
-
Unpacks a tensor of packed int32 weights into individual int8s, maintaining the
|
185
|
-
original their bit range
|
186
|
-
|
187
|
-
:param value: tensor to upack
|
188
|
-
:param num_bits: number of bits to unpack each data point into
|
189
|
-
:param shape: shape to unpack into, used to remove padding
|
190
|
-
:returns: unpacked int8 tensor
|
191
|
-
"""
|
192
|
-
if value.dtype is not torch.int32:
|
193
|
-
raise ValueError(
|
194
|
-
f"Expected {torch.int32} but got {value.dtype}, Aborting unpack."
|
195
|
-
)
|
196
|
-
|
197
|
-
if num_bits > 8:
|
198
|
-
raise ValueError("Unpacking is only supported for less than 8 bits")
|
199
|
-
|
200
|
-
# convert packed input to unsigned numpy
|
201
|
-
value = value.numpy().view(np.uint32)
|
202
|
-
pack_factor = 32 // num_bits
|
203
|
-
|
204
|
-
# unpack
|
205
|
-
mask = pow(2, num_bits) - 1
|
206
|
-
unpacked = np.zeros((value.shape[0], value.shape[1] * pack_factor))
|
207
|
-
for i in range(pack_factor):
|
208
|
-
unpacked[:, i::pack_factor] = (value >> (num_bits * i)) & mask
|
209
|
-
|
210
|
-
# remove padding
|
211
|
-
original_row_size = int(shape[1])
|
212
|
-
unpacked = unpacked[:, :original_row_size]
|
213
|
-
|
214
|
-
# bits are packed in unsigned format, reformat to signed
|
215
|
-
# update the value range from unsigned to signed
|
216
|
-
offset = pow(2, num_bits) // 2
|
217
|
-
unpacked = (unpacked.astype(np.int16) - offset).astype(np.int8)
|
218
|
-
|
219
|
-
return torch.from_numpy(unpacked)
|
@@ -1,48 +0,0 @@
|
|
1
|
-
compressed_tensors/__init__.py,sha256=SV1csvHUVCd8kHXz6UDZim1HZ_fAVG3vfk-j_4Bb6hY,789
|
2
|
-
compressed_tensors/base.py,sha256=Mq4mfVQcJhNpha-BXzpOfpmFIdl01o09BJE7D2oQ_00,796
|
3
|
-
compressed_tensors/version.py,sha256=FIBA21q-DEUbdp_Zie9KwkE5xE_plFRmUQoWtEVn2Kw,1585
|
4
|
-
compressed_tensors/compressors/__init__.py,sha256=wmX4VnkUTS63xBwK5-6w8FP78bNZpcdcqvf2KOEC5E4,1133
|
5
|
-
compressed_tensors/compressors/base.py,sha256=-rqT2h9G2iwDkwrVj0d0jxxn9h0dccJA1mqOzVEkwGM,2144
|
6
|
-
compressed_tensors/compressors/dense.py,sha256=xcWECjcRY4INN6jC7vHx5wvUX3NmnKlxA9SVE1A6m2Q,1267
|
7
|
-
compressed_tensors/compressors/helpers.py,sha256=k9avlkmeYj6vkOAvl-MgcixtP7ib24SCfhzZ-RusXfw,5403
|
8
|
-
compressed_tensors/compressors/marlin_24.py,sha256=e7fGUyZbjUpA5VUMCPxqcYPGNiwoDKupHJaXWCoVKRw,9410
|
9
|
-
compressed_tensors/compressors/model_compressor.py,sha256=b7jPE4czwP9uulIZML5qUQAvQaQzElwzUGwat7jlpgI,13352
|
10
|
-
compressed_tensors/compressors/naive_quantized.py,sha256=6_1wuTF96-lw-UzzrsiEX_ipciKiQQJoZ8uotVwtbyQ,5569
|
11
|
-
compressed_tensors/compressors/pack_quantized.py,sha256=tnhqvkko6fIaTywI2JNvh5lE2xXWKJ_hYShv_s6C9Vk,8506
|
12
|
-
compressed_tensors/compressors/sparse_bitmask.py,sha256=kiDwBlFV0sJGLcIdDYxIiuF64ccgwDfqq1hWRQThYDc,8647
|
13
|
-
compressed_tensors/config/__init__.py,sha256=ZBqWn3r6ku1qfmlHHYp0mQueY0i7Pwhr9rbQk9dDlMc,704
|
14
|
-
compressed_tensors/config/base.py,sha256=caSZ7xZ_kgcHRMXZ5hM1i6TKbgY__CkiSjZ93imHZQ0,1562
|
15
|
-
compressed_tensors/config/dense.py,sha256=NgSxnFCnckU9-iunxEaqiFwqgdO7YYxlWKR74jNbjks,1317
|
16
|
-
compressed_tensors/config/sparse_bitmask.py,sha256=pZUboRNZTu6NajGOQEFExoPknak5ynVAUeiiYpS1Gt8,1308
|
17
|
-
compressed_tensors/quantization/__init__.py,sha256=83J5bPB7PavN2TfCoW7_vEDhfYpm4TDrqYO9vdSQ5bk,760
|
18
|
-
compressed_tensors/quantization/quant_args.py,sha256=Vc_tWSTcbZZsMJlACpLq4JEPvGx87izc8VEx-mcXjoM,5621
|
19
|
-
compressed_tensors/quantization/quant_config.py,sha256=NpVu8YJ4Xw2pIQW_PGaNaml8kx1bUnxkvb0jBYWbKdE,9971
|
20
|
-
compressed_tensors/quantization/quant_scheme.py,sha256=_RKOFJI0T5xJVBLX63UeYkSY4EFAecsBnqzUIVBjeU0,6014
|
21
|
-
compressed_tensors/quantization/lifecycle/__init__.py,sha256=MXE2E7GfIfRRfhrdGy2Og3AZOz5N59B0ZGFcsD89y6c,821
|
22
|
-
compressed_tensors/quantization/lifecycle/apply.py,sha256=sopev9kYAGyLR07ltINR1lpfjwYqx1RbMSiRxMvW6MQ,13607
|
23
|
-
compressed_tensors/quantization/lifecycle/calibration.py,sha256=bCTOb7QLf4knQVhrWDgYzl6ka0Xyjg85JegImMD3qpw,2634
|
24
|
-
compressed_tensors/quantization/lifecycle/compressed.py,sha256=VreB10xPwgSLQQlTu20UCrFpRS--cA7-lx5s7nrPPrg,2247
|
25
|
-
compressed_tensors/quantization/lifecycle/forward.py,sha256=6PSXYcf-R1dOY8zsuIWnBaoyARNymYc3-qvV6-L7SlI,12397
|
26
|
-
compressed_tensors/quantization/lifecycle/frozen.py,sha256=h1XYt89MouBTf3jTYLG_6OdFxIu5q2N8tPjsy6J4E6Y,1726
|
27
|
-
compressed_tensors/quantization/lifecycle/helpers.py,sha256=xDkM3yVpGVnwAdg2aUOmrlDPaOksi-bavSQ5mMeOQlk,1651
|
28
|
-
compressed_tensors/quantization/lifecycle/initialize.py,sha256=oCD8pgmHT3lW5J7zdsSN3YzEQIhTfE7M01R5Wb0wpck,5801
|
29
|
-
compressed_tensors/quantization/observers/__init__.py,sha256=DNH31NQYrIBBcmHsMyFA6whh4pbRsLwuNa6L8AeXaGc,745
|
30
|
-
compressed_tensors/quantization/observers/base.py,sha256=2WO7N2eyXf1r1gxVidos1bUS5o7pcrpug4gQgHIazrQ,6794
|
31
|
-
compressed_tensors/quantization/observers/helpers.py,sha256=s_A23Qa_BLfOdHJCN5bm-qPWkhjjj_RIVrhSp1Y9Dtk,4211
|
32
|
-
compressed_tensors/quantization/observers/memoryless.py,sha256=jH_c6K3gxf4W3VNXQ7tbnP-J_86QTrEfjBn6Kh1C-H8,2165
|
33
|
-
compressed_tensors/quantization/observers/min_max.py,sha256=UK7zCMzxv9GGn6BflBxdajV20RiWaCY2RHcvZodCP1w,3669
|
34
|
-
compressed_tensors/quantization/utils/__init__.py,sha256=VdtEmP0bvuND_IGQnyqUPc5lnFp-1_yD7StKSX4x80w,656
|
35
|
-
compressed_tensors/quantization/utils/helpers.py,sha256=YjXABJQUnelof-z7qcwck6fnrFLh4uMSrOmPiqNp_RY,8591
|
36
|
-
compressed_tensors/registry/__init__.py,sha256=FwLSNYqfIrb5JD_6OK_MT4_svvKTN_nEhpgQlQvGbjI,658
|
37
|
-
compressed_tensors/registry/registry.py,sha256=fxjOjh2wklCvJhQxwofdy-zV8q7MkQ85SLG77nml2iA,11890
|
38
|
-
compressed_tensors/utils/__init__.py,sha256=rvbIJlvdKYn4iX7r3KP6peCbU5uyMzgxwhsQstLoMxQ,785
|
39
|
-
compressed_tensors/utils/helpers.py,sha256=d3yP9ViQ8R3GzMHfohxNlaokzyrRuj2PyjxWAJZmSws,3156
|
40
|
-
compressed_tensors/utils/offload.py,sha256=BL7_cNAHTKbSta179R5R4ASk6oXuZhTJDY4D_8Lv2OE,3717
|
41
|
-
compressed_tensors/utils/permutations_24.py,sha256=kx6fsfDHebx94zsSzhXGyCyuC9sVyah6BUUir_StT28,2530
|
42
|
-
compressed_tensors/utils/safetensors_load.py,sha256=0MheXwx1jeY12PeISppiSIZHs6rmN2YddwPpFb9V67I,8527
|
43
|
-
compressed_tensors/utils/semi_structured_conversions.py,sha256=g1EZHzdv-ko7ufPX430dp7wE33o6FWJXuSP4zZydCu0,13488
|
44
|
-
compressed_tensors-0.5.0.dist-info/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
|
45
|
-
compressed_tensors-0.5.0.dist-info/METADATA,sha256=3-76mQrjlvd_t6rAENTROg331QC-00aR31tgIerjgIs,5677
|
46
|
-
compressed_tensors-0.5.0.dist-info/WHEEL,sha256=cpQTJ5IWu9CdaPViMhC9YzF8gZuS5-vlfoFihTBC86A,91
|
47
|
-
compressed_tensors-0.5.0.dist-info/top_level.txt,sha256=w2i-GyPs2s1UwVxvutSvN_lM22SXC2hQFBmoMcPnV7Y,19
|
48
|
-
compressed_tensors-0.5.0.dist-info/RECORD,,
|
File without changes
|
File without changes
|