compressed-tensors 0.5.0__py3-none-any.whl → 0.7.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. compressed_tensors/__init__.py +1 -0
  2. compressed_tensors/base.py +2 -0
  3. compressed_tensors/compressors/__init__.py +6 -12
  4. compressed_tensors/compressors/base.py +137 -9
  5. compressed_tensors/compressors/helpers.py +6 -6
  6. compressed_tensors/compressors/model_compressors/__init__.py +17 -0
  7. compressed_tensors/compressors/{model_compressor.py → model_compressors/model_compressor.py} +99 -43
  8. compressed_tensors/compressors/quantized_compressors/__init__.py +18 -0
  9. compressed_tensors/compressors/{naive_quantized.py → quantized_compressors/base.py} +64 -62
  10. compressed_tensors/compressors/quantized_compressors/naive_quantized.py +140 -0
  11. compressed_tensors/compressors/quantized_compressors/pack_quantized.py +211 -0
  12. compressed_tensors/compressors/sparse_compressors/__init__.py +18 -0
  13. compressed_tensors/compressors/sparse_compressors/base.py +110 -0
  14. compressed_tensors/compressors/{dense.py → sparse_compressors/dense.py} +3 -3
  15. compressed_tensors/compressors/{sparse_bitmask.py → sparse_compressors/sparse_bitmask.py} +14 -59
  16. compressed_tensors/compressors/sparse_quantized_compressors/__init__.py +16 -0
  17. compressed_tensors/compressors/{marlin_24.py → sparse_quantized_compressors/marlin_24.py} +3 -3
  18. compressed_tensors/config/base.py +6 -1
  19. compressed_tensors/linear/__init__.py +13 -0
  20. compressed_tensors/linear/compressed_linear.py +87 -0
  21. compressed_tensors/quantization/__init__.py +1 -0
  22. compressed_tensors/quantization/cache.py +201 -0
  23. compressed_tensors/quantization/lifecycle/apply.py +63 -9
  24. compressed_tensors/quantization/lifecycle/calibration.py +7 -7
  25. compressed_tensors/quantization/lifecycle/compressed.py +3 -1
  26. compressed_tensors/quantization/lifecycle/forward.py +126 -44
  27. compressed_tensors/quantization/lifecycle/frozen.py +6 -1
  28. compressed_tensors/quantization/lifecycle/helpers.py +0 -20
  29. compressed_tensors/quantization/lifecycle/initialize.py +138 -55
  30. compressed_tensors/quantization/observers/__init__.py +1 -0
  31. compressed_tensors/quantization/observers/base.py +54 -14
  32. compressed_tensors/quantization/observers/min_max.py +8 -0
  33. compressed_tensors/quantization/observers/mse.py +162 -0
  34. compressed_tensors/quantization/quant_args.py +102 -24
  35. compressed_tensors/quantization/quant_config.py +14 -2
  36. compressed_tensors/quantization/quant_scheme.py +12 -13
  37. compressed_tensors/quantization/utils/helpers.py +44 -19
  38. compressed_tensors/utils/__init__.py +1 -0
  39. compressed_tensors/utils/helpers.py +30 -1
  40. compressed_tensors/utils/offload.py +14 -2
  41. compressed_tensors/utils/permute.py +70 -0
  42. compressed_tensors/utils/safetensors_load.py +2 -0
  43. compressed_tensors/utils/semi_structured_conversions.py +1 -0
  44. compressed_tensors/version.py +1 -1
  45. {compressed_tensors-0.5.0.dist-info → compressed_tensors-0.7.0.dist-info}/METADATA +35 -23
  46. compressed_tensors-0.7.0.dist-info/RECORD +59 -0
  47. {compressed_tensors-0.5.0.dist-info → compressed_tensors-0.7.0.dist-info}/WHEEL +1 -1
  48. compressed_tensors/compressors/pack_quantized.py +0 -219
  49. compressed_tensors-0.5.0.dist-info/RECORD +0 -48
  50. {compressed_tensors-0.5.0.dist-info → compressed_tensors-0.7.0.dist-info}/LICENSE +0 -0
  51. {compressed_tensors-0.5.0.dist-info → compressed_tensors-0.7.0.dist-info}/top_level.txt +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: compressed-tensors
3
- Version: 0.5.0
3
+ Version: 0.7.0
4
4
  Summary: Library for utilization of compressed safetensors of neural network models
5
5
  Home-page: https://github.com/neuralmagic/compressed-tensors
6
6
  Author: Neuralmagic, Inc.
@@ -8,44 +8,56 @@ Author-email: support@neuralmagic.com
8
8
  License: Apache 2.0
9
9
  Description-Content-Type: text/markdown
10
10
  License-File: LICENSE
11
- Requires-Dist: torch >=1.7.0
11
+ Requires-Dist: torch>=1.7.0
12
12
  Requires-Dist: transformers
13
- Requires-Dist: accelerate
14
- Requires-Dist: pydantic >=2.0
13
+ Requires-Dist: pydantic>=2.0
14
+ Provides-Extra: accelerate
15
+ Requires-Dist: accelerate; extra == "accelerate"
15
16
  Provides-Extra: dev
16
- Requires-Dist: black ==22.12.0 ; extra == 'dev'
17
- Requires-Dist: isort ==5.8.0 ; extra == 'dev'
18
- Requires-Dist: wheel >=0.36.2 ; extra == 'dev'
19
- Requires-Dist: flake8 >=3.8.3 ; extra == 'dev'
20
- Requires-Dist: pytest >=6.0.0 ; extra == 'dev'
21
- Requires-Dist: nbconvert >=7.16.3 ; extra == 'dev'
17
+ Requires-Dist: black==22.12.0; extra == "dev"
18
+ Requires-Dist: isort==5.8.0; extra == "dev"
19
+ Requires-Dist: wheel>=0.36.2; extra == "dev"
20
+ Requires-Dist: flake8>=3.8.3; extra == "dev"
21
+ Requires-Dist: pytest>=6.0.0; extra == "dev"
22
+ Requires-Dist: nbconvert>=7.16.3; extra == "dev"
22
23
 
23
- # compressed_tensors
24
+ # compressed-tensors
24
25
 
25
- This repository extends a [safetensors](https://github.com/huggingface/safetensors) format to efficiently store sparse and/or quantized tensors on disk. `compressed-tensors` format supports multiple compression types to minimize the disk space and facilitate the tensor manipulation.
26
+ The `compressed-tensors` library extends the [safetensors](https://github.com/huggingface/safetensors) format, providing a versatile and efficient way to store and manage compressed tensor data. This library supports various quantization and sparsity schemes, making it a unified format for handling different model optimizations like GPTQ, AWQ, SmoothQuant, INT8, FP8, SparseGPT, and more.
26
27
 
27
- ## Motivation
28
+ ## Why `compressed-tensors`?
28
29
 
29
- ### Reduce disk space by saving sparse tensors in a compressed format
30
+ As model compression becomes increasingly important for efficient deployment of LLMs, the landscape of quantization and compression techniques has become increasingly fragmented.
31
+ Each method often comes with its own storage format and loading procedures, making it challenging to work with multiple techniques or switch between them.
32
+ `compressed-tensors` addresses this by providing a single, extensible format that can represent a wide variety of compression schemes.
30
33
 
31
- The compressed format stores the data much more efficiently by taking advantage of two properties of tensors:
34
+ * **Unified Checkpoint Format**: Supports various compression schemes in a single, consistent format.
35
+ * **Wide Compatibility**: Works with popular quantization methods like GPTQ, SmoothQuant, and FP8. See [llm-compressor](https://github.com/vllm-project/llm-compressor)
36
+ * **Flexible Quantization Support**:
37
+ * Weight-only quantization (e.g., W4A16, W8A16, WnA16)
38
+ * Activation quantization (e.g., W8A8)
39
+ * KV cache quantization
40
+ * Non-uniform schemes (different layers can be quantized in different ways!)
41
+ * **Sparsity Support**: Handles both unstructured and semi-structured (e.g., 2:4) sparsity patterns.
42
+ * **Open-Source Integration**: Designed to work seamlessly with Hugging Face models and PyTorch.
32
43
 
33
- - Sparse tensors -> due to a large number of entries that are equal to zero.
34
- - Quantized -> due to their low precision representation.
35
-
36
- ### Introduce an elegant interface to save/load compressed tensors
37
-
38
- The library provides the user with the ability to compress/decompress tensors. The properties of tensors are defined by human-readable configs, allowing the users to understand the compression format at a quick glance.
44
+ This allows developers and researchers to easily experiment with composing different quantization methods, simplify model deployment pipelines, and reduce the overhead of supporting multiple compression formats in inference engines.
39
45
 
40
46
  ## Installation
41
47
 
42
- ### Pip
48
+ ### From [PyPI](https://pypi.org/project/compressed-tensors)
43
49
 
50
+ Stable release:
44
51
  ```bash
45
52
  pip install compressed-tensors
46
53
  ```
47
54
 
48
- ### From source
55
+ Nightly release:
56
+ ```bash
57
+ pip install compressed-tensors-nightly
58
+ ```
59
+
60
+ ### From Source
49
61
 
50
62
  ```bash
51
63
  git clone https://github.com/neuralmagic/compressed-tensors
@@ -0,0 +1,59 @@
1
+ compressed_tensors/__init__.py,sha256=UtKmifNeBCSE2TZSAfduVNNzHY-3V7bLjZ7n7RuXLOE,812
2
+ compressed_tensors/base.py,sha256=73HYH7HY7O2roC89yG_piPFnZwrBfn_i7HmKl90SKc0,875
3
+ compressed_tensors/version.py,sha256=RTYptXdV8f4QbYCRQ13eGeEsq4grNJs6EXgejoZl9EE,1585
4
+ compressed_tensors/compressors/__init__.py,sha256=smSygTSfcfuujRrAXDc6uZm4L_ccV1tWZewqVnOb4lM,825
5
+ compressed_tensors/compressors/base.py,sha256=D9TNwQcjanDiAHODPbg8JUqc66e3j50rctY7A708NEs,6743
6
+ compressed_tensors/compressors/helpers.py,sha256=OK6qxX9j3bHwF9JfIYSGMgBJe2PWjlTA3byXKCJaTIQ,5431
7
+ compressed_tensors/compressors/model_compressors/__init__.py,sha256=5RGGPFu4YqEt_aOdFSQYFYFDjcZFJN0CsMqRtDZz3Js,666
8
+ compressed_tensors/compressors/model_compressors/model_compressor.py,sha256=XJgPsq8KiDfiR4e8bSI38lmoOd2ApqRk1aPcXS2obqY,15600
9
+ compressed_tensors/compressors/quantized_compressors/__init__.py,sha256=09UJq68Pht6Bf-4iP9xYl3tetKsncNPHD8IAGbePsr4,714
10
+ compressed_tensors/compressors/quantized_compressors/base.py,sha256=K1KOnS6Y8nUA1-HN7VhyfsDc01nilW0WfXMUhuD-l8w,5954
11
+ compressed_tensors/compressors/quantized_compressors/naive_quantized.py,sha256=Mmfr-hap-4zw7CzE1mXi0UirknqGidNxw38GGWVgTqM,4916
12
+ compressed_tensors/compressors/quantized_compressors/pack_quantized.py,sha256=9H8UrG5v1GRtslLjOEiUM2dnyxJnR-HJmlsFezQs_r0,7706
13
+ compressed_tensors/compressors/sparse_compressors/__init__.py,sha256=i2TESH27l7KXeOhJ6hShIoI904XX96l-cRQiMR6MAaU,704
14
+ compressed_tensors/compressors/sparse_compressors/base.py,sha256=Ua4rUSGyucEs-YJI5z3oIUF-zqQLrFsQ9f-qKasEdUM,4410
15
+ compressed_tensors/compressors/sparse_compressors/dense.py,sha256=lSKNWRx6H7aUqaJj1j4qbXk8Gkm1UohbnvW1Rvq6Ra4,1284
16
+ compressed_tensors/compressors/sparse_compressors/sparse_bitmask.py,sha256=4fKwCG7ZM8mUtSnjPvubzEHl-mTnxMzwjmcs7L43WLY,6622
17
+ compressed_tensors/compressors/sparse_quantized_compressors/__init__.py,sha256=4f_cwcKXB1nVVMoiKgTFAc8jAPjPLElo-Df_EDm1_xw,675
18
+ compressed_tensors/compressors/sparse_quantized_compressors/marlin_24.py,sha256=akqE7eW8CLTslpWRxERaZ8R0TSm1lS7D1bgZXKL0xi8,9427
19
+ compressed_tensors/config/__init__.py,sha256=ZBqWn3r6ku1qfmlHHYp0mQueY0i7Pwhr9rbQk9dDlMc,704
20
+ compressed_tensors/config/base.py,sha256=BNTFKy12isY7qblwxdi_R1f00EzgrNOXLrfxqLCPT8w,1903
21
+ compressed_tensors/config/dense.py,sha256=NgSxnFCnckU9-iunxEaqiFwqgdO7YYxlWKR74jNbjks,1317
22
+ compressed_tensors/config/sparse_bitmask.py,sha256=pZUboRNZTu6NajGOQEFExoPknak5ynVAUeiiYpS1Gt8,1308
23
+ compressed_tensors/linear/__init__.py,sha256=fH6rjBYAxuwrTzBTlTjTgCYNyh6TCvCqajCz4Im4YrA,617
24
+ compressed_tensors/linear/compressed_linear.py,sha256=0jTTf6XxOAjAYs3tvFtgiNMAO4W10sSeR-pdH2M413g,3218
25
+ compressed_tensors/quantization/__init__.py,sha256=nWP_fsl6Nn0ksEgZPzerGiETdvF-ZfNwPnwGlRiR5pY,805
26
+ compressed_tensors/quantization/cache.py,sha256=vnBB5zasO_XpHomZvzUPVVbzyCz2VgebsHePm0kANzY,6831
27
+ compressed_tensors/quantization/quant_args.py,sha256=73KevZXHyrkMCT_3CxbYHz70fI3i-wcF8NvN0wsBPK4,8271
28
+ compressed_tensors/quantization/quant_config.py,sha256=NCiMvUMnnz5kTyAkDylxjtEGQnjgsIYIeNR2zyHEdTQ,10371
29
+ compressed_tensors/quantization/quant_scheme.py,sha256=uFgp6ECU6ZkHWkeKlAVAzZTLDbrTrzPSPrY23eJluaw,5931
30
+ compressed_tensors/quantization/lifecycle/__init__.py,sha256=MXE2E7GfIfRRfhrdGy2Og3AZOz5N59B0ZGFcsD89y6c,821
31
+ compressed_tensors/quantization/lifecycle/apply.py,sha256=czaayvpeUYyWRJhO_klffw6esptOgA9sBKL5TWQcRdw,15805
32
+ compressed_tensors/quantization/lifecycle/calibration.py,sha256=IuLeRkVQPrMxkMcIjr4OMFlIUMHkqjH4qAxC2KiUBGw,2673
33
+ compressed_tensors/quantization/lifecycle/compressed.py,sha256=Fj9n66IN0EWsOAkBHg3O0GlOQpxstqjCcs0ttzMXrJ0,2296
34
+ compressed_tensors/quantization/lifecycle/forward.py,sha256=eLup6QDRUUp_Ozcas7RDRLIXBWjFbxn5gWbcAIJEGlw,15715
35
+ compressed_tensors/quantization/lifecycle/frozen.py,sha256=NiJw7NP7pcT6idWFa8vksgiLoT8oQ975e57S4QfD2QQ,1874
36
+ compressed_tensors/quantization/lifecycle/helpers.py,sha256=C0mhy2vJ0fCjVeN4kFNhw8Eq1wkteBGHiZ36RVLThRY,944
37
+ compressed_tensors/quantization/lifecycle/initialize.py,sha256=4_YG7jKl7d2-Cy58pOkMtInFRhvYahxYchesWMPxPVM,8862
38
+ compressed_tensors/quantization/observers/__init__.py,sha256=4Sa7rqi5RB_S5bPO8KmncETiqDsoMBhwP37arlQym8s,764
39
+ compressed_tensors/quantization/observers/base.py,sha256=5ovQicWPYHjIxr6-EkQ4lgOX0PpI9g23iSzKpxjM1Zg,8420
40
+ compressed_tensors/quantization/observers/helpers.py,sha256=s_A23Qa_BLfOdHJCN5bm-qPWkhjjj_RIVrhSp1Y9Dtk,4211
41
+ compressed_tensors/quantization/observers/memoryless.py,sha256=jH_c6K3gxf4W3VNXQ7tbnP-J_86QTrEfjBn6Kh1C-H8,2165
42
+ compressed_tensors/quantization/observers/min_max.py,sha256=sQXqU3z-voxIDfR_9mQzwQUflZj2sASm_G8CYaXntFw,3865
43
+ compressed_tensors/quantization/observers/mse.py,sha256=Aeh-253Vbab1F8cYuBiGNn4OXWJ67wXQ_JVfl3mu2a8,6034
44
+ compressed_tensors/quantization/utils/__init__.py,sha256=VdtEmP0bvuND_IGQnyqUPc5lnFp-1_yD7StKSX4x80w,656
45
+ compressed_tensors/quantization/utils/helpers.py,sha256=y4LEyC2oUd876ZMdALWKGH3Ct5EgBJZV4id_NUjTGH8,9531
46
+ compressed_tensors/registry/__init__.py,sha256=FwLSNYqfIrb5JD_6OK_MT4_svvKTN_nEhpgQlQvGbjI,658
47
+ compressed_tensors/registry/registry.py,sha256=fxjOjh2wklCvJhQxwofdy-zV8q7MkQ85SLG77nml2iA,11890
48
+ compressed_tensors/utils/__init__.py,sha256=gS4gSU2pwcAbsKj-6YMaqhm25udFy6ISYaWBf-myRSM,808
49
+ compressed_tensors/utils/helpers.py,sha256=hWGIR0W7ENHwdC7wW2SQJJiCF9-xOu_u3fY2RzLyYg4,4101
50
+ compressed_tensors/utils/offload.py,sha256=d9q8LNe8HyF8tOjgjA7QGLD3HRysmNp0d8eBbdqBgIM,4089
51
+ compressed_tensors/utils/permutations_24.py,sha256=kx6fsfDHebx94zsSzhXGyCyuC9sVyah6BUUir_StT28,2530
52
+ compressed_tensors/utils/permute.py,sha256=V6tJLKo3Syccj-viv4F7ZKZgJeCB-hl-dK8RKI_kBwI,2355
53
+ compressed_tensors/utils/safetensors_load.py,sha256=m08ANVuTBxQdoa6LufDgcNJ7wCLDJolyZljB8VEybAU,8578
54
+ compressed_tensors/utils/semi_structured_conversions.py,sha256=XKNffPum54kPASgqKzgKvyeqWPAkair2XEQXjkp7ho8,13489
55
+ compressed_tensors-0.7.0.dist-info/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
56
+ compressed_tensors-0.7.0.dist-info/METADATA,sha256=Lgcl4rU8ifo0PY-FrurFApAkuTD9HBeJohuULjVqebs,6782
57
+ compressed_tensors-0.7.0.dist-info/WHEEL,sha256=eOLhNAGa2EW3wWl_TU484h7q1UNgy0JXjjoqKoxAAQc,92
58
+ compressed_tensors-0.7.0.dist-info/top_level.txt,sha256=w2i-GyPs2s1UwVxvutSvN_lM22SXC2hQFBmoMcPnV7Y,19
59
+ compressed_tensors-0.7.0.dist-info/RECORD,,
@@ -1,5 +1,5 @@
1
1
  Wheel-Version: 1.0
2
- Generator: setuptools (70.1.0)
2
+ Generator: bdist_wheel (0.44.0)
3
3
  Root-Is-Purelib: true
4
4
  Tag: py3-none-any
5
5
 
@@ -1,219 +0,0 @@
1
- # Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
2
- #
3
- # Licensed under the Apache License, Version 2.0 (the "License");
4
- # you may not use this file except in compliance with the License.
5
- # You may obtain a copy of the License at
6
- #
7
- # http://www.apache.org/licenses/LICENSE-2.0
8
- #
9
- # Unless required by applicable law or agreed to in writing,
10
- # software distributed under the License is distributed on an "AS IS" BASIS,
11
- # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
- # See the License for the specific language governing permissions and
13
- # limitations under the License.
14
-
15
- import logging
16
- import math
17
- from typing import Dict, Generator, Tuple
18
-
19
- import numpy as np
20
- import torch
21
- from compressed_tensors.compressors import Compressor
22
- from compressed_tensors.config import CompressionFormat
23
- from compressed_tensors.quantization import QuantizationArgs
24
- from compressed_tensors.quantization.lifecycle.forward import dequantize, quantize
25
- from compressed_tensors.quantization.utils import can_quantize
26
- from compressed_tensors.utils import get_nested_weight_mappings, merge_names
27
- from safetensors import safe_open
28
- from torch import Tensor
29
- from tqdm import tqdm
30
-
31
-
32
- __all__ = ["PackedQuantizationCompressor", "pack_to_int32", "unpack_from_int32"]
33
-
34
- _LOGGER: logging.Logger = logging.getLogger(__name__)
35
-
36
-
37
- @Compressor.register(name=CompressionFormat.pack_quantized.value)
38
- class PackedQuantizationCompressor(Compressor):
39
- """
40
- Compresses a quantized model by packing every eight 4-bit weights into an int32
41
- """
42
-
43
- COMPRESSION_PARAM_NAMES = [
44
- "weight_packed",
45
- "weight_scale",
46
- "weight_zero_point",
47
- "weight_shape",
48
- ]
49
-
50
- def compress(
51
- self,
52
- model_state: Dict[str, Tensor],
53
- names_to_scheme: Dict[str, QuantizationArgs],
54
- **kwargs,
55
- ) -> Dict[str, Tensor]:
56
- """
57
- Compresses a dense state dict
58
-
59
- :param model_state: state dict of uncompressed model
60
- :param names_to_scheme: quantization args for each quantized weight, needed for
61
- quantize function to calculate bit depth
62
- :return: compressed state dict
63
- """
64
- compressed_dict = {}
65
- weight_suffix = ".weight"
66
- _LOGGER.debug(
67
- f"Compressing model with {len(model_state)} parameterized layers..."
68
- )
69
-
70
- for name, value in tqdm(model_state.items(), desc="Compressing model"):
71
- if name.endswith(weight_suffix):
72
- prefix = name[: -(len(weight_suffix))]
73
- scale = model_state.get(merge_names(prefix, "weight_scale"), None)
74
- zp = model_state.get(merge_names(prefix, "weight_zero_point"), None)
75
- shape = torch.tensor(value.shape)
76
- if scale is not None and zp is not None:
77
- # weight is quantized, compress it
78
- quant_args = names_to_scheme[prefix]
79
- if can_quantize(value, quant_args):
80
- # convert weight to an int if not already compressed
81
- value = quantize(
82
- x=value,
83
- scale=scale,
84
- zero_point=zp,
85
- args=quant_args,
86
- dtype=torch.int8,
87
- )
88
- value = pack_to_int32(value.cpu(), quant_args.num_bits)
89
- compressed_dict[merge_names(prefix, "weight_shape")] = shape
90
- compressed_dict[merge_names(prefix, "weight_packed")] = value
91
- continue
92
-
93
- elif name.endswith("zero_point"):
94
- if torch.all(value == 0):
95
- # all zero_points are 0, no need to include in
96
- # compressed state_dict
97
- continue
98
-
99
- compressed_dict[name] = value.to("cpu")
100
-
101
- return compressed_dict
102
-
103
- def decompress(
104
- self,
105
- path_to_model_or_tensors: str,
106
- names_to_scheme: Dict[str, QuantizationArgs],
107
- device: str = "cpu",
108
- ) -> Generator[Tuple[str, Tensor], None, None]:
109
- """
110
- Reads a compressed state dict located at path_to_model_or_tensors
111
- and returns a generator for sequentially decompressing back to a
112
- dense state dict
113
-
114
- :param model_path: path to compressed safetensors model (directory with
115
- one or more safetensors files) or compressed tensors file
116
- :param device: optional device to load intermediate weights into
117
- :return: compressed state dict
118
- """
119
- weight_mappings = get_nested_weight_mappings(
120
- path_to_model_or_tensors, self.COMPRESSION_PARAM_NAMES
121
- )
122
- for weight_name in weight_mappings.keys():
123
- weight_data = {}
124
- for param_name, safe_path in weight_mappings[weight_name].items():
125
- weight_data["num_bits"] = names_to_scheme.get(weight_name).num_bits
126
- full_name = merge_names(weight_name, param_name)
127
- with safe_open(safe_path, framework="pt", device=device) as f:
128
- weight_data[param_name] = f.get_tensor(full_name)
129
-
130
- if "weight_scale" in weight_data:
131
- zero_point = weight_data.get("weight_zero_point", None)
132
- scale = weight_data["weight_scale"]
133
- weight = weight_data["weight_packed"]
134
- num_bits = weight_data["num_bits"]
135
- original_shape = torch.Size(weight_data["weight_shape"])
136
- unpacked = unpack_from_int32(weight, num_bits, original_shape)
137
- decompressed = dequantize(
138
- x_q=unpacked,
139
- scale=scale,
140
- zero_point=zero_point,
141
- )
142
- yield merge_names(weight_name, "weight"), decompressed
143
-
144
-
145
- def pack_to_int32(value: torch.Tensor, num_bits: int) -> torch.Tensor:
146
- """
147
- Packs a tensor of quantized weights stored in int8 into int32s with padding
148
-
149
- :param value: tensor to pack
150
- :param num_bits: number of bits used to store underlying data
151
- :returns: packed int32 tensor
152
- """
153
- if value.dtype is not torch.int8:
154
- raise ValueError("Tensor must be quantized to torch.int8 before packing")
155
-
156
- if num_bits > 8:
157
- raise ValueError("Packing is only supported for less than 8 bits")
158
-
159
- # convert to unsigned for packing
160
- offset = pow(2, num_bits) // 2
161
- value = (value + offset).to(torch.uint8)
162
- value = value.cpu().numpy().astype(np.uint32)
163
- pack_factor = 32 // num_bits
164
-
165
- # pad input tensor and initialize packed output
166
- packed_size = math.ceil(value.shape[1] / pack_factor)
167
- packed = np.zeros((value.shape[0], packed_size), dtype=np.uint32)
168
- padding = packed.shape[1] * pack_factor - value.shape[1]
169
- value = np.pad(value, pad_width=[(0, 0), (0, padding)], constant_values=0)
170
-
171
- # pack values
172
- for i in range(pack_factor):
173
- packed |= value[:, i::pack_factor] << num_bits * i
174
-
175
- # convert back to signed and torch
176
- packed = np.ascontiguousarray(packed).view(np.int32)
177
- return torch.from_numpy(packed)
178
-
179
-
180
- def unpack_from_int32(
181
- value: torch.Tensor, num_bits: int, shape: torch.Size
182
- ) -> torch.Tensor:
183
- """
184
- Unpacks a tensor of packed int32 weights into individual int8s, maintaining the
185
- original their bit range
186
-
187
- :param value: tensor to upack
188
- :param num_bits: number of bits to unpack each data point into
189
- :param shape: shape to unpack into, used to remove padding
190
- :returns: unpacked int8 tensor
191
- """
192
- if value.dtype is not torch.int32:
193
- raise ValueError(
194
- f"Expected {torch.int32} but got {value.dtype}, Aborting unpack."
195
- )
196
-
197
- if num_bits > 8:
198
- raise ValueError("Unpacking is only supported for less than 8 bits")
199
-
200
- # convert packed input to unsigned numpy
201
- value = value.numpy().view(np.uint32)
202
- pack_factor = 32 // num_bits
203
-
204
- # unpack
205
- mask = pow(2, num_bits) - 1
206
- unpacked = np.zeros((value.shape[0], value.shape[1] * pack_factor))
207
- for i in range(pack_factor):
208
- unpacked[:, i::pack_factor] = (value >> (num_bits * i)) & mask
209
-
210
- # remove padding
211
- original_row_size = int(shape[1])
212
- unpacked = unpacked[:, :original_row_size]
213
-
214
- # bits are packed in unsigned format, reformat to signed
215
- # update the value range from unsigned to signed
216
- offset = pow(2, num_bits) // 2
217
- unpacked = (unpacked.astype(np.int16) - offset).astype(np.int8)
218
-
219
- return torch.from_numpy(unpacked)
@@ -1,48 +0,0 @@
1
- compressed_tensors/__init__.py,sha256=SV1csvHUVCd8kHXz6UDZim1HZ_fAVG3vfk-j_4Bb6hY,789
2
- compressed_tensors/base.py,sha256=Mq4mfVQcJhNpha-BXzpOfpmFIdl01o09BJE7D2oQ_00,796
3
- compressed_tensors/version.py,sha256=FIBA21q-DEUbdp_Zie9KwkE5xE_plFRmUQoWtEVn2Kw,1585
4
- compressed_tensors/compressors/__init__.py,sha256=wmX4VnkUTS63xBwK5-6w8FP78bNZpcdcqvf2KOEC5E4,1133
5
- compressed_tensors/compressors/base.py,sha256=-rqT2h9G2iwDkwrVj0d0jxxn9h0dccJA1mqOzVEkwGM,2144
6
- compressed_tensors/compressors/dense.py,sha256=xcWECjcRY4INN6jC7vHx5wvUX3NmnKlxA9SVE1A6m2Q,1267
7
- compressed_tensors/compressors/helpers.py,sha256=k9avlkmeYj6vkOAvl-MgcixtP7ib24SCfhzZ-RusXfw,5403
8
- compressed_tensors/compressors/marlin_24.py,sha256=e7fGUyZbjUpA5VUMCPxqcYPGNiwoDKupHJaXWCoVKRw,9410
9
- compressed_tensors/compressors/model_compressor.py,sha256=b7jPE4czwP9uulIZML5qUQAvQaQzElwzUGwat7jlpgI,13352
10
- compressed_tensors/compressors/naive_quantized.py,sha256=6_1wuTF96-lw-UzzrsiEX_ipciKiQQJoZ8uotVwtbyQ,5569
11
- compressed_tensors/compressors/pack_quantized.py,sha256=tnhqvkko6fIaTywI2JNvh5lE2xXWKJ_hYShv_s6C9Vk,8506
12
- compressed_tensors/compressors/sparse_bitmask.py,sha256=kiDwBlFV0sJGLcIdDYxIiuF64ccgwDfqq1hWRQThYDc,8647
13
- compressed_tensors/config/__init__.py,sha256=ZBqWn3r6ku1qfmlHHYp0mQueY0i7Pwhr9rbQk9dDlMc,704
14
- compressed_tensors/config/base.py,sha256=caSZ7xZ_kgcHRMXZ5hM1i6TKbgY__CkiSjZ93imHZQ0,1562
15
- compressed_tensors/config/dense.py,sha256=NgSxnFCnckU9-iunxEaqiFwqgdO7YYxlWKR74jNbjks,1317
16
- compressed_tensors/config/sparse_bitmask.py,sha256=pZUboRNZTu6NajGOQEFExoPknak5ynVAUeiiYpS1Gt8,1308
17
- compressed_tensors/quantization/__init__.py,sha256=83J5bPB7PavN2TfCoW7_vEDhfYpm4TDrqYO9vdSQ5bk,760
18
- compressed_tensors/quantization/quant_args.py,sha256=Vc_tWSTcbZZsMJlACpLq4JEPvGx87izc8VEx-mcXjoM,5621
19
- compressed_tensors/quantization/quant_config.py,sha256=NpVu8YJ4Xw2pIQW_PGaNaml8kx1bUnxkvb0jBYWbKdE,9971
20
- compressed_tensors/quantization/quant_scheme.py,sha256=_RKOFJI0T5xJVBLX63UeYkSY4EFAecsBnqzUIVBjeU0,6014
21
- compressed_tensors/quantization/lifecycle/__init__.py,sha256=MXE2E7GfIfRRfhrdGy2Og3AZOz5N59B0ZGFcsD89y6c,821
22
- compressed_tensors/quantization/lifecycle/apply.py,sha256=sopev9kYAGyLR07ltINR1lpfjwYqx1RbMSiRxMvW6MQ,13607
23
- compressed_tensors/quantization/lifecycle/calibration.py,sha256=bCTOb7QLf4knQVhrWDgYzl6ka0Xyjg85JegImMD3qpw,2634
24
- compressed_tensors/quantization/lifecycle/compressed.py,sha256=VreB10xPwgSLQQlTu20UCrFpRS--cA7-lx5s7nrPPrg,2247
25
- compressed_tensors/quantization/lifecycle/forward.py,sha256=6PSXYcf-R1dOY8zsuIWnBaoyARNymYc3-qvV6-L7SlI,12397
26
- compressed_tensors/quantization/lifecycle/frozen.py,sha256=h1XYt89MouBTf3jTYLG_6OdFxIu5q2N8tPjsy6J4E6Y,1726
27
- compressed_tensors/quantization/lifecycle/helpers.py,sha256=xDkM3yVpGVnwAdg2aUOmrlDPaOksi-bavSQ5mMeOQlk,1651
28
- compressed_tensors/quantization/lifecycle/initialize.py,sha256=oCD8pgmHT3lW5J7zdsSN3YzEQIhTfE7M01R5Wb0wpck,5801
29
- compressed_tensors/quantization/observers/__init__.py,sha256=DNH31NQYrIBBcmHsMyFA6whh4pbRsLwuNa6L8AeXaGc,745
30
- compressed_tensors/quantization/observers/base.py,sha256=2WO7N2eyXf1r1gxVidos1bUS5o7pcrpug4gQgHIazrQ,6794
31
- compressed_tensors/quantization/observers/helpers.py,sha256=s_A23Qa_BLfOdHJCN5bm-qPWkhjjj_RIVrhSp1Y9Dtk,4211
32
- compressed_tensors/quantization/observers/memoryless.py,sha256=jH_c6K3gxf4W3VNXQ7tbnP-J_86QTrEfjBn6Kh1C-H8,2165
33
- compressed_tensors/quantization/observers/min_max.py,sha256=UK7zCMzxv9GGn6BflBxdajV20RiWaCY2RHcvZodCP1w,3669
34
- compressed_tensors/quantization/utils/__init__.py,sha256=VdtEmP0bvuND_IGQnyqUPc5lnFp-1_yD7StKSX4x80w,656
35
- compressed_tensors/quantization/utils/helpers.py,sha256=YjXABJQUnelof-z7qcwck6fnrFLh4uMSrOmPiqNp_RY,8591
36
- compressed_tensors/registry/__init__.py,sha256=FwLSNYqfIrb5JD_6OK_MT4_svvKTN_nEhpgQlQvGbjI,658
37
- compressed_tensors/registry/registry.py,sha256=fxjOjh2wklCvJhQxwofdy-zV8q7MkQ85SLG77nml2iA,11890
38
- compressed_tensors/utils/__init__.py,sha256=rvbIJlvdKYn4iX7r3KP6peCbU5uyMzgxwhsQstLoMxQ,785
39
- compressed_tensors/utils/helpers.py,sha256=d3yP9ViQ8R3GzMHfohxNlaokzyrRuj2PyjxWAJZmSws,3156
40
- compressed_tensors/utils/offload.py,sha256=BL7_cNAHTKbSta179R5R4ASk6oXuZhTJDY4D_8Lv2OE,3717
41
- compressed_tensors/utils/permutations_24.py,sha256=kx6fsfDHebx94zsSzhXGyCyuC9sVyah6BUUir_StT28,2530
42
- compressed_tensors/utils/safetensors_load.py,sha256=0MheXwx1jeY12PeISppiSIZHs6rmN2YddwPpFb9V67I,8527
43
- compressed_tensors/utils/semi_structured_conversions.py,sha256=g1EZHzdv-ko7ufPX430dp7wE33o6FWJXuSP4zZydCu0,13488
44
- compressed_tensors-0.5.0.dist-info/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
45
- compressed_tensors-0.5.0.dist-info/METADATA,sha256=3-76mQrjlvd_t6rAENTROg331QC-00aR31tgIerjgIs,5677
46
- compressed_tensors-0.5.0.dist-info/WHEEL,sha256=cpQTJ5IWu9CdaPViMhC9YzF8gZuS5-vlfoFihTBC86A,91
47
- compressed_tensors-0.5.0.dist-info/top_level.txt,sha256=w2i-GyPs2s1UwVxvutSvN_lM22SXC2hQFBmoMcPnV7Y,19
48
- compressed_tensors-0.5.0.dist-info/RECORD,,