PyPI - neural-compressor - Versions diffs - 3.2__tar.gz → 3.3__tar.gz - Mend

neural-compressor 3.2tar.gz → 3.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (603) hide show

{neural_compressor-3.2 → neural_compressor-3.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
-Metadata-Version: 2.1
+Metadata-Version: 2.2
 Name: neural_compressor
-Version: 3.2
+Version: 3.3
 Summary: Repository of Intel® Neural Compressor
 Home-page: https://github.com/intel/neural-compressor
 Author: Intel AIPT Team
@@ -43,6 +43,18 @@ Requires-Dist: py-cpuinfo; extra == "tf"
 Requires-Dist: pydantic; extra == "tf"
 Requires-Dist: pyyaml; extra == "tf"
 Requires-Dist: tensorflow; extra == "tf"
+Dynamic: author
+Dynamic: author-email
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: home-page
+Dynamic: keywords
+Dynamic: license
+Dynamic: provides-extra
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
 <div align="center">
@@ -51,7 +63,7 @@ Intel® Neural Compressor
 <h3> An open-source Python library supporting popular model compression techniques on all mainstream deep learning frameworks (TensorFlow, PyTorch, and ONNX Runtime)</h3>
 [![python](https://img.shields.io/badge/python-3.8%2B-blue)](https://github.com/intel/neural-compressor)
-[![version](https://img.shields.io/badge/release-3.2-green)](https://github.com/intel/neural-compressor/releases)
+[![version](https://img.shields.io/badge/release-3.3-green)](https://github.com/intel/neural-compressor/releases)
 [![license](https://img.shields.io/badge/license-Apache%202-blue)](https://github.com/intel/neural-compressor/blob/master/LICENSE)
 [![coverage](https://img.shields.io/badge/coverage-85%25-green)](https://github.com/intel/neural-compressor)
 [![Downloads](https://static.pepy.tech/personalized-badge/neural-compressor?period=total&units=international_system&left_color=grey&right_color=green&left_text=downloads)](https://pepy.tech/project/neural-compressor)
@@ -78,55 +90,33 @@ support AMD CPU, ARM CPU, and NVidia GPU through ONNX Runtime with limited testi
 * [2024/07] Performance optimizations and usability improvements on [client-side](./docs/source/3x/client_quant.md).
 ## Installation
+Choose the necessary framework dependencies to install based on your deploy environment.
 ### Install Framework
-#### Install torch for CPU
-```Shell
-pip install torch --index-url https://download.pytorch.org/whl/cpu
+* [Install intel_extension_for_pytorch for CPU](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/)
+* [Install intel_extension_for_pytorch for XPU](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/)
+* [Use Docker Image with torch installed for HPU](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#bare-metal-fresh-os-single-click)
+  **Note**: There is a version mapping between Intel Neural Compressor and Gaudi Software Stack, please refer to this [table](./docs/source/3x/gaudi_version_map.md) and make sure to use a matched combination.
+* [Install torch for other platform](https://pytorch.org/get-started/locally)
+* [Install TensorFlow](https://www.tensorflow.org/install)
+### Install Neural Compressor from pypi
 ```
-#### Use Docker Image with torch installed for HPU
-https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#bare-metal-fresh-os-single-click
-> **Note**:
-> There is a version mapping between Intel Neural Compressor and Gaudi Software Stack, please refer to this [table](./docs/source/3x/gaudi_version_map.md) and make sure to use a matched combination.
-#### Install torch/intel_extension_for_pytorch for Intel GPU
-https://intel.github.io/intel-extension-for-pytorch/index.html#installation
-#### Install torch for other platform
-https://pytorch.org/get-started/locally
-#### Install tensorflow
-```Shell
-pip install tensorflow
-```
-### Install from pypi
-```Shell
 # Install 2.X API + Framework extension API + PyTorch dependency
 pip install neural-compressor[pt]
 # Install 2.X API + Framework extension API + TensorFlow dependency
 pip install neural-compressor[tf]
-```
-> **Note**:
-> Further installation methods can be found under [Installation Guide](./docs/source/installation_guide.md). check out our [FAQ](./docs/source/faq.md) for more details.
+```
+**Note**: Further installation methods can be found under [Installation Guide](./docs/source/installation_guide.md). check out our [FAQ](./docs/source/faq.md) for more details.
 ## Getting Started
+After successfully installing these packages, try your first quantization program. **Following example code demonstrates FP8 Quantization**, it is supported by Intel Gaudi2 AI Accelerator.
+To try on Intel Gaudi2, docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built).
-Setting up the environment:
-```bash
-pip install "neural-compressor>=2.3" "transformers>=4.34.0" torch torchvision
+Run a container with an interactive shell, [more info](https://docs.habana.ai/en/latest/Installation_Guide/Additional_Installation/Docker_Installation.html#docker-installation)
 ```
-After successfully installing these packages, try your first quantization program.
-### [FP8 Quantization](./docs/source/3x/PT_FP8Quant.md)
-Following example code demonstrates FP8 Quantization, it is supported by Intel Gaudi2 AI Accelerator.
-To try on Intel Gaudi2, docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built).
-```bash
-# Run a container with an interactive shell
-docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.19.0/ubuntu24.04/habanalabs/pytorch-installer-2.5.1:latest
+docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.20.0/ubuntu24.04/habanalabs/pytorch-installer-2.6.0:latest
 ```
-Run the example:
+Run the example,
 ```python
 from neural_compressor.torch.quantization import (
     FP8Config,
@@ -148,12 +138,10 @@ model = convert(model)
 output = model(torch.randn(1, 3, 224, 224).to("hpu")).to("cpu")
 print(output.shape)
-```
-### Weight-Only Large Language Model Loading (LLMs)
-Following example code demonstrates weight-only large language model loading on Intel Gaudi2 AI Accelerator.
+```
+More [FP8 quantization doc](./docs/source/3x/PT_FP8Quant.md).
+**Following example code demonstrates weight-only large language model loading** on Intel Gaudi2 AI Accelerator.
 ```python
 from neural_compressor.torch.quantization import load
@@ -165,10 +153,7 @@ model = load(
     torch_dtype=torch.bfloat16,
 )
 ```
-**Note:**
-Intel Neural Compressor will convert the model format from auto-gptq to hpu format on the first load and save hpu_model.safetensors to the local cache directory for the next load. So it may take a while to load for the first time.
+**Note:** Intel Neural Compressor will convert the model format from auto-gptq to hpu format on the first load and save hpu_model.safetensors to the local cache directory for the next load. So it may take a while to load for the first time.
 ## Documentation

{neural_compressor-3.2 → neural_compressor-3.3}/README.md RENAMED Viewed

@@ -5,7 +5,7 @@ Intel® Neural Compressor
 <h3> An open-source Python library supporting popular model compression techniques on all mainstream deep learning frameworks (TensorFlow, PyTorch, and ONNX Runtime)</h3>
 [![python](https://img.shields.io/badge/python-3.8%2B-blue)](https://github.com/intel/neural-compressor)
-[![version](https://img.shields.io/badge/release-3.2-green)](https://github.com/intel/neural-compressor/releases)
+[![version](https://img.shields.io/badge/release-3.3-green)](https://github.com/intel/neural-compressor/releases)
 [![license](https://img.shields.io/badge/license-Apache%202-blue)](https://github.com/intel/neural-compressor/blob/master/LICENSE)
 [![coverage](https://img.shields.io/badge/coverage-85%25-green)](https://github.com/intel/neural-compressor)
 [![Downloads](https://static.pepy.tech/personalized-badge/neural-compressor?period=total&units=international_system&left_color=grey&right_color=green&left_text=downloads)](https://pepy.tech/project/neural-compressor)
@@ -32,55 +32,33 @@ support AMD CPU, ARM CPU, and NVidia GPU through ONNX Runtime with limited testi
 * [2024/07] Performance optimizations and usability improvements on [client-side](./docs/source/3x/client_quant.md).
 ## Installation
+Choose the necessary framework dependencies to install based on your deploy environment.
 ### Install Framework
-#### Install torch for CPU
-```Shell
-pip install torch --index-url https://download.pytorch.org/whl/cpu
+* [Install intel_extension_for_pytorch for CPU](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/)
+* [Install intel_extension_for_pytorch for XPU](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/)
+* [Use Docker Image with torch installed for HPU](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#bare-metal-fresh-os-single-click)
+  **Note**: There is a version mapping between Intel Neural Compressor and Gaudi Software Stack, please refer to this [table](./docs/source/3x/gaudi_version_map.md) and make sure to use a matched combination.
+* [Install torch for other platform](https://pytorch.org/get-started/locally)
+* [Install TensorFlow](https://www.tensorflow.org/install)
+### Install Neural Compressor from pypi
 ```
-#### Use Docker Image with torch installed for HPU
-https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#bare-metal-fresh-os-single-click
-> **Note**:
-> There is a version mapping between Intel Neural Compressor and Gaudi Software Stack, please refer to this [table](./docs/source/3x/gaudi_version_map.md) and make sure to use a matched combination.
-#### Install torch/intel_extension_for_pytorch for Intel GPU
-https://intel.github.io/intel-extension-for-pytorch/index.html#installation
-#### Install torch for other platform
-https://pytorch.org/get-started/locally
-#### Install tensorflow
-```Shell
-pip install tensorflow
-```
-### Install from pypi
-```Shell
 # Install 2.X API + Framework extension API + PyTorch dependency
 pip install neural-compressor[pt]
 # Install 2.X API + Framework extension API + TensorFlow dependency
 pip install neural-compressor[tf]
-```
-> **Note**:
-> Further installation methods can be found under [Installation Guide](./docs/source/installation_guide.md). check out our [FAQ](./docs/source/faq.md) for more details.
+```
+**Note**: Further installation methods can be found under [Installation Guide](./docs/source/installation_guide.md). check out our [FAQ](./docs/source/faq.md) for more details.
 ## Getting Started
+After successfully installing these packages, try your first quantization program. **Following example code demonstrates FP8 Quantization**, it is supported by Intel Gaudi2 AI Accelerator.
+To try on Intel Gaudi2, docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built).
-Setting up the environment:
-```bash
-pip install "neural-compressor>=2.3" "transformers>=4.34.0" torch torchvision
+Run a container with an interactive shell, [more info](https://docs.habana.ai/en/latest/Installation_Guide/Additional_Installation/Docker_Installation.html#docker-installation)
 ```
-After successfully installing these packages, try your first quantization program.
-### [FP8 Quantization](./docs/source/3x/PT_FP8Quant.md)
-Following example code demonstrates FP8 Quantization, it is supported by Intel Gaudi2 AI Accelerator.
-To try on Intel Gaudi2, docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built).
-```bash
-# Run a container with an interactive shell
-docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.19.0/ubuntu24.04/habanalabs/pytorch-installer-2.5.1:latest
+docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.20.0/ubuntu24.04/habanalabs/pytorch-installer-2.6.0:latest
 ```
-Run the example:
+Run the example,
 ```python
 from neural_compressor.torch.quantization import (
     FP8Config,
@@ -102,12 +80,10 @@ model = convert(model)
 output = model(torch.randn(1, 3, 224, 224).to("hpu")).to("cpu")
 print(output.shape)
-```
-### Weight-Only Large Language Model Loading (LLMs)
-Following example code demonstrates weight-only large language model loading on Intel Gaudi2 AI Accelerator.
+```
+More [FP8 quantization doc](./docs/source/3x/PT_FP8Quant.md).
+**Following example code demonstrates weight-only large language model loading** on Intel Gaudi2 AI Accelerator.
 ```python
 from neural_compressor.torch.quantization import load
@@ -119,10 +95,7 @@ model = load(
     torch_dtype=torch.bfloat16,
 )
 ```
-**Note:**
-Intel Neural Compressor will convert the model format from auto-gptq to hpu format on the first load and save hpu_model.safetensors to the local cache directory for the next load. So it may take a while to load for the first time.
+**Note:** Intel Neural Compressor will convert the model format from auto-gptq to hpu format on the first load and save hpu_model.safetensors to the local cache directory for the next load. So it may take a while to load for the first time.
 ## Documentation

{neural_compressor-3.2 → neural_compressor-3.3}/neural_compressor/adaptor/adaptor.py RENAMED Viewed

@@ -49,7 +49,7 @@ class Adaptor(object):
     @abstractmethod
     def quantize(self, tune_cfg, model, dataloader, q_func=None):
-        """The function is used to do calibration and quanitization in post-training quantization.
+        """The function is used to do calibration and quantization in post-training quantization.
         Args:
             tune_cfg(dict): The chosen tuning configuration.

{neural_compressor-3.2 → neural_compressor-3.3}/neural_compressor/adaptor/mxnet.py RENAMED Viewed

@@ -59,7 +59,7 @@ class MxNetAdaptor(Adaptor):
     @dump_elapsed_time("Pass quantize model")
     def quantize(self, tune_cfg, nc_model, dataloader, q_func=None):
-        """The function is used to do MXNet calibration and quanitization in post-training
+        """The function is used to do MXNet calibration and quantization in post-training
            quantization.
         Args:

{neural_compressor-3.2 → neural_compressor-3.3}/neural_compressor/adaptor/onnxrt.py RENAMED Viewed

@@ -252,7 +252,7 @@ class ONNXRUNTIMEAdaptor(Adaptor):
     @dump_elapsed_time("Pass quantize model")
     def quantize(self, tune_cfg, model, data_loader, q_func=None):
-        """The function is used to do calibration and quanitization in post-training
+        """The function is used to do calibration and quantization in post-training
            quantization.
         Args:
@@ -1853,7 +1853,7 @@ class ONNXRT_WeightOnlyAdaptor(ONNXRUNTIMEAdaptor):
     @dump_elapsed_time("Pass quantize model")
     def quantize(self, tune_cfg, model, data_loader, q_func=None):
-        """The function is used to do calibration and quanitization in post-training
+        """The function is used to do calibration and quantization in post-training
            quantization.
         Args:

{neural_compressor-3.2 → neural_compressor-3.3}/neural_compressor/adaptor/tf_utils/graph_util.py RENAMED Viewed

@@ -212,7 +212,7 @@ class GraphAnalyzer:
             return self._search_patterns(patterns)
     def _search_patterns(self, input_pattern):
-        """Search user specified patterns on internal grpah structure.
+        """Search user specified patterns on internal graph structure.
         Args:
             input_pattern (list): The element of the pattern list could be string/list/tuple.

{neural_compressor-3.2 → neural_compressor-3.3}/neural_compressor/adaptor/torch_utils/gptq.py RENAMED Viewed

@@ -718,10 +718,12 @@ class GPTQuantizer(object):
                         for n, p in sub_layer.named_parameters():
                             param_name = full_layer_name + "." + n
                             if n == "weight":
-                                set_module_tensor_to_device(self.model, param_name, self.device, Q)
+                                set_module_tensor_to_device(self.model, param_name, self.device, Q, dtype=Q.dtype)
                             else:
                                 value = load_value(self.model, param_name, model_path)
-                                set_module_tensor_to_device(self.model, param_name, self.device, value)
+                                set_module_tensor_to_device(
+                                    self.model, param_name, self.device, value, dtype=value.dtype
+                                )
                         # sub_layer.weight.data = Q
                         torch.save(sub_layer.state_dict(), LWQ_WORKSPACE + f"/{full_layer_name}.pt")
                         clean_module_weight(sub_layer)
@@ -745,6 +747,8 @@ class GPTQuantizer(object):
             for j in range(len(self.dataloader)):
                 cache_keyword_batch = self.gather_single_batch_from_dict(self.cache_key_arguments, j)
                 cache_positional_batch = self.gather_single_batch_from_list(self.cache_positional_arguments, j)
+                # breakpoint()
+                # transformer_block = transformer_block.to(getattr(torch, self.model.config.torch_dtype))
                 out = transformer_block(*cache_positional_batch, **cache_keyword_batch)
                 out = self.track_hidden_states(out)
                 outs.append(out)

{neural_compressor-3.2 → neural_compressor-3.3}/neural_compressor/adaptor/torch_utils/hawq_metric.py RENAMED Viewed

@@ -23,14 +23,11 @@ import copy
 import logging
 import numpy as np
-import torch.nn
-import torch.nn as nn
 from torch.quantization.quantize_fx import fuse_fx
 logger = logging.getLogger(__name__)
 from typing import Any, Callable, Dict, List, Optional, Set, Union
-import torch
 import tqdm

{neural_compressor-3.2 → neural_compressor-3.3}/neural_compressor/adaptor/torch_utils/layer_wise_quant/utils.py RENAMED Viewed

@@ -221,7 +221,7 @@ def load_module(model, module_name, path, device="cpu"):
     for n, p in module.named_parameters():
         param_name = module_name + "." + n
         value = load_value(model, param_name, path)
-        set_module_tensor_to_device(model, param_name, device, value)
+        set_module_tensor_to_device(model, param_name, device, value, dtype=value.dtype)
 def register_weight_hooks(model, path, device="cpu", clean_weight=True, saved_path=None):
@@ -239,7 +239,7 @@ def register_weight_hooks(model, path, device="cpu", clean_weight=True, saved_pa
                     value = state_dict[n]
                 else:
                     value = load_value(model, param_name, path)
-                set_module_tensor_to_device(model, param_name, device, value)
+                set_module_tensor_to_device(model, param_name, device, value, dtype=value.dtype)
         return hook

{neural_compressor-3.2 → neural_compressor-3.3}/neural_compressor/common/base_config.py RENAMED Viewed

@@ -18,6 +18,7 @@
 from __future__ import annotations
+import copy
 import inspect
 import json
 import os
@@ -539,6 +540,7 @@ class BaseConfig(ABC):
                 tuning_param_pair = dict(zip(tuning_param_name_lst, params_values))
                 tmp_params_dict = {**not_tuning_param_pair, **tuning_param_pair}
                 new_config = self.__class__(**tmp_params_dict)
+                new_config.local_config = copy.deepcopy(self.local_config)
                 logger.info(new_config.to_dict())
                 config_list.append(new_config)
         logger.info("Expanded the %s and got %d configs.", self.__class__.name, len(config_list))
@@ -629,9 +631,13 @@ class BaseConfig(ABC):
         """
         if not isinstance(other, type(self)):
             return False
-        return self.params_list == other.params_list and all(
+        params_equal = self.params_list == other.params_list and all(
             getattr(self, str(attr)) == getattr(other, str(attr)) for attr in self.params_list
         )
+        local_config_equal = self.local_config == other.local_config
+        global_config_equal = self.global_config == other.global_config
+        return params_equal and local_config_equal and global_config_equal
 class ComposableConfig(BaseConfig):

{neural_compressor-3.2 → neural_compressor-3.3}/neural_compressor/common/version.py RENAMED Viewed

@@ -15,4 +15,4 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """Intel® Neural Compressor: An open-source Python library supporting popular model compression techniques."""
-__version__ = "3.2"
+__version__ = "3.3"

{neural_compressor-3.2 → neural_compressor-3.3}/neural_compressor/data/datasets/dummy_dataset_v2.py RENAMED Viewed

@@ -236,7 +236,7 @@ class SparseDummyDataset(IterableDataset):  # pragma: no cover
                 self.label_shape = len(self.dense_shape) * self.label_shape
             assert len(self.label_shape) == len(
                 self.dense_shape
-            ), "length of dense_shape should be euqal to length of label_shape"
+            ), "length of dense_shape should be equal to length of label_shape"
             self.label_dim = len(self.label_shape)
         self.input_dim = 1 if isinstance(dense_shape, tuple) else len(dense_shape)

{neural_compressor-3.2 → neural_compressor-3.3}/neural_compressor/evaluation/lm_eval/accuracy.py RENAMED Viewed

@@ -39,7 +39,6 @@ from typing import Union
 import lm_eval
 import numpy as np
 from lm_eval import evaluator, utils
-from lm_eval.loggers import WandbLogger
 from lm_eval.tasks import TaskManager
 from lm_eval.utils import make_table, simple_parse_args_string
@@ -67,6 +66,17 @@ def _handle_non_serializable(o):
 def cli_evaluate(args) -> None:
     if args.wandb_args:
+        try:
+            # For 0.4.3 and above
+            from lm_eval.loggers import WandbLogger
+        except ImportError:
+            try:
+                # For 0.4.2
+                from lm_eval.logging_utils import WandbLogger
+            except ImportError:
+                raise ImportError("Import of WandbLogger failed. Please install wandb to use this feature.")
+        except Exception as e:
+            raise RuntimeError(f"An unexpected error occurred: {e}")
         wandb_logger = WandbLogger(**simple_parse_args_string(args.wandb_args))
     eval_logger = utils.eval_logger
@@ -200,6 +210,7 @@ def cli_evaluate(args) -> None:
         )
     lm.pad_to_buckets = args.pad_to_buckets
     lm.buckets = args.buckets
+    lm.add_bos_token = args.add_bos_token
     results = evaluator.simple_evaluate(
         model=lm,

{neural_compressor-3.2 → neural_compressor-3.3}/neural_compressor/evaluation/lm_eval/utils.py RENAMED Viewed

@@ -20,6 +20,8 @@ try:
 except:
     _hpex_available = False
+from neural_compressor.common import logger
 class LMEvalParser:
     def __init__(
@@ -50,6 +52,7 @@ class LMEvalParser:
         trust_remote_code=False,
         pad_to_buckets=None,  # used by HPU to align input length for performance.
         buckets=[32, 64, 128, 256, 512, 1024, 2048, 4096],  # used by HPU to limit input length range.
+        add_bos_token=False,
     ):
         self.model = model
         self.tasks = tasks
@@ -83,3 +86,17 @@ class LMEvalParser:
         else:
             self.pad_to_buckets = pad_to_buckets
         self.buckets = buckets
+        self.add_bos_token = add_bos_token
+        self._post_init()
+    def _check_add_bos_token(self):
+        if not self.add_bos_token:
+            logger.warning(
+                (
+                    "`add_bos_token` is set to False. "
+                    "If the model was trained or fine-tuned with a BOS token, this may lead to incorrect results."
+                )
+            )
+    def _post_init(self):
+        self._check_add_bos_token()

{neural_compressor-3.2 → neural_compressor-3.3}/neural_compressor/tensorflow/quantization/utils/graph_util.py RENAMED Viewed

@@ -212,7 +212,7 @@ class GraphAnalyzer:
             return self._search_patterns(patterns)
     def _search_patterns(self, input_pattern):
-        """Search user specified patterns on internal grpah structure.
+        """Search user specified patterns on internal graph structure.
         Args:
             input_pattern (list): The element of the pattern list could be string/list/tuple.

{neural_compressor-3.2 → neural_compressor-3.3}/neural_compressor/torch/algorithms/fp8_quant/_core/common.py RENAMED Viewed

@@ -13,39 +13,24 @@
 # limitations under the License.
 import functools
-import importlib.util
 import json
 import os
 import numpy as np
 import torch
+from enum import Enum, auto
-from .._quant_common.helper_modules import *
 from .._quant_common.quant_config import get_hqt_config
 from ..utils.logger import logger
-from neural_compressor.torch.algorithms.fp8_quant.model_configs import (
-    ModuleInfo,
-    ModuleConfig,
-    ModuleType,
-    ModuleExtraConfig,
-    get_patched_module_table,
-    get_patched_module_type_table,
-)
-from neural_compressor.torch.utils.auto_accelerator import auto_detect_accelerator
-deepspeed_exists = False
-if importlib.util.find_spec("deepspeed"):  # check if deepspeed is installed
-    deepspeed_exists = True
+from neural_compressor.torch.algorithms.fp8_quant.model_configs import ModuleConfig
 UNMEASURED_MODELS = "UnmeasuredModels"
-_mod_types = {
-    "linear": ModuleType(1, ["weight"], 1, False),
-    "matmul": ModuleType(2, [], 1, False),
-    "kv_cache": ModuleType(1, [], 1, False),
-    "softmax": ModuleType(1, [], 1, True),
-    "fused_sdpa": ModuleType(3, [], 2, True),
-}
+class QuantTensorType(Enum):
+    MEASUREMENTS = auto()
+    CONST = auto()
+    DYNAMIC = auto()
 class ShapeList:
@@ -196,73 +181,6 @@ format_functions = {
 format_functions_rec = lambda k: functools.partial(rec_fn, fn=format_functions[k])
-_mod_default_dict = {
-    "Matmul": ModuleInfo("matmul", PatchedMatmul),
-    "Linear": ModuleInfo("linear", PatchedLinear),
-    "RowParallelLinear": ModuleInfo("linear", PatchedRowParallelLinear),
-    "ColumnParallelLinear": ModuleInfo("linear", PatchedColumnParallelLinear),
-    "MergedColumnParallelLinear": ModuleInfo("linear", PatchedColumnParallelLinear),
-    "QKVParallelLinear": ModuleInfo("linear", PatchedColumnParallelLinear),
-    "FalconLinear": ModuleInfo("linear", PatchedLinear),
-    "KVCache": ModuleInfo("kv_cache", PatchedKVCache),
-    "VLLMKVCache": ModuleInfo("kv_cache", PatchedVLLMKVCache),
-    "Conv2d": ModuleInfo("linear", PatchedConv2d),
-    "LoRACompatibleLinear": ModuleInfo("linear", PatchedLoRACompatibleLinear),
-    "LoRACompatibleConv": ModuleInfo("linear", PatchedLoRACompatibleConv),
-    "Softmax": ModuleInfo("softmax", PatchedSoftmax),
-    "ModuleFusedSDPA": ModuleInfo("fused_sdpa", PatchedModuleFusedSDPA),
-    "MoeMatmul": ModuleInfo("linear", PatchedMoeMatmul),
-    "ReplicatedLinear": ModuleInfo("linear", PatchedReplicatedLinear),
-    "FusedMoE": ModuleInfo("linear", PatchedMixtralMoE, False),
-}
-if deepspeed_exists:
-    _mod_default_dict.update(
-        {
-            "LinearLayer": ModuleInfo("linear", PatchedLinear),
-            "LinearAllreduce": ModuleInfo("linear", PatchedLinearAllReduce),
-            "ScopedLinearAllReduce": ModuleInfo("linear", PatchedLinearAllReduce),
-            "LmHeadLinearAllreduce": ModuleInfo("linear", PatchedLmHeadLinearAllreduce),
-        }
-    )
-@functools.lru_cache(maxsize=None)
-def _import_hpu_modules():
-    from neural_compressor.torch.algorithms.fp8_quant.patched_module_base import (
-        PATCHED_MODULE_TABLE, PATCHED_MODULE_TYPES_TABLE
-    )
-    cur_accelerator = auto_detect_accelerator()
-    if not cur_accelerator.current_device_name().startswith("hpu"):
-        return
-    PATCHED_MODULE_TABLE["hpu"].update(_mod_default_dict)
-    PATCHED_MODULE_TYPES_TABLE["hpu"].update(_mod_types)
-_import_hpu_modules()
-mod_default_dict = get_patched_module_table()
-mod_types = get_patched_module_type_table()
-def get_white_list():
-    return list(mod_default_dict.keys())
-class ModInstInfo:
-    def __init__(self, name, parent):
-        self.name = name
-        self.parent = parent
-parent_child_mod_dict = {}
-def generate_model_info(model):
-    def create_mod_info_recursion(parent):
-        for name, mod in parent.named_children():
-            parent_child_mod_dict[mod] = ModInstInfo(name, parent)
-            create_mod_info_recursion(mod)
-    create_mod_info_recursion(model)
 def get_device_type_for_scales(mod):
     config = get_hqt_config(mod).cfg

neural-compressor 3.2__tar.gz → 3.3__tar.gz

neural-compressor 3.2tar.gz → 3.3tar.gz