PyPI - ista-daslab-optimizers - Versions diffs - 1.1.5__tar.gz → 1.1.6__tar.gz - Mend

ista-daslab-optimizers 1.1.5tar.gz → 1.1.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

{ista_daslab_optimizers-1.1.5/ista_daslab_optimizers.egg-info → ista_daslab_optimizers-1.1.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: ista_daslab_optimizers
-Version: 1.1.5
+Version: 1.1.6
 Summary: Deep Learning optimizers developed in the Distributed Algorithms and Systems group (DASLab) @ Institute of Science and Technology Austria (ISTA)
 Author-email: Ionut-Vlad Modoranu <ionut-vlad.modoranu@ist.ac.at>
 Maintainer-email: Ionut-Vlad Modoranu <ionut-vlad.modoranu@ist.ac.at>
@@ -291,6 +291,8 @@ optimizer = MicroAdam(
 # Versions summary:
 ---
+- **1.1.6** @ February 19th, 2025:
+  - do not update the parameters that have `None` gradient in method `update_model` from `tools.py`. This is useful when using M-FAC for models with more than one classification head in the Continual Learning framework
 - **1.1.5** @ February 19th, 2025:
   - adapted `DenseMFAC` for a model with multiple classification heads for Continual Learning where we have one feature extractor block and a list of classification heads. The issue was related to the model size, which included the feature extractor backbone and all classification heads, but in practice only one classification head will be used for training and inference. This caused some size mismatch errors at runtime in the `DenseCoreMFAC` module because the gradient at runtime had fewer entries than the entire model. When using `DenseMFAC` for such settings, set `optimizer.model_size` to the correct size after calling the constructor and the `DenseCoreMFAC` object will be created automatically in the `step` function.
 - **1.1.3** @ September 5th, 2024:
@@ -301,12 +303,9 @@ optimizer = MicroAdam(
   the expense of another call to `Qinv` and `Q` (and implicitly quantization statistics computation).
   - ***[1.0.2]:*** added FSDP-compatible implementation by initializing the parameter states in the `update_step` method
   instead of MicroAdam constructor
 - **1.0.1** @ June 27th, 2024:
   - removed version in dependencies to avoid conflicts with llm-foundry
 - **1.0.0** @ June 20th, 2024:
   - changed minimum required Python version to 3.8+ and torch to 2.3.0+
 - **0.0.1** @ June 13th, 2024:
   - added initial version of the package for Python 3.9+ and torch 2.3.1+

{ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/README.md RENAMED Viewed

@@ -66,6 +66,8 @@ optimizer = MicroAdam(
 # Versions summary:
 ---
+- **1.1.6** @ February 19th, 2025:
+  - do not update the parameters that have `None` gradient in method `update_model` from `tools.py`. This is useful when using M-FAC for models with more than one classification head in the Continual Learning framework
 - **1.1.5** @ February 19th, 2025:
   - adapted `DenseMFAC` for a model with multiple classification heads for Continual Learning where we have one feature extractor block and a list of classification heads. The issue was related to the model size, which included the feature extractor backbone and all classification heads, but in practice only one classification head will be used for training and inference. This caused some size mismatch errors at runtime in the `DenseCoreMFAC` module because the gradient at runtime had fewer entries than the entire model. When using `DenseMFAC` for such settings, set `optimizer.model_size` to the correct size after calling the constructor and the `DenseCoreMFAC` object will be created automatically in the `step` function.
 - **1.1.3** @ September 5th, 2024:
@@ -76,12 +78,9 @@ optimizer = MicroAdam(
   the expense of another call to `Qinv` and `Q` (and implicitly quantization statistics computation).
   - ***[1.0.2]:*** added FSDP-compatible implementation by initializing the parameter states in the `update_step` method
   instead of MicroAdam constructor
 - **1.0.1** @ June 27th, 2024:
   - removed version in dependencies to avoid conflicts with llm-foundry
 - **1.0.0** @ June 20th, 2024:
   - changed minimum required Python version to 3.8+ and torch to 2.3.0+
 - **0.0.1** @ June 13th, 2024:
   - added initial version of the package for Python 3.9+ and torch 2.3.1+

{ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers/tools.py RENAMED Viewed

@@ -134,6 +134,8 @@ def update_model(params, update, weight_decay=0, alpha=None, multiply_wd_w_lr=Fa
         lr = group['lr']
         wd = group.get('weight_decay', weight_decay) # if the param groups do not have weight decay, then use the externally provided one
         for p in group['params']:
+            if p.grad is None:
+                continue
             u = update[count:(count + p.numel())].reshape(p.shape).to(p.device)
             if wd > 0:
                 if multiply_wd_w_lr:

{ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6/ista_daslab_optimizers.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: ista_daslab_optimizers
-Version: 1.1.5
+Version: 1.1.6
 Summary: Deep Learning optimizers developed in the Distributed Algorithms and Systems group (DASLab) @ Institute of Science and Technology Austria (ISTA)
 Author-email: Ionut-Vlad Modoranu <ionut-vlad.modoranu@ist.ac.at>
 Maintainer-email: Ionut-Vlad Modoranu <ionut-vlad.modoranu@ist.ac.at>
@@ -291,6 +291,8 @@ optimizer = MicroAdam(
 # Versions summary:
 ---
+- **1.1.6** @ February 19th, 2025:
+  - do not update the parameters that have `None` gradient in method `update_model` from `tools.py`. This is useful when using M-FAC for models with more than one classification head in the Continual Learning framework
 - **1.1.5** @ February 19th, 2025:
   - adapted `DenseMFAC` for a model with multiple classification heads for Continual Learning where we have one feature extractor block and a list of classification heads. The issue was related to the model size, which included the feature extractor backbone and all classification heads, but in practice only one classification head will be used for training and inference. This caused some size mismatch errors at runtime in the `DenseCoreMFAC` module because the gradient at runtime had fewer entries than the entire model. When using `DenseMFAC` for such settings, set `optimizer.model_size` to the correct size after calling the constructor and the `DenseCoreMFAC` object will be created automatically in the `step` function.
 - **1.1.3** @ September 5th, 2024:
@@ -301,12 +303,9 @@ optimizer = MicroAdam(
   the expense of another call to `Qinv` and `Q` (and implicitly quantization statistics computation).
   - ***[1.0.2]:*** added FSDP-compatible implementation by initializing the parameter states in the `update_step` method
   instead of MicroAdam constructor
 - **1.0.1** @ June 27th, 2024:
   - removed version in dependencies to avoid conflicts with llm-foundry
 - **1.0.0** @ June 20th, 2024:
   - changed minimum required Python version to 3.8+ and torch to 2.3.0+
 - **0.0.1** @ June 13th, 2024:
   - added initial version of the package for Python 3.9+ and torch 2.3.1+

{ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name='ista_daslab_optimizers'
-version='1.1.5'
+version='1.1.6'
 dependencies = [
     "torch", # >=2.3.1",
     "torchaudio", # >=2.3.1",