ista-daslab-optimizers 1.1.5__tar.gz → 1.1.6__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. {ista_daslab_optimizers-1.1.5/ista_daslab_optimizers.egg-info → ista_daslab_optimizers-1.1.6}/PKG-INFO +3 -4
  2. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/README.md +2 -3
  3. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers/tools.py +2 -0
  4. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6/ista_daslab_optimizers.egg-info}/PKG-INFO +3 -4
  5. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/pyproject.toml +1 -1
  6. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/LICENSE +0 -0
  7. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/MANIFEST.in +0 -0
  8. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers/__init__.py +0 -0
  9. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers/acdc/__init__.py +0 -0
  10. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers/acdc/acdc.py +0 -0
  11. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers/acdc/wd_scheduler.py +0 -0
  12. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers/dense_mfac/__init__.py +0 -0
  13. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers/dense_mfac/dense_core_mfac.py +0 -0
  14. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers/dense_mfac/dense_mfac.py +0 -0
  15. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers/micro_adam/__init__.py +0 -0
  16. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers/micro_adam/micro_adam.py +0 -0
  17. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers/sparse_mfac/__init__.py +0 -0
  18. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers/sparse_mfac/sparse_core_mfac_w_ef.py +0 -0
  19. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers/sparse_mfac/sparse_mfac.py +0 -0
  20. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers.egg-info/SOURCES.txt +0 -0
  21. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers.egg-info/dependency_links.txt +0 -0
  22. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers.egg-info/requires.txt +0 -0
  23. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/ista_daslab_optimizers.egg-info/top_level.txt +0 -0
  24. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/kernels/dense_mfac/dense_mfac.cpp +0 -0
  25. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/kernels/dense_mfac/dense_mfac_kernel.cu +0 -0
  26. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/kernels/micro_adam/micro_adam.cpp +0 -0
  27. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/kernels/micro_adam/micro_adam_asymm_block_quant.cu +0 -0
  28. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/kernels/micro_adam/micro_adam_asymm_block_quant_inv.cu +0 -0
  29. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/kernels/micro_adam/micro_adam_update.cu +0 -0
  30. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/kernels/sparse_mfac/sparse_mfac.cpp +0 -0
  31. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/kernels/sparse_mfac/sparse_mfac_LCG_kernel.cu +0 -0
  32. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/kernels/sparse_mfac/sparse_mfac_SP_kernel.cu +0 -0
  33. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/kernels/sparse_mfac_pruner/sparse_mfac_pruner.cpp +0 -0
  34. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/kernels/sparse_mfac_pruner/sparse_mfac_pruner.cu +0 -0
  35. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/kernels/tools/tools.cpp +0 -0
  36. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/kernels/tools/tools_kernel.cu +0 -0
  37. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/kernels/utils.h +0 -0
  38. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/setup.cfg +0 -0
  39. {ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.6}/setup.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: ista_daslab_optimizers
3
- Version: 1.1.5
3
+ Version: 1.1.6
4
4
  Summary: Deep Learning optimizers developed in the Distributed Algorithms and Systems group (DASLab) @ Institute of Science and Technology Austria (ISTA)
5
5
  Author-email: Ionut-Vlad Modoranu <ionut-vlad.modoranu@ist.ac.at>
6
6
  Maintainer-email: Ionut-Vlad Modoranu <ionut-vlad.modoranu@ist.ac.at>
@@ -291,6 +291,8 @@ optimizer = MicroAdam(
291
291
  # Versions summary:
292
292
 
293
293
  ---
294
+ - **1.1.6** @ February 19th, 2025:
295
+ - do not update the parameters that have `None` gradient in method `update_model` from `tools.py`. This is useful when using M-FAC for models with more than one classification head in the Continual Learning framework
294
296
  - **1.1.5** @ February 19th, 2025:
295
297
  - adapted `DenseMFAC` for a model with multiple classification heads for Continual Learning where we have one feature extractor block and a list of classification heads. The issue was related to the model size, which included the feature extractor backbone and all classification heads, but in practice only one classification head will be used for training and inference. This caused some size mismatch errors at runtime in the `DenseCoreMFAC` module because the gradient at runtime had fewer entries than the entire model. When using `DenseMFAC` for such settings, set `optimizer.model_size` to the correct size after calling the constructor and the `DenseCoreMFAC` object will be created automatically in the `step` function.
296
298
  - **1.1.3** @ September 5th, 2024:
@@ -301,12 +303,9 @@ optimizer = MicroAdam(
301
303
  the expense of another call to `Qinv` and `Q` (and implicitly quantization statistics computation).
302
304
  - ***[1.0.2]:*** added FSDP-compatible implementation by initializing the parameter states in the `update_step` method
303
305
  instead of MicroAdam constructor
304
-
305
306
  - **1.0.1** @ June 27th, 2024:
306
307
  - removed version in dependencies to avoid conflicts with llm-foundry
307
-
308
308
  - **1.0.0** @ June 20th, 2024:
309
309
  - changed minimum required Python version to 3.8+ and torch to 2.3.0+
310
-
311
310
  - **0.0.1** @ June 13th, 2024:
312
311
  - added initial version of the package for Python 3.9+ and torch 2.3.1+
@@ -66,6 +66,8 @@ optimizer = MicroAdam(
66
66
  # Versions summary:
67
67
 
68
68
  ---
69
+ - **1.1.6** @ February 19th, 2025:
70
+ - do not update the parameters that have `None` gradient in method `update_model` from `tools.py`. This is useful when using M-FAC for models with more than one classification head in the Continual Learning framework
69
71
  - **1.1.5** @ February 19th, 2025:
70
72
  - adapted `DenseMFAC` for a model with multiple classification heads for Continual Learning where we have one feature extractor block and a list of classification heads. The issue was related to the model size, which included the feature extractor backbone and all classification heads, but in practice only one classification head will be used for training and inference. This caused some size mismatch errors at runtime in the `DenseCoreMFAC` module because the gradient at runtime had fewer entries than the entire model. When using `DenseMFAC` for such settings, set `optimizer.model_size` to the correct size after calling the constructor and the `DenseCoreMFAC` object will be created automatically in the `step` function.
71
73
  - **1.1.3** @ September 5th, 2024:
@@ -76,12 +78,9 @@ optimizer = MicroAdam(
76
78
  the expense of another call to `Qinv` and `Q` (and implicitly quantization statistics computation).
77
79
  - ***[1.0.2]:*** added FSDP-compatible implementation by initializing the parameter states in the `update_step` method
78
80
  instead of MicroAdam constructor
79
-
80
81
  - **1.0.1** @ June 27th, 2024:
81
82
  - removed version in dependencies to avoid conflicts with llm-foundry
82
-
83
83
  - **1.0.0** @ June 20th, 2024:
84
84
  - changed minimum required Python version to 3.8+ and torch to 2.3.0+
85
-
86
85
  - **0.0.1** @ June 13th, 2024:
87
86
  - added initial version of the package for Python 3.9+ and torch 2.3.1+
@@ -134,6 +134,8 @@ def update_model(params, update, weight_decay=0, alpha=None, multiply_wd_w_lr=Fa
134
134
  lr = group['lr']
135
135
  wd = group.get('weight_decay', weight_decay) # if the param groups do not have weight decay, then use the externally provided one
136
136
  for p in group['params']:
137
+ if p.grad is None:
138
+ continue
137
139
  u = update[count:(count + p.numel())].reshape(p.shape).to(p.device)
138
140
  if wd > 0:
139
141
  if multiply_wd_w_lr:
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: ista_daslab_optimizers
3
- Version: 1.1.5
3
+ Version: 1.1.6
4
4
  Summary: Deep Learning optimizers developed in the Distributed Algorithms and Systems group (DASLab) @ Institute of Science and Technology Austria (ISTA)
5
5
  Author-email: Ionut-Vlad Modoranu <ionut-vlad.modoranu@ist.ac.at>
6
6
  Maintainer-email: Ionut-Vlad Modoranu <ionut-vlad.modoranu@ist.ac.at>
@@ -291,6 +291,8 @@ optimizer = MicroAdam(
291
291
  # Versions summary:
292
292
 
293
293
  ---
294
+ - **1.1.6** @ February 19th, 2025:
295
+ - do not update the parameters that have `None` gradient in method `update_model` from `tools.py`. This is useful when using M-FAC for models with more than one classification head in the Continual Learning framework
294
296
  - **1.1.5** @ February 19th, 2025:
295
297
  - adapted `DenseMFAC` for a model with multiple classification heads for Continual Learning where we have one feature extractor block and a list of classification heads. The issue was related to the model size, which included the feature extractor backbone and all classification heads, but in practice only one classification head will be used for training and inference. This caused some size mismatch errors at runtime in the `DenseCoreMFAC` module because the gradient at runtime had fewer entries than the entire model. When using `DenseMFAC` for such settings, set `optimizer.model_size` to the correct size after calling the constructor and the `DenseCoreMFAC` object will be created automatically in the `step` function.
296
298
  - **1.1.3** @ September 5th, 2024:
@@ -301,12 +303,9 @@ optimizer = MicroAdam(
301
303
  the expense of another call to `Qinv` and `Q` (and implicitly quantization statistics computation).
302
304
  - ***[1.0.2]:*** added FSDP-compatible implementation by initializing the parameter states in the `update_step` method
303
305
  instead of MicroAdam constructor
304
-
305
306
  - **1.0.1** @ June 27th, 2024:
306
307
  - removed version in dependencies to avoid conflicts with llm-foundry
307
-
308
308
  - **1.0.0** @ June 20th, 2024:
309
309
  - changed minimum required Python version to 3.8+ and torch to 2.3.0+
310
-
311
310
  - **0.0.1** @ June 13th, 2024:
312
311
  - added initial version of the package for Python 3.9+ and torch 2.3.1+
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name='ista_daslab_optimizers'
7
- version='1.1.5'
7
+ version='1.1.6'
8
8
  dependencies = [
9
9
  "torch", # >=2.3.1",
10
10
  "torchaudio", # >=2.3.1",