PyPI - ista-daslab-optimizers - Versions diffs - 1.1.5__tar.gz → 1.1.7__tar.gz - Mend

ista-daslab-optimizers 1.1.5tar.gz → 1.1.7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

{ista_daslab_optimizers-1.1.5/ista_daslab_optimizers.egg-info → ista_daslab_optimizers-1.1.7}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
-Metadata-Version: 2.2
+Metadata-Version: 2.1
 Name: ista_daslab_optimizers
-Version: 1.1.5
+Version: 1.1.7
 Summary: Deep Learning optimizers developed in the Distributed Algorithms and Systems group (DASLab) @ Institute of Science and Technology Austria (ISTA)
 Author-email: Ionut-Vlad Modoranu <ionut-vlad.modoranu@ist.ac.at>
 Maintainer-email: Ionut-Vlad Modoranu <ionut-vlad.modoranu@ist.ac.at>
@@ -222,6 +222,7 @@ Requires-Dist: gpustat
 Requires-Dist: timm
 Requires-Dist: einops
 Requires-Dist: psutil
+Requires-Dist: fast-hadamard-transform
 # ISTA DAS Lab Optimization Algorithms Package
 This repository contains optimization algorithms for Deep Learning developed by
@@ -240,6 +241,9 @@ The repository contains code for the following optimizers published by DASLab @
 - **MicroAdam**:
   - paper: [MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence](https://arxiv.org/abs/2405.15593)
   - official repository: [GitHub](https://github.com/IST-DASLab/MicroAdam)
+- **Trion / DCT-AdamW**:
+  - paper: [FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models](https://arxiv.org/abs/2505.17967v3)
+  - code: [GitHub](https://github.com/IST-DASLab/ISTA-DASLab-Optimizers/tree/main/ista_daslab_optimizers/fft_low_rank)
 ### Installation
 To use the latest stable version of this repository, you can install via pip:
@@ -261,7 +265,8 @@ source install.sh
 ## How to use optimizers?
-In this repository we provide a minimal working example for CIFAR-10 for optimizers `acdc`, `dense_mfac`, `sparse_mfac` and `micro_adam`:
+In this repository we provide a minimal working example for CIFAR-10 for optimizers `acdc`,
+`dense_mfac`, `sparse_mfac` and `micro_adam`:
 ```shell
 cd examples/cifar10
 OPTIMIZER=micro_adam # or any other optimizer listed above
@@ -291,22 +296,32 @@ optimizer = MicroAdam(
 # Versions summary:
 ---
+- **1.1.7** @ October 8th, 2025:
+  - added code for `Trion & DCT-AdamW`
+- **1.1.6** @ February 19th, 2025:
+  - do not update the parameters that have `None` gradient in method `update_model` from `tools.py`.
+  This is useful when using M-FAC for models with more than one classification head in the Continual Learning framework.
 - **1.1.5** @ February 19th, 2025:
-  - adapted `DenseMFAC` for a model with multiple classification heads for Continual Learning where we have one feature extractor block and a list of classification heads. The issue was related to the model size, which included the feature extractor backbone and all classification heads, but in practice only one classification head will be used for training and inference. This caused some size mismatch errors at runtime in the `DenseCoreMFAC` module because the gradient at runtime had fewer entries than the entire model. When using `DenseMFAC` for such settings, set `optimizer.model_size` to the correct size after calling the constructor and the `DenseCoreMFAC` object will be created automatically in the `step` function.
+  - adapted `DenseMFAC` for a model with multiple classification heads for Continual Learning where
+  we have one feature extractor block and a list of classification heads. The issue was related to
+  the model size, which included the feature extractor backbone and all classification heads, but
+  in practice only one classification head will be used for training and inference. This caused some
+  size mismatch errors at runtime in the `DenseCoreMFAC` module because the gradient at runtime had
+  fewer entries than the entire model. When using `DenseMFAC` for such settings, set `optimizer.model_size`
+  to the correct size after calling the constructor and the `DenseCoreMFAC` object will be created
+  automatically in the `step` function.
 - **1.1.3** @ September 5th, 2024:
   - allow using `SparseCoreMFACwithEF` separately by importing it in `sparse_mfac.__init__.py`
 - **1.1.2** @ August 1st, 2024:
-  - ***[1.1.0]:*** added support to densify the final update: introduced parameter alpha that controls the fraction of error feedback
-  (EF) to be integrated into the update to make it dense. Finally, the fraction alpha will be discarded from the EF at
-  the expense of another call to `Qinv` and `Q` (and implicitly quantization statistics computation).
-  - ***[1.0.2]:*** added FSDP-compatible implementation by initializing the parameter states in the `update_step` method
-  instead of MicroAdam constructor
+  - ***[1.1.0]:*** added support to densify the final update: introduced parameter alpha that controls
+  the fraction of error feedback (EF) to be integrated into the update to make it dense. Finally, the
+  fraction alpha will be discarded from the EF at the expense of another call to `Qinv` and `Q` (and
+  implicitly quantization statistics computation).
+  - ***[1.0.2]:*** added FSDP-compatible implementation by initializing the parameter states in the
+  `update_step` method instead of MicroAdam constructor
 - **1.0.1** @ June 27th, 2024:
   - removed version in dependencies to avoid conflicts with llm-foundry
 - **1.0.0** @ June 20th, 2024:
   - changed minimum required Python version to 3.8+ and torch to 2.3.0+
 - **0.0.1** @ June 13th, 2024:
   - added initial version of the package for Python 3.9+ and torch 2.3.1+

{ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.7}/README.md RENAMED Viewed

@@ -15,6 +15,9 @@ The repository contains code for the following optimizers published by DASLab @
 - **MicroAdam**:
   - paper: [MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence](https://arxiv.org/abs/2405.15593)
   - official repository: [GitHub](https://github.com/IST-DASLab/MicroAdam)
+- **Trion / DCT-AdamW**:
+  - paper: [FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models](https://arxiv.org/abs/2505.17967v3)
+  - code: [GitHub](https://github.com/IST-DASLab/ISTA-DASLab-Optimizers/tree/main/ista_daslab_optimizers/fft_low_rank)
 ### Installation
 To use the latest stable version of this repository, you can install via pip:
@@ -36,7 +39,8 @@ source install.sh
 ## How to use optimizers?
-In this repository we provide a minimal working example for CIFAR-10 for optimizers `acdc`, `dense_mfac`, `sparse_mfac` and `micro_adam`:
+In this repository we provide a minimal working example for CIFAR-10 for optimizers `acdc`,
+`dense_mfac`, `sparse_mfac` and `micro_adam`:
 ```shell
 cd examples/cifar10
 OPTIMIZER=micro_adam # or any other optimizer listed above
@@ -66,22 +70,32 @@ optimizer = MicroAdam(
 # Versions summary:
 ---
+- **1.1.7** @ October 8th, 2025:
+  - added code for `Trion & DCT-AdamW`
+- **1.1.6** @ February 19th, 2025:
+  - do not update the parameters that have `None` gradient in method `update_model` from `tools.py`.
+  This is useful when using M-FAC for models with more than one classification head in the Continual Learning framework.
 - **1.1.5** @ February 19th, 2025:
-  - adapted `DenseMFAC` for a model with multiple classification heads for Continual Learning where we have one feature extractor block and a list of classification heads. The issue was related to the model size, which included the feature extractor backbone and all classification heads, but in practice only one classification head will be used for training and inference. This caused some size mismatch errors at runtime in the `DenseCoreMFAC` module because the gradient at runtime had fewer entries than the entire model. When using `DenseMFAC` for such settings, set `optimizer.model_size` to the correct size after calling the constructor and the `DenseCoreMFAC` object will be created automatically in the `step` function.
+  - adapted `DenseMFAC` for a model with multiple classification heads for Continual Learning where
+  we have one feature extractor block and a list of classification heads. The issue was related to
+  the model size, which included the feature extractor backbone and all classification heads, but
+  in practice only one classification head will be used for training and inference. This caused some
+  size mismatch errors at runtime in the `DenseCoreMFAC` module because the gradient at runtime had
+  fewer entries than the entire model. When using `DenseMFAC` for such settings, set `optimizer.model_size`
+  to the correct size after calling the constructor and the `DenseCoreMFAC` object will be created
+  automatically in the `step` function.
 - **1.1.3** @ September 5th, 2024:
   - allow using `SparseCoreMFACwithEF` separately by importing it in `sparse_mfac.__init__.py`
 - **1.1.2** @ August 1st, 2024:
-  - ***[1.1.0]:*** added support to densify the final update: introduced parameter alpha that controls the fraction of error feedback
-  (EF) to be integrated into the update to make it dense. Finally, the fraction alpha will be discarded from the EF at
-  the expense of another call to `Qinv` and `Q` (and implicitly quantization statistics computation).
-  - ***[1.0.2]:*** added FSDP-compatible implementation by initializing the parameter states in the `update_step` method
-  instead of MicroAdam constructor
+  - ***[1.1.0]:*** added support to densify the final update: introduced parameter alpha that controls
+  the fraction of error feedback (EF) to be integrated into the update to make it dense. Finally, the
+  fraction alpha will be discarded from the EF at the expense of another call to `Qinv` and `Q` (and
+  implicitly quantization statistics computation).
+  - ***[1.0.2]:*** added FSDP-compatible implementation by initializing the parameter states in the
+  `update_step` method instead of MicroAdam constructor
 - **1.0.1** @ June 27th, 2024:
   - removed version in dependencies to avoid conflicts with llm-foundry
 - **1.0.0** @ June 20th, 2024:
   - changed minimum required Python version to 3.8+ and torch to 2.3.0+
 - **0.0.1** @ June 13th, 2024:
   - added initial version of the package for Python 3.9+ and torch 2.3.1+

{ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.7}/ista_daslab_optimizers/__init__.py RENAMED Viewed

@@ -2,3 +2,5 @@ from .acdc import *
 from .micro_adam import *
 from .sparse_mfac import *
 from .dense_mfac import *
+from .fft_low_rank.trion import Trion
+from .fft_low_rank.dct_adamw import DCTAdamW

ista_daslab_optimizers-1.1.7/ista_daslab_optimizers/ista_optimizer/__init__.py ADDED Viewed

@@ -0,0 +1,5 @@
+from .ista_optimizer import ISTAOptimizer
+__all__ = [
+    'ISTAOptimizer'
+]

ista_daslab_optimizers-1.1.7/ista_daslab_optimizers/ista_optimizer/ista_optimizer.py ADDED Viewed

@@ -0,0 +1,36 @@
+import torch
+class ISTAOptimizer(torch.optim.Optimizer):
+    def __init__(self, params, lr, weight_decay):
+        super().__init__(params, dict(lr=lr, weight_decay=weight_decay))
+        self.lr = lr
+        self.weight_decay = weight_decay
+        self.optim_steps = 0
+    def loop_params(self, check_grad=True):
+        for group in self.param_groups:
+            for p in group['params']:
+                if check_grad:
+                    if p.grad is None: continue
+                yield group, self.state[p], p
+    @torch.no_grad()
+    def init_optimizer_states(self):
+        raise NotImplementedError
+    @torch.no_grad()
+    def optimizer_step(self):
+        raise NotImplementedError
+    @torch.no_grad()
+    def step(self, closure=None):
+        self.optim_steps += 1
+        loss = None
+        if closure is not None:
+            with torch.enable_grad():
+                loss = closure()
+        self.optimizer_step()
+        return loss

{ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.7}/ista_daslab_optimizers/tools.py RENAMED Viewed

@@ -134,6 +134,8 @@ def update_model(params, update, weight_decay=0, alpha=None, multiply_wd_w_lr=Fa
         lr = group['lr']
         wd = group.get('weight_decay', weight_decay) # if the param groups do not have weight decay, then use the externally provided one
         for p in group['params']:
+            if p.grad is None:
+                continue
             u = update[count:(count + p.numel())].reshape(p.shape).to(p.device)
             if wd > 0:
                 if multiply_wd_w_lr:
@@ -212,4 +214,5 @@ class KernelVersionsManager:
         return self.LCG_BLOCKS_THREADS[self.version_LCG][self.BLOCK_INDEX]
     def get_LCG_threads(self):
-        return self.LCG_BLOCKS_THREADS[self.version_LCG][self.THREAD_INDEX]
+        return self.LCG_BLOCKS_THREADS[self.version_LCG][self.THREAD_INDEX]

{ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.7/ista_daslab_optimizers.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
-Metadata-Version: 2.2
+Metadata-Version: 2.1
 Name: ista_daslab_optimizers
-Version: 1.1.5
+Version: 1.1.7
 Summary: Deep Learning optimizers developed in the Distributed Algorithms and Systems group (DASLab) @ Institute of Science and Technology Austria (ISTA)
 Author-email: Ionut-Vlad Modoranu <ionut-vlad.modoranu@ist.ac.at>
 Maintainer-email: Ionut-Vlad Modoranu <ionut-vlad.modoranu@ist.ac.at>
@@ -222,6 +222,7 @@ Requires-Dist: gpustat
 Requires-Dist: timm
 Requires-Dist: einops
 Requires-Dist: psutil
+Requires-Dist: fast-hadamard-transform
 # ISTA DAS Lab Optimization Algorithms Package
 This repository contains optimization algorithms for Deep Learning developed by
@@ -240,6 +241,9 @@ The repository contains code for the following optimizers published by DASLab @
 - **MicroAdam**:
   - paper: [MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence](https://arxiv.org/abs/2405.15593)
   - official repository: [GitHub](https://github.com/IST-DASLab/MicroAdam)
+- **Trion / DCT-AdamW**:
+  - paper: [FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models](https://arxiv.org/abs/2505.17967v3)
+  - code: [GitHub](https://github.com/IST-DASLab/ISTA-DASLab-Optimizers/tree/main/ista_daslab_optimizers/fft_low_rank)
 ### Installation
 To use the latest stable version of this repository, you can install via pip:
@@ -261,7 +265,8 @@ source install.sh
 ## How to use optimizers?
-In this repository we provide a minimal working example for CIFAR-10 for optimizers `acdc`, `dense_mfac`, `sparse_mfac` and `micro_adam`:
+In this repository we provide a minimal working example for CIFAR-10 for optimizers `acdc`,
+`dense_mfac`, `sparse_mfac` and `micro_adam`:
 ```shell
 cd examples/cifar10
 OPTIMIZER=micro_adam # or any other optimizer listed above
@@ -291,22 +296,32 @@ optimizer = MicroAdam(
 # Versions summary:
 ---
+- **1.1.7** @ October 8th, 2025:
+  - added code for `Trion & DCT-AdamW`
+- **1.1.6** @ February 19th, 2025:
+  - do not update the parameters that have `None` gradient in method `update_model` from `tools.py`.
+  This is useful when using M-FAC for models with more than one classification head in the Continual Learning framework.
 - **1.1.5** @ February 19th, 2025:
-  - adapted `DenseMFAC` for a model with multiple classification heads for Continual Learning where we have one feature extractor block and a list of classification heads. The issue was related to the model size, which included the feature extractor backbone and all classification heads, but in practice only one classification head will be used for training and inference. This caused some size mismatch errors at runtime in the `DenseCoreMFAC` module because the gradient at runtime had fewer entries than the entire model. When using `DenseMFAC` for such settings, set `optimizer.model_size` to the correct size after calling the constructor and the `DenseCoreMFAC` object will be created automatically in the `step` function.
+  - adapted `DenseMFAC` for a model with multiple classification heads for Continual Learning where
+  we have one feature extractor block and a list of classification heads. The issue was related to
+  the model size, which included the feature extractor backbone and all classification heads, but
+  in practice only one classification head will be used for training and inference. This caused some
+  size mismatch errors at runtime in the `DenseCoreMFAC` module because the gradient at runtime had
+  fewer entries than the entire model. When using `DenseMFAC` for such settings, set `optimizer.model_size`
+  to the correct size after calling the constructor and the `DenseCoreMFAC` object will be created
+  automatically in the `step` function.
 - **1.1.3** @ September 5th, 2024:
   - allow using `SparseCoreMFACwithEF` separately by importing it in `sparse_mfac.__init__.py`
 - **1.1.2** @ August 1st, 2024:
-  - ***[1.1.0]:*** added support to densify the final update: introduced parameter alpha that controls the fraction of error feedback
-  (EF) to be integrated into the update to make it dense. Finally, the fraction alpha will be discarded from the EF at
-  the expense of another call to `Qinv` and `Q` (and implicitly quantization statistics computation).
-  - ***[1.0.2]:*** added FSDP-compatible implementation by initializing the parameter states in the `update_step` method
-  instead of MicroAdam constructor
+  - ***[1.1.0]:*** added support to densify the final update: introduced parameter alpha that controls
+  the fraction of error feedback (EF) to be integrated into the update to make it dense. Finally, the
+  fraction alpha will be discarded from the EF at the expense of another call to `Qinv` and `Q` (and
+  implicitly quantization statistics computation).
+  - ***[1.0.2]:*** added FSDP-compatible implementation by initializing the parameter states in the
+  `update_step` method instead of MicroAdam constructor
 - **1.0.1** @ June 27th, 2024:
   - removed version in dependencies to avoid conflicts with llm-foundry
 - **1.0.0** @ June 20th, 2024:
   - changed minimum required Python version to 3.8+ and torch to 2.3.0+
 - **0.0.1** @ June 13th, 2024:
   - added initial version of the package for Python 3.9+ and torch 2.3.1+

{ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.7}/ista_daslab_optimizers.egg-info/SOURCES.txt RENAMED Viewed

@@ -27,6 +27,8 @@ ista_daslab_optimizers/acdc/wd_scheduler.py
 ista_daslab_optimizers/dense_mfac/__init__.py
 ista_daslab_optimizers/dense_mfac/dense_core_mfac.py
 ista_daslab_optimizers/dense_mfac/dense_mfac.py
+ista_daslab_optimizers/ista_optimizer/__init__.py
+ista_daslab_optimizers/ista_optimizer/ista_optimizer.py
 ista_daslab_optimizers/micro_adam/__init__.py
 ista_daslab_optimizers/micro_adam/micro_adam.py
 ista_daslab_optimizers/sparse_mfac/__init__.py

{ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.7}/ista_daslab_optimizers.egg-info/requires.txt RENAMED Viewed

@@ -7,3 +7,4 @@ gpustat
 timm
 einops
 psutil
+fast-hadamard-transform

{ista_daslab_optimizers-1.1.5 → ista_daslab_optimizers-1.1.7}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name='ista_daslab_optimizers'
-version='1.1.5'
+version='1.1.7'
 dependencies = [
     "torch", # >=2.3.1",
     "torchaudio", # >=2.3.1",
@@ -15,6 +15,8 @@ dependencies = [
     "timm", # >=1.0.3",
     "einops", # >=0.7.0",
     "psutil", # >=5.9.8",
+    "fast-hadamard-transform",
+#    "fast-hadamard-transform @ git+https://github.com/Dao-AILab/fast-hadamard-transform.git",
 ]
 requires-python = '>= 3.8'
 authors = [