PyPI - adv-optm - Versions diffs - 1.2.2__tar.gz → 2.1.0__tar.gz - Mend

adv-optm 1.2.2tar.gz → 2.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

{adv_optm-1.2.2 → adv_optm-2.1.0}/PKG-INFO +39 -13
{adv_optm-1.2.2 → adv_optm-2.1.0}/README.md +38 -12
{adv_optm-1.2.2 → adv_optm-2.1.0}/adv_optm/__init__.py +3 -1
adv_optm-2.1.0/adv_optm/optim/AdaMuon_adv.py +544 -0
{adv_optm-1.2.2 → adv_optm-2.1.0}/adv_optm/optim/AdamW_adv.py +132 -121
{adv_optm-1.2.2 → adv_optm-2.1.0}/adv_optm/optim/Adopt_adv.py +151 -152
{adv_optm-1.2.2 → adv_optm-2.1.0}/adv_optm/optim/Lion_Prodigy_adv.py +143 -102
{adv_optm-1.2.2 → adv_optm-2.1.0}/adv_optm/optim/Lion_adv.py +110 -71
adv_optm-2.1.0/adv_optm/optim/Muon_adv.py +482 -0
{adv_optm-1.2.2 → adv_optm-2.1.0}/adv_optm/optim/Prodigy_adv.py +172 -156
adv_optm-2.1.0/adv_optm/optim/SignSGD_adv.py +245 -0
{adv_optm-1.2.2 → adv_optm-2.1.0}/adv_optm/optim/Simplified_AdEMAMix.py +85 -64
{adv_optm-1.2.2 → adv_optm-2.1.0}/adv_optm/optim/__init__.py +3 -1
{adv_optm-1.2.2 → adv_optm-2.1.0}/adv_optm/util/Kourkoutas.py +72 -41
adv_optm-2.1.0/adv_optm/util/Muon_AuxAdam.py +163 -0
adv_optm-2.1.0/adv_optm/util/Muon_util.py +322 -0
{adv_optm-1.2.2 → adv_optm-2.1.0}/adv_optm/util/OrthoGrad.py +9 -4
adv_optm-2.1.0/adv_optm/util/__init__.py +0 -0
adv_optm-2.1.0/adv_optm/util/factorization_util.py +105 -0
adv_optm-2.1.0/adv_optm/util/lion_k.py +53 -0
adv_optm-2.1.0/adv_optm/util/param_update.py +164 -0
adv_optm-2.1.0/adv_optm/util/update_util.py +24 -0
{adv_optm-1.2.2 → adv_optm-2.1.0}/adv_optm.egg-info/PKG-INFO +39 -13
{adv_optm-1.2.2 → adv_optm-2.1.0}/adv_optm.egg-info/SOURCES.txt +8 -6
{adv_optm-1.2.2 → adv_optm-2.1.0}/setup.py +1 -1
adv_optm-1.2.2/adv_optm/optim/AdaMuon_adv.py +0 -729
adv_optm-1.2.2/adv_optm/optim/Muon_adv.py +0 -730
adv_optm-1.2.2/adv_optm/util/BF16_Stochastic_Rounding.py +0 -65
adv_optm-1.2.2/adv_optm/util/Effective_Shape.py +0 -8
adv_optm-1.2.2/adv_optm/util/NNMF.py +0 -18
adv_optm-1.2.2/adv_optm/util/Newton_Schulz.py +0 -87
adv_optm-1.2.2/adv_optm/util/One_Bit_Boolean.py +0 -22
adv_optm-1.2.2/adv_optm/util/__init__.py +0 -13
{adv_optm-1.2.2 → adv_optm-2.1.0}/LICENSE +0 -0
{adv_optm-1.2.2 → adv_optm-2.1.0}/adv_optm.egg-info/dependency_links.txt +0 -0
{adv_optm-1.2.2 → adv_optm-2.1.0}/adv_optm.egg-info/requires.txt +0 -0
{adv_optm-1.2.2 → adv_optm-2.1.0}/adv_optm.egg-info/top_level.txt +0 -0
{adv_optm-1.2.2 → adv_optm-2.1.0}/setup.cfg +0 -0

{adv_optm-1.2.2 → adv_optm-2.1.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: adv_optm
-Version: 1.2.2
+Version: 2.1.0
 Summary: A family of highly efficient, lightweight yet powerful optimizers.
 Home-page: https://github.com/Koratahiu/Advanced_Optimizers
 Author: Koratahiu
@@ -35,6 +35,32 @@ A comprehensive, all-in-one collection of optimization algorithms for deep learn
 [![PyPI](https://img.shields.io/pypi/v/adv_optm)](https://pypi.org/project/adv_optm/)
+## 🔥 What's New
+### in 2.0.x
+* Implemented torch.compile for all advanced optimizers. Enabled via (compiled_optimizer=True) to fuse and optimize the optimizer step path.
+* Better and improved 1-bit factored mode via (nnmf_factor=True).
+* Various improvements across the optimizers.
+### in 1.2.x
+* Added **advanced variants** of [Muon optimizer](https://kellerjordan.github.io/posts/muon/) with **features** and **settings** from recent papers.
+| Optimizer | Description |
+|---|---|
+| `Muon_adv` | Advanced Muon implementation with CANS, NorMuon, Low-Rank ortho, etc. features. |
+| `AdaMuon_adv` | Advanced AdaMuon implementation, which combines Muon's geometry with Adam-like adaptive scaling and sign-based orthogonalization. |
+> *Documentation coming soon.*
+* Implemented [Cautious Weight Decay](https://arxiv.org/abs/2510.12402) for all advanced optimizers.
+* Improved parameter update and weight decay for **BF16** with **stochastic rounding**. The updates are now accumulated in **float32** and rounded once at the end.
+* Use fused and in-place operations whenever possible for all advanced optimizers.
+* **Prodigy variants** are now **50% faster** by [avoiding CUDA syncs](https://github.com/Koratahiu/Advanced_Optimizers/pull/5). Thanks to **@dxqb**!
 ---
 ## 📦 Installation
@@ -52,7 +78,7 @@ This library integrates multiple state-of-the-art optimization techniques valida
 ### **Memory-Efficient Optimization (SMMF-inspired)**
 - **Paper**: [SMMF: Square-Matricized Momentum Factorization](https://arxiv.org/abs/2412.08894)
 - **Approach**: Uses rank-1 non-negative matrix factorization with reconstruction cycle (factor → reconstruct → update → factor)
-- **Innovation**:
+- **Innovation**:
   - First moment split into **1-bit sign + absolute value**
   - Final storage: **four factored vectors + one 1-bit sign state**
   - Preserves Adam-like update quality with drastically reduced memory
@@ -110,7 +136,7 @@ This library integrates multiple state-of-the-art optimization techniques valida
 ## 🛠️ Comprehensive Feature Guide
-### A. Universal Safe Features
+### A. Universal Safe Features
 *These features work with all optimizers and are generally safe to enable.*
 | Feature | Description | Recommended Usage | Performance Impact | Theoretical Basis | Compatibility |
@@ -164,7 +190,7 @@ This library integrates multiple state-of-the-art optimization techniques valida
 | `beta1` | 0.99 | Controls accumulator memory length:<br>• Small BS: **0.99–0.9999**<br>• Large BS: **0.9** |
 | `Grad α` | 100 | Most critical parameter:<br>• Inversely scales with batch size<br>• **100–10** for small BS (≤32)<br>• **1–0.1** for large BS (≥512) |
-> ⚠️ **Critical**: Requires **~100x smaller learning rate** than AdamW (e.g., 1e-6 vs 1e-4).
+> ⚠️ **Critical**: Requires **~100x smaller learning rate** than AdamW (e.g., 1e-6 vs 1e-4).
 > For `Prodigy_Adv`, set `initial_d` to:
 > - **LoRA**: `1e-8`
 > - **Full FT**: `1e-10`
@@ -180,7 +206,7 @@ This library integrates multiple state-of-the-art optimization techniques valida
 - Automatically clips updates to **[-2, 2]**, preventing destabilizing jumps.
 - **Highly recommended** for `Adopt_Adv`, which is prone to instability without clipping.
-> 📚 **Reference**:
+> 📚 **Reference**:
 > - Paper: https://arxiv.org/abs/2407.05872
 > - Code: https://github.com/lucidrains/adam-atan2-pytorch
@@ -192,8 +218,8 @@ This library integrates multiple state-of-the-art optimization techniques valida
 Instead of using a fixed β₂ (e.g., 0.999 or 0.95), it **dynamically modulates β₂ per layer** based on a bounded *sunspike ratio*:
-- **During gradient bursts** → β₂ ↓ toward `Lower β₂` → faster reaction
-- **During calm phases** → β₂ ↑ toward `The Selected β₂` → stronger smoothing
+- **During gradient bursts** → β₂ ↓ toward `Lower β₂` → faster reaction
+- **During calm phases** → β₂ ↑ toward `The Selected β₂` → stronger smoothing
 This is especially effective for **noisy training, small batch sizes, and high learning rates**, where gradient norms shift abruptly due to noise or aggressive LR schedules.
@@ -206,17 +232,17 @@ This is especially effective for **noisy training, small batch sizes, and high l
 > 💡 **Best Practice**: Set `K_warmup_steps` equal to your standard LR warmup steps. During warmup, the optimizer uses the static `beta2`; adaptation begins only after warmup ends.
-> 📚 **Reference**:
-> - Paper: [Kourkoutas-β: A Sunspike-Driven Adam Optimizer with Desert Flair](https://arxiv.org/abs/2508.12996)
+> 📚 **Reference**:
+> - Paper: [Kourkoutas-β: A Sunspike-Driven Adam Optimizer with Desert Flair](https://arxiv.org/abs/2508.12996)
 > - Code: [kbeta](https://github.com/sck-at-ucy/kbeta)
 ---
 ## 📚 References
-1. [Revisiting BFloat16 Training](https://arxiv.org/abs/2010.06192)
-2. [SMMF: Square-Matricized Momentum Factorization](https://arxiv.org/abs/2412.08894)
-3. [The AdEMAMix Optimizer](https://arxiv.org/abs/2409.03137)
-4. [Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD](https://arxiv.org/abs/2502.02431)
+1. [Revisiting BFloat16 Training](https://arxiv.org/abs/2010.06192)
+2. [SMMF: Square-Matricized Momentum Factorization](https://arxiv.org/abs/2412.08894)
+3. [The AdEMAMix Optimizer](https://arxiv.org/abs/2409.03137)
+4. [Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD](https://arxiv.org/abs/2502.02431)
 6. [Kourkoutas-β: A Sunspike-Driven Adam Optimizer with Desert Flair](https://arxiv.org/abs/2508.12996)
 7. [Scaling Exponents Across Parameterizations and Optimizers](https://arxiv.org/abs/2407.05872)

{adv_optm-1.2.2 → adv_optm-2.1.0}/README.md RENAMED Viewed

@@ -4,6 +4,32 @@ A comprehensive, all-in-one collection of optimization algorithms for deep learn
 [![PyPI](https://img.shields.io/pypi/v/adv_optm)](https://pypi.org/project/adv_optm/)
+## 🔥 What's New
+### in 2.0.x
+* Implemented torch.compile for all advanced optimizers. Enabled via (compiled_optimizer=True) to fuse and optimize the optimizer step path.
+* Better and improved 1-bit factored mode via (nnmf_factor=True).
+* Various improvements across the optimizers.
+### in 1.2.x
+* Added **advanced variants** of [Muon optimizer](https://kellerjordan.github.io/posts/muon/) with **features** and **settings** from recent papers.
+| Optimizer | Description |
+|---|---|
+| `Muon_adv` | Advanced Muon implementation with CANS, NorMuon, Low-Rank ortho, etc. features. |
+| `AdaMuon_adv` | Advanced AdaMuon implementation, which combines Muon's geometry with Adam-like adaptive scaling and sign-based orthogonalization. |
+> *Documentation coming soon.*
+* Implemented [Cautious Weight Decay](https://arxiv.org/abs/2510.12402) for all advanced optimizers.
+* Improved parameter update and weight decay for **BF16** with **stochastic rounding**. The updates are now accumulated in **float32** and rounded once at the end.
+* Use fused and in-place operations whenever possible for all advanced optimizers.
+* **Prodigy variants** are now **50% faster** by [avoiding CUDA syncs](https://github.com/Koratahiu/Advanced_Optimizers/pull/5). Thanks to **@dxqb**!
 ---
 ## 📦 Installation
@@ -21,7 +47,7 @@ This library integrates multiple state-of-the-art optimization techniques valida
 ### **Memory-Efficient Optimization (SMMF-inspired)**
 - **Paper**: [SMMF: Square-Matricized Momentum Factorization](https://arxiv.org/abs/2412.08894)
 - **Approach**: Uses rank-1 non-negative matrix factorization with reconstruction cycle (factor → reconstruct → update → factor)
-- **Innovation**:
+- **Innovation**:
   - First moment split into **1-bit sign + absolute value**
   - Final storage: **four factored vectors + one 1-bit sign state**
   - Preserves Adam-like update quality with drastically reduced memory
@@ -79,7 +105,7 @@ This library integrates multiple state-of-the-art optimization techniques valida
 ## 🛠️ Comprehensive Feature Guide
-### A. Universal Safe Features
+### A. Universal Safe Features
 *These features work with all optimizers and are generally safe to enable.*
 | Feature | Description | Recommended Usage | Performance Impact | Theoretical Basis | Compatibility |
@@ -133,7 +159,7 @@ This library integrates multiple state-of-the-art optimization techniques valida
 | `beta1` | 0.99 | Controls accumulator memory length:<br>• Small BS: **0.99–0.9999**<br>• Large BS: **0.9** |
 | `Grad α` | 100 | Most critical parameter:<br>• Inversely scales with batch size<br>• **100–10** for small BS (≤32)<br>• **1–0.1** for large BS (≥512) |
-> ⚠️ **Critical**: Requires **~100x smaller learning rate** than AdamW (e.g., 1e-6 vs 1e-4).
+> ⚠️ **Critical**: Requires **~100x smaller learning rate** than AdamW (e.g., 1e-6 vs 1e-4).
 > For `Prodigy_Adv`, set `initial_d` to:
 > - **LoRA**: `1e-8`
 > - **Full FT**: `1e-10`
@@ -149,7 +175,7 @@ This library integrates multiple state-of-the-art optimization techniques valida
 - Automatically clips updates to **[-2, 2]**, preventing destabilizing jumps.
 - **Highly recommended** for `Adopt_Adv`, which is prone to instability without clipping.
-> 📚 **Reference**:
+> 📚 **Reference**:
 > - Paper: https://arxiv.org/abs/2407.05872
 > - Code: https://github.com/lucidrains/adam-atan2-pytorch
@@ -161,8 +187,8 @@ This library integrates multiple state-of-the-art optimization techniques valida
 Instead of using a fixed β₂ (e.g., 0.999 or 0.95), it **dynamically modulates β₂ per layer** based on a bounded *sunspike ratio*:
-- **During gradient bursts** → β₂ ↓ toward `Lower β₂` → faster reaction
-- **During calm phases** → β₂ ↑ toward `The Selected β₂` → stronger smoothing
+- **During gradient bursts** → β₂ ↓ toward `Lower β₂` → faster reaction
+- **During calm phases** → β₂ ↑ toward `The Selected β₂` → stronger smoothing
 This is especially effective for **noisy training, small batch sizes, and high learning rates**, where gradient norms shift abruptly due to noise or aggressive LR schedules.
@@ -175,17 +201,17 @@ This is especially effective for **noisy training, small batch sizes, and high l
 > 💡 **Best Practice**: Set `K_warmup_steps` equal to your standard LR warmup steps. During warmup, the optimizer uses the static `beta2`; adaptation begins only after warmup ends.
-> 📚 **Reference**:
-> - Paper: [Kourkoutas-β: A Sunspike-Driven Adam Optimizer with Desert Flair](https://arxiv.org/abs/2508.12996)
+> 📚 **Reference**:
+> - Paper: [Kourkoutas-β: A Sunspike-Driven Adam Optimizer with Desert Flair](https://arxiv.org/abs/2508.12996)
 > - Code: [kbeta](https://github.com/sck-at-ucy/kbeta)
 ---
 ## 📚 References
-1. [Revisiting BFloat16 Training](https://arxiv.org/abs/2010.06192)
-2. [SMMF: Square-Matricized Momentum Factorization](https://arxiv.org/abs/2412.08894)
-3. [The AdEMAMix Optimizer](https://arxiv.org/abs/2409.03137)
-4. [Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD](https://arxiv.org/abs/2502.02431)
+1. [Revisiting BFloat16 Training](https://arxiv.org/abs/2010.06192)
+2. [SMMF: Square-Matricized Momentum Factorization](https://arxiv.org/abs/2412.08894)
+3. [The AdEMAMix Optimizer](https://arxiv.org/abs/2409.03137)
+4. [Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD](https://arxiv.org/abs/2502.02431)
 6. [Kourkoutas-β: A Sunspike-Driven Adam Optimizer with Desert Flair](https://arxiv.org/abs/2508.12996)
 7. [Scaling Exponents Across Parameterizations and Optimizers](https://arxiv.org/abs/2407.05872)

{adv_optm-1.2.2 → adv_optm-2.1.0}/adv_optm/__init__.py RENAMED Viewed

@@ -7,6 +7,7 @@ from .optim import (
     Lion_Prodigy_adv,
     Muon_adv,
     AdaMuon_adv,
+    SignSGD_adv,
 )
 __all__ = [
@@ -18,6 +19,7 @@ __all__ = [
     "Lion_Prodigy_adv",
     "Muon_adv",
     "AdaMuon_adv",
+    "SignSGD_adv",
 ]
-__version__ = "1.2.2"
+__version__ = "2.1.0"

adv-optm 1.2.2__tar.gz → 2.1.0__tar.gz

adv-optm 1.2.2tar.gz → 2.1.0tar.gz