torchzero 0.1.7__py3-none-any.whl → 0.1.8__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
torchzero/core/module.py CHANGED
@@ -212,6 +212,22 @@ class OptimizerModule(TensorListOptimizer, ABC): # type:ignore
212
212
  if self._initialized: return super().__repr__()
213
213
  return f"uninitialized {self.__class__.__name__}()"
214
214
 
215
+ def state_dict(self):
216
+ state_dict = {}
217
+ state_dict['__self__'] = super().state_dict()
218
+ for k,v in self.children.items():
219
+ state_dict[k] = v.state_dict()
220
+ return state_dict
221
+
222
+ def load_state_dict(self, state_dict: dict[str, Any]) -> None:
223
+ super().load_state_dict(state_dict['__self__'])
224
+ for k, v in self.children.items():
225
+ if k in state_dict:
226
+ v.load_state_dict(state_dict[k])
227
+ else:
228
+ warnings.warn(f"Tried to load state dict for {k}: {v.__class__.__name__}, but it is not present in state_dict with {list(state_dict.keys()) = }")
229
+
230
+
215
231
  def set_params(self, params: ParamsT):
216
232
  """
217
233
  Set parameters to this module. Use this to set per-parameter group settings.
@@ -2,6 +2,7 @@ from collections import abc
2
2
  import warnings
3
3
  from inspect import cleandoc
4
4
  import torch
5
+ from typing import Any
5
6
 
6
7
  from ..core import OptimizerModule, TensorListOptimizer, OptimizationVars, _Chain, _Chainable
7
8
  from ..utils.python_tools import flatten
@@ -67,6 +68,21 @@ class Modular(TensorListOptimizer):
67
68
  for hook in module.post_init_hooks:
68
69
  hook(self, module)
69
70
 
71
+ def state_dict(self):
72
+ state_dict = {}
73
+ state_dict['__self__'] = super().state_dict()
74
+ for i,v in enumerate(self.unrolled_modules):
75
+ state_dict[str(i)] = v.state_dict()
76
+ return state_dict
77
+
78
+ def load_state_dict(self, state_dict: dict[str, Any]) -> None:
79
+ super().load_state_dict(state_dict['__self__'])
80
+ for i,v in enumerate(self.unrolled_modules):
81
+ if str(i) in state_dict:
82
+ v.load_state_dict(state_dict[str(i)])
83
+ else:
84
+ warnings.warn(f"Tried to load state dict for {i}th module: {v.__class__.__name__}, but it is not present in state_dict with {list(state_dict.keys()) = }")
85
+
70
86
  def get_lr_module(self, last=True) -> OptimizerModule:
71
87
  """
72
88
  Retrieves the module in the chain that controls the learning rate.
@@ -0,0 +1,130 @@
1
+ Metadata-Version: 2.2
2
+ Name: torchzero
3
+ Version: 0.1.8
4
+ Summary: Modular optimization library for PyTorch.
5
+ Author-email: Ivan Nikishev <nkshv2@gmail.com>
6
+ License: MIT License
7
+
8
+ Copyright (c) 2024 inikishev
9
+
10
+ Permission is hereby granted, free of charge, to any person obtaining a copy
11
+ of this software and associated documentation files (the "Software"), to deal
12
+ in the Software without restriction, including without limitation the rights
13
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
14
+ copies of the Software, and to permit persons to whom the Software is
15
+ furnished to do so, subject to the following conditions:
16
+
17
+ The above copyright notice and this permission notice shall be included in all
18
+ copies or substantial portions of the Software.
19
+
20
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
23
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
25
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
26
+ SOFTWARE.
27
+
28
+ Project-URL: Homepage, https://github.com/inikishev/torchzero
29
+ Project-URL: Repository, https://github.com/inikishev/torchzero
30
+ Project-URL: Issues, https://github.com/inikishev/torchzero/isses
31
+ Keywords: optimization,optimizers,torch,neural networks,zeroth order,second order
32
+ Requires-Python: >=3.10
33
+ Description-Content-Type: text/markdown
34
+ License-File: LICENSE
35
+ Requires-Dist: torch
36
+ Requires-Dist: numpy
37
+ Requires-Dist: typing_extensions
38
+
39
+ ![example workflow](https://github.com/inikishev/torchzero/actions/workflows/tests.yml/badge.svg)
40
+
41
+ # torchzero
42
+
43
+ `torchzero` implements a large number of chainable optimization modules that can be chained together to create custom optimizers:
44
+
45
+ ```py
46
+ import torchzero as tz
47
+
48
+ optimizer = tz.Modular(
49
+ model.parameters(),
50
+ tz.m.Adam(),
51
+ tz.m.Cautious(),
52
+ tz.m.LR(1e-3),
53
+ tz.m.WeightDecay(1e-4)
54
+ )
55
+
56
+ # standard training loop
57
+ for batch in dataset:
58
+ preds = model(batch)
59
+ loss = criterion(preds)
60
+ optimizer.zero_grad()
61
+ optimizer.step()
62
+ ```
63
+
64
+ Each module takes the output of the previous module and applies a further transformation. This modular design avoids redundant code, such as reimplementing cautioning, orthogonalization, laplacian smoothing, etc for every optimizer. It is also easy to experiment with grafting, interpolation between different optimizers, and perhaps some weirder combinations like nested momentum.
65
+
66
+ Modules are not limited to gradient transformations. They can perform other operations like line searches, exponential moving average (EMA) and stochastic weight averaging (SWA), gradient accumulation, gradient approximation, and more.
67
+
68
+ There are over 100 modules, all accessible within the `tz.m` namespace. For example, the Adam update rule is available as `tz.m.Adam`. Complete list of modules is available in [documentation](https://torchzero.readthedocs.io/en/latest/autoapi/torchzero/modules/index.html).
69
+
70
+ ## Closure
71
+
72
+ Some modules and optimizers in torchzero, particularly line-search methods and gradient approximation modules, require a closure function. This is similar to how `torch.optim.LBFGS` works in PyTorch. In torchzero, closure needs to accept a boolean backward argument (though the argument can have any name). When `backward=True`, the closure should zero out old gradients using `opt.zero_grad()`, and compute new gradients using `loss.backward()`.
73
+
74
+ ```py
75
+ def closure(backward = True):
76
+ preds = model(inputs)
77
+ loss = loss_fn(preds, targets)
78
+
79
+ if backward:
80
+ optimizer.zero_grad()
81
+ loss.backward()
82
+ return loss
83
+
84
+ optimizer.step(closure)
85
+ ```
86
+
87
+ If you intend to use gradient-free methods, `backward` argument is still required in the closure. Simply leave it unused. Gradient-free and gradient approximation methods always call closure with `backward=False`.
88
+
89
+ All built-in pytorch optimizers, as well as most custom ones, support closure too. So the code above will work with all other optimizers out of the box, and you can switch between different optimizers without rewriting your training loop.
90
+
91
+ # Documentation
92
+
93
+ For more information on how to create, use and extend torchzero modules, please refer to the documentation at [torchzero.readthedocs.io](https://torchzero.readthedocs.io/en/latest/index.html).
94
+
95
+ # Extra
96
+
97
+ Some other optimization related things in torchzero:
98
+
99
+ ### scipy.optimize.minimize wrapper
100
+
101
+ scipy.optimize.minimize wrapper with support for both gradient and hessian via batched autograd
102
+
103
+ ```py
104
+ from torchzero.optim.wrappers.scipy import ScipyMinimize
105
+ opt = ScipyMinimize(model.parameters(), method = 'trust-krylov')
106
+ ```
107
+
108
+ Use as any other closure-based optimizer, but make sure closure accepts `backward` argument. Note that it performs full minimization on each step.
109
+
110
+ ### Nevergrad wrapper
111
+
112
+ [Nevergrad](https://github.com/facebookresearch/nevergrad) is an optimization library by facebook with an insane number of gradient free methods.
113
+
114
+ ```py
115
+ from torchzero.optim.wrappers.nevergrad import NevergradOptimizer
116
+ opt = NevergradOptimizer(bench.parameters(), ng.optimizers.NGOptBase, budget = 1000)
117
+ ```
118
+
119
+ Use as any other closure-based optimizer, but make sure closure accepts `backward` argument.
120
+
121
+ ### NLopt wrapper
122
+
123
+ [NLopt](https://nlopt.readthedocs.io/en/latest/NLopt_Algorithms/) is another optimization library similar to scipy.optimize.minimize, with a large number of both gradient based and gradient free methods.
124
+
125
+ ```py
126
+ from torchzero.optim.wrappers.nlopt import NLOptOptimizer
127
+ opt = NLOptOptimizer(bench.parameters(), 'LD_TNEWTON_PRECOND_RESTART', maxeval = 1000)
128
+ ```
129
+
130
+ Use as any other closure-based optimizer, but make sure closure accepts `backward` argument. Note that it performs full minimization on each step.
@@ -1,7 +1,7 @@
1
1
  torchzero/__init__.py,sha256=CCIYfhGNYMnRP_cdXL7DgocxkEWYUZYgB3Sf1T5tdYY,203
2
2
  torchzero/tensorlist.py,sha256=V9m5zJ44PtiJxTOiM7cADxYaWK-ogv5Tk_KnyQqm1oo,41601
3
3
  torchzero/core/__init__.py,sha256=hab7HAep0JIVeJ-EQhcOAB9oKIIo2MuCVn7yS3BFVYA,266
4
- torchzero/core/module.py,sha256=Dhf8Rn6zKpzzO3rzyxzJnRAe_tOKS35AgH-FQQKR16A,22362
4
+ torchzero/core/module.py,sha256=iFgRX_PY2Y0WD8_w5q6ygry1VhvpsE3jO-ySS_YQkuQ,23023
5
5
  torchzero/core/tensorlist_optimizer.py,sha256=MenRzyNPQJWx26j4Tj2vbX2pEcRu_26HQPIpUu_nsFc,9957
6
6
  torchzero/modules/__init__.py,sha256=5f8kt2mMn1eo9YcjXc3ESW-bqMQIRf646V3zlr8UAO4,571
7
7
  torchzero/modules/adaptive/__init__.py,sha256=YBVDXCosr4-C-GFCoreHS3DFyHiYMhCbOWgdhVVaZ_E,161
@@ -69,7 +69,7 @@ torchzero/modules/weight_averaging/__init__.py,sha256=nJJRs68AV2G4rGwiiHNRfm6Xmt
69
69
  torchzero/modules/weight_averaging/ema.py,sha256=xdVnKC8PxCQCec4ad3ncvznvVsQM5O6EVfsOKsRr18k,2854
70
70
  torchzero/modules/weight_averaging/swa.py,sha256=syio5qq1vf76d5bAJU-I8Zc8U12bR2n8C9hEznCgr7s,6764
71
71
  torchzero/optim/__init__.py,sha256=vk6pIYJHWAGYJMdtJ1otsmVph-pdL5HwBg-CTeBCGso,253
72
- torchzero/optim/modular.py,sha256=tiuKXpLqq6jWwWWrfC_QU9XkErl2Saaw_xGau0frwzQ,6060
72
+ torchzero/optim/modular.py,sha256=tg-VUcxDAhBzM4uR7SFkf5I2hV_SScfqXCbZpvV1Yzc,6788
73
73
  torchzero/optim/experimental/__init__.py,sha256=RqNzJu5mVl3T0u7cf4TBzSiA20M1kxTZVYWjSVhEHuU,585
74
74
  torchzero/optim/experimental/experimental.py,sha256=tMHZVbEXm3s6mMr7unFSvk_Jks3uAaAG0fzsH6gr098,10928
75
75
  torchzero/optim/experimental/ray_search.py,sha256=GYyssL64D6RiImrZ2tchoZJ04x9rX-Bp1y2nQXEGxX0,2662
@@ -97,8 +97,8 @@ torchzero/utils/compile.py,sha256=pYEyX8P26iCb_hFqAXC8IP2SSQrRfC7ZDhXS0vVCsfY,12
97
97
  torchzero/utils/derivatives.py,sha256=koLmuUcVcX41SrH_9rvfJyMXyHyocNLuZ-C8Kr2B7hk,4844
98
98
  torchzero/utils/python_tools.py,sha256=kkyDhoP695HhapfKrdjcrRbRAbcvB0ArP1pkxuVUlf0,1192
99
99
  torchzero/utils/torch_tools.py,sha256=sSBY5Bmk9LOAgPtaq-6TK4wDgPXsg6FIWxv8CVDx82k,3580
100
- torchzero-0.1.7.dist-info/LICENSE,sha256=r9ZciAoZoqKC_FNADE0ORukj1p1XhLXEbegdsAyqhJs,1087
101
- torchzero-0.1.7.dist-info/METADATA,sha256=kA1ASlxBXLMOh_T2pgyk2OHsI50sOwguDfDmjg8fgAU,6096
102
- torchzero-0.1.7.dist-info/WHEEL,sha256=In9FTNxeP60KnTkGw7wk6mJPYd_dQSjEZmXdBdMCI-8,91
103
- torchzero-0.1.7.dist-info/top_level.txt,sha256=isztuDR1ZGo8p2tORLa-vNuomcbLj7Xd208lhd-pVPs,10
104
- torchzero-0.1.7.dist-info/RECORD,,
100
+ torchzero-0.1.8.dist-info/LICENSE,sha256=r9ZciAoZoqKC_FNADE0ORukj1p1XhLXEbegdsAyqhJs,1087
101
+ torchzero-0.1.8.dist-info/METADATA,sha256=gtnLFgZ4XQwI7GO5U_p8fUFrt1X8V_Lyv9bvISGnSro,6058
102
+ torchzero-0.1.8.dist-info/WHEEL,sha256=In9FTNxeP60KnTkGw7wk6mJPYd_dQSjEZmXdBdMCI-8,91
103
+ torchzero-0.1.8.dist-info/top_level.txt,sha256=isztuDR1ZGo8p2tORLa-vNuomcbLj7Xd208lhd-pVPs,10
104
+ torchzero-0.1.8.dist-info/RECORD,,
@@ -1,120 +0,0 @@
1
- Metadata-Version: 2.2
2
- Name: torchzero
3
- Version: 0.1.7
4
- Summary: Modular optimization library for PyTorch.
5
- Author-email: Ivan Nikishev <nkshv2@gmail.com>
6
- License: MIT License
7
-
8
- Copyright (c) 2024 inikishev
9
-
10
- Permission is hereby granted, free of charge, to any person obtaining a copy
11
- of this software and associated documentation files (the "Software"), to deal
12
- in the Software without restriction, including without limitation the rights
13
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
14
- copies of the Software, and to permit persons to whom the Software is
15
- furnished to do so, subject to the following conditions:
16
-
17
- The above copyright notice and this permission notice shall be included in all
18
- copies or substantial portions of the Software.
19
-
20
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
23
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
25
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
26
- SOFTWARE.
27
-
28
- Project-URL: Homepage, https://github.com/inikishev/torchzero
29
- Project-URL: Repository, https://github.com/inikishev/torchzero
30
- Project-URL: Issues, https://github.com/inikishev/torchzero/isses
31
- Keywords: optimization,optimizers,torch,neural networks,zeroth order,second order
32
- Requires-Python: >=3.10
33
- Description-Content-Type: text/markdown
34
- License-File: LICENSE
35
- Requires-Dist: torch
36
- Requires-Dist: numpy
37
- Requires-Dist: typing_extensions
38
-
39
- ![example workflow](https://github.com/inikishev/torchzero/actions/workflows/tests.yml/badge.svg)
40
-
41
- # torchzero
42
-
43
- This is a work-in-progress optimizers library for pytorch with composable zeroth, first, second order and quasi newton methods, gradient approximation, line searches and a whole lot of other stuff.
44
-
45
- Most optimizers are modular, meaning you can chain them like this:
46
-
47
- ```py
48
- optimizer = torchzero.optim.Modular(model.parameters(), [*list of modules*])`
49
- ```
50
-
51
- For example you might use `[ClipNorm(4), LR(1e-3), NesterovMomentum(0.9)]` for standard SGD with gradient clipping and nesterov momentum. Move `ClipNorm` to the end to clip the update instead of the gradients. If you don't have access to gradients, add a `RandomizedFDM()` at the beginning to approximate them via randomized finite differences. Add `Cautious()` to make the optimizer cautious.
52
-
53
- Each new module takes previous module update and works on it. That way there is no need to reimplement stuff like laplacian smoothing for all optimizers, and it is easy to experiment with grafting, interpolation between different optimizers, and perhaps some weirder combinations like nested momentum.
54
-
55
- # How to use
56
-
57
- All modules are defined in `torchzero.modules`. You can generally mix and match them however you want. Some pre-made optimizers are available in `torchzero.optim`.
58
-
59
- Some optimizers require closure, which should look like this:
60
-
61
- ```py
62
- def closure(backward = True):
63
- preds = model(inputs)
64
- loss = loss_fn(preds, targets)
65
-
66
- # if you can't call loss.backward(), and instead use gradient-free methods,
67
- # they always call closure with backward=False.
68
- # so you can remove the part below, but keep the unused backward argument.
69
- if backward:
70
- optimizer.zero_grad()
71
- loss.backward()
72
- return loss
73
-
74
- optimizer.step(closure)
75
- ```
76
-
77
- This closure will also work with all built in pytorch optimizers, including LBFGS, all optimizers in this library, as well as most custom ones.
78
-
79
- # Contents
80
-
81
- Docs are available at [torchzero.readthedocs.io](https://torchzero.readthedocs.io/en/latest/). A preliminary list of all modules is available here <https://torchzero.readthedocs.io/en/latest/autoapi/torchzero/modules/index.html#classes>. Some of the implemented algorithms:
82
-
83
- - SGD/Rprop/RMSProp/AdaGrad/Adam as composable modules. They are also tested to exactly match built in pytorch versions.
84
- - Cautious Optimizers (<https://huggingface.co/papers/2411.16085>)
85
- - Optimizer grafting (<https://openreview.net/forum?id=FpKgG31Z_i9>)
86
- - Laplacian smoothing (<https://arxiv.org/abs/1806.06317>)
87
- - Polyak momentum, nesterov momentum
88
- - Gradient norm and value clipping, gradient normalization
89
- - Gradient centralization (<https://arxiv.org/abs/2004.01461>)
90
- - Learning rate droput (<https://pubmed.ncbi.nlm.nih.gov/35286266/>).
91
- - Forward gradient (<https://arxiv.org/abs/2202.08587>)
92
- - Gradient approximation via finite difference or randomized finite difference, which includes SPSA, RDSA, FDSA and Gaussian smoothing (<https://arxiv.org/abs/2211.13566v3>)
93
- - Various line searches
94
- - Exact Newton's method (with Levenberg-Marquardt regularization), newton with hessian approximation via finite difference, subspace finite differences newton.
95
- - Directional newton via one additional forward pass
96
-
97
- All modules should be quite fast, especially on models with many different parameters, due to `_foreach` operations.
98
-
99
- I am getting to the point where I can start focusing on good docs and tests. As of now, the code should be considered experimental, untested and subject to change, so feel free but be careful if using this for actual project.
100
-
101
- # Wrappers
102
-
103
- ### scipy.optimize.minimize wrapper
104
-
105
- scipy.optimize.minimize wrapper with support for both gradient and hessian via batched autograd
106
-
107
- ```py
108
- from torchzero.optim.wrappers.scipy import ScipyMinimize
109
- opt = ScipyMinimize(model.parameters(), method = 'trust-krylov')
110
- ```
111
-
112
- Use as any other optimizer (make sure closure accepts `backward` argument like one from **How to use**). Note that it performs full minimization on each step.
113
-
114
- ### Nevergrad wrapper
115
-
116
- ```py
117
- opt = NevergradOptimizer(bench.parameters(), ng.optimizers.NGOptBase, budget = 1000)
118
- ```
119
-
120
- Use as any other optimizer (make sure closure accepts `backward` argument like one from **How to use**).