barbor 1.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- barbor-1.0.0/LICENSE +21 -0
- barbor-1.0.0/PKG-INFO +56 -0
- barbor-1.0.0/README.md +24 -0
- barbor-1.0.0/barbor/__init__.py +2 -0
- barbor-1.0.0/barbor/momentum.py +36 -0
- barbor-1.0.0/barbor/optimizer.py +242 -0
- barbor-1.0.0/barbor/restart.py +90 -0
- barbor-1.0.0/barbor/stepsize.py +89 -0
- barbor-1.0.0/barbor/utils.py +42 -0
- barbor-1.0.0/barbor.egg-info/PKG-INFO +56 -0
- barbor-1.0.0/barbor.egg-info/SOURCES.txt +14 -0
- barbor-1.0.0/barbor.egg-info/dependency_links.txt +1 -0
- barbor-1.0.0/barbor.egg-info/not-zip-safe +1 -0
- barbor-1.0.0/barbor.egg-info/top_level.txt +1 -0
- barbor-1.0.0/setup.cfg +4 -0
- barbor-1.0.0/setup.py +47 -0
barbor-1.0.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Jing Lin
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
barbor-1.0.0/PKG-INFO
ADDED
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: barbor
|
|
3
|
+
Version: 1.0.0
|
|
4
|
+
Summary: The gradient optimization library with barzilar borwein method.
|
|
5
|
+
Home-page: https://github.com/linjing-lab/barbor
|
|
6
|
+
Download-URL: https://github.com/linjing-lab/barbor/tags
|
|
7
|
+
Author: 林景
|
|
8
|
+
Author-email: linjing010729@163.com
|
|
9
|
+
License: MIT
|
|
10
|
+
Project-URL: Source, https://github.com/linjing-lab/barbor/tree/main/barbor/
|
|
11
|
+
Project-URL: Tracker, https://github.com/linjing-lab/barbor/issues
|
|
12
|
+
Platform: UNKNOWN
|
|
13
|
+
Classifier: Development Status :: 5 - Production/Stable
|
|
14
|
+
Classifier: Intended Audience :: Developers
|
|
15
|
+
Classifier: Intended Audience :: Information Technology
|
|
16
|
+
Classifier: Intended Audience :: Science/Research
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
22
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
23
|
+
Classifier: Topic :: Scientific/Engineering
|
|
24
|
+
Classifier: Topic :: Scientific/Engineering :: Mathematics
|
|
25
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
26
|
+
Classifier: Topic :: Software Development
|
|
27
|
+
Classifier: Topic :: Software Development :: Libraries
|
|
28
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
29
|
+
Description-Content-Type: text/markdown
|
|
30
|
+
License-File: LICENSE
|
|
31
|
+
|
|
32
|
+
# barbor
|
|
33
|
+
|
|
34
|
+
The gradient optimization library with barzilar borwein method.
|
|
35
|
+
|
|
36
|
+
## description
|
|
37
|
+
|
|
38
|
+
This PyTorch implementation of the Barzilai-Borwein (BB) gradient descent optimizer represents a sophisticated advancement beyond standard first-order optimization methods. The core innovation lies in its adaptive step size computation that approximates second-order curvature information without explicit Hessian calculation, addressing a fundamental limitation of fixed-learning-rate gradient descent.
|
|
39
|
+
|
|
40
|
+
The implementation introduces two complementary step size strategies: BB1 (α = s·s/s·y) and BB2 (α = s·y/y·y), where s represents parameter changes and y represents gradient differences between iterations. These formulas effectively capture local curvature, enabling the optimizer to automatically adjust step sizes based on problem geometry. The default alternating strategy intelligently switches between these variants, leveraging their complementary strengths—BB1 tends to be more stable while BB2 can achieve faster convergence.
|
|
41
|
+
|
|
42
|
+
A key innovation is the adaptive restart mechanism that prevents divergence in non-convex landscapes. The code implements three restart conditions: gradient orthogonality (when s and y become nearly orthogonal), negative gradient correlation (when consecutive gradients point in opposite directions), or a combined approach. This system allows the optimizer to reset to initial learning rates when progress stalls, effectively escaping regions of poor curvature.
|
|
43
|
+
|
|
44
|
+
The implementation also integrates momentum support (both standard and Nesterov variants) with the BB framework, creating a hybrid approach that combines momentum's acceleration with BB's curvature awareness. Comprehensive numerical safeguards—including regularization parameters, step size clamping, and division-by-zero protection—ensure robustness across diverse optimization landscapes.
|
|
45
|
+
|
|
46
|
+
Beyond the core algorithm, the optimizer provides extensive diagnostic tools for monitoring convergence behavior, including real-time step size tracking, gradient correlation metrics, and convergence statistics. This transparency allows users to understand the adaptive behavior and make informed adjustments.
|
|
47
|
+
|
|
48
|
+
The combination of curvature-aware step sizing, intelligent restart conditions, momentum integration, and robust numerical handling makes this implementation particularly valuable for non-convex optimization problems where traditional methods struggle with learning rate selection and convergence stability.
|
|
49
|
+
|
|
50
|
+
## install barbor
|
|
51
|
+
|
|
52
|
+
```python
|
|
53
|
+
pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu121
|
|
54
|
+
pip install barbor
|
|
55
|
+
```
|
|
56
|
+
|
barbor-1.0.0/README.md
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
# barbor
|
|
2
|
+
|
|
3
|
+
The gradient optimization library with barzilar borwein method.
|
|
4
|
+
|
|
5
|
+
## description
|
|
6
|
+
|
|
7
|
+
This PyTorch implementation of the Barzilai-Borwein (BB) gradient descent optimizer represents a sophisticated advancement beyond standard first-order optimization methods. The core innovation lies in its adaptive step size computation that approximates second-order curvature information without explicit Hessian calculation, addressing a fundamental limitation of fixed-learning-rate gradient descent.
|
|
8
|
+
|
|
9
|
+
The implementation introduces two complementary step size strategies: BB1 (α = s·s/s·y) and BB2 (α = s·y/y·y), where s represents parameter changes and y represents gradient differences between iterations. These formulas effectively capture local curvature, enabling the optimizer to automatically adjust step sizes based on problem geometry. The default alternating strategy intelligently switches between these variants, leveraging their complementary strengths—BB1 tends to be more stable while BB2 can achieve faster convergence.
|
|
10
|
+
|
|
11
|
+
A key innovation is the adaptive restart mechanism that prevents divergence in non-convex landscapes. The code implements three restart conditions: gradient orthogonality (when s and y become nearly orthogonal), negative gradient correlation (when consecutive gradients point in opposite directions), or a combined approach. This system allows the optimizer to reset to initial learning rates when progress stalls, effectively escaping regions of poor curvature.
|
|
12
|
+
|
|
13
|
+
The implementation also integrates momentum support (both standard and Nesterov variants) with the BB framework, creating a hybrid approach that combines momentum's acceleration with BB's curvature awareness. Comprehensive numerical safeguards—including regularization parameters, step size clamping, and division-by-zero protection—ensure robustness across diverse optimization landscapes.
|
|
14
|
+
|
|
15
|
+
Beyond the core algorithm, the optimizer provides extensive diagnostic tools for monitoring convergence behavior, including real-time step size tracking, gradient correlation metrics, and convergence statistics. This transparency allows users to understand the adaptive behavior and make informed adjustments.
|
|
16
|
+
|
|
17
|
+
The combination of curvature-aware step sizing, intelligent restart conditions, momentum integration, and robust numerical handling makes this implementation particularly valuable for non-convex optimization problems where traditional methods struggle with learning rate selection and convergence stability.
|
|
18
|
+
|
|
19
|
+
## install barbor
|
|
20
|
+
|
|
21
|
+
```python
|
|
22
|
+
pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu121
|
|
23
|
+
pip install barbor
|
|
24
|
+
```
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
import torch
|
|
2
|
+
|
|
3
|
+
def apply_momentum(
|
|
4
|
+
param: torch.Tensor,
|
|
5
|
+
grad: torch.Tensor,
|
|
6
|
+
alpha: torch.Tensor,
|
|
7
|
+
state: dict,
|
|
8
|
+
momentum: float,
|
|
9
|
+
dampening: float,
|
|
10
|
+
nesterov: bool
|
|
11
|
+
):
|
|
12
|
+
"""Apply momentum to parameter update
|
|
13
|
+
|
|
14
|
+
Args:
|
|
15
|
+
param: Parameter tensor to update
|
|
16
|
+
grad: Gradient of parameter
|
|
17
|
+
alpha: Step size
|
|
18
|
+
state: Optimizer state for the parameter
|
|
19
|
+
momentum: Momentum factor
|
|
20
|
+
dampening: Momentum dampening factor
|
|
21
|
+
nesterov: Whether to use Nesterov momentum
|
|
22
|
+
"""
|
|
23
|
+
if 'momentum_buffer' not in state:
|
|
24
|
+
state['momentum_buffer'] = torch.zeros_like(param)
|
|
25
|
+
|
|
26
|
+
buf = state['momentum_buffer']
|
|
27
|
+
|
|
28
|
+
if nesterov:
|
|
29
|
+
# Nesterov momentum
|
|
30
|
+
grad_corrected = grad.add(buf, alpha=momentum)
|
|
31
|
+
param.data.add_(grad_corrected, alpha=-alpha.item() if torch.is_tensor(alpha) else -alpha)
|
|
32
|
+
buf.mul_(momentum).add_(grad, alpha=1 - dampening)
|
|
33
|
+
else:
|
|
34
|
+
# Standard momentum
|
|
35
|
+
buf.mul_(momentum).add_(grad, alpha=1 - dampening)
|
|
36
|
+
param.data.add_(buf, alpha=-alpha.item() if torch.is_tensor(alpha) else -alpha)
|
|
@@ -0,0 +1,242 @@
|
|
|
1
|
+
import torch
|
|
2
|
+
from typing import Optional, Callable, Dict, List, Tuple
|
|
3
|
+
from .stepsize import compute_step_size
|
|
4
|
+
from .restart import check_restart_condition
|
|
5
|
+
from .momentum import apply_momentum
|
|
6
|
+
from .utils import validate_arguments, compute_dot_products
|
|
7
|
+
|
|
8
|
+
class barbor(torch.optim.Optimizer):
|
|
9
|
+
"""Barzilar-Borwein Gradient Descent Method
|
|
10
|
+
|
|
11
|
+
Args:
|
|
12
|
+
params: Parameters to optimize
|
|
13
|
+
lr: Initial learning rate (default: 1.0)
|
|
14
|
+
method: Step size calculation method, options: 'bb1', 'bb2', 'alternating' (default: 'alternating')
|
|
15
|
+
gamma: Regularization parameter to prevent zero step size (default: 1e-8)
|
|
16
|
+
safe_guard: Step size safety guard factor (default: 1e-8)
|
|
17
|
+
min_step: Minimum step size (default: 1e-8)
|
|
18
|
+
max_step: Maximum step size (default: 1e8)
|
|
19
|
+
adaptive_restart: Whether to use adaptive restart (default: True)
|
|
20
|
+
restart_condition: Restart condition, options: 'gradient', 'angle', 'both' (default: 'both')
|
|
21
|
+
restart_tol: Restart condition tolerance (default: 0.9)
|
|
22
|
+
momentum: Momentum parameter (default: 0.0)
|
|
23
|
+
dampening: Momentum dampening (default: 0.0)
|
|
24
|
+
nesterov: Whether to use Nesterov momentum (default: False)
|
|
25
|
+
"""
|
|
26
|
+
|
|
27
|
+
def __init__(
|
|
28
|
+
self,
|
|
29
|
+
params,
|
|
30
|
+
lr: float = 1.0,
|
|
31
|
+
method: str = 'alternating',
|
|
32
|
+
gamma: float = 1e-8,
|
|
33
|
+
safe_guard: float = 1e-8,
|
|
34
|
+
min_step: float = 1e-8,
|
|
35
|
+
max_step: float = 1e8,
|
|
36
|
+
adaptive_restart: bool = True,
|
|
37
|
+
restart_condition: str = 'both',
|
|
38
|
+
restart_tol: float = 0.9,
|
|
39
|
+
momentum: float = 0.0,
|
|
40
|
+
dampening: float = 0.0,
|
|
41
|
+
nesterov: bool = False
|
|
42
|
+
):
|
|
43
|
+
# Validate input arguments
|
|
44
|
+
validate_arguments(lr, method, restart_condition, momentum, dampening)
|
|
45
|
+
|
|
46
|
+
defaults = dict(
|
|
47
|
+
lr=lr,
|
|
48
|
+
method=method,
|
|
49
|
+
gamma=gamma,
|
|
50
|
+
safe_guard=safe_guard,
|
|
51
|
+
min_step=min_step,
|
|
52
|
+
max_step=max_step,
|
|
53
|
+
adaptive_restart=adaptive_restart,
|
|
54
|
+
restart_condition=restart_condition,
|
|
55
|
+
restart_tol=restart_tol,
|
|
56
|
+
momentum=momentum,
|
|
57
|
+
dampening=dampening,
|
|
58
|
+
nesterov=nesterov
|
|
59
|
+
)
|
|
60
|
+
super().__init__(params, defaults)
|
|
61
|
+
|
|
62
|
+
# Initialize state for each parameter group
|
|
63
|
+
self._initialize_states()
|
|
64
|
+
|
|
65
|
+
def _initialize_states(self):
|
|
66
|
+
"""Initialize optimizer states for all parameters"""
|
|
67
|
+
for group in self.param_groups:
|
|
68
|
+
for p in group['params']:
|
|
69
|
+
self._initialize_param_state(p, group['lr'], group['momentum'])
|
|
70
|
+
|
|
71
|
+
def _initialize_param_state(self, p, lr: float, momentum: float):
|
|
72
|
+
"""Initialize state for a single parameter"""
|
|
73
|
+
state = self.state[p]
|
|
74
|
+
state.setdefault('step', 0)
|
|
75
|
+
state.setdefault('prev_param', torch.zeros_like(p))
|
|
76
|
+
state.setdefault('prev_grad', torch.zeros_like(p))
|
|
77
|
+
state.setdefault('alpha', torch.tensor(lr, device=p.device))
|
|
78
|
+
state.setdefault('prev_alpha', torch.tensor(lr, device=p.device))
|
|
79
|
+
|
|
80
|
+
if momentum > 0:
|
|
81
|
+
state.setdefault('momentum_buffer', torch.zeros_like(p))
|
|
82
|
+
|
|
83
|
+
@torch.no_grad()
|
|
84
|
+
def step(self, closure: Optional[Callable[[], float]] = None):
|
|
85
|
+
"""Perform a single optimization step
|
|
86
|
+
|
|
87
|
+
Args:
|
|
88
|
+
closure: A callable that recomputes the loss and returns the loss
|
|
89
|
+
|
|
90
|
+
Returns:
|
|
91
|
+
Loss value (if closure is provided)
|
|
92
|
+
"""
|
|
93
|
+
loss = None
|
|
94
|
+
if closure is not None:
|
|
95
|
+
with torch.enable_grad():
|
|
96
|
+
loss = closure()
|
|
97
|
+
|
|
98
|
+
for group in self.param_groups:
|
|
99
|
+
for p in group['params']:
|
|
100
|
+
if p.grad is None:
|
|
101
|
+
continue
|
|
102
|
+
|
|
103
|
+
self._update_parameter(p, group)
|
|
104
|
+
|
|
105
|
+
return loss
|
|
106
|
+
|
|
107
|
+
def _update_parameter(self, p, group: dict):
|
|
108
|
+
"""Update a single parameter"""
|
|
109
|
+
grad = p.grad
|
|
110
|
+
if grad.is_sparse:
|
|
111
|
+
raise RuntimeError('BarzilaiBorwein does not support sparse gradients')
|
|
112
|
+
|
|
113
|
+
state = self.state[p]
|
|
114
|
+
step = state['step']
|
|
115
|
+
|
|
116
|
+
if step == 0:
|
|
117
|
+
# First step: use initial learning rate
|
|
118
|
+
new_alpha = torch.tensor(group['lr'], device=p.device)
|
|
119
|
+
restart = False
|
|
120
|
+
else:
|
|
121
|
+
# Compute new step size
|
|
122
|
+
s, y = self._compute_updates(p, state)
|
|
123
|
+
restart = self._should_restart(s, y, grad, state, group)
|
|
124
|
+
new_alpha = self._compute_new_step_size(s, y, state, group, restart)
|
|
125
|
+
|
|
126
|
+
# Update parameter with new step size
|
|
127
|
+
self._apply_update(p, grad, state, new_alpha, group)
|
|
128
|
+
|
|
129
|
+
# Save state for next iteration
|
|
130
|
+
self._save_state(p, grad, state, new_alpha)
|
|
131
|
+
state['step'] += 1
|
|
132
|
+
|
|
133
|
+
def _compute_updates(self, p, state: dict) -> Tuple[torch.Tensor, torch.Tensor]:
|
|
134
|
+
"""Compute parameter and gradient updates"""
|
|
135
|
+
s = p.data - state['prev_param']
|
|
136
|
+
y = p.grad - state['prev_grad']
|
|
137
|
+
return s, y
|
|
138
|
+
|
|
139
|
+
def _should_restart(self, s: torch.Tensor, y: torch.Tensor,
|
|
140
|
+
grad: torch.Tensor, state: dict, group: dict) -> bool:
|
|
141
|
+
"""Check if restart condition is met"""
|
|
142
|
+
if not group['adaptive_restart'] or state['step'] <= 1:
|
|
143
|
+
return False
|
|
144
|
+
|
|
145
|
+
return check_restart_condition(
|
|
146
|
+
s, y, grad, state['prev_grad'],
|
|
147
|
+
group['restart_condition'], group['restart_tol']
|
|
148
|
+
)
|
|
149
|
+
|
|
150
|
+
def _compute_new_step_size(self, s: torch.Tensor, y: torch.Tensor,
|
|
151
|
+
state: dict, group: dict, restart: bool) -> torch.Tensor:
|
|
152
|
+
"""Compute new step size"""
|
|
153
|
+
if restart:
|
|
154
|
+
return torch.tensor(group['lr'], device=s.device)
|
|
155
|
+
|
|
156
|
+
s_dot_s, s_dot_y, y_dot_y = compute_dot_products(s, y)
|
|
157
|
+
|
|
158
|
+
new_alpha = compute_step_size(
|
|
159
|
+
s_dot_s, s_dot_y, y_dot_y,
|
|
160
|
+
state['step'], group['method'],
|
|
161
|
+
group['gamma'], group['safe_guard'],
|
|
162
|
+
s.device
|
|
163
|
+
)
|
|
164
|
+
|
|
165
|
+
# Clip step size
|
|
166
|
+
return torch.clamp(new_alpha, group['min_step'], group['max_step'])
|
|
167
|
+
|
|
168
|
+
def _apply_update(self, p, grad: torch.Tensor, state: dict,
|
|
169
|
+
alpha: torch.Tensor, group: dict):
|
|
170
|
+
"""Apply parameter update"""
|
|
171
|
+
state['prev_alpha'] = state['alpha']
|
|
172
|
+
state['alpha'] = alpha
|
|
173
|
+
|
|
174
|
+
if group['momentum'] > 0:
|
|
175
|
+
apply_momentum(
|
|
176
|
+
p, grad, alpha, state,
|
|
177
|
+
group['momentum'], group['dampening'], group['nesterov']
|
|
178
|
+
)
|
|
179
|
+
else:
|
|
180
|
+
p.data.add_(grad, alpha=-alpha)
|
|
181
|
+
|
|
182
|
+
def _save_state(self, p, grad: torch.Tensor, state: dict, alpha: torch.Tensor):
|
|
183
|
+
"""Save current state for next iteration"""
|
|
184
|
+
state['prev_param'].copy_(p.data)
|
|
185
|
+
state['prev_grad'].copy_(grad)
|
|
186
|
+
|
|
187
|
+
def get_step_sizes(self) -> List[float]:
|
|
188
|
+
"""Get current step sizes for all parameters"""
|
|
189
|
+
alphas = []
|
|
190
|
+
for group in self.param_groups:
|
|
191
|
+
for p in group['params']:
|
|
192
|
+
state = self.state[p]
|
|
193
|
+
if 'alpha' in state:
|
|
194
|
+
alpha = state['alpha']
|
|
195
|
+
alphas.append(alpha.item() if torch.is_tensor(alpha) else alpha)
|
|
196
|
+
return alphas
|
|
197
|
+
|
|
198
|
+
def reset_step_sizes(self, alpha: float = 1.0):
|
|
199
|
+
"""Reset step sizes for all parameters"""
|
|
200
|
+
for group in self.param_groups:
|
|
201
|
+
for p in group['params']:
|
|
202
|
+
state = self.state[p]
|
|
203
|
+
device = p.device
|
|
204
|
+
state['alpha'] = torch.tensor(alpha, device=device)
|
|
205
|
+
state['prev_alpha'] = torch.tensor(alpha, device=device)
|
|
206
|
+
|
|
207
|
+
def get_gradient_history_info(self) -> List[Tuple[float, float, float]]:
|
|
208
|
+
"""Get gradient history information
|
|
209
|
+
|
|
210
|
+
Returns:
|
|
211
|
+
List of (s·s, s·y, y·y) values for each parameter
|
|
212
|
+
"""
|
|
213
|
+
info = []
|
|
214
|
+
for group in self.param_groups:
|
|
215
|
+
for p in group['params']:
|
|
216
|
+
state = self.state[p]
|
|
217
|
+
if 'prev_grad' in state and p.grad is not None:
|
|
218
|
+
s, y = self._compute_updates(p, state)
|
|
219
|
+
s_dot_s, s_dot_y, y_dot_y = compute_dot_products(s, y)
|
|
220
|
+
info.append((s_dot_s.item(), s_dot_y.item(), y_dot_y.item()))
|
|
221
|
+
return info
|
|
222
|
+
|
|
223
|
+
def get_convergence_info(self) -> Dict[str, List[float]]:
|
|
224
|
+
"""Get convergence information"""
|
|
225
|
+
info = {
|
|
226
|
+
'step_sizes': [],
|
|
227
|
+
'gradient_norms': [],
|
|
228
|
+
'step_norms': []
|
|
229
|
+
}
|
|
230
|
+
for group in self.param_groups:
|
|
231
|
+
for p in group['params']:
|
|
232
|
+
state = self.state[p]
|
|
233
|
+
if 'alpha' in state:
|
|
234
|
+
alpha = state['alpha']
|
|
235
|
+
info['step_sizes'].append(alpha.item() if torch.is_tensor(alpha) else alpha)
|
|
236
|
+
if p.grad is not None:
|
|
237
|
+
grad_norm = torch.norm(p.grad).item()
|
|
238
|
+
info['gradient_norms'].append(grad_norm)
|
|
239
|
+
if 'prev_param' in state:
|
|
240
|
+
step_norm = torch.norm(p.data - state['prev_param']).item()
|
|
241
|
+
info['step_norms'].append(step_norm)
|
|
242
|
+
return info
|
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
import torch
|
|
2
|
+
from typing import Union
|
|
3
|
+
from enum import Enum
|
|
4
|
+
|
|
5
|
+
class RestartCondition(Enum):
|
|
6
|
+
GRADIENT = 'gradient'
|
|
7
|
+
ANGLE = 'angle'
|
|
8
|
+
BOTH = 'both'
|
|
9
|
+
|
|
10
|
+
def check_restart_condition(
|
|
11
|
+
s: torch.Tensor,
|
|
12
|
+
y: torch.Tensor,
|
|
13
|
+
grad: torch.Tensor,
|
|
14
|
+
prev_grad: torch.Tensor,
|
|
15
|
+
condition: Union[str, RestartCondition],
|
|
16
|
+
tol: float = 0.9
|
|
17
|
+
) -> bool:
|
|
18
|
+
"""Check if restart condition is met
|
|
19
|
+
|
|
20
|
+
Restart conditions help prevent BB method from diverging on non-convex problems
|
|
21
|
+
|
|
22
|
+
Args:
|
|
23
|
+
s: Parameter difference (x_k - x_{k-1})
|
|
24
|
+
y: Gradient difference (∇f_k - ∇f_{k-1})
|
|
25
|
+
grad: Current gradient
|
|
26
|
+
prev_grad: Previous gradient
|
|
27
|
+
condition: Restart condition type
|
|
28
|
+
tol: Tolerance for restart condition
|
|
29
|
+
|
|
30
|
+
Returns:
|
|
31
|
+
True if restart condition is met
|
|
32
|
+
"""
|
|
33
|
+
if isinstance(condition, str):
|
|
34
|
+
condition = RestartCondition(condition.lower())
|
|
35
|
+
|
|
36
|
+
if condition == RestartCondition.GRADIENT:
|
|
37
|
+
return _check_gradient_condition(s, y, tol)
|
|
38
|
+
elif condition == RestartCondition.ANGLE:
|
|
39
|
+
return _check_angle_condition(grad, prev_grad, tol)
|
|
40
|
+
elif condition == RestartCondition.BOTH:
|
|
41
|
+
return _check_both_conditions(s, y, grad, prev_grad, tol)
|
|
42
|
+
else:
|
|
43
|
+
raise ValueError(f"Unsupported restart condition: {condition}")
|
|
44
|
+
|
|
45
|
+
def _check_gradient_condition(s: torch.Tensor, y: torch.Tensor, tol: float) -> bool:
|
|
46
|
+
"""Check gradient-based restart condition"""
|
|
47
|
+
s_norm = torch.norm(s)
|
|
48
|
+
y_norm = torch.norm(y)
|
|
49
|
+
|
|
50
|
+
if s_norm < 1e-12 or y_norm < 1e-12:
|
|
51
|
+
return False
|
|
52
|
+
|
|
53
|
+
cos_theta = torch.abs(torch.sum(s * y)) / (s_norm * y_norm)
|
|
54
|
+
return cos_theta < tol
|
|
55
|
+
|
|
56
|
+
def _check_angle_condition(grad: torch.Tensor, prev_grad: torch.Tensor, tol: float) -> bool:
|
|
57
|
+
"""Check angle-based restart condition"""
|
|
58
|
+
grad_norm = torch.norm(grad)
|
|
59
|
+
prev_grad_norm = torch.norm(prev_grad)
|
|
60
|
+
|
|
61
|
+
if grad_norm < 1e-12 or prev_grad_norm < 1e-12:
|
|
62
|
+
return False
|
|
63
|
+
|
|
64
|
+
cos_phi = torch.sum(grad * prev_grad) / (grad_norm * prev_grad_norm)
|
|
65
|
+
return cos_phi < -tol
|
|
66
|
+
|
|
67
|
+
def _check_both_conditions(
|
|
68
|
+
s: torch.Tensor,
|
|
69
|
+
y: torch.Tensor,
|
|
70
|
+
grad: torch.Tensor,
|
|
71
|
+
prev_grad: torch.Tensor,
|
|
72
|
+
tol: float
|
|
73
|
+
) -> bool:
|
|
74
|
+
"""Check both restart conditions"""
|
|
75
|
+
restart1, restart2 = False, False
|
|
76
|
+
|
|
77
|
+
s_norm = torch.norm(s)
|
|
78
|
+
y_norm = torch.norm(y)
|
|
79
|
+
grad_norm = torch.norm(grad)
|
|
80
|
+
prev_grad_norm = torch.norm(prev_grad)
|
|
81
|
+
|
|
82
|
+
if s_norm > 1e-12 and y_norm > 1e-12:
|
|
83
|
+
cos_theta = torch.abs(torch.sum(s * y)) / (s_norm * y_norm)
|
|
84
|
+
restart1 = cos_theta < tol
|
|
85
|
+
|
|
86
|
+
if grad_norm > 1e-12 and prev_grad_norm > 1e-12:
|
|
87
|
+
cos_phi = torch.sum(grad * prev_grad) / (grad_norm * prev_grad_norm)
|
|
88
|
+
restart2 = cos_phi < -tol
|
|
89
|
+
|
|
90
|
+
return restart1 or restart2
|
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
import torch
|
|
2
|
+
from typing import Union
|
|
3
|
+
from enum import Enum
|
|
4
|
+
|
|
5
|
+
class StepSizeMethod(Enum):
|
|
6
|
+
BB1 = 'bb1'
|
|
7
|
+
BB2 = 'bb2'
|
|
8
|
+
ALTERNATING = 'alternating'
|
|
9
|
+
|
|
10
|
+
def compute_step_size(
|
|
11
|
+
s_dot_s: torch.Tensor,
|
|
12
|
+
s_dot_y: torch.Tensor,
|
|
13
|
+
y_dot_y: torch.Tensor,
|
|
14
|
+
step: int,
|
|
15
|
+
method: Union[str, StepSizeMethod],
|
|
16
|
+
gamma: float = 1e-8,
|
|
17
|
+
safe_guard: float = 1e-8,
|
|
18
|
+
device: torch.device = None
|
|
19
|
+
) -> torch.Tensor:
|
|
20
|
+
"""Compute step size using Barzilai-Borwein method
|
|
21
|
+
|
|
22
|
+
Args:
|
|
23
|
+
s_dot_s: s·s where s = x_k - x_{k-1}
|
|
24
|
+
s_dot_y: s·y where y = ∇f_k - ∇f_{k-1}
|
|
25
|
+
y_dot_y: y·y
|
|
26
|
+
step: Current iteration number
|
|
27
|
+
method: Step size calculation method
|
|
28
|
+
gamma: Regularization parameter
|
|
29
|
+
safe_guard: Numerical safety parameter
|
|
30
|
+
device: Device for tensor creation
|
|
31
|
+
|
|
32
|
+
Returns:
|
|
33
|
+
Computed step size
|
|
34
|
+
"""
|
|
35
|
+
if isinstance(method, str):
|
|
36
|
+
method = StepSizeMethod(method.lower())
|
|
37
|
+
|
|
38
|
+
if method == StepSizeMethod.BB1:
|
|
39
|
+
return _bb1_step(s_dot_s, s_dot_y, gamma, safe_guard, device)
|
|
40
|
+
elif method == StepSizeMethod.BB2:
|
|
41
|
+
return _bb2_step(s_dot_y, y_dot_y, gamma, safe_guard, device)
|
|
42
|
+
elif method == StepSizeMethod.ALTERNATING:
|
|
43
|
+
# Alternate between BB1 and BB2
|
|
44
|
+
if step % 2 == 1:
|
|
45
|
+
return _bb1_step(s_dot_s, s_dot_y, gamma, safe_guard, device)
|
|
46
|
+
else:
|
|
47
|
+
return _bb2_step(s_dot_y, y_dot_y, gamma, safe_guard, device)
|
|
48
|
+
else:
|
|
49
|
+
raise ValueError(f"Unsupported step size calculation method: {method}")
|
|
50
|
+
|
|
51
|
+
def _bb1_step(
|
|
52
|
+
s_dot_s: torch.Tensor,
|
|
53
|
+
s_dot_y: torch.Tensor,
|
|
54
|
+
gamma: float,
|
|
55
|
+
safe_guard: float,
|
|
56
|
+
device: torch.device
|
|
57
|
+
) -> torch.Tensor:
|
|
58
|
+
"""Compute BB1 step size"""
|
|
59
|
+
if torch.abs(s_dot_y) < safe_guard:
|
|
60
|
+
if torch.abs(s_dot_s) < safe_guard:
|
|
61
|
+
return torch.tensor(1.0, device=device)
|
|
62
|
+
else:
|
|
63
|
+
return s_dot_s.clone()
|
|
64
|
+
|
|
65
|
+
denominator = s_dot_y + gamma
|
|
66
|
+
if denominator <= 0:
|
|
67
|
+
denominator = torch.abs(denominator) + gamma
|
|
68
|
+
|
|
69
|
+
return s_dot_s / denominator
|
|
70
|
+
|
|
71
|
+
def _bb2_step(
|
|
72
|
+
s_dot_y: torch.Tensor,
|
|
73
|
+
y_dot_y: torch.Tensor,
|
|
74
|
+
gamma: float,
|
|
75
|
+
safe_guard: float,
|
|
76
|
+
device: torch.device
|
|
77
|
+
) -> torch.Tensor:
|
|
78
|
+
"""Compute BB2 step size"""
|
|
79
|
+
if torch.abs(y_dot_y) < safe_guard:
|
|
80
|
+
if torch.abs(s_dot_y) < safe_guard:
|
|
81
|
+
return torch.tensor(1.0, device=device)
|
|
82
|
+
else:
|
|
83
|
+
return s_dot_y.clone()
|
|
84
|
+
|
|
85
|
+
denominator = y_dot_y + gamma
|
|
86
|
+
if denominator <= 0:
|
|
87
|
+
denominator = torch.abs(denominator) + gamma
|
|
88
|
+
|
|
89
|
+
return s_dot_y / denominator
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
import torch
|
|
2
|
+
|
|
3
|
+
def validate_arguments(
|
|
4
|
+
lr: float,
|
|
5
|
+
method: str,
|
|
6
|
+
restart_condition: str,
|
|
7
|
+
momentum: float,
|
|
8
|
+
dampening: float
|
|
9
|
+
):
|
|
10
|
+
"""Validate optimizer arguments
|
|
11
|
+
|
|
12
|
+
Args:
|
|
13
|
+
lr: Learning rate
|
|
14
|
+
method: Step size method
|
|
15
|
+
restart_condition: Restart condition
|
|
16
|
+
momentum: Momentum parameter
|
|
17
|
+
dampening: Dampening parameter
|
|
18
|
+
|
|
19
|
+
Raises:
|
|
20
|
+
ValueError: If any argument is invalid
|
|
21
|
+
"""
|
|
22
|
+
if lr <= 0.0:
|
|
23
|
+
raise ValueError(f"Learning rate must be positive: {lr}")
|
|
24
|
+
valid_methods = ['bb1', 'bb2', 'alternating']
|
|
25
|
+
if method not in valid_methods:
|
|
26
|
+
raise ValueError(f"Unsupported step size calculation method: {method}. "
|
|
27
|
+
f"Must be one of {valid_methods}")
|
|
28
|
+
valid_restart_conditions = ['gradient', 'angle', 'both']
|
|
29
|
+
if restart_condition not in valid_restart_conditions:
|
|
30
|
+
raise ValueError(f"Unsupported restart condition: {restart_condition}. "
|
|
31
|
+
f"Must be one of {valid_restart_conditions}")
|
|
32
|
+
if momentum < 0.0:
|
|
33
|
+
raise ValueError(f"Momentum must be non-negative: {momentum}")
|
|
34
|
+
if dampening < 0.0:
|
|
35
|
+
raise ValueError(f"Dampening must be non-negative: {dampening}")
|
|
36
|
+
|
|
37
|
+
def compute_dot_products(s: torch.Tensor, y: torch.Tensor) -> tuple:
|
|
38
|
+
"""Compute dot products for BB step size calculation"""
|
|
39
|
+
s_dot_s = torch.sum(s * s)
|
|
40
|
+
s_dot_y = torch.sum(s * y)
|
|
41
|
+
y_dot_y = torch.sum(y * y)
|
|
42
|
+
return s_dot_s, s_dot_y, y_dot_y
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: barbor
|
|
3
|
+
Version: 1.0.0
|
|
4
|
+
Summary: The gradient optimization library with barzilar borwein method.
|
|
5
|
+
Home-page: https://github.com/linjing-lab/barbor
|
|
6
|
+
Download-URL: https://github.com/linjing-lab/barbor/tags
|
|
7
|
+
Author: 林景
|
|
8
|
+
Author-email: linjing010729@163.com
|
|
9
|
+
License: MIT
|
|
10
|
+
Project-URL: Source, https://github.com/linjing-lab/barbor/tree/main/barbor/
|
|
11
|
+
Project-URL: Tracker, https://github.com/linjing-lab/barbor/issues
|
|
12
|
+
Platform: UNKNOWN
|
|
13
|
+
Classifier: Development Status :: 5 - Production/Stable
|
|
14
|
+
Classifier: Intended Audience :: Developers
|
|
15
|
+
Classifier: Intended Audience :: Information Technology
|
|
16
|
+
Classifier: Intended Audience :: Science/Research
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
22
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
23
|
+
Classifier: Topic :: Scientific/Engineering
|
|
24
|
+
Classifier: Topic :: Scientific/Engineering :: Mathematics
|
|
25
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
26
|
+
Classifier: Topic :: Software Development
|
|
27
|
+
Classifier: Topic :: Software Development :: Libraries
|
|
28
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
29
|
+
Description-Content-Type: text/markdown
|
|
30
|
+
License-File: LICENSE
|
|
31
|
+
|
|
32
|
+
# barbor
|
|
33
|
+
|
|
34
|
+
The gradient optimization library with barzilar borwein method.
|
|
35
|
+
|
|
36
|
+
## description
|
|
37
|
+
|
|
38
|
+
This PyTorch implementation of the Barzilai-Borwein (BB) gradient descent optimizer represents a sophisticated advancement beyond standard first-order optimization methods. The core innovation lies in its adaptive step size computation that approximates second-order curvature information without explicit Hessian calculation, addressing a fundamental limitation of fixed-learning-rate gradient descent.
|
|
39
|
+
|
|
40
|
+
The implementation introduces two complementary step size strategies: BB1 (α = s·s/s·y) and BB2 (α = s·y/y·y), where s represents parameter changes and y represents gradient differences between iterations. These formulas effectively capture local curvature, enabling the optimizer to automatically adjust step sizes based on problem geometry. The default alternating strategy intelligently switches between these variants, leveraging their complementary strengths—BB1 tends to be more stable while BB2 can achieve faster convergence.
|
|
41
|
+
|
|
42
|
+
A key innovation is the adaptive restart mechanism that prevents divergence in non-convex landscapes. The code implements three restart conditions: gradient orthogonality (when s and y become nearly orthogonal), negative gradient correlation (when consecutive gradients point in opposite directions), or a combined approach. This system allows the optimizer to reset to initial learning rates when progress stalls, effectively escaping regions of poor curvature.
|
|
43
|
+
|
|
44
|
+
The implementation also integrates momentum support (both standard and Nesterov variants) with the BB framework, creating a hybrid approach that combines momentum's acceleration with BB's curvature awareness. Comprehensive numerical safeguards—including regularization parameters, step size clamping, and division-by-zero protection—ensure robustness across diverse optimization landscapes.
|
|
45
|
+
|
|
46
|
+
Beyond the core algorithm, the optimizer provides extensive diagnostic tools for monitoring convergence behavior, including real-time step size tracking, gradient correlation metrics, and convergence statistics. This transparency allows users to understand the adaptive behavior and make informed adjustments.
|
|
47
|
+
|
|
48
|
+
The combination of curvature-aware step sizing, intelligent restart conditions, momentum integration, and robust numerical handling makes this implementation particularly valuable for non-convex optimization problems where traditional methods struggle with learning rate selection and convergence stability.
|
|
49
|
+
|
|
50
|
+
## install barbor
|
|
51
|
+
|
|
52
|
+
```python
|
|
53
|
+
pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu121
|
|
54
|
+
pip install barbor
|
|
55
|
+
```
|
|
56
|
+
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
LICENSE
|
|
2
|
+
README.md
|
|
3
|
+
setup.py
|
|
4
|
+
barbor/__init__.py
|
|
5
|
+
barbor/momentum.py
|
|
6
|
+
barbor/optimizer.py
|
|
7
|
+
barbor/restart.py
|
|
8
|
+
barbor/stepsize.py
|
|
9
|
+
barbor/utils.py
|
|
10
|
+
barbor.egg-info/PKG-INFO
|
|
11
|
+
barbor.egg-info/SOURCES.txt
|
|
12
|
+
barbor.egg-info/dependency_links.txt
|
|
13
|
+
barbor.egg-info/not-zip-safe
|
|
14
|
+
barbor.egg-info/top_level.txt
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
barbor
|
barbor-1.0.0/setup.cfg
ADDED
barbor-1.0.0/setup.py
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
from setuptools import setup
|
|
2
|
+
from barbor import __version__
|
|
3
|
+
|
|
4
|
+
try:
|
|
5
|
+
with open('README.md', 'r', encoding='utf-8') as fp:
|
|
6
|
+
_long_description = fp.read()
|
|
7
|
+
except FileNotFoundError:
|
|
8
|
+
_long_description = ''
|
|
9
|
+
|
|
10
|
+
setup(
|
|
11
|
+
name='barbor', # pkg_name
|
|
12
|
+
packages=['barbor',],
|
|
13
|
+
version=__version__, # version number
|
|
14
|
+
description="The gradient optimization library with barzilar borwein method.",
|
|
15
|
+
author='林景',
|
|
16
|
+
author_email='linjing010729@163.com',
|
|
17
|
+
license='MIT',
|
|
18
|
+
url='https://github.com/linjing-lab/barbor',
|
|
19
|
+
download_url='https://github.com/linjing-lab/barbor/tags',
|
|
20
|
+
long_description=_long_description,
|
|
21
|
+
long_description_content_type='text/markdown',
|
|
22
|
+
include_package_data=True,
|
|
23
|
+
zip_safe=False,
|
|
24
|
+
setup_requires=['setuptools>=18.0', 'wheel'],
|
|
25
|
+
project_urls={
|
|
26
|
+
'Source': 'https://github.com/linjing-lab/barbor/tree/main/barbor/',
|
|
27
|
+
'Tracker': 'https://github.com/linjing-lab/barbor/issues',
|
|
28
|
+
},
|
|
29
|
+
classifiers=[
|
|
30
|
+
'Development Status :: 5 - Production/Stable',
|
|
31
|
+
'Intended Audience :: Developers',
|
|
32
|
+
'Intended Audience :: Information Technology',
|
|
33
|
+
'Intended Audience :: Science/Research',
|
|
34
|
+
'Programming Language :: Python :: 3.8',
|
|
35
|
+
'Programming Language :: Python :: 3.9',
|
|
36
|
+
'Programming Language :: Python :: 3.10',
|
|
37
|
+
'Programming Language :: Python :: 3.11',
|
|
38
|
+
'Programming Language :: Python :: 3.12',
|
|
39
|
+
'License :: OSI Approved :: MIT License',
|
|
40
|
+
'Topic :: Scientific/Engineering',
|
|
41
|
+
'Topic :: Scientific/Engineering :: Mathematics',
|
|
42
|
+
'Topic :: Scientific/Engineering :: Artificial Intelligence',
|
|
43
|
+
'Topic :: Software Development',
|
|
44
|
+
'Topic :: Software Development :: Libraries',
|
|
45
|
+
'Topic :: Software Development :: Libraries :: Python Modules',
|
|
46
|
+
],
|
|
47
|
+
)
|