barbor 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
barbor-1.0.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Jing Lin
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
barbor-1.0.0/PKG-INFO ADDED
@@ -0,0 +1,56 @@
1
+ Metadata-Version: 2.1
2
+ Name: barbor
3
+ Version: 1.0.0
4
+ Summary: The gradient optimization library with barzilar borwein method.
5
+ Home-page: https://github.com/linjing-lab/barbor
6
+ Download-URL: https://github.com/linjing-lab/barbor/tags
7
+ Author: 林景
8
+ Author-email: linjing010729@163.com
9
+ License: MIT
10
+ Project-URL: Source, https://github.com/linjing-lab/barbor/tree/main/barbor/
11
+ Project-URL: Tracker, https://github.com/linjing-lab/barbor/issues
12
+ Platform: UNKNOWN
13
+ Classifier: Development Status :: 5 - Production/Stable
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Intended Audience :: Information Technology
16
+ Classifier: Intended Audience :: Science/Research
17
+ Classifier: Programming Language :: Python :: 3.8
18
+ Classifier: Programming Language :: Python :: 3.9
19
+ Classifier: Programming Language :: Python :: 3.10
20
+ Classifier: Programming Language :: Python :: 3.11
21
+ Classifier: Programming Language :: Python :: 3.12
22
+ Classifier: License :: OSI Approved :: MIT License
23
+ Classifier: Topic :: Scientific/Engineering
24
+ Classifier: Topic :: Scientific/Engineering :: Mathematics
25
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
26
+ Classifier: Topic :: Software Development
27
+ Classifier: Topic :: Software Development :: Libraries
28
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
29
+ Description-Content-Type: text/markdown
30
+ License-File: LICENSE
31
+
32
+ # barbor
33
+
34
+ The gradient optimization library with barzilar borwein method.
35
+
36
+ ## description
37
+
38
+ This PyTorch implementation of the Barzilai-Borwein (BB) gradient descent optimizer represents a sophisticated advancement beyond standard first-order optimization methods. The core innovation lies in its adaptive step size computation that approximates second-order curvature information without explicit Hessian calculation, addressing a fundamental limitation of fixed-learning-rate gradient descent.
39
+
40
+ The implementation introduces two complementary step size strategies: BB1 (α = s·s/s·y) and BB2 (α = s·y/y·y), where s represents parameter changes and y represents gradient differences between iterations. These formulas effectively capture local curvature, enabling the optimizer to automatically adjust step sizes based on problem geometry. The default alternating strategy intelligently switches between these variants, leveraging their complementary strengths—BB1 tends to be more stable while BB2 can achieve faster convergence.
41
+
42
+ A key innovation is the adaptive restart mechanism that prevents divergence in non-convex landscapes. The code implements three restart conditions: gradient orthogonality (when s and y become nearly orthogonal), negative gradient correlation (when consecutive gradients point in opposite directions), or a combined approach. This system allows the optimizer to reset to initial learning rates when progress stalls, effectively escaping regions of poor curvature.
43
+
44
+ The implementation also integrates momentum support (both standard and Nesterov variants) with the BB framework, creating a hybrid approach that combines momentum's acceleration with BB's curvature awareness. Comprehensive numerical safeguards—including regularization parameters, step size clamping, and division-by-zero protection—ensure robustness across diverse optimization landscapes.
45
+
46
+ Beyond the core algorithm, the optimizer provides extensive diagnostic tools for monitoring convergence behavior, including real-time step size tracking, gradient correlation metrics, and convergence statistics. This transparency allows users to understand the adaptive behavior and make informed adjustments.
47
+
48
+ The combination of curvature-aware step sizing, intelligent restart conditions, momentum integration, and robust numerical handling makes this implementation particularly valuable for non-convex optimization problems where traditional methods struggle with learning rate selection and convergence stability.
49
+
50
+ ## install barbor
51
+
52
+ ```python
53
+ pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu121
54
+ pip install barbor
55
+ ```
56
+
barbor-1.0.0/README.md ADDED
@@ -0,0 +1,24 @@
1
+ # barbor
2
+
3
+ The gradient optimization library with barzilar borwein method.
4
+
5
+ ## description
6
+
7
+ This PyTorch implementation of the Barzilai-Borwein (BB) gradient descent optimizer represents a sophisticated advancement beyond standard first-order optimization methods. The core innovation lies in its adaptive step size computation that approximates second-order curvature information without explicit Hessian calculation, addressing a fundamental limitation of fixed-learning-rate gradient descent.
8
+
9
+ The implementation introduces two complementary step size strategies: BB1 (α = s·s/s·y) and BB2 (α = s·y/y·y), where s represents parameter changes and y represents gradient differences between iterations. These formulas effectively capture local curvature, enabling the optimizer to automatically adjust step sizes based on problem geometry. The default alternating strategy intelligently switches between these variants, leveraging their complementary strengths—BB1 tends to be more stable while BB2 can achieve faster convergence.
10
+
11
+ A key innovation is the adaptive restart mechanism that prevents divergence in non-convex landscapes. The code implements three restart conditions: gradient orthogonality (when s and y become nearly orthogonal), negative gradient correlation (when consecutive gradients point in opposite directions), or a combined approach. This system allows the optimizer to reset to initial learning rates when progress stalls, effectively escaping regions of poor curvature.
12
+
13
+ The implementation also integrates momentum support (both standard and Nesterov variants) with the BB framework, creating a hybrid approach that combines momentum's acceleration with BB's curvature awareness. Comprehensive numerical safeguards—including regularization parameters, step size clamping, and division-by-zero protection—ensure robustness across diverse optimization landscapes.
14
+
15
+ Beyond the core algorithm, the optimizer provides extensive diagnostic tools for monitoring convergence behavior, including real-time step size tracking, gradient correlation metrics, and convergence statistics. This transparency allows users to understand the adaptive behavior and make informed adjustments.
16
+
17
+ The combination of curvature-aware step sizing, intelligent restart conditions, momentum integration, and robust numerical handling makes this implementation particularly valuable for non-convex optimization problems where traditional methods struggle with learning rate selection and convergence stability.
18
+
19
+ ## install barbor
20
+
21
+ ```python
22
+ pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu121
23
+ pip install barbor
24
+ ```
@@ -0,0 +1,2 @@
1
+ from .optimizer import barbor
2
+ __version__ = '1.0.0'
@@ -0,0 +1,36 @@
1
+ import torch
2
+
3
+ def apply_momentum(
4
+ param: torch.Tensor,
5
+ grad: torch.Tensor,
6
+ alpha: torch.Tensor,
7
+ state: dict,
8
+ momentum: float,
9
+ dampening: float,
10
+ nesterov: bool
11
+ ):
12
+ """Apply momentum to parameter update
13
+
14
+ Args:
15
+ param: Parameter tensor to update
16
+ grad: Gradient of parameter
17
+ alpha: Step size
18
+ state: Optimizer state for the parameter
19
+ momentum: Momentum factor
20
+ dampening: Momentum dampening factor
21
+ nesterov: Whether to use Nesterov momentum
22
+ """
23
+ if 'momentum_buffer' not in state:
24
+ state['momentum_buffer'] = torch.zeros_like(param)
25
+
26
+ buf = state['momentum_buffer']
27
+
28
+ if nesterov:
29
+ # Nesterov momentum
30
+ grad_corrected = grad.add(buf, alpha=momentum)
31
+ param.data.add_(grad_corrected, alpha=-alpha.item() if torch.is_tensor(alpha) else -alpha)
32
+ buf.mul_(momentum).add_(grad, alpha=1 - dampening)
33
+ else:
34
+ # Standard momentum
35
+ buf.mul_(momentum).add_(grad, alpha=1 - dampening)
36
+ param.data.add_(buf, alpha=-alpha.item() if torch.is_tensor(alpha) else -alpha)
@@ -0,0 +1,242 @@
1
+ import torch
2
+ from typing import Optional, Callable, Dict, List, Tuple
3
+ from .stepsize import compute_step_size
4
+ from .restart import check_restart_condition
5
+ from .momentum import apply_momentum
6
+ from .utils import validate_arguments, compute_dot_products
7
+
8
+ class barbor(torch.optim.Optimizer):
9
+ """Barzilar-Borwein Gradient Descent Method
10
+
11
+ Args:
12
+ params: Parameters to optimize
13
+ lr: Initial learning rate (default: 1.0)
14
+ method: Step size calculation method, options: 'bb1', 'bb2', 'alternating' (default: 'alternating')
15
+ gamma: Regularization parameter to prevent zero step size (default: 1e-8)
16
+ safe_guard: Step size safety guard factor (default: 1e-8)
17
+ min_step: Minimum step size (default: 1e-8)
18
+ max_step: Maximum step size (default: 1e8)
19
+ adaptive_restart: Whether to use adaptive restart (default: True)
20
+ restart_condition: Restart condition, options: 'gradient', 'angle', 'both' (default: 'both')
21
+ restart_tol: Restart condition tolerance (default: 0.9)
22
+ momentum: Momentum parameter (default: 0.0)
23
+ dampening: Momentum dampening (default: 0.0)
24
+ nesterov: Whether to use Nesterov momentum (default: False)
25
+ """
26
+
27
+ def __init__(
28
+ self,
29
+ params,
30
+ lr: float = 1.0,
31
+ method: str = 'alternating',
32
+ gamma: float = 1e-8,
33
+ safe_guard: float = 1e-8,
34
+ min_step: float = 1e-8,
35
+ max_step: float = 1e8,
36
+ adaptive_restart: bool = True,
37
+ restart_condition: str = 'both',
38
+ restart_tol: float = 0.9,
39
+ momentum: float = 0.0,
40
+ dampening: float = 0.0,
41
+ nesterov: bool = False
42
+ ):
43
+ # Validate input arguments
44
+ validate_arguments(lr, method, restart_condition, momentum, dampening)
45
+
46
+ defaults = dict(
47
+ lr=lr,
48
+ method=method,
49
+ gamma=gamma,
50
+ safe_guard=safe_guard,
51
+ min_step=min_step,
52
+ max_step=max_step,
53
+ adaptive_restart=adaptive_restart,
54
+ restart_condition=restart_condition,
55
+ restart_tol=restart_tol,
56
+ momentum=momentum,
57
+ dampening=dampening,
58
+ nesterov=nesterov
59
+ )
60
+ super().__init__(params, defaults)
61
+
62
+ # Initialize state for each parameter group
63
+ self._initialize_states()
64
+
65
+ def _initialize_states(self):
66
+ """Initialize optimizer states for all parameters"""
67
+ for group in self.param_groups:
68
+ for p in group['params']:
69
+ self._initialize_param_state(p, group['lr'], group['momentum'])
70
+
71
+ def _initialize_param_state(self, p, lr: float, momentum: float):
72
+ """Initialize state for a single parameter"""
73
+ state = self.state[p]
74
+ state.setdefault('step', 0)
75
+ state.setdefault('prev_param', torch.zeros_like(p))
76
+ state.setdefault('prev_grad', torch.zeros_like(p))
77
+ state.setdefault('alpha', torch.tensor(lr, device=p.device))
78
+ state.setdefault('prev_alpha', torch.tensor(lr, device=p.device))
79
+
80
+ if momentum > 0:
81
+ state.setdefault('momentum_buffer', torch.zeros_like(p))
82
+
83
+ @torch.no_grad()
84
+ def step(self, closure: Optional[Callable[[], float]] = None):
85
+ """Perform a single optimization step
86
+
87
+ Args:
88
+ closure: A callable that recomputes the loss and returns the loss
89
+
90
+ Returns:
91
+ Loss value (if closure is provided)
92
+ """
93
+ loss = None
94
+ if closure is not None:
95
+ with torch.enable_grad():
96
+ loss = closure()
97
+
98
+ for group in self.param_groups:
99
+ for p in group['params']:
100
+ if p.grad is None:
101
+ continue
102
+
103
+ self._update_parameter(p, group)
104
+
105
+ return loss
106
+
107
+ def _update_parameter(self, p, group: dict):
108
+ """Update a single parameter"""
109
+ grad = p.grad
110
+ if grad.is_sparse:
111
+ raise RuntimeError('BarzilaiBorwein does not support sparse gradients')
112
+
113
+ state = self.state[p]
114
+ step = state['step']
115
+
116
+ if step == 0:
117
+ # First step: use initial learning rate
118
+ new_alpha = torch.tensor(group['lr'], device=p.device)
119
+ restart = False
120
+ else:
121
+ # Compute new step size
122
+ s, y = self._compute_updates(p, state)
123
+ restart = self._should_restart(s, y, grad, state, group)
124
+ new_alpha = self._compute_new_step_size(s, y, state, group, restart)
125
+
126
+ # Update parameter with new step size
127
+ self._apply_update(p, grad, state, new_alpha, group)
128
+
129
+ # Save state for next iteration
130
+ self._save_state(p, grad, state, new_alpha)
131
+ state['step'] += 1
132
+
133
+ def _compute_updates(self, p, state: dict) -> Tuple[torch.Tensor, torch.Tensor]:
134
+ """Compute parameter and gradient updates"""
135
+ s = p.data - state['prev_param']
136
+ y = p.grad - state['prev_grad']
137
+ return s, y
138
+
139
+ def _should_restart(self, s: torch.Tensor, y: torch.Tensor,
140
+ grad: torch.Tensor, state: dict, group: dict) -> bool:
141
+ """Check if restart condition is met"""
142
+ if not group['adaptive_restart'] or state['step'] <= 1:
143
+ return False
144
+
145
+ return check_restart_condition(
146
+ s, y, grad, state['prev_grad'],
147
+ group['restart_condition'], group['restart_tol']
148
+ )
149
+
150
+ def _compute_new_step_size(self, s: torch.Tensor, y: torch.Tensor,
151
+ state: dict, group: dict, restart: bool) -> torch.Tensor:
152
+ """Compute new step size"""
153
+ if restart:
154
+ return torch.tensor(group['lr'], device=s.device)
155
+
156
+ s_dot_s, s_dot_y, y_dot_y = compute_dot_products(s, y)
157
+
158
+ new_alpha = compute_step_size(
159
+ s_dot_s, s_dot_y, y_dot_y,
160
+ state['step'], group['method'],
161
+ group['gamma'], group['safe_guard'],
162
+ s.device
163
+ )
164
+
165
+ # Clip step size
166
+ return torch.clamp(new_alpha, group['min_step'], group['max_step'])
167
+
168
+ def _apply_update(self, p, grad: torch.Tensor, state: dict,
169
+ alpha: torch.Tensor, group: dict):
170
+ """Apply parameter update"""
171
+ state['prev_alpha'] = state['alpha']
172
+ state['alpha'] = alpha
173
+
174
+ if group['momentum'] > 0:
175
+ apply_momentum(
176
+ p, grad, alpha, state,
177
+ group['momentum'], group['dampening'], group['nesterov']
178
+ )
179
+ else:
180
+ p.data.add_(grad, alpha=-alpha)
181
+
182
+ def _save_state(self, p, grad: torch.Tensor, state: dict, alpha: torch.Tensor):
183
+ """Save current state for next iteration"""
184
+ state['prev_param'].copy_(p.data)
185
+ state['prev_grad'].copy_(grad)
186
+
187
+ def get_step_sizes(self) -> List[float]:
188
+ """Get current step sizes for all parameters"""
189
+ alphas = []
190
+ for group in self.param_groups:
191
+ for p in group['params']:
192
+ state = self.state[p]
193
+ if 'alpha' in state:
194
+ alpha = state['alpha']
195
+ alphas.append(alpha.item() if torch.is_tensor(alpha) else alpha)
196
+ return alphas
197
+
198
+ def reset_step_sizes(self, alpha: float = 1.0):
199
+ """Reset step sizes for all parameters"""
200
+ for group in self.param_groups:
201
+ for p in group['params']:
202
+ state = self.state[p]
203
+ device = p.device
204
+ state['alpha'] = torch.tensor(alpha, device=device)
205
+ state['prev_alpha'] = torch.tensor(alpha, device=device)
206
+
207
+ def get_gradient_history_info(self) -> List[Tuple[float, float, float]]:
208
+ """Get gradient history information
209
+
210
+ Returns:
211
+ List of (s·s, s·y, y·y) values for each parameter
212
+ """
213
+ info = []
214
+ for group in self.param_groups:
215
+ for p in group['params']:
216
+ state = self.state[p]
217
+ if 'prev_grad' in state and p.grad is not None:
218
+ s, y = self._compute_updates(p, state)
219
+ s_dot_s, s_dot_y, y_dot_y = compute_dot_products(s, y)
220
+ info.append((s_dot_s.item(), s_dot_y.item(), y_dot_y.item()))
221
+ return info
222
+
223
+ def get_convergence_info(self) -> Dict[str, List[float]]:
224
+ """Get convergence information"""
225
+ info = {
226
+ 'step_sizes': [],
227
+ 'gradient_norms': [],
228
+ 'step_norms': []
229
+ }
230
+ for group in self.param_groups:
231
+ for p in group['params']:
232
+ state = self.state[p]
233
+ if 'alpha' in state:
234
+ alpha = state['alpha']
235
+ info['step_sizes'].append(alpha.item() if torch.is_tensor(alpha) else alpha)
236
+ if p.grad is not None:
237
+ grad_norm = torch.norm(p.grad).item()
238
+ info['gradient_norms'].append(grad_norm)
239
+ if 'prev_param' in state:
240
+ step_norm = torch.norm(p.data - state['prev_param']).item()
241
+ info['step_norms'].append(step_norm)
242
+ return info
@@ -0,0 +1,90 @@
1
+ import torch
2
+ from typing import Union
3
+ from enum import Enum
4
+
5
+ class RestartCondition(Enum):
6
+ GRADIENT = 'gradient'
7
+ ANGLE = 'angle'
8
+ BOTH = 'both'
9
+
10
+ def check_restart_condition(
11
+ s: torch.Tensor,
12
+ y: torch.Tensor,
13
+ grad: torch.Tensor,
14
+ prev_grad: torch.Tensor,
15
+ condition: Union[str, RestartCondition],
16
+ tol: float = 0.9
17
+ ) -> bool:
18
+ """Check if restart condition is met
19
+
20
+ Restart conditions help prevent BB method from diverging on non-convex problems
21
+
22
+ Args:
23
+ s: Parameter difference (x_k - x_{k-1})
24
+ y: Gradient difference (∇f_k - ∇f_{k-1})
25
+ grad: Current gradient
26
+ prev_grad: Previous gradient
27
+ condition: Restart condition type
28
+ tol: Tolerance for restart condition
29
+
30
+ Returns:
31
+ True if restart condition is met
32
+ """
33
+ if isinstance(condition, str):
34
+ condition = RestartCondition(condition.lower())
35
+
36
+ if condition == RestartCondition.GRADIENT:
37
+ return _check_gradient_condition(s, y, tol)
38
+ elif condition == RestartCondition.ANGLE:
39
+ return _check_angle_condition(grad, prev_grad, tol)
40
+ elif condition == RestartCondition.BOTH:
41
+ return _check_both_conditions(s, y, grad, prev_grad, tol)
42
+ else:
43
+ raise ValueError(f"Unsupported restart condition: {condition}")
44
+
45
+ def _check_gradient_condition(s: torch.Tensor, y: torch.Tensor, tol: float) -> bool:
46
+ """Check gradient-based restart condition"""
47
+ s_norm = torch.norm(s)
48
+ y_norm = torch.norm(y)
49
+
50
+ if s_norm < 1e-12 or y_norm < 1e-12:
51
+ return False
52
+
53
+ cos_theta = torch.abs(torch.sum(s * y)) / (s_norm * y_norm)
54
+ return cos_theta < tol
55
+
56
+ def _check_angle_condition(grad: torch.Tensor, prev_grad: torch.Tensor, tol: float) -> bool:
57
+ """Check angle-based restart condition"""
58
+ grad_norm = torch.norm(grad)
59
+ prev_grad_norm = torch.norm(prev_grad)
60
+
61
+ if grad_norm < 1e-12 or prev_grad_norm < 1e-12:
62
+ return False
63
+
64
+ cos_phi = torch.sum(grad * prev_grad) / (grad_norm * prev_grad_norm)
65
+ return cos_phi < -tol
66
+
67
+ def _check_both_conditions(
68
+ s: torch.Tensor,
69
+ y: torch.Tensor,
70
+ grad: torch.Tensor,
71
+ prev_grad: torch.Tensor,
72
+ tol: float
73
+ ) -> bool:
74
+ """Check both restart conditions"""
75
+ restart1, restart2 = False, False
76
+
77
+ s_norm = torch.norm(s)
78
+ y_norm = torch.norm(y)
79
+ grad_norm = torch.norm(grad)
80
+ prev_grad_norm = torch.norm(prev_grad)
81
+
82
+ if s_norm > 1e-12 and y_norm > 1e-12:
83
+ cos_theta = torch.abs(torch.sum(s * y)) / (s_norm * y_norm)
84
+ restart1 = cos_theta < tol
85
+
86
+ if grad_norm > 1e-12 and prev_grad_norm > 1e-12:
87
+ cos_phi = torch.sum(grad * prev_grad) / (grad_norm * prev_grad_norm)
88
+ restart2 = cos_phi < -tol
89
+
90
+ return restart1 or restart2
@@ -0,0 +1,89 @@
1
+ import torch
2
+ from typing import Union
3
+ from enum import Enum
4
+
5
+ class StepSizeMethod(Enum):
6
+ BB1 = 'bb1'
7
+ BB2 = 'bb2'
8
+ ALTERNATING = 'alternating'
9
+
10
+ def compute_step_size(
11
+ s_dot_s: torch.Tensor,
12
+ s_dot_y: torch.Tensor,
13
+ y_dot_y: torch.Tensor,
14
+ step: int,
15
+ method: Union[str, StepSizeMethod],
16
+ gamma: float = 1e-8,
17
+ safe_guard: float = 1e-8,
18
+ device: torch.device = None
19
+ ) -> torch.Tensor:
20
+ """Compute step size using Barzilai-Borwein method
21
+
22
+ Args:
23
+ s_dot_s: s·s where s = x_k - x_{k-1}
24
+ s_dot_y: s·y where y = ∇f_k - ∇f_{k-1}
25
+ y_dot_y: y·y
26
+ step: Current iteration number
27
+ method: Step size calculation method
28
+ gamma: Regularization parameter
29
+ safe_guard: Numerical safety parameter
30
+ device: Device for tensor creation
31
+
32
+ Returns:
33
+ Computed step size
34
+ """
35
+ if isinstance(method, str):
36
+ method = StepSizeMethod(method.lower())
37
+
38
+ if method == StepSizeMethod.BB1:
39
+ return _bb1_step(s_dot_s, s_dot_y, gamma, safe_guard, device)
40
+ elif method == StepSizeMethod.BB2:
41
+ return _bb2_step(s_dot_y, y_dot_y, gamma, safe_guard, device)
42
+ elif method == StepSizeMethod.ALTERNATING:
43
+ # Alternate between BB1 and BB2
44
+ if step % 2 == 1:
45
+ return _bb1_step(s_dot_s, s_dot_y, gamma, safe_guard, device)
46
+ else:
47
+ return _bb2_step(s_dot_y, y_dot_y, gamma, safe_guard, device)
48
+ else:
49
+ raise ValueError(f"Unsupported step size calculation method: {method}")
50
+
51
+ def _bb1_step(
52
+ s_dot_s: torch.Tensor,
53
+ s_dot_y: torch.Tensor,
54
+ gamma: float,
55
+ safe_guard: float,
56
+ device: torch.device
57
+ ) -> torch.Tensor:
58
+ """Compute BB1 step size"""
59
+ if torch.abs(s_dot_y) < safe_guard:
60
+ if torch.abs(s_dot_s) < safe_guard:
61
+ return torch.tensor(1.0, device=device)
62
+ else:
63
+ return s_dot_s.clone()
64
+
65
+ denominator = s_dot_y + gamma
66
+ if denominator <= 0:
67
+ denominator = torch.abs(denominator) + gamma
68
+
69
+ return s_dot_s / denominator
70
+
71
+ def _bb2_step(
72
+ s_dot_y: torch.Tensor,
73
+ y_dot_y: torch.Tensor,
74
+ gamma: float,
75
+ safe_guard: float,
76
+ device: torch.device
77
+ ) -> torch.Tensor:
78
+ """Compute BB2 step size"""
79
+ if torch.abs(y_dot_y) < safe_guard:
80
+ if torch.abs(s_dot_y) < safe_guard:
81
+ return torch.tensor(1.0, device=device)
82
+ else:
83
+ return s_dot_y.clone()
84
+
85
+ denominator = y_dot_y + gamma
86
+ if denominator <= 0:
87
+ denominator = torch.abs(denominator) + gamma
88
+
89
+ return s_dot_y / denominator
@@ -0,0 +1,42 @@
1
+ import torch
2
+
3
+ def validate_arguments(
4
+ lr: float,
5
+ method: str,
6
+ restart_condition: str,
7
+ momentum: float,
8
+ dampening: float
9
+ ):
10
+ """Validate optimizer arguments
11
+
12
+ Args:
13
+ lr: Learning rate
14
+ method: Step size method
15
+ restart_condition: Restart condition
16
+ momentum: Momentum parameter
17
+ dampening: Dampening parameter
18
+
19
+ Raises:
20
+ ValueError: If any argument is invalid
21
+ """
22
+ if lr <= 0.0:
23
+ raise ValueError(f"Learning rate must be positive: {lr}")
24
+ valid_methods = ['bb1', 'bb2', 'alternating']
25
+ if method not in valid_methods:
26
+ raise ValueError(f"Unsupported step size calculation method: {method}. "
27
+ f"Must be one of {valid_methods}")
28
+ valid_restart_conditions = ['gradient', 'angle', 'both']
29
+ if restart_condition not in valid_restart_conditions:
30
+ raise ValueError(f"Unsupported restart condition: {restart_condition}. "
31
+ f"Must be one of {valid_restart_conditions}")
32
+ if momentum < 0.0:
33
+ raise ValueError(f"Momentum must be non-negative: {momentum}")
34
+ if dampening < 0.0:
35
+ raise ValueError(f"Dampening must be non-negative: {dampening}")
36
+
37
+ def compute_dot_products(s: torch.Tensor, y: torch.Tensor) -> tuple:
38
+ """Compute dot products for BB step size calculation"""
39
+ s_dot_s = torch.sum(s * s)
40
+ s_dot_y = torch.sum(s * y)
41
+ y_dot_y = torch.sum(y * y)
42
+ return s_dot_s, s_dot_y, y_dot_y
@@ -0,0 +1,56 @@
1
+ Metadata-Version: 2.1
2
+ Name: barbor
3
+ Version: 1.0.0
4
+ Summary: The gradient optimization library with barzilar borwein method.
5
+ Home-page: https://github.com/linjing-lab/barbor
6
+ Download-URL: https://github.com/linjing-lab/barbor/tags
7
+ Author: 林景
8
+ Author-email: linjing010729@163.com
9
+ License: MIT
10
+ Project-URL: Source, https://github.com/linjing-lab/barbor/tree/main/barbor/
11
+ Project-URL: Tracker, https://github.com/linjing-lab/barbor/issues
12
+ Platform: UNKNOWN
13
+ Classifier: Development Status :: 5 - Production/Stable
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Intended Audience :: Information Technology
16
+ Classifier: Intended Audience :: Science/Research
17
+ Classifier: Programming Language :: Python :: 3.8
18
+ Classifier: Programming Language :: Python :: 3.9
19
+ Classifier: Programming Language :: Python :: 3.10
20
+ Classifier: Programming Language :: Python :: 3.11
21
+ Classifier: Programming Language :: Python :: 3.12
22
+ Classifier: License :: OSI Approved :: MIT License
23
+ Classifier: Topic :: Scientific/Engineering
24
+ Classifier: Topic :: Scientific/Engineering :: Mathematics
25
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
26
+ Classifier: Topic :: Software Development
27
+ Classifier: Topic :: Software Development :: Libraries
28
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
29
+ Description-Content-Type: text/markdown
30
+ License-File: LICENSE
31
+
32
+ # barbor
33
+
34
+ The gradient optimization library with barzilar borwein method.
35
+
36
+ ## description
37
+
38
+ This PyTorch implementation of the Barzilai-Borwein (BB) gradient descent optimizer represents a sophisticated advancement beyond standard first-order optimization methods. The core innovation lies in its adaptive step size computation that approximates second-order curvature information without explicit Hessian calculation, addressing a fundamental limitation of fixed-learning-rate gradient descent.
39
+
40
+ The implementation introduces two complementary step size strategies: BB1 (α = s·s/s·y) and BB2 (α = s·y/y·y), where s represents parameter changes and y represents gradient differences between iterations. These formulas effectively capture local curvature, enabling the optimizer to automatically adjust step sizes based on problem geometry. The default alternating strategy intelligently switches between these variants, leveraging their complementary strengths—BB1 tends to be more stable while BB2 can achieve faster convergence.
41
+
42
+ A key innovation is the adaptive restart mechanism that prevents divergence in non-convex landscapes. The code implements three restart conditions: gradient orthogonality (when s and y become nearly orthogonal), negative gradient correlation (when consecutive gradients point in opposite directions), or a combined approach. This system allows the optimizer to reset to initial learning rates when progress stalls, effectively escaping regions of poor curvature.
43
+
44
+ The implementation also integrates momentum support (both standard and Nesterov variants) with the BB framework, creating a hybrid approach that combines momentum's acceleration with BB's curvature awareness. Comprehensive numerical safeguards—including regularization parameters, step size clamping, and division-by-zero protection—ensure robustness across diverse optimization landscapes.
45
+
46
+ Beyond the core algorithm, the optimizer provides extensive diagnostic tools for monitoring convergence behavior, including real-time step size tracking, gradient correlation metrics, and convergence statistics. This transparency allows users to understand the adaptive behavior and make informed adjustments.
47
+
48
+ The combination of curvature-aware step sizing, intelligent restart conditions, momentum integration, and robust numerical handling makes this implementation particularly valuable for non-convex optimization problems where traditional methods struggle with learning rate selection and convergence stability.
49
+
50
+ ## install barbor
51
+
52
+ ```python
53
+ pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu121
54
+ pip install barbor
55
+ ```
56
+
@@ -0,0 +1,14 @@
1
+ LICENSE
2
+ README.md
3
+ setup.py
4
+ barbor/__init__.py
5
+ barbor/momentum.py
6
+ barbor/optimizer.py
7
+ barbor/restart.py
8
+ barbor/stepsize.py
9
+ barbor/utils.py
10
+ barbor.egg-info/PKG-INFO
11
+ barbor.egg-info/SOURCES.txt
12
+ barbor.egg-info/dependency_links.txt
13
+ barbor.egg-info/not-zip-safe
14
+ barbor.egg-info/top_level.txt
@@ -0,0 +1 @@
1
+
@@ -0,0 +1 @@
1
+ barbor
barbor-1.0.0/setup.cfg ADDED
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
barbor-1.0.0/setup.py ADDED
@@ -0,0 +1,47 @@
1
+ from setuptools import setup
2
+ from barbor import __version__
3
+
4
+ try:
5
+ with open('README.md', 'r', encoding='utf-8') as fp:
6
+ _long_description = fp.read()
7
+ except FileNotFoundError:
8
+ _long_description = ''
9
+
10
+ setup(
11
+ name='barbor', # pkg_name
12
+ packages=['barbor',],
13
+ version=__version__, # version number
14
+ description="The gradient optimization library with barzilar borwein method.",
15
+ author='林景',
16
+ author_email='linjing010729@163.com',
17
+ license='MIT',
18
+ url='https://github.com/linjing-lab/barbor',
19
+ download_url='https://github.com/linjing-lab/barbor/tags',
20
+ long_description=_long_description,
21
+ long_description_content_type='text/markdown',
22
+ include_package_data=True,
23
+ zip_safe=False,
24
+ setup_requires=['setuptools>=18.0', 'wheel'],
25
+ project_urls={
26
+ 'Source': 'https://github.com/linjing-lab/barbor/tree/main/barbor/',
27
+ 'Tracker': 'https://github.com/linjing-lab/barbor/issues',
28
+ },
29
+ classifiers=[
30
+ 'Development Status :: 5 - Production/Stable',
31
+ 'Intended Audience :: Developers',
32
+ 'Intended Audience :: Information Technology',
33
+ 'Intended Audience :: Science/Research',
34
+ 'Programming Language :: Python :: 3.8',
35
+ 'Programming Language :: Python :: 3.9',
36
+ 'Programming Language :: Python :: 3.10',
37
+ 'Programming Language :: Python :: 3.11',
38
+ 'Programming Language :: Python :: 3.12',
39
+ 'License :: OSI Approved :: MIT License',
40
+ 'Topic :: Scientific/Engineering',
41
+ 'Topic :: Scientific/Engineering :: Mathematics',
42
+ 'Topic :: Scientific/Engineering :: Artificial Intelligence',
43
+ 'Topic :: Software Development',
44
+ 'Topic :: Software Development :: Libraries',
45
+ 'Topic :: Software Development :: Libraries :: Python Modules',
46
+ ],
47
+ )