growl-reg 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Matheus Lopes Carrijo
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,2 @@
1
+ include README.md
2
+ recursive-include examples *
@@ -0,0 +1,307 @@
1
+ Metadata-Version: 2.4
2
+ Name: growl_reg
3
+ Version: 0.1.1
4
+ Summary: GrOWL regression estimator with OWL, OSCAR, and Lasso variants
5
+ Author: Matheus Lopes Carrijo
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/matheuscarrijo/growl_reg
8
+ Project-URL: Repository, https://github.com/matheuscarrijo/growl_reg
9
+ Requires-Python: >=3.8
10
+ Description-Content-Type: text/markdown
11
+ License-File: LICENSE
12
+ Requires-Dist: numpy>=1.21.0
13
+ Requires-Dist: scikit-learn>=1.0
14
+ Requires-Dist: matplotlib>=3.0.0
15
+ Dynamic: license-file
16
+
17
+ [![PyPI version](https://img.shields.io/pypi/v/growl-reg)](https://pypi.org/project/growl-reg/)
18
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
19
+ [![Python >=3.8](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/)
20
+
21
+ # ๐Ÿงฎ Group Ordered Weighted $\ell_1$ (GrOWL) Norm
22
+
23
+ This repository provides a Python implementation of the **Group Ordered Weighted
24
+ $\ell_1$ (GrOWL) Norm** regularization using the **Proximal Operator
25
+ algorithm** and the **Fast Iterative Shrinkage-Thresholding Algorithm (FISTA)**.
26
+ It solves the following general optimization problem:
27
+
28
+ $$
29
+ \min_{B} \frac{1}{2n} \lVert Y - XB \rVert_F^2 +
30
+ \sum_i w_i \space \lVert \beta\_{[i], \cdot}\rVert_2 \space, \quad \quad \quad (1)
31
+ $$
32
+
33
+ where
34
+ - $X \in \mathbb{R}^{n \times r}$ is the design matrix,
35
+ - $Y \in \mathbb{R}^{n \times p}$ is the matrix of response variables,
36
+ - $B \in \mathbb{R}^{r \times p}$ is the coefficient matrix to be estimated,
37
+ - $\beta\_{[i], \cdot}$ denotes the $i$-th largest row of $B$ in terms of
38
+ its $\ell_2$-norm, and
39
+ - $w \in \mathbb{R}^r$ is a vector of non-negative, non-increasing weights.
40
+
41
+ This regularizaton problem was introduced by Oswal et al. (2016) and it is a
42
+ multi-task ($p > 1$) version of the standard ($p=1$) Ordered Weighted
43
+ $\ell_1$ (OWL) Norm introduced independently by Zeng and Figueiredo (2014a)
44
+ and Bogdan et al. (2013).
45
+
46
+ Due to the non-smothness of the GrOWL penalty, a closed-form solution to this
47
+ problem is not available. However, the objective function remains convex,
48
+ allowing the use of efficient proximal optimization algorithms to reliably compute
49
+ the solution. Specifically, it is used the Proximal Gradient Method with
50
+ **Fast Iterative Shrinkage-Thresholding Algorithm (FISTA)** from Beck and
51
+ Teboulle (2009). Readers who are not familiar with proximal algorithms are referred
52
+ to Parikh and Boyd (2013).
53
+
54
+ ---
55
+
56
+ ---
57
+
58
+ ## ๐Ÿ“ Mathematical Background
59
+
60
+ <!-- The standard ($p = 1$) Ordered Weighted $\ell_1$ (OWL) regularization
61
+ problem can be written as
62
+
63
+ $$
64
+ \min_{\beta} \frac{1}{2n} \lVerty - X\beta
65
+ Vert_2^2 + \sum_i w_i |\beta\_{[i]}|,
66
+ $$
67
+
68
+ where $w$ is as before but now we have $p=1$, and then $y := Y \in
69
+ \mathbb{R}^{n \times 1}$ and $\beta := B \in \mathbb{R}^{r \times 1}$,
70
+ with $\beta\_{[i]}$ being the $i$-th largest component of $\beta$. -->
71
+
72
+ Due to non-smothness of the penalty term in (1), this optimization problem
73
+ has no closed-form solution. Proximal operator algorithms is employed to solve
74
+ it. The proximal operator of the GrOWL norm is given by
75
+
76
+ $$
77
+ \mathrm{prox}_G(V) = \mathrm{arg min}_B \space \frac{1}{2} \lVert B - V \rVert_F^2 +
78
+ \sum_i w_i \space \lVert \beta\_{[i], \cdot} \rVert_2.
79
+ $$
80
+
81
+ The proximal operator of GrOWL is solved in terms of the proximal operator of
82
+ the standard OWL (when $p=1$) norm, denoted by $\mathrm{prox}\_{\Omega_w}$. We thus
83
+ have the following result:
84
+
85
+ ---
86
+
87
+ **Theorem 4 from Oswal et al. (2016).**
88
+ Let $\tilde{v}_i = \lVert v\_{i,\cdot}\rVert$ for $i = 1, ..., p$. Then
89
+ $\mathrm{prox}_G(V) = \hat{V}$, where the $i$-th row of $\hat{V}$ is given by
90
+
91
+ $$
92
+ \hat{\mathbf{v}}\_{i,\cdot} =
93
+ \left(\mathrm{prox}\_{\Omega_w}(\tilde{\mathbf{v}}) \right)_i \times
94
+ \frac{\mathbf{v}\_{i,\cdot}}{\lVert \mathbf{v}\_{i,\cdot} \rVert}.
95
+ $$
96
+
97
+ ---
98
+
99
+ The formulation of $\mathrm{prox}\_{\Omega_w}$ is given in equation (24) of
100
+ Zeng and Figueiredo (2014b):
101
+
102
+ $$
103
+ \mathrm{prox}\_{\Omega_w}(\mathbf{\tilde{v}}) =
104
+ \mathrm{sign}(\mathbf{\tilde{v}}) \odot \left( \mathbf{P}(|\mathbf{\tilde{v}}|)^T
105
+ \mathrm{proj}\_{\mathbb{R}\_+^n} \left( \mathrm{proj}\_{\mathcal{K}\_m}
106
+ (|\mathbf{\tilde{v}}|\_{\downarrow} - \mathbf{w}) \right) \right),
107
+ $$
108
+
109
+ where
110
+ - $\mathrm{sign}(\mathbf{\tilde{v}})$ denotes the elementwise sign
111
+ of vector $\mathbf{\tilde{v}}$.
112
+ - $\odot$ is the Hadamard (elementwise) product.
113
+ - $\mathbf{P}(|\mathbf{\tilde{v}}|)$ is the permutation matrix that
114
+ sorts the absolute values $|\mathbf{\tilde{v}}|$ in non-increasing order,
115
+ i.e., $|\mathbf{v}|\_{\downarrow} = \mathbf{P}(|\mathbf{\tilde{v}}|)
116
+ |\mathbf{\tilde{v}}|$.
117
+ - $\mathrm{proj}\_{\mathcal{K}_m}$ is the Euclidean projection
118
+ onto the monotone cone $\mathcal{K}\_m =$ {$\mathbf{x}
119
+ \in \mathbb{R}^n : x_1 \geq x_2 \geq \cdots \geq x_n$},
120
+ implemented using the Pool Adjacent Violators (PAV) algorithm.
121
+ - $\mathrm{proj}\_{\mathbb{R}_+^n}$ is the Euclidean projection
122
+ onto the nonnegative orthant, i.e., it replaces negative values by zero
123
+ (clipping).
124
+ - $\mathbf{w}$ is a weight vector satisfying $w_1 \geq w_2 \geq
125
+ \cdots \geq w_n \geq 0$.
126
+ - $|\mathbf{\tilde{v}}|\_{\downarrow}$ denotes the absolute values of
127
+ $\mathbf{\tilde{v}}$ sorted in non-increasing order.
128
+
129
+ We use **FISTA** (Beck and Teboulle, 2009), which is an accelerated first-order
130
+ method designed for problems of the form:
131
+
132
+ $$
133
+ \min_{B} f(B) + g(B),
134
+ $$
135
+
136
+ where $f$ is convex and differentiable with Lipschitz continuous gradient,
137
+ and $g$ is convex (possibly non-smooth) with a proximal operator that can be
138
+ computed efficiently.
139
+
140
+ In our case:
141
+ - $f(B) := \frac{1}{2n} \lVert Y - XB \rVert_2^2$ is the smooth loss,
142
+ - $g(B) := \sum_i w_i \space \lVert \beta\_{[i], \cdot}\rVert_2$ is the GrOWL
143
+ penalty.
144
+
145
+ FISTA proceeds by alternating between gradient descent steps on $f$ and proximal
146
+ steps on $g$, with a Nesterov-type momentum update to accelerate convergence. Each
147
+ iteration consists of:
148
+
149
+ 1. **Gradient step:**
150
+
151
+ $$
152
+ V^{(k)} = Z^{(k)} - \frac{1}{L} \nabla f(Z^{(k)}),
153
+ $$
154
+
155
+ where $L$ is the Lipschitz constant of $\nabla f$, computed as
156
+ $L = \lVert X \rVert_2^2/n$, where $\lVert X \rVert_2$ denotes the spectral norm of the matrix $X$.
157
+
158
+ 2. **Proximal step using the GrOWL operator:**
159
+
160
+ $$
161
+ B^{(k+1)} = \mathrm{prox}_G (V^{(k)}),
162
+ $$
163
+
164
+ which is implemented as described earlier, using the Pool Adjacent Violators (PAV)
165
+ algorithm for isotonic regression and restoring the original signs and order.
166
+
167
+ 3. **Nesterov momentum step:**
168
+
169
+ $$
170
+ t_{k+1} = \frac{1}{2} \left(1 + \sqrt{1 + 4t_k^2} \right), \quad
171
+ Z^{(k+1)} = B^{(k+1)} + \left( \frac{t_k - 1}{t_{k+1}} \right) (B^{(k+1)} - B^{(k)}).
172
+ $$
173
+
174
+ The algorithm continues until convergence is detected, based on one of three
175
+ user-defined stopping criteria:
176
+ - Absolute change in objective value,
177
+ - Relative change in objective value,
178
+ - Frobenius norm of the difference between successive iterates.
179
+
180
+ This FISTA procedure is implemented in the function `growl_fista()` inside the file
181
+ 'fista_solver.py' in the codebase. The function handles flexible weight vector definitions
182
+ (manual or parameterized via `lambda_1`, `lambda_2`, and `ramp_size`) and returns the
183
+ estimated coefficient matrix along with the cost history.
184
+
185
+ The proximal operators evaluations are implemented in the functions 'prox_owl()' and
186
+ 'prox_growl()' inside the file 'prox_operator.py'.
187
+
188
+ ---
189
+
190
+ ---
191
+
192
+ ## ๐Ÿ—‚ Repository Structure
193
+
194
+ Below are the important modules in this project and their functionalities:
195
+
196
+ 1. **`__init__.py`**
197
+ This file is part of the `growl/` module and exposes the `GrowlRegressor` class
198
+ as the main interface for the package. It enables clean imports such as:
199
+
200
+ ```python
201
+ from growl import GrowlRegressor
202
+ ```
203
+
204
+ 3. **`base.py`**
205
+ Contains the main class `GrowlRegressor`, a `scikit-learn` compatible estimator that
206
+ implements GrOWL regression. This class provides:
207
+
208
+ - `.fit(X, Y)` to estimate coefficients using the GrOWL penalty
209
+ - `.predict(X)` for in-sample or out-of-sample predictions
210
+ - Integration with `GridSearchCV`
211
+ - Optional centering of `X` and `Y` when `fit_intercept=True`
212
+ - Storage of the coefficient matrix `coef_` and optimization history `cost_history_`
213
+
214
+ 5. **`prox_operator.py`**
215
+ Implements proximal operators required for optimization:
216
+ - `prox_owl(v, w)`: Evaluate the proximal operator for the OWL penalty.
217
+ - `prox_growl(V, w)`: Evaluate the proximal operator for the GrOWL penalty.
218
+
219
+ 6. **`fista_solver.py`**
220
+ Implements the FISTA-based optimization routine used to solve the GrOWL regularized
221
+ least-squares problem. This module includes:
222
+
223
+ - `growl_fista(...)`: A solver using Nesterovโ€™s acceleration
224
+ - Weight vector construction based on `lambda_1`, `lambda_2`, and `ramp_size`
225
+ - Convergence monitoring based on cost, relative cost, or solution change
226
+ - Optional scaling of the objective function to improve numerical stability
227
+
228
+ 8. **`growl_example.py`**
229
+ Located in the `examples/` folder, this script demonstrates the usage of
230
+ the `GrowlRegressor`:
231
+
232
+ - Grid search over hyperparameters (`lambda_1`, `lambda_2`, `ramp_size`)
233
+ - Visual comparisons between:
234
+ - True vs estimated coefficients
235
+ - GrOWL vs MultiTaskLasso (for pooled regression)
236
+ - GrOWL (OWL style) vs Lasso (for standard regression)
237
+ - Plots showing grouping behavior and coefficient shrinkage
238
+
239
+ To run the example, use:
240
+ ```bash
241
+ python examples/growl_example.py
242
+ ```
243
+
244
+ ---
245
+
246
+ ---
247
+
248
+ ## โš™๏ธ Setup
249
+
250
+ **Install the repository:**
251
+
252
+ ```bash
253
+ pip install growl_reg
254
+ ```
255
+
256
+ ---
257
+
258
+ ---
259
+
260
+ ## ๐Ÿ“š References
261
+
262
+ Beck, A. and Teboulle, M. "A fast iterative shrinkage-thresholding algorithm
263
+ for linear inverse problems", _SIAM Journal on Imaging Sciences, vol. 2, no. 1,
264
+ pp. 183โ€“202_, 2009.
265
+
266
+ Bogdan, J., Berg, E., Su, W. and Candes, E. "Statistical
267
+ estimation and testing via the ordered $\ell_1$ norm", arXiv preprint
268
+ [arxiv:1310.1969v2](https://arxiv.org/abs/1310.1969) 2013.
269
+
270
+ Oswal, U., Cox, C., Ralph, M. A. L., and Rogers, T., Nowak, R., 2016.
271
+ "Representational Similarity Learning with Application to Brain Networks".
272
+ _Proceedings of the 33 rd International Conference on Machine Learning,
273
+ New York, NY, USA, 2016. JMLR: W\&CP volume 48_.
274
+
275
+ Parikh, Neal and Boyd, Stephen. "Proximal algorithms". _Foundations and Trends
276
+ in optimization_, 1(3):123โ€“231, 2013.
277
+
278
+ Zeng, X. and Figueiredo, M, 2014a. "Decreasing Weighted Sorted $\ell_1$
279
+ Regularization". arXiv preprint
280
+ [arXiv:1404.3184v1](https://arxiv.org/abs/1404.3184), 2014.
281
+
282
+ Zeng, X. and Figueiredo, M, 2014b. "The ordered weighted $\ell_1$ norm - atomic
283
+ formulation, projections, and Algorithms". arXiv preprint
284
+ [arXiv:1409.4271v5](https://arxiv.org/abs/1409.4271), 2014.
285
+
286
+ ---
287
+
288
+ ## ๐Ÿ“‘ Citation
289
+
290
+ If you use `growl_reg` in your work, please cite it as:
291
+
292
+ Matheus Lopes Carrijo. "GrOWL Regression Estimator (Python package)." 2025.
293
+ Available at: https://github.com/matheuscarrijo/growl_reg
294
+
295
+
296
+ Or use the following BibTeX entry:
297
+
298
+ ```bibtex
299
+ @misc{carrijo2025growl,
300
+ author = {Carrijo, M. L.},
301
+ title = {GrOWL Regression Estimator (Python Package)},
302
+ year = {2025},
303
+ howpublished = {https://github.com/matheuscarrijo/growl_reg},
304
+ note = {Version 0.1.1}
305
+ }
306
+ ```
307
+
@@ -0,0 +1,291 @@
1
+ [![PyPI version](https://img.shields.io/pypi/v/growl-reg)](https://pypi.org/project/growl-reg/)
2
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
3
+ [![Python >=3.8](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/)
4
+
5
+ # ๐Ÿงฎ Group Ordered Weighted $\ell_1$ (GrOWL) Norm
6
+
7
+ This repository provides a Python implementation of the **Group Ordered Weighted
8
+ $\ell_1$ (GrOWL) Norm** regularization using the **Proximal Operator
9
+ algorithm** and the **Fast Iterative Shrinkage-Thresholding Algorithm (FISTA)**.
10
+ It solves the following general optimization problem:
11
+
12
+ $$
13
+ \min_{B} \frac{1}{2n} \lVert Y - XB \rVert_F^2 +
14
+ \sum_i w_i \space \lVert \beta\_{[i], \cdot}\rVert_2 \space, \quad \quad \quad (1)
15
+ $$
16
+
17
+ where
18
+ - $X \in \mathbb{R}^{n \times r}$ is the design matrix,
19
+ - $Y \in \mathbb{R}^{n \times p}$ is the matrix of response variables,
20
+ - $B \in \mathbb{R}^{r \times p}$ is the coefficient matrix to be estimated,
21
+ - $\beta\_{[i], \cdot}$ denotes the $i$-th largest row of $B$ in terms of
22
+ its $\ell_2$-norm, and
23
+ - $w \in \mathbb{R}^r$ is a vector of non-negative, non-increasing weights.
24
+
25
+ This regularizaton problem was introduced by Oswal et al. (2016) and it is a
26
+ multi-task ($p > 1$) version of the standard ($p=1$) Ordered Weighted
27
+ $\ell_1$ (OWL) Norm introduced independently by Zeng and Figueiredo (2014a)
28
+ and Bogdan et al. (2013).
29
+
30
+ Due to the non-smothness of the GrOWL penalty, a closed-form solution to this
31
+ problem is not available. However, the objective function remains convex,
32
+ allowing the use of efficient proximal optimization algorithms to reliably compute
33
+ the solution. Specifically, it is used the Proximal Gradient Method with
34
+ **Fast Iterative Shrinkage-Thresholding Algorithm (FISTA)** from Beck and
35
+ Teboulle (2009). Readers who are not familiar with proximal algorithms are referred
36
+ to Parikh and Boyd (2013).
37
+
38
+ ---
39
+
40
+ ---
41
+
42
+ ## ๐Ÿ“ Mathematical Background
43
+
44
+ <!-- The standard ($p = 1$) Ordered Weighted $\ell_1$ (OWL) regularization
45
+ problem can be written as
46
+
47
+ $$
48
+ \min_{\beta} \frac{1}{2n} \lVerty - X\beta
49
+ Vert_2^2 + \sum_i w_i |\beta\_{[i]}|,
50
+ $$
51
+
52
+ where $w$ is as before but now we have $p=1$, and then $y := Y \in
53
+ \mathbb{R}^{n \times 1}$ and $\beta := B \in \mathbb{R}^{r \times 1}$,
54
+ with $\beta\_{[i]}$ being the $i$-th largest component of $\beta$. -->
55
+
56
+ Due to non-smothness of the penalty term in (1), this optimization problem
57
+ has no closed-form solution. Proximal operator algorithms is employed to solve
58
+ it. The proximal operator of the GrOWL norm is given by
59
+
60
+ $$
61
+ \mathrm{prox}_G(V) = \mathrm{arg min}_B \space \frac{1}{2} \lVert B - V \rVert_F^2 +
62
+ \sum_i w_i \space \lVert \beta\_{[i], \cdot} \rVert_2.
63
+ $$
64
+
65
+ The proximal operator of GrOWL is solved in terms of the proximal operator of
66
+ the standard OWL (when $p=1$) norm, denoted by $\mathrm{prox}\_{\Omega_w}$. We thus
67
+ have the following result:
68
+
69
+ ---
70
+
71
+ **Theorem 4 from Oswal et al. (2016).**
72
+ Let $\tilde{v}_i = \lVert v\_{i,\cdot}\rVert$ for $i = 1, ..., p$. Then
73
+ $\mathrm{prox}_G(V) = \hat{V}$, where the $i$-th row of $\hat{V}$ is given by
74
+
75
+ $$
76
+ \hat{\mathbf{v}}\_{i,\cdot} =
77
+ \left(\mathrm{prox}\_{\Omega_w}(\tilde{\mathbf{v}}) \right)_i \times
78
+ \frac{\mathbf{v}\_{i,\cdot}}{\lVert \mathbf{v}\_{i,\cdot} \rVert}.
79
+ $$
80
+
81
+ ---
82
+
83
+ The formulation of $\mathrm{prox}\_{\Omega_w}$ is given in equation (24) of
84
+ Zeng and Figueiredo (2014b):
85
+
86
+ $$
87
+ \mathrm{prox}\_{\Omega_w}(\mathbf{\tilde{v}}) =
88
+ \mathrm{sign}(\mathbf{\tilde{v}}) \odot \left( \mathbf{P}(|\mathbf{\tilde{v}}|)^T
89
+ \mathrm{proj}\_{\mathbb{R}\_+^n} \left( \mathrm{proj}\_{\mathcal{K}\_m}
90
+ (|\mathbf{\tilde{v}}|\_{\downarrow} - \mathbf{w}) \right) \right),
91
+ $$
92
+
93
+ where
94
+ - $\mathrm{sign}(\mathbf{\tilde{v}})$ denotes the elementwise sign
95
+ of vector $\mathbf{\tilde{v}}$.
96
+ - $\odot$ is the Hadamard (elementwise) product.
97
+ - $\mathbf{P}(|\mathbf{\tilde{v}}|)$ is the permutation matrix that
98
+ sorts the absolute values $|\mathbf{\tilde{v}}|$ in non-increasing order,
99
+ i.e., $|\mathbf{v}|\_{\downarrow} = \mathbf{P}(|\mathbf{\tilde{v}}|)
100
+ |\mathbf{\tilde{v}}|$.
101
+ - $\mathrm{proj}\_{\mathcal{K}_m}$ is the Euclidean projection
102
+ onto the monotone cone $\mathcal{K}\_m =$ {$\mathbf{x}
103
+ \in \mathbb{R}^n : x_1 \geq x_2 \geq \cdots \geq x_n$},
104
+ implemented using the Pool Adjacent Violators (PAV) algorithm.
105
+ - $\mathrm{proj}\_{\mathbb{R}_+^n}$ is the Euclidean projection
106
+ onto the nonnegative orthant, i.e., it replaces negative values by zero
107
+ (clipping).
108
+ - $\mathbf{w}$ is a weight vector satisfying $w_1 \geq w_2 \geq
109
+ \cdots \geq w_n \geq 0$.
110
+ - $|\mathbf{\tilde{v}}|\_{\downarrow}$ denotes the absolute values of
111
+ $\mathbf{\tilde{v}}$ sorted in non-increasing order.
112
+
113
+ We use **FISTA** (Beck and Teboulle, 2009), which is an accelerated first-order
114
+ method designed for problems of the form:
115
+
116
+ $$
117
+ \min_{B} f(B) + g(B),
118
+ $$
119
+
120
+ where $f$ is convex and differentiable with Lipschitz continuous gradient,
121
+ and $g$ is convex (possibly non-smooth) with a proximal operator that can be
122
+ computed efficiently.
123
+
124
+ In our case:
125
+ - $f(B) := \frac{1}{2n} \lVert Y - XB \rVert_2^2$ is the smooth loss,
126
+ - $g(B) := \sum_i w_i \space \lVert \beta\_{[i], \cdot}\rVert_2$ is the GrOWL
127
+ penalty.
128
+
129
+ FISTA proceeds by alternating between gradient descent steps on $f$ and proximal
130
+ steps on $g$, with a Nesterov-type momentum update to accelerate convergence. Each
131
+ iteration consists of:
132
+
133
+ 1. **Gradient step:**
134
+
135
+ $$
136
+ V^{(k)} = Z^{(k)} - \frac{1}{L} \nabla f(Z^{(k)}),
137
+ $$
138
+
139
+ where $L$ is the Lipschitz constant of $\nabla f$, computed as
140
+ $L = \lVert X \rVert_2^2/n$, where $\lVert X \rVert_2$ denotes the spectral norm of the matrix $X$.
141
+
142
+ 2. **Proximal step using the GrOWL operator:**
143
+
144
+ $$
145
+ B^{(k+1)} = \mathrm{prox}_G (V^{(k)}),
146
+ $$
147
+
148
+ which is implemented as described earlier, using the Pool Adjacent Violators (PAV)
149
+ algorithm for isotonic regression and restoring the original signs and order.
150
+
151
+ 3. **Nesterov momentum step:**
152
+
153
+ $$
154
+ t_{k+1} = \frac{1}{2} \left(1 + \sqrt{1 + 4t_k^2} \right), \quad
155
+ Z^{(k+1)} = B^{(k+1)} + \left( \frac{t_k - 1}{t_{k+1}} \right) (B^{(k+1)} - B^{(k)}).
156
+ $$
157
+
158
+ The algorithm continues until convergence is detected, based on one of three
159
+ user-defined stopping criteria:
160
+ - Absolute change in objective value,
161
+ - Relative change in objective value,
162
+ - Frobenius norm of the difference between successive iterates.
163
+
164
+ This FISTA procedure is implemented in the function `growl_fista()` inside the file
165
+ 'fista_solver.py' in the codebase. The function handles flexible weight vector definitions
166
+ (manual or parameterized via `lambda_1`, `lambda_2`, and `ramp_size`) and returns the
167
+ estimated coefficient matrix along with the cost history.
168
+
169
+ The proximal operators evaluations are implemented in the functions 'prox_owl()' and
170
+ 'prox_growl()' inside the file 'prox_operator.py'.
171
+
172
+ ---
173
+
174
+ ---
175
+
176
+ ## ๐Ÿ—‚ Repository Structure
177
+
178
+ Below are the important modules in this project and their functionalities:
179
+
180
+ 1. **`__init__.py`**
181
+ This file is part of the `growl/` module and exposes the `GrowlRegressor` class
182
+ as the main interface for the package. It enables clean imports such as:
183
+
184
+ ```python
185
+ from growl import GrowlRegressor
186
+ ```
187
+
188
+ 3. **`base.py`**
189
+ Contains the main class `GrowlRegressor`, a `scikit-learn` compatible estimator that
190
+ implements GrOWL regression. This class provides:
191
+
192
+ - `.fit(X, Y)` to estimate coefficients using the GrOWL penalty
193
+ - `.predict(X)` for in-sample or out-of-sample predictions
194
+ - Integration with `GridSearchCV`
195
+ - Optional centering of `X` and `Y` when `fit_intercept=True`
196
+ - Storage of the coefficient matrix `coef_` and optimization history `cost_history_`
197
+
198
+ 5. **`prox_operator.py`**
199
+ Implements proximal operators required for optimization:
200
+ - `prox_owl(v, w)`: Evaluate the proximal operator for the OWL penalty.
201
+ - `prox_growl(V, w)`: Evaluate the proximal operator for the GrOWL penalty.
202
+
203
+ 6. **`fista_solver.py`**
204
+ Implements the FISTA-based optimization routine used to solve the GrOWL regularized
205
+ least-squares problem. This module includes:
206
+
207
+ - `growl_fista(...)`: A solver using Nesterovโ€™s acceleration
208
+ - Weight vector construction based on `lambda_1`, `lambda_2`, and `ramp_size`
209
+ - Convergence monitoring based on cost, relative cost, or solution change
210
+ - Optional scaling of the objective function to improve numerical stability
211
+
212
+ 8. **`growl_example.py`**
213
+ Located in the `examples/` folder, this script demonstrates the usage of
214
+ the `GrowlRegressor`:
215
+
216
+ - Grid search over hyperparameters (`lambda_1`, `lambda_2`, `ramp_size`)
217
+ - Visual comparisons between:
218
+ - True vs estimated coefficients
219
+ - GrOWL vs MultiTaskLasso (for pooled regression)
220
+ - GrOWL (OWL style) vs Lasso (for standard regression)
221
+ - Plots showing grouping behavior and coefficient shrinkage
222
+
223
+ To run the example, use:
224
+ ```bash
225
+ python examples/growl_example.py
226
+ ```
227
+
228
+ ---
229
+
230
+ ---
231
+
232
+ ## โš™๏ธ Setup
233
+
234
+ **Install the repository:**
235
+
236
+ ```bash
237
+ pip install growl_reg
238
+ ```
239
+
240
+ ---
241
+
242
+ ---
243
+
244
+ ## ๐Ÿ“š References
245
+
246
+ Beck, A. and Teboulle, M. "A fast iterative shrinkage-thresholding algorithm
247
+ for linear inverse problems", _SIAM Journal on Imaging Sciences, vol. 2, no. 1,
248
+ pp. 183โ€“202_, 2009.
249
+
250
+ Bogdan, J., Berg, E., Su, W. and Candes, E. "Statistical
251
+ estimation and testing via the ordered $\ell_1$ norm", arXiv preprint
252
+ [arxiv:1310.1969v2](https://arxiv.org/abs/1310.1969) 2013.
253
+
254
+ Oswal, U., Cox, C., Ralph, M. A. L., and Rogers, T., Nowak, R., 2016.
255
+ "Representational Similarity Learning with Application to Brain Networks".
256
+ _Proceedings of the 33 rd International Conference on Machine Learning,
257
+ New York, NY, USA, 2016. JMLR: W\&CP volume 48_.
258
+
259
+ Parikh, Neal and Boyd, Stephen. "Proximal algorithms". _Foundations and Trends
260
+ in optimization_, 1(3):123โ€“231, 2013.
261
+
262
+ Zeng, X. and Figueiredo, M, 2014a. "Decreasing Weighted Sorted $\ell_1$
263
+ Regularization". arXiv preprint
264
+ [arXiv:1404.3184v1](https://arxiv.org/abs/1404.3184), 2014.
265
+
266
+ Zeng, X. and Figueiredo, M, 2014b. "The ordered weighted $\ell_1$ norm - atomic
267
+ formulation, projections, and Algorithms". arXiv preprint
268
+ [arXiv:1409.4271v5](https://arxiv.org/abs/1409.4271), 2014.
269
+
270
+ ---
271
+
272
+ ## ๐Ÿ“‘ Citation
273
+
274
+ If you use `growl_reg` in your work, please cite it as:
275
+
276
+ Matheus Lopes Carrijo. "GrOWL Regression Estimator (Python package)." 2025.
277
+ Available at: https://github.com/matheuscarrijo/growl_reg
278
+
279
+
280
+ Or use the following BibTeX entry:
281
+
282
+ ```bibtex
283
+ @misc{carrijo2025growl,
284
+ author = {Carrijo, M. L.},
285
+ title = {GrOWL Regression Estimator (Python Package)},
286
+ year = {2025},
287
+ howpublished = {https://github.com/matheuscarrijo/growl_reg},
288
+ note = {Version 0.1.1}
289
+ }
290
+ ```
291
+