growl-reg 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- growl_reg-0.1.0/LICENSE +21 -0
- growl_reg-0.1.0/MANIFEST.in +2 -0
- growl_reg-0.1.0/PKG-INFO +306 -0
- growl_reg-0.1.0/README.md +290 -0
- growl_reg-0.1.0/examples/growl_example.py +167 -0
- growl_reg-0.1.0/growl/__init__.py +3 -0
- growl_reg-0.1.0/growl/base.py +118 -0
- growl_reg-0.1.0/growl/fista_solver.py +211 -0
- growl_reg-0.1.0/growl/prox_operator.py +105 -0
- growl_reg-0.1.0/growl_reg.egg-info/PKG-INFO +306 -0
- growl_reg-0.1.0/growl_reg.egg-info/SOURCES.txt +15 -0
- growl_reg-0.1.0/growl_reg.egg-info/dependency_links.txt +1 -0
- growl_reg-0.1.0/growl_reg.egg-info/requires.txt +3 -0
- growl_reg-0.1.0/growl_reg.egg-info/top_level.txt +1 -0
- growl_reg-0.1.0/pyproject.toml +23 -0
- growl_reg-0.1.0/setup.cfg +4 -0
- growl_reg-0.1.0/setup.py +3 -0
growl_reg-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Matheus Lopes Carrijo
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
growl_reg-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,306 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: growl_reg
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: GrOWL regression estimator with OWL, OSCAR, and Lasso variants
|
|
5
|
+
Author: Matheus Lopes Carrijo
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/matheuscarrijo/growl_reg
|
|
8
|
+
Project-URL: Repository, https://github.com/matheuscarrijo/growl_reg
|
|
9
|
+
Requires-Python: >=3.8
|
|
10
|
+
Description-Content-Type: text/markdown
|
|
11
|
+
License-File: LICENSE
|
|
12
|
+
Requires-Dist: numpy>=1.21.0
|
|
13
|
+
Requires-Dist: scikit-learn>=1.0
|
|
14
|
+
Requires-Dist: matplotlib>=3.0.0
|
|
15
|
+
Dynamic: license-file
|
|
16
|
+
|
|
17
|
+
[](https://pypi.org/project/growl-reg/)
|
|
18
|
+
[](LICENSE)
|
|
19
|
+
[](https://www.python.org/)
|
|
20
|
+
|
|
21
|
+
# ๐งฎ Group Ordered Weighted $\ell_1$ (GrOWL) Norm
|
|
22
|
+
|
|
23
|
+
This repository provides a Python implementation of the **Group Ordered Weighted
|
|
24
|
+
$\ell_1$ (GrOWL) Norm** regularization using the **Proximal Operator
|
|
25
|
+
algorithm** and the **Fast Iterative Shrinkage-Thresholding Algorithm (FISTA)**.
|
|
26
|
+
It solves the following general optimization problem:
|
|
27
|
+
|
|
28
|
+
$$
|
|
29
|
+
\min_{B} \frac{1}{2n} \lVert Y - XB \rVert_F^2 +
|
|
30
|
+
\sum_i w_i \space \lVert \beta\_{[i], \cdot}\rVert_2 \space, \quad \quad \quad (1)
|
|
31
|
+
$$
|
|
32
|
+
|
|
33
|
+
where
|
|
34
|
+
- $X \in \mathbb{R}^{n \times r}$ is the design matrix,
|
|
35
|
+
- $Y \in \mathbb{R}^{n \times p}$ is the matrix of response variables,
|
|
36
|
+
- $B \in \mathbb{R}^{r \times p}$ is the coefficient matrix to be estimated,
|
|
37
|
+
- $\beta\_{[i], \cdot}$ denotes the $i$-th largest row of $B$ in terms of
|
|
38
|
+
its $\ell_2$-norm, and
|
|
39
|
+
- $w \in \mathbb{R}^r$ is a vector of non-negative, non-increasing weights.
|
|
40
|
+
|
|
41
|
+
This regularizaton problem was introduced by Oswal et al. (2016) and it is a
|
|
42
|
+
multi-task ($p > 1$) version of the standard ($p=1$) Ordered Weighted
|
|
43
|
+
$\ell_1$ (OWL) Norm introduced independently by Zeng and Figueiredo (2014a)
|
|
44
|
+
and Bogdan et al. (2013).
|
|
45
|
+
|
|
46
|
+
Due to the non-smothness of the GrOWL penalty, a closed-form solution to this
|
|
47
|
+
problem is not available. However, the objective function remains convex,
|
|
48
|
+
allowing the use of efficient proximal optimization algorithms to reliably compute
|
|
49
|
+
the solution. Specifically, it is used the Proximal Gradient Method with
|
|
50
|
+
**Fast Iterative Shrinkage-Thresholding Algorithm (FISTA)** from Beck and
|
|
51
|
+
Teboulle (2009). Readers who are not familiar with proximal algorithms are referred
|
|
52
|
+
to Parikh and Boyd (2013).
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## ๐ Mathematical Background
|
|
59
|
+
|
|
60
|
+
<!-- The standard ($p = 1$) Ordered Weighted $\ell_1$ (OWL) regularization
|
|
61
|
+
problem can be written as
|
|
62
|
+
|
|
63
|
+
$$
|
|
64
|
+
\min_{\beta} \frac{1}{2n} \lVerty - X\beta
|
|
65
|
+
Vert_2^2 + \sum_i w_i |\beta\_{[i]}|,
|
|
66
|
+
$$
|
|
67
|
+
|
|
68
|
+
where $w$ is as before but now we have $p=1$, and then $y := Y \in
|
|
69
|
+
\mathbb{R}^{n \times 1}$ and $\beta := B \in \mathbb{R}^{r \times 1}$,
|
|
70
|
+
with $\beta\_{[i]}$ being the $i$-th largest component of $\beta$. -->
|
|
71
|
+
|
|
72
|
+
Due to non-smothness of the penalty term in (1), this optimization problem
|
|
73
|
+
has no closed-form solution. Proximal operator algorithms is employed to solve
|
|
74
|
+
it. The proximal operator of the GrOWL norm is given by
|
|
75
|
+
|
|
76
|
+
$$
|
|
77
|
+
\mathrm{prox}_G(V) = \mathrm{arg min}_B \space \frac{1}{2} \lVert B - V \rVert_F^2 +
|
|
78
|
+
\sum_i w_i \space \lVert \beta\_{[i], \cdot} \rVert_2.
|
|
79
|
+
$$
|
|
80
|
+
|
|
81
|
+
The proximal operator of GrOWL is solved in terms of the proximal operator of
|
|
82
|
+
the standard OWL (when $p=1$) norm, denoted by $\mathrm{prox}\_{\Omega_w}$. We thus
|
|
83
|
+
have the following result:
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
**Theorem 4 from Oswal et al. (2016).**
|
|
88
|
+
Let $\tilde{v}_i = \lVert v\_{i,\cdot}\rVert$ for $i = 1, ..., p$. Then
|
|
89
|
+
$\mathrm{prox}_G(V) = \hat{V}$, where the $i$-th row of $\hat{V}$ is given by
|
|
90
|
+
|
|
91
|
+
$$
|
|
92
|
+
\hat{\mathbf{v}}\_{i,\cdot} =
|
|
93
|
+
\left(\mathrm{prox}\_{\Omega_w}(\tilde{\mathbf{v}}) \right)_i \times
|
|
94
|
+
\frac{\mathbf{v}\_{i,\cdot}}{\lVert \mathbf{v}\_{i,\cdot} \rVert}.
|
|
95
|
+
$$
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
The formulation of $\mathrm{prox}\_{\Omega_w}$ is given in equation (24) of
|
|
100
|
+
Zeng and Figueiredo (2014b):
|
|
101
|
+
|
|
102
|
+
$$
|
|
103
|
+
\mathrm{prox}\_{\Omega_w}(\mathbf{\tilde{v}}) =
|
|
104
|
+
\mathrm{sign}(\mathbf{\tilde{v}}) \odot \left( \mathbf{P}(|\mathbf{\tilde{v}}|)^T
|
|
105
|
+
\mathrm{proj}\_{\mathbb{R}\_+^n} \left( \mathrm{proj}\_{\mathcal{K}\_m}
|
|
106
|
+
(|\mathbf{\tilde{v}}|\_{\downarrow} - \mathbf{w}) \right) \right),
|
|
107
|
+
$$
|
|
108
|
+
|
|
109
|
+
where
|
|
110
|
+
- $\mathrm{sign}(\mathbf{\tilde{v}})$ denotes the elementwise sign
|
|
111
|
+
of vector $\mathbf{\tilde{v}}$.
|
|
112
|
+
- $\odot$ is the Hadamard (elementwise) product.
|
|
113
|
+
- $\mathbf{P}(|\mathbf{\tilde{v}}|)$ is the permutation matrix that
|
|
114
|
+
sorts the absolute values $|\mathbf{\tilde{v}}|$ in non-increasing order,
|
|
115
|
+
i.e., $|\mathbf{v}|\_{\downarrow} = \mathbf{P}(|\mathbf{\tilde{v}}|)
|
|
116
|
+
|\mathbf{\tilde{v}}|$.
|
|
117
|
+
- $\mathrm{proj}\_{\mathcal{K}_m}$ is the Euclidean projection
|
|
118
|
+
onto the monotone cone $\mathcal{K}\_m =$ {$\mathbf{x}
|
|
119
|
+
\in \mathbb{R}^n : x_1 \geq x_2 \geq \cdots \geq x_n$},
|
|
120
|
+
implemented using the Pool Adjacent Violators (PAV) algorithm.
|
|
121
|
+
- $\mathrm{proj}\_{\mathbb{R}_+^n}$ is the Euclidean projection
|
|
122
|
+
onto the nonnegative orthant, i.e., it replaces negative values by zero
|
|
123
|
+
(clipping).
|
|
124
|
+
- $\mathbf{w}$ is a weight vector satisfying $w_1 \geq w_2 \geq
|
|
125
|
+
\cdots \geq w_n \geq 0$.
|
|
126
|
+
- $|\mathbf{\tilde{v}}|\_{\downarrow}$ denotes the absolute values of
|
|
127
|
+
$\mathbf{\tilde{v}}$ sorted in non-increasing order.
|
|
128
|
+
|
|
129
|
+
We use **FISTA** (Beck and Teboulle, 2009), which is an accelerated first-order
|
|
130
|
+
method designed for problems of the form:
|
|
131
|
+
|
|
132
|
+
$$
|
|
133
|
+
\min_{B} f(B) + g(B),
|
|
134
|
+
$$
|
|
135
|
+
|
|
136
|
+
where $f$ is convex and differentiable with Lipschitz continuous gradient,
|
|
137
|
+
and $g$ is convex (possibly non-smooth) with a proximal operator that can be
|
|
138
|
+
computed efficiently.
|
|
139
|
+
|
|
140
|
+
In our case:
|
|
141
|
+
- $f(B) := \frac{1}{2n} \lVert Y - XB \rVert_2^2$ is the smooth loss,
|
|
142
|
+
- $g(B) := \sum_i w_i \space \lVert \beta\_{[i], \cdot}\rVert_2$ is the GrOWL
|
|
143
|
+
penalty.
|
|
144
|
+
|
|
145
|
+
FISTA proceeds by alternating between gradient descent steps on $f$ and proximal
|
|
146
|
+
steps on $g$, with a Nesterov-type momentum update to accelerate convergence. Each
|
|
147
|
+
iteration consists of:
|
|
148
|
+
|
|
149
|
+
1. **Gradient step:**
|
|
150
|
+
|
|
151
|
+
$$
|
|
152
|
+
V^{(k)} = Z^{(k)} - \frac{1}{L} \nabla f(Z^{(k)}),
|
|
153
|
+
$$
|
|
154
|
+
|
|
155
|
+
where $L$ is the Lipschitz constant of $\nabla f$, computed as
|
|
156
|
+
$L = \lVert X \rVert_2^2/n$, where $\lVert X \rVert_2$ denotes the spectral norm of the matrix $X$.
|
|
157
|
+
|
|
158
|
+
2. **Proximal step using the GrOWL operator:**
|
|
159
|
+
|
|
160
|
+
$$
|
|
161
|
+
B^{(k+1)} = \mathrm{prox}_G (V^{(k)}),
|
|
162
|
+
$$
|
|
163
|
+
|
|
164
|
+
which is implemented as described earlier, using the Pool Adjacent Violators (PAV)
|
|
165
|
+
algorithm for isotonic regression and restoring the original signs and order.
|
|
166
|
+
|
|
167
|
+
3. **Nesterov momentum step:**
|
|
168
|
+
|
|
169
|
+
$$
|
|
170
|
+
t_{k+1} = \frac{1}{2} \left(1 + \sqrt{1 + 4t_k^2} \right), \quad
|
|
171
|
+
Z^{(k+1)} = B^{(k+1)} + \left( \frac{t_k - 1}{t_{k+1}} \right) (B^{(k+1)} - B^{(k)}).
|
|
172
|
+
$$
|
|
173
|
+
|
|
174
|
+
The algorithm continues until convergence is detected, based on one of three
|
|
175
|
+
user-defined stopping criteria:
|
|
176
|
+
- Absolute change in objective value,
|
|
177
|
+
- Relative change in objective value,
|
|
178
|
+
- Frobenius norm of the difference between successive iterates.
|
|
179
|
+
|
|
180
|
+
This FISTA procedure is implemented in the function `growl_fista()` inside the file
|
|
181
|
+
'fista_solver.py' in the codebase. The function handles flexible weight vector definitions
|
|
182
|
+
(manual or parameterized via `lambda_1`, `lambda_2`, and `ramp_size`) and returns the
|
|
183
|
+
estimated coefficient matrix along with the cost history.
|
|
184
|
+
|
|
185
|
+
The proximal operators evaluations are implemented in the functions 'prox_owl()' and
|
|
186
|
+
'prox_growl()' inside the file 'prox_operator.py'.
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
190
|
+
---
|
|
191
|
+
|
|
192
|
+
## ๐ Repository Structure
|
|
193
|
+
|
|
194
|
+
Below are the important modules in this project and their functionalities:
|
|
195
|
+
|
|
196
|
+
1. **`__init__.py`**
|
|
197
|
+
This file is part of the `growl/` module and exposes the `GrowlRegressor` class
|
|
198
|
+
as the main interface for the package. It enables clean imports such as:
|
|
199
|
+
|
|
200
|
+
```python
|
|
201
|
+
from growl import GrowlRegressor
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
3. **`base.py`**
|
|
205
|
+
Contains the main class `GrowlRegressor`, a `scikit-learn` compatible estimator that
|
|
206
|
+
implements GrOWL regression. This class provides:
|
|
207
|
+
|
|
208
|
+
- `.fit(X, Y)` to estimate coefficients using the GrOWL penalty
|
|
209
|
+
- `.predict(X)` for in-sample or out-of-sample predictions
|
|
210
|
+
- Integration with `GridSearchCV`
|
|
211
|
+
- Optional centering of `X` and `Y` when `fit_intercept=True`
|
|
212
|
+
- Storage of the coefficient matrix `coef_` and optimization history `cost_history_`
|
|
213
|
+
|
|
214
|
+
5. **`prox_operator.py`**
|
|
215
|
+
Implements proximal operators required for optimization:
|
|
216
|
+
- `prox_owl(v, w)`: Evaluate the proximal operator for the OWL penalty.
|
|
217
|
+
- `prox_growl(V, w)`: Evaluate the proximal operator for the GrOWL penalty.
|
|
218
|
+
|
|
219
|
+
6. **`fista_solver.py`**
|
|
220
|
+
Implements the FISTA-based optimization routine used to solve the GrOWL regularized
|
|
221
|
+
least-squares problem. This module includes:
|
|
222
|
+
|
|
223
|
+
- `growl_fista(...)`: A solver using Nesterovโs acceleration
|
|
224
|
+
- Weight vector construction based on `lambda_1`, `lambda_2`, and `ramp_size`
|
|
225
|
+
- Convergence monitoring based on cost, relative cost, or solution change
|
|
226
|
+
- Optional scaling of the objective function to improve numerical stability
|
|
227
|
+
|
|
228
|
+
8. **`growl_example.py`**
|
|
229
|
+
Located in the `examples/` folder, this script demonstrates the usage of
|
|
230
|
+
the `GrowlRegressor`:
|
|
231
|
+
|
|
232
|
+
- Grid search over hyperparameters (`lambda_1`, `lambda_2`, `ramp_size`)
|
|
233
|
+
- Visual comparisons between:
|
|
234
|
+
- True vs estimated coefficients
|
|
235
|
+
- GrOWL vs MultiTaskLasso (for pooled regression)
|
|
236
|
+
- GrOWL (OWL style) vs Lasso (for standard regression)
|
|
237
|
+
- Plots showing grouping behavior and coefficient shrinkage
|
|
238
|
+
|
|
239
|
+
To run the example, use:
|
|
240
|
+
```bash
|
|
241
|
+
python examples/growl_example.py
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
---
|
|
245
|
+
|
|
246
|
+
---
|
|
247
|
+
|
|
248
|
+
## โ๏ธ Setup
|
|
249
|
+
|
|
250
|
+
**Install the repository:**
|
|
251
|
+
|
|
252
|
+
```bash
|
|
253
|
+
pip install growl_reg
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
---
|
|
257
|
+
|
|
258
|
+
---
|
|
259
|
+
|
|
260
|
+
## ๐ References
|
|
261
|
+
|
|
262
|
+
Beck, A. and Teboulle, M. "A fast iterative shrinkage-thresholding algorithm
|
|
263
|
+
for linear inverse problems", _SIAM Journal on Imaging Sciences, vol. 2, no. 1,
|
|
264
|
+
pp. 183โ202_, 2009.
|
|
265
|
+
|
|
266
|
+
Bogdan, J., Berg, E., Su, W. and Candes, E. "Statistical
|
|
267
|
+
estimation and testing via the ordered $\ell_1$ norm", arXiv preprint
|
|
268
|
+
[arxiv:1310.1969v2](https://arxiv.org/abs/1310.1969) 2013.
|
|
269
|
+
|
|
270
|
+
Oswal, U., Cox, C., Ralph, M. A. L., and Rogers, T., Nowak, R., 2016.
|
|
271
|
+
"Representational Similarity Learning with Application to Brain Networks".
|
|
272
|
+
_Proceedings of the 33 rd International Conference on Machine Learning,
|
|
273
|
+
New York, NY, USA, 2016. JMLR: W\&CP volume 48_.
|
|
274
|
+
|
|
275
|
+
Parikh, Neal and Boyd, Stephen. "Proximal algorithms". _Foundations and Trends
|
|
276
|
+
in optimization_, 1(3):123โ231, 2013.
|
|
277
|
+
|
|
278
|
+
Zeng, X. and Figueiredo, M, 2014a. "Decreasing Weighted Sorted $\ell_1$
|
|
279
|
+
Regularization". arXiv preprint
|
|
280
|
+
[arXiv:1404.3184v1](https://arxiv.org/abs/1404.3184), 2014.
|
|
281
|
+
|
|
282
|
+
Zeng, X. and Figueiredo, M, 2014b. "The ordered weighted $\ell_1$ norm - atomic
|
|
283
|
+
formulation, projections, and Algorithms". arXiv preprint
|
|
284
|
+
[arXiv:1409.4271v5](https://arxiv.org/abs/1409.4271), 2014.
|
|
285
|
+
|
|
286
|
+
---
|
|
287
|
+
|
|
288
|
+
## ๐ Citation
|
|
289
|
+
|
|
290
|
+
If you use `growl_reg` in your work, please cite it as:
|
|
291
|
+
|
|
292
|
+
Matheus Lopes Carrijo. "GrOWL Regression Estimator (Python package)." 2025.
|
|
293
|
+
Available at: https://github.com/matheuscarrijo/growl_reg
|
|
294
|
+
|
|
295
|
+
|
|
296
|
+
Or use the following BibTeX entry:
|
|
297
|
+
|
|
298
|
+
```bibtex
|
|
299
|
+
@misc{carrijo2025growl,
|
|
300
|
+
author = {Carrijo, M. L.},
|
|
301
|
+
title = {GrOWL Regression Estimator (Python Package)},
|
|
302
|
+
year = {2025},
|
|
303
|
+
howpublished = {https://github.com/matheuscarrijo/growl_reg},
|
|
304
|
+
note = {Version 0.0.1}
|
|
305
|
+
}
|
|
306
|
+
```
|
|
@@ -0,0 +1,290 @@
|
|
|
1
|
+
[](https://pypi.org/project/growl-reg/)
|
|
2
|
+
[](LICENSE)
|
|
3
|
+
[](https://www.python.org/)
|
|
4
|
+
|
|
5
|
+
# ๐งฎ Group Ordered Weighted $\ell_1$ (GrOWL) Norm
|
|
6
|
+
|
|
7
|
+
This repository provides a Python implementation of the **Group Ordered Weighted
|
|
8
|
+
$\ell_1$ (GrOWL) Norm** regularization using the **Proximal Operator
|
|
9
|
+
algorithm** and the **Fast Iterative Shrinkage-Thresholding Algorithm (FISTA)**.
|
|
10
|
+
It solves the following general optimization problem:
|
|
11
|
+
|
|
12
|
+
$$
|
|
13
|
+
\min_{B} \frac{1}{2n} \lVert Y - XB \rVert_F^2 +
|
|
14
|
+
\sum_i w_i \space \lVert \beta\_{[i], \cdot}\rVert_2 \space, \quad \quad \quad (1)
|
|
15
|
+
$$
|
|
16
|
+
|
|
17
|
+
where
|
|
18
|
+
- $X \in \mathbb{R}^{n \times r}$ is the design matrix,
|
|
19
|
+
- $Y \in \mathbb{R}^{n \times p}$ is the matrix of response variables,
|
|
20
|
+
- $B \in \mathbb{R}^{r \times p}$ is the coefficient matrix to be estimated,
|
|
21
|
+
- $\beta\_{[i], \cdot}$ denotes the $i$-th largest row of $B$ in terms of
|
|
22
|
+
its $\ell_2$-norm, and
|
|
23
|
+
- $w \in \mathbb{R}^r$ is a vector of non-negative, non-increasing weights.
|
|
24
|
+
|
|
25
|
+
This regularizaton problem was introduced by Oswal et al. (2016) and it is a
|
|
26
|
+
multi-task ($p > 1$) version of the standard ($p=1$) Ordered Weighted
|
|
27
|
+
$\ell_1$ (OWL) Norm introduced independently by Zeng and Figueiredo (2014a)
|
|
28
|
+
and Bogdan et al. (2013).
|
|
29
|
+
|
|
30
|
+
Due to the non-smothness of the GrOWL penalty, a closed-form solution to this
|
|
31
|
+
problem is not available. However, the objective function remains convex,
|
|
32
|
+
allowing the use of efficient proximal optimization algorithms to reliably compute
|
|
33
|
+
the solution. Specifically, it is used the Proximal Gradient Method with
|
|
34
|
+
**Fast Iterative Shrinkage-Thresholding Algorithm (FISTA)** from Beck and
|
|
35
|
+
Teboulle (2009). Readers who are not familiar with proximal algorithms are referred
|
|
36
|
+
to Parikh and Boyd (2013).
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## ๐ Mathematical Background
|
|
43
|
+
|
|
44
|
+
<!-- The standard ($p = 1$) Ordered Weighted $\ell_1$ (OWL) regularization
|
|
45
|
+
problem can be written as
|
|
46
|
+
|
|
47
|
+
$$
|
|
48
|
+
\min_{\beta} \frac{1}{2n} \lVerty - X\beta
|
|
49
|
+
Vert_2^2 + \sum_i w_i |\beta\_{[i]}|,
|
|
50
|
+
$$
|
|
51
|
+
|
|
52
|
+
where $w$ is as before but now we have $p=1$, and then $y := Y \in
|
|
53
|
+
\mathbb{R}^{n \times 1}$ and $\beta := B \in \mathbb{R}^{r \times 1}$,
|
|
54
|
+
with $\beta\_{[i]}$ being the $i$-th largest component of $\beta$. -->
|
|
55
|
+
|
|
56
|
+
Due to non-smothness of the penalty term in (1), this optimization problem
|
|
57
|
+
has no closed-form solution. Proximal operator algorithms is employed to solve
|
|
58
|
+
it. The proximal operator of the GrOWL norm is given by
|
|
59
|
+
|
|
60
|
+
$$
|
|
61
|
+
\mathrm{prox}_G(V) = \mathrm{arg min}_B \space \frac{1}{2} \lVert B - V \rVert_F^2 +
|
|
62
|
+
\sum_i w_i \space \lVert \beta\_{[i], \cdot} \rVert_2.
|
|
63
|
+
$$
|
|
64
|
+
|
|
65
|
+
The proximal operator of GrOWL is solved in terms of the proximal operator of
|
|
66
|
+
the standard OWL (when $p=1$) norm, denoted by $\mathrm{prox}\_{\Omega_w}$. We thus
|
|
67
|
+
have the following result:
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
**Theorem 4 from Oswal et al. (2016).**
|
|
72
|
+
Let $\tilde{v}_i = \lVert v\_{i,\cdot}\rVert$ for $i = 1, ..., p$. Then
|
|
73
|
+
$\mathrm{prox}_G(V) = \hat{V}$, where the $i$-th row of $\hat{V}$ is given by
|
|
74
|
+
|
|
75
|
+
$$
|
|
76
|
+
\hat{\mathbf{v}}\_{i,\cdot} =
|
|
77
|
+
\left(\mathrm{prox}\_{\Omega_w}(\tilde{\mathbf{v}}) \right)_i \times
|
|
78
|
+
\frac{\mathbf{v}\_{i,\cdot}}{\lVert \mathbf{v}\_{i,\cdot} \rVert}.
|
|
79
|
+
$$
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
|
|
83
|
+
The formulation of $\mathrm{prox}\_{\Omega_w}$ is given in equation (24) of
|
|
84
|
+
Zeng and Figueiredo (2014b):
|
|
85
|
+
|
|
86
|
+
$$
|
|
87
|
+
\mathrm{prox}\_{\Omega_w}(\mathbf{\tilde{v}}) =
|
|
88
|
+
\mathrm{sign}(\mathbf{\tilde{v}}) \odot \left( \mathbf{P}(|\mathbf{\tilde{v}}|)^T
|
|
89
|
+
\mathrm{proj}\_{\mathbb{R}\_+^n} \left( \mathrm{proj}\_{\mathcal{K}\_m}
|
|
90
|
+
(|\mathbf{\tilde{v}}|\_{\downarrow} - \mathbf{w}) \right) \right),
|
|
91
|
+
$$
|
|
92
|
+
|
|
93
|
+
where
|
|
94
|
+
- $\mathrm{sign}(\mathbf{\tilde{v}})$ denotes the elementwise sign
|
|
95
|
+
of vector $\mathbf{\tilde{v}}$.
|
|
96
|
+
- $\odot$ is the Hadamard (elementwise) product.
|
|
97
|
+
- $\mathbf{P}(|\mathbf{\tilde{v}}|)$ is the permutation matrix that
|
|
98
|
+
sorts the absolute values $|\mathbf{\tilde{v}}|$ in non-increasing order,
|
|
99
|
+
i.e., $|\mathbf{v}|\_{\downarrow} = \mathbf{P}(|\mathbf{\tilde{v}}|)
|
|
100
|
+
|\mathbf{\tilde{v}}|$.
|
|
101
|
+
- $\mathrm{proj}\_{\mathcal{K}_m}$ is the Euclidean projection
|
|
102
|
+
onto the monotone cone $\mathcal{K}\_m =$ {$\mathbf{x}
|
|
103
|
+
\in \mathbb{R}^n : x_1 \geq x_2 \geq \cdots \geq x_n$},
|
|
104
|
+
implemented using the Pool Adjacent Violators (PAV) algorithm.
|
|
105
|
+
- $\mathrm{proj}\_{\mathbb{R}_+^n}$ is the Euclidean projection
|
|
106
|
+
onto the nonnegative orthant, i.e., it replaces negative values by zero
|
|
107
|
+
(clipping).
|
|
108
|
+
- $\mathbf{w}$ is a weight vector satisfying $w_1 \geq w_2 \geq
|
|
109
|
+
\cdots \geq w_n \geq 0$.
|
|
110
|
+
- $|\mathbf{\tilde{v}}|\_{\downarrow}$ denotes the absolute values of
|
|
111
|
+
$\mathbf{\tilde{v}}$ sorted in non-increasing order.
|
|
112
|
+
|
|
113
|
+
We use **FISTA** (Beck and Teboulle, 2009), which is an accelerated first-order
|
|
114
|
+
method designed for problems of the form:
|
|
115
|
+
|
|
116
|
+
$$
|
|
117
|
+
\min_{B} f(B) + g(B),
|
|
118
|
+
$$
|
|
119
|
+
|
|
120
|
+
where $f$ is convex and differentiable with Lipschitz continuous gradient,
|
|
121
|
+
and $g$ is convex (possibly non-smooth) with a proximal operator that can be
|
|
122
|
+
computed efficiently.
|
|
123
|
+
|
|
124
|
+
In our case:
|
|
125
|
+
- $f(B) := \frac{1}{2n} \lVert Y - XB \rVert_2^2$ is the smooth loss,
|
|
126
|
+
- $g(B) := \sum_i w_i \space \lVert \beta\_{[i], \cdot}\rVert_2$ is the GrOWL
|
|
127
|
+
penalty.
|
|
128
|
+
|
|
129
|
+
FISTA proceeds by alternating between gradient descent steps on $f$ and proximal
|
|
130
|
+
steps on $g$, with a Nesterov-type momentum update to accelerate convergence. Each
|
|
131
|
+
iteration consists of:
|
|
132
|
+
|
|
133
|
+
1. **Gradient step:**
|
|
134
|
+
|
|
135
|
+
$$
|
|
136
|
+
V^{(k)} = Z^{(k)} - \frac{1}{L} \nabla f(Z^{(k)}),
|
|
137
|
+
$$
|
|
138
|
+
|
|
139
|
+
where $L$ is the Lipschitz constant of $\nabla f$, computed as
|
|
140
|
+
$L = \lVert X \rVert_2^2/n$, where $\lVert X \rVert_2$ denotes the spectral norm of the matrix $X$.
|
|
141
|
+
|
|
142
|
+
2. **Proximal step using the GrOWL operator:**
|
|
143
|
+
|
|
144
|
+
$$
|
|
145
|
+
B^{(k+1)} = \mathrm{prox}_G (V^{(k)}),
|
|
146
|
+
$$
|
|
147
|
+
|
|
148
|
+
which is implemented as described earlier, using the Pool Adjacent Violators (PAV)
|
|
149
|
+
algorithm for isotonic regression and restoring the original signs and order.
|
|
150
|
+
|
|
151
|
+
3. **Nesterov momentum step:**
|
|
152
|
+
|
|
153
|
+
$$
|
|
154
|
+
t_{k+1} = \frac{1}{2} \left(1 + \sqrt{1 + 4t_k^2} \right), \quad
|
|
155
|
+
Z^{(k+1)} = B^{(k+1)} + \left( \frac{t_k - 1}{t_{k+1}} \right) (B^{(k+1)} - B^{(k)}).
|
|
156
|
+
$$
|
|
157
|
+
|
|
158
|
+
The algorithm continues until convergence is detected, based on one of three
|
|
159
|
+
user-defined stopping criteria:
|
|
160
|
+
- Absolute change in objective value,
|
|
161
|
+
- Relative change in objective value,
|
|
162
|
+
- Frobenius norm of the difference between successive iterates.
|
|
163
|
+
|
|
164
|
+
This FISTA procedure is implemented in the function `growl_fista()` inside the file
|
|
165
|
+
'fista_solver.py' in the codebase. The function handles flexible weight vector definitions
|
|
166
|
+
(manual or parameterized via `lambda_1`, `lambda_2`, and `ramp_size`) and returns the
|
|
167
|
+
estimated coefficient matrix along with the cost history.
|
|
168
|
+
|
|
169
|
+
The proximal operators evaluations are implemented in the functions 'prox_owl()' and
|
|
170
|
+
'prox_growl()' inside the file 'prox_operator.py'.
|
|
171
|
+
|
|
172
|
+
---
|
|
173
|
+
|
|
174
|
+
---
|
|
175
|
+
|
|
176
|
+
## ๐ Repository Structure
|
|
177
|
+
|
|
178
|
+
Below are the important modules in this project and their functionalities:
|
|
179
|
+
|
|
180
|
+
1. **`__init__.py`**
|
|
181
|
+
This file is part of the `growl/` module and exposes the `GrowlRegressor` class
|
|
182
|
+
as the main interface for the package. It enables clean imports such as:
|
|
183
|
+
|
|
184
|
+
```python
|
|
185
|
+
from growl import GrowlRegressor
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
3. **`base.py`**
|
|
189
|
+
Contains the main class `GrowlRegressor`, a `scikit-learn` compatible estimator that
|
|
190
|
+
implements GrOWL regression. This class provides:
|
|
191
|
+
|
|
192
|
+
- `.fit(X, Y)` to estimate coefficients using the GrOWL penalty
|
|
193
|
+
- `.predict(X)` for in-sample or out-of-sample predictions
|
|
194
|
+
- Integration with `GridSearchCV`
|
|
195
|
+
- Optional centering of `X` and `Y` when `fit_intercept=True`
|
|
196
|
+
- Storage of the coefficient matrix `coef_` and optimization history `cost_history_`
|
|
197
|
+
|
|
198
|
+
5. **`prox_operator.py`**
|
|
199
|
+
Implements proximal operators required for optimization:
|
|
200
|
+
- `prox_owl(v, w)`: Evaluate the proximal operator for the OWL penalty.
|
|
201
|
+
- `prox_growl(V, w)`: Evaluate the proximal operator for the GrOWL penalty.
|
|
202
|
+
|
|
203
|
+
6. **`fista_solver.py`**
|
|
204
|
+
Implements the FISTA-based optimization routine used to solve the GrOWL regularized
|
|
205
|
+
least-squares problem. This module includes:
|
|
206
|
+
|
|
207
|
+
- `growl_fista(...)`: A solver using Nesterovโs acceleration
|
|
208
|
+
- Weight vector construction based on `lambda_1`, `lambda_2`, and `ramp_size`
|
|
209
|
+
- Convergence monitoring based on cost, relative cost, or solution change
|
|
210
|
+
- Optional scaling of the objective function to improve numerical stability
|
|
211
|
+
|
|
212
|
+
8. **`growl_example.py`**
|
|
213
|
+
Located in the `examples/` folder, this script demonstrates the usage of
|
|
214
|
+
the `GrowlRegressor`:
|
|
215
|
+
|
|
216
|
+
- Grid search over hyperparameters (`lambda_1`, `lambda_2`, `ramp_size`)
|
|
217
|
+
- Visual comparisons between:
|
|
218
|
+
- True vs estimated coefficients
|
|
219
|
+
- GrOWL vs MultiTaskLasso (for pooled regression)
|
|
220
|
+
- GrOWL (OWL style) vs Lasso (for standard regression)
|
|
221
|
+
- Plots showing grouping behavior and coefficient shrinkage
|
|
222
|
+
|
|
223
|
+
To run the example, use:
|
|
224
|
+
```bash
|
|
225
|
+
python examples/growl_example.py
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
---
|
|
229
|
+
|
|
230
|
+
---
|
|
231
|
+
|
|
232
|
+
## โ๏ธ Setup
|
|
233
|
+
|
|
234
|
+
**Install the repository:**
|
|
235
|
+
|
|
236
|
+
```bash
|
|
237
|
+
pip install growl_reg
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
---
|
|
241
|
+
|
|
242
|
+
---
|
|
243
|
+
|
|
244
|
+
## ๐ References
|
|
245
|
+
|
|
246
|
+
Beck, A. and Teboulle, M. "A fast iterative shrinkage-thresholding algorithm
|
|
247
|
+
for linear inverse problems", _SIAM Journal on Imaging Sciences, vol. 2, no. 1,
|
|
248
|
+
pp. 183โ202_, 2009.
|
|
249
|
+
|
|
250
|
+
Bogdan, J., Berg, E., Su, W. and Candes, E. "Statistical
|
|
251
|
+
estimation and testing via the ordered $\ell_1$ norm", arXiv preprint
|
|
252
|
+
[arxiv:1310.1969v2](https://arxiv.org/abs/1310.1969) 2013.
|
|
253
|
+
|
|
254
|
+
Oswal, U., Cox, C., Ralph, M. A. L., and Rogers, T., Nowak, R., 2016.
|
|
255
|
+
"Representational Similarity Learning with Application to Brain Networks".
|
|
256
|
+
_Proceedings of the 33 rd International Conference on Machine Learning,
|
|
257
|
+
New York, NY, USA, 2016. JMLR: W\&CP volume 48_.
|
|
258
|
+
|
|
259
|
+
Parikh, Neal and Boyd, Stephen. "Proximal algorithms". _Foundations and Trends
|
|
260
|
+
in optimization_, 1(3):123โ231, 2013.
|
|
261
|
+
|
|
262
|
+
Zeng, X. and Figueiredo, M, 2014a. "Decreasing Weighted Sorted $\ell_1$
|
|
263
|
+
Regularization". arXiv preprint
|
|
264
|
+
[arXiv:1404.3184v1](https://arxiv.org/abs/1404.3184), 2014.
|
|
265
|
+
|
|
266
|
+
Zeng, X. and Figueiredo, M, 2014b. "The ordered weighted $\ell_1$ norm - atomic
|
|
267
|
+
formulation, projections, and Algorithms". arXiv preprint
|
|
268
|
+
[arXiv:1409.4271v5](https://arxiv.org/abs/1409.4271), 2014.
|
|
269
|
+
|
|
270
|
+
---
|
|
271
|
+
|
|
272
|
+
## ๐ Citation
|
|
273
|
+
|
|
274
|
+
If you use `growl_reg` in your work, please cite it as:
|
|
275
|
+
|
|
276
|
+
Matheus Lopes Carrijo. "GrOWL Regression Estimator (Python package)." 2025.
|
|
277
|
+
Available at: https://github.com/matheuscarrijo/growl_reg
|
|
278
|
+
|
|
279
|
+
|
|
280
|
+
Or use the following BibTeX entry:
|
|
281
|
+
|
|
282
|
+
```bibtex
|
|
283
|
+
@misc{carrijo2025growl,
|
|
284
|
+
author = {Carrijo, M. L.},
|
|
285
|
+
title = {GrOWL Regression Estimator (Python Package)},
|
|
286
|
+
year = {2025},
|
|
287
|
+
howpublished = {https://github.com/matheuscarrijo/growl_reg},
|
|
288
|
+
note = {Version 0.0.1}
|
|
289
|
+
}
|
|
290
|
+
```
|