python-gls 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (36) hide show
  1. python_gls-0.1.0/LICENSE +21 -0
  2. python_gls-0.1.0/PKG-INFO +361 -0
  3. python_gls-0.1.0/README.md +335 -0
  4. python_gls-0.1.0/pyproject.toml +46 -0
  5. python_gls-0.1.0/python_gls/__init__.py +29 -0
  6. python_gls-0.1.0/python_gls/_parametrization.py +137 -0
  7. python_gls-0.1.0/python_gls/correlation/__init__.py +29 -0
  8. python_gls-0.1.0/python_gls/correlation/ar1.py +67 -0
  9. python_gls-0.1.0/python_gls/correlation/arma.py +118 -0
  10. python_gls-0.1.0/python_gls/correlation/base.py +125 -0
  11. python_gls-0.1.0/python_gls/correlation/car1.py +92 -0
  12. python_gls-0.1.0/python_gls/correlation/comp_symm.py +69 -0
  13. python_gls-0.1.0/python_gls/correlation/spatial.py +190 -0
  14. python_gls-0.1.0/python_gls/correlation/symm.py +85 -0
  15. python_gls-0.1.0/python_gls/likelihood.py +302 -0
  16. python_gls-0.1.0/python_gls/model.py +511 -0
  17. python_gls-0.1.0/python_gls/results.py +223 -0
  18. python_gls-0.1.0/python_gls/variance/__init__.py +19 -0
  19. python_gls-0.1.0/python_gls/variance/base.py +101 -0
  20. python_gls-0.1.0/python_gls/variance/comb.py +82 -0
  21. python_gls-0.1.0/python_gls/variance/const_power.py +50 -0
  22. python_gls-0.1.0/python_gls/variance/exp.py +50 -0
  23. python_gls-0.1.0/python_gls/variance/fixed.py +46 -0
  24. python_gls-0.1.0/python_gls/variance/ident.py +84 -0
  25. python_gls-0.1.0/python_gls/variance/power.py +52 -0
  26. python_gls-0.1.0/python_gls.egg-info/PKG-INFO +361 -0
  27. python_gls-0.1.0/python_gls.egg-info/SOURCES.txt +34 -0
  28. python_gls-0.1.0/python_gls.egg-info/dependency_links.txt +1 -0
  29. python_gls-0.1.0/python_gls.egg-info/requires.txt +9 -0
  30. python_gls-0.1.0/python_gls.egg-info/top_level.txt +1 -0
  31. python_gls-0.1.0/setup.cfg +4 -0
  32. python_gls-0.1.0/tests/test_against_r.py +515 -0
  33. python_gls-0.1.0/tests/test_correlation.py +178 -0
  34. python_gls-0.1.0/tests/test_model.py +192 -0
  35. python_gls-0.1.0/tests/test_stress.py +1078 -0
  36. python_gls-0.1.0/tests/test_variance.py +103 -0
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Bruno Abrahao
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,361 @@
1
+ Metadata-Version: 2.4
2
+ Name: python-gls
3
+ Version: 0.1.0
4
+ Summary: GLS estimator with learned correlation and variance structures (Python equivalent of R's nlme::gls)
5
+ Author: Bruno Abrahao
6
+ License-Expression: MIT
7
+ Classifier: Development Status :: 3 - Alpha
8
+ Classifier: Intended Audience :: Science/Research
9
+ Classifier: Programming Language :: Python :: 3
10
+ Classifier: Programming Language :: Python :: 3.10
11
+ Classifier: Programming Language :: Python :: 3.11
12
+ Classifier: Programming Language :: Python :: 3.12
13
+ Classifier: Topic :: Scientific/Engineering :: Mathematics
14
+ Requires-Python: >=3.10
15
+ Description-Content-Type: text/markdown
16
+ License-File: LICENSE
17
+ Requires-Dist: numpy>=1.24
18
+ Requires-Dist: scipy>=1.10
19
+ Requires-Dist: pandas>=2.0
20
+ Requires-Dist: formulaic>=1.0
21
+ Provides-Extra: dev
22
+ Requires-Dist: pytest>=7.0; extra == "dev"
23
+ Requires-Dist: pytest-cov>=4.0; extra == "dev"
24
+ Requires-Dist: statsmodels>=0.14; extra == "dev"
25
+ Dynamic: license-file
26
+
27
+ # python-gls
28
+
29
+ **GLS with learned correlation and variance structures for Python.**
30
+
31
+ The missing Python equivalent of R's `nlme::gls()`. Unlike `statsmodels.GLS` (which requires you to supply a pre-computed covariance matrix), `python-gls` *estimates* the correlation and variance structure from your data via maximum likelihood (ML) or restricted maximum likelihood (REML) — exactly like R's `nlme::gls()`.
32
+
33
+ ## Why this library?
34
+
35
+ If you work with **panel data**, **repeated measures**, **longitudinal studies**, or **clustered observations**, your errors are probably correlated and possibly heteroscedastic. Ignoring this gives you wrong standard errors and misleading p-values.
36
+
37
+ R has had `nlme::gls()` for 25+ years. Python hasn't had an equivalent. Until now.
38
+
39
+ | Feature | `statsmodels.GLS` | `python-gls` | R `nlme::gls` |
40
+ |---|---|---|---|
41
+ | Estimate correlation from data | No (manual Omega) | Yes | Yes |
42
+ | AR(1), compound symmetry, etc. | No | Yes (11 structures) | Yes |
43
+ | Heteroscedastic variance models | No | Yes (6 functions) | Yes |
44
+ | ML / REML estimation | No | Yes | Yes |
45
+ | R-style formulas | No | Yes | Yes |
46
+
47
+ ## Installation
48
+
49
+ ```bash
50
+ pip install python-gls
51
+ ```
52
+
53
+ Or from source:
54
+
55
+ ```bash
56
+ git clone https://github.com/brunoabrahao/python-gls.git
57
+ cd python-gls
58
+ pip install -e ".[dev]"
59
+ ```
60
+
61
+ ## Quick Start
62
+
63
+ ```python
64
+ from python_gls import GLS
65
+ from python_gls.correlation import CorAR1
66
+ from python_gls.variance import VarIdent
67
+
68
+ result = GLS.from_formula(
69
+ "response ~ treatment + time",
70
+ data=df,
71
+ correlation=CorAR1(), # Learn AR(1) correlation
72
+ variance=VarIdent("group"), # Learn group-specific variances
73
+ groups="subject", # Define independent clusters
74
+ ).fit()
75
+
76
+ print(result.summary())
77
+ print(f"Estimated AR(1) phi: {result.correlation_params[0]:.3f}")
78
+ ```
79
+
80
+ Output:
81
+
82
+ ```
83
+ ==============================================================================
84
+ Generalized Least Squares Results
85
+ ==============================================================================
86
+ Method: REML Log-Likelihood: -615.0544
87
+ No. Observations:500 AIC: 1240.1088
88
+ Df Model: 2 BIC: 1261.1818
89
+ Df Residuals: 497 Sigma^2: 0.984576
90
+ Converged: Yes Iterations: 6
91
+ ------------------------------------------------------------------------------
92
+ coef std err t P>|t| [0.025 0.975]
93
+ ------------------------------------------------------------------------------
94
+ Intercept 1.0368 0.1069 9.7013 0.0000 0.8268 1.2468
95
+ treatment 0.6465 0.1428 4.5272 0.0000 0.3659 0.9271
96
+ x 1.9734 0.0323 61.0960 0.0000 1.9099 2.0368
97
+ ==============================================================================
98
+ Correlation Structure: CorAR1
99
+ Parameters: [0.61312872]
100
+ ```
101
+
102
+ ## Correlation Structures
103
+
104
+ All correlation structures are in `python_gls.correlation`.
105
+
106
+ ### Temporal / Serial Correlation
107
+
108
+ | Class | R Equivalent | Parameters | Description |
109
+ |---|---|---|---|
110
+ | `CorAR1(phi=None)` | `corAR1()` | 1 | First-order autoregressive. R[i,j] = phi^|i-j| |
111
+ | `CorARMA(p, q)` | `corARMA(p, q)` | p + q | ARMA(p,q) autocorrelation |
112
+ | `CorCAR1(phi=None)` | `corCAR1()` | 1 | Continuous-time AR(1) for irregular spacing |
113
+ | `CorCompSymm(rho=None)` | `corCompSymm()` | 1 | Exchangeable / compound symmetry. All pairs equal rho |
114
+ | `CorSymm(dim=None)` | `corSymm()` | d(d-1)/2 | General unstructured. Free correlation for every pair |
115
+
116
+ ### Spatial Correlation
117
+
118
+ | Class | R Equivalent | Parameters | Description |
119
+ |---|---|---|---|
120
+ | `CorExp(range_param, nugget=False)` | `corExp()` | 1-2 | Exponential: exp(-d/range) |
121
+ | `CorGaus(range_param, nugget=False)` | `corGaus()` | 1-2 | Gaussian: exp(-(d/range)^2) |
122
+ | `CorLin(range_param, nugget=False)` | `corLin()` | 1-2 | Linear: max(1 - d/range, 0) |
123
+ | `CorRatio(range_param, nugget=False)` | `corRatio()` | 1-2 | Rational quadratic: 1/(1 + (d/range)^2) |
124
+ | `CorSpher(range_param, nugget=False)` | `corSpher()` | 1-2 | Spherical: cubic polynomial, zero beyond range |
125
+
126
+ All spatial structures accept an optional `nugget=True` parameter for a discontinuity at distance zero.
127
+
128
+ ### Usage
129
+
130
+ ```python
131
+ from python_gls.correlation import CorAR1, CorSymm, CorExp
132
+
133
+ # Serial: AR(1) with optional initial value
134
+ cor = CorAR1(phi=0.5)
135
+
136
+ # Unstructured: all pairs free
137
+ cor = CorSymm() # dimension inferred from data
138
+
139
+ # Spatial: set coordinates per group
140
+ cor = CorExp(range_param=10.0, nugget=True)
141
+ cor.set_coordinates(group_id=0, coords=np.array([[0,0], [1,0], [0,1]]))
142
+ ```
143
+
144
+ ## Variance Functions
145
+
146
+ All variance functions are in `python_gls.variance`.
147
+
148
+ | Class | R Equivalent | Parameters | Description |
149
+ |---|---|---|---|
150
+ | `VarIdent(group_var)` | `varIdent(form=~1\|group)` | G-1 | Different variance per group level |
151
+ | `VarPower(covariate)` | `varPower(form=~cov)` | 1 | sd = |v|^delta |
152
+ | `VarExp(covariate)` | `varExp(form=~cov)` | 1 | sd = exp(delta * v) |
153
+ | `VarConstPower(covariate)` | `varConstPower(form=~cov)` | 2 | sd = (c + |v|^delta) |
154
+ | `VarFixed(weights_var)` | `varFixed(~cov)` | 0 | Pre-specified weights (not estimated) |
155
+ | `VarComb(*varfuncs)` | `varComb(...)` | sum | Product of multiple variance functions |
156
+
157
+ ### Usage
158
+
159
+ ```python
160
+ from python_gls.variance import VarIdent, VarPower, VarComb
161
+
162
+ # Different variance for treatment vs. control
163
+ var = VarIdent("treatment_group")
164
+
165
+ # Variance increases with fitted values
166
+ var = VarPower("fitted_values")
167
+
168
+ # Combine: group-specific + covariate-dependent
169
+ var = VarComb(VarIdent("group"), VarPower("x"))
170
+ ```
171
+
172
+ ## API Reference
173
+
174
+ ### `GLS` Class
175
+
176
+ #### Construction
177
+
178
+ ```python
179
+ # From formula (recommended)
180
+ model = GLS.from_formula(
181
+ formula, # R-style formula: "y ~ x1 + x2"
182
+ data, # pandas DataFrame
183
+ correlation=None, # CorStruct instance
184
+ variance=None, # VarFunc instance
185
+ groups=None, # str: column name for groups
186
+ method="REML", # "ML" or "REML"
187
+ )
188
+
189
+ # From arrays
190
+ model = GLS(
191
+ endog=y, # response vector
192
+ exog=X, # design matrix (include intercept column)
193
+ correlation=None,
194
+ variance=None,
195
+ groups=None, # array of group labels
196
+ method="REML",
197
+ )
198
+ ```
199
+
200
+ #### Fitting
201
+
202
+ ```python
203
+ result = model.fit(
204
+ maxiter=200, # max optimization iterations
205
+ tol=1e-8, # convergence tolerance
206
+ verbose=False, # print optimization progress
207
+ )
208
+ ```
209
+
210
+ ### `GLSResults` Class
211
+
212
+ | Property / Method | Type | Description |
213
+ |---|---|---|
214
+ | `params` | Series | Estimated coefficients |
215
+ | `bse` | Series | Standard errors |
216
+ | `tvalues` | Series | t-statistics |
217
+ | `pvalues` | Series | Two-sided p-values |
218
+ | `conf_int(alpha=0.05)` | DataFrame | Confidence intervals |
219
+ | `sigma2` | float | Estimated residual variance |
220
+ | `loglik` | float | Log-likelihood at convergence |
221
+ | `aic` | float | Akaike Information Criterion |
222
+ | `bic` | float | Bayesian Information Criterion |
223
+ | `resid` | array | Residuals (y - X*beta) |
224
+ | `fittedvalues` | array | Fitted values (X*beta) |
225
+ | `correlation_params` | array | Estimated correlation parameters |
226
+ | `variance_params` | array | Estimated variance parameters |
227
+ | `cov_params_func()` | DataFrame | Covariance matrix of beta |
228
+ | `summary()` | str | Formatted results table |
229
+ | `converged` | bool | Optimization convergence status |
230
+ | `n_iter` | int | Number of iterations |
231
+ | `method` | str | "ML" or "REML" |
232
+
233
+ ## How It Works
234
+
235
+ ### The Statistical Model
236
+
237
+ GLS models the response as:
238
+
239
+ **y = X*beta + epsilon**, where **Var(epsilon) = sigma^2 * Omega**
240
+
241
+ The covariance matrix Omega is block-diagonal by group:
242
+
243
+ **Omega_g = A_g^{1/2} R_g A_g^{1/2}**
244
+
245
+ where:
246
+ - **R_g** is the correlation matrix (from the correlation structure)
247
+ - **A_g** is a diagonal matrix of variance weights (from the variance function)
248
+
249
+ ### Estimation
250
+
251
+ 1. **OLS initial fit** to get starting residuals
252
+ 2. **Initialize** correlation and variance parameters from residuals
253
+ 3. **Optimize** profile log-likelihood over correlation/variance parameters using L-BFGS-B. At each step, beta and sigma^2 are profiled out analytically.
254
+ 4. **Compute** final GLS estimates at the converged parameters:
255
+ - beta = (X' Omega^{-1} X)^{-1} X' Omega^{-1} y
256
+ - Cov(beta) = sigma^2 (X' Omega^{-1} X)^{-1}
257
+
258
+ ### Key Design Decisions
259
+
260
+ **Spherical parametrization** for `CorSymm`: The unstructured correlation matrix is parametrized via angles that map to a Cholesky factor, guaranteeing positive-definiteness without constrained optimization. Based on [Pinheiro & Bates (1996)](https://doi.org/10.1007/BF00140873).
261
+
262
+ **Block-diagonal inversion**: Omega is inverted per-group (O(n*m^3)) rather than as a full matrix (O(N^3)), where n = number of groups and m = group size.
263
+
264
+ **REML**: Restricted maximum likelihood integrates out the fixed effects from the likelihood, giving unbiased variance estimates. This is the default, matching R's `nlme::gls()`.
265
+
266
+ ## Formula Syntax
267
+
268
+ Powered by [formulaic](https://github.com/matthewwardrop/formulaic), supporting:
269
+
270
+ ```python
271
+ # Simple linear
272
+ "y ~ x1 + x2"
273
+
274
+ # Categorical variables
275
+ "y ~ C(treatment)"
276
+
277
+ # Interactions
278
+ "y ~ x1 * x2" # x1 + x2 + x1:x2
279
+ "y ~ x1 : x2" # just the interaction
280
+
281
+ # Transformations
282
+ "y ~ np.log(x1) + x2"
283
+
284
+ # Remove intercept
285
+ "y ~ x1 + x2 - 1"
286
+ ```
287
+
288
+ ## ML vs. REML
289
+
290
+ | | ML | REML |
291
+ |---|---|---|
292
+ | Variance estimate | Biased (divides by N) | Unbiased (divides by N-k) |
293
+ | Default in R's gls | No | Yes |
294
+ | Default here | No | Yes |
295
+ | Use for model comparison | AIC/BIC of nested & non-nested models | Only models with same fixed effects |
296
+ | `method=` | `"ML"` | `"REML"` |
297
+
298
+ ## Translating from R
299
+
300
+ ### R code → Python equivalent
301
+
302
+ ```r
303
+ # R
304
+ library(nlme)
305
+ m <- gls(y ~ x1 + x2,
306
+ data = df,
307
+ correlation = corAR1(form = ~1|subject),
308
+ weights = varIdent(form = ~1|group),
309
+ method = "REML")
310
+ summary(m)
311
+ intervals(m)
312
+ ```
313
+
314
+ ```python
315
+ # Python
316
+ from python_gls import GLS
317
+ from python_gls.correlation import CorAR1
318
+ from python_gls.variance import VarIdent
319
+
320
+ r = GLS.from_formula(
321
+ "y ~ x1 + x2",
322
+ data=df,
323
+ correlation=CorAR1(),
324
+ variance=VarIdent("group"),
325
+ groups="subject",
326
+ method="REML",
327
+ ).fit()
328
+
329
+ print(r.summary())
330
+ print(r.conf_int())
331
+ ```
332
+
333
+ ### Parameter name mapping
334
+
335
+ | R | Python | Notes |
336
+ |---|---|---|
337
+ | `corAR1(form=~1\|subject)` | `CorAR1(), groups="subject"` | Groups specified at model level |
338
+ | `corCompSymm(form=~1\|id)` | `CorCompSymm(), groups="id"` | |
339
+ | `corSymm(form=~1\|id)` | `CorSymm(), groups="id"` | |
340
+ | `corExp(form=~x+y\|id)` | `CorExp(); cor.set_coordinates(...)` | Coordinates set per group |
341
+ | `varIdent(form=~1\|group)` | `VarIdent("group")` | Group variable as string |
342
+ | `varPower(form=~fitted)` | `VarPower("fitted")` | Covariate name as string |
343
+ | `method="REML"` | `method="REML"` | Same |
344
+
345
+ ## Dependencies
346
+
347
+ - **numpy** >= 1.24
348
+ - **scipy** >= 1.10
349
+ - **pandas** >= 2.0
350
+ - **formulaic** >= 1.0
351
+
352
+ ## Testing
353
+
354
+ ```bash
355
+ pip install -e ".[dev]"
356
+ pytest
357
+ ```
358
+
359
+ ## License
360
+
361
+ MIT