mgcv-rust 0.2.2__cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
mgcv_rust/__init__.py ADDED
@@ -0,0 +1,5 @@
1
+ from .mgcv_rust import *
2
+
3
+ __doc__ = mgcv_rust.__doc__
4
+ if hasattr(mgcv_rust, "__all__"):
5
+ __all__ = mgcv_rust.__all__
@@ -0,0 +1,291 @@
1
+ Metadata-Version: 2.4
2
+ Name: mgcv-rust
3
+ Version: 0.2.2
4
+ Classifier: Development Status :: 3 - Alpha
5
+ Classifier: Intended Audience :: Science/Research
6
+ Classifier: License :: OSI Approved :: MIT License
7
+ Classifier: Programming Language :: Python :: 3
8
+ Classifier: Programming Language :: Python :: 3.8
9
+ Classifier: Programming Language :: Python :: 3.9
10
+ Classifier: Programming Language :: Python :: 3.10
11
+ Classifier: Programming Language :: Python :: 3.11
12
+ Classifier: Programming Language :: Python :: 3.12
13
+ Classifier: Programming Language :: Rust
14
+ Classifier: Programming Language :: Python :: Implementation :: CPython
15
+ Classifier: Topic :: Scientific/Engineering
16
+ Requires-Dist: numpy>=1.20.0
17
+ License-File: LICENSE
18
+ Summary: Rust implementation of Generalized Additive Models with Python bindings
19
+ Keywords: statistics,machine-learning,gam,regression,splines
20
+ Author-email: Aleksander Jaworski <ale.jaworski@gmail.com>
21
+ Requires-Python: >=3.8
22
+ Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
23
+ Project-URL: Homepage, https://github.com/AlekJaworski/nn_exploring
24
+ Project-URL: Repository, https://github.com/AlekJaworski/nn_exploring
25
+ Project-URL: Issues, https://github.com/AlekJaworski/nn_exploring/issues
26
+
27
+ # mgcv_rust: Generalized Additive Models in Rust
28
+
29
+ A Rust implementation of Generalized Additive Models (GAMs) with automatic smoothing parameter selection using REML (Restricted Maximum Likelihood) and the PiRLS (Penalized Iteratively Reweighted Least Squares) algorithm, inspired by R's `mgcv` package.
30
+
31
+ ## Features
32
+
33
+ - **Multiple Distribution Families**: Gaussian, Binomial, Poisson, and Gamma
34
+ - **Flexible Basis Functions**:
35
+ - Cubic B-splines with natural boundary conditions
36
+ - Thin plate splines for smooth multivariate regression
37
+ - **Automatic Smoothing**:
38
+ - REML (Restricted Maximum Likelihood) criterion
39
+ - GCV (Generalized Cross-Validation) criterion
40
+ - **PiRLS Algorithm**: Efficient fitting via Penalized Iteratively Reweighted Least Squares
41
+ - **Pure Rust**: No external BLAS/LAPACK dependencies
42
+ - **Test-Driven Development**: Comprehensive test suite with 20+ passing tests
43
+
44
+ ## Installation
45
+
46
+ Add to your `Cargo.toml`:
47
+
48
+ ```toml
49
+ [dependencies]
50
+ mgcv_rust = { path = "." }
51
+ ndarray = "0.16"
52
+ ```
53
+
54
+ ## Quick Start
55
+
56
+ ### Python (Recommended)
57
+
58
+ ```python
59
+ import numpy as np
60
+ from mgcv_rust import GAM
61
+
62
+ # Generate data: y = sin(2πx) + noise
63
+ X = np.random.uniform(0, 1, (500, 2))
64
+ y = np.sin(2 * np.pi * X[:, 0]) + 0.5 * (X[:, 1] - 0.5)**2
65
+
66
+ # Fit GAM with automatic smooth setup
67
+ gam = GAM()
68
+ result = gam.fit(X, y, k=[10, 15]) # That's it!
69
+
70
+ print(f"Lambda values: {result['lambda']}")
71
+ print(f"Deviance: {result['deviance']}")
72
+
73
+ # Make predictions
74
+ X_test = np.random.uniform(0, 1, (100, 2))
75
+ predictions = gam.predict(X_test)
76
+ ```
77
+
78
+ **Performance**: 1.5x - 65x faster than R's mgcv (problem-dependent)
79
+
80
+ See `API_SIMPLIFICATION.md` for more details on the simplified Python API.
81
+
82
+ ### Rust
83
+
84
+ ```rust
85
+ use mgcv_rust::{GAM, Family, SmoothTerm, OptimizationMethod};
86
+ use ndarray::{Array1, Array2};
87
+
88
+ fn main() -> Result<(), Box<dyn std::error::Error>> {
89
+ // Generate data: y = sin(2πx) + noise
90
+ let n = 100;
91
+ let x_data: Vec<f64> = (0..n).map(|i| i as f64 / n as f64).collect();
92
+ let y_data: Vec<f64> = x_data
93
+ .iter()
94
+ .map(|&xi| (2.0 * std::f64::consts::PI * xi).sin() + noise())
95
+ .collect();
96
+
97
+ let x = Array1::from_vec(x_data);
98
+ let y = Array1::from_vec(y_data);
99
+ let x_matrix = x.into_shape((n, 1))?;
100
+
101
+ // Create GAM with cubic spline smooth
102
+ let mut gam = GAM::new(Family::Gaussian);
103
+ let smooth = SmoothTerm::cubic_spline("x".to_string(), 20, 0.0, 1.0)?;
104
+ gam.add_smooth(smooth);
105
+
106
+ // Fit with REML smoothing parameter selection
107
+ gam.fit(
108
+ &x_matrix,
109
+ &y,
110
+ OptimizationMethod::REML,
111
+ 5, // max outer iterations
112
+ 50, // max inner iterations (PiRLS)
113
+ 1e-4 // convergence tolerance
114
+ )?;
115
+
116
+ // Make predictions
117
+ let predictions = gam.predict(&x_test)?;
118
+
119
+ Ok(())
120
+ }
121
+ ```
122
+
123
+ ## Architecture
124
+
125
+ ### Core Components
126
+
127
+ 1. **`basis.rs`**: Basis function implementations
128
+ - `CubicSpline`: Cubic B-spline basis with configurable knots
129
+ - `ThinPlateSpline`: Radial basis functions for smooth regression
130
+
131
+ 2. **`penalty.rs`**: Penalty matrix construction
132
+ - Second derivative penalties for smoothness
133
+ - Supports multiple penalty types per basis
134
+
135
+ 3. **`pirls.rs`**: Penalized IRLS fitting algorithm
136
+ - Implements PiRLS for GLMs with penalties
137
+ - Supports all standard GLM families
138
+ - Automatic weight computation and convergence checking
139
+
140
+ 4. **`reml.rs`**: Smoothing parameter selection
141
+ - REML criterion for optimal smoothing
142
+ - GCV criterion as alternative
143
+ - Log-determinant computations
144
+
145
+ 5. **`smooth.rs`**: Smoothing parameter optimization
146
+ - Coordinate descent optimization
147
+ - Grid search for initialization
148
+ - Works in log-space for numerical stability
149
+
150
+ 6. **`gam.rs`**: Main GAM model interface
151
+ - Combines all components
152
+ - Handles multiple smooth terms
153
+ - Outer loop for lambda optimization
154
+
155
+ 7. **`linalg.rs`**: Linear algebra operations
156
+ - Gaussian elimination with partial pivoting
157
+ - Matrix inversion via Gauss-Jordan
158
+ - Determinant computation via LU decomposition
159
+
160
+ ## Mathematical Background
161
+
162
+ ### GAM Model
163
+
164
+ ```
165
+ g(E[Y]) = β₀ + f₁(x₁) + f₂(x₂) + ... + fₚ(xₚ)
166
+ ```
167
+
168
+ Where:
169
+ - `g()` is the link function
170
+ - `fᵢ()` are smooth functions represented by basis expansions
171
+ - Each smooth is penalized by `λᵢ ∫ (f''ᵢ(x))² dx`
172
+
173
+ ### PiRLS Algorithm
174
+
175
+ 1. Initialize: η = g(y)
176
+ 2. Until convergence:
177
+ - Compute μ = g⁻¹(η)
178
+ - Compute weights: w = (g'(μ))² / V(μ)
179
+ - Compute working response: z = η + (y - μ) / g'(μ)
180
+ - Solve: β = (X'WX + λS)⁻¹ X'Wz
181
+ - Update: η = Xβ
182
+
183
+ ### REML Criterion
184
+
185
+ ```
186
+ REML(λ) = n·log(RSS) + log|X'WX + λS| - log|S|
187
+ ```
188
+
189
+ Minimized with respect to λ to select optimal smoothing parameters.
190
+
191
+ ## Examples
192
+
193
+ See `examples/simple_gam.rs` for a complete working example:
194
+
195
+ ```bash
196
+ cargo run --example simple_gam --release
197
+ ```
198
+
199
+ ## Project Structure
200
+
201
+ ```
202
+ ├── src/ # Core Rust library code
203
+ ├── examples/ # Rust usage examples
204
+ ├── benches/ # Rust benchmarks
205
+ ├── tests/ # Rust tests
206
+ ├── scripts/ # Testing and benchmarking scripts
207
+ │ ├── python/ # Python scripts
208
+ │ │ ├── tests/ # Python test scripts
209
+ │ │ └── benchmarks/ # Python benchmark scripts
210
+ │ └── r/ # R scripts
211
+ │ ├── tests/ # R test scripts
212
+ │ └── benchmarks/ # R benchmark scripts
213
+ ├── docs/ # Documentation and analysis
214
+ └── test_data/ # Test data and results
215
+ ```
216
+
217
+ ## Testing
218
+
219
+ Run the Rust test suite:
220
+
221
+ ```bash
222
+ cargo test
223
+ ```
224
+
225
+ All 20 tests should pass, covering:
226
+ - Basis function evaluation
227
+ - Penalty matrix construction
228
+ - Linear algebra operations
229
+ - REML/GCV criteria
230
+ - PiRLS convergence
231
+ - Full GAM fitting pipeline
232
+
233
+ Additional tests and benchmarks are available in the `scripts/` directory.
234
+
235
+ ## Implementation Notes
236
+
237
+ - **TDD Approach**: Every feature was implemented with tests first
238
+ - **No External Dependencies**: Custom linear algebra to avoid BLAS/LAPACK issues
239
+ - **Numerical Stability**: Operations performed in log-space where appropriate
240
+ - **Extensible Design**: Easy to add new basis types, families, or criteria
241
+
242
+ ## Limitations & Future Work
243
+
244
+ - Smoothing parameter optimization could be improved with better algorithms (e.g., Newton-Raphson)
245
+ - Eigendecomposition for handling penalty null spaces more rigorously
246
+ - Confidence intervals and standard errors
247
+ - Model diagnostics and residual analysis
248
+ - Tensor product smooths for multivariate terms
249
+ - Parallel processing for large datasets
250
+
251
+ ## References
252
+
253
+ - Wood, S.N. (2017). Generalized Additive Models: An Introduction with R (2nd ed.). Chapman and Hall/CRC.
254
+ - Wood, S.N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. JRSS-B, 73(1), 3-36.
255
+
256
+ ## License
257
+
258
+ MIT License - see LICENSE file for details
259
+
260
+ ## Author
261
+
262
+ Implemented as a Rust port of R's mgcv package core functionality.
263
+
264
+ ## Update: REML Implementation Fixed! ✅
265
+
266
+ **You were absolutely right** - the REML implementation had bugs that caused it to always select λ ≈ 0.
267
+
268
+ ### What Was Wrong
269
+
270
+ 1. **Singular Penalty Handling**: REML was incorrectly handling rank-deficient penalty matrices, setting `log|S| = 0` which broke the criterion
271
+ 2. **Lambda Passing**: Optimization was passing `λ = 1.0` with pre-multiplied penalties, confusing the `rank(S)*log(λ)` term
272
+ 3. **Insufficient Data**: Examples used n=30 with p=15 (ratio 2:1), which is too small for REML/GCV
273
+
274
+ ### What Was Fixed
275
+
276
+ 1. **REML Criterion**: Now correctly uses `log|λS| = rank(S)*log(λ) + constant`
277
+ 2. **Optimization**: Passes actual λ values to criterion functions
278
+ 3. **Data Size**: Increased to n=300 for proper n/p ratio (20:1)
279
+ 4. **REML Search**: Uses fine grid search (gradient descent had issues)
280
+
281
+ ### Current Performance (n=300)
282
+
283
+ ```
284
+ GCV: λ = 0.067, Test RMSE = 0.480 ✅
285
+ REML: λ = 0.058, Test RMSE = 0.480 ✅
286
+ ```
287
+
288
+ Both methods now select nearly optimal smoothing parameters!
289
+
290
+ See `IMPLEMENTATION_SUMMARY.md` for complete details.
291
+
@@ -0,0 +1,6 @@
1
+ mgcv_rust-0.2.2.dist-info/METADATA,sha256=mpOFDcgGxrv_D_aSazMkDFDEiLNYijGz_HSFdJPO99U,9340
2
+ mgcv_rust-0.2.2.dist-info/WHEEL,sha256=sgi_Ui9BKkTXfCemBDwUkdsWKtVg5--W9WBHQKZVLI0,147
3
+ mgcv_rust-0.2.2.dist-info/licenses/LICENSE,sha256=-fsWaMApsPlQ8ZHs8KqNesMpqgrf8YyQxHT2m7orIuc,1070
4
+ mgcv_rust/__init__.py,sha256=RBB4OFS2FkQdFzXiLESi1-4cDxXwQ9fnJLeCQg37Fyo,119
5
+ mgcv_rust/mgcv_rust.cpython-310-x86_64-linux-gnu.so,sha256=wjJZxtb-IUKt4zmJs__Zs836l-vlycJjs5ii7iUhZb4,1512640
6
+ mgcv_rust-0.2.2.dist-info/RECORD,,
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: maturin (1.10.2)
3
+ Root-Is-Purelib: false
4
+ Tag: cp310-cp310-manylinux_2_17_x86_64
5
+ Tag: cp310-cp310-manylinux2014_x86_64
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2018 philosorapt0r
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.