linreg-core 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,555 @@
1
+ # linreg-core
2
+
3
+ [![CI](https://github.com/jesse-anderson/linreg-core/actions/workflows/ci.yml/badge.svg)](https://github.com/jesse-anderson/linreg-core/actions/workflows/ci.yml)
4
+ [![Coverage](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/jesse-anderson/linreg-core/main/.github/coverage-badge.json)](https://github.com/jesse-anderson/linreg-core/actions/workflows/ci.yml)
5
+ [![License](https://img.shields.io/badge/license-MIT%20OR%20Apache--2.0-blue)](LICENSE-MIT)
6
+ [![Crates.io](https://img.shields.io/crates/v/linreg-core?color=orange)](https://crates.io/crates/linreg-core)
7
+ [![docs.rs](https://img.shields.io/badge/docs.rs-linreg__core-green)](https://docs.rs/linreg-core)
8
+
9
+ ## Installation
10
+
11
+ ```bash
12
+ # npm
13
+ npm install linreg-core
14
+
15
+ # yarn
16
+ yarn add linreg-core
17
+
18
+ # pnpm
19
+ pnpm add linreg-core
20
+ ```
21
+
22
+ ---
23
+
24
+ A lightweight, self-contained linear regression library written in Rust. Compiles to WebAssembly for browser use or runs as a native Rust crate.
25
+
26
+ **Key design principle:** All linear algebra and statistical distribution functions are implemented from scratch — no external math libraries required. This keeps binary sizes small and makes the crate highly portable.
27
+
28
+ ## Features
29
+
30
+ ### Regression Methods
31
+ - **OLS Regression:** Coefficients, standard errors, t-statistics, p-values, confidence intervals
32
+ - **Ridge Regression:** L2-regularized regression with optional standardization
33
+ - **Lasso Regression:** L1-regularized regression via coordinate descent
34
+ - **Elastic Net:** Combined L1 + L2 regularization for variable selection with multicollinearity handling
35
+ - **Lambda Path Generation:** Create regularization paths for cross-validation
36
+
37
+ ### Model Statistics
38
+ - R-squared, Adjusted R-squared, F-statistic, F-test p-value
39
+ - Residuals, fitted values, leverage (hat matrix diagonal)
40
+ - Mean Squared Error (MSE)
41
+ - Variance Inflation Factor (VIF) for multicollinearity detection
42
+
43
+ ### Diagnostic Tests
44
+ | Category | Tests |
45
+ |----------|-------|
46
+ | **Linearity** | Rainbow Test, Harvey-Collier Test, RESET Test |
47
+ | **Heteroscedasticity** | Breusch-Pagan (Koenker variant), White Test (R & Python methods) |
48
+ | **Normality** | Jarque-Bera, Shapiro-Wilk (n ≤ 5000), Anderson-Darling |
49
+ | **Autocorrelation** | Durbin-Watson, Breusch-Godfrey (higher-order) |
50
+ | **Influence** | Cook's Distance |
51
+
52
+ ### Dual Target
53
+ - Browser (WASM) and server (native Rust)
54
+ - Optional domain restriction for WASM builds
55
+
56
+ ## Quick Start
57
+
58
+ ### Native Rust
59
+
60
+ Add to your `Cargo.toml`:
61
+
62
+ ```toml
63
+ [dependencies]
64
+ linreg-core = { version = "0.3", default-features = false }
65
+ ```
66
+
67
+ #### OLS Regression
68
+
69
+ ```rust
70
+ use linreg_core::core::ols_regression;
71
+
72
+ fn main() -> Result<(), linreg_core::Error> {
73
+ let y = vec![2.5, 3.7, 4.2, 5.1, 6.3];
74
+ let x = vec![vec![1.0, 2.0, 3.0, 4.0, 5.0]];
75
+ let names = vec!["Intercept".to_string(), "X1".to_string()];
76
+
77
+ let result = ols_regression(&y, &x, &names)?;
78
+
79
+ println!("Coefficients: {:?}", result.coefficients);
80
+ println!("R-squared: {:.4}", result.r_squared);
81
+ println!("F-statistic: {:.4}", result.f_statistic);
82
+
83
+ Ok(())
84
+ }
85
+ ```
86
+
87
+ #### Ridge Regression
88
+
89
+ ```rust,no_run
90
+ use linreg_core::regularized::{ridge_fit, RidgeFitOptions};
91
+ use linreg_core::linalg::Matrix;
92
+
93
+ fn main() -> Result<(), linreg_core::Error> {
94
+ let y = vec![2.5, 3.7, 4.2, 5.1, 6.3];
95
+ // Matrix: 5 rows × 2 cols (intercept + 1 predictor), row-major order
96
+ let x = Matrix::new(5, 2, vec![
97
+ 1.0, 1.0, // row 0: intercept, x1
98
+ 1.0, 2.0, // row 1
99
+ 1.0, 3.0, // row 2
100
+ 1.0, 4.0, // row 3
101
+ 1.0, 5.0, // row 4
102
+ ]);
103
+
104
+ let options = RidgeFitOptions {
105
+ lambda: 1.0,
106
+ standardize: true,
107
+ intercept: true,
108
+ };
109
+
110
+ let result = ridge_fit(&x, &y, &options)?;
111
+
112
+ println!("Intercept: {}", result.intercept);
113
+ println!("Coefficients: {:?}", result.coefficients);
114
+
115
+ Ok(())
116
+ }
117
+ ```
118
+
119
+ #### Lasso Regression
120
+
121
+ ```rust,no_run
122
+ use linreg_core::regularized::{lasso_fit, LassoFitOptions};
123
+ use linreg_core::linalg::Matrix;
124
+
125
+ fn main() -> Result<(), linreg_core::Error> {
126
+ let y = vec![2.5, 3.7, 4.2, 5.1, 6.3];
127
+ // Matrix: 5 rows × 3 cols (intercept + 2 predictors), row-major order
128
+ let x = Matrix::new(5, 3, vec![
129
+ 1.0, 1.0, 0.5, // row 0: intercept, x1, x2
130
+ 1.0, 2.0, 1.0, // row 1
131
+ 1.0, 3.0, 1.5, // row 2
132
+ 1.0, 4.0, 2.0, // row 3
133
+ 1.0, 5.0, 2.5, // row 4
134
+ ]);
135
+
136
+ let options = LassoFitOptions {
137
+ lambda: 0.1,
138
+ standardize: true,
139
+ intercept: true,
140
+ ..Default::default() // uses default max_iter=1000, tol=1e-7
141
+ };
142
+
143
+ let result = lasso_fit(&x, &y, &options)?;
144
+
145
+ println!("Intercept: {}", result.intercept);
146
+ println!("Coefficients: {:?}", result.coefficients);
147
+ println!("Non-zero coefficients: {}", result.n_nonzero);
148
+
149
+ Ok(())
150
+ }
151
+ ```
152
+
153
+ #### Elastic Net Regression
154
+
155
+ ```rust,no_run
156
+ use linreg_core::regularized::{elastic_net_fit, ElasticNetOptions};
157
+ use linreg_core::linalg::Matrix;
158
+
159
+ fn main() -> Result<(), linreg_core::Error> {
160
+ let y = vec![2.5, 3.7, 4.2, 5.1, 6.3];
161
+ // Matrix: 5 rows × 3 cols (intercept + 2 predictors), row-major order
162
+ let x = Matrix::new(5, 3, vec![
163
+ 1.0, 1.0, 0.5, // row 0: intercept, x1, x2
164
+ 1.0, 2.0, 1.0, // row 1
165
+ 1.0, 3.0, 1.5, // row 2
166
+ 1.0, 4.0, 2.0, // row 3
167
+ 1.0, 5.0, 2.5, // row 4
168
+ ]);
169
+
170
+ let options = ElasticNetOptions {
171
+ lambda: 0.1,
172
+ alpha: 0.5, // 0 = Ridge, 1 = Lasso, 0.5 = balanced
173
+ standardize: true,
174
+ intercept: true,
175
+ ..Default::default()
176
+ };
177
+
178
+ let result = elastic_net_fit(&x, &y, &options)?;
179
+
180
+ println!("Intercept: {}", result.intercept);
181
+ println!("Coefficients: {:?}", result.coefficients);
182
+ println!("Non-zero coefficients: {}", result.n_nonzero);
183
+
184
+ Ok(())
185
+ }
186
+ ```
187
+
188
+ ### WebAssembly (Browser)
189
+
190
+ Build with wasm-pack:
191
+
192
+ ```bash
193
+ wasm-pack build --release --target web
194
+ ```
195
+
196
+ #### OLS in JavaScript
197
+
198
+ ```javascript
199
+ import init, { ols_regression } from 'linreg-core';
200
+
201
+ async function run() {
202
+ await init();
203
+
204
+ const y = [1, 2, 3, 4, 5];
205
+ const x = [[1, 2, 3, 4, 5]];
206
+ const names = ["Intercept", "X1"];
207
+
208
+ const resultJson = ols_regression(
209
+ JSON.stringify(y),
210
+ JSON.stringify(x),
211
+ JSON.stringify(names)
212
+ );
213
+
214
+ const result = JSON.parse(resultJson);
215
+ console.log("Coefficients:", result.coefficients);
216
+ console.log("R-squared:", result.r_squared);
217
+ }
218
+
219
+ run();
220
+ ```
221
+
222
+ #### Ridge Regression in JavaScript
223
+
224
+ ```javascript
225
+ const result = JSON.parse(ridge_regression(
226
+ JSON.stringify(y),
227
+ JSON.stringify(x),
228
+ JSON.stringify(["Intercept", "X1", "X2"]),
229
+ 1.0, // lambda
230
+ true // standardize
231
+ ));
232
+
233
+ console.log("Coefficients:", result.coefficients);
234
+ ```
235
+
236
+ #### Lasso Regression in JavaScript
237
+
238
+ ```javascript
239
+ const result = JSON.parse(lasso_regression(
240
+ JSON.stringify(y),
241
+ JSON.stringify(x),
242
+ JSON.stringify(["Intercept", "X1", "X2"]),
243
+ 0.1, // lambda
244
+ true, // standardize
245
+ 100000, // max_iter
246
+ 1e-7 // tol
247
+ ));
248
+
249
+ console.log("Coefficients:", result.coefficients);
250
+ console.log("Non-zero coefficients:", result.n_nonzero);
251
+ ```
252
+
253
+ #### Elastic Net Regression
254
+
255
+ ```javascript
256
+ const result = JSON.parse(elastic_net_regression(
257
+ JSON.stringify(y),
258
+ JSON.stringify(x),
259
+ JSON.stringify(["Intercept", "X1", "X2"]),
260
+ 0.1, // lambda
261
+ 0.5, // alpha (0 = Ridge, 1 = Lasso, 0.5 = balanced)
262
+ true, // standardize
263
+ 100000, // max_iter
264
+ 1e-7 // tol
265
+ ));
266
+
267
+ console.log("Coefficients:", result.coefficients);
268
+ console.log("Non-zero coefficients:", result.n_nonzero);
269
+ ```
270
+
271
+ #### Lambda Path Generation
272
+
273
+ ```javascript
274
+ const path = JSON.parse(make_lambda_path(
275
+ JSON.stringify(y),
276
+ JSON.stringify(x),
277
+ 100, // n_lambda
278
+ 0.01 // lambda_min_ratio (as fraction of lambda_max)
279
+ ));
280
+
281
+ console.log("Lambda sequence:", path.lambda_path);
282
+ console.log("Lambda max:", path.lambda_max);
283
+ ```
284
+
285
+ ## Diagnostic Tests
286
+
287
+ ### Native Rust
288
+
289
+ ```rust
290
+ use linreg_core::diagnostics::{
291
+ breusch_pagan_test, durbin_watson_test, jarque_bera_test,
292
+ shapiro_wilk_test, RainbowMethod, rainbow_test
293
+ };
294
+
295
+ fn main() -> Result<(), linreg_core::Error> {
296
+ let y = vec![/* your data */];
297
+ let x = vec![vec![/* predictor 1 */], vec![/* predictor 2 */]];
298
+
299
+ // Heteroscedasticity
300
+ let bp = breusch_pagan_test(&y, &x)?;
301
+ println!("Breusch-Pagan: LM={:.4}, p={:.4}", bp.statistic, bp.p_value);
302
+
303
+ // Autocorrelation
304
+ let dw = durbin_watson_test(&y, &x)?;
305
+ println!("Durbin-Watson: {:.4}", dw.statistic);
306
+
307
+ // Normality
308
+ let jb = jarque_bera_test(&y, &x)?;
309
+ println!("Jarque-Bera: JB={:.4}, p={:.4}", jb.statistic, jb.p_value);
310
+
311
+ // Linearity
312
+ let rainbow = rainbow_test(&y, &x, 0.5, RainbowMethod::R)?;
313
+ println!("Rainbow: F={:.4}, p={:.4}",
314
+ rainbow.r_result.as_ref().unwrap().statistic,
315
+ rainbow.r_result.as_ref().unwrap().p_value);
316
+
317
+ Ok(())
318
+ }
319
+ ```
320
+
321
+ ### WebAssembly
322
+
323
+ All diagnostic tests are available in WASM:
324
+
325
+ ```javascript
326
+ // Rainbow test
327
+ const rainbow = JSON.parse(rainbow_test(
328
+ JSON.stringify(y),
329
+ JSON.stringify(x),
330
+ 0.5, // fraction
331
+ "r" // method: "r", "python", or "both"
332
+ ));
333
+
334
+ // Harvey-Collier test
335
+ const hc = JSON.parse(harvey_collier_test(
336
+ JSON.stringify(y),
337
+ JSON.stringify(x)
338
+ ));
339
+
340
+ // Breusch-Pagan test
341
+ const bp = JSON.parse(breusch_pagan_test(
342
+ JSON.stringify(y),
343
+ JSON.stringify(x)
344
+ ));
345
+
346
+ // White test (method selection: "r", "python", or "both")
347
+ const white = JSON.parse(white_test(
348
+ JSON.stringify(y),
349
+ JSON.stringify(x),
350
+ "r" // method: "r", "python", or "both"
351
+ ));
352
+
353
+ // White test - R-specific method (no method parameter)
354
+ const whiteR = JSON.parse(r_white_test(
355
+ JSON.stringify(y),
356
+ JSON.stringify(x)
357
+ ));
358
+
359
+ // White test - Python-specific method (no method parameter)
360
+ const whitePy = JSON.parse(python_white_test(
361
+ JSON.stringify(y),
362
+ JSON.stringify(x)
363
+ ));
364
+
365
+ // Jarque-Bera test
366
+ const jb = JSON.parse(jarque_bera_test(
367
+ JSON.stringify(y),
368
+ JSON.stringify(x)
369
+ ));
370
+
371
+ // Durbin-Watson test
372
+ const dw = JSON.parse(durbin_watson_test(
373
+ JSON.stringify(y),
374
+ JSON.stringify(x)
375
+ ));
376
+
377
+ // Shapiro-Wilk test
378
+ const sw = JSON.parse(shapiro_wilk_test(
379
+ JSON.stringify(y),
380
+ JSON.stringify(x)
381
+ ));
382
+
383
+ // Anderson-Darling test
384
+ const ad = JSON.parse(anderson_darling_test(
385
+ JSON.stringify(y),
386
+ JSON.stringify(x)
387
+ ));
388
+
389
+ // Cook's Distance
390
+ const cd = JSON.parse(cooks_distance_test(
391
+ JSON.stringify(y),
392
+ JSON.stringify(x)
393
+ ));
394
+
395
+ // RESET test (functional form)
396
+ const reset = JSON.parse(reset_test(
397
+ JSON.stringify(y),
398
+ JSON.stringify(x),
399
+ JSON.stringify([2, 3]), // powers (array of powers to test)
400
+ "fitted" // type: "fitted", "regressor", or "princomp"
401
+ ));
402
+
403
+ // Breusch-Godfrey test (higher-order autocorrelation)
404
+ const bg = JSON.parse(breusch_godfrey_test(
405
+ JSON.stringify(y),
406
+ JSON.stringify(x),
407
+ 1, // order (1 = first-order autocorrelation)
408
+ "chisq" // test_type: "chisq" or "f"
409
+ ));
410
+ ```
411
+
412
+ ## Statistical Utilities (WASM)
413
+
414
+ ```javascript
415
+ // Student's t CDF: P(T <= t)
416
+ const tCDF = get_t_cdf(1.96, 20);
417
+
418
+ // Critical t-value for two-tailed test
419
+ const tCrit = get_t_critical(0.05, 20);
420
+
421
+ // Normal inverse CDF (probit)
422
+ const zScore = get_normal_inverse(0.975);
423
+
424
+ // Descriptive statistics (all return JSON strings)
425
+ const mean = JSON.parse(stats_mean(JSON.stringify([1, 2, 3, 4, 5])));
426
+ const variance = JSON.parse(stats_variance(JSON.stringify([1, 2, 3, 4, 5])));
427
+ const stddev = JSON.parse(stats_stddev(JSON.stringify([1, 2, 3, 4, 5])));
428
+ const median = JSON.parse(stats_median(JSON.stringify([1, 2, 3, 4, 5])));
429
+ const quantile = JSON.parse(stats_quantile(JSON.stringify([1, 2, 3, 4, 5]), 0.5));
430
+ const correlation = JSON.parse(stats_correlation(
431
+ JSON.stringify([1, 2, 3, 4, 5]),
432
+ JSON.stringify([2, 4, 6, 8, 10])
433
+ ));
434
+ ```
435
+
436
+ ### CSV Parsing (WASM)
437
+
438
+ ```javascript
439
+ // Parse CSV content - returns headers, data rows, and numeric column names
440
+ const csv = parse_csv(csvContent);
441
+ const parsed = JSON.parse(csv);
442
+ console.log("Headers:", parsed.headers);
443
+ console.log("Numeric columns:", parsed.numeric_columns);
444
+ ```
445
+
446
+ ### Helper Functions (WASM)
447
+
448
+ ```javascript
449
+ // Get library version
450
+ const version = get_version(); // e.g., "0.3.0"
451
+
452
+ // Verify WASM is working
453
+ const msg = test(); // "Rust WASM is working!"
454
+ ```
455
+
456
+ ## Feature Flags
457
+
458
+ | Feature | Default | Description |
459
+ |---------|---------|-------------|
460
+ | `wasm` | Yes | Enables WASM bindings and browser support |
461
+ | `validation` | No | Includes test data for validation tests |
462
+
463
+ For native Rust without WASM overhead:
464
+
465
+ ```toml
466
+ linreg-core = { version = "0.3", default-features = false }
467
+ ```
468
+
469
+ ## Regularization Path
470
+
471
+ Generate a sequence of lambda values for regularization path analysis:
472
+
473
+ ```rust,no_run
474
+ use linreg_core::regularized::{make_lambda_path, LambdaPathOptions};
475
+ use linreg_core::linalg::Matrix;
476
+
477
+ // Assume x is your standardized design matrix and y is centered
478
+ let x = Matrix::new(100, 5, vec![0.0; 500]);
479
+ let y = vec![0.0; 100];
480
+
481
+ let options = LambdaPathOptions {
482
+ nlambda: 100,
483
+ lambda_min_ratio: Some(0.01),
484
+ alpha: 1.0, // Lasso
485
+ ..Default::default()
486
+ };
487
+
488
+ let lambdas = make_lambda_path(&x, &y, &options, None, Some(0));
489
+
490
+ // Use each lambda for cross-validation or plotting regularization paths
491
+ for &lambda in lambdas.iter() {
492
+ // Fit model with this lambda
493
+ // ...
494
+ }
495
+ ```
496
+
497
+ ## Domain Security (WASM)
498
+
499
+ Optional domain restriction via build-time environment variable:
500
+
501
+ ```bash
502
+ LINREG_DOMAIN_RESTRICT=example.com,mysite.com wasm-pack build --release --target web
503
+ ```
504
+
505
+ When NOT set (default), all domains are allowed. When set, only the specified domains can use the WASM module.
506
+
507
+ ## Validation
508
+
509
+ Results are validated against R (`lmtest`, `car`, `skedastic`, `nortest`, `glmnet`) and Python (`statsmodels`, `scipy`, `sklearn`). See the `verification/` directory for test scripts and reference outputs.
510
+
511
+ ### Running Tests
512
+
513
+ ```bash
514
+ # Unit tests
515
+ cargo test
516
+
517
+ # WASM tests
518
+ wasm-pack test --node
519
+
520
+ # All tests including doctests
521
+ cargo test --all-features
522
+ ```
523
+
524
+ ## Implementation Notes
525
+
526
+ ### Regularization
527
+
528
+ The Ridge and Lasso implementations follow the glmnet formulation:
529
+
530
+ ```
531
+ minimize (1/(2n)) * Σ(yᵢ - β₀ - xᵢᵀβ)² + λ * [(1 - α) * ||β||₂² / 2 + α * ||β||₁]
532
+ ```
533
+
534
+ - **Ridge** (α = 0): Closed-form solution with (X'X + λI)⁻¹X'y
535
+ - **Lasso** (α = 1): Coordinate descent algorithm
536
+
537
+ ### Numerical Precision
538
+
539
+ - QR decomposition used throughout for numerical stability
540
+ - Anderson-Darling uses Abramowitz & Stegun 7.1.26 for normal CDF (differs from R's Cephes by ~1e-6)
541
+ - Shapiro-Wilk implements Royston's 1995 algorithm matching R's implementation
542
+
543
+ ### Known Limitations
544
+
545
+ - Harvey-Collier test may fail on high-VIF datasets (VIF > 5) due to numerical instability in recursive residuals
546
+ - Shapiro-Wilk limited to n <= 5000 (matching R's limitation)
547
+ - White test may differ from R on collinear datasets due to numerical precision in near-singular matrices
548
+
549
+ ## Disclaimer
550
+
551
+ This library is under active development and has not reached 1.0 stability. While outputs are validated against R and Python implementations, **do not use this library for critical applications** (medical, financial, safety-critical systems) without independent verification. See the [LICENSE](LICENSE-MIT) for full terms. The software is provided "as is" without warranty of any kind.
552
+
553
+ ## License
554
+
555
+ Dual-licensed under [MIT](LICENSE-MIT) or [Apache-2.0](LICENSE-APACHE).