liesel-gam 0.0.4__py3-none-any.whl → 0.0.6a4__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- liesel_gam/__about__.py +1 -1
- liesel_gam/__init__.py +38 -1
- liesel_gam/builder/__init__.py +8 -0
- liesel_gam/builder/builder.py +2003 -0
- liesel_gam/builder/category_mapping.py +158 -0
- liesel_gam/builder/consolidate_bases.py +105 -0
- liesel_gam/builder/registry.py +561 -0
- liesel_gam/constraint.py +107 -0
- liesel_gam/dist.py +541 -1
- liesel_gam/kernel.py +18 -7
- liesel_gam/plots.py +946 -0
- liesel_gam/predictor.py +59 -20
- liesel_gam/var.py +1508 -126
- liesel_gam-0.0.6a4.dist-info/METADATA +559 -0
- liesel_gam-0.0.6a4.dist-info/RECORD +18 -0
- {liesel_gam-0.0.4.dist-info → liesel_gam-0.0.6a4.dist-info}/WHEEL +1 -1
- liesel_gam-0.0.4.dist-info/METADATA +0 -160
- liesel_gam-0.0.4.dist-info/RECORD +0 -11
- {liesel_gam-0.0.4.dist-info → liesel_gam-0.0.6a4.dist-info}/licenses/LICENSE +0 -0
|
@@ -0,0 +1,559 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: liesel_gam
|
|
3
|
+
Version: 0.0.6a4
|
|
4
|
+
Summary: Functionality for Generalized Additive Models in Liesel
|
|
5
|
+
Author: Johannes Brachem
|
|
6
|
+
License-File: LICENSE
|
|
7
|
+
Keywords: machine-learning,statistics
|
|
8
|
+
Classifier: Intended Audience :: Science/Research
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Programming Language :: Python :: 3
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
12
|
+
Requires-Python: <3.14,>=3.13
|
|
13
|
+
Requires-Dist: formulaic>=1.2.1
|
|
14
|
+
Requires-Dist: liesel>=0.4.1
|
|
15
|
+
Requires-Dist: smoothcon>=0.0.9
|
|
16
|
+
Description-Content-Type: text/markdown
|
|
17
|
+
|
|
18
|
+
# Bayesian Generalized Additive Models in Liesel
|
|
19
|
+
|
|
20
|
+
[](https://pypi.org/project/liesel_gam)
|
|
21
|
+
[](https://github.com/liesel-devs/liesel_gam/actions/workflows/pre-commit.yml)
|
|
22
|
+
[](https://github.com/liesel-devs/liesel_gam/tree/main/notebooks)
|
|
23
|
+
[](https://github.com/liesel-devs/liesel_gam/actions/workflows/pytest.yml)
|
|
24
|
+
[](https://github.com/liesel-devs/liesel_gam/actions/workflows/pytest.yml)
|
|
25
|
+
|
|
26
|
+
This title is short and catchy, but does not convey the full range of models covered by this library. We could also say:
|
|
27
|
+
|
|
28
|
+
- Bayesian Generalized Additive Models for **Location, Scale, and Shape** (and beyond)
|
|
29
|
+
- Bayesian **Structured Additive Distributional Regression**
|
|
30
|
+
|
|
31
|
+

|
|
32
|
+
|
|
33
|
+
This library provides functionality to make the setup of generalized additive models in [Liesel](https://github.com/liesel-devs/liesel) convenient. It uses [ryp](https://github.com/Wainberg/ryp) to obtain basis and penalty matrices from the R package [mgcv](https://cran.r-project.org/web/packages/mgcv/index.html), and relies on [formulaic](https://github.com/matthewwardrop/formulaic) to parse Wilkinson formulas, known to many from the formula syntax in R.
|
|
34
|
+
|
|
35
|
+
A little syntax teaser:
|
|
36
|
+
|
|
37
|
+
```python
|
|
38
|
+
import tensorflow_probability.substrates.jax.distributions as tfd
|
|
39
|
+
import liesel.model as lsl
|
|
40
|
+
import liesel_gam as gam
|
|
41
|
+
|
|
42
|
+
tb = gam.TermBuilder.from_df(data) # data: a pandas DataFrame
|
|
43
|
+
|
|
44
|
+
loc = gam.AdditivePredictor(name="loc")
|
|
45
|
+
|
|
46
|
+
loc += tb.lin("x1 + x2*x3 + C(x4, contr.sum)") # Linear term
|
|
47
|
+
loc += tb.ps("x5", k=20) # P-spline
|
|
48
|
+
loc += tb.tf( # Full tensor product
|
|
49
|
+
tb.ps("x6", k=8), # first marginal
|
|
50
|
+
tb.ps("x7", k=8) # second marginal
|
|
51
|
+
)
|
|
52
|
+
|
|
53
|
+
y = lsl.Var.new_obs(
|
|
54
|
+
data["y"].to_numpy(),
|
|
55
|
+
distribution=lsl.Dist(tfd.Normal, loc=loc, scale=...),
|
|
56
|
+
name="y"
|
|
57
|
+
)
|
|
58
|
+
|
|
59
|
+
model = lsl.Model([y])
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
As a Liesel addon, `liesel_gam` gives you:
|
|
63
|
+
|
|
64
|
+
- a lot of freedom to use existing building blocks to create
|
|
65
|
+
new models via Liesel
|
|
66
|
+
- visualization of your models as directed acyclic graphs via Liesel
|
|
67
|
+
- just-in-time compilation for speed via [JAX](https://github.com/jax-ml/jax),
|
|
68
|
+
- automatic differentiation via [JAX](https://github.com/jax-ml/jax),
|
|
69
|
+
- access to different samplers
|
|
70
|
+
like Hamiltonian Monte Carlo (HMC), the No-U-turn sampler (NUTS), the
|
|
71
|
+
iteratively-reweighted least squares samples (IWLS), Gibbs sampling, and general
|
|
72
|
+
metropolis-hastings sampling via Liesel.
|
|
73
|
+
|
|
74
|
+
## Disclaimer
|
|
75
|
+
|
|
76
|
+
This library is experimental and under active development. That means:
|
|
77
|
+
|
|
78
|
+
- The API cannot be considered stable. If you depend on this library, pin the version.
|
|
79
|
+
- Testing has not been extensive as of now. Please check and verify!
|
|
80
|
+
- There is currently no documentation beyond this readme.
|
|
81
|
+
|
|
82
|
+
This library comes with no warranty or guarantees.
|
|
83
|
+
|
|
84
|
+
## Installation
|
|
85
|
+
|
|
86
|
+
You can install `liesel_gam` from pypi:
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
pip install liesel_gam
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
You can also install the development version from GitHub via pip:
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
pip install git+https://github.com/liesel-devs/liesel_gam.git
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
## Contents
|
|
99
|
+
|
|
100
|
+
- [Short usage illustration](https://github.com/liesel-devs/liesel_gam?tab=readme-ov-file#illustrations)
|
|
101
|
+
- [Example notebooks](https://github.com/liesel-devs/liesel_gam?tab=readme-ov-file#example-notebooks)
|
|
102
|
+
- [Plotting](https://github.com/liesel-devs/liesel_gam?tab=readme-ov-file#plotting-functionality)
|
|
103
|
+
- [Details](https://github.com/liesel-devs/liesel_gam?tab=readme-ov-file#more-information)
|
|
104
|
+
- [Customize the intercepts](https://github.com/liesel-devs/liesel_gam?tab=readme-ov-file#customize-the-intercepts)
|
|
105
|
+
- [Define priors for lin terms](https://github.com/liesel-devs/liesel_gam?tab=readme-ov-file#define-priors-for-lin-terms)
|
|
106
|
+
- [Use different MCMC kernels like HMC/NUTS](https://github.com/liesel-devs/liesel_gam?tab=readme-ov-file#use-different-mcmc-kernels-like-hmcnuts)
|
|
107
|
+
- [Use different priors and MCMC kernels for variance parameters](https://github.com/liesel-devs/liesel_gam?tab=readme-ov-file#use-different-priors-and-mcmc-kernels-for-variance-parameters)
|
|
108
|
+
- [Compose terms to build new models](https://github.com/liesel-devs/liesel_gam?tab=readme-ov-file#compose-terms-to-build-new-models)
|
|
109
|
+
- [Use a custom basis function](https://github.com/liesel-devs/liesel_gam?tab=readme-ov-file#use-a-custom-basis-function)
|
|
110
|
+
- [Use a custom basis matrix directly](https://github.com/liesel-devs/liesel_gam?tab=readme-ov-file#use-a-custom-basis-matrix-directly)
|
|
111
|
+
- [Noncentered parameterization](https://github.com/liesel-devs/liesel_gam?tab=readme-ov-file#noncentered-parameterization)
|
|
112
|
+
- [Extract a basis](https://github.com/liesel-devs/liesel_gam?tab=readme-ov-file#extract-a-basis-directly)
|
|
113
|
+
- [Extract a column from the data frame as a variable](https://github.com/liesel-devs/liesel_gam?tab=readme-ov-file#extract-a-column-from-the-data-frame-as-a-variable)
|
|
114
|
+
- [Overview of smooth terms available in liesel_gam](https://github.com/liesel-devs/liesel_gam?tab=readme-ov-file#overview-of-smooth-terms-available-in-liesel_gam)
|
|
115
|
+
- [Acknowledgements](https://github.com/liesel-devs/liesel_gam?tab=readme-ov-file#acknowledgements)
|
|
116
|
+
|
|
117
|
+
## Illustrations
|
|
118
|
+
|
|
119
|
+
These are pseudo-code illustrations without real data. For full examples, please
|
|
120
|
+
consider the [notebooks](https://github.com/liesel-devs/liesel_gam/blob/main/notebooks).
|
|
121
|
+
|
|
122
|
+
### Imports
|
|
123
|
+
|
|
124
|
+
```python
|
|
125
|
+
import tensorflow_probability.substrates.jax.distributions as tfd
|
|
126
|
+
import jax.numpy as jnp
|
|
127
|
+
|
|
128
|
+
import liesel.model as lsl
|
|
129
|
+
import liesel.goose as gs
|
|
130
|
+
|
|
131
|
+
import liesel_gam as gam
|
|
132
|
+
|
|
133
|
+
data = ... # assuming data is a pandas DataFrame object
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
### Additive predictors and response model
|
|
137
|
+
|
|
138
|
+
First, we set up the response model. The `gam.AdditivePredictor` classes are
|
|
139
|
+
containers for `lsl.Var` objects, which can added using the `+=` operator, as we will
|
|
140
|
+
see. By default, each `gam.AdditivePredictor` includes an intercept with a
|
|
141
|
+
constant prior, but you are free to pass any `lsl.Var` as the intercept
|
|
142
|
+
during initialization.
|
|
143
|
+
|
|
144
|
+
```python
|
|
145
|
+
loc_pred = gam.AdditivePredictor("mu")
|
|
146
|
+
scale_pred = gam.AdditivePredictor("sigma", inv_link=jnp.exp)
|
|
147
|
+
|
|
148
|
+
y = lsl.Var.new_obs(
|
|
149
|
+
value=...,
|
|
150
|
+
distribution=lsl.Dist(tfd.Normal, loc=loc_pred, scale=scale_pred),
|
|
151
|
+
name="y"
|
|
152
|
+
)
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
### TermBuilder
|
|
156
|
+
|
|
157
|
+
Next, we initialize a `gam.TermBuilder`. This class helps you set up structure additive
|
|
158
|
+
regression terms from a dataframe.
|
|
159
|
+
|
|
160
|
+
```python
|
|
161
|
+
tb = gam.TermBuilder.from_df(data)
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
### TermBuilder.lin: Linear terms from formulas
|
|
165
|
+
|
|
166
|
+
Using the TermBuilder, we can now start adding terms to our predictors. For example, to
|
|
167
|
+
add a linear effect we can use `gam.TermBuilder.lin`, which allows us to use
|
|
168
|
+
Wilkinson formulas as implemented in [formulaic](https://matthewwardrop.github.io/formulaic/latest/guides/grammar/).
|
|
169
|
+
|
|
170
|
+
Note that formulaic allows you to set up several
|
|
171
|
+
smooth bases and these speficications are supported by `liesel_gam`. If you use
|
|
172
|
+
them, be aware that smooths set up via formulaic in the `lin` term will *not* be
|
|
173
|
+
equipped with any regularizing priors. They will be fully unpenalized smooths. In
|
|
174
|
+
almost all cases, you will want to use penalized smooths. The `gam.TermBuilder` offers
|
|
175
|
+
dedicated methods for setting up penalized smooths, see below.
|
|
176
|
+
|
|
177
|
+
```python
|
|
178
|
+
loc_pred += tb.lin("x1 + x2*x3 + C(x4, contr.sum)")
|
|
179
|
+
scale_pred += tb.lin("x1") # using a simpler model for the scale predictor here
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
### TermBuilder.s: Penalized smooth terms
|
|
183
|
+
|
|
184
|
+
Next, we add a smooth term.
|
|
185
|
+
|
|
186
|
+
```python
|
|
187
|
+
loc_pred += tb.ps("x5", k=20)
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
### MCMC algorithm setup
|
|
191
|
+
|
|
192
|
+
Finally, we build the Liesel model.
|
|
193
|
+
|
|
194
|
+
```python
|
|
195
|
+
model = lsl.Model([y])
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
For MCMC sampling, we then set up an engine builder. All terms returned by
|
|
199
|
+
`gam.TermBuilder` come with default `inference` specifications that provide a
|
|
200
|
+
`gs.IWLSKernel` for the coefficients of each term. Smoothing parameters receive a
|
|
201
|
+
default prior of `InverseGamma(scale=1.0, concentration=0.005)` and a corresponding
|
|
202
|
+
`gs.GibbsKernel`. This setup allows you to get first results quickly.
|
|
203
|
+
|
|
204
|
+
```python
|
|
205
|
+
eb = gs.LieselMCMC(model).get_engine_builder(seed=42, num_chains=4)
|
|
206
|
+
|
|
207
|
+
eb.set_duration(
|
|
208
|
+
warmup_duration=1000,
|
|
209
|
+
posterior_duration=1000,
|
|
210
|
+
term_duration=200,
|
|
211
|
+
posterior_thinning=2
|
|
212
|
+
)
|
|
213
|
+
engine = eb.build()
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
## Example notebooks
|
|
217
|
+
|
|
218
|
+
This repository includes a number of notebooks with minimal examples for using different
|
|
219
|
+
smooth terms. You can find them here: [Notebooks](https://github.com/liesel-devs/liesel_gam/tree/main/notebooks)
|
|
220
|
+
|
|
221
|
+
## Plotting functionality
|
|
222
|
+
|
|
223
|
+
`liesel_gam` comes with some default plotting functions building on the wonderful
|
|
224
|
+
[plotnine](https://plotnine.org), which brings a ggplot2-like syntax to python.
|
|
225
|
+
|
|
226
|
+
The current plotting functions are:
|
|
227
|
+
|
|
228
|
+
- `gam.plot_1d_smooth`: For plotting univariate smooths.
|
|
229
|
+
- `gam.plot_2d_smooth`: For plotting bivariate smooths.
|
|
230
|
+
- `gam.plot_regions`: For plotting discrete spatial effects like markov random fields or spatially organized random intercepts.
|
|
231
|
+
- `gam.plot_forest`: For plotting discrete effects like random intercepts and markov random fields.
|
|
232
|
+
- `gam.plot_1d_smooth_clustered`: For plotting clustered smooths, including random slopes and smooths with a random scalar.
|
|
233
|
+
- `gam.plot_polys`: General function for plotting discrete spatial regions.
|
|
234
|
+
|
|
235
|
+
Example usage:
|
|
236
|
+
|
|
237
|
+
```python
|
|
238
|
+
gam.plot_1d_smooth(
|
|
239
|
+
term=model.vars["ps(x)"], # the Term object, here retrieved from the model
|
|
240
|
+
samples=samples # the MCMC samples drawn via liesel.goose
|
|
241
|
+
)
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
.png)
|
|
245
|
+
|
|
246
|
+
## More information
|
|
247
|
+
|
|
248
|
+
### Customize the intercepts
|
|
249
|
+
|
|
250
|
+
By default, a `gam.AdditivePredictor` comes with an intercept that receives a constant
|
|
251
|
+
prior and is sampled with `gs.IWLSKernel`. You can override this default in several ways.
|
|
252
|
+
|
|
253
|
+
You can turn off the default by passing `intercept=False`:
|
|
254
|
+
|
|
255
|
+
```python
|
|
256
|
+
loc_pred = gam.AdditivePredictor("mu", intercept=False)
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
You can also pass a custom variable, which opens the full freedom of Liesel to you in
|
|
260
|
+
terms of prior and inference specification for the intercept:
|
|
261
|
+
|
|
262
|
+
```python
|
|
263
|
+
loc_intercept = lsl.Var.new_param(
|
|
264
|
+
value=0.0,
|
|
265
|
+
distribution=lsl.Dist(tfd.Normal, loc=0.0, scale=100.0),
|
|
266
|
+
inference=gs.MCMCSpec(gs.NUTSKernel),
|
|
267
|
+
name="mu_intercept"
|
|
268
|
+
)
|
|
269
|
+
|
|
270
|
+
loc_pred = gam.AdditivePredictor("mu", intercept=loc_intercept)
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
### Define priors for `lin` terms
|
|
274
|
+
|
|
275
|
+
The regression coefficients of a `lin` term receive a constant prior by default.
|
|
276
|
+
You can customize the prior by passing a `lsl.Dist` to the `prior` argument:
|
|
277
|
+
|
|
278
|
+
```python
|
|
279
|
+
loc_pred += tb.lin(
|
|
280
|
+
"x1 + x2*x3 + C(x4, contr.sum)",
|
|
281
|
+
prior=lsl.Dist(tfd.Normal, loc=0.0, scale=100.0)
|
|
282
|
+
)
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
### Sample `lin` terms and intercepts in one joint block
|
|
286
|
+
|
|
287
|
+
By default, each term initialized by the `gam.TermBuilder` receives their own kernel,
|
|
288
|
+
and this includes the intercept. That means, in the blocked MCMC algorithm employed
|
|
289
|
+
by Liesel, there will be one block for the intercept of each predictor and separate
|
|
290
|
+
blocks for other terms.
|
|
291
|
+
|
|
292
|
+
However, there may be cases in which you want intercepts and linear terms to be sampled
|
|
293
|
+
jointly by a single sampler. You can achieve this by customizing the `inference`
|
|
294
|
+
arguments and using the `gs.MCMCSpec(kernel_group)` argument:
|
|
295
|
+
|
|
296
|
+
```python
|
|
297
|
+
loc_intercept = lsl.Var.new_param(
|
|
298
|
+
value=0.0,
|
|
299
|
+
inference=gs.MCMCSpec(gs.IWLSKernel, kernel_group="loc_lin"),
|
|
300
|
+
name="mu_intercept"
|
|
301
|
+
)
|
|
302
|
+
|
|
303
|
+
loc_pred = gam.AdditivePredictor("mu", intercept=loc_intercept)
|
|
304
|
+
|
|
305
|
+
loc_pred += tb.lin(
|
|
306
|
+
"x1 + x2*x3 + C(x4, contr.sum)",
|
|
307
|
+
inference=gs.MCMCSpec(gs.IWLSKernel, kernel_group="loc_lin")
|
|
308
|
+
)
|
|
309
|
+
```
|
|
310
|
+
|
|
311
|
+
### Use different MCMC kernels like HMC/NUTS
|
|
312
|
+
|
|
313
|
+
The default MCMC kernel for all terms created by a `gam.TermBuilder` is `gs.IWLSKernel`.
|
|
314
|
+
You are free to override this default by supplying your own `liesel.goose.MCMCSpec`
|
|
315
|
+
object in the `inference` argument:
|
|
316
|
+
|
|
317
|
+
```python
|
|
318
|
+
import liesel.goose as gs
|
|
319
|
+
|
|
320
|
+
loc_pred += tb.ps("x5", k=20, inference=gs.MCMCSpec(gs.NUTSKernel))
|
|
321
|
+
```
|
|
322
|
+
|
|
323
|
+
This way, you can also use Liesel to set up general custom Metropolis-Hastings
|
|
324
|
+
kernels ([gs.MHKernel](https://docs.liesel-project.org/en/latest/generated/liesel.goose.MHKernel.html))
|
|
325
|
+
or Gibbs kernels ([gs.GibbsKernel](https://docs.liesel-project.org/en/latest/generated/liesel.goose.GibbsKernel.html)).
|
|
326
|
+
|
|
327
|
+
### Use different priors and MCMC kernels for variance parameters
|
|
328
|
+
|
|
329
|
+
The variance parameters in the priors of penalized smooths are controlled with the `scale` argument,
|
|
330
|
+
which accepts `gam.VarIGPrior` and `lsl.Var` objects, but also simple floats.
|
|
331
|
+
|
|
332
|
+
- The default `scale=gam.VarIGPrior(1.0, 0.005)` will set up an inverse gamma prior for
|
|
333
|
+
the smooth's variance parameter, with parameters `concentration=1.0` and `scale=0.005`.
|
|
334
|
+
The variance parameter will then automatically receive a fitting Gibbs kernel, since
|
|
335
|
+
the full conditional is known in this case.
|
|
336
|
+
- If you pass `scale=gam.VarIGPrior(a, b)` for any `a` and `b`, you will also set up an inverse gamma prior for
|
|
337
|
+
the smooth's variance parameter, with parameters `concentration=a` and `scale=b`.
|
|
338
|
+
Again, the variance parameter will then automatically receive a fitting Gibbs kernel, since
|
|
339
|
+
the full conditional is known in this case.
|
|
340
|
+
- If you pass a float, this is taken as the scale parameter and held fixed.
|
|
341
|
+
- You can also pass a custom `lsl.Var`. In this case, it is your responsibility to
|
|
342
|
+
define a fitting `inference` specification. For example, to set up a term with a
|
|
343
|
+
half-normal prior on the scale parameter, and sampling of the log scale via NUTS:
|
|
344
|
+
|
|
345
|
+
```python
|
|
346
|
+
scale_x5 = lsl.Var.new_param(
|
|
347
|
+
1.0,
|
|
348
|
+
distribution=lsl.Dist(tfd.HalfNormal, scale=20.0),
|
|
349
|
+
name="scale_x5"
|
|
350
|
+
)
|
|
351
|
+
|
|
352
|
+
scale_x5.transform(
|
|
353
|
+
bijector=tfb.Exp(),
|
|
354
|
+
inference=gs.MCMCSpec(gs.NUTSKernel),
|
|
355
|
+
name="log_scale_x5"
|
|
356
|
+
)
|
|
357
|
+
|
|
358
|
+
loc_pred += tb.ps("x5", k=20, scale=scale_x5)
|
|
359
|
+
```
|
|
360
|
+
|
|
361
|
+
### Compose terms to build new models
|
|
362
|
+
|
|
363
|
+
Since `gam.TermBuilder` returns objects that are subclasses of `lsl.Var` objects,
|
|
364
|
+
you can use them as building blocks for more sophisticated models. For example,
|
|
365
|
+
to build a varying coefficient model, you can do the following:
|
|
366
|
+
|
|
367
|
+
```python
|
|
368
|
+
x1_var = tb.registry.get_obs("x1")
|
|
369
|
+
x2_smooth = tb.ps("x2", k=10)
|
|
370
|
+
|
|
371
|
+
term = lsl.Var.new_calc(
|
|
372
|
+
lambda x, by: x * by,
|
|
373
|
+
x=x1_var,
|
|
374
|
+
by=x2_smooth,
|
|
375
|
+
name="x1*ps(x2)",
|
|
376
|
+
)
|
|
377
|
+
|
|
378
|
+
loc_pred += term
|
|
379
|
+
```
|
|
380
|
+
|
|
381
|
+
In fact, this is essentially how the `gam.TermBuilder.vc` method is implemented.
|
|
382
|
+
|
|
383
|
+
### Use a custom basis function
|
|
384
|
+
|
|
385
|
+
If you have a custom basis function and a penalty matrix, you can supply them
|
|
386
|
+
directly to the TermBuilder.
|
|
387
|
+
|
|
388
|
+
```python
|
|
389
|
+
def custom_basis_fn(x: jax.Array) -> jax.Array:
|
|
390
|
+
# code for your custom basis goes here
|
|
391
|
+
|
|
392
|
+
# x is shape (n, k), where k is the number of columns in the input data that this
|
|
393
|
+
# term depends on
|
|
394
|
+
|
|
395
|
+
# the return needs to be an array of dimension (n, p)
|
|
396
|
+
# where n is the number of observations for x and p is the dimension
|
|
397
|
+
# of the penalty matrix corresponding to this basis.
|
|
398
|
+
...
|
|
399
|
+
|
|
400
|
+
custom_penalty = ... # your custom penalty of shape (p, p)
|
|
401
|
+
|
|
402
|
+
loc_pred += tb.f(
|
|
403
|
+
# here, we supply two covariances.
|
|
404
|
+
# They will be concatenated into x = jnp.stack([x6, x7], axis=-1) for passing
|
|
405
|
+
# to basis_fn
|
|
406
|
+
"x6", "x7",
|
|
407
|
+
basis_fn=custom_basis_fn,
|
|
408
|
+
penalty=custom_penalty
|
|
409
|
+
)
|
|
410
|
+
```
|
|
411
|
+
|
|
412
|
+
Implementing a custom basis via a basis function
|
|
413
|
+
is advantageous, because it enables us to simply pass the
|
|
414
|
+
covariates that this basis relies on directly to `lsl.Model.predict` for predictions:
|
|
415
|
+
|
|
416
|
+
```python
|
|
417
|
+
model = lsl.Model([y]) # a lsl.Model that contains an .f term
|
|
418
|
+
|
|
419
|
+
new_x6 = ... # 1d array with new data for x6
|
|
420
|
+
new_x7 = ... # 1d array with new data for x7
|
|
421
|
+
model.predict(newdata={"x6": new_x6, "x7": new_x7}, predict=["f1(x6,x7)"])
|
|
422
|
+
```
|
|
423
|
+
|
|
424
|
+
### Use a custom basis matrix directly
|
|
425
|
+
|
|
426
|
+
If you have a custom basis matrix and a penalty matrix, you can initialize a
|
|
427
|
+
`liesel_gam.Basis` object and, building on it, a `liesel_gam.Term` directly:
|
|
428
|
+
|
|
429
|
+
```python
|
|
430
|
+
custom_basis = gam.Basis(
|
|
431
|
+
value=..., # your basis matrix of shape (n, p) goes here
|
|
432
|
+
penalty=..., # your penalty matrix of shape (p, p) goes here
|
|
433
|
+
xname="x8" # the name of the basis object will by default be B(xname); here: B(x8)
|
|
434
|
+
)
|
|
435
|
+
|
|
436
|
+
custom_term = gam.Term.f(
|
|
437
|
+
basis=custom_basis,
|
|
438
|
+
scale=gam.VarIGPrior(1.0, 0.005), # also accepts any scalar-valued lsl.Var object
|
|
439
|
+
fname="h" # name of the term will be fname(basis.x.name), so here: h(x8)
|
|
440
|
+
)
|
|
441
|
+
|
|
442
|
+
loc_pred += custom_term # still need to add the term to our predictor
|
|
443
|
+
```
|
|
444
|
+
|
|
445
|
+
Be aware that, if you go this route, the `lsl.Model` does *not* how to construct
|
|
446
|
+
your basis from input data. So to predict at new values, you will have to
|
|
447
|
+
provide a full basis matrix:
|
|
448
|
+
|
|
449
|
+
```python
|
|
450
|
+
model = lsl.Model([y]) # a lsl.Model that contains your custom term
|
|
451
|
+
|
|
452
|
+
new_custom_basis = ... # your (m, p) array, the basis matrix at which you want to predict
|
|
453
|
+
model.predict(newdata={"x8": new_custom_basis}, predict=["h(x8)"])
|
|
454
|
+
```
|
|
455
|
+
|
|
456
|
+
### Noncentered parameterization
|
|
457
|
+
|
|
458
|
+
Sometimes sampling from the posterior can be facilitated by sampling from a
|
|
459
|
+
reparameterized model, particularly using a "noncentered" parameterization
|
|
460
|
+
(see [Stan documentation](https://mc-stan.org/docs/2_18/stan-users-guide/reparameterization-section.html)).
|
|
461
|
+
|
|
462
|
+
Consider the mdoel $x \sim N(0, \sigma^2)$. Noncentered parameterization means that,
|
|
463
|
+
instead of sampling $x$ and $\sigma^2$ directly, we rewrite it as $x = \sigma * \tilde{x}$,
|
|
464
|
+
where $\tilde{x} \sim N(0, 1)$, and draw samples of $\tilde{x}$ and $\sigma^2$.
|
|
465
|
+
|
|
466
|
+
For many terms in `liesel_gam` you can enable a noncentered parameterization by
|
|
467
|
+
setting a corresponding argument to `True`:
|
|
468
|
+
|
|
469
|
+
```python
|
|
470
|
+
loc_pred += tb.ps("x5", k=20, noncentered=True)
|
|
471
|
+
```
|
|
472
|
+
|
|
473
|
+
### Extract a basis directly
|
|
474
|
+
|
|
475
|
+
Sometimes you just want a certain basis matrix, and use it to build your own term.
|
|
476
|
+
You can do that by using the `gam.BasisBuilder`, which is available from the
|
|
477
|
+
`TermBuilder` object. The method names of the `gam.BasisBuilder` correspond to the
|
|
478
|
+
term initialization methods on the `TermBuilder`.
|
|
479
|
+
|
|
480
|
+
```python
|
|
481
|
+
tb = gam.TermBuilder.from_df(data)
|
|
482
|
+
|
|
483
|
+
s_basis = tb.bases.ps(
|
|
484
|
+
"x5",
|
|
485
|
+
k=20,
|
|
486
|
+
|
|
487
|
+
# whether sum-to-zero constraints should be applied by reparameterizing the basis
|
|
488
|
+
absorb_cons = True,
|
|
489
|
+
|
|
490
|
+
# whether the penalty matrix corresponding to this basis should be reparameterized
|
|
491
|
+
# into a diagonal matrix, with a corresponding reparameterization for the basis
|
|
492
|
+
diagonal_penalty = True,
|
|
493
|
+
|
|
494
|
+
# whether the penalty matrix should corresponding to this basis should be
|
|
495
|
+
# scaled
|
|
496
|
+
scale_penalty = True
|
|
497
|
+
)
|
|
498
|
+
```
|
|
499
|
+
|
|
500
|
+
The `Basis` object then gives you access to its own value through `Basis.value`, to
|
|
501
|
+
the basis function through `Basis.value_node.function`, and to its penalty
|
|
502
|
+
through `Basis.penalty`.
|
|
503
|
+
|
|
504
|
+
### Extract a column from the data frame as a variable
|
|
505
|
+
|
|
506
|
+
If you simply want to turn a column of your data frame into a `lsl.Var` object,
|
|
507
|
+
you can use the `gam.PandasRegistry` attached to the `TermBuilder`:
|
|
508
|
+
|
|
509
|
+
```python
|
|
510
|
+
tb = gam.TermBuilder.from_df(data)
|
|
511
|
+
|
|
512
|
+
x1_var = tb.registry.get_obs("x1")
|
|
513
|
+
x2_var = tb.registry.get_numerical_obs("x2")
|
|
514
|
+
x3_var = tb.registry.get_boolean_obs("x3")
|
|
515
|
+
x4_var, mapping = tb.registry.get_categorical_obs("x4")
|
|
516
|
+
```
|
|
517
|
+
|
|
518
|
+
`tb.registry.get_categorical_obs` returns a variable object that represents
|
|
519
|
+
the categories in the corresponding column of the data frame as numeric codes
|
|
520
|
+
for compatibility with jax. For that reason, it also returns a `gam.CategoryMapping`
|
|
521
|
+
object that enables you to convert back and forth between category labels and numeric
|
|
522
|
+
category codes.
|
|
523
|
+
|
|
524
|
+
The registry has some more convenience methods on offer that we do not list
|
|
525
|
+
here. Check out the documentation for more.
|
|
526
|
+
|
|
527
|
+
## Overview of smooth terms available in `liesel_gam`
|
|
528
|
+
|
|
529
|
+
`gam.TermBuilder` offers a range of smooth terms
|
|
530
|
+
as dedicated methods, including:
|
|
531
|
+
|
|
532
|
+
- Linear effects
|
|
533
|
+
- `.lin` Linear and categorical effects, conveniently defined via formulas.
|
|
534
|
+
- Univariate smooths
|
|
535
|
+
- `.ps` Penalized B-splines, i.e. P-splines
|
|
536
|
+
- `.cp` Cyclic P-splines
|
|
537
|
+
- `.bs` B-splines
|
|
538
|
+
- `.cr` Cubic regression splines
|
|
539
|
+
- `.cs` Cubic regression splines with shrinkage
|
|
540
|
+
- `.cc` Cyclic cubic regression splines
|
|
541
|
+
- Uni- or multivariate smooths
|
|
542
|
+
- `.tp` Thin plate splines
|
|
543
|
+
- `.ts` Thin plate splines with a null space penalty
|
|
544
|
+
- `.kriging` Gaussian process kriging
|
|
545
|
+
- Multivariate smooths. These smooths are initialized directly from marginal smooths.
|
|
546
|
+
- `.tf` Full tensor products, including main effects. Similar to `mgcv::te`, but with a different API.
|
|
547
|
+
- `.tx` Tensor product interaction without main effects, appropriate when the main effects are also added to the model. Similar to `mgcv::ti`, but with a different API.
|
|
548
|
+
- Discrete and further composite terms
|
|
549
|
+
- `.mrf`: Markov random fields (discrete-region spatial effects)
|
|
550
|
+
- `.ri`: Random intercept terms.
|
|
551
|
+
- `.rs`: Random slope terms.
|
|
552
|
+
- `.vc`: Varying coefficient terms
|
|
553
|
+
|
|
554
|
+
## Acknowledgements
|
|
555
|
+
|
|
556
|
+
Liesel is being developed by Paul Wiemann, Hannes Riebl, Johannes
|
|
557
|
+
Brachem and Gianmarco Callegher with support from Thomas Kneib. We are
|
|
558
|
+
grateful to the German Research Foundation (DFG) for funding the
|
|
559
|
+
development through grant KN 922/11-1.
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
liesel_gam/__about__.py,sha256=KqOvLbZHgmnjob0TAaxqhncxOcXgkr-DoZkTx8zrLCw,25
|
|
2
|
+
liesel_gam/__init__.py,sha256=-PIODn0ztSBXDhwHMmZ0a_CGJ1gHP42bXtL0CUlCDwU,2120
|
|
3
|
+
liesel_gam/constraint.py,sha256=bPBek6I070B2fIMNzlvjqnsImEtBC4Ob2J8se9a70zw,3263
|
|
4
|
+
liesel_gam/dist.py,sha256=Jq5Vx2Fszi8qGIkm9GTAXaK6-CaQp7RF1YLuWXOT6EQ,21771
|
|
5
|
+
liesel_gam/kernel.py,sha256=_4yjIe-hE6N8ssbSiBzYpu5jIWnJakBfZi_Kg3lFpBs,1760
|
|
6
|
+
liesel_gam/plots.py,sha256=OpBDj_vERCtU5-Gn59ohrd0PntCjxcJcSzCYw2GRvxU,29295
|
|
7
|
+
liesel_gam/predictor.py,sha256=kI7150tDdH_QMcDQzFdYuKSpwFc8GtBsrarWkBswe-Q,2655
|
|
8
|
+
liesel_gam/roles.py,sha256=eZeuZI5YccNzlrgqOR5ltREB4dRBV4k4afZt9701doM,335
|
|
9
|
+
liesel_gam/var.py,sha256=FrBT4yKTUdZX5gG4NTcvJKJwrzhOzeaRqKPODnVYsGY,52389
|
|
10
|
+
liesel_gam/builder/__init__.py,sha256=v8yxolp4T_DJowY_z_0VO5qjYw6lzD_spkJtX8di-dU,392
|
|
11
|
+
liesel_gam/builder/builder.py,sha256=0JSQiAdsL0lVTTThkEOxb2KeDNQ6eW5SVFxnlq-Mg-I,64959
|
|
12
|
+
liesel_gam/builder/category_mapping.py,sha256=Cfn2ZIT-4kek7WkArOg1Yis9gxgyPkXyNklcMrM9cqY,5406
|
|
13
|
+
liesel_gam/builder/consolidate_bases.py,sha256=t_-vQX8kkL3x9BxgcfnkJrZUv4VW7oCDMw2q7n0TLaQ,3608
|
|
14
|
+
liesel_gam/builder/registry.py,sha256=f8iHazQtOA_qeCBc9ezyrUNa5hja1nWLYJVw13X7BBg,19294
|
|
15
|
+
liesel_gam-0.0.6a4.dist-info/METADATA,sha256=DNa9HLRyXnzIwINCICW7LbFOktra6rQF7vVu7vD_Po0,21690
|
|
16
|
+
liesel_gam-0.0.6a4.dist-info/WHEEL,sha256=WLgqFyCfm_KASv4WHyYy0P3pM_m7J5L9k2skdKLirC8,87
|
|
17
|
+
liesel_gam-0.0.6a4.dist-info/licenses/LICENSE,sha256=pjhYbDHmDl8Gms9kI5nPaJoWte2QGB0F6Cwa1r9jsQ0,1063
|
|
18
|
+
liesel_gam-0.0.6a4.dist-info/RECORD,,
|