bayesian-pricing 0.2.2__tar.gz → 0.2.3__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/PKG-INFO +69 -42
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/README.md +65 -40
- bayesian_pricing-0.2.3/benchmarks/benchmark.py +286 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/pyproject.toml +4 -2
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/src/bayesian_pricing/_utils.py +32 -5
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/.github/workflows/tests.yml +0 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/.gitignore +0 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/CITATION.cff +0 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/LICENSE +0 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/notebooks/01_hierarchical_frequency_demo.py +0 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/notebooks/bayesian_pricing_demo.py +0 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/notebooks/benchmark.py +0 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/src/bayesian_pricing/__init__.py +0 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/src/bayesian_pricing/diagnostics.py +0 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/src/bayesian_pricing/frequency.py +0 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/src/bayesian_pricing/relativities.py +0 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/src/bayesian_pricing/severity.py +0 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/tests/conftest.py +0 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/tests/test_frequency.py +0 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/tests/test_relativities.py +0 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/tests/test_severity.py +0 -0
- {bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/uv.lock +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: bayesian-pricing
|
|
3
|
-
Version: 0.2.
|
|
3
|
+
Version: 0.2.3
|
|
4
4
|
Summary: Hierarchical Bayesian models for insurance pricing thin-data segments
|
|
5
5
|
Project-URL: Homepage, https://github.com/burning-cost/bayesian-pricing
|
|
6
6
|
Project-URL: Repository, https://github.com/burning-cost/bayesian-pricing
|
|
@@ -18,6 +18,7 @@ Classifier: Programming Language :: Python :: 3
|
|
|
18
18
|
Classifier: Programming Language :: Python :: 3.10
|
|
19
19
|
Classifier: Programming Language :: Python :: 3.11
|
|
20
20
|
Classifier: Programming Language :: Python :: 3.12
|
|
21
|
+
Classifier: Topic :: Office/Business :: Financial
|
|
21
22
|
Classifier: Topic :: Scientific/Engineering :: Mathematics
|
|
22
23
|
Requires-Python: >=3.10
|
|
23
24
|
Requires-Dist: arviz<1.0,>=0.17
|
|
@@ -32,25 +33,47 @@ Provides-Extra: numpyro
|
|
|
32
33
|
Requires-Dist: jax>=0.4; extra == 'numpyro'
|
|
33
34
|
Requires-Dist: numpyro>=0.13; extra == 'numpyro'
|
|
34
35
|
Provides-Extra: pymc
|
|
35
|
-
Requires-Dist:
|
|
36
|
+
Requires-Dist: numpy>=2.0; extra == 'pymc'
|
|
37
|
+
Requires-Dist: pymc>=5.8; extra == 'pymc'
|
|
36
38
|
Description-Content-Type: text/markdown
|
|
37
39
|
|
|
38
40
|
# bayesian-pricing
|
|
39
41
|
|
|
40
|
-
[](https://github.com/burning-cost/bayesian-pricing/actions/workflows/tests.yml)
|
|
41
42
|
[](https://pypi.org/project/bayesian-pricing/)
|
|
42
|
-
](https://pypi.org/project/bayesian-pricing/)
|
|
44
|
+
[]()
|
|
45
|
+
[]()
|
|
43
46
|
|
|
44
|
-
Hierarchical Bayesian models for insurance pricing thin-data segments.
|
|
47
|
+
Hierarchical Bayesian models for insurance pricing thin-data segments — when your rating grid has more cells than your book has claims, partial pooling gives you credible estimates where every other method gives you noise or nothing.
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## Why bother
|
|
52
|
+
|
|
53
|
+
Benchmarked against raw segment estimates (observed claims / exposure, no shrinkage) on synthetic UK motor data with a known DGP: 20 occupation classes crossed with 3 vehicle groups (60 rating cells), ranging from 20 to 800 policy-years of exposure.
|
|
54
|
+
|
|
55
|
+
| Segment type | RMSE — raw estimates | RMSE — Bayesian | Improvement |
|
|
56
|
+
|---|---|---|---|
|
|
57
|
+
| Thin occupations (20-50 py) | Higher | Lower | Typically 20-40% |
|
|
58
|
+
| Thick occupations (300-800 py) | Lower | Comparable | Small or neutral |
|
|
59
|
+
| All segments combined | — | — | Typically 10-25% |
|
|
60
|
+
|
|
61
|
+
RMSE is measured against the known DGP ground truth. Thin cells see the largest improvement because partial pooling replaces noise-dominated raw estimates with a data-weighted blend of the cell mean and the grand mean. Thick cells are largely unaffected — their credibility factor approaches 1.
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
[Run on Databricks](https://github.com/burning-cost/burning-cost-examples/blob/main/notebooks/01_hierarchical_frequency_demo.py)
|
|
66
|
+
|
|
67
|
+
---
|
|
45
68
|
|
|
46
69
|
## The problem
|
|
47
70
|
|
|
48
|
-
UK personal lines rating operates on multi-dimensional grids. A typical motor model has driver age
|
|
71
|
+
UK personal lines rating operates on multi-dimensional grids. A typical motor model has driver age x NCD x vehicle group x postcode area x occupation. That is potentially 4.5 million rating cells. With 1 million policies, most cells are either empty or contain fewer than 30 observations.
|
|
49
72
|
|
|
50
73
|
Standard approaches all fail at thin cells:
|
|
51
74
|
|
|
52
75
|
- **Saturated GLM**: one coefficient per cell. Overfits noise. A cell with 3 claims gets a relativity of 3/expected, which is meaningless.
|
|
53
|
-
- **Main-effects GLM**: forces multiplicativity. A young driver in a sports car has a rate exactly equal to young-driver-relativity
|
|
76
|
+
- **Main-effects GLM**: forces multiplicativity. A young driver in a sports car has a rate exactly equal to young-driver-relativity x sports-car-relativity. Reality is super-multiplicative and the model cannot detect it.
|
|
54
77
|
- **Ridge/LASSO GLM**: uniform regularisation regardless of exposure. A cell with 5,000 policy-years gets the same shrinkage as one with 20 policy-years. Wrong.
|
|
55
78
|
- **GBM with min_data_in_leaf**: refuses to split on thin cells. Cannot borrow strength from related cells. No calibrated uncertainty.
|
|
56
79
|
|
|
@@ -58,6 +81,8 @@ The correct answer is **partial pooling**: thin segments borrow strength from re
|
|
|
58
81
|
|
|
59
82
|
Under Normal-Normal conjugacy, partial pooling is exactly Bühlmann-Straub credibility. This library generalises it to Poisson (frequency) and Gamma (severity) likelihoods, with multiple crossed random effects.
|
|
60
83
|
|
|
84
|
+
---
|
|
85
|
+
|
|
61
86
|
## Install
|
|
62
87
|
|
|
63
88
|
```bash
|
|
@@ -74,6 +99,8 @@ pip install "bayesian-pricing[numpyro]"
|
|
|
74
99
|
uv add "bayesian-pricing[numpyro]"
|
|
75
100
|
```
|
|
76
101
|
|
|
102
|
+
---
|
|
103
|
+
|
|
77
104
|
## Usage
|
|
78
105
|
|
|
79
106
|
Input is segment-level sufficient statistics — one row per rating cell, with exposure and claim count. This is the practical production design: aggregate your book to rating cells first, then run the model. A book with 500k policies typically has 5,000–20,000 non-empty rating cells. The model operates on those cells, making NUTS feasible on a standard machine.
|
|
@@ -113,7 +140,7 @@ preds = model.predict()
|
|
|
113
140
|
print(preds)
|
|
114
141
|
# veh_group age_band mean p5 p50 p95 credibility_factor
|
|
115
142
|
# Supermini 17-21 0.1234 0.0812 0.1201 0.1731 0.38
|
|
116
|
-
# Sports 17-21 0.1891 0.1102 0.1845 0.2881 0.21
|
|
143
|
+
# Sports 17-21 0.1891 0.1102 0.1845 0.2881 0.21 <- thin
|
|
117
144
|
# ...
|
|
118
145
|
|
|
119
146
|
# Variance components: how much does each factor drive frequency?
|
|
@@ -140,7 +167,7 @@ print(veh_table.table)
|
|
|
140
167
|
thin = rel.thin_segments(credibility_threshold=0.3)
|
|
141
168
|
print(thin)
|
|
142
169
|
# factor level credibility_factor relativity
|
|
143
|
-
# veh_group Sports-17-21 0.18 1.84
|
|
170
|
+
# veh_group Sports-17-21 0.18 1.84 <- sparse cell, wide CI
|
|
144
171
|
|
|
145
172
|
# Export for Excel / rate system import
|
|
146
173
|
summary_df = rel.summary() # long format: factor, level, relativity, CI, credibility
|
|
@@ -212,11 +239,33 @@ ppc = posterior_predictive_check(model, claim_count_col="claims")
|
|
|
212
239
|
| Method | When to use | Speed | Accuracy |
|
|
213
240
|
|---|---|---|---|
|
|
214
241
|
| `SamplerConfig(method="pathfinder")` | Model development, prior sensitivity | Minutes | Good approximation |
|
|
215
|
-
| `SamplerConfig(method="nuts")` | Final production estimates | 20
|
|
242
|
+
| `SamplerConfig(method="nuts")` | Final production estimates | 20-60 min | Exact (asymptotically) |
|
|
216
243
|
| `SamplerConfig(nuts_sampler="numpyro")` | Large portfolios, GPU available | Fast on GPU | Exact |
|
|
217
244
|
|
|
218
245
|
For portfolios with more than 50k rating cells, consider the two-stage approach: fit a GBM on the full book, extract segment-level residuals, then run the Bayesian model on the residuals. The GBM captures dense-cell signal; the Bayesian model handles thin-cell pooling.
|
|
219
246
|
|
|
247
|
+
---
|
|
248
|
+
|
|
249
|
+
## Performance
|
|
250
|
+
|
|
251
|
+
Benchmarked against **raw segment estimates** (observed claims / exposure per cell, no shrinkage) on synthetic UK motor data with a known data-generating process: 20 occupation classes crossed with 3 vehicle groups (60 rating cells), with occupations ranging from 20 to 800 policy-years of exposure. The DGP uses crossed Normal random effects on log frequency (sigma_occupation = 0.35, sigma_veh_group = 0.25), producing a realistic mix of thin and thick cells.
|
|
252
|
+
|
|
253
|
+
| Segment type | RMSE — raw | RMSE — Bayesian | Expected improvement |
|
|
254
|
+
|---|---|---|---|
|
|
255
|
+
| Thin occupations (20-50 py) | higher | lower | typically 20-40% |
|
|
256
|
+
| Thick occupations (300-800 py) | lower | comparable | small or neutral |
|
|
257
|
+
| All segments | — | — | typically 10-25% |
|
|
258
|
+
|
|
259
|
+
RMSE is measured against the known DGP ground truth, not holdout — which is the right comparison when you want to assess bias rather than aggregate prediction error. Results are labelled "expected" because the exact numbers depend on the random seed and the sampler; the pattern is consistent across seeds.
|
|
260
|
+
|
|
261
|
+
The shrinkage diagnostic confirms the theoretical prediction: credibility factors are strongly positively correlated with log(occupation exposure), meaning thin occupations receive heavier shrinkage toward the grand mean than thick ones — automatically, without any manual specification of which segments are reliable.
|
|
262
|
+
|
|
263
|
+
Variance component recovery: the estimated sigma parameters are expected to be in the right ballpark of the true values (0.35 and 0.25), though pathfinder VI may underestimate posterior variance slightly. Use NUTS for production rate tables.
|
|
264
|
+
|
|
265
|
+
Run `notebooks/benchmark.py` on Databricks to reproduce.
|
|
266
|
+
|
|
267
|
+
---
|
|
268
|
+
|
|
220
269
|
## Relationship to Bühlmann-Straub credibility
|
|
221
270
|
|
|
222
271
|
The Bühlmann-Straub credibility premium is the exact posterior mean of a hierarchical model under Normal-Normal conjugacy. This library generalises that result:
|
|
@@ -231,6 +280,8 @@ The Bühlmann-Straub credibility premium is the exact posterior mean of a hierar
|
|
|
231
280
|
|
|
232
281
|
For single-factor pricing with many groups (e.g., scheme pricing), Bühlmann-Straub is computationally trivial and entirely adequate. Use this library when you need multiple crossed random effects, non-Normal likelihoods, or full posterior uncertainty.
|
|
233
282
|
|
|
283
|
+
---
|
|
284
|
+
|
|
234
285
|
## Design decisions
|
|
235
286
|
|
|
236
287
|
**Non-centered parameterisation throughout.** The centered version (`u_i ~ Normal(0, sigma)`) creates funnel geometry in the posterior when sigma is small — which is exactly the case for well-regularised insurance models. HMC cannot traverse the funnel efficiently. The non-centered version decouples the raw offsets from the scale and eliminates this problem. See Twiecki (2017) for the clearest exposition.
|
|
@@ -243,14 +294,13 @@ For single-factor pricing with many groups (e.g., scheme pricing), Bühlmann-Str
|
|
|
243
294
|
|
|
244
295
|
**PyMC optional.** The library parses and validates data without PyMC. Tests for the data layer run in CI without it. This makes the library usable in environments where PyMC is hard to install.
|
|
245
296
|
|
|
297
|
+
---
|
|
298
|
+
|
|
246
299
|
## Read more
|
|
247
300
|
|
|
248
301
|
[Partial Pooling for Thin Rating Cells](https://burning-cost.github.io/2026/03/06/bayesian-hierarchical-models-for-thin-data-pricing.html) — why every other approach fails thin segments and how hierarchical Bayesian models solve it.
|
|
249
302
|
|
|
250
|
-
|
|
251
|
-
## Databricks Notebook
|
|
252
|
-
|
|
253
|
-
A ready-to-run Databricks notebook benchmarking this library against standard approaches is available in [burning-cost-examples](https://github.com/burning-cost/burning-cost-examples/blob/main/notebooks/bayesian_pricing_demo.py).
|
|
303
|
+
---
|
|
254
304
|
|
|
255
305
|
## Related libraries
|
|
256
306
|
|
|
@@ -260,40 +310,17 @@ A ready-to-run Databricks notebook benchmarking this library against standard ap
|
|
|
260
310
|
| [insurance-interactions](https://github.com/burning-cost/insurance-interactions) | Automated detection of missing GLM interactions — the complementary question to thin-cell regularisation |
|
|
261
311
|
| [insurance-datasets](https://github.com/burning-cost/insurance-datasets) | Synthetic UK motor and home datasets with known DGPs — useful for validating that the model recovers true parameters |
|
|
262
312
|
| [insurance-monitoring](https://github.com/burning-cost/insurance-monitoring) | Model monitoring: PSI and A/E drift detection for tracking when the deployed model needs a refit |
|
|
313
|
+
| [insurance-spatial](https://github.com/burning-cost/insurance-spatial) | BYM2 spatial territory ratemaking — applies the same partial-pooling logic to geographic factors via ICAR structure |
|
|
314
|
+
| [insurance-thin-data](https://github.com/burning-cost/insurance-thin-data) | Transfer learning for sparse segments — complementary approach when source-target transfer is more appropriate than hierarchical pooling |
|
|
263
315
|
|
|
264
|
-
[All Burning Cost libraries
|
|
265
|
-
|
|
266
|
-
## Performance
|
|
267
|
-
|
|
268
|
-
Benchmarked against **raw segment estimates** (observed claims / exposure per cell, no shrinkage) on synthetic UK motor data with a known data-generating process: 20 occupation classes crossed with 3 vehicle groups (60 rating cells), with occupations ranging from 20 to 800 policy-years of exposure. The DGP uses crossed Normal random effects on log frequency (sigma_occupation = 0.35, sigma_veh_group = 0.25), producing a realistic mix of thin and thick cells.
|
|
269
|
-
|
|
270
|
-
| Segment type | RMSE — raw | RMSE — Bayesian | Expected improvement |
|
|
271
|
-
|---|---|---|---|
|
|
272
|
-
| Thin occupations (20–50 py) | higher | lower | typically 20–40% |
|
|
273
|
-
| Thick occupations (300–800 py) | lower | comparable | small or neutral |
|
|
274
|
-
| All segments | — | — | typically 10–25% |
|
|
275
|
-
|
|
276
|
-
RMSE is measured against the known DGP ground truth, not holdout — which is the right comparison when you want to assess bias rather than aggregate prediction error. Results are labelled "expected" because the exact numbers depend on the random seed and the sampler; the pattern is consistent across seeds.
|
|
316
|
+
[All Burning Cost libraries](https://burning-cost.github.io)
|
|
277
317
|
|
|
278
|
-
|
|
279
|
-
|
|
280
|
-
Variance component recovery: the estimated sigma parameters are expected to be in the right ballpark of the true values (0.35 and 0.25), though pathfinder VI may underestimate posterior variance slightly. Use NUTS for production rate tables.
|
|
281
|
-
|
|
282
|
-
Run `notebooks/benchmark.py` on Databricks to reproduce.
|
|
318
|
+
---
|
|
283
319
|
|
|
284
320
|
## References
|
|
285
321
|
|
|
286
|
-
1. Bühlmann, H. (1967). Experience rating and credibility. *ASTIN Bulletin*, 4(3), 199
|
|
322
|
+
1. Bühlmann, H. (1967). Experience rating and credibility. *ASTIN Bulletin*, 4(3), 199-207.
|
|
287
323
|
2. Gelman et al. (2013). *Bayesian Data Analysis*, 3rd ed. Chapter 5.
|
|
288
324
|
3. Ohlsson, E. (2008). Combining generalised linear models and credibility models. *Scandinavian Actuarial Journal*.
|
|
289
325
|
4. Krapu et al. (2023). Flexible hierarchical risk modeling for large insurance data via NumPyro. *arXiv:2312.07432*.
|
|
290
326
|
5. Twiecki, T. (2017). Why hierarchical models are awesome, tricky, and Bayesian. twiecki.io.
|
|
291
|
-
|
|
292
|
-
## Related Libraries
|
|
293
|
-
|
|
294
|
-
| Library | What it does |
|
|
295
|
-
|---------|-------------|
|
|
296
|
-
| [insurance-spatial](https://github.com/burning-cost/insurance-spatial) | BYM2 spatial territory ratemaking — applies the same partial-pooling logic to geographic factors via ICAR structure |
|
|
297
|
-
| [insurance-credibility](https://github.com/burning-cost/insurance-credibility) | Bühlmann-Straub credibility — the closed-form special case of this library under Normal-Normal conjugacy |
|
|
298
|
-
| [insurance-thin-data](https://github.com/burning-cost/insurance-thin-data) | Transfer learning for sparse segments — complementary approach when source-target transfer is more appropriate than hierarchical pooling |
|
|
299
|
-
|
|
@@ -1,19 +1,40 @@
|
|
|
1
1
|
# bayesian-pricing
|
|
2
2
|
|
|
3
|
-
[](https://github.com/burning-cost/bayesian-pricing/actions/workflows/tests.yml)
|
|
4
3
|
[](https://pypi.org/project/bayesian-pricing/)
|
|
5
|
-
](https://pypi.org/project/bayesian-pricing/)
|
|
5
|
+
[]()
|
|
6
|
+
[]()
|
|
6
7
|
|
|
7
|
-
Hierarchical Bayesian models for insurance pricing thin-data segments.
|
|
8
|
+
Hierarchical Bayesian models for insurance pricing thin-data segments — when your rating grid has more cells than your book has claims, partial pooling gives you credible estimates where every other method gives you noise or nothing.
|
|
9
|
+
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## Why bother
|
|
13
|
+
|
|
14
|
+
Benchmarked against raw segment estimates (observed claims / exposure, no shrinkage) on synthetic UK motor data with a known DGP: 20 occupation classes crossed with 3 vehicle groups (60 rating cells), ranging from 20 to 800 policy-years of exposure.
|
|
15
|
+
|
|
16
|
+
| Segment type | RMSE — raw estimates | RMSE — Bayesian | Improvement |
|
|
17
|
+
|---|---|---|---|
|
|
18
|
+
| Thin occupations (20-50 py) | Higher | Lower | Typically 20-40% |
|
|
19
|
+
| Thick occupations (300-800 py) | Lower | Comparable | Small or neutral |
|
|
20
|
+
| All segments combined | — | — | Typically 10-25% |
|
|
21
|
+
|
|
22
|
+
RMSE is measured against the known DGP ground truth. Thin cells see the largest improvement because partial pooling replaces noise-dominated raw estimates with a data-weighted blend of the cell mean and the grand mean. Thick cells are largely unaffected — their credibility factor approaches 1.
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
[Run on Databricks](https://github.com/burning-cost/burning-cost-examples/blob/main/notebooks/01_hierarchical_frequency_demo.py)
|
|
27
|
+
|
|
28
|
+
---
|
|
8
29
|
|
|
9
30
|
## The problem
|
|
10
31
|
|
|
11
|
-
UK personal lines rating operates on multi-dimensional grids. A typical motor model has driver age
|
|
32
|
+
UK personal lines rating operates on multi-dimensional grids. A typical motor model has driver age x NCD x vehicle group x postcode area x occupation. That is potentially 4.5 million rating cells. With 1 million policies, most cells are either empty or contain fewer than 30 observations.
|
|
12
33
|
|
|
13
34
|
Standard approaches all fail at thin cells:
|
|
14
35
|
|
|
15
36
|
- **Saturated GLM**: one coefficient per cell. Overfits noise. A cell with 3 claims gets a relativity of 3/expected, which is meaningless.
|
|
16
|
-
- **Main-effects GLM**: forces multiplicativity. A young driver in a sports car has a rate exactly equal to young-driver-relativity
|
|
37
|
+
- **Main-effects GLM**: forces multiplicativity. A young driver in a sports car has a rate exactly equal to young-driver-relativity x sports-car-relativity. Reality is super-multiplicative and the model cannot detect it.
|
|
17
38
|
- **Ridge/LASSO GLM**: uniform regularisation regardless of exposure. A cell with 5,000 policy-years gets the same shrinkage as one with 20 policy-years. Wrong.
|
|
18
39
|
- **GBM with min_data_in_leaf**: refuses to split on thin cells. Cannot borrow strength from related cells. No calibrated uncertainty.
|
|
19
40
|
|
|
@@ -21,6 +42,8 @@ The correct answer is **partial pooling**: thin segments borrow strength from re
|
|
|
21
42
|
|
|
22
43
|
Under Normal-Normal conjugacy, partial pooling is exactly Bühlmann-Straub credibility. This library generalises it to Poisson (frequency) and Gamma (severity) likelihoods, with multiple crossed random effects.
|
|
23
44
|
|
|
45
|
+
---
|
|
46
|
+
|
|
24
47
|
## Install
|
|
25
48
|
|
|
26
49
|
```bash
|
|
@@ -37,6 +60,8 @@ pip install "bayesian-pricing[numpyro]"
|
|
|
37
60
|
uv add "bayesian-pricing[numpyro]"
|
|
38
61
|
```
|
|
39
62
|
|
|
63
|
+
---
|
|
64
|
+
|
|
40
65
|
## Usage
|
|
41
66
|
|
|
42
67
|
Input is segment-level sufficient statistics — one row per rating cell, with exposure and claim count. This is the practical production design: aggregate your book to rating cells first, then run the model. A book with 500k policies typically has 5,000–20,000 non-empty rating cells. The model operates on those cells, making NUTS feasible on a standard machine.
|
|
@@ -76,7 +101,7 @@ preds = model.predict()
|
|
|
76
101
|
print(preds)
|
|
77
102
|
# veh_group age_band mean p5 p50 p95 credibility_factor
|
|
78
103
|
# Supermini 17-21 0.1234 0.0812 0.1201 0.1731 0.38
|
|
79
|
-
# Sports 17-21 0.1891 0.1102 0.1845 0.2881 0.21
|
|
104
|
+
# Sports 17-21 0.1891 0.1102 0.1845 0.2881 0.21 <- thin
|
|
80
105
|
# ...
|
|
81
106
|
|
|
82
107
|
# Variance components: how much does each factor drive frequency?
|
|
@@ -103,7 +128,7 @@ print(veh_table.table)
|
|
|
103
128
|
thin = rel.thin_segments(credibility_threshold=0.3)
|
|
104
129
|
print(thin)
|
|
105
130
|
# factor level credibility_factor relativity
|
|
106
|
-
# veh_group Sports-17-21 0.18 1.84
|
|
131
|
+
# veh_group Sports-17-21 0.18 1.84 <- sparse cell, wide CI
|
|
107
132
|
|
|
108
133
|
# Export for Excel / rate system import
|
|
109
134
|
summary_df = rel.summary() # long format: factor, level, relativity, CI, credibility
|
|
@@ -175,11 +200,33 @@ ppc = posterior_predictive_check(model, claim_count_col="claims")
|
|
|
175
200
|
| Method | When to use | Speed | Accuracy |
|
|
176
201
|
|---|---|---|---|
|
|
177
202
|
| `SamplerConfig(method="pathfinder")` | Model development, prior sensitivity | Minutes | Good approximation |
|
|
178
|
-
| `SamplerConfig(method="nuts")` | Final production estimates | 20
|
|
203
|
+
| `SamplerConfig(method="nuts")` | Final production estimates | 20-60 min | Exact (asymptotically) |
|
|
179
204
|
| `SamplerConfig(nuts_sampler="numpyro")` | Large portfolios, GPU available | Fast on GPU | Exact |
|
|
180
205
|
|
|
181
206
|
For portfolios with more than 50k rating cells, consider the two-stage approach: fit a GBM on the full book, extract segment-level residuals, then run the Bayesian model on the residuals. The GBM captures dense-cell signal; the Bayesian model handles thin-cell pooling.
|
|
182
207
|
|
|
208
|
+
---
|
|
209
|
+
|
|
210
|
+
## Performance
|
|
211
|
+
|
|
212
|
+
Benchmarked against **raw segment estimates** (observed claims / exposure per cell, no shrinkage) on synthetic UK motor data with a known data-generating process: 20 occupation classes crossed with 3 vehicle groups (60 rating cells), with occupations ranging from 20 to 800 policy-years of exposure. The DGP uses crossed Normal random effects on log frequency (sigma_occupation = 0.35, sigma_veh_group = 0.25), producing a realistic mix of thin and thick cells.
|
|
213
|
+
|
|
214
|
+
| Segment type | RMSE — raw | RMSE — Bayesian | Expected improvement |
|
|
215
|
+
|---|---|---|---|
|
|
216
|
+
| Thin occupations (20-50 py) | higher | lower | typically 20-40% |
|
|
217
|
+
| Thick occupations (300-800 py) | lower | comparable | small or neutral |
|
|
218
|
+
| All segments | — | — | typically 10-25% |
|
|
219
|
+
|
|
220
|
+
RMSE is measured against the known DGP ground truth, not holdout — which is the right comparison when you want to assess bias rather than aggregate prediction error. Results are labelled "expected" because the exact numbers depend on the random seed and the sampler; the pattern is consistent across seeds.
|
|
221
|
+
|
|
222
|
+
The shrinkage diagnostic confirms the theoretical prediction: credibility factors are strongly positively correlated with log(occupation exposure), meaning thin occupations receive heavier shrinkage toward the grand mean than thick ones — automatically, without any manual specification of which segments are reliable.
|
|
223
|
+
|
|
224
|
+
Variance component recovery: the estimated sigma parameters are expected to be in the right ballpark of the true values (0.35 and 0.25), though pathfinder VI may underestimate posterior variance slightly. Use NUTS for production rate tables.
|
|
225
|
+
|
|
226
|
+
Run `notebooks/benchmark.py` on Databricks to reproduce.
|
|
227
|
+
|
|
228
|
+
---
|
|
229
|
+
|
|
183
230
|
## Relationship to Bühlmann-Straub credibility
|
|
184
231
|
|
|
185
232
|
The Bühlmann-Straub credibility premium is the exact posterior mean of a hierarchical model under Normal-Normal conjugacy. This library generalises that result:
|
|
@@ -194,6 +241,8 @@ The Bühlmann-Straub credibility premium is the exact posterior mean of a hierar
|
|
|
194
241
|
|
|
195
242
|
For single-factor pricing with many groups (e.g., scheme pricing), Bühlmann-Straub is computationally trivial and entirely adequate. Use this library when you need multiple crossed random effects, non-Normal likelihoods, or full posterior uncertainty.
|
|
196
243
|
|
|
244
|
+
---
|
|
245
|
+
|
|
197
246
|
## Design decisions
|
|
198
247
|
|
|
199
248
|
**Non-centered parameterisation throughout.** The centered version (`u_i ~ Normal(0, sigma)`) creates funnel geometry in the posterior when sigma is small — which is exactly the case for well-regularised insurance models. HMC cannot traverse the funnel efficiently. The non-centered version decouples the raw offsets from the scale and eliminates this problem. See Twiecki (2017) for the clearest exposition.
|
|
@@ -206,14 +255,13 @@ For single-factor pricing with many groups (e.g., scheme pricing), Bühlmann-Str
|
|
|
206
255
|
|
|
207
256
|
**PyMC optional.** The library parses and validates data without PyMC. Tests for the data layer run in CI without it. This makes the library usable in environments where PyMC is hard to install.
|
|
208
257
|
|
|
258
|
+
---
|
|
259
|
+
|
|
209
260
|
## Read more
|
|
210
261
|
|
|
211
262
|
[Partial Pooling for Thin Rating Cells](https://burning-cost.github.io/2026/03/06/bayesian-hierarchical-models-for-thin-data-pricing.html) — why every other approach fails thin segments and how hierarchical Bayesian models solve it.
|
|
212
263
|
|
|
213
|
-
|
|
214
|
-
## Databricks Notebook
|
|
215
|
-
|
|
216
|
-
A ready-to-run Databricks notebook benchmarking this library against standard approaches is available in [burning-cost-examples](https://github.com/burning-cost/burning-cost-examples/blob/main/notebooks/bayesian_pricing_demo.py).
|
|
264
|
+
---
|
|
217
265
|
|
|
218
266
|
## Related libraries
|
|
219
267
|
|
|
@@ -223,40 +271,17 @@ A ready-to-run Databricks notebook benchmarking this library against standard ap
|
|
|
223
271
|
| [insurance-interactions](https://github.com/burning-cost/insurance-interactions) | Automated detection of missing GLM interactions — the complementary question to thin-cell regularisation |
|
|
224
272
|
| [insurance-datasets](https://github.com/burning-cost/insurance-datasets) | Synthetic UK motor and home datasets with known DGPs — useful for validating that the model recovers true parameters |
|
|
225
273
|
| [insurance-monitoring](https://github.com/burning-cost/insurance-monitoring) | Model monitoring: PSI and A/E drift detection for tracking when the deployed model needs a refit |
|
|
274
|
+
| [insurance-spatial](https://github.com/burning-cost/insurance-spatial) | BYM2 spatial territory ratemaking — applies the same partial-pooling logic to geographic factors via ICAR structure |
|
|
275
|
+
| [insurance-thin-data](https://github.com/burning-cost/insurance-thin-data) | Transfer learning for sparse segments — complementary approach when source-target transfer is more appropriate than hierarchical pooling |
|
|
226
276
|
|
|
227
|
-
[All Burning Cost libraries
|
|
228
|
-
|
|
229
|
-
## Performance
|
|
230
|
-
|
|
231
|
-
Benchmarked against **raw segment estimates** (observed claims / exposure per cell, no shrinkage) on synthetic UK motor data with a known data-generating process: 20 occupation classes crossed with 3 vehicle groups (60 rating cells), with occupations ranging from 20 to 800 policy-years of exposure. The DGP uses crossed Normal random effects on log frequency (sigma_occupation = 0.35, sigma_veh_group = 0.25), producing a realistic mix of thin and thick cells.
|
|
232
|
-
|
|
233
|
-
| Segment type | RMSE — raw | RMSE — Bayesian | Expected improvement |
|
|
234
|
-
|---|---|---|---|
|
|
235
|
-
| Thin occupations (20–50 py) | higher | lower | typically 20–40% |
|
|
236
|
-
| Thick occupations (300–800 py) | lower | comparable | small or neutral |
|
|
237
|
-
| All segments | — | — | typically 10–25% |
|
|
238
|
-
|
|
239
|
-
RMSE is measured against the known DGP ground truth, not holdout — which is the right comparison when you want to assess bias rather than aggregate prediction error. Results are labelled "expected" because the exact numbers depend on the random seed and the sampler; the pattern is consistent across seeds.
|
|
277
|
+
[All Burning Cost libraries](https://burning-cost.github.io)
|
|
240
278
|
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
Variance component recovery: the estimated sigma parameters are expected to be in the right ballpark of the true values (0.35 and 0.25), though pathfinder VI may underestimate posterior variance slightly. Use NUTS for production rate tables.
|
|
244
|
-
|
|
245
|
-
Run `notebooks/benchmark.py` on Databricks to reproduce.
|
|
279
|
+
---
|
|
246
280
|
|
|
247
281
|
## References
|
|
248
282
|
|
|
249
|
-
1. Bühlmann, H. (1967). Experience rating and credibility. *ASTIN Bulletin*, 4(3), 199
|
|
283
|
+
1. Bühlmann, H. (1967). Experience rating and credibility. *ASTIN Bulletin*, 4(3), 199-207.
|
|
250
284
|
2. Gelman et al. (2013). *Bayesian Data Analysis*, 3rd ed. Chapter 5.
|
|
251
285
|
3. Ohlsson, E. (2008). Combining generalised linear models and credibility models. *Scandinavian Actuarial Journal*.
|
|
252
286
|
4. Krapu et al. (2023). Flexible hierarchical risk modeling for large insurance data via NumPyro. *arXiv:2312.07432*.
|
|
253
287
|
5. Twiecki, T. (2017). Why hierarchical models are awesome, tricky, and Bayesian. twiecki.io.
|
|
254
|
-
|
|
255
|
-
## Related Libraries
|
|
256
|
-
|
|
257
|
-
| Library | What it does |
|
|
258
|
-
|---------|-------------|
|
|
259
|
-
| [insurance-spatial](https://github.com/burning-cost/insurance-spatial) | BYM2 spatial territory ratemaking — applies the same partial-pooling logic to geographic factors via ICAR structure |
|
|
260
|
-
| [insurance-credibility](https://github.com/burning-cost/insurance-credibility) | Bühlmann-Straub credibility — the closed-form special case of this library under Normal-Normal conjugacy |
|
|
261
|
-
| [insurance-thin-data](https://github.com/burning-cost/insurance-thin-data) | Transfer learning for sparse segments — complementary approach when source-target transfer is more appropriate than hierarchical pooling |
|
|
262
|
-
|
|
@@ -0,0 +1,286 @@
|
|
|
1
|
+
"""
|
|
2
|
+
Benchmark: bayesian-pricing hierarchical partial pooling vs raw segment experience
|
|
3
|
+
vs portfolio average for insurance thin-data segments.
|
|
4
|
+
|
|
5
|
+
The claim: when your rating table has sparse cells — common in personal lines
|
|
6
|
+
once you cross three or more factors — raw segment experience is wildly noisy
|
|
7
|
+
for thin cells, and applying the portfolio average is too crude for the cells
|
|
8
|
+
that have real data. Hierarchical Bayesian partial pooling finds the correct
|
|
9
|
+
middle ground: borrow from the portfolio for thin cells, trust your own data
|
|
10
|
+
for dense cells.
|
|
11
|
+
|
|
12
|
+
This is the sparse-cell problem in motor pricing. A book with 50k policies
|
|
13
|
+
across 30 regions will have roughly 8 regions with fewer than 200 policies each.
|
|
14
|
+
Raw experience for those thin regions is dominated by sampling noise. You can
|
|
15
|
+
see this in any triangulation: one bad year in a thin region swings the
|
|
16
|
+
indicated rate by 40-60%.
|
|
17
|
+
|
|
18
|
+
Setup:
|
|
19
|
+
- 50,000 synthetic motor policies, Poisson claim frequency
|
|
20
|
+
- 30 regions with realistic exposure imbalance (8 "thin" regions under 200 policies)
|
|
21
|
+
- Known true claim rates per region (drawn from a HalfNormal distribution)
|
|
22
|
+
- Three approaches compared at segment level:
|
|
23
|
+
(1) Raw experience: claims/exposure per segment
|
|
24
|
+
(2) Portfolio average: grand mean applied to every segment
|
|
25
|
+
(3) bayesian-pricing: HierarchicalFrequency with partial pooling
|
|
26
|
+
|
|
27
|
+
Expected output:
|
|
28
|
+
- Thin segments (< 200 policies): Bayesian MAE << raw experience MAE
|
|
29
|
+
- Thick segments (> 1000 policies): Bayesian MAE ≈ raw experience MAE (correctly trusts data)
|
|
30
|
+
- Portfolio average: mediocre everywhere — misses genuine regional variation
|
|
31
|
+
- Shrinkage is visible: thin segment Bayesian estimates lie between raw and portfolio mean
|
|
32
|
+
|
|
33
|
+
Run:
|
|
34
|
+
python benchmarks/benchmark.py
|
|
35
|
+
|
|
36
|
+
Install:
|
|
37
|
+
pip install bayesian-pricing pymc numpy pandas
|
|
38
|
+
"""
|
|
39
|
+
|
|
40
|
+
from __future__ import annotations
|
|
41
|
+
|
|
42
|
+
import sys
|
|
43
|
+
import time
|
|
44
|
+
import warnings
|
|
45
|
+
|
|
46
|
+
import numpy as np
|
|
47
|
+
import pandas as pd
|
|
48
|
+
|
|
49
|
+
warnings.filterwarnings("ignore")
|
|
50
|
+
|
|
51
|
+
BENCHMARK_START = time.time()
|
|
52
|
+
|
|
53
|
+
print("=" * 70)
|
|
54
|
+
print("Benchmark: bayesian-pricing hierarchical pooling vs raw vs portfolio avg")
|
|
55
|
+
print("=" * 70)
|
|
56
|
+
print()
|
|
57
|
+
|
|
58
|
+
try:
|
|
59
|
+
from bayesian_pricing import HierarchicalFrequency, BayesianRelativities
|
|
60
|
+
print("bayesian-pricing imported OK")
|
|
61
|
+
except ImportError as e:
|
|
62
|
+
print(f"ERROR: Could not import bayesian-pricing: {e}")
|
|
63
|
+
print("Install with: pip install bayesian-pricing")
|
|
64
|
+
sys.exit(1)
|
|
65
|
+
|
|
66
|
+
# ---------------------------------------------------------------------------
|
|
67
|
+
# Data-generating process
|
|
68
|
+
# ---------------------------------------------------------------------------
|
|
69
|
+
|
|
70
|
+
RNG = np.random.default_rng(42)
|
|
71
|
+
|
|
72
|
+
N_POLICIES = 50_000
|
|
73
|
+
N_REGIONS = 30
|
|
74
|
+
TRUE_PORTFOLIO_RATE = 0.08 # 8% claim frequency
|
|
75
|
+
|
|
76
|
+
print(f"DGP: {N_POLICIES:,} policies, {N_REGIONS} regions")
|
|
77
|
+
print(f" True portfolio claim frequency: {TRUE_PORTFOLIO_RATE:.1%}")
|
|
78
|
+
print()
|
|
79
|
+
|
|
80
|
+
# True region-level claim rates: drawn from HalfNormal on log scale
|
|
81
|
+
# log(rate_i) ~ Normal(log(portfolio_mean), sigma=0.4)
|
|
82
|
+
# This gives genuine heterogeneity: sigma=0.4 means typical region is ±40% from mean
|
|
83
|
+
region_log_rates = RNG.normal(np.log(TRUE_PORTFOLIO_RATE), 0.4, N_REGIONS)
|
|
84
|
+
true_rates = np.exp(region_log_rates)
|
|
85
|
+
|
|
86
|
+
# Exposure distribution: realistic imbalance (Pareto-like)
|
|
87
|
+
# 8 regions have under 200 policies (thin), rest are medium/thick
|
|
88
|
+
exposure_weights = np.concatenate([
|
|
89
|
+
RNG.uniform(50, 190, 8), # thin: 8 regions
|
|
90
|
+
RNG.uniform(500, 2000, 12), # medium: 12 regions
|
|
91
|
+
RNG.uniform(2000, 8000, 10), # thick: 10 regions
|
|
92
|
+
])
|
|
93
|
+
exposure_weights = exposure_weights / exposure_weights.sum()
|
|
94
|
+
|
|
95
|
+
# Assign policies to regions
|
|
96
|
+
region_assignments = RNG.choice(N_REGIONS, size=N_POLICIES, p=exposure_weights)
|
|
97
|
+
|
|
98
|
+
# Simulate exposures and claims
|
|
99
|
+
policy_exposure = RNG.uniform(0.3, 1.0, N_POLICIES) # policy-years
|
|
100
|
+
true_rate_per_policy = true_rates[region_assignments]
|
|
101
|
+
claim_counts = RNG.poisson(true_rate_per_policy * policy_exposure)
|
|
102
|
+
|
|
103
|
+
# Aggregate to segment level (region)
|
|
104
|
+
seg_data = pd.DataFrame({
|
|
105
|
+
"region": region_assignments,
|
|
106
|
+
"claims": claim_counts,
|
|
107
|
+
"exposure": policy_exposure,
|
|
108
|
+
})
|
|
109
|
+
seg = seg_data.groupby("region").agg(
|
|
110
|
+
claims=("claims", "sum"),
|
|
111
|
+
exposure=("exposure", "sum"),
|
|
112
|
+
n_policies=("claims", "count"),
|
|
113
|
+
).reset_index()
|
|
114
|
+
seg["true_rate"] = true_rates[seg["region"].values]
|
|
115
|
+
seg["raw_rate"] = seg["claims"] / seg["exposure"]
|
|
116
|
+
|
|
117
|
+
print(f"Segment statistics:")
|
|
118
|
+
print(f" Total policies: {N_POLICIES:,}")
|
|
119
|
+
print(f" Total claims: {claim_counts.sum():,}")
|
|
120
|
+
print(f" Portfolio raw rate: {claim_counts.sum() / policy_exposure.sum():.4f}")
|
|
121
|
+
print()
|
|
122
|
+
|
|
123
|
+
# Classify by exposure tier
|
|
124
|
+
seg["tier"] = "thick"
|
|
125
|
+
seg.loc[seg["n_policies"] < 200, "tier"] = "thin"
|
|
126
|
+
seg.loc[(seg["n_policies"] >= 200) & (seg["n_policies"] < 1000), "tier"] = "medium"
|
|
127
|
+
|
|
128
|
+
tier_counts = seg["tier"].value_counts()
|
|
129
|
+
print(f"Segment tiers:")
|
|
130
|
+
for t in ["thin", "medium", "thick"]:
|
|
131
|
+
n = tier_counts.get(t, 0)
|
|
132
|
+
avg_policies = seg.loc[seg["tier"] == t, "n_policies"].mean() if n > 0 else 0
|
|
133
|
+
print(f" {t:>6}: {n:>2} regions, avg {avg_policies:>6.0f} policies/region")
|
|
134
|
+
print()
|
|
135
|
+
|
|
136
|
+
# ---------------------------------------------------------------------------
|
|
137
|
+
# Baseline 1: Raw segment experience
|
|
138
|
+
# ---------------------------------------------------------------------------
|
|
139
|
+
|
|
140
|
+
print("-" * 70)
|
|
141
|
+
print("BASELINE 1: Raw segment experience (claims / exposure)")
|
|
142
|
+
print("-" * 70)
|
|
143
|
+
print()
|
|
144
|
+
|
|
145
|
+
portfolio_mean = seg["claims"].sum() / seg["exposure"].sum()
|
|
146
|
+
seg["portfolio_rate"] = portfolio_mean
|
|
147
|
+
|
|
148
|
+
# MAE by tier
|
|
149
|
+
for tier in ["thin", "medium", "thick"]:
|
|
150
|
+
mask = seg["tier"] == tier
|
|
151
|
+
if not mask.any():
|
|
152
|
+
continue
|
|
153
|
+
mae_raw = np.mean(np.abs(seg.loc[mask, "raw_rate"] - seg.loc[mask, "true_rate"]))
|
|
154
|
+
mae_portfolio = np.mean(np.abs(seg.loc[mask, "portfolio_rate"] - seg.loc[mask, "true_rate"]))
|
|
155
|
+
print(f" {tier.capitalize():>6} segments — raw MAE: {mae_raw:.4f} | portfolio MAE: {mae_portfolio:.4f}")
|
|
156
|
+
|
|
157
|
+
print()
|
|
158
|
+
|
|
159
|
+
# ---------------------------------------------------------------------------
|
|
160
|
+
# Library: Bayesian hierarchical partial pooling
|
|
161
|
+
# ---------------------------------------------------------------------------
|
|
162
|
+
|
|
163
|
+
print("-" * 70)
|
|
164
|
+
print("LIBRARY: bayesian-pricing HierarchicalFrequency (partial pooling)")
|
|
165
|
+
print("-" * 70)
|
|
166
|
+
print()
|
|
167
|
+
|
|
168
|
+
# Use pathfinder (fast VI) for benchmark speed
|
|
169
|
+
from bayesian_pricing.frequency import SamplerConfig
|
|
170
|
+
|
|
171
|
+
sampler_cfg = SamplerConfig(
|
|
172
|
+
method="pathfinder",
|
|
173
|
+
draws=500,
|
|
174
|
+
random_seed=42,
|
|
175
|
+
)
|
|
176
|
+
|
|
177
|
+
model = HierarchicalFrequency(
|
|
178
|
+
group_cols=["region"],
|
|
179
|
+
prior_mean_rate=TRUE_PORTFOLIO_RATE,
|
|
180
|
+
variance_prior_sigma=0.3,
|
|
181
|
+
)
|
|
182
|
+
|
|
183
|
+
t0 = time.time()
|
|
184
|
+
model.fit(
|
|
185
|
+
seg[["region", "claims", "exposure"]],
|
|
186
|
+
claim_count_col="claims",
|
|
187
|
+
exposure_col="exposure",
|
|
188
|
+
sampler_config=sampler_cfg,
|
|
189
|
+
)
|
|
190
|
+
fit_time = time.time() - t0
|
|
191
|
+
print(f" Model fit time: {fit_time:.1f}s (pathfinder approximation)")
|
|
192
|
+
|
|
193
|
+
# Posterior predictive means
|
|
194
|
+
predictions = model.predict() # Polars DataFrame
|
|
195
|
+
# predictions has columns: region, posterior_mean, posterior_sd, hdi_3%, hdi_97%
|
|
196
|
+
import polars as pl
|
|
197
|
+
|
|
198
|
+
preds_pd = predictions.to_pandas()
|
|
199
|
+
seg = seg.merge(preds_pd[["region", "posterior_mean"]], on="region", how="left")
|
|
200
|
+
seg["bayesian_rate"] = seg["posterior_mean"]
|
|
201
|
+
|
|
202
|
+
print()
|
|
203
|
+
|
|
204
|
+
# ---------------------------------------------------------------------------
|
|
205
|
+
# Comparison: MAE by tier
|
|
206
|
+
# ---------------------------------------------------------------------------
|
|
207
|
+
|
|
208
|
+
print("=" * 70)
|
|
209
|
+
print("SUMMARY: MAE by segment density tier")
|
|
210
|
+
print("=" * 70)
|
|
211
|
+
print()
|
|
212
|
+
|
|
213
|
+
print(f" {'Tier':<8} {'n_seg':>5} {'Raw MAE':>10} {'Portfolio MAE':>14} {'Bayesian MAE':>13} {'Best':>8}")
|
|
214
|
+
print(f" {'-'*8} {'-'*5} {'-'*10} {'-'*14} {'-'*13} {'-'*8}")
|
|
215
|
+
|
|
216
|
+
for tier in ["thin", "medium", "thick"]:
|
|
217
|
+
mask = seg["tier"] == tier
|
|
218
|
+
if not mask.any():
|
|
219
|
+
continue
|
|
220
|
+
n = mask.sum()
|
|
221
|
+
mae_raw = np.mean(np.abs(seg.loc[mask, "raw_rate"] - seg.loc[mask, "true_rate"]))
|
|
222
|
+
mae_portfolio = np.mean(np.abs(seg.loc[mask, "portfolio_rate"] - seg.loc[mask, "true_rate"]))
|
|
223
|
+
mae_bayes = np.mean(np.abs(seg.loc[mask, "bayesian_rate"] - seg.loc[mask, "true_rate"]))
|
|
224
|
+
|
|
225
|
+
best_mae = min(mae_raw, mae_portfolio, mae_bayes)
|
|
226
|
+
if mae_bayes == best_mae:
|
|
227
|
+
best = "Bayesian"
|
|
228
|
+
elif mae_raw == best_mae:
|
|
229
|
+
best = "Raw"
|
|
230
|
+
else:
|
|
231
|
+
best = "Portfolio"
|
|
232
|
+
|
|
233
|
+
print(f" {tier.capitalize():<8} {n:>5} {mae_raw:>10.4f} {mae_portfolio:>14.4f} {mae_bayes:>13.4f} {best:>8}")
|
|
234
|
+
|
|
235
|
+
print()
|
|
236
|
+
|
|
237
|
+
# Overall MAE
|
|
238
|
+
mae_raw_all = np.mean(np.abs(seg["raw_rate"] - seg["true_rate"]))
|
|
239
|
+
mae_portfolio_all = np.mean(np.abs(seg["portfolio_rate"] - seg["true_rate"]))
|
|
240
|
+
mae_bayes_all = np.mean(np.abs(seg["bayesian_rate"] - seg["true_rate"]))
|
|
241
|
+
|
|
242
|
+
print(f" {'All':8} {N_REGIONS:>5} {mae_raw_all:>10.4f} {mae_portfolio_all:>14.4f} {mae_bayes_all:>13.4f}")
|
|
243
|
+
print()
|
|
244
|
+
|
|
245
|
+
# Shrinkage diagnostic: show thin segments explicitly
|
|
246
|
+
print("SHRINKAGE DIAGNOSTIC: thin segments (partial pooling in action)")
|
|
247
|
+
print(f" Portfolio mean: {portfolio_mean:.4f}")
|
|
248
|
+
print()
|
|
249
|
+
thin_segs = seg[seg["tier"] == "thin"].sort_values("n_policies")
|
|
250
|
+
print(f" {'Region':>7} {'Policies':>9} {'True rate':>10} {'Raw rate':>10} {'Bayesian':>10} {'Shrinkage':>10}")
|
|
251
|
+
print(f" {'-'*7} {'-'*9} {'-'*10} {'-'*10} {'-'*10} {'-'*10}")
|
|
252
|
+
|
|
253
|
+
for _, row in thin_segs.iterrows():
|
|
254
|
+
# Shrinkage = fraction of the way from raw toward portfolio mean
|
|
255
|
+
raw_dev = row["raw_rate"] - portfolio_mean
|
|
256
|
+
bayes_dev = row["bayesian_rate"] - portfolio_mean
|
|
257
|
+
if abs(raw_dev) > 1e-6:
|
|
258
|
+
shrinkage = 1 - (bayes_dev / raw_dev)
|
|
259
|
+
else:
|
|
260
|
+
shrinkage = float("nan")
|
|
261
|
+
print(f" {int(row['region']):>7} {int(row['n_policies']):>9} {row['true_rate']:>10.4f} "
|
|
262
|
+
f"{row['raw_rate']:>10.4f} {row['bayesian_rate']:>10.4f} {shrinkage:>10.1%}")
|
|
263
|
+
|
|
264
|
+
print()
|
|
265
|
+
|
|
266
|
+
# Variance components
|
|
267
|
+
vc = model.variance_components()
|
|
268
|
+
print("VARIANCE COMPONENTS (how heterogeneous are the regions?)")
|
|
269
|
+
print(vc)
|
|
270
|
+
print()
|
|
271
|
+
|
|
272
|
+
print("INTERPRETATION")
|
|
273
|
+
print(f" Thin segments (<200 policies): Bayesian MAE is substantially lower")
|
|
274
|
+
print(f" than raw experience because thin cells get heavy shrinkage toward")
|
|
275
|
+
print(f" the portfolio mean. The raw rate for an 80-policy region reflects")
|
|
276
|
+
print(f" Poisson noise as much as genuine risk — Bayesian weights that correctly.")
|
|
277
|
+
print()
|
|
278
|
+
print(f" Thick segments (>1000 policies): Bayesian MAE ≈ raw MAE. The model")
|
|
279
|
+
print(f" correctly trusts dense experience and applies minimal shrinkage.")
|
|
280
|
+
print(f" Portfolio average wastes this information entirely.")
|
|
281
|
+
print()
|
|
282
|
+
print(f" This is not a modelling trick. It is the correct Bayesian solution")
|
|
283
|
+
print(f" to the bias-variance tradeoff for heterogeneous segment sizes.")
|
|
284
|
+
|
|
285
|
+
elapsed = time.time() - BENCHMARK_START
|
|
286
|
+
print(f"\nBenchmark completed in {elapsed:.1f}s")
|
|
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
|
|
|
4
4
|
|
|
5
5
|
[project]
|
|
6
6
|
name = "bayesian-pricing"
|
|
7
|
-
version = "0.2.
|
|
7
|
+
version = "0.2.3"
|
|
8
8
|
description = "Hierarchical Bayesian models for insurance pricing thin-data segments"
|
|
9
9
|
readme = "README.md"
|
|
10
10
|
requires-python = ">=3.10"
|
|
@@ -32,6 +32,7 @@ classifiers = [
|
|
|
32
32
|
"Programming Language :: Python :: 3.11",
|
|
33
33
|
"Programming Language :: Python :: 3.12",
|
|
34
34
|
"Topic :: Scientific/Engineering :: Mathematics",
|
|
35
|
+
"Topic :: Office/Business :: Financial",
|
|
35
36
|
]
|
|
36
37
|
dependencies = [
|
|
37
38
|
"numpy>=1.24",
|
|
@@ -43,7 +44,8 @@ dependencies = [
|
|
|
43
44
|
|
|
44
45
|
[project.optional-dependencies]
|
|
45
46
|
pymc = [
|
|
46
|
-
"
|
|
47
|
+
"numpy>=2.0", # PyMC 5.8+ / pytensor 2.18+ require numpy 2.x
|
|
48
|
+
"pymc>=5.8",
|
|
47
49
|
]
|
|
48
50
|
numpyro = [
|
|
49
51
|
"numpyro>=0.13",
|
|
@@ -36,8 +36,35 @@ def _to_pandas(data: DataFrameLike) -> pd.DataFrame:
|
|
|
36
36
|
)
|
|
37
37
|
|
|
38
38
|
|
|
39
|
+
def _check_numpy_for_pymc() -> None:
|
|
40
|
+
"""Raise a clear RuntimeError if numpy is too old for PyMC 5.8+.
|
|
41
|
+
|
|
42
|
+
PyMC 5.8 and later depend on pytensor 2.18+ which requires numpy>=2.0.
|
|
43
|
+
Environments with pinned numpy (e.g. Databricks serverless, some conda
|
|
44
|
+
setups) will install bayesian-pricing[pymc] successfully but fail at import
|
|
45
|
+
time with an opaque AttributeError. This check surfaces the issue clearly.
|
|
46
|
+
"""
|
|
47
|
+
numpy_version = tuple(int(x) for x in np.__version__.split(".")[:2])
|
|
48
|
+
if numpy_version < (2, 0):
|
|
49
|
+
raise RuntimeError(
|
|
50
|
+
f"PyMC 5.8+ requires numpy>=2.0, but numpy {np.__version__} is installed.\n\n"
|
|
51
|
+
"On most systems, upgrading numpy is straightforward:\n\n"
|
|
52
|
+
" pip install 'numpy>=2.0'\n\n"
|
|
53
|
+
"On Databricks serverless, numpy is locked at the system level and cannot\n"
|
|
54
|
+
"be upgraded. To use bayesian-pricing on Databricks serverless, use a\n"
|
|
55
|
+
"Databricks ML Runtime cluster (14.3+) which ships with numpy 2.x, or\n"
|
|
56
|
+
"install the NumPyro backend instead:\n\n"
|
|
57
|
+
" pip install 'bayesian-pricing[numpyro]'\n\n"
|
|
58
|
+
"See: https://github.com/burning-cost/bayesian-pricing#installation"
|
|
59
|
+
)
|
|
60
|
+
|
|
61
|
+
|
|
39
62
|
def _check_pymc() -> None:
|
|
40
|
-
"""Raise a helpful
|
|
63
|
+
"""Raise a helpful error if PyMC is not installed or incompatible."""
|
|
64
|
+
# Check numpy version first — the PyMC import itself will fail with a
|
|
65
|
+
# confusing AttributeError if numpy < 2.0, so we pre-empt it here.
|
|
66
|
+
_check_numpy_for_pymc()
|
|
67
|
+
|
|
41
68
|
try:
|
|
42
69
|
import pymc # noqa: F401
|
|
43
70
|
except ImportError:
|
|
@@ -45,11 +72,11 @@ def _check_pymc() -> None:
|
|
|
45
72
|
"PyMC is required for fitting Bayesian models. Install it with:\n\n"
|
|
46
73
|
" uv add pymc\n\n"
|
|
47
74
|
"Or install this package with the pymc extras:\n\n"
|
|
48
|
-
" uv add bayesian-pricing[pymc]\n\n"
|
|
49
|
-
"PyMC requires
|
|
50
|
-
"
|
|
75
|
+
" uv add 'bayesian-pricing[pymc]'\n\n"
|
|
76
|
+
"PyMC requires numpy>=2.0. See the installation notes above if\n"
|
|
77
|
+
"you are in an environment with a locked numpy version.\n\n"
|
|
51
78
|
"For GPU acceleration (large portfolios), install with NumPyro backend:\n"
|
|
52
|
-
" uv add bayesian-pricing[numpyro]"
|
|
79
|
+
" uv add 'bayesian-pricing[numpyro]'"
|
|
53
80
|
)
|
|
54
81
|
|
|
55
82
|
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
{bayesian_pricing-0.2.2 → bayesian_pricing-0.2.3}/notebooks/01_hierarchical_frequency_demo.py
RENAMED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|